Madeline Version 0.93

May, 2001

(c) 1999 by the Regents of the University of Michigan, Ann Arbor.


Contents


Section 1
Overview and Features

What is Madeline?

Madeline is software written in ANSI C/C++ for:

Supported Platforms

Madeline has been compiled for the following platforms:

FUSION Study Support

Madeline was designed to meet the needs of the Finland-United States Investigation of NIDDM Genetics (FUSION) Study. Because of this, Madeline has specific knowledge about FUSION study IDs. A subset of Madeline’s functionality makes use of this knowledge (see FUSION box below).

The program continues to be modified to make it useful for genetic studies in general. Paragraphs or headings preceded by "" describe FUSION-specific functionality.


FUSION
The Finland-United States Investigation of NIDDM Genetics Study

The aim of the FUSION study is to map and identify susceptibility genes for non-insulin dependent diabetes mellitus (NIDDM) and for the intermediate quantitative traits associated with NIDDM.

FUSION individual IDs are eight characters long. The first four characters represent the family ID. This is followed by a dash sign, plus sign or a letter between A-Z. Finally, three unique digits specify individuals within each pedigree.



Sample ID:                 1021+402
                           |   |  |
         +-----------------+   |  +---------------------+
         |                     |                        |
Family ID begins with:  Encoded flag symbol:       Individual ID:
- "0" for FUSION 1      "-" for FUSION 1           "100" for probands
- "1" for FUSION 2 fam. "+" for FUSION 2           "200" for fathers
- "C" for control fam.  "A" to "Z" for resampled   "300" for mothers
- "T" for Trios         FUSION records             "400" for siblings of the
                                                         proband (enumerated)
                                                   "500" for proband spouses 
                                                         (enumerated)
                                                   "600" for proband offspring
                                                         (enumerated)
                                                   "700" for sibling spouses
                                                         (enumerated)
                                                   "800" for sibling offspring
                                                         (enumerated)

  

Madeline is internally aware of the structure of FUSION IDs and uses this information in specific situations to:

  • Determine family IDs when a family ID field is not provided in the pedigree table.
  • Determine the proband in a sibship when a proband indicator field is not provided in the pedigree table.
  • Insert virtual FUSION parents in a family when only siblings were sampled, even when the father and mother fields contain missing values.
  • Connect spouses to one another even if they have no sampled offspring.
  • Re-connect dummied-in siblings (known via their offspring) to their parents to restore proper pedigree structure when required.

Madeline currently uses the following rules to determine if an ID in a dataset is a FUSION ID:

  • The ID must be exactly 8 characters long.
  • The first character must be in the set {0,1,C,T,9}. The numeral 9 is included to support constructed IDs used when part of a larger pedigree is split off for separate analytical treatment.
  • The fifth character must be in the set {-,+,A-Z}. The capital letters A-Z are allowed to support resampled IDs.
  • The sixth character must be in the set {0-9}. 0 is allowed to support FUSION 1 control "probands" who have a 0 instead of a 1 in this position.

A data set can easily contain a mixture of FUSION IDs and non-FUSION IDs. Only IDs meeting the above criteria will be construed as FUSION IDs.

 

Running the Program Interactively and in Batch Mode

Instructions to Madeline are entered at a command prompt. Madeline's command interpreter is not sensitive to capitalization. However, capitalization is often used in this document for clarity of presentation.

Madeline can be run interactively or in batch mode (Fig 1.1). To run Madeline interactively, type the name of the program at your system prompt and press return. Madeline’s "M>" prompt will appear.

There are two ways to run batch files. The first way is to provide the name of a batch file containing commands after the name of the program on the command line. The second way is to start Madeline interactively and then use the run command to execute the batch file. Madeline returns to interactive mode if an error occurs, or when a batch file terminates without a goodbye or quit command.

csvr1%                          <-- system prompt (on UNIX)
csvr1% madeline                 <-- starting the program in interactive mode

MADELINE Version  0.910
Copyright (c) 1999 by Edward H. Trager 
and the FUSION Study Group
(Finland-United States Investigation of NIDDM Genetics Study),
University of Michigan
Ann Arbor, Michigan, USA

Help facility loaded.

+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | ENGLISH   | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.00 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | 17:07 Tuesday, September 28, 1999       |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>
M>                              <-- Madeline’s  "M>" prompt appears
M> quit                         <-- quit interactive session
Releasing resources ...
Goodbye!
csrvr1% 
csrvr1% madeline process.batch  <-- starting the program in batch mode
Madeline Version  0.910
...
open '\test\chr20.dbf'          <-- executing first batch command
Calculating allele frequencies for   7. D20S173...
Calculating allele frequencies for  10. D20S889...
Calculating allele frequencies for  13. D20S898...
...

Fig. 1.1. Starting Madeline. Madeline can be run either interactively or in batch mode.

 

Start up Batch File

An option is available to set parameters and run commonly needed commands automatically each time Madeline is started by providing a special batch file called "autorun.bat" in the working directory where Madeline will be invoked.

Any commands that can normally be invoked on the command line or in a batch file can be placed into autorun.bat. Assignments to specify default field names or environmental settings are typically placed in autorun.bat (Fig. 1.2).

//
// Typical autorun.bat file for Unix/Linux environment:
//

//
// Environment settings:
//
quiet
set language to English
FileEditor="vi"
PostscriptViewer="gv"
//
// Pedigree drawing-specific settings:
//
set color off
set PaperSize to A4
// margin in centimeters:
set PaperMargin to 1.5
set orientation to automatic
//
// Pedigree database-specific settings:
//
GenderField='GENDER'
FamilyIDField='FAMILY'
IndividualIDField='INDIVIDUAL'
//
// Map standard missing value indicators:
//
NumericMissingValue[0]=-1
NumericMissingValue[1]=-9
//
// Map database-specific settings:
//
PositionField="POSTN"
OrdinalField ="ORDNL"

Fig. 1.2. Example autorun.bat file.

 

Starting with Madeline v. 0.91, a warning message is produced if an autorun.bat file is not found, and the "M>" prompt changes accordingly (Fig 1.3).

...
Could not find "autorun.bat" file.
...
1 WARNING M>

Fig. 1.3. In Madeline v. 0.91 and following, A warning is produced if autorun.bat file is not present.

 

Overview of Database Tables Used by the Program

A database table is a rectangular array of data. A record is a row in the array. A field is a column in the array. One row or record contains the data -- all the measured variables -- for one entity.

In Madeline, the measured entity is either an individual or a genetic marker. Key fields are fields that identify the entity. To uniquely identify an individual, two key fields are required: (1) a family identifier, and (2) an individual identifier. Data fields contain the data measured on the entity. Combinations of other fields will be required to identify other entities, such as a genetic marker. The specific set of key fields required depends upon the context.

In Madeline, only three types of database tables occur:

  1. Pedigree tables
  2. Genetic Map tables
  3. Marker tables

Each type is described in turn below.

Pedigree Tables

In a pedigree table, each row or record contains the data for one individual. In Madeline, the names of the family and individual ID fields are stored in variables called FamilyIDField and IndividualIDField, respectively. Basic pedigree reconstruction additionally requires knowledge of the father, mother, and gender of each individual. Therefore, Madeline defines a set of five core fields that must be present in every pedigree database:

  1. FamilyIDField -- database key field
  2. IndividualIDField -- database key field
  3. FatherIDField -- required for pedigree reconstruction
  4. MotherIDField -- required for pedigree reconstruction
  5. GenderField -- required for pedigree reconstruction

The remaining data fields in a pedigree database can be classified into two groups: (1) phenotype and (2) genotype fields. Madeline therefore classifies all fields in a database table into one of these three categories using the single-letter identifiers shown below:

  1. "C" -- core fields
  2. "P" -- Phenotype fields
  3. "G" -- Genotype fields

The complete set of core fields consists of the five obligatory core fields listed above, as well as some additional, non-obligatory core "phenotype" fields such as AffectionStatusField and DateOfBirthField.

Genetic Map Tables

A map table contains map information related to markers on one or more chromosomes. The key fields in a map table are:

  1. ChromosomeField -- chromosome on which marker appears
  2. MarkerField -- name of the marker

The data fields in a map table are:

  1. PositionField -- map position from p terminus in centiMorgans
  2. OrdinalField -- ordinal ranking of the marker in the map from 1 to n where n is the number of markers mapped for the given chromosome

Marker Tables

A marker table contains the alleles for a specific marker measured on a specific individual. Output from ABI machines is in this table format. This type of table has three key fields:

  1. FamilyIDField -- family of the individual
  2. IndividualIDField -- ID of the individual
  3. MarkerField -- name of the marker

There are only two essential data fields in a marker table:

  1. Allele1Field -- positive integral numeric label assigned to first allele
  2. Allele2Field -- positive integral numeric label assigned to second allele

In principle, the two allele fields could be represented by a single genotype field containing the numeric labels separated by a forward slash, "/". Madeline does not yet contain support for this option in marker tables.

Madeline provides support for integrating the information in a marker table into a pedigree table via the transpose and merge commands. The transpose command takes care of converting the paired allele fields into the single genotype fields expected in a pedigree table.

Supported Database Formats

Madeline currently supports xbase (FoxPro, dBase III/IV), Visual FoxPro and SAS transport file formats, and space-delimited, column-aligned ASCII flat files. Madeline supports flat file tables directly by referencing a binary header file created using the recognize command. All pedigree databases are opened using the open command. Madeline’s database engine detects operating system and file byte-ordering at run time, thus permitting database tables from PCs to be opened on Unix workstations, and vice versa.

Supported Data Types

Madeline’s database engine supports character, numeric (floating point and integer), and date types of the supported database formats. A logical data type such as the "L" field type of xbase is not supported: use appropriately coded numeric variables instead. Other derived types, such as date-time or monetary types are not supported.

Character Data

Character data are read from databases by trimming leading and trailing space characters. Thus, blank entries in a database appear as the empty string, "". When entered on the command line, literal character data must be delimited by a pair of matching single or double quotes, e.g., "0001-230" or '0980A'.

Numeric Data

All numeric data types are converted to double-precision floating point numbers. Literal numeric values are entered on the command line without delimiters.

Logical or Boolean Data

In order to support multiple file formats and missing values in a uniform manner, Madeline does not recognize a logical data type separate from the numeric data type. In contexts where a value is to be interpreted as a logical value, Madeline treats zero as _false, and any non-zero non-missing value as _true. Binary true/false data should thus be coded using a numeric field type with values of 0, 1, and a missing value indicator if required.

Date Data

Date data read from a file are automatically converted to Julian day integers. When entered at the command line, dates must be delimited between curly braces and must be entered according to the ordering and capitalization conventions of the current language setting (Fig. 1.4). Madeline recognizes spaces, commas, periods or forward slashes as delimiters between the month, day, and year elements of a date. Madeline recognizes correctly capitalized, unabbreviated month names and month ordinals. Madeline does not recognize two-digit years as belonging to the current century.

M>show {December 11 1963}
{Wednesday, December 11, 1963}
M>show {December 11, 1963}
{Wednesday, December 11, 1963}
M>show {12/11/1963}
{Wednesday, December 11, 1963}
M>show {12/11/63}
{Sunday, December 9, 63  <-- in the year 63 A.D, before the Gregorian Calendar
M>show {dec 11 1963}     <-- Madeline does not recognize abbreviated month names ...
{}			 <-- ...so this evaluates to a missing date
M>set language to Suomi
M>show {11.12.1963}
{keskiviikko 11.12.1963}
M>

Fig. 1.4. Dates in Madeline. Dates entered at the command line must be delimited by curly braces and must adhere to the ordering and capitalization conventions of the current language setting.

 

Extent of Date Support

Date data may be displayed on pedigree drawings. Dates may also be used in an expression passed to a view or a draw command, to a subsetting command such as exclude, or to the sort command (which sorts the order of individuals on a pedigree drawing). There is currently no support for writing date data to an output file.

Missing Value Support

Madeline supports entry of missing values from the command line, and also provides a simple mechanism for the user to define sets of values in a database that should be mapped as missing values when the database is read by Madeline.

On the command line, Madeline provides the following external representations of internal missing value indicators for the user to use:

Some supported database formats, such as flat files and FoxPro database files, do not provide native missing value support for character and numeric types. Even when missing value support is provided by a database format, protocols in a study may require that different types of missing value codes be used when recording missing values. For example, in the FUSION Los Angeles data, different negative integers were used to code for assay pending, no assay, and no tube conditions.

Madeline therefore permits the user to specify lists of values that are to be treated as missing values. These lists of missing value indicators are stored in two arrays. CharacterMissingValue[] is used whenever character fields, including genotype fields, are referenced. NumericMissingValue[] is used whenever numeric fields are referenced (Table 1.1). For simplicity, these arrays can be referenced using their abbreviated names, cmv[] and nmv[], respectively.

Table 1.1. Character and numeric missing value arrays in Madeline.
Full Name Abbreviated Name Default Values
CharacterMissingValue[] cmv[]
cmv[0] = "."
cmv[1] = "/"
cmv[2] = "0/0"
cmv[3] = "0/ 0"
cmv[4] = "0/  0"
NumericMissingValue[] nmv[]
nmv[0] = -9999

 

When data are read from a database, all native missing values (for example, a space-padded blank entry is a native missing value indicator in a flat file) and any values that match the values specified in Madeline’s CharacterMissingValue[] or NumericMissingValue[] arrays are converted to Madeline’s internal missing value indicators.

At startup, CharacterMissingValue[] and NumericMissingValue[] contain a set of default missing value indicators appropriate to most FUSION data. New values can be assigned to existing cells or appended to the end of these lists as required by the user (Fig. 1.5): this should be done before a database is opened so that the values will be recognized appropriately. The autorun.bat batch file is an appropriate place to set character and numeric missing value indicators. Note that all arrays in Madeline are zero-offset.

M>list cmv            <-- view CharacterMissingValue array
CMV has 5 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
M>cmv[5]="./."        <-- append new value to end of list
M>list cmv
CMV has 6 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
CMV[ 5]="./."
M>list nmv            <-- view NumericMissingValue array
NMV has 1 element:
NMV[ 0]=         -9999
M>nmv[0]=-1           <-- overwrite one value
M>nmv[1]=-9           <-- and append another value
M>list nmv
NMV has 2 elements:
NMV[ 0]=            -1
NMV[ 1]=            -9
M>

Fig. 1.5. Assigning missing value indicators. Missing value indicators may be assigned to existing cells or appended to the ends of Madeline’s character and numeric missing value lists.

 

1.9. Categorization of Data

Upon opening a pedigree table, Madeline categorizes each field into one of three categories:

When a field is completely empty or contains only missing values, Madeline assigns the field to a null category represented by an asterisk, "*".

When required, Madeline allows the user to designate a subset of "P" phenotype fields as "V" covariate fields using the toggle command. Madeline does not automatically assign fields to the covariate category. Field categories are summarized in Table 1.2 and described in greater depth below.

Table 1.2. Summary of Field Categories in Madeline.
Data Category Symbolic Designation Description
Core C Set of five required fields like GenderField that must be present in all pedigree databases, plus additional optional fields, like AffectionStatusField, that are not required by default but may be required for some operations.
Genotype G Character fields containing two numeric labels separated by a forward slash character, e.g., "141/142"
Phenotype P Character, numeric, or date fields that contain categorical or continuous phenotype information.
Covariate V A subset of phenotype fields that are to be used as covariates. The user must use the toggle command to change the designation of a "P" field to "V".
Null * Character, numeric, or date fields that are completely empty or contain only missing value indicators. In general, these fields cannot be operated upon.

 

Core Data Fields

Core "C" data fields provide key information about an individual (Table 1.3). Madeline identifies core fields by their names. These names are stored in internal variables whose values may be reassigned by the user. In conformance with the requirements of the supported database types, all letters of field names must be capitalized, and cannot exceed 10 letters in length. Madeline automatically capitalizes and truncates any non-conformant field name identifiers.

Core data fields are either required or optional. The absence of one or more of the five required core fields will generate an error when a database is opened ( -- An exception applies when FUSION 1 data are used --see below).

Optional core fields may be required for some operations, but are not required by default. Madeline makes use of the additional information provided in optional core fields whenever they are present. For example, Madeline’s pedigree drawing functionality is enhanced by the presence of fields for affection, death, index case, monozygotic and dizygotic twin status.

Table 1.3. Core Data Fields in Madeline.

Variable Name

Description

Default Value

Expected Field Type

I. Required Core Fields which must always be present1:

1. IndividualIDField

Individual identifier

"STUDYID"

Character only

2. FatherIDField

Father's identifier

"FATHER"

Character only

3. MotherIDField

Mother's identifier

"MOTHER"

Character only

4. GenderField

Gender

"SEX"

Character or numeric

5. FamilyIDField 1

Family identifier

"FAMID"

Character only

II. Optional Core Fields:

AffectionStatusField

Affection status

"NAFFECTE"

Numeric or character

DeathStatusField

Death status

"DECEASED"

Numeric or character

IndexCaseField

Index case or proband indicator

"PROBAND"

Numeric only

LiabilityClassField

Liability class

"LCLASS"

Numeric or character

MZTwinField

Monozygotic twin status indicator

"TWIN"

Character only

DZTwinField

Dizygotic twin status indicator

"DZTWIN"

Character only

DateOfBirthField

Date of birth

"DOB"

Date only

DateOfDeathField

Date of death

"DOD"

Date only

1 The FamilyIDField is not required when data are restricted to FUSION 1 IDs only.

 

Interpretation of Core Data

Madeline interprets data from required and optional core fields in order to reconstruct pedigrees and evaluate key information. A clear understanding of how Madeline interprets core data is essential to proper use of the program.

Use of Arrays To Map External Values Into Internal Meanings

A key aspect of Madeline’s generality and flexibility is the use of a set of arrays to map external data values into internal meanings. We have already seen how Madeline uses CharacterMissingValue[] and NumericMissingValue[] in order to map external missing value indicators to uniform internal missing value representations. If a value in a core field such as the field for gender, affection status, or death status does not map to a missing value, Madeline uses a designated array for mapping the external categorical value into an internal representation.

For example, suppose the GenderField contains the value "F" for some record. Since "F" is not a missing value listed in CharacterMissingValue[], Madeline looks in CharacterSexValue[] (abreviated as csv[]) and sees that "F" matches the second entry in the list, which is the entry reserved for female, _female. That is:

"F" = CharacterSexValue[ 1 ] = CharacterSexValue[ _female ]

So, Madeline knows that the individual is a female and records this information internally.

To insure that Madeline recognizes values in core fields correctly, assignments to the designated arrays must be made before any open or load command.

Database Field Naming Conventions

Different database file formats impose different restrictions on the length and format of field names in a database. For example, up to 10 characters can be used for field names in an xbase file, but only up to 8 characters in a SAS transport file. Although Madeline now supports several different file formats, the program originally only supported the xbase file format. As a result, Madeline restricts field name identifiers as follows:

Madeline does not actively check for errors such as spaces or disallowed characters in field identifiers. This is the user's responsibility. Madeline also has no way of knowing in advance what type of database file will be opened. For example, the program will not notice if you enter a ten-letter name for use with a SAS transport file that permits only 8-letter field identifiers.

Family Identifier

The value in FamilyIDField tells Madeline the name of the family ID field in the database. The default value is "FAMID".

  The FamilyIDField is not strictly required when FUSION-compliant IDs are used. When the FamilyIDField is not present, Madeline automatically extracts the family identifier from individual IDs which "look" like FUSION IDs. However, FUSION 2 databases are likely to have "95x" individuals who are connected to pedigrees via unstudied individuals who are assigned IDs that are not FUSION-compliant. Thus, the FamilyIDField is required when reading such databases.

Individual and Parental Identifiers

The values in IndividualIDField, FatherIDField, and MotherIDField serve to identify the individual and parent ID fields in the database. The default values are "STUDYID", "FATHER", and "MOTHER", respectively.

Parent IDs should be present in both the FatherIDField and MotherIDField of all non-founder individuals. The program interprets any individual with missing value indicators for both parents as a founder.

In the event that one of the two parent IDs is missing for an individual or individuals in a sibship, Madeline provides a randomly-generated eight-letter identifier to represent the missing parent. The randomly-generated IDs begin and end with exclamation marks to distinguish them from regular IDs. Using the generated ID, Madeline constructs a virtual parent in memory who will appear on pedigree drawings (Fig. 1.6) and in output from the write command. Madeline assumes that the sibs are full sibs sharing a single pair of parents.

Fig. 1.6. Virtual parent in Madeline. A virtual parent with a randomly-generated ID (right) is constructed when the ID of one parent is missing among a sibship of individuals (not shown). Sibs are assumed to be full sibs.

 

  When FUSION-compliant IDs are used, it is possible to leave the FatherIDField and MotherIDField of non-founders both missing in cases where Madeline can determine the IDs of the parents. For example, Madeline knows that the parental IDs of a "100" or "401" individual must end in "200" and "300" for the father and mother, respectively. Madeline first looks for parents sampled during FUSION 1 or FUSION 2 in the database. If parents are not found in the database, Madeline dummies-in virtual parents using FUSION 1 IDs. In other cases, if only one of the two parent IDs is missing, Madeline can reconstruct the correct ID of the missing parent from the parent whose ID is provided. For example, if a "801" individual is the offspring of a "402" sib, the missing parent’s ID must end in "702".

Gender Data

The default value for GenderField is "SEX". The GenderField can be either numeric or character. Madeline detects the field type when the database is opened. Madeline defines two constants, _male, which has a value of 0, and _female, which has a value of 1. These symbolic constants are used for indexing two arrays, NumericSexValue[] and CharacterSexValue[]. These arrays define the external values used in a database to designate gender (Table 1.4). Default values may be reassigned by the user as required.

Table 1.4. Character and Numeric Sex Value Arrays.
Array Name Abbreviated Name Default Values
CharacterSexValue[] csv[]
csv[_male  ] = "M"
csv[_female] = "F"
NumericSexValue[] nsv[]
nsv[_male  ] = 0
nsv[_female] = 1

 

In Madeline, only terminal individuals without offspring may retain a gender attribute of missing. If during pedigree reconstruction Madeline detects any father or mother with a missing gender attribute, the program will automatically change the gender of the individual in memory to be consistent with the reconstruction, and will warn the user of the change. The database file on disk will not be changed.

Madeline will also automatically correct the gender attribute of mislabeled individuals in memory, for example, of a male listed as a mother, or of a female listed as a father. Madeline always warns the user of these types of database errors. Again, the database file on disk will not be changed -- that is the user's responsibility.

Madeline will warn the user and then terminate abruptly if conflicting and unresolvable gender roles exist for an individual, for example if an individual is listed as both a mother and a father.

Monozygotic and Dizygotic Twin Data

The MZTwinField should remain blank or missing for non-twins, and should contain a single-letter identifier for each twin pair or group. For example, "A" can be used to designate the first twin pair in a family, "B" the second pair, and so on. Starting with version 0.90 of the program, MZTwinField is considered an optional core field.

The optional DZTwinField, if present, should be coded in a similar manner to designate dizygotic twins.

Affection Status Data

The AffectionStatusField may be either numeric or character. Madeline defines two symbolic constants for describing the affection status of sampled individuals (the underscores are used to avoid confusion with possible field names and are required):

In addition to these two categories, Madeline also recognizes these additional categories for mapping unstudied individuals:

These additional categories are useful for drawing extended pedigrees which may include unstudied individuals in addition to sampled individuals. Madeline defines two arrays, CharacterAffectionStatus[] and NumericAffectionStatus[], for mapping external affection status values to one of the five internally recognized categories (Table 1.5).

Table 1.5. Character and Numeric Affection Status Arrays.
Array Name Abbreviated Name Default Values
CharacterAffectionStatus[] cas[]
cas[_unaffected] = "0"
cas[_affected  ] = "1"
cas[_UnstudiedUnaffected] = "2"
(unstudied, reported unaffected)
cas[_UnstudiedAffected  ] = "3"
(unstudied, reported affected)
cas[_UnstudiedConflicting] = "4"
(unstudied, conflicting reports)
NumericAffectionStatus[] nas[]
nas[_unaffected] = 0
nas[_affected  ] = 1
nas[_UnstudiedUnaffected] = 2
(unstudied, reported unaffected)
nas[_UnstudiedAffected  ] = 3
(unstudied, reported affected)
nas[_UnstudiedConflicting] = 4
(unstudied, conflicting reports)

 

Note that categories 2-4 refer only to unstudied individuals. Guard against using the externally mapped values of categories 2-4 for sampled individuals, especially if the write command is used to produce a file for analysis.

Death Status Field

The optional DeathStatusField may be either numeric or character. Madeline defines the constants _alive, with a value of 0, and _dead, with a value of 1, for indexing the CharacterDeathStatus[] and NumericDeathStatus[] arrays used to map external values in the DeathStatusField into internal representations (Table 1.6).

Table 1.6. Character and Numeric Death Status Arrays.
Array Name Abbreviated Name Default Values
CharacterDeathStatus[] cds[]
cds[_alive] = "N"
cds[_dead ] = "Y"
NumericDeathStatus[] nds[]
nds[_alive] = 0
nds[_dead ] = 1

 

Index Case Field

The optional IndexCaseField must be numeric. Madeline assumes that the probands or index cases will be coded using a value of 1, and all other individuals with a value of 0.

 When FUSION-compliant IDs are used, Madeline automatically determines which individuals are probands directly from the IndividualIDField, making the IndexCaseField unneccesary.

Liability Class Field

Some output formats, such as Genehunter, have the option of including liability class information. The LiabilityClassField may be numeric or character. Madeline does not interpret the values in this field.

Date of Birth and Death Data

The DateOfBirthField and DateOfDeathField are optional core date fields. When present, Madeline performs checks to insure that dates in these fields are reasonable, and looks for twins based on date of birth who have not been designated as such in the MZTwinField or DZTwinField.

Genotype Data

Genotype "G" data are character fields that contain allelic marker data separated by the forward slash "/" character. The allele labels themselves must be numeric, non-alphabetic labels, e.g. "141/142".

The names of genotype fields should be the names of the markers themselves. This allows Madeline to automatically place the genotype fields into map order whenever a map database for the markers is loaded using the load command. Make sure that marker names in the map database are capitalized to correspond with the required capitalization of field names.

Estimation of Allele Frequencies from Genotype Data

When a database is opened, Madeline automatically estimates allele frequencies for all genotype fields using gene counting ignoring family relationships. Allele frequencies are estimated from all records in a database. Allele frequencies calculated from one database may be saved for use when processing another database using the set SaveAlleleFrequencies on command.

Phenotype Data

Phenotype "P" fields are any remaining fields that are not core "C" or genotype "G" fields. Phenotype fields may be character, numeric, or date fields, and are assumed to contain categorical or continuous phenotype information. Because date fields cannot be written to output from the write command, date fields are the only type of phenotype field not flagged for output when a pedigree database is opened.

For some types of output, it may be necessary to designate certain phenotype fields as representing covariates. Madeline therefore maintains a separate covariate or "V" field category which is a subset of the "P" category. Covariate fields are automatically recognized as phenotype fields when writing any format that doesn’t distinguish between phenotype and covariate fields. "P" fields can be marked as "V" fields using the toggle command.

Marking and Ordering Data Fields for Output

When a pedigree database is opened, most core "C" fields, all genotype "G" fields, and all phenotype "P" fields (except date fields), are flagged, or toggled on, for output by default. Madeline indicates which fields in a database are toggled for output by placing the letter "o" after the category indicator "C","G", or "P" (Fig. 1.7). A number after the "o" indicates the order in which fields will appear in pedigree drawings and file output. Fields may be manually reordered using the set field order command.

M>list fields
  1.FAMID      Co__1   20.D20S482    Go__6   39.D20S96     Go_25
  2.STUDYID    Co__2   21.D20S849    Go__7   40.D20S119    Go_26
  3.SEX        Co__3   22.D20S905    Go__8   41.D20S481    Go_27
  4.FATHER     Co__4   23.D20S846    Go__9   42.D20S836    Go_28
  5.MOTHER     Co__5   24.D20S892    Go_10   43.D20S888    Go_29
  6.TWIN       Co__6   25.D20S115    Go_11   44.D20S886    Go_30
  7.NAFFECTE   Co__7+  26.D20S851    Go_12   45.D20S197    Go_31
  8.BMI        Po__1   27.D20S917    Go_13   46.D20S178N   Go_32
  9.INS_FAST   Po__2   28.D20S894    Go_14   47.D20S866    Go_33
 10.INS_2H     Po__3   29.D20S189    Go_15   48.D20S196    Go_34
 11.BW_REAL    Po__4   30.D20S898    Go_16   49.D20S857    Go_35
 12.GLU_FAST   Po__5   31.D20S114    Go_17   50.D20S480    Go_36
 13.GLU_2H     Po__6   32.D20S912    Go_18   51.D20S211    Go_37
 14.GAD_DUP    Po__7   33.D20S477    Go_19   52.D20S840    Go_38
 15.D20S103    Go__1   34.D20S874    Go_20   53.D20S120    Go_39
 16.D20S117    Go__2   35.D20S195    Go_21   54.D20S100    Go_40
 17.D20S906    Go__3   36.D20S909    Go_22   55.D20S102    Go_41
 18.D20S193    Go__4   37.D20S107    Go_23   56.D20S171    Go_42
 19.D20S889    Go__5   38.D20S170    Go_24   57.D20S173    Go_43
M>

Fig. 1.7. Categorization of Fields in Madeline. The plus "+" sign after NAFFECTE indicates that Madeline has detected this field as the AffectionStatusField: categorical levels of this field will be used to color icon symbols on pedigree drawings. A field listing is shown when a database is first opened, or at any other time using the list fields command.

 

The order of genotype fields is automatically set to map order when a marker map database is loaded using the load command. Load can be issued either before (the preferred method) or after an open command. The order of genotype fields whose names match the names of markers in the map database will be set to the map order.

Fields toggled on for output are displayed in pedigree drawings created with the draw command.

When a write command is executed, the set of core "C" fields required by the specific format being produced will generally be output regardless of the on/off output flag status. For example, Madeline will output the GenderField even if you toggle it off because it is required for almost all output formats. This behavior is required to insure proper file construction. Genotype "Go" fields toggled for output will be written, along with phenotype "Po" (and possibly covariate "Vo") fields toggled for output if the analysis format supports phenotype fields. Some analysis programs, such as Genehunter and Siblink, do not use phenotype data beyond affection status (which is a core field).

Fields may be toggled on or off for output using the toggle command.

Genetic Map Data

Madeline makes use of marker map information to:

The load command is used to load a table containing genetic maps for one or more chromosomes. It may contain only one map for each chromosome. The map database must contain fields of information specifying the chromosome, rank or ordinal position of the marker within the map for a given chromosome, name of the marker, and the position of the marker in centiMorgans (Table 1.7). A map may be viewed using the list map command (Fig. 1.8).

Table 1.7. Map Database Fields in Madeline.
Variable For Storing Field Name Default Value Description
ChromosomeField "CHROMOSOME" Numeric field storing the chromosome number.
OrdinalField "ORDINAL" Numeric field storing the ordinal position or rank of the marker on the map for this chromosome.
MarkerField "MARKERNAME" Character field storing the name of the marker
PositionField "POSITION" Numeric field storing the map position from the p terminus in centiMorgans.

 

M>load '\maps\newmaps.dbf'
Marker maps based on \maps\newmaps.dbf are now installed.
M>list map for chromosome=7
Marker Name Ch Or Position
----------- -- -- --------
D7S2477      7  1    0.0000
D7S531       7  2    5.4000
D7S517       7  3    7.7000
D7S513       7  4   19.1000
D7S493       7  5   36.1000
D7S516       7  6   43.8000
D7S484       7  7   55.6000
D7S510       7  8   62.7000
D7S2422      7  9   74.2000
D7S669       7 10   87.4000
D7S657       7 11  102.6000
D7S515       7 12  111.8000
D7S2502      7 13  124.9000
D7S530       7 14  134.1000
D7S640       7 15  140.5000
D7S495       7 16  145.7000
D7S2513      7 17  150.9000
D7S483       7 18  167.7000
D7S550       7 19  182.4000
M>

Fig. 1.8. Loading and viewing marker maps in Madeline. A map database is installed using the load command. The list map command is used to print a table showing marker name, chromosome, mapped order, and position in centiMorgans.

 

 When using FUSION data with Madeline v. 0.90 and above, be sure to include the following two lines in your batch file, or in the autorun.bat file, in order to define the map database field names used in FUSION:

OrdinalField ="POSITION"
PositionField="KOSAMBICM"

 

Log and Error Reporting Features

Madeline produces three types of log files (Table 1.8). The first is a summary file that has a ".log" extension by default and records each command that was entered and a summary of execution results. For example, results of a write command indicate how many pedigrees and individuals were included, how many were excluded, and the total number of pedigrees and individuals. The second is a detail file that has a ".dtl" extension by default. It provides detailed information on which pedigrees and individuals were excluded and why they were excluded. The third log file is an error log that has a ".err" extension by default and records warning and error conditions that occur.

Table 1.8. Log Files in Madeline.
Type of File Default Name Purpose
Summary madeline.log Records commands and summaries of execution results.
Detail madeline.dtl Records details regarding inclusion and exclusion of individuals and pedigrees.
Error madeline.err Records warning and error conditions.

 

Display of Warning and Error Levels

If manageable errors do occur when a new pedigree database is opened, Madeline’s interactive "M>" prompt changes to display the number and type of error conditions detected. For example, "1 SYNTAX ERROR 10 WARNINGS M>" would indicate that one syntax error and 10 manageable error conditions or warnings occurred. Altogether, the program maintains four categories of warnings and errors:
  1. Syntax errors
  2. Warnings
  3. Severe Warnings
  4. Fatal Errors

A syntax error refers to an error in typing a command on the command line or in a batch file. A warning often indicates a manageable database error such as having only one instead of both parents listed in a database. A severe warning indicates a more severe type of database error such as having a male listed as the mother of an individual. Madeline will try to manage this type of situation, for example by changing the sex of the "male" mother to female. Such a change does not guarantee that the situation is remedied, much less correct: later in the same database, the "male" mother may turn out to be listed as the "father" of another child! This would cause a fatal error, causing the program to terminate, because there is no way to rectify such inconsistent information. The warning and error conditions may be reviewed in the error log.

Pedigree Reconstruction and the Categorization of Individuals

When a pedigree database is opened, Madeline reconstructs pedigrees based on the core data fields. When records for the parents of non-founder individuals are absent from the database, Madeline dummies-in the parents using the IDs shown in the FatherIDField and MotherIDField. If one of the two parental IDs is missing, Madeline creates a random ID for the missing parent . Random IDs are always eight characters in length and begin and end with an exclamation point (e.g., "!EW12M5!", "!G79ER5!", etc.) to facilitate recognition.

 When FUSION IDs are used, Madeline dummies-in parents even when parental IDs are not provided in the FatherIDField and MotherIDField, and joins together spouses when they don’t have any offspring.

After reconstructing pedigrees, Madeline classifies individuals into categories (Table 1.9) and summarizes their distribution in a table (Fig. 1.9). Attached individuals are individuals in the database who have either parents, or offspring, or both. Unattached individuals are in the database, but remain unconnected because they don’t have parents or offspring. Unattached individuals often represent a set of unrelated controls in a data set.

 In the current version, Childless spouses can only be detected when FUSION IDs are employed. When a FUSION couple does not have children listed in the database, usually one of the individuals has other connections to the pedigree and falls into the attached category, while the remaining spouse usually has no other connections to the pedigree and so is categorized as a childless spouse.

Table 1.9. Classes of Individuals in Madeline.
Category Description
In Database:
Attached Individuals in the database who have parents and/or offspring.
Childless Spouses Married individuals in the database who do not have children and who are not otherwise attached to a pedigree.

  In the current version, Madeline detects marriages without offspring only when FUSION IDs are employed.

Unattached Individuals in the database who remain unconnected. These may be controls.
Not In Database:
Not In Database Parents without records in the database who are inserted by Madeline.

 

M> open ‘\test\test.dbf’
         .
         .
         .
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        590         0       590
Individuals .................      3,317         0     3,317
 + In database ..............      2,178         0     2,178
 |  + Attached ..............      2,164         0     2,164
 |  + Childless spouses .....         14         0        14
 |  + Unattached ............          0         0         0
 + Not in database ..........      1,139         0     1,139
M>

Fig. 1.9. Summary table of pedigree count and distribution of individuals by category in Madeline. After a database is opened and pedigrees reconstructed, Madeline displays a table showing the number of pedigrees and distribution of individuals by category.

Madeline provides _unattached, _ChildlessSpouse, and _InDatabase as references which return boolean status information regarding the categorization of an individual. These references can be easily used in queries to find out about the categorization of individuals (Fig. 1.10).

M>view for _ChildlessSpouse
0007-500 in 0007 (rec. no.    42) * childless spouse *
0049-500 in 0049 (rec. no.   276) * childless spouse *
0409-500 in 0409 (rec. no.  2433) * childless spouse *
0442-500 in 0442 (rec. no.  2628) * childless spouse *
0497-500 in 0497 (rec. no.  2912) * childless spouse *
1040+500 in 1040 (rec. no.  3917) * childless spouse *
1360+500 in 1360 (rec. no.  4853) * childless spouse *
1366+500 in 1366 (rec. no.  4862) * childless spouse *

8 individuals in 8 pedigrees matched as follows:

Individuals ..............          8
 + In database ...........          8
 |  + Attached ...........          0
 |  + Childless spouses ..          8
 |  + Unattached .........          0
 + Not in database .......          0
M>

Fig. 1.10. References returning boolean status information about individuals, such as _ChildlessSpouse, can be easily incorporated into queries in Madeline.

Data Classifications of Individuals

Before writing a file in a specific format using the write command, Madeline determines which individuals in a pedigree have data that can be used in an analysis of that pedigree. Madeline does this by examining the phenotype "Po" and genotype "Go" fields toggled on for output. Madeline uses this information when deciding which individuals are required in output. This is described in more detail in Data Evaluation and Management.

After the file has been written, Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category (Fig. 1.11). In this table, Madeline sub-categorizes attached individuals based on whether they have data or not, or have been otherwise marked for exclusion by the user. Note that individuals marked for exclusion may actually be included in output, but without their data, in order to preserve pedigree structure.

M>write to ‘\test\test.ped’ in genehunter format
          .
          .	
          .
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        574        16       590
Individuals .................      3,247        70     3,317
 + In database ..............      2,140        38     2,178
 |  + Attached ..............      2,140        24     2,164
 |  |  + With data ..........      2,139        15     2,154
 |  |  + Without data .......          1         9        10
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....          0        14        14
 |  + Unattached ............          0         0         0
 + Not in database ..........      1,107        32     1,139
M>

Fig. 1.11. Summary table after a write command in Madeline. Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category. Attached individuals (in bold) are sub-categorized based on whether they have data or not, or have been marked for exclusion by the user.

 

Twin Management

When present, Madeline relies on information contained in the MZTwinField, DZTwinField, and DateOfBirthField to evaluate monozygotic and dizygotic twinships. When the optional DateOfBirthField is included, Madeline verifies that birth dates of twins match. Verification is extended to dizygotic twins when the optional DZTwinField is also included.

When the DateOfBirthField is included, Madeline looks for twins who are not marked in the either MZTwinField or DZTwinField (if present). Apparent twins of opposite sex are categorized as dizygotic twins. Apparent same-sex twins are assigned to a special twin of unknown type category. Twins whose type is unknown are shown with a question mark between them in pedigree drawings.

If Madeline encounters single, unpaired individuals marked as twins in the MZTwinField or DZTwinField, the program automatically removes the twin flag and informs the user of the change. The flag is only altered in memory -- the data table itself remains unchanged.

Messages about twinships are recorded in the summary and detail log files.

Consanguinity

Madeline automatically detects consanguinity in pedigrees. Messages about consanguinity are recorded in the summary and detail log files.

Multiple Mates

There is no limit to the number of spouses that an individual in a pedigree may have. Pedigree drawings can display up to 10 spouses of a single individual.

Multiple Original Founders

Madeline can model pedigrees having multiple original founders. When the DividedPages flag is on (the default), Madeline's draw command will draw pedigrees consisting of an ancestral founder with one or more founding spouses on a single virtual page. Pedigrees consisting of two or more founding ancestral mate groups will be printed on multiple virtual pages (Whether a single virtual page is printed on one or more physical pages depends on the setting of orientation and the unscaled dimensions of the drawing).

Data Evaluation And Management

Prior to writing output in a specific format, Madeline determines which individuals in a pedigree have data that can be used for analysis by examining the genotype "Go" fields and, if appropriate, the phenotype "Po" and covariate "Vo" fields toggled on for output.

In general, an individual is considered to have genotype data if he is typed for at least one marker among the set of "Go" fields. If applicable, an individual is considered to have phenotype data if all of his or her "Po" and "Vo" fields are non-missing.

After flagging individuals in a pedigree who have usable data, Madeline decides whether the entire pedigree is usable or not. Madeline’s decisions depend on the specific format keyword associated with the write command. For example, using the GenehunterNpl keyword (for a non-parametric analysis) will result in a different set of pedigree exclusions than the genehunter keyword (for a parametric analysis), although there will certainly be overlap in the sets.

Only required individuals in included pedigrees are written to output. Required individuals consist of individuals who:

For example, records for unsampled parents are often required to show relationships among siblings. Terminal individuals without offspring who do not have data are excluded from output. Individuals who have been marked for exclusion by the user using the exclude command will be included, but without their data, only if they are required to maintain pedigree structure. Otherwise, they will be excluded.

It is possible to turn off Madeline's data evaluation machinery for most formats in order to include possibly unusable pedigrees and individuals in output by issuing the command set autoexclude off.

Tracking Inclusion and Exclusion of Pedigrees and Individuals

Madeline’s detail log file records which pedigrees were excluded from output. Fig. 1.12 shows an example detail log produced after requesting an output file in GenehunterNpl format.

         .
         .
         .
GenehunterPedigreeHasData(): excluding pedigree 0547: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0557: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0558: lacks an individual with data.
GenehunterPedigreeHasData(): excluding pedigree 0560: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0572: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0583: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0587: contains only a single affected 
individual.
         .
         .
         .

Fig. 1.12. Excerpt from a Madeline detail log file produced after requesting output in GenehunterNpl format. Madeline’s detail log file records which pedigrees were excluded from output and why.

In addition, a draw command executed after a write command will automatically produce annotated pedigree drawings showing which individuals:

An example is shown in Fig. 1.13. In this example, the user marked individuals with a body mass index (BMI) greater than or equal to 35 for exclusion using the exclude command and then requested an output file in GenehunterNpl format.

Fig. 1.13. Annotated pedigree drawing produced by draw after a write command in Madeline. Madeline dummied-in the two founding parents, "200" and "300", who are indicated by dashed lines. They were included ("INCLUDED") in output. Two individuals, "500" and "601", were marked for exclusion by the user. The terminal individual, "601", was not included in output ("EXCLUDED"), but "500" was retained with data excluded in order to preserve pedigree structure ("DATA EXCL INDV INCL"). The remaining individuals are all annotated as having genotype data and were included in output ("HAS DATA - INCLUDED"). Affected individuals are shaded and labeled with "A", while unaffected individuals are unshaded and labeled with "U".

 

Queries and Subsetting

Madeline provides powerful mechanisms for querying and subsetting records in pedigree tables. Database management systems can generally match query criteria against only one record at a time. In contrast, Madeline is specialized for dealing with multiple relationships in a pedigree simultaneously.

Madeline provides mechanisms for referring to related records within a single query statement. In Madeline, you can reference an individual, his or her mother or father, mates, and offspring all in a single query statement.

You can also reference aggregate or summary information related to an entire sibship, such as the mean sibship value of a variable, as easily as you can reference values related to single individuals. These two mechanisms -- referencing related individuals and referencing sibship aggregate data -- make it easy to get answers to many questions in Madeline that can be tedious to obtain in general database management systems.

Referencing Internal Information About An Individual And Relatives

Madeline allows the user to look at internal information about an individual and his or her relatives using references. References are a subset of keywords which begin with an underscore character to distinguish them from similarly-named variables or fields in databases. There are two types of references:

References to Internal Information About An Individual

Madeline provides references to many items of internal information about an individual, such as the number of offspring (_noffspring) and number of mates (_nmates) an individual has, and total number of individuals in the individual's pedigree (_n). Example usage is shown in Fig. 1.14. Table 5.4 lists all references to internal information.

M>go 1901          <-- go to record no. 1901
M>show studyid     <-- display the studyid of this individual
"05100"
M>show bmi         <-- display body mass index
48.9809
M>show cpep        <-- display c peptide value
0.88
M>show _noffspring <-- display number of offspring
4
M>show _nmates     <-- display number of mates
1
M>show _n          <-- display total number of individuals in this individual’s pedigree
16
M>

Fig. 1.14. References to internal information about an individual in Madeline. Command lines shown in blue are examples of references to internal information that Madeline maintains about each individual.

References To Relatives

Madeline also maintains references which point to relatives of an individual (Fig. 1.15). The references to mates, _mate[], and offspring, _o[], are treated as arrays. Alternate references such as _spouse for _mate[0] and _FirstChild for _o[0], are also provided for convenience.

References can be chained using the dot operator, ".", in order to access information related to more distant relatives. For example, a maternal grandmother may be referenced using _mother._mother. Example usage is shown in Fig. 1.15. A complete list of references to relatives is provided in Table 5.4.

M>go 6174                  <-- go to record no. 6174
M>show frstname            <-- first name of individual
"William"
M>show lastname            <-- last name of individual
"Goodman"
M>show _noffspring         <-- number of offspring
11
M>show _nmates             <-- number of spouses
1
M>show _mate[0].frstname   <-- first name of  spouse
"Tessie"
M>show _FirstChild.dob     <-- date of birth of first listed child
{Thursday, May 30, 1957}
M>show _SecondChild.dob    <-- date of birth of second listed child
{Monday, December 19, 1966}
M>show _o[10].dob          <-- date of birth of eleventh listed child
{Sunday, January 25, 1953}
M>show _mother._mother.dob <-- date of birth of maternal grandmother (unknown)
{ }
M>show _mother._mother.lastname <-- last name of maternal grandmother
"Toughwoman"
M>

Fig. 1.15. Using References to Relatives in Madeline. Command lines using references to relatives are shown in blue. Note that children in the offspring vector are sorted by IndividualIDField, not by date of birth.

Aggregate Functions

In addition to references to individual information and relatives, Madeline provides aggregate functions that allow one to look at aggregate or summary information -- such as means and standard deviations -- of the offspring of an individual (Fig. 1.16).

M>go 1577            <-- go to record no. 1577
M>show studyid       <-- display studyid
"044301"
M>show _noffspring   <-- display number of offspring
2
M>show _o[0].bmi     <-- body mass index of first child
31.1327
M>show _o[1].bmi     <-- body mass index of second child
32.7896
M>show _omean(bmi)   <-- mean body mass index of offspring
31.9612
M>show _ostddev(bmi) <-- standard deviation of offspring bmi
1.17156
M>

1.16. Aggregate Functions In Madeline. Aggregate functions (blue) allow one to look at summary information such as means and standard deviations of the offspring of individuals.

 

All aggregate functions take as an argument an expression which evaluates to a numeric result. Table 6.2 lists the aggregate functions available in Madeline.

Query and Subsetting Commands

The view command retrieves a subset of records that match query criteria. The exclude command allows the user to mark a subset of records for exclusion from output. The unexclude command performs the opposite function -- unmarking a subset of records previously marked for exclusion. Starting with version 0.90, the draw command can now also be invoked with a query expression in order to draw a subset of pedigrees. Example usage is shown in Fig. 1.17.

M>view for _noffspring>=3 and _omean(bmi)>=50
2113-100 in 2113 (rec. no.    32)
2113-500 in 2113 (rec. no.    35)

2 individuals in 1 pedigree matched as follows:

Individuals ..............          2
 + In database ...........          2
 |  + Attached ...........          2
 |  + Childless spouses ..          0
 |  + Unattached .........          0
 + Not in database .......          0

M>exclude for _noffspring>=3 and _omean(bmi)>=50
2113-100 has been marked for exclusion
2113-500 has been marked for exclusion

2 individuals in 1 pedigree marked for exclusion as follows:

Individuals ..............          2
 + In database ...........          2
 |  + Attached ...........          2
 |  + Childless spouses ..          0
 |  + Unattached .........          0
 + Not in database .......          0
M>draw pedigrees for _noffspring>=3 and _omean(bmi)>=50
1 pedigree in result set
calling "gs madeline.ps"
M>

Fig. 1.17. Query and Subsetting Commands in Madeline. In this example, the view command is used to identify parents having three offspring whose mean body mass index is greater than or equal to 50. The query result set contains one pair who are excluded using exclude. The draw command is then invoked with the same query expression in order to draw the relevant pedigree. The command draw pedigree '2113' could also have been used.

 

Pedigree Drawings

Madeline's draw command produces drawings of pedigrees using Adobe Postscript language procedures and document structuring conventions (Fig. 1.18).

Fig. 1.18. An example pedigree drawn by Madeline. In this example, two categorical variables indicating disease conditions are graphically displayed on the left and right halves of the icons. The status of the first condition, on the left side, is coded using "U" for unaffected and "A" for affected. On the right side, the status of the second condition is coded using "U" for unaffected, "M" for moderate, and "S" for severe. Missing values are indicated by dots, ".". The icon drawn with a dashed line perimeter indicates an individual whose record was not found in the database. No ID was provided in the FatherIDField of the gender-unknown offspring, and so the program has assigned a random ID of !21A3F8! to the missing father. (The displayed data were invented to illustrate the drawing capabilities of the program).

 

Pedigree drawings can display any number of field variables present in a dataset. The toggle command is used to select fields for inclusion on a pedigree drawing.Toggle output flags toggles which fields appear as labels under the icons on a pedigree drawing. The set field order command is used to order selected fields within their respective categories, "C" ,"P", or "G". On drawings, core "Co" fields always appear first, followed by phenotype "Po" fields, and finally genotype "Go" fields.

Toggle icon flags toggles on or off the set of categorical variables to be displayed graphically by shading or coloring regions of the male and female icons. Madeline divides the icon into pie-slice shading regions based on the number of categorical variables selected. The program does not impose a limit on the number of categorical variables that can be graphed simultaneously.

The manner in which subtrees are divided across pages, the paper orientation, size, margins, and color may all be set using various set commands. When DividedDrawings is set on (the default), subtrees of a pedigree originating from different founding ancestor groups are printed on separate pages. Orientation may be set to portrait, landscape, automatic, or MultiPage. When orientation is set to automatic or MultiPage, Madeline decides on the orientation of individual pedigrees depending upon the width and height of each drawing. In the event that a drawing would require excessive reduction to fit on a single page, Madeline will automatically include Postscript commands to print the drawing in poster-style across several physical pages.

Madeline's Postscript drawing routines are efficient, typically permitting the construction of hundreds of drawings per second on a modern Sun SparcStation or Intel Pentium machine. In order to view the drawings on screen, the user needs to assign the name of a Postscript viewing application (such as GhostView, GV or GSView) to Madeline's PostscriptViewer variable (Fig. 1.17). This can be done in the autorun.bat file.


  14.GAD_DUP    Po__7   33.D20S477    Go_19   52.D20S840    Go_38
  15.D20S103    Go__1   34.D20S874    Go_20   53.D20S120    Go_39
  16.D20S117    Go__2   35.D20S195    Go_21   54.D20S100    Go_40
  17.D20S906    Go__3   36.D20S909    Go_22   55.D20S102    Go_41
  18.D20S193    Go__4   37.D20S107    Go_23   56.D20S171    Go_42
  19.D20S889    Go__5   38.D20S170    Go_24   57.D20S173    Go_43
 M>toggle output flags for 1,3-6,20-57
 M>PostscriptViewer="gv"
 M>draw pedigrees for bmi>=45
 Drawing pedigree 0009, 0009-300's subtree (page 1 of 1) ...
 Drawing pedigree 0086, 0086-300's subtree (page 1 of 1) ...
 Drawing pedigree 0213, 0213+300's subtree (page 1 of 1) ...
 Drawing pedigree 0235, 0235-300's subtree (page 1 of 1) ...
 Drawing pedigree 0305, 0305-300's subtree (page 1 of 1) ...
 Drawing pedigree 0322, 0322-300's subtree (page 1 of 1) ...
 Drawing pedigree 0547, 0547-300's subtree (page 1 of 1) ...
 Drawing pedigree 0572, 0572-300's subtree (page 1 of 1) ...
 Drawing pedigree 0808, 0808+300's subtree (page 1 of 1) ...
 Drawing pedigree 1082, 1082-300's subtree (page 1 of 1) ...
 Drawing pedigree C161, C161+500's subtree (page 1 of 1) ...

 11 pedigrees in result set.

 Calling "gv madeline.ps" ...
 M>
 
 

Fig. 1.17. Drawing pedigrees in Madeline. Toggle output flags specifies which fields will appear on the pedigree drawings. Draw pedigrees for ... specifies a subset of pedigrees that match the query criteria. Madeline calls the Postscript viewing application named in PostscriptViewer (gv in the Linux environment shown).

Producing Output Files for Analysis

The write command is used to produce locus, pedigree, and control or parameter files for analysis. Keywords like Mendel and GenehunterNpl are used to specify the analysis file format.

For most formats which require a control or parameter file, a single write command suffices to produce both the pedigree and control file. In these cases, the control file often contains the required locus information. For some other formats, the command write locus file is used to produce the locus file separately from the write pedigree command used to create the pedigree file. Section 4, Write Formats, documents the procedure required for supported formats.


Section 2
Tutorial

Introduction to the Tutorial

Madeline is easy to use once you see how it works. The goal of this section is to enable you to use Madeline to accomplish real tasks in a very short time. An instructive command file is shown in Fig. 2.1. Comment lines begin with two forward slashes, "//". Command lines are shown in bold. The effect of each command or group of commands is described in turn.

// Assign log files:
LogFile='chr8.log'
DetailFile='chr8.dtl'
ErrorFile='chr8.err'
quiet
system "dir \databases\chr8.*"
// Map missing value indicators:
list nmv
nmv[0]=-1
nmv[1]=-9
list nmv
// Map core field names:
GenderField='GENDER'
AffectionStatusField="AFFECTSTAT"
// Map codes used in core fields:
list csv
csv[_female]='FEMALE'
csv[_male]='MALE'
list csv
// Load a database containing genetic maps:
load '\maps\emap.dbf'
list map for chromosome=8
// Open pedigree database:
open '\databases\chr8.dbf'
// toggle off output of phenotype fields:
toggle output flag for bmi
list fields
// Example 1: Create files for Mendel USERM13 analysis:
write locus file to '\analysis\mendel.loc' in mendel format
write pedigree file to '\analysis\userm13.ped' in userm13 format
// Example 2: Create files for Genehunter non-parametric linkage analysis:
write locus file to '\analysis\ghnpl.loc' in genehunter format
write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format
// Example 3: Create files for Siblink affected sib pair analysis:
// First, mark some individuals for exclusion:
exclude for bmi>=35
write to '\analysis\asp.ped' in SiblinkAffectedPairs format
// Draw pedigrees:
list fields
toggle output flags for 2-5, bmi, affectstat, 12-20
list fields
drawingfile='pedigrees.ps'
set color off
set orientation to automatic
set papermargin to 1.5
AffectstatLabel[0]="U"
AffectstatLabel[1]="A"
draw pedigrees ‘0001’-‘0005’,'0472','0570'
// End session:
goodbye

Fig. 2.1. Example Madeline command file.

 

This tutorial includes sample commands to map missing values, assign core field names, and designate codes used in core fields. These commands are typically required, but some of them will not be needed when FUSION data are used. Madeline is generally quite flexible about the order in which commands are executed. The tutorial presents a recommended command sequence.

Assigning Log Files

LogFile, DetailFile, and ErrorFile store the names of the summary, detail, and error logs. By default, LogFile is set to "madeline.log", DetailFile to "madeline.dtl", and ErrorFile to "madeline.err". If the default names are used, these files will be overwritten each time you start Madeline. When you provide new assignments (Fig. 2.2), the current contents of the log files are copied to the new files, and all subsequent messages are redirected to the new files. Reassignment of the log and detail files should be done at the beginning of a session.

M>LogFile='chr8.log'
LogFile has been changed from "madeline.log" to "chr8.log"
M>DetailFile='chr8.dtl'
DetailFile has been changed from "madeline.dtl" to "chr8.dtl"
M>ErrorFile='chr8.err'
ErrorFile has been changed from "madeline.err" to "chr8.err"
M>

Fig. 2.2. Reassigning summary, detail, and error log file names in Madeline.

 

Quiet

By default, Madeline is in verbose mode. In verbose mode, all messages, both summary and detail log messages, are sent to the screen. Writing many messages to the screen slows the program down a bit and may be distracting, so Madeline supports two quieter levels. When quiet is issued, summary log messages continue to be printed to the screen, but detail log messages are suppressed from the screen. When silent or silence is issued, neither summary nor detail messages appear on the screen. Error messages are always printed to screen regardless of the verboseness setting. To return from a quiet state to the default, issue verbose. Under all circumstances, messages continue to be printed to the summary and detail log files, as appropriate. Quiet mode is recommended on platforms such as DOS32 and Windows that lack scrollable terminal window buffers.

System ‘dir \databases\chr8.*’

The system command transfers a quoted-string command to the operating system shell. This allows the user to obtain directory and file information, copy or move files, or run other software without having to exit Madeline. System is especially useful when you need to obtain file or directory information using the DOS dir command or the UNIX ls command.

Mapping Missing Value Indicators

Nmv is the abbreviated name for the NumericMissingValue array. The list command instructs Madeline to list the elements of the array (Fig 2.3).

M>list nmv
NMV has 1 element:
NMV[ 0]=         -9999
M>nmv[0]=-1
M>nmv[1]=-9
M>list nmv
NMV has 2 elements:
NMV[ 0]=            -1
NMV[ 1]=            -9

Fig. 2.3. Mapping missing value indicators in Madeline.

 

By default, nmv[] contains a single element, -9999, which is a default missing value indicator used in the FUSION study. The assignment nmv[0]=-1 overwrites the value of the first cell with -1. The assignment nmv[1]=-9 assigns -9 to the second cell, automatically expanding the array if necessary. -1 and -9 will now be automatically recognized as missing value indicators when subsequently reading values in a database. Madeline’s self-expanding arrays do not impose a limit on the number of missing value indicators which may be used in a database.

Mapping Core Field Names

In a general setting, the names of core fields in a pedigree database may differ from the default names used in Madeline which are based on field names encountered in the FUSION study. Assignments to the appropriate core field name variables (Fig. 2.4) instruct Madeline to recognize core field names when a pedigree database is opened subsequently. Madeline will automatically capitalize and truncate field names to 10 letters if necessary.

M>GenderField='GENDER'
M>AffectionStatusField="AFFECTSTAT"

Fig. 2.4. Mapping Core Field Names in Madeline.

 

Mapping Codes Used In Core Fields

Arbitrary sets of codes may be used to represent core categorical information such as gender or affection status. Assignments to the appropriate arrays instruct Madeline to recognize study codings correctly. Fig. 2.5. shows how to tell Madeline to recognize the gender codes "MALE" and "FEMALE" in a database in place of the default codes "M" and "F". By using the symbolic constants _female and _male to index the array, you don't have to remember specifically which cell is reserved for which sex.

M>list csv
CSV has 2 elements:
CSV[ 0]="M"
CSV[ 1]="F"
M>csv[_female]='FEMALE'
M>csv[_male]='MALE'
M>list csv
CSV has 2 elements:
CSV[ 0]="MALE"
CSV[ 1]="FEMALE"

Fig. 2.5. Mapping codes used in core fields in Madeline.

 

Loading a Database of Genetic Maps

The load command (Fig. 2.6) loads a table containing genetic maps for one or more chromosomes. The map table can be in any of the supported input database formats. It may contain only one map for each chromosome. The map table must contain fields of information specifying the chromosome, the rank or ordinal position of the marker within the map for a given chromosome, the name of the marker, and the position of the marker in centiMorgans.

After load, Madeline will indicate that marker maps have been installed. You can view a map by issuing list map for chromosome=n, where n is a valid chromosome number (the human x chromosome may be designated by 23). To obtain a listing of all markers for all chromosomes present in the table, issue list map by itself.

M>load '\maps\emap.dbf'
Marker maps based on k:\emap\emap.dbf are now installed.
M>list map for chromosome=8
Marker Name Ch Or Position
----------- -- -- --------
D8S504       8  1    0.0000
D8S550       8  2   15.1000
D8S258       8  3   30.1000
D8S283       8  4   55.0000
Beta3        8  5   59.8000
D8S285       8  6   66.4000
D8S260       8  7   71.3000
D8S530       8  8   80.7000
D8S270       8  9   94.4000
D8S276       8 10  105.0000
GATA101F01   8 11  111.4000
D8S514       8 12  122.2000
D8S284       8 13  135.3000

Fig. 2.6. Loading a database containing genetic maps in Madeline.

 

Toggling Fields

The USERM13, Genehunter, and Siblink pedigree files that will be written subsequently do not include phenotype information. With the exception of core "C" fields which Madeline controls, it is imperative to toggle off all fields in the database which should not be included in the output and which should not be considered when Madeline decides whether an individual or pedigree contains sufficient data for output. This is done using the toggle command (Fig. 2.7). The list fields command can then be used to verify that the correct subset of fields were turned off.

// toggle off output of phenotype fields:
M>toggle output flag for bmi
Note: genotype fields ordered according to current map
M>list fields
  1.STUDYID    Co__1    8.BMI        P       15.D8S276     Go__9
  2.GENDER     Co__2    9.D8S504     Go__1   16.D8S283     Go__4
  3.FATHER     Co__3   10.D8S550     Go__2   17.D8S285     Go__5
  4.MOTHER     Co__4   11.D8S258     Go__3   18.D8S260     Go__6
  5.TWIN       Co__5   12.GATA101F01 Go_10   19.D8S530     Go__7
  6.AFFECTSTAT C       13.D8S514     Go_11   20.D8S270     Go__8
  7.DOB        C       14.D8S284     Go_12
M>

Fig. 2.7. Toggling and listing fields in Madeline. After the toggle command, field 8. BMI is no longer toggled on for output.

 

Opening a Pedigree Database

Open opens a pedigree database. Madeline's database engine seamlessly opens all supported database types on all supported platforms, allowing you to open FoxPro files on Solaris, SAS transport files on a PC, and so on. The user does not need to tell Madeline the file type. To open an ASCII flat file database, see documentation for the recognize, convert, rectify, transpose and merge commands.

When a pedigree database is opened, Madeline first categorizes fields as core "C", genotype "G", phenotype "P", or null, "*". If genotype fields are present, allele frequencies are estimated from all of the data using gene counting, ignoring family relationships (a in Fig. 2.8). If a map table is already installed and contains a map for markers in the database, the genotype fields are automatically ordered according to the map (b in Fig. 2.8). Pedigrees are reconstructed based on the core information. Madeline performs additional data operations when optional core fields such as AffectionStatusField or DateOfBirthField are included (c in Fig. 2.8). In this example, Madeline marks several apparent dizygotic twinships. Madeline also flags the AffectionStatusField, AFFECTSTAT, with a plus sign, "+", indicating that the categorical levels of AFFECTSTAT will be displayed graphically on the male and female icons in pedigree drawings. Finally, the program displays a summary table showing the count of pedigrees and distribution of individuals by category (d in Fig. 2.8).

M>open '\hold\chr8.dbf'
Calculating allele frequencies for   9. D8S504...                  (a)
	…
Calculating allele frequencies for  20. D8S270...                  (a)
Database "\hold\chr8.dbf" opened with     2,506 records
Core information read in   2.00 seconds
	…
NOTE: 0471-100 and 0471-401 now marked with "a" indicating         (c)
an apparent dizygotic twinship.
NOTE: 0570-401 and 0570-402 now marked with "a" indicating         (c)

an apparent dizygotic twinship.
Pedigrees reconstructed in   0.1780 seconds
Note: genotype fields ordered according to current map             (b)
  1.STUDYID    Co__1    8.BMI        Po__1   15.D8S276     Go__9  
  2.GENDER     Co__2    9.D8S504     Go__1   16.D8S283     Go__4  
  3.FATHER     Co__3   10.D8S550     Go__2   17.D8S285     Go__5  
  4.MOTHER     Co__4   11.D8S258     Go__3   18.D8S260     Go__6  
  5.TWIN       Co__5   12.GATA101F01 Go_10   19.D8S530     Go__7  
  6.AFFECTSTAT C    +  13.D8S514     Go_11   20.D8S270     Go__8  
  7.DOB        C       14.D8S284     Go_12 
 
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total       (d)
-----------------------------  --------- --------- ---------
Pedigrees ...................        958         0       958
Individuals .................      3,626         0     3,626
 + In database ..............      2,506         0     2,506
 |  + Attached ..............      2,115         0     2,115
 |  + Childless spouses .....         13         0        13
 |  + Unattached ............        378         0       378
 + Not in database ..........      1,120         0     1,120

Fig. 2.8. Opening a pedigree database in Madeline. Madeline performs a series of operations when the open command is used to open a pedigree database. See text for explanation.

Example 1: Creating Files for Mendel USERM13 Analysis

Mendel’s USERM13 module uses maximum likelihood methods to calculate allele frequencies, taking family relationships into consideration. All genotyped individuals in a database, including childless spouses, controls and other singleton individuals who are classified as unattached by Madeline can be used in an analysis.

USERM13 requires a locus and pedigree file as input. The locus file will contain allele frequency information calculated by Madeline. The pedigree file will contain the family and genotype information. The write locus file command with the generic mendel keyword creates the locus file (Fig. 2.9). The write pedigree file command with the userm13 keyword creates the pedigree file. As expected, childless spouses and a number of unattached individuals are included in the output file. The detail log file documents which individuals and pedigrees were excluded and why.

M>write locus file to '\analysis\mendel.loc' in mendel format
Locus file "\analysis\mendel.loc" has been written.
M>write pedigree file to '\analysis\userm13.ped' in userm13 format
Writing pedigree data to "\analysis\userm13.ped"
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        810       148       958
Individuals .................      3,469       157     3,626
 + In database ..............      2,351       155     2,506
 |  + Attached ..............      2,107         8     2,115
 |  |  + With data ..........      2,107         0     2,107
 |  |  + Without data .......          0         8         8
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....         13         0        13
 |  + Unattached ............        231       147       378
 + Not in database ..........      1,118         2     1,120

Fig. 2.9. Creating locus and pedigree files for a Mendel USERM13 analysis in Madeline.

 

Example 2: Creating Files for Non-parametric Linkage Analysis in Genehunter

Like USERM13, Genehunter also requires a locus and pedigree file for analysis. In addition to allele frequency information, Genehunter’s locus file will contain map distance information obtained from the previously loaded map database. The generic genehunter keyword is used to specify the locus file format (Fig. 2.10).

M>write locus file to '\analysis\ghnpl.loc' in genehunter format
Locus file "\analysis\ghnpl.loc" has been written.
M>write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format
Creating associated Genehunter control file called "\analysis\ghnpl.ctl"
Writing pedigree data to "\analysis\ghnpl.ped"
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        533       425       958
Individuals .................      3,033       593     3,626
 + In database ..............      2,003       503     2,506
 |  + Attached ..............      2,003       112     2,115
 |  |  + With data ..........      2,003       104     2,107
 |  |  + Without data .......          0         8         8
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....          0        13        13
 |  + Unattached ............          0       378       378
 + Not in database ..........      1,030        90     1,120

Fig. 2.10. Creating locus and pedigree files for Genehunter non-pa