![]() |
||||||
![]() |
![]() |
|||||
![]() |
![]() |
![]() |
||||
![]() |
![]() |
![]() |
![]() |
|||
May, 2001
(c) 1999 by the Regents of the University of Michigan, Ann Arbor.
Madeline is software written in ANSI C/C++ for:
Madeline has been compiled for the following platforms:
Madeline was designed to meet the needs of the Finland-United States Investigation of NIDDM Genetics (FUSION) Study. Because of this, Madeline has specific knowledge about FUSION study IDs. A subset of Madeline’s functionality makes use of this knowledge (see FUSION box below).
The program continues to be modified to make it useful for genetic studies in general.
Paragraphs or headings preceded by "
"
describe FUSION-specific functionality.
FUSION
|
Sample ID: 1021+402
| | |
+-----------------+ | +---------------------+
| | |
Family ID begins with: Encoded flag symbol: Individual ID:
- "0" for FUSION 1 "-" for FUSION 1 "100" for probands
- "1" for FUSION 2 fam. "+" for FUSION 2 "200" for fathers
- "C" for control fam. "A" to "Z" for resampled "300" for mothers
- "T" for Trios FUSION records "400" for siblings of the
proband (enumerated)
"500" for proband spouses
(enumerated)
"600" for proband offspring
(enumerated)
"700" for sibling spouses
(enumerated)
"800" for sibling offspring
(enumerated)
|
|
Madeline is internally aware of the structure of FUSION IDs and uses this information in specific situations to:
Madeline currently uses the following rules to determine if an ID in a dataset is a FUSION ID:
A data set can easily contain a mixture of FUSION IDs and non-FUSION IDs. Only IDs meeting the above criteria will be construed as FUSION IDs. |
Running the Program Interactively and in Batch Mode
Instructions to Madeline are entered at a command prompt. Madeline's command interpreter is not sensitive to capitalization. However, capitalization is often used in this document for clarity of presentation.
Madeline can be run interactively or in batch mode (Fig 1.1). To run Madeline interactively, type the name of the program at your system prompt and press return. Madeline’s "M>" prompt will appear.
There are two ways to run batch files. The first way is to provide the name of a batch file containing commands after the name of the program on the command line. The second way is to start Madeline interactively and then use the run command to execute the batch file. Madeline returns to interactive mode if an error occurs, or when a batch file terminates without a goodbye or quit command.
csvr1% <-- system prompt (on UNIX) csvr1% madeline <-- starting the program in interactive mode MADELINE Version 0.910 Copyright (c) 1999 by Edward H. Trager |
Fig. 1.1. Starting Madeline. Madeline can be run either interactively or in batch mode.
An option is available to set parameters and run commonly needed commands automatically each time Madeline is started by providing a special batch file called "autorun.bat" in the working directory where Madeline will be invoked.
Any commands that can normally be invoked on the command line or in a batch file can be placed into autorun.bat. Assignments to specify default field names or environmental settings are typically placed in autorun.bat (Fig. 1.2).
// // Typical autorun.bat file for Unix/Linux environment: // // // Environment settings: // quiet set language to English FileEditor="vi" PostscriptViewer="gv" // // Pedigree drawing-specific settings: // set color off set PaperSize to A4 // margin in centimeters: set PaperMargin to 1.5 set orientation to automatic // // Pedigree database-specific settings: // GenderField='GENDER' FamilyIDField='FAMILY' IndividualIDField='INDIVIDUAL' // // Map standard missing value indicators: // NumericMissingValue[0]=-1 NumericMissingValue[1]=-9 // // Map database-specific settings: // PositionField="POSTN" OrdinalField ="ORDNL" |
Fig. 1.2. Example autorun.bat file.
Starting with Madeline v. 0.91, a warning message is produced if an autorun.bat file is not found, and the "M>" prompt changes accordingly (Fig 1.3).
... Could not find "autorun.bat" file. ... 1 WARNING M> |
Fig. 1.3. In Madeline v. 0.91 and following, A warning is produced if autorun.bat file is not present.
Overview of Database Tables Used by the Program
A database table is a rectangular array of data. A record is a row in the array. A field is a column in the array. One row or record contains the data -- all the measured variables -- for one entity.
In Madeline, the measured entity is either an individual or a genetic marker. Key fields are fields that identify the entity. To uniquely identify an individual, two key fields are required: (1) a family identifier, and (2) an individual identifier. Data fields contain the data measured on the entity. Combinations of other fields will be required to identify other entities, such as a genetic marker. The specific set of key fields required depends upon the context.
In Madeline, only three types of database tables occur:
Each type is described in turn below.
In a pedigree table, each row or record contains the data for one individual. In Madeline, the names of the family and individual ID fields are stored in variables called FamilyIDField and IndividualIDField, respectively. Basic pedigree reconstruction additionally requires knowledge of the father, mother, and gender of each individual. Therefore, Madeline defines a set of five core fields that must be present in every pedigree database:
The remaining data fields in a pedigree database can be classified into two groups: (1) phenotype and (2) genotype fields. Madeline therefore classifies all fields in a database table into one of these three categories using the single-letter identifiers shown below:
The complete set of core fields consists of the five obligatory core fields listed above, as well as some additional, non-obligatory core "phenotype" fields such as AffectionStatusField and DateOfBirthField.
A map table contains map information related to markers on one or more chromosomes. The key fields in a map table are:
The data fields in a map table are:
Marker Tables
A marker table contains the alleles for a specific marker measured on a specific
individual. Output from ABI machines is in this table format.
This type of table has three key fields:
There are only two essential data fields in a marker table:
In principle, the two allele fields could be represented by a single genotype field containing the numeric labels separated by a forward slash, "/". Madeline does not yet contain support for this option in marker tables.
Madeline provides support for integrating the information in a marker table into a pedigree table via the transpose and merge commands. The transpose command takes care of converting the paired allele fields into the single genotype fields expected in a pedigree table.
Madeline currently supports xbase (FoxPro, dBase III/IV), Visual FoxPro and SAS transport file formats, and space-delimited, column-aligned ASCII flat files. Madeline supports flat file tables directly by referencing a binary header file created using the recognize command. All pedigree databases are opened using the open command. Madeline’s database engine detects operating system and file byte-ordering at run time, thus permitting database tables from PCs to be opened on Unix workstations, and vice versa.
Madeline’s database engine supports character, numeric (floating point and integer), and date types of the supported database formats. A logical data type such as the "L" field type of xbase is not supported: use appropriately coded numeric variables instead. Other derived types, such as date-time or monetary types are not supported.
Character data are read from databases by trimming leading and trailing space characters. Thus, blank entries in a database appear as the empty string, "". When entered on the command line, literal character data must be delimited by a pair of matching single or double quotes, e.g., "0001-230" or '0980A'.
All numeric data types are converted to double-precision floating point numbers. Literal numeric values are entered on the command line without delimiters.
In order to support multiple file formats and missing values in a uniform manner, Madeline does not recognize a logical data type separate from the numeric data type. In contexts where a value is to be interpreted as a logical value, Madeline treats zero as _false, and any non-zero non-missing value as _true. Binary true/false data should thus be coded using a numeric field type with values of 0, 1, and a missing value indicator if required.
Date data read from a file are automatically converted to Julian day integers. When entered at the command line, dates must be delimited between curly braces and must be entered according to the ordering and capitalization conventions of the current language setting (Fig. 1.4). Madeline recognizes spaces, commas, periods or forward slashes as delimiters between the month, day, and year elements of a date. Madeline recognizes correctly capitalized, unabbreviated month names and month ordinals. Madeline does not recognize two-digit years as belonging to the current century.
M>show {December 11 1963}
{Wednesday, December 11, 1963}
M>show {December 11, 1963}
{Wednesday, December 11, 1963}
M>show {12/11/1963}
{Wednesday, December 11, 1963}
M>show {12/11/63}
{Sunday, December 9, 63 <-- in the year 63 A.D, before the Gregorian Calendar
M>show {dec 11 1963} <-- Madeline does not recognize abbreviated month names ...
{} <-- ...so this evaluates to a missing date
M>set language to Suomi
M>show {11.12.1963}
{keskiviikko 11.12.1963}
M>
|
Fig. 1.4. Dates in Madeline. Dates entered at the command line must be delimited by curly braces and must adhere to the ordering and capitalization conventions of the current language setting.
Date data may be displayed on pedigree drawings. Dates may also be used in an expression passed to a view or a draw command, to a subsetting command such as exclude, or to the sort command (which sorts the order of individuals on a pedigree drawing). There is currently no support for writing date data to an output file.
Madeline supports entry of missing values from the command line, and also provides a simple mechanism for the user to define sets of values in a database that should be mapped as missing values when the database is read by Madeline.
On the command line, Madeline provides the following external representations of internal missing value indicators for the user to use:
Some supported database formats, such as flat files and FoxPro database files, do not provide native missing value support for character and numeric types. Even when missing value support is provided by a database format, protocols in a study may require that different types of missing value codes be used when recording missing values. For example, in the FUSION Los Angeles data, different negative integers were used to code for assay pending, no assay, and no tube conditions.
Madeline therefore permits the user to specify lists of values that are to be treated as missing values. These lists of missing value indicators are stored in two arrays. CharacterMissingValue[] is used whenever character fields, including genotype fields, are referenced. NumericMissingValue[] is used whenever numeric fields are referenced (Table 1.1). For simplicity, these arrays can be referenced using their abbreviated names, cmv[] and nmv[], respectively.
| Full Name | Abbreviated Name | Default Values |
| CharacterMissingValue[] | cmv[] | cmv[0] = "." cmv[1] = "/" cmv[2] = "0/0" cmv[3] = "0/ 0" cmv[4] = "0/ 0" |
| NumericMissingValue[] | nmv[] | nmv[0] = -9999 |
When data are read from a database, all native missing values (for example, a space-padded blank entry is a native missing value indicator in a flat file) and any values that match the values specified in Madeline’s CharacterMissingValue[] or NumericMissingValue[] arrays are converted to Madeline’s internal missing value indicators.
At startup, CharacterMissingValue[] and NumericMissingValue[] contain a set of default missing value indicators appropriate to most FUSION data. New values can be assigned to existing cells or appended to the end of these lists as required by the user (Fig. 1.5): this should be done before a database is opened so that the values will be recognized appropriately. The autorun.bat batch file is an appropriate place to set character and numeric missing value indicators. Note that all arrays in Madeline are zero-offset.
M>list cmv <-- view CharacterMissingValue array CMV has 5 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" M>cmv[5]="./." <-- append new value to end of list M>list cmv CMV has 6 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" CMV[ 5]="./." M>list nmv <-- view NumericMissingValue array NMV has 1 element: NMV[ 0]= -9999 M>nmv[0]=-1 <-- overwrite one value M>nmv[1]=-9 <-- and append another value M>list nmv NMV has 2 elements: NMV[ 0]= -1 NMV[ 1]= -9 M> |
Fig. 1.5. Assigning missing value indicators. Missing value indicators may be assigned to existing cells or appended to the ends of Madeline’s character and numeric missing value lists.
Upon opening a pedigree table, Madeline categorizes each field into one of three categories:
When a field is completely empty or contains only missing values, Madeline assigns the field to a null category represented by an asterisk, "*".
When required, Madeline allows the user to designate a subset of "P" phenotype fields as "V" covariate fields using the toggle command. Madeline does not automatically assign fields to the covariate category. Field categories are summarized in Table 1.2 and described in greater depth below.
| Data Category | Symbolic Designation | Description | Core | C | Set of five required fields like GenderField that must be present in all pedigree databases, plus additional optional fields, like AffectionStatusField, that are not required by default but may be required for some operations. |
| Genotype | G | Character fields containing two numeric labels separated by a forward slash character, e.g., "141/142" |
| Phenotype | P | Character, numeric, or date fields that contain categorical or continuous phenotype information. |
| Covariate | V | A subset of phenotype fields that are to be used as covariates. The user must use the toggle command to change the designation of a "P" field to "V". |
| Null | * | Character, numeric, or date fields that are completely empty or contain only missing value indicators. In general, these fields cannot be operated upon. |
Core "C" data fields provide key information about an individual (Table 1.3). Madeline identifies core fields by their names. These names are stored in internal variables whose values may be reassigned by the user. In conformance with the requirements of the supported database types, all letters of field names must be capitalized, and cannot exceed 10 letters in length. Madeline automatically capitalizes and truncates any non-conformant field name identifiers.
Core data fields are either required or optional.
The absence of one or more of the five required core fields will
generate an error when a database is opened (
-- An exception applies when FUSION 1 data are used --see below).
Optional core fields may be required for some operations, but are not required by default. Madeline makes use of the additional information provided in optional core fields whenever they are present. For example, Madeline’s pedigree drawing functionality is enhanced by the presence of fields for affection, death, index case, monozygotic and dizygotic twin status.
|
Variable Name |
Description |
Default Value |
Expected Field Type |
| I. Required Core Fields which must always be present1: | |||
|
1. IndividualIDField |
Individual identifier |
"STUDYID" |
Character only |
|
2. FatherIDField |
Father's identifier |
"FATHER" |
Character only |
|
3. MotherIDField |
Mother's identifier |
"MOTHER" |
Character only |
|
4. GenderField |
Gender |
"SEX" |
Character or numeric |
|
5. FamilyIDField 1 |
Family identifier |
"FAMID" |
Character only |
|
II. Optional Core Fields: |
|||
|
AffectionStatusField |
Affection status |
"NAFFECTE" |
Numeric or character |
|
DeathStatusField |
Death status |
"DECEASED" |
Numeric or character |
|
IndexCaseField |
Index case or proband indicator |
"PROBAND" |
Numeric only |
|
LiabilityClassField |
Liability class |
"LCLASS" |
Numeric or character |
|
MZTwinField |
Monozygotic twin status indicator |
"TWIN" |
Character only |
|
DZTwinField |
Dizygotic twin status indicator |
"DZTWIN" |
Character only |
|
DateOfBirthField |
Date of birth |
"DOB" |
Date only |
|
DateOfDeathField |
Date of death |
"DOD" |
Date only |
1 The FamilyIDField is not required when data are restricted to FUSION 1 IDs only.
Madeline interprets data from required and optional core fields in order to reconstruct pedigrees and evaluate key information. A clear understanding of how Madeline interprets core data is essential to proper use of the program.
Use of Arrays To Map External Values Into Internal Meanings
A key aspect of Madeline’s generality and flexibility is the use of a set of arrays to map external data values into internal meanings. We have already seen how Madeline uses CharacterMissingValue[] and NumericMissingValue[] in order to map external missing value indicators to uniform internal missing value representations. If a value in a core field such as the field for gender, affection status, or death status does not map to a missing value, Madeline uses a designated array for mapping the external categorical value into an internal representation.
For example, suppose the GenderField contains the value "F" for some record. Since "F" is not a missing value listed in CharacterMissingValue[], Madeline looks in CharacterSexValue[] (abreviated as csv[]) and sees that "F" matches the second entry in the list, which is the entry reserved for female, _female. That is:
| "F" = CharacterSexValue[ 1 ] = CharacterSexValue[ _female ] |
So, Madeline knows that the individual is a female and records this information internally.
To insure that Madeline recognizes values in core fields correctly, assignments to the designated arrays must be made before any open or load command.
Database Field Naming Conventions
Different database file formats impose different restrictions on the length and format of field names in a database. For example, up to 10 characters can be used for field names in an xbase file, but only up to 8 characters in a SAS transport file. Although Madeline now supports several different file formats, the program originally only supported the xbase file format. As a result, Madeline restricts field name identifiers as follows:
Madeline does not actively check for errors such as spaces or disallowed characters in field identifiers. This is the user's responsibility. Madeline also has no way of knowing in advance what type of database file will be opened. For example, the program will not notice if you enter a ten-letter name for use with a SAS transport file that permits only 8-letter field identifiers.
The value in FamilyIDField tells Madeline the name of the family ID field in the database. The default value is "FAMID".
The FamilyIDField is not strictly required when
FUSION-compliant IDs are used. When the FamilyIDField is not present, Madeline
automatically extracts the family identifier from individual IDs which "look" like
FUSION IDs. However, FUSION 2 databases are likely to have "95x" individuals who are
connected to pedigrees via unstudied individuals who are assigned IDs that are
not FUSION-compliant. Thus, the FamilyIDField is required when
reading such databases.
Individual and Parental Identifiers
The values in IndividualIDField, FatherIDField, and MotherIDField serve to identify the individual and parent ID fields in the database. The default values are "STUDYID", "FATHER", and "MOTHER", respectively.
Parent IDs should be present in both the FatherIDField and MotherIDField of all non-founder individuals. The program interprets any individual with missing value indicators for both parents as a founder.
In the event that one of the two parent IDs is missing for an individual or individuals in a sibship, Madeline provides a randomly-generated eight-letter identifier to represent the missing parent. The randomly-generated IDs begin and end with exclamation marks to distinguish them from regular IDs. Using the generated ID, Madeline constructs a virtual parent in memory who will appear on pedigree drawings (Fig. 1.6) and in output from the write command. Madeline assumes that the sibs are full sibs sharing a single pair of parents.
Fig. 1.6. Virtual parent in Madeline. A virtual parent with a randomly-generated ID (right) is constructed when the ID of one parent is missing among a sibship of individuals (not shown). Sibs are assumed to be full sibs.
When FUSION-compliant IDs are used, it is possible to leave the FatherIDField
and MotherIDField of non-founders both missing in cases where Madeline can determine the IDs of the
parents. For example, Madeline knows that the parental IDs of a "100" or "401"
individual must end in "200" and "300" for the father and mother, respectively.
Madeline first looks for parents sampled during FUSION 1 or FUSION 2 in the database.
If parents are not found in the database, Madeline dummies-in virtual parents using
FUSION 1 IDs. In other cases, if only one of the two parent IDs is missing, Madeline can
reconstruct the correct ID of the missing parent from the parent whose ID is provided.
For example, if a "801" individual is the offspring of a "402" sib, the missing parent’s ID must end
in "702".
The default value for GenderField is "SEX". The GenderField can be either numeric or character. Madeline detects the field type when the database is opened. Madeline defines two constants, _male, which has a value of 0, and _female, which has a value of 1. These symbolic constants are used for indexing two arrays, NumericSexValue[] and CharacterSexValue[]. These arrays define the external values used in a database to designate gender (Table 1.4). Default values may be reassigned by the user as required.
| Array Name | Abbreviated Name | Default Values |
| CharacterSexValue[] | csv[] | csv[_male ] = "M" csv[_female] = "F" |
| NumericSexValue[] | nsv[] | nsv[_male ] = 0 nsv[_female] = 1 |
In Madeline, only terminal individuals without offspring may retain a gender attribute of missing. If during pedigree reconstruction Madeline detects any father or mother with a missing gender attribute, the program will automatically change the gender of the individual in memory to be consistent with the reconstruction, and will warn the user of the change. The database file on disk will not be changed.
Madeline will also automatically correct the gender attribute of mislabeled individuals in memory, for example, of a male listed as a mother, or of a female listed as a father. Madeline always warns the user of these types of database errors. Again, the database file on disk will not be changed -- that is the user's responsibility.
Madeline will warn the user and then terminate abruptly if conflicting and unresolvable gender roles exist for an individual, for example if an individual is listed as both a mother and a father.
Monozygotic and Dizygotic Twin Data
The MZTwinField should remain blank or missing for non-twins, and should contain a single-letter identifier for each twin pair or group. For example, "A" can be used to designate the first twin pair in a family, "B" the second pair, and so on. Starting with version 0.90 of the program, MZTwinField is considered an optional core field.
The optional DZTwinField, if present, should be coded in a similar manner to designate dizygotic twins.
The AffectionStatusField may be either numeric or character. Madeline defines two symbolic constants for describing the affection status of sampled individuals (the underscores are used to avoid confusion with possible field names and are required):
In addition to these two categories, Madeline also recognizes these additional categories for mapping unstudied individuals:
These additional categories are useful for drawing extended pedigrees which may include unstudied individuals in addition to sampled individuals. Madeline defines two arrays, CharacterAffectionStatus[] and NumericAffectionStatus[], for mapping external affection status values to one of the five internally recognized categories (Table 1.5).
| Array Name | Abbreviated Name | Default Values |
| CharacterAffectionStatus[] | cas[] | cas[_unaffected] = "0" cas[_affected ] = "1" cas[_UnstudiedUnaffected] = "2" (unstudied, reported unaffected) cas[_UnstudiedAffected ] = "3" (unstudied, reported affected) cas[_UnstudiedConflicting] = "4" (unstudied, conflicting reports) |
| NumericAffectionStatus[] | nas[] | nas[_unaffected] = 0 nas[_affected ] = 1 nas[_UnstudiedUnaffected] = 2 (unstudied, reported unaffected) nas[_UnstudiedAffected ] = 3 (unstudied, reported affected) nas[_UnstudiedConflicting] = 4 (unstudied, conflicting reports) |
Note that categories 2-4 refer only to unstudied individuals. Guard against using the externally mapped values of categories 2-4 for sampled individuals, especially if the write command is used to produce a file for analysis.
The optional DeathStatusField may be either numeric or character. Madeline defines the constants _alive, with a value of 0, and _dead, with a value of 1, for indexing the CharacterDeathStatus[] and NumericDeathStatus[] arrays used to map external values in the DeathStatusField into internal representations (Table 1.6).
| Array Name | Abbreviated Name | Default Values |
| CharacterDeathStatus[] | cds[] | cds[_alive] = "N" cds[_dead ] = "Y" |
| NumericDeathStatus[] | nds[] | nds[_alive] = 0 nds[_dead ] = 1 |
The optional IndexCaseField must be numeric. Madeline assumes that the probands or index cases will be coded using a value of 1, and all other individuals with a value of 0.
When FUSION-compliant IDs are used, Madeline
automatically determines which individuals are probands directly from the
IndividualIDField, making the IndexCaseField
unneccesary.
Some output formats, such as Genehunter, have the option of including liability class information. The LiabilityClassField may be numeric or character. Madeline does not interpret the values in this field.
The DateOfBirthField and DateOfDeathField are optional core date fields. When present, Madeline performs checks to insure that dates in these fields are reasonable, and looks for twins based on date of birth who have not been designated as such in the MZTwinField or DZTwinField.
Genotype "G" data are character fields that contain allelic marker data separated by the forward slash "/" character. The allele labels themselves must be numeric, non-alphabetic labels, e.g. "141/142".
The names of genotype fields should be the names of the markers themselves. This allows Madeline to automatically place the genotype fields into map order whenever a map database for the markers is loaded using the load command. Make sure that marker names in the map database are capitalized to correspond with the required capitalization of field names.
Estimation of Allele Frequencies from Genotype Data
When a database is opened, Madeline automatically estimates allele frequencies for all genotype fields using gene counting ignoring family relationships. Allele frequencies are estimated from all records in a database. Allele frequencies calculated from one database may be saved for use when processing another database using the set SaveAlleleFrequencies on command.
Phenotype "P" fields are any remaining fields that are not core "C" or genotype "G" fields. Phenotype fields may be character, numeric, or date fields, and are assumed to contain categorical or continuous phenotype information. Because date fields cannot be written to output from the write command, date fields are the only type of phenotype field not flagged for output when a pedigree database is opened.
For some types of output, it may be necessary to designate certain phenotype fields as representing covariates. Madeline therefore maintains a separate covariate or "V" field category which is a subset of the "P" category. Covariate fields are automatically recognized as phenotype fields when writing any format that doesn’t distinguish between phenotype and covariate fields. "P" fields can be marked as "V" fields using the toggle command.
Marking and Ordering Data Fields for Output
When a pedigree database is opened, most core "C" fields, all genotype "G" fields, and all phenotype "P" fields (except date fields), are flagged, or toggled on, for output by default. Madeline indicates which fields in a database are toggled for output by placing the letter "o" after the category indicator "C","G", or "P" (Fig. 1.7). A number after the "o" indicates the order in which fields will appear in pedigree drawings and file output. Fields may be manually reordered using the set field order command.
M>list fields 1.FAMID Co__1 20.D20S482 Go__6 39.D20S96 Go_25 2.STUDYID Co__2 21.D20S849 Go__7 40.D20S119 Go_26 3.SEX Co__3 22.D20S905 Go__8 41.D20S481 Go_27 4.FATHER Co__4 23.D20S846 Go__9 42.D20S836 Go_28 5.MOTHER Co__5 24.D20S892 Go_10 43.D20S888 Go_29 6.TWIN Co__6 25.D20S115 Go_11 44.D20S886 Go_30 7.NAFFECTE Co__7+ 26.D20S851 Go_12 45.D20S197 Go_31 8.BMI Po__1 27.D20S917 Go_13 46.D20S178N Go_32 9.INS_FAST Po__2 28.D20S894 Go_14 47.D20S866 Go_33 10.INS_2H Po__3 29.D20S189 Go_15 48.D20S196 Go_34 11.BW_REAL Po__4 30.D20S898 Go_16 49.D20S857 Go_35 12.GLU_FAST Po__5 31.D20S114 Go_17 50.D20S480 Go_36 13.GLU_2H Po__6 32.D20S912 Go_18 51.D20S211 Go_37 14.GAD_DUP Po__7 33.D20S477 Go_19 52.D20S840 Go_38 15.D20S103 Go__1 34.D20S874 Go_20 53.D20S120 Go_39 16.D20S117 Go__2 35.D20S195 Go_21 54.D20S100 Go_40 17.D20S906 Go__3 36.D20S909 Go_22 55.D20S102 Go_41 18.D20S193 Go__4 37.D20S107 Go_23 56.D20S171 Go_42 19.D20S889 Go__5 38.D20S170 Go_24 57.D20S173 Go_43 M> |
Fig. 1.7. Categorization of Fields in Madeline. The plus "+" sign after NAFFECTE indicates that Madeline has detected this field as the AffectionStatusField: categorical levels of this field will be used to color icon symbols on pedigree drawings. A field listing is shown when a database is first opened, or at any other time using the list fields command.
The order of genotype fields is automatically set to map order when a marker map database is loaded using the load command. Load can be issued either before (the preferred method) or after an open command. The order of genotype fields whose names match the names of markers in the map database will be set to the map order.
Fields toggled on for output are displayed in pedigree drawings created with the draw command.
When a write command is executed, the set of core "C" fields required by the specific format being produced will generally be output regardless of the on/off output flag status. For example, Madeline will output the GenderField even if you toggle it off because it is required for almost all output formats. This behavior is required to insure proper file construction. Genotype "Go" fields toggled for output will be written, along with phenotype "Po" (and possibly covariate "Vo") fields toggled for output if the analysis format supports phenotype fields. Some analysis programs, such as Genehunter and Siblink, do not use phenotype data beyond affection status (which is a core field).
Fields may be toggled on or off for output using the toggle command.
Madeline makes use of marker map information to:
The load command is used to load a table containing genetic maps for one or more chromosomes. It may contain only one map for each chromosome. The map database must contain fields of information specifying the chromosome, rank or ordinal position of the marker within the map for a given chromosome, name of the marker, and the position of the marker in centiMorgans (Table 1.7). A map may be viewed using the list map command (Fig. 1.8).
| Variable For Storing Field Name | Default Value | Description |
| ChromosomeField | "CHROMOSOME" | Numeric field storing the chromosome number. |
| OrdinalField | "ORDINAL" | Numeric field storing the ordinal position or rank of the marker on the map for this chromosome. |
| MarkerField | "MARKERNAME" | Character field storing the name of the marker |
| PositionField | "POSITION" | Numeric field storing the map position from the p terminus in centiMorgans. |
M>load '\maps\newmaps.dbf' Marker maps based on \maps\newmaps.dbf are now installed. M>list map for chromosome=7 Marker Name Ch Or Position ----------- -- -- -------- D7S2477 7 1 0.0000 D7S531 7 2 5.4000 D7S517 7 3 7.7000 D7S513 7 4 19.1000 D7S493 7 5 36.1000 D7S516 7 6 43.8000 D7S484 7 7 55.6000 D7S510 7 8 62.7000 D7S2422 7 9 74.2000 D7S669 7 10 87.4000 D7S657 7 11 102.6000 D7S515 7 12 111.8000 D7S2502 7 13 124.9000 D7S530 7 14 134.1000 D7S640 7 15 140.5000 D7S495 7 16 145.7000 D7S2513 7 17 150.9000 D7S483 7 18 167.7000 D7S550 7 19 182.4000 M> |
Fig. 1.8. Loading and viewing marker maps in Madeline. A map database is installed using the load command. The list map command is used to print a table showing marker name, chromosome, mapped order, and position in centiMorgans.
When using FUSION data with Madeline v. 0.90
and above, be sure to include the following two lines in your batch file,
or in the autorun.bat file, in order to define the
map database field names used in FUSION:
OrdinalField ="POSITION" PositionField="KOSAMBICM" |
Log and Error Reporting Features
Madeline produces three types of log files (Table 1.8). The first is a summary file that has a ".log" extension by default and records each command that was entered and a summary of execution results. For example, results of a write command indicate how many pedigrees and individuals were included, how many were excluded, and the total number of pedigrees and individuals. The second is a detail file that has a ".dtl" extension by default. It provides detailed information on which pedigrees and individuals were excluded and why they were excluded. The third log file is an error log that has a ".err" extension by default and records warning and error conditions that occur.
| Type of File | Default Name | Purpose |
| Summary | madeline.log | Records commands and summaries of execution results. |
| Detail | madeline.dtl | Records details regarding inclusion and exclusion of individuals and pedigrees. |
| Error | madeline.err | Records warning and error conditions. |
Display of Warning and Error Levels
If manageable errors do occur when a new pedigree database is opened, Madeline’s
interactive "M>" prompt changes to display the number and type of error conditions
detected. For example, "1 SYNTAX ERROR 10 WARNINGS M>"
would indicate that one syntax error and 10 manageable error conditions or
warnings occurred. Altogether, the program maintains four categories of warnings
and errors:
A syntax error refers to an error in typing a command on the command line or in a batch file. A warning often indicates a manageable database error such as having only one instead of both parents listed in a database. A severe warning indicates a more severe type of database error such as having a male listed as the mother of an individual. Madeline will try to manage this type of situation, for example by changing the sex of the "male" mother to female. Such a change does not guarantee that the situation is remedied, much less correct: later in the same database, the "male" mother may turn out to be listed as the "father" of another child! This would cause a fatal error, causing the program to terminate, because there is no way to rectify such inconsistent information. The warning and error conditions may be reviewed in the error log.
Pedigree Reconstruction and the Categorization of Individuals
When a pedigree database is opened, Madeline reconstructs pedigrees based on the core data fields. When records for the parents of non-founder individuals are absent from the database, Madeline dummies-in the parents using the IDs shown in the FatherIDField and MotherIDField. If one of the two parental IDs is missing, Madeline creates a random ID for the missing parent . Random IDs are always eight characters in length and begin and end with an exclamation point (e.g., "!EW12M5!", "!G79ER5!", etc.) to facilitate recognition.
When FUSION IDs are used, Madeline dummies-in parents
even when parental IDs are not provided in the FatherIDField and
MotherIDField, and joins together spouses when they don’t have any offspring.
After reconstructing pedigrees, Madeline classifies individuals into categories (Table 1.9) and summarizes their distribution in a table (Fig. 1.9). Attached individuals are individuals in the database who have either parents, or offspring, or both. Unattached individuals are in the database, but remain unconnected because they don’t have parents or offspring. Unattached individuals often represent a set of unrelated controls in a data set.
In the current version, Childless spouses can only
be detected when FUSION IDs are employed. When a FUSION couple does not have children
listed in the database, usually one of the individuals has other connections to the
pedigree and falls into the attached category, while the remaining spouse
usually has no other connections to the pedigree and so is categorized as a
childless spouse.
| Category | Description |
| In Database: | |
| Attached | Individuals in the database who have parents and/or offspring. |
| Childless Spouses |
Married individuals in the database who do
not have children and who are not otherwise
attached to a pedigree.
|
| Unattached | Individuals in the database who remain unconnected. These may be controls. |
| Not In Database: | |
| Not In Database | Parents without records in the database who are inserted by Madeline. |
M> open ‘\test\test.dbf’
.
.
.
----------------------------- --------- --------- ---------
Pedigrees and Individuals Included Excluded Total
----------------------------- --------- --------- ---------
Pedigrees ................... 590 0 590
Individuals ................. 3,317 0 3,317
+ In database .............. 2,178 0 2,178
| + Attached .............. 2,164 0 2,164
| + Childless spouses ..... 14 0 14
| + Unattached ............ 0 0 0
+ Not in database .......... 1,139 0 1,139
M>
|
Fig. 1.9. Summary table of pedigree count and distribution of individuals by category in Madeline. After a database is opened and pedigrees reconstructed, Madeline displays a table showing the number of pedigrees and distribution of individuals by category.
Madeline provides _unattached, _ChildlessSpouse, and _InDatabase as references which return boolean status information regarding the categorization of an individual. These references can be easily used in queries to find out about the categorization of individuals (Fig. 1.10).
M>view for _ChildlessSpouse 0007-500 in 0007 (rec. no. 42) * childless spouse * 0049-500 in 0049 (rec. no. 276) * childless spouse * 0409-500 in 0409 (rec. no. 2433) * childless spouse * 0442-500 in 0442 (rec. no. 2628) * childless spouse * 0497-500 in 0497 (rec. no. 2912) * childless spouse * 1040+500 in 1040 (rec. no. 3917) * childless spouse * 1360+500 in 1360 (rec. no. 4853) * childless spouse * 1366+500 in 1366 (rec. no. 4862) * childless spouse * 8 individuals in 8 pedigrees matched as follows: Individuals .............. 8 + In database ........... 8 | + Attached ........... 0 | + Childless spouses .. 8 | + Unattached ......... 0 + Not in database ....... 0 M> |
Fig. 1.10. References returning boolean status information about individuals, such as _ChildlessSpouse, can be easily incorporated into queries in Madeline.
Data Classifications of Individuals
Before writing a file in a specific format using the write command, Madeline determines which individuals in a pedigree have data that can be used in an analysis of that pedigree. Madeline does this by examining the phenotype "Po" and genotype "Go" fields toggled on for output. Madeline uses this information when deciding which individuals are required in output. This is described in more detail in Data Evaluation and Management.
After the file has been written, Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category (Fig. 1.11). In this table, Madeline sub-categorizes attached individuals based on whether they have data or not, or have been otherwise marked for exclusion by the user. Note that individuals marked for exclusion may actually be included in output, but without their data, in order to preserve pedigree structure.
M>write to ‘\test\test.ped’ in genehunter format
.
.
.
----------------------------- --------- --------- ---------
Pedigrees and Individuals Included Excluded Total
----------------------------- --------- --------- ---------
Pedigrees ................... 574 16 590
Individuals ................. 3,247 70 3,317
+ In database .............. 2,140 38 2,178
| + Attached .............. 2,140 24 2,164
| | + With data .......... 2,139 15 2,154
| | + Without data ....... 1 9 10
| | + Marked for exclusion 0 0 0
| + Childless spouses ..... 0 14 14
| + Unattached ............ 0 0 0
+ Not in database .......... 1,107 32 1,139
M>
|
Fig. 1.11. Summary table after a write command in Madeline. Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category. Attached individuals (in bold) are sub-categorized based on whether they have data or not, or have been marked for exclusion by the user.
When present, Madeline relies on information contained in the MZTwinField, DZTwinField, and DateOfBirthField to evaluate monozygotic and dizygotic twinships. When the optional DateOfBirthField is included, Madeline verifies that birth dates of twins match. Verification is extended to dizygotic twins when the optional DZTwinField is also included.
When the DateOfBirthField is included, Madeline looks for twins who are not marked in the either MZTwinField or DZTwinField (if present). Apparent twins of opposite sex are categorized as dizygotic twins. Apparent same-sex twins are assigned to a special twin of unknown type category. Twins whose type is unknown are shown with a question mark between them in pedigree drawings.
If Madeline encounters single, unpaired individuals marked as twins in the MZTwinField or DZTwinField, the program automatically removes the twin flag and informs the user of the change. The flag is only altered in memory -- the data table itself remains unchanged.
Messages about twinships are recorded in the summary and detail log files.
Madeline automatically detects consanguinity in pedigrees. Messages about consanguinity are recorded in the summary and detail log files.
There is no limit to the number of spouses that an individual in a pedigree may have. Pedigree drawings can display up to 10 spouses of a single individual.
Madeline can model pedigrees having multiple original founders. When the DividedPages flag is on (the default), Madeline's draw command will draw pedigrees consisting of an ancestral founder with one or more founding spouses on a single virtual page. Pedigrees consisting of two or more founding ancestral mate groups will be printed on multiple virtual pages (Whether a single virtual page is printed on one or more physical pages depends on the setting of orientation and the unscaled dimensions of the drawing).
Data Evaluation And Management
Prior to writing output in a specific format, Madeline determines which individuals in a pedigree have data that can be used for analysis by examining the genotype "Go" fields and, if appropriate, the phenotype "Po" and covariate "Vo" fields toggled on for output.
In general, an individual is considered to have genotype data if he is typed for at least one marker among the set of "Go" fields. If applicable, an individual is considered to have phenotype data if all of his or her "Po" and "Vo" fields are non-missing.
After flagging individuals in a pedigree who have usable data, Madeline decides whether the entire pedigree is usable or not. Madeline’s decisions depend on the specific format keyword associated with the write command. For example, using the GenehunterNpl keyword (for a non-parametric analysis) will result in a different set of pedigree exclusions than the genehunter keyword (for a parametric analysis), although there will certainly be overlap in the sets.
Only required individuals in included pedigrees are written to output. Required individuals consist of individuals who:
For example, records for unsampled parents are often required to show relationships among siblings. Terminal individuals without offspring who do not have data are excluded from output. Individuals who have been marked for exclusion by the user using the exclude command will be included, but without their data, only if they are required to maintain pedigree structure. Otherwise, they will be excluded.
It is possible to turn off Madeline's data evaluation machinery for most formats in order to include possibly unusable pedigrees and individuals in output by issuing the command set autoexclude off.
Tracking Inclusion and Exclusion of Pedigrees and Individuals
Madeline’s detail log file records which pedigrees were excluded from output. Fig. 1.12 shows an example detail log produced after requesting an output file in GenehunterNpl format.
.
.
.
GenehunterPedigreeHasData(): excluding pedigree 0547: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0557: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0558: lacks an individual with data.
GenehunterPedigreeHasData(): excluding pedigree 0560: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0572: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0583: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0587: contains only a single affected
individual.
.
.
.
|
Fig. 1.12. Excerpt from a Madeline detail log file produced after requesting output in GenehunterNpl format. Madeline’s detail log file records which pedigrees were excluded from output and why.
In addition, a draw command executed after a write command will automatically produce annotated pedigree drawings showing which individuals:
An example is shown in Fig. 1.13. In this example, the user marked individuals with a body mass index (BMI) greater than or equal to 35 for exclusion using the exclude command and then requested an output file in GenehunterNpl format.
Fig. 1.13. Annotated pedigree drawing produced by draw after a write command in Madeline. Madeline dummied-in the two founding parents, "200" and "300", who are indicated by dashed lines. They were included ("INCLUDED") in output. Two individuals, "500" and "601", were marked for exclusion by the user. The terminal individual, "601", was not included in output ("EXCLUDED"), but "500" was retained with data excluded in order to preserve pedigree structure ("DATA EXCL INDV INCL"). The remaining individuals are all annotated as having genotype data and were included in output ("HAS DATA - INCLUDED"). Affected individuals are shaded and labeled with "A", while unaffected individuals are unshaded and labeled with "U".
Madeline provides powerful mechanisms for querying and subsetting records in pedigree tables. Database management systems can generally match query criteria against only one record at a time. In contrast, Madeline is specialized for dealing with multiple relationships in a pedigree simultaneously.
Madeline provides mechanisms for referring to related records within a single query statement. In Madeline, you can reference an individual, his or her mother or father, mates, and offspring all in a single query statement.
You can also reference aggregate or summary information related to an entire sibship, such as the mean sibship value of a variable, as easily as you can reference values related to single individuals. These two mechanisms -- referencing related individuals and referencing sibship aggregate data -- make it easy to get answers to many questions in Madeline that can be tedious to obtain in general database management systems.
Referencing Internal Information About An Individual And Relatives
Madeline allows the user to look at internal information about an individual and his or her relatives using references. References are a subset of keywords which begin with an underscore character to distinguish them from similarly-named variables or fields in databases. There are two types of references:
References to Internal Information About An Individual
Madeline provides references to many items of internal information about an individual, such as the number of offspring (_noffspring) and number of mates (_nmates) an individual has, and total number of individuals in the individual's pedigree (_n). Example usage is shown in Fig. 1.14. Table 5.4 lists all references to internal information.
M>go 1901 <-- go to record no. 1901 M>show studyid <-- display the studyid of this individual "05100" M>show bmi <-- display body mass index 48.9809 M>show cpep <-- display c peptide value 0.88 M>show _noffspring <-- display number of offspring 4 M>show _nmates <-- display number of mates 1 M>show _n <-- display total number of individuals in this individual’s pedigree 16 M> |
Fig. 1.14. References to internal information about an individual in Madeline. Command lines shown in blue are examples of references to internal information that Madeline maintains about each individual.
Madeline also maintains references which point to relatives of an individual (Fig. 1.15). The references to mates, _mate[], and offspring, _o[], are treated as arrays. Alternate references such as _spouse for _mate[0] and _FirstChild for _o[0], are also provided for convenience.
References can be chained using the dot operator, ".", in order to access information related to more distant relatives. For example, a maternal grandmother may be referenced using _mother._mother. Example usage is shown in Fig. 1.15. A complete list of references to relatives is provided in Table 5.4.
M>go 6174 <-- go to record no. 6174
M>show frstname <-- first name of individual
"William"
M>show lastname <-- last name of individual
"Goodman"
M>show _noffspring <-- number of offspring
11
M>show _nmates <-- number of spouses
1
M>show _mate[0].frstname <-- first name of spouse
"Tessie"
M>show _FirstChild.dob <-- date of birth of first listed child
{Thursday, May 30, 1957}
M>show _SecondChild.dob <-- date of birth of second listed child
{Monday, December 19, 1966}
M>show _o[10].dob <-- date of birth of eleventh listed child
{Sunday, January 25, 1953}
M>show _mother._mother.dob <-- date of birth of maternal grandmother (unknown)
{ }
M>show _mother._mother.lastname <-- last name of maternal grandmother
"Toughwoman"
M>
|
Fig. 1.15. Using References to Relatives in Madeline. Command lines using references to relatives are shown in blue. Note that children in the offspring vector are sorted by IndividualIDField, not by date of birth.
In addition to references to individual information and relatives, Madeline provides aggregate functions that allow one to look at aggregate or summary information -- such as means and standard deviations -- of the offspring of an individual (Fig. 1.16).
M>go 1577 <-- go to record no. 1577 M>show studyid <-- display studyid "044301" M>show _noffspring <-- display number of offspring 2 M>show _o[0].bmi <-- body mass index of first child 31.1327 M>show _o[1].bmi <-- body mass index of second child 32.7896 M>show _omean(bmi) <-- mean body mass index of offspring 31.9612 M>show _ostddev(bmi) <-- standard deviation of offspring bmi 1.17156 M> |
1.16. Aggregate Functions In Madeline. Aggregate functions (blue) allow one to look at summary information such as means and standard deviations of the offspring of individuals.
All aggregate functions take as an argument an expression which evaluates to a numeric result. Table 6.2 lists the aggregate functions available in Madeline.
The view command retrieves a subset of records that match query criteria. The exclude command allows the user to mark a subset of records for exclusion from output. The unexclude command performs the opposite function -- unmarking a subset of records previously marked for exclusion. Starting with version 0.90, the draw command can now also be invoked with a query expression in order to draw a subset of pedigrees. Example usage is shown in Fig. 1.17.
M>view for _noffspring>=3 and _omean(bmi)>=50 2113-100 in 2113 (rec. no. 32) 2113-500 in 2113 (rec. no. 35) 2 individuals in 1 pedigree matched as follows: Individuals .............. 2 + In database ........... 2 | + Attached ........... 2 | + Childless spouses .. 0 | + Unattached ......... 0 + Not in database ....... 0 M>exclude for _noffspring>=3 and _omean(bmi)>=50 2113-100 has been marked for exclusion 2113-500 has been marked for exclusion 2 individuals in 1 pedigree marked for exclusion as follows: Individuals .............. 2 + In database ........... 2 | + Attached ........... 2 | + Childless spouses .. 0 | + Unattached ......... 0 + Not in database ....... 0 M>draw pedigrees for _noffspring>=3 and _omean(bmi)>=50 1 pedigree in result set calling "gs madeline.ps" M> |
Fig. 1.17. Query and Subsetting Commands in Madeline. In this example, the view command is used to identify parents having three offspring whose mean body mass index is greater than or equal to 50. The query result set contains one pair who are excluded using exclude. The draw command is then invoked with the same query expression in order to draw the relevant pedigree. The command draw pedigree '2113' could also have been used.
Madeline's draw command produces drawings of pedigrees using Adobe Postscript language procedures and document structuring conventions (Fig. 1.18).
![]() |
|||||||||||||
![]() |
|||||||||||||
![]() |
|||||||||||||
![]() |
|||||||||||||
Fig. 1.18. An example pedigree drawn by Madeline. In this example, two categorical variables indicating disease conditions are graphically displayed on the left and right halves of the icons. The status of the first condition, on the left side, is coded using "U" for unaffected and "A" for affected. On the right side, the status of the second condition is coded using "U" for unaffected, "M" for moderate, and "S" for severe. Missing values are indicated by dots, ".". The icon drawn with a dashed line perimeter indicates an individual whose record was not found in the database. No ID was provided in the FatherIDField of the gender-unknown offspring, and so the program has assigned a random ID of !21A3F8! to the missing father. (The displayed data were invented to illustrate the drawing capabilities of the program).
Pedigree drawings can display any number of field variables present in a dataset. The toggle command is used to select fields for inclusion on a pedigree drawing.Toggle output flags toggles which fields appear as labels under the icons on a pedigree drawing. The set field order command is used to order selected fields within their respective categories, "C" ,"P", or "G". On drawings, core "Co" fields always appear first, followed by phenotype "Po" fields, and finally genotype "Go" fields.
Toggle icon flags toggles on or off the set of categorical variables to be displayed graphically by shading or coloring regions of the male and female icons. Madeline divides the icon into pie-slice shading regions based on the number of categorical variables selected. The program does not impose a limit on the number of categorical variables that can be graphed simultaneously.
The manner in which subtrees are divided across pages, the paper orientation, size, margins, and color may all be set using various set commands. When DividedDrawings is set on (the default), subtrees of a pedigree originating from different founding ancestor groups are printed on separate pages. Orientation may be set to portrait, landscape, automatic, or MultiPage. When orientation is set to automatic or MultiPage, Madeline decides on the orientation of individual pedigrees depending upon the width and height of each drawing. In the event that a drawing would require excessive reduction to fit on a single page, Madeline will automatically include Postscript commands to print the drawing in poster-style across several physical pages.
Madeline's Postscript drawing routines are efficient, typically permitting the construction of hundreds of drawings per second on a modern Sun SparcStation or Intel Pentium machine. In order to view the drawings on screen, the user needs to assign the name of a Postscript viewing application (such as GhostView, GV or GSView) to Madeline's PostscriptViewer variable (Fig. 1.17). This can be done in the autorun.bat file.
![]() |
![]() |
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig. 1.17. Drawing pedigrees in Madeline. Toggle output flags specifies which fields will appear on the pedigree drawings. Draw pedigrees for ... specifies a subset of pedigrees that match the query criteria. Madeline calls the Postscript viewing application named in PostscriptViewer (gv in the Linux environment shown).
Producing Output Files for Analysis
The write command is used to produce locus, pedigree, and control or parameter files for analysis. Keywords like Mendel and GenehunterNpl are used to specify the analysis file format.
For most formats which require a control or parameter file, a single write command suffices to produce both the pedigree and control file. In these cases, the control file often contains the required locus information. For some other formats, the command write locus file is used to produce the locus file separately from the write pedigree command used to create the pedigree file. Section 4, Write Formats, documents the procedure required for supported formats.
Section 2
Tutorial
Introduction to the Tutorial
Madeline is easy to use once you see how it works. The goal of this section is to enable you to use Madeline to accomplish real tasks in a very short time. An instructive command file is shown in Fig. 2.1. Comment lines begin with two forward slashes, "//". Command lines are shown in bold. The effect of each command or group of commands is described in turn.
// Assign log files: LogFile='chr8.log' DetailFile='chr8.dtl' ErrorFile='chr8.err' quiet system "dir \databases\chr8.*" // Map missing value indicators: list nmv nmv[0]=-1 nmv[1]=-9 list nmv // Map core field names: GenderField='GENDER' AffectionStatusField="AFFECTSTAT" // Map codes used in core fields: list csv csv[_female]='FEMALE' csv[_male]='MALE' list csv // Load a database containing genetic maps: load '\maps\emap.dbf' list map for chromosome=8 // Open pedigree database: open '\databases\chr8.dbf' // toggle off output of phenotype fields: toggle output flag for bmi list fields // Example 1: Create files for Mendel USERM13 analysis: write locus file to '\analysis\mendel.loc' in mendel format write pedigree file to '\analysis\userm13.ped' in userm13 format // Example 2: Create files for Genehunter non-parametric linkage analysis: write locus file to '\analysis\ghnpl.loc' in genehunter format write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format // Example 3: Create files for Siblink affected sib pair analysis: // First, mark some individuals for exclusion: exclude for bmi>=35 write to '\analysis\asp.ped' in SiblinkAffectedPairs format // Draw pedigrees: list fields toggle output flags for 2-5, bmi, affectstat, 12-20 list fields drawingfile='pedigrees.ps' set color off set orientation to automatic set papermargin to 1.5 AffectstatLabel[0]="U" AffectstatLabel[1]="A" draw pedigrees ‘0001’-‘0005’,'0472','0570' // End session: goodbye |
Fig. 2.1. Example Madeline command file.
This tutorial includes sample commands to map missing values, assign core field names, and designate codes used in core fields. These commands are typically required, but some of them will not be needed when FUSION data are used. Madeline is generally quite flexible about the order in which commands are executed. The tutorial presents a recommended command sequence.
LogFile, DetailFile, and ErrorFile store the names of the summary, detail, and error logs. By default, LogFile is set to "madeline.log", DetailFile to "madeline.dtl", and ErrorFile to "madeline.err". If the default names are used, these files will be overwritten each time you start Madeline. When you provide new assignments (Fig. 2.2), the current contents of the log files are copied to the new files, and all subsequent messages are redirected to the new files. Reassignment of the log and detail files should be done at the beginning of a session.
M>LogFile='chr8.log' LogFile has been changed from "madeline.log" to "chr8.log" M>DetailFile='chr8.dtl' DetailFile has been changed from "madeline.dtl" to "chr8.dtl" M>ErrorFile='chr8.err' ErrorFile has been changed from "madeline.err" to "chr8.err" M> |
Fig. 2.2. Reassigning summary, detail, and error log file names in Madeline.
By default, Madeline is in verbose mode. In verbose mode, all messages, both summary and detail log messages, are sent to the screen. Writing many messages to the screen slows the program down a bit and may be distracting, so Madeline supports two quieter levels. When quiet is issued, summary log messages continue to be printed to the screen, but detail log messages are suppressed from the screen. When silent or silence is issued, neither summary nor detail messages appear on the screen. Error messages are always printed to screen regardless of the verboseness setting. To return from a quiet state to the default, issue verbose. Under all circumstances, messages continue to be printed to the summary and detail log files, as appropriate. Quiet mode is recommended on platforms such as DOS32 and Windows that lack scrollable terminal window buffers.
System ‘dir \databases\chr8.*’
The system command transfers a quoted-string command to the operating system shell. This allows the user to obtain directory and file information, copy or move files, or run other software without having to exit Madeline. System is especially useful when you need to obtain file or directory information using the DOS dir command or the UNIX ls command.
Mapping Missing Value Indicators
Nmv is the abbreviated name for the NumericMissingValue array. The list command instructs Madeline to list the elements of the array (Fig 2.3).
M>list nmv NMV has 1 element: NMV[ 0]= -9999 M>nmv[0]=-1 M>nmv[1]=-9 M>list nmv NMV has 2 elements: NMV[ 0]= -1 NMV[ 1]= -9 |
Fig. 2.3. Mapping missing value indicators in Madeline.
By default, nmv[] contains a single element, -9999, which is a default missing value indicator used in the FUSION study. The assignment nmv[0]=-1 overwrites the value of the first cell with -1. The assignment nmv[1]=-9 assigns -9 to the second cell, automatically expanding the array if necessary. -1 and -9 will now be automatically recognized as missing value indicators when subsequently reading values in a database. Madeline’s self-expanding arrays do not impose a limit on the number of missing value indicators which may be used in a database.
In a general setting, the names of core fields in a pedigree database may differ from the default names used in Madeline which are based on field names encountered in the FUSION study. Assignments to the appropriate core field name variables (Fig. 2.4) instruct Madeline to recognize core field names when a pedigree database is opened subsequently. Madeline will automatically capitalize and truncate field names to 10 letters if necessary.
M>GenderField='GENDER' M>AffectionStatusField="AFFECTSTAT" |
Fig. 2.4. Mapping Core Field Names in Madeline.
Mapping Codes Used In Core Fields
Arbitrary sets of codes may be used to represent core categorical information such as gender or affection status. Assignments to the appropriate arrays instruct Madeline to recognize study codings correctly. Fig. 2.5. shows how to tell Madeline to recognize the gender codes "MALE" and "FEMALE" in a database in place of the default codes "M" and "F". By using the symbolic constants _female and _male to index the array, you don't have to remember specifically which cell is reserved for which sex.
M>list csv CSV has 2 elements: CSV[ 0]="M" CSV[ 1]="F" M>csv[_female]='FEMALE' M>csv[_male]='MALE' M>list csv CSV has 2 elements: CSV[ 0]="MALE" CSV[ 1]="FEMALE" |
Fig. 2.5. Mapping codes used in core fields in Madeline.
The load command (Fig. 2.6) loads a table containing genetic maps for one or more chromosomes. The map table can be in any of the supported input database formats. It may contain only one map for each chromosome. The map table must contain fields of information specifying the chromosome, the rank or ordinal position of the marker within the map for a given chromosome, the name of the marker, and the position of the marker in centiMorgans.
After load, Madeline will indicate that marker maps have been installed. You can view a map by issuing list map for chromosome=n, where n is a valid chromosome number (the human x chromosome may be designated by 23). To obtain a listing of all markers for all chromosomes present in the table, issue list map by itself.
M>load '\maps\emap.dbf' Marker maps based on k:\emap\emap.dbf are now installed. M>list map for chromosome=8 Marker Name Ch Or Position ----------- -- -- -------- D8S504 8 1 0.0000 D8S550 8 2 15.1000 D8S258 8 3 30.1000 D8S283 8 4 55.0000 Beta3 8 5 59.8000 D8S285 8 6 66.4000 D8S260 8 7 71.3000 D8S530 8 8 80.7000 D8S270 8 9 94.4000 D8S276 8 10 105.0000 GATA101F01 8 11 111.4000 D8S514 8 12 122.2000 D8S284 8 13 135.3000 |
Fig. 2.6. Loading a database containing genetic maps in Madeline.
The USERM13, Genehunter, and Siblink pedigree files that will be written subsequently do not include phenotype information. With the exception of core "C" fields which Madeline controls, it is imperative to toggle off all fields in the database which should not be included in the output and which should not be considered when Madeline decides whether an individual or pedigree contains sufficient data for output. This is done using the toggle command (Fig. 2.7). The list fields command can then be used to verify that the correct subset of fields were turned off.
// toggle off output of phenotype fields: M>toggle output flag for bmi Note: genotype fields ordered according to current map M>list fields 1.STUDYID Co__1 8.BMI P 15.D8S276 Go__9 2.GENDER Co__2 9.D8S504 Go__1 16.D8S283 Go__4 3.FATHER Co__3 10.D8S550 Go__2 17.D8S285 Go__5 4.MOTHER Co__4 11.D8S258 Go__3 18.D8S260 Go__6 5.TWIN Co__5 12.GATA101F01 Go_10 19.D8S530 Go__7 6.AFFECTSTAT C 13.D8S514 Go_11 20.D8S270 Go__8 7.DOB C 14.D8S284 Go_12 M> |
Fig. 2.7. Toggling and listing fields in Madeline. After the toggle command, field 8. BMI is no longer toggled on for output.
Open opens a pedigree database. Madeline's database engine seamlessly opens all supported database types on all supported platforms, allowing you to open FoxPro files on Solaris, SAS transport files on a PC, and so on. The user does not need to tell Madeline the file type. To open an ASCII flat file database, see documentation for the recognize, convert, rectify, transpose and merge commands.
When a pedigree database is opened, Madeline first categorizes fields as core "C", genotype "G", phenotype "P", or null, "*". If genotype fields are present, allele frequencies are estimated from all of the data using gene counting, ignoring family relationships (a in Fig. 2.8). If a map table is already installed and contains a map for markers in the database, the genotype fields are automatically ordered according to the map (b in Fig. 2.8). Pedigrees are reconstructed based on the core information. Madeline performs additional data operations when optional core fields such as AffectionStatusField or DateOfBirthField are included (c in Fig. 2.8). In this example, Madeline marks several apparent dizygotic twinships. Madeline also flags the AffectionStatusField, AFFECTSTAT, with a plus sign, "+", indicating that the categorical levels of AFFECTSTAT will be displayed graphically on the male and female icons in pedigree drawings. Finally, the program displays a summary table showing the count of pedigrees and distribution of individuals by category (d in Fig. 2.8).
M>open '\hold\chr8.dbf' Calculating allele frequencies for 9. D8S504... (a) … Calculating allele frequencies for 20. D8S270... (a) Database "\hold\chr8.dbf" opened with 2,506 records Core information read in 2.00 seconds … NOTE: 0471-100 and 0471-401 now marked with "a" indicating (c) an apparent dizygotic twinship. NOTE: 0570-401 and 0570-402 now marked with "a" indicating (c) an apparent dizygotic twinship. Pedigrees reconstructed in 0.1780 seconds Note: genotype fields ordered according to current map (b) 1.STUDYID Co__1 8.BMI Po__1 15.D8S276 Go__9 2.GENDER Co__2 9.D8S504 Go__1 16.D8S283 Go__4 3.FATHER Co__3 10.D8S550 Go__2 17.D8S285 Go__5 4.MOTHER Co__4 11.D8S258 Go__3 18.D8S260 Go__6 5.TWIN Co__5 12.GATA101F01 Go_10 19.D8S530 Go__7 6.AFFECTSTAT C + 13.D8S514 Go_11 20.D8S270 Go__8 7.DOB C 14.D8S284 Go_12 ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total (d) ----------------------------- --------- --------- --------- Pedigrees ................... 958 0 958 Individuals ................. 3,626 0 3,626 + In database .............. 2,506 0 2,506 | + Attached .............. 2,115 0 2,115 | + Childless spouses ..... 13 0 13 | + Unattached ............ 378 0 378 + Not in database .......... 1,120 0 1,120 |
Fig. 2.8. Opening a pedigree database in Madeline. Madeline performs a series of operations when the open command is used to open a pedigree database. See text for explanation.
Example 1: Creating Files for Mendel USERM13 Analysis
Mendel’s USERM13 module uses maximum likelihood methods to calculate allele frequencies, taking family relationships into consideration. All genotyped individuals in a database, including childless spouses, controls and other singleton individuals who are classified as unattached by Madeline can be used in an analysis.
USERM13 requires a locus and pedigree file as input. The locus file will contain allele frequency information calculated by Madeline. The pedigree file will contain the family and genotype information. The write locus file command with the generic mendel keyword creates the locus file (Fig. 2.9). The write pedigree file command with the userm13 keyword creates the pedigree file. As expected, childless spouses and a number of unattached individuals are included in the output file. The detail log file documents which individuals and pedigrees were excluded and why.
M>write locus file to '\analysis\mendel.loc' in mendel format Locus file "\analysis\mendel.loc" has been written. M>write pedigree file to '\analysis\userm13.ped' in userm13 format Writing pedigree data to "\analysis\userm13.ped" ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 810 148 958 Individuals ................. 3,469 157 3,626 + In database .............. 2,351 155 2,506 | + Attached .............. 2,107 8 2,115 | | + With data .......... 2,107 0 2,107 | | + Without data ....... 0 8 8 | | + Marked for exclusion 0 0 0 | + Childless spouses ..... 13 0 13 | + Unattached ............ 231 147 378 + Not in database .......... 1,118 2 1,120 |
Fig. 2.9. Creating locus and pedigree files for a Mendel USERM13 analysis in Madeline.
Example 2: Creating Files for Non-parametric Linkage Analysis in Genehunter
Like USERM13, Genehunter also requires a locus and pedigree file for analysis. In addition to allele frequency information, Genehunter’s locus file will contain map distance information obtained from the previously loaded map database. The generic genehunter keyword is used to specify the locus file format (Fig. 2.10).
M>write locus file to '\analysis\ghnpl.loc' in genehunter format Locus file "\analysis\ghnpl.loc" has been written. M>write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format Creating associated Genehunter control file called "\analysis\ghnpl.ctl" Writing pedigree data to "\analysis\ghnpl.ped" ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 533 425 958 Individuals ................. 3,033 593 3,626 + In database .............. 2,003 503 2,506 | + Attached .............. 2,003 112 2,115 | | + With data .......... 2,003 104 2,107 | | + Without data ....... 0 8 8 | | + Marked for exclusion 0 0 0 | + Childless spouses ..... 0 13 13 | + Unattached ............ 0 378 378 + Not in database .......... 1,030 90 1,120 |
Fig. 2.10. Creating locus and pedigree files for Genehunter non-pa