![]() |
||||||
![]() |
![]() |
|||||
![]() |
![]() |
![]() |
||||
![]() |
![]() |
![]() |
![]() |
|||
May, 2001
(c) 1999 by the Regents of the University of Michigan, Ann Arbor.
Madeline is software written in ANSI C/C++ for:
Madeline has been compiled for the following platforms:
Madeline was designed to meet the needs of the Finland-United States Investigation of NIDDM Genetics (FUSION) Study. Because of this, Madeline has specific knowledge about FUSION study IDs. A subset of Madeline’s functionality makes use of this knowledge (see FUSION box below).
The program continues to be modified to make it useful for genetic studies in general.
Paragraphs or headings preceded by "
"
describe FUSION-specific functionality.
FUSION
|
Sample ID: 1021+402
| | |
+-----------------+ | +---------------------+
| | |
Family ID begins with: Encoded flag symbol: Individual ID:
- "0" for FUSION 1 "-" for FUSION 1 "100" for probands
- "1" for FUSION 2 fam. "+" for FUSION 2 "200" for fathers
- "C" for control fam. "A" to "Z" for resampled "300" for mothers
- "T" for Trios FUSION records "400" for siblings of the
proband (enumerated)
"500" for proband spouses
(enumerated)
"600" for proband offspring
(enumerated)
"700" for sibling spouses
(enumerated)
"800" for sibling offspring
(enumerated)
|
|
Madeline is internally aware of the structure of FUSION IDs and uses this information in specific situations to:
Madeline currently uses the following rules to determine if an ID in a dataset is a FUSION ID:
A data set can easily contain a mixture of FUSION IDs and non-FUSION IDs. Only IDs meeting the above criteria will be construed as FUSION IDs. |
Running the Program Interactively and in Batch Mode
Instructions to Madeline are entered at a command prompt. Madeline's command interpreter is not sensitive to capitalization. However, capitalization is often used in this document for clarity of presentation.
Madeline can be run interactively or in batch mode (Fig 1.1). To run Madeline interactively, type the name of the program at your system prompt and press return. Madeline’s "M>" prompt will appear.
There are two ways to run batch files. The first way is to provide the name of a batch file containing commands after the name of the program on the command line. The second way is to start Madeline interactively and then use the run command to execute the batch file. Madeline returns to interactive mode if an error occurs, or when a batch file terminates without a goodbye or quit command.
csvr1% <-- system prompt (on UNIX) csvr1% madeline <-- starting the program in interactive mode MADELINE Version 0.910 Copyright (c) 1999 by Edward H. Trager |
Fig. 1.1. Starting Madeline. Madeline can be run either interactively or in batch mode.
An option is available to set parameters and run commonly needed commands automatically each time Madeline is started by providing a special batch file called "autorun.bat" in the working directory where Madeline will be invoked.
Any commands that can normally be invoked on the command line or in a batch file can be placed into autorun.bat. Assignments to specify default field names or environmental settings are typically placed in autorun.bat (Fig. 1.2).
// // Typical autorun.bat file for Unix/Linux environment: // // // Environment settings: // quiet set language to English FileEditor="vi" PostscriptViewer="gv" // // Pedigree drawing-specific settings: // set color off set PaperSize to A4 // margin in centimeters: set PaperMargin to 1.5 set orientation to automatic // // Pedigree database-specific settings: // GenderField='GENDER' FamilyIDField='FAMILY' IndividualIDField='INDIVIDUAL' // // Map standard missing value indicators: // NumericMissingValue[0]=-1 NumericMissingValue[1]=-9 // // Map database-specific settings: // PositionField="POSTN" OrdinalField ="ORDNL" |
Fig. 1.2. Example autorun.bat file.
Starting with Madeline v. 0.91, a warning message is produced if an autorun.bat file is not found, and the "M>" prompt changes accordingly (Fig 1.3).
... Could not find "autorun.bat" file. ... 1 WARNING M> |
Fig. 1.3. In Madeline v. 0.91 and following, A warning is produced if autorun.bat file is not present.
Overview of Database Tables Used by the Program
A database table is a rectangular array of data. A record is a row in the array. A field is a column in the array. One row or record contains the data -- all the measured variables -- for one entity.
In Madeline, the measured entity is either an individual or a genetic marker. Key fields are fields that identify the entity. To uniquely identify an individual, two key fields are required: (1) a family identifier, and (2) an individual identifier. Data fields contain the data measured on the entity. Combinations of other fields will be required to identify other entities, such as a genetic marker. The specific set of key fields required depends upon the context.
In Madeline, only three types of database tables occur:
Each type is described in turn below.
In a pedigree table, each row or record contains the data for one individual. In Madeline, the names of the family and individual ID fields are stored in variables called FamilyIDField and IndividualIDField, respectively. Basic pedigree reconstruction additionally requires knowledge of the father, mother, and gender of each individual. Therefore, Madeline defines a set of five core fields that must be present in every pedigree database:
The remaining data fields in a pedigree database can be classified into two groups: (1) phenotype and (2) genotype fields. Madeline therefore classifies all fields in a database table into one of these three categories using the single-letter identifiers shown below:
The complete set of core fields consists of the five obligatory core fields listed above, as well as some additional, non-obligatory core "phenotype" fields such as AffectionStatusField and DateOfBirthField.
A map table contains map information related to markers on one or more chromosomes. The key fields in a map table are:
The data fields in a map table are:
Marker Tables
A marker table contains the alleles for a specific marker measured on a specific
individual. Output from ABI machines is in this table format.
This type of table has three key fields:
There are only two essential data fields in a marker table:
In principle, the two allele fields could be represented by a single genotype field containing the numeric labels separated by a forward slash, "/". Madeline does not yet contain support for this option in marker tables.
Madeline provides support for integrating the information in a marker table into a pedigree table via the transpose and merge commands. The transpose command takes care of converting the paired allele fields into the single genotype fields expected in a pedigree table.
Madeline currently supports xbase (FoxPro, dBase III/IV), Visual FoxPro and SAS transport file formats, and space-delimited, column-aligned ASCII flat files. Madeline supports flat file tables directly by referencing a binary header file created using the recognize command. All pedigree databases are opened using the open command. Madeline’s database engine detects operating system and file byte-ordering at run time, thus permitting database tables from PCs to be opened on Unix workstations, and vice versa.
Madeline’s database engine supports character, numeric (floating point and integer), and date types of the supported database formats. A logical data type such as the "L" field type of xbase is not supported: use appropriately coded numeric variables instead. Other derived types, such as date-time or monetary types are not supported.
Character data are read from databases by trimming leading and trailing space characters. Thus, blank entries in a database appear as the empty string, "". When entered on the command line, literal character data must be delimited by a pair of matching single or double quotes, e.g., "0001-230" or '0980A'.
All numeric data types are converted to double-precision floating point numbers. Literal numeric values are entered on the command line without delimiters.
In order to support multiple file formats and missing values in a uniform manner, Madeline does not recognize a logical data type separate from the numeric data type. In contexts where a value is to be interpreted as a logical value, Madeline treats zero as _false, and any non-zero non-missing value as _true. Binary true/false data should thus be coded using a numeric field type with values of 0, 1, and a missing value indicator if required.
Date data read from a file are automatically converted to Julian day integers. When entered at the command line, dates must be delimited between curly braces and must be entered according to the ordering and capitalization conventions of the current language setting (Fig. 1.4). Madeline recognizes spaces, commas, periods or forward slashes as delimiters between the month, day, and year elements of a date. Madeline recognizes correctly capitalized, unabbreviated month names and month ordinals. Madeline does not recognize two-digit years as belonging to the current century.
M>show {December 11 1963}
{Wednesday, December 11, 1963}
M>show {December 11, 1963}
{Wednesday, December 11, 1963}
M>show {12/11/1963}
{Wednesday, December 11, 1963}
M>show {12/11/63}
{Sunday, December 9, 63 <-- in the year 63 A.D, before the Gregorian Calendar
M>show {dec 11 1963} <-- Madeline does not recognize abbreviated month names ...
{} <-- ...so this evaluates to a missing date
M>set language to Suomi
M>show {11.12.1963}
{keskiviikko 11.12.1963}
M>
|
Fig. 1.4. Dates in Madeline. Dates entered at the command line must be delimited by curly braces and must adhere to the ordering and capitalization conventions of the current language setting.
Date data may be displayed on pedigree drawings. Dates may also be used in an expression passed to a view or a draw command, to a subsetting command such as exclude, or to the sort command (which sorts the order of individuals on a pedigree drawing). There is currently no support for writing date data to an output file.
Madeline supports entry of missing values from the command line, and also provides a simple mechanism for the user to define sets of values in a database that should be mapped as missing values when the database is read by Madeline.
On the command line, Madeline provides the following external representations of internal missing value indicators for the user to use:
Some supported database formats, such as flat files and FoxPro database files, do not provide native missing value support for character and numeric types. Even when missing value support is provided by a database format, protocols in a study may require that different types of missing value codes be used when recording missing values. For example, in the FUSION Los Angeles data, different negative integers were used to code for assay pending, no assay, and no tube conditions.
Madeline therefore permits the user to specify lists of values that are to be treated as missing values. These lists of missing value indicators are stored in two arrays. CharacterMissingValue[] is used whenever character fields, including genotype fields, are referenced. NumericMissingValue[] is used whenever numeric fields are referenced (Table 1.1). For simplicity, these arrays can be referenced using their abbreviated names, cmv[] and nmv[], respectively.
| Full Name | Abbreviated Name | Default Values |
| CharacterMissingValue[] | cmv[] | cmv[0] = "." cmv[1] = "/" cmv[2] = "0/0" cmv[3] = "0/ 0" cmv[4] = "0/ 0" |
| NumericMissingValue[] | nmv[] | nmv[0] = -9999 |
When data are read from a database, all native missing values (for example, a space-padded blank entry is a native missing value indicator in a flat file) and any values that match the values specified in Madeline’s CharacterMissingValue[] or NumericMissingValue[] arrays are converted to Madeline’s internal missing value indicators.
At startup, CharacterMissingValue[] and NumericMissingValue[] contain a set of default missing value indicators appropriate to most FUSION data. New values can be assigned to existing cells or appended to the end of these lists as required by the user (Fig. 1.5): this should be done before a database is opened so that the values will be recognized appropriately. The autorun.bat batch file is an appropriate place to set character and numeric missing value indicators. Note that all arrays in Madeline are zero-offset.
M>list cmv <-- view CharacterMissingValue array CMV has 5 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" M>cmv[5]="./." <-- append new value to end of list M>list cmv CMV has 6 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" CMV[ 5]="./." M>list nmv <-- view NumericMissingValue array NMV has 1 element: NMV[ 0]= -9999 M>nmv[0]=-1 <-- overwrite one value M>nmv[1]=-9 <-- and append another value M>list nmv NMV has 2 elements: NMV[ 0]= -1 NMV[ 1]= -9 M> |
Fig. 1.5. Assigning missing value indicators. Missing value indicators may be assigned to existing cells or appended to the ends of Madeline’s character and numeric missing value lists.
Upon opening a pedigree table, Madeline categorizes each field into one of three categories:
When a field is completely empty or contains only missing values, Madeline assigns the field to a null category represented by an asterisk, "*".
When required, Madeline allows the user to designate a subset of "P" phenotype fields as "V" covariate fields using the toggle command. Madeline does not automatically assign fields to the covariate category. Field categories are summarized in Table 1.2 and described in greater depth below.
| Data Category | Symbolic Designation | Description | Core | C | Set of five required fields like GenderField that must be present in all pedigree databases, plus additional optional fields, like AffectionStatusField, that are not required by default but may be required for some operations. |
| Genotype | G | Character fields containing two numeric labels separated by a forward slash character, e.g., "141/142" |
| Phenotype | P | Character, numeric, or date fields that contain categorical or continuous phenotype information. |
| Covariate | V | A subset of phenotype fields that are to be used as covariates. The user must use the toggle command to change the designation of a "P" field to "V". |
| Null | * | Character, numeric, or date fields that are completely empty or contain only missing value indicators. In general, these fields cannot be operated upon. |
Core "C" data fields provide key information about an individual (Table 1.3). Madeline identifies core fields by their names. These names are stored in internal variables whose values may be reassigned by the user. In conformance with the requirements of the supported database types, all letters of field names must be capitalized, and cannot exceed 10 letters in length. Madeline automatically capitalizes and truncates any non-conformant field name identifiers.
Core data fields are either required or optional.
The absence of one or more of the five required core fields will
generate an error when a database is opened (
-- An exception applies when FUSION 1 data are used --see below).
Optional core fields may be required for some operations, but are not required by default. Madeline makes use of the additional information provided in optional core fields whenever they are present. For example, Madeline’s pedigree drawing functionality is enhanced by the presence of fields for affection, death, index case, monozygotic and dizygotic twin status.
|
Variable Name |
Description |
Default Value |
Expected Field Type |
| I. Required Core Fields which must always be present1: | |||
|
1. IndividualIDField |
Individual identifier |
"STUDYID" |
Character only |
|
2. FatherIDField |
Father's identifier |
"FATHER" |
Character only |
|
3. MotherIDField |
Mother's identifier |
"MOTHER" |
Character only |
|
4. GenderField |
Gender |
"SEX" |
Character or numeric |
|
5. FamilyIDField 1 |
Family identifier |
"FAMID" |
Character only |
|
II. Optional Core Fields: |
|||
|
AffectionStatusField |
Affection status |
"NAFFECTE" |
Numeric or character |
|
DeathStatusField |
Death status |
"DECEASED" |
Numeric or character |
|
IndexCaseField |
Index case or proband indicator |
"PROBAND" |
Numeric only |
|
LiabilityClassField |
Liability class |
"LCLASS" |
Numeric or character |
|
MZTwinField |
Monozygotic twin status indicator |
"TWIN" |
Character only |
|
DZTwinField |
Dizygotic twin status indicator |
"DZTWIN" |
Character only |
|
DateOfBirthField |
Date of birth |
"DOB" |
Date only |
|
DateOfDeathField |
Date of death |
"DOD" |
Date only |
1 The FamilyIDField is not required when data are restricted to FUSION 1 IDs only.
Madeline interprets data from required and optional core fields in order to reconstruct pedigrees and evaluate key information. A clear understanding of how Madeline interprets core data is essential to proper use of the program.
Use of Arrays To Map External Values Into Internal Meanings
A key aspect of Madeline’s generality and flexibility is the use of a set of arrays to map external data values into internal meanings. We have already seen how Madeline uses CharacterMissingValue[] and NumericMissingValue[] in order to map external missing value indicators to uniform internal missing value representations. If a value in a core field such as the field for gender, affection status, or death status does not map to a missing value, Madeline uses a designated array for mapping the external categorical value into an internal representation.
For example, suppose the GenderField contains the value "F" for some record. Since "F" is not a missing value listed in CharacterMissingValue[], Madeline looks in CharacterSexValue[] (abreviated as csv[]) and sees that "F" matches the second entry in the list, which is the entry reserved for female, _female. That is:
| "F" = CharacterSexValue[ 1 ] = CharacterSexValue[ _female ] |
So, Madeline knows that the individual is a female and records this information internally.
To insure that Madeline recognizes values in core fields correctly, assignments to the designated arrays must be made before any open or load command.
Database Field Naming Conventions
Different database file formats impose different restrictions on the length and format of field names in a database. For example, up to 10 characters can be used for field names in an xbase file, but only up to 8 characters in a SAS transport file. Although Madeline now supports several different file formats, the program originally only supported the xbase file format. As a result, Madeline restricts field name identifiers as follows:
Madeline does not actively check for errors such as spaces or disallowed characters in field identifiers. This is the user's responsibility. Madeline also has no way of knowing in advance what type of database file will be opened. For example, the program will not notice if you enter a ten-letter name for use with a SAS transport file that permits only 8-letter field identifiers.
The value in FamilyIDField tells Madeline the name of the family ID field in the database. The default value is "FAMID".
The FamilyIDField is not strictly required when
FUSION-compliant IDs are used. When the FamilyIDField is not present, Madeline
automatically extracts the family identifier from individual IDs which "look" like
FUSION IDs. However, FUSION 2 databases are likely to have "95x" individuals who are
connected to pedigrees via unstudied individuals who are assigned IDs that are
not FUSION-compliant. Thus, the FamilyIDField is required when
reading such databases.
Individual and Parental Identifiers
The values in IndividualIDField, FatherIDField, and MotherIDField serve to identify the individual and parent ID fields in the database. The default values are "STUDYID", "FATHER", and "MOTHER", respectively.
Parent IDs should be present in both the FatherIDField and MotherIDField of all non-founder individuals. The program interprets any individual with missing value indicators for both parents as a founder.
In the event that one of the two parent IDs is missing for an individual or individuals in a sibship, Madeline provides a randomly-generated eight-letter identifier to represent the missing parent. The randomly-generated IDs begin and end with exclamation marks to distinguish them from regular IDs. Using the generated ID, Madeline constructs a virtual parent in memory who will appear on pedigree drawings (Fig. 1.6) and in output from the write command. Madeline assumes that the sibs are full sibs sharing a single pair of parents.
Fig. 1.6. Virtual parent in Madeline. A virtual parent with a randomly-generated ID (right) is constructed when the ID of one parent is missing among a sibship of individuals (not shown). Sibs are assumed to be full sibs.
When FUSION-compliant IDs are used, it is possible to leave the FatherIDField
and MotherIDField of non-founders both missing in cases where Madeline can determine the IDs of the
parents. For example, Madeline knows that the parental IDs of a "100" or "401"
individual must end in "200" and "300" for the father and mother, respectively.
Madeline first looks for parents sampled during FUSION 1 or FUSION 2 in the database.
If parents are not found in the database, Madeline dummies-in virtual parents using
FUSION 1 IDs. In other cases, if only one of the two parent IDs is missing, Madeline can
reconstruct the correct ID of the missing parent from the parent whose ID is provided.
For example, if a "801" individual is the offspring of a "402" sib, the missing parent’s ID must end
in "702".
The default value for GenderField is "SEX". The GenderField can be either numeric or character. Madeline detects the field type when the database is opened. Madeline defines two constants, _male, which has a value of 0, and _female, which has a value of 1. These symbolic constants are used for indexing two arrays, NumericSexValue[] and CharacterSexValue[]. These arrays define the external values used in a database to designate gender (Table 1.4). Default values may be reassigned by the user as required.
| Array Name | Abbreviated Name | Default Values |
| CharacterSexValue[] | csv[] | csv[_male ] = "M" csv[_female] = "F" |
| NumericSexValue[] | nsv[] | nsv[_male ] = 0 nsv[_female] = 1 |
In Madeline, only terminal individuals without offspring may retain a gender attribute of missing. If during pedigree reconstruction Madeline detects any father or mother with a missing gender attribute, the program will automatically change the gender of the individual in memory to be consistent with the reconstruction, and will warn the user of the change. The database file on disk will not be changed.
Madeline will also automatically correct the gender attribute of mislabeled individuals in memory, for example, of a male listed as a mother, or of a female listed as a father. Madeline always warns the user of these types of database errors. Again, the database file on disk will not be changed -- that is the user's responsibility.
Madeline will warn the user and then terminate abruptly if conflicting and unresolvable gender roles exist for an individual, for example if an individual is listed as both a mother and a father.
Monozygotic and Dizygotic Twin Data
The MZTwinField should remain blank or missing for non-twins, and should contain a single-letter identifier for each twin pair or group. For example, "A" can be used to designate the first twin pair in a family, "B" the second pair, and so on. Starting with version 0.90 of the program, MZTwinField is considered an optional core field.
The optional DZTwinField, if present, should be coded in a similar manner to designate dizygotic twins.
The AffectionStatusField may be either numeric or character. Madeline defines two symbolic constants for describing the affection status of sampled individuals (the underscores are used to avoid confusion with possible field names and are required):
In addition to these two categories, Madeline also recognizes these additional categories for mapping unstudied individuals:
These additional categories are useful for drawing extended pedigrees which may include unstudied individuals in addition to sampled individuals. Madeline defines two arrays, CharacterAffectionStatus[] and NumericAffectionStatus[], for mapping external affection status values to one of the five internally recognized categories (Table 1.5).
| Array Name | Abbreviated Name | Default Values |
| CharacterAffectionStatus[] | cas[] | cas[_unaffected] = "0" cas[_affected ] = "1" cas[_UnstudiedUnaffected] = "2" (unstudied, reported unaffected) cas[_UnstudiedAffected ] = "3" (unstudied, reported affected) cas[_UnstudiedConflicting] = "4" (unstudied, conflicting reports) |
| NumericAffectionStatus[] | nas[] | nas[_unaffected] = 0 nas[_affected ] = 1 nas[_UnstudiedUnaffected] = 2 (unstudied, reported unaffected) nas[_UnstudiedAffected ] = 3 (unstudied, reported affected) nas[_UnstudiedConflicting] = 4 (unstudied, conflicting reports) |
Note that categories 2-4 refer only to unstudied individuals. Guard against using the externally mapped values of categories 2-4 for sampled individuals, especially if the write command is used to produce a file for analysis.
The optional DeathStatusField may be either numeric or character. Madeline defines the constants _alive, with a value of 0, and _dead, with a value of 1, for indexing the CharacterDeathStatus[] and NumericDeathStatus[] arrays used to map external values in the DeathStatusField into internal representations (Table 1.6).
| Array Name | Abbreviated Name | Default Values |
| CharacterDeathStatus[] | cds[] | cds[_alive] = "N" cds[_dead ] = "Y" |
| NumericDeathStatus[] | nds[] | nds[_alive] = 0 nds[_dead ] = 1 |
The optional IndexCaseField must be numeric. Madeline assumes that the probands or index cases will be coded using a value of 1, and all other individuals with a value of 0.
When FUSION-compliant IDs are used, Madeline
automatically determines which individuals are probands directly from the
IndividualIDField, making the IndexCaseField
unneccesary.
Some output formats, such as Genehunter, have the option of including liability class information. The LiabilityClassField may be numeric or character. Madeline does not interpret the values in this field.
The DateOfBirthField and DateOfDeathField are optional core date fields. When present, Madeline performs checks to insure that dates in these fields are reasonable, and looks for twins based on date of birth who have not been designated as such in the MZTwinField or DZTwinField.
Genotype "G" data are character fields that contain allelic marker data separated by the forward slash "/" character. The allele labels themselves must be numeric, non-alphabetic labels, e.g. "141/142".
The names of genotype fields should be the names of the markers themselves. This allows Madeline to automatically place the genotype fields into map order whenever a map database for the markers is loaded using the load command. Make sure that marker names in the map database are capitalized to correspond with the required capitalization of field names.
Estimation of Allele Frequencies from Genotype Data
When a database is opened, Madeline automatically estimates allele frequencies for all genotype fields using gene counting ignoring family relationships. Allele frequencies are estimated from all records in a database. Allele frequencies calculated from one database may be saved for use when processing another database using the set SaveAlleleFrequencies on command.
Phenotype "P" fields are any remaining fields that are not core "C" or genotype "G" fields. Phenotype fields may be character, numeric, or date fields, and are assumed to contain categorical or continuous phenotype information. Because date fields cannot be written to output from the write command, date fields are the only type of phenotype field not flagged for output when a pedigree database is opened.
For some types of output, it may be necessary to designate certain phenotype fields as representing covariates. Madeline therefore maintains a separate covariate or "V" field category which is a subset of the "P" category. Covariate fields are automatically recognized as phenotype fields when writing any format that doesn’t distinguish between phenotype and covariate fields. "P" fields can be marked as "V" fields using the toggle command.
Marking and Ordering Data Fields for Output
When a pedigree database is opened, most core "C" fields, all genotype "G" fields, and all phenotype "P" fields (except date fields), are flagged, or toggled on, for output by default. Madeline indicates which fields in a database are toggled for output by placing the letter "o" after the category indicator "C","G", or "P" (Fig. 1.7). A number after the "o" indicates the order in which fields will appear in pedigree drawings and file output. Fields may be manually reordered using the set field order command.
M>list fields 1.FAMID Co__1 20.D20S482 Go__6 39.D20S96 Go_25 2.STUDYID Co__2 21.D20S849 Go__7 40.D20S119 Go_26 3.SEX Co__3 22.D20S905 Go__8 41.D20S481 Go_27 4.FATHER Co__4 23.D20S846 Go__9 42.D20S836 Go_28 5.MOTHER Co__5 24.D20S892 Go_10 43.D20S888 Go_29 6.TWIN Co__6 25.D20S115 Go_11 44.D20S886 Go_30 7.NAFFECTE Co__7+ 26.D20S851 Go_12 45.D20S197 Go_31 8.BMI Po__1 27.D20S917 Go_13 46.D20S178N Go_32 9.INS_FAST Po__2 28.D20S894 Go_14 47.D20S866 Go_33 10.INS_2H Po__3 29.D20S189 Go_15 48.D20S196 Go_34 11.BW_REAL Po__4 30.D20S898 Go_16 49.D20S857 Go_35 12.GLU_FAST Po__5 31.D20S114 Go_17 50.D20S480 Go_36 13.GLU_2H Po__6 32.D20S912 Go_18 51.D20S211 Go_37 14.GAD_DUP Po__7 33.D20S477 Go_19 52.D20S840 Go_38 15.D20S103 Go__1 34.D20S874 Go_20 53.D20S120 Go_39 16.D20S117 Go__2 35.D20S195 Go_21 54.D20S100 Go_40 17.D20S906 Go__3 36.D20S909 Go_22 55.D20S102 Go_41 18.D20S193 Go__4 37.D20S107 Go_23 56.D20S171 Go_42 19.D20S889 Go__5 38.D20S170 Go_24 57.D20S173 Go_43 M> |
Fig. 1.7. Categorization of Fields in Madeline. The plus "+" sign after NAFFECTE indicates that Madeline has detected this field as the AffectionStatusField: categorical levels of this field will be used to color icon symbols on pedigree drawings. A field listing is shown when a database is first opened, or at any other time using the list fields command.
The order of genotype fields is automatically set to map order when a marker map database is loaded using the load command. Load can be issued either before (the preferred method) or after an open command. The order of genotype fields whose names match the names of markers in the map database will be set to the map order.
Fields toggled on for output are displayed in pedigree drawings created with the draw command.
When a write command is executed, the set of core "C" fields required by the specific format being produced will generally be output regardless of the on/off output flag status. For example, Madeline will output the GenderField even if you toggle it off because it is required for almost all output formats. This behavior is required to insure proper file construction. Genotype "Go" fields toggled for output will be written, along with phenotype "Po" (and possibly covariate "Vo") fields toggled for output if the analysis format supports phenotype fields. Some analysis programs, such as Genehunter and Siblink, do not use phenotype data beyond affection status (which is a core field).
Fields may be toggled on or off for output using the toggle command.
Madeline makes use of marker map information to:
The load command is used to load a table containing genetic maps for one or more chromosomes. It may contain only one map for each chromosome. The map database must contain fields of information specifying the chromosome, rank or ordinal position of the marker within the map for a given chromosome, name of the marker, and the position of the marker in centiMorgans (Table 1.7). A map may be viewed using the list map command (Fig. 1.8).
| Variable For Storing Field Name | Default Value | Description |
| ChromosomeField | "CHROMOSOME" | Numeric field storing the chromosome number. |
| OrdinalField | "ORDINAL" | Numeric field storing the ordinal position or rank of the marker on the map for this chromosome. |
| MarkerField | "MARKERNAME" | Character field storing the name of the marker |
| PositionField | "POSITION" | Numeric field storing the map position from the p terminus in centiMorgans. |
M>load '\maps\newmaps.dbf' Marker maps based on \maps\newmaps.dbf are now installed. M>list map for chromosome=7 Marker Name Ch Or Position ----------- -- -- -------- D7S2477 7 1 0.0000 D7S531 7 2 5.4000 D7S517 7 3 7.7000 D7S513 7 4 19.1000 D7S493 7 5 36.1000 D7S516 7 6 43.8000 D7S484 7 7 55.6000 D7S510 7 8 62.7000 D7S2422 7 9 74.2000 D7S669 7 10 87.4000 D7S657 7 11 102.6000 D7S515 7 12 111.8000 D7S2502 7 13 124.9000 D7S530 7 14 134.1000 D7S640 7 15 140.5000 D7S495 7 16 145.7000 D7S2513 7 17 150.9000 D7S483 7 18 167.7000 D7S550 7 19 182.4000 M> |
Fig. 1.8. Loading and viewing marker maps in Madeline. A map database is installed using the load command. The list map command is used to print a table showing marker name, chromosome, mapped order, and position in centiMorgans.
When using FUSION data with Madeline v. 0.90
and above, be sure to include the following two lines in your batch file,
or in the autorun.bat file, in order to define the
map database field names used in FUSION:
OrdinalField ="POSITION" PositionField="KOSAMBICM" |
Log and Error Reporting Features
Madeline produces three types of log files (Table 1.8). The first is a summary file that has a ".log" extension by default and records each command that was entered and a summary of execution results. For example, results of a write command indicate how many pedigrees and individuals were included, how many were excluded, and the total number of pedigrees and individuals. The second is a detail file that has a ".dtl" extension by default. It provides detailed information on which pedigrees and individuals were excluded and why they were excluded. The third log file is an error log that has a ".err" extension by default and records warning and error conditions that occur.
| Type of File | Default Name | Purpose |
| Summary | madeline.log | Records commands and summaries of execution results. |
| Detail | madeline.dtl | Records details regarding inclusion and exclusion of individuals and pedigrees. |
| Error | madeline.err | Records warning and error conditions. |
Display of Warning and Error Levels
If manageable errors do occur when a new pedigree database is opened, Madeline’s
interactive "M>" prompt changes to display the number and type of error conditions
detected. For example, "1 SYNTAX ERROR 10 WARNINGS M>"
would indicate that one syntax error and 10 manageable error conditions or
warnings occurred. Altogether, the program maintains four categories of warnings
and errors:
A syntax error refers to an error in typing a command on the command line or in a batch file. A warning often indicates a manageable database error such as having only one instead of both parents listed in a database. A severe warning indicates a more severe type of database error such as having a male listed as the mother of an individual. Madeline will try to manage this type of situation, for example by changing the sex of the "male" mother to female. Such a change does not guarantee that the situation is remedied, much less correct: later in the same database, the "male" mother may turn out to be listed as the "father" of another child! This would cause a fatal error, causing the program to terminate, because there is no way to rectify such inconsistent information. The warning and error conditions may be reviewed in the error log.
Pedigree Reconstruction and the Categorization of Individuals
When a pedigree database is opened, Madeline reconstructs pedigrees based on the core data fields. When records for the parents of non-founder individuals are absent from the database, Madeline dummies-in the parents using the IDs shown in the FatherIDField and MotherIDField. If one of the two parental IDs is missing, Madeline creates a random ID for the missing parent . Random IDs are always eight characters in length and begin and end with an exclamation point (e.g., "!EW12M5!", "!G79ER5!", etc.) to facilitate recognition.
When FUSION IDs are used, Madeline dummies-in parents
even when parental IDs are not provided in the FatherIDField and
MotherIDField, and joins together spouses when they don’t have any offspring.
After reconstructing pedigrees, Madeline classifies individuals into categories (Table 1.9) and summarizes their distribution in a table (Fig. 1.9). Attached individuals are individuals in the database who have either parents, or offspring, or both. Unattached individuals are in the database, but remain unconnected because they don’t have parents or offspring. Unattached individuals often represent a set of unrelated controls in a data set.
In the current version, Childless spouses can only
be detected when FUSION IDs are employed. When a FUSION couple does not have children
listed in the database, usually one of the individuals has other connections to the
pedigree and falls into the attached category, while the remaining spouse
usually has no other connections to the pedigree and so is categorized as a
childless spouse.
| Category | Description |
| In Database: | |
| Attached | Individuals in the database who have parents and/or offspring. |
| Childless Spouses |
Married individuals in the database who do
not have children and who are not otherwise
attached to a pedigree.
|
| Unattached | Individuals in the database who remain unconnected. These may be controls. |
| Not In Database: | |
| Not In Database | Parents without records in the database who are inserted by Madeline. |
M> open ‘\test\test.dbf’
.
.
.
----------------------------- --------- --------- ---------
Pedigrees and Individuals Included Excluded Total
----------------------------- --------- --------- ---------
Pedigrees ................... 590 0 590
Individuals ................. 3,317 0 3,317
+ In database .............. 2,178 0 2,178
| + Attached .............. 2,164 0 2,164
| + Childless spouses ..... 14 0 14
| + Unattached ............ 0 0 0
+ Not in database .......... 1,139 0 1,139
M>
|
Fig. 1.9. Summary table of pedigree count and distribution of individuals by category in Madeline. After a database is opened and pedigrees reconstructed, Madeline displays a table showing the number of pedigrees and distribution of individuals by category.
Madeline provides _unattached, _ChildlessSpouse, and _InDatabase as references which return boolean status information regarding the categorization of an individual. These references can be easily used in queries to find out about the categorization of individuals (Fig. 1.10).
M>view for _ChildlessSpouse 0007-500 in 0007 (rec. no. 42) * childless spouse * 0049-500 in 0049 (rec. no. 276) * childless spouse * 0409-500 in 0409 (rec. no. 2433) * childless spouse * 0442-500 in 0442 (rec. no. 2628) * childless spouse * 0497-500 in 0497 (rec. no. 2912) * childless spouse * 1040+500 in 1040 (rec. no. 3917) * childless spouse * 1360+500 in 1360 (rec. no. 4853) * childless spouse * 1366+500 in 1366 (rec. no. 4862) * childless spouse * 8 individuals in 8 pedigrees matched as follows: Individuals .............. 8 + In database ........... 8 | + Attached ........... 0 | + Childless spouses .. 8 | + Unattached ......... 0 + Not in database ....... 0 M> |
Fig. 1.10. References returning boolean status information about individuals, such as _ChildlessSpouse, can be easily incorporated into queries in Madeline.
Data Classifications of Individuals
Before writing a file in a specific format using the write command, Madeline determines which individuals in a pedigree have data that can be used in an analysis of that pedigree. Madeline does this by examining the phenotype "Po" and genotype "Go" fields toggled on for output. Madeline uses this information when deciding which individuals are required in output. This is described in more detail in Data Evaluation and Management.
After the file has been written, Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category (Fig. 1.11). In this table, Madeline sub-categorizes attached individuals based on whether they have data or not, or have been otherwise marked for exclusion by the user. Note that individuals marked for exclusion may actually be included in output, but without their data, in order to preserve pedigree structure.
M>write to ‘\test\test.ped’ in genehunter format
.
.
.
----------------------------- --------- --------- ---------
Pedigrees and Individuals Included Excluded Total
----------------------------- --------- --------- ---------
Pedigrees ................... 574 16 590
Individuals ................. 3,247 70 3,317
+ In database .............. 2,140 38 2,178
| + Attached .............. 2,140 24 2,164
| | + With data .......... 2,139 15 2,154
| | + Without data ....... 1 9 10
| | + Marked for exclusion 0 0 0
| + Childless spouses ..... 0 14 14
| + Unattached ............ 0 0 0
+ Not in database .......... 1,107 32 1,139
M>
|
Fig. 1.11. Summary table after a write command in Madeline. Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category. Attached individuals (in bold) are sub-categorized based on whether they have data or not, or have been marked for exclusion by the user.
When present, Madeline relies on information contained in the MZTwinField, DZTwinField, and DateOfBirthField to evaluate monozygotic and dizygotic twinships. When the optional DateOfBirthField is included, Madeline verifies that birth dates of twins match. Verification is extended to dizygotic twins when the optional DZTwinField is also included.
When the DateOfBirthField is included, Madeline looks for twins who are not marked in the either MZTwinField or DZTwinField (if present). Apparent twins of opposite sex are categorized as dizygotic twins. Apparent same-sex twins are assigned to a special twin of unknown type category. Twins whose type is unknown are shown with a question mark between them in pedigree drawings.
If Madeline encounters single, unpaired individuals marked as twins in the MZTwinField or DZTwinField, the program automatically removes the twin flag and informs the user of the change. The flag is only altered in memory -- the data table itself remains unchanged.
Messages about twinships are recorded in the summary and detail log files.
Madeline automatically detects consanguinity in pedigrees. Messages about consanguinity are recorded in the summary and detail log files.
There is no limit to the number of spouses that an individual in a pedigree may have. Pedigree drawings can display up to 10 spouses of a single individual.
Madeline can model pedigrees having multiple original founders. When the DividedPages flag is on (the default), Madeline's draw command will draw pedigrees consisting of an ancestral founder with one or more founding spouses on a single virtual page. Pedigrees consisting of two or more founding ancestral mate groups will be printed on multiple virtual pages (Whether a single virtual page is printed on one or more physical pages depends on the setting of orientation and the unscaled dimensions of the drawing).
Data Evaluation And Management
Prior to writing output in a specific format, Madeline determines which individuals in a pedigree have data that can be used for analysis by examining the genotype "Go" fields and, if appropriate, the phenotype "Po" and covariate "Vo" fields toggled on for output.
In general, an individual is considered to have genotype data if he is typed for at least one marker among the set of "Go" fields. If applicable, an individual is considered to have phenotype data if all of his or her "Po" and "Vo" fields are non-missing.
After flagging individuals in a pedigree who have usable data, Madeline decides whether the entire pedigree is usable or not. Madeline’s decisions depend on the specific format keyword associated with the write command. For example, using the GenehunterNpl keyword (for a non-parametric analysis) will result in a different set of pedigree exclusions than the genehunter keyword (for a parametric analysis), although there will certainly be overlap in the sets.
Only required individuals in included pedigrees are written to output. Required individuals consist of individuals who:
For example, records for unsampled parents are often required to show relationships among siblings. Terminal individuals without offspring who do not have data are excluded from output. Individuals who have been marked for exclusion by the user using the exclude command will be included, but without their data, only if they are required to maintain pedigree structure. Otherwise, they will be excluded.
It is possible to turn off Madeline's data evaluation machinery for most formats in order to include possibly unusable pedigrees and individuals in output by issuing the command set autoexclude off.
Tracking Inclusion and Exclusion of Pedigrees and Individuals
Madeline’s detail log file records which pedigrees were excluded from output. Fig. 1.12 shows an example detail log produced after requesting an output file in GenehunterNpl format.
.
.
.
GenehunterPedigreeHasData(): excluding pedigree 0547: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0557: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0558: lacks an individual with data.
GenehunterPedigreeHasData(): excluding pedigree 0560: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0572: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0583: contains only a single affected
individual.
GenehunterPedigreeHasData(): excluding pedigree 0587: contains only a single affected
individual.
.
.
.
|
Fig. 1.12. Excerpt from a Madeline detail log file produced after requesting output in GenehunterNpl format. Madeline’s detail log file records which pedigrees were excluded from output and why.
In addition, a draw command executed after a write command will automatically produce annotated pedigree drawings showing which individuals:
An example is shown in Fig. 1.13. In this example, the user marked individuals with a body mass index (BMI) greater than or equal to 35 for exclusion using the exclude command and then requested an output file in GenehunterNpl format.
Fig. 1.13. Annotated pedigree drawing produced by draw after a write command in Madeline. Madeline dummied-in the two founding parents, "200" and "300", who are indicated by dashed lines. They were included ("INCLUDED") in output. Two individuals, "500" and "601", were marked for exclusion by the user. The terminal individual, "601", was not included in output ("EXCLUDED"), but "500" was retained with data excluded in order to preserve pedigree structure ("DATA EXCL INDV INCL"). The remaining individuals are all annotated as having genotype data and were included in output ("HAS DATA - INCLUDED"). Affected individuals are shaded and labeled with "A", while unaffected individuals are unshaded and labeled with "U".
Madeline provides powerful mechanisms for querying and subsetting records in pedigree tables. Database management systems can generally match query criteria against only one record at a time. In contrast, Madeline is specialized for dealing with multiple relationships in a pedigree simultaneously.
Madeline provides mechanisms for referring to related records within a single query statement. In Madeline, you can reference an individual, his or her mother or father, mates, and offspring all in a single query statement.
You can also reference aggregate or summary information related to an entire sibship, such as the mean sibship value of a variable, as easily as you can reference values related to single individuals. These two mechanisms -- referencing related individuals and referencing sibship aggregate data -- make it easy to get answers to many questions in Madeline that can be tedious to obtain in general database management systems.
Referencing Internal Information About An Individual And Relatives
Madeline allows the user to look at internal information about an individual and his or her relatives using references. References are a subset of keywords which begin with an underscore character to distinguish them from similarly-named variables or fields in databases. There are two types of references:
References to Internal Information About An Individual
Madeline provides references to many items of internal information about an individual, such as the number of offspring (_noffspring) and number of mates (_nmates) an individual has, and total number of individuals in the individual's pedigree (_n). Example usage is shown in Fig. 1.14. Table 5.4 lists all references to internal information.
M>go 1901 <-- go to record no. 1901 M>show studyid <-- display the studyid of this individual "05100" M>show bmi <-- display body mass index 48.9809 M>show cpep <-- display c peptide value 0.88 M>show _noffspring <-- display number of offspring 4 M>show _nmates <-- display number of mates 1 M>show _n <-- display total number of individuals in this individual’s pedigree 16 M> |
Fig. 1.14. References to internal information about an individual in Madeline. Command lines shown in blue are examples of references to internal information that Madeline maintains about each individual.
Madeline also maintains references which point to relatives of an individual (Fig. 1.15). The references to mates, _mate[], and offspring, _o[], are treated as arrays. Alternate references such as _spouse for _mate[0] and _FirstChild for _o[0], are also provided for convenience.
References can be chained using the dot operator, ".", in order to access information related to more distant relatives. For example, a maternal grandmother may be referenced using _mother._mother. Example usage is shown in Fig. 1.15. A complete list of references to relatives is provided in Table 5.4.
M>go 6174 <-- go to record no. 6174
M>show frstname <-- first name of individual
"William"
M>show lastname <-- last name of individual
"Goodman"
M>show _noffspring <-- number of offspring
11
M>show _nmates <-- number of spouses
1
M>show _mate[0].frstname <-- first name of spouse
"Tessie"
M>show _FirstChild.dob <-- date of birth of first listed child
{Thursday, May 30, 1957}
M>show _SecondChild.dob <-- date of birth of second listed child
{Monday, December 19, 1966}
M>show _o[10].dob <-- date of birth of eleventh listed child
{Sunday, January 25, 1953}
M>show _mother._mother.dob <-- date of birth of maternal grandmother (unknown)
{ }
M>show _mother._mother.lastname <-- last name of maternal grandmother
"Toughwoman"
M>
|
Fig. 1.15. Using References to Relatives in Madeline. Command lines using references to relatives are shown in blue. Note that children in the offspring vector are sorted by IndividualIDField, not by date of birth.
In addition to references to individual information and relatives, Madeline provides aggregate functions that allow one to look at aggregate or summary information -- such as means and standard deviations -- of the offspring of an individual (Fig. 1.16).
M>go 1577 <-- go to record no. 1577 M>show studyid <-- display studyid "044301" M>show _noffspring <-- display number of offspring 2 M>show _o[0].bmi <-- body mass index of first child 31.1327 M>show _o[1].bmi <-- body mass index of second child 32.7896 M>show _omean(bmi) <-- mean body mass index of offspring 31.9612 M>show _ostddev(bmi) <-- standard deviation of offspring bmi 1.17156 M> |
1.16. Aggregate Functions In Madeline. Aggregate functions (blue) allow one to look at summary information such as means and standard deviations of the offspring of individuals.
All aggregate functions take as an argument an expression which evaluates to a numeric result. Table 6.2 lists the aggregate functions available in Madeline.
The view command retrieves a subset of records that match query criteria. The exclude command allows the user to mark a subset of records for exclusion from output. The unexclude command performs the opposite function -- unmarking a subset of records previously marked for exclusion. Starting with version 0.90, the draw command can now also be invoked with a query expression in order to draw a subset of pedigrees. Example usage is shown in Fig. 1.17.
M>view for _noffspring>=3 and _omean(bmi)>=50 2113-100 in 2113 (rec. no. 32) 2113-500 in 2113 (rec. no. 35) 2 individuals in 1 pedigree matched as follows: Individuals .............. 2 + In database ........... 2 | + Attached ........... 2 | + Childless spouses .. 0 | + Unattached ......... 0 + Not in database ....... 0 M>exclude for _noffspring>=3 and _omean(bmi)>=50 2113-100 has been marked for exclusion 2113-500 has been marked for exclusion 2 individuals in 1 pedigree marked for exclusion as follows: Individuals .............. 2 + In database ........... 2 | + Attached ........... 2 | + Childless spouses .. 0 | + Unattached ......... 0 + Not in database ....... 0 M>draw pedigrees for _noffspring>=3 and _omean(bmi)>=50 1 pedigree in result set calling "gs madeline.ps" M> |
Fig. 1.17. Query and Subsetting Commands in Madeline. In this example, the view command is used to identify parents having three offspring whose mean body mass index is greater than or equal to 50. The query result set contains one pair who are excluded using exclude. The draw command is then invoked with the same query expression in order to draw the relevant pedigree. The command draw pedigree '2113' could also have been used.
Madeline's draw command produces drawings of pedigrees using Adobe Postscript language procedures and document structuring conventions (Fig. 1.18).
![]() |
|||||||||||||
![]() |
|||||||||||||
![]() |
|||||||||||||
![]() |
|||||||||||||
Fig. 1.18. An example pedigree drawn by Madeline. In this example, two categorical variables indicating disease conditions are graphically displayed on the left and right halves of the icons. The status of the first condition, on the left side, is coded using "U" for unaffected and "A" for affected. On the right side, the status of the second condition is coded using "U" for unaffected, "M" for moderate, and "S" for severe. Missing values are indicated by dots, ".". The icon drawn with a dashed line perimeter indicates an individual whose record was not found in the database. No ID was provided in the FatherIDField of the gender-unknown offspring, and so the program has assigned a random ID of !21A3F8! to the missing father. (The displayed data were invented to illustrate the drawing capabilities of the program).
Pedigree drawings can display any number of field variables present in a dataset. The toggle command is used to select fields for inclusion on a pedigree drawing.Toggle output flags toggles which fields appear as labels under the icons on a pedigree drawing. The set field order command is used to order selected fields within their respective categories, "C" ,"P", or "G". On drawings, core "Co" fields always appear first, followed by phenotype "Po" fields, and finally genotype "Go" fields.
Toggle icon flags toggles on or off the set of categorical variables to be displayed graphically by shading or coloring regions of the male and female icons. Madeline divides the icon into pie-slice shading regions based on the number of categorical variables selected. The program does not impose a limit on the number of categorical variables that can be graphed simultaneously.
The manner in which subtrees are divided across pages, the paper orientation, size, margins, and color may all be set using various set commands. When DividedDrawings is set on (the default), subtrees of a pedigree originating from different founding ancestor groups are printed on separate pages. Orientation may be set to portrait, landscape, automatic, or MultiPage. When orientation is set to automatic or MultiPage, Madeline decides on the orientation of individual pedigrees depending upon the width and height of each drawing. In the event that a drawing would require excessive reduction to fit on a single page, Madeline will automatically include Postscript commands to print the drawing in poster-style across several physical pages.
Madeline's Postscript drawing routines are efficient, typically permitting the construction of hundreds of drawings per second on a modern Sun SparcStation or Intel Pentium machine. In order to view the drawings on screen, the user needs to assign the name of a Postscript viewing application (such as GhostView, GV or GSView) to Madeline's PostscriptViewer variable (Fig. 1.17). This can be done in the autorun.bat file.
![]() |
![]() |
![]() |
![]() |
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
![]() |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Fig. 1.17. Drawing pedigrees in Madeline. Toggle output flags specifies which fields will appear on the pedigree drawings. Draw pedigrees for ... specifies a subset of pedigrees that match the query criteria. Madeline calls the Postscript viewing application named in PostscriptViewer (gv in the Linux environment shown).
Producing Output Files for Analysis
The write command is used to produce locus, pedigree, and control or parameter files for analysis. Keywords like Mendel and GenehunterNpl are used to specify the analysis file format.
For most formats which require a control or parameter file, a single write command suffices to produce both the pedigree and control file. In these cases, the control file often contains the required locus information. For some other formats, the command write locus file is used to produce the locus file separately from the write pedigree command used to create the pedigree file. Section 4, Write Formats, documents the procedure required for supported formats.
Section 2
Tutorial
Introduction to the Tutorial
Madeline is easy to use once you see how it works. The goal of this section is to enable you to use Madeline to accomplish real tasks in a very short time. An instructive command file is shown in Fig. 2.1. Comment lines begin with two forward slashes, "//". Command lines are shown in bold. The effect of each command or group of commands is described in turn.
// Assign log files: LogFile='chr8.log' DetailFile='chr8.dtl' ErrorFile='chr8.err' quiet system "dir \databases\chr8.*" // Map missing value indicators: list nmv nmv[0]=-1 nmv[1]=-9 list nmv // Map core field names: GenderField='GENDER' AffectionStatusField="AFFECTSTAT" // Map codes used in core fields: list csv csv[_female]='FEMALE' csv[_male]='MALE' list csv // Load a database containing genetic maps: load '\maps\emap.dbf' list map for chromosome=8 // Open pedigree database: open '\databases\chr8.dbf' // toggle off output of phenotype fields: toggle output flag for bmi list fields // Example 1: Create files for Mendel USERM13 analysis: write locus file to '\analysis\mendel.loc' in mendel format write pedigree file to '\analysis\userm13.ped' in userm13 format // Example 2: Create files for Genehunter non-parametric linkage analysis: write locus file to '\analysis\ghnpl.loc' in genehunter format write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format // Example 3: Create files for Siblink affected sib pair analysis: // First, mark some individuals for exclusion: exclude for bmi>=35 write to '\analysis\asp.ped' in SiblinkAffectedPairs format // Draw pedigrees: list fields toggle output flags for 2-5, bmi, affectstat, 12-20 list fields drawingfile='pedigrees.ps' set color off set orientation to automatic set papermargin to 1.5 AffectstatLabel[0]="U" AffectstatLabel[1]="A" draw pedigrees ‘0001’-‘0005’,'0472','0570' // End session: goodbye |
Fig. 2.1. Example Madeline command file.
This tutorial includes sample commands to map missing values, assign core field names, and designate codes used in core fields. These commands are typically required, but some of them will not be needed when FUSION data are used. Madeline is generally quite flexible about the order in which commands are executed. The tutorial presents a recommended command sequence.
LogFile, DetailFile, and ErrorFile store the names of the summary, detail, and error logs. By default, LogFile is set to "madeline.log", DetailFile to "madeline.dtl", and ErrorFile to "madeline.err". If the default names are used, these files will be overwritten each time you start Madeline. When you provide new assignments (Fig. 2.2), the current contents of the log files are copied to the new files, and all subsequent messages are redirected to the new files. Reassignment of the log and detail files should be done at the beginning of a session.
M>LogFile='chr8.log' LogFile has been changed from "madeline.log" to "chr8.log" M>DetailFile='chr8.dtl' DetailFile has been changed from "madeline.dtl" to "chr8.dtl" M>ErrorFile='chr8.err' ErrorFile has been changed from "madeline.err" to "chr8.err" M> |
Fig. 2.2. Reassigning summary, detail, and error log file names in Madeline.
By default, Madeline is in verbose mode. In verbose mode, all messages, both summary and detail log messages, are sent to the screen. Writing many messages to the screen slows the program down a bit and may be distracting, so Madeline supports two quieter levels. When quiet is issued, summary log messages continue to be printed to the screen, but detail log messages are suppressed from the screen. When silent or silence is issued, neither summary nor detail messages appear on the screen. Error messages are always printed to screen regardless of the verboseness setting. To return from a quiet state to the default, issue verbose. Under all circumstances, messages continue to be printed to the summary and detail log files, as appropriate. Quiet mode is recommended on platforms such as DOS32 and Windows that lack scrollable terminal window buffers.
System ‘dir \databases\chr8.*’
The system command transfers a quoted-string command to the operating system shell. This allows the user to obtain directory and file information, copy or move files, or run other software without having to exit Madeline. System is especially useful when you need to obtain file or directory information using the DOS dir command or the UNIX ls command.
Mapping Missing Value Indicators
Nmv is the abbreviated name for the NumericMissingValue array. The list command instructs Madeline to list the elements of the array (Fig 2.3).
M>list nmv NMV has 1 element: NMV[ 0]= -9999 M>nmv[0]=-1 M>nmv[1]=-9 M>list nmv NMV has 2 elements: NMV[ 0]= -1 NMV[ 1]= -9 |
Fig. 2.3. Mapping missing value indicators in Madeline.
By default, nmv[] contains a single element, -9999, which is a default missing value indicator used in the FUSION study. The assignment nmv[0]=-1 overwrites the value of the first cell with -1. The assignment nmv[1]=-9 assigns -9 to the second cell, automatically expanding the array if necessary. -1 and -9 will now be automatically recognized as missing value indicators when subsequently reading values in a database. Madeline’s self-expanding arrays do not impose a limit on the number of missing value indicators which may be used in a database.
In a general setting, the names of core fields in a pedigree database may differ from the default names used in Madeline which are based on field names encountered in the FUSION study. Assignments to the appropriate core field name variables (Fig. 2.4) instruct Madeline to recognize core field names when a pedigree database is opened subsequently. Madeline will automatically capitalize and truncate field names to 10 letters if necessary.
M>GenderField='GENDER' M>AffectionStatusField="AFFECTSTAT" |
Fig. 2.4. Mapping Core Field Names in Madeline.
Mapping Codes Used In Core Fields
Arbitrary sets of codes may be used to represent core categorical information such as gender or affection status. Assignments to the appropriate arrays instruct Madeline to recognize study codings correctly. Fig. 2.5. shows how to tell Madeline to recognize the gender codes "MALE" and "FEMALE" in a database in place of the default codes "M" and "F". By using the symbolic constants _female and _male to index the array, you don't have to remember specifically which cell is reserved for which sex.
M>list csv CSV has 2 elements: CSV[ 0]="M" CSV[ 1]="F" M>csv[_female]='FEMALE' M>csv[_male]='MALE' M>list csv CSV has 2 elements: CSV[ 0]="MALE" CSV[ 1]="FEMALE" |
Fig. 2.5. Mapping codes used in core fields in Madeline.
The load command (Fig. 2.6) loads a table containing genetic maps for one or more chromosomes. The map table can be in any of the supported input database formats. It may contain only one map for each chromosome. The map table must contain fields of information specifying the chromosome, the rank or ordinal position of the marker within the map for a given chromosome, the name of the marker, and the position of the marker in centiMorgans.
After load, Madeline will indicate that marker maps have been installed. You can view a map by issuing list map for chromosome=n, where n is a valid chromosome number (the human x chromosome may be designated by 23). To obtain a listing of all markers for all chromosomes present in the table, issue list map by itself.
M>load '\maps\emap.dbf' Marker maps based on k:\emap\emap.dbf are now installed. M>list map for chromosome=8 Marker Name Ch Or Position ----------- -- -- -------- D8S504 8 1 0.0000 D8S550 8 2 15.1000 D8S258 8 3 30.1000 D8S283 8 4 55.0000 Beta3 8 5 59.8000 D8S285 8 6 66.4000 D8S260 8 7 71.3000 D8S530 8 8 80.7000 D8S270 8 9 94.4000 D8S276 8 10 105.0000 GATA101F01 8 11 111.4000 D8S514 8 12 122.2000 D8S284 8 13 135.3000 |
Fig. 2.6. Loading a database containing genetic maps in Madeline.
The USERM13, Genehunter, and Siblink pedigree files that will be written subsequently do not include phenotype information. With the exception of core "C" fields which Madeline controls, it is imperative to toggle off all fields in the database which should not be included in the output and which should not be considered when Madeline decides whether an individual or pedigree contains sufficient data for output. This is done using the toggle command (Fig. 2.7). The list fields command can then be used to verify that the correct subset of fields were turned off.
// toggle off output of phenotype fields: M>toggle output flag for bmi Note: genotype fields ordered according to current map M>list fields 1.STUDYID Co__1 8.BMI P 15.D8S276 Go__9 2.GENDER Co__2 9.D8S504 Go__1 16.D8S283 Go__4 3.FATHER Co__3 10.D8S550 Go__2 17.D8S285 Go__5 4.MOTHER Co__4 11.D8S258 Go__3 18.D8S260 Go__6 5.TWIN Co__5 12.GATA101F01 Go_10 19.D8S530 Go__7 6.AFFECTSTAT C 13.D8S514 Go_11 20.D8S270 Go__8 7.DOB C 14.D8S284 Go_12 M> |
Fig. 2.7. Toggling and listing fields in Madeline. After the toggle command, field 8. BMI is no longer toggled on for output.
Open opens a pedigree database. Madeline's database engine seamlessly opens all supported database types on all supported platforms, allowing you to open FoxPro files on Solaris, SAS transport files on a PC, and so on. The user does not need to tell Madeline the file type. To open an ASCII flat file database, see documentation for the recognize, convert, rectify, transpose and merge commands.
When a pedigree database is opened, Madeline first categorizes fields as core "C", genotype "G", phenotype "P", or null, "*". If genotype fields are present, allele frequencies are estimated from all of the data using gene counting, ignoring family relationships (a in Fig. 2.8). If a map table is already installed and contains a map for markers in the database, the genotype fields are automatically ordered according to the map (b in Fig. 2.8). Pedigrees are reconstructed based on the core information. Madeline performs additional data operations when optional core fields such as AffectionStatusField or DateOfBirthField are included (c in Fig. 2.8). In this example, Madeline marks several apparent dizygotic twinships. Madeline also flags the AffectionStatusField, AFFECTSTAT, with a plus sign, "+", indicating that the categorical levels of AFFECTSTAT will be displayed graphically on the male and female icons in pedigree drawings. Finally, the program displays a summary table showing the count of pedigrees and distribution of individuals by category (d in Fig. 2.8).
M>open '\hold\chr8.dbf' Calculating allele frequencies for 9. D8S504... (a) … Calculating allele frequencies for 20. D8S270... (a) Database "\hold\chr8.dbf" opened with 2,506 records Core information read in 2.00 seconds … NOTE: 0471-100 and 0471-401 now marked with "a" indicating (c) an apparent dizygotic twinship. NOTE: 0570-401 and 0570-402 now marked with "a" indicating (c) an apparent dizygotic twinship. Pedigrees reconstructed in 0.1780 seconds Note: genotype fields ordered according to current map (b) 1.STUDYID Co__1 8.BMI Po__1 15.D8S276 Go__9 2.GENDER Co__2 9.D8S504 Go__1 16.D8S283 Go__4 3.FATHER Co__3 10.D8S550 Go__2 17.D8S285 Go__5 4.MOTHER Co__4 11.D8S258 Go__3 18.D8S260 Go__6 5.TWIN Co__5 12.GATA101F01 Go_10 19.D8S530 Go__7 6.AFFECTSTAT C + 13.D8S514 Go_11 20.D8S270 Go__8 7.DOB C 14.D8S284 Go_12 ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total (d) ----------------------------- --------- --------- --------- Pedigrees ................... 958 0 958 Individuals ................. 3,626 0 3,626 + In database .............. 2,506 0 2,506 | + Attached .............. 2,115 0 2,115 | + Childless spouses ..... 13 0 13 | + Unattached ............ 378 0 378 + Not in database .......... 1,120 0 1,120 |
Fig. 2.8. Opening a pedigree database in Madeline. Madeline performs a series of operations when the open command is used to open a pedigree database. See text for explanation.
Example 1: Creating Files for Mendel USERM13 Analysis
Mendel’s USERM13 module uses maximum likelihood methods to calculate allele frequencies, taking family relationships into consideration. All genotyped individuals in a database, including childless spouses, controls and other singleton individuals who are classified as unattached by Madeline can be used in an analysis.
USERM13 requires a locus and pedigree file as input. The locus file will contain allele frequency information calculated by Madeline. The pedigree file will contain the family and genotype information. The write locus file command with the generic mendel keyword creates the locus file (Fig. 2.9). The write pedigree file command with the userm13 keyword creates the pedigree file. As expected, childless spouses and a number of unattached individuals are included in the output file. The detail log file documents which individuals and pedigrees were excluded and why.
M>write locus file to '\analysis\mendel.loc' in mendel format Locus file "\analysis\mendel.loc" has been written. M>write pedigree file to '\analysis\userm13.ped' in userm13 format Writing pedigree data to "\analysis\userm13.ped" ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 810 148 958 Individuals ................. 3,469 157 3,626 + In database .............. 2,351 155 2,506 | + Attached .............. 2,107 8 2,115 | | + With data .......... 2,107 0 2,107 | | + Without data ....... 0 8 8 | | + Marked for exclusion 0 0 0 | + Childless spouses ..... 13 0 13 | + Unattached ............ 231 147 378 + Not in database .......... 1,118 2 1,120 |
Fig. 2.9. Creating locus and pedigree files for a Mendel USERM13 analysis in Madeline.
Example 2: Creating Files for Non-parametric Linkage Analysis in Genehunter
Like USERM13, Genehunter also requires a locus and pedigree file for analysis. In addition to allele frequency information, Genehunter’s locus file will contain map distance information obtained from the previously loaded map database. The generic genehunter keyword is used to specify the locus file format (Fig. 2.10).
M>write locus file to '\analysis\ghnpl.loc' in genehunter format Locus file "\analysis\ghnpl.loc" has been written. M>write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format Creating associated Genehunter control file called "\analysis\ghnpl.ctl" Writing pedigree data to "\analysis\ghnpl.ped" ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 533 425 958 Individuals ................. 3,033 593 3,626 + In database .............. 2,003 503 2,506 | + Attached .............. 2,003 112 2,115 | | + With data .......... 2,003 104 2,107 | | + Without data ....... 0 8 8 | | + Marked for exclusion 0 0 0 | + Childless spouses ..... 0 13 13 | + Unattached ............ 0 378 378 + Not in database .......... 1,030 90 1,120 |
Fig. 2.10. Creating locus and pedigree files for Genehunter non-parametric linkage analysis in Madeline.
The genehunternpl keyword specifies that Madeline exclude pedigrees that cannot be used, or do not contribute to a non-parametric linkage analysis in Genehunter. For a parametric linkage analysis, the generic genehunter keyword would have been used, which could have resulted in a different set of exclusions. Since genehunter cannot make use of information from singleton individuals, all unattached individuals are excluded from the output file. Childless spouses are also excluded since they cannot contribute to an analysis in Genehunter.
For the Genehunter format, Madeline also creates a command file ending in a .ctl extension (ghnpl.ctl in the example). This file contains commands and parameter values for running the analysis in Genehunter (Fig. 2.11). Note that the values of Madeline’s internal variables OffEndDistance and EvaluationInterval are automatically inserted in the off end and increment distance commands. Whenever practical, Madeline produces control files in conjunction with data files for running analyses.
haplotype off score all ps on off end 10.000000 <-- value from Madeline’s OffEndDistance variable increment distance 0.500000 <-- value from Madeline’s EvaluationInterval variable load \analysis\ghnpl.loc scan \analysis\ghnpl.ped total stat \analysis\ghnpl.npl.ps \analysis\ghnpl.lod.ps \analysis\ghnpl.inf.ps |
Fig. 2.11. Genehunter command file created by Madeline. This command file can be used directly to run the analysis. The values of Madeline’s internal variables OffEndDistance and EvaluationInterval are automatically inserted in the off end and increment distance commands.
Example 3: Excluding Individuals and Creating Files for Siblink Analysis
Madeline’s exclude command marks individuals for exclusion (Fig. 2.12). Marked individuals are retained in output, but without their data, only if they are required to maintain pedigree structure, and are otherwise excluded. A summary table will be produced to show the distribution of excluded individuals.
M>exclude for bmi>=35 223 individuals in 172 pedigrees marked for exclusion as follows: Individuals .............. 223 + In database ........... 223 | + Attached ........... 212 | + Childless spouses .. 1 | + Unattached ......... 10 + Not in database ....... 0 |
Fig. 2.12. Excluding Individuals in Madeline. After an exclude command, Madeline produces a summary table showing the distribution of excluded individuals.
A single write command suffices to produce all required files for certain formats, such as Siblink (Fig. 2.13) which requires a pedigree file and a control file. Because the locus information is embedded in the control file, a separate write locus file command is not required. The write pedigree file command can always be abbreviated to write, as shown in the following example.
In addition to a table labeled "ACTUAL" showing the actual number of pedigrees and individuals included in the output file, Madeline produces a second table labeled "NUCLEAR FAMILY-BASED" which shows the number of nuclear families (labeled "Pedigrees"), individuals, and sibpairs included in the output file.
M>write to '\dump\asp.ped' in SiblinkAffectedPairs format Creating associated SIBLINK control/parameter file called "\dump\asp.ctl" Writing pedigree data to "\dump\asp.ped" ACTUAL: ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 425 533 958 Individuals ................. 1,791 1,835 3,626 + In database .............. 970 1,536 2,506 | + Attached .............. 970 1,145 2,115 | | + With data .......... 958 152 1,110 | | + Without data ....... 12 781 793 | | + Marked for exclusion 0 212 212 | + Childless spouses ..... 0 13 13 | + Unattached ............ 0 378 378 + Not in database .......... 821 299 1,120 NUCLEAR FAMILY-BASED: ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 425 365 790 Individuals ................. 1,791 2,045 3,836 + In database .............. 970 1,746 2,716 | + Attached .............. 970 1,355 2,325 | | + With data .......... 958 336 1,294 | | + Without data ....... 12 782 794 | | + Marked for exclusion 0 237 237 | + Childless spouses ..... 0 13 13 | + Unattached ............ 0 378 378 + Not in database .......... 821 299 1,120 ----------------------------- --------- --------- --------- Number of Sibpairs .......... 626 898 1,524 ----------------------------- --------- --------- --------- M> |
Fig. 2.13. Creating files for Siblink analysis in Madeline. A single write command produces the required pedigree and control files. Madeline produces a second table of nuclear family-based statistics for all formats that require decomposition of full pedigrees into nuclear families.
In the nuclear family-based statistics, individuals who appear as offspring in one nuclear family and subsequently as founding parents of their own nuclear families are counted twice, thus leading to an apparently greater number of individuals in the second table. There appear to be fewer pedigrees overall in the second table because singleton pedigrees counted in the "ACTUAL" table are not counted in the "NUCLEAR FAMILY-BASED" table. Madeline produces this second table for all output formats requiring the decomposition of full pedigrees into nuclear pedigrees, such as Siblink and Aspex.
The toggle command is used to specify the set of fields to appear as labels on the pedigree drawings (Fig. 2.14). Fields can be referred to by either name or number, and a range of fields can be specified using a dash. DrawingFile indicates the name of the Postscript output file which will contain the drawings. Set color off instructs Madeline to use black and white. Set orientation to automatic tells Madeline to automatically decide which orientation is best, and to divide the drawing among several physical pages if necessary. Set PaperMargin to 1.5 instructs the program to leave margins of 1.5 centimeters on all four sides of the paper. Note that dimensions must be specified in centimeters.
M>list fields 1.STUDYID Co__1 8.BMI Po__1 15.D8S276 Go__9 2.GENDER Co__2 9.D8S504 Go__1 16.D8S283 Go__4 3.FATHER Co__3 10.D8S550 Go__2 17.D8S285 Go__5 4.MOTHER Co__4 11.D8S258 Go__3 18.D8S260 Go__6 5.TWIN Co__5 12.GATA101F01 Go_10 19.D8S530 Go__7 6.AFFECTSTAT C + 13.D8S514 Go_11 20.D8S270 Go__8 7.DOB C 14.D8S284 Go_12 M>toggle output flags for 2-5, bmi, affectstat, 12-20 Note: genotype fields ordered according to current map M>list fields 1.STUDYID Co__1 8.BMI Po__1 15.D8S276 G 2.GENDER C 9.D8S504 Go__1 16.D8S283 G 3.FATHER C 10.D8S550 Go__2 17.D8S285 G 4.MOTHER C 11.D8S258 Go__3 18.D8S260 G 5.TWIN C 12.GATA101F01 G 19.D8S530 G 6.AFFECTSTAT Co__2+ 13.D8S514 G 20.D8S270 G 7.DOB C 14.D8S284 G M>drawingfile='pedigrees.ps' M>set color off M>set orientation to automatic M>set papermargin to 1.5 M>AffectstatLabel[0]="U" M>AffectstatLabel[1]="A" M>draw pedigrees ‘0001’-‘0005’,'0472','0570' Drawing page 1 of 1 page for pedigree 0001... Drawing page 1 of 1 page for pedigree 0002... Drawing page 1 of 1 page for pedigree 0003... Drawing page 1 of 1 page for pedigree 0004... Drawing page 1 of 1 page for pedigree 0005... Drawing page 1 of 1 page for pedigree 0472... Drawing page 1 of 1 page for pedigree 0570... M> |
Fig. 2.14. Drawing Pedigrees in Madeline.
Whenever a categorical field from a table is flagged for graphical display on a pedigree drawing, Madeline associates a labels array with the field. The user may designate a short label for each level of the categorical variable. The name of the labels array is simply the name of the categorical field with the word "label" appended to the end.
In this example, AFFECTSTAT was detected as the AffectionStatusField when the database was opened and automatically flagged for icon display by the program (one can also manually flag categorical fields using the toggle icon flag command). The name of the labels array is AffectstatLabel.
Madeline assigns default labels for each level of a categorical variable. The default labels will be either the values that the categorical variable takes at each level (such as 0, 1, 2, ... etc.) or sequential letters of the alphabet. The list command can be used to view the values in the array. Often one will want to assign different labels: here, unaffected individuals will be labeled with "U" and affected individuals with "A".
A range of pedigrees may be specified as a parameter to the draw command by using a dash to separate the starting and ending pedigrees. Individual pedigrees may be separated by commas. Note that pedigree IDs are string values that must be enclosed in quotes.
Because the draw command followed after a write command, the pedigree drawings are annotated to show which individuals had data and were included in the Siblink output (Fig. 2.15). As expected, only affected individuals and their parents were included.
Fig. 2.15. Annotated pedigree drawing produced by Madeline. Individuals included in output are annotated with "INCLUDED", those contributing data are annotated with "HAS DATA".
2.15. Ending a Madeline Session
Goodbye (Fig. 2.16) is equivalent to quit. It terminates the current Madeline session.
M>goodbye Releasing resources ... Goodbye! |
Fig. 2.16. Ending a Madeline Session
This section describes Madeline’s commands. Commands are presented in alphabetical order. A bold heading shows the name of each command. A second bold heading shows the syntax of the command. Note the following conventions (Table 3.1).
| Symbol | Description | |
| [ ] | Square brackets indicate optional items in the syntax. For example, DRAW PEDIGREE[S] means that DRAW PEDIGREE and DRAW PEDIGREES are both valid. | |
| < > | Angled brackets indicate an expression that can be evaluated by Madeline -- see below: | |
| <cXXX> | An expression beginning with a lower-case c indicates a character or string expression. For example, DRAW PEDIGREE <cFamilyID1> means that <cFamilyID1> must be in the form of a string, such as "0341". Issuing draw pedigree 0341 would result in an error. | |
| <nXXX> | An expression beginning with a lower-case n indicates a numeric expression. List map for chromosome=<nChrNo> means that <nChrNo> must be a number like 23. List map for chromosome=23 would succeed: list map for chromosome="X" would fail. | |
| <LXXX> | An expression beginning with an upper-case L indicates a logical expression that evaluates to either _true (1), or _false (0). This is usually an expression containing an equality or inequality operator, or a series of such operators joined by and or or. For example, view for <Lexpr> indicates that the view command requires a logical expression following the word for: view for studyage<=35 is a valid example of this command. | |
|
<Field_i>
<InternalArray> etc. |
Other expressions in angled brackets, such as <Field_i> or <InternalArray>, represent database field variables or Madeline's internal variables or arrays. For example, toggle output flag for <field_i> means that the name of a field is expected to follow the word for: toggle output flag for D20S889 is a valid example. | |
| | | A bar indicates the word "or", indicating that either the option preceding or following the bar is valid. For example list fields|<InternalArray>|map indicates that list fields, list CharacterMissingValue, and list map are all valid variants of the list command. |
A description of the command follows the syntax heading, with at least one example showing how to use the command.
Displays program banner. See: HELLO,
STATUS.
BANNER
BANNER
M>banner MADELINE Version 0.910 Copyright (c) 1999 by Edward H. Trager |
Clears exclusion flags from all individuals previously marked for exclusion using the exclude command. To clear exclusion flags from only a subset of individuals, use the unexclude command. See: EXCLUDE, UNEXCLUDE.
M>exclude for bmi>=35 213 individuals in 162 pedigrees marked for exclusion as follows: Individuals .............. 213 + In database ........... 213 | + Attached ........... 212 | + Childless spouses .. 1 | + Unattached ......... 0 + Not in database ....... 0 M>clear exclusions M> |
CONVERT
CONVERT COMMA|TAB|<OTHER> DELIMITED FILE <INPUT_FILE> [TO
<OUTPUT_FILE>]
Convert converts a comma-, tab- or other-delimited file to a space-delimited, column-aligned file that can be read by the recognize command. The keyword comma or tab can be used to specify comma- or tab-delimited files, respectively. Alternatively, you can specify the delimiter within single or double quotes.
If an output file is not specified, Madeline will create an output file having the same name as the input file, but with a ".mod" (i.e., modified) extension at the end. See: RECOGNIZE.
M>convert "*" delimited file "mydata.stars" to "mydata.dat" Converting "mydata.stars" to "mydata.dat" 3547 lines were written. M> |
Draws pedigrees. Specify one or more pedigree (family) IDs separated by commas, or an alphabetically increasing range of pedigrees IDs with a dash. Be sure to enclose pedigree IDs in quotes. Alternatively, a subset of pedigrees in which one or more individuals match a query expression may be drawn using draw pedigrees for <LExpression>.
Orientation, paper size, margins, and color vs. black-and-white printing may be set using set commands. The left-to-right sort order of siblings within sibships and multiple spouses connected to a single spouse may be explicitely set using the sort command.
Drawings are created using efficient Adobe Postscript language routines and document structuring conventions. Output, which may consist of one to hundreds of drawings, is sent to the file named in DrawingFile. A Postscript viewing application such as Ghostview or GV on Unix/Linux or GSView on Windows is required for on-screen viewing of drawings. The name and path of the Postscript viewing application is specified in PostscriptViewer: this can be included in the autorun.bat file.
Madeline v. 1.0 can draw most single- and multiple-founder pedigrees. When DividedPages is set on (the default), subtrees in a pedigree defined by each founding ancestral group are printed on separate virtual pages. A founding ancestral group consists of an ultimate founder and his or her one to many spouses. When DividedPages is off, the entire pedigree will be drawn on a single virtual page, regardless of structural complexity. DividedPages has no effect on simple pedigrees which originate with a single founding group. For complicated pedigrees, the DividedPages option separates a pedigree into several more easily-viewed sections.
The options for orientation are landscape, portrait, automatic, and MultiPage. When orientation is set to portrait or landscape, pedigree drawings are scaled to fit the dimensions of the physical page. The scaling factor required to reduce large pedigrees to small pages may result in loss of legibility (or new corrective lenses!) --in these cases automatic, or MultiPage, is preferred.
Currently, the automatic and MultiPage options are identical. Automatic is preferred over portrait or landscape in most cases. When automatic is selected, Madeline chooses the best orientation based on the dimensions of the virtual drawing. If rescaling to fit a single physical page is likely to result in reduced legibility, the program inserts a Postscript routine for printing the drawing across two or more physical pages. Madeline automatically selects the number and orientation of physical pages that requires the least amount of rescaling.
Madeline produces a schematic index for assembling the individual pages after printing. The program may use up to 5 pages across and 5 pages down, or a total of not more than 25 pages, for printing a drawing in automatic mode. Normally only 2 to 4 pages are required for large drawings.
Due to the way that Madeline's Postscript routines manage the splitting of a large drawing for printing across multiple physical pages, Postscript viewing applications like Ghostview or GSView will generally only display the last section, or the viewer may appear to cycle through the individual pages of a split drawing without pausing. This limitation does not impair the correct printing of such drawings on a Postscript printer.
M>draw pedigrees '0001','0033','0317','0374'-'0376' Drawing pedigree 0001, P0001006's subtree (page 1 of 2) ... Printing drawing scaled to 0.91. Drawing pedigree 0001, !EM89WP!'s subtree (page 2 of 2) ... Drawing pedigree 0033, !FVQURR!'s subtree (page 1 of 1) ... Printing drawing scaled to 0.94. Drawing pedigree 0317, !A7Z3FP!'s subtree (page 1 of 2) ... Printing virtual portrait drawing scaled to 1.02 on 4 physical pages wide by 2 physical pages tall. (You may not be able to view entire drawing in Postscript viewing application). Physical page print order index: [5][6][7][8] [1][2][3][4] Drawing pedigree 0317, !9UE3V6!'s subtree (page 2 of 2) ... Printing drawing scaled to 0.81. Drawing pedigree 0374, P0374021's subtree (page 1 of 3) ... Printing drawing scaled to 0.77. Drawing pedigree 0374, P0374015's subtree (page 2 of 3) ... Printing virtual landscape drawing scaled to 0.98 on 2 physical pages wide by 1 physical page tall. (You may not be able to view entire drawing in Postscript viewing application). Physical page print order index: [1][2] Drawing pedigree 0374, P0374018's subtree (page 3 of 3) ... Printing virtual landscape drawing scaled to 0.98 on 2 physical pages wide by 1 physical page tall. (You may not be able to view entire drawing in Postscript viewing application). Physical page print order index: [1][2] Drawing pedigree 0375, P0375011's subtree (page 1 of 1) ... Printing drawing scaled to 0.94. Drawing pedigree 0376, P0376007's subtree (page 1 of 1) ... Calling "gs madeline.ps" ... M> |
&NBSP;
Up to ten mates of a single individual may be drawn. At the time of this writing, the drawing routines were being revised to provide better support for drawing consanguinous loops and other complicated pedigree structures.
Edit a file using the editor specified in the FileEditor variable. This allows you to edit files without having to exit Madeline.
M>FileEditor="emacs" M>edit "datafile.ped" <-- Madeline calls emacs to edit the file |
EXCLUDE
EXCLUDE [FAMILIES] FOR <LExpression>
Mark individuals for exclusion. If exclude families is used, all individuals who match the criteria and their spouses and descendants will be excluded. See: CLEAR, UNEXCLUDE
M>exclude for _famid="0049" 0049-100 has been marked for exclusion 0049-401 has been marked for exclusion 0049-701 has been marked for exclusion 0049-801 has been marked for exclusion 0049-802 has been marked for exclusion M> |
In this example, _famid is a reference to the family ID. You can dereference _famid even when no FamilyIDField is present in the database (as is permitted for FUSION 1 data).
Go to a specified record, nRecNo, in a database. In Madeline, records are numbered from 0 to n-1 where n is the total number of records in the table (inserted parents do not contribute to this count, and you cannot go to the non-existent table record of an inserted parent).
M>show studyid "0001-100" M>go 197 M>show studyid "0052-100" M>view record ... ... M> |
Terminate the current Madeline session. Equivalent to the quit command. See: QUIT.
M>goodbye Releasing resources ... Goodbye! |
Displays the current setting of Madeline’s boolean state flags and other status information. Identical to the status command.
M>hello +-----------------------+-----------+-----------------------------------------+ | Variable or State Flag| Setting | Description | +-----------------------+-----------+-----------------------------------------+ | AutoExclude | ON | Exclude pedigrees automatically | | Color | ON | Draw pedigrees in color | | DividedDrawings | ON | Paginate drawings by founding group | | EvaluationInterval | 0.50 cM | Value to write to control file. | | Help | HTML | Extended HTML help documentation | | Language | ENGLISH | Language convention used for date, time | | OffEndDistance | 10.00 cM | Value to write to control file | | Orientation | AUTOMATIC | Automatic based on drawing dimensions | | PaperMargin | 1.00 cm | Margin (in cm) on all four sides | | PaperSize | USLETTER | 8.5 x 11.0 inches | | SaveAlleleFrequencies | OFF | Calculate new frequencies on next OPEN | | Time | Current | 16:37 Monday, October 4, 1999 | | Verbosity | VERBOSE | All messages are printed to the console | +-----------------------+-----------+-----------------------------------------+ M> |
Invokes HTML-based help. Madeline will invoke the world wide web browser named in WebViewer with the URL named in WebAddress. The default WebViewer is "netscape". The default WebAddress is the current URL of the Madeline online documentation. Help assumes that the quoted-string provided as an argument is a valid Madeline token (i.e., command, variable, array, or other keyword recognized by the interpreter) or other valid bookmark in the online documentation and simply passes it as part of the URL:
M>quiet M>show WebViewer "netscape" M>show WebAddress "www.sph.umich.edu/group/fusion/programs/madeline.html" M>help "genehunter" M> |
The browser will locate any valid bookmark found in the online documentation, including section and topic headings. For example, help "tutorial" would bring up the Tutorial section of the online documentation.
See Lookup if you need to determine the name or correct spelling of a command, variable,or other token recognized by Madeline's interpreter.
LIST
(1) LIST FIELDS
(2) LIST <InternalArray>
(3) LIST MAP [FOR CHROMOSOME=<nChrNo>]
Shows current values in a list of items. The list may consist of:
The command has the three forms shown above. Examples of each form of the command follow:
(1) LIST FIELDS
M>open "chr8.dbf" . . . M>toggle output flag for D8s270-GATA101F01 M>list fields 1.FAMID Co__1 10.D8S504 Go__1 19.D8S1757 Go_10 2.STUDYID Co__2 11.D8S550 Go__2 20.D8S270 G 3.SEX Co__3 12.D8S258 Go__3 21.D8S1778 G 4.FATHER Co__4 13.D8S1771 Go__4 22.D8S276 G 5.MOTHER Co__5 14.D8S1820 Go__5 23.GATA101F01 G 6.TWIN Co__6 15.D8S283 Go__6 24.D8S514 Go_11 7.BMI Po__1 16.D8S285 Go__7 25.D8S284 Go_12 8.NAFFECTE Co__7+ 17.D8S260 Go__8 26.D8S534 Go_13 9.STUDYAGE Po__2 18.D8S530 Go__9 27.D8S1836 Go_14 M> |
(2) LIST <InternalArray>
M>// M>// cmv is the internal array of missing value indicators for M>// character/string variables: M>// M>list cmv CMV has 5 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" M> |
(3) LIST MAP [FOR CHROMOSOME=<nChrNo>]
M>load 'k:\emap\emap.dbf' Marker maps based on k:\emap\emap.dbf are now installed. Note: genotype fields ordered according to current map Field ordering now set based on k:\emap\emap.dbf. M>list map for chromosome=17 Marker Name Ch Or Position ----------- -- -- -------- D17S945 17 1 0.0000 D17S1803 17 2 8.8000 D17S1871 17 3 21.6000 D17S798 17 4 27.0000 D17S791 17 5 40.2000 D17S809 17 6 48.3000 D17S1835 17 7 57.2000 D17S1351 17 8 74.3000 D17S802 17 9 88.0000 D17S1806 17 10 93.2000 M> |
Column headings in the listing refer to marker name, chromosome number (Ch), ordinal rank (Or), and position in centiMorgans.
Load a map database. The map database must contain fields for chromosome number, marker name, ordinal position of the marker in the map for the chromosome, and positional distance in centiMorgans (Table 3.1).
|
Name of Variable Storing Field Name |
Default Value | Description | |
| ChromosomeField | "CHROMOSOME" | Chromosome | |
| OrdinalField | "ORDINAL" | Ordinal position (or rank) of the marker on the map for this chromosome | |
| MarkerField | "MARKERNAME" | Name of the marker | |
| PositionField | "POSITION" | Map position from p terminus in centiMorgans |
The map database can contain maps for any number of chromosomes, but may contain only one map for each chromosome. As soon as Madeline detects that a map database has been installed, genotype fields in an open pedigree database will automatically be placed in map order. When possible, execute load prior to any open command. When a pedigree database is subsequently opened, genotype fields will then automatically appear in map order from the outset.
M>load 'k:\emap\emap.dbf' Marker maps based on k:\emap\emap.dbf are now installed. M>list map for chromosome=23 Marker Name Ch Or Position ----------- -- -- -------- DXS7100 23 1 0.0000 DXS7110 23 2 33.4000 DXS1214 23 3 45.1000 DXS993 23 4 63.7000 DXS1055 23 5 72.5000 DXS991 23 6 80.7000 DXS986 23 7 90.1000 DXS8096 23 8 105.5000 DXS8072 23 9 146.2000 DXS8011 23 10 180.9000 M>open "chrx.dbf" . . . M> |
In the FUSION map database tables, the
OrdinalField is called "POSITION" and the PositionField
is called "KOSAMBICM". Therefore with FUSION data, be sure to include
the following lines in your autorun.bat file or
elsewhere as applicable:
OrdinalField="POSITION" PositionField="KOSAMBICM" |
Lookup a command or keyword by supplying a string containing the first few letters of a command or keyword:
M>lookup 'g' GENDERFIELD is an internal variable. Its current value is: "SEX". GENEHUNTER is a keyword. GENEHUNTERNPL is a keyword. GENEHUNTERQTL is a keyword. GENERIC is a keyword. GENOTYPE is a keyword. GO is a command. GOODBYE is a command. M> |
Merges any number of input tables to an output table. All input tables must contain identically-named FamilyIDField and IndividualIDField names which are used as the keys for constructing records in the output table.
Output is in Madeline's Mbase format which consists of a rectangular ASCII data table and an associated binary header file. The binary header file usually has the same name as the ASCII table, but with a .mfh extension.
The TO <OUTPUT_FILE> clause is optional. When present, data are written to the specified output file and an associated header is created with a .mfh extension. When absent, Madeline creates a file name based on the name of the first table by adding a .mrg extension to the end. The associated binary header will have a .mfh extension. In the event that a .mfh file already exists, Madeline uses an extension of .cfh instead.
The IN ALPHA | PHYSICAL | <USER_DEFINED_FILE> ORDER clause is also optional. When absent, the default ALPHA ORDER is used. When ALPHA ORDER is used, fields from all input tables are arranged alphabetically in the output table. When PHYSICAL ORDER is specified, fields in the output table are arranged in the same order that they appear in the source tables starting with the first table. Even though the key index fields FamilyIDField and IndividualIDField are present in every input table, they only appear once in the output table, as you would expect.
As an alternative to ALPHA and PHYSICAL order, you can specify the order of fields precisely by creating a text file containing the field names in the order you want separated by white space (i.e., spaces and/or carriage returns). For example, you can create a text file containing the marker fields listed in genetic map order (along with any other fields from the source tables). Assuming this file was called "map.order", the clause IN "map.order" ORDER would instruct Madeline to read field order from this file. When PHYSICAL or <USER_DEFINED_FILE> ORDER are used, make sure that the only fields duplicated in all source tables are the key index fields, FamilyIDField and IndividualIDField. Other fields cannot appear multiple times. Be especially careful with core fields like GenderField, FatherIDField, and MotherIDField which may quite possibly appear in more than one table. If it is not possible to remove non-index fields that appear multiple times, simply rename them so that name conflicts do not occur.
When ALPHA ORDER is used, fields that appear more than once are not a problem and will appear only once in the output table. The field type, width, and numeric precision of duplicate fields are based on the first table in which the fields appear. The data for such fields are also pulled from the first table in which the fields appear. As you would expect, tables are merged horizontally or side-by-side. Note that Madeline also permits you to merge tables vertically, but only in the case where ALPHA ORDER is used. For example, if you had two tables containing identical fields but with one containing one set of individuals in your study, and the other containing another set of individuals, MERGE ... IN ALPHA ORDER will permit you to join the two tables vertically. The restriction that fields be sorted in alphabetic order is necessary so that Madeline can map individual field data correctly even though it appears that field names are "duplicated". After a "vertical" merge, one can always do a subsequent MERGE in which a preferred field order is specified -- Madeline allows you to "merge" a single table in order to redefine field order!
Regardless of the setting of ORDER specified, any subsets of individuals who do not appear in all tables will have missing values for fields extracted from tables in which those individuals did not appear.
Table merges are done in memory. For large data sets, Madeline will use a lot of memory. On modern workstations, this should rarely be an obstacle. Madeline does not use table indexes on disk, but instead creates its own indexes in memory. If problems do occur with large tables, it may be necessary to merge files in stages.
// // merge uses FamilyIDField and IndividualIDField // as the keys for merging: // M>FamilyIDField ="FAMID" M>IndividualIDField="INDIVIDUAL" M>merge 't1.dbf','t2.dbf','t3.dbf','t4.dbf','t5.dbf' to 'out.dat' in 'map.order' order Building field and record trees ... Writing 2711 records to "out.dat" 5 databases merged to "out.mfh" in 8.5 seconds M>open 'out.mfh' ... ... M> |
Merge is part of Madeline's arsenal of commands designed to ease the task of manipulating flat files. Also see: CONVERT, RECTIFY, and TRANSPOSE.
OPEN
OPEN <cDatabaseTableName>
Open opens a pedigree database. Madeline currently supports the following database table formats:
Madeline’s database engine detects operating system and file byte-ordering at run time, permitting database files from PCs to be opened on Unix workstations, and vice-versa. The user does not need to tell Madeline the file type. Madeline does not make use of associated index files, such as .cdx files used by FoxPro. To open an ASCII flat file database, see RECOGNIZE.
Internally, when Madeline opens a database, the following events occurs:
M>open '\hold\chr8.dbf' Calculating allele frequencies for 9. D8S504... ... Calculating allele frequencies for 20. D8S270... Database "\hold\chr8.dbf" opened with 2,506 records Core information read in 2.00 seconds ... NOTE: 0472-100 and 0472-401 now marked with "a" indicating an apparent dizygotic twinship. NOTE: 0570-401 and 0570-402 now marked with "a" indicating an apparent dizygotic twinship. Pedigrees reconstructed in 0.1780 seconds Note: genotype fields ordered according to current map 1.STUDYID Co__1 8.BMI Po__1 15.D8S276 Go__9 2.GENDER Co__2 9.D8S504 Go__1 16.D8S283 Go__4 3.FATHER Co__3 10.D8S550 Go__2 17.D8S285 Go__5 4.MOTHER Co__4 11.D8S258 Go__3 18.D8S260 Go__6 5.TWIN Co__5 12.GATA101F01 Go_10 19.D8S530 Go__7 6.AFFECTSTAT C + 13.D8S514 Go_11 20.D8S270 Go__8 7.DOB C 14.D8S284 Go_12 ----------------------------- --------- --------- --------- Pedigrees and Individuals Included Excluded Total ----------------------------- --------- --------- --------- Pedigrees ................... 958 0 958 Individuals ................. 3,626 0 3,626 + In database .............. 2,506 0 2,506 | + Attached .............. 2,115 0 2,115 | + Childless spouses ..... 13 0 13 | + Unattached ............ 378 0 378 + Not in database .......... 1,120 0 1,120 M> |
Specifies that "detail" messages are not shown on the screen. Summary log messages still appear on the screen, and both detail and summary messages are still written to the .dtl and .log files, respectively. See: SILENT, VERBOSE.
M>quiet Madeline is now in quiet mode. M> |
Terminates the program session. Equivalent to goodbye. See: GOODBYE.
M>quit Releasing resources ... Goodbye! |
RECOGNIZE
RECOGNIZE <INPUT_FILE> [TO <BINARY_HEADER_FILE_NAME>]
Recognize a space-delimited, column-aligned rectangular ASCII data file (i.e., a "flat file") as a database table by creating a binary header file that contains key information about the number of records, number of columns, column names, column data types, and so on. By default, Madeline adds ".mfh" to the name of the input file to create the name of the output file. However, you can specify any other name for the binary header file using the to clause. If you plan on using the recognize command, be sure to read all of the following documentation very carefully!
If necessary, a flat file in the appropriate space-delimited column format can usually be created using Madeline's convert or rectify commands. In fact, recognize will automatically call rectify if necessary -- if this does occur, it is usually a good idea to investigate why rectify was called and to run rectify manually on a data file with all field column header information stripped out.
After the data are in the correct rectangular format, a minimal header containing the column names and data types needs to be added at the top of the file, as described below. If either convert or rectify is required, don't add a header until after running these commands!
When stored in a computer, a database table has two parts:
An ASCII flat file that contains a rectangular array of data with spaces (not tabs or commas) separating the aligned columns can be considered the simplest form of a database table:
0001 0001-100 F 0001-200 0001-300 23.45 14.2 141/142 0001 0001-200 M . . . 10.2 138/141 0001 0001-300 F . . 78.21 15.2 140/142 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
The problem with this "database" format is that it has no header! There are no records to establish what the columns mean, how many columns there are, or how many records are in the table.
Madeline tackles this problem by constructing a separate binary header file which is used to open the table indirectly. The binary header file is built by the recognize command and usually has a ".mfh" (i.e., Madeline Flat file Header) extension. The combination of a ".mfh" binary header and an ASCII flat file table is referred to as the Madeline Database, or Mbase file format.
Madeline can determine a lot of key information just by examining the flat file table itself. From a table with unlabeled columns (such as illustrated above), Madeline can:
Always determine:
Almost always determine:
and often determine:
The ability to determine the gender, individual, father, and mother ID fields provides a fruitful start to deciphering a file with unmarked columns. Still, there is no way for Madeline to know what all columns in an unmarked file represent. In the absence of additional information, Madeline provides default names based on whether the columns contain character, numeric, or date data. This is usually not what you want, unless you are in a great hurry!
Madeline provides the opportunity for the user to provide column names and, if necessary, column data types, at the top of the flat file before the first record. When present, the recognize command reads this minimal flat file "header" before parsing the rectangular data array. Once you are confident that you have the data in the correct rectangular format, it is highly recommended that you add a minimal header to the file, as described below.
The ONLY information that should be provided about each field in the header is:
The following set of single-letter options is permitted for designating column type:
Column name and type must be separated by spaces and can appear on any number of lines. The only requirement is that the lines of the header must be shorter in length than those of the records. This is how the program knows which lines are header lines and which are data records.
Some core fields such as FamilyIDField and IndividualIDField must be treated as "C" character fields, even though the IDs consist of only numbers. So, at a minimum, it may be necessary to supply the field types of these core fields. Here is an example:
FAMID C INDIVIDUAL GENDER C FATHER MOTHER STUDYAGE GLUCOSE D20S119 0001 0001-100 F 0001-200 0001-300 23.45 14.2 141/142 0001 0001-200 M . . . 10.2 138/141 0001 0001-300 F . . 78.21 15.2 140/142 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
The spacing and arrangement of the column labels and data type indicators in the header above is immaterial --except that all lines of the header are shorter than the data records. The INDIVIDUAL, FATHER, and MOTHER IDs contain dash characters, so they will automatically be interpreted as "C" character fields without being marked so (the program can see that they are not date fields). However, FAMID consists entirely of digits and would be interpreted as "N" numeric if it were not marked "C". The above example is now ready to be processed by the recognize command.
There are a couple of special situations to pay attention to when constructing the flat file header.
First, only a gender field containing character string labels such as "M" and "F" or "male" and "female" should be designated as being of type "X". You can also designate such a gender field with the more generic "C" (as was done above), or not designate any type at all, and Madeline will figure it out for you.
Secondly, Madeline provides the opportunity to specify a special column type of "A" for allele fields. Allele fields are present in file formats such as the Genehunter format where two contiguous space-delimited columns contain the allele labels that taken together represent the genotype for one marker. Since two columns are present, in the flat file header you must show same column name twice -- once for the first allele column, and once for the second allele. The column names should be the marker names. For example:
FAMID C STUDYID C FATHER C MOTHER C SEX X NAFFECTE N D20S100 A D20S100 A D20S200 A D20S200 A 0001 0001-200 M . 0 0 0 0 0001 0001-300 F . 0 0 0 0 0001 0001-100 0001-200 0001-300 M 1 1 1 4 5 0001 0001-401 0001-200 0001-300 F 0 1 2 5 5 0001 0001-402 0001-200 0001-300 F 0 2 2 4 4 0001 0001-403 0001-200 0001-300 F 1 1 2 4 4 0001 0001-404 0001-200 0001-300 M 0 2 2 4 4 0001 0001-408 0001-200 0001-300 M 1 2 2 4 5 0001 0001-409 0001-200 0001-300 M 1 1 1 4 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . |
Errors will result with unpaired "A" fields, so be careful! Madeline will combine the paired allele fields into genotype fields, as shown below. The column "Start" and "End" values confirm that Madeline has merged pairs of columns:
M>recognize "flat.test"
Recognizing file "flat.test" to "flat.test.mfh" ...
Skipping a total of 11 lines at top.
There are 10 non-empty header lines and 27 data lines.
Data records are 45 bytes long.
The gender field has been identified and will appear in the ".run" file
# . Field Name Start End Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
1. FAMID 1 4 4 0 1 C
2. STUDYID 6 13 8 0 1 C
3. FATHER 15 22 8 0 1 C
4. MOTHER 24 31 8 0 1 C
5. SEX 33 33 1 0 1 X
6. NAFFECTE 35 35 1 0 2 N
7. D20S100 38 40 3 0 2 G
8. D20S200 43 45 3 0 0 G
Madeline recognition header written.
Type 'open "flat.test.mfh" ' to open the database.
The template batch file "flat.test.run" has been created.
NOTE: The ".run" file contains commands and parameters to assist
you in opening a flat file database, but generally requires
editing before use.
M>
|
After recognizing a file, the ".mfh" file can be used as the parameter to the open, load, transpose, or merge commands just like any other table.
In addition to the ".mfh" file, Madeline creates a template batch command file with a ".run" extension. This command file contains parameter settings and commands to open the flat file database. For example, the ".run" file will specify the names used for the GenderField, IndividualIDField, FatherIDField, and MotherIDField if Madeline was successful at identifying these.
The ".run" file must be edited by the user. Madeline cannot identify certain information automatically. For example, blank fields and fields containing a single dot "." are always treated as missing values. However, Madeline cannot determine if other values are also used to represent missing data. In all data tables, arbitrary values are used to represent categorical states such as affected and unaffected: the program must be told about these values as well.
The ".run" file provides a template for opening a pedigree table. The recognize command can also recognize a map or marker table: in these latter instances, more modification of the ".run" file may be required.
M>recognize 'flat.dat'
Recognizing file "flat.dat" to "flat.mfh" ...
Skipping a total of 7 lines at top.
There are 6 non-empty header lines and 7046 data lines.
Data records are 319 bytes long.
The gender field has been identified and will appear in the ".run" file
# . Field Name Start End Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
1. FAMID 1 4 4 0 1 C
2. STUDYID 6 13 8 0 1 C
3. SEX 15 15 1 0 1 X
4. FATHER 17 24 8 0 1 C
5. MOTHER 26 33 8 0 1 C
6. TWIN 35 35 1 0 1 C
7. BMI 37 41 5 0 1 N
8. NAFFECTE 43 56 14 8 1 N
9. STUDYAGE 58 67 10 4 1 N
10. D8S504 69 75 7 0 7 G
11. D8S550 83 89 7 0 7 G
12. D8S258 97 103 7 0 7 G
13. D8S1771 111 117 7 0 7 G
14. D8S1820 125 131 7 0 7 G
15. D8S283 139 145 7 0 7 G
16. D8S285 153 159 7 0 7 G
17. D8S260 167 173 7 0 7 G
18. D8S530 181 187 7 0 7 G
19. D8S1757 195 201 7 0 7 G
20. D8S270 209 215 7 0 7 G
21. D8S1778 223 229 7 0 7 G
22. D8S276 237 241 5 0 9 G
23. GATA101F01 251 257 7 0 7 G
24. D8S514 265 271 7 0 7 G
25. D8S284 279 285 7 0 7 G
26. D8S534 293 299 7 0 7 G
27. D8S1836 307 313 7 0 6 G
Madeline recognition header written.
Type 'open "flat.mfh" ' to open the database.
The template batch file "flat.run" has been created.
NOTE: The ".run" file contains commands and parameters to assist
you in opening a flat file database, but generally requires
editing before use.
M>
|
RECTIFY
RECTIFY <INPUT_FILE> [TO <OUTPUT_FILE>]
In order for Madeline to use a flat file table, it must contain aligned columns that are delimited by space characters. Extra space characters are used to pad column widths so that the columns always line up. In addition, the table must be truly rectangular, which means that all data lines must be of equal length.
Embedded tab characters are usually replaced by space characters when a file is viewed in an editor or word processor on screen, leading to the false impression of a rectangular array with even line lengths, when in fact lines are actually of varying lengths. Extra (but invisible!) space or tab characters after the last column in a table can also result in varying line lengths.
The rectify command replaces all embedded tab characters with the appropriate number of space characters, and trims or pads lines so that all records are of equal length. Rectify contains an algorithm for determining what the tab interval in the original software used to create or edit the flat file must have been to achieve column alignment.
In some cases, rectify will report that a unique tab interval could not be determined. This may simply mean that the source file contained tab characters at the same horizontal offsets for every record in the file (a tab-delimited file is an one example of such a file). Replacement of the tabs by any fixed number of spaces will always result in aligned columns in output. Madeline will always use one space, the least number required, for this type of file.
In some cases, Madeline may not be successful at rectifying the file without further editing by the user. Sparse data files containing many missing values are the most likely to be troublesome, especially if the missing values are represented by blank entries or single dots.
If an output file is not specified, Madeline will create an output file having the same name as the input file, but with a ".mod" extension at the end.
It is possible for rectify to be called from the recognize command if recognize determines that data records are of varying lengths. If this happens, a common problem that can occur is that the header lines which the user has inserted into the data file for recognize to use will also be padded out the length of the data records. When this happens, Madeline will no longer be able to discriminate where the header lines end and the data records begin. The simple solution is to edit the file to manually remove extra trailing spaces from any header lines present in the file.
M>rectify "mydata.txt" Rectifying "mydata.txt" to "mydata.mod" Tab interval = 4 2567 lines were written. M> |
Load and run a batch file. Batch files can themselves contain nested run commands. When commands from a batch file are being processed, Madeline displays the "M-Batch>" prompt in place of the "M>" prompt, and returns to the "M>" prompt after successful completion of batch commands. Madeline goes into quiet mode whenever a batch file is invoked with run: issue verbose after run if you want to return to verbose mode.
| Contents of load.bat: |
|
|
| Contents of task.bat: |
|
Script which runs the batch files:
M>run ‘load’bat’
M-Batch> Running batch file "load.bat"... ***
M-Batch> run ‘task.bat’
M-Batch> Running batch file "task.bat"... ***
M-Batch> quiet
Madeline is now in quiet mode
M-Batch> open ‘\test\Thursday.dbf’
... ...
M-Batch> write to ‘\test\mendel.ped’ in mendel format
... ...
M-Batch> load ‘k:\emap\emap.dbf’
Marker maps based on k:\emap\emap.dbf are now installed
M-Batch> write to ‘\test\siblink.ped’ in siblink format
... ...
M-Batch>
M-Batch> Finished batch file "task.bat"... ***
M-Batch>
M-Batch> Finished batch file "load.bat"... ***
M>
|
Batch processing can also be invoked from the command line. In addition, a batch file named autorun.bat will be automatically invoked at program startup.
The set and turn commands are identical. See TURN for complete descriptions of both forms of the command.
SHOW
SHOW <nExpression>|<cExpression>|<LExpression>
Show the value of a single expression. Equivalent to what is. To display values in any kind of list, including field lists, arrays, and marker maps, use the list command instead. See LIST, WHAT IS.
M>show sin(pi/4) 0.707107 M> |
Detail and summary log messages are not shown on the screen. Identical to silent. See SILENT.
Detail and summary log messages are not shown on the screen.
M>silent M> |
SORT
SORT ON <Expression> [ASCENDING]|DESCENDING
Sets the sort order for displaying siblings in a sibship and multiple spouses on pedigree drawings. <Expression> can be any expression that can be evaluated by Madeline. The default sort order is ascending.
M>// M>// show siblings in descending order by date of birth: M>// M>sort on dob descending M>draw pedigree '0535' Drawing page 1 of 1 page for pedigree 0535... M>// M>// show siblings in ascending order by the number of offspring they have: M>// M>sort on _noffspring ascending M>draw pedigree '0535' Drawing page 1 of 1 page for pedigree 0535... M> |
|
|
|||||||||||||||||||||
| Pedigree drawn with siblings sorted on date of birth descending. | Same pedigree drawn with siblings sorted by number of offspring ascending. | |||||||||||||||||||||
For more information on drawing pedigrees, see the draw and set commands.
Displays the current setting of Madeline’s boolean state flags and other status information. Identical to the hello command.
M>status +-----------------------+-----------+-----------------------------------------+ | Variable or State Flag| Setting | Description | +-----------------------+-----------+-----------------------------------------+ | AutoExclude | ON | Exclude pedigrees automatically | | Color | ON | Draw pedigrees in color | | DividedDrawings | ON | Paginate drawings by founding group | | EvaluationInterval | 0.50 cM | Value to write to control file. | | Help | HTML | Extended HTML help documentation | | Language | ENGLISH | Language convention used for date, time | | OffEndDistance | 10.00 cM | Value to write to control file | | Orientation | AUTOMATIC | Automatic based on drawing dimensions | | PaperMargin | 1.00 cm | Margin (in cm) on all four sides | | PaperSize | USLETTER | 8.5 x 11.0 inches | | SaveAlleleFrequencies | OFF | Calculate new frequencies on next OPEN | | Time | Current | 16:37 Monday, October 4, 1999 | | Verbosity | VERBOSE | All messages are printed to the console | +-----------------------+-----------+-----------------------------------------+ M> |
Transfers a quoted-string command to the operating system. This allows the user to obtain directory and file information, copy or move files, or run analysis software without having to exit Madeline. System is especially useful when you need to obtain file or directory information using the DOS dir command or the Unix ls command. Since Madeline is supported on multiple platforms, there is no built-in support for operating system-specific commands. Because system transfers control to the operating system, screen output from other programs or from operating system commands is not recorded in Madeline's log files.
M>system "ls -l /test/*.dbf" -rw-rw-rw-a 8061 Tue Dec 02 14:34:24 1997 chr20dic.dbf -rw-rw-rw-a 550246 Tue Jan 13 15:08:10 1998 c14.dbf -rw-rw-rw-a 777954 Tue Dec 02 14:40:18 1997 chr20.dbf -rw-rw-rw-a1001786 Mon Feb 16 14:53:10 1998 sib20.dbf -rw-rw-rw-a 369746 Thu Feb 26 11:10:16 1998 draw.dbf M> |
Toggle database field category and status flags.
(1) TOGGLE [PHENOTYPE|GENOTYPE|COVARIATE|OUTPUT] FLAG[S] FOR <field_i>[,<field_j>[,<field_k>-<field_z]]
Madeline automatically categorizes fields in a database table as being core "C", genotype "G", or phenotype "P" fields. Core "C" fields contain core information used to reconstruct pedigrees and classify individuals, such as the StudyIDField, GenderField, and AffectionStatusField. Genotype "G" fields contain marker information. The names of genotype fields should correspond with the marker names exactly. Fields that are not "C" or "G" fields are classified as phenotype "P" fields.
Core fields are determined by matching up field names in the database table with names stored in internal variables. Genotype fields are determined by sampling the data to find character fields that contain numeric labels separated by slash characters. By elimination, remaining fields are classified as phenotype fields. Certain output formats may require knowing which of the phenotype fields are to be used as covariates. Hence, there is also a covariate "V" category. By default, "C", "G" and all "P" fields except date fields are marked for output with the "o" flag. With the exception of core fields which Madeline handles automatically in most cases, only fields marked for output with the "o" flag will be examined and appear in output.
The most common use of the toggle command is to toggle the output flags on or off. Occasionally you might need to change the status of a phenotype "P" field to that of a covariate "V" field. Covariate "V" fields are still recognized as phenotype "P" fields when writing formats that do not require covariates.
M>open "/m55/newtest.dbf"
... ...
1.STUDYID Co__1 19.D20S889 Go__4 37.D20S481 Go_22
2.SEX Co__2 20.D20S482 Go__5 38.D20S836 Go_23
3.FATHER Co__3 21.D20S905 Go__6 39.D20S888 Go_24
4.MOTHER Co__4 22.D20S115 Go__7 40.D20S886 Go_25
5.TWIN Co__5 23.D20S851 Go__8 41.D20S197 Go_26
6.FUSION2 Po__1 24.D20S917 Go__9 42.D20S178N Go_27
7.CONTROL Po__2 25.D20S189 Go_10 43.D20S866 Go_28
8.CPEP Po__3 26.D20S898 Go_11 44.D20S196 Go_29
9.GLU_FAST Po__4 27.D20S114 Go_12 45.D20S857 Go_30
10.GLU_2H Po__5 28.D20S912 Go_13 46.D20S480 Go_31
11.STUDYAGE Po__6 29.D20S477 Go_14 47.D20S211 Go_32
12.LOGSI Po__7 30.D20S874 Go_15 48.D20S840 Go_33
13.BMI Po__8 31.D20S195 Go_16 49.D20S120 Go_34
14.TP Po__9 32.D20S909 Go_17 50.D20S100 Go_35
15.NAFFECTE C + 33.D20S107 Go_18 51.D20S102 Go_36
16.D20S103 Go__1 34.D20S170 Go_19 52.D20S171 Go_37
17.D20S117 Go__2 35.D20S96 Go_20 53.D20S173 Go_38
18.D20S906 Go__3 36.D20S119 Go_21
M>toggle output flags for 6-7,glu_fast,glu_2h,12-14
M>toggle covariate flag for studyage
M>list fields
1.STUDYID Co__1 19.D20S889 Go__4 37.D20S481 Go_22
2.SEX Co__2 20.D20S482 Go__5 38.D20S836 Go_23
3.FATHER Co__3 21.D20S905 Go__6 39.D20S888 Go_24
4.MOTHER Co__4 22.D20S115 Go__7 40.D20S886 Go_25
5.TWIN Co__5 23.D20S851 Go__8 41.D20S197 Go_26
6.FUSION2 P 24.D20S917 Go__9 42.D20S178N Go_27
7.CONTROL P 25.D20S189 Go_10 43.D20S866 Go_28
8.CPEP Po__1 26.D20S898 Go_11 44.D20S196 Go_29
9.GLU_FAST P 27.D20S114 Go_12 45.D20S857 Go_30
10.GLU_2H P 28.D20S912 Go_13 46.D20S480 Go_31
11.STUDYAGE Vo__2 29.D20S477 Go_14 47.D20S211 Go_32
12.LOGSI P 30.D20S874 Go_15 48.D20S840 Go_33
13.BMI P 31.D20S195 Go_16 49.D20S120 Go_34
14.TP P 32.D20S909 Go_17 50.D20S100 Go_35
15.NAFFECTE C + 33.D20S107 Go_18 51.D20S102 Go_36
16.D20S103 Go__1 34.D20S170 Go_19 52.D20S171 Go_37
17.D20S117 Go__2 35.D20S96 Go_20 53.D20S173 Go_38
18.D20S906 Go__3 36.D20S119 Go_21
M>
|
(2) TOGGLE ICON FLAG[S] FOR <field_i>[,<field_j>[,<field_k>-<field_z>]]
Toggle icon flag enables you to designate one or more categorical variables to display graphically on the male and female icons of a pedigree drawing. The AffectionStatusField is toggled with the icon flag on by default. You can designate any number of additional or alternate fields for graphical display. The number of fields you select determines the number of pie-slice regions into which the icons on the drawing will be divided. Each pie-slice region will be shaded to display the categorical level of the respective variable. Fields toggled with the icon flag on are displayed in the field list with a plus sign, "+" at the end. For example:
... 15.NAFFECTE C + 16.HEARTCOND N + ... |
When the icon flag of a field is toggled on, Madeline automatically determines how many non-missing categorical levels are present in the field:
15. NAFFECTE has 2 levels. 16. HEARTCOND has 3 levels. |
Madeline also automatically constructs a label array for each flagged categorical variable, with entries for each level of the variable. The label arrays are used for assigning character string labels for each level of a variable when displayed on a pedigree drawing. The name of the label array is simply the name of the field variable with the word "label" appended to the end. Normally, the default entries are either the ordinals "1, 2, 3" or the letters of the alphabet "A, B, C" enumerated for each level of the categorical variable. These defaults can be changed easily:
M>list naffectelabel <-- list the default entries naffectelabel[0]="A" naffectelabel[1]="B" M>NaffecteLabel[_unaffected]="U" <-- assign "U" as a label for unaffected individuals M>NaffecteLabel[_affected ]="A" <-- assign "A" as a label for affected individuals M>list naffectelabel <-- list the new entries naffectelabel[0]="U" naffectelabel[1]="A" M> |
Normally only single-character labels will fit within the male or female symbols. This is especially true when more than one categorical variable is selected so that the symbols are divided into pie-slice regions. Single-character labels can be legible when as many as five categorical variables have been selected. Assign the null string, "", to each element of an array if you do not want character labels displayed at all.
Watch out for two conditions! First, a variable with only a single non-missing categorical level may represent a problem in the database and can cause divide-by-zero errors in the Postscript drawing routines. Secondly, guard against selecting a variable with too many levels. This too may represent a database problem. In any case, the shades of gray or color used to display different levels will become indistinguishable as the number of levels increases:
15. NAFFECTE has 1 level. <-- Possible problem in the database. Why only one level? 16. HEARTCOND has 359 levels. <-- Whoa! Too many levels! |
When drawing in black and white, Madeline assigns shades of gray for each level of an icon field variable, using white for the first level, and black for the last level. When drawing in color, Madeline selects alternating shades of red, green, and blue for each level of a variable. See Fig. 1.18 for an example pedigree drawing displaying two categorical variables graphically.
TRANSPOSE
TRANSPOSE <INPUT_FILE> [TO <OUTPUT_FILE>]
The transpose command converts a marker database containing the alleles of a given marker for a given individual in a given family to a table which contains a single record for each individual, marker names as column headings, and genotypes as field data. This command is designed for converting the output from genotyping machine software (such as ABI Genotyper) into a database form compatible with Madeline's pedigree database model:
INPUT: --------------- FAMID INDIVIDUAL MARKERNAME ALLELE1 ALLELE2 DISCARD 0001 0001-100 d20s100 112 114 G323 0001 0001-100 d20s898 120 122 G364 0001 0001-100 d20s129 98 100 G311 0002 0002-100 d20s100 116 116 G112 0002 0002-100 d20s898 115 118 G918 0002 0002-100 d20s129 94 96 G454 . . . . . . . . . . . . . . . . . . OUTPUT: --------------- FAMID INDIVIDUAL D20S100 D20S898 D20S129 0001 0001-100 112/114 120/122 98/100 0002 0002-100 116/116 115/118 94/96 . . . . . . . . . . . . . . . |
Before running transpose, be sure to specify the names of the three required key fields (FamilyIDField, IndividualIDField, and MarkerField) and the two allele fields (Allele1Field and Allele2Field). The input table may contain additional fields, but only FamilyIDField, IndividualIDField, and the marker fields will appear in the output database. For example, as shown above, the "DISCARD" field of the input table does not appear in output.
If an output file name is not provided, Madeline creates an output file with a ".trp" extension for the Mbase flat file output and a ".tfh" extension for the binary Mbase header if a .mfh file already exists.
Core family structure information fields (gender, parental IDs, twin status) or other phenotype or genotype data can be added to the transposed genotype table using the merge command.
M>FamilyIDField = "FAMID" M>IndividualIDField = "INDIVIDUAL" M>MarkerField = "MARKERNAME" M>Allele1Field = "ALLELE1" M>Allele2Field = "ALLELE2" M>transpose "marker.mfh" to "genotypes.dat" Transposing "marker.mfh" to "genotypes.dat" Transposed file created. M> |
In Madeline, the set and turn commands are identical. Normally, of course, one will select the command verb that makes the most sense in English. Descriptions of all forms of the command follow. (1) SET|TURN AUTOEXCLUDE|SAVEALLELEFREQUENCIES|DIVIDEDPAGES|COLOR ON|OFF
Turns boolean state flags on or off. The effect and default state of each boolean flag is described below. See Table 5.5 for a tabular summary of Madeline’s boolean state flags.
TURN AutoExclude [ON|OFF]
AutoExclude instructs Madeline, on a subsequent write command, to automatically exclude pedigrees with insufficient data. If AutoExclude is off, no pedigrees will be excluded. Autoexclude is on by default. There are few reasons to turn AutoExclude off.
M>turn autoexclude off Autoexclude is now off M> |
TURN SaveAlleleFrequencies [ON|OFF]
SaveAlleleFrequencies instructs Madeline, on a subsequent open command, to retain the current set of allele frequencies, estimated from the current pedigree table, rather than calculating new frequencies from the new table. In order to use SaveAlleleFrequencies, the subsequent table must have the same number of fields prior to the set of genotype fields, and the genotype fields must match exactly in name and order (Table 3.2). SaveAlleleFrequencies is off by default.
Table 3.2. Field requirements for using SaveAlleleFrequencies. The subsequent database must have the same number of fields prior to the set of genotype fields, and the genotype fields must match exactly in name and order.
| Field set in first table used to calculate allele frequencies | Field set in second table opened with SaveAlleleFrequencies on | 1. STUDYID Co__1 2. FATHER Co__2 3. MOTHER Co__3 4. SEX Co__4 5. TWIN Co__5 6. LOGSI Po__1 7. BMI Po__2 8. GLU_FAST Po__3 |
Same number of fields preceding genotype fields
(fields need not match) |
1. STUDYID Co__1 2. FATHER Co__2 3. MOTHER Co__3 4. SEX Co__4 5. TWIN Co__5 6. DBP Po__1 7. SBP Po__2 8. GLU_2H Po__3 |
9. D20S103 Go__1 10. D20S906 Go__2 11. D20S889 Go__3 |
Genotype fields match in order and name | 9. D20S103 Go__1 10. D20S906 Go__2 11. D20S889 Go__3 |
M>open '\test\fullset.ssd' ... M>turn saveallelefrequencies on SaveAlleleFrequencies is now on M>open '\test\subset.ssd' Existing allele frequency information has been saved... Closing database "\test\fullset.ssd"... Removing old pedigrees... Database "\test\subset.ssd" opened with 1,856 records ... M> |
TURN DividedPages [ON|OFF]
DividedPages controls how a pedigree with multiple founding groups is logically partitioned for drawing when draw is invoked. A founding group consists of an original founder and his or her one to many spouses. When DividedPages is on (the default), each subtree of a pedigree originating with a different founding group is drawn on a separate virtual page. The transfer of a drawing from a virtual page to one or more physical pages is still governed by the settings of orientation, the number of data fields displayed on the drawing, and the size of the pedigree. When DividedPages is off, a multiple founding group pedigree is drawn in its entirety on a single virtual page. DividedPages provides one way to logically partition a large complicated pedigree to make it easier to view.
NOTE: Due to incomplete state of the drawing algorithms in version 0.90 and 0.91, DividedPages is effectively always on. This toggle is provided to support the augmented feature set of version 1.0.
TURN COLOR [ON|OFF]
Pedigrees are printed in color when color is on (the default), and in black-and-white otherwise. This toggle affects a single boolean flag located near the top of the Postscript file that Madeline generates. Thus, any saved pedigree drawing can be printed in color or in black-and-white by simply changing the boolean INCOLOR flag in the Postscript file from true to false or vice-versa:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % % Boolean toggle for color shading/printing: % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% /INCOLOR true def |
TURN HaplotypeDisplay ON|OFF
When HaplotypeDisplay is on, genotypes are shown with alleles delimited by "|" on pedigree drawings. When off, genotypes are shown delimited by "/". Off is the default setting.
NOTE: Madeline is not capable of inferring haplotypes and has no way of knowing whether alleles in a pedigree file are arranged to show haplotypes or not. If alleles in a pedigree file are arranged to show known or inferred haplotypes, then Madeline provides a convenient way to draw pedigrees of such data using the set HaplotypeDisplay on and draw pedigree commands. If haplotypes are not known, Madeline cannot help you.
(2) SET|TURN FIELD ORDER [TO <Field_i>[,<Field_j>[,<Field_k>-<Field_p>]] |AUTOMATICALLY]
Reorder fields by specifying field names or field indices separated by commas, or a range of contiguous field names or indices separated by a dash. Fields are ordered within their category (i.e., "C","P", or "G"). Covariate "V" fields are simply a subset of phenotype "P" fields and, thus, are numbered along with phenotype fields.
If within a category you specify only m of n fields (m<n, where n is either the number of "C", "P", or "G" fields), then the fields you specify will be numbered in the sequence you specify from 1 to m, and the remaining output fields will be numbered from m+1 to n in the physical order they occur in the database.
When specifying field order, you can mix and match any sequence of "C", "P", and "G" fields within a single set command. Specified fields not already toggled for output are ignored. Issuing a load command after a set field order command resets the order of all "Co" and "Po" fields, while "Go" field ordering is set to the map order. To avoid this behaviour, issue load prior to any set field order command.
Madeline controls the order of core fields when writing most output formats. Reordering of core fields is recognized by the view record and draw pedigree commands, and by the CommaDelimited and SpaceDelimited write formats.
Issuing set field order without the to clause resets the order of "Co" and "Po" fields to their natural order, while the order of "Go" fields will depend on whether a map database is loaded or not.
M>open '\test\nt2.dbf' ... ... 1.STUDYID Co__1 19.D20S889 Go__5 36.D20S836 Go_22 2.SEX Co__2 20.D20S103 Go__6 37.D20S888 Go_23 3.FATHER Co__3 21.D20S115 Go__7 38.D20S886 Go_24 4.MOTHER Co__4 22.D20S851 Go__8 39.D20S197 Go_25 5.TWIN Co__5 23.D20S912 Go__9 40.D20S178N Go_26 6.CPEP Po__1 24.D20S917 Go_10 41.D20S866 Go_27 7.GLU_FAST Po__2 25.D20S898 Go_11 42.D20S196 Go_28 8.GLU_2H Po__3 26.D20S114 Go_12 43.D20S857 Go_29 9.STUDYAGE Po__4 27.D20S477 Go_13 44.D20S480 Go_30 10.LOGSI Po__5 28.D20S874 Go_14 45.D20S211 Go_31 11.BMI Po__6 29.D20S195 Go_15 46.D20S120 Go_32 12.TP Po__7 30.D20S909 Go_16 47.D20S102 Go_33 13.NAFFECTE C 31.D20S107 Go_17 48.D20S173 Go_34 14.ISTYPED Po__8 32.D20S170 Go_18 49.D20S171 Go_35 15.D20S117 Go__1 33.D20S96 Go_19 50.D20S840 Go_36 16.D20S906 Go__2 34.D20S119 Go_20 51.D20S189 Go_37 17.D20S482 Go__3 35.D20S481 Go_21 52.D20S100 Go_38 18.D20S905 Go__4 M>load 'k:\emap\emap.dbf' Marker maps based on k:\emap\emap.dbf are now installed. Note: genotype fields ordered according to current map Field ordering now set based on k:\emap\emap.dbf. M>list fields 1.STUDYID Co__1 19.D20S889 Go__4 36.D20S836 Go_23 2.SEX Co__2 20.D20S103 Go__1 37.D20S888 Go_24 3.FATHER Co__3 21.D20S115 Go__7 38.D20S886 Go_25 4.MOTHER Co__4 22.D20S851 Go__8 39.D20S197 Go_26 5.TWIN Co__5 23.D20S912 Go_13 40.D20S178N Go_27 6.CPEP Po__1 24.D20S917 Go__9 41.D20S866 Go_28 7.GLU_FAST Po__2 25.D20S898 Go_11 42.D20S196 Go_29 8.GLU_2H Po__3 26.D20S114 Go_12 43.D20S857 Go_30 9.STUDYAGE Po__4 27.D20S477 Go_14 44.D20S480 Go_31 10.LOGSI Po__5 28.D20S874 Go_15 45.D20S211 Go_32 11.BMI Po__6 29.D20S195 Go_16 46.D20S120 Go_34 12.TP Po__7 30.D20S909 Go_17 47.D20S102 Go_36 13.NAFFECTE C 31.D20S107 Go_18 48.D20S173 Go_38 14.ISTYPED Po__8 32.D20S170 Go_19 49.D20S171 Go_37 15.D20S117 Go__2 33.D20S96 Go_20 50.D20S840 Go_33 16.D20S906 Go__3 34.D20S119 Go_21 51.D20S189 Go_10 17.D20S482 Go__5 35.D20S481 Go_22 52.D20S100 Go_35 18.D20S905 Go__6 M>set field order to father,mother,logsi,studyage,sex,twin,tp,bmi M>list fields 1.STUDYID Co__5 19.D20S889 Go__4 36.D20S836 Go_23 2.SEX Co__3 20.D20S103 Go__1 37.D20S888 Go_24 3.FATHER Co__1 21.D20S115 Go__7 38.D20S886 Go_25 4.MOTHER Co__2 22.D20S851 Go__8 39.D20S197 Go_26 5.TWIN Co__4 23.D20S912 Go_13 40.D20S178N Go_27 6.CPEP Po__5 24.D20S917 Go__9 41.D20S866 Go_28 7.GLU_FAST Po__6 25.D20S898 Go_11 42.D20S196 Go_29 8.GLU_2H Po__7 26.D20S114 Go_12 43.D20S857 Go_30 9.STUDYAGE Po__2 27.D20S477 Go_14 44.D20S480 Go_31 10.LOGSI Po__1 28.D20S874 Go_15 45.D20S211 Go_32 11.BMI Po__4 29.D20S195 Go_16 46.D20S120 Go_34 12.TP Po__3 30.D20S909 Go_17 47.D20S102 Go_36 13.NAFFECTE C 31.D20S107 Go_18 48.D20S173 Go_38 14.ISTYPED Po__8 32.D20S170 Go_19 49.D20S171 Go_37 15.D20S117 Go__2 33.D20S96 Go_20 50.D20S840 Go_33 16.D20S906 Go__3 34.D20S119 Go_21 51.D20S189 Go_10 17.D20S482 Go__5 35.D20S481 Go_22 52.D20S100 Go_35 18.D20S905 Go__6 M> |
(3) SET|TURN LANGUAGE TO ENGLISH|FINNISH|SUOMI|FRENCH
Sets the language and linguistic conventions used for displaying and entering dates and times in Madeline. FINNISH and SUOMI are identical.
M>set language to Suomi
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting | Description |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude | ON | Exclude pedigrees automatically |
| Color | ON | Draw pedigrees in color |
| DividedDrawings | ON | Paginate drawings by founding group |
| EvaluationInterval | 0.50 cM | Value to write to control file. |
| Help | HTML | Extended HTML help documentation |
| Language | FINNISH | Language convention used for date, time |
| OffEndDistance | 10.00 cM | Value to write to control file |
| Orientation | AUTOMATIC | Automatic based on drawing dimensions |
| PaperMargin | 1.00 cm | Margin (in cm) on all four sides |
| PaperSize | USLETTER | 8.5 x 11.0 inches |
| SaveAlleleFrequencies | OFF | Calculate new frequencies on next OPEN |
| Time | Current | perjantai 8.10.1999, 10:00 |
| Verbosity | VERBOSE | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>show {8.10.1999}
{perjantai 8.10.1999}
M>set language to french
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting | Description |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude | ON | Exclude pedigrees automatically |
| Color | ON | Draw pedigrees in color |
| DividedDrawings | ON | Paginate drawings by founding group |
| EvaluationInterval | 0.50 cM | Value to write to control file. |
| Help | HTML | Extended HTML help documentation |
| Language | FRENCH | Language convention used for date, time |
| OffEndDistance | 10.00 cM | Value to write to control file |
| Orientation | AUTOMATIC | Automatic based on drawing dimensions |
| PaperMargin | 1.00 cm | Margin (in cm) on all four sides |
| PaperSize | USLETTER | 8.5 x 11.0 inches |
| SaveAlleleFrequencies | OFF | Calculate new frequencies on next OPEN |
| Time | Current | 10:01 le vendredi 8 octobre 1999 |
| Verbosity | VERBOSE | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>show {8.10.1999}
{le vendredi 8 octobre 1999}
M>
|
When entering dates in curly brackets, Madeline applies ordering (i.e., month before day vs. day before month) and capitalization rules, and looks up month names that apply to the current language setting only. For example, you cannot enter English month names when language is set to French:
M>set language to french
...
M>// invalid date error because month is in English:
M>show {11 December 1612}
{}
M>// error because of inappropriate capitalization:
M>show {11 Decembre 1612}
{}
M>// the following is correct:
M>show {11 decembre 1612}
{le mardi 11 decembre 1612}
M>// this is OK too:
M>show {11.12.1612}
{le mardi 11 decembre 1612}
M>
|
(4) SET|TURN ORIENTATION TO LANDSCAPE|PORTRAIT|AUTOMATIC|MULTIPAGE
In landscape and portrait modes, Madeline resizes a pedigree drawing to fit on a single physical page in the desired orientation. For large pedigrees, the reduction necessary to fit a drawing on a single page may result in labels that are too small to read. In general, the default automatic or MultiPage mode is a better choice. The keywords automatic and MultiPage are identical.
When orientation is set to automatic, Madeline chooses the best orientation for a drawing based on it's height and width. If Madeline determines that the reduction necessary to fit the drawing on a single page may make the labels difficult to read or illegible, the program inserts additional code into the drawing file to print the drawing centered across multiple physical pages. The program selects the number and orientation of physical pages that require the least amount of rescaling of a drawing. A schematic index is produced as a guide for assembling the drawing after printing. See DRAW for more information.
M>set orientation to landscape ... M> |
(5) SET|TURN PAPERSIZE TO USLETTER|USLEGAL|A4|A4LONG|A4SUPER
Sets the paper size to the specified standard printer paper size. Madeline does not send special commands to multi-tray printers, so be sure that the correct paper size is in the selected printer tray.
M>set papersize to usletter ... M> |
(6) SET|TURN PAPERMARGIN TO <nValue>
Sets the paper margins on all four sides to the specified value in centimeters. The default margin size is one centimeter. If a multiple-page drawing is produced, not only will the outer-edge margins be of the specified width, but also the drawings will overlap by exactly the margin width along the joining edges of the drawing. Do not set the margins to much less than one centimeter because most printers cannot print out to the physical edge of the paper.
M>set papermargin to 1.5 +-----------------------+-----------+-----------------------------------------+ | Variable or State Flag| Setting | Description | +-----------------------+-----------+-----------------------------------------+ | AutoExclude | ON | Exclude pedigrees automatically | | Color | ON | Draw pedigrees in color | | DividedDrawings | ON | Paginate drawings by founding group | | EvaluationInterval | 0.50 cM | Value to write to control file. | | Help | HTML | Extended HTML help documentation | | Language | FRENCH | Language convention used for date, time | | OffEndDistance | 10.00 cM | Value to write to control file | | Orientation | AUTOMATIC | Automatic based on drawing dimensions | | PaperMargin | 1.50 cm | Margin (in cm) on all four sides | | PaperSize | USLETTER | 8.5 x 11.0 inches | | SaveAlleleFrequencies | OFF | Calculate new frequencies on next OPEN | | Time | Current | 10:39 le vendredi 8 octobre 1999 | | Verbosity | VERBOSE | All messages are printed to the console | +-----------------------+-----------+-----------------------------------------+ M> |
UNEXCLUDE
UNEXCLUDE [FAMILIES] FOR
Includes previously excluded individuals and pedigrees in output. If unexclude families is used, all individuals who match the criteria and their spouse(s) and descendants who were excluded by a previous exclude families or other exclude command will be included again. See EXCLUDE
M>verbose Madeline is now in verbose mode. M>exclude for _famid=='0172' 0172-100 has been marked for exclusion 0172-401 has been marked for exclusion 0172-402 has been marked for exclusion 0172-500 has been marked for exclusion 0172-601 has been marked for exclusion 0172-602 has been marked for exclusion 0172-603 has been marked for exclusion 0172-604 has been marked for exclusion 0172-605 has been marked for exclusion M>unexclude for _famid=='0172' 0172-100 has been marked for inclusion 0172-401 has been marked for inclusion 0172-402 has been marked for inclusion 0172-500 has been marked for inclusion 0172-601 has been marked for inclusion 0172-602 has been marked for inclusion 0172-603 has been marked for inclusion 0172-604 has been marked for inclusion 0172-605 has been marked for inclusion M> |
Prints all summary and detail messages to the screen. See QUIET, SILENT.
M>verbose Madeline is now in verbose mode. M> |
The view command has two forms, described below:
(1) VIEW [RECORD][FOR <Lexpression>]
When view is used without the record keyword, only the IndividualID and database record number of the individual, if in the database, are shown. If the record keyword is included, then those fields in the database currently toggled on for output are also shown. If view record is typed without a for query expression, only the current record is shown. View for <Lexpression> queries the IDs or records of a subset of the data. Upon completion, view prints a tally of the records matching the criteria. For examples of using Madeline’s internal references in view queries, see Table 5.4.
M>open 'chr8.dbf' ... 1.FAMID Co__1 10.D8S504 Go__1 19.D8S1757 Go_10 2.STUDYID Co__2 11.D8S550 Go__2 20.D8S270 Go_11 3.SEX Co__3 12.D8S258 Go__3 21.D8S1778 Go_12 4.FATHER Co__4 13.D8S1771 Go__4 22.D8S276 Go_13 5.MOTHER Co__5 14.D8S1820 Go__5 23.GATA101F01 Go_14 6.TWIN Co__6 15.D8S283 Go__6 24.D8S514 Go_15 7.BMI Po__1 16.D8S285 Go__7 25.D8S284 Go_16 8.NAFFECTE Co__7+ 17.D8S260 Go__8 26.D8S534 Go_17 9.STUDYAGE Po__2 18.D8S530 Go__9 27.D8S1836 Go_18 M>go 197 M>view record CORE FIELDS: 3015 3015-602 M 3015-100 3015-500 . 0 PHENOTYPE FIELDS: 23.15242297 41.1663 GENOTYPE FIELDS: 129/139 200/206 153/153 297/301 112/112 119/121 319/319 203/205 222/238 292/296 110/110 202/209 77/79 227/227 210/215 281/294 178/209 139/146 M> M>view for famid="3348" 3348+100 in 1348 (rec. no. 4839) 3348+401 in 1348 (rec. no. 4840) 3348+402 in 1348 (rec. no. 4841) 3348-200 in 1348 (not in database) 3348-300 in 1348 (not in database) 5 individuals in 1 pedigree matched as follows: Individuals .............. 5 + In database ........... 3 | + Attached ........... 3 | + Childless spouses .. 0 | + Unattached ......... 0 + Not in database ....... 2 M> M>toggle output flags for 3-9,13-27 M>view record for famid="0482" 0482+402 in 0482 (rec. no. 2863) CORE FIELDS: 0482 0482+402 GENOTYPE FIELDS: ............. ............. ............. 0482-100 in 0482 (rec. no. 2864) CORE FIELDS: 0482 0482-100 GENOTYPE FIELDS: 133/139 185/206 150/153 0482-200 in 0482 (not in database) -- not in database -- 0482-300 in 0482 (not in database) -- not in database -- 0482-401 in 0482 (rec. no. 2865) CORE FIELDS: 0482 0482-401 GENOTYPE FIELDS: ............. 185/193 ............. 5 individuals in 1 pedigree matched as follows: Individuals .............. 5 + In database ........... 3 | + Attached ........... 3 | + Childless spouses .. 0 | + Unattached ......... 0 + Not in database ....... 2 10 WARNINGS M> M> |
(2) VIEW DISTINCT VALUES OF <cField_A>[,<cField_B>[,<cField_C>-<cField_Z>]]
View a histogram of the distinct values in a field or set of fields. A list of field names or field indices may be specified separated by commas. A range may be specified by separating the first and last field in a range with a dash. Madeline reports the number of non-missing levels of a variable. The number of missing cases is printed at the end of the list.
M>view distinct values of naffecte, 8
7. NAFFECTE has 2 levels:
Level Value Cases
------ ----- -----
1. 0 1336
2. 1 1514
.. ..... 2995 missing values in database
8. D20S103 has 27 levels:
Level Value Cases
------ ----- -----
1. 103/103 1
2. 89/103 28
3. 89/89 64
4. 89/91 11
5. 89/93 202
6. 89/95 399
7. 89/97 279
8. 89/99 20
9. 91/103 1
10. 91/93 9
11. 91/95 25
12. 91/97 18
13. 93/103 59
14. 93/93 235
15. 93/95 871
16. 93/97 549
17. 93/99 38
18. 95/102 1
19. 95/103 108
20. 95/95 784
21. 95/97 1046
22. 95/99 39
23. 97/103 65
24. 97/97 341
25. 97/99 30
26. 99/103 2
27. 99/99 1
.. ............. 619 missing values in database
M>
|
WHAT IS
WHAT IS <nExpression>|<cExpression>|<LExpression>
Shows the value of an expression. Equivalent to show command. See SHOW.
M>what is studyid "0052-100" M>what is d20s889 "201/216" M> |
WRITE
WRITE [[PEDIGREE FILE]|LOCUS FILE] TO <cFileName> IN <FormatKeyword> FORMAT
There are two forms of the write command:
Write pedigree file writes the current set of core "C" fields and flagged output fields (e.g., "Go", "Po" and "Vo" fields) to a pedigree file, <cFileName>, in the format specified by <FormatKeyword>. Write pedigree file can be shortened to just write.
Write locus file creates a locus file containing allele frequency information for the "Go" genotype fields flagged for output in the current database.
After a write command, the value of OutputFile will be <cFileName>.
For certain formats, such as the Sage and Siblink formats, Madeline will
automatically create a parameter or control file at the same time the pedigree file is
created. A parameter or control file contains a template for running an analysis, along
with other core information required by the specific package, such as number of families
or sib pairs in the corresponding pedigree file. Madeline will provide all information
possible -- such as number of families or sib pairs -- but Madeline cannot guess what sort
of analysis is to be conducted, what genetic model to specify, and
so on. The user will need to edit the parameter file to meet specific needs.
For these formats, the value of OutputParameterFile will become
Formats such as the Siblink format incorporate locus file information directly into the control file. In these cases, you do not need to create a separate locus file. Other packages, such as Crimap, do not require a locus file at all.
Some formats, such as the Siblink and Genehunter formats, incorporate map distance information into either a control file or locus file, and therefore require that a map database be loaded prior to the write command. Madeline will issue an error if you try to write such a file without first loading a map.
For specific formats and usage, see Section 4. Write Formats.
(1)
M>write to '\test\sibpal3.ped' in sibpal3 format
Creating associated SIBPAL parameter file called "\test\sibpal3.par"
Writing pedigree data to "\test\sibpal3.ped"
...
M>write locus file to '\test\sibpal3.loc' in sage format
M>
|
(2)
M>load 'k:\emap\emap.dbf' Marker maps based on k:\emap\emap.dbf are now installed. M>write to '\test\siblink.ped' in siblink format Creating associated SIBLINK control/parameter file called "\test\siblink.ctl" Writing pedigree data to "\test\siblink.ped" ... M> |
This section describes all formats currently supported by the write pedigree file and write locus file commands.
Format keywords are listed alphabetically within each group. Some keywords can be used for creating both a pedigree file and a locus file, while others cannot. To make these distinctions clear, the following codes in parentheses appear following the keyword headings:
| Code | Description |
| PED | indicates a keyword used with write pedigree file only. |
| LOC | indicates a keyword used with write locus file only. |
| PED, LOC | indicates a keyword used to write both pedigree and locus files. |
| PAR/CTL | indicates that a complementary parameter or control file is produced when the write pedigree file command is executed. |
For example, the sibpal3 keyword can only be used to create a pedigree file, while the sage keyword can only be used to create the corresponding locus file, so you will see SIBPAL3 (PED) and SAGE (LOC) as headings.
Depending upon analysis package, the parameter file may be called a control file or may have some other name. In Madeline, any file containing analysis control or parameter information is referred to as a parameter file. For many formats, the parameter file also contains locus (and sometimes map) information which eliminates the need for writing a locus file in a separate step.
Any one program may contain numerous settable parameters in the parameter file. For those formats that require it, Madeline provides a template parameter file that may be edited to set parameters to pass to an analysis program. Madeline provides default parameters to the extent possible, but these defaults are not necessarily the best choices for any given analysis and, in some cases, they may only be place-holder values like "0.00".
Used to output a pedigree file as a comma-delimited ASCII flat file. Since this is a generic format, there is no fixed set of required core fields. It is necessary to toggle output flags on or off and set field order, as required, for core fields as well as for general phenotype and genotype fields. For readability, fields in the output are padded with white space so that columns align, just as in the SpaceDelimited format. Missing numeric values are printed using the value specified in the first cell of the numeric missing value array, NumericMissingValue[0]. Missing character values are printed using the value specified in the first cell of the character missing value array, CharacterMissingValue[0].
M>nmv[0]=-9 M>list nmv NMV has 1 elements: NMV[ 0]= -9 M>list cmv CMV has 5 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" M>write pedigree to 'commadlm.dat' in commadelimited format . . . M>
M>nmv[0]=-9 M>list nmv NMV has 1 elements: NMV[ 0]= -9 M>list cmv CMV has 5 elements: CMV[ 0]="." CMV[ 1]="/" CMV[ 2]="0/0" CMV[ 3]="0/ 0" CMV[ 4]="0/ 0" M>write pedigree to 'spacedlm.dat' in spacedelimited format . . . M>
Used to output a locus file in a generic flat-file format that provides allele frequencies as well as the raw allele counts and allele ranks (Fig. 4.1). The output file is useful for checking alleles, and for matching up allele ranks (used in formats such as Siblink and Genehunter) against the original allele labels.
D20S103 has 7 alleles: 1. 90 454/ 4296 = 0.1057 2. 92 27/ 4296 = 0.0063 3. 94 909/ 4296 = 0.2116 4. 96 663/ 4296 = 0.3871 5. 98 094/ 4296 = 0.2547 6. 100 44/ 4296 = 0.0102 7. 104 105/ 4296 = 0.0244 D20S117 has 14 alleles: 1. 166 4/ 4198 = 0.0010 2. 168 153/ 4198 = 0.0364 3. 176 658/ 4198 = 0.1567 4. 178 22/ 4198 = 0.0052 5. 183 9/ 4198 = 0.0021 6. 185 132/ 4198 = 0.0314 . . .Fig. 4.1. Excerpt from a locus file in generic format produced by Madeline.
The programs in Aspex use a single pedigree file format. However, each program requires a different set of control parameters in the .tcl control file. Madeline therefore provides a format keyword for each program in the package and produces well-commented .tcl template files containing the default values for all relevant parameters. Two of the programs, sib_ibd and sib_phase (Madelines sibibd and sibphase keywords), require marker information and so a marker map must be loaded prior to issuing the write command for these formats.
Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex kinship program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created.
Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_ibd program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created. A map must be loaded prior to issuing the write command for this format.
Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_map program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created.
Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_phase program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created. A map must be loaded prior to issuing the write command for this format.
Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_tdt program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created.
Used to specify Crimap .gen file format. Non-numeric characters in the study IDs are converted to their ASCII decimal equivalents. For example, "-" is converted to "45". Although this process lengthens the IDs, it does maintain the uniqueness of each ID and provides the completely numeric IDs required by Crimap. Note that the integer value of a converted ID must not exceed the maximum integer that can be represented within Crimap on your platform (Crimap uses a signed long int for IDs, the maximum value of which is 2,147,483,647 on many systems. * This could be a problem for unmodified FUSION control and trio IDs, but not for other FUSION 1 or 2 IDs).
Madeline's Crimap routine currently only handles pedigrees with a single pair of founders (the founders may be dummied-in, as is done for FUSION sibship pedigrees). Criteria for including a pedigree are:
- There must be at least one nuclear family with both parents typed (either the founding pair or a proband-spouse or sibling-spouse pair) with at least one typed offspring.
- If only one parent is typed, then one grandparent must also be typed in addition to at least one typed offspring.
These criteria were defined by Beth Hauser and Mike Boehnke to prevent biased map lengths that occur when data are available on only a single generation of individuals.
For the Genehunter formats, genehunter, genehunternpl and genehunterqtl, any pedigrees consisting of a trio of two parents and a single offspring are excluded.
When the genehunternpl keyword is used to specify a file for non-parametric analysis, the following types of pedigrees are also excluded:
Used to specify a Genehunter pedigree file for parametric linkage analysis. Also used to create a Genehunter locus file. Madeline automatically converts the allele labels in the pedigree database to ordinals and prints these ordinal labels in both the locus and pedigree file. For cross-reference purposes, you may find it useful to also produce a generic locus file -- see GENERIC (LOC). A Genehunter locus file also contains inter-marker distance information. Be sure to load a map database prior to generating the locus file.
When used to create a pedigree file, the genehunter keyword instructs Madeline to exclude pedigrees that do not contribute to a parametric analysis. For a non-parametric analysis, use the genehunternpl keyword.
Used to specify a Genehunter pedigree file for non-parametric linkage analysis. Pedigrees that cannot be used or do not contribute to a non-parametric analysis will be excluded. For a parametric linkage analysis, use the genehunter keyword. To create the corresponding locus file, use the genehunter keyword. Read above to learn about Madelines exclusion rules for this format.
Used to specify a Genehunter pedigree file for quantitative trait linkage analysis. Pedigrees are excluded using the same rule as for the genehunter keyword used for a parametric analysis file. Using genehunterqtl differs from using the genehunter keyword in that the complementary control file is customized for a quantitative trait linkage analysis. To create the corresponding locus file, use the genehunter keyword.
Linkage Disequilibrium (LDEQ) Formats
For linkage disequilibrium analyses, Madeline selects a single parent-offspring trio providing the most genetic information possible from each pedigree. The output file format is a flat file similar to that produced by the generic SpaceDelimited format. In addition to toggling the genotype fields required for output, the user must also designate which core fields are required, and the order in which the core fields are required, prior to executing the write command. Note that the AffectionStatusField is required in output. The three options for linkage disequilibrium analyses are presented below.
For the LDEQMARKER format, Madeline selects a trio providing the most information for a linkage disequilibrium analysis without regard to the affection status of the three individuals in the trio.
For the LDEQAFFECTEDSPOUSE format, Madeline selects a trio providing the most information for a linkage disequilibrium analysis with the additional condition that at least one of the parents must be affected. The status of the other parent and offpsring can be affected, unaffected, or unknown (missing).
For the LDEQTDT format, Madeline selects a trio providing the most information for a linkage disequilibrium analysis with the additional condition that the offspring must be affected. The status of the two parents can be affected, unaffected, or unknown (missing).
The Mendel and Fisher programs cannot use individuals whose gender is listed as missing. In Madeline, only terminal individuals without offspring may have gender listed as missing because Madeline will, when necessary, infer the gender of non-terminal individuals via the FatherIDField and MotherIDField of their offspring. Therefore, Madeline excludes individuals whose gender is missing when writing files in the various Mendel and Fisher formats even when such individuals have genotype data.
Used to specify Fisher file format with no ascertainment correction. Zeros are written in the header for each pedigree to indicate no proband ascertainment. Use the mendel keyword to write the corresponding locus file.
Used to specify Fisher file format with ascertainment correction. Ones are written in the header for each pedigree that has a proband to indicate proband ascertainment. Under fisher1, at least one non-proband individual in the pedigree must have sufficient data for the pedigree to be included in output. Use the mendel keyword to write the corresponding locus file.
Used to specify generic Mendel pedigree and locus file formats.
Used to specify Mendel UserM13 file format. Use the mendel keyword to write corresponding locus file. When userm13 is specified, all non-excluded genotyped individuals, including childless spouses and unattached individuals, are included in output.
The pedcheck keyword produces an output file for use with the Pedcheck program by Jeff O'Connell of the University of Pittsburgh. The format is essentially the Linkage program format. Records for all individuals with genotype data are written to output.
Used to specify Relpair file formats. Relpairs locus file format is very similar to the UserFQTL format, while the pedigree file format is identical to generic Mendel format.
The locus file contains map information, and therefore a map database must be loaded prior to the write locus file command.
To run a module in Sage such as Sibpal, you will need to have an FSP family data input file in addition to a Sibpal pedigree file. Be careful to use the same set of exclusions when creating both files. The Sage modules also require parameter files to run. Madeline provides template parameter files that require editing. The parameter files are generated at the same time as the pedigree files.
Note that since the FSP and Sibpal .ped or .par files could easily end up having the same names, be sure to differentiate the file names somewhere other than just in the file extension (Madeline will automatically provide .par as the extension for any of the Sage package parameter files).
Used to specify the Sage FSP data file format. Madeline creates a corresponding .par file at the same time that the pedigree file is created. When FSP0 is used, Madeline only outputs the core fields that FSP requires for construction of the family structure pointer ".lnk" file which is used as one input to SIBPAL. No genotype fields are output (hence the "0" in the format name). In order to place genotype fields in an FSP segregation analysis data file used as input to ASSOC and LODLINK, use the FSP format (below) instead of FSP0. If your only objective is to obtain a family structure pointer file to run SIBPAL, then you do not need to include any phenotype or genotype fields as input to FSP, and FSP0 is the preferred choice.
Used to specify the Sage FSP data file format. Madeline creates a corresponding .par file at the same time that the pedigree file is created. If you plan to run SIBPAL, it is more convenient to use the FSP0 format above. However, if you plan to run ASSOC or LODLINK, you should use the FSP format here in order to place genotype fields in the FSP segregation analysis data file.
Used to specify the Sage locus file format.
Used to specify Sage Sibpal quantitative trait linkage format. Be sure to toggle the covariate and output flags of any covariates -- see TOGGLE 3.26.
Madeline creates a corresponding .par file at the same time that the pedigree file is created.
Used to specify Sage Sibpal binary trait linkage format. Be sure to toggle the covariate and output flags of any covariates -- see TOGGLE 3.26.
Madeline creates a corresponding .par file at the same time that the pedigree file is created.
Used to specify Sage Sibpal binary trait linkage with variable age of onset format. Be sure to toggle the covariate and output flags of any covariates see TOGGLE 3.26. The age of onset variable must be the first of the specified covariates.
Madeline creates a corresponding .par file at the same time that the pedigree file is created.
Used to specify Sage Sibpal marker ordering (i.e., mapping) format. There is no demand for this format, and so it has not been thoroughly tested.
Madeline creates a corresponding .par file at the same time that the pedigree file is created.
In addition to the usual set of core fields, the AffectionStatusField must be present so that Madeline can choose sib pairs based on affection status. In addition, a map database must be loaded. Madeline creates a Siblink control file with a .ctl extension at the same time that the pedigree file is created. The control file contains locus information, including map distance information.
Madeline automatically converts the allele labels in the source database to ordinals and prints these ordinal labels in both the locus and pedigree file. For cross-reference purposes, you may find it useful to also produce a generic locus file -- see GENERIC.
SIBLINKAFFECTEDPAIRS (PED, PAR)
Used to specify a file in Siblink format containing only affected sib pairs.
SIBLINKUNAFFECTEDPAIRS (PED, PAR)
Used to specify a file in Siblink format containing only unaffected sib pairs.
Used to specify a file in Siblink format containing all affected and unaffected sib pairs. Siblings whose affection status is missing are excluded.
SIBLINKDISCORDANTPAIRS (PED, PAR)
Used to specify a file in Siblink format containing discordant affected-unaffected sib pairs. Siblings whose affection status is missing are excluded.
UserFQTL requires nuclear family blocks for input. Madeline enumerates each nuclear family block by affixing a dot "." followed by a sequential ordinal identifier after the original pedigree identifier. For example, if the pedigree ID is 0123, successive nuclear family blocks up to n will be identified as 0123.1, 0123.2, 0123.3 ... 0123.n in the family record headers of the resulting data file. A nuclear family must have at least one person with phenotype data for the pedigree to be included.
Used to specify UserFQTL locus file format.
Used to specify UserFQTL all nuclear families format. All nuclear families constructed by decomposing a full pedigree will be output.
Used to specify UserFQTL founding nuclear families format. Only nuclear families in the founding generation will be output.
Used to specify UserFQTL offspring nuclear families format. Only nuclear families in the offspring generation will be output.
Section 5
Internal Constants, Variables, Arrays, References,
and Boolean Flags
Madeline maintains symbolic names for a number of numeric constants such as pi, the base of natural logarithms e, true and false. Madeline also has internal variables and arrays whose default values can be modified by the user. In addition, Madeline provides references to internal information related to individuals, such as the number of offspring that an individual has. References are also provided to directly access the parent, offspring and mate vectors of an individual without having to move to another record in the database table. Finally, Madeline maintains certain state information in globally-accessible boolean flags.
This section contains a table each for numeric constants, internal variables, arrays, references, and boolean flags. Table 5.1 shows numeric constants. Table 5.2 shows internal variables used to store file names, pedigree database field names, and map database field names. Table 5.3 shows internal arrays used to store lists of values that are recognized by the program as having specific meanings. For example, to inform Madeline that -7 represents a missing value in the data, -7 must be present in the NumericMissingValue[ ] array.
When not using Madeline's defaults, you must inform Madeline of the proper field names, such as AffectionStatusField, before you open a database. Similarly, you must tell Madeline what the field codings are before you issue a write command. Note that alternate short names are provided for arrays. For example, CharacterMissingValue[2] can be referenced as cmv[2].
Table 5.4 shows references that allow you to access information related to individuals such as number of offspring, number of affected offspring, or information related to their parents, children, or mates. These read-only references always begin with the underscore character to distinguish them from field or other variable names with which they might otherwise be confused. References are useful for querying information from pedigree databases.
Finally, Table 5.5 lists Madelines boolean flags and their default states. Boolean flags can be set using the turn or set command.
Table 5.1. Internal Numeric Constants in Madeline.
| Constant Name | Value |
| e | 2.71828 ... |
| pi | 3.1415926 ... |
| missing | The uniform numeric missing value indicator. |
| _female | 1 |
| _male | 0 |
| _true | 1 |
| _false | 0 |
| _affected | 1 |
| _unaffected | 0 |
| _dead | 1 |
| _alive | 0 |
Table 5.2. Internal Variables in Madeline.
Stores.. |
Name | Description | Default Value |
Data Field Name |
AffectionStatusField | Stores the name of the affection status field, an optional core field. This field can be either a numeric or character field. See: CharacterAffectionStatus[] NumericAffectionStatus[] | "NAFFECTE" |
Map Field Name |
ChromosomeField | Stores the name of the chromosome field in the map database. This field must be a numeric field. | "CHROMOSOME" |
Data Field Name |
DateOfBirthField | Stores the name of the date of birth field. | "DOB" |
Data Field Name |
DateOfDeathField | Stores the name of the date of death field. | "DOD" |
Data Field Name |
DeathStatusField | Stores the name of the death status field, an optional core field. This field can be a numeric or character field. See: CharacterDeathStatus[], NumericDeathStatus[] | "DECEASED" |
File Name |
DetailFile | Stores the name of the detail log file. | "madeline.dtl" |
File Name |
DrawingFile | Stores the name of the Postscript drawing output file. | "madeline.ps" |
Data Field Name |
DZTwinField | Stores the name of the dizygotic twin indicator field. Field must be a character field and only the first character is examined. | "DZTWIN" |
Param. Value |
EvaluationInterval | Stores the desired analysis evaluation interval in centiMorgans. Madeline automatically inserts this value into parameter and control files where appropriate. | 0.50 centiMorgans |
Data Field Name |
FamilyIDField | Stores the name of the family (pedigree) ID field. Must be a character field. This core field is not required when FUSION-compliant IDs are used. | "FAMID" |
Data Field Name |
FatherIDField | Stores the name of the father ID field. Must be a character field. Required core field. | "FATHER" |
File Name |
FileEditor | Stores the name of the file editor called when the edit command is issued. | "e" |
Data Field Name |
GenderField | Stores the name of the gender field. This
field can be either a character or a numeric field. See: CharacterSexValue[], NumericSexValue[]. Required core field. |
"SEX" |
Data Field Name |
IndexCaseField | Stores the name of the proband or index case indicator field. Must be a numeric field coded with 1 for proband, 0 otherwise. This optional core field is not required when FUSION-compliant IDs are used. | "PROBAND" |
Data Field Name |
IndividualIDField | Stores the name of the individual ID field. Must be a character field. Required core field. | "STUDYID" |
Font Size Value |
LabelFontSize | Stores the size, in points, of the typeface used to print labels on pedigree drawings. | 7 pt. |
Font Size Value |
LegendFontSize | Stores the size, in points, of the typeface used to print the legend on pedigree drawings. | 9 pt. |
Data Field Name |
LiabilityClassField | Stores the name of the liability class indicator field, an optional core field. This field can be either a numeric or character field. | "LCLASS" |
File Name |
LogFile | Stores the name of the log file. | "madeline.log" |
File Name |
MapDatabase | Stores the name of the map database. | "emap.dbf" |
Map Field Name |
MarkerField | Stores the name of the marker name field in the map database. This must be a character field. | "MARKERNAME" |
Data Field Name |
MotherIDField | Stores the name of the mother ID field. Must be a character field. Required core field. | "MOTHER" |
Data Field Name |
MZTwinField | Stores the name of the monozygotic twin indicator field. Must be a character field and only the first character in the field is examined. Required core field. | "TWIN" |
Param. Value |
OffEndDistance | Stores the desired analysis off-end evaluation distance in centiMorgans. Madeline automatically inserts this value into parameter and control files where appropriate. | 10.00 centiMorgans |
Map Field Name |
OrdinalField | Stores the name of the marker ordinal field in the map database. This field must be a numeric field. | "ORDINAL" |
File Name |
OutputFile | Holds the name of the most recent pedigree output file. This variable is reassigned each time a write command is executed. | "output.ped" |
File Name |
ParameterOutputFile | Holds the name of the most recent parameter output file. This variable is reassigned each time a write command uses a format, such as certain Sage formats, that requires concurrent writing of a parameter file. | "output.par" |
Map Field Name |
PositionField | Stores the name of the marker position field in the map database. This field must be a numeric field. | "POSITION" |
File Name |
PostscriptViewer | Stores the name of the Postscript viewing application used for viewing pedigree drawings | "gs" |
Table 5.3. Internal Arrays in Madeline.
Stores... |
Name | Description | Default Values |
Array |
CharacterAffectionStatus[ ] CAS[ ] |
Stores a list of string values representing affection status used in the AffectionStatusField when that field is a character field. See: NumericAffectionStatus[] | cas[Unaffected]="0" cas[Affected ]="1" cas[2]="2" (unstudied, reported as unaffected) cas[3]="3" (unstudied, reported as affected) cas[4]="%" (unstudied, conflicting reports) |
Array |
CharacterDeathStatus[ ] CDS[ ] |
Stores string values representing dead or alive, respectively, used in the DeathStatusField when that field is a character field. See: NumericDeathStatus[] | cds[Dead ]="Y" cds[Alive]="N" |
Array |
CharacterMissingValue[ ] CMV[ ] |
Stores a list of string values representing missing values used in character fields in the database. | cmv[0]="" cmv[1]="." cmv[2]="0/0" cmv[3]="0/ 0" |
Array |
CharacterSexValue[] CSV[] |
Stores string values used to represent male and female, respectively, in the GenderField when that field is a character field. See: NumericSexValue[] | csv[_male]="M" csv[_female]="F" |
Array |
NumericAffectionStatus[] NAS[] |
Stores a list of numeric values representing affection status used in the AffectionStatusField when that field is a numeric field. See: CharacterAffectionStatus[] | nas[Unaffected]=0 nas[Affected ]=1 nas[2]=2 (unstudied, reported as unaffected) nas[3]=3 (unstudied, reported as affected) nas[4]=4 (unstudied, conflicting reports) |
Array |
NumericDeathStatus[] NDS[] |
Stores numeric values representing dead or alive, respectively, used in the DeathStatusField when that field is a numeric field. See: CharacterDeathStatus[] | nds[Alive]=0 nds[Dead ]=1 |
Array |
NumericMissingValue[] NMV[] |
used to store values that represent missing values in numeric fields in the database. | nmv[0]=MISSING nmv[1]=-9999 |
Array |
NumericSexValue[] NSV[] |
used to store values for male and female, respectively, when the GenderField is a numeric field. See: CharacterSexValue[] | nsv[_male ]=0 nsv[_female]=1 |
Table 5.4. Internal References to Individual Information.
Reference Type |
Name | Description | Example |
Pointer to an Individual |
_EighthChild | Refers to an individuals eighth child. Equivalent to _0[7]. | See example for _FirstChild. |
Numeric Variable |
_excluded | True (1) if an individual has been marked for exclusion by the user. | M>view for _noffspring>=6 |
Character Variable |
_famid | Individuals family ID. | M>exclude for _famid="0300" |
Pointer to an Individual |
_father | Refers to an individuals father. | M>view for _father.bmi>=25 |
Pointer to an Individual |
_FifthChild | Refers to an individuals fifth child. Equivalent to _0[4]. | See example for _FirstChild. |
Pointer to an Individual |
_FirstChild | Refers to an individuals first child. Equivalent to _o[0]. | M>view for _noffspring=2 and _firstchild.istyped and _secondchild.istyped |
Pointer to an Individual |
_FourthChild | Refers to an individuals fourth child. Equivalent to _0[3]. | See example for _FirstChild. |
Numeric Variable |
_HasData | True (1) if an individual has been marked as having data by the last write command. | M>view for _hasdata |
Character Variable |
_id | Individuals ID. | M>view record for _id="0125-100" |
Vector of Pointers to Individuals |
_mate | Refers to the vector of mates of an individual. | M>exclude for bmi>=30 and _nmates=1 and _mate[0].bmi>=30 |
Pointer to an Individual |
_mother | Refers to an individuals mother. | M>view for _mother.studyage-studyage<=17 |
Numeric Variable |
_n | Total number of individuals in this individuals pedigree. | M>view for _n>=40 |
Numeric Variable |
_nff | Number of founding fathers in this individuals pedigree. | M>view for _nff=3 |
Numeric Variable |
_nfm | Number of founding mothers in this individuals pedigree. | M>view for _nfm=_nff+1 |
Pointer to an Individual |
_NinthChild | Refers to an individuals ninth child. Equivalent to _0[8]. | See example for _FirstChild. |
Numeric Variable |
_nmates | Number of mates of an individual. | M>view for _nmates>=2 |
Numeric Variable |
_noffspring | Number of offspring of an individual. | M>view for _noffspring>=6 |
Vector of Pointers to Individuals |
_o | Refers to the vector of offspring of a female individual. | M>view for _noffspring=2 and _o[0].istyped=1 and _o[1].istyped=1 |
Pointer to an Individual |
_SecondChild | Refers to an individuals second child. Equivalent to _0[1]. | See example for _FirstChild. |
Pointer to an Individual |
_SeventhChild | Refers to an individuals seventh child. Equivalent to _0[6]. | See example for _FirstChild. |
Pointer to an Individual |
_SixthChild | Refers to an individuals sixth child. Equivalent to _0[5]. | See example for _FirstChild. |
Pointer to an Individual |
_spouse | Refers to an individual's first spouse, if present. Equivalent to _mate[0]. | M>view for affected and _spouse.affected |
Pointer to an Individual |
_TenthChild | Refers to an individuals tenth child. Equivalent to _0[9]. | See example for _FirstChild. |
Pointer to an Individual |
_ThirdChild | Refers to an individuals third child. Equivalent to _0[2]. | See example for _FirstChild. |
Table 5.5. Boolean State Flags in Madeline.
| Boolean Flag | Default Setting |
Explanation |
| AutoExclude | ON | ON: When executing write pedigree file, Madeline
automatically excludes pedigrees, nuclear families, or affected sib pairs having
insufficient data. OFF: Program doesn't evaluate whether pedigrees have sufficient data when executing write pedigree file, resulting in the inclusion of all pedigrees, nuclear families, or affected sib pairs. There are few reasons to turn AutoExclude off in the current version of Madeline. |
| Color | ON | ON: Print pedigrees in color. OFF: Print pedigrees in black-and-white. |
| DividedPages | ON | ON: Print subtrees originating from distinct ancestral
founding groups on separate drawing pages. OFF: If a pedigree consists of multiple subtrees originating from distinct ancestral groups, print all subtrees on a single drawing. |
| HaplotypeDisplay | OFF | ON: Genotypes on pedigree drawings are delimited by "|".
OFF: Genotypes on pedigree drawings are delimited by "/". |
| Quiet | OFF | ON: Detail-level program messages are not sent to the
terminal, but still appear in the detail log file. OFF: Detail-level program messages are sent to the terminal. |
| SaveAlleleFrequencies | OFF | ON: Allele frequencies already calculated from a previous open
command are retained and used when a new database having the same structure is opened. OFF: New allele frequencies are calculated each time a database is opened. |
| Silent | OFF | ON: Summary and detail messages do not appear on the
terminal, but are still sent to the summary and detail log files. OFF: Summary messages appear on the terminal. The setting of quiet determines whether detail messages also appear on the terminal. |
Section 6
Mathematical and Aggregate Level Processing Functions
The following mathematical functions can be used in expressions (Table 6.1):
Table 6.1. Mathematical Functions Available in Madeline.
| Function Name | Description |
| ABS( ) | Take the absolute value of a real number |
| ACOS( ) | Take the arc cosine of a real number |
| ASIN( ) | Take the arc sine of a real number |
| ATAN( ) | Take the arc tangent of a real number |
| CEILING( ) | Takes the ceiling of a real number (round up to the nearest whole number) |
| COS( ) | Takes the cosine of a real number |
| COSH( ) | Takes the hyperbolic cosine of a real number |
| EXP( ) | Calculate base e raised to the supplied power n |
| FLOOR( ) | Take the floor of a real number (round down to the nearest whole number) |
| INV( ) | Calculates the inverse of a non-zero real number |
| LOG( ) | Take the natural log of a real number |
| LOG10( ) | Take the logarithm to base 10 of a real number |
| ROUND( ) | Rounds a number up or down to the next whole number. |
| SIN( ) | Take the sine of a real number |
| SINH( ) | Take the hyperbolic sine of a real number |
| SQRT( ) | Take the square root of a real number |
| TAN( ) | Take the tangent of a real number |
| TANH( ) | Take the hyperbolic tangent of a real number |
The following aggregate level processing functions are available in Madeline (Table 6.2):
Table 6.2. Aggregate Functions Available in Madeline.
| Function Name | Description | Example |
| _oCount(<nExpr>) | Returns the count of the number of times the numeric expression, nExpr, evaluates to non-missing among the offspring of an individual. | // // Find the subset of // mothers for // whom the affection // status (naffecte) of all of // their children is known: // M>view for _noffspring>=1 and _oCount(naffecte)=_nOffspring |
| _oCountFalse(<nExpr>) | Returns the count of the number of times the numeric expression, nExpr, evaluates to FALSE (zero) among the offspring of an individual. | // // Find the subset of mothers // with at least two unaffected // offspring: // M>view for _oCountFalse(naffecte)>=2 |
| _oCountMissing(<nExpr>) | Returns the count of the number of times the numeric expression, nExpr, evaluates to MISSING among the offspring of an individual. | // // Find the subset of mothers // for whom one or more offspring // lack a glucose measurement // M>view for _oCountMissing(glu_fast)>=1 |
| _oCountTrue(<nExpr>) | Returns the count of the number of times the numeric expression, nExpr, evaluates to TRUE (non-zero, non-missing) among the offspring of an individual. | // // Find the subset of mothers // with exactly two affected // and two unaffected offspring // M>view for _nOffspring=4 and _oCountTrue(naffecte)= _oCountFalse(naffecte) |
| _oMean(<nExpr>) | Returns the mean offspring value of nExpr | M>go 1673 M>show studyid "0470-701" M>show sex "F" M>show bmi 22.975 M>show _noffspring 6 M>show _oMean(bmi) 26.3063 M>show _oMean(studyage) 36.1821 M> |
| _oStdDev(<nExpr>) | Returns the standard deviation of nExpr among the offspring of an individual | // // Find mothers for whom the // coefficient of variation in // glucose values among their // children is greater than or // equal to ˝: // M>view for _oCount(glu_fast)>=3 and _oStdDev(glu_fast)/_oMean(glu_fast)>=0.5 |
| _oSum(<nExpr>) | Returns the sum of nExpr among the offspring of an individual | // // find grandmothers with 20 or // more grandchildren // M>view for _oSum(_noffspring)>=20
|
| _oVariance(<nExpr>) | Returns the variance of nExpr among the offspring of an individual | M>go 35 M>show studyid "0009-500" M>show _noffspring 4 M>show _oVariance(bmi) 140.682 M>show _oMean(bmi) 34.5261 M> |
Section 7
String and Character Manipulation Functions
The following string and character manipulation functions are available in Madeline (Table 7.1):
Table 7.1. String and Character Manipulation Functions Available in Madeline.
| Function Name, parameters | Description |
| SubString(cString,nStart,nHowMany) | Extract a substring of nHowMany characters starting
at position nStart in string cString: M>what is substring("Hello, World!",1,5) Hello M> |
Section 8
Characteristics Of The Expression Parser
There are a few important things to note about the expression parser (Madelines command-line interpreter).
Madeline only supports exact string comparison. You can test for string equality using either = or ==. The comparison operator is the same in both cases. Two strings are equal if and only if (1) they are the same length and (2) have identical contents. Therefore, assuming FUSION-style IDs, this will exclude everyone in family 0100:
M>exclude for substring(studyid,1,4)="0100"
... but the following will not because FUSION study IDs are always more than four characters long:
M>exclude for studyid="0100"
Note that the latter case would work just fine in FoxPro and other data management systems that, by default at least, use = for inexact string comparisons.
Internal Representation of Logical True and False
In Madeline, logical false is equivalent to zero, and logical true is equivalent to not zero. This is identical to the way things work in the C language, but different from the way things work in many interpreted environments which represent true and false using an additional level of abstraction. This allows for certain syntactical conveniences. For example, suppose that you have a numeric field called AFFECTED coded with one for affected individuals and zero for unaffected individuals. In Madeline, you can do this:
M>exclude for affected
Assuming there are no other values in AFFECTED other than 0 and 1, this would be equivalent to:
M>exclude for affected=1
If you feel uncomfortable with this sort of economy of expression, you can always express exactly what you want as shown in the latter case. The latter usage would also be necessary if the affected field possibly contained missing values, since MISSING is a non-zero value.