Madeline Version 0.93

May, 2001

(c) 1999 by the Regents of the University of Michigan, Ann Arbor.


Contents


Section 1
Overview and Features

What is Madeline?

Madeline is software written in ANSI C/C++ for:

Supported Platforms

Madeline has been compiled for the following platforms:

FUSION Study Support

Madeline was designed to meet the needs of the Finland-United States Investigation of NIDDM Genetics (FUSION) Study. Because of this, Madeline has specific knowledge about FUSION study IDs. A subset of Madeline’s functionality makes use of this knowledge (see FUSION box below).

The program continues to be modified to make it useful for genetic studies in general. Paragraphs or headings preceded by "" describe FUSION-specific functionality.


FUSION
The Finland-United States Investigation of NIDDM Genetics Study

The aim of the FUSION study is to map and identify susceptibility genes for non-insulin dependent diabetes mellitus (NIDDM) and for the intermediate quantitative traits associated with NIDDM.

FUSION individual IDs are eight characters long. The first four characters represent the family ID. This is followed by a dash sign, plus sign or a letter between A-Z. Finally, three unique digits specify individuals within each pedigree.



Sample ID:                 1021+402
                           |   |  |
         +-----------------+   |  +---------------------+
         |                     |                        |
Family ID begins with:  Encoded flag symbol:       Individual ID:
- "0" for FUSION 1      "-" for FUSION 1           "100" for probands
- "1" for FUSION 2 fam. "+" for FUSION 2           "200" for fathers
- "C" for control fam.  "A" to "Z" for resampled   "300" for mothers
- "T" for Trios         FUSION records             "400" for siblings of the
                                                         proband (enumerated)
                                                   "500" for proband spouses 
                                                         (enumerated)
                                                   "600" for proband offspring
                                                         (enumerated)
                                                   "700" for sibling spouses
                                                         (enumerated)
                                                   "800" for sibling offspring
                                                         (enumerated)

  

Madeline is internally aware of the structure of FUSION IDs and uses this information in specific situations to:

  • Determine family IDs when a family ID field is not provided in the pedigree table.
  • Determine the proband in a sibship when a proband indicator field is not provided in the pedigree table.
  • Insert virtual FUSION parents in a family when only siblings were sampled, even when the father and mother fields contain missing values.
  • Connect spouses to one another even if they have no sampled offspring.
  • Re-connect dummied-in siblings (known via their offspring) to their parents to restore proper pedigree structure when required.

Madeline currently uses the following rules to determine if an ID in a dataset is a FUSION ID:

  • The ID must be exactly 8 characters long.
  • The first character must be in the set {0,1,C,T,9}. The numeral 9 is included to support constructed IDs used when part of a larger pedigree is split off for separate analytical treatment.
  • The fifth character must be in the set {-,+,A-Z}. The capital letters A-Z are allowed to support resampled IDs.
  • The sixth character must be in the set {0-9}. 0 is allowed to support FUSION 1 control "probands" who have a 0 instead of a 1 in this position.

A data set can easily contain a mixture of FUSION IDs and non-FUSION IDs. Only IDs meeting the above criteria will be construed as FUSION IDs.

 

Running the Program Interactively and in Batch Mode

Instructions to Madeline are entered at a command prompt. Madeline's command interpreter is not sensitive to capitalization. However, capitalization is often used in this document for clarity of presentation.

Madeline can be run interactively or in batch mode (Fig 1.1). To run Madeline interactively, type the name of the program at your system prompt and press return. Madeline’s "M>" prompt will appear.

There are two ways to run batch files. The first way is to provide the name of a batch file containing commands after the name of the program on the command line. The second way is to start Madeline interactively and then use the run command to execute the batch file. Madeline returns to interactive mode if an error occurs, or when a batch file terminates without a goodbye or quit command.

csvr1%                          <-- system prompt (on UNIX)
csvr1% madeline                 <-- starting the program in interactive mode

MADELINE Version  0.910
Copyright (c) 1999 by Edward H. Trager 
and the FUSION Study Group
(Finland-United States Investigation of NIDDM Genetics Study),
University of Michigan
Ann Arbor, Michigan, USA

Help facility loaded.

+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | ENGLISH   | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.00 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | 17:07 Tuesday, September 28, 1999       |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>
M>                              <-- Madeline’s  "M>" prompt appears
M> quit                         <-- quit interactive session
Releasing resources ...
Goodbye!
csrvr1% 
csrvr1% madeline process.batch  <-- starting the program in batch mode
Madeline Version  0.910
...
open '\test\chr20.dbf'          <-- executing first batch command
Calculating allele frequencies for   7. D20S173...
Calculating allele frequencies for  10. D20S889...
Calculating allele frequencies for  13. D20S898...
...

Fig. 1.1. Starting Madeline. Madeline can be run either interactively or in batch mode.

 

Start up Batch File

An option is available to set parameters and run commonly needed commands automatically each time Madeline is started by providing a special batch file called "autorun.bat" in the working directory where Madeline will be invoked.

Any commands that can normally be invoked on the command line or in a batch file can be placed into autorun.bat. Assignments to specify default field names or environmental settings are typically placed in autorun.bat (Fig. 1.2).

//
// Typical autorun.bat file for Unix/Linux environment:
//

//
// Environment settings:
//
quiet
set language to English
FileEditor="vi"
PostscriptViewer="gv"
//
// Pedigree drawing-specific settings:
//
set color off
set PaperSize to A4
// margin in centimeters:
set PaperMargin to 1.5
set orientation to automatic
//
// Pedigree database-specific settings:
//
GenderField='GENDER'
FamilyIDField='FAMILY'
IndividualIDField='INDIVIDUAL'
//
// Map standard missing value indicators:
//
NumericMissingValue[0]=-1
NumericMissingValue[1]=-9
//
// Map database-specific settings:
//
PositionField="POSTN"
OrdinalField ="ORDNL"

Fig. 1.2. Example autorun.bat file.

 

Starting with Madeline v. 0.91, a warning message is produced if an autorun.bat file is not found, and the "M>" prompt changes accordingly (Fig 1.3).

...
Could not find "autorun.bat" file.
...
1 WARNING M>

Fig. 1.3. In Madeline v. 0.91 and following, A warning is produced if autorun.bat file is not present.

 

Overview of Database Tables Used by the Program

A database table is a rectangular array of data. A record is a row in the array. A field is a column in the array. One row or record contains the data -- all the measured variables -- for one entity.

In Madeline, the measured entity is either an individual or a genetic marker. Key fields are fields that identify the entity. To uniquely identify an individual, two key fields are required: (1) a family identifier, and (2) an individual identifier. Data fields contain the data measured on the entity. Combinations of other fields will be required to identify other entities, such as a genetic marker. The specific set of key fields required depends upon the context.

In Madeline, only three types of database tables occur:

  1. Pedigree tables
  2. Genetic Map tables
  3. Marker tables

Each type is described in turn below.

Pedigree Tables

In a pedigree table, each row or record contains the data for one individual. In Madeline, the names of the family and individual ID fields are stored in variables called FamilyIDField and IndividualIDField, respectively. Basic pedigree reconstruction additionally requires knowledge of the father, mother, and gender of each individual. Therefore, Madeline defines a set of five core fields that must be present in every pedigree database:

  1. FamilyIDField -- database key field
  2. IndividualIDField -- database key field
  3. FatherIDField -- required for pedigree reconstruction
  4. MotherIDField -- required for pedigree reconstruction
  5. GenderField -- required for pedigree reconstruction

The remaining data fields in a pedigree database can be classified into two groups: (1) phenotype and (2) genotype fields. Madeline therefore classifies all fields in a database table into one of these three categories using the single-letter identifiers shown below:

  1. "C" -- core fields
  2. "P" -- Phenotype fields
  3. "G" -- Genotype fields

The complete set of core fields consists of the five obligatory core fields listed above, as well as some additional, non-obligatory core "phenotype" fields such as AffectionStatusField and DateOfBirthField.

Genetic Map Tables

A map table contains map information related to markers on one or more chromosomes. The key fields in a map table are:

  1. ChromosomeField -- chromosome on which marker appears
  2. MarkerField -- name of the marker

The data fields in a map table are:

  1. PositionField -- map position from p terminus in centiMorgans
  2. OrdinalField -- ordinal ranking of the marker in the map from 1 to n where n is the number of markers mapped for the given chromosome

Marker Tables

A marker table contains the alleles for a specific marker measured on a specific individual. Output from ABI machines is in this table format. This type of table has three key fields:

  1. FamilyIDField -- family of the individual
  2. IndividualIDField -- ID of the individual
  3. MarkerField -- name of the marker

There are only two essential data fields in a marker table:

  1. Allele1Field -- positive integral numeric label assigned to first allele
  2. Allele2Field -- positive integral numeric label assigned to second allele

In principle, the two allele fields could be represented by a single genotype field containing the numeric labels separated by a forward slash, "/". Madeline does not yet contain support for this option in marker tables.

Madeline provides support for integrating the information in a marker table into a pedigree table via the transpose and merge commands. The transpose command takes care of converting the paired allele fields into the single genotype fields expected in a pedigree table.

Supported Database Formats

Madeline currently supports xbase (FoxPro, dBase III/IV), Visual FoxPro and SAS transport file formats, and space-delimited, column-aligned ASCII flat files. Madeline supports flat file tables directly by referencing a binary header file created using the recognize command. All pedigree databases are opened using the open command. Madeline’s database engine detects operating system and file byte-ordering at run time, thus permitting database tables from PCs to be opened on Unix workstations, and vice versa.

Supported Data Types

Madeline’s database engine supports character, numeric (floating point and integer), and date types of the supported database formats. A logical data type such as the "L" field type of xbase is not supported: use appropriately coded numeric variables instead. Other derived types, such as date-time or monetary types are not supported.

Character Data

Character data are read from databases by trimming leading and trailing space characters. Thus, blank entries in a database appear as the empty string, "". When entered on the command line, literal character data must be delimited by a pair of matching single or double quotes, e.g., "0001-230" or '0980A'.

Numeric Data

All numeric data types are converted to double-precision floating point numbers. Literal numeric values are entered on the command line without delimiters.

Logical or Boolean Data

In order to support multiple file formats and missing values in a uniform manner, Madeline does not recognize a logical data type separate from the numeric data type. In contexts where a value is to be interpreted as a logical value, Madeline treats zero as _false, and any non-zero non-missing value as _true. Binary true/false data should thus be coded using a numeric field type with values of 0, 1, and a missing value indicator if required.

Date Data

Date data read from a file are automatically converted to Julian day integers. When entered at the command line, dates must be delimited between curly braces and must be entered according to the ordering and capitalization conventions of the current language setting (Fig. 1.4). Madeline recognizes spaces, commas, periods or forward slashes as delimiters between the month, day, and year elements of a date. Madeline recognizes correctly capitalized, unabbreviated month names and month ordinals. Madeline does not recognize two-digit years as belonging to the current century.

M>show {December 11 1963}
{Wednesday, December 11, 1963}
M>show {December 11, 1963}
{Wednesday, December 11, 1963}
M>show {12/11/1963}
{Wednesday, December 11, 1963}
M>show {12/11/63}
{Sunday, December 9, 63  <-- in the year 63 A.D, before the Gregorian Calendar
M>show {dec 11 1963}     <-- Madeline does not recognize abbreviated month names ...
{}			 <-- ...so this evaluates to a missing date
M>set language to Suomi
M>show {11.12.1963}
{keskiviikko 11.12.1963}
M>

Fig. 1.4. Dates in Madeline. Dates entered at the command line must be delimited by curly braces and must adhere to the ordering and capitalization conventions of the current language setting.

 

Extent of Date Support

Date data may be displayed on pedigree drawings. Dates may also be used in an expression passed to a view or a draw command, to a subsetting command such as exclude, or to the sort command (which sorts the order of individuals on a pedigree drawing). There is currently no support for writing date data to an output file.

Missing Value Support

Madeline supports entry of missing values from the command line, and also provides a simple mechanism for the user to define sets of values in a database that should be mapped as missing values when the database is read by Madeline.

On the command line, Madeline provides the following external representations of internal missing value indicators for the user to use:

Some supported database formats, such as flat files and FoxPro database files, do not provide native missing value support for character and numeric types. Even when missing value support is provided by a database format, protocols in a study may require that different types of missing value codes be used when recording missing values. For example, in the FUSION Los Angeles data, different negative integers were used to code for assay pending, no assay, and no tube conditions.

Madeline therefore permits the user to specify lists of values that are to be treated as missing values. These lists of missing value indicators are stored in two arrays. CharacterMissingValue[] is used whenever character fields, including genotype fields, are referenced. NumericMissingValue[] is used whenever numeric fields are referenced (Table 1.1). For simplicity, these arrays can be referenced using their abbreviated names, cmv[] and nmv[], respectively.

Table 1.1. Character and numeric missing value arrays in Madeline.
Full Name Abbreviated Name Default Values
CharacterMissingValue[] cmv[]
cmv[0] = "."
cmv[1] = "/"
cmv[2] = "0/0"
cmv[3] = "0/ 0"
cmv[4] = "0/  0"
NumericMissingValue[] nmv[]
nmv[0] = -9999

 

When data are read from a database, all native missing values (for example, a space-padded blank entry is a native missing value indicator in a flat file) and any values that match the values specified in Madeline’s CharacterMissingValue[] or NumericMissingValue[] arrays are converted to Madeline’s internal missing value indicators.

At startup, CharacterMissingValue[] and NumericMissingValue[] contain a set of default missing value indicators appropriate to most FUSION data. New values can be assigned to existing cells or appended to the end of these lists as required by the user (Fig. 1.5): this should be done before a database is opened so that the values will be recognized appropriately. The autorun.bat batch file is an appropriate place to set character and numeric missing value indicators. Note that all arrays in Madeline are zero-offset.

M>list cmv            <-- view CharacterMissingValue array
CMV has 5 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
M>cmv[5]="./."        <-- append new value to end of list
M>list cmv
CMV has 6 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
CMV[ 5]="./."
M>list nmv            <-- view NumericMissingValue array
NMV has 1 element:
NMV[ 0]=         -9999
M>nmv[0]=-1           <-- overwrite one value
M>nmv[1]=-9           <-- and append another value
M>list nmv
NMV has 2 elements:
NMV[ 0]=            -1
NMV[ 1]=            -9
M>

Fig. 1.5. Assigning missing value indicators. Missing value indicators may be assigned to existing cells or appended to the ends of Madeline’s character and numeric missing value lists.

 

1.9. Categorization of Data

Upon opening a pedigree table, Madeline categorizes each field into one of three categories:

When a field is completely empty or contains only missing values, Madeline assigns the field to a null category represented by an asterisk, "*".

When required, Madeline allows the user to designate a subset of "P" phenotype fields as "V" covariate fields using the toggle command. Madeline does not automatically assign fields to the covariate category. Field categories are summarized in Table 1.2 and described in greater depth below.

Table 1.2. Summary of Field Categories in Madeline.
Data Category Symbolic Designation Description
Core C Set of five required fields like GenderField that must be present in all pedigree databases, plus additional optional fields, like AffectionStatusField, that are not required by default but may be required for some operations.
Genotype G Character fields containing two numeric labels separated by a forward slash character, e.g., "141/142"
Phenotype P Character, numeric, or date fields that contain categorical or continuous phenotype information.
Covariate V A subset of phenotype fields that are to be used as covariates. The user must use the toggle command to change the designation of a "P" field to "V".
Null * Character, numeric, or date fields that are completely empty or contain only missing value indicators. In general, these fields cannot be operated upon.

 

Core Data Fields

Core "C" data fields provide key information about an individual (Table 1.3). Madeline identifies core fields by their names. These names are stored in internal variables whose values may be reassigned by the user. In conformance with the requirements of the supported database types, all letters of field names must be capitalized, and cannot exceed 10 letters in length. Madeline automatically capitalizes and truncates any non-conformant field name identifiers.

Core data fields are either required or optional. The absence of one or more of the five required core fields will generate an error when a database is opened ( -- An exception applies when FUSION 1 data are used --see below).

Optional core fields may be required for some operations, but are not required by default. Madeline makes use of the additional information provided in optional core fields whenever they are present. For example, Madeline’s pedigree drawing functionality is enhanced by the presence of fields for affection, death, index case, monozygotic and dizygotic twin status.

Table 1.3. Core Data Fields in Madeline.

Variable Name

Description

Default Value

Expected Field Type

I. Required Core Fields which must always be present1:

1. IndividualIDField

Individual identifier

"STUDYID"

Character only

2. FatherIDField

Father's identifier

"FATHER"

Character only

3. MotherIDField

Mother's identifier

"MOTHER"

Character only

4. GenderField

Gender

"SEX"

Character or numeric

5. FamilyIDField 1

Family identifier

"FAMID"

Character only

II. Optional Core Fields:

AffectionStatusField

Affection status

"NAFFECTE"

Numeric or character

DeathStatusField

Death status

"DECEASED"

Numeric or character

IndexCaseField

Index case or proband indicator

"PROBAND"

Numeric only

LiabilityClassField

Liability class

"LCLASS"

Numeric or character

MZTwinField

Monozygotic twin status indicator

"TWIN"

Character only

DZTwinField

Dizygotic twin status indicator

"DZTWIN"

Character only

DateOfBirthField

Date of birth

"DOB"

Date only

DateOfDeathField

Date of death

"DOD"

Date only

1 The FamilyIDField is not required when data are restricted to FUSION 1 IDs only.

 

Interpretation of Core Data

Madeline interprets data from required and optional core fields in order to reconstruct pedigrees and evaluate key information. A clear understanding of how Madeline interprets core data is essential to proper use of the program.

Use of Arrays To Map External Values Into Internal Meanings

A key aspect of Madeline’s generality and flexibility is the use of a set of arrays to map external data values into internal meanings. We have already seen how Madeline uses CharacterMissingValue[] and NumericMissingValue[] in order to map external missing value indicators to uniform internal missing value representations. If a value in a core field such as the field for gender, affection status, or death status does not map to a missing value, Madeline uses a designated array for mapping the external categorical value into an internal representation.

For example, suppose the GenderField contains the value "F" for some record. Since "F" is not a missing value listed in CharacterMissingValue[], Madeline looks in CharacterSexValue[] (abreviated as csv[]) and sees that "F" matches the second entry in the list, which is the entry reserved for female, _female. That is:

"F" = CharacterSexValue[ 1 ] = CharacterSexValue[ _female ]

So, Madeline knows that the individual is a female and records this information internally.

To insure that Madeline recognizes values in core fields correctly, assignments to the designated arrays must be made before any open or load command.

Database Field Naming Conventions

Different database file formats impose different restrictions on the length and format of field names in a database. For example, up to 10 characters can be used for field names in an xbase file, but only up to 8 characters in a SAS transport file. Although Madeline now supports several different file formats, the program originally only supported the xbase file format. As a result, Madeline restricts field name identifiers as follows:

Madeline does not actively check for errors such as spaces or disallowed characters in field identifiers. This is the user's responsibility. Madeline also has no way of knowing in advance what type of database file will be opened. For example, the program will not notice if you enter a ten-letter name for use with a SAS transport file that permits only 8-letter field identifiers.

Family Identifier

The value in FamilyIDField tells Madeline the name of the family ID field in the database. The default value is "FAMID".

  The FamilyIDField is not strictly required when FUSION-compliant IDs are used. When the FamilyIDField is not present, Madeline automatically extracts the family identifier from individual IDs which "look" like FUSION IDs. However, FUSION 2 databases are likely to have "95x" individuals who are connected to pedigrees via unstudied individuals who are assigned IDs that are not FUSION-compliant. Thus, the FamilyIDField is required when reading such databases.

Individual and Parental Identifiers

The values in IndividualIDField, FatherIDField, and MotherIDField serve to identify the individual and parent ID fields in the database. The default values are "STUDYID", "FATHER", and "MOTHER", respectively.

Parent IDs should be present in both the FatherIDField and MotherIDField of all non-founder individuals. The program interprets any individual with missing value indicators for both parents as a founder.

In the event that one of the two parent IDs is missing for an individual or individuals in a sibship, Madeline provides a randomly-generated eight-letter identifier to represent the missing parent. The randomly-generated IDs begin and end with exclamation marks to distinguish them from regular IDs. Using the generated ID, Madeline constructs a virtual parent in memory who will appear on pedigree drawings (Fig. 1.6) and in output from the write command. Madeline assumes that the sibs are full sibs sharing a single pair of parents.

Fig. 1.6. Virtual parent in Madeline. A virtual parent with a randomly-generated ID (right) is constructed when the ID of one parent is missing among a sibship of individuals (not shown). Sibs are assumed to be full sibs.

 

  When FUSION-compliant IDs are used, it is possible to leave the FatherIDField and MotherIDField of non-founders both missing in cases where Madeline can determine the IDs of the parents. For example, Madeline knows that the parental IDs of a "100" or "401" individual must end in "200" and "300" for the father and mother, respectively. Madeline first looks for parents sampled during FUSION 1 or FUSION 2 in the database. If parents are not found in the database, Madeline dummies-in virtual parents using FUSION 1 IDs. In other cases, if only one of the two parent IDs is missing, Madeline can reconstruct the correct ID of the missing parent from the parent whose ID is provided. For example, if a "801" individual is the offspring of a "402" sib, the missing parent’s ID must end in "702".

Gender Data

The default value for GenderField is "SEX". The GenderField can be either numeric or character. Madeline detects the field type when the database is opened. Madeline defines two constants, _male, which has a value of 0, and _female, which has a value of 1. These symbolic constants are used for indexing two arrays, NumericSexValue[] and CharacterSexValue[]. These arrays define the external values used in a database to designate gender (Table 1.4). Default values may be reassigned by the user as required.

Table 1.4. Character and Numeric Sex Value Arrays.
Array Name Abbreviated Name Default Values
CharacterSexValue[] csv[]
csv[_male  ] = "M"
csv[_female] = "F"
NumericSexValue[] nsv[]
nsv[_male  ] = 0
nsv[_female] = 1

 

In Madeline, only terminal individuals without offspring may retain a gender attribute of missing. If during pedigree reconstruction Madeline detects any father or mother with a missing gender attribute, the program will automatically change the gender of the individual in memory to be consistent with the reconstruction, and will warn the user of the change. The database file on disk will not be changed.

Madeline will also automatically correct the gender attribute of mislabeled individuals in memory, for example, of a male listed as a mother, or of a female listed as a father. Madeline always warns the user of these types of database errors. Again, the database file on disk will not be changed -- that is the user's responsibility.

Madeline will warn the user and then terminate abruptly if conflicting and unresolvable gender roles exist for an individual, for example if an individual is listed as both a mother and a father.

Monozygotic and Dizygotic Twin Data

The MZTwinField should remain blank or missing for non-twins, and should contain a single-letter identifier for each twin pair or group. For example, "A" can be used to designate the first twin pair in a family, "B" the second pair, and so on. Starting with version 0.90 of the program, MZTwinField is considered an optional core field.

The optional DZTwinField, if present, should be coded in a similar manner to designate dizygotic twins.

Affection Status Data

The AffectionStatusField may be either numeric or character. Madeline defines two symbolic constants for describing the affection status of sampled individuals (the underscores are used to avoid confusion with possible field names and are required):

In addition to these two categories, Madeline also recognizes these additional categories for mapping unstudied individuals:

These additional categories are useful for drawing extended pedigrees which may include unstudied individuals in addition to sampled individuals. Madeline defines two arrays, CharacterAffectionStatus[] and NumericAffectionStatus[], for mapping external affection status values to one of the five internally recognized categories (Table 1.5).

Table 1.5. Character and Numeric Affection Status Arrays.
Array Name Abbreviated Name Default Values
CharacterAffectionStatus[] cas[]
cas[_unaffected] = "0"
cas[_affected  ] = "1"
cas[_UnstudiedUnaffected] = "2"
(unstudied, reported unaffected)
cas[_UnstudiedAffected  ] = "3"
(unstudied, reported affected)
cas[_UnstudiedConflicting] = "4"
(unstudied, conflicting reports)
NumericAffectionStatus[] nas[]
nas[_unaffected] = 0
nas[_affected  ] = 1
nas[_UnstudiedUnaffected] = 2
(unstudied, reported unaffected)
nas[_UnstudiedAffected  ] = 3
(unstudied, reported affected)
nas[_UnstudiedConflicting] = 4
(unstudied, conflicting reports)

 

Note that categories 2-4 refer only to unstudied individuals. Guard against using the externally mapped values of categories 2-4 for sampled individuals, especially if the write command is used to produce a file for analysis.

Death Status Field

The optional DeathStatusField may be either numeric or character. Madeline defines the constants _alive, with a value of 0, and _dead, with a value of 1, for indexing the CharacterDeathStatus[] and NumericDeathStatus[] arrays used to map external values in the DeathStatusField into internal representations (Table 1.6).

Table 1.6. Character and Numeric Death Status Arrays.
Array Name Abbreviated Name Default Values
CharacterDeathStatus[] cds[]
cds[_alive] = "N"
cds[_dead ] = "Y"
NumericDeathStatus[] nds[]
nds[_alive] = 0
nds[_dead ] = 1

 

Index Case Field

The optional IndexCaseField must be numeric. Madeline assumes that the probands or index cases will be coded using a value of 1, and all other individuals with a value of 0.

 When FUSION-compliant IDs are used, Madeline automatically determines which individuals are probands directly from the IndividualIDField, making the IndexCaseField unneccesary.

Liability Class Field

Some output formats, such as Genehunter, have the option of including liability class information. The LiabilityClassField may be numeric or character. Madeline does not interpret the values in this field.

Date of Birth and Death Data

The DateOfBirthField and DateOfDeathField are optional core date fields. When present, Madeline performs checks to insure that dates in these fields are reasonable, and looks for twins based on date of birth who have not been designated as such in the MZTwinField or DZTwinField.

Genotype Data

Genotype "G" data are character fields that contain allelic marker data separated by the forward slash "/" character. The allele labels themselves must be numeric, non-alphabetic labels, e.g. "141/142".

The names of genotype fields should be the names of the markers themselves. This allows Madeline to automatically place the genotype fields into map order whenever a map database for the markers is loaded using the load command. Make sure that marker names in the map database are capitalized to correspond with the required capitalization of field names.

Estimation of Allele Frequencies from Genotype Data

When a database is opened, Madeline automatically estimates allele frequencies for all genotype fields using gene counting ignoring family relationships. Allele frequencies are estimated from all records in a database. Allele frequencies calculated from one database may be saved for use when processing another database using the set SaveAlleleFrequencies on command.

Phenotype Data

Phenotype "P" fields are any remaining fields that are not core "C" or genotype "G" fields. Phenotype fields may be character, numeric, or date fields, and are assumed to contain categorical or continuous phenotype information. Because date fields cannot be written to output from the write command, date fields are the only type of phenotype field not flagged for output when a pedigree database is opened.

For some types of output, it may be necessary to designate certain phenotype fields as representing covariates. Madeline therefore maintains a separate covariate or "V" field category which is a subset of the "P" category. Covariate fields are automatically recognized as phenotype fields when writing any format that doesn’t distinguish between phenotype and covariate fields. "P" fields can be marked as "V" fields using the toggle command.

Marking and Ordering Data Fields for Output

When a pedigree database is opened, most core "C" fields, all genotype "G" fields, and all phenotype "P" fields (except date fields), are flagged, or toggled on, for output by default. Madeline indicates which fields in a database are toggled for output by placing the letter "o" after the category indicator "C","G", or "P" (Fig. 1.7). A number after the "o" indicates the order in which fields will appear in pedigree drawings and file output. Fields may be manually reordered using the set field order command.

M>list fields
  1.FAMID      Co__1   20.D20S482    Go__6   39.D20S96     Go_25
  2.STUDYID    Co__2   21.D20S849    Go__7   40.D20S119    Go_26
  3.SEX        Co__3   22.D20S905    Go__8   41.D20S481    Go_27
  4.FATHER     Co__4   23.D20S846    Go__9   42.D20S836    Go_28
  5.MOTHER     Co__5   24.D20S892    Go_10   43.D20S888    Go_29
  6.TWIN       Co__6   25.D20S115    Go_11   44.D20S886    Go_30
  7.NAFFECTE   Co__7+  26.D20S851    Go_12   45.D20S197    Go_31
  8.BMI        Po__1   27.D20S917    Go_13   46.D20S178N   Go_32
  9.INS_FAST   Po__2   28.D20S894    Go_14   47.D20S866    Go_33
 10.INS_2H     Po__3   29.D20S189    Go_15   48.D20S196    Go_34
 11.BW_REAL    Po__4   30.D20S898    Go_16   49.D20S857    Go_35
 12.GLU_FAST   Po__5   31.D20S114    Go_17   50.D20S480    Go_36
 13.GLU_2H     Po__6   32.D20S912    Go_18   51.D20S211    Go_37
 14.GAD_DUP    Po__7   33.D20S477    Go_19   52.D20S840    Go_38
 15.D20S103    Go__1   34.D20S874    Go_20   53.D20S120    Go_39
 16.D20S117    Go__2   35.D20S195    Go_21   54.D20S100    Go_40
 17.D20S906    Go__3   36.D20S909    Go_22   55.D20S102    Go_41
 18.D20S193    Go__4   37.D20S107    Go_23   56.D20S171    Go_42
 19.D20S889    Go__5   38.D20S170    Go_24   57.D20S173    Go_43
M>

Fig. 1.7. Categorization of Fields in Madeline. The plus "+" sign after NAFFECTE indicates that Madeline has detected this field as the AffectionStatusField: categorical levels of this field will be used to color icon symbols on pedigree drawings. A field listing is shown when a database is first opened, or at any other time using the list fields command.

 

The order of genotype fields is automatically set to map order when a marker map database is loaded using the load command. Load can be issued either before (the preferred method) or after an open command. The order of genotype fields whose names match the names of markers in the map database will be set to the map order.

Fields toggled on for output are displayed in pedigree drawings created with the draw command.

When a write command is executed, the set of core "C" fields required by the specific format being produced will generally be output regardless of the on/off output flag status. For example, Madeline will output the GenderField even if you toggle it off because it is required for almost all output formats. This behavior is required to insure proper file construction. Genotype "Go" fields toggled for output will be written, along with phenotype "Po" (and possibly covariate "Vo") fields toggled for output if the analysis format supports phenotype fields. Some analysis programs, such as Genehunter and Siblink, do not use phenotype data beyond affection status (which is a core field).

Fields may be toggled on or off for output using the toggle command.

Genetic Map Data

Madeline makes use of marker map information to:

The load command is used to load a table containing genetic maps for one or more chromosomes. It may contain only one map for each chromosome. The map database must contain fields of information specifying the chromosome, rank or ordinal position of the marker within the map for a given chromosome, name of the marker, and the position of the marker in centiMorgans (Table 1.7). A map may be viewed using the list map command (Fig. 1.8).

Table 1.7. Map Database Fields in Madeline.
Variable For Storing Field Name Default Value Description
ChromosomeField "CHROMOSOME" Numeric field storing the chromosome number.
OrdinalField "ORDINAL" Numeric field storing the ordinal position or rank of the marker on the map for this chromosome.
MarkerField "MARKERNAME" Character field storing the name of the marker
PositionField "POSITION" Numeric field storing the map position from the p terminus in centiMorgans.

 

M>load '\maps\newmaps.dbf'
Marker maps based on \maps\newmaps.dbf are now installed.
M>list map for chromosome=7
Marker Name Ch Or Position
----------- -- -- --------
D7S2477      7  1    0.0000
D7S531       7  2    5.4000
D7S517       7  3    7.7000
D7S513       7  4   19.1000
D7S493       7  5   36.1000
D7S516       7  6   43.8000
D7S484       7  7   55.6000
D7S510       7  8   62.7000
D7S2422      7  9   74.2000
D7S669       7 10   87.4000
D7S657       7 11  102.6000
D7S515       7 12  111.8000
D7S2502      7 13  124.9000
D7S530       7 14  134.1000
D7S640       7 15  140.5000
D7S495       7 16  145.7000
D7S2513      7 17  150.9000
D7S483       7 18  167.7000
D7S550       7 19  182.4000
M>

Fig. 1.8. Loading and viewing marker maps in Madeline. A map database is installed using the load command. The list map command is used to print a table showing marker name, chromosome, mapped order, and position in centiMorgans.

 

 When using FUSION data with Madeline v. 0.90 and above, be sure to include the following two lines in your batch file, or in the autorun.bat file, in order to define the map database field names used in FUSION:

OrdinalField ="POSITION"
PositionField="KOSAMBICM"

 

Log and Error Reporting Features

Madeline produces three types of log files (Table 1.8). The first is a summary file that has a ".log" extension by default and records each command that was entered and a summary of execution results. For example, results of a write command indicate how many pedigrees and individuals were included, how many were excluded, and the total number of pedigrees and individuals. The second is a detail file that has a ".dtl" extension by default. It provides detailed information on which pedigrees and individuals were excluded and why they were excluded. The third log file is an error log that has a ".err" extension by default and records warning and error conditions that occur.

Table 1.8. Log Files in Madeline.
Type of File Default Name Purpose
Summary madeline.log Records commands and summaries of execution results.
Detail madeline.dtl Records details regarding inclusion and exclusion of individuals and pedigrees.
Error madeline.err Records warning and error conditions.

 

Display of Warning and Error Levels

If manageable errors do occur when a new pedigree database is opened, Madeline’s interactive "M>" prompt changes to display the number and type of error conditions detected. For example, "1 SYNTAX ERROR 10 WARNINGS M>" would indicate that one syntax error and 10 manageable error conditions or warnings occurred. Altogether, the program maintains four categories of warnings and errors:
  1. Syntax errors
  2. Warnings
  3. Severe Warnings
  4. Fatal Errors

A syntax error refers to an error in typing a command on the command line or in a batch file. A warning often indicates a manageable database error such as having only one instead of both parents listed in a database. A severe warning indicates a more severe type of database error such as having a male listed as the mother of an individual. Madeline will try to manage this type of situation, for example by changing the sex of the "male" mother to female. Such a change does not guarantee that the situation is remedied, much less correct: later in the same database, the "male" mother may turn out to be listed as the "father" of another child! This would cause a fatal error, causing the program to terminate, because there is no way to rectify such inconsistent information. The warning and error conditions may be reviewed in the error log.

Pedigree Reconstruction and the Categorization of Individuals

When a pedigree database is opened, Madeline reconstructs pedigrees based on the core data fields. When records for the parents of non-founder individuals are absent from the database, Madeline dummies-in the parents using the IDs shown in the FatherIDField and MotherIDField. If one of the two parental IDs is missing, Madeline creates a random ID for the missing parent . Random IDs are always eight characters in length and begin and end with an exclamation point (e.g., "!EW12M5!", "!G79ER5!", etc.) to facilitate recognition.

 When FUSION IDs are used, Madeline dummies-in parents even when parental IDs are not provided in the FatherIDField and MotherIDField, and joins together spouses when they don’t have any offspring.

After reconstructing pedigrees, Madeline classifies individuals into categories (Table 1.9) and summarizes their distribution in a table (Fig. 1.9). Attached individuals are individuals in the database who have either parents, or offspring, or both. Unattached individuals are in the database, but remain unconnected because they don’t have parents or offspring. Unattached individuals often represent a set of unrelated controls in a data set.

 In the current version, Childless spouses can only be detected when FUSION IDs are employed. When a FUSION couple does not have children listed in the database, usually one of the individuals has other connections to the pedigree and falls into the attached category, while the remaining spouse usually has no other connections to the pedigree and so is categorized as a childless spouse.

Table 1.9. Classes of Individuals in Madeline.
Category Description
In Database:
Attached Individuals in the database who have parents and/or offspring.
Childless Spouses Married individuals in the database who do not have children and who are not otherwise attached to a pedigree.

  In the current version, Madeline detects marriages without offspring only when FUSION IDs are employed.

Unattached Individuals in the database who remain unconnected. These may be controls.
Not In Database:
Not In Database Parents without records in the database who are inserted by Madeline.

 

M> open ‘\test\test.dbf’
         .
         .
         .
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        590         0       590
Individuals .................      3,317         0     3,317
 + In database ..............      2,178         0     2,178
 |  + Attached ..............      2,164         0     2,164
 |  + Childless spouses .....         14         0        14
 |  + Unattached ............          0         0         0
 + Not in database ..........      1,139         0     1,139
M>

Fig. 1.9. Summary table of pedigree count and distribution of individuals by category in Madeline. After a database is opened and pedigrees reconstructed, Madeline displays a table showing the number of pedigrees and distribution of individuals by category.

Madeline provides _unattached, _ChildlessSpouse, and _InDatabase as references which return boolean status information regarding the categorization of an individual. These references can be easily used in queries to find out about the categorization of individuals (Fig. 1.10).

M>view for _ChildlessSpouse
0007-500 in 0007 (rec. no.    42) * childless spouse *
0049-500 in 0049 (rec. no.   276) * childless spouse *
0409-500 in 0409 (rec. no.  2433) * childless spouse *
0442-500 in 0442 (rec. no.  2628) * childless spouse *
0497-500 in 0497 (rec. no.  2912) * childless spouse *
1040+500 in 1040 (rec. no.  3917) * childless spouse *
1360+500 in 1360 (rec. no.  4853) * childless spouse *
1366+500 in 1366 (rec. no.  4862) * childless spouse *

8 individuals in 8 pedigrees matched as follows:

Individuals ..............          8
 + In database ...........          8
 |  + Attached ...........          0
 |  + Childless spouses ..          8
 |  + Unattached .........          0
 + Not in database .......          0
M>

Fig. 1.10. References returning boolean status information about individuals, such as _ChildlessSpouse, can be easily incorporated into queries in Madeline.

Data Classifications of Individuals

Before writing a file in a specific format using the write command, Madeline determines which individuals in a pedigree have data that can be used in an analysis of that pedigree. Madeline does this by examining the phenotype "Po" and genotype "Go" fields toggled on for output. Madeline uses this information when deciding which individuals are required in output. This is described in more detail in Data Evaluation and Management.

After the file has been written, Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category (Fig. 1.11). In this table, Madeline sub-categorizes attached individuals based on whether they have data or not, or have been otherwise marked for exclusion by the user. Note that individuals marked for exclusion may actually be included in output, but without their data, in order to preserve pedigree structure.

M>write to ‘\test\test.ped’ in genehunter format
          .
          .	
          .
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        574        16       590
Individuals .................      3,247        70     3,317
 + In database ..............      2,140        38     2,178
 |  + Attached ..............      2,140        24     2,164
 |  |  + With data ..........      2,139        15     2,154
 |  |  + Without data .......          1         9        10
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....          0        14        14
 |  + Unattached ............          0         0         0
 + Not in database ..........      1,107        32     1,139
M>

Fig. 1.11. Summary table after a write command in Madeline. Madeline displays a summary table showing the distribution of included and excluded pedigrees and individuals by category. Attached individuals (in bold) are sub-categorized based on whether they have data or not, or have been marked for exclusion by the user.

 

Twin Management

When present, Madeline relies on information contained in the MZTwinField, DZTwinField, and DateOfBirthField to evaluate monozygotic and dizygotic twinships. When the optional DateOfBirthField is included, Madeline verifies that birth dates of twins match. Verification is extended to dizygotic twins when the optional DZTwinField is also included.

When the DateOfBirthField is included, Madeline looks for twins who are not marked in the either MZTwinField or DZTwinField (if present). Apparent twins of opposite sex are categorized as dizygotic twins. Apparent same-sex twins are assigned to a special twin of unknown type category. Twins whose type is unknown are shown with a question mark between them in pedigree drawings.

If Madeline encounters single, unpaired individuals marked as twins in the MZTwinField or DZTwinField, the program automatically removes the twin flag and informs the user of the change. The flag is only altered in memory -- the data table itself remains unchanged.

Messages about twinships are recorded in the summary and detail log files.

Consanguinity

Madeline automatically detects consanguinity in pedigrees. Messages about consanguinity are recorded in the summary and detail log files.

Multiple Mates

There is no limit to the number of spouses that an individual in a pedigree may have. Pedigree drawings can display up to 10 spouses of a single individual.

Multiple Original Founders

Madeline can model pedigrees having multiple original founders. When the DividedPages flag is on (the default), Madeline's draw command will draw pedigrees consisting of an ancestral founder with one or more founding spouses on a single virtual page. Pedigrees consisting of two or more founding ancestral mate groups will be printed on multiple virtual pages (Whether a single virtual page is printed on one or more physical pages depends on the setting of orientation and the unscaled dimensions of the drawing).

Data Evaluation And Management

Prior to writing output in a specific format, Madeline determines which individuals in a pedigree have data that can be used for analysis by examining the genotype "Go" fields and, if appropriate, the phenotype "Po" and covariate "Vo" fields toggled on for output.

In general, an individual is considered to have genotype data if he is typed for at least one marker among the set of "Go" fields. If applicable, an individual is considered to have phenotype data if all of his or her "Po" and "Vo" fields are non-missing.

After flagging individuals in a pedigree who have usable data, Madeline decides whether the entire pedigree is usable or not. Madeline’s decisions depend on the specific format keyword associated with the write command. For example, using the GenehunterNpl keyword (for a non-parametric analysis) will result in a different set of pedigree exclusions than the genehunter keyword (for a parametric analysis), although there will certainly be overlap in the sets.

Only required individuals in included pedigrees are written to output. Required individuals consist of individuals who:

For example, records for unsampled parents are often required to show relationships among siblings. Terminal individuals without offspring who do not have data are excluded from output. Individuals who have been marked for exclusion by the user using the exclude command will be included, but without their data, only if they are required to maintain pedigree structure. Otherwise, they will be excluded.

It is possible to turn off Madeline's data evaluation machinery for most formats in order to include possibly unusable pedigrees and individuals in output by issuing the command set autoexclude off.

Tracking Inclusion and Exclusion of Pedigrees and Individuals

Madeline’s detail log file records which pedigrees were excluded from output. Fig. 1.12 shows an example detail log produced after requesting an output file in GenehunterNpl format.

         .
         .
         .
GenehunterPedigreeHasData(): excluding pedigree 0547: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0557: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0558: lacks an individual with data.
GenehunterPedigreeHasData(): excluding pedigree 0560: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0572: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0583: contains only a single affected 
individual.
GenehunterPedigreeHasData(): excluding pedigree 0587: contains only a single affected 
individual.
         .
         .
         .

Fig. 1.12. Excerpt from a Madeline detail log file produced after requesting output in GenehunterNpl format. Madeline’s detail log file records which pedigrees were excluded from output and why.

In addition, a draw command executed after a write command will automatically produce annotated pedigree drawings showing which individuals:

An example is shown in Fig. 1.13. In this example, the user marked individuals with a body mass index (BMI) greater than or equal to 35 for exclusion using the exclude command and then requested an output file in GenehunterNpl format.

Fig. 1.13. Annotated pedigree drawing produced by draw after a write command in Madeline. Madeline dummied-in the two founding parents, "200" and "300", who are indicated by dashed lines. They were included ("INCLUDED") in output. Two individuals, "500" and "601", were marked for exclusion by the user. The terminal individual, "601", was not included in output ("EXCLUDED"), but "500" was retained with data excluded in order to preserve pedigree structure ("DATA EXCL INDV INCL"). The remaining individuals are all annotated as having genotype data and were included in output ("HAS DATA - INCLUDED"). Affected individuals are shaded and labeled with "A", while unaffected individuals are unshaded and labeled with "U".

 

Queries and Subsetting

Madeline provides powerful mechanisms for querying and subsetting records in pedigree tables. Database management systems can generally match query criteria against only one record at a time. In contrast, Madeline is specialized for dealing with multiple relationships in a pedigree simultaneously.

Madeline provides mechanisms for referring to related records within a single query statement. In Madeline, you can reference an individual, his or her mother or father, mates, and offspring all in a single query statement.

You can also reference aggregate or summary information related to an entire sibship, such as the mean sibship value of a variable, as easily as you can reference values related to single individuals. These two mechanisms -- referencing related individuals and referencing sibship aggregate data -- make it easy to get answers to many questions in Madeline that can be tedious to obtain in general database management systems.

Referencing Internal Information About An Individual And Relatives

Madeline allows the user to look at internal information about an individual and his or her relatives using references. References are a subset of keywords which begin with an underscore character to distinguish them from similarly-named variables or fields in databases. There are two types of references:

References to Internal Information About An Individual

Madeline provides references to many items of internal information about an individual, such as the number of offspring (_noffspring) and number of mates (_nmates) an individual has, and total number of individuals in the individual's pedigree (_n). Example usage is shown in Fig. 1.14. Table 5.4 lists all references to internal information.

M>go 1901          <-- go to record no. 1901
M>show studyid     <-- display the studyid of this individual
"05100"
M>show bmi         <-- display body mass index
48.9809
M>show cpep        <-- display c peptide value
0.88
M>show _noffspring <-- display number of offspring
4
M>show _nmates     <-- display number of mates
1
M>show _n          <-- display total number of individuals in this individual’s pedigree
16
M>

Fig. 1.14. References to internal information about an individual in Madeline. Command lines shown in blue are examples of references to internal information that Madeline maintains about each individual.

References To Relatives

Madeline also maintains references which point to relatives of an individual (Fig. 1.15). The references to mates, _mate[], and offspring, _o[], are treated as arrays. Alternate references such as _spouse for _mate[0] and _FirstChild for _o[0], are also provided for convenience.

References can be chained using the dot operator, ".", in order to access information related to more distant relatives. For example, a maternal grandmother may be referenced using _mother._mother. Example usage is shown in Fig. 1.15. A complete list of references to relatives is provided in Table 5.4.

M>go 6174                  <-- go to record no. 6174
M>show frstname            <-- first name of individual
"William"
M>show lastname            <-- last name of individual
"Goodman"
M>show _noffspring         <-- number of offspring
11
M>show _nmates             <-- number of spouses
1
M>show _mate[0].frstname   <-- first name of  spouse
"Tessie"
M>show _FirstChild.dob     <-- date of birth of first listed child
{Thursday, May 30, 1957}
M>show _SecondChild.dob    <-- date of birth of second listed child
{Monday, December 19, 1966}
M>show _o[10].dob          <-- date of birth of eleventh listed child
{Sunday, January 25, 1953}
M>show _mother._mother.dob <-- date of birth of maternal grandmother (unknown)
{ }
M>show _mother._mother.lastname <-- last name of maternal grandmother
"Toughwoman"
M>

Fig. 1.15. Using References to Relatives in Madeline. Command lines using references to relatives are shown in blue. Note that children in the offspring vector are sorted by IndividualIDField, not by date of birth.

Aggregate Functions

In addition to references to individual information and relatives, Madeline provides aggregate functions that allow one to look at aggregate or summary information -- such as means and standard deviations -- of the offspring of an individual (Fig. 1.16).

M>go 1577            <-- go to record no. 1577
M>show studyid       <-- display studyid
"044301"
M>show _noffspring   <-- display number of offspring
2
M>show _o[0].bmi     <-- body mass index of first child
31.1327
M>show _o[1].bmi     <-- body mass index of second child
32.7896
M>show _omean(bmi)   <-- mean body mass index of offspring
31.9612
M>show _ostddev(bmi) <-- standard deviation of offspring bmi
1.17156
M>

1.16. Aggregate Functions In Madeline. Aggregate functions (blue) allow one to look at summary information such as means and standard deviations of the offspring of individuals.

 

All aggregate functions take as an argument an expression which evaluates to a numeric result. Table 6.2 lists the aggregate functions available in Madeline.

Query and Subsetting Commands

The view command retrieves a subset of records that match query criteria. The exclude command allows the user to mark a subset of records for exclusion from output. The unexclude command performs the opposite function -- unmarking a subset of records previously marked for exclusion. Starting with version 0.90, the draw command can now also be invoked with a query expression in order to draw a subset of pedigrees. Example usage is shown in Fig. 1.17.

M>view for _noffspring>=3 and _omean(bmi)>=50
2113-100 in 2113 (rec. no.    32)
2113-500 in 2113 (rec. no.    35)

2 individuals in 1 pedigree matched as follows:

Individuals ..............          2
 + In database ...........          2
 |  + Attached ...........          2
 |  + Childless spouses ..          0
 |  + Unattached .........          0
 + Not in database .......          0

M>exclude for _noffspring>=3 and _omean(bmi)>=50
2113-100 has been marked for exclusion
2113-500 has been marked for exclusion

2 individuals in 1 pedigree marked for exclusion as follows:

Individuals ..............          2
 + In database ...........          2
 |  + Attached ...........          2
 |  + Childless spouses ..          0
 |  + Unattached .........          0
 + Not in database .......          0
M>draw pedigrees for _noffspring>=3 and _omean(bmi)>=50
1 pedigree in result set
calling "gs madeline.ps"
M>

Fig. 1.17. Query and Subsetting Commands in Madeline. In this example, the view command is used to identify parents having three offspring whose mean body mass index is greater than or equal to 50. The query result set contains one pair who are excluded using exclude. The draw command is then invoked with the same query expression in order to draw the relevant pedigree. The command draw pedigree '2113' could also have been used.

 

Pedigree Drawings

Madeline's draw command produces drawings of pedigrees using Adobe Postscript language procedures and document structuring conventions (Fig. 1.18).

Fig. 1.18. An example pedigree drawn by Madeline. In this example, two categorical variables indicating disease conditions are graphically displayed on the left and right halves of the icons. The status of the first condition, on the left side, is coded using "U" for unaffected and "A" for affected. On the right side, the status of the second condition is coded using "U" for unaffected, "M" for moderate, and "S" for severe. Missing values are indicated by dots, ".". The icon drawn with a dashed line perimeter indicates an individual whose record was not found in the database. No ID was provided in the FatherIDField of the gender-unknown offspring, and so the program has assigned a random ID of !21A3F8! to the missing father. (The displayed data were invented to illustrate the drawing capabilities of the program).

 

Pedigree drawings can display any number of field variables present in a dataset. The toggle command is used to select fields for inclusion on a pedigree drawing.Toggle output flags toggles which fields appear as labels under the icons on a pedigree drawing. The set field order command is used to order selected fields within their respective categories, "C" ,"P", or "G". On drawings, core "Co" fields always appear first, followed by phenotype "Po" fields, and finally genotype "Go" fields.

Toggle icon flags toggles on or off the set of categorical variables to be displayed graphically by shading or coloring regions of the male and female icons. Madeline divides the icon into pie-slice shading regions based on the number of categorical variables selected. The program does not impose a limit on the number of categorical variables that can be graphed simultaneously.

The manner in which subtrees are divided across pages, the paper orientation, size, margins, and color may all be set using various set commands. When DividedDrawings is set on (the default), subtrees of a pedigree originating from different founding ancestor groups are printed on separate pages. Orientation may be set to portrait, landscape, automatic, or MultiPage. When orientation is set to automatic or MultiPage, Madeline decides on the orientation of individual pedigrees depending upon the width and height of each drawing. In the event that a drawing would require excessive reduction to fit on a single page, Madeline will automatically include Postscript commands to print the drawing in poster-style across several physical pages.

Madeline's Postscript drawing routines are efficient, typically permitting the construction of hundreds of drawings per second on a modern Sun SparcStation or Intel Pentium machine. In order to view the drawings on screen, the user needs to assign the name of a Postscript viewing application (such as GhostView, GV or GSView) to Madeline's PostscriptViewer variable (Fig. 1.17). This can be done in the autorun.bat file.


  14.GAD_DUP    Po__7   33.D20S477    Go_19   52.D20S840    Go_38
  15.D20S103    Go__1   34.D20S874    Go_20   53.D20S120    Go_39
  16.D20S117    Go__2   35.D20S195    Go_21   54.D20S100    Go_40
  17.D20S906    Go__3   36.D20S909    Go_22   55.D20S102    Go_41
  18.D20S193    Go__4   37.D20S107    Go_23   56.D20S171    Go_42
  19.D20S889    Go__5   38.D20S170    Go_24   57.D20S173    Go_43
 M>toggle output flags for 1,3-6,20-57
 M>PostscriptViewer="gv"
 M>draw pedigrees for bmi>=45
 Drawing pedigree 0009, 0009-300's subtree (page 1 of 1) ...
 Drawing pedigree 0086, 0086-300's subtree (page 1 of 1) ...
 Drawing pedigree 0213, 0213+300's subtree (page 1 of 1) ...
 Drawing pedigree 0235, 0235-300's subtree (page 1 of 1) ...
 Drawing pedigree 0305, 0305-300's subtree (page 1 of 1) ...
 Drawing pedigree 0322, 0322-300's subtree (page 1 of 1) ...
 Drawing pedigree 0547, 0547-300's subtree (page 1 of 1) ...
 Drawing pedigree 0572, 0572-300's subtree (page 1 of 1) ...
 Drawing pedigree 0808, 0808+300's subtree (page 1 of 1) ...
 Drawing pedigree 1082, 1082-300's subtree (page 1 of 1) ...
 Drawing pedigree C161, C161+500's subtree (page 1 of 1) ...

 11 pedigrees in result set.

 Calling "gv madeline.ps" ...
 M>
 
 

Fig. 1.17. Drawing pedigrees in Madeline. Toggle output flags specifies which fields will appear on the pedigree drawings. Draw pedigrees for ... specifies a subset of pedigrees that match the query criteria. Madeline calls the Postscript viewing application named in PostscriptViewer (gv in the Linux environment shown).

Producing Output Files for Analysis

The write command is used to produce locus, pedigree, and control or parameter files for analysis. Keywords like Mendel and GenehunterNpl are used to specify the analysis file format.

For most formats which require a control or parameter file, a single write command suffices to produce both the pedigree and control file. In these cases, the control file often contains the required locus information. For some other formats, the command write locus file is used to produce the locus file separately from the write pedigree command used to create the pedigree file. Section 4, Write Formats, documents the procedure required for supported formats.


Section 2
Tutorial

Introduction to the Tutorial

Madeline is easy to use once you see how it works. The goal of this section is to enable you to use Madeline to accomplish real tasks in a very short time. An instructive command file is shown in Fig. 2.1. Comment lines begin with two forward slashes, "//". Command lines are shown in bold. The effect of each command or group of commands is described in turn.

// Assign log files:
LogFile='chr8.log'
DetailFile='chr8.dtl'
ErrorFile='chr8.err'
quiet
system "dir \databases\chr8.*"
// Map missing value indicators:
list nmv
nmv[0]=-1
nmv[1]=-9
list nmv
// Map core field names:
GenderField='GENDER'
AffectionStatusField="AFFECTSTAT"
// Map codes used in core fields:
list csv
csv[_female]='FEMALE'
csv[_male]='MALE'
list csv
// Load a database containing genetic maps:
load '\maps\emap.dbf'
list map for chromosome=8
// Open pedigree database:
open '\databases\chr8.dbf'
// toggle off output of phenotype fields:
toggle output flag for bmi
list fields
// Example 1: Create files for Mendel USERM13 analysis:
write locus file to '\analysis\mendel.loc' in mendel format
write pedigree file to '\analysis\userm13.ped' in userm13 format
// Example 2: Create files for Genehunter non-parametric linkage analysis:
write locus file to '\analysis\ghnpl.loc' in genehunter format
write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format
// Example 3: Create files for Siblink affected sib pair analysis:
// First, mark some individuals for exclusion:
exclude for bmi>=35
write to '\analysis\asp.ped' in SiblinkAffectedPairs format
// Draw pedigrees:
list fields
toggle output flags for 2-5, bmi, affectstat, 12-20
list fields
drawingfile='pedigrees.ps'
set color off
set orientation to automatic
set papermargin to 1.5
AffectstatLabel[0]="U"
AffectstatLabel[1]="A"
draw pedigrees ‘0001’-‘0005’,'0472','0570'
// End session:
goodbye

Fig. 2.1. Example Madeline command file.

 

This tutorial includes sample commands to map missing values, assign core field names, and designate codes used in core fields. These commands are typically required, but some of them will not be needed when FUSION data are used. Madeline is generally quite flexible about the order in which commands are executed. The tutorial presents a recommended command sequence.

Assigning Log Files

LogFile, DetailFile, and ErrorFile store the names of the summary, detail, and error logs. By default, LogFile is set to "madeline.log", DetailFile to "madeline.dtl", and ErrorFile to "madeline.err". If the default names are used, these files will be overwritten each time you start Madeline. When you provide new assignments (Fig. 2.2), the current contents of the log files are copied to the new files, and all subsequent messages are redirected to the new files. Reassignment of the log and detail files should be done at the beginning of a session.

M>LogFile='chr8.log'
LogFile has been changed from "madeline.log" to "chr8.log"
M>DetailFile='chr8.dtl'
DetailFile has been changed from "madeline.dtl" to "chr8.dtl"
M>ErrorFile='chr8.err'
ErrorFile has been changed from "madeline.err" to "chr8.err"
M>

Fig. 2.2. Reassigning summary, detail, and error log file names in Madeline.

 

Quiet

By default, Madeline is in verbose mode. In verbose mode, all messages, both summary and detail log messages, are sent to the screen. Writing many messages to the screen slows the program down a bit and may be distracting, so Madeline supports two quieter levels. When quiet is issued, summary log messages continue to be printed to the screen, but detail log messages are suppressed from the screen. When silent or silence is issued, neither summary nor detail messages appear on the screen. Error messages are always printed to screen regardless of the verboseness setting. To return from a quiet state to the default, issue verbose. Under all circumstances, messages continue to be printed to the summary and detail log files, as appropriate. Quiet mode is recommended on platforms such as DOS32 and Windows that lack scrollable terminal window buffers.

System ‘dir \databases\chr8.*’

The system command transfers a quoted-string command to the operating system shell. This allows the user to obtain directory and file information, copy or move files, or run other software without having to exit Madeline. System is especially useful when you need to obtain file or directory information using the DOS dir command or the UNIX ls command.

Mapping Missing Value Indicators

Nmv is the abbreviated name for the NumericMissingValue array. The list command instructs Madeline to list the elements of the array (Fig 2.3).

M>list nmv
NMV has 1 element:
NMV[ 0]=         -9999
M>nmv[0]=-1
M>nmv[1]=-9
M>list nmv
NMV has 2 elements:
NMV[ 0]=            -1
NMV[ 1]=            -9

Fig. 2.3. Mapping missing value indicators in Madeline.

 

By default, nmv[] contains a single element, -9999, which is a default missing value indicator used in the FUSION study. The assignment nmv[0]=-1 overwrites the value of the first cell with -1. The assignment nmv[1]=-9 assigns -9 to the second cell, automatically expanding the array if necessary. -1 and -9 will now be automatically recognized as missing value indicators when subsequently reading values in a database. Madeline’s self-expanding arrays do not impose a limit on the number of missing value indicators which may be used in a database.

Mapping Core Field Names

In a general setting, the names of core fields in a pedigree database may differ from the default names used in Madeline which are based on field names encountered in the FUSION study. Assignments to the appropriate core field name variables (Fig. 2.4) instruct Madeline to recognize core field names when a pedigree database is opened subsequently. Madeline will automatically capitalize and truncate field names to 10 letters if necessary.

M>GenderField='GENDER'
M>AffectionStatusField="AFFECTSTAT"

Fig. 2.4. Mapping Core Field Names in Madeline.

 

Mapping Codes Used In Core Fields

Arbitrary sets of codes may be used to represent core categorical information such as gender or affection status. Assignments to the appropriate arrays instruct Madeline to recognize study codings correctly. Fig. 2.5. shows how to tell Madeline to recognize the gender codes "MALE" and "FEMALE" in a database in place of the default codes "M" and "F". By using the symbolic constants _female and _male to index the array, you don't have to remember specifically which cell is reserved for which sex.

M>list csv
CSV has 2 elements:
CSV[ 0]="M"
CSV[ 1]="F"
M>csv[_female]='FEMALE'
M>csv[_male]='MALE'
M>list csv
CSV has 2 elements:
CSV[ 0]="MALE"
CSV[ 1]="FEMALE"

Fig. 2.5. Mapping codes used in core fields in Madeline.

 

Loading a Database of Genetic Maps

The load command (Fig. 2.6) loads a table containing genetic maps for one or more chromosomes. The map table can be in any of the supported input database formats. It may contain only one map for each chromosome. The map table must contain fields of information specifying the chromosome, the rank or ordinal position of the marker within the map for a given chromosome, the name of the marker, and the position of the marker in centiMorgans.

After load, Madeline will indicate that marker maps have been installed. You can view a map by issuing list map for chromosome=n, where n is a valid chromosome number (the human x chromosome may be designated by 23). To obtain a listing of all markers for all chromosomes present in the table, issue list map by itself.

M>load '\maps\emap.dbf'
Marker maps based on k:\emap\emap.dbf are now installed.
M>list map for chromosome=8
Marker Name Ch Or Position
----------- -- -- --------
D8S504       8  1    0.0000
D8S550       8  2   15.1000
D8S258       8  3   30.1000
D8S283       8  4   55.0000
Beta3        8  5   59.8000
D8S285       8  6   66.4000
D8S260       8  7   71.3000
D8S530       8  8   80.7000
D8S270       8  9   94.4000
D8S276       8 10  105.0000
GATA101F01   8 11  111.4000
D8S514       8 12  122.2000
D8S284       8 13  135.3000

Fig. 2.6. Loading a database containing genetic maps in Madeline.

 

Toggling Fields

The USERM13, Genehunter, and Siblink pedigree files that will be written subsequently do not include phenotype information. With the exception of core "C" fields which Madeline controls, it is imperative to toggle off all fields in the database which should not be included in the output and which should not be considered when Madeline decides whether an individual or pedigree contains sufficient data for output. This is done using the toggle command (Fig. 2.7). The list fields command can then be used to verify that the correct subset of fields were turned off.

// toggle off output of phenotype fields:
M>toggle output flag for bmi
Note: genotype fields ordered according to current map
M>list fields
  1.STUDYID    Co__1    8.BMI        P       15.D8S276     Go__9
  2.GENDER     Co__2    9.D8S504     Go__1   16.D8S283     Go__4
  3.FATHER     Co__3   10.D8S550     Go__2   17.D8S285     Go__5
  4.MOTHER     Co__4   11.D8S258     Go__3   18.D8S260     Go__6
  5.TWIN       Co__5   12.GATA101F01 Go_10   19.D8S530     Go__7
  6.AFFECTSTAT C       13.D8S514     Go_11   20.D8S270     Go__8
  7.DOB        C       14.D8S284     Go_12
M>

Fig. 2.7. Toggling and listing fields in Madeline. After the toggle command, field 8. BMI is no longer toggled on for output.

 

Opening a Pedigree Database

Open opens a pedigree database. Madeline's database engine seamlessly opens all supported database types on all supported platforms, allowing you to open FoxPro files on Solaris, SAS transport files on a PC, and so on. The user does not need to tell Madeline the file type. To open an ASCII flat file database, see documentation for the recognize, convert, rectify, transpose and merge commands.

When a pedigree database is opened, Madeline first categorizes fields as core "C", genotype "G", phenotype "P", or null, "*". If genotype fields are present, allele frequencies are estimated from all of the data using gene counting, ignoring family relationships (a in Fig. 2.8). If a map table is already installed and contains a map for markers in the database, the genotype fields are automatically ordered according to the map (b in Fig. 2.8). Pedigrees are reconstructed based on the core information. Madeline performs additional data operations when optional core fields such as AffectionStatusField or DateOfBirthField are included (c in Fig. 2.8). In this example, Madeline marks several apparent dizygotic twinships. Madeline also flags the AffectionStatusField, AFFECTSTAT, with a plus sign, "+", indicating that the categorical levels of AFFECTSTAT will be displayed graphically on the male and female icons in pedigree drawings. Finally, the program displays a summary table showing the count of pedigrees and distribution of individuals by category (d in Fig. 2.8).

M>open '\hold\chr8.dbf'
Calculating allele frequencies for   9. D8S504...                  (a)
	…
Calculating allele frequencies for  20. D8S270...                  (a)
Database "\hold\chr8.dbf" opened with     2,506 records
Core information read in   2.00 seconds
	…
NOTE: 0471-100 and 0471-401 now marked with "a" indicating         (c)
an apparent dizygotic twinship.
NOTE: 0570-401 and 0570-402 now marked with "a" indicating         (c)

an apparent dizygotic twinship.
Pedigrees reconstructed in   0.1780 seconds
Note: genotype fields ordered according to current map             (b)
  1.STUDYID    Co__1    8.BMI        Po__1   15.D8S276     Go__9  
  2.GENDER     Co__2    9.D8S504     Go__1   16.D8S283     Go__4  
  3.FATHER     Co__3   10.D8S550     Go__2   17.D8S285     Go__5  
  4.MOTHER     Co__4   11.D8S258     Go__3   18.D8S260     Go__6  
  5.TWIN       Co__5   12.GATA101F01 Go_10   19.D8S530     Go__7  
  6.AFFECTSTAT C    +  13.D8S514     Go_11   20.D8S270     Go__8  
  7.DOB        C       14.D8S284     Go_12 
 
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total       (d)
-----------------------------  --------- --------- ---------
Pedigrees ...................        958         0       958
Individuals .................      3,626         0     3,626
 + In database ..............      2,506         0     2,506
 |  + Attached ..............      2,115         0     2,115
 |  + Childless spouses .....         13         0        13
 |  + Unattached ............        378         0       378
 + Not in database ..........      1,120         0     1,120

Fig. 2.8. Opening a pedigree database in Madeline. Madeline performs a series of operations when the open command is used to open a pedigree database. See text for explanation.

Example 1: Creating Files for Mendel USERM13 Analysis

Mendel’s USERM13 module uses maximum likelihood methods to calculate allele frequencies, taking family relationships into consideration. All genotyped individuals in a database, including childless spouses, controls and other singleton individuals who are classified as unattached by Madeline can be used in an analysis.

USERM13 requires a locus and pedigree file as input. The locus file will contain allele frequency information calculated by Madeline. The pedigree file will contain the family and genotype information. The write locus file command with the generic mendel keyword creates the locus file (Fig. 2.9). The write pedigree file command with the userm13 keyword creates the pedigree file. As expected, childless spouses and a number of unattached individuals are included in the output file. The detail log file documents which individuals and pedigrees were excluded and why.

M>write locus file to '\analysis\mendel.loc' in mendel format
Locus file "\analysis\mendel.loc" has been written.
M>write pedigree file to '\analysis\userm13.ped' in userm13 format
Writing pedigree data to "\analysis\userm13.ped"
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        810       148       958
Individuals .................      3,469       157     3,626
 + In database ..............      2,351       155     2,506
 |  + Attached ..............      2,107         8     2,115
 |  |  + With data ..........      2,107         0     2,107
 |  |  + Without data .......          0         8         8
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....         13         0        13
 |  + Unattached ............        231       147       378
 + Not in database ..........      1,118         2     1,120

Fig. 2.9. Creating locus and pedigree files for a Mendel USERM13 analysis in Madeline.

 

Example 2: Creating Files for Non-parametric Linkage Analysis in Genehunter

Like USERM13, Genehunter also requires a locus and pedigree file for analysis. In addition to allele frequency information, Genehunter’s locus file will contain map distance information obtained from the previously loaded map database. The generic genehunter keyword is used to specify the locus file format (Fig. 2.10).

M>write locus file to '\analysis\ghnpl.loc' in genehunter format
Locus file "\analysis\ghnpl.loc" has been written.
M>write pedigree file to '\analysis\ghnpl.ped' in genehunternpl format
Creating associated Genehunter control file called "\analysis\ghnpl.ctl"
Writing pedigree data to "\analysis\ghnpl.ped"
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        533       425       958
Individuals .................      3,033       593     3,626
 + In database ..............      2,003       503     2,506
 |  + Attached ..............      2,003       112     2,115
 |  |  + With data ..........      2,003       104     2,107
 |  |  + Without data .......          0         8         8
 |  |  + Marked for exclusion          0         0         0
 |  + Childless spouses .....          0        13        13
 |  + Unattached ............          0       378       378
 + Not in database ..........      1,030        90     1,120

Fig. 2.10. Creating locus and pedigree files for Genehunter non-parametric linkage analysis in Madeline.

 

The genehunternpl keyword specifies that Madeline exclude pedigrees that cannot be used, or do not contribute to a non-parametric linkage analysis in Genehunter. For a parametric linkage analysis, the generic genehunter keyword would have been used, which could have resulted in a different set of exclusions. Since genehunter cannot make use of information from singleton individuals, all unattached individuals are excluded from the output file. Childless spouses are also excluded since they cannot contribute to an analysis in Genehunter.

For the Genehunter format, Madeline also creates a command file ending in a .ctl extension (ghnpl.ctl in the example). This file contains commands and parameter values for running the analysis in Genehunter (Fig. 2.11). Note that the values of Madeline’s internal variables OffEndDistance and EvaluationInterval are automatically inserted in the off end and increment distance commands. Whenever practical, Madeline produces control files in conjunction with data files for running analyses.

haplotype off
score all
ps on
off end 10.000000            <-- value from Madeline’s OffEndDistance variable
increment distance 0.500000  <-- value from Madeline’s EvaluationInterval variable
load \analysis\ghnpl.loc
scan \analysis\ghnpl.ped
total stat
\analysis\ghnpl.npl.ps
\analysis\ghnpl.lod.ps
\analysis\ghnpl.inf.ps

Fig. 2.11. Genehunter command file created by Madeline. This command file can be used directly to run the analysis. The values of Madeline’s internal variables OffEndDistance and EvaluationInterval are automatically inserted in the off end and increment distance commands.

 

Example 3: Excluding Individuals and Creating Files for Siblink Analysis

Madeline’s exclude command marks individuals for exclusion (Fig. 2.12). Marked individuals are retained in output, but without their data, only if they are required to maintain pedigree structure, and are otherwise excluded. A summary table will be produced to show the distribution of excluded individuals.

M>exclude for bmi>=35

223 individuals in 172 pedigrees marked for exclusion as follows:

Individuals ..............        223
 + In database ...........        223
 |  + Attached ...........        212
 |  + Childless spouses ..          1
 |  + Unattached .........         10
 + Not in database .......          0

Fig. 2.12. Excluding Individuals in Madeline. After an exclude command, Madeline produces a summary table showing the distribution of excluded individuals.

 

A single write command suffices to produce all required files for certain formats, such as Siblink (Fig. 2.13) which requires a pedigree file and a control file. Because the locus information is embedded in the control file, a separate write locus file command is not required. The write pedigree file command can always be abbreviated to write, as shown in the following example.

In addition to a table labeled "ACTUAL" showing the actual number of pedigrees and individuals included in the output file, Madeline produces a second table labeled "NUCLEAR FAMILY-BASED" which shows the number of nuclear families (labeled "Pedigrees"), individuals, and sibpairs included in the output file.

M>write to '\dump\asp.ped' in SiblinkAffectedPairs format
Creating associated SIBLINK control/parameter file called "\dump\asp.ctl"
Writing pedigree data to "\dump\asp.ped"

ACTUAL:
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        425       533       958
Individuals .................      1,791     1,835     3,626
 + In database ..............        970     1,536     2,506
 |  + Attached ..............        970     1,145     2,115
 |  |  + With data ..........        958       152     1,110
 |  |  + Without data .......         12       781       793
 |  |  + Marked for exclusion          0       212       212
 |  + Childless spouses .....          0        13        13
 |  + Unattached ............          0       378       378
 + Not in database ..........        821       299     1,120

NUCLEAR FAMILY-BASED:
-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        425       365       790
Individuals .................      1,791     2,045     3,836
 + In database ..............        970     1,746     2,716
 |  + Attached ..............        970     1,355     2,325
 |  |  + With data ..........        958       336     1,294
 |  |  + Without data .......         12       782       794
 |  |  + Marked for exclusion          0       237       237
 |  + Childless spouses .....          0        13        13
 |  + Unattached ............          0       378       378
 + Not in database ..........        821       299     1,120
-----------------------------  --------- --------- ---------
Number of Sibpairs ..........        626       898     1,524
-----------------------------  --------- --------- ---------
M>

Fig. 2.13. Creating files for Siblink analysis in Madeline. A single write command produces the required pedigree and control files. Madeline produces a second table of nuclear family-based statistics for all formats that require decomposition of full pedigrees into nuclear families.

 

In the nuclear family-based statistics, individuals who appear as offspring in one nuclear family and subsequently as founding parents of their own nuclear families are counted twice, thus leading to an apparently greater number of individuals in the second table. There appear to be fewer pedigrees overall in the second table because singleton pedigrees counted in the "ACTUAL" table are not counted in the "NUCLEAR FAMILY-BASED" table. Madeline produces this second table for all output formats requiring the decomposition of full pedigrees into nuclear pedigrees, such as Siblink and Aspex.

Drawing Pedigrees

The toggle command is used to specify the set of fields to appear as labels on the pedigree drawings (Fig. 2.14). Fields can be referred to by either name or number, and a range of fields can be specified using a dash. DrawingFile indicates the name of the Postscript output file which will contain the drawings. Set color off instructs Madeline to use black and white. Set orientation to automatic tells Madeline to automatically decide which orientation is best, and to divide the drawing among several physical pages if necessary. Set PaperMargin to 1.5 instructs the program to leave margins of 1.5 centimeters on all four sides of the paper. Note that dimensions must be specified in centimeters.

M>list fields
  1.STUDYID    Co__1    8.BMI        Po__1   15.D8S276     Go__9  
  2.GENDER     Co__2    9.D8S504     Go__1   16.D8S283     Go__4  
  3.FATHER     Co__3   10.D8S550     Go__2   17.D8S285     Go__5  
  4.MOTHER     Co__4   11.D8S258     Go__3   18.D8S260     Go__6  
  5.TWIN       Co__5   12.GATA101F01 Go_10   19.D8S530     Go__7  
  6.AFFECTSTAT C    +  13.D8S514     Go_11   20.D8S270     Go__8  
  7.DOB        C       14.D8S284     Go_12  
M>toggle output flags for 2-5, bmi, affectstat, 12-20
Note: genotype fields ordered according to current map
M>list fields
  1.STUDYID    Co__1    8.BMI        Po__1   15.D8S276     G      
  2.GENDER     C        9.D8S504     Go__1   16.D8S283     G      
  3.FATHER     C       10.D8S550     Go__2   17.D8S285     G      
  4.MOTHER     C       11.D8S258     Go__3   18.D8S260     G      
  5.TWIN       C       12.GATA101F01 G       19.D8S530     G      
  6.AFFECTSTAT Co__2+  13.D8S514     G       20.D8S270     G      
  7.DOB        C       14.D8S284     G      
M>drawingfile='pedigrees.ps'
M>set color off
M>set orientation to automatic
M>set papermargin to 1.5
M>AffectstatLabel[0]="U"
M>AffectstatLabel[1]="A"
M>draw pedigrees ‘0001’-‘0005’,'0472','0570'
Drawing page 1 of 1 page for pedigree 0001...
Drawing page 1 of 1 page for pedigree 0002...
Drawing page 1 of 1 page for pedigree 0003...
Drawing page 1 of 1 page for pedigree 0004...
Drawing page 1 of 1 page for pedigree 0005...
Drawing page 1 of 1 page for pedigree 0472...
Drawing page 1 of 1 page for pedigree 0570...
M>

Fig. 2.14. Drawing Pedigrees in Madeline.

 

Whenever a categorical field from a table is flagged for graphical display on a pedigree drawing, Madeline associates a labels array with the field. The user may designate a short label for each level of the categorical variable. The name of the labels array is simply the name of the categorical field with the word "label" appended to the end.

In this example, AFFECTSTAT was detected as the AffectionStatusField when the database was opened and automatically flagged for icon display by the program (one can also manually flag categorical fields using the toggle icon flag command). The name of the labels array is AffectstatLabel.

Madeline assigns default labels for each level of a categorical variable. The default labels will be either the values that the categorical variable takes at each level (such as 0, 1, 2, ... etc.) or sequential letters of the alphabet. The list command can be used to view the values in the array. Often one will want to assign different labels: here, unaffected individuals will be labeled with "U" and affected individuals with "A".

A range of pedigrees may be specified as a parameter to the draw command by using a dash to separate the starting and ending pedigrees. Individual pedigrees may be separated by commas. Note that pedigree IDs are string values that must be enclosed in quotes.

Because the draw command followed after a write command, the pedigree drawings are annotated to show which individuals had data and were included in the Siblink output (Fig. 2.15). As expected, only affected individuals and their parents were included.

Fig. 2.15. Annotated pedigree drawing produced by Madeline. Individuals included in output are annotated with "INCLUDED", those contributing data are annotated with "HAS DATA".

 

2.15. Ending a Madeline Session

Goodbye (Fig. 2.16) is equivalent to quit. It terminates the current Madeline session.

M>goodbye
Releasing resources ...
Goodbye!

Fig. 2.16. Ending a Madeline Session

 


Section 3
Commands

Introduction to Commands

This section describes Madeline’s commands. Commands are presented in alphabetical order. A bold heading shows the name of each command. A second bold heading shows the syntax of the command. Note the following conventions (Table 3.1).

Table 3.1. Command syntax conventions used in this document.
Symbol Description
[ ] Square brackets indicate optional items in the syntax. For example, DRAW PEDIGREE[S] means that DRAW PEDIGREE and DRAW PEDIGREES are both valid.
< > Angled brackets indicate an expression that can be evaluated by Madeline -- see below:
<cXXX> An expression beginning with a lower-case c indicates a character or string expression. For example, DRAW PEDIGREE <cFamilyID1> means that <cFamilyID1> must be in the form of a string, such as "0341". Issuing draw pedigree 0341 would result in an error.
<nXXX> An expression beginning with a lower-case n indicates a numeric expression. List map for chromosome=<nChrNo> means that <nChrNo> must be a number like 23. List map for chromosome=23 would succeed: list map for chromosome="X" would fail.
<LXXX> An expression beginning with an upper-case L indicates a logical expression that evaluates to either _true (1), or _false (0). This is usually an expression containing an equality or inequality operator, or a series of such operators joined by and or or. For example, view for <Lexpr> indicates that the view command requires a logical expression following the word for: view for studyage<=35 is a valid example of this command.
<Field_i>
<InternalArray>
etc.
Other expressions in angled brackets, such as <Field_i> or <InternalArray>, represent database field variables or Madeline's internal variables or arrays. For example, toggle output flag for <field_i> means that the name of a field is expected to follow the word for: toggle output flag for D20S889 is a valid example.
| A bar indicates the word "or", indicating that either the option preceding or following the bar is valid. For example list fields|<InternalArray>|map indicates that list fields, list CharacterMissingValue, and list map are all valid variants of the list command.

 

A description of the command follows the syntax heading, with at least one example showing how to use the command.

 

BANNER
BANNER

Displays program banner. See: HELLO, STATUS.

M>banner

MADELINE Version  0.910
Copyright (c) 1999 by Edward H. Trager 
and the FUSION Study Group
(Finland-United States Investigation of NIDDM Genetics Study),
University of Michigan
Ann Arbor, Michigan, USA
M>

 

CLEAR
CLEAR EXCLUSIONS

Clears exclusion flags from all individuals previously marked for exclusion using the exclude command. To clear exclusion flags from only a subset of individuals, use the unexclude command. See: EXCLUDE, UNEXCLUDE.

M>exclude for bmi>=35
213 individuals in 162 pedigrees marked for exclusion as follows:
Individuals ..............        213
 + In database ...........        213
 |  + Attached ...........        212
 |  + Childless spouses ..          1
 |  + Unattached .........          0
 + Not in database .......          0
M>clear exclusions
M>

 

CONVERT
CONVERT COMMA|TAB|<OTHER> DELIMITED FILE <INPUT_FILE> [TO <OUTPUT_FILE>]

Convert converts a comma-, tab- or other-delimited file to a space-delimited, column-aligned file that can be read by the recognize command. The keyword comma or tab can be used to specify comma- or tab-delimited files, respectively. Alternatively, you can specify the delimiter within single or double quotes.

If an output file is not specified, Madeline will create an output file having the same name as the input file, but with a ".mod" (i.e., modified) extension at the end. See: RECOGNIZE.

M>convert "*" delimited file "mydata.stars" to "mydata.dat"
Converting "mydata.stars" to "mydata.dat"
3547 lines were written.      
M>

 

DRAW
(1) DRAW PEDIGREE[S] <cFamilyID1>[,<cFamilyID2>[,<cFamilyIDa> - <cFamilyIDz>]]
(2) DRAW PEDIGREE[S] FOR <LExpression>

Draws pedigrees. Specify one or more pedigree (family) IDs separated by commas, or an alphabetically increasing range of pedigrees IDs with a dash. Be sure to enclose pedigree IDs in quotes. Alternatively, a subset of pedigrees in which one or more individuals match a query expression may be drawn using draw pedigrees for <LExpression>.

Orientation, paper size, margins, and color vs. black-and-white printing may be set using set commands. The left-to-right sort order of siblings within sibships and multiple spouses connected to a single spouse may be explicitely set using the sort command.

Drawings are created using efficient Adobe Postscript language routines and document structuring conventions. Output, which may consist of one to hundreds of drawings, is sent to the file named in DrawingFile. A Postscript viewing application such as Ghostview or GV on Unix/Linux or GSView on Windows is required for on-screen viewing of drawings. The name and path of the Postscript viewing application is specified in PostscriptViewer: this can be included in the autorun.bat file.

Madeline v. 1.0 can draw most single- and multiple-founder pedigrees. When DividedPages is set on (the default), subtrees in a pedigree defined by each founding ancestral group are printed on separate virtual pages. A founding ancestral group consists of an ultimate founder and his or her one to many spouses. When DividedPages is off, the entire pedigree will be drawn on a single virtual page, regardless of structural complexity. DividedPages has no effect on simple pedigrees which originate with a single founding group. For complicated pedigrees, the DividedPages option separates a pedigree into several more easily-viewed sections.

The options for orientation are landscape, portrait, automatic, and MultiPage. When orientation is set to portrait or landscape, pedigree drawings are scaled to fit the dimensions of the physical page. The scaling factor required to reduce large pedigrees to small pages may result in loss of legibility (or new corrective lenses!) --in these cases automatic, or MultiPage, is preferred.

Currently, the automatic and MultiPage options are identical. Automatic is preferred over portrait or landscape in most cases. When automatic is selected, Madeline chooses the best orientation based on the dimensions of the virtual drawing. If rescaling to fit a single physical page is likely to result in reduced legibility, the program inserts a Postscript routine for printing the drawing across two or more physical pages. Madeline automatically selects the number and orientation of physical pages that requires the least amount of rescaling.

Madeline produces a schematic index for assembling the individual pages after printing. The program may use up to 5 pages across and 5 pages down, or a total of not more than 25 pages, for printing a drawing in automatic mode. Normally only 2 to 4 pages are required for large drawings.

Due to the way that Madeline's Postscript routines manage the splitting of a large drawing for printing across multiple physical pages, Postscript viewing applications like Ghostview or GSView will generally only display the last section, or the viewer may appear to cycle through the individual pages of a split drawing without pausing. This limitation does not impair the correct printing of such drawings on a Postscript printer.

M>draw pedigrees '0001','0033','0317','0374'-'0376'
Drawing pedigree 0001, P0001006's subtree (page 1 of 2) ...
Printing drawing scaled to 0.91.
Drawing pedigree 0001, !EM89WP!'s subtree (page 2 of 2) ...

Drawing pedigree 0033, !FVQURR!'s subtree (page 1 of 1) ...
Printing drawing scaled to 0.94.

Drawing pedigree 0317, !A7Z3FP!'s subtree (page 1 of 2) ...
Printing virtual portrait drawing scaled to 1.02 on
4 physical pages wide by 2 physical pages tall.
(You may not be able to view entire drawing in Postscript viewing application).
Physical page print order index:

[5][6][7][8]
[1][2][3][4]

Drawing pedigree 0317, !9UE3V6!'s subtree (page 2 of 2) ...
Printing drawing scaled to 0.81.

Drawing pedigree 0374, P0374021's subtree (page 1 of 3) ...
Printing drawing scaled to 0.77.

Drawing pedigree 0374, P0374015's subtree (page 2 of 3) ...
Printing virtual landscape drawing scaled to 0.98 on
2 physical pages wide by 1 physical page tall.
(You may not be able to view entire drawing in Postscript viewing application).
Physical page print order index:

[1][2]

Drawing pedigree 0374, P0374018's subtree (page 3 of 3) ...
Printing virtual landscape drawing scaled to 0.98 on
2 physical pages wide by 1 physical page tall.
(You may not be able to view entire drawing in Postscript viewing application).
Physical page print order index:

[1][2]

Drawing pedigree 0375, P0375011's subtree (page 1 of 1) ...
Printing drawing scaled to 0.94.

Drawing pedigree 0376, P0376007's subtree (page 1 of 1) ...
Calling "gs madeline.ps" ...
M>

&NBSP;

Up to ten mates of a single individual may be drawn. At the time of this writing, the drawing routines were being revised to provide better support for drawing consanguinous loops and other complicated pedigree structures.

EDIT
EDIT <cFileName>

Edit a file using the editor specified in the FileEditor variable. This allows you to edit files without having to exit Madeline.

M>FileEditor="emacs"
M>edit "datafile.ped" <-- Madeline calls emacs to edit the file

 

EXCLUDE
EXCLUDE [FAMILIES] FOR <LExpression>

Mark individuals for exclusion. If exclude families is used, all individuals who match the criteria and their spouses and descendants will be excluded. See: CLEAR, UNEXCLUDE

M>exclude for _famid="0049"
0049-100 has been marked for exclusion
0049-401 has been marked for exclusion
0049-701 has been marked for exclusion
0049-801 has been marked for exclusion
0049-802 has been marked for exclusion
M>

 

In this example, _famid is a reference to the family ID. You can dereference _famid even when no FamilyIDField is present in the database (as is permitted for FUSION 1 data).

GO
GO <nRecNo>

Go to a specified record, nRecNo, in a database. In Madeline, records are numbered from 0 to n-1 where n is the total number of records in the table (inserted parents do not contribute to this count, and you cannot go to the non-existent table record of an inserted parent).

M>show studyid
"0001-100"
M>go 197
M>show studyid
"0052-100"
M>view record
  ...   ...
M>

 

GOODBYE
GOODBYE

Terminate the current Madeline session. Equivalent to the quit command. See: QUIT.

M>goodbye
Releasing resources ...
Goodbye!

 

HELLO
HELLO

Displays the current setting of Madeline’s boolean state flags and other status information. Identical to the status command.

M>hello
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | ENGLISH   | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.00 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | 16:37 Monday, October 4, 1999           |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>

 

HELP
HELP

Invokes HTML-based help. Madeline will invoke the world wide web browser named in WebViewer with the URL named in WebAddress. The default WebViewer is "netscape". The default WebAddress is the current URL of the Madeline online documentation. Help assumes that the quoted-string provided as an argument is a valid Madeline token (i.e., command, variable, array, or other keyword recognized by the interpreter) or other valid bookmark in the online documentation and simply passes it as part of the URL:

M>quiet
M>show WebViewer
"netscape"
M>show WebAddress
"www.sph.umich.edu/group/fusion/programs/madeline.html"
M>help "genehunter"
M>

 

The browser will locate any valid bookmark found in the online documentation, including section and topic headings. For example, help "tutorial" would bring up the Tutorial section of the online documentation.

See Lookup if you need to determine the name or correct spelling of a command, variable,or other token recognized by Madeline's interpreter.

LIST
(1) LIST FIELDS
(2) LIST <InternalArray>
(3) LIST MAP [FOR CHROMOSOME=<nChrNo>]

Shows current values in a list of items. The list may consist of:

The command has the three forms shown above. Examples of each form of the command follow:

(1) LIST FIELDS

M>open "chr8.dbf"
   . . .
M>toggle output flag for D8s270-GATA101F01
M>list fields
  1.FAMID      Co__1   10.D8S504     Go__1   19.D8S1757    Go_10
  2.STUDYID    Co__2   11.D8S550     Go__2   20.D8S270     G
  3.SEX        Co__3   12.D8S258     Go__3   21.D8S1778    G
  4.FATHER     Co__4   13.D8S1771    Go__4   22.D8S276     G
  5.MOTHER     Co__5   14.D8S1820    Go__5   23.GATA101F01 G
  6.TWIN       Co__6   15.D8S283     Go__6   24.D8S514     Go_11
  7.BMI        Po__1   16.D8S285     Go__7   25.D8S284     Go_12
  8.NAFFECTE   Co__7+  17.D8S260     Go__8   26.D8S534     Go_13
  9.STUDYAGE   Po__2   18.D8S530     Go__9   27.D8S1836    Go_14
M>

 

(2) LIST <InternalArray>

M>//
M>// cmv is the internal array of missing value indicators for 
M>// character/string variables:
M>//
M>list cmv
CMV has 5 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
M>

 

(3) LIST MAP [FOR CHROMOSOME=<nChrNo>]

M>load 'k:\emap\emap.dbf'
Marker maps based on k:\emap\emap.dbf are now installed.
Note: genotype fields ordered according to current map
Field ordering now set based on k:\emap\emap.dbf.
M>list map for chromosome=17
Marker Name Ch Or Position
----------- -- -- --------
D17S945     17  1    0.0000
D17S1803    17  2    8.8000
D17S1871    17  3   21.6000
D17S798     17  4   27.0000
D17S791     17  5   40.2000
D17S809     17  6   48.3000
D17S1835    17  7   57.2000
D17S1351    17  8   74.3000
D17S802     17  9   88.0000
D17S1806    17 10   93.2000
M>

 

Column headings in the listing refer to marker name, chromosome number (Ch), ordinal rank (Or), and position in centiMorgans.

LOAD
LOAD

Load a map database. The map database must contain fields for chromosome number, marker name, ordinal position of the marker in the map for the chromosome, and positional distance in centiMorgans (Table 3.1).

Table 3.1. Map Database Fields in Madeline.
Name of Variable
Storing Field Name
Default Value Description
ChromosomeField "CHROMOSOME" Chromosome
OrdinalField "ORDINAL" Ordinal position (or rank) of the marker on the map for this chromosome
MarkerField "MARKERNAME" Name of the marker
PositionField "POSITION" Map position from p terminus in centiMorgans

 

The map database can contain maps for any number of chromosomes, but may contain only one map for each chromosome. As soon as Madeline detects that a map database has been installed, genotype fields in an open pedigree database will automatically be placed in map order. When possible, execute load prior to any open command. When a pedigree database is subsequently opened, genotype fields will then automatically appear in map order from the outset.

M>load 'k:\emap\emap.dbf'
Marker maps based on k:\emap\emap.dbf are now installed.
M>list map for chromosome=23
Marker Name Ch Or Position
----------- -- -- --------
DXS7100     23  1    0.0000
DXS7110     23  2   33.4000
DXS1214     23  3   45.1000
DXS993      23  4   63.7000
DXS1055     23  5   72.5000
DXS991      23  6   80.7000
DXS986      23  7   90.1000
DXS8096     23  8  105.5000
DXS8072     23  9  146.2000
DXS8011     23 10  180.9000
M>open "chrx.dbf"
  . . .
M>

 

 In the FUSION map database tables, the OrdinalField is called "POSITION" and the PositionField is called "KOSAMBICM". Therefore with FUSION data, be sure to include the following lines in your autorun.bat file or elsewhere as applicable:

OrdinalField="POSITION"
PositionField="KOSAMBICM"

 

LOOKUP
LOOKUP <sExpr>

Lookup a command or keyword by supplying a string containing the first few letters of a command or keyword:

M>lookup 'g'
GENDERFIELD is an internal variable. Its current value is: "SEX".
GENEHUNTER is a keyword.
GENEHUNTERNPL is a keyword.
GENEHUNTERQTL is a keyword.
GENERIC is a keyword.
GENOTYPE is a keyword.
GO is a command.
GOODBYE is a command.
M>

 

MERGE
MERGE <INPUT_FILE_1>[,<INPUT_FILE_2>,<INPUT_FILE_3>, ... <INPUT_FILE_N>]
[TO <OUTPUT_FILE>]
[IN ALPHA | PHYSICAL | <USER_DEFINED_FILE> ORDER]

Merges any number of input tables to an output table. All input tables must contain identically-named FamilyIDField and IndividualIDField names which are used as the keys for constructing records in the output table.

Output is in Madeline's Mbase format which consists of a rectangular ASCII data table and an associated binary header file. The binary header file usually has the same name as the ASCII table, but with a .mfh extension.

The TO <OUTPUT_FILE> clause is optional. When present, data are written to the specified output file and an associated header is created with a .mfh extension. When absent, Madeline creates a file name based on the name of the first table by adding a .mrg extension to the end. The associated binary header will have a .mfh extension. In the event that a .mfh file already exists, Madeline uses an extension of .cfh instead.

The IN ALPHA | PHYSICAL | <USER_DEFINED_FILE> ORDER clause is also optional. When absent, the default ALPHA ORDER is used. When ALPHA ORDER is used, fields from all input tables are arranged alphabetically in the output table. When PHYSICAL ORDER is specified, fields in the output table are arranged in the same order that they appear in the source tables starting with the first table. Even though the key index fields FamilyIDField and IndividualIDField are present in every input table, they only appear once in the output table, as you would expect.

As an alternative to ALPHA and PHYSICAL order, you can specify the order of fields precisely by creating a text file containing the field names in the order you want separated by white space (i.e., spaces and/or carriage returns). For example, you can create a text file containing the marker fields listed in genetic map order (along with any other fields from the source tables). Assuming this file was called "map.order", the clause IN "map.order" ORDER would instruct Madeline to read field order from this file. When PHYSICAL or <USER_DEFINED_FILE> ORDER are used, make sure that the only fields duplicated in all source tables are the key index fields, FamilyIDField and IndividualIDField. Other fields cannot appear multiple times. Be especially careful with core fields like GenderField, FatherIDField, and MotherIDField which may quite possibly appear in more than one table. If it is not possible to remove non-index fields that appear multiple times, simply rename them so that name conflicts do not occur.

When ALPHA ORDER is used, fields that appear more than once are not a problem and will appear only once in the output table. The field type, width, and numeric precision of duplicate fields are based on the first table in which the fields appear. The data for such fields are also pulled from the first table in which the fields appear. As you would expect, tables are merged horizontally or side-by-side. Note that Madeline also permits you to merge tables vertically, but only in the case where ALPHA ORDER is used. For example, if you had two tables containing identical fields but with one containing one set of individuals in your study, and the other containing another set of individuals, MERGE ... IN ALPHA ORDER will permit you to join the two tables vertically. The restriction that fields be sorted in alphabetic order is necessary so that Madeline can map individual field data correctly even though it appears that field names are "duplicated". After a "vertical" merge, one can always do a subsequent MERGE in which a preferred field order is specified -- Madeline allows you to "merge" a single table in order to redefine field order!

Regardless of the setting of ORDER specified, any subsets of individuals who do not appear in all tables will have missing values for fields extracted from tables in which those individuals did not appear.

Table merges are done in memory. For large data sets, Madeline will use a lot of memory. On modern workstations, this should rarely be an obstacle. Madeline does not use table indexes on disk, but instead creates its own indexes in memory. If problems do occur with large tables, it may be necessary to merge files in stages.

//
// merge uses FamilyIDField and IndividualIDField
// as the keys for merging:
//
M>FamilyIDField    ="FAMID"
M>IndividualIDField="INDIVIDUAL"
M>merge 't1.dbf','t2.dbf','t3.dbf','t4.dbf','t5.dbf' to 'out.dat' in 'map.order' order
  Building field and record trees ...
  Writing 2711 records to "out.dat"
  5 databases merged to "out.mfh" in 8.5 seconds
M>open 'out.mfh'
  ... ...
M>

 

Merge is part of Madeline's arsenal of commands designed to ease the task of manipulating flat files. Also see: CONVERT, RECTIFY, and TRANSPOSE.

OPEN OPEN <cDatabaseTableName>

Open opens a pedigree database. Madeline currently supports the following database table formats:

Madeline’s database engine detects operating system and file byte-ordering at run time, permitting database files from PCs to be opened on Unix workstations, and vice-versa. The user does not need to tell Madeline the file type. Madeline does not make use of associated index files, such as .cdx files used by FoxPro. To open an ASCII flat file database, see RECOGNIZE.

Internally, when Madeline opens a database, the following events occurs:

  1. File is opened and buffered.
  2. Fields are categorized as core "C", genotype "G", phenotype "P", or "*" null.
  3. If genotype fields are present, allele frequencies are estimated from all of the data using gene counting ignoring family relationships. If a map database is already installed and contains a map for markers in the database, the genotype fields are automatically ordered according to the map.
  4. Pedigrees are reconstructed from core information.
  5. Individuals are categorized by Madeline based on whether they are in the database or dummied-in, attached or unconnected. When FUSION-compliant IDs are used, unconnected spouses without children are joined to their mates and classified as childless spouses.
  6. Madeline performs additional data operations when optional core fields such as AffectionStatusField, DateOfBirthField, or DateOfDeathField are included.
  7. Madeline displays a summary table showing the count of pedigrees and distribution of individuals by category.

M>open '\hold\chr8.dbf'
Calculating allele frequencies for   9. D8S504...
	...
Calculating allele frequencies for  20. D8S270...
Database "\hold\chr8.dbf" opened with     2,506 records
Core information read in   2.00 seconds
	...
NOTE: 0472-100 and 0472-401 now marked with "a" indicating
an apparent dizygotic twinship.
NOTE: 0570-401 and 0570-402 now marked with "a" indicating
an apparent dizygotic twinship.
Pedigrees reconstructed in   0.1780 seconds
Note: genotype fields ordered according to current map
  1.STUDYID    Co__1    8.BMI        Po__1   15.D8S276     Go__9  
  2.GENDER     Co__2    9.D8S504     Go__1   16.D8S283     Go__4  
  3.FATHER     Co__3   10.D8S550     Go__2   17.D8S285     Go__5  
  4.MOTHER     Co__4   11.D8S258     Go__3   18.D8S260     Go__6  
  5.TWIN       Co__5   12.GATA101F01 Go_10   19.D8S530     Go__7  
  6.AFFECTSTAT C    +  13.D8S514     Go_11   20.D8S270     Go__8  
  7.DOB        C       14.D8S284     Go_12 
 

-----------------------------  --------- --------- ---------
Pedigrees and Individuals       Included  Excluded     Total
-----------------------------  --------- --------- ---------
Pedigrees ...................        958         0       958
Individuals .................      3,626         0     3,626
 + In database ..............      2,506         0     2,506
 |  + Attached ..............      2,115         0     2,115
 |  + Childless spouses .....         13         0        13
 |  + Unattached ............        378         0       378
 + Not in database ..........      1,120         0     1,120
M>

 

QUIET
QUIET

Specifies that "detail" messages are not shown on the screen. Summary log messages still appear on the screen, and both detail and summary messages are still written to the .dtl and .log files, respectively. See: SILENT, VERBOSE.

M>quiet
Madeline is now in quiet mode.
M>

 

QUIT
QUIT

Terminates the program session. Equivalent to goodbye. See: GOODBYE.

M>quit
Releasing resources ...
Goodbye!

 

RECOGNIZE
RECOGNIZE <INPUT_FILE> [TO <BINARY_HEADER_FILE_NAME>]

Recognize a space-delimited, column-aligned rectangular ASCII data file (i.e., a "flat file") as a database table by creating a binary header file that contains key information about the number of records, number of columns, column names, column data types, and so on. By default, Madeline adds ".mfh" to the name of the input file to create the name of the output file. However, you can specify any other name for the binary header file using the to clause. If you plan on using the recognize command, be sure to read all of the following documentation very carefully!

If necessary, a flat file in the appropriate space-delimited column format can usually be created using Madeline's convert or rectify commands. In fact, recognize will automatically call rectify if necessary -- if this does occur, it is usually a good idea to investigate why rectify was called and to run rectify manually on a data file with all field column header information stripped out.

After the data are in the correct rectangular format, a minimal header containing the column names and data types needs to be added at the top of the file, as described below. If either convert or rectify is required, don't add a header until after running these commands!

When stored in a computer, a database table has two parts:

  1. A rectangular array of data.
  2. A header that defines the column names, types, number of records, file size, file type, and other key information.

An ASCII flat file that contains a rectangular array of data with spaces (not tabs or commas) separating the aligned columns can be considered the simplest form of a database table:

0001 0001-100 F 0001-200 0001-300  23.45  14.2  141/142
0001 0001-200 M .        .             .  10.2  138/141
0001 0001-300 F .        .         78.21  15.2  140/142
.    .        . .        .             .     .  .
.    .        . .        .             .     .  .
.    .        . .        .             .     .  .
.    .        . .        .             .     .  .

 

The problem with this "database" format is that it has no header! There are no records to establish what the columns mean, how many columns there are, or how many records are in the table.

Madeline tackles this problem by constructing a separate binary header file which is used to open the table indirectly. The binary header file is built by the recognize command and usually has a ".mfh" (i.e., Madeline Flat file Header) extension. The combination of a ".mfh" binary header and an ASCII flat file table is referred to as the Madeline Database, or Mbase file format.

Madeline can determine a lot of key information just by examining the flat file table itself. From a table with unlabeled columns (such as illustrated above), Madeline can:

Always determine:

Almost always determine:

and often determine:

The ability to determine the gender, individual, father, and mother ID fields provides a fruitful start to deciphering a file with unmarked columns. Still, there is no way for Madeline to know what all columns in an unmarked file represent. In the absence of additional information, Madeline provides default names based on whether the columns contain character, numeric, or date data. This is usually not what you want, unless you are in a great hurry!

Madeline provides the opportunity for the user to provide column names and, if necessary, column data types, at the top of the flat file before the first record. When present, the recognize command reads this minimal flat file "header" before parsing the rectangular data array. Once you are confident that you have the data in the correct rectangular format, it is highly recommended that you add a minimal header to the file, as described below.

The ONLY information that should be provided about each field in the header is:

  1. Column name -- Up to 10 letters or digits without spaces can be used to represent field names.
  2. Column type -- Data type of the column.

The following set of single-letter options is permitted for designating column type:

The column type designation is optional. If not provided, Madeline will make a determination. The program determines data type by looking at what characters are present in a column and how they are formatted. For example, if a dash "-" or "/" slash occurs in a column, the column cannot be numeric: additional processing is used to decide if the column contains dates or genotypes, or something else. If you are uncertain whether Madeline will make the correct determination, then provide column types.

Column name and type must be separated by spaces and can appear on any number of lines. The only requirement is that the lines of the header must be shorter in length than those of the records. This is how the program knows which lines are header lines and which are data records.

Some core fields such as FamilyIDField and IndividualIDField must be treated as "C" character fields, even though the IDs consist of only numbers. So, at a minimum, it may be necessary to supply the field types of these core fields. Here is an example:

FAMID C
INDIVIDUAL
GENDER C
FATHER MOTHER   STUDYAGE  GLUCOSE  D20S119

0001 0001-100 F 0001-200 0001-300  23.45  14.2  141/142
0001 0001-200 M .        .             .  10.2  138/141
0001 0001-300 F .        .         78.21  15.2  140/142
.    .        . .        .             .     .  .
.    .        . .        .             .     .  .
.    .        . .        .             .     .  .
.    .        . .        .             .     .  .

 

The spacing and arrangement of the column labels and data type indicators in the header above is immaterial --except that all lines of the header are shorter than the data records. The INDIVIDUAL, FATHER, and MOTHER IDs contain dash characters, so they will automatically be interpreted as "C" character fields without being marked so (the program can see that they are not date fields). However, FAMID consists entirely of digits and would be interpreted as "N" numeric if it were not marked "C". The above example is now ready to be processed by the recognize command.

There are a couple of special situations to pay attention to when constructing the flat file header.

First, only a gender field containing character string labels such as "M" and "F" or "male" and "female" should be designated as being of type "X". You can also designate such a gender field with the more generic "C" (as was done above), or not designate any type at all, and Madeline will figure it out for you.

Secondly, Madeline provides the opportunity to specify a special column type of "A" for allele fields. Allele fields are present in file formats such as the Genehunter format where two contiguous space-delimited columns contain the allele labels that taken together represent the genotype for one marker. Since two columns are present, in the flat file header you must show same column name twice -- once for the first allele column, and once for the second allele. The column names should be the marker names. For example:

FAMID C
STUDYID C
FATHER C
MOTHER C
SEX X
NAFFECTE N
D20S100 A
D20S100 A
D20S200 A
D20S200 A

0001 0001-200                   M .  0 0  0 0
0001 0001-300                   F .  0 0  0 0
0001 0001-100 0001-200 0001-300 M 1  1 1  4 5
0001 0001-401 0001-200 0001-300 F 0  1 2  5 5
0001 0001-402 0001-200 0001-300 F 0  2 2  4 4
0001 0001-403 0001-200 0001-300 F 1  1 2  4 4
0001 0001-404 0001-200 0001-300 M 0  2 2  4 4
0001 0001-408 0001-200 0001-300 M 1  2 2  4 5
0001 0001-409 0001-200 0001-300 M 1  1 1  4 4
  .      .        .        .    . .  . .  . .
  .      .        .        .    . .  . .  . .
  .      .        .        .    . .  . .  . .

 

Errors will result with unpaired "A" fields, so be careful! Madeline will combine the paired allele fields into genotype fields, as shown below. The column "Start" and "End" values confirm that Madeline has merged pairs of columns:

M>recognize "flat.test"
Recognizing file "flat.test" to "flat.test.mfh" ...
Skipping a total of 11 lines at top.
There are 10 non-empty header lines and 27 data lines.
Data records are 45 bytes long.

The gender field has been identified and will appear in the ".run" file

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. FAMID           1     4     4     0     1 C
  2. STUDYID         6    13     8     0     1 C
  3. FATHER         15    22     8     0     1 C
  4. MOTHER         24    31     8     0     1 C
  5. SEX            33    33     1     0     1 X
  6. NAFFECTE       35    35     1     0     2 N
  7. D20S100        38    40     3     0     2 G
  8. D20S200        43    45     3     0     0 G

Madeline recognition header written.
Type 'open "flat.test.mfh" ' to open the database.

The template batch file "flat.test.run" has been created.

NOTE: The ".run" file contains commands and parameters to assist
      you in opening a flat file database, but generally requires
      editing before use.

M>

 

After recognizing a file, the ".mfh" file can be used as the parameter to the open, load, transpose, or merge commands just like any other table.

In addition to the ".mfh" file, Madeline creates a template batch command file with a ".run" extension. This command file contains parameter settings and commands to open the flat file database. For example, the ".run" file will specify the names used for the GenderField, IndividualIDField, FatherIDField, and MotherIDField if Madeline was successful at identifying these.

The ".run" file must be edited by the user. Madeline cannot identify certain information automatically. For example, blank fields and fields containing a single dot "." are always treated as missing values. However, Madeline cannot determine if other values are also used to represent missing data. In all data tables, arbitrary values are used to represent categorical states such as affected and unaffected: the program must be told about these values as well.

The ".run" file provides a template for opening a pedigree table. The recognize command can also recognize a map or marker table: in these latter instances, more modification of the ".run" file may be required.

M>recognize 'flat.dat'
Recognizing file "flat.dat" to "flat.mfh" ...
Skipping a total of 7 lines at top.
There are 6 non-empty header lines and 7046 data lines.
Data records are 319 bytes long.

The gender field has been identified and will appear in the ".run" file

 # . Field Name  Start End   Length Prec. Space Type
---- ----------- ----- ----- ------ ----- ----- -----
  1. FAMID           1     4     4     0     1 C
  2. STUDYID         6    13     8     0     1 C
  3. SEX            15    15     1     0     1 X
  4. FATHER         17    24     8     0     1 C
  5. MOTHER         26    33     8     0     1 C
  6. TWIN           35    35     1     0     1 C
  7. BMI            37    41     5     0     1 N
  8. NAFFECTE       43    56    14     8     1 N
  9. STUDYAGE       58    67    10     4     1 N
 10. D8S504         69    75     7     0     7 G
 11. D8S550         83    89     7     0     7 G
 12. D8S258         97   103     7     0     7 G
 13. D8S1771       111   117     7     0     7 G
 14. D8S1820       125   131     7     0     7 G
 15. D8S283        139   145     7     0     7 G
 16. D8S285        153   159     7     0     7 G
 17. D8S260        167   173     7     0     7 G
 18. D8S530        181   187     7     0     7 G
 19. D8S1757       195   201     7     0     7 G
 20. D8S270        209   215     7     0     7 G
 21. D8S1778       223   229     7     0     7 G
 22. D8S276        237   241     5     0     9 G
 23. GATA101F01    251   257     7     0     7 G
 24. D8S514        265   271     7     0     7 G
 25. D8S284        279   285     7     0     7 G
 26. D8S534        293   299     7     0     7 G
 27. D8S1836       307   313     7     0     6 G

Madeline recognition header written.
Type 'open "flat.mfh" ' to open the database.

The template batch file "flat.run" has been created.

NOTE: The ".run" file contains commands and parameters to assist
      you in opening a flat file database, but generally requires
      editing before use.

M>

 

RECTIFY
RECTIFY <INPUT_FILE> [TO <OUTPUT_FILE>]

In order for Madeline to use a flat file table, it must contain aligned columns that are delimited by space characters. Extra space characters are used to pad column widths so that the columns always line up. In addition, the table must be truly rectangular, which means that all data lines must be of equal length.

Embedded tab characters are usually replaced by space characters when a file is viewed in an editor or word processor on screen, leading to the false impression of a rectangular array with even line lengths, when in fact lines are actually of varying lengths. Extra (but invisible!) space or tab characters after the last column in a table can also result in varying line lengths.

The rectify command replaces all embedded tab characters with the appropriate number of space characters, and trims or pads lines so that all records are of equal length. Rectify contains an algorithm for determining what the tab interval in the original software used to create or edit the flat file must have been to achieve column alignment.

In some cases, rectify will report that a unique tab interval could not be determined. This may simply mean that the source file contained tab characters at the same horizontal offsets for every record in the file (a tab-delimited file is an one example of such a file). Replacement of the tabs by any fixed number of spaces will always result in aligned columns in output. Madeline will always use one space, the least number required, for this type of file.

In some cases, Madeline may not be successful at rectifying the file without further editing by the user. Sparse data files containing many missing values are the most likely to be troublesome, especially if the missing values are represented by blank entries or single dots.

If an output file is not specified, Madeline will create an output file having the same name as the input file, but with a ".mod" extension at the end.

It is possible for rectify to be called from the recognize command if recognize determines that data records are of varying lengths. If this happens, a common problem that can occur is that the header lines which the user has inserted into the data file for recognize to use will also be padded out the length of the data records. When this happens, Madeline will no longer be able to discriminate where the header lines end and the data records begin. The simple solution is to edit the file to manually remove extra trailing spaces from any header lines present in the file.

M>rectify "mydata.txt"
Rectifying "mydata.txt" to "mydata.mod"
Tab interval = 4
2567 lines were written.
M>

 

RUN
RUN

Load and run a batch file. Batch files can themselves contain nested run commands. When commands from a batch file are being processed, Madeline displays the "M-Batch>" prompt in place of the "M>" prompt, and returns to the "M>" prompt after successful completion of batch commands. Madeline goes into quiet mode whenever a batch file is invoked with run: issue verbose after run if you want to return to verbose mode.

Contents of load.bat:
run ‘task.bat’
Contents of task.bat:
quiet
open ‘\test\thursday.dbf’
write to ‘\test\mendel.ped’ in mendel format
load ‘k:\emap\emap.dbf’
write to ‘\test\siblink.ped’ in siblink format

Script which runs the batch files:

M>run ‘load’bat’
M-Batch> Running batch file "load.bat"... ***
M-Batch> run ‘task.bat’
M-Batch> Running batch file "task.bat"... ***
M-Batch> quiet
Madeline is now in quiet mode
M-Batch> open ‘\test\Thursday.dbf’
         ...        ...
M-Batch> write to ‘\test\mendel.ped’ in mendel format
...        ...
M-Batch> load ‘k:\emap\emap.dbf’
Marker maps based on k:\emap\emap.dbf are now installed
M-Batch> write to ‘\test\siblink.ped’ in siblink format
...        ...
M-Batch>
M-Batch> Finished batch file "task.bat"... ***
M-Batch>
M-Batch> Finished batch file "load.bat"... ***
M>

 

Batch processing can also be invoked from the command line. In addition, a batch file named autorun.bat will be automatically invoked at program startup.

SET
(1) SET|TURN AUTOEXCLUDE|SAVEALLELEFREQUENCIES|DIVIDEDPAGES|COLOR|HAPLOTYPEDISPLAY ON|OFF
(2) SET|TURN FIELD ORDER TO <Field_i>[,<Field_j>[,<Field_k>-<Field_p>]]
(3) SET|TURN LANGUAGE TO [ENGLISH|FRENCH|FINNISH|SUOMI]
(4) SET|TURN ORIENTATION TO [LANDSCAPE|PORTRAIT|AUTOMATIC|MULTIPAGE]
(5) SET|TURN PAPERSIZE TO [USLETTER|USLEGAL|A4|A4LONG|A4SUPER]
(6) SET|TURN PAPERMARGIN TO <nValue>

The set and turn commands are identical. See TURN for complete descriptions of both forms of the command.

SHOW
SHOW <nExpression>|<cExpression>|<LExpression>

Show the value of a single expression. Equivalent to what is. To display values in any kind of list, including field lists, arrays, and marker maps, use the list command instead. See LIST, WHAT IS.

M>show sin(pi/4)
0.707107
M>

 

SILENCE
SILENCE

Detail and summary log messages are not shown on the screen. Identical to silent. See SILENT.

SILENT
SILENT

Detail and summary log messages are not shown on the screen.

M>silent
M>

 

SORT
SORT ON <Expression> [ASCENDING]|DESCENDING

Sets the sort order for displaying siblings in a sibship and multiple spouses on pedigree drawings. <Expression> can be any expression that can be evaluated by Madeline. The default sort order is ascending.

M>//
M>// show siblings in descending order by date of birth:
M>//
M>sort on dob descending
M>draw pedigree '0535'
Drawing page 1 of 1 page for pedigree 0535...
M>//
M>// show siblings in ascending order by the number of offspring they have:
M>//
M>sort on _noffspring ascending
M>draw pedigree '0535'
Drawing page 1 of 1 page for pedigree 0535...
M>

 
 
Pedigree drawn with siblings sorted on date of birth descending.   Same pedigree drawn with siblings sorted by number of offspring ascending.

For more information on drawing pedigrees, see the draw and set commands.

 

STATUS
STATUS

Displays the current setting of Madeline’s boolean state flags and other status information. Identical to the hello command.

M>status
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | ENGLISH   | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.00 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | 16:37 Monday, October 4, 1999           |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>

 

SYSTEM
SYSTEM <cExpression>

Transfers a quoted-string command to the operating system. This allows the user to obtain directory and file information, copy or move files, or run analysis software without having to exit Madeline. System is especially useful when you need to obtain file or directory information using the DOS dir command or the Unix ls command. Since Madeline is supported on multiple platforms, there is no built-in support for operating system-specific commands. Because system transfers control to the operating system, screen output from other programs or from operating system commands is not recorded in Madeline's log files.

M>system "ls -l /test/*.dbf"
-rw-rw-rw-a   8061 Tue Dec 02 14:34:24 1997  chr20dic.dbf
-rw-rw-rw-a 550246 Tue Jan 13 15:08:10 1998  c14.dbf
-rw-rw-rw-a 777954 Tue Dec 02 14:40:18 1997  chr20.dbf
-rw-rw-rw-a1001786 Mon Feb 16 14:53:10 1998  sib20.dbf
-rw-rw-rw-a 369746 Thu Feb 26 11:10:16 1998  draw.dbf
M>

 

TOGGLE
(1) TOGGLE [PHENOTYPE|GENOTYPE|COVARIATE|OUTPUT] FLAG[S] FOR <field_i>[,<field_j>[,<field_k>-<field_z>]]
(2) TOGGLE ICON FLAG[S] FOR <field_i>[,<field_j>[,<field_k>-<field_z>]]

Toggle database field category and status flags.

(1) TOGGLE [PHENOTYPE|GENOTYPE|COVARIATE|OUTPUT] FLAG[S] FOR <field_i>[,<field_j>[,<field_k>-<field_z]]

Madeline automatically categorizes fields in a database table as being core "C", genotype "G", or phenotype "P" fields. Core "C" fields contain core information used to reconstruct pedigrees and classify individuals, such as the StudyIDField, GenderField, and AffectionStatusField. Genotype "G" fields contain marker information. The names of genotype fields should correspond with the marker names exactly. Fields that are not "C" or "G" fields are classified as phenotype "P" fields.

Core fields are determined by matching up field names in the database table with names stored in internal variables. Genotype fields are determined by sampling the data to find character fields that contain numeric labels separated by slash characters. By elimination, remaining fields are classified as phenotype fields. Certain output formats may require knowing which of the phenotype fields are to be used as covariates. Hence, there is also a covariate "V" category. By default, "C", "G" and all "P" fields except date fields are marked for output with the "o" flag. With the exception of core fields which Madeline handles automatically in most cases, only fields marked for output with the "o" flag will be examined and appear in output.

The most common use of the toggle command is to toggle the output flags on or off. Occasionally you might need to change the status of a phenotype "P" field to that of a covariate "V" field. Covariate "V" fields are still recognized as phenotype "P" fields when writing formats that do not require covariates.

M>open "/m55/newtest.dbf"
      ...    ...
  1.STUDYID    Co__1   19.D20S889    Go__4   37.D20S481    Go_22  
  2.SEX        Co__2   20.D20S482    Go__5   38.D20S836    Go_23  
  3.FATHER     Co__3   21.D20S905    Go__6   39.D20S888    Go_24  
  4.MOTHER     Co__4   22.D20S115    Go__7   40.D20S886    Go_25  
  5.TWIN       Co__5   23.D20S851    Go__8   41.D20S197    Go_26  
  6.FUSION2    Po__1   24.D20S917    Go__9   42.D20S178N   Go_27  
  7.CONTROL    Po__2   25.D20S189    Go_10   43.D20S866    Go_28  
  8.CPEP       Po__3   26.D20S898    Go_11   44.D20S196    Go_29  
  9.GLU_FAST   Po__4   27.D20S114    Go_12   45.D20S857    Go_30  
 10.GLU_2H     Po__5   28.D20S912    Go_13   46.D20S480    Go_31  
 11.STUDYAGE   Po__6   29.D20S477    Go_14   47.D20S211    Go_32  
 12.LOGSI      Po__7   30.D20S874    Go_15   48.D20S840    Go_33  
 13.BMI        Po__8   31.D20S195    Go_16   49.D20S120    Go_34  
 14.TP         Po__9   32.D20S909    Go_17   50.D20S100    Go_35  
 15.NAFFECTE   C    +  33.D20S107    Go_18   51.D20S102    Go_36  
 16.D20S103    Go__1   34.D20S170    Go_19   52.D20S171    Go_37  
 17.D20S117    Go__2   35.D20S96     Go_20   53.D20S173    Go_38  
 18.D20S906    Go__3   36.D20S119    Go_21  
M>toggle output flags for 6-7,glu_fast,glu_2h,12-14
M>toggle covariate flag for studyage
M>list fields
  1.STUDYID    Co__1   19.D20S889    Go__4   37.D20S481    Go_22  
  2.SEX        Co__2   20.D20S482    Go__5   38.D20S836    Go_23  
  3.FATHER     Co__3   21.D20S905    Go__6   39.D20S888    Go_24  
  4.MOTHER     Co__4   22.D20S115    Go__7   40.D20S886    Go_25  
  5.TWIN       Co__5   23.D20S851    Go__8   41.D20S197    Go_26  
  6.FUSION2    P       24.D20S917    Go__9   42.D20S178N   Go_27  
  7.CONTROL    P       25.D20S189    Go_10   43.D20S866    Go_28  
  8.CPEP       Po__1   26.D20S898    Go_11   44.D20S196    Go_29  
  9.GLU_FAST   P       27.D20S114    Go_12   45.D20S857    Go_30  
 10.GLU_2H     P       28.D20S912    Go_13   46.D20S480    Go_31  
 11.STUDYAGE   Vo__2   29.D20S477    Go_14   47.D20S211    Go_32  
 12.LOGSI      P       30.D20S874    Go_15   48.D20S840    Go_33  
 13.BMI        P       31.D20S195    Go_16   49.D20S120    Go_34  
 14.TP         P       32.D20S909    Go_17   50.D20S100    Go_35  
 15.NAFFECTE   C    +  33.D20S107    Go_18   51.D20S102    Go_36  
 16.D20S103    Go__1   34.D20S170    Go_19   52.D20S171    Go_37  
 17.D20S117    Go__2   35.D20S96     Go_20   53.D20S173    Go_38  
 18.D20S906    Go__3   36.D20S119    Go_21
M>

 

(2) TOGGLE ICON FLAG[S] FOR <field_i>[,<field_j>[,<field_k>-<field_z>]]

Toggle icon flag enables you to designate one or more categorical variables to display graphically on the male and female icons of a pedigree drawing. The AffectionStatusField is toggled with the icon flag on by default. You can designate any number of additional or alternate fields for graphical display. The number of fields you select determines the number of pie-slice regions into which the icons on the drawing will be divided. Each pie-slice region will be shaded to display the categorical level of the respective variable. Fields toggled with the icon flag on are displayed in the field list with a plus sign, "+" at the end. For example:

...
15.NAFFECTE   C   +
16.HEARTCOND  N   +
...

 

When the icon flag of a field is toggled on, Madeline automatically determines how many non-missing categorical levels are present in the field:

15. NAFFECTE has 2 levels.
16. HEARTCOND has 3 levels.

 

Madeline also automatically constructs a label array for each flagged categorical variable, with entries for each level of the variable. The label arrays are used for assigning character string labels for each level of a variable when displayed on a pedigree drawing. The name of the label array is simply the name of the field variable with the word "label" appended to the end. Normally, the default entries are either the ordinals "1, 2, 3" or the letters of the alphabet "A, B, C" enumerated for each level of the categorical variable. These defaults can be changed easily:

M>list naffectelabel              <-- list the default entries
naffectelabel[0]="A"
naffectelabel[1]="B"
M>NaffecteLabel[_unaffected]="U"  <-- assign "U" as a label for unaffected individuals
M>NaffecteLabel[_affected  ]="A"  <-- assign "A" as a label for affected individuals
M>list naffectelabel              <-- list the new entries
naffectelabel[0]="U"
naffectelabel[1]="A"
M>

 

Normally only single-character labels will fit within the male or female symbols. This is especially true when more than one categorical variable is selected so that the symbols are divided into pie-slice regions. Single-character labels can be legible when as many as five categorical variables have been selected. Assign the null string, "", to each element of an array if you do not want character labels displayed at all.

Watch out for two conditions! First, a variable with only a single non-missing categorical level may represent a problem in the database and can cause divide-by-zero errors in the Postscript drawing routines. Secondly, guard against selecting a variable with too many levels. This too may represent a database problem. In any case, the shades of gray or color used to display different levels will become indistinguishable as the number of levels increases:

15. NAFFECTE has 1 level.     <-- Possible problem in the database.  Why only one level?
16. HEARTCOND has 359 levels. <-- Whoa!  Too many levels!

 

When drawing in black and white, Madeline assigns shades of gray for each level of an icon field variable, using white for the first level, and black for the last level. When drawing in color, Madeline selects alternating shades of red, green, and blue for each level of a variable. See Fig. 1.18 for an example pedigree drawing displaying two categorical variables graphically.

TRANSPOSE
TRANSPOSE <INPUT_FILE> [TO <OUTPUT_FILE>]

The transpose command converts a marker database containing the alleles of a given marker for a given individual in a given family to a table which contains a single record for each individual, marker names as column headings, and genotypes as field data. This command is designed for converting the output from genotyping machine software (such as ABI Genotyper) into a database form compatible with Madeline's pedigree database model:

   INPUT:
 ---------------
 
 FAMID
 INDIVIDUAL
 MARKERNAME
 ALLELE1
 ALLELE2
 DISCARD

 0001 0001-100 d20s100  112 114  G323
 0001 0001-100 d20s898  120 122  G364
 0001 0001-100 d20s129   98 100  G311
 0002 0002-100 d20s100  116 116  G112
 0002 0002-100 d20s898  115 118  G918
 0002 0002-100 d20s129   94  96  G454
 .    .        .          .   .  .
 .    .        .          .   .  .
 .    .        .          .   .  .


   OUTPUT:
 ---------------

 FAMID
 INDIVIDUAL
 D20S100
 D20S898
 D20S129

 0001 0001-100  112/114  120/122   98/100
 0002 0002-100  116/116  115/118    94/96
 .    .         .        .        .
 .    .         .        .        .
 .    .         .        .        .

 

Before running transpose, be sure to specify the names of the three required key fields (FamilyIDField, IndividualIDField, and MarkerField) and the two allele fields (Allele1Field and Allele2Field). The input table may contain additional fields, but only FamilyIDField, IndividualIDField, and the marker fields will appear in the output database. For example, as shown above, the "DISCARD" field of the input table does not appear in output.

If an output file name is not provided, Madeline creates an output file with a ".trp" extension for the Mbase flat file output and a ".tfh" extension for the binary Mbase header if a .mfh file already exists.

Core family structure information fields (gender, parental IDs, twin status) or other phenotype or genotype data can be added to the transposed genotype table using the merge command.

M>FamilyIDField     = "FAMID"
M>IndividualIDField = "INDIVIDUAL"
M>MarkerField       = "MARKERNAME"
M>Allele1Field      = "ALLELE1"
M>Allele2Field      = "ALLELE2"
M>transpose "marker.mfh" to "genotypes.dat"
Transposing "marker.mfh" to "genotypes.dat"
Transposed file created.
M>

 

TURN|SET
(1) SET|TURN AUTOEXCLUDE|SAVEALLELEFREQUENCIES|DIVIDEDPAGES|COLOR|HAPLOTYPEDISPLAY ON|OFF
(2) SET|TURN FIELD ORDER TO <Field_i>[,<Field_j>[,<Field_k>-<Field_p>]]
(3) SET|TURN LANGUAGE TO [ENGLISH|FRENCH|FINNISH|SUOMI]
(4) SET|TURN ORIENTATION TO [LANDSCAPE|PORTRAIT|AUTOMATIC|MULTIPAGE]
(5) SET|TURN PAPERSIZE TO [USLETTER|USLEGAL|A4|A4LONG|A4SUPER]
(6) SET|TURN PAPERMARGIN TO <nValue>

In Madeline, the set and turn commands are identical. Normally, of course, one will select the command verb that makes the most sense in English. Descriptions of all forms of the command follow. (1) SET|TURN AUTOEXCLUDE|SAVEALLELEFREQUENCIES|DIVIDEDPAGES|COLOR ON|OFF

Turns boolean state flags on or off. The effect and default state of each boolean flag is described below. See Table 5.5 for a tabular summary of Madeline’s boolean state flags.

TURN AutoExclude [ON|OFF]

AutoExclude instructs Madeline, on a subsequent write command, to automatically exclude pedigrees with insufficient data. If AutoExclude is off, no pedigrees will be excluded. Autoexclude is on by default. There are few reasons to turn AutoExclude off.

M>turn autoexclude off
Autoexclude is now off
M>

 

TURN SaveAlleleFrequencies [ON|OFF]

SaveAlleleFrequencies instructs Madeline, on a subsequent open command, to retain the current set of allele frequencies, estimated from the current pedigree table, rather than calculating new frequencies from the new table. In order to use SaveAlleleFrequencies, the subsequent table must have the same number of fields prior to the set of genotype fields, and the genotype fields must match exactly in name and order (Table 3.2). SaveAlleleFrequencies is off by default.

Table 3.2. Field requirements for using SaveAlleleFrequencies. The subsequent database must have the same number of fields prior to the set of genotype fields, and the genotype fields must match exactly in name and order.
Field set in first table used to calculate allele frequencies   Field set in second table opened with SaveAlleleFrequencies on
 1. STUDYID  Co__1
 2. FATHER   Co__2
 3. MOTHER   Co__3
 4. SEX      Co__4
 5. TWIN     Co__5
 6. LOGSI    Po__1
 7. BMI      Po__2
 8. GLU_FAST Po__3
Same number of fields preceding genotype fields
(fields need not match)
 1. STUDYID  Co__1
 2. FATHER   Co__2
 3. MOTHER   Co__3
 4. SEX      Co__4
 5. TWIN     Co__5
 6. DBP      Po__1
 7. SBP      Po__2
 8. GLU_2H   Po__3
 9. D20S103  Go__1
10. D20S906  Go__2
11. D20S889  Go__3
Genotype fields match in order and name
 9. D20S103  Go__1
10. D20S906  Go__2
11. D20S889  Go__3

 

M>open '\test\fullset.ssd'
  ...
M>turn saveallelefrequencies on
SaveAlleleFrequencies is now on
M>open '\test\subset.ssd'
Existing allele frequency information has been saved...
Closing database "\test\fullset.ssd"...
Removing old pedigrees...
Database "\test\subset.ssd" opened with     1,856 records
  ...
M>

 

TURN DividedPages [ON|OFF]

DividedPages controls how a pedigree with multiple founding groups is logically partitioned for drawing when draw is invoked. A founding group consists of an original founder and his or her one to many spouses. When DividedPages is on (the default), each subtree of a pedigree originating with a different founding group is drawn on a separate virtual page. The transfer of a drawing from a virtual page to one or more physical pages is still governed by the settings of orientation, the number of data fields displayed on the drawing, and the size of the pedigree. When DividedPages is off, a multiple founding group pedigree is drawn in its entirety on a single virtual page. DividedPages provides one way to logically partition a large complicated pedigree to make it easier to view.

NOTE: Due to incomplete state of the drawing algorithms in version 0.90 and 0.91, DividedPages is effectively always on. This toggle is provided to support the augmented feature set of version 1.0.

TURN COLOR [ON|OFF]

Pedigrees are printed in color when color is on (the default), and in black-and-white otherwise. This toggle affects a single boolean flag located near the top of the Postscript file that Madeline generates. Thus, any saved pedigree drawing can be printed in color or in black-and-white by simply changing the boolean INCOLOR flag in the Postscript file from true to false or vice-versa:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
% Boolean toggle for color shading/printing:
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
/INCOLOR true def

TURN HaplotypeDisplay ON|OFF

When HaplotypeDisplay is on, genotypes are shown with alleles delimited by "|" on pedigree drawings. When off, genotypes are shown delimited by "/". Off is the default setting.

NOTE: Madeline is not capable of inferring haplotypes and has no way of knowing whether alleles in a pedigree file are arranged to show haplotypes or not. If alleles in a pedigree file are arranged to show known or inferred haplotypes, then Madeline provides a convenient way to draw pedigrees of such data using the set HaplotypeDisplay on and draw pedigree commands. If haplotypes are not known, Madeline cannot help you.

(2) SET|TURN FIELD ORDER [TO <Field_i>[,<Field_j>[,<Field_k>-<Field_p>]] |AUTOMATICALLY]

Reorder fields by specifying field names or field indices separated by commas, or a range of contiguous field names or indices separated by a dash. Fields are ordered within their category (i.e., "C","P", or "G"). Covariate "V" fields are simply a subset of phenotype "P" fields and, thus, are numbered along with phenotype fields.

If within a category you specify only m of n fields (m<n, where n is either the number of "C", "P", or "G" fields), then the fields you specify will be numbered in the sequence you specify from 1 to m, and the remaining output fields will be numbered from m+1 to n in the physical order they occur in the database.

When specifying field order, you can mix and match any sequence of "C", "P", and "G" fields within a single set command. Specified fields not already toggled for output are ignored. Issuing a load command after a set field order command resets the order of all "Co" and "Po" fields, while "Go" field ordering is set to the map order. To avoid this behaviour, issue load prior to any set field order command.

Madeline controls the order of core fields when writing most output formats. Reordering of core fields is recognized by the view record and draw pedigree commands, and by the CommaDelimited and SpaceDelimited write formats.

Issuing set field order without the to clause resets the order of "Co" and "Po" fields to their natural order, while the order of "Go" fields will depend on whether a map database is loaded or not.

M>open '\test\nt2.dbf'
  ...    ...
  1.STUDYID    Co__1   19.D20S889    Go__5   36.D20S836    Go_22  
  2.SEX        Co__2   20.D20S103    Go__6   37.D20S888    Go_23  
  3.FATHER     Co__3   21.D20S115    Go__7   38.D20S886    Go_24  
  4.MOTHER     Co__4   22.D20S851    Go__8   39.D20S197    Go_25  
  5.TWIN       Co__5   23.D20S912    Go__9   40.D20S178N   Go_26  
  6.CPEP       Po__1   24.D20S917    Go_10   41.D20S866    Go_27  
  7.GLU_FAST   Po__2   25.D20S898    Go_11   42.D20S196    Go_28  
  8.GLU_2H     Po__3   26.D20S114    Go_12   43.D20S857    Go_29  
  9.STUDYAGE   Po__4   27.D20S477    Go_13   44.D20S480    Go_30  
 10.LOGSI      Po__5   28.D20S874    Go_14   45.D20S211    Go_31  
 11.BMI        Po__6   29.D20S195    Go_15   46.D20S120    Go_32  
 12.TP         Po__7   30.D20S909    Go_16   47.D20S102    Go_33  
 13.NAFFECTE   C       31.D20S107    Go_17   48.D20S173    Go_34  
 14.ISTYPED    Po__8   32.D20S170    Go_18   49.D20S171    Go_35  
 15.D20S117    Go__1   33.D20S96     Go_19   50.D20S840    Go_36  
 16.D20S906    Go__2   34.D20S119    Go_20   51.D20S189    Go_37  
 17.D20S482    Go__3   35.D20S481    Go_21   52.D20S100    Go_38  
 18.D20S905    Go__4  
M>load 'k:\emap\emap.dbf'
Marker maps based on k:\emap\emap.dbf are now installed.
Note: genotype fields ordered according to current map
Field ordering now set based on k:\emap\emap.dbf.
M>list fields
  1.STUDYID    Co__1   19.D20S889    Go__4   36.D20S836    Go_23  
  2.SEX        Co__2   20.D20S103    Go__1   37.D20S888    Go_24  
  3.FATHER     Co__3   21.D20S115    Go__7   38.D20S886    Go_25  
  4.MOTHER     Co__4   22.D20S851    Go__8   39.D20S197    Go_26  
  5.TWIN       Co__5   23.D20S912    Go_13   40.D20S178N   Go_27  
  6.CPEP       Po__1   24.D20S917    Go__9   41.D20S866    Go_28  
  7.GLU_FAST   Po__2   25.D20S898    Go_11   42.D20S196    Go_29  
  8.GLU_2H     Po__3   26.D20S114    Go_12   43.D20S857    Go_30  
  9.STUDYAGE   Po__4   27.D20S477    Go_14   44.D20S480    Go_31  
 10.LOGSI      Po__5   28.D20S874    Go_15   45.D20S211    Go_32  
 11.BMI        Po__6   29.D20S195    Go_16   46.D20S120    Go_34  
 12.TP         Po__7   30.D20S909    Go_17   47.D20S102    Go_36  
 13.NAFFECTE   C       31.D20S107    Go_18   48.D20S173    Go_38  
 14.ISTYPED    Po__8   32.D20S170    Go_19   49.D20S171    Go_37  
 15.D20S117    Go__2   33.D20S96     Go_20   50.D20S840    Go_33  
 16.D20S906    Go__3   34.D20S119    Go_21   51.D20S189    Go_10  
 17.D20S482    Go__5   35.D20S481    Go_22   52.D20S100    Go_35  
 18.D20S905    Go__6  
M>set field order to father,mother,logsi,studyage,sex,twin,tp,bmi
M>list fields
  1.STUDYID    Co__5   19.D20S889    Go__4   36.D20S836    Go_23  
  2.SEX        Co__3   20.D20S103    Go__1   37.D20S888    Go_24  
  3.FATHER     Co__1   21.D20S115    Go__7   38.D20S886    Go_25  
  4.MOTHER     Co__2   22.D20S851    Go__8   39.D20S197    Go_26  
  5.TWIN       Co__4   23.D20S912    Go_13   40.D20S178N   Go_27  
  6.CPEP       Po__5   24.D20S917    Go__9   41.D20S866    Go_28  
  7.GLU_FAST   Po__6   25.D20S898    Go_11   42.D20S196    Go_29  
  8.GLU_2H     Po__7   26.D20S114    Go_12   43.D20S857    Go_30  
  9.STUDYAGE   Po__2   27.D20S477    Go_14   44.D20S480    Go_31  
 10.LOGSI      Po__1   28.D20S874    Go_15   45.D20S211    Go_32  
 11.BMI        Po__4   29.D20S195    Go_16   46.D20S120    Go_34  
 12.TP         Po__3   30.D20S909    Go_17   47.D20S102    Go_36  
 13.NAFFECTE   C       31.D20S107    Go_18   48.D20S173    Go_38  
 14.ISTYPED    Po__8   32.D20S170    Go_19   49.D20S171    Go_37  
 15.D20S117    Go__2   33.D20S96     Go_20   50.D20S840    Go_33  
 16.D20S906    Go__3   34.D20S119    Go_21   51.D20S189    Go_10  
 17.D20S482    Go__5   35.D20S481    Go_22   52.D20S100    Go_35  
 18.D20S905    Go__6  
M>

 

(3) SET|TURN LANGUAGE TO ENGLISH|FINNISH|SUOMI|FRENCH

Sets the language and linguistic conventions used for displaying and entering dates and times in Madeline. FINNISH and SUOMI are identical.

M>set language to Suomi
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | FINNISH   | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.00 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | perjantai 8.10.1999, 10:00              |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>show {8.10.1999}
{perjantai 8.10.1999}
M>set language to french
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | FRENCH    | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.00 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | 10:01 le vendredi  8 octobre 1999       |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>show {8.10.1999}
{le vendredi  8 octobre 1999}
M>

 

When entering dates in curly brackets, Madeline applies ordering (i.e., month before day vs. day before month) and capitalization rules, and looks up month names that apply to the current language setting only. For example, you cannot enter English month names when language is set to French:

M>set language to french
...
M>// invalid date error because month is in English:
M>show {11 December 1612}
{}
M>// error because of inappropriate capitalization:
M>show {11 Decembre 1612}
{}
M>// the following is correct:
M>show {11 decembre 1612}
{le mardi 11 decembre 1612}
M>// this is OK too:
M>show {11.12.1612}
{le mardi 11 decembre 1612}
M>

 

(4) SET|TURN ORIENTATION TO LANDSCAPE|PORTRAIT|AUTOMATIC|MULTIPAGE

In landscape and portrait modes, Madeline resizes a pedigree drawing to fit on a single physical page in the desired orientation. For large pedigrees, the reduction necessary to fit a drawing on a single page may result in labels that are too small to read. In general, the default automatic or MultiPage mode is a better choice. The keywords automatic and MultiPage are identical.

When orientation is set to automatic, Madeline chooses the best orientation for a drawing based on it's height and width. If Madeline determines that the reduction necessary to fit the drawing on a single page may make the labels difficult to read or illegible, the program inserts additional code into the drawing file to print the drawing centered across multiple physical pages. The program selects the number and orientation of physical pages that require the least amount of rescaling of a drawing. A schematic index is produced as a guide for assembling the drawing after printing. See DRAW for more information.

M>set orientation to landscape
...
M>

 

(5) SET|TURN PAPERSIZE TO USLETTER|USLEGAL|A4|A4LONG|A4SUPER

Sets the paper size to the specified standard printer paper size. Madeline does not send special commands to multi-tray printers, so be sure that the correct paper size is in the selected printer tray.

M>set papersize to usletter
...
M>

 

(6) SET|TURN PAPERMARGIN TO <nValue>

Sets the paper margins on all four sides to the specified value in centimeters. The default margin size is one centimeter. If a multiple-page drawing is produced, not only will the outer-edge margins be of the specified width, but also the drawings will overlap by exactly the margin width along the joining edges of the drawing. Do not set the margins to much less than one centimeter because most printers cannot print out to the physical edge of the paper.

M>set papermargin to 1.5
+-----------------------+-----------+-----------------------------------------+
| Variable or State Flag| Setting   | Description                             |
+-----------------------+-----------+-----------------------------------------+
| AutoExclude           | ON        | Exclude pedigrees automatically         |
| Color                 | ON        | Draw pedigrees in color                 |
| DividedDrawings       | ON        | Paginate drawings by founding group     |
| EvaluationInterval    |   0.50 cM | Value to write to control file.         |
| Help                  | HTML      | Extended HTML help documentation        |
| Language              | FRENCH    | Language convention used for date, time |
| OffEndDistance        |  10.00 cM | Value to write to control file          |
| Orientation           | AUTOMATIC | Automatic based on drawing dimensions   |
| PaperMargin           | 1.50 cm   | Margin (in cm) on all four sides        |
| PaperSize             | USLETTER  | 8.5 x 11.0 inches                       |
| SaveAlleleFrequencies | OFF       | Calculate new frequencies on next OPEN  |
| Time                  | Current   | 10:39 le vendredi  8 octobre 1999       |
| Verbosity             | VERBOSE   | All messages are printed to the console |
+-----------------------+-----------+-----------------------------------------+
M>

 

UNEXCLUDE
UNEXCLUDE [FAMILIES] FOR

Includes previously excluded individuals and pedigrees in output. If unexclude families is used, all individuals who match the criteria and their spouse(s) and descendants who were excluded by a previous exclude families or other exclude command will be included again. See EXCLUDE

M>verbose
Madeline is now in verbose mode.
M>exclude for _famid=='0172'
0172-100 has been marked for exclusion
0172-401 has been marked for exclusion
0172-402 has been marked for exclusion
0172-500 has been marked for exclusion
0172-601 has been marked for exclusion
0172-602 has been marked for exclusion
0172-603 has been marked for exclusion
0172-604 has been marked for exclusion
0172-605 has been marked for exclusion
M>unexclude for _famid=='0172'
0172-100 has been marked for inclusion
0172-401 has been marked for inclusion
0172-402 has been marked for inclusion
0172-500 has been marked for inclusion
0172-601 has been marked for inclusion
0172-602 has been marked for inclusion
0172-603 has been marked for inclusion
0172-604 has been marked for inclusion
0172-605 has been marked for inclusion
M>

 

VERBOSE
VERBOSE

Prints all summary and detail messages to the screen. See QUIET, SILENT.

M>verbose
Madeline is now in verbose mode.
M>

 

VIEW
(1) VIEW [RECORD][FOR <Lexpression>]
(2) VIEW DISTINCT VALUES OF <cField_A>[,<cField_B>[,<cField_C>-<cField_Z>]]

The view command has two forms, described below:

(1) VIEW [RECORD][FOR <Lexpression>]

When view is used without the record keyword, only the IndividualID and database record number of the individual, if in the database, are shown. If the record keyword is included, then those fields in the database currently toggled on for output are also shown. If view record is typed without a for query expression, only the current record is shown. View for <Lexpression> queries the IDs or records of a subset of the data. Upon completion, view prints a tally of the records matching the criteria. For examples of using Madeline’s internal references in view queries, see Table 5.4.

M>open 'chr8.dbf'
   ...
  1.FAMID      Co__1   10.D8S504     Go__1   19.D8S1757    Go_10
  2.STUDYID    Co__2   11.D8S550     Go__2   20.D8S270     Go_11
  3.SEX        Co__3   12.D8S258     Go__3   21.D8S1778    Go_12
  4.FATHER     Co__4   13.D8S1771    Go__4   22.D8S276     Go_13
  5.MOTHER     Co__5   14.D8S1820    Go__5   23.GATA101F01 Go_14
  6.TWIN       Co__6   15.D8S283     Go__6   24.D8S514     Go_15
  7.BMI        Po__1   16.D8S285     Go__7   25.D8S284     Go_16
  8.NAFFECTE   Co__7+  17.D8S260     Go__8   26.D8S534     Go_17
  9.STUDYAGE   Po__2   18.D8S530     Go__9   27.D8S1836    Go_18
M>go 197
M>view record
CORE FIELDS:
3015 3015-602 M 3015-100 3015-500 .     0
PHENOTYPE FIELDS:
   23.15242297    41.1663
GENOTYPE FIELDS:
129/139       200/206       153/153       297/301       112/112
119/121       319/319       203/205       222/238       292/296
110/110       202/209       77/79         227/227       210/215
281/294       178/209       139/146
M>
M>view for famid="3348"
3348+100 in 1348 (rec. no.  4839)
3348+401 in 1348 (rec. no.  4840)
3348+402 in 1348 (rec. no.  4841)
3348-200 in 1348 (not in database)
3348-300 in 1348 (not in database)

5 individuals in 1 pedigree matched as follows:

Individuals ..............          5
 + In database ...........          3
 |  + Attached ...........          3
 |  + Childless spouses ..          0
 |  + Unattached .........          0
 + Not in database .......          2
M>
M>toggle output flags for 3-9,13-27
M>view record for famid="0482"
0482+402 in 0482 (rec. no.  2863)
CORE FIELDS:
0482 0482+402
GENOTYPE FIELDS:
............. ............. .............
0482-100 in 0482 (rec. no.  2864)
CORE FIELDS:
0482 0482-100
GENOTYPE FIELDS:
133/139       185/206       150/153
0482-200 in 0482 (not in database)
-- not in database --
0482-300 in 0482 (not in database)
-- not in database --
0482-401 in 0482 (rec. no.  2865)
CORE FIELDS:
0482 0482-401
GENOTYPE FIELDS:
............. 185/193       .............

5 individuals in 1 pedigree matched as follows:

Individuals ..............          5
 + In database ...........          3
 |  + Attached ...........          3
 |  + Childless spouses ..          0
 |  + Unattached .........          0
 + Not in database .......          2
10 WARNINGS M>
M>

 

(2) VIEW DISTINCT VALUES OF <cField_A>[,<cField_B>[,<cField_C>-<cField_Z>]]

View a histogram of the distinct values in a field or set of fields. A list of field names or field indices may be specified separated by commas. A range may be specified by separating the first and last field in a range with a dash. Madeline reports the number of non-missing levels of a variable. The number of missing cases is printed at the end of the list.

M>view distinct values of naffecte, 8

7. NAFFECTE has 2 levels:

Level  Value Cases
------ ----- -----
    1.     0  1336
    2.     1  1514

    .. .....  2995 missing values in database

8. D20S103 has 27 levels:

Level  Value         Cases
------ -----         -----
    1. 103/103           1
    2. 89/103           28
    3. 89/89            64
    4. 89/91            11
    5. 89/93           202
    6. 89/95           399
    7. 89/97           279
    8. 89/99            20
    9. 91/103            1
   10. 91/93             9
   11. 91/95            25
   12. 91/97            18
   13. 93/103           59
   14. 93/93           235
   15. 93/95           871
   16. 93/97           549
   17. 93/99            38
   18. 95/102            1
   19. 95/103          108
   20. 95/95           784
   21. 95/97          1046
   22. 95/99            39
   23. 97/103           65
   24. 97/97           341
   25. 97/99            30
   26. 99/103            2
   27. 99/99             1

    .. .............   619 missing values in database

M>

 

WHAT IS
WHAT IS <nExpression>|<cExpression>|<LExpression>

Shows the value of an expression. Equivalent to show command. See SHOW.

M>what is studyid
"0052-100"
M>what is d20s889
"201/216"
M>

 

WRITE
WRITE [[PEDIGREE FILE]|LOCUS FILE] TO <cFileName> IN <FormatKeyword> FORMAT

There are two forms of the write command:

Write pedigree file writes the current set of core "C" fields and flagged output fields (e.g., "Go", "Po" and "Vo" fields) to a pedigree file, <cFileName>, in the format specified by <FormatKeyword>. Write pedigree file can be shortened to just write.

Write locus file creates a locus file containing allele frequency information for the "Go" genotype fields flagged for output in the current database.

After a write command, the value of OutputFile will be <cFileName>.

For certain formats, such as the Sage and Siblink formats, Madeline will automatically create a parameter or control file at the same time the pedigree file is created. A parameter or control file contains a template for running an analysis, along with other core information required by the specific package, such as number of families or sib pairs in the corresponding pedigree file. Madeline will provide all information possible -- such as number of families or sib pairs -- but Madeline cannot guess what sort of analysis is to be conducted, what genetic model to specify, and so on. The user will need to edit the parameter file to meet specific needs. For these formats, the value of OutputParameterFile will become with the file extension replaced by ".par" or ".ctl", depending upon the naming conventions used for the format. Madeline will print a message informing the user of creation of a complementary .par or .ctl file.

Formats such as the Siblink format incorporate locus file information directly into the control file. In these cases, you do not need to create a separate locus file. Other packages, such as Crimap, do not require a locus file at all.

Some formats, such as the Siblink and Genehunter formats, incorporate map distance information into either a control file or locus file, and therefore require that a map database be loaded prior to the write command. Madeline will issue an error if you try to write such a file without first loading a map.

For specific formats and usage, see Section 4. Write Formats.

(1)

M>write to '\test\sibpal3.ped' in sibpal3 format
Creating associated SIBPAL parameter file called "\test\sibpal3.par"
Writing pedigree data to "\test\sibpal3.ped"
    ...
M>write locus file to '\test\sibpal3.loc' in sage format
M>

 

(2)

M>load 'k:\emap\emap.dbf'
Marker maps based on k:\emap\emap.dbf are now installed.
M>write to '\test\siblink.ped' in siblink format
Creating associated SIBLINK control/parameter file called "\test\siblink.ctl"
Writing pedigree data to "\test\siblink.ped"
...
M>

 


Section 4
Write Formats

Introduction to Write Formats

This section describes all formats currently supported by the write pedigree file and write locus file commands.

Format keywords are listed alphabetically within each group. Some keywords can be used for creating both a pedigree file and a locus file, while others cannot. To make these distinctions clear, the following codes in parentheses appear following the keyword headings:

Code Description
PED indicates a keyword used with write pedigree file only.
LOC indicates a keyword used with write locus file only.
PED, LOC indicates a keyword used to write both pedigree and locus files.
PAR/CTL indicates that a complementary parameter or control file is produced when the write pedigree file command is executed.

For example, the sibpal3 keyword can only be used to create a pedigree file, while the sage keyword can only be used to create the corresponding locus file, so you will see SIBPAL3 (PED) and SAGE (LOC) as headings.

Depending upon analysis package, the parameter file may be called a control file or may have some other name. In Madeline, any file containing analysis control or parameter information is referred to as a parameter file. For many formats, the parameter file also contains locus (and sometimes map) information which eliminates the need for writing a locus file in a separate step.

Any one program may contain numerous settable parameters in the parameter file.   For those formats that require it, Madeline provides a template parameter file that may be edited to set parameters to pass to an analysis program. Madeline provides default parameters to the extent possible, but these defaults are not necessarily the best choices for any given analysis and, in some cases, they may only be place-holder values like "0.00".

 

Generic Formats

COMMADELIMITED (PED)

Used to output a pedigree file as a comma-delimited ASCII flat file. Since this is a generic format, there is no fixed set of required core fields. It is necessary to toggle output flags on or off and set field order, as required, for core fields as well as for general phenotype and genotype fields. For readability, fields in the output are padded with white space so that columns align, just as in the SpaceDelimited format. Missing numeric values are printed using the value specified in the first cell of the numeric missing value array,  NumericMissingValue[0]. Missing character values are printed using the value specified in the first cell of the character missing value array, CharacterMissingValue[0].

M>nmv[0]=-9
M>list nmv
NMV has 1 elements:
NMV[ 0]=            -9
M>list cmv
CMV has 5 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
M>write pedigree to 'commadlm.dat' in commadelimited format
  . . .
M>

 

SPACEDELIMITED (PED)

Used to output a pedigree file as a column-aligned, space-delimited ASCII flat file. Since this is a generic format, there is no fixed set of required core fields. It is necessary to toggle output flags on or off and set field order, as required, for core fields as well as for general phenotype and genotype fields. Missing numeric values are printed using the value specified in the first cell of the numeric missing value array,  NumericMissingValue[0]. Missing character values are printed using the value specified in the first cell of the character missing value array, CharacterMissingValue[0].

M>nmv[0]=-9
M>list nmv
NMV has 1 elements:
NMV[ 0]=            -9
M>list cmv
CMV has 5 elements:
CMV[ 0]="."
CMV[ 1]="/"
CMV[ 2]="0/0"
CMV[ 3]="0/ 0"
CMV[ 4]="0/  0"
M>write pedigree to 'spacedlm.dat' in spacedelimited format
  . . .
M>

 

GENERIC (LOC)

Used to output a locus file in a generic flat-file format that provides allele frequencies as well as the raw allele counts and allele ranks (Fig. 4.1). The output file is useful for checking alleles, and for matching up allele ranks (used in formats such as Siblink and Genehunter) against the original allele labels.

D20S103 has 7 alleles:
1.  90  454/ 4296 = 0.1057
2.  92   27/ 4296 = 0.0063
3.  94  909/ 4296 = 0.2116
4.  96  663/ 4296 = 0.3871
5.  98  094/ 4296 = 0.2547
6. 100   44/ 4296 = 0.0102
7. 104  105/ 4296 = 0.0244

D20S117 has 14 alleles:
1. 166    4/ 4198 = 0.0010
2. 168  153/ 4198 = 0.0364
3. 176  658/ 4198 = 0.1567
4. 178   22/ 4198 = 0.0052
5. 183    9/ 4198 = 0.0021
6. 185  132/ 4198 = 0.0314
   . . .

Fig. 4.1. Excerpt from a locus file in generic format produced by Madeline.

 

Aspex Formats

The programs in Aspex use a single pedigree file format. However, each program requires a different set of control parameters in the .tcl control file. Madeline therefore provides a format keyword for each program in the package and produces well-commented .tcl template files containing the default values for all relevant parameters. Two of the programs, sib_ibd and sib_phase (Madeline’s sibibd and sibphase keywords), require marker information and so a marker map must be loaded prior to issuing the write command for these formats.

KINSHIP (PED, PAR/CTL)

Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex kinship program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created.

SIBIBD (PED, PAR/CTL)

Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_ibd program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created. A map must be loaded prior to issuing the write command for this format.

SIBMAP (PED, PAR/CTL)

Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_map program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created.

SIBPHASE (PED, PAR/CTL)

Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_phase program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created. A map must be loaded prior to issuing the write command for this format.

SIBTDT (PED, PAR/CTL)

Used to specify the pedigree file format along with the .tcl parameter file used by the Aspex sib_tdt program. Madeline creates a well-commented .tcl parameter file at the same time that the pedigree file is created.

 

Crimap

CRIMAP (PED)

Used to specify Crimap .gen file format. Non-numeric characters in the study IDs are converted to their ASCII decimal equivalents. For example, "-" is converted to "45". Although this process lengthens the IDs, it does maintain the uniqueness of each ID and provides the completely numeric IDs required by Crimap. Note that the integer value of a converted ID must not exceed the maximum integer that can be represented within Crimap on your platform (Crimap uses a signed long int for IDs, the maximum value of which is 2,147,483,647 on many systems. * This could be a problem for unmodified FUSION control and trio IDs, but not for other FUSION 1 or 2 IDs).

Madeline's Crimap routine currently only handles pedigrees with a single pair of founders (the founders may be dummied-in, as is done for FUSION sibship pedigrees). Criteria for including a pedigree are:

These criteria were defined by Beth Hauser and Mike Boehnke to prevent biased map lengths that occur when data are available on only a single generation of individuals.

 

Genehunter

For the Genehunter formats, genehunter, genehunternpl and genehunterqtl, any pedigrees consisting of a trio of two parents and a single offspring are excluded.

When the genehunternpl keyword is used to specify a file for non-parametric analysis, the following types of pedigrees are also excluded:

GENEHUNTER (PED,PAR/CTL, LOC)

Used to specify a Genehunter pedigree file for parametric linkage analysis. Also used to create a Genehunter locus file. Madeline automatically converts the allele labels in the pedigree database to ordinals and prints these ordinal labels in both the locus and pedigree file. For cross-reference purposes, you may find it useful to also produce a generic locus file -- see GENERIC (LOC). A Genehunter locus file also contains inter-marker distance information. Be sure to load a map database prior to generating the locus file.

When used to create a pedigree file, the genehunter keyword instructs Madeline to exclude pedigrees that do not contribute to a parametric analysis. For a non-parametric analysis, use the genehunternpl keyword.

GENEHUNTERNPL (PED,PAR/CTL)

Used to specify a Genehunter pedigree file for non-parametric linkage analysis. Pedigrees that cannot be used or do not contribute to a non-parametric analysis will be excluded. For a parametric linkage analysis, use the genehunter keyword. To create the corresponding locus file, use the genehunter keyword. Read above to learn about Madeline’s exclusion rules for this format.

GENEHUNTERQTL (PED,PAR/CTL)

Used to specify a Genehunter pedigree file for quantitative trait linkage analysis. Pedigrees are excluded using the same rule as for the genehunter keyword used for a parametric analysis file. Using genehunterqtl differs from using the genehunter keyword in that the complementary control file is customized for a quantitative trait linkage analysis. To create the corresponding locus file, use the genehunter keyword.

 

Linkage Disequilibrium (LDEQ) Formats

For linkage disequilibrium analyses, Madeline selects a single parent-offspring trio providing the most genetic information possible from each pedigree. The output file format is a flat file similar to that produced by the generic SpaceDelimited format. In addition to toggling the genotype fields required for output, the user must also designate which core fields are required, and the order in which the core fields are required, prior to executing the write command. Note that the AffectionStatusField is required in output. The three options for linkage disequilibrium analyses are presented below.

LDEQMARKER (PED)

For the LDEQMARKER format, Madeline selects a trio providing the most information for a linkage disequilibrium analysis without regard to the affection status of the three individuals in the trio.

LDEQAFFECTEDSPOUSE (PED)

For the LDEQAFFECTEDSPOUSE format, Madeline selects a trio providing the most information for a linkage disequilibrium analysis with the additional condition that at least one of the parents must be affected. The status of the other parent and offpsring can be affected, unaffected, or unknown (missing).

LDEQTDT (PED)

For the LDEQTDT format, Madeline selects a trio providing the most information for a linkage disequilibrium analysis with the additional condition that the offspring must be affected. The status of the two parents can be affected, unaffected, or unknown (missing).

 

Mendel and Fisher Formats

The Mendel and Fisher programs cannot use individuals whose gender is listed as missing. In Madeline, only terminal individuals without offspring may have gender listed as missing because Madeline will, when necessary, infer the gender of non-terminal individuals via the FatherIDField and MotherIDField of their offspring. Therefore, Madeline excludes individuals whose gender is missing when writing files in the various Mendel and Fisher formats even when such individuals have genotype data.

FISHER0 (PED)

Used to specify Fisher file format with no ascertainment correction. Zeros are written in the header for each pedigree to indicate no proband ascertainment. Use the mendel keyword to write the corresponding locus file.

FISHER1 (PED)

Used to specify Fisher file format with ascertainment correction. Ones are written in the header for each pedigree that has a proband to indicate proband ascertainment. Under fisher1, at least one non-proband individual in the pedigree must have sufficient data for the pedigree to be included in output. Use the mendel keyword to write the corresponding locus file.

MENDEL (PED, LOC)

Used to specify generic Mendel pedigree and locus file formats.

USERM13 (PED)

Used to specify Mendel UserM13 file format. Use the mendel keyword to write corresponding locus file. When userm13 is specified, all non-excluded genotyped individuals, including childless spouses and unattached individuals, are included in output.

 

PedCheck

PEDCHECK (PED)

The pedcheck keyword produces an output file for use with the Pedcheck program by Jeff O'Connell of the University of Pittsburgh. The format is essentially the Linkage program format. Records for all individuals with genotype data are written to output.

 

Relpair

RELPAIR (PED, LOC)

Used to specify Relpair file formats. Relpair’s locus file format is very similar to the UserFQTL format, while the pedigree file format is identical to generic Mendel format.

The locus file contains map information, and therefore a map database must be loaded prior to the write locus file command.

 

Sage Formats

To run a module in Sage such as Sibpal, you will need to have an FSP family data input file in addition to a Sibpal pedigree file. Be careful to use the same set of exclusions when creating both files. The Sage modules also require parameter files to run. Madeline provides template parameter files that require editing. The parameter files are generated at the same time as the pedigree files.

Note that since the FSP and Sibpal .ped or .par files could easily end up having the same names, be sure to differentiate the file names somewhere other than just in the file extension (Madeline will automatically provide .par as the extension for any of the Sage package parameter files).

FSP0 (PED, PAR)

Used to specify the Sage FSP data file format. Madeline creates a corresponding .par file at the same time that the pedigree file is created. When FSP0 is used, Madeline only outputs the core fields that FSP requires for construction of the family structure pointer ".lnk" file which is used as one input to SIBPAL. No genotype fields are output (hence the "0" in the format name). In order to place genotype fields in an FSP segregation analysis data file used as input to ASSOC and LODLINK, use the FSP format (below) instead of FSP0. If your only objective is to obtain a family structure pointer file to run SIBPAL, then you do not need to include any phenotype or genotype fields as input to FSP, and FSP0 is the preferred choice.

FSP (PED, PAR)

Used to specify the Sage FSP data file format. Madeline creates a corresponding .par file at the same time that the pedigree file is created. If you plan to run SIBPAL, it is more convenient to use the FSP0 format above. However, if you plan to run ASSOC or LODLINK, you should use the FSP format here in order to place genotype fields in the FSP segregation analysis data file.

SAGE (LOC)

Used to specify the Sage locus file format.

SIBPAL1 (PED, PAR)

Used to specify Sage Sibpal quantitative trait linkage format. Be sure to toggle the covariate and output flags of any covariates -- see TOGGLE 3.26.

Madeline creates a corresponding .par file at the same time that the pedigree file is created.

SIBPAL2 (PED, PAR)

Used to specify Sage Sibpal binary trait linkage format. Be sure to toggle the covariate and output flags of any covariates -- see TOGGLE 3.26.

Madeline creates a corresponding .par file at the same time that the pedigree file is created.

SIBPAL3 (PED, PAR)

Used to specify Sage Sibpal binary trait linkage with variable age of onset format. Be sure to toggle the covariate and output flags of any covariates – see TOGGLE 3.26. The age of onset variable must be the first of the specified covariates.

Madeline creates a corresponding .par file at the same time that the pedigree file is created.

SIBPAL4 (PED, PAR)

Used to specify Sage Sibpal marker ordering (i.e., mapping) format. There is no demand for this format, and so it has not been thoroughly tested.

Madeline creates a corresponding .par file at the same time that the pedigree file is created.

 

Siblink

In addition to the usual set of core fields, the AffectionStatusField must be present so that Madeline can choose sib pairs based on affection status. In addition, a map database must be loaded. Madeline creates a Siblink control file with a .ctl extension at the same time that the pedigree file is created. The control file contains locus information, including map distance information.

Madeline automatically converts the allele labels in the source database to ordinals and prints these ordinal labels in both the locus and pedigree file. For cross-reference purposes, you may find it useful to also produce a generic locus file -- see GENERIC.

SIBLINKAFFECTEDPAIRS (PED, PAR)

Used to specify a file in Siblink format containing only affected sib pairs.

SIBLINKUNAFFECTEDPAIRS (PED, PAR)

Used to specify a file in Siblink format containing only unaffected sib pairs.

SIBLINKALLPAIRS (PED, PAR)

Used to specify a file in Siblink format containing all affected and unaffected sib pairs. Siblings whose affection status is missing are excluded.

SIBLINKDISCORDANTPAIRS (PED, PAR)

Used to specify a file in Siblink format containing discordant affected-unaffected sib pairs. Siblings whose affection status is missing are excluded.

 

UserFQTL Formats

UserFQTL requires nuclear family blocks for input. Madeline enumerates each nuclear family block by affixing a dot "." followed by a sequential ordinal identifier after the original pedigree identifier. For example, if the pedigree ID is 0123, successive nuclear family blocks up to n will be identified as 0123.1, 0123.2, 0123.3 ... 0123.n in the family record headers of the resulting data file. A nuclear family must have at least one person with phenotype data for the pedigree to be included.

USERFQTL (LOC)

Used to specify UserFQTL locus file format.

USERFQTLALL (PED)

Used to specify UserFQTL all nuclear families format. All nuclear families constructed by decomposing a full pedigree will be output.

USERFQTLFOUNDERS (PED)

Used to specify UserFQTL founding nuclear families format. Only nuclear families in the founding generation will be output.

USERFQTLOFFSPRING (PED)

Used to specify UserFQTL offspring nuclear families format. Only nuclear families in the offspring generation will be output.

 


 

Section 5
Internal Constants, Variables, Arrays, References,
and Boolean Flags

Madeline maintains symbolic names for a number of numeric constants such as pi, the base of natural logarithms e, true and false. Madeline also has internal variables and arrays whose default values can be modified by the user. In addition, Madeline provides references to internal information related to individuals, such as the number of offspring that an individual has. References are also provided to directly access the parent, offspring and mate vectors of an individual without having to move to another record in the database table. Finally, Madeline maintains certain state information in globally-accessible boolean flags.

This section contains a table each for numeric constants, internal variables, arrays, references, and boolean flags. Table 5.1 shows numeric constants. Table 5.2 shows internal variables used to store file names, pedigree database field names, and map database field names. Table 5.3 shows internal arrays used to store lists of values that are recognized by the program as having specific meanings. For example, to inform Madeline that -7 represents a missing value in the data, -7 must be present in the NumericMissingValue[ ] array.

When not using Madeline's defaults, you must inform Madeline of the proper field names, such as AffectionStatusField, before you open a database. Similarly, you must tell Madeline what the field codings are before you issue a write command. Note that alternate short names are provided for arrays.  For example, CharacterMissingValue[2] can be referenced as cmv[2].

Table 5.4 shows references that allow you to access information related to individuals such as number of offspring, number of affected offspring, or information related to their parents, children, or mates. These read-only references always begin with the underscore character to distinguish them from field or other variable names with which they might otherwise be confused. References are useful for querying information from pedigree databases.

Finally, Table 5.5 lists Madeline’s boolean flags and their default states. Boolean flags can be set using the turn or set command.

 

Table 5.1. Internal Numeric Constants in Madeline.

Constant Name Value
e 2.71828 ...
pi 3.1415926 ...
missing The uniform numeric missing value indicator.
_female 1
_male 0
_true 1
_false 0
_affected 1
_unaffected 0
_dead 1
_alive 0
   

 

Table 5.2. Internal Variables in Madeline.

Stores..

Name Description Default Value

Data Field Name

AffectionStatusField Stores the name of the affection status field, an optional core field. This field can be either a numeric or character field. See: CharacterAffectionStatus[] NumericAffectionStatus[] "NAFFECTE"

Map Field Name

ChromosomeField Stores the name of the chromosome field in the map database. This field must be a numeric field. "CHROMOSOME"

Data Field Name

DateOfBirthField Stores the name of the date of birth field. "DOB"

Data Field Name

DateOfDeathField Stores the name of the date of death field. "DOD"

Data Field Name

DeathStatusField Stores the name of the death status field, an optional core field. This field can be a numeric or character field. See: CharacterDeathStatus[], NumericDeathStatus[] "DECEASED"

File Name

DetailFile Stores the name of the detail log file. "madeline.dtl"

File Name

DrawingFile Stores the name of the Postscript drawing output file. "madeline.ps"

Data Field Name

DZTwinField Stores the name of the dizygotic twin indicator field. Field must be a character field and only the first character is examined. "DZTWIN"

Param. Value

EvaluationInterval Stores the desired analysis evaluation interval in centiMorgans. Madeline automatically inserts this value into parameter and control files where appropriate. 0.50 centiMorgans

Data

Field Name

FamilyIDField Stores the name of the family (pedigree) ID field. Must be a character field. This core field is not required when FUSION-compliant IDs are used. "FAMID"

Data Field Name

FatherIDField Stores the name of the father ID field. Must be a character field. Required core field. "FATHER"

File

Name

FileEditor Stores the name of the file editor called when the edit command is issued. "e"

Data Field

Name

GenderField Stores the name of the gender field. This field can be either a character or a numeric field. See:

CharacterSexValue[], NumericSexValue[]. Required core field.

"SEX"

Data

Field Name

IndexCaseField Stores the name of the proband or index case indicator field. Must be a numeric field coded with 1 for proband, 0 otherwise. This optional core field is not required when FUSION-compliant IDs are used. "PROBAND"

Data Field Name

IndividualIDField Stores the name of the individual ID field. Must be a character field. Required core field. "STUDYID"

Font Size Value

LabelFontSize Stores the size, in points, of the typeface used to print labels on pedigree drawings. 7 pt.

Font Size Value

LegendFontSize Stores the size, in points, of the typeface used to print the legend on pedigree drawings. 9 pt.

Data Field Name

LiabilityClassField Stores the name of the liability class indicator field, an optional core field. This field can be either a numeric or character field. "LCLASS"

File Name

LogFile Stores the name of the log file. "madeline.log"

File Name

MapDatabase Stores the name of the map database. "emap.dbf"

Map Field Name

MarkerField Stores the name of the marker name field in the map database. This must be a character field. "MARKERNAME"

Data Field Name

MotherIDField Stores the name of the mother ID field. Must be a character field. Required core field. "MOTHER"

Data Field Name

MZTwinField Stores the name of the monozygotic twin indicator field. Must be a character field and only the first character in the field is examined. Required core field. "TWIN"

Param. Value

OffEndDistance Stores the desired analysis off-end evaluation distance in centiMorgans. Madeline automatically inserts this value into parameter and control files where appropriate. 10.00 centiMorgans

Map Field Name

OrdinalField Stores the name of the marker ordinal field in the map database. This field must be a numeric field. "ORDINAL"

File Name

OutputFile Holds the name of the most recent pedigree output file. This variable is reassigned each time a write command is executed. "output.ped"

File Name

ParameterOutputFile Holds the name of the most recent parameter output file. This variable is reassigned each time a write command uses a format, such as certain Sage formats, that requires concurrent writing of a parameter file. "output.par"

Map Field Name

PositionField Stores the name of the marker position field in the map database. This field must be a numeric field. "POSITION"

File

Name

PostscriptViewer Stores the name of the Postscript viewing application used for viewing pedigree drawings "gs"

 

Table 5.3. Internal Arrays in Madeline.

Stores...

Name Description Default Values

Array

CharacterAffectionStatus[ ]

CAS[ ]

Stores a list of string values representing affection status used in the AffectionStatusField when that field is a character field. See: NumericAffectionStatus[] cas[Unaffected]="0"

cas[Affected ]="1"

cas[2]="2"

(unstudied, reported as unaffected)

cas[3]="3"

(unstudied, reported as affected)

cas[4]="%"

(unstudied, conflicting reports)

Array

CharacterDeathStatus[ ]

CDS[ ]

Stores string values representing dead or alive, respectively, used in the DeathStatusField when that field is a character field. See: NumericDeathStatus[] cds[Dead ]="Y"

cds[Alive]="N"

Array

CharacterMissingValue[ ]

CMV[ ]

Stores a list of string values representing missing values used in character fields in the database. cmv[0]=""

cmv[1]="."

cmv[2]="0/0"

cmv[3]="0/ 0"

Array

CharacterSexValue[]

CSV[]

Stores string values used to represent male and female, respectively, in the GenderField when that field is a character field. See: NumericSexValue[] csv[_male]="M"

csv[_female]="F"

Array

NumericAffectionStatus[]

NAS[]

Stores a list of numeric values representing affection status used in the AffectionStatusField when that field is a numeric field. See: CharacterAffectionStatus[] nas[Unaffected]=0

nas[Affected ]=1

nas[2]=2

(unstudied, reported as unaffected)

nas[3]=3

(unstudied, reported as affected)

nas[4]=4

(unstudied, conflicting reports)

Array

NumericDeathStatus[]

NDS[]

Stores numeric values representing dead or alive, respectively, used in the DeathStatusField when that field is a numeric field. See: CharacterDeathStatus[] nds[Alive]=0

nds[Dead ]=1

Array

NumericMissingValue[]

NMV[]

used to store values that represent missing values in numeric fields in the database. nmv[0]=MISSING

nmv[1]=-9999

Array

NumericSexValue[]

NSV[]

used to store values for male and female, respectively, when the GenderField is a numeric field. See: CharacterSexValue[] nsv[_male ]=0

nsv[_female]=1

 

Table 5.4. Internal References to Individual Information.

Reference Type

Name Description Example

Pointer to an Individual

_EighthChild Refers to an individual’s eighth child. Equivalent to _0[7]. See example for _FirstChild.

Numeric Variable

_excluded True (1) if an individual has been marked for exclusion by the user. M>view for _noffspring>=6

Character

Variable

_famid Individual’s family ID. M>exclude for _famid="0300"

Pointer to an Individual

_father Refers to an individual’s father. M>view for _father.bmi>=25

Pointer to an Individual

_FifthChild Refers to an individual’s fifth child. Equivalent to _0[4]. See example for _FirstChild.

Pointer to an Individual

_FirstChild Refers to an individual’s first child. Equivalent to _o[0]. M>view for _noffspring=2

and _firstchild.istyped

and _secondchild.istyped

Pointer to an Individual

_FourthChild Refers to an individual’s fourth child. Equivalent to _0[3]. See example for _FirstChild.

Numeric Variable

_HasData True (1) if an individual has been marked as having data by the last write command. M>view for _hasdata

Character

Variable

_id Individual’s ID. M>view record for _id="0125-100"

Vector of Pointers

to Individuals

_mate Refers to the vector of mates of an individual. M>exclude for bmi>=30

and _nmates=1

and _mate[0].bmi>=30

Pointer to an Individual

_mother Refers to an individual’s mother. M>view for

_mother.studyage-studyage<=17

Numeric Variable

_n Total number of individuals in this individual’s pedigree. M>view for _n>=40

Numeric Variable

_nff Number of founding fathers in this individual’s pedigree. M>view for _nff=3

Numeric Variable

_nfm Number of founding mothers in this individual’s pedigree. M>view for _nfm=_nff+1

Pointer to an Individual

_NinthChild Refers to an individual’s ninth child. Equivalent to _0[8]. See example for _FirstChild.

Numeric Variable

_nmates Number of mates of an individual. M>view for _nmates>=2

Numeric Variable

_noffspring Number of offspring of an individual. M>view for _noffspring>=6

Vector of Pointers to Individuals

_o Refers to the vector of offspring of a female individual. M>view for _noffspring=2

and _o[0].istyped=1

and _o[1].istyped=1

Pointer to an Individual

_SecondChild Refers to an individual’s second child. Equivalent to _0[1]. See example for _FirstChild.

Pointer to an Individual

_SeventhChild Refers to an individual’s seventh child. Equivalent to _0[6]. See example for _FirstChild.

Pointer to an Individual

_SixthChild Refers to an individual’s sixth child. Equivalent to _0[5]. See example for _FirstChild.

Pointer to an Individual

_spouse Refers to an individual's first spouse, if present. Equivalent to _mate[0]. M>view for affected

and _spouse.affected

Pointer to an Individual

_TenthChild Refers to an individual’s tenth child. Equivalent to _0[9]. See example for _FirstChild.

Pointer to an Individual

_ThirdChild Refers to an individual’s third child. Equivalent to _0[2]. See example for _FirstChild.

 

Table 5.5. Boolean State Flags in Madeline.

Boolean Flag Default

Setting

Explanation
AutoExclude ON ON: When executing write pedigree file, Madeline automatically excludes pedigrees, nuclear families, or affected sib pairs having insufficient data.

OFF: Program doesn't evaluate whether pedigrees have sufficient data when executing write pedigree file, resulting in the inclusion of all pedigrees, nuclear families, or affected sib pairs.

There are few reasons to turn AutoExclude off in the current version of Madeline.

Color ON ON: Print pedigrees in color.

OFF: Print pedigrees in black-and-white.

DividedPages ON ON: Print subtrees originating from distinct ancestral founding groups on separate drawing pages.

OFF: If a pedigree consists of multiple subtrees originating from distinct ancestral groups, print all subtrees on a single drawing.

HaplotypeDisplay OFF ON: Genotypes on pedigree drawings are delimited by "|".

OFF: Genotypes on pedigree drawings are delimited by "/".

Quiet OFF ON: Detail-level program messages are not sent to the terminal, but still appear in the detail log file.

OFF: Detail-level program messages are sent to the terminal.

SaveAlleleFrequencies OFF ON: Allele frequencies already calculated from a previous open command are retained and used when a new database having the same structure is opened.

OFF: New allele frequencies are calculated each time a database is opened.

Silent OFF ON: Summary and detail messages do not appear on the terminal, but are still sent to the summary and detail log files.

OFF: Summary messages appear on the terminal. The setting of quiet determines whether detail messages also appear on the terminal.

 


Section 6
Mathematical and Aggregate Level Processing Functions

The following mathematical functions can be used in expressions (Table 6.1):

Table 6.1. Mathematical Functions Available in Madeline.

Function Name Description
ABS( ) Take the absolute value of a real number
ACOS( ) Take the arc cosine of a real number
ASIN( ) Take the arc sine of a real number
ATAN( ) Take the arc tangent of a real number
CEILING( ) Takes the ceiling of a real number (round up to the nearest whole number)
COS( ) Takes the cosine of a real number
COSH( ) Takes the hyperbolic cosine of a real number
EXP( ) Calculate base e raised to the supplied power n
FLOOR( ) Take the floor of a real number (round down to the nearest whole number)
INV( ) Calculates the inverse of a non-zero real number
LOG( ) Take the natural log of a real number
LOG10( ) Take the logarithm to base 10 of a real number
ROUND( ) Rounds a number up or down to the next whole number.
SIN( ) Take the sine of a real number
SINH( ) Take the hyperbolic sine of a real number
SQRT( ) Take the square root of a real number
TAN( ) Take the tangent of a real number
TANH( ) Take the hyperbolic tangent of a real number

 

The following aggregate level processing functions are available in Madeline (Table 6.2):

Table 6.2. Aggregate Functions Available in Madeline.

Function Name Description Example
_oCount(<nExpr>) Returns the count of the number of times the numeric expression, nExpr, evaluates to non-missing among the offspring of an individual. //

// Find the subset of

// mothers for

// whom the affection

// status (naffecte) of all of

// their children is known:

//

M>view for _noffspring>=1 and _oCount(naffecte)=_nOffspring

_oCountFalse(<nExpr>) Returns the count of the number of times the numeric expression, nExpr, evaluates to FALSE (zero) among the offspring of an individual. //

// Find the subset of mothers

// with at least two unaffected

// offspring:

//

M>view for

_oCountFalse(naffecte)>=2

_oCountMissing(<nExpr>) Returns the count of the number of times the numeric expression, nExpr, evaluates to MISSING among the offspring of an individual. //

// Find the subset of mothers

// for whom one or more offspring

// lack a glucose measurement

//

M>view for

_oCountMissing(glu_fast)>=1

_oCountTrue(<nExpr>) Returns the count of the number of times the numeric expression, nExpr, evaluates to TRUE (non-zero, non-missing) among the offspring of an individual. //

// Find the subset of mothers

// with exactly two affected

// and two unaffected offspring

//

M>view for _nOffspring=4 and

_oCountTrue(naffecte)=

_oCountFalse(naffecte)

_oMean(<nExpr>) Returns the mean offspring value of nExpr M>go 1673

M>show studyid

"0470-701"

M>show sex

"F"

M>show bmi

22.975

M>show _noffspring

6

M>show _oMean(bmi)

26.3063

M>show _oMean(studyage)

36.1821

M>

_oStdDev(<nExpr>) Returns the standard deviation of nExpr among the offspring of an individual //

// Find mothers for whom the

// coefficient of variation in

// glucose values among their

// children is greater than or

// equal to ˝:

//

M>view for

_oCount(glu_fast)>=3 and _oStdDev(glu_fast)/_oMean(glu_fast)>=0.5

_oSum(<nExpr>) Returns the sum of nExpr among the offspring of an individual //

// find grandmothers with 20 or

// more grandchildren

//

M>view for _oSum(_noffspring)>=20

 

_oVariance(<nExpr>) Returns the variance of nExpr among the offspring of an individual M>go 35

M>show studyid

"0009-500"

M>show _noffspring

4

M>show _oVariance(bmi)

140.682

M>show _oMean(bmi)

34.5261

M>

 

 

 


Section 7

String and Character Manipulation Functions

 

The following string and character manipulation functions are available in Madeline (Table 7.1):

Table 7.1. String and Character Manipulation Functions Available in Madeline.

Function Name, parameters Description
SubString(cString,nStart,nHowMany) Extract a substring of nHowMany characters starting at position nStart in string cString:

M>what is substring("Hello, World!",1,5)

Hello

M>

 


Section 8

Characteristics Of The Expression Parser

 

There are a few important things to note about the expression parser (Madeline’s command-line interpreter).

Equality of Strings

Madeline only supports exact string comparison. You can test for string equality using either = or ==. The comparison operator is the same in both cases. Two strings are equal if and only if (1) they are the same length and (2) have identical contents. Therefore, assuming FUSION-style IDs, this will exclude everyone in family 0100:

M>exclude for substring(studyid,1,4)="0100"

... but the following will not because FUSION study IDs are always more than four characters long:

M>exclude for studyid="0100"

Note that the latter case would work just fine in FoxPro and other data management systems that, by default at least, use = for inexact string comparisons.

 

Internal Representation of Logical True and False

In Madeline, logical false is equivalent to zero, and logical true is equivalent to not zero. This is identical to the way things work in the C language, but different from the way things work in many interpreted environments which represent true and false using an additional level of abstraction. This allows for certain syntactical conveniences. For example, suppose that you have a numeric field called AFFECTED coded with one for affected individuals and zero for unaffected individuals. In Madeline, you can do this:

M>exclude for affected

Assuming there are no other values in AFFECTED other than 0 and 1, this would be equivalent to:

M>exclude for affected=1

If you feel uncomfortable with this sort of economy of expression, you can always express exactly what you want as shown in the latter case. The latter usage would also be necessary if the affected field possibly contained missing values, since MISSING is a non-zero value.

 


Section 9
Summary of Features


end of document