‹‹ Back to SVS Home
Importing Family-Based Data
4.5 Importing Family-Based Data
4.5.1 Preparing Family Data
HelixTree supports the import of family-based data in FBAT/PBAT Pedigree format, FBAT/PBAT Phenotype format, text pedigree format, and family-based text phenotype format.
NOTE: HelixTree also supports joining family-based spreadsheets with other spreadsheets. This may be useful if the genetic data for your family-based study comes from sources such as the Affymetrix GeneChipTM.
(At this time, HelixTree only supports import of the above family-based formats. In the future, we may support import of other variations on the pedigree format or of other family-based data formats.)
The format of a text pedigree should be as delineated in 4, with the following exceptions:
- The optional label column, whose position in the file you may specify in the import dialog, is mainly useful for indexing into other genetic data. This data could be, for instance, Affymetrix GeneChipTM data.
- The first (non-row-label) column should be the family number or ID.
- The second (non-row-label) column should be the patient number or ID.
- The third (non-row-label) column should be the father number or ID.
- The fourth (non-row-label) column should be the mother number or ID.
- The fifth (non-row-label) column should be the gender. Use 1 for male and 2 for female.
- The sixth (non-row-label) column should be the affection status. Use 2 for affected, 1 for unaffected and 0 for unknown.
- The remaining (non-row-label) columns should be markers. Use the same format as delineated in 4.
The format of a text phenotype file should also be as delineated in 4, but with these exceptions:
- The first (non-row-label) column should be the family number or ID.
- The second (non-row-label) column should be the patient number or ID.
- The remaining (non-row-label) columns may be phenotype data.
4.5.2 Importing Family Data
From within a HelixTree project’s main screen open the PBAT menu. A sub-menu showing the file formats will appear. Select the desired file format. In the case of the FBAT/PBAT pedigree or phenotype format, a dialog box will appear to allow the file selection.
|
In the case of a text pedigree or family-based format, a dialog will appear. Select the text file, how it is delimited, and what column, if any, holds row labels.
NOTE: To facilitate organizing Affymetrix GeneChipTM or other genetic data according to pedigree, use the CHP file name (or other genetic label) as one of the columns in your text file, and designate that column as the one which will hold the row labels for your text pedigree file. Later on, you will be able to use “Join Spreadsheets on Row Labels” to joint the text pedigree spreadsheet and the genetic information spreadsheet. (See 6.3.7.)
Once you select a file or click “OK” to the dialog, the importation process will start. On a very large file this may take some time. In that case, a progress indicator will show.
Once the data has been converted you will see a summary of the number of records and the data types processed. Dismiss by clicking the OK button.
When finished you are returned to the main navigation screen of HelixTree where you will see a new icon for the imported data and a new icon for a spreadsheet containing the imported data. The spreadsheet is automatically opened and ready for further analysis.
Note that the Family ID and Patient ID are the first two “data” columns in family-based spreadsheets. These act as a “virtual patient index” into these spreadsheets when they are used for PBAT family-based analysis.
NOTE: When creating your pedigree, remember to list the parents, even if their genotype information is not known, in order to group siblings together properly into families.
NOTE: If unrelated families are listed together using the same family ID, PBAT will consider the data under that family ID to be invalid.
NOTE: When performing non-PBAT analyses of family-based spreadsheets in HelixTree, disable the Family ID and Patient ID columns.
NOTE: Non-family-based analyses using phenotypes may be facilitated by first joining the phenotype and pedigree spreadsheets.