PBAT Data Analysis for Copy Number Variation
23.5.1 Summary
A new feature of PBAT is the support of testing for copy-number variation (CNV) in a family-based setting. Family-based CNV analysis is discussed in [Ionita-Laza 2007].
The normal FBAT statistic is based on the coded genotypes of the family members being tested at the locus being tested. These depend upon the genetic model under consideration. Meanwhile, the CNV FBAT statistic is simply based on the intensity values themselves, or else numbers derived from intensity values such as log2 ratios. These intensity-derived values are used in place of the coded genotypes. This approach bypasses the need for a CNV genotyping algorithm to analyze CNV data.
To obtain the expected intensity value for an offspring, the intensity values of the respective parents are averaged. If the parental information is missing, the intensity values of the siblings are averaged. (This is in place of finding an expected genotypic coding based on the genotypes of the parents or the genotypes of the siblings.)
To obtain a variance, an empirical variance under the null hypothesis is used, since using Mendelian transmissions to compute the theoretical variance is not available in this context.
All robustness properties of the genotype FBAT approach are maintained in PBAT CNV analysis. In addition, all previously-developed FBAT extensions, including FBATs for time-to-onset, multivariate FBAT’s, and FBAT-testing strategies, can be directly transferred to the analysis of copy-number variation.
The following CNV PBAT features are available in HelixTree:
- Computation of CNV FBAT statistics for nuclear families and for extended pedigrees.
- Multivariate CNV FBATs for multiple phenotypes: FBAT-GEE and FBAT-PC. FBAT-GEE is based on the generalized estimating equation approach.
- Transformation tools for continuous phenotypes that are not normally distributed.
- Including predictor variables in the CNV FBAT.
- Including gene-environment/drug interactions in the CNV FBAT statistic.
23.5.2 Using PBAT CNV Family-Based Analysis
23.5.2.1 Getting Started
The first step is to open an existing project or create a new project where you want to do the data analysis. See 3.1.2 for details about creating a new project. See 3.4.1 for details about opening an existing project.
Once you have opened or created a project you must import a file containing the pedigree information, possibly containing row labels identifying the test subjects. Using PBAT -> Import Text Pedigree from the project menu is recommended, since when using this method, it is not necessary to supply any genotypic information at all, and you can import row labels if necessary. See 4.5.
NOTE: When creating your pedigree, remember to list the parents, even if their CNV intensity information is not known, in order to group siblings together properly into families.
NOTE: If unrelated families are listed together using the same family ID, the results will be unpredictable.
Once the pedigree is imported, the analysis process may be begun. You can perform PBAT CNV family-based analysis by selecting PBAT -> PBAT CNV Analysis from the project menu. A parameter selection dialogue will open.
23.5.2.2 Major CNV Parameters
A few major parameters may be selected near the top of the screen–these are as follows:
- The phenotype spreadsheet. (See the beginning of Subsection 23.4.3 for a discussion of selecting the phenotype spreadsheet.)
- The DSF copy-number data file. (Select the “.dsf” file of log2 ratio data. Use the “Browse” button for easier file selection.)
- The selection of chromosomes from the DSF copy-number file. When you select the “.dsf” file of log2 ratio
data, a list of chromosomes within that file will appear.
If you wish to deselect some of these, press the “Choose” button to get a chromosome selector. This will have a check box for each chromosome, so that you can deselect any chromosomes that you wish.
- The spreadsheet containing the pedigree, gender, and affection status information. Clicking on the “Browse”
button here will pull up a spreadsheet selector. Click on the desired pedigree spreadsheet and click OK.
NOTE: The pedigree-information spreadsheet may also contain genotypic information–however, this will simply be ignored during CNV analysis.
23.5.2.3 Other CNV Parameters
The other parameters for CNV PBAT analysis include phenotype (and other variable) selections, the type of analysis, type of screening, phenotype parameters, haplotype parameters, test statistic parameters and computational parameters. The remaining parameters are organized into four tabs, visible below the major parameter selections. These tabs are as follows:
- Select Phenotypes
Figure 23.19: The Select Phenotypes tab of the PBAT CNV analysis dialog.
Select the phenotypes in the same way that you would for genotypic PBAT analysis. (See 23.4.3.)
NOTE: Time-to-onset analysis is not available for HelixTree CNV PBAT. Thus, it will not be possible to select a censor variable.
- Phenotype Parameters
Figure 23.20: The Phenotype tab of the PBAT CNV analysis dialog.
The phenotype parameters supported by HelixTree CNV PBAT are:
- Maximum and Minimum Number of Phenotypes per Group
- Offset Choice
- Compute All Predictor Sub-Models
- Transformations
Please see the beginning of Subsection 23.4.4 for a discussion of these parameters.
- Test Statistic and Computational
Figure 23.21: The Test Statistic and Computational tab of the PBAT CNV dialog.
The test statistic and computational parameters supported by HelixTree CNV PBAT are:
- Test Statistic (NOTE: Only FBAT-GEE and FBAT-PC are available for HelixTree CNV PBAT.)
- Null Hypothesis
- Screening Type
- GFBAT
- Maximal Number of Non-Founders in One Pedigree
- Maximal Iterations for GEE
- Significance Level
- Output Format
See Subsection 23.4.5 for discussions of these parameters.
- Multiple Processes
NOTE: Multiple processing is not available for HelixTree CNV PBAT.
NOTE: In general, CNV computations will be faster than genotypic calculations because the algorithm for obtaining expected intensity values is simpler than that for obtaining expected genotypic scores.
23.5.2.4 Outputs
Outputs from PBAT copy-number variation analysis are almost completely the same as the corresponding outputs from genotypic PBAT analysis, with the exception that fields that do not make sense for CNV analysis are not output.
See 23.4.7.6 for a description of the outputs from genotypic PBAT analysis.