1. Home
  2. SVS
  3. SVS Tutorials
  4. Advanced Tutorials
  5. Family-Based Analysis Using SVS PBAT Tutorial

Family-Based Analysis Using SVS PBAT Tutorial

Welcome to the Family-Based Analysis Using SVS PBAT Tutorial!


Updated: January 28, 2021

Level: Advanced

Packages: PBAT Analysis

This tutorial leads you through family-based association analysis using the PBAT statistical package incorporated into SNP & Variation Suite (SVS). Covered workflows include data preparation, quality assurance testing, association analysis, and basic visualization of results.

Golden Helix SVS PBAT is developed in collaboration with Dr. Christoph Lange of Harvard University’s School of Public Health.

NOTE: The data used in this tutorial is for demonstration purposes only as it consists of simulated phenotypic information for the CEU HapMap samples.


To complete this tutorial you will need to download and unzip the following file, which includes several datasets.


Files included in the above ZIP file:

  • CEU – PED.csv – Actual pedigree information for the CEU HapMap samples (Phase III).
  • CEU – SIM – PHENO.csv – Simulated phenotype and clinical data.
  • CEU – GENO – Chr22.dsf – Actual chromosome 22 genotypes for the CEU HapMap samples (Phase III) generated from a combination of Affymetrix and Illumina arrays.

We hope you enjoy the experience and look forward to your feedback.

1. Data Preparation

In order to run PBAT in SVS you need, at minimum, a spreadsheet containing pedigree information (including Family ID, Patient ID, Mother ID, Father ID, Sex, and Affection Status) and genetic data (either genotypes or continuous variables, such as log ratios). If there is phenotype data in addition to (or instead of) the Affection Status, you first need to join it with your pedigree and genetic data in order for Golden Helix SVS PBAT to be able to access it. The following steps lead you through importing each data type separately and then merging this data into a single spreadsheet.

A. Import Pedigree Information

Before you can begin you need to create a new project.

  • Open SVS and from the Welcome Screen select File > New Project.
  • Name the project PBAT Tutorial, browse to a directory where you want the project saved, keep the default genome assembly Homo sapiens (Human) GRCh37 (hg19) (Feb 2009), and click OK. This will open the Project Navigator.

The first file to import is CEU – PED.csv contained within the downloaded zip file. This is a comma-delimited CSV file with pedigree information for the CEU HapMap samples (Phase III).

  • Select Import > Family Pedigree > Text Pedigree.
  • Browse to the directory where you saved CEU – PED.csv, select CEU – PED.csv, and click Open.
  • Under Row Labels select Use column number: 1.
  • Choose the Sex is encoded as 0/1/2 (or ?/1/2) radio button.
  • Choose the Affection Status is encoded as 0/1/2 (or ?/1/2) radio button.

NOTE: If the default options (?/0/1) are used for encoding Sex and Affection Status, the resulting spreadsheet will not be recognized as a pedigree spreadsheet.

  • Click OK.

This will create a new pedigree spreadsheet called CEU – PED Pedigree Dataset – Sheet 1 (Figure 1-1).

Figure 2a. Pedigree spreadsheet
Figure 1-1: Pedigree spreadsheet.

NOTE: Pedigree spreadsheets are denoted as such by a pedigree icon in the Project Navigator as well as blue headers for the pedigree columns at the front of the spreadsheet. If your imported spreadsheet has neither of these, it has not been recognized as a pedigree spreadsheet, and so certain analysis options will not be available.

B. Import Phenotype Information

Next you need to import CEU – SIM – PHENO.csv. This is a comma-delimited CSV file with simulated phenotype information. It is used for demonstration purposes only.

  • From the Project Navigator select Import > Text.
  • Browse to the directory where you saved CEU – SIM – PHENO.csv, select CEU – SIM – PHENO.csv, and click Open.
  • Leave the rest of the parameters as defaults and click OK.

This will create a new spreadsheet called CEU – SIM – PHENO – Dataset – Sheet 1 (Figure 1-2).

Figure 2b. Simulated phenotype spreadsheet
Figure 1-2: Simulated phenotype spreadsheet

C. Import Genotypes

Last, you need to import CEU – GENO – Chr22.dsf. This file contains actual genotypes on chromosome 22 for the CEU samples, which were generated by a combination of Affymetrix and Illumina platforms.

  • From the Project Navigator select Import > Golden Helix DSF.
  • Browse to the directory where you saved CEU – GENO – Chr22.DSF, select CEU – GENO – Chr22.DSF, and click Open.

This will create a new marker mapped spreadsheet called CEU – GENO – Chr 22 – Sheet 1 (Figure 1-3).

Figure 2c. Genotype spreadsheet
Figure 1-3: Genotype spreadsheet.

D. Merge Spreadsheets

Now that you have all three spreadsheets in the project you need to join them together. When joining spreadsheets it doesn’t matter which one you start from. However, if there is certain data you want located toward the front of your spreadsheet for easier viewing (e.g. phenotype data) you will want to initiate the join from that spreadsheet. When pedigree data is available (and denoted as such) this information will always be the first six columns of the spreadsheet.

  • Open CEU – PED Pedigree Dataset – Sheet 1 and select File > Join or Merge Spreadsheets.
  • From the spreadsheet chooser select CEU – SIM – PHENO – Dataset – Sheet 1 and click OK.
  • Enter PED + PHENO for New dataset name:.
  • Under Spreadsheet as Child of choose Current Spreadsheet.
  • Leave all other parameters as the defaults and click OK.

This will create a new spreadsheet PED + PHENO – Sheet 1. Now join this one with the genotype spreadsheet.

  • From PED + PHENO – Sheet 1 select File > Join or Merge Spreadsheets.
  • Select CEU – GENO – Chr22 – Sheet 1 and click OK.
  • Enter CEU All for New dataset name:.
  • Under Spreadsheet as Child of choose Project root.
  • Leave all other parameters as the defaults and click OK.

You now have all the data in one spreadsheet, CEU All – Sheet 1, and are ready for analysis.

CNV Analysis Through SVS PBAT

In addition to performing family-based association testing using genotypes as covariates, you can also perform association with various CNV covariates. Though not covered in this tutorial, you would go about PBAT CNV Analysis in the same manner as PBAT Genotype Analysis, except that instead of joining a genotype spreadsheet with your pedigree and phenotype information, you would join your CNV data. To learn more about processing microarray CNV data, see the SVS Microarray CNV Univariate Analysis Tutorial.

2. Quality Assurance

There are a number of quality control metrics in SVS to control for poor quality SNPs and samples, some of which are family-based and make use of SVS PBAT. This tutorial focuses specifically on PBAT Family-Based QC, which enables the detection of Mendelian errors and samples with overall poor genotype quality.

NOTE: Though not covered in this tutorial, it is still appropriate to apply other non-family-based quality assurance metrics to exclude poor quality samples and markers from analysis. Several additional options are available under the Genotype > Quality Assurance and Utilities spreadsheet menu. For more information about these options, see the Genotype Data Quality Assessment and Utilities section of the SVS Manual.

A. Quality Control by Marker

  • Open CEU All – Sheet 1 and select Genotype > PBAT Family-Based QA.
  • Under Computation parameters check Use alternative rapid pedigree algorithm. This option needs to be checked in order for PBAT to report Mendelian errors.
  • Under Output choose Output by marker.
  • Leave all parameters as the defaults and click Run.
Figure 3a. PBAT QA Results by marker
Figure 2-1: PBAT QA Results by marker

Upon completion a new spreadsheet is created, PBAT QA Results (by Marker) (Figure 2-1), with various quality control statistics. In this tutorial we’ll focus on removing SNPs that have one or more Mendelian errors.

  • Right-click the Mendelian errors column and select Activate by Threshold.
  • Select <= 0 and click OK.

This will inactivate all the rows where there are Mendelian errors. We will use the active rows in this spreadsheet to activate their respective columns in the CEU All – Sheet 1 spreadsheet.

  • From the PBAT QA Results (by Marker) spreadsheet go to Select > Apply Current Selection to Second Spreadsheet.
  • Choose to apply filtered rows to CEU All – Sheet 1, then Click OK.

This will create a new spreadsheet, CEU All – Sheet 2, with 19,090 active columns. This tool will also inactivate the pedigree and phenotype columns–to reactivate these, left-click once on the Family ID column header, then while holding down the Shift button, click on the Age phenotype column header.

B. Quality Assurance by Sample

PBAT incorporates a novel test that assesses the genotyping quality of individual probands in family-based association studies. Published in PLoS Genetics Fardo, 2009 these tests are “ideally suited as the final layer of quality assurance filters in the cleaning process of genome-wide association studies.”

  • Open CEU All – Sheet 2 and select Genotype > PBAT Family-Based QA.
  • Again, check Use alternative rapid pedigree algorithm under Computation parameters.
  • This time select Output by proband under Output and click Run.
Figure 3b. PBAT QC Results by proband
Figure 2-2: PBAT QA Results by proband

Another new spreadsheet is created, PBAT QA Results (by Proband) (Figure 2-2), this time with quality control metrics for each proband. In the paper cited above, Fardo et al. suggest that, on a genome-wide scale, probands with a score greater than 30 are considered to have poor genotyping quality.

  • Right-click on the Tgw column header and select Sort Descending.

Notice there are 5 samples with a Tgw value greater than 30. However, this particular dataset only contains genotypes for chromosome 22, so the statistics reported do not necessarily translate to a whole genome scale. Therefore, for this tutorial we will not exclude any samples.

3. Association Analysis

Now that important quality control metrics have been considered, you’re ready to run PBAT analysis on the remaining samples and SNPs. There are many different configurations of association tests and parameters one could run in PBAT. This tutorial covers a basic workflow. For more detailed information on the various options please reference the PBAT Family-Based Analysis section of the SVS Manual.

A. Run PBAT Genotype Analysis

  • Open the CEU All – Sheet 2 spreadsheet and select Genotype > PBAT Genotype Analysis.

This will open the PBAT Genotype Analysis window. The first window enables you to select various phenotypes, predictor variables, interactions, and more for analyses. For this tutorial we will only consider Affection Status.

  • Select Affection Status in the upper-left box on the Select Phenotypes tab.
  • Click the Test Statistic and Computational tab.
  • Check Output -log 10 p-values under Output Format.
  • Leave all other parameters as defaults and click Run.
Figure 4a. PBAT Results spreadsheet
Figure 3-1: PBAT Results spreadsheet

Upon completion a results spreadsheet, PBAT Results is created (Figure 3-1). This spreadsheet reports a number of statistics, of greatest interest being -log10 pvalue(FBAT) and power(FBAT). For a complete description of these and the other statistics reported please see the PBAT Family-Based Analysis section of the SVS Manual.

B. Plot Results

We will examine both the -log10 pvalue(FBAT) and power(FBAT) columns.

  • From the PBAT Results spreadsheet, right-click on the -log10 pvalue (FBAT) column and select Plot Variable in GenomeBrowse.
  • Zoom into Chromosome 22 by copy and pasting 22: 13,501,202 – 51,304,566 into the address bar at the top of the GenomeBrowse window.
Figure 4b. Plot of -log10 pvalues (FBAT)
Figure 3-2: Plot of -log10 pvalues (FBAT)

This opens the plot viewer with -log10 pvalues displayed according to chromosome and position (Figure 3-2). You can add additional plots to this view from the User Graphs node in the Graph Control Interface.

  • Go to File > Plot and click the Project button then select the PBAT Results spreadsheet and check the power(FBAT) item and then click Plot & Close.

You should now have two graphs in the plot viewer.

Updated on March 22, 2021

Was this article helpful?

Related Articles

Leave a Comment