Back to Tutorials

VarSeq Example Project

Example Tumor / Normal Pair Analysis

Work through a paired tumor/normal exome from a published gastric cancer study. Filter 11,079 variants down to confirmed somatic events using COSMIC, a 661-gene custom list, and side-by-side BAM review in GenomeBrowse.

This project contains an exome pair (Normal-N990005 and Tumor-T990005) from the gastric cancer study Exome sequencing of gastric adenocarcinoma identifies recurrent somatic mutations in cell adhesion and chromatin remodeling genes, published in Nature Genetics. Exome sequence data was downloaded from the NCBI Sequence Read Archive (SRA) under accession SRA045832. Batch single-sample variant calling was done with BWA + GATK through the Seven Bridges Genomics pipeline. The full BAM and VCF data is available through VarSeq under Tools > Manage Data Sources, in Example Samples > Gastric Cancer Samples.

Initial Workflow Summary

The VCF files were imported using the Tumor-Normal template. We set the tumor/normal relationship and, on the last dialog, restricted import to chromosomes 3-5 within defined exon regions from RefSeq Genes 105v2, NCBI. The result: 11,079 variants from the combined pair.

Initial 11,079 variants from the tumor/normal pair

The variants were then annotated with the following sources:

  • RefSeq Genes 105v2, NCBI
  • COSMIC 71v2 Mutations Left Aligned 71v2, GHI
  • OMIM Genes and Phenotypes from the 2016-06-01 release

The variants were then filtered with:

  • Tumor Sample Filter field contains PASS
  • Tumor Sample Read Depth (DP) ≥ 10
  • Tumor Sample Alt Allele Freq > 0.01
  • Normal Sample Alt Allele Freq < 0.001 or missing
  • Variant present in COSMIC 71v2
  • Variant present in the 661 Study Target Genes selected gene list
Tumor/normal filter chain with COSMIC and gene list filters

Investigating Results

Of the 11,079 variants imported, 367 meet the QA criteria of the first two filter cards and fit the pattern of sufficient alternate allele frequency in the tumor and absence in the normal. As you scroll the variant list, verify true somatic mutations against the BAM reads for each sample.

From these 367 variants, two Variant Sets were created: one for confirmed somatic mutations and one for potential somatic mutations.

Confirmed and potential somatic variant sets

Click the row with the red flag in the SV variant set to see a confirmed somatic variant in GenomeBrowse: 3:9920151 G/A.

GenomeBrowse view of confirmed somatic variant 3:9920151 G/A

Click the row with the green flag in the PV variant set to see a potential somatic variant: 3:125725559 C/G.

GenomeBrowse view of potential somatic variant 3:125725559 C/G

Variants in the Potential Somatic Mutations set were included if there was no coverage in the normal sample for the region or if the variant had a single read for the alternate allele. These flag regions that may need re-sequencing to confirm findings.

Of the 367 variants, 50 are present in COSMIC 71v2. Click the 50 at the bottom of the COSMIC 71v2 filter card.

COSMIC subset showing 50 variants

Of those 50 COSMIC-matched variants, 4 are flagged as Somatic Variants (the 4 in the red square) and 2 are flagged as Potential Somatic Variants (the 2 in the green square).

Filtering by a Gene List

Match Gene List filter card with 2 matching variants

The last filter card was created with the Add > Computed Data > Match Gene List algorithm, which determines matches between the gene annotation of each variant and a user-selected list of gene or identifier symbols.

The original study identified 661 genes containing non-silent somatic point mutations, and we used that list to define the filter.

Of the 50 variants reaching this stage, 2 fall within those 661 genes. Click the 2 at the bottom of the filter chain to update the variant table.

Final 2 variants matching the 661-gene study list

Exporting Data

To export an annotated VCF of the variants of interest for the combined pair, click the 367 on the Alt Allele Freq (Normal) < 0.001 OR missing card, then go to Export > VCF File. On the first dialog, choose to export only the variant table.

VCF export dialog with variant table selected

On the second dialog, include the RefSeq Genes 105v2, NCBI, Summary of COSMIC 71v2 Mutations Left Aligned 71v2, GHI, and the Flag fields from the Variant Sets in addition to the default checked items.

Field selection dialog for VCF export

The result is an annotated VCF file you can import into a new project for further analysis or load directly into a GenomeBrowse window for visualization.

Try this workflow on your own data

Request a free VarSeq evaluation and reproduce this tumor/normal analysis or run it against your own samples.

Free 30-day trial
Full feature access
Direct support from our scientists