‹‹ Back to SVS Home

Quality Assurance Overview

7.1 Quality Assurance Overview

To ensure your data is of the highest quality, SVS provides a variety of features that not only help you assess the quality of your data, but remedy any problems as well.

There are general quality assurance measures as well as specific features for certain types of data, such as CNV. The features available are detailed below.

Allele and Genotype Frequencies/Counts
Calculate the allele frequencies for both the major and minor alleles and allele and genotype counts for each marker in your dataset. See Genotype Statistics by Marker for more information.

Filter Samples by Call Rates
Calculates the fraction of called genotypes for each marker or sample. See Genotype Filtering by Marker for more information.

Hardy-Weinberg Equilibrium P-Value
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg Equilibrium (HWE) by calculating HWE p-values. See Genotype Statistics by Marker and Genotype Filtering by Marker for more information.

Fisher’s Exact Test for HWE P-Value
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg Equilibrium (HWE) by calculating Fisher’s Exact Test for HWE p-values. See Genotype Statistics by Marker and Genotype Filtering by Marker for more information.

Signed HWE Correlation R
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg Equilibrium (HWE) by calculating HWE correlation R values. See Genotype Statistics by Marker and Genotype Filtering by Marker for more information.

Hardy-Weinberg Thw P-value
Determine if genotypes for samples violate population-based transmission Hardy-Weinberg principles. See Genotype Statistics by Sample for more information.

PBAT Family-Based QC Measures
Determine if genotypes violate PBAT family-based quality control measures, including Mendelian errors. See PBAT Family-Based Analysis for more information.

Wave Detection/Correction
Detect and optionally correct the genomic wave phenomenon described by Disken et al. See Wave Detection and Correction for more information.

Genotype Principal Component Analysis
Adjust for population stratification on genotypic markers. See Genotypic Principal Component Analysis for more information.

Numeric Principal Component Analysis
Adjust for batch effects or population stratification on log2 ratio data or other numeric data. See Numeric Principal Component Analysis for more information

SNP Concordance
Calculate various concordance rates for all SNPs for a given set of samples. See Quality Assurance Procedures for more information.

Filtering Markers
Exclude data out of HWE, with a minor allele frequency or call rate below a user-specified threshold, or does not meet other quality control thresholds. See Genotype Filtering by Marker for more information.

Derivative Log Ratio Spread
The derivative log ratio spread (DLRS) is a measurement of point-to-point consistency or noisiness in log ratio data. Samples with higher values of DLRS tend to have poor signal-to-noise properties. See Quality Assurance Procedures for more information.

Percentile Based Winsorizing
Calculates thresholds for the top and bottom percentiles of log ratio data, as specified by the user, for the purpose of winsorizing - replacing extreme log ratio values with the calculated thresholds. Winsorizing data prevents segmentation algorithms from being driven by outlier values and results in a more accurate determination of regions of copy number variation. See Quality Assurance Procedures for more information.

Autosome Heterozygosity
Calculates the heterozygosity of samples by examining the marker mapped genotype columns in a spreadsheet. See Quality Assurance Procedures for more information.

Identity by Descent Estimation
Estimates Identity by Descent (IBD) between all pairs of individuals, based on the data in your genotypic spreadsheet. This function should mainly be used as a quality control measure. The samples are required to be row wise, and only the autosomal genotype columns should be active. See Identity by Descent Estimation for more information.

  • NOTE: It is usually advisable to apply LD pruning before using this feature.

Inbreeding Coefficients
Calculates the inbreeding coefficients of the individuals corresponding to your samples by looking at the samples’ autosomal data. This function requires a marker mapped spreadsheet containing genotype columns with samples row wise. See Inbreeding Coefficients for more information.

  • NOTE: It is usually advisable to apply LD pruning before using this feature.

LD Pruning
Deactivates (“prunes”) genotypic data from the active columns of the current spreadsheet based on pairwise LD. If any pair of markers which are both within a moving window has LD greater than the specified threshold, the first marker of the pair will be deactivated. See LD Pruning for more information.

SNP Density
Reports the SNP density of the current marker mapped genotypic columns. See Quality Assurance Procedures for more information.

X Heterozygosity Gender Inference
Predicts the gender of samples by examining the X-Chromosome data. This function requires a spreadsheet that contains marker-mapped genotypic columns from the X Chromosome with samples row wise. See Quality Assurance Procedures for more information.

Multidimensional Outlier Detection
Identifies outliers in more than one dimension. This function computes a distance score for N columns and determines if samples have distance scores greater than a user-specified threshold. See Quality Assurance Procedures for more information.

Column Statistics
Caclulates statistics for all real-, integer-valued or binary (optional) columns in a spreadsheet. See Quality Assurance Procedures for more information.

Compare Columns
Compares two columns and inactivates rows in which dissimilar data values lie. See Quality Assurance Procedures for more information.

Row Average by Chromosome
Calculates the mean of the integer and real-valued columns for each row, creating a new spreadsheet with the respective row means. If the data is marker mapped, the row means are calculated by chromosome and overall. See Quality Assurance Procedures for more information.