‹‹ Back to SVS Home
Quality Assurance Overview
7.1 Quality Assurance Overview
To ensure your data is of the highest quality, SVS provides a variety of features that not only help you assess the quality of
your data, but remedy any problems as well.
There are general quality assurance measures as well as specific features for certain types of data, such as CNV. The
features available are detailed below.
Allele and Genotype Frequencies/Counts
Calculate the allele frequencies for both the major and minor alleles and allele and genotype counts for each marker in your
dataset. See Genotype Statistics by Marker for more information.
Filter Samples by Call Rates
Calculates the fraction of called genotypes for each marker or sample. See Genotype Filtering by Marker for more
information.
Hardy-Weinberg Equilibrium P-Value
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg Equilibrium (HWE) by
calculating HWE p-values. See Genotype Statistics by Marker and Genotype Filtering by Marker for more
information.
Fisher’s Exact Test for HWE P-Value
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg Equilibrium (HWE) by
calculating Fisher’s Exact Test for HWE p-values. See Genotype Statistics by Marker and Genotype Filtering by Marker for
more information.
Signed HWE Correlation R
Determine how closely respective genotypes in your dataset approximate a state of Hardy-Weinberg Equilibrium (HWE) by
calculating HWE correlation R values. See Genotype Statistics by Marker and Genotype Filtering by Marker for more
information.
Hardy-Weinberg Thw P-value
Determine if genotypes for samples violate population-based transmission Hardy-Weinberg principles. See Genotype
Statistics by Sample for more information.
PBAT Family-Based QC Measures
Determine if genotypes violate PBAT family-based quality control measures, including Mendelian errors. See PBAT
Family-Based Analysis for more information.
Wave Detection/Correction
Detect and optionally correct the genomic wave phenomenon described by Disken et al. See Wave Detection and Correction
for more information.
Genotype Principal Component Analysis
Adjust for population stratification on genotypic markers. See Genotypic Principal Component Analysis for more
information.
Numeric Principal Component Analysis
Adjust for batch effects or population stratification on log2 ratio data or other numeric data. See Numeric Principal
Component Analysis for more information
SNP Concordance
Calculate various concordance rates for all SNPs for a given set of samples. See Quality Assurance Procedures for more
information.
Filtering Markers
Exclude data out of HWE, with a minor allele frequency or call rate below a user-specified threshold, or does not meet other
quality control thresholds. See Genotype Filtering by Marker for more information.
Derivative Log Ratio Spread
The derivative log ratio spread (DLRS) is a measurement of point-to-point consistency or noisiness in log ratio data. Samples
with higher values of DLRS tend to have poor signal-to-noise properties. See Quality Assurance Procedures for more
information.
Percentile Based Winsorizing
Calculates thresholds for the top and bottom percentiles of log ratio data, as specified by the user, for the purpose of
winsorizing - replacing extreme log ratio values with the calculated thresholds. Winsorizing data prevents segmentation
algorithms from being driven by outlier values and results in a more accurate determination of regions of copy number
variation. See Quality Assurance Procedures for more information.
Autosome Heterozygosity
Calculates the heterozygosity of samples by examining the marker mapped genotype columns in a spreadsheet. See Quality
Assurance Procedures for more information.
Identity by Descent Estimation
Estimates Identity by Descent (IBD) between all pairs of individuals, based on the data in your genotypic spreadsheet. This
function should mainly be used as a quality control measure. The samples are required to be row wise, and only
the autosomal genotype columns should be active. See Identity by Descent Estimation for more information.
- NOTE: It is usually advisable to apply LD pruning before using this feature.
Inbreeding Coefficients
Calculates the inbreeding coefficients of the individuals corresponding to your samples by looking at the samples’ autosomal
data. This function requires a marker mapped spreadsheet containing genotype columns with samples row wise. See
Inbreeding Coefficients for more information.
- NOTE: It is usually advisable to apply LD pruning before using this feature.
LD Pruning
Deactivates (“prunes”) genotypic data from the active columns of the current spreadsheet based on pairwise LD. If any pair
of markers which are both within a moving window has LD greater than the specified threshold, the first marker of the pair
will be deactivated. See LD Pruning for more information.
SNP Density
Reports the SNP density of the current marker mapped genotypic columns. See Quality Assurance Procedures for more
information.
X Heterozygosity Gender Inference
Predicts the gender of samples by examining the X-Chromosome data. This function requires a spreadsheet that contains
marker-mapped genotypic columns from the X Chromosome with samples row wise. See Quality Assurance Procedures for
more information.
Multidimensional Outlier Detection
Identifies outliers in more than one dimension. This function computes a distance score for N columns and determines if
samples have distance scores greater than a user-specified threshold. See Quality Assurance Procedures for more
information.
Column Statistics
Caclulates statistics for all real-, integer-valued or binary (optional) columns in a spreadsheet. See Quality Assurance
Procedures for more information.
Compare Columns
Compares two columns and inactivates rows in which dissimilar data values lie. See Quality Assurance Procedures for more
information.
Row Average by Chromosome
Calculates the mean of the integer and real-valued columns for each row, creating a new spreadsheet with the respective row
means. If the data is marker mapped, the row means are calculated by chromosome and overall. See Quality Assurance
Procedures for more information.