‹‹ Back to SVS Home

Runs Of Homozygosity (ROH) Algorithm

15.7 Runs Of Homozygosity (ROH) Algorithm

The Runs of Homozygosity algorithm is designed to find runs of consecutive homozygous SNPs and find where those runs are common among at least the specified number of samples.

The first part of the analysis is determining the homozygous runs or Runs of Homozygosity (ROH), which can be represented as the index of the SNP where the run started, and the index of the last SNP in the run. A homozygous genotype is one where both bases are the same, such as A_A or B_B.

Unlike the PLINK ROH algorithm which uses a moving window that potentially introduces artificial runs and fails to capture runs larger than the window, the SVS algorithm works continuously across an entire chromosome looking at every possible run that match the parameters specified for the algorithm in the Runs of Homozygosity window. The algorithm looks at every homozygous SNP as a potential start of a new ROH run. Each SNP is then determined to be homozygous, heterozygous or missing. Each potential run is then updated to be extended if it’s homozygous, or modified to have their heterozygous or missing count incremented. Runs that exceed the allowed number of heterozygotes or missing SNPs are then checked to see if they should be thrown out based on their length and SNP density according to the specified algorithm parameters. At the end of a chromosome, the longest runs are greedily chosen out of all the potential runs (with no overlapping of runs allowed) until no more runs are available. The result is the set of runs containing the longest runs of homozygous SNPs for a given chromosome and sample. This is repeated for each chromosome and sample of the dataset.

A second algorithm is used to create a clustering of runs. Given the final list of ROHs calculated by the previous algorithm, and a threshold Smin for the minimum number of samples that contain a run, a cluster is defined as a contiguous set of SNPs where every SNP has at least Smin samples in a run for each SNP. The algorithm sweeps across all the SNPs in the dataset looking for clusters that meet the run length requirements provided to the algorithm. A list of clusters is outputted to a spreadsheet. Optionally, another spreadsheet is computed that calculates for each sample the ratio of SNPs in each cluster that are members of a run of homozygosity.