A Glossary of Terms Used in Family-Based Analysis
The following is a short glossary of terms used either as a part of genetic analysis in general or as a part of family-based analysis in particular.
Allele:
One of two or more alternative nucleotide sequences at a single gene locus on a chromosome.
Allele frequency:
Allele frequency is a term in population genetics that is used in characterizing the genetic diversity of a species population or
equivalently the richness of its gene pool. Allele frequency is defined as follows:
Given:
- a Particular chromosome locus,
- a gene occupying that locus,
- a population of individuals carrying n loci in each of their somatic cells (e.g. two loci in the cells of diploid species, which contain two sets of chromosomes), and
- a variant or allele of the gene,
the allele frequency of that allele is the fraction or percentage of loci that the allele occupies with in a population. For instance, if the frequency of an allele is 20% in a given population, then among population members, one in five chromosomes carry that allele. Four out of five will be occupied by other variants of the gene, of which there may be one or many.
Example: If there are ten individuals in a population, and at a given locus, there are two possible alleles A and a, and if the genotypes of the individuals are AA, Aa, AA, aa, Aa, AA, AA, Aa, Aa and AA, then the allele frequencies of allele A and allele a are:


Association studies:
The primary means of establishing the correlation between a given gene and the risk of having a particular
disease.
Attributable fraction:
The proportion of disease occurrence that would potentially be eliminated if exposure were prevented.
Binary trait:
A binary trait has only two possible values, e.g. presence of the trait vs. absence of the trait.
Continuous trait:
A continuous trait is a trait whose variations are measured with a scale or has a range of variation, rather than classification
into categories. Examples: height, body mass index (BMI), blood pressure, etc.
Covariate:
In statistics, a covariate is a variable that is possibly predictive of the outcome under study. A covariate may be of direct
interest or be a confounding variable or effect modifier, affecting the relationship between the dependent variable and
independent variables of primary interest.
Disease gene:
A gene that carries or is responsible for a disorder, defect or a disease.
Genetic model:
The overall specification of how the disease allele(s) act to the influence the disease. For parametric (model-dependent)
linkage analysis, the genetic model must be specified for analysis. Components of the genetic model include the information
on whether the disorder is autosomal or X-linked, dominant or recessive, the frequency and penetrance of the
disease allele, the frequency of the phenocopies and the mutation rate. A genetic model consists of three main
components:
- A model for disease susceptibility, connecting disease phenotypes to genotypes at disease susceptibility (DS) loci for the sibs;
- A population genetics model, describing the population joint distribution of genotypes at the DS loci of the parents; and
- A segregation model, describing the segregation of alleles at the DS loci during meiosis.
Genotype:
May mean the genetic composition (alleles) of an individual in total, but in PBAT and HelixTree, refers to the particular pair
of alleles that an individual possesses at a single gene locus on a chromosome.
Haplotype:
Set of closely linked genetic markers present on one chromosome which tend to be inherited together (not easily separable by
recombination).
Hardy Weinberg equilibrium:
A state attained by a population which displays constant allele and genotype frequencies from generation to generation. In the
case of a locus with two alleles, A and B, occuring at frequencies p and q, respectively, the frequency of genotype AA is p2,
the frequency of AB is 2pq and the frequency of BB is q2. A population in HW equilibrium normally has to be large and
random-mating with no selection, mutation, or migration.
Heritability:
A measure of the degree to which the variance in the distribution of a phenotype is due to genetic causes. Specifically,
heritability is defined as the proportion of phenotypic variance explained by the analyzed marker. In PBAT, a negative sign for
a heritability is used to denote a negative correlation between the phenotype and the number of transmitted target/disease
alleles. Thus, a positive heritability in PBAT denotes that the allele is over-transmitted, and a negative heritability in PBAT
denotes that the allele is under-transmitted.
Linkage:
Two genes or markers that are so close together on a chromosome that they are rarely separated by recombination are said to
be linked. (Linkage analysis is a statistical method for detecting linkage between a disease and markers of known location by
following their inheritance in families.)
Linkage disequilibrium:
Linkage disequilibrium (LD) is the condition in which the haplotype frequencies in a population deviate from the values they
would have if the genes at each locus were combined at random. LD between two loci often indicates that they are physically
close to each other on a DNA strand.
Marker gene:
A detectable genetic trait or segment of DNA that can be identified and tracked. A marker gene can serve as a flag for another
gene, sometimes called the target gene. A marker gene must be on the same chromosome as the target gene and near enough
to it so that the two genes (the marker gene and the target gene) are genetically linked and are usually inherited
together.
Monte-Carlo simulation:
An analytical technique in which a large number of simulations are run using random quantities for uncertain variables and in
which the distribution of results is used to infer which values are most likely. The name comes from the city of Monte Carlo,
which is known for its casinos.
Multivariate:
Statistical, mathematical or graphical technique which considers multiple variables simultaneously.
Null-Hypothesis:
This is usually a statement of "no effect", that is to say that the independent variable will not have any effect on the dependent
variable and that any differences between the experimental and control groups are attributable to chance. The null hypothesis
is usually represented by the symbol H0, and is stated in order that it can be rejected as an explanation for the results of the
experiment. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on
average, than the placebo. We would write H0 : there is no difference between the current drug and the placebo on
average.
Odds Ratio:
The odds ratio is a way of comparing whether the probability of a certain event is the same for two groups. An odds ratio of 1
implies that the event is equally likely in both groups. An odds ratio greater than 1 implies that the event is more likely in the
first group.
For instance, the odds ratio may describe the odds of an experimental patient suffering an adverse event relative to a control patient. Or, it may describe the ratio of the odds of having the target disorder in the experimental group relative to the odds in favor of having the target disorder in the control group. Or, it may describe the odds in favor of being exposed in subjects with the target disorder divided by the odds in favor of being exposed in control subjects (without the target disorder).
(Odds: A ratio of number of people incurring an event to the number of people who have non-events.)
P-Value:
P-Value is a measure of how much evidence we have against the null hypothesis. The smaller the P-value, the more evidence
we have against H0. Traditionally, researchers will reject the null-hypothesis if the P-value is less than 0.05. A small
P-value is evidence against the null hypothesis while a large P-value means little or no evidence against the null
hypothesis.
Pedigree files:
Pedigree files contain information about family relationships, gender and genetic data.
With minor variations, the pre-MAKEPED format for the LINKAGE program is the de-facto standard for pedigree files. This
format contains fields for pedigree number, individual ID, father’s ID, mother’s ID, sex, disease status, and the first and second
alleles of each of the markers.
Phenotype files:
Phenotype files contain information about the individual phenotype values such as height, weight, body mass index (BMI),
whether the individual has the disease being studied, etc.
There are many different formats for phenotype files. However, they typically identify the pedigree ID and individual ID so
that phenotype and pedigree information may be matched.
Power (Statistical):
Statistical power is the probability you will detect a meaningful difference, or effect, if one were to occur. Ideally, studies
should have power levels of 0.80 or higher – an 80% chance or greater of finding an effect if one was really
there.
Another definition is: a gauge of the sensitivity of a statistical test, that is, its ability to detect relationships. Specifically, the probability of rejecting a null hypothesis when it is false–and therefore should be rejected. In general, the statistical power increases with your sample size. Also called the "Power" of a test.
Another definition: The power of a statistical test is the probability that the test will reject a false null hypothesis, or in other words that it will not make a Type II error. The higher the power, the greater the chance of obtaining a statistically significant result when the null hypothesis is false.
Predictor variables:
Variables or factors that are assumed to have an effect or influence on the selected phenotypes. E.g. height, weight, sex, age.
However, they are not necessarily the variables of primary interest. (See Covariate.)
Prevalence:
Prevalence is the total number of cases of a disease in a given population at a specific time, or the percentage of population
estimated to have that particular disease. “Population”, as used as a denominator, is generally the projected population
calculated from the given model.

Proband:
The family member through whom a family’s medical history comes to light. For example, a proband might be a baby
with Downs syndrome. The proband may also be called the index case, propositus (if male), or proposita (if
female).
Significance level:
The significance level of a test is the probability that the test statistic will reject the null hypothesis when the null hypothesis is
true.
Simulation:
The use of a mathematical model to recreate a situation, often repeatedly, so that the likelihood of various outcomes can be
more accurately estimated.
SNP analysis:
Single nucleotide polymorphisms (SNPs) are DNA sequence variations that occur when a single nucleotide (A, T, C, or
G) in the genome sequence is changed. This occurs approximately once every 100 to 300 bases. There are
many techniques for SNP detection and genotyping, such as restriction fragment length polymorphism PCR
(RFLP-PCR), SSCP, allele specific hybridization, primer extension, allele specific oligonucleotide ligation, and
sequencing.
Test statistic:
A test statistic is a quantity calculated from a sample of data. Its value is used to decide whether or not the null hypothesis
should be rejected in a hypothesis test. The choice of a test statistic will depend on the assumed probability model and the
hypothesis under question.