Missing Values

18.4.1 Using Missing Values for Genotypes

Your data may have missing values for some of the genotypes. The default for association testing and stratification correction is to drop these missing values.

However, sometimes it is desirable to test wholly or partly on “predictive missingness”, that is, what dependency the response may have upon missing values. If you wish to include missing values in the predictions, check Use Missing Values As Predictors.

Note that the available statistical tests which use missings as predictors consist only of the following:

  • Chi-Squared Test (case/control dependent variable) (18.3.4)
  • Fisher’s Exact Test (case/control dependent variable) (18.3.5)
  • Analysis of Deviance (case/control dependent variable) (18.3.7)
  • F-Test (quantitative dependent variable) (18.3.8)

These test types do not impose anything like an “order” on the predictor’s values, and thus can work with missing data.

NOTE: No stratification correction is available when including missing data as predictors.

18.4.2 Missing Values in a Case/Control Variable

If you have case/control data with some missing values, you may have found out that HelixTree will import this column as a type “integer”. Normally, an “integer”-type dependent variable will be subject to different tests than a “binary”-type dependent variable (with the exception of the correlation/trend test).

However, the good news is that the HelixTree Association Test window (and the separate General Marker Statistics window 18.10) will recognize such a column as “binary”, if you have made it the dependent column, and will give you the ability to test this data as case/control data.

NOTE: When you make a column dependent, the HelixTree spreadsheet will deactivate all rows in which that response column has a missing value, before any tests are run.