Release Notes

HelixTree is a software system for disentangling complex relationships between multiple interacting genetic and environmental factors and disease and clinical outcomes.

HelixTree employs novel recursive partitioning algorithms to understand relationships among variables using statistical hypothesis testing.

1.2.1 Improvements in version 6.4

  • The Import LogR DSF functionality has been put into a separate CNAM menu item.
  • The capability to save LogR DSF files as CNT files has been embedded directly into HelixTree as a menu item and a Python command (“convertDsfToCntFiles()”).
  • HelixTree can optionally import all “genetic” data (data containing underscore delimiters) as purely categorical and not genetic.
  • Spreadsheet logs have been clarified and improved for both genotypic association testing and CNV association testing.
  • A separate spreadsheet with markers in DD/Dd/dd format (where the major and minor alleles are clearly identified as d and D, respectively) is now available.
  • Coefficient and intercept standard error outputs have been added to logistic regression.
  • Imputed case frequencies and imputed control frequencies have been added to the output of a full-model-only logistic haplotype trend regression.
  • A “Save As” option has been added to the Python file chooser dialog.
  • The new Python command “getManualSplit()” finds the manual split that would be derived from the root node of a tree directly formed from the current spreadsheet.
  • A new PBAT option is available to speed up the analysis of extended pedigrees.
  • An optional “signed p-value” PBAT output is available. This output combines the magnitude of the p-value with the directionality of the effect.
  • Up to eight simultaneous PBAT jobs may now be run on your local machine.

1.2.2 Improvements in version 6.3

  • PBAT has been updated to support copy number variation (CNV) family-based association tests. (See Section 23.5.)
  • Join Spreadsheets With Uneven Row Numbers may now be used to append spreadsheets. (See Section 6.3.8.) Additionally, if Join Spreadsheets With Uneven Row Numbers is used on two pedigree spreadsheets, the resulting spreadsheet will be a pedigree spreadsheet, and if it is used on two phenotype spreadsheets, the resulting spreadsheet will be a phenotype spreadsheet.

1.2.3 Improvements in version 6.2

  • The Copy Number Analysis Module (CNAM) has added functionality for doing association tests and PCA correction directly on LogR copy number data (see section 25.7).
  • HelixTree can now import and export the text PED/MAP format as well as import the binary PED (BED) format (see section 4.3.7).
  • The download Affymetrix marker map feature now supports full marker maps with Copy Number (CN) probes.

1.2.4 Improvements in version 6.1

  • The genetic association test module (see Chapter 18) has the following enhancements:
    • Tests using continuous dependent variables (quantitative traits) are now available.
    • Stratification correction through principal components analysis (PCA) (the “EIGENSTRAT” method) is now available.
    • Stratification correction through genomic control is now also available.
  • Principal components as determined by PCA may also be output into a separate spreadsheet without doing association testing using a separate window.
  • Overall marker statistics are also available from a separate window.
  • The Copy Number Analysis Module (CNAM) (see Chapter 25) has been improved to also include the ability to import CNCHP files and intensity CEL files as log2 ratios for use in the optimal segmenting algorithm in CNAM and association analysis in HelixTree.
  • For convenience when sorting columns of type Real and Int, missing values will be at the end when sorted either by ascending or descending order.
  • The CNAM Convert CNT(s) to DSF tool has been optimized to handle very large CNT files.
  • Under Linux, a full set of binary python modules from the python standard library is now bundled.

1.2.5 Bug Fixes

  • Copy number segmentation has been fixed to use much less memory.
  • P-values from PBAT analyses done with interaction variables differed depending on the screening method being used.
  • Applying certain trees which used logistic regression would sometimes hang the application.
  • Logistic regression handles certain difficult-to-regress situations better.
  • Uncorrelated data in correlation/trend test could crash the application.

1.2.6 Known Bugs

  • When using the toolbar Feedback feature from behind a firewall the email is sometimes prevented from going out. The current workaround is to copy the TO and Subject fields generated by HelixTree and put them along with your feedback in your normal email program.
  • Multiple threading is disabled in tree building, for now, because there have been problems with multiple threading, related to the third-party library we use to handle threading and strings.
  • One of our third party libraries for importing data from different file formats will not let you specify a row for the column header different from their default for some data formats.
  • When importing ASCII delimited data using the Import Wizard and specifying comma as the allele delimiter, the format of the file must be tab delimited. Auto-detection of other field (column) delimiters will not work.
  • Exceeding 4GB of memory use will crash the program on all platforms as it is a 32-bit application. Different Windows configurations support only up to 2GB or 3GB. See Appendix E for details on how to increase the maximum size of a Windows process from 2GB to 3GB.
  • To improve performance for manual splitting, once the variables have been scanned, a question box will pop up if there are too many splits for the manual split window to appear quickly. You may choose to list only the most important splits. P-Value plots will still show data for all potential splitters.
  • When importing Affymetrix marker maps using Affymetrix NetAffx, HelixTree will not be able to receive annotation information if your internet connection uses a proxy that requires authentication. Strict firewalls may also affect HelixTree’s ability to receive annotations.
  • Under some circumstances, the iteration procedure for the logistic regression will be unstable and the regression may fail, even when the matrix has sufficient rank and significant regressors are included. (See 26.18.9.) At this time, the best workaround is to filter out the data that causes such instabilities.
  • When using PBAT analysis with an interaction variable and screening using conditional power, the results differ between HelixTree PBAT and the free version of PBAT. These differences are specifically in the following (non-compact output) columns:
    • FBAT p-value (only slightly different)
    • FBAT-I p-value
    • FBAT power (only slightly different)
    • FBAT-I power
    • main effect
    • Std error (main effect)
    • p-value (main effect)
    • h-main
    • h-interaction

    It has not yet been determined which results are correct.

    NOTE: The results for non-parametric screening (Wald test) agree, as do all non-power results between HelixTree PBAT using non-parametric screening and HelixTree PBAT using conditional power screening.