Contents

I  Installing ChemTree and Acquiring Data
1 Installing and Initializing ChemTree
 1.1 Installation Overview
  1.1.1 Installation Under Windows
  1.1.2 Installation Under Apple
  1.1.3 Installation Under Linux
  1.1.4 Software Registration and Activation
   1.1.4.1 Manually Setting Proxy Settings
   1.1.4.2 Adding Personal Information
  1.1.5 Alternate Software Activation via Email
 1.2 Release Notes
  1.2.1 Improvements in version 5.1
  1.2.2 Bug Fixes
  1.2.3 Known Bugs
2 Welcome to ChemTree
 2.1 Goals for this Chapter
 2.2 Recursive Partitioning Primer
 2.3 The ChemTree Basic Workflow
 2.4 Tutorial 1: Performing the Basic Workflow in GUI Mode
  2.4.1 Create a New Project
  2.4.2 Import A Data Set
  2.4.3 View Imported Data in a Spread Sheet Viewer
  2.4.4 Identify the Data Columns to be Used in Analysis
  2.4.5 Perform the Analysis
  2.4.6 Checking Predictions of Multitree Model
 2.5 Tutorial 2: Cherry Picking Compounds Using ChemTree
  2.5.1 Cherry Pick Options
  2.5.2 Cherry Pick Results
3 Navigating the Main Screen
 3.1 Main Screen Overview
  3.1.1 The Menu Bar
  3.1.2 New Project
 3.2 Project Viewer Window
  3.2.1 The Menu Bar
  3.2.2 The Tool Bar
  3.2.3 The Project Navigator Window
  3.2.4 The Node Change Log Window
  3.2.5 The Edit Annotations Window
 3.3 Navigator Nodes
  3.3.1 Viewer Windows
  3.3.2 Types of Navigator Nodes and Their Associated Viewers
   3.3.2.1 The Project Node
   3.3.2.2 The Dataset Node
   3.3.2.3 The Spreadsheet Node
   3.3.2.4 Tree Analysis Node
   3.3.2.5 Multitree Model Node
   3.3.2.6 Applied Tree Node
   3.3.2.7 Histogram Node
   3.3.2.8 Observation Distance Matrix Node
   3.3.2.9 Correlation Interaction Node
   3.3.2.10 P Value Node
   3.3.2.11 Text Node
   3.3.2.12 Compound Node
 3.4 The File Menu
  3.4.1 Open Project
  3.4.2 Save Project
  3.4.3 Close Project
  3.4.4 Import Data
  3.4.5 View SD File
  3.4.6 Recent Project Files
  3.4.7 Quit
 3.5 Tools Menu
  3.5.1 Options for Updates and New Projects
  3.5.2 Current Project’s Options
  3.5.3 Project Option Settings Available
   3.5.3.1 Tree Options
   3.5.3.2 Node View Options
   3.5.3.3 Appearance
   3.5.3.4 Other
  3.5.4 View Project Log - Sorted by Node
  3.5.5 View Project Log - Chronological
  3.5.6 Run Script
  3.5.7 Run Python Shell
  3.5.8 Update ChemTree
 3.6 The Help Menu
4 Importing Your Data Into ChemTree
 4.1 General Considerations
 4.2 Mathematical Considerations
  4.2.1 P-Values and Dependent Variables
  4.2.2 P-Value Algorithms
  4.2.3 Preparing Data Example
 4.3 Importing Data
  4.3.1 Importing Legacy GHD Files
  4.3.2 Importing HTS Files
   4.3.2.1 Atom Path Lengths
   4.3.2.2 Augmented Atoms
   4.3.2.3 User-specified descriptors
   4.3.2.4 User Descriptor/Potency file
   4.3.2.5 Import Dialog
   4.3.2.6 Input (MDL) SD File
   4.3.2.7 Augmented Atoms (Multivariate Only)
   4.3.2.8 Atom Path Lengths
   4.3.2.9 Descriptor/Potency File
   4.3.2.10 Additional Dialog(s)
5 Scripting and Other Integrated Statistical Tools
 5.1 Overview
 5.2 The Python Shell Window
  5.2.1 Using Shell Objects
  5.2.2 Using the Dir Command
  5.2.3 Using Python for Getting Help
 5.3 Running Scripts
  5.3.1 Command Line Invocation
  5.3.2 Python Shell Invocation
  5.3.3 Navigator Menu Invocation
  5.3.4 Script Server Invocation
 5.4 Selecting a Script Server
 5.5 Example Scripts
 5.6 Scripting Reference
  5.6.1 Project Related Commands
   5.6.1.1 Creating a New Project
   5.6.1.2 Creating a Temporary Project
   5.6.1.3 Open an Existing Project
   5.6.1.4 Saving a Project
   5.6.1.5 Closing a Project
  5.6.2 General GHI Commands
   5.6.2.1 Allowing Viewers to Display
   5.6.2.2 Allowing Log Messages to Be Created
   5.6.2.3 Display a GUI Message
   5.6.2.4 Display a GUI Error Message
   5.6.2.5 Getting a Specific Navigator Node
   5.6.2.6 Getting the Current Navigator Node
   5.6.2.7 Choosing a File
   5.6.2.8 Choosing a directory
   5.6.2.9 Creating a Progress Bar
   5.6.2.10 Setting the Progress Bar’s Progress
   5.6.2.11 Checking Whether the Progress Bar Has Been Cancelled
   5.6.2.12 Disposing of a Progress Bar When Done
   5.6.2.13 Creating a Status Dialog
   5.6.2.14 Setting the Status Dialog’s Message
   5.6.2.15 Closing the Status Dialog When Done
  5.6.3 Commands Common to All Objects
   5.6.3.1 Change a Navigator Node Name
   5.6.3.2 Getting a Navigator Node Name
   5.6.3.3 Getting a Navigator Node Type
   5.6.3.4 Getting a Navigator Node ID
   5.6.3.5 Deleting a Navigator Node
   5.6.3.6 Closing a Navigator Viewer
   5.6.3.7 Showing a Navigator Viewer
   5.6.3.8 Finding a Node’s Parent
   5.6.3.9 Finding a Node’s Secondary Parent
   5.6.3.10 Getting a Node’s Annotations
   5.6.3.11 Appending to a Node’s Annotations
  5.6.4 Importing and Loading Data
   5.6.4.1 Importing GHD-format DataSets
   5.6.4.2 Importing HTS Files
  5.6.5 Creating a New DataSet With Scripting
   5.6.5.1 Getting a Datasetbuilder Object
   5.6.5.2 Adding Row Labels
   5.6.5.3 Adding a Column of Boolean Values
   5.6.5.4 Adding a Column of Integer Values
   5.6.5.5 Adding a Column of Double Values
   5.6.5.6 Adding a Column of Nominal Values
   5.6.5.7 Creating the DataSet
  5.6.6 Spreadsheet Access and Manipulation
   5.6.6.1 Getting the Spreadsheet as a Dictionary
   5.6.6.2 Getting the Spreadsheet as a List of Lists
   5.6.6.3 Getting a Spreadsheet Cell
   5.6.6.4 Getting a Spreadsheet Column by Column Number
   5.6.6.5 Getting a Spreadsheet Column by Column Name
   5.6.6.6 Determining if a Spreadsheet is a Marker Map
   5.6.6.7 Get a Spreadsheet Column Type
   5.6.6.8 Get a Spreadsheet Column State
   5.6.6.9 Export a Spreadsheet to CSV File
   5.6.6.10 Finding a Column by Name
   5.6.6.11 Finding a Row by Name
   5.6.6.12 Invert Row States
   5.6.6.13 Getting the Number of Spreadsheet Columns
   5.6.6.14 Get the Number of Columns in a State
   5.6.6.15 Get the Number of Spreadsheet Rows
   5.6.6.16 Get the Number of Rows in a State
   5.6.6.17 Randomly Shuffle Rows
   5.6.6.18 Getting a Row of Data
   5.6.6.19 Change the State of a Single Column
   5.6.6.20 Change the State of a Range of Columns
   5.6.6.21 Setting the State of a Single Row
   5.6.6.22 Getting the State of a Single Row
   5.6.6.23 Setting the State of a Range of Rows
   5.6.6.24 Randomly set a Number of Rows to a State
   5.6.6.25 Randomly Set a Percentage of Rows to a State
   5.6.6.26 Sort a Column in Ascending Order
   5.6.6.27 Sort a Column in Descending Order
   5.6.6.28 Remembering a Spreadsheet Page
  5.6.7 Using the P-Value plot
   5.6.7.1 Getting P-Values
   5.6.7.2 Getting Simes P-Values
   5.6.7.3 Setting the Simes Window
   5.6.7.4 Getting FDR (aP)
   5.6.7.5 Getting all P-Values as a spreadsheet
  5.6.8 Getting and Setting Tree Options
   5.6.8.1 Setting the Minimum Elements for Splitting
   5.6.8.2 Viewing the Minimum Elements Setting
   5.6.8.3 Setting the Number of Threads
   5.6.8.4 Viewing the Number of Threads Setting
   5.6.8.5 Setting the P Threshold
   5.6.8.6 Viewing the P Threshold Setting
   5.6.8.7 Setting the Pairwise Threshold
   5.6.8.8 Viewing the Pairwise Threshold Setting
   5.6.8.9 Setting the P Threshold Type
   5.6.8.10 Viewing the P Threshold Type Setting
   5.6.8.11 Setting the Segmenting Algorithm
   5.6.8.12 Viewing the Segmenting Algorithm
   5.6.8.13 Setting the Maximum Segments
   5.6.8.14 Viewing the Maximum Segments Setting
   5.6.8.15 Setting Linear Regression
   5.6.8.16 Viewing Linear Regression Setting
   5.6.8.17 Setting Use Missing Values Option
   5.6.8.18 Viewing Use Missing Values Option
   5.6.8.19 Setting Resample Iterations
   5.6.8.20 Viewing Resample Iterations Setting
  5.6.9 Creating a Tree Model
  5.6.10 Importing a Legacy Tree Model
  5.6.11 Tree Model Commands
   5.6.11.1 Get Variable Frequencies
   5.6.11.2 Get Tree Predictions
   5.6.11.3 Get Tree Variables
   5.6.11.4 Get Correlation Table
   5.6.11.5 Get Correlation Plot
   5.6.11.6 Cherry Picking Compounds
   5.6.11.7 Get Observation Distance Matrix Unsorted
   5.6.11.8 Get Observation Distance Matrix Sorted by First Principal Component
   5.6.11.9 Get Observation Distance Sorted by Similarity to One Observation
   5.6.11.10 Using the Distance Matrix Object
  5.6.12 Applying a Tree Model
  5.6.13 Performing Regression
  5.6.14 Output a C File
  5.6.15 Prompting the User for Input
  5.6.16 Text Viewer
   5.6.16.1 Getting the text
   5.6.16.2 Saving text to a file
  5.6.17 Regression Results
   5.6.17.1 Getting the text
   5.6.17.2 Saving text to a file
   5.6.17.3 Getting the covariates
   5.6.17.4 Getting the interactions
  5.6.18 Navigator Object Selection
   5.6.18.1 Selecting a Spreadsheet
   5.6.18.2 Selecting a Tree model
 5.7 S-PLUS Integration
  5.7.1 S-PLUS Desktop Integration
  5.7.2 S-PLUS Client/Server Integration
 5.8 R Integration
6 Using the Spreadsheet Viewer
 6.1 Spreadsheet Overview
 6.2 Manipulating, Filtering and Preparing Data Using the Spreadsheet
  6.2.1 Dependent or Independent Variable?
  6.2.2 Selecting a Dependent
  6.2.3 Sorting Records
  6.2.4 Deactivating Unwanted Columns
  6.2.5 Activating - Deactivating Row Data
  6.2.6 Picking Random Record Sets
 6.3 Navigating the Spreadsheet Menus
  6.3.1 File Menu
   6.3.1.1 Save As - Exporting Data
   6.3.1.2 Save As Comma-Delimited Text File
   6.3.1.3 Import a Legacy Tree Model
   6.3.1.4 Closing the File
  6.3.2 Edit Menu
   6.3.2.1 Select Row Subset
   6.3.2.2 Activate All Rows
   6.3.2.3 Inverting the Records (Rows) Selected
   6.3.2.4 Inverting the Columns Selected
   6.3.2.5 Row Subset Spreadsheet
   6.3.2.6 Find Column Search Tool
   6.3.2.7 Inactivate All Columns/Activate All Columns
  6.3.3 Analysis Menu
   6.3.3.1 Interactive Tree Analysis
   6.3.3.2 Create a Multiple-Tree Model
   6.3.3.3 Apply a Tree Model
  6.3.4 Help Menu
II  Recursive Partitioning
7 Interactive Tree Analysis
 7.1 Tree Analysis Overview
 7.2 Setting Options for Tree Analysis
  7.2.1 The Tree Tab
   7.2.1.1 Minimum Elements per Child:
   7.2.1.2 Segmenting Algorithm:
   7.2.1.3 Max Segments:
   7.2.1.4 Parallel Threads:
   7.2.1.5 Resampling Iterations
   7.2.1.6 P Value Threshold:
  7.2.2 The Node View Tab
   7.2.2.1 What the Node Values Mean
 7.3 Working with Nodes
  7.3.1 Node Pop-up Menu Selections
   7.3.1.1 Split Node
   7.3.1.2 Manual Split
   7.3.1.3 Collapse/Expand
   7.3.1.4 Recursive Split
   7.3.1.5 Spreadsheet
   7.3.1.6 Resample
  7.3.2 Visualize Split Data
  7.3.3 Visualize Split Data->Visualize Compounds
  7.3.4 Visualize Split Data->Multiple Tree Clustering
  7.3.5 Visualize Split Data->Multiple Tree Atom Highlighting
  7.3.6 Visualize Split Data->Histogram
  7.3.7 Showing Split Data
   7.3.7.1 Splits on Continuous Predictors
   7.3.7.2 Splits on Categorical Predictors
 7.4 Manually Splitting Nodes
  7.4.1 P Value Plots
  7.4.2 Define Split
  7.4.3 Don’t Split
  7.4.4 Using the Tree and Manual Split Window Together
 7.5 Defining Splits
  7.5.1 The Split Point
  7.5.2 The Split Point Controls the Node Information
  7.5.3 Smoothing the Data Points
  7.5.4 Refined or Course Data Points
  7.5.5 Zooming into a Specific Region
  7.5.6 Categorical Predictors
 7.6 The File Menu
  7.6.1 File->Print tree
  7.6.2 File->Save Tree Image
  7.6.3 File->View Predictions (In-Sample)
  7.6.4 File->Output C Code
  7.6.5 File->Close
 7.7 The Tree Menu
  7.7.1 Tree->Options
  7.7.2 Tree->Subset Spreadsheet
  7.7.3 Tree->Subset Tree
  7.7.4 Tree->Cherry Pick Compounds Using Current Tree
  7.7.5 Tree->Extend Current Tree Randomly
  7.7.6 Tree->Search Tree
   7.7.6.1 Tree->SearchTree->Find Observation
   7.7.6.2 Tree->SearchTree->Find Node
   7.7.6.3 Tree->SearchTree->Select Node by Threshold
   7.7.6.4 Tree->SearchTree->Highlight All Nodes
   7.7.6.5 Tree->SearchTree->UnHighlight All Nodes
 7.8 The Font Menu - Resizing and Formatting Tree View
  7.8.1 Font->Size
  7.8.2 Font->Family
8 Prediction Recipes
 8.1 Training and Validation Recipe
 8.2 Getting the Best Prediction Performance
9 Random Tree Generation
 9.1 Random Tree Overview
 9.2 Creating a Random Tree Model
 9.3 Multitree Model Browsing - Tree View
  9.3.1 Multitree Model – Tree List
  9.3.2 Multitree Model – Variable List
   9.3.2.1 Sorting
   9.3.2.2 Subset
   9.3.2.3 Variables->View Variable Usage
   9.3.2.4 Variables->View Variable Frequency
   9.3.2.5 Viewing Variable Correlations
  9.3.3 File->Analyze Current Tree Tools
  9.3.4 Predictions (InSample)->View Average Tree In-Sample Predictions
  9.3.5 Predictions (InSample)->Save All Tree In-Sample Predictions to CSV File
  9.3.6 Save “C” Prediction Program
  9.3.7 Close
  9.3.8 Help
10 Multivariate Tree Analysis
 10.1 Multivariate Analysis Overview
 10.2 Using More Than One Dependent Variable
  10.2.1 Continuous Multivariate Response
  10.2.2 Binary and/or Categorical Multivariate Response
  10.2.3 Multivariate Multiple Tree Clustering
  10.2.4 File->Output C Code
  10.2.5 Multivariate Compound View
  10.2.6 Visualize Split Data->Multiple Tree Atom Highlighting
  10.2.7 Multivariate Cherry Picking (Included in Cherry Picking Module)
11 Histogram Node Analysis
 11.1 Histogram Overview
 11.2 Viewing Split Data Histograms
  11.2.1 Creating Histograms
  11.2.2 Visualizing Node Relationships
  11.2.3 Changing Bins
  11.2.4 Zooming or Rubber Banding Data
  11.2.5 Menus
  11.2.6 File->Print
12 Viewing the Observation Distance Matrix
 12.1 Observation Distance Matrix Overview
  12.1.1 Creating an Observation Distance Matrix
  12.1.2 The Observation Distance Matrix
  12.1.3 Set Axes
  12.1.4 Stop Calculation/Stop Refresh and Restore Calculation/Restore Refresh
  12.1.5 Copy to Clipboard
  12.1.6 Creating a Spreadsheet or Tree view from the Matrix Plot
  12.1.7 Zoom Mode
  12.1.8 Modify Color Scaling
  12.1.9 Effect of Clicking on the Plot
  12.1.10 Color Drop Down Menu
 12.2 Viewing Observation Distance Matrix
  12.2.1 Viewing Spreadsheets or Trees of Subsets
  12.2.2 Zooming-In on a Subset of the Distance Matrix Plot
  12.2.3 Narrowing the Distance Range
  12.2.4 Menus
 12.3 Printing and Saving the Observation Distance Matrix
  12.3.1 Save Image or Print
  12.3.2 The Menus
  12.3.3 File->Save Obs. Distance Matrix (Sorted)
13 The Correlation Interaction View
 13.1 Correlation Interaction Overview
  13.1.1 Pick Wanted Variables
 13.2 Viewing Correlation Interactions
  13.2.1 About Correlation Interaction
  13.2.2 Determining Higher Order Effects
  13.2.3 The Correlation Interaction View
  13.2.4 The Upper Triangle
  13.2.5 The Lower Triangle
14 P-Value Plot
 14.1 Plotting P-values
 14.2 P-Value plot types
  14.2.1 P-Value Plots sorted by Var #
  14.2.2 P-Value Plots sorted by adjusted P-value
 14.3 The P-value Plot
 14.4 Reset View
 14.5 Copy to Clipboard
 14.6 Axis Selector
 14.7 Zooming into the Graph
 14.8 File Menu
 14.9 Create Bitmap
 14.10 Print Image
 14.11 P-Value Spreadsheet
15 Text Viewer
 15.1 Text Viewer Overview
 15.2 Navigating the Text Viewer Menus
  15.2.1 File Menu
   15.2.1.1 Save As Text File
16 Regression Analysis (Optional Module)
 16.1 Overview
 16.2 Performing Analysis
  16.2.1 Covariates
  16.2.2 Type of Regression
  16.2.3 Permutation Tests
  16.2.4 Create Residual Spreadsheet With Covariates
  16.2.5 Output and Running the Regression
III  The Science Behind ChemTree
17 Formulas and Theories
 17.1 Split-Prediction Methodology
 17.2 Normally Distributed Response Binomial Predictor
  17.2.1 Univariate Case
  17.2.2 Multivariate Case
 17.3 Normally Distributed Response Continuous-Ordinal Predictor
 17.4 Normally Distributed Response Categorical Predictor
 17.5 Linear Regression with Continuous Response
  17.5.1 Methodology
  17.5.2 Stepwise Regression
 17.6 Permutation Test Methodology (Optional Module)
 17.7 Results from Linear Regression
  17.7.1 Residual Spreadsheet
  17.7.2 Linear Regression Statistical Output Viewer
  17.7.3 Overall Statistics
  17.7.4 Regressor Statistics
  17.7.5 Left-Out Regressors
  17.7.6 Parameters
 17.8 Binomially Distributed Response Binary Predictor
  17.8.1 Univariate Case
  17.8.2 Multivariate Case
 17.9 Binomially Distributed Response Continuous/Ordinal Predictor
 17.10 Binomially Distributed Response Categorical Predictor
 17.11 Categorical Response
 17.12 Logistic Regression with Binomial Response
  17.12.1 Methodology
  17.12.2 Stepwise Regression
 17.13 Results from Logistic Regression
  17.13.1 Residual Spreadsheet
  17.13.2 Logistic Regression Statistical Output Viewer
  17.13.3 Overall Statistics
  17.13.4 Regressor Statistics
  17.13.5 Left-Out Regressors
  17.13.6 Parameters
 17.14 Caveats
 17.15 The False Discovery Rate and the Simes Method
  17.15.1 The False Discovery Rate
  17.15.2 Simes’ Method
A EULA
B REFERENCES
C BUG FIX HISTORY
 C.1 Bugs Fixed in Version 5.1.0 of ChemTree
 C.2 Bugs Fixed in Version 5.0.0 of ChemTree
 C.3 Bugs Fixed in Version 4.1.0 of ChemTree
 C.4 Bugs Fixed in Version 4.0.3 of ChemTree
 C.5 Bugs Fixed in Version 4.0.0 of ChemTree
 C.6 Bugs Fixed in Version 3.2.2 of ChemTree