Contents
I Installing ChemTree and Acquiring Data
1 Installing and Initializing ChemTree
1.1 Installation Overview
1.1.1 Installation Under Windows
1.1.2 Installation Under Apple
1.1.3 Installation Under Linux
1.1.4 Software Registration and Activation
1.1.4.1 Manually Setting Proxy Settings
1.1.4.2 Adding Personal Information
1.1.5 Alternate Software Activation via Email
1.2 Release Notes
1.2.1 Improvements in version 5.1
1.2.2 Bug Fixes
1.2.3 Known Bugs
2 Welcome to ChemTree
2.1 Goals for this Chapter
2.2 Recursive Partitioning Primer
2.3 The ChemTree Basic Workflow
2.4 Tutorial 1: Performing the Basic Workflow in GUI Mode
2.4.1 Create a New Project
2.4.2 Import A Data Set
2.4.3 View Imported Data in a Spread Sheet Viewer
2.4.4 Identify the Data Columns to be Used in Analysis
2.4.5 Perform the Analysis
2.4.6 Checking Predictions of Multitree Model
2.5 Tutorial 2: Cherry Picking Compounds Using ChemTree
2.5.1 Cherry Pick Options
2.5.2 Cherry Pick Results
3 Navigating the Main Screen
3.1 Main Screen Overview
3.1.1 The Menu Bar
3.1.2 New Project
3.2 Project Viewer Window
3.2.1 The Menu Bar
3.2.2 The Tool Bar
3.2.3 The Project Navigator Window
3.2.4 The Node Change Log Window
3.2.5 The Edit Annotations Window
3.3 Navigator Nodes
3.3.1 Viewer Windows
3.3.2 Types of Navigator Nodes and Their Associated Viewers
3.3.2.1 The Project Node
3.3.2.2 The Dataset Node
3.3.2.3 The Spreadsheet Node
3.3.2.4 Tree Analysis Node
3.3.2.5 Multitree Model Node
3.3.2.6 Applied Tree Node
3.3.2.7 Histogram Node
3.3.2.8 Observation Distance Matrix Node
3.3.2.9 Correlation Interaction Node
3.3.2.10 P Value Node
3.3.2.11 Text Node
3.3.2.12 Compound Node
3.4 The File Menu
3.4.1 Open Project
3.4.2 Save Project
3.4.3 Close Project
3.4.4 Import Data
3.4.5 View SD File
3.4.6 Recent Project Files
3.4.7 Quit
3.5 Tools Menu
3.5.1 Options for Updates and New Projects
3.5.2 Current Project’s Options
3.5.3 Project Option Settings Available
3.5.3.1 Tree Options
3.5.3.2 Node View Options
3.5.3.3 Appearance
3.5.3.4 Other
3.5.4 View Project Log - Sorted by Node
3.5.5 View Project Log - Chronological
3.5.6 Run Script
3.5.7 Run Python Shell
3.5.8 Update ChemTree
3.6 The Help Menu
4 Importing Your Data Into ChemTree
4.1 General Considerations
4.2 Mathematical Considerations
4.2.1 P-Values and Dependent Variables
4.2.2 P-Value Algorithms
4.2.3 Preparing Data Example
4.3 Importing Data
4.3.1 Importing Legacy GHD Files
4.3.2 Importing HTS Files
4.3.2.1 Atom Path Lengths
4.3.2.2 Augmented Atoms
4.3.2.3 User-specified descriptors
4.3.2.4 User Descriptor/Potency file
4.3.2.5 Import Dialog
4.3.2.6 Input (MDL) SD File
4.3.2.7 Augmented Atoms (Multivariate Only)
4.3.2.8 Atom Path Lengths
4.3.2.9 Descriptor/Potency File
4.3.2.10 Additional Dialog(s)
5 Scripting and Other Integrated Statistical Tools
5.1 Overview
5.2 The Python Shell Window
5.2.1 Using Shell Objects
5.2.2 Using the Dir Command
5.2.3 Using Python for Getting Help
5.3 Running Scripts
5.3.1 Command Line Invocation
5.3.2 Python Shell Invocation
5.3.3 Navigator Menu Invocation
5.3.4 Script Server Invocation
5.4 Selecting a Script Server
5.5 Example Scripts
5.6 Scripting Reference
5.6.1 Project Related Commands
5.6.1.1 Creating a New Project
5.6.1.2 Creating a Temporary Project
5.6.1.3 Open an Existing Project
5.6.1.4 Saving a Project
5.6.1.5 Closing a Project
5.6.2 General GHI Commands
5.6.2.1 Allowing Viewers to Display
5.6.2.2 Allowing Log Messages to Be Created
5.6.2.3 Display a GUI Message
5.6.2.4 Display a GUI Error Message
5.6.2.5 Getting a Specific Navigator Node
5.6.2.6 Getting the Current Navigator Node
5.6.2.7 Choosing a File
5.6.2.8 Choosing a directory
5.6.2.9 Creating a Progress Bar
5.6.2.10 Setting the Progress Bar’s Progress
5.6.2.11 Checking Whether the Progress Bar Has Been Cancelled
5.6.2.12 Disposing of a Progress Bar When Done
5.6.2.13 Creating a Status Dialog
5.6.2.14 Setting the Status Dialog’s Message
5.6.2.15 Closing the Status Dialog When Done
5.6.3 Commands Common to All Objects
5.6.3.1 Change a Navigator Node Name
5.6.3.2 Getting a Navigator Node Name
5.6.3.3 Getting a Navigator Node Type
5.6.3.4 Getting a Navigator Node ID
5.6.3.5 Deleting a Navigator Node
5.6.3.6 Closing a Navigator Viewer
5.6.3.7 Showing a Navigator Viewer
5.6.3.8 Finding a Node’s Parent
5.6.3.9 Finding a Node’s Secondary Parent
5.6.3.10 Getting a Node’s Annotations
5.6.3.11 Appending to a Node’s Annotations
5.6.4 Importing and Loading Data
5.6.4.1 Importing GHD-format DataSets
5.6.4.2 Importing HTS Files
5.6.5 Creating a New DataSet With Scripting
5.6.5.1 Getting a Datasetbuilder Object
5.6.5.2 Adding Row Labels
5.6.5.3 Adding a Column of Boolean Values
5.6.5.4 Adding a Column of Integer Values
5.6.5.5 Adding a Column of Double Values
5.6.5.6 Adding a Column of Nominal Values
5.6.5.7 Creating the DataSet
5.6.6 Spreadsheet Access and Manipulation
5.6.6.1 Getting the Spreadsheet as a Dictionary
5.6.6.2 Getting the Spreadsheet as a List of Lists
5.6.6.3 Getting a Spreadsheet Cell
5.6.6.4 Getting a Spreadsheet Column by Column Number
5.6.6.5 Getting a Spreadsheet Column by Column Name
5.6.6.6 Determining if a Spreadsheet is a Marker Map
5.6.6.7 Get a Spreadsheet Column Type
5.6.6.8 Get a Spreadsheet Column State
5.6.6.9 Export a Spreadsheet to CSV File
5.6.6.10 Finding a Column by Name
5.6.6.11 Finding a Row by Name
5.6.6.12 Invert Row States
5.6.6.13 Getting the Number of Spreadsheet Columns
5.6.6.14 Get the Number of Columns in a State
5.6.6.15 Get the Number of Spreadsheet Rows
5.6.6.16 Get the Number of Rows in a State
5.6.6.17 Randomly Shuffle Rows
5.6.6.18 Getting a Row of Data
5.6.6.19 Change the State of a Single Column
5.6.6.20 Change the State of a Range of Columns
5.6.6.21 Setting the State of a Single Row
5.6.6.22 Getting the State of a Single Row
5.6.6.23 Setting the State of a Range of Rows
5.6.6.24 Randomly set a Number of Rows to a State
5.6.6.25 Randomly Set a Percentage of Rows to a State
5.6.6.26 Sort a Column in Ascending Order
5.6.6.27 Sort a Column in Descending Order
5.6.6.28 Remembering a Spreadsheet Page
5.6.7 Using the P-Value plot
5.6.7.1 Getting P-Values
5.6.7.2 Getting Simes P-Values
5.6.7.3 Setting the Simes Window
5.6.7.4 Getting FDR (aP)
5.6.7.5 Getting all P-Values as a spreadsheet
5.6.8 Getting and Setting Tree Options
5.6.8.1 Setting the Minimum Elements for Splitting
5.6.8.2 Viewing the Minimum Elements Setting
5.6.8.3 Setting the Number of Threads
5.6.8.4 Viewing the Number of Threads Setting
5.6.8.5 Setting the P Threshold
5.6.8.6 Viewing the P Threshold Setting
5.6.8.7 Setting the Pairwise Threshold
5.6.8.8 Viewing the Pairwise Threshold Setting
5.6.8.9 Setting the P Threshold Type
5.6.8.10 Viewing the P Threshold Type Setting
5.6.8.11 Setting the Segmenting Algorithm
5.6.8.12 Viewing the Segmenting Algorithm
5.6.8.13 Setting the Maximum Segments
5.6.8.14 Viewing the Maximum Segments Setting
5.6.8.15 Setting Linear Regression
5.6.8.16 Viewing Linear Regression Setting
5.6.8.17 Setting Use Missing Values Option
5.6.8.18 Viewing Use Missing Values Option
5.6.8.19 Setting Resample Iterations
5.6.8.20 Viewing Resample Iterations Setting
5.6.9 Creating a Tree Model
5.6.10 Importing a Legacy Tree Model
5.6.11 Tree Model Commands
5.6.11.1 Get Variable Frequencies
5.6.11.2 Get Tree Predictions
5.6.11.3 Get Tree Variables
5.6.11.4 Get Correlation Table
5.6.11.5 Get Correlation Plot
5.6.11.6 Cherry Picking Compounds
5.6.11.7 Get Observation Distance Matrix Unsorted
5.6.11.8 Get Observation Distance Matrix Sorted by First Principal Component
5.6.11.9 Get Observation Distance Sorted by Similarity to One Observation
5.6.11.10 Using the Distance Matrix Object
5.6.12 Applying a Tree Model
5.6.13 Performing Regression
5.6.14 Output a C File
5.6.15 Prompting the User for Input
5.6.16 Text Viewer
5.6.16.1 Getting the text
5.6.16.2 Saving text to a file
5.6.17 Regression Results
5.6.17.1 Getting the text
5.6.17.2 Saving text to a file
5.6.17.3 Getting the covariates
5.6.17.4 Getting the interactions
5.6.18 Navigator Object Selection
5.6.18.1 Selecting a Spreadsheet
5.6.18.2 Selecting a Tree model
5.7 S-PLUS Integration
5.7.1 S-PLUS Desktop Integration
5.7.2 S-PLUS Client/Server Integration
5.8 R Integration
6 Using the Spreadsheet Viewer
6.1 Spreadsheet Overview
6.2 Manipulating, Filtering and Preparing Data Using the Spreadsheet
6.2.1 Dependent or Independent Variable?
6.2.2 Selecting a Dependent
6.2.3 Sorting Records
6.2.4 Deactivating Unwanted Columns
6.2.5 Activating - Deactivating Row Data
6.2.6 Picking Random Record Sets
6.3 Navigating the Spreadsheet Menus
6.3.1 File Menu
6.3.1.1 Save As - Exporting Data
6.3.1.2 Save As Comma-Delimited Text File
6.3.1.3 Import a Legacy Tree Model
6.3.1.4 Closing the File
6.3.2 Edit Menu
6.3.2.1 Select Row Subset
6.3.2.2 Activate All Rows
6.3.2.3 Inverting the Records (Rows) Selected
6.3.2.4 Inverting the Columns Selected
6.3.2.5 Row Subset Spreadsheet
6.3.2.6 Find Column Search Tool
6.3.2.7 Inactivate All Columns/Activate All Columns
6.3.3 Analysis Menu
6.3.3.1 Interactive Tree Analysis
6.3.3.2 Create a Multiple-Tree Model
6.3.3.3 Apply a Tree Model
6.3.4 Help Menu
II Recursive Partitioning
7 Interactive Tree Analysis
7.1 Tree Analysis Overview
7.2 Setting Options for Tree Analysis
7.2.1 The Tree Tab
7.2.1.1 Minimum Elements per Child:
7.2.1.2 Segmenting Algorithm:
7.2.1.3 Max Segments:
7.2.1.4 Parallel Threads:
7.2.1.5 Resampling Iterations
7.2.1.6 P Value Threshold:
7.2.2 The Node View Tab
7.2.2.1 What the Node Values Mean
7.3 Working with Nodes
7.3.1 Node Pop-up Menu Selections
7.3.1.1 Split Node
7.3.1.2 Manual Split
7.3.1.3 Collapse/Expand
7.3.1.4 Recursive Split
7.3.1.5 Spreadsheet
7.3.1.6 Resample
7.3.2 Visualize Split Data
7.3.3 Visualize Split Data->Visualize Compounds
7.3.4 Visualize Split Data->Multiple Tree Clustering
7.3.5 Visualize Split Data->Multiple Tree Atom Highlighting
7.3.6 Visualize Split Data->Histogram
7.3.7 Showing Split Data
7.3.7.1 Splits on Continuous Predictors
7.3.7.2 Splits on Categorical Predictors
7.4 Manually Splitting Nodes
7.4.1 P Value Plots
7.4.2 Define Split
7.4.3 Don’t Split
7.4.4 Using the Tree and Manual Split Window Together
7.5 Defining Splits
7.5.1 The Split Point
7.5.2 The Split Point Controls the Node Information
7.5.3 Smoothing the Data Points
7.5.4 Refined or Course Data Points
7.5.5 Zooming into a Specific Region
7.5.6 Categorical Predictors
7.6 The File Menu
7.6.1 File->Print tree
7.6.2 File->Save Tree Image
7.6.3 File->View Predictions (In-Sample)
7.6.4 File->Output C Code
7.6.5 File->Close
7.7 The Tree Menu
7.7.1 Tree->Options
7.7.2 Tree->Subset Spreadsheet
7.7.3 Tree->Subset Tree
7.7.4 Tree->Cherry Pick Compounds Using Current Tree
7.7.5 Tree->Extend Current Tree Randomly
7.7.6 Tree->Search Tree
7.7.6.1 Tree->SearchTree->Find Observation
7.7.6.2 Tree->SearchTree->Find Node
7.7.6.3 Tree->SearchTree->Select Node by Threshold
7.7.6.4 Tree->SearchTree->Highlight All Nodes
7.7.6.5 Tree->SearchTree->UnHighlight All Nodes
7.8 The Font Menu - Resizing and Formatting Tree View
7.8.1 Font->Size
7.8.2 Font->Family
8 Prediction Recipes
8.1 Training and Validation Recipe
8.2 Getting the Best Prediction Performance
9 Random Tree Generation
9.1 Random Tree Overview
9.2 Creating a Random Tree Model
9.3 Multitree Model Browsing - Tree View
9.3.1 Multitree Model – Tree List
9.3.2 Multitree Model – Variable List
9.3.2.1 Sorting
9.3.2.2 Subset
9.3.2.3 Variables->View Variable Usage
9.3.2.4 Variables->View Variable Frequency
9.3.2.5 Viewing Variable Correlations
9.3.3 File->Analyze Current Tree Tools
9.3.4 Predictions (InSample)->View Average Tree In-Sample Predictions
9.3.5 Predictions (InSample)->Save All Tree In-Sample Predictions to CSV File
9.3.6 Save “C” Prediction Program
9.3.7 Close
9.3.8 Help
10 Multivariate Tree Analysis
10.1 Multivariate Analysis Overview
10.2 Using More Than One Dependent Variable
10.2.1 Continuous Multivariate Response
10.2.2 Binary and/or Categorical Multivariate Response
10.2.3 Multivariate Multiple Tree Clustering
10.2.4 File->Output C Code
10.2.5 Multivariate Compound View
10.2.6 Visualize Split Data->Multiple Tree Atom Highlighting
10.2.7 Multivariate Cherry Picking (Included in Cherry Picking Module)
11 Histogram Node Analysis
11.1 Histogram Overview
11.2 Viewing Split Data Histograms
11.2.1 Creating Histograms
11.2.2 Visualizing Node Relationships
11.2.3 Changing Bins
11.2.4 Zooming or Rubber Banding Data
11.2.5 Menus
11.2.6 File->Print
12 Viewing the Observation Distance Matrix
12.1 Observation Distance Matrix Overview
12.1.1 Creating an Observation Distance Matrix
12.1.2 The Observation Distance Matrix
12.1.3 Set Axes
12.1.4 Stop Calculation/Stop Refresh and Restore Calculation/Restore Refresh
12.1.5 Copy to Clipboard
12.1.6 Creating a Spreadsheet or Tree view from the Matrix Plot
12.1.7 Zoom Mode
12.1.8 Modify Color Scaling
12.1.9 Effect of Clicking on the Plot
12.1.10 Color Drop Down Menu
12.2 Viewing Observation Distance Matrix
12.2.1 Viewing Spreadsheets or Trees of Subsets
12.2.2 Zooming-In on a Subset of the Distance Matrix Plot
12.2.3 Narrowing the Distance Range
12.2.4 Menus
12.3 Printing and Saving the Observation Distance Matrix
12.3.1 Save Image or Print
12.3.2 The Menus
12.3.3 File->Save Obs. Distance Matrix (Sorted)
13 The Correlation Interaction View
13.1 Correlation Interaction Overview
13.1.1 Pick Wanted Variables
13.2 Viewing Correlation Interactions
13.2.1 About Correlation Interaction
13.2.2 Determining Higher Order Effects
13.2.3 The Correlation Interaction View
13.2.4 The Upper Triangle
13.2.5 The Lower Triangle
14 P-Value Plot
14.1 Plotting P-values
14.2 P-Value plot types
14.2.1 P-Value Plots sorted by Var #
14.2.2 P-Value Plots sorted by adjusted P-value
14.3 The P-value Plot
14.4 Reset View
14.5 Copy to Clipboard
14.6 Axis Selector
14.7 Zooming into the Graph
14.8 File Menu
14.9 Create Bitmap
14.10 Print Image
14.11 P-Value Spreadsheet
15 Text Viewer
15.1 Text Viewer Overview
15.2 Navigating the Text Viewer Menus
15.2.1 File Menu
15.2.1.1 Save As Text File
16 Regression Analysis (Optional Module)
16.1 Overview
16.2 Performing Analysis
16.2.1 Covariates
16.2.2 Type of Regression
16.2.3 Permutation Tests
16.2.4 Create Residual Spreadsheet With Covariates
16.2.5 Output and Running the Regression
III The Science Behind ChemTree
17 Formulas and Theories
17.1 Split-Prediction Methodology
17.2 Normally Distributed Response Binomial Predictor
17.2.1 Univariate Case
17.2.2 Multivariate Case
17.3 Normally Distributed Response Continuous-Ordinal Predictor
17.4 Normally Distributed Response Categorical Predictor
17.5 Linear Regression with Continuous Response
17.5.1 Methodology
17.5.2 Stepwise Regression
17.6 Permutation Test Methodology (Optional Module)
17.7 Results from Linear Regression
17.7.1 Residual Spreadsheet
17.7.2 Linear Regression Statistical Output Viewer
17.7.3 Overall Statistics
17.7.4 Regressor Statistics
17.7.5 Left-Out Regressors
17.7.6 Parameters
17.8 Binomially Distributed Response Binary Predictor
17.8.1 Univariate Case
17.8.2 Multivariate Case
17.9 Binomially Distributed Response Continuous/Ordinal Predictor
17.10 Binomially Distributed Response Categorical Predictor
17.11 Categorical Response
17.12 Logistic Regression with Binomial Response
17.12.1 Methodology
17.12.2 Stepwise Regression
17.13 Results from Logistic Regression
17.13.1 Residual Spreadsheet
17.13.2 Logistic Regression Statistical Output Viewer
17.13.3 Overall Statistics
17.13.4 Regressor Statistics
17.13.5 Left-Out Regressors
17.13.6 Parameters
17.14 Caveats
17.15 The False Discovery Rate and the Simes Method
17.15.1 The False Discovery Rate
17.15.2 Simes’ Method
A EULA
B REFERENCES
C BUG FIX HISTORY
C.1 Bugs Fixed in Version 5.1.0 of ChemTree
C.2 Bugs Fixed in Version 5.0.0 of ChemTree
C.3 Bugs Fixed in Version 4.1.0 of ChemTree
C.4 Bugs Fixed in Version 4.0.3 of ChemTree
C.5 Bugs Fixed in Version 4.0.0 of ChemTree
C.6 Bugs Fixed in Version 3.2.2 of ChemTree
1 Installing and Initializing ChemTree
1.1 Installation Overview
1.1.1 Installation Under Windows
1.1.2 Installation Under Apple
1.1.3 Installation Under Linux
1.1.4 Software Registration and Activation
1.1.4.1 Manually Setting Proxy Settings
1.1.4.2 Adding Personal Information
1.1.5 Alternate Software Activation via Email
1.2 Release Notes
1.2.1 Improvements in version 5.1
1.2.2 Bug Fixes
1.2.3 Known Bugs
2 Welcome to ChemTree
2.1 Goals for this Chapter
2.2 Recursive Partitioning Primer
2.3 The ChemTree Basic Workflow
2.4 Tutorial 1: Performing the Basic Workflow in GUI Mode
2.4.1 Create a New Project
2.4.2 Import A Data Set
2.4.3 View Imported Data in a Spread Sheet Viewer
2.4.4 Identify the Data Columns to be Used in Analysis
2.4.5 Perform the Analysis
2.4.6 Checking Predictions of Multitree Model
2.5 Tutorial 2: Cherry Picking Compounds Using ChemTree
2.5.1 Cherry Pick Options
2.5.2 Cherry Pick Results
3 Navigating the Main Screen
3.1 Main Screen Overview
3.1.1 The Menu Bar
3.1.2 New Project
3.2 Project Viewer Window
3.2.1 The Menu Bar
3.2.2 The Tool Bar
3.2.3 The Project Navigator Window
3.2.4 The Node Change Log Window
3.2.5 The Edit Annotations Window
3.3 Navigator Nodes
3.3.1 Viewer Windows
3.3.2 Types of Navigator Nodes and Their Associated Viewers
3.3.2.1 The Project Node
3.3.2.2 The Dataset Node
3.3.2.3 The Spreadsheet Node
3.3.2.4 Tree Analysis Node
3.3.2.5 Multitree Model Node
3.3.2.6 Applied Tree Node
3.3.2.7 Histogram Node
3.3.2.8 Observation Distance Matrix Node
3.3.2.9 Correlation Interaction Node
3.3.2.10 P Value Node
3.3.2.11 Text Node
3.3.2.12 Compound Node
3.4 The File Menu
3.4.1 Open Project
3.4.2 Save Project
3.4.3 Close Project
3.4.4 Import Data
3.4.5 View SD File
3.4.6 Recent Project Files
3.4.7 Quit
3.5 Tools Menu
3.5.1 Options for Updates and New Projects
3.5.2 Current Project’s Options
3.5.3 Project Option Settings Available
3.5.3.1 Tree Options
3.5.3.2 Node View Options
3.5.3.3 Appearance
3.5.3.4 Other
3.5.4 View Project Log - Sorted by Node
3.5.5 View Project Log - Chronological
3.5.6 Run Script
3.5.7 Run Python Shell
3.5.8 Update ChemTree
3.6 The Help Menu
4 Importing Your Data Into ChemTree
4.1 General Considerations
4.2 Mathematical Considerations
4.2.1 P-Values and Dependent Variables
4.2.2 P-Value Algorithms
4.2.3 Preparing Data Example
4.3 Importing Data
4.3.1 Importing Legacy GHD Files
4.3.2 Importing HTS Files
4.3.2.1 Atom Path Lengths
4.3.2.2 Augmented Atoms
4.3.2.3 User-specified descriptors
4.3.2.4 User Descriptor/Potency file
4.3.2.5 Import Dialog
4.3.2.6 Input (MDL) SD File
4.3.2.7 Augmented Atoms (Multivariate Only)
4.3.2.8 Atom Path Lengths
4.3.2.9 Descriptor/Potency File
4.3.2.10 Additional Dialog(s)
5 Scripting and Other Integrated Statistical Tools
5.1 Overview
5.2 The Python Shell Window
5.2.1 Using Shell Objects
5.2.2 Using the Dir Command
5.2.3 Using Python for Getting Help
5.3 Running Scripts
5.3.1 Command Line Invocation
5.3.2 Python Shell Invocation
5.3.3 Navigator Menu Invocation
5.3.4 Script Server Invocation
5.4 Selecting a Script Server
5.5 Example Scripts
5.6 Scripting Reference
5.6.1 Project Related Commands
5.6.1.1 Creating a New Project
5.6.1.2 Creating a Temporary Project
5.6.1.3 Open an Existing Project
5.6.1.4 Saving a Project
5.6.1.5 Closing a Project
5.6.2 General GHI Commands
5.6.2.1 Allowing Viewers to Display
5.6.2.2 Allowing Log Messages to Be Created
5.6.2.3 Display a GUI Message
5.6.2.4 Display a GUI Error Message
5.6.2.5 Getting a Specific Navigator Node
5.6.2.6 Getting the Current Navigator Node
5.6.2.7 Choosing a File
5.6.2.8 Choosing a directory
5.6.2.9 Creating a Progress Bar
5.6.2.10 Setting the Progress Bar’s Progress
5.6.2.11 Checking Whether the Progress Bar Has Been Cancelled
5.6.2.12 Disposing of a Progress Bar When Done
5.6.2.13 Creating a Status Dialog
5.6.2.14 Setting the Status Dialog’s Message
5.6.2.15 Closing the Status Dialog When Done
5.6.3 Commands Common to All Objects
5.6.3.1 Change a Navigator Node Name
5.6.3.2 Getting a Navigator Node Name
5.6.3.3 Getting a Navigator Node Type
5.6.3.4 Getting a Navigator Node ID
5.6.3.5 Deleting a Navigator Node
5.6.3.6 Closing a Navigator Viewer
5.6.3.7 Showing a Navigator Viewer
5.6.3.8 Finding a Node’s Parent
5.6.3.9 Finding a Node’s Secondary Parent
5.6.3.10 Getting a Node’s Annotations
5.6.3.11 Appending to a Node’s Annotations
5.6.4 Importing and Loading Data
5.6.4.1 Importing GHD-format DataSets
5.6.4.2 Importing HTS Files
5.6.5 Creating a New DataSet With Scripting
5.6.5.1 Getting a Datasetbuilder Object
5.6.5.2 Adding Row Labels
5.6.5.3 Adding a Column of Boolean Values
5.6.5.4 Adding a Column of Integer Values
5.6.5.5 Adding a Column of Double Values
5.6.5.6 Adding a Column of Nominal Values
5.6.5.7 Creating the DataSet
5.6.6 Spreadsheet Access and Manipulation
5.6.6.1 Getting the Spreadsheet as a Dictionary
5.6.6.2 Getting the Spreadsheet as a List of Lists
5.6.6.3 Getting a Spreadsheet Cell
5.6.6.4 Getting a Spreadsheet Column by Column Number
5.6.6.5 Getting a Spreadsheet Column by Column Name
5.6.6.6 Determining if a Spreadsheet is a Marker Map
5.6.6.7 Get a Spreadsheet Column Type
5.6.6.8 Get a Spreadsheet Column State
5.6.6.9 Export a Spreadsheet to CSV File
5.6.6.10 Finding a Column by Name
5.6.6.11 Finding a Row by Name
5.6.6.12 Invert Row States
5.6.6.13 Getting the Number of Spreadsheet Columns
5.6.6.14 Get the Number of Columns in a State
5.6.6.15 Get the Number of Spreadsheet Rows
5.6.6.16 Get the Number of Rows in a State
5.6.6.17 Randomly Shuffle Rows
5.6.6.18 Getting a Row of Data
5.6.6.19 Change the State of a Single Column
5.6.6.20 Change the State of a Range of Columns
5.6.6.21 Setting the State of a Single Row
5.6.6.22 Getting the State of a Single Row
5.6.6.23 Setting the State of a Range of Rows
5.6.6.24 Randomly set a Number of Rows to a State
5.6.6.25 Randomly Set a Percentage of Rows to a State
5.6.6.26 Sort a Column in Ascending Order
5.6.6.27 Sort a Column in Descending Order
5.6.6.28 Remembering a Spreadsheet Page
5.6.7 Using the P-Value plot
5.6.7.1 Getting P-Values
5.6.7.2 Getting Simes P-Values
5.6.7.3 Setting the Simes Window
5.6.7.4 Getting FDR (aP)
5.6.7.5 Getting all P-Values as a spreadsheet
5.6.8 Getting and Setting Tree Options
5.6.8.1 Setting the Minimum Elements for Splitting
5.6.8.2 Viewing the Minimum Elements Setting
5.6.8.3 Setting the Number of Threads
5.6.8.4 Viewing the Number of Threads Setting
5.6.8.5 Setting the P Threshold
5.6.8.6 Viewing the P Threshold Setting
5.6.8.7 Setting the Pairwise Threshold
5.6.8.8 Viewing the Pairwise Threshold Setting
5.6.8.9 Setting the P Threshold Type
5.6.8.10 Viewing the P Threshold Type Setting
5.6.8.11 Setting the Segmenting Algorithm
5.6.8.12 Viewing the Segmenting Algorithm
5.6.8.13 Setting the Maximum Segments
5.6.8.14 Viewing the Maximum Segments Setting
5.6.8.15 Setting Linear Regression
5.6.8.16 Viewing Linear Regression Setting
5.6.8.17 Setting Use Missing Values Option
5.6.8.18 Viewing Use Missing Values Option
5.6.8.19 Setting Resample Iterations
5.6.8.20 Viewing Resample Iterations Setting
5.6.9 Creating a Tree Model
5.6.10 Importing a Legacy Tree Model
5.6.11 Tree Model Commands
5.6.11.1 Get Variable Frequencies
5.6.11.2 Get Tree Predictions
5.6.11.3 Get Tree Variables
5.6.11.4 Get Correlation Table
5.6.11.5 Get Correlation Plot
5.6.11.6 Cherry Picking Compounds
5.6.11.7 Get Observation Distance Matrix Unsorted
5.6.11.8 Get Observation Distance Matrix Sorted by First Principal Component
5.6.11.9 Get Observation Distance Sorted by Similarity to One Observation
5.6.11.10 Using the Distance Matrix Object
5.6.12 Applying a Tree Model
5.6.13 Performing Regression
5.6.14 Output a C File
5.6.15 Prompting the User for Input
5.6.16 Text Viewer
5.6.16.1 Getting the text
5.6.16.2 Saving text to a file
5.6.17 Regression Results
5.6.17.1 Getting the text
5.6.17.2 Saving text to a file
5.6.17.3 Getting the covariates
5.6.17.4 Getting the interactions
5.6.18 Navigator Object Selection
5.6.18.1 Selecting a Spreadsheet
5.6.18.2 Selecting a Tree model
5.7 S-PLUS Integration
5.7.1 S-PLUS Desktop Integration
5.7.2 S-PLUS Client/Server Integration
5.8 R Integration
6 Using the Spreadsheet Viewer
6.1 Spreadsheet Overview
6.2 Manipulating, Filtering and Preparing Data Using the Spreadsheet
6.2.1 Dependent or Independent Variable?
6.2.2 Selecting a Dependent
6.2.3 Sorting Records
6.2.4 Deactivating Unwanted Columns
6.2.5 Activating - Deactivating Row Data
6.2.6 Picking Random Record Sets
6.3 Navigating the Spreadsheet Menus
6.3.1 File Menu
6.3.1.1 Save As - Exporting Data
6.3.1.2 Save As Comma-Delimited Text File
6.3.1.3 Import a Legacy Tree Model
6.3.1.4 Closing the File
6.3.2 Edit Menu
6.3.2.1 Select Row Subset
6.3.2.2 Activate All Rows
6.3.2.3 Inverting the Records (Rows) Selected
6.3.2.4 Inverting the Columns Selected
6.3.2.5 Row Subset Spreadsheet
6.3.2.6 Find Column Search Tool
6.3.2.7 Inactivate All Columns/Activate All Columns
6.3.3 Analysis Menu
6.3.3.1 Interactive Tree Analysis
6.3.3.2 Create a Multiple-Tree Model
6.3.3.3 Apply a Tree Model
6.3.4 Help Menu
II Recursive Partitioning
7 Interactive Tree Analysis
7.1 Tree Analysis Overview
7.2 Setting Options for Tree Analysis
7.2.1 The Tree Tab
7.2.1.1 Minimum Elements per Child:
7.2.1.2 Segmenting Algorithm:
7.2.1.3 Max Segments:
7.2.1.4 Parallel Threads:
7.2.1.5 Resampling Iterations
7.2.1.6 P Value Threshold:
7.2.2 The Node View Tab
7.2.2.1 What the Node Values Mean
7.3 Working with Nodes
7.3.1 Node Pop-up Menu Selections
7.3.1.1 Split Node
7.3.1.2 Manual Split
7.3.1.3 Collapse/Expand
7.3.1.4 Recursive Split
7.3.1.5 Spreadsheet
7.3.1.6 Resample
7.3.2 Visualize Split Data
7.3.3 Visualize Split Data->Visualize Compounds
7.3.4 Visualize Split Data->Multiple Tree Clustering
7.3.5 Visualize Split Data->Multiple Tree Atom Highlighting
7.3.6 Visualize Split Data->Histogram
7.3.7 Showing Split Data
7.3.7.1 Splits on Continuous Predictors
7.3.7.2 Splits on Categorical Predictors
7.4 Manually Splitting Nodes
7.4.1 P Value Plots
7.4.2 Define Split
7.4.3 Don’t Split
7.4.4 Using the Tree and Manual Split Window Together
7.5 Defining Splits
7.5.1 The Split Point
7.5.2 The Split Point Controls the Node Information
7.5.3 Smoothing the Data Points
7.5.4 Refined or Course Data Points
7.5.5 Zooming into a Specific Region
7.5.6 Categorical Predictors
7.6 The File Menu
7.6.1 File->Print tree
7.6.2 File->Save Tree Image
7.6.3 File->View Predictions (In-Sample)
7.6.4 File->Output C Code
7.6.5 File->Close
7.7 The Tree Menu
7.7.1 Tree->Options
7.7.2 Tree->Subset Spreadsheet
7.7.3 Tree->Subset Tree
7.7.4 Tree->Cherry Pick Compounds Using Current Tree
7.7.5 Tree->Extend Current Tree Randomly
7.7.6 Tree->Search Tree
7.7.6.1 Tree->SearchTree->Find Observation
7.7.6.2 Tree->SearchTree->Find Node
7.7.6.3 Tree->SearchTree->Select Node by Threshold
7.7.6.4 Tree->SearchTree->Highlight All Nodes
7.7.6.5 Tree->SearchTree->UnHighlight All Nodes
7.8 The Font Menu - Resizing and Formatting Tree View
7.8.1 Font->Size
7.8.2 Font->Family
8 Prediction Recipes
8.1 Training and Validation Recipe
8.2 Getting the Best Prediction Performance
9 Random Tree Generation
9.1 Random Tree Overview
9.2 Creating a Random Tree Model
9.3 Multitree Model Browsing - Tree View
9.3.1 Multitree Model – Tree List
9.3.2 Multitree Model – Variable List
9.3.2.1 Sorting
9.3.2.2 Subset
9.3.2.3 Variables->View Variable Usage
9.3.2.4 Variables->View Variable Frequency
9.3.2.5 Viewing Variable Correlations
9.3.3 File->Analyze Current Tree Tools
9.3.4 Predictions (InSample)->View Average Tree In-Sample Predictions
9.3.5 Predictions (InSample)->Save All Tree In-Sample Predictions to CSV File
9.3.6 Save “C” Prediction Program
9.3.7 Close
9.3.8 Help
10 Multivariate Tree Analysis
10.1 Multivariate Analysis Overview
10.2 Using More Than One Dependent Variable
10.2.1 Continuous Multivariate Response
10.2.2 Binary and/or Categorical Multivariate Response
10.2.3 Multivariate Multiple Tree Clustering
10.2.4 File->Output C Code
10.2.5 Multivariate Compound View
10.2.6 Visualize Split Data->Multiple Tree Atom Highlighting
10.2.7 Multivariate Cherry Picking (Included in Cherry Picking Module)
11 Histogram Node Analysis
11.1 Histogram Overview
11.2 Viewing Split Data Histograms
11.2.1 Creating Histograms
11.2.2 Visualizing Node Relationships
11.2.3 Changing Bins
11.2.4 Zooming or Rubber Banding Data
11.2.5 Menus
11.2.6 File->Print
12 Viewing the Observation Distance Matrix
12.1 Observation Distance Matrix Overview
12.1.1 Creating an Observation Distance Matrix
12.1.2 The Observation Distance Matrix
12.1.3 Set Axes
12.1.4 Stop Calculation/Stop Refresh and Restore Calculation/Restore Refresh
12.1.5 Copy to Clipboard
12.1.6 Creating a Spreadsheet or Tree view from the Matrix Plot
12.1.7 Zoom Mode
12.1.8 Modify Color Scaling
12.1.9 Effect of Clicking on the Plot
12.1.10 Color Drop Down Menu
12.2 Viewing Observation Distance Matrix
12.2.1 Viewing Spreadsheets or Trees of Subsets
12.2.2 Zooming-In on a Subset of the Distance Matrix Plot
12.2.3 Narrowing the Distance Range
12.2.4 Menus
12.3 Printing and Saving the Observation Distance Matrix
12.3.1 Save Image or Print
12.3.2 The Menus
12.3.3 File->Save Obs. Distance Matrix (Sorted)
13 The Correlation Interaction View
13.1 Correlation Interaction Overview
13.1.1 Pick Wanted Variables
13.2 Viewing Correlation Interactions
13.2.1 About Correlation Interaction
13.2.2 Determining Higher Order Effects
13.2.3 The Correlation Interaction View
13.2.4 The Upper Triangle
13.2.5 The Lower Triangle
14 P-Value Plot
14.1 Plotting P-values
14.2 P-Value plot types
14.2.1 P-Value Plots sorted by Var #
14.2.2 P-Value Plots sorted by adjusted P-value
14.3 The P-value Plot
14.4 Reset View
14.5 Copy to Clipboard
14.6 Axis Selector
14.7 Zooming into the Graph
14.8 File Menu
14.9 Create Bitmap
14.10 Print Image
14.11 P-Value Spreadsheet
15 Text Viewer
15.1 Text Viewer Overview
15.2 Navigating the Text Viewer Menus
15.2.1 File Menu
15.2.1.1 Save As Text File
16 Regression Analysis (Optional Module)
16.1 Overview
16.2 Performing Analysis
16.2.1 Covariates
16.2.2 Type of Regression
16.2.3 Permutation Tests
16.2.4 Create Residual Spreadsheet With Covariates
16.2.5 Output and Running the Regression
III The Science Behind ChemTree
17 Formulas and Theories
17.1 Split-Prediction Methodology
17.2 Normally Distributed Response Binomial Predictor
17.2.1 Univariate Case
17.2.2 Multivariate Case
17.3 Normally Distributed Response Continuous-Ordinal Predictor
17.4 Normally Distributed Response Categorical Predictor
17.5 Linear Regression with Continuous Response
17.5.1 Methodology
17.5.2 Stepwise Regression
17.6 Permutation Test Methodology (Optional Module)
17.7 Results from Linear Regression
17.7.1 Residual Spreadsheet
17.7.2 Linear Regression Statistical Output Viewer
17.7.3 Overall Statistics
17.7.4 Regressor Statistics
17.7.5 Left-Out Regressors
17.7.6 Parameters
17.8 Binomially Distributed Response Binary Predictor
17.8.1 Univariate Case
17.8.2 Multivariate Case
17.9 Binomially Distributed Response Continuous/Ordinal Predictor
17.10 Binomially Distributed Response Categorical Predictor
17.11 Categorical Response
17.12 Logistic Regression with Binomial Response
17.12.1 Methodology
17.12.2 Stepwise Regression
17.13 Results from Logistic Regression
17.13.1 Residual Spreadsheet
17.13.2 Logistic Regression Statistical Output Viewer
17.13.3 Overall Statistics
17.13.4 Regressor Statistics
17.13.5 Left-Out Regressors
17.13.6 Parameters
17.14 Caveats
17.15 The False Discovery Rate and the Simes Method
17.15.1 The False Discovery Rate
17.15.2 Simes’ Method
A EULA
B REFERENCES
C BUG FIX HISTORY
C.1 Bugs Fixed in Version 5.1.0 of ChemTree
C.2 Bugs Fixed in Version 5.0.0 of ChemTree
C.3 Bugs Fixed in Version 4.1.0 of ChemTree
C.4 Bugs Fixed in Version 4.0.3 of ChemTree
C.5 Bugs Fixed in Version 4.0.0 of ChemTree
C.6 Bugs Fixed in Version 3.2.2 of ChemTree