Using the Multitree Model created in Tutorial 1, we can predict compound potency of a new set of untested compounds
using the Cherry Picking feature of ChemTree. See Chapter 7.7.4 for a more detailed discussion of Cherry
Picking. Remember that this model was built using a random half of the data. Normally you would build a
new Multitree Model with all the data, but for this tutorial we will just continue with the previously created
model.
2.5.1 Cherry Pick Options
The first step is to double click the Multitree Model in the Project Navigator Window to open up the Multitree Model Viewer.
Now select the menu choice File->Cherry Pick Compounds Using This Model. This will start the Cherry Picking Wizard
that guides you through the process.
| Figure 2.20: | The "Cherry-picking" Wizard opening screen. |
|
For this tutorial we will be making predictions on an SD formated file, so start by clicking the radio button next to the SD
Format label if it is not already set. We will not be including an additional descriptor file so the check box at the bottom of the
wizard can be left unchecked. Click Next to continue.
| Figure 2.21: | The "Cherry-picking" Wizard for selecting the data source. |
|
On this wizard screen click the Browse button and navigate to the example folder, select the aidssubset.SD file for Cherry
Picking and click the Next button.
| Figure 2.22: | The "Cherry-picking" Wizard for selecting the Compound name field. |
|
You are now shown a list of fields, from the aidssubset.SD file, click on the field that contains the compound names. For
this tutorial that field is called the "Header Name" field. We will not be importing any additional field descriptors for this
tutorial, so make sure the Do not use additional fields as descriptors radio button is selected. Then click Next to
continue.
| Figure 2.23: | The "Cherry-picking" Wizard for setting selection parameters. |
|
We are going to select all compounds for predictions so make sure that the radio button Select all compounds is set. The
Output Predicted Activity box should be checked by default. Leave the Project Spreadsheet radio button selected for the
output location. This will later produce a spreadsheet of predicted activity for us to review. No other settings need to be made
so click the Finish button to start the Cherry Picking process.
2.5.2 Cherry Pick Results
Once the progress bars are completed, you see an information dialog Fig. (2.24) telling you that predictions
were made for 3019 compounds from the aidssubset data file using the 100 random trees from the Multitree
Model.
| Figure 2.24: | Dialog showing results completion of Cherry Pick analysis |
|
After closing this dialog, a Cherry Pick Output spreadsheet is added to the Project Navigator and a Spread Sheet Viewer,
Fig. (2.25), is displayed with the results of the analysis.
| Figure 2.25: | Spreadsheet showing prediction results of Cherry Pick analysis |
|
The untested compounds are effectively "dropped" down the tree into their appropriate place according to their structural
features. Compounds that fall in high potency nodes are more likely to have a higher potency than compounds simply chosen
at random. Using multiple randomly generated trees, it is possible to get an average prediction across many models that is an
improvement over a single tree prediction. Hit rates in cherry-picked compounds tend to be anywhere from 10 to 100-fold
over randomly selected compounds, given that the training set has sufficient structural similarity with the holdout
set.