Recursive Partitioning Analysis of Anti-HIV Compounds

Background

The NIH Developmental Therapeutics Program AIDS Antiviral Screen has tested tens of thousands of compounds for evidence of anti-HIV activity, and has made these results publicly available at dtp.nci.nih.gov/docs/aids/aids_data.html. We retrieved this data and selected 31,693 compounds that were measured for their protective effect on HIV-infected cells. The effect, EC50, is the concentration in molar units that gives 50% protection effect on infected cells. We averaged concentrations where there were multiple measures. The potency is measured as pEC50= -log10(EC50). The log transform makes the data closer to normal, and hence more suitable for statistical analysis.

Brute Force HTS Brute-force HTS refers to conventional HTS as it is practiced today whereby large numbers of random compounds are screened, resulting in only a small proportion of hits.

Below is the potency distribution of the entire data set. This is the distribution one may expect on a typical brute-force random screen. A screener may typically select the top 1% of compounds for further development. As we see from the histogram, a small proportion of the compounds have a high potency. In the table below, we see 5% are above 6, 1.52% is above 7, and 0.12% are above 8. If the target potency is 8 (1e-8 Molar), only 38 out of 31,693 compounds are hits -- a rather meager outcome.

31693 DTP AIDS COMPOUNDS Potency >6 Potency >7 Potency >8
Hits Hit Rate Hits Hit Rate Hits Hit Rate
1574 5.00% 481 1.52% 38 0.12%

 

Smart HTS

Smart HTS entails the following steps:

  1. 1. Select and screen a small number of compounds.
  2. 2. Build a ChemTree model using the data from screen.
  3. 3. Use ChemTree model to cherry-pick additional compounds.
  4. 4. Screen cherry-picked compounds to obtain desired hits.

The following is an example application that demonstrates the superiority of smart HTS over brute-force HTS by generating a 1,667% increase in hit rates.

We generated 14,034 molecular descriptors, augmented atoms, for the 31,693 compounds. We selected a random subset of 10% of the compounds which totals 3,169 compounds. We then built a recursive partitioning-based ChemTree model tree using those 3,169 compounds. We selected those nodes whose compounds had a mean potency of 6 or higher. Then we dropped the holdout sample of 90% of the compounds (28,524 compounds) down the training tree, and cherry-picked the compounds that were in the mean-of-6-or-higher nodes chosen initially. We repeated the above process 10 times with independent random subsets of the data in order to show the variability in our predictions. Over the 10 repeats of the process, we had the following potency distribution of the cherry-picked compounds:

Trial Number of cherry-picked compounds selected by the ChemTree Model (from pool of 28524) POTENCY >6 POTENCY>7 POTENCY>8
  Hits Hit Rate Hits Hit Rate Hits Hit Rate
1 212 107 50.0% 56 26.4% 6 2.8%
2 413 146 35.3% 90 22.0% 6 1.5%
3 376 186 49.5% 104 27.7% 5 1.3%
4 350 141 40.2% 92 26.3% 13 3.7%
5 349 163 46.7% 101 28.9% 5 1.4%
6 601 194 32.3% 110 18.3% 11 1.8%
7 199 76 38.2% 49 24.6% 2 1.0%
8 529 251 47.4% 150 28.4% 10 1.9%
9 439 169 38.5% 93 21.2% 10 2.3%
10 379 125 33.0% 96 25.3% 8 2.1%
Average 384.7 155.8 40.5% 94.1 24.5% 7.6 2.0%

In the table above, we have averaged the 10 repeats of the procedure. We see 40.5% of the cherry-picked compounds have a potency above 6, 24.5% are above 7, and 2.0% are above 8. If the target potency is 8 (1e-8 Molar), and we had enough compounds to cherry-pick 31693 compounds, we would obtain about 625 hits on average, compared with the 38 found in the random 31693 compounds. The average improvement in hit ratios is as follows:

  Brute Force
Hit Rate
ChemTree
Hit Rate
Avg.Hit Rate
Increase
Potency>6 5.0% 40.5% 810%
Potency>7 1.52% 24.5% 1612%
Potency>8 .12% 2.0% 1667%

40.5/5.0 = 810% for potencies over 6

24.5/1.52 = 1612% for potencies over 7

2.0/.12 = 1667% for potencies over 8

With only 3,169 compounds we were able to construct a model that delivers an increase in hit rate of over 1600% for cherry-picked compounds. If we wished to get 30 additional hits at a potency of 8 or higher, we would only have to test in the neighborhood of 1,520 cherry-picked compounds.