Rising Above Uncertainty; Increasing Clinical Yield in Array-Based Cytogenetics

         July 27, 2010

As Andy Ferrin and I drove the five-hour car ride home from a cytogenetics conference, we had a lot of time to reflect on the persistent themes we heard in presentations and dialog among conference attendees. Taking somewhat of an outsider view, we traced each complaint, each sigh of frustration, and the unverbalized assumptions behind opposing viewpoints, and they all pointed to the same persistent systemic conflict. This conflict stemmed from the extra information, both knowledge, and uncertainty, they were obtaining from using higher resolution microarrays instead of the blunt (but certain) instrument of karyotyping. Some people were on one side of the conflict, others on the other side, but most were oscillating in a compromise position within the dichotomy.

What is this core conflict? Can it be broken to create systemic win-wins for all? Can understanding the core problem of a market lead to truly breakthrough solutions and sustainable competitive advantage? What follows is our attempt at addressing these questions.


Historically, cytogenetic testing based on karyotyping and fluorescence in situ hybridization (FISH) resulted in a rather dismal clinical yield of approximately 3-5%. That is, of 100 patients with obvious birth defects, 95-97 are not provided with a genetic reason for their condition. Despite this low yield, these tests have remained the technology standard for years and still account for the majority of the current US cytogenetic testing market.

Only a handful of years ago, innovators in cytogenetic testing challenged the acceptability of this 3-5% diagnosis rate. Despite early naysayers, these leaders pioneered the field by employing genome-wide BAC arrays, and then even higher resolution arrays, eventually achieving a clinical yield of approximately 20%. In addition to providing directions for treatment, peace of mind and closure for a greater number of patients and their families, this fivefold improvement also created significant competitive advantage, delivering rapid growth, broad adoption, and the rise of multiple competitors. Now, only several years later, array-based cytogenetic testing accounts for more than 20% of the US cytogenetic market and is growing briskly. A recent consensus statement by the International Standard Cytogenomic Array (ISCA) Consortium now recommends chromosomal microarrays as a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies, largely due to its enhanced clinical yield over a G-banded karyotype.

Given that higher resolution arrays with more genomic content have led to increased clinical yields, and given that these higher clinical yields have been a major source of success, one might think a clear path to ongoing innovation and competitive advantage for cytogenetic service providers would be to continue increasing resolution and genomic content, thereby increasing clinical yields. Such a course would help more patients, further widen the gap between previous generation testing, and provide a significant advantage over competitors.

To an outsider, this seems like an obvious direction, yet knowledgeable and experienced people in the field strongly believe this course of innovation is fraught with danger. The unfortunate result is that while 20% of patients are now successfully diagnosed, the vast majority – the remaining 80% – are not, and progress appears to have plateaued.

The Conflict

Observation of the dialog at a recent cytogenetic microarray conference revealed that the field as a whole suffers from a systemic internal conflict, diagrammed below using the “Evaporating Cloud” model from the Theory of Constraints.

Cytogenetic systemic internal conflict

Everyone can agree on a common goal (A) of helping the patient and their family. To do so we clearly need to (B) understand and diagnose the disease afflicting the patient and, at the same time, (C) ensure that healthcare decisions are made based on clear, reliable information. However, these two legitimate needs have led to conflicting behaviors. On one hand, in order to better understand and diagnose disease, there is pressure to (D) increase genetic information. On the other hand, to ensure that decisions are made based on clear, reliable information, there is pressure to take the opposite approach and (D’) limit genetic information to only that which is already known and well characterized.

When individuals are locked into such a systemic conflict, it generates friction between those who identify with opposite sides of the conflict, and inevitably leads to compromises and living with inherent contradictions.  Note how this conflict produces many of the compromises and pain experienced in the field of cytogenetics:

  • The majority of the undiagnosed patients (the 80%) are surprisingly (to outsiders like us) called “normal,” though clearly suffering from a genetic abnormality that was simply not detected or understood. (This use of language appears to be a holdover from the technological limitations of karyotyping, where the dismal 95-97% failure rate in diagnosis and the psychological burden of being unhelpful to parents and child is perhaps repressed under the label “normal”, rather than using a more apt description like “failed to diagnose”)
  • Different cytogenetic labs have dramatically different thresholds for excluding deletions and duplications.
  • Some physicians want to know about every variant in the patient’s genome, others only want to know of abnormal variants that are well-characterized and known.
  • Some labs report only “normal” versus “abnormal,” others report multiple levels of uncertainty.
  • Some high-resolution cytogenetic providers are moving to ultra-high density arrays, others are limiting resolution and/or masking regions to limit uncertainty.
  • Imperfect penetrance and interaction effects are both acknowledged and ignored.
  • When a known pathogenic variant is identified, there is a tendency to ignore other variants of less clear significance.
  • In a real-life experiment, several labs gave dramatically different answers on the same difficult-to-diagnose samples – with most of the reports not reflecting the inherent uncertainty or thinking process that went into what was, in essence, a judgment call.
  • Cytogeneticists are conflicted overstating a variant is possibly or probably pathogenic when doctors are then likely to consider the case closed, or parents likely to cling to these judgment calls that may later be determined erroneous.
  • Doctors are reluctant to prescribe high-density microarray testing for prenatal testing because life/death decisions are made on this data.  Some labs use more stringent criteria for prenatal calls and even use different arrays.
  • Cytogenetic service providers’ marketing pieces often promote “clarity” or “certainty” (perhaps in response to objections of those who are starting from the “near-certain” traditional methods).  Yet this clarity comes with a sacrifice in that only 20% of patients are diagnosed.
  • Current policies require that humans evaluate all copy number variants (CNVs) identified by the computer; computers are not allowed to make these decisions.  However, choosing lower resolution arrays and less sensitive algorithms that reduce false discovery at the expense of sensitivity are already, in effect, automating these decisions.

The Theory of Constraints states that every conflict that has a common goal (in this case, “helping the patient and their family”) can be resolved without compromise.  Further, it stands to reason that the company that focuses its innovation efforts on breaking this conflict will dominate the market for years to come, thereby creating the capital to continue building additional competitive advantage whether through additional products, new markets, or both.


Before proceeding, however, the underlying assumptions of this conflict must be examined.  Specifically, would having more genetic information truly help us understand and diagnose disease better?  Just as many traditional cytogeneticists doubted that more than a 3-5% yield was possible, evidence and experience suggest that many doubt much more than the current 20% yield is possible, even with higher resolution arrays.  Most of this doubt is likely due to not knowing how to deal with the uncertainty of, perhaps, orders of magnitude more CNVs to evaluate or validate.  Let us set aside for the moment how we might deal with the uncertainty of going to higher resolution arrays, and focus instead on the question of whether there is actually more signal in the data that could account for a large portion of the 80% undiagnosed.  Following are compelling reasons to believe there is more to be found:

  • Unless mosaicism spans very large regions, current practices leave them undetected or ignored.  Mosaicism accounts for several percents of the current abnormal findings.  One would expect higher clinical yield by detection of smaller regions and smaller mosaic fractions than current levels.
  • Loss of Heterozygosity (LOH) accounts for 2% or more of abnormals.  Some companies detect these, others don’t, but most focus on only large scale LOH.  Similar to mosaicism, smaller resolution LOH detection is likely to improve clinical yields by additional percentage points.
  • Current cytogenetic practices mostly ignore multi-copy variants, considering only gain, neutral, or loss, plus some large mosaicism.  Further, areas containing benign common copy variants are avoided in some custom arrays in an effort to unburden the analyst.  However, while having one, two, or even three copies of a given segment may be benign, having zero or more than four might be pathogenic (recent literature contains several examples of different threshold effects for CNVs).
  • Interaction effects (epistasis) are acknowledged, but current cytogenetic practices do not provide a way to delineate them with confidence and thereby add to the clinical yield.
  • Ultimately, copy number variation exists at the scale of single base pair indels.  Research from Dr. Charles Lee of Harvard University and others reports that copy number variants account for seven times the genetic heterogeneity of SNPs, with each person having as many as several hundred thousand.  There are already disorders associated with single-digit base pair deletions.

Whether the field is called “cytogenetics” in the future or is supplanted by a new one, it is clear that diagnosis of genetic defects must one day embrace higher resolution techniques, including and up to next-generation sequencing.

Resolving the Conflict

If we accept that greater genomic content can lead to improved clinical yield, still the conflict diagrammed at the beginning of this article must be resolved before that content can be effectively utilized; a way must be found to increase array resolution, identify more variants, and diagnose more patients, but without sacrificing clarity, creating confusion, or eroding the confidence of the cytogenetic lab’s customers.  In other words, increasing clinical yield while avoiding copy number calls of unclear clinical significance.

Thus the heart of the conflict is the concern over finding and revealing CNVs of unknown significance, resulting in a vicious cycle: CNVs cannot be shared until their clinical significance is known, yet their clinical significance cannot be determined until they are collected and analyzed.

Thus, algorithms and methods of lesser sensitivity become desirable in an effort to avoid this very issue – if an unclear CNV isn’t detected, then it doesn’t create uncertainty.  However, by not finding and characterizing new, unknown CNVs, the field is locked in a period of stagnation, relying on external research organizations to find and characterize new syndromes associated with genetic anomalies.

While we do not claim to have all of the answers, we present ideas in the direction of a solution and thoughts on how to turn those improvements into increased yield, revenues, and market share.  Some ideas represent policy changes to break out of the limitations inherited from outworn traditional cytogenetics paradigms, others involve analytics research and development.

Capitalizing on the Conflict

At a high level, we recommend cytogenetic labs implement highly sensitive algorithms, capable of accurately finding CNVs of smaller size and of lesser relative intensity.  Yet we also recommend that this new information is masked or hidden from both the cytogenetic directors and their customers for a period of time.  Alternatively, one might envision this data be placed in some kind of research track that is not normally (or initially) made part of the decision process.

During an average month, we estimate that the larger cytogenetic labs process approximately 1,000 or more samples, of which 800 samples (the aforementioned 80%) go undiagnosed.  Over the course of six months, 4,800 undiagnosed samples are processed.  Nearly 10,000 over the course of a year.

With this ever-increasing number of samples processed at the proposed finer detail, patterns in the data will begin to emerge as CNVs of previously unknown clinical significance converge, as do their associated phenotypes.  These could be uncovered in an automated way through statistical association methods.  At the point where patterns emerge and these CNVs have become increasingly, repetitively associated with a region and phenotype, the filters or mechanisms that hid them can be selectively removed, allowing the users to see these correlations, and enabling them to associate these new variants with a repository of similar cases, not too dissimilar from how many labs work today.  (Capturing quality phenotypes will be essential for this and is discussed later.)

The result, from the referring physician’s perspective, is that they are told not only of their patient’s known variants but, benefiting from a given lab’s experience, are also told of additional CNVs that may not be of known, published syndromes (which, increasingly, most any array-based cytogenetic lab could tell them about) but that experience has shown may be related to the patient’s condition.  While the results themselves may actually be somewhat clinically ambiguous, the lab is able to present them in a fashion which resolves the ambiguity.

Building on this capability or, perhaps, in place of this model, the following ideas are also presented:

  • The cytogenetic service provider can create a subscription plan or premium service that includes longitudinal follow-up.  For the life of a patient, if new information changes the patient diagnosis, the new findings and their impact will be communicated to the referring doctor, letting the physician know there is an ongoing commitment to find the cause of their patients’ conditions.  Note: with every patient as a customer for life, additional follow-on offerings will also be easier to introduce.
  • Dedicate R&D efforts to increasing clinical yield.  Focus on detecting smaller variants of all kinds including mosaics, LOH, and epistatic effects.  Statistical and computer science expertise will be essential for this.
  • Segment the market by tolerance for uncertainty – some customers prefer information that is certain, others prefer to understand the ambiguity and uncertainty of their cases.  Since it takes more time to look at higher resolution calls, the cytogenetic lab can charge accordingly for this extra service.  A test might start with standard resolution and move to higher resolution analysis conditional on finding or not finding something at low resolution – the physician can make the choice to go to higher resolution if nothing is found.  Even providing traditional cytogenetics methods as the lowest resolution offering might be considered to overcome barriers to entry of the more conservative physician – 95-97% of the time nothing would be found, providing an opportunity to up-sell the value of higher yield methods.
  • Quantify the clinical uncertainty with statistical methods.  Arm the cytogeneticist with a true estimate of uncertainty based on global statistical analysis of thousands of samples.  Educate physicians on the aforementioned conflict and how the lab helps them break it.
  • Address the large problem of ambiguity of phenotypes. Currently, phenotypic information provided by the referring physician is highly variable, ranging from very specific to extremely general, making statistical inferences to characterize new abnormal regions problematic.  Given the volume of samples, payback on this extra information is relatively quick.
    • Deploy a methodology to capture phenotypes consistently, simply, and efficiently.  For example, through a drop-down taxonomy that allows the quick exclusion of entire categories of symptoms via a hierarchical list.
    • Help doctors to resolve the conflict of needing to submit one phenotype for insurance purposes and another to help diagnose their patient properly.
    • Incentivize full participation by the physicians, perhaps offering a better price for those that take the time to submit detailed phenotypic information.

Potential Pitfalls

There are many potential hurdles with adopting these ideas, and prudent planning beyond the scope of this document should obviously precede any action.  For example, some undesirable consequences that are often ignored are those of achieving tremendous success.  For instance, what would happen if, by following the above suggestions, a given lab achieved not a 25% clinical yield but rather 40%, 60%, or more?  Some problems that come to mind are:

  • Overloaded personnel (directors, counselors, etc.) due to the excess number of results and increased number of calls of unclear clinical significance.
  • Dissatisfaction from clients due to increased uncertainty about the results they receive.
  • Current policies in some labs provide free testing for the parents of patients diagnosed as abnormal.  Going from 20% to 40% or more means more free testing for parents, hurting margins, and further burdening of key personnel (directors, counselors, etc.).
  • FISH probes do not exist for many smaller variants, so traditional methods of verification are not possible.  This also means that verification of parents must be done with more expensive tests.

Some things to bear in mind are that since lab capacity (hardware and operations) scales relatively easily, the only other bottleneck resource that needs to scale are those people who interpret the data.  However, even if the number of directors had to be doubled or tripled to accommodate the doubling of volume (and revenues), in the end, they form only a small fraction of overall operating expense.  Further, because other personnel costs would remain fixed or rise slowly, margins would rise much faster than would a growth strategy, for instance, based on pursuing wholly new products and/or new markets.  Note also, the commoditization of arrays changes the cost equation over time in a way favorable to more tests for less money.

Also, informatics innovations may enable reduction of the need for extra interpretation capacity if it is acknowledged that the computer is already a part of the diagnostic process (for instance, calling CNVs).  One idea is to automatically calculate and assign a difficulty rating to a case, then route the more difficult cases to experts, and easier cases to less experienced staff.


There is no question that it is only a matter of time before array-based cytogenetic service providers implement technologies with higher density and increased resolution – that is already happening.  The question is which cytogenetic service provider will lead the way and, subsequently, will lead the market.

With these and other ideas, we see tremendous opportunity to sustain and increase competitive advantage, while bringing relief to thousands of undiagnosed patients and their families.

At the end of the day, high-end informatics are necessary but not sufficient for sustaining competitive advantage – a deep understanding of the limitations of existing policies and innovative changes to break core market conflicts are essential as well. …And that’s my two SNPs.

About Christophe Lambert

Dr. Christophe Lambert is the Chairman of Golden Helix, Inc., a bioinformatics software and services company he founded in Bozeman, MT, USA in 1998. Dr. Lambert graduated with his Bachelors in Computer Science from Montana State University in 1992 and received his Ph.D. in Computer Science from Duke University in 1997. He has performed interdisciplinary research in the life sciences for over twenty years.

Leave a Reply

Your email address will not be published. Required fields are marked *