Multi-Way vs. Binary Splits
The real world is not linear, nor is the data that comes from it. In fact, data is full of discontinuities, irregular curves, and unexpected relationships. While binary segmentation works well with data that is linear, symmetrical, or dimensionally simple, their efficacy fails in complex scenarios.
Consider the following example:
- 1000 random points, mean=1
- Followed by 100 random points mean=1.15
- Followed by 1000 random points, mean=1
Goal
Find the best segmentation, minimizing sum of squared deviations of the data from the segment’s mean.
Best binary split has multiple problems (Figure 1):
- Does not find either edge of the real change
- The splitting is driven by noise
- The split shows nominal significance p=0.04 (not shown), yet is meaningless
- Subsequent binary splits will begin from this erroneous position and will not be able to “recover” to the proper segmentation, resulting in false bins.

Figure 1. Sub-optimal binary split.
Optimal fit is a 3-way split (Figure 2):
- In one single step, the discontinuity is found and properly segmented.
- Is highly statistically significant p=9.1e-16 (not shown)
- Captures full explanatory power of x in one step

Figure 2. Optimal three-way split