Client
This collaboration between Ghent University and Sunthetics focused on optimizing categorical variables in a complex Suzuki-Miyaura cross-coupling. The joint team aimed to compare the predictive power and resource efficiency of SuntheticsML with traditional high-throughput experimentation (HTE).
Challenge
- Determine the best-performing catalyst, base, and solvent combination from a large pool of categorical options.
- Handle high dimensionality in experimental design without requiring parameterization.
- Reduce the resource cost and scale of traditional HTE workflows, which demand hundreds of reactions and significant infrastructure.
Goal
- Achieve or surpass the performance of HTE using significantly fewer experiments.
- Demonstrate that ML can efficiently optimize categorical variables without prior assumptions or encoding.
- Reveal non-intuitive variables of influence to inform future R&D strategy.
Approach & Solution
- Used SuntheticsML’s proprietary Supervised + Active Learning (SL + AL) framework.
- No one-hot encoding or parameter mapping was needed—true categorical optimization was performed directly.
- Each ML seed (5 total) started from 36 randomly selected experiments (out of 768).
- At each iteration, SuntheticsML suggested 6 experiments:
- 5 predicted optimal points
- 1 exploratory point for model refinement
- The experiment recommendations adapted dynamically as performance data improved.
Results & Metrics
- Traditional HTE:
- Required 768 experiments to map the full design space.
- SuntheticsML outcomes:
- Average performance:
- 6 iterations, 72 experiments, 91% experiment reduction
- Best performance:
- 2 iterations, 48 experiments, 94% experiment reduction
- Worst case:
- 9 iterations, 84 experiments, 89% reduction
- Speed to optimum:
- Reached the same global maximum as HTE with a fraction of the cost and time.
- Insight from variable importance analysis:
- Contrary to expectation, base selection had the largest influence on reaction yield.
- Solvent and catalyst changes had significantly less predictive weight, shifting future optimization priorities.
The Sunthetics Edge
"This study shows why categoricals don’t have to be the bottleneck. With SuntheticsML, turned a 768-experiment challenge into a 48-experiment solution—without sacrificing accuracy. The model revealed that base selection—not catalyst or solvent—was the key driver of performance. That’s the kind of insight you don’t get from brute force."
Key Takeaways
- 94% fewer experiments than HTE, with equivalent outcomes.
- Optimizes true categorical variables without any transformation—no parameterization required.
- Identified base as the key driver of reaction yield, challenging conventional assumptions.
- Works with small data—just 36 random starting points were enough to converge on the optimum.
- Rapid iteration cycles (as few as 2 needed) accelerate discovery and reduce costs.