Machine Learning
-
Peak Performance or Just Noise?
It's easy to look at machine learning leaderboards and assume raw scores tell the whole story. In this post, I compare statistical methods to cut through the noise and help us spot genuine model superiority.
-
Working with Large Virtual Chemical Libraries: Part 3 - Thompson Sampling for Classification
Exhaustively screening billion-compound virtual libraries would take decades, so we need smarter ways to hunt for molecules. I look at how we can adapt Thompson Sampling, a classic reinforcement learning technique, using the Beta distribution to efficiently find active compounds without breaking our computers.
-
Interpretability vs. Explainability in Cheminformatics
Interpretability and explainability are different concepts in machine learning, yet many cheminformatics authors use the terms interchangeably.
-
Chemprop-RF: A Hybrid Approach to Chemical Property Prediction
Can we combine d-MPNNs and Random Forests to outperform each of them individually?
-
Drug Repurposing Using Artificial Intelligence
Finding new uses for existing, approved medications is a massive shortcut in drug discovery. After bad weather ruined my weekend hiking plans, I sat down to build an open-source deep learning workflow to virtually screen clinical libraries for hidden hits.
-
TabPFN for Chemical Datasets
TabPFN is a new transformer-based foundation model that claims to handle tabular data in a single, lightning-fast forward pass. I decided to put it to the test on several molecular property benchmarks to see how it holds up out of the box.
-
Working with Large Virtual Chemical Libraries: Part 1 - Active Learning
If a computational scoring function takes just one second per molecule, screening a billion-compound library would take nearly 32 years. In part one of this series, I look at how we can use active learning loops to train a machine learning model, allowing us to intelligently hunt down the highest-performing molecules without exhaustively testing the whole library.