Jon Swain
AboutTopics

Machine Learning

  • May 13, 2026 • 1 min read

    Peak Performance or Just Noise?

    It's easy to look at machine learning leaderboards and assume raw scores tell the whole story. In this post, I compare statistical methods to cut through the noise and help us spot genuine model superiority.

    Read article

  • Nov 26, 2025 • 17 min read

    Working with Large Virtual Chemical Libraries: Part 3 - Thompson Sampling for Classification

    Exhaustively screening billion-compound virtual libraries would take decades, so we need smarter ways to hunt for molecules. I look at how we can adapt Thompson Sampling, a classic reinforcement learning technique, using the Beta distribution to efficiently find active compounds without breaking our computers.

    Read article

  • Nov 9, 2025 • 7 min read

    Interpretability vs. Explainability in Cheminformatics

    Interpretability and explainability are different concepts in machine learning, yet many cheminformatics authors use the terms interchangeably.

    Read article

  • Sep 12, 2025 • 12 min read

    Chemprop-RF: A Hybrid Approach to Chemical Property Prediction

    Can we combine d-MPNNs and Random Forests to outperform each of them individually?

    Read article

  • May 3, 2025 • 11 min read

    Drug Repurposing Using Artificial Intelligence

    Finding new uses for existing, approved medications is a massive shortcut in drug discovery. After bad weather ruined my weekend hiking plans, I sat down to build an open-source deep learning workflow to virtually screen clinical libraries for hidden hits.

    Read article

  • Jan 22, 2025 • 7 min read

    TabPFN for Chemical Datasets

    TabPFN is a new transformer-based foundation model that claims to handle tabular data in a single, lightning-fast forward pass. I decided to put it to the test on several molecular property benchmarks to see how it holds up out of the box.

    Read article

  • May 18, 2024 • 15 min read

    Working with Large Virtual Chemical Libraries: Part 1 - Active Learning

    If a computational scoring function takes just one second per molecule, screening a billion-compound library would take nearly 32 years. In part one of this series, I look at how we can use active learning loops to train a machine learning model, allowing us to intelligently hunt down the highest-performing molecules without exhaustively testing the whole library.

    Read article

Subscribe

I am a data scientist and cheminformatician, originally from the UK, but often found in Aotearoa (New Zealand). I'm interested in using data science and machine learning to solve problems in drug discovery. When not in front of a computer, I can usually be found in the mountains or on the water.