Jon Swain
AboutTopics

Ultra-Large Libraries

  • Nov 26, 2025 • 17 min read

    Working with Large Virtual Chemical Libraries: Part 3 - Thompson Sampling for Classification

    Exhaustively screening billion-compound virtual libraries would take decades, so we need smarter ways to hunt for molecules. I look at how we can adapt Thompson Sampling, a classic reinforcement learning technique, using the Beta distribution to efficiently find active compounds without breaking our computers.

    Read article

  • Jan 2, 2025 • 12 min read

    Working with Large Virtual Chemical Libraries: Part 2 - Genetic Algorithms

    When a virtual library is way too massive to screen one molecule at a time, genetic algorithms offer an elegant way out. In part two of this series, I explore how biologically inspired selection can navigate massive combinatorial spaces using just building block data.

    Read article

  • May 18, 2024 • 15 min read

    Working with Large Virtual Chemical Libraries: Part 1 - Active Learning

    If a computational scoring function takes just one second per molecule, screening a billion-compound library would take nearly 32 years. In part one of this series, I look at how we can use active learning loops to train a machine learning model, allowing us to intelligently hunt down the highest-performing molecules without exhaustively testing the whole library.

    Read article

Subscribe

I am a data scientist and cheminformatician, originally from the UK, but often found in Aotearoa (New Zealand). I'm interested in using data science and machine learning to solve problems in drug discovery. When not in front of a computer, I can usually be found in the mountains or on the water.