Drug companies could save millions on research by employing a mathematical strategy akin to the one Netflix uses to suggest movies you’ll love.
Scientists have developed an algorithm that ranks chemical compounds based on their potential drug activity. The new technique, reported online in the Journal of Chemical Information and Modeling, outperforms current computational methods that seek therapeutic needles in enormous chemical haystacks.
The cost of developing a new drug is estimated by industry to be more than $1 billion. Much of this expense comes from the cost of pursuing initially promising molecules that ultimately fail, notes study coauthor Shivani Agarwal of MIT’s Computer Science and Artificial Intelligence Laboratory. Of 10,000 tested compounds, only one or two will make it to market.
Many researchers are turning to computers for help sorting through the millions of molecules in chemical libraries. Machine-learning techniques, which train computers with known solutions to a problem so it can then seek novel solutions on its own, can help researchers focus on the small number of really promising molecules, says Cynthia Rudin, an MIT expert in machine learning who was not involved with the research.
Such learning methods are used by outfits such as Netflix, where users rate how much they like a movie, and the computer uses that information to suggest others of the same ilk.
The new work used a particular kind of machine-learning algorithm, known as a ranking algorithm. Rather than just estimating the probability that a molecule will have some desired activity, the computer also ranks the molecules compared to each other. This can bring the cream of the crop to the top. So instead of knowing only that someone is a fan of 30 particular movies, the ranking method also asks, is Chinatown better than Star Wars? Is Parenthood better than The Big Lebowski?
The researchers began with five data sets, each of which had 50 compounds of known therapeutic activity buried among 2,092 inactive molecules. Half of these molecules were used to train the algorithm, by feeding it chemical information such as bond length, molecular weight, and which parts of the molecule have electrical charge. After being told which 25 of the 1,046 training compounds were winners for each data set, the algorithm had to figure out how to weigh all of the chemical information in order to find the other 25 promising compounds.
The technique outperformed, albeit modestly, standard algorithms that had scoured the same data. But this is just the beginning, says Agarwal. She and her colleagues are developing other ranking algorithms that really focus on ranking the best as the best. “It could really help pinpoint compounds,” she says.
“Ranking methods are relatively new, and have not been discovered by many industries,” adds Rudin. “My guess is that these new methods would really make a difference in finding the best candidates.”