TY - JOUR
T1 - Machine Learning for Toxicity Prediction Using Chemical Structures
T2 - Pillars for Success in the Real World
AU - Seal, Srijit
AU - Mahale, Manas
AU - García-Ortegón, Miguel
AU - Joshi, Chaitanya K.
AU - Hosseini-Gerami, Layla
AU - Beatson, Alex
AU - Greenig, Matthew
AU - Shekhar, Mrinal
AU - Patra, Arijit
AU - Weis, Caroline
AU - Mehrjou, Arash
AU - Badré, Adrien
AU - Paisley, Brianna
AU - Lowe, Rhiannon
AU - Singh, Shantanu
AU - Shah, Falgun
AU - Johannesson, Bjarki
AU - Williams, Dominic
AU - Rouquie, David
AU - Clevert, Djork Arné
AU - Schwab, Patrick
AU - Richmond, Nicola
AU - Nicolaou, Christos A.
AU - Gonzalez, Raymond J.
AU - Naven, Russell
AU - Schramm, Carolin
AU - Vidler, Lewis R.
AU - Mansouri, Kamel
AU - Walters, W. Patrick
AU - Wilk, Deidre Dalmas
AU - Spjuth, Ola
AU - Carpenter, Anne E.
AU - Bender, Andreas
N1 - Publisher Copyright:
© 2025 The Authors. Published by American Chemical Society.
PY - 2025/5/19
Y1 - 2025/5/19
N2 - Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.
AB - Machine learning (ML) is increasingly valuable for predicting molecular properties and toxicity in drug discovery. However, toxicity-related end points have always been challenging to evaluate experimentally with respect to in vivo translation due to the required resources for human and animal studies; this has impacted data availability in the field. ML can augment or even potentially replace traditional experimental processes depending on the project phase and specific goals of the prediction. For instance, models can be used to select promising compounds for on-target effects or to deselect those with undesirable characteristics (e.g., off-target or ineffective due to unfavorable pharmacokinetics). However, reliance on ML is not without risks, due to biases stemming from nonrepresentative training data, incompatible choice of algorithm to represent the underlying data, or poor model building and validation approaches. This might lead to inaccurate predictions, misinterpretation of the confidence in ML predictions, and ultimately suboptimal decision-making. Hence, understanding the predictive validity of ML models is of utmost importance to enable faster drug development timelines while improving the quality of decisions. This perspective emphasizes the need to enhance the understanding and application of machine learning models in drug discovery, focusing on well-defined data sets for toxicity prediction based on small molecule structures. We focus on five crucial pillars for success with ML-driven molecular property and toxicity prediction: (1) data set selection, (2) structural representations, (3) model algorithm, (4) model validation, and (5) translation of predictions to decision-making. Understanding these key pillars will foster collaboration and coordination between ML researchers and toxicologists, which will help to advance drug discovery and development.
UR - https://www.scopus.com/pages/publications/105004762127
U2 - 10.1021/acs.chemrestox.5c00033
DO - 10.1021/acs.chemrestox.5c00033
M3 - Review article
C2 - 40314361
AN - SCOPUS:105004762127
SN - 0893-228X
VL - 38
SP - 759
EP - 807
JO - Chemical Research in Toxicology
JF - Chemical Research in Toxicology
IS - 5
ER -