Motivation: One goal of metabolomics is to define and monitor the
entire metabolite complement of a cell, while it is still far from reach
since systematic and rapid approaches for determining the biotransformations
of newly discovered metabolites are lacking. For drug
development, such metabolic biotransformation of a new chemical
entity (NCE) is of more interest because it may profoundly affect its
bioavailability, activity and toxicity profile. The use of in silico methods
to predict the site of metabolism (SOM) in phase I cytochromes
P450-mediated reactions is usually a starting point of metabolic
pathway studies, which may also assist in the process of drug/lead
Results: This paper reports the CYP450-mediated SOM prediction
for the six most important metabolic reactions by incorporating the
use of machine learning and semi-empirical quantum chemical calculations.
Non-local models were developed on the basis of a large
dataset comprising 1858 metabolic reactions extracted from 1034
heterogeneous chemicals. For validation, the overall accuracies of
all six reaction types are higher than 0.81, four of which exceed
0.90. In further receiver operating characteristic (ROC) analyses,
each of the SOM model gave a significant area under curve (AUC)
value over 0.86, indicating a good predicting power. An external test
was made on a previously published dataset (Sheridan, et al.,
2007), of which 80% of the experimentally observed SOMs can be
correctly identified by applying the full set of our SOM models.
Availability: The program package SOME_v1.0 (Site Of Metabolism
Estimator) developed based on our models is available at
Contact: hljiang@mail.shcnc.ac.cn