Ghose 1999
Ghose 1999
Ghose 1999
1999, 1, 55-68 55
The discovery of various protein/receptor targets from genomic research is expanding rapidly. Along with the
automation of organic synthesis and biochemical screening, this is bringing a major change in the whole field of
drug discovery research. In the traditional drug discovery process, the industry tests compounds in the thousands.
With automated synthesis, the number of compounds to be tested could be in the millions. This two-dimensional
expansion will lead to a major demand for resources, unless the chemical libraries are made wisely. The
objective of this work is to provide both quantitative and qualitative characterization of known drugs which will
help to generate ªdrug-likeº libraries. In this work we analyzed the Comprehensive Medicinal Chemistry (CMC)
database and seven different subsets belonging to different classes of drug molecules. These include some central
nervous system active drugs and cardiovascular, cancer, inflammation, and infection disease states. A
quantitative characterization based on computed physicochemical property profiles such as log P, molar
refractivity, molecular weight, and number of atoms as well as a qualitative characterization based on the
occurrence of functional groups and important substructures are developed here. For the CMC database, the
qualifying range (covering more than 80% of the compounds) of the calculated log P is between -0.4 and 5.6,
with an average value of 2.52. For molecular weight, the qualifying range is between 160 and 480, with an
average value of 357. For molar refractivity, the qualifying range is between 40 and 130, with an average value
of 97. For the total number of atoms, the qualifying range is between 20 and 70, with an average value of 48.
Benzene is by far the most abundant substructure in this drug database, slightly more abundant than all the
heterocyclic rings combined. Nonaromatic heterocyclic rings are twice as abundant as the aromatic heterocycles.
Tertiary aliphatic amines, alcoholic OH and carboxamides are the most abundant functional groups in the drug
database. The effective range of physicochemical properties presented here can be used in the design of drug-like
combinatorial libraries as well as in developing a more efficient corporate medicinal chemistry library.
1. Introduction titative structure-activity relationship (3D-QSAR) methods; 12-
18
(iii) developing ultrasensitive high-throughput screening
It is widely believed1,2 that the pharmaceutical and
biotechnology industry will be one of the most active (HTS).4,19 The design process can be further streamlined by
industrial fields in the next century because of the informa- focusing on ªdrug-likeº molecules. A convenient starting point
tion explosion in the field of genomics. The number of target to develop a consensus definition(s) of a drug-like molecule is
proteins that can yield important therapeutic agents is to analyze the databases of known pharmaceuti-cal agents. As
expected to increase dramatically in the near future. The a first step, it is necessary to identify biologically and
pharmaceutical drug discovery research is currently undergo- pharmacologically relevant properties which are easily
ing a tremendous change due to automated parallel organic computable from the structure. In this context, it will be
synthesis3,4 (combinatorial chemistry) and high-throughput instructive to analyze the physicochemical, topologi-cal, and
electronic properties of all known drugs and compare the
biochemical screening.4 However, it is necessary to avoid the
properties of different classes of drugs. It will be easy to
pitfall of combinatorial explosion because, although the cost
formulate a consensus definition if the drug molecules are
of high-throughput screening or automated synthesis per
compound may be low, it will become fairly expensive when clustered in one or more property spaces. An earlier analysis 20
multiplied by millions. There are several ways to decrease the of known drugs pertained to molecular frameworks and
cost to a manageable level: (i) understanding the target protein employed shape description methods to prepare a list of
structure and designing focused compounds or libraries that fit common drug shapes. Another analysis 21 of known drugs (the
the protein binding site;5-11 (ii) designing compounds or Comprehensive Medicinal Chemistry (CMC) database) and
libraries around ªhitsº that are often identified from the initial other databases such as the Available Chemicals Direc-tory
screening of the existing corporate libraries, using (ACD) has been devoted to identify the criteria to use in the
pharmacophore modeling and three-dimensional quan- selection of compounds for screening. This study
10.1021/cc9800071 CCC: $18.00 © 1999 American Chemical Society
Published on Web 12/18/1998
56 Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 Ghose et al.
Table 1. Classes of Drugs Identified from the CMC Database Based on Keyword Searches and Their Average Physicochemical
a
Properties
database no. disease type keyword no. of drugs ALOGP AMR MW no. of atoms
1 CMC (clean) 6304 2.30 (2.6) 96.7 (45.3) 357 (174) 48.4 (25.0)
2 inflammation antiinflammatory 290 3.09 (1.5) 89.2 (31.0) 335 (122) 43.3 (19.0)
3 CNS antidepressant 208 3.05 (1.5) 85.8 (19.3) 291 (69) 42.0 (9.7)
4 CNS antipsychotic 105 4.10 (1.5) 108.0 (22.8) 380 (83) 51.7 (12.2)
5 cardiovascular antihypertensive 269 1.97 (2.1) 97.7 (33.5) 361 (123) 48.3 (18.2)
6 CNS hypnotic 74 2.20 (1.5) 70.2 (25.3) 277 (99) 33.6 (11.5)
7 cancer antineoplastic 349 1.59 (2.5) 87.6 (36.5) 332 (129) 44.1 (19.5)
8 infection antiinfective 39 2.38 (2.7) 89.0 (42.9) 339 (139) 41.7 (25.2)
a
Mean and standard deviation (in parentheses) are listed for (a) calculated log P (ALOGP), (b) calculated molar refractivity (AMR), (c)
molecular weight (MW), and (d) number of atoms.
showed that the MDL keys provide at least one way to several classes of compounds such as diagnostic imaging
eliminate compounds least likely to satisfy ªdrug-likenessº. agents, solvents, and pharmaceutical aids that are important
Lipinski et al.22 studied the USAN (United States Adopted in the pharmaceutical industry but not necessarily drug-
Names) compound list in terms of the computed lipophilicity like. The analyzed CMC database was therefore cleaned of
by Moriguchi method23 and gave a set of rules (ªthe rule of these agents (see also ref 20). The search expression for the
5º). According to ªthe rule of 5º, a poor permeation or removed agents was built from ISIS ªquery builderº and had
absorption is more likely when there are more than 5 H-bond the syntax
donors, 10 H-bond acceptors, the molecular weight is greater
than 500, and the calculated log P (Clog P) is greater than 5 MOL > CLASS > class like ª%keyword%º
(or Moriguchi log P > 4.15); compound classes that are
substrates for biological transporters are exceptions to the rule. where the keyword was one of the following: radiopaque,
Hydrogen bond donors, acceptors, and molecular weight are contrast, disinfectant, spermicide, wetting, flavoring, phar-
easily computed for any library, real or virtual, but calculating maceutical aid, surgical aid, dental, surfactant, sunscreen,
lipophilicity accurately entails choosing a well-tested, ultraviolet screen, preservative, aerosol, chelator, insecticide,
commercially available method that is easy to use and is astrigent, herbicide, solvent, laxative, sweetner, adhesive,
generally applicable to all classes of organic molecules of dentistry, veterinary, buffer, scabicide. In addition we screened
medicinal interest. A recent study 24 shows that the ALOGP compounds with elements X (a symbolic repre-sentation of a
method24-26 satisfies these criteria and is therefore likely to be resin), Li, Be, and various transition elements as well as a few
more useful. The main objective of the present work is to compounds with radioactive elements. A few drugs with a
profile some pharmacologically relevant physicochemical mixture of more than one compound such as haloquinol were
also removed. All this screening dropped the number of
properties including log P (using the ALOGP24-26 method)
ªacceptableº compounds to 6454. The substructure search was
and pharmacophorically relevant chemical functionalities of
done with this data set. However, during the physicochemical
some important classes of drugs along with the whole CMC
data set, in order to empirically define a drug-like molecule. property calculation a few compounds were deleted because of
This, in turn, will help to design the drug-like combinatorial complex structures or unavailable parameters. This database
libraries and to develop guidelines for prioritizing large sets of had 6304 compounds.
compounds for biological testing. In addition to the whole CMC database, we analyzed
several drug classes such as central nervous system (CNS),
2. Materials and Methods cardiovascular, cancer, inflammation, and infectious
(a) Molecular Databases. There are several commercially diseases. These types and the specific keywords used in
available drug-related databases, e.g., Chapman and Hall's these searches are shown in Table 1.
Dictionary of Pharmacological Agents27 which contains over (b) Molecular Physicochemical Properties. The selection
30 000 compounds of pharmacological interest, the MDL of physicochemical properties in the profiling may need some
Drug Data Report (MDDR) which has the structures and discussion. Experimentally determined values are not directly
biological activity data for over 85 000 compounds, 28 and the useful in the design process since we need the properties
CMC database (version 97.1) which contains the structures before the compounds are made. The best choice may be the
and biological properties of 7183 compounds.28 We used here experimentally measurable (pharmaceutically and bio-
the CMC database, since the major source for compounds in logically relevant) properties that can be computed reliably.
this compendium was the drugs identified by either USAN 29 Calculated log P, molar refractivity, number of hydrogen bond
donors and acceptors, molecular size (number of atoms and
or INN (International Nonproprietary Names) 29 generic
molecular weight), and molar refractivity are some examples
names. These names include practically all medicinal agents
of the properties that satisfy the criterion. Math-ematical
or compounds intended for clinical study in the advanced
countries. In other words, the CMC database is by far the properties such as topological indices and functional
closest database of drug-like molecules. The other databases characteristics such as substructure counts may also be
contain a large fraction of early discovery stage compounds interesting to study. The quantum chemical properties such as
which may not be drug-like. The current CMC version (v. highest occupied molecular orbital and lowest unoccupied
97.1) contains 7183 structures. However, it has molecular orbital30 often are conformation dependent and are
Characterization of Known Drug Databases Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 57
difficult to study for large data sets. In the current study we structurally; (ii) to compare property distributions among
included molecular weight, number of atoms, calculated log P different drug classes and the complete database and
using the ALOGP24,31 method, and molar refractivity using analyze the differences; (iii) to develop a practical strategy
the AMR25,31,32 method. The latter property is related to the for designing combinatorial libraries or a corporate
volume of the molecule and its molecular weight. medicinal chemistry compound library.
(c) Computational Steps for Physicochemical Profiling Physicochemical Properties. Molecular lipophilicity and
and Comparison of Drug Classes. The various computa- molar refractivity of drug molecules are long known to be
tional steps in this analysis are summarized below: (i) once the important features strongly influencing receptor binding,
clean CMC and other drug class lists were created, structures cellular uptake, and bioavailability. As fragmental constants,
were exported from ISIS as SD files; (ii) the SD files were they are used to represent the hydrophobic and dispersive (van
then converted into Galaxy databases for the calculation of der Waals) interactions,32 respectively. These properties are
relevant properties, taking care to correct the representation of
used in QSAR34,35 and 3D-QSAR12,13,36,37 studies. Thus the
functionalities such as nitro or N-oxides which are represented
range and distribution of these properties can be used to
differently in ISIS28 and Galaxy;24 (ii) most of the simple fingerprint or characterize a library or drug class. It must,
hydrogen halide salts were converted to the corresponding however, be cautioned that this characterization pertains to the
neutral system for a better computation of the overall property of the molecule and not to the distribu-tion of
physicochemical properties; (iv) analysis of the the property within the molecule. Nevertheless, it may be
physicochemical properties (range and percentile distribution) regarded as a useful filter, and hence it may be used to develop
was performed using the ªDatabase Analysisº module of the a consensus definition of drug-like character. The mean values
Galaxy software. We determined two different ranges for the and the corresponding standard deviations of these properties
physicochemical properties: the qualifying range which are shown in Table 1. The frequency distributions of these
covers approximately 80% of the drugs studied and the properties are shown in Figures 2-5.
preferred range which covers approximately 50% of the Table 1 shows the average values calculated for log P
drugs. Having two ranges may be useful as the distribution,
(ALOGP24-26) and molar refractivity (AMR 25,31,32) using the
being ç type,33 may need a considerably larger range to cover
atomic constant approach. The average ALOGP24-26 value of
80% of known drugs, whereas the search/design for new drugs
the CMC database is 2.3 with a standard deviation of 2.6. The
may be more efficient if we simply consider com-pounds
qualifying and preferred ranges for the whole CMC database
having a considerably shorter property range (pre-ferred
as well as the seven drug classes are shown in Table 2. The
range!) which has a high density of occupation within the
qualifying range for the CMC database is -0.4 to 5.6 (see
qualifying range. On the basis of a careful analysis of the Figure 2 and Table 2). The corresponding preferred range is
property histograms showing the ranges occupied by different 1.3 to 4.1. It may be interesting to analyze the compounds
percentages of drugs in each drug class and in the CMC which are well beyond this range, for example, the very
database, the following definition of the preferred range
hydrophilic drugs whose ALOGP24-26 values were less than -
appeared satisfactory: the preferred range is the smallest range
5.0 are shown in Table 3. These compounds are mostly
within the qualifying range occupied by approximately 50%
polyhydroxy polyamine antibacterial compounds, unblocked
of the drugs. It is thus necessary to determine both the
(zwitterionic) peptides, and quaternary ammonium salts.
interval and location of these ranges. The interval of the Unlike the antibacterial compounds, the peptides and
range was determined by starting from (mean - standard quaternary salts have several hydrophobic cores in their
deviation) to (mean + standard deviation). The range was structures. The antibacterial compounds are definitely a
expanded symmetrically on either side of the mean if it did special class of biologically active compounds, which are very
not cover approximately 80% (or 50%) of the drug and different from the regular drugs. These findings clearly show
contracted if it was considerably higher. Once the interval that unless one is interested in a special class of drugs such as
of the range was determined, it was shifted on either side antibacterial, the chance of success is at least 1 order of
until the most densely populated area was obtained. magnitude higher if we keep log P of the compound within -
(d) Analysis of the Chemical Functionalities and 0.4 to 5.6. The analysis of the drugs whose ALOGP 24-26 values
Important Substructures. We determined the frequency of were greater than 7.0 (Table 4) did not show any predominant
occurrence of common organic functional groups and class, although quite a few steroids were in this list. Most of
aromatic ring systems and a few interesting structural moieties these compounds had a relatively high molecular weight
in the CMC database and in a few different drug classes. The (>500). Some of these compounds had a very hydrophobic
substructure search was done using the ISIS software. 28 The hydrolyzable group. It is possible that some of these
search queries sometimes were a simple structure in a compounds were prodrugs,38 and the hydrophobic group
substructure search; sometimes they were a complex query helped to cross the cell membrane, blood brain barrier, or to
with the presence or absence of several substructures. These enhance its chemical stability, etc. Some of these outliers may
organic functional groups, ring systems, and other structural resemble some naturally occurring compounds of the body and
moieties are shown in Figure 1. may have an active transport mechanism over passive
transport.39 Both of these lists showed some compounds that
3. Results and Discussion did not have any ªdrug classº in the CMC database. These
The objectives of this work are (i) to develop a consensus compounds were deleted from the analysis.
definition of a drug-like molecule, physicochemically and It is seen that CNS drugs differed considerably in their
58 Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 Ghose et al.
Figure 1. Representation of substructures searched for the comparative analysis of drug databases of Table 1. In these structures, the atoms
are carbon unless otherwise indicated. ªAº represents any element. The bonds with dashed lines stand for aromatic or delocalized bonds.
lipophilicity. The antipsychotic drugs are considerably more classes have very distinct sharp peaks. These peaks are
hydrophobic (mean log P ) 4.10) than antidepressant (mean distributed over the ALOGP24-26 range of 1.0 to 4.0 (see
log P ) 3.10) or hypnotic (mean log P ) 2.20) drugs. The Figure 2). Because of this, the distribution curve of the
standard deviation of log P for all the three classes of CNS whole CMC database is considerably flattened compared to
active drugs was 1.5 which is considerably lower than several any particular drug class. Although the distribution peaks
other classes of drugs such as anticancer, cardiovascular, or for different classes are different, their overall distributions
antiinfective drugs. This is due to the requirement that they overlap considerably as indicated by their overlapping
should cross the blood brain barrier. Surprisingly, the qualifying and preferred ranges (Table 2). This indicates
antiinflammatory drugs are very similar to the antidepressant that most drugs should fall within a particular range of log
drugs in their physicochemical property profile. The anti- P to satisfy a proper physiological distribution and it may
cancer drugs are the least lipophilic compounds with a high be a tool for fine-tuning the efficacy of a drug, though log
standard deviation. This is a consequence of the fact that P by itself does not determine the drug class.
cancer is a complex disease affecting different parts of the Molecular weight, the number of atoms, and molar
body and tissues and often works with chemical brute force refractivity all are related to molecular size. Although the
rather than milder physical interactions. The antihypertensive average value, the standard deviation, and the distributions
drugs are more hydrophilic than the average CMC com- of all these properties are given in Tables 1 and 2 and
pound, and they have a reasonably high standard deviation. Figures 2-5, we will analyze only the molecular weight in a
Despite the low number of the antiinfective drugs, the greater detail. The CMC database has an average molecular
standard deviation is fairly high for this drug class. Analysis weight of 357 and a standard deviation of 174. The average
of the ALOGP24-26 distribution curve shows that most drug molecular weights of the drug classes are close, with the
Characterization of Known Drug Databases Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 59
Figure 2. Histogram plots of octanol-water log P (ALOGP) distributions for the CMC database and a few other drug classes as described
in Table 1.
Figure 3. Histogram plots of molar refractivity (AMR) distributions for the CMC database and a few other drug classes as described in
Table 1.
exception of the antidepressant and hypnotic drug classes analyzed the compounds which had a molecular weight
which are somewhat smaller in their average molecular significantly outside this range. The compounds in the CMC
weight. All three classes of CNS active drugs we studied database which had a molecular weight less than 150 are
here had considerably smaller standard deviations. We also shown in Table 5. The most important drug classes found
60 Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 Ghose et al.
Figure 4. Histogram plots of molecular weight (MW) distributions for the CMC database and a few other drug classes as described in
Table 1.
Figure 5. Histogram plots of total number of atoms distributions for the CMC database and a few other drug classes as described in Table
1.
in this list are adrenergic, antineoplastic, anesthetic, anti- be considered differently! In the new drug discovery, we
convulsant, and nutrient (amino acids). Many of the drug will probably not miss many important drugs if we avoid
classes such as antineoplastic, anesthetic, and nutrient should too small molecules. The unusually large drugs with a
Characterization of Known Drug Databases Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 61
Table 2. Qualifying (Covering 80% of the Total Number of Compounds in That Class) and Preferred (Covering 50% of the
Total Number of Compounds) Ranges of the Various Physicochemical Properties in Different Drug Classes
ALOGP ALOGP AMR AMR MW MW atoms atoms
no. drug class 80% 50% 80% 50% 80% 50% 80% 50%
1 CMC clean -0.4; 5.6 1.3; 4.1 40; 130 70; 110 160; 480 230; 390 20; 70 30; 55
2 inflammatory 1.4; 4.5 2.6; 4.2 59; 119 67; 97 212; 447 260; 380 24; 59 28; 40
3 depressant 1.4; 4.9 2.1; 4.0 62; 114 75; 95 210; 380 260; 330 32; 56 37; 48
4 psychotic 2.3; 5.2 3.3; 5.0 85; 131 94; 120 274; 464 322; 422 40; 63 49; 61
5 hypertensive -0.5; 4.5 1; 3.4 54; 128 68; 116 206; 506 281; 433 28; 66 36; 58
6 hypnotic 0.5; 3.9 1.3; 3.5 43; 97 43; 73 162; 360 212; 306 20; 45 29; 38
7 neoplastic -1.5; 4.7 0.0; 3.7 43; 128 60; 107 180; 475 258; 388 21; 63 30; 55
8 infective -0.3; 5.1 0.8; 3.8 44; 144 68; 138 145; 455 192; 392 12; 64 12; 42
Table 3. List of Unusually Hydrophilic Compounds in the CMC Database (ALOGP < -5.0)
generic name drug class generic name drug class generic name drug class
acarbose R-glucosidase polymyxin b1 antibacterial obidoxime chloride cholinesterase reactivator
(inhibitor)
anthelmycin anthelmintic ristocetin antibacterial pralidoxime chloride cholinesterase reactivator
pentigetide antiallergic butirosin antibacterial alfadex complexing agent
paromomycin antiamebic homatropine anticholinergic edrophonium diagnostic aid
methylbromide (antispasmodic) chloride (myasthenia gravis)
carcainium chloride antiarrhythmic tiametonium iodide anticholinergic pentetic acid diagnostic aid
(antispasmodic)
capreomycin 1b antibacterial edrophonium chloride antidote (curare) thiamine enzyme cofactor
(tuberculostatic)
viomycin antibacterial sinefungin antifungal pentamethonium ganglionic blocker
(tuberculostatic) bromide
streptomycin antibacterial azamethonium antihypertensive trepirium iodide ganglionic blocker
(tuberculostatic) bromide
enviomycin antibacterial pentamethonium antihypertensive dicolinium iodide ganglionic blocker
(tuberculostatic) bromide
tobramycin antibacterial hexamethonium antihypertensive somatostatin gastro-duodenal ulcers (therapeutic
bromide for severe hemorrhage)
propikacin antibacterial ademetionine antiinflammatory eledoisin hypotensive
neomycin b [u] antibacterial lividomycin antimicrobial hymotrinan immunological agent
(broad spectrum)
arbekacin antibacterial oxiglutatione antineoplastic acemannan immunomodulator
isepamicin antibacterial prospidium chloride antineoplastic prolonium iodide iodine source
dihydrostreptomycin antibacterial peplomycin antineoplastic eledoisin lacrimal stimulant
dibekacin antibacterial talisomycin antineoplastic oxydipentonium muscle relaxant (skeletal)
chloride
ribostamycin antibacterial bleomycin a2 antineoplastic alcuronium chloride muscle relaxant (skeletal)
amphomycin antibacterial capreomycin 1b antitubercular benzoquinonium muscle relaxant (skeletal)
(antimycotic) chloride
bluensomycin antibacterial viomycin antitubercular succinylcholine muscle relaxant (skeletal)
(antimycotic) chloride
streptonicozid antibacterial streptomycin antitubercular hexcarbacholine muscle relaxant (skeletal)
(antimycotic) bromide
kanamycin antibacterial enviomycin antitubercular succinylcholine neuromuscular blocker
(antimycotic) chloride
furazolium chloride antibacterial tiametonium iodide antitussive gallamine neuromuscular blocker
triethiodide
bekanamycin antibacterial acemannan antiviral suxamethonium neuromuscular blocker
bromide
butikacin antibacterial goralatide bone marrow suxethonium neuromuscular blocker
therapeutic chloride
colistin antibacterial deltibant bradykinin pendetide scintigraphy (agent)
antagonist
betamicin antibacterial icatibant bradykinin argiprestocin testicular androgen
antagonist biosynthesis (inhibitor)
amikacin antibacterial ambenonium chloride cholinergic terlipressin vasopressor
apramycin antibacterial trimedoxime bromide cholinesterase thiamine vitamin (cofactor), vitamin
reactivator (provitamin)
molecular weight greater than 700 are collected in Table 6. A Tables 3-6, are given in alphabetical order in Comprehen-siVe
large portion of these compounds are simply the repetition of Medicinal Chemistry29 and may be useful as a reference.
the unusually hydrophilic antibacterial, antifungal, anti-biotic, We also studied the correlations of property distributions
or anticancer drugs. Unless one is interested in these classes of among these drug classes. These correlation matrices are
drugs, making compounds in this molecular weight range may given in Table 7. These matrices could be used to infer the
be not be an efficient drug discovery process. The acceptable similarity among drug classes. For example, with respect to
(qualifying) range here may be 160-480. The structures of the log P distributions, (i) antiinflammatory drugs are highly
various ªunusual drugsº', as given in correlated with antidepressants and least correlated with
62 Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 Ghose et al.
Table 4. List of Unusually Hydrophobic Compounds in the CMC Database (ALOGP > 7.0)
generic name drug class generic name drug class generic name drug class
quiflapon 5-lipoxygenase-activating- lecimibide antihyperlipidemic anipamil calcium channel blocker
protein (inhibitor)
flumethrin acaricide melinamide antihyperlipoproteinemic dihydrotachysterol calcium regulator
mebolazine anabolic salafibrate antihyperlipoproteinemic ubidecarenone cardiovascular agent
bolazine anabolic tiafibrate antihyperlipoproteinemic steroidal cellobioside
nandrolone anabolic tocofibrate antihyperlipoproteinemic carbamoyl derivs. cholesterol absorption (inhibitor)
decanoate
myrophine analgesic (narcotic) probucol antihyperlipoproteinemic stearylsulfamide dermatologic
olvanil analgesic clinolamide antihyperlipoproteinemic flumethrin ectoparasiticide
nabitan analgesic sitofibrate antihyperlipoproteinemic declaben emphysema (therapeutic)
menabitan analgesic moctamide antihyperlipoproteinemic cholesterol emulsifier
antrafenine analgesic octimibate antihyperlipoproteinemic stearyl alcohol emulsion adjunct
testosterone androgen terbuficin antihyperlipoproteinemic furostilbestrol estrogen
ketolaurate
nandrolone androgen cetaben antihyperlipoproteinemic estradiol undecylate estrogen
decanoate
dymanthine anthelmintic hexachlorophene antiinfective (topical) cistinexine expectorant
zilantel anthelmintic octenidine antiinfective (topical) benzquercin for capillary fragility
bisbendazole anthelmintic thymol iodide antiinfective prednisolone glucocorticoid
steaglate
bunamidine anthelmintic olvanil antiinflammatory acebrochol hypnotic
ly-290324 antiallergic iralukast antiischemic (platelet gamma oryzanol hypocholesterolemic
aggregation inhibitor)
iralukast antiallergic halofantrine antimalarial orlipastat hypolipidemic
symetine antiamebic lapinone antimalarial tereticornate-a influenza and diarrhea
(therapeutic)
butoprozine antianginal menoctone antimalarial iralukast leukotriene (antagonist)
butoprozine antiarrhythmic diathymosulfone antimycobacterial quiflapon leukotriene-synthesis (inhibitor)
(cardiac depressant)
amiodarone antiarrhythmic buclizine antinauseant tripalmitin lung disorders (therapeutic)
(cardiac depressant)
declaben antiarthritic carzelesin antineoplastic glyceryltrierucate multiple sclerosis (therapeutic)
iralukast antiasthmatic phenesterin antineoplastic duoperone neuroleptic
eldacimibe antiatherosclerotic thalicarpine antineoplastic iralukast phospholipase a2 (inhibitor)
chaulmosulfone antibacterial (leprostatic) verteporfin antineoplastic temoporfin photosensitizer
clofazimine antibacterial (tuberculostatic) bizelesin antineoplastic phytonadione prothrombogenic
(leprostatic)
chloramphenicol antibacterial atrimustine antineoplastic triolein i 125 radioactive agent
palmitate
clindamycin antibacterial aminoquinol antiprotozoal iodocholesterol i 131 radioactive agent
palmitate (leishmaniasis)
clofoctol antibacterial fluphenazine antipsychotic adibenzylnor- radioprotective
enanthate spermidine
quindecamine antibacterial fluphenazine antipsychotic tocofenoxate reverses aging of murine cells
decanoate
biclotymol antibacterial bromperidol antipsychotic acebrochol sedative
decanoate
diathymosulfone antibacterial haloperidol antipsychotic isopropyl palmitate vehicle (oleaginous)
decanoate
gefarnate anticholinergic pipotiazine antipsychotic ergocalciferol vitamin (antirachitic)
(antispasmodic) palmitate
menatetrenone anticoagulant penfluridol antipsychotic ergocalciferol vitamin (cofactor)
nabazenil anticonvulsant cholecalciferol antirachitic vitamin e vitamin (cofactor)
butoprozine antidepressant chloramphenicol antirickettsial phytonadione vitamin (cofactor)
palmitate
amiodarone antidepressant teroxalene antischistosomal cholecalciferol vitamin (cofactor)
naboctate antiemetic clofazimine antitubercular ergocalciferol vitamin (provitamin)
(antimycotic)
idoxifene antiestrogen dimyristoyl- antiviral vitamin e vitamin (provitamin)
phosphatidyl-azt (aids therapeutic)
thymol iodide antifungal cicloxolone ntiviral (herpes genitalis) phytonadione vitamin (provitamin)
naboctate antiglaucoma agent avridine antiviral cholecalciferol vitamin (provitamin)
eldacimibe antihyperlipidemic perfluamine blood substitute scarlet red vulnerary
anticancer compounds; (ii) antidepressants are also least highly correlated with anticancer compounds and least with
correlated with anticancer compounds; (iii) antipsychotics are antipsychotics; (ii) antidepressants are least correlated with
generally less correlated with other drug classes and the least antiinfectives; (iii) antipsychotics are least correlated with
with antihypertensives; (iv) antihypertensives are well hypnotics. With respect to molecular weight distributions,
correlated with most drug classes and the most with hypnotics; (i) antiinflammatory drugs are highly correlated with anti-
(v) anticancer compounds are understandably less correlated cancer compounds and least correlated with antipsychotic
with most drug classes in general, but have the best correlation compounds; (ii) antidepressants are least correlated with
with antiinfectives. With respect to molar refractivity antipsychotic compounds; (iii) antipsychotics are generally
distributions, (i) antiinflammatory drugs are less correlated with other drug classes and the least with
Characterization of Known Drug Databases Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 63
Table 5. List of Unusually Low Molecular Weight Compounds in the CMC Database (MW < 150)
generic name drug class generic name drug class generic name drug class
glutamic acid acidifier (gastric) metacresol antifungal imexon immunostimulant
racemethionine acidifier (urinary) ornithine antihyperlipoproteinemic ethohexadiol insect repellent
fumaric acid acidifier oxiniacic acid antihyperlipoproteinemic deferiprone iron chelating agent
cyclopentamine adrenergic oxibetaine antihyperlipoproteinemic racemethionine lipotropic
phenpromethamine adrenergic ç-aminobutyric acid antihypertensive methionine lipotropic
tyramine adrenergic carbamide peroxide antiinfective (topical) choline chloride lipotropic
tuaminoheptane adrenergic dimethyl sulfoxide antiinflammatory (topical) mecysteine mucolytic
octodrine adrenergic phenylethyl alcohol antimicrobial agent fampridine multiple sclerosis
(ophthalmic) ç-aminobutyric acid (therapeutic)
fomepizole alcohol dehydrogenase sorbic acid antimicrobial agent neurotransmitter
(inhibitor)
methyl isobutyl alcohol denaturant carzolamide antineoplastic tetrazolylglycine nmda agonist
ketone
tromethamine alkalizer alanosine antineoplastic dimiracetam nootropic
diethanolamine alkalizing agent cycloleucine antineoplastic rolziracetam nootropic
diisopropanolamine alkalizing agent urethane antineoplastic methionine nutrient (amino acid)
trolamine alkalizing agent diazouracil antineoplastic lysine nutrient (amino acid)
trichloroethylene analgesic (inhalation) lycidyl methacrylate antineoplastic isoleucine nutrient (amino acid)
picolamine analgesic fluorouracil antineoplastic aspartic acid nutrient (amino acid)
gaboxadol analgesic thiocarzolamide antineoplastic leucine nutrient (amino acid)
trolamine analgesic imexon antineoplastic serine nutrient (amino acid)
salicylamide analgesic butanediol cyclic sulfite antineoplastic threonine nutrient (amino acid)
acetanilide analgesic dianhydrogalactitol antineoplastic alanine nutrient (amino acid)
trichloroethylene anesthetic (inhalation) imidazopyrazole antineoplastic valine nutrient (amino acid)
cyclopropane anesthetic (inhalation) guanazole antineoplastic proline nutrient (amino acid)
ethylene anesthetic (inhalation) methyl methanesulfonate antineoplastic glycine nutrient
fluroxene anesthetic (inhalation) methylformamide antineoplastic cinnamaldehyde perfume agent
norflurane anesthetic (inhalation) aminothiadiazole antineoplastic isaxonine peripheral neuropathies
(therapeutic)
ether anesthetic (inhalation) hydroxyurea antineoplastic cysteamine radioprotective
vinyl ether anesthetic (inhalation) ammonium lactate antipruritic (topical) glycerin reduces intraocular and
intracranial pressure
salicyl alcohol anesthetic (local) acetanilide antipyretic methyl nicotinate rubifacient
ethyl chloride anesthetic (topical) metacresol antiseptic (topical) monoethanolamine sclerosing agent
cathinone anorexic isoniazid antitubercular (antimycotic) paraldehyde sedative
phentermine anorexic pyrazinamide antitubercular (antimycotic) meparfynol sedative
levamfetamine anorexic cycloserine antitubercular (antimyotic) ethchlorvynol sedative
piperazine anthelmintic (as citrate) cyacetacide antitubercular dextroamphetamine stimulant (central)
metyridine anthelmintic cysteamine antiurolithic methamphetamine stimulant (central)
cyacetacide anthelmintic amitivir antiviral ampyzine stimulant (central)
thiouracil antianginal fosfonet antiviral piracetam stimulant (central)
parachlorophenol antibacterial (topical) dimepranol antiviral amphetamine stimulant (central)
isoniazid antibacterial foscarnet antiviral pentylenetetrazole stimulant (central)
(tuberculostatic)
pyrazinamide antibacterial kethoxal antiviral histamine stimulant (gastric
(tuberculostatic) secretory)
cycloserine antibacterial creatinine bulk agent for freeze-drying cathinone stimulant
(tuberculostatic)
methenamine antibacterial (urinary) heptaminol cardiotonic thiouracil thyroid (inhibitor)
bacitracin a antibacterial phenylpropanol choleretic aminothiazole thyroid (inhibitor)
fosfomycin antibacterial timonacic choleretic methimazole thyroid (inhibitor)
taurultam antibacterial piracetam cognition enhancer mipimazole thyroid (inhibitor)
benzyl alcohol antibacterial allyl isothiocyanate counter-irritant methylthiouracil thyroid (inhibitor)
ornithine anticholesteremic captamine depigmentor pimagedine hcl tooth discoloration
(inhibitor)
vigabatrin anticonvulsant hydroquinone depigmentor cetohexazine tranquilizer
(tardive dyskinesia)
valpromide anticonvulsant cysteine detoxicant emylcamate tranquilizer
trimethadione anticonvulsant ethyl nitrite diaphoretic valnoctamide tranquilizer
milacemide anticonvulsant isosorbide diuretic acetohydroxamic acid urease (inhibitor)
dimethadione anticonvulsant urea diuretic tiformin uremic diabetes
(therapeutic)
valproic acid anticonvulsant ethyl nitrite diuretic cyclopentamine vasoconstrictor
ethosuximide anticonvulsant niacinamide enzyme cofactor octodrine vasoconstrictor
levcycloserine anticonvulsant niacin enzyme cofactor nicotinyl alcohol vasodilator (peripheral)
milacemide antidepressant levcycloserine enzyme gaucher's betahistine vasodilator
disease (inhibitor)
phenelzine antidepressant guaiacol expectorant aminoethyl nitrate vasodilator
octamoxin antidepressant cinnamaldehyde flavor agent niacinamide vitamin (cofactor)
tranylcypromine antidepressant bacitracin a food additive niacin vitamin (cofactor)
mebanazine antidepressant amogastrin gastric secretion stimulant niacinamide vitamin (provitamin)
metformin antidiabetic aminocaproic acid hemostatic niacin vitamin (provitamin)
cysteamine antidote nafarelin acetate hormone agonist adenine vitamin b4
(acetaminophen) (gonadotrophin releasing)
dimercaprol antidote (heavy metal) pidolic acid humectant (as Na salt) allopurinol vulnerary
taurultam antifungal mequinol hyperpigmentation trientine wilson's disease
(therapeutic) (therapeutic)
octanoic acid antifungal paraldehyde hypnotic allopurinol xanthine oxidase
benzoic acid antifungal meparfynol hypnotic (inhibitor)
flucytosine antifungal cetohexazine hypnotic
64 Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 Ghose et al.
Table 6. List of Unusually High Molecular Weight Compounds in the CMC Database (MW > 700)
generic name drug class generic name drug class generic name drug class
cp-331 analgesic paulomycin a antibacterial atrimustine antineoplastic
ivermectin b1a anthelmintic clarithromycin antibacterial vinrosidine antineoplastic
(oncocerchiasis)
abamectin b1b anthelmintic scopafungin antibacterial peplomycin antineoplastic
fuladectin a3 anthelmintic pristinamycin antibiotic lanreotide antineoplastic
fuladectin a4 anthelmintic ardacin antibiotic talisomycin antineoplastic
iralukast antiallergic penimocycline antibiotic toyomycin antineoplastic
berythromycin antiamebic hamycin a antibiotic vinylglycinate antineoplastic
iralukast antiasthmatic salinomycin anticoccidial vincristine antineoplastic
pamaqueside antiatherosclerotic maduramicin anticoccidial dactinomycin antineoplastic
rifapentine antibacterial seglitide antidiabetic bleomycin a2 antineoplastic
(antitubercular)
rifampin antibacterial lypressin antidiuretic rodorubicin antineoplastic
(antitubercular)
rifamide antibacterial desmopressin antidiuretic vinfosiltine antineoplastic
(antitubercular)
rifabutin antibacterial suramin antifilarial docetaxol antineoplastic
(antitubercular)
14-hydroxycl- antibacterial candicidin antifungal (topical) nogalamycin antineoplastic
arithromycin (broad spectrum) (levorin a2)
maridomycin antibacterial levorin a2 antifungal agent bryostatin-1 antineoplastic
(gram-positive)
chaulmosulfone antibacterial basifungin antifungal agent vinformide antineoplastic
(leprostatic)
gramicidin antibacterial lucimycin antifungal eprinomectin b1b antiparasitic
(topical)
berythromycin antibacterial cilofungin antifungal eprinomectin b1a antiparasitic
mocimycin antibacterial nystatin a1 antifungal partricin a antiprotozoal
virginiamycin antibacterial amphotericin b antifungal pipotiazine palmitate antipsychotic
factor s
coumermycin a1 antibacterial rutamycin antifungal cp-331 antipyretic
rifamexil antibacterial scopafungin antifungal octreotide antisecretory (gastric)
phenyracillin antibacterial partricin a antifungal suramin antitrypanosomal
mirosamicin antibacterial fungimycin antifungal rifapentine antitubercular (antimycotic)
penimepicycline antibacterial itraconazole antifungal rifampin antitubercular (antimycotic)
rifametane antibacterial tiqueside antihyperlipidemic rifamide antitubercular (antimycotic)
lexithromycin antibacterial pantenicate antihyperlipoproteinemic rifabutin antitubercular (antimycotic)
primycin antibacterial glunicate antihyperlipoproteinemic sucrosofate antiulcerative (K salt)
josamycin antibacterial dextrothyroxine antihyperlipoproteinemic ilatreotide antiulcerative
flurithromycin antibacterial etiroxate antihyperlipoproteinemic dimyristoyl- antiviral (aids therapeutic)
phosphatidyl-azt
pristinamycin antibacterial ditekiren antihypertensive streptovarycin c antiviral
(rennin inhibitor)
paldimycin b antibacterial saralasin antihypertensive acemannan antiviral
rokitamycin antibacterial sr-43845 antihypertensive 1 731723 antiviral
amphomycin antibacterial fk-744 antihypertensive ritonavir antiviral
azithromycin antibacterial bietaserpine antihypertensive palinavir antiviral
erythromycin antibacterial protoveratrine a antihypertensive deltibant bradykinin antagonist
stinoprate
rifaximin antibacterial zankiren antihypertensive icatibant bradykinin antagonist
streptonicozid antibacterial mipragoside antiinflammatory gitoformate cardiac glycoside
erythromycin antibacterial proglumetacin antiinflammatory digitoxin cardiotonic
ethylsuccinate
carbomycin antibacterial cp-331 antiinflammatory acetyldigitoxin cardiotonic
midecamycin antibacterial iralukast antiischemic (platelet lanatoside c cardiotonic
aggregation inhibitor)
kitasamycin antibacterial lividomycin antimicrobial digoxin cardiotonic
(broad spectrum)
erythromycin antibacterial streptovarycin c antimicrobial deslanoside cardiotonic
vancomycin hcl antibacterial triptorelin antineoplastic (prostatic pengitoxin cardiotonic
carcinoma therapeutic)
erythromycin antibacterial leuprolide antineoplastic metildigoxin cardiotonic
propionate (prostatic carcinoma)
dirithromycin antibacterial vinorelbine antineoplastic alkaloid gitaloxin cardiotonic
diproleandomycin antibacterial carzelesin antineoplastic ubidecarenone cardiovascular agent
troleandomycin antibacterial aclarubicin antineoplastic sincalide choleretic
colistin antibacterial vindesine antineoplastic steroidal cellobioside
ramoplanin a2 antibacterial vinzolidine antineoplastic carbamoyl derivs. cholesterol absorption
(inhibitor)
megalomicin antibacterial vinblastine antineoplastic 11-ketotigogenin cholesterol absorption
cellobioside (inhibitor)
relomycin antibacterial plicamycin antineoplastic demecarium bromide cholinergic
(ophthalmic)
spiramycin antibacterial echinomycin antineoplastic salinomycin coccidiostat
aspartocin antibacterial taxol antineoplastic semduramicin coccidiostat
quinupristin antibacterial didemnin b antineoplastic sulbutiamine coenzyme precursor
polymyxin b1 antibacterial vinepidine antineoplastic alfadex complexing agent
piridicillin antibacterial olivomycin a antineoplastic truxipicurium curariform
iodide
tylosin antibacterial vinleucinol antineoplastic iobitridol diagnostic agent
erythromycin antibacterial ditercalinium antineoplastic pentagastrin diagnostic aid
acistrate chloride (gastric secretion
indicator)
ristocetin antibacterial vapreotide antineoplastic sulfobromophthalein diagnostic aid
(hepatic function)
roxithromycin antibacterial verteporfin antineoplastic ioxilan diagnostic aid
avilamycin-a antibacterial bizelesin antineoplastic iofratol diagnostic aid
Characterization of Known Drug Databases Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 65
Table 6 (Continued)
generic name drug class generic name drug class generic name drug class
teprotide enzyme angiotensin- geclosporin immunosuppressive cagutocin oxytocic
converting (inhibitor)
bisbutitiamine enzyme cofactor eledoisin lacrimal stimulant atosiban oxytocin antagonist
cistinexine expectorant iralukast leukotriene (antagonist) iralukast phospholipase a2 (inhibitor)
sorbinicate fibrinolytic deslorelin lhrh agonist goserelin prostatic carcinoma
(therapeutic)
flavin adenin flavin coenzyme lutrelin lhrh agonist triolein i 125 radioactive agent
dinucleotide
ahn-683 fluorescent ligand for analysis histrelin lhrh agonist thyroxine i 125 radioactive agent
of benzodiazepine receptors detirelix lhrh antagonist pendetide scintigraphy (agent)
benzquercin for capillary fragility tripalmitin lung disorders petrichloral sedative
(therapeutic)
somatostatin gastro-duodenal ulcers cetrorelix luteinizing-hormone- zaragozic acid squalene synthase
(therapeutic for severe releasing-hormone (inhibitor)
hemorrhage) antagonist
gonadorelin gonad-stimulating principle glyceryltrierucate multiple sclerosis ceruletide stimulant of gastric
(therapeutic) secretion
buserelin gonad-stimulant truxipicurium iodide muscle relaxant (general) argiprestocin testicular androgen
biosynthesis (inhibitor)
ganirelix gonad-stimulating principle pipecuronium bromide muscle relaxant (general) thyromedan thyroid hormone
fertirelin gonadotropin-releasing dimethyltubo- muscle relaxant (skeletal) levothyroxine thyroid hormone
hormone (vet.) curarinium chloride
examorelin growth hormone releaser doxacurium chloride muscle relaxant (skeletal) thyromedan thyromimetic
ritonavir hiv-1 and hiv-2 (inhibitor) alcuronium chloride muscle relaxant (skeletal) levothyroxine thyromimetic
l 731723 hiv-1 protease (inhibitor) atracurium besilate muscle relaxant (skeletal) lypressin vasoconstrictor
vasopressin hormone (antidiuretic) gallamine triethiodide neuromuscular blocker felypressin vasoconstrictor
petrichloral hypnotic metocurine iodide neuromuscular blocker angiotensin ii vasoconstrictor
pamaqueside hypocholesteremic mivacurium chloride neuromuscular blocker ornipressin vasoconstrictor
eledoisin hypotensive pancuronium bromide neuromuscular blocker angiotensin amide vasoconstrictor
acemannan immunomodulator laudexium methyl neuromuscular blocker inositol niacinate vasodilator (peripheral)
sulfate
thymoctonan immunomodulator cisatracurium neuromuscular terlipressin vasopressor
besylate blocking agent
ebiratide nootropic troxerutin venous disorders
(therapeutic)
romurtide immunostimulant oxytocin oxytocic bisbentiamine vitamin (cofactor)
sirolimus immunosuppressant carbetocin oxytocic bisbentiamine vitamin (provitamin)
tacrolimus immunosuppressant demoxytocin oxytocic bisbentiamine vitamin b1 source
oxeclosporin immunosuppressive agent
cyclosporine immunosuppessive
Table 8. Composition of Functional Groups (Identified in Figure 1) among Drugs Classified by Disease State: (a) Antiinflam-
a
matory, (b) Antidepressants, (c) Antipsychotic, (d) Antihypertensive, (e) Hypnotics, (f) Anticancer, (g) Antiinfectives, (h) CMC
(a) (b) (c) (d) (e) (f) (g) (h)
no. description 293 222 110 368 75 431 37 6454
I carboxyl 114 6 2b 87 0 39 7 972
II alcohol 69 25 20 82 5 161 5 1668
c c d
III aldehyde 1 0 0 0 0 2 0 34
IV aliphatic primary amine 3 15 0 17 0 43 1 367
V aliphatic secondary amine 1 46 1 63 0 16 3 587
VI aliphatic tertiary amine 21 116 102 66 9 59 5 1910
VII amino acid 2 0 0 5 0 5 0 96
VIII aromatic primary amine 6 3 2 15 2 35 4 350
e
IX aromatic secondary amine 38 10 13 30 0 53 1 462
X aromatic tertiary amine 14 41 40 39 6 44 7 663
XI carboxamide 60 53 23 106 44 109 1 1752
XII keto 91 8 26 21 4 109 5 1014
XIII N-oxide 0 0 0 3 0 1 0 12
XIV nitro 2 1 0 13 2 11 5 170
XV phenolic OH 19 4 3 25 1 58 7 660
XVI epoxy 0 0 0 0 0 10 0 49
XVII C-O-O-C 0 0 0 0 0 0 0 4
XVIII C-N-O-C 2 0 0 1 0 1 0 16
XIX C-N-N-C 17 5 0 14 1 4 0 100
f
XX C-N-S-C 0 0 0 0 0 0 0 4
XXI C-S-S-C 0 0 0 1 0 5 0 42
XXII C-S-O-Cg 0 0 0 0 0 0 0 0
XXIII nucleoside 0 0 0 0 0 33 0 66
XXIV pyridine 29 20 5 30 2 27 5 521
XXV pyrimidine 2 2 0 15 0 27 0 158
XXVI pyrrole 24 16 12 22 0 30 0 286
XXVII benzene 224 205 107 292 34 189 24 4536
XXVIII furan 5 3 1 5 0 2 4 128
XXIX thiophene 13 4 3 9 2 6 1 124
XXX imidazole 9 1 2 22 3 38 3 388
XXXI ester 74 6 8 92 3 86 0 1174
XXXII sulfonamide 16 3 3 32 0 4 0 291
XXXIII sulfonic acid 0 0 0 1 0 4 0 42
XXXIV aliphatic ether 8 0 0 1 0 0 0 46
aromatic ether 2 4 1 27 1 1 0 81
heterocyclic (any) 169 149 105 270 56 285 21 4314
heterocyclic (aromatic) 103 52 27 118 12 127 13 1832
a
The numbers in the column label show the number of compounds found in the CMC database for the respective classes. b The acid
counterpart of a salt. c Sugar or aromatic aldehydes. d Mostly antibiotics, sugar, and aromatic aldehydes. e Diaryl or aryl alkyl. f Excluding
C-N-S(dO)-C. g Excluding C-O-S(dO)-C.
desolvation cost41 and can serve as a scaffold for polar or outnumbered aliphatic primary or secondary amines by 3-to
hydrophobic functionalities, and it often contributes to 5-fold. Both basicity and biochemical stability may be the
hydrophobic interactions. The common heterocyclic reasons for the success of tertiary amines. The keto (XII)
aromatic rings, when each type of ring is considered and the carboxy esters (XXXI) were found in comparable
separately, had a fairly low occurrence in the CMC data set occurrence. We did find compounds with a single bond
as well as in the various drug classes studied here. Among between heteroatoms; however, they are mostly stabilized
nitrogen-containing heterocycles, the pyridine ring is most by a conjugated CdX bond. The antineoplastic compounds
common, which may be understood by its somewhat often have reactive functionalities, and such drugs should
chemical inert-ness. In addition it can act as an H-bond be considered as special brute force drugs and should be
acceptor. The number of compounds where there was a avoided in the regular drug design process. The carboxylic
carbon heteroatom bond in a ring was 4314. The number of acid, aromatic secondary and tertiary amines, and aliphatic
compounds which had both a benzene ring and a carbon- secondary amines also constituted moderately in the drug
heteroatom ring bond was 3163, indicating that this database. It is interesting to note that carboxyl acid groups
combination is likely to create successful drug candidates. are virtually absent among antipsychotic, antidepressant,
Among the common organic functional groups considered, and hypnotic drug classes, all of which act on the central
the alcoholic hydroxyl (II) and the carboxamide group (XI) nervous system. As is well known, CNS acting drugs have
occur with a high frequency among the CMC database as well to cross the blood brain barrier, requiring them to be
as in the various drug classes studied here. This is also sufficiently lipophilic, thus disfavoring acid groups.
expected, since they have both hydrogen-accepting and - From the analysis presented here we may provide the
donating abilities in a hydrogen bond. They are hydrophilic following consensus definition of a drug-like molecule: (i)
and chemically stable, neutral groups. Aliphatic tertiary amine an organic compound having a calculated log P (ALOGP)
had a comparable presence in the CMC database. It between -0.4 and 5.6, a molar refractivity (AMR25,31,32)
Characterization of Known Drug Databases Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 67
between 40 and 130, a molecular weight between 160 and (7) Moran, E. J.; Sarshar, S.; Cargill, J. F.; Shahbaz, M. M.; Lio, A.;
Mjalli, A. M. M.; Armstrong, R. W. Radio frequency tag encoded
480, and the total number of atoms between 20 and 70; (ii) combinatorial library method for the discovery of tripeptide-
structurally a combination of some of the following groups: substituted cinnamic acid inhibitors of the protein tyrosine phos-
a benzene ring, a heterocyclic ring (both aliphatic and phatase PTP1B. J. Am. Chem. Soc. 1995, 117, 10787-10788.
(8) Joseph-McCarthy, D.; Hogle, J. M.; Karplus, M. Use of the multiple
aromatic), an aliphatic amine (preferably tertiary), a car- copy simultaneous search (MCSS) method to design a new class of
boxamide group, an alcoholic hydroxyl group, a carboxy picornavirus capsid binding drugs. Proteins: Struct., Funct., Genet.
ester, and a keto group; (iii) chemically stable in the 1997, 29, 32-58.
(9) (a) Kuntz, I. D. Structure-Based Strategies for Drug Design and
physiological buffer, as obvious by the absence of a Discovery. Science 1992, 257, 1078-1082. (b) Kick, E. K.; Roe, D. C.;
reactive functional group or structural moiety. Skillman, A. G.; Liu, G.; Ewing, T. J. A.; Sun, Y.; Kuntz, I. D.; Ellman,
J. A. Structure-Based Design and Combinatorial Chemistry Yield Low
4. Conclusion Nanomolar Inhibitors of Cathepsin D. Chem. Biol. 1997, 4, 297-307. (c)
See also the web page www.combinatorial.com.
We provided here an analysis of some computable (10) Kollman, P. Molecular-Dynamics and Free-Energy Perturbation
physicochemical properties and chemical constitutions of Calculations - What Role Do They Play In Computer-Assisted
known drug molecules available in the CMC database and Molecular Design. FASEB J. 1995, 9, A1253-A1253.
(11) Head, R. D.; Smythe, M. L.; Oprea, T. I.; Waller, C. L.; Green, S.
seven known drug classes. Our study showed that the M.; Marshall, G. R. VALIDATE: A new method for the receptor-
qualifying range (the chance of missing good compounds is based prediction of binding affinities of novel ligands. J. Am.
Chem. Soc. 1996, 118, 3959-3969.
less than 20%) of calculated log P (ALOGP24-26) for drug-like
(12) Ghose, A. K.; Crippen, G. M. Use of physicochemical parameters
molecules is -0.4 to 5.6. The mean ALOGP24-26 is 2.3, and the in distance geometry and related three-dimensional quantitative
preferred range (most populated for an interval having 50% of structure-activity relationships: a demonstration using Escherichia
coli dihydrofolate reductase inhibitors. J. Med. Chem. 1985, 28,
the drugs) is 1.3 to 4.1. For molar refractivity the qualifying 333-46.
range is 40 to 130. The mean is 97, and the preferred range is (13) Ghose, A. K.; Crippen, G. M.; Revankar, G. R.; McKernan, P. A.;
70 to 110. For molecular weight the qualifying range is 160 to Smee, D. F.; Robins, R. K. Analysis of the in vitro antiviral activity
of certain ribonucleosides against parainfluenza virus using a novel
480. The mean molecular weight is 360, and the preferred computer aided receptor modeling procedure. J. Med. Chem. 1989,
range is 230 to 390. For the total number of atoms the mean 32, 746-756.
value is 48, and the qualifying range is 20 to 70. The preferred (14) Ghose, A. K.; Logan, M. E.; Treasurywala, A. M.; Wang, H.; Wahl,
range for the total number of atoms is 30 to 55. For different R. C.; Tomczuk, B.; Gowravaram, M.; Jaeger, E. P.; Wendoloski, J.
J. Determination of Pharmacophoric Geometry for Collagenase
drug classes the ranges may be considerably tighter than what Inhibitors Using a Novel Computational Method and Its
we stated above. Verification Using Molecular Dynamics, NMR and X-ray
Benzene is the most abundant structural unit found in the Crystallography. J. Am. Chem. Soc. 1995, 117, 4671-4682.
(15) Ghose, A. K.; Wendoloski, J. J. Pharmacophore Modeling:
drug database. It outnumbered any other structural unit by Methods, Experimental Verifications and Applications. In 3D
a few folds. Tertiary aliphatic amine is the most frequent QSAR in Drug Design: Ligand-Protein Interactions and Molecular
functional group in the drug molecules. Carboxamides and Similarity; Kubinyi, H., Folker, G., Martin, Y. C., Eds.; Kluwer
Academic: The Netherlands, 1998.
alcohols are the two other groups whose frequency of (16) Hansch, C. Quantitative Structure-Activity Relationships and the
occurrence was close to the aliphatic tertiary amine group. Unnamed Science. Acc. Chem. Res. 1993, 26, 147-153.
The frequency of occurrence of the common heterocyclic (17) Martin, Y. C. 3D QSAR: Current State, Scope and Limitations. In
3D QSAR in Drug Design: Recent AdVances; Kubinyi, H., Flokers,
aromatic rings is far less than that of the aliphatic G., Martin, Y. C., Eds.; Kluwer/Escom: Dordrecht, 1998; Vol. 3, pp
heterocyclic rings. 3-23.
The consensus definition of a drug-like molecule obtained (18) Cramer, R. D.; Patterson, D. E.; Bunce, J. D. Comparative Molecular
Field Analysis (CoMFA) 1. Effect of shape on binding of steroids to
here will help to streamline the design of combinatorial carrier proteins. J. Am. Chem. Soc. 1988, 110, 5959-5967.
chemistry libraries for drug design as well as in developing a (19) Koltermann, A.; Kettling, U.; Bieschke, J.; Winkler, T.; Eigen, M.
more efficient corporate medicinal chemistry library. Rapid assay processing by integration of dual-color fluorescence
cross-validation spectroscopy: high throughput screening for
enzyme activity. Proc. Natl. Acad. Sci. U.S.A. 1998, 95, 1421-1426.
References and Notes (20) Bemis, G. W.; Murcko, M. A. The Properties of Known Drugs. I.
(1) Drews, J. Genomic sciences and the medicine of tomorrow. Nature Molecular Frameworks. J. Med. Chem. 1996, 37, 2887-2893.
Biotechnol. 1996, 14, 1516-1518. (21) McGregor, M. J.; Pallai, P. V. Clustering Large Databases of
(2) Drews, J. Sciences towards the Medicine of Tomorrow. Chimia Compounds: Using MDL ªKeysº as Structural Descriptors. J. Chem.
1996, 30, 507-510. Inf. Comput. Sci. 1997, 37, 443-448.
(3) Armstrong, R. W.; Combs, A. P.; Tempest, P. A.; Brown, S. D.; (22) Lipinski, C. A.; Lombardo, F.; Dominy, B. W.; Feeney, P. J.
Keating, T. A. Multiple-component condensation strategies for Experimental and computational approaches to estimate solubility
combinatorial library synthesis. Acc. Chem. Res. 1996, 29, 123- and permeability in drug discovery and development settings. AdV.
131. Drug DeliVery ReV. 1997, 23, 3-25.
(4) Chaiken, I. M.; Janda, K. D. Molecular DiVersity and (23) Moriguchi, I.; Hirono, S.; Liu, Q.; Nakagome, Y.; Masushita, Y.
Combinatorial Chemistry: ACS Conference Proceeding Series; Simple method of calculating octanol/water partition coefficient.
American Chemical Society: Washington, DC, 1996 Chem. Pharm. Bull. 1992, 40, 127-130.
(5) Caflisch, A.; Karplus, M. Computational combinatorial chemistry (24) Ghose, A. K.; Viswanadhan, V. N.; Wendoloski, J. Prediction of
for de novo ligand design: Review and assessment. Perspect. Drug Hydrophobic Properties of Small Organic Molecules Using Frag-
DiscoVery Des. 1995, 3, 51-84. mental methods: An Analysis of ALOGP and CLOGP methods. J.
(6) Ghose, A. K.; Viswanadhan, V. N.; Wendoloski, J. J. Adapting Phys. Chem. A 1998, 102, 3762-3772.
structure-based drug design in the paradigm of combinatorial (25) Viswanadhan, V. N.; Ghose, A. K.; Revankar, G. R.; Robins, R. K.
chemistry and high-throughput screening: An overview and new Atomic Physicochemical Parameters for Three-Dimensional
examples with important caveats for newcomers to combinatorial Structure Directed Quantitative Structure-Activity Relationships. 4.
library design using pharmacophore models or multiple copy Additional Parameters for Hydrophobic and Dispersive Interactions
simultaneous search (MCSS) fragments. In Rational Drug Design; and Their Application for an Automated Superposition of Certain
Parrill, A., Reddy, M. R., Eds.; American Chemical Society: Naturally Occurring Nucleoside Antibiotics. J. Chem. Inf. Comput.
Washington, DC, 1998; in press. Sci. 1989, 29, 163-172.
68 Journal of Combinatorial Chemistry, 1999, Vol. 1, No. 1 Ghose et al.
(26) Ghose, A. K.; Crippen, G. M. Atomic Physicochemical Parameters (36) Viswanadhan, V. N.; Ghose, A. K.; Weinstein, J. N. Mapping the
for Three-Dimensional Structure Directed Quantitative Structure binding site of the nucleoside transporter protein: a 3D-OSAR study.
Activity Relationships I. Partition Coefficients as a Measure of Biochim. Biophys. Acta 1990, 1039, 356-66.
Hydrophobicity. J. Comput. Chem. 1986, 7, 565-577. (37) Viswanadhan, V. N.; Ghose, A. K.; Hanna, N. B.; Matsumoto, S. S.;
(27) C&H Dictionary of Pharmaceutical Agents; available as a 3D Avery, T. L.; Revankar, G. R.; Robins, R. K. Analysis of the in vitro
UNITY database from Tripos, Inc.: St. Louis, MO, 1998.
antitumor activity of novel purine-6-sulfenamide, -sulfinamide, and -
(28) Integrated Scientific Information System (ISIS); available from
sulfonamide nucleosides and certain related compounds using
MDL Information Systems, Inc.: San Leandro, CA, 1997.
(29) Craig, P. N. Drug Compendium. In ComprehensiVe Medicinal a computer-aided receptor modeling procedure. J. Med. Chem.
Chemistry; Hansch, C., Sammes, P. G., Taylor, J. B., Drayton, C. 1991, 34, 526-32.
J., Eds.; Pergamon Press: Oxford, 1989; Vol. 6, p 237. (38) Macdonald, C. M.; Turcan, R. G. Sites of Drug Metabolism, Prodrugs
(30) Clark, T. A Handbook of Computational Chemistry: A Practical and Bioactivation. In ComprehensiVe Medicinal Chemistry; Hansch,
Guide to Chemical Structure and Energy Calculations; John Wiley: C., Sammes, P. G., Taylor, J. B., Ramsden, C. A., Eds.; Pergamon
New York, 1985. Press: London, 1990; Vol. 5, pp 111-138.
(31) Galaxy V. 2.5; available from AM Technologies, Inc.: San (39) Gaillot, J.; Bruno, R.; Montay, G. Distribution and Clearance
Antonio, TX, 1997.
Concept. In ComprehensiVe Medicinal Chemistry; Hansch, C.,
(32) Ghose, A. K.; Crippen, G. M. Atomic physicochemical parameters for
three-dimensional-structure-directed quantitative structure-activ-ity Sammes, P. G., Taylor, J. B., Eds.; Pergamon Press: Oxford, 1990;
relationships. 2. Modeling dispersive and hydrophobic interactions. Vol. 5, pp 71-109.
J. Chem. Inf. Comput. Sci. 1987, 27, 21-35. (40) Rishton, G. M. Reactive compounds and in vitro false positives in
(33) Mood, M. A.; Graybill, F. A.; Boes, D. C. Introduction to the HTS. Drug DiscoVery Trends 1997, 2, 384-386.
Theory of Statistics; McGraw-Hill: New York, 1974. (41) Still, W. C.; Tempczyk, A.; Hawley, R. C.; Hendrickson, T.
(34) Hansch, C. In Correlation Analysis in Chemistry; Chapman, N. B., Semianalytical Treatment of solvation for molecular mechanics and
Shorter, J., Eds.; Wiley: New York, 1978. dynamics. J. Am. Chem. Soc. 1990, 112, 6127-6129.
(35) Hansch, C.; Leo, A. J. Substituent Constants for Correlation
Analysis in Chemistry; Wiley: New York, 1979. CC9800071