50 million chemicals and counting
By Janet Raloff
Bring out the helium balloons, confetti and a noisemaker or two. Today, researchers the world over have reason to raise a toast. This afternoon, the Chemical Abstracts Service — an American Chemical Society subsidiary — identified the 50 millionth compound known. Arylmethylidene heterocycle — the molecule that qualified for the momentous spot during the long holiday weekend — is a future candidate for reducing neuropathic pain.
Since 1907, the Columbus, Ohio-based Chem Abstracts has maintained a registry of all publicly disclosed chemicals. Over the years, this registry has become the definitive one-stop shopping site for tracking down any and every known compound, including the names for each (as some compounds have as many as 1,000 monikers), a compound’s structure and any general characteristics (such as melting point).
“Thirty years ago, we felt six or seven million substances might be about it,” says Roger Schenck, who manages content planning at Chem Abstracts. He says there had been a suspicion that once chemists had characterized all of these, his group might become little more than caretakers of a static database. However, chemical designers continue to keep his group plenty busy.
The 40 millionth compound that his organization identified was a synthetic analog to the anticancer drug taxol. To keep things simple, we’ll just refer to that member of the azulenobenzofuran family as 1073662-18-6 (its structure appears below). In the intervening nine months since that chemical was added to the database, Chem Abstracts has identified yet an additional 10 million novel chemicals.
To search for new candidates, Chem Abstracts’ staff pores over journal articles, data filed with 59 patent authorities around the world, commercial chemical suppliers’ catalogs and announcements, and reports surfacing on the Internet. In all, “we cover over 50 languages,” Schenck says.
For instance, Chem Abstracts noted that the 50 millionth entrant “was identified by [its] scientists in the Examples section of a nearly 200-page patent document” that was issued on Aug. 13, 2009. The molecule’s formal name is a mouthful: (5Z)-5-[(5-Fluoro-2-hydroxyphenyl)methylene]-2-(4-methyl-1-piperazinyl)-4(5H)-thiazolone.
Tracking down each and every qualifying chemical has become a bit more than the chemists at Schenck’s organization can manage on their own. Computers now sift through machine-readable files, so “we don’t have to manually review each one,” he explains. Good thing, too, since it’s hard to imagine how a staff of 1,300 people could collectively screen and then add some 36,000 new chemicals to the database every day — year in and year out — complete with files describing who developed or first found a chemical and when; citations detailing the chemical’s isolation, function and properties; a chemical structure for the molecule; and often magnetic-resonance or other characteristic spectra.
With 50 million novel compounds in this database, how can anyone find what they’re looking for? Explains Schenck: “If someone knows a molecule’s name, they can search for that. Or if they even have a fragment of a name, we will look it up and find matches.” Input a known or suspected structure, he says, and if that chemical resides in the database, “we’ll get them an exact match. Or if someone only knows a piece of a structure, we can find all of the things in our collection that have that same piece in them.”
As you might expect, patent attorneys and patent examiners are key users of this encyclopedic, cross-indexed list of chemicals. So are synthetic chemists looking to cook up the next boffo plastic, alloy or pharmaceutical.
This summer, my daughter and her adviser worked in a materials science lab as part of a project funded by the National Science Foundation. Their goal: the development of novel quaternary diamondlike semiconductor crystals. My enterprising undergraduate successfully cooked up two such crystals possessing never-before-reported recipes. She was promised first authorship on a paper that reports their structures.
So I asked Schenck: When a paper comes out describing my daughter’s crystals, will each of them get added to Chem Abstracts’ database? “You bet,” he said, “with her name on them.”
I passed the information along to her over the weekend. And her typically nuanced response: “Sweeeeeet!”