By Peter Weiss
When Gary W. Flake searches for a certain computer scientist on the World Wide Web, he gets deluged with unwanted information about someone else. The scientist’s name? Michael Jordan.
A new algorithm developed by Flake of the NEC Research Institute in Princeton, N.J., and his colleagues could lead to more precise Web-searching engines, Flake says. Such an engine could, for instance, zero in on University of California, Berkeley Professor Jordan despite all those Web references to the basketball superstar.
Most conventional search engines seek Web pages with key words that simply match the words a searcher supplies, no matter the context. In the March issue of Computer, Flake and his coworkers describe their alternative: the community-identification algorithm. It recognizes communities of Web pages by taking advantage of patterns of interconnectivity in these so-called small-world networks (SN: 8/22/98, p. 124). In such networks, members tend to have lots of links to a small nucleus of other members and fewer links to everything else.
In proposed Web applications of Flake’s algorithm, a person would call up a Web page and then activate a program that follows links. It would start on the original page and crawl through the Web to find other pages for provisional membership in a community. Then, the algorithm would discard candidates with more than half of their links outside the evolving community.
In the case of Michael Jordan, the search engine could be asked to first identify a community of computer-science Web pages. Then typing “Michael Jordan” would lead to the professor rather than the sports icon, Flake says. The same algorithm would also be adept at demarcating seedy cyberneighborhoods as off-limits to children or bored employees, he adds. Moreover, because the algorithm can handle databases such as marketing data or phone records, it can potentially identify communities of shoppers with similar tastes or maybe even terrorist cells.