Edgar Allan Poe Review: Spring 2001
Poe in Cyberspace: Search engines grow up

Watch out -- Poe internet searching is growing up! Gone are the days of just firing up Yahoo, typing Poe in the text box and sitting back to watch the marvelous results. The sheer size of the internet now -- according to one source 1,326,920,000 charted web pages (yes, over a billion -- not including those yet uncharted) makes any search very chancy. In addition, the speed with which pages spring up, change, move, or expire make one's head spin. The original anxiety was of finding something -- but the new anxiety is of finding too much. Today the main task of the search engine is how it would rank and thus display its riches for people infected with the browser habit of not looking at more than the top screenful or two.

In the old days (only six years ago!) search engines found that the way to seem smart was to use software to weigh the supposed relevance of the results of any search. Various algorithms used word counts of the web page title, keywords list, description, and beginning text, and each page was brazenly quantified as 98% or 73% or 6% relevant. No doubt the numbers were true to the algorithm but to the Poe researcher they seemed meaningless if not silly. Too often unconnected or expired pages floated to the top, while meatier ones were lost below. So the software gurus hit on a very radical idea: why not bring back human beings? Editors were expected to use their judgment to create directories on fixed subjects, such as standard American authors. So the early machine-made search engines came to be supplemented by human-made directories. Although the human editors could get rid of irrelevant or expired links by actually checking them, their editorial judgment revealed that very few were librarians or literary scholars. Soon the syndication of these directorires became a business. Today Open Directory serves Lycos, Hotbot, and Netscape with its 32,000 editors and 325,000 subject categories; Looksmart serves Altavista, Excite, and MSN with 200 editors and 200,000 subject categories, and Yahoo has its own 100+ editors and an undisclosed number of subject categories.

Yahoo retains its early advantage as the first popular search engine, but some researches long ago drifted to AltaVista, which covered more pages and could be customized by the user, while others swore by Hotbot, which seemed to deliver a better percentage of significant answers. But anyone with the patience and determination to look at the thousands, tens of thousands, and even hundreds of thousands of promised hits in any Poe search was destined to discover that the navigational thread simply expired after the display of a certain number of screenfuls, delivering but a tiny percentage of all those promised. Where were all those additional hits, and what was the point of promising them if they couldn't be displayed? One tactic to overcome this dead end was to dig more narrowly and deeply into these resources by making very specific requests. Another tactic was to keep switching search engines, each operating according to different criteria and thus producing slightly different results. But in the last year or so the entire complexion of internet searching has changed. The promise of relying on human editors proved more complicated and overwhelming than legions of employees or volunteers could handle. So software was given a second chance.

A dominant new browser emerged with a name as improbable as Yahoo (but not as improbable as Dogpile), namely Google at http://www.google.com, which avoid the pitfalls of relying exclusively on the taste and preferences of its house staff by using a concensus of other web page authors to weigh each page on Poe or other subjects. Google simply ranks each page according to how many other pages have written links to it, frankly paying more attention to what it deems the more important sites. Thus the ranking of any Poe site depends on how many other significant web authors, academic or general, have deemed it worthwhile. In some fashion the relative quantity translates into relative quality and thus produces a display rank order. The formula has weaknesses in theory but seems remarkably successful in practice. If you type more than one word, Google assumes you want pages which include both words, so if you type poe cryptography -- Google doesn't care about upper or lower case -- you will get hits on the combination of words, not on either word as some early browsers did. (In boolean terms, the default for combining words on Google is and rather than or.) If you are searching for an exact phrase -- for example, suspected plagiarism in a term paper on the Gothic which misspells Poe's name you might put the phrase in quotation marks as in "Edgar Allen Poe" Gothic, the and being understood.

Google searches can be configured by the number of hits per run (up to 100), by languages, and by proximity range when multiple words are requested. Having scored a success with its pragmatic computer algorithm, Google is now consolidating its success by introducing human-made directories as well. To get to Poe, the top level directory category is Arts, under which 30,000 Literature entries are claimed. There are several possible pathways to Poe, including such familiar research rubrics as American Literature, Authors, Electronic Text Archives, Fantasy, Fiction, Genres, Mystery, Poetry, Romance, Short Story, and World Literature. I tested American Literature (21 general hits) -- the 19th century (6 hits, Poe being one of three highlighted authors) -- and finally Poe (82), under which came Biography (8) and Works (54), below which Criticism (3), Poems (16), and Tales (16) could be found. Although the structure and procedure felt familiar, at times the results seemed thin and random. Although the directory structure follows subject headings reminiscent of the Dewey or Library of Congress schemes, web pages, unfortunately, are not always composed with these subject classifications in mind.

In contrast to the author-centric structure of Google, a reader-centric structure can be found at Lycos at http://www.lycos.com/. This site reshapes itself according to the usage patterns of Web readers by following their browsing and searching patterns. The directory structure of Lycos proved to be fairly simple: Arts led to Literature (27,109), then American Literature (1,218), then 19th century (742) -- the last two giving the login names of their section editors -- and then Poe (156). Here the top level contained 31 general sites plus 123 more under Works, divided into Poems (19) and Tales (81). But the quality of the Lycos listings seemed an uneven mixture of academic and general sites with accidental duplicates and some dead links. In a second test, I searched directly for Poe, Edgar Allan and got a display page divided into three parts: first came the notice, "Popular: 4 Web sites were selected based on user selection traffic"; then "Web Sites," which contained ten more hits followed by the notice "66,362 Web sites were found in a search of the complete Lycos Web catalog"; and finally, an unavoidable section for "Shopping."

An uncannily well-targeted banner ad appeared unexpectedly at both the top and bottom of the page, asking "Which of Poe's Works?" No matter what I tried to chose I was taken to a specialist paper mill called www.poeessays.com, which offered about 60 Poe term papers at $9.85 a page, the first sample page being free. The Poeessays site is one of many operated by The Paper Store at www.paperstore.net, which drops its slogan one word a time onto the screen for the benefit of slow readers: "Since 1994, The Paper Store Has Helped Tens of Thousands Of College & Graduate Students Overcome Their Greatest Fear: WRITING PAPERS!"

I wondered if Lycos regarded such advertising as a matter of protected free speech. I returned to the top level to do an Advanced Search, selecting the human edited Open Directory, then Reference, then Education (40,296), Subjects (2,999), Writing (201), Research Paper (112), and finally Paper Mills (43), with seven sites at the top level and a further division into Commercial (15), Non-commercial (9), and Directories (9). The editor of this directory, ChuckIII, included his own paper mill site.

At the bottom of the page of hits for Poe, Edgar Allan on Lycos, I noticed a "second opinion" link to Hotbot at http://hotbot.lycos.com. Here I found "People who did this search also searched for" followed by eight items, which included two different misspellings of Poe's full name. Some 51,000 hits were promised but only 10 could be viewed at a time. Hotbot, once the most sophisticated of all search engines, was now apparently a puppet of Lycos, down to the identical advertising banner for Poeessays.

The current restructuring of Lycos according to user reading or browsing patterns -- the opposite of web authoring patterns on Google -- results in a very different culture. Every web search, no matter which search engines is used, is likely to produce a promiscuous mixture of academic, popular, and plagiarism results, but I found the promotion of term papers dramatically more prominent on reader-centric Lycos than on author-centric Google. So when I searched for "Poe term paper" on Lycos first I got four popular items and then the promise of 15,727 more. The same search on Google promised 17,300 hits but, as might be expected, the inclusion of papers on serious research technique seemed somewhat more palatable. The Google directory does have fee-based as a sub-category of term papers, but at least it puts research papers under writer's resources.

If customization is your need, the Altavista site at http://www.altavista.com provides the most help and the most versatility in configuring or limiting the results. Although it is claimed that its results are ranked by relevance, the algorithm to achieve this is not disclosed. Altavista may be usefully customized to give up to 50 hits per run and to order the results by date or size. (I especially liked being able to search for new sites since my last visit.) Testing Poe, Edgar Allan gave 90,584 hits, compared to Poe, E. A., which gave 59,079, or Edgar Allan Poe, yielding 309,626. Out of curiosity I tried just Poe, producing 113,190 hits, while Poe, Edgar Allen returned 8,632 hits, largely chatrooms and student papers. At this point I could not resist Poe term papers, which gave an unexpectedly large return of 78,748 leads; finally, Edgar Allen Poe, gave an inexplicably large 600,580 hits. The lesson is that the precise search terms can make a considerable difference.

Altavista also provides a directory approach which begins with "Arts & Entertainment," moves to "Arts & Culture,: and then offers choices which include "Humanities," "Library & Resources," "Literature," and "Literary Criticism." I opted for "Authors" and then moved in several steps to "P," then "Po," and finally "Poe," where the main categories included "Biographies," "Essays & Criticism," "Fan Pages," "Guides and Directories," "Online texts," "Organizations," and "Stage Adaptations." At this level I looked at "Genres," "Periods & Movements," "Poetry," and "Theory and Criticism." But once more the structure seemed to be more substantial than its contents.

The most pleasant surprise on Altavista was the "Media/Topic" directory, where I found varied but promising Poe results leading to 1,281 images, 1,236 MP3/Audio items, and 183 Videos. The News category, limited to the last 14 days, was richly endowed with hits, thanks to the recent prominence of the Baltimore Ravens football team.

Emboldened with this sports connection, I decided to pursue internet sites for other recent Poe items in the news, such as the agreement by New York University to preserve a portion of Poe's Amity Street house, or the New York Giants fan who put some athletically inspirational verse on Poe's grave in Baltimore, or the Super Bowl bet lost by New York Senators Chuck Schumer and Hillary Clinton for which they read "The Raven" on TV after the Baltimore Ravens crushed the Giants. For these I found two specialist tools to do the best job, Deja at http://www.deja.com for internet news discussion groups, and Northern Light at http://www.northernlight.com, which has a fee-based database of recent items in newspapers, magazines, and journals.

Further emboldened, I did a direct comparison of three search engines, Google, Northernlight, and Altavista for information on Poe's connections to specific subjects: for cryptography, Google was best; for rabies, Northern Light and Google were good, for Amity Street, Google was best, for Derrida, Google and Northern Light were good. The third search engine, Altavista, came in last in each comparison. But there is a simple non-statistical test which will quickly characterize whether or a search site is designed for serious researchers or more casual browsers. The more serious lists trusted readers to understand the original web address and site title with a short text quotation, while the casual sites conveniently repackaged everything with their own promotional paraphrases.

If you need more introductory matter on academic web searching and reference, see http://www.academicinfo.net/, which has a reference desk, guides to the internet and search engines, and even student study aids. For a recent evaluation of search engines, see an article in PC Magazine for November 15, 2000, online at http://www.zdnet.com/pcmag/stories/reviews/0,6755,2652815,00.html. For further tutorials and introductions see http://websearch.about.com/internet/websearch/. There's a great deal more at searchenginewatch.com, including a discussion of what search engines can't find at http://searchenginewatch.com/links/Specialty_Search_Engines/Invisible_Web/index.html. At Searchenginewatch I learned that in one survey the top eight search engines visited by users in 2000 were Yahoo (62%), MSN (52%), AOL (44%), Lycos (35%), Go (Infoseek) (27%), Netscape (23%), Excite (20%), and Altavista (18%) -- distantly followed by my choices, Google (7%) and NorthernLight (1.4%), which trailed in the bottom third of the top twenty. Ironically, the popularity of search engine sites seems the direct opposite of their depth. Three of the four largest sites in coverage are not among the top twenty in popularity.

A final irony is that now you can search the web without calling up a search engine at all. Just type Edgar Allan Poe in the address bar of recent browsers. Internet Explorer will launch MSN and give you 114 hits on Poe ordered by popularity. (Ignore the adjective-happy copy which calls the recommended Internet Public Library Online Literary Criticism site both a "potpourri" and a "comprehensive guide.") Netscape users will get six directory categories and 68 reviewed websites for Poe. These two browser-based search engines plus the subscriber-based AOL search engine have become three of the top six search sites in popularity, still following the leader, Yahoo. Perhaps avoiding heavy traffic there will also avoid their advertising banners.

This and other "Poe in Cyberspace" columns in the Edgar Allan Poe Review are indexed online at http://andromeda.rutgers.edu/~ehrlich/poe/.

Heyward Ehrlich
Department of English
Rutgers University
Newark NJ 07102
E-mail: ehrlich@andromeda.rutgers.edu