Interactive Media Consulting, LLCInteractive Media Consulting

Searching the Net - Part I

 

By Elizabeth Weise Moeller

Network Wizards (http://www.nw.com) estimates there were over 36 million World Wide Web (WWW) domains in July, 1998-an increase of 21% from the January, 1998 survey. With the WWW growing at such a breakneck pace, finding the information you need is becoming increasingly difficult. Fortunately, we have sites like Yahoo (http://www.yahoo.com), Hotbot (http://www.hotbot.com), and Excite (http://www.excite.com) to help. Unfortunately, search sites are not created equal. Part I (this issue) explains how search sites work. Part II (next issue) will discuss ways to improve the time you spend searching.

Not all search sites are "search engines," some are directories. The essential difference is in the way pages are entered in their respective databases. Search engines are automated while directories still use human intervention.

 

Search Engines

Search engines, such as Hotbot or Excite, use "spiders" to crawl the web and find web sites to add to its index. The search engine then often strips out the graphical elements and saves the text from the web site. Some search engines remove very common words, such as "the" or "and." The user interface is a program that sifts through everything in the search engine's index to find sites that may be relevant, based on the keywords provided. These indexes tend to be very large. AltaVista (http://altavista.digital.com), consistently one of the largest, has over 140 million entries. Excite, considered a "medium-sized" search engine, has approximately 55 million entries.

How do search engines decide which sites are relevant to your search? It depends on the search engine. Some rely on the information found in the web site title or in the META tags (hidden text on a web site listing items such as the author, generator, keywords, and web site description). Others rely more heavily on the content of the web site, the length of the web page, or the number of links pointing to the web page. Some search engines use a combination of indexing methods. For example, Hotbot weights these variables so that if your keywords appear in META tags, the site will be ranked high, but not as high as if your keywords appear in the web site title. Excite, on the other hand, highly ranks pages with a number of links pointing to the page.

Some keywords can create a problem in your search. For example, Hotbot includes the word "web" as a stopword-a word not indexed by the search engine because it appears so frequently. So searching for information on the children's book, Charlotte's Web, causes some problems. A search of Hotbot for the exact phrase Charlotte's Web returned 3,589 possible web sites with the first relevant site listed 56th. AltaVista, on the other hand, returned 6898 possible web sites with a relevant site listed first.

To illustrate the overall differences between Hotbot, Excite, and AltaVista, a search for the words professional communication society, using no special search tricks, came up with the following results:

  • Hotbot returned 91,910 web pages with the first relevant web site at #31
  • Excite returned 3,090,791 web pages with the first relevant web site at #17
  • AltaVista returned 7,023,510 web pages with the first relevant web site at #

The difference between 91 thousand returns, 3 million returns, and 7 million returns is remarkable. However, even though Excite returned 33 times the number of pages Hotbot did, Excite provided a relevant site earlier in the listings illustrating the differences. AltaVista, with the largest number of returns, provided the a relevant site right at the top.

While AltaVista provided the most relevant results in these two examples, it does not mean that AltaVista is the best search engine. It just happened to provide the best results on these two examples. Part II will discuss ways to narrow searches so fewer web sites are returned and relevant sites appear near the top of the results list more often-no matter which search engine you are using.

 

Directories

A directory, such as Yahoo, is created by humans. Web designers submit their sites and suggest categorical placements. Directory employees review the submissions, determine which categories are appropriate, and then finally add the sites to their listings. This human interaction often translates into more accurate results while having significantly fewer entries. Yahoo has approximately 750,000 entries, compared to the 50 million plus entries of Hotbot, Excite, and AltaVista. It also takes longer for a site to be listed.

Again, a search of Yahoo for Charlotte's Web returned 11 web sites, none of which had relevant information. A search for professional communication society returned 3 categories and 36 web pages. The categories were marginally relevant as were some of the web sites.

This does not mean a directory is a useless search tool. It does mean that different types of searches require different search strategies. A directory such as Yahoo is quite useful if you want to find a listing of web sites discussing intellectual property issues or if you want to find a listing of grant agencies and foundations. In these cases, the categorical listing makes searching easier.

Searching the Net is not as easy as it used to be. Since each search site behaves differently, it is best to try your searches using a few different sites. After a little searching yourself, you will find a search site that meets your needs best. Part II of this column, in the next issue, will discuss ways to improve your search techniques at both search engines and directories.

Return to Net Notes Listing

© 2000 Interactive Media Consulting, LLC