| Search engines are the key to finding
specific information on the vast expanse of the World Wide Web. Without
the use of sophisticated search engines, it would be virtually
impossible to locate anything on the Web without knowing a specific URL,
especially as the Internet grows exponentially every day. But do you
know how search engines work? And do you know what makes some search
engines more effective than others?
There are basically three types of
search engines: Those that are powered by crawlers, or spiders; those
that are powered by human submissions; and those that are a combination
of the two.
- Crawler-based engines send crawlers,
or spiders, out into cyberspace. These crawlers visit a Web site, read
the information on the actual site, read the sites meta tags and also
follow the links that the site connects to. The crawler returns all
that information back to a central depository where the data is
indexed. The crawler will periodically return to the sites to check
for any information that has changed, and the frequency with which
this happens is determined by the administrators of the search engine.
- Human-powered search engines rely on
humans to submit information that is subsequently indexed and
catalogued. Only information that is submitted is put into the index.
In both cases, when you query a search
engine to locate information, you are actually searching through the
index that the search engine has created; you are not actually searching
the Web. These indices are giant databases of information that is
collected and stored and subsequently searched. This explains why
sometimes a search on a commercial search engine, such as Yahoo! or
Google, will return results that are in fact dead links. Since the
search results are based on the index, if the index hasn't been updated
since a Web page became invalid the search engine treats the page as
still an active link even though it no longer is. It will remain that
way until the index is updated.
So why will the same search on
different search engines produce different results? Part of the answer
to that is because not all indices are going to be exactly the same. It
depends on what the spiders find or what the humans submitted. But more
important, not every search engine uses the same algorithm to search
through the indices. The algorithm is what the search engines use to
determine the relevance of the information in the index to what the user
is searching for.
One of the elements that a search
engine algorithm scans for is the frequency and location of keywords on
a Web page. Those with higher frequency are typically considered more
relevant. But search engine technology is becoming sophisticated in its
attempt to discourage what is known as keyword stuffing, or spamdexing.
Another common element that algorithms
analyze is the way that pages link to other pages in the Web. By
analyzing how pages link to each other, an engine can both determine
what a page is about (if the keywords of the linked pages are similar to
the keywords on the original page) and whether that page is considered
"important" and deserving of a boost in ranking. Just as the technology
is becoming increasingly sophisticated to ignore keyword stuffing, it is
also becoming more savvy to Web masters who build artificial links into
their sites in order to build an artificial ranking.
|