Archive for the '6 - Web-Search' Category
Web-Search
Available search tools on the Web fall into two categories: net directories and search engines. Net directories, such as Yahoo!, give a hierarchical classification of documents; each document in the directory is associated with a node of the tree (either a leaf or an internal node).
Moving along the tree, a user can access a set of pages that have been manually pre-classified and placed in the tree. Yahoo! for example consists today of a classification tree of depth of 10 or more (depending on the path followed). About 10-30 branches at each level of the tree lead to a total of a few hundreds of thousands of pages. Search in a net directory is very convenient and usually leads the user to the set of documents he is seeking, but it leads to only a small fraction of the Web (often the commercial part). This limited coverage stems from the (slow) rate of manual classification.
Search engines such as AltaVista andExcite cover a large portion of the Web. The drawback of these search engines is that they only support syntactic, keyword-oriented search, i.e., the search returns a list of pages that include a given set of keywords (or phrases). Most queries return either no page or a long list of pages, all of which include the given keywords, but most of which are irrelevant. The user must manually browse one document after another to find the page(s) sought.
Some search engines offer “advanced” search features that enable Boolean combinations of search terms for improving the precision of the search. Aside from the limited improvement this can afford, one should not expect non-computer-literate users (whose ranks are growing) to be experienced at forming such Boolean formula. (Note also that the “find similar pages” features in Excite and Infoseek require the user to first find at least one relevant page using syntactic search).
So how does Google work??
At the heart of Google software is a system called PageRank, which basically gives every site on the Internet a rank from 0-10. So how is this calculated? Well, the page rank of your site is determined by the links to your web site. Each time somebody adds a link to your web site, Google interprets this as a vote for your site. The more links you have to your site, the more votes you get.
Google also looks a little deeper than just sheer volume of links, and analyses the importance of the web site that has cast a vote for your site. Sites that Google determines are important are those with a higher PageRank.
So a link to you from a site with a PageRank of 6 is better than a link from a site with a PageRank of 3. In fact, 1 link from a site with a PageRank of 6 is better than 10 links from PageRank 3 sites.
Still following? Almost there. When Google is determining how important the link to your site is, it also checks how many other links are on the web page.
Take a PageRank 6 page for example. If it has 1000 links on a page, with your site being one of them, Google will determine that the site’s ‘vote’ for your web site is only worth 1/1000 of the PageRank 6 value. If there were only 3 other links on that page their ‘vote’ for your site will be interpreted by Google as much more important