International Journal of Science and Technology

International Journal of Science and Technology>> Volume 7, Number 2, February 2017

International Journal of Science and Technology

An Analysis of Web Document Clustering Algorithms

Full Text Pdf Pdf
Author K. Sridevi, R. Umarani, V.Selvi
On Pages 275-282
Volume No. 1
Issue No. 6
Issue Date December 01, 2011
Publishing Date December 01, 2011
Keywords Information Retrieval, Document Clustering, Search Results Clustering, Web Clustering Engines


Evidently there is a tremendous increase in the amount of information found today on the largest shared information source, the World Wide Web. The process of finding relevant information on the web is overwhelming. Even with the presence of today’s search engines that index the web it is difficult to wade through the large number of returned documents in a response to a user query. Furthermore, users without domain expertise are not familiar with the appropriate terminology thus not submitting the right query terms, leading to the retrieval of more irrelevant pages and the most relevant documents do not necessarily appear at the top of the query output sequence. Users of Web search engines are thus often forced to sift through the long ordered list of document “snippets” returned by the engines. This fact has lead to the need to organize a large set of documents into categories through clustering. The Information Retrieval community has explored document clustering as an alternative method of organizing retrieval results. Grouping similar documents together into clusters will help the users find relevant information quicker and will allow them to focus their search in the appropriate direction. Various web document clustering techniques are now being used to give meaningful search result on web. In this paper an analysis of the various categories of web document clustering and also the various existing web clustering engines with its relevant clustering techniques are presented.

    Journal of Science and Technology | Journal of Information and Communication Technology     
© 2010 IRPN Publishers