International Journal of Information Studies, Vol 1, No 4 (2009)

A Persian Web Page Classifi er Applying a Combination of Content-Based and

Mojgan Farhoodi, Alireza Yari, Maryam Mahmoudi


Abstract


There are many automatic classifi cation methods and algorithms that have been propose for content-based or context-based features of web pages. In this paper we analyze these features and try to exploit a combination of features to improve categorization accuracy of Persian web page classifi cation. In this work we have suggested a linear combination of
different features and adjusting the optimum weighing during application. To show the outcome of this approach, we have conducted various experiments on a dataset consisting of all pages belonging to Persian Wikipedia in the fi eld of computer. These experiments demonstrate the usefulness of using content-based and context-based web page features in a linear weighted combination.