Author : Shipra Mittal
Affiliation : Department of Computer Science & Engineering, National Institute of Technology
Country : India
Category : Computer Science & Information Technology
Volume, Issue, Month, Year : 6, 4, November, 2016
With the increasing growth of Internet and World Wide Web, information retrieval (IR) has attracted much attention in recent years. Quick, accurate and quality information mining is the core concern of successful search companies. Likewise, spammers try to manipulate IR system to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the spamming techniques of adversarial IR, allowing users to exploit ranking of specific documents in search engine result page (SERP). Spammers take advantage of different features of web indexing system for notorious motives. Suitable machine learning approaches can be useful in analysis of spam patterns and automated detection of spam. This paper examines content based features of web documents and discusses the potential of feature selection (FS) in upcoming studies to combat web spam. The objective of feature selection is to select the salient features to improve prediction performance and to understand the underlying data generation techniques. A publically available web data set namely WEBSPAM - UK2007 is used for all evaluations.
Keyword : Web Spamming, Spamdexing, Content Spam, Feature Selection & Adversarial IR
For More Details : https://airccj.org/CSCP/vol6/csit65103.pdf