Wednesday, March 11, 2020

FEATURE SELECTION-MODEL-BASED CONTENT ANALYSIS FOR COMBATING WEB SPAM


Author :  Shipra Mittal

Affiliation :  Department of Computer Science & Engineering, National Institute of Technology

Country :  India

Category :  Computer Science & Information Technology

Volume, Issue, Month, Year :  6, 4, November, 2016

ABSTRACT

With the increasing growth of Internet and World Wide Web, information retrieval (IR) has attracted much attention in recent years. Quick, accurate and quality information mining is the core concern of successful search companies. Likewise, spammers try to manipulate IR system to fulfil their stealthy needs. Spamdexing, (also known as web spamming) is one of the spamming techniques of adversarial IR, allowing users to exploit ranking of specific documents in search engine result page (SERP). Spammers take advantage of different features of web indexing system for notorious motives. Suitable machine learning approaches can be useful in analysis of spam patterns and automated detection of spam. This paper examines content based features of web documents and discusses the potential of feature selection (FS) in upcoming studies to combat web spam. The objective of feature selection is to select the salient features to improve prediction performance and to understand the underlying data generation techniques. A publically available web data set namely WEBSPAM - UK2007 is used for all evaluations.

Keyword :  Web Spamming, Spamdexing, Content Spam, Feature Selection & Adversarial IR

For More Details :  https://airccj.org/CSCP/vol6/csit65103.pdf


No comments:

Post a Comment