Detecting Spam Blogs: A Machine Learning Approach

  • Himanshu Jain Student,Department of Computer Science Engineering, Global Institute of Technology, Jaipur, Rajasthan, India

Abstract

Weblogs or sites are a significant better approach to distribute data, take part in conversations, and structure communities on the Internet. The Blogosphere has unfortunately been contaminated by a few assortments of spam-like substance. Blog web crawlers, for instance, are inundated by posts from splogs – bogus online journals with machine produced or commandeered content whose sole object is to have advertisements or raise the PageRank of target locales. We dis-cuss how SVM models dependent on nearby and connection based highlights can be utilized to distinguish splogs. We present an assessment of learned models and their utility to blog web indexes; frameworks that utilize methods contrast ing from those of traditional web crawlers.


How to cite this article:
Jain H, Thakur H. Detecting Spam Blogs: A Machine Learning Approach. J Adv Res Appl Arti Intel Neural Netw 2020; 4(2): 12-18.

References

1. Boser, B. E.; Guyon, I. M.; and Vapnik, V. N. 1992. A training algorithm for optimal margin classifiers. In COLT ’92: Proceedings of the fifth annual workshop on Compu-tational learning theory, 144–152. New York: ACM Press.
2. Chang, C.- C., and Lin, C.-J. 2001. LIBSVM: a library for support vector machines. Software available at http:
3. //www.csie.ntu.edu.tw/∼cjlin/libsvm.
4. Cuban, M. 2005. A splog here, a splog there, pretty soon it ads up and we all lose. [Online; accessed 22-December-

5.

6. 2005; http://www.blogmaverick.com/entry/ 1234000870054492/].
7. Dalvi, N. N.; Domingos, P.; Mausam; Sanghai, S.; and Verma, D. 2004. Adversarial classification. In KDD, 99– 108.
8. Drost, I., and Scheffer, T. 2005. Thwarting the nigritude ultramarine: Learning to identify link spam. In ECML, 96– 107.
9. Fetterly, D.; Manasse, M.; and Najork, M. 2004. Spam, damn spam, and statistics: using statistical analysis to lo- cate spam web pages. In WebDB ’04: Proceedings of the 7th International Workshop on the Web and Databases,1–
10. 6. New York, NY, USA: ACM Press.
11. Gy¨ongyi, Z., and Garcia-Molina, H. 2005. Web spam tax- onomy. In First International Workshop on Adversarial Information Retrieval on the Web.
12. Gy¨ongyi, Z.; Garcia-Molina, H.; and Pedersen, J. 2004. Combating web spam with TrustRank. In Proceedings of the 30th International Conference on Very Large Data- bases, 576–587. Morgan Kaufmann.
13. Joachims, T. 1998. Text categorization with suport vector machines: Learning with many relevant features. In ECML ’98: Proceedings of the 10th European Conference on Ma-chine Learning, 137–142. London, UK: Springer-Verlag.
14. Kolari, P.; Finin, T.; and Joshi, A. 2006. SVMs for the blogosphere: Blog identification and splog detection. In AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs.
15. Kolari, P.; Java, A.; and Finin, T. 2006. Characterizing the splogosphere. In WWW 2006, 3rd Annual Workshop on the Webloggging Ecosystem: Aggregation, Analysis and Dynamics.
16. Lu, Q., and Getoor, L. 2003. Link-based classification. In
17. ICML, 496–503.
18. Mishne, G.; Carmel, D.; and Lempel, R. 2005. Blocking blog spam with language model disagreement. In AIRWeb ’05 - 1st International Workshop on Adversarial Informa-tion Retrieval on the Web, at WWW 2005.
19. Page, L.; Brin, S.; Motwani, R.; and Winograd, T. 1998. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project.
20. Pirillo, C. 2005. Google: Kill blogspot al-ready!!! [Online; http://chris.pirillo.com/ blog/
21. archives/2005/10/16/1302867.html].
22. Rubel, S. 2005. Blog content theft. [Online; http://www.micropersuasion.com/2005/ 12/blog content th.html].
23. Umbria. 2005. Spam in the blogosphere. [Online; http://www.umbrialistens.com/consumer/ showWhitePaper].
24. Wu, B., and Davison, B. D. 2005. Identifying link farm spam pages. In WWW ’05: Special interest tracks and posters of the 14th international conference on World Wide Web, 820–829. New York: ACM Press
Published
2021-09-28
How to Cite
JAIN, Himanshu. Detecting Spam Blogs: A Machine Learning Approach. Journal of Advanced Research in Applied Artificial Intelligence and Neural Network, [S.l.], v. 4, n. 2, p. 12-18, sep. 2021. Available at: <http://thejournalshouse.com/index.php/neural-network-intelligence-adr/article/view/354>. Date accessed: 22 dec. 2024.