Web Content Mining Techniques for Structured Data: A Review
Keywords:
Web Mining, Content Mining, Structured Data Mining, Web Crawler
Abstract
The Web accumulated vast volumes of data, making it difficult to extract data according to customers' needs; hence, web mining came to tackle these challenges. Web mining has involved databases, information retrieval systems, and artificial intelligence. Web Mining is an extensive, interdisciplinary, and dynamic area; it consists of three techniques: Web Content Mining, Structured Web Mining, and Web Usage Mining. This paper gives an overview of web mining techniques and explores the Web Content Mining Techniques, such as Wrapper Generation, Page Content Mining, and Web Crawler, including their classification and tool being used.
References
[1] Shukla, R. K., Sharma, P., Samaiya, N., & Kherajani, M. (2020). WEB USAGE MINING-A Study of Web data pattern detecting methodologies and its applications in Data Mining. In 2nd International Conference on Data, Engineering and Applications (IDEA). 2020 2nd International Conference on Data, Engineering and Applications (IDEA). IEEE. https://doi.org/10.1109/idea49133.2020.9170690
[2] WIKIPEDIA: "Web mining," URL: https://en.wikipedia.org/wiki/Web_mining
[3] Faustina Johnson and Santosh Kumar Gupta, "Web Content Mining Techniques: A Survey," 2012, International Journal of Computer Application (0975-888), DOI: 10.5120/7236-0266.
[4] Saleh Mowla, Ishita Bedi, and Nisha P.Shetty, "A Study on Web Mining Tools and Techniques," 2017, published in Journal of Engineering and Applied Sciences; DOI: 10.36478/jeasci.2017.6135.6142.
[5] Vandana Shrivastava, "A Methodical Study of Web Crawler," 2018, published in International Journal of Engineering Research and Applications; DOI: 10.9790/9622-0811 01 0108
[6] Mohd Amir Bin Mohd Azir, and Kamsuriah Binti Ahmad, "Wrapper Approaches For Web Data Extraction: A Review," 2017 published in IEEE Conference, DOI: 10.1109/ICEEI.2017.8312458.
[7] Mohd Shoaib1, and Ashish K. Maurya, "Comparative Study of Different Web Mining Algorithms to Discover Knowledge on the Web," 2014 Conference Paper @ ResearchGate;
[8] SEOBILITY-WIKI: Structured Data; URL: https://www.seobility.net/en/wiki/Structured_Data.
[9] WIKIPEDIA: Semi-structured Data; URL: https://en.wikipedia.org/wiki/Semi-structured_data#cite_note-1
[10] Mylavarapu Kalyan Ram, M.Venkateswara Rao, and Challapalli Sujana, "An Overview on Multimedia Data Mining and Its Relevance Today," 2017, published in IJCST (International Journal of Computer Science Trends and Technology)-Vol. 5; URL: http://ijcstjournal.org/Vol5Issue3No1.html
[11] Lin Xuan Yu, Yeli Li, Qingtao ZengQingdaobbbong tSun, Yuning Bian and Wei He, "Summary of web crawler technology research," 2020, Journal of Physics: Conference Series – IOP Publishing; DOI: 10.1088/1742-6596/1449/1/012036
[12] WIKIPEDIA: Web crawler; URL: https://en.wikipedia.org/wiki/Web_crawler
[13] Lu Zhang, Zhan Bu, Zhiang Wu, and Jie Cao, "Distributed and generic web crawler for online information extraction," 2017 – IEEE, DOI: 10.1109/BESC.2016.7804487
[14] Anish Gupta and Priya Anand, "FOCUSED WEB CRAWLERS AND ITS APPROACHES," 2015 – IEEE, DOI: 10.1109/ABLAZE.2015.7154936
[15] Kevin S. McCurley, "Incremental Crawling," Google Research; URL: https://research.google.com/pubs/archive/34403.pdf
[16] WIKIPEDIA: Wrapper, URL: https://en.wikipedia.org/wiki/Wrapper_(data_mining)
[17] Andemariam Mebrahtu, and Balu Srinivasulu, "Web Content Mining Techniques and Tools," 2017 – IJSCMC; URL: https://www.ijcsmc.com/docs/papers/April2017/V6I4201725.pdf
[18] Kumar, S., & Kumar, R. (2021).” A Study on Different Aspects of Web Mining and Research Issues. In IOP Conference Series: Materials Science and Engineering” (Vol. 1022, Issue 1, p. 012018). IOP Publishing. https://doi.org/10.1088/1757-899x/1022/1/012018
[19] Sharma, P. S., Yadav, D., & Thakur, R. N. (2022). “Web Page Ranking Using Web Mining Techniques: A Comprehensive Survey”. In M. P. Kumar Reddy (Ed.), Mobile Information Systems (Vol. 2022, pp. 1–19). Hindawi Limited. https://doi.org/10.1155/2022/7519573
[2] WIKIPEDIA: "Web mining," URL: https://en.wikipedia.org/wiki/Web_mining
[3] Faustina Johnson and Santosh Kumar Gupta, "Web Content Mining Techniques: A Survey," 2012, International Journal of Computer Application (0975-888), DOI: 10.5120/7236-0266.
[4] Saleh Mowla, Ishita Bedi, and Nisha P.Shetty, "A Study on Web Mining Tools and Techniques," 2017, published in Journal of Engineering and Applied Sciences; DOI: 10.36478/jeasci.2017.6135.6142.
[5] Vandana Shrivastava, "A Methodical Study of Web Crawler," 2018, published in International Journal of Engineering Research and Applications; DOI: 10.9790/9622-0811 01 0108
[6] Mohd Amir Bin Mohd Azir, and Kamsuriah Binti Ahmad, "Wrapper Approaches For Web Data Extraction: A Review," 2017 published in IEEE Conference, DOI: 10.1109/ICEEI.2017.8312458.
[7] Mohd Shoaib1, and Ashish K. Maurya, "Comparative Study of Different Web Mining Algorithms to Discover Knowledge on the Web," 2014 Conference Paper @ ResearchGate;
[8] SEOBILITY-WIKI: Structured Data; URL: https://www.seobility.net/en/wiki/Structured_Data.
[9] WIKIPEDIA: Semi-structured Data; URL: https://en.wikipedia.org/wiki/Semi-structured_data#cite_note-1
[10] Mylavarapu Kalyan Ram, M.Venkateswara Rao, and Challapalli Sujana, "An Overview on Multimedia Data Mining and Its Relevance Today," 2017, published in IJCST (International Journal of Computer Science Trends and Technology)-Vol. 5; URL: http://ijcstjournal.org/Vol5Issue3No1.html
[11] Lin Xuan Yu, Yeli Li, Qingtao ZengQingdaobbbong tSun, Yuning Bian and Wei He, "Summary of web crawler technology research," 2020, Journal of Physics: Conference Series – IOP Publishing; DOI: 10.1088/1742-6596/1449/1/012036
[12] WIKIPEDIA: Web crawler; URL: https://en.wikipedia.org/wiki/Web_crawler
[13] Lu Zhang, Zhan Bu, Zhiang Wu, and Jie Cao, "Distributed and generic web crawler for online information extraction," 2017 – IEEE, DOI: 10.1109/BESC.2016.7804487
[14] Anish Gupta and Priya Anand, "FOCUSED WEB CRAWLERS AND ITS APPROACHES," 2015 – IEEE, DOI: 10.1109/ABLAZE.2015.7154936
[15] Kevin S. McCurley, "Incremental Crawling," Google Research; URL: https://research.google.com/pubs/archive/34403.pdf
[16] WIKIPEDIA: Wrapper, URL: https://en.wikipedia.org/wiki/Wrapper_(data_mining)
[17] Andemariam Mebrahtu, and Balu Srinivasulu, "Web Content Mining Techniques and Tools," 2017 – IJSCMC; URL: https://www.ijcsmc.com/docs/papers/April2017/V6I4201725.pdf
[18] Kumar, S., & Kumar, R. (2021).” A Study on Different Aspects of Web Mining and Research Issues. In IOP Conference Series: Materials Science and Engineering” (Vol. 1022, Issue 1, p. 012018). IOP Publishing. https://doi.org/10.1088/1757-899x/1022/1/012018
[19] Sharma, P. S., Yadav, D., & Thakur, R. N. (2022). “Web Page Ranking Using Web Mining Techniques: A Comprehensive Survey”. In M. P. Kumar Reddy (Ed.), Mobile Information Systems (Vol. 2022, pp. 1–19). Hindawi Limited. https://doi.org/10.1155/2022/7519573
Published
2022-09-23
How to Cite
Bamboat, M., Khan, G., Mirbahar, N., & Memon, S. (2022). Web Content Mining Techniques for Structured Data: A Review. Journal of Software Engineering, 1(1), 1-10. Retrieved from https://sjhse.smiu.edu.pk/index.php/SJHSE/article/view/23
Section
Articles