Extraction of Web News from Web Pages Using a Ternary Tree Approach
- Title
- Extraction of Web News from Web Pages Using a Ternary Tree Approach
- Creator
- Laishram D.; Sebastian M.
- Description
- The spread of information available in the World Wide Web, it appears that the pursuit of quality data is effortless and simple but it has been a significant matter of concern. Various extractors, wrappers systems with advanced techniques have been studied that retrieves the desired data from a collection of web pages. In this paper we propose a method for extracting the news content from multiple news web sites considering the occurrence of similar pattern in their representation such as date, place and the content of the news that overcomes the cost and space constraint observed in previous studies which work on single web document at a time. The method is an unsupervised web extraction technique which builds a pattern representing the structure of the pages using the extraction rules learned from the web pages by creating a ternary tree which expands when a series of common tags are found in the web pages. The pattern can then be used to extract news from other new web pages. The analysis and the results on real time web sites validate the effectiveness of our approach. 2015 IEEE.
- Source
- Proceedings - 2015 2nd IEEE International Conference on Advances in Computing and Communication Engineering, ICACCE 2015, pp. 628-633.
- Date
- 2015-01-01
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Subject
- pattern generation; Quality data; Ternary tree; unsupervised web extraction; Web extractors
- Coverage
- Laishram D., Department of Computer Science and Engineering, Christ University Faculty of Engineering, Bangalore, India; Sebastian M., Department of Computer Science and Engineering, Christ University Faculty of Engineering, Bangalore, India
- Rights
- Restricted Access
- Relation
- ISBN: 978-147991734-1
- Format
- Online
- Language
- English
- Type
- Conference paper
Collection
Citation
Laishram D.; Sebastian M., “Extraction of Web News from Web Pages Using a Ternary Tree Approach,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 23, 2025, https://archives.christuniversity.in/items/show/21025.