GWebPositionRank: Unsupervised Graph and Web-based Keyphrase Extraction form BERT Embeddings
- Title
- GWebPositionRank: Unsupervised Graph and Web-based Keyphrase Extraction form BERT Embeddings
- Creator
- Jose J.; Soundarabai B.P.
- Description
- Automatic keyphrase extraction is considered a preliminary task in many Natural Language Processing (NLP) applications that attempt to extract the descriptive phrases representing the main content of a document. Owing to the need for a large amount of labelled training data, an unsupervised approach is highly appropriate for keyphrase extraction and ranking. Keyphrase Extraction with BERT Transformers (KeyBERT) leverages the BERT embeddings that utilize the cosine similarity to rank the candidate keyphrases. However, extracting keyphrases based on the fundamental cosine similarity measure does not consider the spatial dimension locally and globally. Hence, this work focuses on enhancing the KeyBERT-based method with a Graph-based WebPositionRank (GWebPositionRank) design. The proposed unsupervised GWebPositionRank is the composition of graph-based ranking, referring to local analysis and web-based ranking, referring to the global analysis. To spatially examine the keyphrases, the proposed approach conducts the keyphrase position analysis at the document level through graph-based ranking and the web level using the WebPositionRank algorithm. Initially, the proposed approach extracts the coarse-grained keyphrases from the KeyBERT model and ranks the extracted keyphrases, the modelling of quality and fine-tuned keyphrases. In the GWebPositionRank method, the quality keyphrase ranking involves the document-level position analysis and four different graph centrality measures in a constructed textual graph for each text document, whereas the fine-tuned keyphrase ranking involves the web-level position analysis and diversity computation for the quality keyphrases extracted from the graph-based ranking method. Thus, the proposed approach extracts a set of potential keyphrases for each document through the advantage of the GWebPositionRank algorithm. The experimental results illustrate that the proposed unsupervised algorithm yielded superior results than the comparative baseline models while testing on the SemEval2017 dataset. 2024 IEEE.
- Source
- Proceedings of ICWITE 2024: IEEE International Conference for Women in Innovation, Technology and Entrepreneurship, pp. 45-52.
- Date
- 2024-01-01
- Publisher
- Institute of Electrical and Electronics Engineers Inc.
- Subject
- Graph-based Centrality measures; KeyBERT; Keyphrase Extraction; Keyphrase semantic diversity; Semantic similarity measure; Wikipedia source
- Coverage
- Jose J., Christ Deemed to Be University, Dept. of Computer Science, Bangalore, India; Soundarabai B.P., Christ Deemed to Be University, Dept. of Computer Science, Bangalore, India
- Rights
- Restricted Access
- Relation
- ISBN: 979-835038328-7
- Format
- Online
- Language
- English
- Type
- Conference paper
Collection
Citation
Jose J.; Soundarabai B.P., “GWebPositionRank: Unsupervised Graph and Web-based Keyphrase Extraction form BERT Embeddings,” CHRIST (Deemed To Be University) Institutional Repository, accessed February 25, 2025, https://archives.christuniversity.in/items/show/19479.