A ranking-based approach for supporting the initial selection of primary studies in a Systematic Literature Review
Loading...
Date
2019
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Abstract
Traditionally most of the steps involved in a Systematic Literature Review (SLR) process are manually executed, causing inconvenience of time and effort, given the massive amount of primary studies available online. This has motivated a lot of research focused on automating the process. Current state-of-the-art methods combine active learning methods and manual selection of primary studies from a smaller set so they can maximize the finding of relevant papers while at the same time minimizing the number of manually reviewed papers. In this work, we propose a novel strategy to further improve these methods whose early success heavily depends on an effective selection of initial papers to be read by researchers using a PCAbased method which combines different document representation and similarity metric approaches to cluster and rank the content within the corpus related to an enriched representation of research questions within the SLR protocol. Validation was carried out over four publicly available data sets corresponding to SLR studies from the Software Engineering domain. The proposed model proved to be more efficient than a BM25 baseline model as a mechanism to select the initial set of relevant primary studies within the top 100 rank, which makes it a promising method to bootstrap an active learning cycle.
Resumen
Keywords
data mining, machinelearning, NLPPCA, ranking indexing, Systematic literature review, automation, knowledge graphs, text mining
