Logo Repositorio Institucional

Please use this identifier to cite or link to this item: http://dspace.ucuenca.edu.ec/handle/123456789/34490
Title: A Comparative Evaluation of Preprocessing Techniques for Short Texts in Spanish
Authors: Orellana Cordero, Marcos Patricio
Trujillo, Andrea
Cedillo Orellana, Irene Priscila
metadata.dc.ucuenca.correspondencia: Cedillo Orellana, Irene Priscila, priscila.cedillo@ucuenca.edu.ec
Keywords: Text mining
Sentiment analysis
Twitter
Preprocessing
Natural Language Processing
Issue Date: 2020
metadata.dc.ucuenca.volumen: Volumen 1130
metadata.dc.source: Advances in Intelligent Systems and Computing
metadata.dc.identifier.doi: 10.1007/978-3-030-39442-4_10
Publisher: Springer
metadata.dc.description.city: 
San Francisco
metadata.dc.type: ARTÍCULO DE CONFERENCIA
Abstract: 
Natural Language Processing (NLP) is used to identify key information, generating predictive models, and explaining global events or trends. Also, NLP is supported during the process to create knowledge. Therefore, it is important to apply refinement techniques in major stages such as preprocessing, when data is frequently produced and processed with poor results. This document analyzes and measures the impact of combinations of preprocessing techniques and libraries for short texts that have been written in Spanish. These techniques were applied in tweets for analysis of sentiments considering evaluation parameters in its analysis, the processing time and characteristics of the techniques for each library. The performed experimentation provides readers insights for choosing the appropriate combination of techniques during preprocessing. The results show improvement of up to 5% to 9% in the performance of the classification.
Description: 
Natural Language Processing (NLP) is used to identify key information, generating predictive models, and explaining global events or trends. Also, NLP is supported during the process to create knowledge. Therefore, it is important to apply refinement techniques in major stages such as preprocessing, when data is frequently produced and processed with poor results. This document analyzes and measures the impact of combinations of preprocessing techniques and libraries for short texts that have been written in Spanish. These techniques were applied in tweets for analysis of sentiments considering evaluation parameters in its analysis, the processing time and characteristics of the techniques for each library. The performed experimentation provides readers insights for choosing the appropriate combination of techniques during preprocessing. The results show improvement of up to 5% to 9% in the performance of the classification.
URI: http://dspace.ucuenca.edu.ec/handle/123456789/34490
https://link.springer.com/chapter/10.1007/978-3-030-39442-4_10
metadata.dc.ucuenca.urifuente: https://link.springer.com/book/10.1007/978-3-030-39442-4
ISBN: 978-303039441-7
ISSN: 2194-5357
Appears in Collections:Artículos

Files in This Item:
File SizeFormat 
documento.pdf363.43 kBAdobe PDFView/Open


This item is protected by original copyright



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

 

Centro de Documentacion Regional "Juan Bautista Vázquez"

Biblioteca Campus Central Biblioteca Campus Salud Biblioteca Campus Yanuncay
Av. 12 de Abril y Calle Agustín Cueva, Telf: 4051000 Ext. 1311, 1312, 1313, 1314. Horario de atención: Lunes-Viernes: 07H00-21H00. Sábados: 08H00-12H00 Av. El Paraíso 3-52, detrás del Hospital Regional "Vicente Corral Moscoso", Telf: 4051000 Ext. 3144. Horario de atención: Lunes-Viernes: 07H00-19H00 Av. 12 de Octubre y Diego de Tapia, antiguo Colegio Orientalista, Telf: 4051000 Ext. 3535 2810706 Ext. 116. Horario de atención: Lunes-Viernes: 07H30-19H00