Outlier detection with data mining techniques and statistical methods

Orellana Cordero, Marcos Patricio; Cedillo Orellana, Irene Priscila

Por favor, use este identificador para citar o enlazar este ítem: http://dspace.ucuenca.edu.ec/handle/123456789/34510

Título :	Outlier detection with data mining techniques and statistical methods
Autor:	Orellana Cordero, Marcos Patricio Cedillo Orellana, Irene Priscila
Palabras clave :	-Chi-square -Data-mining -Financial-fraud -KNN Outlier
Área de conocimiento FRASCATI amplio:	2. Ingeniería y Tecnología
Área de conocimiento FRASCATI detallado:	2.2.4 Ingeniería de La Comunicación y de Sistemas
Área de conocimiento FRASCATI específico:	2.2 Ingenierias Eléctrica, Electrónica e Información
Área de conocimiento UNESCO amplio:	07 - Ingeniería, Industria y Construcción
ÁArea de conocimiento UNESCO detallado:	0714 - Electrónica y Automatización
Área de conocimiento UNESCO específico:	071 - Ingeniería y Profesiones Afines
Fecha de publicación :	2019
Fecha de fin de embargo:	15-jun-2050
Volumen:	Volumen 11, no 1
Fuente:	Proceedings - 2019 International Conference on Information Systems and Computer Science, INCISCOS 2019
metadata.dc.identifier.doi:	10.1109/INCISCOS49368.2019.00017
Editor:	Institute of Electrical and Electronics Engineers Inc.
Ciudad:	Quito
Tipo:	ARTÍCULO DE CONFERENCIA
Abstract:	The outlier detection in the field of data mining and Knowledge Discovering from Data (KDD) is capturing special interest due to its benefits. It can be applied in the financial area; because the obtained data patterns can help finding possible frauds and user errors. Therefore, it is essential to assess the truthfulness of the information. In this context, data auditory process uses techniques of data mining that play a significant role in the detection of unusual behavior. Here, a method for detecting values that can be considered as outliers in a nominal database is proposed. The basic idea in this method is to implement: a Global k-Nearest Neighbors algorithm, a clustering algorithm named k-means, and a statistical method of chi-square. The application of algorithms has been developed with a database of candidate people for the granting of a loan. Each test was made on a dataset of 1180 registers in which outliers have been introduced deliberately. The experimental results show that the method is able to detect all introduced values, which were previously labeled to be differentiated. Consequently, there were found a total of 48 tuples with outliers of 11 nominal columns. © 2019 IEEE.
Resumen :	The outlier detection in the field of data mining and Knowledge Discovering from Data (KDD) is capturing special interest due to its benefits. It can be applied in the financial area; because the obtained data patterns can help finding possible frauds and user errors. Therefore, it is essential to assess the truthfulness of the information. In this context, data auditory process uses techniques of data mining that play a significant role in the detection of unusual behavior. Here, a method for detecting values that can be considered as outliers in a nominal database is proposed. The basic idea in this method is to implement: a Global k-Nearest Neighbors algorithm, a clustering algorithm named k-means, and a statistical method of chi-square. The application of algorithms has been developed with a database of candidate people for the granting of a loan. Each test was made on a dataset of 1180 registers in which outliers have been introduced deliberately. The experimental results show that the method is able to detect all introduced values, which were previously labeled to be differentiated. Consequently, there were found a total of 48 tuples with outliers of 11 nominal columns. © 2019 IEEE.
URI :	https://ieeexplore.ieee.org/document/9052236
URI Fuente:	https://ieeexplore.ieee.org/xpl/conhome/9039808/proceeding
ISBN :	978-1-7281-5581-4
ISSN :	0000-0000
Aparece en las colecciones:	Artículos

Ficheros en este ítem:

Fichero	Descripción	Tamaño	Formato
documento.pdf Until 2050-06-15	document	56.85 kB	Adobe PDF	Visualizar/Abrir Solicitar una copia

Este ítem está protegido por copyright original

Mostrar el registro Dublin Core completo del ítem

Centro de Documentacion Regional "Juan Bautista Vázquez"

Biblioteca Campus Central		Biblioteca Campus Salud		Biblioteca Campus Yanuncay
Av. 12 de Abril y Calle Agustín Cueva, Telf: 4051000 Ext. 1311, 1312, 1313, 1314. Horario de atención: Lunes-Viernes: 07H00-21H00. Sábados: 08H00-12H00		Av. El Paraíso 3-52, detrás del Hospital Regional "Vicente Corral Moscoso", Telf: 4051000 Ext. 3144. Horario de atención: Lunes-Viernes: 07H00-19H00		Av. 12 de Octubre y Diego de Tapia, antiguo Colegio Orientalista, Telf: 4051000 Ext. 3535 2810706 Ext. 116. Horario de atención: Lunes-Viernes: 07H30-19H00