Publication:
Outlier detection with data mining techniques and statistical methods

dc.contributor.authorOrellana Cordero, Marcos Patricio
dc.contributor.authorCedillo Orellana, Irene Priscila
dc.date.accessioned2020-06-15T22:05:16Z
dc.date.available2020-06-15T22:05:16Z
dc.date.issued2019
dc.descriptionThe outlier detection in the field of data mining and Knowledge Discovering from Data (KDD) is capturing special interest due to its benefits. It can be applied in the financial area; because the obtained data patterns can help finding possible frauds and user errors. Therefore, it is essential to assess the truthfulness of the information. In this context, data auditory process uses techniques of data mining that play a significant role in the detection of unusual behavior. Here, a method for detecting values that can be considered as outliers in a nominal database is proposed. The basic idea in this method is to implement: a Global k-Nearest Neighbors algorithm, a clustering algorithm named k-means, and a statistical method of chi-square. The application of algorithms has been developed with a database of candidate people for the granting of a loan. Each test was made on a dataset of 1180 registers in which outliers have been introduced deliberately. The experimental results show that the method is able to detect all introduced values, which were previously labeled to be differentiated. Consequently, there were found a total of 48 tuples with outliers of 11 nominal columns. © 2019 IEEE.
dc.description.abstractThe outlier detection in the field of data mining and Knowledge Discovering from Data (KDD) is capturing special interest due to its benefits. It can be applied in the financial area; because the obtained data patterns can help finding possible frauds and user errors. Therefore, it is essential to assess the truthfulness of the information. In this context, data auditory process uses techniques of data mining that play a significant role in the detection of unusual behavior. Here, a method for detecting values that can be considered as outliers in a nominal database is proposed. The basic idea in this method is to implement: a Global k-Nearest Neighbors algorithm, a clustering algorithm named k-means, and a statistical method of chi-square. The application of algorithms has been developed with a database of candidate people for the granting of a loan. Each test was made on a dataset of 1180 registers in which outliers have been introduced deliberately. The experimental results show that the method is able to detect all introduced values, which were previously labeled to be differentiated. Consequently, there were found a total of 48 tuples with outliers of 11 nominal columns. © 2019 IEEE.
dc.description.cityQuito
dc.identifier.doi10.1109/INCISCOS49368.2019.00017
dc.identifier.isbn978-1-7281-5581-4
dc.identifier.issn0000-0000
dc.identifier.urihttps://ieeexplore.ieee.org/document/9052236
dc.language.isoes_ES
dc.publisherInstitute of Electrical and Electronics Engineers Inc.
dc.sourceProceedings - 2019 International Conference on Information Systems and Computer Science, INCISCOS 2019
dc.subject-Chi-square
dc.subject-Data-mining
dc.subject-Financial-fraud
dc.subject-KNN
dc.subjectOutlier
dc.titleOutlier detection with data mining techniques and statistical methods
dc.typeARTÍCULO DE CONFERENCIA
dc.ucuenca.afiliacionOrellana, M., Universidad del Azuay, Cuenca, Ecuador
dc.ucuenca.afiliacionCedillo, I., Universidad de Cuenca, Departamento de Ciencias de la Computación, Cuenca, Ecuador
dc.ucuenca.areaconocimientofrascatiamplio2. Ingeniería y Tecnología
dc.ucuenca.areaconocimientofrascatidetallado2.2.4 Ingeniería de La Comunicación y de Sistemas
dc.ucuenca.areaconocimientofrascatiespecifico2.2 Ingenierias Eléctrica, Electrónica e Información
dc.ucuenca.areaconocimientounescoamplio07 - Ingeniería, Industria y Construcción
dc.ucuenca.areaconocimientounescodetallado0714 - Electrónica y Automatización
dc.ucuenca.areaconocimientounescoespecifico071 - Ingeniería y Profesiones Afines
dc.ucuenca.comiteorganizadorconferenciaSergio Luján,Oswaldo Moscoso,Luis Terán,R.S. Nithin,Giancarlo Agostini ,Diego Ordóñez,William Chamorro,Joel Paredes,Guillermo Mosquera,Estevan Gómez.
dc.ucuenca.conferencia4th International Conference on Information Systems and Computer Science, INCISCOS 2019
dc.ucuenca.embargoend2050-06-15
dc.ucuenca.embargointerno2050-06-15
dc.ucuenca.fechafinconferencia2019-11-22
dc.ucuenca.fechainicioconferencia2019-11-20
dc.ucuenca.idautor0102668209
dc.ucuenca.idautor0102815842
dc.ucuenca.indicebibliograficoSCOPUS
dc.ucuenca.organizadorconferenciaInstitute of Electrical and Electronics Engineers Inc.
dc.ucuenca.paisECUADOR
dc.ucuenca.urifuentehttps://ieeexplore.ieee.org/xpl/conhome/9039808/proceeding
dc.ucuenca.versionVersión publicada
dc.ucuenca.volumenVolumen 11, no 1
dspace.entity.typePublication
relation.isAuthorOfPublication9ecaad85-5b06-4b92-b05c-0d89c7b10660
relation.isAuthorOfPublication.latestForDiscovery9ecaad85-5b06-4b92-b05c-0d89c7b10660

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
documento.pdf
Size:
56.85 KB
Format:
Adobe Portable Document Format
Description:
document

Collections