Water quality assessment with emphasis in parameter optimisation using pattern recognition methods and genetic algorithm

Sotomayor Valarezo, Gonzalo Patricio; Hampel , Henrietta; Vazquez Zambrano, Raul Fernando

Please use this identifier to cite or link to this item: http://dspace.ucuenca.edu.ec/handle/123456789/33209

Title:	Water quality assessment with emphasis in parameter optimisation using pattern recognition methods and genetic algorithm
Other Titles:
Authors:	Sotomayor Valarezo, Gonzalo Patricio Hampel , Henrietta Vazquez Zambrano, Raul Fernando
metadata.dc.ucuenca.correspondencia:	Sotomayor Valarezo, Gonzalo Patricio, gonpa83@yahoo.com
Keywords:	Water quality Pattern recognition Genetic algorithm Land cover
metadata.dc.ucuenca.areaconocimientofrascatiamplio:	1. Ciencias Naturales y Exactas
metadata.dc.ucuenca.areaconocimientofrascatidetallado:	1.5.10 Recursos Hídricos
metadata.dc.ucuenca.areaconocimientofrascatiespecifico:	1.5 Ciencias de la Tierra y el Ambiente
metadata.dc.ucuenca.areaconocimientounescoamplio:	05 - Ciencias Físicas, Ciencias Naturales, Matemáticas y Estadísticas
metadata.dc.ucuenca.areaconocimientounescodetallado:	0521 - Ciencias Ambientales
metadata.dc.ucuenca.areaconocimientounescoespecifico:	052 - Medio Ambiente
Issue Date:	2017
metadata.dc.ucuenca.embargoend:	30-Dec-2050
metadata.dc.ucuenca.volumen:	volumen 130
metadata.dc.source:	Water Research
metadata.dc.identifier.doi:	https://doi.org/10.1016/j.watres.2017.12.010
metadata.dc.type:	ARTÍCULO
Abstract:	A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting a large complex matrix of water quality (WQ) data collected during five years (2008, 2010e2013) in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was carried out to identify classes of sampling stations regarding their associated WQ status by considering three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Cali nski-Harabasz. As a result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA algorithm was applied on the available data to construct a classification model with the two WQ classes, previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical and microbiological parameters being the independent ones. This algorithm led to a significant reduction of the multidimensional space of independent variables to only nine, which are likely to explain most of the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand and turbidity. Further, the land use cover of the study basin revealed a very good agreement with the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main results of the used WQ data mining approach.
Description:	A non-supervised (k-means) and a supervised (k-Nearest Neighbour in combination with genetic algorithm optimisation, k-NN/GA) pattern recognition algorithms were applied for evaluating and interpreting a large complex matrix of water quality (WQ) data collected during five years (2008, 2010e2013) in the Paute river basin (southern Ecuador). 21 physical, chemical and microbiological parameters collected at 80 different WQ sampling stations were examined. At first, the k-means algorithm was carried out to identify classes of sampling stations regarding their associated WQ status by considering three internal validation indexes, i.e., Silhouette coefficient, Davies-Bouldin and Cali nski-Harabasz. As a result, two WQ classes were identified, representing low (C1) and high (C2) pollution. The k-NN/GA algorithm was applied on the available data to construct a classification model with the two WQ classes, previously defined by the k-means algorithm, as the dependent variables and the 21 physical, chemical and microbiological parameters being the independent ones. This algorithm led to a significant reduction of the multidimensional space of independent variables to only nine, which are likely to explain most of the structure of the two identified WQ classes. These parameters are, namely, electric conductivity, faecal coliforms, dissolved oxygen, chlorides, total hardness, nitrate, total alkalinity, biochemical oxygen demand and turbidity. Further, the land use cover of the study basin revealed a very good agreement with the WQ spatial distribution suggested by the k-means algorithm, confirming the credibility of the main results of the used WQ data mining approach.
URI:	https://www.sciencedirect.com/science/article/pii/S0043135417310059?via%3Dihub
metadata.dc.ucuenca.urifuente:	www.sciencedirect.com/journal/water-research
ISSN:	00431354
Appears in Collections:	Artículos

Files in This Item:

File	Description	Size	Format
documento.pdf Until 2050-12-30	document	1.56 MB	Adobe PDF	View/Open Request a copy

This item is protected by original copyright

Show full item record

Centro de Documentacion Regional "Juan Bautista Vázquez"

Biblioteca Campus Central		Biblioteca Campus Salud		Biblioteca Campus Yanuncay
Av. 12 de Abril y Calle Agustín Cueva, Telf: 4051000 Ext. 1311, 1312, 1313, 1314. Horario de atención: Lunes-Viernes: 07H00-21H00. Sábados: 08H00-12H00		Av. El Paraíso 3-52, detrás del Hospital Regional "Vicente Corral Moscoso", Telf: 4051000 Ext. 3144. Horario de atención: Lunes-Viernes: 07H00-19H00		Av. 12 de Octubre y Diego de Tapia, antiguo Colegio Orientalista, Telf: 4051000 Ext. 3535 2810706 Ext. 116. Horario de atención: Lunes-Viernes: 07H30-19H00