Prof. David Gutiérrez Avilés, Ph.D. is a Computer Science Engineer (University of Seville, 2010), Master in Software Engineering and Technology (University of Seville, 2013), Ph.D. (University of Seville, 2015). He is Assistant Professor in the area of Languages and Information Systems of University of Pablo de Olavide.
His main scientific achievement is the TrLab methodology to mining and evaluating of behavior patterns from large time-dependent datasets. This novel method extracts patterns from 3D large data using triclustering and genetic algorithms techniques. Through this research, several research productions and goals have been obtained: five JCR papers published, six conferences, a stay abroad in the University of Chile, belongings to one R&D team, a Regional project, and a National project; His Ph.D. thesis and intellectual property for the TrLab application.
The research lines of Prof. David Gutiérrez Avilés, Ph.D. are focused on: Electricity fraud detection in Big Data environments, On-line machine learning from Big data streaming, Analysis of Internet of Things protocols and sensor data analysis. Through this research, several research productions and goals have been obtained: 2 conference papers (in revision); belongings to an R&D team, a European project, a National projects, and 4 Business projects.
Publications
2021 |
L. Melgar-García and D. Gutiérrez-Avilés and C. Rubio-Escudero and A. Troncoso Discovering three-dimensional patterns in real-time from data streams: An online triclustering approach Journal Article Information Sciences, in press , 2021. @article{Melgar21_IS, title = {Discovering three-dimensional patterns in real-time from data streams: An online triclustering approach}, author = {L. Melgar-García and D. Gutiérrez-Avilés and C. Rubio-Escudero and A. Troncoso}, year = {2021}, date = {2021-01-01}, journal = {Information Sciences}, volume = {in press}, abstract = {Triclustering algorithms group sets of coordinates of 3-dimensional datasets. In this paper, a new triclustering approach for data streams is introduced. It follows a streaming scheme of learning in two steps: offline and online phases. First, the offline phase provides a summary model with the components of the triclusters. Then, the second stage is the online phase to deal with data in streaming. This online phase consists in using the summary model obtained in the offline stage to update the triclusters as fast as possible with genetic operators. Results using three types of synthetic datasets and a real-world environmental sensor dataset are reported. The performance of the proposed triclustering streaming algorithm is compared to a batch triclustering algorithm, showing an accurate performance both in terms of quality and running times. }, keywords = {}, pubstate = {published}, tppubtype = {article} } Triclustering algorithms group sets of coordinates of 3-dimensional datasets. In this paper, a new triclustering approach for data streams is introduced. It follows a streaming scheme of learning in two steps: offline and online phases. First, the offline phase provides a summary model with the components of the triclusters. Then, the second stage is the online phase to deal with data in streaming. This online phase consists in using the summary model obtained in the offline stage to update the triclusters as fast as possible with genetic operators. Results using three types of synthetic datasets and a real-world environmental sensor dataset are reported. The performance of the proposed triclustering streaming algorithm is compared to a batch triclustering algorithm, showing an accurate performance both in terms of quality and running times. |
2020 |
L. Melgar-García and M. T. Godinho and R. Espada and D. Gutiérrez-Avilés and I. S. Brito and F. Martínez-Álvarez and A. Troncoso and C. Rubio-Escudero Discovering Spatio-Temporal Patterns in Precision Agriculture Based on Triclustering Conference SOCO 15th International Conference on Soft Computing Models in Industrial and Environmental Applications, Advances in Intelligent Systems and Computing 2020. @conference{SOCO20, title = {Discovering Spatio-Temporal Patterns in Precision Agriculture Based on Triclustering}, author = {L. Melgar-García and M. T. Godinho and R. Espada and D. Gutiérrez-Avilés and I. S. Brito and F. Martínez-Álvarez and A. Troncoso and C. Rubio-Escudero}, url = {https://link.springer.com/chapter/10.1007/978-3-030-57802-2_22}, year = {2020}, date = {2020-08-29}, booktitle = {SOCO 15th International Conference on Soft Computing Models in Industrial and Environmental Applications}, pages = {226-236}, series = {Advances in Intelligent Systems and Computing }, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
F. Martínez-Álvarez and G. Asencio-Cortés and J. F. Torres and D. Gutiérrez-Avilés and L. Melgar-García and R. Pérez-Chacón and C. Rubio-Escudero and A. Troncoso and J. C. Riquelme Coronavirus Optimization Algorithm: A bioinspired metaheuristic based on the COVID-19 propagation model Journal Article Big Data, 8 (4), pp. 308-322, 2020. @article{MARTINEZ-ALVAREZ20, title = {Coronavirus Optimization Algorithm: A bioinspired metaheuristic based on the COVID-19 propagation model}, author = {F. Martínez-Álvarez and G. Asencio-Cortés and J. F. Torres and D. Gutiérrez-Avilés and L. Melgar-García and R. Pérez-Chacón and C. Rubio-Escudero and A. Troncoso and J. C. Riquelme}, url = {https://www.liebertpub.com/doi/full/10.1089/big.2020.0051}, doi = {10.1089/big.2020.0051}, year = {2020}, date = {2020-07-22}, journal = {Big Data}, volume = {8}, number = {4}, pages = {308-322}, abstract = {This work proposes a novel bioinspired metaheuristic, simulating how the coronavirus spreads and infects healthy people. From a primary infected individual (patient zero), the coronavirus rapidly infects new victims, creating large populations of infected people who will either die or spread infection. Relevant terms such as reinfection probability, super-spreading rate, social distancing measures or traveling rate are introduced into the model in order to simulate the coronavirus activity as accurately as possible. The infected population initially grows exponentially over time, but taking into consideration social isolation measures, the mortality rate and number of recoveries, the infected population gradually decreases. The Coronavirus Optimization Algorithm has two major advantages when compared to other similar strategies. Firstly, the input parameters are already set according to the disease statistics, preventing researchers from initializing them with arbitrary values. Secondly, the approach has the ability to end after several iterations, without setting this value either. Furthermore, a parallel multi-virus version is proposed, where several coronavirus strains evolve over time and explore wider search space areas in less iterations. Finally, the metaheuristic has been combined with deep learning models, in order to find optimal hyperparameters during the training phase. As application case, the problem of electricity load time series forecasting has been addressed, showing quite remarkable performance.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This work proposes a novel bioinspired metaheuristic, simulating how the coronavirus spreads and infects healthy people. From a primary infected individual (patient zero), the coronavirus rapidly infects new victims, creating large populations of infected people who will either die or spread infection. Relevant terms such as reinfection probability, super-spreading rate, social distancing measures or traveling rate are introduced into the model in order to simulate the coronavirus activity as accurately as possible. The infected population initially grows exponentially over time, but taking into consideration social isolation measures, the mortality rate and number of recoveries, the infected population gradually decreases. The Coronavirus Optimization Algorithm has two major advantages when compared to other similar strategies. Firstly, the input parameters are already set according to the disease statistics, preventing researchers from initializing them with arbitrary values. Secondly, the approach has the ability to end after several iterations, without setting this value either. Furthermore, a parallel multi-virus version is proposed, where several coronavirus strains evolve over time and explore wider search space areas in less iterations. Finally, the metaheuristic has been combined with deep learning models, in order to find optimal hyperparameters during the training phase. As application case, the problem of electricity load time series forecasting has been addressed, showing quite remarkable performance. |
A. M. Fernández and D. Gutiérrez-Avilés and A. Troncoso and F. Martínez-Álvarez Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration Journal Article Big Data Research, 19-20 , pp. 100135, 2020. @article{FERNANDEZ20, title = {Automated Deployment of a Spark Cluster with Machine Learning Algorithm Integration}, author = {A. M. Fernández and D. Gutiérrez-Avilés and A. Troncoso and F. Martínez-Álvarez}, url = {https://www.sciencedirect.com/science/article/pii/S2214579620300034}, doi = {10.1016/j.bdr.2020.100135}, year = {2020}, date = {2020-05-12}, journal = {Big Data Research}, volume = {19-20}, pages = {100135}, abstract = {The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces LadonSpark, an open-source and non-commercial solution to configure and deploy a Spark cluster automatically. It has been specially designed for easy and efficient management of a Spark cluster with a friendly graphical user interface to automate the deployment of a cluster and to start up the distributed file system of Hadoop quickly. Moreover, LadonSpark includes the functionality of integrating any algorithm into the system. That is, the user only needs to provide the executable file and the number of required inputs for proper parametrization. Source codes developed in Scala, R, Python, or Java can be supported on LadonSpark. Besides, clustering, regression, classification, and association rules algorithms are already integrated so that users can test its usability from its initial installation.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The vast amount of data stored nowadays has turned big data analytics into a very trendy research field. The Spark distributed computing platform has emerged as a dominant and widely used paradigm for cluster deployment and big data analytics. However, to get started up is still a task that may take much time when manually done, due to the requisites that all nodes must fulfill. This work introduces LadonSpark, an open-source and non-commercial solution to configure and deploy a Spark cluster automatically. It has been specially designed for easy and efficient management of a Spark cluster with a friendly graphical user interface to automate the deployment of a cluster and to start up the distributed file system of Hadoop quickly. Moreover, LadonSpark includes the functionality of integrating any algorithm into the system. That is, the user only needs to provide the executable file and the number of required inputs for proper parametrization. Source codes developed in Scala, R, Python, or Java can be supported on LadonSpark. Besides, clustering, regression, classification, and association rules algorithms are already integrated so that users can test its usability from its initial installation. |
L. Melgar-García and D. Gutiérrez-Avilés and C. Rubio-Escudero and A. Troncoso High-content screening images streaming analysis using the STriGen methodology Conference SAC 35th Annual ACM Symposium on Applied Computing, 2020. @conference{Melgar20_SAC, title = {High-content screening images streaming analysis using the STriGen methodology}, author = {L. Melgar-García and D. Gutiérrez-Avilés and C. Rubio-Escudero and A. Troncoso }, doi = {doi.org/10.1145/3341105.3374071}, year = {2020}, date = {2020-03-01}, booktitle = {SAC 35th Annual ACM Symposium on Applied Computing}, pages = {537-539}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
2019 |
J. F. Torres and D. Gutiérrez-Avilés and A. Troncoso and F. Martínez-Álvarez Random Hyper-Parameter Search-Based Deep Neural Network for Power Consumption Forecasting Conference IWANN 15th International Work-Conference on Artificial Neural Networks, 11506 , Lecture Notes in Computer Science 2019. @conference{TORRES19-2, title = {Random Hyper-Parameter Search-Based Deep Neural Network for Power Consumption Forecasting}, author = {J. F. Torres and D. Gutiérrez-Avilés and A. Troncoso and F. Martínez-Álvarez}, url = {https://link.springer.com/chapter/10.1007/978-3-030-20521-8_22}, doi = {https://doi.org/10.1007/978-3-030-20521-8_22}, year = {2019}, date = {2019-05-16}, booktitle = {IWANN 15th International Work-Conference on Artificial Neural Networks}, volume = {11506}, pages = {259-269}, series = {Lecture Notes in Computer Science}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
A. M. Fernández and D. Gutiérrez-Avilés and A. Troncoso and F. Martínez-Álvarez Real-Time Big Data Analytics in Smart Cities from LoRa-based IoT Networks Conference SOCO 14th International Conference on Soft Computing Models in Industrial and Environmental Applications, Advances in Intelligent Systems and Computing 2019. @conference{SOCO2019, title = {Real-Time Big Data Analytics in Smart Cities from LoRa-based IoT Networks}, author = {A. M. Fernández and D. Gutiérrez-Avilés and A. Troncoso and F. Martínez-Álvarez}, url = {https://link.springer.com/chapter/10.1007/978-3-030-20055-8_9}, year = {2019}, date = {2019-01-01}, booktitle = {SOCO 14th International Conference on Soft Computing Models in Industrial and Environmental Applications}, series = {Advances in Intelligent Systems and Computing}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
2018 |
D. Gutiérrez-Avilés and R. Giráldez and F. J. Gil-Cumbreras and C. Rubio-Escudero TRIQ: a new method to evaluate triclusters Journal Article BioData Mining, 11 (1), pp. 15, 2018. @article{Gutierrez-Aviles2018, title = {TRIQ: a new method to evaluate triclusters}, author = {D. Gutiérrez-Avilés and R. Giráldez and F. J. Gil-Cumbreras and C. Rubio-Escudero}, url = {https://biodatamining.biomedcentral.com/articles/10.1186/s13040-018-0177-5}, doi = {10.1186/s13040-018-0177-5}, year = {2018}, date = {2018-01-01}, journal = {BioData Mining}, volume = {11}, number = {1}, pages = {15}, abstract = {Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. The standard for validation of triclustering is based on three different measures: correlation, graphic similarity of the patterns and functional annotations for the genes extracted from the Gene Ontology project (GO).}, keywords = {}, pubstate = {published}, tppubtype = {article} } Triclustering has shown to be a valuable tool for the analysis of microarray data since its appearance as an improvement of classical clustering and biclustering techniques. The standard for validation of triclustering is based on three different measures: correlation, graphic similarity of the patterns and functional annotations for the genes extracted from the Gene Ontology project (GO). |
D. Gutiérrez-Avilés and J. A. Fábregas and J. Tejedor and F. Martínez-Álvarez and A. Troncoso and J. C. Riquelme SmartFD: A real big data application for electrical fraud detection Conference HAIS 13th International Conference on Hybrid Artificial Intelligence Systems, Lecture Notes in Computer Science 2018. @conference{HAIS2018, title = {SmartFD: A real big data application for electrical fraud detection}, author = {D. Gutiérrez-Avilés and J. A. Fábregas and J. Tejedor and F. Martínez-Álvarez and A. Troncoso and J. C. Riquelme}, url = {https://link.springer.com/chapter/10.1007/978-3-319-92639-1_11}, year = {2018}, date = {2018-01-01}, booktitle = {HAIS 13th International Conference on Hybrid Artificial Intelligence Systems}, series = {Lecture Notes in Computer Science}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
2016 |
D. Gutiérrez-Avilés and C. Rubio-Escudero TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms Conference Hybrid Artificial Intelligent Systems: 11th International Conference, HAIS 2016, Seville, Spain, April 18-20, 2016, Proceedings, Lecture Notes in Computer Science 2016. @conference{Gutiérrez-Avilés2016, title = {TRIQ: A Comprehensive Evaluation Measure for Triclustering Algorithms}, author = {D. Gutiérrez-Avilés and C. Rubio-Escudero}, url = {https://link.springer.com/chapter/10.1007/978-3-319-32034-2_56}, year = {2016}, date = {2016-01-01}, booktitle = {Hybrid Artificial Intelligent Systems: 11th International Conference, HAIS 2016, Seville, Spain, April 18-20, 2016, Proceedings}, series = {Lecture Notes in Computer Science}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
2015 |
D. Gutiérrez-Avilés and C. Rubio-Escudero MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data Journal Article Evolutionary Bioinformatics, 11 , pp. 121—135, 2015. @article{Gutierrez-Aviles2015, title = {MSL: A Measure to Evaluate Three-dimensional Patterns in Gene Expression Data}, author = {D. Gutiérrez-Avilés and C. Rubio-Escudero}, url = {https://journals.sagepub.com/doi/10.4137/EBO.S25822}, doi = {10.4137/EBO.S25822}, year = {2015}, date = {2015-01-01}, journal = {Evolutionary Bioinformatics}, volume = {11}, pages = {121—135}, abstract = {icroarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes, experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the angles of the slopes formed by each profile formed by the genes, conditions, and times of the tricluster.}, keywords = {}, pubstate = {published}, tppubtype = {article} } icroarray technology is highly used in biological research environments due to its ability to monitor the RNA concentration levels. The analysis of the data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior. Biclustering relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. Triclustering appears for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. These triclusters provide hidden information in the form of behavior patterns from temporal experiments with microarrays relating subsets of genes, experimental conditions, and time points. We present an evaluation measure for triclusters called Multi Slope Measure, based on the similarity among the angles of the slopes formed by each profile formed by the genes, conditions, and times of the tricluster. |
F. Martínez-Álvarez and D. Gutiérrez-Avilés and A. Morales-Esteban and J. Reyes and J. L. Amaro-Mellado and C. Rubio-Escudero A Novel Method for Seismogenic Zoning Based on Triclustering: Application to the Iberian Peninsula Journal Article Entropy, 17 (7), pp. 5000-5021, 2015. @article{martinez2015, title = {A Novel Method for Seismogenic Zoning Based on Triclustering: Application to the Iberian Peninsula}, author = {F. Martínez-Álvarez and D. Gutiérrez-Avilés and A. Morales-Esteban and J. Reyes and J. L. Amaro-Mellado and C. Rubio-Escudero}, url = {https://www.mdpi.com/1099-4300/17/7/5000}, doi = {10.3390/e17075000}, year = {2015}, date = {2015-01-01}, journal = {Entropy}, volume = {17}, number = {7}, pages = {5000-5021}, abstract = {A previous definition of seismogenic zones is required to do a probabilistic seismic hazard analysis for areas of spread and low seismic activity. Traditional zoning methods are based on the availabl seismic catalog and the geological structures. It is admitted that thermal and resistant parameters of the crust provide better criteria for zoning. Nonetheless, the working out of the rheological profiles causes a great uncertainty. This has generated inconsistencies, as different zones have been proposed for the same area. A new method for seismogenic zoning by means of triclustering is proposed in this research. The main advantage is that it is solely based on seismic data. Almost no human decision is made, and therefore, the method is nearly non-biased. To assess its performance, the method has been applied to the Iberian Peninsula, which is characterized by the occurrence of small to moderate magnitude earthquakes. The catalog of the National Geographic Institute of Spain has been used. The output map is checked for validity with the geology. Moreover, a geographic information system has been used for two purposes. First, the obtained zones have been depicted within it. Second, the data have been used to calculate the seismic parameters (b-value, annual rate). Finally, the results have been compared to Kohonen’s self-organizing maps.}, keywords = {}, pubstate = {published}, tppubtype = {article} } A previous definition of seismogenic zones is required to do a probabilistic seismic hazard analysis for areas of spread and low seismic activity. Traditional zoning methods are based on the availabl seismic catalog and the geological structures. It is admitted that thermal and resistant parameters of the crust provide better criteria for zoning. Nonetheless, the working out of the rheological profiles causes a great uncertainty. This has generated inconsistencies, as different zones have been proposed for the same area. A new method for seismogenic zoning by means of triclustering is proposed in this research. The main advantage is that it is solely based on seismic data. Almost no human decision is made, and therefore, the method is nearly non-biased. To assess its performance, the method has been applied to the Iberian Peninsula, which is characterized by the occurrence of small to moderate magnitude earthquakes. The catalog of the National Geographic Institute of Spain has been used. The output map is checked for validity with the geology. Moreover, a geographic information system has been used for two purposes. First, the obtained zones have been depicted within it. Second, the data have been used to calculate the seismic parameters (b-value, annual rate). Finally, the results have been compared to Kohonen’s self-organizing maps. |
2014 |
D. Gutiérrez-Avilés and C. Rubio-Escudero Mining 3D Patterns from Gene Expression Temporal Data: A New Tricluster Evaluation Measure Journal Article The Scientific World Journal, 2014 , pp. 1-16, 2014. @article{Gutierrez-Aviles2014, title = {Mining 3D Patterns from Gene Expression Temporal Data: A New Tricluster Evaluation Measure}, author = {D. Gutiérrez-Avilés and C. Rubio-Escudero}, url = {http://www.hindawi.com/journals/tswj/2014/624371/}, doi = {10.1155/2014/624371}, year = {2014}, date = {2014-01-01}, journal = {The Scientific World Journal}, volume = {2014}, pages = {1-16}, abstract = {Microarrays have revolutionized biotechnological research. The analysis of new data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are applied to create groups of genes that exhibit a similar behavior. Biclustering emerges as a valuable tool for microarray data analysis since it relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. However, if a third dimension appears in the data, triclustering is the appropriate tool for the analysis. This occurs in longitudinal experiments in which the genes are evaluated under conditions at several time points. All clustering, biclustering, and triclustering techniques guide their search for solutions by a measure that evaluates the quality of clusters. We present an evaluation measure for triclusters called Mean Square Residue 3D. This measure is based on the classic biclustering measure Mean Square Residue. Mean Square Residue 3D has been applied to both synthetic and real data and it has proved to be capable of extracting groups of genes with homogeneous patterns in subsets of conditions and times, and these groups have shown a high correlation level and they are also related to their functional annotations extracted from the Gene Ontology project.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Microarrays have revolutionized biotechnological research. The analysis of new data generated represents a computational challenge due to the characteristics of these data. Clustering techniques are applied to create groups of genes that exhibit a similar behavior. Biclustering emerges as a valuable tool for microarray data analysis since it relaxes the constraints for grouping, allowing genes to be evaluated only under a subset of the conditions. However, if a third dimension appears in the data, triclustering is the appropriate tool for the analysis. This occurs in longitudinal experiments in which the genes are evaluated under conditions at several time points. All clustering, biclustering, and triclustering techniques guide their search for solutions by a measure that evaluates the quality of clusters. We present an evaluation measure for triclusters called Mean Square Residue 3D. This measure is based on the classic biclustering measure Mean Square Residue. Mean Square Residue 3D has been applied to both synthetic and real data and it has proved to be capable of extracting groups of genes with homogeneous patterns in subsets of conditions and times, and these groups have shown a high correlation level and they are also related to their functional annotations extracted from the Gene Ontology project. |
D. Gutiérrez-Avilés and C. Rubio-Escudero LSL: A new measure to evaluate triclusters Conference 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2014. @conference{Gutierrez-Aviles2014b, title = {LSL: A new measure to evaluate triclusters}, author = {D. Gutiérrez-Avilés and C. Rubio-Escudero}, url = {http://ieeexplore.ieee.org/document/6999244/}, year = {2014}, date = {2014-01-01}, booktitle = {2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
D. Gutiérrez-Avilés and C. Rubio-Escudero and F. Martínez-Álvarez and J.C. Riquelme TriGen: A genetic algorithm to mine triclusters in temporal gene expression data Journal Article Neurocomputing, 132 , pp. 42-53, 2014. @article{GUTIERREZAVILES201442, title = {TriGen: A genetic algorithm to mine triclusters in temporal gene expression data}, author = {D. Gutiérrez-Avilés and C. Rubio-Escudero and F. Martínez-Álvarez and J.C. Riquelme}, url = {http://www.sciencedirect.com/science/article/pii/S0925231213011004}, doi = {10.1016/j.neucom.2013.03.061}, year = {2014}, date = {2014-01-01}, journal = {Neurocomputing}, volume = {132}, pages = {42-53}, abstract = {Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. We present the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously. We have used TriGen to mine datasets related to synthetic data, yeast (Saccharomyces cerevisiae) cell cycle and human inflammation and host response to injury experiments. TriGen has proved to be capable of extracting groups of genes with similar patterns in subsets of conditions and times, and these groups have shown to be related in terms of their functional annotations extracted from the Gene Ontology.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Analyzing microarray data represents a computational challenge due to the characteristics of these data. Clustering techniques are widely applied to create groups of genes that exhibit a similar behavior under the conditions tested. Biclustering emerges as an improvement of classical clustering since it relaxes the constraints for grouping genes to be evaluated only under a subset of the conditions and not under all of them. However, this technique is not appropriate for the analysis of longitudinal experiments in which the genes are evaluated under certain conditions at several time points. We present the TriGen algorithm, a genetic algorithm that finds triclusters of gene expression that take into account the experimental conditions and the time points simultaneously. We have used TriGen to mine datasets related to synthetic data, yeast (Saccharomyces cerevisiae) cell cycle and human inflammation and host response to injury experiments. TriGen has proved to be capable of extracting groups of genes with similar patterns in subsets of conditions and times, and these groups have shown to be related in terms of their functional annotations extracted from the Gene Ontology. |