Federico Divina obtained his Ph.D. in Artificial Intelligence from the Vrije Universiteit of Amsterdam, and after that he worked as a postdoc at the University of Tilburg, within the European project NEWTIES. In 2006 he moved to the Pablo de Olavide University, where he is actually an Associate Professor.
He has been working on knowledge extraction since his Ph.D. thesis at the Vrije Universiteit of Amsterdam. He has extensive experience in the application of Machine Learning, especially techniques based on Soft Computing, for the extraction of knowledge from massive data.
His main research interests are:
- Bioinformatics
- Evolutionary Computation
- Machine Learning
- Big Data
Projects
Federico Divina has participated in various research project projects, for instance:
- Differential: this project aims to develop new tools and methods to manage and analyse information coming from several sources with the final goal of better understanding how and when energy is consumed in distributed facilities. This project was developed as a coordinated project with three complementary research groups from three different universities (Universidad de Granada, Universidad Pablo de Olavide and Universidad de Castilla La Mancha).
- GALICIAME: project that aimed at applying machine learning tools in order to extract knowledge from genetic data related to spinal muscular atrophy (SMA), in collaboration with the “Centro Andaluz de Biología del Desarrollo” (CABD).
- NEWTIES: EU project that aimed at developing an artificial society. This project involved the Vrije Universiteit van Amsterm, the University of Tilburg, the Napier University, University of Surrey, Napier University and Eötvös Loránd University.
Publications
For a complete list of my publications, please visit my Google Scholar Profile or my ORCID.
2022 |
F. Morales and M. García-Torres and G. Velázquez and F. Daumas-Ladouce and P. Gardel-Sotomayor and F. Gómez-Vela and F. Divina and J.L. Vázquez Noguera and C. Sauer Ayala and D. Pinto-Roa Analysis of Electric Energy Consumption Profiles Using a Machine Learning Approach: A Paraguayan Case Study Journal Article Electronics, 11 (2), pp. 267, 2022. @article{morales2022analysis, title = {Analysis of Electric Energy Consumption Profiles Using a Machine Learning Approach: A Paraguayan Case Study}, author = {F. Morales and M. García-Torres and G. Velázquez and F. Daumas-Ladouce and P. Gardel-Sotomayor and F. Gómez-Vela and F. Divina and J.L. Vázquez Noguera and C. Sauer Ayala and D. Pinto-Roa}, url = {https://www.mdpi.com/2079-9292/11/2/267}, doi = {10.3390/electronics11020267}, year = {2022}, date = {2022-01-01}, journal = {Electronics}, volume = {11}, number = {2}, pages = {267}, publisher = {Multidisciplinary Digital Publishing Institute pubstate = published}, abstract = {Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters. |
G. Velázquez and F. Morales and M. García-Torres and F. Gómez-Vela and F. Divina and J.L. Vázquez Noguera and F. Daumas-Ladouce and C. Ayala and D. Pinto-Roaand P. Gardel-Sotomayor Distribution level Electric current consumption and meteorological data set of the East region of Paraguay Journal Article Data in Brief, 40 , pp. 107699, 2022. @article{velazquez2022distribution, title = {Distribution level Electric current consumption and meteorological data set of the East region of Paraguay}, author = {G. Velázquez and F. Morales and M. García-Torres and F. Gómez-Vela and F. Divina and J.L. Vázquez Noguera and F. Daumas-Ladouce and C. Ayala and D. Pinto-Roaand P. Gardel-Sotomayor}, url = {https://www.sciencedirect.com/science/article/pii/S2352340921009744}, doi = {10.1016/j.dib.2021.107699}, year = {2022}, date = {2022-01-01}, journal = {Data in Brief}, volume = {40}, pages = {107699}, publisher = {Elsevier pubstate = published}, abstract = {This paper presents a data set with information on meteorological data and electricity consumption in the department of Alto Paraná, Paraguay. The meteorological data were registered every three hours at the Aeropuerto Guarani, Department of Alto Paraná, which belongs to the Dirección Nacional de Aeronáutica Civil of Paraguay. The final data consists of a total of 22.445 records of temperature, relative humidity, wind speed and atmospheric pressure. On the other hand, the electrical energy consumption data set contains a total of 1.848.947 records, all of them coming from the one hundred and fifteen feeders located throughout the Alto Paraná region of Paraguay. Electrical energy consumption data was provided by Administración Nacional de Electricidad (ANDE). The analysis of this data can yield insights regarding the energy consumption in the area.}, keywords = {}, pubstate = {published}, tppubtype = {article} } This paper presents a data set with information on meteorological data and electricity consumption in the department of Alto Paraná, Paraguay. The meteorological data were registered every three hours at the Aeropuerto Guarani, Department of Alto Paraná, which belongs to the Dirección Nacional de Aeronáutica Civil of Paraguay. The final data consists of a total of 22.445 records of temperature, relative humidity, wind speed and atmospheric pressure. On the other hand, the electrical energy consumption data set contains a total of 1.848.947 records, all of them coming from the one hundred and fifteen feeders located throughout the Alto Paraná region of Paraguay. Electrical energy consumption data was provided by Administración Nacional de Electricidad (ANDE). The analysis of this data can yield insights regarding the energy consumption in the area. |
2021 |
A. Lopez-Fernandez and D. Rodriguez-Baena and F. Gomez-Vela and F. Divina and M. Garcia-Torres A multi-GPU biclustering algorithm for binary datasets Journal Article Journal of Parallel and Distributed Computing, 147 , pp. 209–219, 2021. @article{lopez2021multi, title = {A multi-GPU biclustering algorithm for binary datasets}, author = {A. Lopez-Fernandez and D. Rodriguez-Baena and F. Gomez-Vela and F. Divina and M. Garcia-Torres}, doi = {10.1016/j.jpdc.2020.09.009}, year = {2021}, date = {2021-01-01}, journal = {Journal of Parallel and Distributed Computing}, volume = {147}, pages = {209--219}, publisher = {Elsevier pubstate = published}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
R. Parra and V. Ojeda and J.L. Vázquez Noguera and M. García-Torres and J.C. Mello-Román and C. Villalba and J. Facon and F. Divina and O. Cardozo and V. Castillo A Trust-Based Methodology to Evaluate Deep Learning Models for Automatic Diagnosis of Ocular Toxoplasmosis from Fundus Images Journal Article Diagnostics, 11 (11), pp. 1951, 2021. @article{parra2021trust, title = {A Trust-Based Methodology to Evaluate Deep Learning Models for Automatic Diagnosis of Ocular Toxoplasmosis from Fundus Images}, author = {R. Parra and V. Ojeda and J.L. Vázquez Noguera and M. García-Torres and J.C. Mello-Román and C. Villalba and J. Facon and F. Divina and O. Cardozo and V. Castillo}, doi = {10.3390/diagnostics11111951}, year = {2021}, date = {2021-01-01}, journal = {Diagnostics}, volume = {11}, number = {11}, pages = {1951}, publisher = {Multidisciplinary Digital Publishing Institute pubstate = published}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
P.M. Martínez-García and M. García-Torres and F. Divina and J. Terrón-Bautista and I. Delgado-Sainz and F. Gómez-Vela and F. Cortés-Ledesma Genome-wide prediction of topoisomerase II $beta$ binding by architectural factors and chromatin accessibility Journal Article PLoS computational biology, 17 (1), pp. e1007814, 2021. @article{martinez2021genome, title = {Genome-wide prediction of topoisomerase II $beta$ binding by architectural factors and chromatin accessibility}, author = {P.M. Martínez-García and M. García-Torres and F. Divina and J. Terrón-Bautista and I. Delgado-Sainz and F. Gómez-Vela and F. Cortés-Ledesma}, doi = {10.1371/journal.pcbi.1007814}, year = {2021}, date = {2021-01-01}, journal = {PLoS computational biology}, volume = {17}, number = {1}, pages = {e1007814}, publisher = {Public Library of Science San Francisco, CA USA pubstate = published}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
S.A. Grillo and J.C. Román and J.D. Mello-Román and J.L. Vázquez Noguera and M. García-Torres and F. Divina and P.E. Sotomayor Adjacent Inputs With Different Labels and Hardness in Supervised Learning Journal Article IEEE Access, pp. 162487–162498, 2021. @article{grillo2021adjacent, title = {Adjacent Inputs With Different Labels and Hardness in Supervised Learning}, author = {S.A. Grillo and J.C. Román and J.D. Mello-Román and J.L. Vázquez Noguera and M. García-Torres and F. Divina and P.E. Sotomayor}, doi = {10.1109/ACCESS.2021.3131150 volume=9}, year = {2021}, date = {2021-01-01}, journal = {IEEE Access}, pages = {162487--162498}, publisher = {IEEE pubstate = published}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
J. Ayala and M. García-Torres and J.L. Vázquez Noguera and F. Gómez-Vela and F. Divina Technical analysis strategy optimization using a machine learning approach in stock market indices Journal Article Knowledge-Based Systems, pp. 107119, 2021. @article{ayala2021technical, title = {Technical analysis strategy optimization using a machine learning approach in stock market indices}, author = {J. Ayala and M. García-Torres and J.L. Vázquez Noguera and F. Gómez-Vela and F. Divina}, doi = {10.1016/j.knosys.2021.107119 volume=225}, year = {2021}, date = {2021-01-01}, journal = {Knowledge-Based Systems}, pages = {107119}, publisher = {Elsevier pubstate = published}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
F. Divina and F. Gómez-Vela and M. García-Torres Advanced Optimization Methods and Big Data Applications in Energy Demand Forecast Journal Article Applied Sciences, 11 (3), pp. 1261, 2021. @article{divina2021advanced, title = {Advanced Optimization Methods and Big Data Applications in Energy Demand Forecast}, author = {F. Divina and F. Gómez-Vela and M. García-Torres}, url = {https://www.mdpi.com/2076-3417/11/3/1261/htm}, doi = {10.3390/app11031261}, year = {2021}, date = {2021-01-01}, journal = {Applied Sciences}, volume = {11}, number = {3}, pages = {1261}, publisher = {Multidisciplinary Digital Publishing Institute}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
2020 |
F. Divina and J. F. Torres and M. García-Torres and F. Martínez-Álvarez and A. Troncoso Hybridizing deep learning and neuroevolution: Application to the Spanish short-term electric energy consumption forecasting Journal Article Applied Sciences, 10 (16), pp. 5487, 2020. @article{DIVINA2020, title = {Hybridizing deep learning and neuroevolution: Application to the Spanish short-term electric energy consumption forecasting}, author = {F. Divina and J. F. Torres and M. García-Torres and F. Martínez-Álvarez and A. Troncoso}, url = {https://www.mdpi.com/2076-3417/10/16/5487}, doi = {https://doi.org/10.3390/app10165487}, year = {2020}, date = {2020-07-30}, journal = {Applied Sciences}, volume = {10}, number = {16}, pages = {5487}, abstract = {The electric energy production would be much more efficient if accurate estimations of the future demand were available, since these would allow allocating only the resources needed for the production of the right amount of energy required. With this motivation in mind, we propose a strategy, based on neuroevolution, that can be used to this aim. Our proposal uses a genetic algorithm in order to find a sub-optimal set of hyper-parameters for configuring a deep neural network, which can then be used for obtaining the forecasting. Such a strategy is justified by the observation that the performances achieved by deep neural networks are strongly dependent on the right setting of the hyper-parameters, and genetic algorithms have shown excellent search capabilities in huge search spaces. Moreover, we base our proposal on a distributed computing platform, which allows its use on a large time-series. In order to assess the performances of our approach, we have applied it to a large dataset, related to the electric energy consumption registered in Spain over almost 10 years. Experimental results confirm the validity of our proposal since it outperforms all other forecasting techniques to which it has been compared.}, keywords = {}, pubstate = {published}, tppubtype = {article} } The electric energy production would be much more efficient if accurate estimations of the future demand were available, since these would allow allocating only the resources needed for the production of the right amount of energy required. With this motivation in mind, we propose a strategy, based on neuroevolution, that can be used to this aim. Our proposal uses a genetic algorithm in order to find a sub-optimal set of hyper-parameters for configuring a deep neural network, which can then be used for obtaining the forecasting. Such a strategy is justified by the observation that the performances achieved by deep neural networks are strongly dependent on the right setting of the hyper-parameters, and genetic algorithms have shown excellent search capabilities in huge search spaces. Moreover, we base our proposal on a distributed computing platform, which allows its use on a large time-series. In order to assess the performances of our approach, we have applied it to a large dataset, related to the electric energy consumption registered in Spain over almost 10 years. Experimental results confirm the validity of our proposal since it outperforms all other forecasting techniques to which it has been compared. |
F. M. Delgado-Chaves and F. Gómez-Vela and F. Divina and M. García-Torres and D. S. Rodríguez-Baena Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks Journal Article Genes, 11 (7), pp. 831-864, 2020. @article{Delgado-Chaves20, title = {Computational Analysis of the Global Effects of Ly6E in the Immune Response to Coronavirus Infection Using Gene Networks}, author = {F. M. Delgado-Chaves and F. Gómez-Vela and F. Divina and M. García-Torres and D. S. Rodríguez-Baena}, year = {2020}, date = {2020-01-01}, journal = {Genes}, volume = {11}, number = {7}, pages = {831-864}, abstract = {Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E∆HSC compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E∆HSC mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E∆HSC mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Gene networks have arisen as a promising tool in the comprehensive modeling and analysis of complex diseases. Particularly in viral infections, the understanding of the host-pathogen mechanisms, and the immune response to these, is considered a major goal for the rational design of appropriate therapies. For this reason, the use of gene networks may well encourage therapy-associated research in the context of the coronavirus pandemic, orchestrating experimental scrutiny and reducing costs. In this work, gene co-expression networks were reconstructed from RNA-Seq expression data with the aim of analyzing the time-resolved effects of gene Ly6E in the immune response against the coronavirus responsible for murine hepatitis (MHV). Through the integration of differential expression analyses and reconstructed networks exploration, significant differences in the immune response to virus were observed in Ly6E∆HSC compared to wild type animals. Results show that Ly6E ablation at hematopoietic stem cells (HSCs) leads to a progressive impaired immune response in both liver and spleen. Specifically, depletion of the normal leukocyte mediated immunity and chemokine signaling is observed in the liver of Ly6E∆HSC mice. On the other hand, the immune response in the spleen, which seemed to be mediated by an intense chromatin activity in the normal situation, is replaced by ECM remodeling in Ly6E∆HSC mice. These findings, which require further experimental characterization, could be extrapolated to other coronaviruses and motivate the efforts towards novel antiviral approaches. |
D. S. Rodríguez-Baena and F. Gómez-Vela and M. García-Torres and F. Divina and C. D. Barranco and N- Díaz-Díaz and M. Jimenez and G. Montalvo Identifying livestock behavior patterns based on accelerometer dataset Journal Article Journal of Computational Science, 41 , pp. 101076, 2020. @article{Rodriguez-Baena20, title = {Identifying livestock behavior patterns based on accelerometer dataset}, author = {D. S. Rodríguez-Baena and F. Gómez-Vela and M. García-Torres and F. Divina and C. D. Barranco and N- Díaz-Díaz and M. Jimenez and G. Montalvo}, url = {https://doi.org/10.1016/j.jocs.2020.101076}, doi = {10.1016/j.jocs.2020.101076}, year = {2020}, date = {2020-01-01}, journal = {Journal of Computational Science}, volume = {41}, pages = {101076}, abstract = {In large livestock farming it would be beneficial to be able to automatically detect behaviors in animals. In fact, this would allow to estimate the health status of individuals, providing valuable insight to stock raisers. Traditionally this process has been carried out manually, relying only on the experience of the breeders. Such an approach is effective for a small number of individuals. However, in large breeding farms this may not represent the best approach, since, in this way, not all the animals can be effectively monitored all the time. Moreover, the traditional approach heavily rely on human experience, which cannot be always taken for granted. To this aim, in this paper, we propose a new method for automatically detecting activity and inactivity time periods of animals, as a behavior indicator of livestock. In order to do this, we collected data with sensors located in the body of the animals to be analyzed. In particular, the reliability of the method was tested with data collected on Iberian pigs and calves. Results confirm that the proposed method can help breeders in detecting activity and inactivity periods for large livestock farming.}, keywords = {}, pubstate = {published}, tppubtype = {article} } In large livestock farming it would be beneficial to be able to automatically detect behaviors in animals. In fact, this would allow to estimate the health status of individuals, providing valuable insight to stock raisers. Traditionally this process has been carried out manually, relying only on the experience of the breeders. Such an approach is effective for a small number of individuals. However, in large breeding farms this may not represent the best approach, since, in this way, not all the animals can be effectively monitored all the time. Moreover, the traditional approach heavily rely on human experience, which cannot be always taken for granted. To this aim, in this paper, we propose a new method for automatically detecting activity and inactivity time periods of animals, as a behavior indicator of livestock. In order to do this, we collected data with sensors located in the body of the animals to be analyzed. In particular, the reliability of the method was tested with data collected on Iberian pigs and calves. Results confirm that the proposed method can help breeders in detecting activity and inactivity periods for large livestock farming. |
T. Vanhaeren and F. Divina and M. García-Torres and F. Gómez-Vela and W. Vanhoof and P. M. Martínez-García A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions Journal Article Genes, 11 (9), pp. 985, 2020. @article{Vanhaeren20, title = {A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions}, author = {T. Vanhaeren and F. Divina and M. García-Torres and F. Gómez-Vela and W. Vanhoof and P. M. Martínez-García}, year = {2020}, date = {2020-01-01}, journal = {Genes}, volume = {11}, number = {9}, pages = {985}, abstract = {The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin}, keywords = {}, pubstate = {published}, tppubtype = {article} } The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin |
2019 |
M. García-Torres and D. Becerra-Alonso and F. A Gómez-Vela and F. Divina and I. López Cobo and F. Martínez-Álvarez Analysis of Student Achievement Scores: A Machine Learning Approach Conference ICEUTE 10th International Conference on EUropean Transnational Education, Advances in Intelligent Systems and Computing 2019. @conference{Garcia2019, title = {Analysis of Student Achievement Scores: A Machine Learning Approach}, author = {M. García-Torres and D. Becerra-Alonso and F. A Gómez-Vela and F. Divina and I. López Cobo and F. Martínez-Álvarez}, url = {https://link.springer.com/chapter/10.1007/978-3-030-20005-3_28}, year = {2019}, date = {2019-01-01}, booktitle = {ICEUTE 10th International Conference on EUropean Transnational Education}, pages = {275-284}, series = {Advances in Intelligent Systems and Computing}, keywords = {}, pubstate = {published}, tppubtype = {conference} } |
F. Gómez-Vela and F. M Delgado-Chaves and D.S. Rodríguez-Baena and M. García-Torres and F. Divina Ensemble and Greedy Approach for the Reconstruction of Large Gene Co-Expression Networks Journal Article Entropy, 21 (12), pp. 1139, 2019. @article{Entropy2019, title = {Ensemble and Greedy Approach for the Reconstruction of Large Gene Co-Expression Networks}, author = {F. Gómez-Vela and F. M Delgado-Chaves and D.S. Rodríguez-Baena and M. García-Torres and F. Divina}, url = {https://www.mdpi.com/1099-4300/21/12/1139}, doi = {https://doi.org/10.3390/e21121139}, year = {2019}, date = {2019-01-01}, journal = {Entropy}, volume = {21}, number = {12}, pages = {1139}, abstract = {Gene networks have become a powerful tool in the comprehensive analysis of gene expression. Due to the increasing amount of available data, computational methods for networks generation must deal with the so-called curse of dimensionality in the quest for the reliability of the obtained results. In this context, ensemble strategies have significantly improved the precision of results by combining different measures or methods. On the other hand, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for gene networks inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves topological features. Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder, revealing an innate immunity-mediated response to this pathology. These results are indicative of the method’s potential in the field of biomarkers discovery and characterization.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Gene networks have become a powerful tool in the comprehensive analysis of gene expression. Due to the increasing amount of available data, computational methods for networks generation must deal with the so-called curse of dimensionality in the quest for the reliability of the obtained results. In this context, ensemble strategies have significantly improved the precision of results by combining different measures or methods. On the other hand, structure optimization techniques are also important in the reduction of the size of the networks, not only improving their topology but also keeping a positive prediction ratio. In this work, we present Ensemble and Greedy networks (EnGNet), a novel two-step method for gene networks inference. First, EnGNet uses an ensemble strategy for co-expression networks generation. Second, a greedy algorithm optimizes both the size and the topological features of the network. Not only do achieved results show that this method is able to obtain reliable networks, but also that it significantly improves topological features. Moreover, the usefulness of the method is proven by an application to a human dataset on post-traumatic stress disorder, revealing an innate immunity-mediated response to this pathology. These results are indicative of the method’s potential in the field of biomarkers discovery and characterization. |
E.L. Mangas and A. Rubio and R. Álvarez-Marín and G. Labrador-Herrera and J. Pachón and M. Eugenia Pachón-Ibáñez and F. Divina and A.J. Pérez-Pulido Microbial Genomics, pp. mgen000309, 2019. @article{MG2019, title = {Pangenome of Acinetobacter baumannii uncovers two groups of genomes, one of them with genes involved in CRISPR/Cas defence systems associated with the absence of plasmids and exclusive genes for biofilm formation}, author = {E.L. Mangas and A. Rubio and R. Álvarez-Marín and G. Labrador-Herrera and J. Pachón and M. Eugenia Pachón-Ibáñez and F. Divina and A.J. Pérez-Pulido}, url = {https://www.microbiologyresearch.org/content/journal/mgen/10.1099/mgen.0.000309}, doi = {https://doi.org/10.1099/mgen.0.000309}, year = {2019}, date = {2019-01-01}, journal = {Microbial Genomics}, pages = {mgen000309}, abstract = {Acinetobacter baumannii is an opportunistic bacterium that causes hospital-acquired infections with a high mortality and morbidity, since there are strains resistant to virtually any kind of antibiotic. The chase to find novel strategies to fight against this microbe can be favoured by knowledge of the complete catalogue of genes of the species, and their relationship with the specific characteristics of different isolates. In this work, we performed a genomics analysis of almost 2500 strains. Two different groups of genomes were found based on the number of shared genes. One of these groups rarely has plasmids, and bears clustered regularly interspaced short palindromic repeat (CRISPR) sequences, in addition to CRISPR-associated genes (cas genes) or restriction-modification system genes. This fact strongly supports the lack of plasmids. Furthermore, the scarce plasmids in this group also bear CRISPR sequences, and specifically contain genes involved in prokaryotic toxin–antitoxin systems that could either act as the still little known CRISPR type IV system or be the precursors of other novel CRISPR/Cas systems. In addition, a limited set of strains present a new cas9-like gene, which may complement the other cas genes in inhibiting the entrance of new plasmids into the bacteria. Finally, this group has exclusive genes involved in biofilm formation, which would connect CRISPR systems to the biogenesis of these bacterial resistance structures.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Acinetobacter baumannii is an opportunistic bacterium that causes hospital-acquired infections with a high mortality and morbidity, since there are strains resistant to virtually any kind of antibiotic. The chase to find novel strategies to fight against this microbe can be favoured by knowledge of the complete catalogue of genes of the species, and their relationship with the specific characteristics of different isolates. In this work, we performed a genomics analysis of almost 2500 strains. Two different groups of genomes were found based on the number of shared genes. One of these groups rarely has plasmids, and bears clustered regularly interspaced short palindromic repeat (CRISPR) sequences, in addition to CRISPR-associated genes (cas genes) or restriction-modification system genes. This fact strongly supports the lack of plasmids. Furthermore, the scarce plasmids in this group also bear CRISPR sequences, and specifically contain genes involved in prokaryotic toxin–antitoxin systems that could either act as the still little known CRISPR type IV system or be the precursors of other novel CRISPR/Cas systems. In addition, a limited set of strains present a new cas9-like gene, which may complement the other cas genes in inhibiting the entrance of new plasmids into the bacteria. Finally, this group has exclusive genes involved in biofilm formation, which would connect CRISPR systems to the biogenesis of these bacterial resistance structures. |