Miguel García Torres is an associate professor in the Escuela Politécnica Superior of the Universidad Pablo de Olavide. He received the BS degree in physics and the PhD degree in computer science from the Universidad de La Laguna, Tenerife, Spain, in 2001 and 2007, respectively. After obtaining the doctorate he held a postoc position in the Laboratory for Space Astrophysics and Theoretical Physics at the National institute of Aerospace Technology (INTA). There, he joined in the Gaia mission from the European Space Agency (ESA) and started to participate in the Gaia Data Processing and Analysis Consortium (DPAC) as a member of “Astrophysical Parameters”, Coordination Unit (CU8). He has been involved in the “Object Clustering Analysis” (OCA) Development Unit since then. His research areas of interests include machine learning, metaheuristics, big data, time series forecasting, bioinformatics and astrostatistics.
Publications
2024 |
M. García-Torres and D. P. Pinto-Roa and C. Núñez-Castillo and B. Quiñonez and G. Vázquez and M. Allegretti and M. E. García-Diaz Feature selection applied to QoS/QoE modeling on video and web-based mobile data services: An ordinal approach Journal Article In: Computer Communications, 2024. @article{garcia2024feature, Nowadays, mobile service providers perceive the user experience as a reliable indicator of the quality associated to a service. Given a set of Quality of Service (QoS) factors, the aim is to predict the Quality of Experience (QoE), measured in terms of the Mean Opinion Score (MOS). Although this problem is receiving much attention, there are still some challenges that require more research in order to find effective solutions for meeting user’s expectation in terms of service quality. A core challenge in this topic refers to the analysis of the contribution of each factor to the QoS/QoE Model. In this work, we study the mapping between QoS and QoE on video and web-based services using a machine learning approach. For such purpose, we design a lab-testing methodology to emulate different cellular transmission network scenarios. Then, we address the problem of inducing a predictive model and identifying relevant QoS factors. Results suggest that bandwidth is a key factor when analyzing user’s perception of service quality. |
F. Morales-Mareco and M. García-Torres and F. Divina and D. H Stalder and C. Sauer Machine learning for electric energy consumption forecasting: Application to the Paraguayan system Journal Article In: Logic Journal of the IGPL, pp. jzae035, 2024. @article{morales2024machine, In this paper we address the problem of short-term electric energy prediction using a time series forecasting approach applied to data generated by a Paraguayan electricity distribution provider. The dataset used in this work contains data collected over a three-year period. This is the first time that these data have been used; therefore, a preprocessing phase of the data was also performed. In particular, we propose a comparative study of various machine learning and statistical strategies with the objective of predicting the electric energy consumption for a given prediction horizon, in our case seven days, using historical data. In this paper we have tested the effectiveness of the techniques with different historical window sizes. Specifically, we considered two ensemble strategies, a neural network, a deep learning technique and linear regression. Moreover, in this study, we tested whether the inclusion of meteorological data can help achieve better predictions. In particular, we considered data regarding temperature, humidity, wind speed and atmospheric pressure registered during the three-year period of data collection. The results show that, in general, the deep learning approach obtains the best results and that such results are obtained when meteorological data are also considered. Moreover, when meteorological data is used, a smaller historical window size is required to obtain precise predictions. |
G. Sosa-Cabrera and S. Gómez-Guerrero and M. García-Torres and C. E Schaerer Feature selection: A perspective on inter-attribute cooperation Journal Article In: International Journal of Data Science and Analytics, vol. 17, no. 2, pp. 139–151, 2024. @article{sosa2024feature, High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development. |
F. Divina and M. García-Torres and F. Gómez-Vela and D. S. Rodriguez-Baena A stacking ensemble learning for Iberian pigs activity prediction: a time series forecasting approach Journal Article In: AIMS Mathematics, vol. 9, no. 5, pp. 13358–13384, 2024. @article{divina2024stacking, Automatic determination of abnormal animal activities can be helpful for the timely detection of signs of health and welfare problems. Usually, this problem is addressed as a classification problem, which typically requires manual annotation of behaviors. This manual annotation can introduce noise into the data and may not always be possible. This motivated us to address the problem as a time-series forecasting problem in which the activity of an animal can be predicted. In this work, different machine learning techniques were tested to obtain activity patterns for Iberian pigs. In particular, we propose a novel stacking ensemble learning approach that combines base learners with meta-learners to obtain the final predictive model. Results confirm the superior performance of the proposed method relative to the other tested strategies. We also explored the possibility of using predictive models trained on an animal to predict the activity of different animals on the same farm. As expected, the predictive performance degrades in this case, but it remains acceptable. The proposed method could be integrated into a monitoring system that may have the potential to transform the way farm animals are monitored, improving their health and welfare conditions, for example, by allowing the early detection of a possible health problem. |
2023 |
O. Cardozo and V. Ojeda and R. Parra and J. C. Mello-Román and J. L. Noguera Vázquez and M. García-Torres and F. Divina and S. Grillo and C. Villalba and J. Facon Dataset of fundus images for the diagnosis of ocular toxoplasmosis Journal Article In: Data in Brief, pp. 109056, 2023. @article{cardozo2023dataset, Toxoplasmosis chorioretinitis is commonly diagnosed by an ophthalmologist through the evaluation of the fundus images of a patient. Early detection of these lesions may help to prevent blindness. In this article we present a data set of fundus images labeled into three categories: healthy eye, inactive and active chorioretinitis. The dataset was developed by three ophthalmologists with expertise in toxoplasmosis detection using fundus images. The dataset will be of great use to researchers working on ophthalmic image analysis using artificial intelligence techniques for the automatic detection of toxoplasmosis chorioretinitis. |
M. García-Torres and R. Ruiz and F. Divina Evolutionary feature selection on high dimensional data using a search space reduction approach Journal Article In: Engineering Applications of Artificial Intelligence, vol. 117, pp. 105556, 2023. @article{garcia2023evolutionary, Feature selection is becoming more and more a challenging task due to the increase of the dimensionality of the data. The complexity of the interactions among features and the size of the search space make it unfeasible to find the optimal subset of features. In order to reduce the search space, feature grouping has arisen as an approach that allows to cluster feature according to the shared information about the class. On the other hand, metaheuristic algorithms have proven to achieve sub-optimal solutions within a reasonable time. In this work we propose a Scatter Search (SS) strategy that uses feature grouping to generate an initial population comprised of diverse and high quality solutions. Solutions are then evolved by applying random mechanisms in combination with the feature group structure, with the objective of maintaining during the search a population of good and, at the same time, as diverse as possible solutions. Not only does the proposed strategy provide the best subset of features found but it also reduces the redundancy structure of the data. We test the strategy on high dimensional data from biomedical and text-mining domains. The results are compared with those obtained by other adaptations of SS and other popular strategies. Results show that the proposed strategy can find, on average, the smallest subsets of features without degrading the performance of the classifier. |
M. Vázquez-Marrufo and E. Sarrias-Arrabal and M. García-Torres and R. Martín-Clemente and G. Izquierdo A systematic review of the application of machine-learning algorithms in multiple sclerosis Journal Article In: Neurología (English Edition), 2023. @article{vazquez2022systematic, Introduction: The applications of artificial intelligence, and in particular automatic learning or “machine learning” (ML), constitute both a challenge and a great opportunity in numerous scientific, technical, and clinical disciplines. Specific applications in the study of multiple sclerosis (MS) have been no exception, and constitute an area of increasing interest in recent years. Objective: We present a systematic review of the application of ML algorithms in MS. Materials and methods: We used the PubMed search engine, which allows free access to the MEDLINE medical database, to identify studies including the keywords “machine learning” and “multiple sclerosis.” We excluded review articles, studies written in languages other than English or Spanish, and studies that were mainly technical and did not specifically apply to MS. The final selection included 76 articles, and 38 were rejected. Conclusions: After the review process, we established 4 main applications of ML in MS: 1) classifying MS subtypes; 2) distinguishing patients with MS from healthy controls and individuals with other diseases; 3) predicting progression and response to therapeutic interventions; and 4) other applications. Results found to date have shown that ML algorithms may offer great support for health professionals both in clinical settings and in research into MS. |
G. Sosa-Cabrera and S. Gómez-Guerrero and M. García-Torres and C. E. Schaerer Feature selection: A perspective on inter-attribute cooperation Journal Article In: International Journal of Data Science and Analytics, pp. 1–13, 2023. @article{sosa2023feature, High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development. |
2022 |
G. Velázquez and F. Morales and M. García-Torres and F. Gómez-Vela and F. Divina and J.L. Vázquez Noguera and F. Daumas-Ladouce and C. Ayala and D. Pinto-Roaand P. Gardel-Sotomayor Distribution level Electric current consumption and meteorological data set of the East region of Paraguay Journal Article In: Data in Brief, vol. 40, pp. 107699, 2022. @article{velazquez2022distribution, This paper presents a data set with information on meteorological data and electricity consumption in the department of Alto Paraná, Paraguay. The meteorological data were registered every three hours at the Aeropuerto Guarani, Department of Alto Paraná, which belongs to the Dirección Nacional de Aeronáutica Civil of Paraguay. The final data consists of a total of 22.445 records of temperature, relative humidity, wind speed and atmospheric pressure. On the other hand, the electrical energy consumption data set contains a total of 1.848.947 records, all of them coming from the one hundred and fifteen feeders located throughout the Alto Paraná region of Paraguay. Electrical energy consumption data was provided by Administración Nacional de Electricidad (ANDE). The analysis of this data can yield insights regarding the energy consumption in the area. |
S. Gómez-Guerrero and I. Ortiz and G. and Sosa-Cabrera and M. García-Torres and C.E. Schaerer Measuring Interactions in Categorical Datasets Using Multivariate Symmetrical Uncertainty Journal Article In: Entropy, vol. 24, no. 1, pp. 64, 2022. @article{gomez2022measuring, Interaction between variables is often found in statistical models, and it is usually expressed in the model as an additional term when the variables are numeric. However, when the variables are categorical (also known as nominal or qualitative) or mixed numerical-categorical, defining, detecting, and measuring interactions is not a simple task. In this work, based on an entropy-based correlation measure for n nominal variables (named as Multivariate Symmetrical Uncertainty (MSU)), we propose a formal and broader definition for the interaction of the variables. Two series of experiments are presented. In the first series, we observe that datasets where some record types or combinations of categories are absent, forming patterns of records, which often display interactions among their attributes. In the second series, the interaction/non-interaction behavior of a regression model (entirely built on continuous variables) gets successfully replicated under a discretized version of the dataset. It is shown that there is an interaction-wise correspondence between the continuous and the discretized versions of the dataset. Hence, we demonstrate that the proposed definition of interaction enabled by the MSU is a valuable tool for detecting and measuring interactions within linear and non-linear models. |
F. Morales and M. García-Torres and G. Velázquez and F. Daumas-Ladouce and P. Gardel-Sotomayor and F. Gómez-Vela and F. Divina and J. L. Vázquez Noguera and C. Sauer Ayala and D. Pinto-Roa Analysis of Electric Energy Consumption Profiles Using a Machine Learning Approach: A Paraguayan Case Study Journal Article In: Electronics, vol. 11, no. 2, pp. 267, 2022. @article{morales2022analysisb, Correctly defining and grouping electrical feeders is of great importance for electrical system operators. In this paper, we compare two different clustering techniques, K-means and hierarchical agglomerative clustering, applied to real data from the east region of Paraguay. The raw data were pre-processed, resulting in four data sets, namely, (i) a weekly feeder demand, (ii) a monthly feeder demand, (iii) a statistical feature set extracted from the original data and (iv) a seasonal and daily consumption feature set obtained considering the characteristics of the Paraguayan load curve. Considering the four data sets, two clustering algorithms, two distance metrics and five linkage criteria a total of 36 models with the Silhouette, Davies–Bouldin and Calinski–Harabasz index scores was assessed. The K-means algorithms with the seasonal feature data sets showed the best performance considering the Silhouette, Calinski–Harabasz and Davies–Bouldin validation index scores with a configuration of six clusters. |
D. Aquino-Brítez and J.A. Gómez and J.L. Vázquez Noguera and M. García-Torres and J.C. Mello Román and P.E. Gardel-Sotomayor and V.E. Castillo Benitez and I. Castro Matto and D.P. Pinto-Roa and J. Facon and S.A. Grillo Automatic Diagnosis of Diabetic Retinopathy from Fundus Images Using Neuro-Evolutionary Algorithms Journal Article In: Studies in Health Technology and Informatics, vol. 290, pp. 689–693, 2022. @article{aquino2022automatic, Due to the presence of high glucose levels, diabetes mellitus (DM) is a widespread disease that can damage blood vessels in the retina and lead to loss of the visual system. To combat this disease, called Diabetic Retinopathy (DR), retinography, using images of the fundus of the retina, is the most used method for the diagnosis of Diabetic Retinopathy. The Deep Learning (DL) area achieved high performance for the classification of retinal images and even achieved almost the same human performance in diagnostic tasks. However, the performance of DL architectures is highly dependent on the optimal configuration of the hyperparameters. In this article, we propose the use of Neuroevolutionary Algorithms to optimize the hyperparameters corresponding to the DL model for the diagnosis of DR. The results obtained prove that the proposed method outperforms the results obtained by the classical approach. |
P. Mugariri and H. Abdullah and M. García-Torres and B.D. Parameshchari and K.N. Abdul-Sattar Promoting Information Privacy Protection Awareness for Internet of Things (IoT) Journal Article In: Mobile Information Systems, vol. 2022, pp. 1–11, 2022. @article{mugariri2022promoting, The Internet of Things (IoT) has had a considerable influence on our daily lives by enabling enhanced connection of devices, systems, and services that extends beyond machine-to-machine interactions and encompasses a wide range of protocols, domains, and applications. However, despite privacy concerns shown by IoT users, little has been done to reduce and protect individual information exposure. It is extremely difficult to mitigate IoT devices from reidentification threats which is why it is still a major challenge for IoT users to securely protect their information. The trust controls how we regulate privacy in our IoT platforms in the same way that it governs personal relationships. As IoT devices become increasingly linked, more data is shared across individuals, businesses, governments, and ecosystems. Technologies, sensors, machines, data, and cloud connections all rely largely on trust relationships that have been formed. With the rapid growth of additional types of IoT devices that are being introduced, it, therefore, expands privacy concerns and is difficult to develop trust with an IoT system or device without the option to regulate information privacy settings. Privacy has always been a barrier for many devices as they race for the early adoption of IoT technologies. Several Internet of Things devices or systems will continue to pose privacy threats. As a result, the main objective of this study was to examine the individual understanding of privacy and to promote information privacy protection awareness not only to IoT users but also to organizations that use IoT devices or platforms to run their day-to-day business operations. Furthermore, the objective extends to compare user knowledge and concerns about IoT privacy, as well as to identify any common attitudes and variances. However, in terms of enhancing individuals’ knowledge, an artifact was developed to educate and enhance information privacy awareness among IoT users. A pre- and postquestionnaire was generated to test and validate user knowledge regarding information privacy protection in IoT. The study was conducted using a quantitative research method. Findings indicate that IoT users’ awareness of information privacy protection turned out to be average, suggesting a need for education and awareness. Several participants stated that information privacy protection awareness is required within the community to educate, raise awareness, eliminate human error, and enable individuals to be conscious of their privacy when surfing the Internet. |
2021 |
M. García-Torres and F. Gómez-Vela and F. Divina and D.P. Pinto-Roa and J.L. Vázquez Noguera and J.C. Román Scatter search for high-dimensional feature selection using feature grouping Conference GECCO Genetic and Evolutionary Computation Conference, 2021. @conference{garcia2021scatter, |
R. Parra and V. Ojeda and J.L. Vázquez Noguera and M. García-Torres and J.C. Mello-Román and C. Villalba and J. Facon and F. Divina and O. Cardozo and V. Castillo A Trust-Based Methodology to Evaluate Deep Learning Models for Automatic Diagnosis of Ocular Toxoplasmosis from Fundus Images Journal Article In: Diagnostics, vol. 11, no. 11, pp. 1951, 2021. @article{parra2021trust, |