Prof. Rubén Pérez Chacón, Ph.D. holds a Master’s Degree in Information Security (Autonomous University of Barcelona, 2014) and completed his Ph.D. in Computer Science (Pablo de Olavide University, 2021).
Currently, he leads an IT team in the public sector with over 10 years of experience, and he balances this role with his postdoctoral research in the Data Science and Big Data Lab at Pablo de Olavide University, where he also serves as an Assistant Professor in the area of Languages and Information Systems. His research focuses on data mining and machine learning, particularly in the prediction of time series of large volumes of data.
Publications
2024 |
R. Pérez-Chacón and G. Asencio-Cortés and A. Troncoso and F. Martínez-Álvarez Pattern sequence-based algorithm for multivariate big data time series forecasting: Application to electricity consumption Journal Article In: Future Generation Computer Systems, vol. 154, pp. 397-412, 2024. @article{PEREZ24, Several interrelated variables typically characterize real-world processes, and a time series cannot be predicted without considering the influence that other time series might have on the target time series. This work proposes a novel algorithm to forecast multivariate big data time series. This new general-purpose approach consists first of a previous pattern recognition performed jointly using all time series that form the multivariate time series and then predicts the target time series by searching for similarities between pattern sequences. The proposed algorithm is designed to tackle multivariate time series forecasting problems within the context of big data. In particular, the algorithm has been developed with a distributed nature to enhance its efficiency in analyzing and processing large volumes of data. Moreover, the algorithm is straightforward to use, with only two parameters needing adjustment. Another advantage of the MV-bigPSF algorithm is its ability to perform multi-step forecasting, which is particularly useful in many practical applications. To evaluate the algorithm’s performance, real-world data from Uruguay’s power consumption has been utilized. Specifically, MV-bigPSF has been compared with both univariate and multivariate methods. Regarding the univariate ones, MV-bigPSF improved 12.8% in MAPE compared to the second-best method. Regarding the multivariate comparison, MV-bigPSF improved 44.8% in MAPE with respect to the second most accurate method. Regarding efficiency, the execution time of MV-bigPSF was 1.83 times faster than the second-fastest multivariate method, both in a single-core environment. Therefore, the proposed algorithm can be a valuable tool for practitioners and researchers working in multivariate time series forecasting, particularly in big data applications. |
2020 |
F. Martínez-Álvarez and G. Asencio-Cortés and J. F. Torres and D. Gutiérrez-Avilés and L. Melgar-García and R. Pérez-Chacón and C. Rubio-Escudero and A. Troncoso and J. C. Riquelme Coronavirus Optimization Algorithm: A bioinspired metaheuristic based on the COVID-19 propagation model Journal Article In: Big Data, vol. 8, no. 4, pp. 308-322, 2020. @article{MARTINEZ-ALVAREZ20, This work proposes a novel bioinspired metaheuristic, simulating how the coronavirus spreads and infects healthy people. From a primary infected individual (patient zero), the coronavirus rapidly infects new victims, creating large populations of infected people who will either die or spread infection. Relevant terms such as reinfection probability, super-spreading rate, social distancing measures or traveling rate are introduced into the model in order to simulate the coronavirus activity as accurately as possible. The infected population initially grows exponentially over time, but taking into consideration social isolation measures, the mortality rate and number of recoveries, the infected population gradually decreases. The Coronavirus Optimization Algorithm has two major advantages when compared to other similar strategies. Firstly, the input parameters are already set according to the disease statistics, preventing researchers from initializing them with arbitrary values. Secondly, the approach has the ability to end after several iterations, without setting this value either. Furthermore, a parallel multi-virus version is proposed, where several coronavirus strains evolve over time and explore wider search space areas in less iterations. Finally, the metaheuristic has been combined with deep learning models, in order to find optimal hyperparameters during the training phase. As application case, the problem of electricity load time series forecasting has been addressed, showing quite remarkable performance. |
R. Pérez-Chacón and G. Asencio-Cortés and F. Martínez-Álvarez and A. Troncoso Big data time series forecasting based on pattern sequence similarity and its application to the electricity demand Journal Article In: Information Sciences, vol. 540, pp. 160-174, 2020. @article{PEREZ20, This work proposes a novel algorithm to forecast big data time series. Based on the well-established Pattern Sequence Forecasting algorithm, this new approach has two major contributions to the literature. First, the improvement of the aforementioned algorithm with respect to the accuracy of predictions, and second, its transformation into the big data context, having reached meaningful results in terms of scalability. The algorithm uses the Apache Spark distributed computation framework and it is a ready-to-use application with few parameters to adjust. Physical and cloud clusters have been used to carry out the experimentation, which consisted in applying the algorithm to real-world data from Uruguay electricity demand. |
2019 |
R. Talavera-Llames and R. Pérez-Chacón and A. Troncoso and F. Martínez-Álvarez MV-kWNN: A novel multivariate and multi-output weighted nearest neighbors algorithm for big data time series forecasting Journal Article In: Neurocomputing, vol. 353, pp. 56-73, 2019. @article{NEUCOM2019, This paper introduces a novel algorithm for big data time series forecasting. Its main novelty lies in its ability to deal with multivariate data, i.e. to consider multiple time series simultaneously, in order to make multi-output predictions. Real-world processes are typically characterised by several interrelated variables, and the future occurrence of certain time series cannot be explained without understanding the influence that other time series might have on the target time series. One key issue in the context of the multivariate analysis is to determine a priori whether exogenous variables must be included in the model or not. To deal with this, a correlation analysis is used to find a minimum correlation threshold that an exogenous time series must exhibit, in order to be beneficial. Furthermore, the proposed approach has been specifically designed to be used in the context of big data, thus making it possible to efficiently process very large time series. To evaluate the performance of the proposed approach we use data from Spanish electricity prices. Results have been compared to other multivariate approaches showing remarkable improvements both in terms of accuracy and execution time. |
2018 |
R. Talavera-Llames and R. Pérez-Chacón and A. Troncoso and F. Martínez-Álvarez Big data time series forecasting based on nearest neighbors distributed computing with Spark Journal Article In: Knowledge-Based Systems, vol. 161, no. 1, pp. 12-25, 2018. @article{KNOSYS2018b, A new approach for big data forecasting based on the k-weighted nearest neighbours algorithm is introduced in this work. Such an algorithm has been developed for distributed computing under the Apache Spark framework. Every phase of the algorithm is explained in this work, along with how the optimal values of the input parameters required for the algorithm are obtained. In order to test the developed algorithm, a Spanish energy consumption big data time series has been used. The accuracy of the prediction has been assessed showing remarkable results. Additionally, the optimal configuration of a Spark cluster has been discussed. Finally, a scalability analysis of the algorithm has been conducted leading to the conclusion that the proposed algorithm is highly suitable for big data environments. |
R. Pérez-Chacón and J. M. Luna and A. Troncoso and F. Martínez-Álvarez and J. C. Riquelme Big data analytics for discovering electricity consumption patterns in smart cities Journal Article In: Energies, vol. 11, no. 3, pp. 683, 2018. @article{Energies2018, New technologies such as sensor networks have been incorporated into the management of buildings for organizations and cities. Sensor networks have led to an exponential increase in the volume of data available in recent years, which can be used to extract consumption patterns for the purposes of energy and monetary savings. For this reason, new approaches and strategies are needed to analyze information in big data environments. This paper proposes a methodology to extract electric energy consumption patterns in big data time series, so that very valuable conclusions can be made for managers and governments. The methodology is based on the study of four clustering validity indices in their parallelized versions along with the application of a clustering technique. In particular, this work uses a voting system to choose an optimal number of clusters from the results of the indices, as well as the application of the distributed version of the k-means algorithm included in Apache Spark’s Machine Learning Library. The results, using electricity consumption for the years 2011–2017 for eight buildings of a public university, are presented and discussed. In addition, the performance of the proposed methodology is evaluated using synthetic big data, which cab represent thousands of buildings in a smart city. Finally, policies derived from the patterns discovered are proposed to optimize energy usage across the university campus. |
2016 |
R. Talavera-Llames and R. Pérez-Chacón and M. Martínez-Ballesteros and A. Troncoso and F. Martínez-Álvarez A Nearest Neighbours - Based Algorithm for Big Time Series Data Forecasting Conference HAIS 11th International Conference on Hybrid Artificial Intelligence Systems, Lecture Note in Computer Science 2016. @conference{HAIS2016b, |
R. Pérez-Chacón and R. Talavera-Llames and F. Martínez-Álvarez and A. Troncoso Finding Electric Energy Consumption Patterns in Big Time Series Data Conference DCAI 13th International Conference on Distributed Computing and Artificial Intelligence, Advances in Intelligent Systems and Computing 2016. @conference{DCAI2016, |