Big Time – Aware Data

Big Time – Aware Data | Universidad Pablo Olavide

It is known that, in recent years, advances in technology have meant that the amount of data being generated and stored is increasing to the point that 90% of the data in the world have been generated in the last years. The need to process this huge amount has become essential for the evolution of the tools in recent years. The collection of these tools are jointly designated as Data Mining. This development has given rise to the term Big Data. Big Data begins with the development of tools capable of processing large amounts of data and provide an associated value, as the MapReduce paradigm of Google, and its open source implementation Hadoop or, more recently, the launch of Spark tool under the Apache platform.

This project aims to analyze massive data but with the particularity that data must be indexed over time, i. e., that an essential component in the nature of the data is that they have been retrieved over time. This case is very common in the field of Big Data. For example, two of the main sources of Big Data are 1) repositories of open data generated by the Administration in order to deploy transparency policies and 2) smart cities, where multiple sensors provide information on consumption, traffic, pollution, etc. These two types of data are meaningless if its analysis with respect to its evolution in time is not performed. For instance, measuring the electrical demand or pollution can be analyzed for different purposes: to predict its evolution, to predict outliers or to predict some patterns that allow us to compare their progress to other data, to establish relationships between some variables over others and so on.

Thus, we can conclude that although the analysis of massive data indexed over time has some features in common with the problems solved by big data technologies, it also has some particularities whose solution is considered the main objective of this project. Four sub-objectives are arising: prediction of future values, classification or obtaining similarity patterns, clustering data over time and obtaining patterns in multidimensional time series, and all under a big data prospect.