Machine learning for high frequency data

90% of the data generated in the world have been produced in the last years. A vast amount of data is characterized by the fact of being indexed by time, i.e., that an intrinsic feature of the data is that it has been obtained depending on the time.

Moreover, data is often generated by electronic devices. This fact implies that such data is characterized by a high temporal frequency, giving rise to long sequences of data. Within this kind of data, we can divide the data into two classes: time series and data streams. Time series are sequences of values measured at specific, equally distanced, time intervals. On the other hand, data streams are continuous streams of data that are received with high velocity in times that are not necessarily equally distanced.

Of the three main features of data (velocity, variety and volume) is the volume that received more attention from researchers. In fact, in the last years, many algorithms have been developed in order to deal with the problems posed by the data volume. Nevertheless, velocity is another feature that is intrinsic to many problems, for example in data generated by sensors. Such data has a huge socioeconomic impact, for instance in Smart Cities or in Industry 4.0. Due to this, lately the research community is also giving more and more attention to velocity.

The objective of this project is the analysis of high frequency data. In particular we aim at addressing high frequency time series and temporal data streams. Both kind of data presents their own peculiarities and problems. We aim at developing machine learning algorithms for time series forecasting and machine learning models for data streams for both prediction and anomalies detection in real time. In order to assess the validity of the algorithms, prediction algorithms will be applied to electric vehicles and buildings consumption data, while for the anomalies detection algorithms they will be applied in the context of Smart Grids.

Project details