Big Data Analytics - Data Science & Big Data Research Lab

Big data analytics has become a powerful tool for those firms that want to make the most of data. Accurate predictive and descriptive data analysis may result in huge economic benefits, and leading companies are totally aware of this fact.

Big data can be studied from two different perspectives. First, how information is stored and secured, namely, big data infrastructures. Second, how such information is mined and subsequently exploited in order to gain knowledge, namely, big data analytics.

This training activity is divided in two main modules (infrastructures and analytics), that can be independently studied. Additionally, a Scala programming session is also conducted for those people really involved in developing their own big data applications.

Course Contents

Introduction to big data
Big data infrastructure
- Apache Hadoop
  - Hadoop Distributed File System: HDFS
  - MapReduce paradigm
- Cluster computing frameworks: Apache Spark
- Cluster manager: Apache Mesos
Big data analytics
- Processing data using Apache Spark
  - Scala programming
- Machine learning library (Spark’s MLlib)
  - Clustering
  - Classification
  - Regression
  - Streaming

Project details