Big Data

Teaching Methodologies

The teaching activity takes place in class or via videoconference, with exposure to concepts, techniques, and methods,with a strong focus on solving practical problems. Software will be used to support problem solving. The student willbe evaluated by a work assignment and a final written exam, both with the same weight in the final grade. The grade ofthe work will only be considered if the student obtains a minimum grade of 8 in the written test (on a scale of 0 to 20).

Learning Results

In everyday life, huge amounts of data are generated either through websites, cell phones, wearable devices orsensors associated with the Internet of Things, among others. Processing this huge amount of data requires the use ofspecific tools that exceed the capacity of our PCs and even some servers, making it necessary to use distributedsystems for data processing. The main objective of this course is to familiarize students with the most importantinformation technologies used in the manipulation, storage, and analysis of large amounts of data, one of the majorexamples being the Apache Spark framework, used for distributed computing.

Program

1. Relational databases (SQL)
2. NoSQL databases
3. Big Data fundamentals
4. Hadoop and Spark
5. Machine Learning in Big Data

Internship(s)

NAO

Bibliography

– Antonio Trigo. (2018, June 12). PyTrigo – Introdução à Data Science com Python (Version v0.12). Zenodo.http://doi.org/10.5281/zenodo.1288006
– Mining of Massive Datasets, A. Rajaraman, J. Ullman, 2011, Cambridge University Press.
– Big Data: Algorithms, Analytics, and Applications, Kuan-Ching Li et al., Chapman and Hall/CRC, 2015.
– Advanced Analytics with Spark: Patterns for Learning from Data at Scale, Sandy Ryza et al., O’Reilly Media, 2017.