Big Data – Instituto Politécnico de Coimbra

Base Knowledge

Knowledge of databases and programming is recommended.

Teaching Methodologies

The teaching activity will take place in the classroom, with exposure to concepts, techniques and methods, with a strong focus on practical applications. Software will be used to support the resolution of problems within the program.

Learning Results

The generation and storage of data has been experiencing a remarkable growth, requiring increasingly complex and comprehensive information management and processing processes. This dynamic can be observed in the most varied areas, from data collected from loyalty cards, to data available on social networks in a virtual environment, and also data automatically generated by clinical analysis devices, among others.

All this information opens up a vast number of opportunities, particularly for companies and for knowledge in general.

The curricular unit of Big Data aims to provide students with knowledge in the field of big data analysis, namely in terms of the collection, processing and availability of this massive data, making students aware of the whole ecosystem associated to this type of data.

In terms of skills it is expected that the student will be able to

Enumerate the concepts associated with data processing;
Use some of the most important technologies in this domain such as Hadoop or PySpark.

Program

Big Data Fundamentals
Big Data Storage

Relational Databases
NoSQL Databases

Data Processing in Big Data

Hadoop Map Reduce
Apache Spark
PySpark

Curricular Unit Teachers

António Rui Trigo Ribeiro

Internship(s)

NAO

Bibliography

Fundamental:

– Antonio Trigo. (2023). PyTrigo – Introdução à Data Science com Python (Version v0.12). Zenodo.http://doi.org/10.5281/zenodo.1288006

– MongoDB Documentation: https://docs.mongodb.com/

– Apache Hadoop Documentation: https://hadoop.apache.org/docs/

– Apache Spark Documentation: https://spark.apache.org/docs/

– Bill Chambers and Matei Zaharia (2018). Spark : the definitive guide : big data processing made simple. O’Reilly, 2018.

Complementary:

– Manuel Ignacio Franco Galeano (2018) Big Data processing with Apache Spark : efficiently tacklelarge datasets and big data analysis with Spark and Python. Packt Publishing, 2018.

– Bernard Marr (2016). Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results, Wiley, 2016.

– Saeed, A., & Ebrahim, H. (2024). The Intersection of Machine Learning, Artificial Intelligence, and Big Data. In Big Data Computing (pp. 111-131). CRC Press.

– Hassan, M. A. M., Sardar, T. H., Fahim, M. F. H., Mohamed, M. S., Suleyman, R. M., & Usman, M. M. (2024). Artificial Intelligence and Deep Learning Applications on Big Data Computing Frameworks. In Big Data Computing (pp. 196-211). CRC Press.

– Pradhan, T., Nimkar, P., & Jhajharia, K. (2024). Machine Learning and Deep Learning for Big Data Analysis. In Big Data Analytics Techniques for Market Intelligence (pp. 209-240). IGI Global.

– Darius, P. S., Sowjanya, K., Manju, V. N., Saha, S., Mitra, P., Majumder, P., & Prabhu, S. M. (2024). From Data to Insights: A Review of Cloud-Based Big Data Tools and Technologies. Big Data Computing, 86-110.