Big Data

Base Knowledge

Knowledge of databases and programming is recommended. 

Teaching Methodologies

The teaching activity will take place in the classroom, with exposure to concepts, techniques and methods, with a strong focus on practical applications. Software will be used to support the resolution of problems within the program.

Learning Results

The generation and storage of data has been experiencing a remarkable growth, requiring increasingly complex and comprehensive information management and processing processes. This dynamic can be observed in the most varied areas, from data collected from loyalty cards, to data available on social networks in a virtual environment, and also data automatically generated by clinical analysis devices, among others.

All this information opens up a vast number of opportunities, particularly for companies and for knowledge in general.

The curricular unit of Big Data aims to provide students with knowledge in the field of big data analysis, namely in terms of the collection, processing and availability of this massive data, making students aware of the whole ecosystem associated to this type of data.

In terms of skills it is expected that the student will be able to

  • Enumerate the concepts associated with data processing;
  • Use some of the most important technologies in this domain such as Hadoop or PySpark.

Program

  1. Big Data Fundamentals
  2. Big Data Storage
    1. Relational Databases
    2. NoSQL Databases
  3. Data Processing in Big Data
    1. Hadoop Map Reduce
    2. Apache Spark
    3. PySpark

Curricular Unit Teachers

Internship(s)

NAO

Bibliography

Fundamental:

– Antonio Trigo. (2023). PyTrigo – Introdução à Data Science com Python (Version v0.12). Zenodo.http://doi.org/10.5281/zenodo.1288006

– MongoDB Documentation: https://docs.mongodb.com/

– Apache Hadoop Documentation: https://hadoop.apache.org/docs/

– Apache Spark Documentation: https://spark.apache.org/docs/

– Bill Chambers and Matei Zaharia (2018). Spark : the definitive guide : big data processing made simple. O’Reilly, 2018. 

Complementary:

– Manuel Ignacio Franco Galeano (2018) Big Data processing with Apache Spark : efficiently tacklelarge datasets and big data analysis with Spark and Python. Packt Publishing, 2018.

– Bernard Marr (2016). Big Data in Practice: How 45 Successful Companies Used Big Data Analytics to Deliver Extraordinary Results, Wiley, 2016.

– Saeed, A., & Ebrahim, H. (2024). The Intersection of Machine Learning, Artificial Intelligence, and Big Data. In Big Data Computing (pp. 111-131). CRC Press.

– Hassan, M. A. M., Sardar, T. H., Fahim, M. F. H., Mohamed, M. S., Suleyman, R. M., & Usman, M. M. (2024). Artificial Intelligence and Deep Learning Applications on Big Data Computing Frameworks. In Big Data Computing (pp. 196-211). CRC Press.

– Pradhan, T., Nimkar, P., & Jhajharia, K. (2024). Machine Learning and Deep Learning for Big Data Analysis. In Big Data Analytics Techniques for Market Intelligence (pp. 209-240). IGI Global.

– Darius, P. S., Sowjanya, K., Manju, V. N., Saha, S., Mitra, P., Majumder, P., & Prabhu, S. M. (2024). From Data to Insights: A Review of Cloud-Based Big Data Tools and Technologies. Big Data Computing, 86-110.