Master in Data Analysis and Decision Supporting Systems

Teaching Methodologies

The teaching activity takes place in class, with exposure to concepts, techniques, and methods, with a strong focus on solving practical
problems. Software will be used to support problem-solving.

Learning Results

In everyday life, vast amounts of data are generated either through websites, cell phones, wearable devices, or sensors associated with the
Internet of Things, among others. Processing this massive amount of data requires specialized tools that exceed the capacity of our PCs
and even some servers, making distributed systems necessary for data processing. The main objective of this course is to familiarize
students with the most important information technologies used in the manipulation, storage, and analysis of large amounts of data, one of
the significant examples being the Apache Spark framework, used for distributed computing

Program

1. Big Data Fundamentals
1.1 Concepts and motivation.
1.2 The 5 Vs and data types.
1.3 Architectures and applications.
2. The Hadoop Ecosystem
2.1 HDFS and distributed storage.
2.2 MapReduce: principles and examples.
2.3 Ecosystem components.
3. Apache Spark
3.1 Core concepts and advantages.
3.2 RDDs, DataFrames, and transformations.
3.3 Persistence and actions.
4. Large-Scale Data Processing
4.1 Data pipelines.
4.2 Integration with distributed systems.
4.3 Use cases.
5. Machine Learning in Big Data
5.1 MLlib: basic models.
5.2 Distributed evaluation.
5.3 Applied examples.

Curricular Unit Teachers

Ricardo Manuel da Silva Malheiro

Internship(s)

NAO

Bibliography

Rajaraman, A., & Ullman, J. (2011). Mining of massive datasets. Cambridge University Press.
Ryza, S., et al. (2017). Advanced analytics with Spark: Patterns for learning from data at scale. O’Reilly Media.
Mendelevitch, O., Stella, C., & Eadline, D. (2016). Practical data science with Hadoop and Spark: Designing and building effective analytics
at scale. Addison-Wesley.
Deitel, P., & Deitel, H. (2019). Intro to Python for computer science and data science: Learning to program with AI, big data and the cloud.
Pearson.
Klosterman, S. (2019). Data science projects with Python: A case study approach to successful data science projects using Python,
pandas, and scikit-learn. Packt Publishing.
Triguero, I., & Galar, M. (2023). Large-scale data analytics with Python and Spark: A hands-on guide to implementing machine learning
solutions. Cambridge University Press.

Organic Unit(s) Instituto Superior de Contabilidade e Administração

School Year 2026/2027 Scientific Area Quantitative Methods and Management Information Systems Teaching Mode PRESENCIAL ECTS 5 Cicle Segundo Year 1º Code SEGUNDO Teaching Language Portuguese Teaching Language Semester Type Normal Mandatory Obrigatória