Big Data

Base Knowledge

Knowledge of databases and programming is recommended. 

Teaching Methodologies

The teaching activity will take place in the classroom, with exposure to concepts, techniques and methods, with a strong focus on practical applications. Software will be used to support the resolution of problems within the program.

Learning Results

The generation and storage of data has been experiencing a remarkable growth, requiring increasingly complex and comprehensive information management and processing processes. This dynamic can be observed in the most varied areas, from data collected from loyalty cards, to data available on social networks in a virtual environment, and also data automatically generated by clinical analysis devices, among others.

All this information opens up a vast number of opportunities, particularly for companies and for knowledge in general.

The curricular unit of Big Data aims to provide students with knowledge in the field of big data analysis, namely in terms of the collection, processing and availability of this massive data, making students aware of the whole ecosystem associated to this type of data.

In terms of skills it is expected that the student will be able to

  • Enumerate the concepts associated with data processing;
  • Use some of the most important technologies in this domain such as Hadoop or PySpark.

Program

  1. Big Data Fundamentals
  2. Big Data Storage
    1. Relational Databases
    2. NoSQL Databases
  3. Data Processing in Big Data
    1. Hadoop Map Reduce
    2. Apache Spark
    3. PySpark
  4. Data analysis techniques in Big Data

Curricular Unit Teachers

Internship(s)

NAO

Bibliography

– Doug Laney, “3D Data Management: Controlling Data Volume, Velocity, and Variety”, Gartner, February 2001.

– Stonebraker et al., “MapReduce and Parallel DBMS’s: Friends or Foes?”, Communications of the ACM, January 2010.

– Dean and Ghemawat, “MapReduce: A Flexible Data Processing Tool”, Communications of the ACM, January 2010.

– Rick Cattell, “Scalable SQL and NoSQL Data Stores”, SIGMOD Record, December 2010 (39:4).

– Ahuja, R.K., Magnanti, T.L., Orlin, J.B., 1993. Network flows: Theory, algorithms and applications, Prentice-Hall Inc, New Jersey, USA.

– Hair, J.F., Tatham, R.L., Anderson, R.E., Black, W., 1998. Multivariate data analysis, Prentice-Hall Inc, New Jersey, USA.

– Martins, P., Ladrón, A., Ramalhinho, H., 2014. Maximum cut-clique problem: ILS heuristics and a data analysis application, International Transactions in Operational Research 22(5), 775-809 (DOI: 10.1111/itor.12120).

– Elements of pedagogical support prepared by the subject teacher.