Big Data

Base Knowledge

Databases

Data Structures

SQL

Teaching Methodologies

1. Presentation of the theoretical foundations underlying Big Data

2. Presentation and analysis of the characteristics of NoSQL databases

3. Presentation and analysis of the characteristics of NewSQL databases

4. Theoretical exposition of Big Data processing technologies

5. Solving exercises and application examples

6. Practical works

Learning Results

On successful completion of this curricular unit, the student should be able to:

1. Known and understand the principles and concepts of storage, processing and analysis of Big Data.

2. Using NoSQL and NewSQL databases

3. Evaluate the databases using benchmarks

4. Identify and apply big data storage, processing, and analysis concepts and techniques to real-world problems

5. Select and use appropriate tools for storing, processing and analysing large volumes of data (Big Data)

Program

1. Introduction to Big Data

– A little of history

– What is Analytics?

– What is Big Data?

– Characteristics of Big Data

– Domain Specific Examples of Big Data

– Analytics Flow for Big Data

– Big Data Stack

 

2. What are NoSQL databases?

– What is wrong with the relational model?

– Big Data

– NoSQL

 

3. Characteristics of NoSQL databases

– NoSQL Architectures

– Data schemas

– Data Sharing and Sharing

– Consistency

– ACID and BASE models

 

4. Classification of the NoSQL databases

– Key-Value

– Document

– Column

– Graph

 

5. NewSQL databases

– Main features

– Functionalities

– Differences between SQL, NoSQL and NewSQL

 

6. Benchmarks for Database Evaluation

– Relevant properties of a NoSQL benchmark

– Benchmarks for Key-value databases

– Benchmarks for Document databases

– Benchmarks for Column databases

– Benchmarks for Graph databases

– Examples of practical works

 

7. Big Data – Introduction to Distributed Processing

– Limitations of traditional systems

– Features

– MapReduce and Hadoop

– Big Data platforms

 

8. Big Data – Storage

– Architecture

– HDFS – Hadoop Distributed File System

 

9. Big Data – Processing

– Tools and techniques

– Processing with Hadoop, Spark

– SQL on Hadoop, Hive

– Processing stream data

Curricular Unit Teachers

Internship(s)

NAO

Bibliography

Recommended Bibliography:

– Slides and lecture notes from professors’ classes

 

Optional Bibliography:

– Santos, M. Y., & Costa, C. (2019). Big Data: Concepts, Warehousing, and Analytics. FCA.

– Erl, T., Khattak, W., & Buhler, P. (2013). Big Data Fundamentals: Concepts, Drivers & Techniques. Prentice Hall.

– Bahga, A., & Madisetti, V. (2016). Big Data Science & Analytics: A Hands-On Approach. VPT.

– Provost, F., & Fawcett, T. (2013). Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. O’Reilly Media.

– Bengfort, B., & Kim, J. (2016). Data Analytics with Hadoop: An Introduction for Data Scientists. O’Reilly Media.

– White, T. (2015). Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale. O’Reilly Media.

– Kumar, V. N., & Shindgikar, P. (2018). Modern Big Data Processing with Hadoop: Expert techniques for architecting end-to-end Big Data solutions to get valuable insights. Packt Publishing.

– Chodorow, K. (2013). MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. O’Reilly Media.