Big Data

Base Knowledge

Databases

Data Structures

SQL

Teaching Methodologies

1. Exposure of the Big Data fundamentals

2. Theoretical exposition and analysis of the characteristics of NoSQL databases

3. Theoretical exposition and analysis of the characteristics of NewSQL databases

4. Theoretical exposition of Big Data processing technologies

5. Application examples/exercises

6. Practical works

Learning Results

On successful completion of this course unit, the student should be able to:

1. Known and understand the principles and concepts of storage, processing and analysis of Big Data.

2. Using NoSQL and NewSQL databases

3. Evaluate the databases using benchmarks

4. Identify and apply the concepts and storage techniques, processing and analysis of Big Data to solve practical problems.

5. Select and use appropriate tools, for storage, processing and analysis of large volume of Big Data.

Program

1. Introduction to Big Data

– A little of history

– What is Analytics?

– What is Big Data?

– Characteristics of Big Data

– Domain Specific Examples of Big Data

– Analytics Flow for Big Data

– Big Data Stack

 

2. What are NoSQL databases?

– What is wrong with the relational model?

– Big Data

– NoSQL

 

3. Characteristics of NoSQL databases

– NoSQL Architectures

– Data schemas

– Data Sharing and Sharing

– Consistency

– ACID and BASE models

 

4. Classification of the NoSQL databases

– Key-Value

– Document

– Column

– Graph

 

5. NewSQL databases

– Main features

– Functionalities

– Differences between SQL, NoSQL and NewSQL

 

6. Benchmarks for Database Evaluation

– Relevant properties of a NoSQL benchmark

– Benchmarks for Key-value databases

– Benchmarks for Document databases

– Benchmarks for Column databases

– Benchmarks for Graph databases

– Examples of practical works

 

7. Big Data – Introduction to Distributed Processing

– Limitations of traditional systems

– Features

– MapReduce and Hadoop

– Big Data platforms

 

8. Big Data – Storage

– Architecture

– HDFS – Hadoop Distributed File System

 

9. Big Data – Processing

– Tools and techniques

– Processing with Hadoop, Spark

– SQL on Hadoop, Hive

– Processing stream data

Curricular Unit Teachers

Internship(s)

NAO

Bibliography

Recommended Bibliography:

– Slides and lecture notes from professors’ classes

 

Optional Bibliography:

– Big Data: Concepts, Warehousing, and Analytics. Maribel Yasmina Santos, Carlos Costa, FCA 2019, (ISBN: 978-972-722-908-6)

– Big Data Fundamentals: Concepts, Drivers & Techniques. Erl, T., Khattak, W., & Buhler, P. Prentice Hall, (ISBN 978-0-13-429107-9).

– Big Data Science & Analytics: A Hands-On Approach. Bahga, A., & Madisetti, V. VPT, (ISBN 978-0-9960255-3-9).

– Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking. Provost, F., & Fawcett, T. O’Reilly Media, (ISBN 978-1-4493-6132-7).

– Data Analytics with Hadoop: An Introduction for Data Scientists. Bengfort, B., & Kim, J. O’Reilly Media, (ISBN 978-1-4919-1370-3).

– Hadoop: The Definitive Guide: Storage and Analysis at Internet Scale. White, T. O’Reilly Media, (ISBN 978-1-4919-0163-2).

– Modern Big Data Processing with Hadoop: Expert techniques for architecting end-to-end Big Data solutions to get valuable insights. Kumar, V. N., & Shindgikar, P. Packt Publishing, (ISBN 978-1-78712-276-5).

– MongoDB: The Definitive Guide: Powerful and Scalable Data Storage. Chodorow, K. O’Reilly Media, (ISBN 978-1-4493-4468-9).