Teaching Methodologies
The classes are, according to what is determined in the curricular plan, include both theory and practice. In the theorypart of the lesson, the expository method will be used predominantly to introduce concepts, fundamental results andmethodologies. The practice sessions will be aimed at exemplifying procedures and problem solving under theguidance of the teacher, but encouraging autonomous work or in small groups with the support of a computer tool(predominantly, Python). A strong interaction between theory and practice will prevail.
The student will be graded by a project made within the course and by a written exam. Both the written exam and theproject are mandatory. The final grade is equal to 50% of the project grade plus 50% of the written exam result.
Learning Results
In this curricular unit it is intended that the student knows the techniques, with a special focus on the statistical ones,aimed at understanding and preparing data in Data Science tasks, complementing and structuring those alreadycovered in other curricular units. The student must also be able to select the most suitable ones and apply them to adata set in a structured way and with implementation preferably in Python, following the methodology CRISP-DM(Cross-Industry Standard Process for Data Mining).
Additionally, it is intended that the student knows and performs simple resampling and simulation techniques and thathe/she identifies concrete situations in which they are appropriate.
Program
1. Data understanding
1.1. Stages and tasks
1.2. Statistical toolkit
2. Statistical perspective on data preparation
2.1. Stages and tasks
2.2. Statistical tools: missing values treatment; outlier treatment; discretization, normalization and othertransformations; techniques to eliminate redundancy; techniques for dimensionality reduction.
3. Topics of Computational Statistics
3.1. Resampling
3.2. Simulation
Internship(s)
NAO
Bibliography
Bruce, P., Bruce, A., Gedeck, P. (2020). Practical Statistics for Data Scientists, 2nd Edition. O’Reilly Media.
Ciaburro, G. (2020). Hands-On Simulation Modeling with Python. Packt Publishing.
García, S., Luengo, J., Herrera, F. (2014). Data Preprocessing in Data Mining. Springer.
Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E. (2019). Multivariate Data Analysis, 8th Edition. Cengage.
Kuhn, M., Johnson, K. (2020). Feature Engineering and Selection – A Practical Approach for Predictive Models. CRCPress.
Moreira, J., Carvalho, A., Horvath, T. (2018). A General Introduction to Data Analytics. Wiley.