Data Integration

Base Knowledge

Knowledges on procedural programming.

Teaching Methodologies

The weekly teaching load is as follows:

– 2 theoretical hours used to present new concepts and approach methodologies for data integration. All the concepts explained are complemented with the presentation of simple examples;

– 2 practical hours in which students have the opportunity to apply the concepts learned in solving specific problems.

The assessment is divided into 2 components:

  • Theoretical (70%): Students can choose to carry out the theoretical assessment on a continuous basis, by taking two tests throughout the semester or taking only one exam at the end of the semester.
  • Practice (30%) Students have to carry out practical work throughout the semester.

Students to pass the curricular unit must obtain a minimum of 35% in both assessment components (theoretical and practical).

Learning Results

Goals

To provide students the capacities of analyze and solve data integration problems in different levels of an organization. The unit focus different types and strategies of data integration and explores the use of tools and/or programming solutions for data integration.

Skills

Knowledge and understanding:
• Explain the main reasons for integrating data
• Identify the main types of data integration
• Identify and explain data integration technologies and strategies
• Explain the steps involved in the data integration process.

Knowledge application
• Acquisition and understanding of the essential concepts for the integration of data and systems.
• Acquisition and application of knowledge about the communication between components and applications, built with different languages ​​and / or running in heterogeneous systems.
• Ability to use markup languages ​​to question heterogeneous data sources.
• Ability to design, implement, integrate and maintain distributed applications and systems made up of heterogeneous sources.

Justification in decision-making
• Ability to make fundamental choices in terms of integration technologies.

Conducting judgments
• Critical evaluation of systems integration solutions.

Communication
Prepare clear documentation in the scope of the development of practical work, identifying and justifying the main decisions taken.

Autonomy and self-learning
• Ability to perform autonomous and group work.
• Development and autonomy in learning.
Encourage the ability to address new problems posed in terms of systems integration as well as in the realization of solutions that support that same integration

Program

1. Introduction
1.1. Motivation and context
1.2. Definition of Data Integration
1.3. Data Integration Difficulties
1.4. Data Integration Architectures
2. Data Extraction
2.1. Extracting Web Data
2.2. Extracting data from text
2.3. Regular Expressions
2.4. String matching algorithms
3. Formats for storing and integrating data
3.1. JSON
3.2. XML
4. JSON Data Models
4.1. Features
4.2. Advantages / Disadvantages
4.3. Examples
5. XML data models
5.1. XML Language
5.2. Validation of XML documents
5.3. Interrogation and modification languages: XPath, XSLT, XQuery, XQuery Update
6. Data Integration Techniques
6.1. Schema Mapping
6.2. Mediators and wrappers
6.3. Query processing
6.4. Matching Strings
7. Service-based integration
7.1. Web services
7.2. Simple Object Access Protocol (SOAP)
7.3. Web Services Description Language (WSDL)

Curricular Unit Teachers

Anabela Borges Simões

Internship(s)

NAO

Bibliography

Doan, A. H., Halevy, A., & Ives, Z. G. (2012). Principles of data integration. Waltham, MA: Morgan Kaufmann

ISEC Library: 1A-13-49 (ISEC) – 16815

Martins, V. (2006). Integração de Sistemas de Informação – Perspectivas, Normas e Abordagens. Lisboa: Edições Silabo.
ISEC Library:: 1A-13-42 (ISEC) – 16048

Newcomer, E. (2006). Understanding Web Services. Boston, MA: Independent Technology Guides.
ISEC Library: 1A-12-165 (ISEC) – 16107

Linthicum, D. S. (2003). Next Generation Application Integration: From Simple Information to Web Services. Boston, MA: Addison-Wesley
ISEC Library: 1A-13-37 (ISEC) – 15975