Computer Vision and Multimedia

Base Knowledge

C Language and Matlab; Mathematical Analysis, Linear Algebra.

Teaching Methodologies

Motivation and presentation of the topics in theoretical classes, including small practical examples.
Detailed application exercises with real application.
Execution of laboratory work in groups of two students in practical laboratory classes,
including a project in the vision module.
The laboratory work and project have a report per group, with individual discussion.
Mandatory presence in laboratory classes (maximum of 2 absences).
The ECTS organization provides 97 semester hours for the student’s autonomous work.

Practical assignments in the vision laboratory classes:
– Image acquisition and processing in Matlab.
– Filtering in the Spatial Domain.
– Feature detection and segmentation.
– Applications using the Sherlock tool and/or Matlab/C/Python/OpenCV.
– Applications with Deep Learning using Matlab/Python/OpenCV tools.

1 Applied Vision Project (using tools: Sherlock/Matlab/OpenCV/DLib).
Example: application to quality control in the industry.

3 Practical assignments in multimedia laboratory classes:
– Sampling, Quantization, PCM, DPCM and spatial filtering.
– Transform coding (DCT).
– Image and video coding standards and representation (JPEG,JPEG2000,MPEG,H264,HEVC).

For students under the Student Worker Statute, and for components with mandatory
frequency and distributed evaluation, adjustments to the functionning of these
components may be agreed with the student.

Learning Results

To know the importance of vision in current and future engineering problems, in the industrial and robotics areas,
safety, agriculture, the environment, medicine, sports and autonomous vehicles, among others;
To understand the principles of the formation, acquisition and representation of images;
To understand and apply the most representative spatial image processing techniques, including spatial filtering and simple segmentation;
To understand basic techniques for color and texture representation;
To propose and develop industrial vision applications using specific development software.
To understand the digital representation of audio and video signals;
To understand the basic principles of information theory;
To know the main techniques and standards for compression, coding, storage and transmission of image, audio and video signals.

Learning results and generic competences:
Ability to use image acquisition and processing devices to develop industrial vision solutions using dedicated software;
Ability to participate in the development and installation of multimedia systems, involving audio and video equipment in a network,
such as video surveillance applications.
Ability to identify, propose and implement vision solutions in engineering problems.
Motivation for the investigation of vision applications within the scope of Industry 4.0 and using Deep Learning.

Program

Part I – Computer Vision
Motivation for computer vision with a focus on industrial applications;
Objectives of computer vision;
The Human Visual System;
Sensors and imaging;
Fundamentals of digital imaging;
Binary image analysis;
Processing of images in the spatial domain;
Gray Level Transformations;
Histogram-based processing;
Processing using logical operators;
Spatial filtering – smoothing and sharpening;
Elements of digital morphology: dilation, erosion, aperture and closure;
Notions of color and texture;
Introduction to segmentation;
Application examples;
Image processing using Deep Learning.

Part II – Multimedia:
Multimedia concept;
Digital representation of audio and video signals;
Principles of information theory;
Image compression techniques;
Principles of audio compression;
Principles of video compression;
Main techniques and standards for representation, compression, coding, storage and transmission of multimedia signals;
Application examples.

Curricular Unit Teachers

Internship(s)

NAO

Bibliography

Recommended (available in the library or academic platforms):

– Shapiro, Linda G.; & Stockman, George C. (2001). Computer vision. Upper Saddle River, NJ: Prentice Hall. ISBN 0-13-030796-3. Ref. ISEC:1A-11-41
– Davies, E. R. (1990). Machine vision : Teory, algorithms practicalities. London: Academic Press. ISBN 0-12-206090-3. Refe. ISEC:1A-7-11
– Parker, J. R. (1997). Algorithms for image processing and computer vision. New York [etc.]: John Wiley & Sons, Inc. ISBN 0-471-14056-2. Ref. ISEC:1A-11-19
– Watkinson, J. (2004)The MPEG handbook : MPEG-1, MPEG-2, MPEG-4. (2nd ed.). Amsterdam [etc.]: Elsevier/Focal Press. ISBN 0-240-80578-X. Ref. ISEC:1A-12-51
– Lopes, F. (2018). Computer Vision and Multimedia. Slides to support lectures.
– Support texts and laboratory papers by teachers – updated every year.
– Manuals of the used software tools (Matlab, Sherlock, Python, etc.) – updated every year.
– Manuals and tools of the used hardware (cameras, monitors, etc.) – updated every year.

Complementary:

– Jayant, N. S. & Noll, P. (1984). Digital Coding of Waveforms. Prentice-Hall Signal Processing Series. ISBN 978-0132119139
– Ghanbari, M. (1999). Video Coding: An Introduction to Standard Codecs. IEE Telecommunications Series. ISBN 978-0852967621
– Barry G. Haskell, Atul Puri, Arun N. Netravali Digital Video: An introduction to MPEG-2. Ref. ISEC:1A-12-49
– Miano,J. (2000). Compressed image file formats:JPEG,PNG,GIF,XBM,BMP. Reading, MA [etc.]: Addison Wesley. ISBN 0-201-60443-4. Ref. ISEC:1B-1-20
– Ayache, N. (1991). Artificial vision for mobile robots: stereo vision and multisensory perception. Cambridge, MA [etc.]:The MIT Press. ISBN 0-262-01124-7
– ImageProcessingPlace. http://www.imageprocessingplace.com/
– JPEG. https://jpeg.org/
– MPEG. https://mpeg.chiariglione.org/
– b-on: Online Knowledge Library
– https://blogs.mathworks.com/deep-learning/2022/01/03/deep-learning-for-computer-vision-using-python-and-matlab/
– online platforms to test computer vision applications using deep learning – updated every year.