Keynote speakersKeynote speech #1: The next generation of JPEG standards for human and machine consumptionArtificial Intelligence (AI) has emerged as a key technology in a wide range of applications, including those in visual information processing.
Computer vision based on AI has shown outstanding results in pattern recognition and content understanding and AI-based image processing has demonstrated unparalleled performance in cognitive enhancement of visual content as witnessed by advances in super-resolution and denoising operations taking advantage of deep learning techniques. More recent efforts in AI-based compression have shown superior performance when compared to the most advanced to state of the art in image coding.
Based on the above observations, the JPEG standardization committee has started new projects on visual information coding for next generation applications where visual content should be efficiently represented to be used for both human and machine consumption. The first project in this direction which is near completion, is referred to as JPEG AI. The scope of JPEG AI is defined as a learning-based image coding standard that offers a single-stream, compact and compressed domain representation, targeting both human visualization, with significant compression efficiency improvement over current image coding standards, as well as effective performance for image processing and computer vision tasks. Other similar projects have been launched or are under investigation, such as JPEG Pleno learning based point cloud compression and JPEG XE, towards standardization of event based imaging.
In this talk we will provide an overview of JPEG AI, its current status and its performance when compared to state of the art. We will then continue with a discussion about the roadmap and status of other JPEG standardization efforts motivated by recent progress in AI-centric imaging.
Keynote speech #2: Integration of contextual knowledge about urban environments in deep learning models to enhance their performance in visual tasksMore and more researchers focus on the integration of contextual knowledge in the development of machine vision systems, particularly concerning model-based approaches. In this talk, we will put some insights about how contextual (e.g., geometrical, semantic) knowledge can be integrated in the learning process of deep models, with the aim to enhance their performance in visual tasks, such as depth estimation from monocular images or panoptic segmentation, in urban road environments. In the former task (depth estimation), we propose to assist the deep model learning by introducing monocular clues in the model input. These clues are extracted from a semantic segmentation map by using ontological reasoning. In the latter visual task (panoptic segmentation), we propose to introduce knowledge about spatial relationships of the acquired scene objects in the loss function of the model. The analysis of experimental results and their comparison with the state of the art are based on different public datasets of urban road scenes.
|
Online user: 3 | Privacy | Accessibility |
![]() ![]() |