Keynote speakersKeynote speech #1: The next generation of JPEG standards for human and machine consumptionArtificial Intelligence (AI) has emerged as a key technology in a wide range of applications, including those in visual information processing.
Computer vision based on AI has shown outstanding results in pattern recognition and content understanding and AI-based image processing has demonstrated unparalleled performance in cognitive enhancement of visual content as witnessed by advances in super-resolution and denoising operations taking advantage of deep learning techniques. More recent efforts in AI-based compression have shown superior performance when compared to the most advanced to state of the art in image coding.
Based on the above observations, the JPEG standardization committee has started new projects on visual information coding for next generation applications where visual content should be efficiently represented to be used for both human and machine consumption. The first project in this direction which is near completion, is referred to as JPEG AI. The scope of JPEG AI is defined as a learning-based image coding standard that offers a single-stream, compact and compressed domain representation, targeting both human visualization, with significant compression efficiency improvement over current image coding standards, as well as effective performance for image processing and computer vision tasks. Other similar projects have been launched or are under investigation, such as JPEG Pleno learning based point cloud compression and JPEG XE, towards standardization of event based imaging.
In this talk we will provide an overview of JPEG AI, its current status and its performance when compared to state of the art. We will then continue with a discussion about the roadmap and status of other JPEG standardization efforts motivated by recent progress in AI-centric imaging.
Touradj Ebrahimi is professor of image processing at Ecole Polytechnique Fédérale de Lausanne (EPFL). He is active in teaching and research in multimedia signal processing and heads the Multimedia Signal Processing Group at EPFL. Since 2014, he has been the Convenor (Chairman) of the JPEG standardization Committee which has produced a family of standards that have revolutionised the world of imaging. He represents Switzerland as the head of its delegation to JTC1 (in charge of standardization of information technology in ISO and IEC), SC29 (the body overseeing MPEG and JPEG standardization) and is a member of ITU representing EPFL. He was previously the chairman of the SC29 Advisory Group on Management until 2014. Prof. Ebrahimi is involved in Ecma International as a member of its ExeCom since 2020 and serves as consultant, evaluator and expert for European Commission and other governmental funding agencies in Europe and advises a number of Venture Capital companies in Switzerland in their scientific and technical audits. He has founded several startup and spinoff companies in the past two decades, including the most recent RayShaper SA, a research company based in Crans-Montana involved in AI powered multimedia. Prof. Ebrahimi is author or co-author of over 300 scientific publications and holds a dozen invention patents. His areas of interest include image and video compression, media security, quality of experience in multimedia and AI based image and video processing and analysis. Prof. Ebrahimi is a Fellow of the IEEE, SPIE and EURASIP and has been recipient of several awards and distinctions, including a Prime Time Emmy Award for JPEG, the IEEE Star Innovator Award in Multimedia and SMPTE Progress Medal.
Keynote speech #2: Integration of contextual knowledge about urban environments in deep learning models to enhance their performance in visual tasksMore and more researchers focus on the integration of contextual knowledge in the development of machine vision systems, particularly concerning model-based approaches. In this talk, we will put some insights about how contextual (e.g., geometrical, semantic) knowledge can be integrated in the learning process of deep models, with the aim to enhance their performance in visual tasks, such as depth estimation from monocular images or panoptic segmentation, in urban road environments. In the former task (depth estimation), we propose to assist the deep model learning by introducing monocular clues in the model input. These clues are extracted from a semantic segmentation map by using ontological reasoning. In the latter visual task (panoptic segmentation), we propose to introduce knowledge about spatial relationships of the acquired scene objects in the loss function of the model. The analysis of experimental results and their comparison with the state of the art are based on different public datasets of urban road scenes.
Yassine Ruichek received the Ph.D. degree in control and computer engineering and the Habilitation à Diriger des Recherches degree in physic science from the University of Lille, France, in 1997 and 2005, respectively. Since 2007, he has been a Full Professor at the University of Technology of Belfort-Montbéliard (UTBM). His research interests are concerned with multisensory data-based perception and localization, including computer vision, pattern recognition and classification, machine learning, and data fusion, with applications to intelligent transportation systems and video surveillance. |
Online user: 2 | Privacy |