ICIP 2010, The Hong Kong Convention and Exhibition Centre, Hong Kong
 

Tutorials

 
 Sunday, September 26, 09:30 - 12:30

 

T-1: Advanced Image Processing Based on Spatially Adaptive Nonlocal Image Filtering and Regularization
Presenters: Vladimir Katkovnik, Alessandro Foi, Karen Egiazarian.
Location: S423

 

T-2: Poisson Image Theory for Integrating Detectors: Algorithms and Applications
Presenters: Keigo Hirakawa and Patrick J. Wolfe
Location: S424
  T-3: Perceptual Quality Measurement for Video
Presenter: Stefan Winkler
Location: S426

 

T-4: Video Processing Techniques for 3-D Television
Presenter: Yo-sung Ho
Location: S427
    T-5: Video Tracking: overview, applications and recent developments
Presenters: Andrea Cavallaro and Emilio Maggio
Location: S428
 
 Sunday, September 26, 13:30 - 16:30

T-6: Photometric Methods for 3-D Modeling
Presenters: Yasuyuki Matsushita, Bennett Wilburn, and Moshe Ben-Ezra
Location: S423

T-7: To Tell a Good Picture from a Good One: Perceptual Visual Quality Evaluation
Presenter: Weisi Lin
Location: S424

T-8: Human Behavior Analysis
Presenters: Nicu Sebe and Nicola Conci
Location: S426

T-9: Image Denoising and the SURE-LET METHODOLOGY
Presenters: Thierry Blu and Florian Luisier
Location: S427

T-10: Low-rank Matrix Recovery: From Theory to Imaging Applications
Presenters: Yi Ma, Zhouchen Lin, John Wright
Location: S428

Tutorial 1: Advanced Image Processing Based on Spatially Adaptive Nonlocal Image Filtering and Regularization

Presenters
Vladimir Katkovnik, Alessandro Foi, Karen Egiazarian.
Department of Signal Processing, Tampere University of Technology,
Tampere, Finland.

Abstract
Nonlocal imaging techniques look for blocks (patches, fragments) which are similar to each other and process them jointly. This sort of techniques appear independently and in parallel in a number of different developments, in particular as the block matching proposed for video processing and as nonlocal means in nonparametric regression modeling. The last few years witness an intensive flow of publications with ideas and techniques based on various nonlocal approximations. Some of these nonlocal developments report very good and sometimes even extraordinary good performance. While a theory for these methods is far from being fully developed, the source of their advanced performance is clear as it originates from the fact that real-life images are characterized by mutual similarity of their fragments.

Nonlocal imaging techniques appear in forms and modifications that can be so diverse that it is sometimes difficult even to recognize that they belong to the same class of algorithms.

One of the motivations of this tutorial is to provide a proper classification of the nonlocal techniques that can serve as a guideline for orienting in this intensively developing area. We analyze the evolution of the nonlocal modeling in imaging from the local Nadaraya-Watson kernel estimate to the nonlocal means, and further to transform-domain filtering based on nonlocal block-matching.

Another motivation is to demonstrate the efficiency and the state-of-the-art performance achieved by the recent nonlocal algorithms.

The considered methods are classified mainly according to two main features: local/nonlocal and pointwise/multipoint. These alternatives, though obvious simplifications, allow to impose a fruitful and transparent classification of the basic ideas in the advanced techniques. Here nonlocal is an alternative to local, and multipoint is an alternative to pointwise. In the multipoint case the data are typically processed by overlapping subsets, i.e. windows, blocks or generic neighborhoods, and multiple estimates are obtained for each individual point. The final estimate is obtained by aggregating (fusing) the multiple multipoint estimates. It is found that this sort of redundant approximations with multiple estimates for each pixel dramatically improves the accuracy of estimation.

Within this framework, we discuss different forms of efficient redundant modeling as well as an original single- and multiple-model transform domain nonlocal collaborative filtering approach.
The tutorial is accompanied by numerous examples where these methods are applied to competitive image processing problems.

Applications include Gaussian and non-Gaussian denoising, deblurring (deconvolution), deblocking and deringing, inverse-halftoning, color image processing, etc.

Matlab software, which implements the presented techniques and experiments, is publicly available on the website http://www.cs.tut.fi/~foi.

Description of the tutorial and material to be covered:

1. Local pointwise modeling: pointwise weighted means, pointwise polynomial (non-polynomial) approximations;
2. Adaptivity of pointwise polynomial estimates: varying spatially adaptive optimal window size (scale) estimates, Lepski's approach for the window size selection, intersection of confidence intervals (ICI) rule;
3. Local polynomial approximation (LPA) with anisotropic adaptive supports;
4. Signal-dependent windows/weights;
5. Variational formulations of local pointwise modeling;
6. Local multipoint modeling: overcomplete transforms, model selection rules, order adaptive models, aggregation of multipoint estimates;
7. Shape-adaptive transform domain filtering;
8. Nonlocal pointwise modeling: weights defined by pointwise and neighborhood-wise differences, nonlocal (NL)-means;
9. Recursive reweighting;
10. Nonlocal pointwise higher-order models;
11. Variational formulations of nonlocal models;
12. Nonlocal multipoint modeling: single-model approach;
13. Nonlocal multipoint modeling: multiple-model approach, collaborative filtering, block-matching 3D transform (BM3D) algorithms;
14. Non-Gaussian image processing based on optimized variance stabilizing transformations with exact unbiased inverse.
15. Applications: denoising, deblurring, super-resolution imaging, compressive sensing, still and video imaging, etc.


Speakers' Biographies

Prof. Vladimir Katkovnik received the M.Sc., Ph.D., and D.Sc. degrees in technical cybernetics from the Leningrad Polytechnic Institute, Leningrad, Russia, in 1960, 1964, and 1974, respectively. From 1964 to 1991, he held the positions of Associate Professor and Professor at the Department of Mechanics and Control Processes, Leningrad Polytechnic Institute. From 1991 to 1999, he was a Professor of the Department of Statistics of the University of South Africa, Pretoria. From 2001 to 2003, he was a Professor of the Mechatronics Department of Kwangju Institute of Science and Technology, South Korea. From 2000 to 2001 and since 2003 he is a Research Professor with the DSP/TUT. His research interests include stochastic signal processing, linear and nonlinear filtering, nonparametric estimation, imaging, nonstationary systems, and time-frequency analysis.

Dr. Alessandro Foi received the M.Sc. degree in Mathematics from the Universita degli Studi di Milano, Italy, in 2001, the Ph.D. degree in Mathematics from the Politecnico di Milano in 2005, and the D.Sc.Tech. degree in Signal Processing from Tampere University of Technology, Finland, in 2007. His research interests include mathematical and statistical methods for signal processing, functional analysis, and harmonic analysis. Currently, he is a senior researcher at the Department of Signal Processing, Tampere University of Technology. His work focuses on spatially adaptive algorithms for denoising and deblurring of digital images and on noise modeling for digital imaging sensors.

Prof. Karen Egiazarian received the M.Sc. degree in Mathematics from Yerevan State University, Armenia, in 1981, the Ph.D. degree in physics and mathematics from Moscow State University, Moscow, Russia, in 1986, and the D.Tech. degree from Tampere University of Technology, Finland, in 1994. He has been Senior Researcher with the Department of Digital Signal Processing, Institute of Information Problems and Automation, National Academy of Sciences of Armenia. Since 1996, he has been an Assistant Professor with the DSP/TUT, where he is currently a Professor, leading the Transforms and Spectral Methods group. His research interests are in the areas of applied
mathematics, signal/image processing, and digital logic.

 

Tutorial 2: Poisson Imaging Theory for Integrating Detectors: Algorithms and Applications

Presenters
Keigo Hirakawa, University of Dayton
Patrick J. Wolfe, Harvard University

Abstract
The ubiquity of digital imaging in consumer, scientific, and medical applications places ever greater demands on engineers to understand how photons are accumulated and counted by pixel sensors as integrating detectors. In this tutorial we address the key points of Poisson imaging theory as a means of understanding the dynamics of image data acquisition and processing and mitigating the effects of shrinking device footprints, increasing dynamic range requirements, and heightened image quality expectations. This topic is of broad interest yet sufficiently focused to provide a clear, structured framework for the tutorial setting. Its aim is to educate students, researchers, and industry practitioners in relevant theory, algorithms, and applications for working with Poisson data. Participants will gain an understanding of practical implementational trade-offs, along with a set of technical tools and frameworks for thinking holistically about integrating detectors and inference for Poisson processes in imaging.

Description of the tutorial and material to be covered:

I. Introduction: Photon Counts, Poisson Processes and Integrating Detectors
    A. Integrating detectors and relevant concepts from image sensing
B. Basics of Poisson processes and statistical inference
C. Sources of Poisson variability, missingness, and censoring in the data acquisition process
II. Algorithms for Sensing and Denoising Poisson Intensities
  A. Image resolution/distortion trade-offs in sensing
B. Estimation with and without variance stabilization
C. Wavelet-based Poisson data processing
D. High dynamic range (HDR) image acquisition
III. Applications and In-Depth Worked Examples
  A. Astronomy - Single-photon detection
B. Medical Imaging - The photon paucity problem
C. Consumer Color Cameras - Joint denoising/demosaicking

Speaker Biographies

The tutorial will be led by presenters with extensive experience of both the theoretical and practical aspects of statistical image acquisition and processing, and whose research has been funded by companies such as Sony Electronics, Inc., and Texas Instruments. Prof. Keigo Hirakawa, currently of the University of Dayton, has been an ASIC engineer and principal image scientist for the camera division of Hewlett-Packard/Agilent Technologies, and has been involved with camera development efforts at Micron, Sony, Texas Instruments, and Kodak. For two years he led a Harvard-Sony camera pipeline collaboration jointly with Prof. Wolfe, who heads the Statistics and Information Sciences Laboratory at Harvard. Jointly with Prof. Hirakawa, Prof. Patrick Wolfe has received an ICIP 2007 DoCoMo paper award for work in color filter array design, and delivered an ICIP 2008 tutorial on the color imaging pipeline for digital cameras.

 

Tutorial 3: Perceptual Quality Measurement for Video

Presenter
Stefan Winkler
Cheetah Technologies - http://www.cheetahtech.com/
1901 Charcot Avenue, San Jose, CA 95131, USA

Abstract
The proliferation of digital video content (e.g. IPTV services, Internet video, etc.) has underlined the importance of subjective and objective methods for video quality measurement. At the same time, research on the various metrics, methods and standards is highly active and constantly evolving, creating an abundance of video quality monitoring solutions which can be confusing to users. Significant progress has been made recently, both in terms of standardized approaches, as well as our understanding of the evaluation and performance of quality metrics.
This tutorial introduces participants to the fundamentals of visual perception and the algorithmic modeling of human vision. It will discuss the various factors that influence a viewer's perception of video quality. It will also briefly cover the basics of MPEG-2/H.264 video compression, IP transmission, and the most common distortions present as a result.

The focus of the tutorial will be subjective as well as objective methods for video quality assessment. Here we will cover various methods for subjective tests, how to obtain Mean Opinion Scores (MOS), and related considerations important for conducting such experiments. We will also review a number of popular video quality metrics and discuss the different approaches taken by these algorithms, as well as issues with metric validation.
In the last part, we will discuss international standards related to video quality and ongoing standardization activities in VQEG, ITU, ATIS, VSF, and other groups. We will also review publicly available resources for video quality testing and metrics development. Finally, the tutorial will conclude with an outlook on future trends in this area, in particular 3D video quality.
Course material to be provided will include the tutorial slides and a website with links to supporting material such as white papers, public datasets, and software resources.

Outline of the tutorial and material to be covered:

íP Visual perception
íP Vision modeling
íP Quality factors
íP Video compression, transmission, and other common distortions
íP Subjective methods for video quality assessment
íP Objective methods for video quality assessment
íP How to validate quality metrics
íP Video quality standards (VQEG, ITU, ATIS, VSF, etc.)
íP Recent developments and trends (including 3D)

Speaker Biography

Stefan Winkler holds a M.Sc. degree in Electrical Engineering from the University of Technology in Vienna, Austria, and a Ph.D. degree from the Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland. He is currently Chief Scientist at Cheetah Technologies. Prior to that, he was Principal Technologist at Symmetricom and Chief Scientist at Genista, which he co-founded in 2001. He has also held assistant professor positions at the National University of Singapore (NUS) and the University of Lausanne, Switzerland.
Dr. Winkler has 15 years of experience in the areas of image/video processing and quality assessment. He has published more than 60 papers and is the author of the book "Digital Video Quality". He serves as an Associate Editor for IEEE Transactions on Image Processing. He has been an active contributor to the Video Quality Experts Group (VQEG) since it was founded in 1997 and is currently co-chair of the QoE Metrics Activity Group of the Video Services Forum (VSF).

 

Tutorial 4: Video Processing Techniques for 3-D Television

Presenter
Yo-Sung Ho, Gwangju Institute of Science and Technology (GIST), Korea

Abstract
In recent years, various multimedia services have become available and the demand for three-dimensional television (3DTV) is growing rapidly. Since 3DTV is considered as the next generation broadcasting service that can deliver real and immersive experiences by supporting user-friendly interactions, a number of advanced 3D video processing technologies have been developed. In this tutorial lecture, we will cover the current state-of-the-art video processing techniques for 3DTV, including camera calibration and image rectification, illumination compensation and colour correction, depth map modelling and enhancement, 3D warping and depth map refinement, coding of multi-view video and depth map, hole filling for occluded objects, and virtual view synthesis.

Description of the tutorial and material to be covered:

Trend of Broadcasting Technologies
Review of 3DTV related Activities
Video Processing Techniques for 3DTV
Pre-processing of multi-view video
    - Illumination compensation and colour correction
- Camera calibration and image rectification
Depth map generation for multi-view video
    - Depth map modelling and enhancement
- 3D warping and depth map refinement
Coding of multi-view video and depth map
    - Prediction structure
- Compression operation
Intermediate view synthesis
    - Hole filling for occluded objects
- Virtual view synthesis
Conclusions

Speaker Biography

Dr. Yo-Sung Ho received the B.S. and M.S. degrees in electronic engineering from Seoul National University, Seoul, Korea, in 1981 and 1983, respectively, and the Ph.D. degree in electrical and computer engineering from the University of California, Santa Barbara, in 1990. He joined ETRI (Electronics and Telecommunications Research Institute), Daejon, Korea, in 1983. From 1990 to 1993, he was with North America Philips Laboratories, Briarcliff Manor, New York, where he was involved in development of the Advanced Digital High-Definition Television (AD-HDTV) system. In 1993, he rejoined the technical staff of ETRI and was involved in development of the Korean DBS Digital Television and High-Definition Television systems. Since 1995, he has been with Gwangju Institute of Science and Technology (GIST), where he is currently Professor of Information and Communications Department. Since August 2003, he has been Director of Realistic Broadcasting Research Center at GIST in Korea. He gave several tutorial lectures at various international conferences, including the IEEE International Conference on Image Processing (ICIP) in 2009, and the Pacific-Rim Conference on Multimedia (PCM) in 2006 and 2008. He is presently serving as an Associate Editor of IEEE Transactions on Circuits and Systems Video Technology (T-CSVT). His research interests include digital image and video coding, advanced source coding techniques, three-dimensional image modelling and representation, and three-dimensional television (3DTV).

 

Tutorial 5: Video Tracking: overview, applications and recent developments

Presenters
Andrea Cavallaro, Queen Mary University of London
Emilio Maggio, Vicon

Abstract
This tutorial will cover the fundamental aspects of algorithm and application development for image-based tracking. The tutorial sets forth the state-of-the-art in image feature extraction, object detection and tracking algorithms and their performance evaluation. Moreover, a number of video tracking applications relevant to the ICIP audience will be described such as surveillance, robotics, smart environments, video editing and human-computer interfaces.

The tutorial will discuss and demonstrate the latest video tracking algorithms with a unified and comprehensive coverage. Starting from the general problem definition and a review of existing and emerging video-based tracking applications, we will present popular methods, such as those based on correlation and gradient-descent minimization. Using practical examples and illustration as support, we will introduce the participants in a discussion of the advantages and the limitations of deterministic approaches and we will guide them toward more efficient and accurate video-based tracking solutions. Recent algorithms based on the Bayes' recursive framework will be presented and their application to real-world tracking scenarios will be discussed. To better exemplify these methods, a particular attention will be given to the video surveillance problem and to multi-hypothesis data association algorithms. Next, an insight is given of the forefront of the research by presenting, discussing and evaluating video-based multi-target tracking based on Finite Set Statistics. We will conclude the tutorial by introducing a collection of software resources and publicly available datasets to help the participants develop and test a video-based tracker.

The lecturers will provide the course material, which will be organized into chapters according to the proposed syllabus. The tutorial slides will be provided as PDF for inclusion in the course distribution material. A website will be developed for the course, which will contain links to supporting material and video segments that enrich the learning experience of the participants.

Description of the tutorial and material to be covered:

1. Image-based tracking: introduction
This first part of the tutorial introduces the reader to the video-based tracking problem and motivates it by reviewing and discussing the most popular and emerging applications of video-based tracking. Practical examples show state-of-the-art video-based tracking results.
a) Overview of the tutorial
b) Motivating application examples
c) Formalization of the image tracking problem

2. Image features and detectors We discuss image pre-processing techniques used in target tracking. Image features that are significant for the disambiguation of the objects of interest from the background are discussed in general and for specific applications. As automated tracking systems incorporate a detection algorithm to initialize and to produce measurements to be processed by the tracker, we will also discuss detection methodologies for both low-level features (i.e., edges, texture and corners) and high-level objects (i.e., faces, vehicles and people).
a) Image pre-processing
b) Feature extraction
c) Appearance representation
d) Modelling appearance changes
e) Shape approximation

3. Video-based single-object tracking
We discuss algorithms that treat each target independently and use as input the initial location of the target (initialization). First, simpler methods based on correlation and gradient descent are introduced.
Next, a comprehensive treatment of the probabilistic Bayesian framework is presented. Finally, we discuss the advantages of recursive filtering with the aid of practical examples
a) Gradient-based trackers
b) Bayes' tracking and the Kalman filter
c) Particle filter
d) Hybrid methods

4. Video-based multi-object tracking
When the results from an object detector are available, the tracking problem can be solved by grouping subsequent instances of the same object over time by resolving association hypotheses in order to form object trajectories. Although the problem is NP-complex with the number of targets and the duration of the image sequence, simplifications can be introduced to reduce the number of association hypotheses. This part of the tutorial first introduces deterministic methods for data association such as the Nearest Neighbor and the Graph Path Cover. Then, methodologies that solve the data association as a probabilistic filtering problem are discussed. Finally, the chapter introduces and discusses the recent multi-target framework based on Finite Set Statistics and Random Finite Sets. This framework elegantly extends the Bayes' recursion to multiple objects and paves the way for a novel class of multiobject video tracking algorithms.

a) Data association
    - Nearest neighbour
- Linear assignment
- Multiple Hypothesis Tracking
b) Random Finite Sets for tracking
  - Random Finite Sets: introduction
- Probabilistic Hypothesis Density filter

5. Conclusions
a) Future development (context modelling, on-line learning)
b) Open research issues (tracking in crowds, tracking from UAV and other mobile platforms)
c) Software and datasets available

Useful Links

Andrea Cavallaro, Queen Mary University of London -
http://www.elec.qmul.ac.uk/staffinfo/andrea
Emilio Maggio, Vicon - http://www.vicon.com

 

Tutorial 6: Photometric Methods for 3-D Modeling

Presenters
Yasuyuki Matsushita, Bennett Wilburn, and Moshe Ben-Ezra Microsoft Research Asia

Abstract
Over the past two decades, we have seen tremendous theoretical advances in photometric methods for 3-D modeling, where we wish to reconstruct the scene geometry from observed images under varying lighting conditions. These advances now motivate us to more practical applications in industry, or even in our daily life.

This tutorial is a focused, vertical introduction to 3-D modeling from photometric signals. In the first part of the tutorial, we will begin with imaging models and cover various issues when one applies photometric methods for 3-D modeling. It also introduces the traditional photometric stereo problem in calibrated and uncalibrated cases, and shows solution methods to these problems. In the second part, we will show how to generalize the traditional methods so as to work under relaxed assumptions, such as non-Lambertian cases, dynamic scenes, unknown camera parameters, etc. This tutorial also covers the application of these techniques in real-world scenarios, such as computer graphics, digital archiving in e-Heritage, and etc.

Description of the tutorial and material to be covered:

The half-day tutorial will consist of two sessions of 1.5 hours each. The first session will cover the theory and algorithms of standard photometric methods, while the second will cover generalizations of the standard techniques and applications of latest photometric methods. A tentative list of lectures and schedule for the tutorial are listed below. Slides associated with these lectures will be made available for inclusion in the ICIP CDROM, and will also be included in the tutorial web-page.

I. Introduction to Photometric Methods for 3-D Modeling (20 minutes)
II. Basic Theory and Algorithms in Photometric Methods (70 minutes)

    a. Basic Image Formation Models in Photometry
b. Radiometric Calibration in the Imaging System
c. Calibrated and Uncalibrated Photometric Stereo
d. Surface Reconstruction from Surface Orientations

III. Generalization of Photometric 3-D Modeling Methods (45 minutes)

    a. Photometric Stereo with Non-Lambertian Surfaces
b. Self-Calibrating Photometric Stereo
c. Geometrically constrainted photometric stereo

IV. Photometric Stereo with Specialized Hardware (45 minutes)

    a. Handheld Photometric Stereo Camera
b. Photometric Stereo and Video Relighting for Dynamic Scenes
c. Dense photometric stereo using very high resolution camera

Speaker Biography

Yasuyuki Matsushita, Lead Researcher, Visual Computing Group, Microsoft Research Asia
Website: http://research.microsoft.com/en-us/people/yasumat/

Dr. Matsushita received his B.S., M.S. and Ph.D degrees in EECS from the University of Tokyo in 1998, 2000, and 2003, respectively. He joined Microsoft Research Asia in April 2003. His areas of research are computer vision (photometric techniques, such as radiometric calibration, photometric stereo, shape-from-shading), computer graphics (image relighting, video analysis and synthesis). Dr. Matsushita served as an Area Chair for IEEE Computer Vision and Pattern Recognition (CVPR) 2009 and International Conference on Computer Vision (ICCV) 2009, and he is on the editorial board member of International Journal of Computer Vision (IJCV) and IPSJ Journal of Computer Vision and Applications (CVA). He also serves as a Program Co-Chair of PSIVT 2010.

Bennett Wilburn, Researcher, Visual Computing Group, Microsoft Research Asia.
Website: http://research.microsoft.com/en-us/people/bwilburn

Dr. Wilburn received his B.S. and M.S. in Electrical Engineering with a focus in VLSI design in 1993. After working on microprocessors for Hewlett Packard, AMD and Silicon Graphics, he returned to Stanford, receiving his Ph.D in Electrical Engineering in 2005. For his thesis, he designed custom CMOS cameras for a scalable video camera array, and devised high-performance imaging methods using the 100 camera system. His primary research interest is hardware system design for graphics and vision, especially for real-world performance capture, modeling, and relighting. Recently he has
begun researching 3D displays and new interaction styles for mobile devices.

Moshe Ben-Ezra, Lead Researcher, Visual Computing Group, Microsoft Research Asia.
Website: http://research.microsoft.com/en-us/people/mosheb

Dr. Ben-Ezra received his Ph.D in Computer sciene from the Hebrew University of Jerusalem in 2000. He did his post-doctoral studies at Columbia University in the city of New York. Before joining Microsoft Research Asia in 2006 he was a member of technical staff in Siemens Corporate Research. His research interests include physics based and hardware related computer vision.

 

Tutorial 7: To Tell a Good Picture from a Good One: Perceptual Visual Quality Evaluation

Presenter
Weisi Lin, School of Computer Engineering, Nanyang Technological University

Abstract
Quality evaluation of images and video is useful in many applications, and also crucial in shaping almost all visual processing algorithms/systems, as well as their implementation, optimization and testing. Since the human visual system (HVS) is the final receiver and appreciator for most processed images and videos (be they naturally captured or computer generated), it would be beneficial to use a perceptual quality criterion in the system design and optimization, instead of a traditional one (e.g., MSE, SNR, PSNR, QoS). As a result of the evolution, the HVS has developed unique characteristics. Significant research effort has been made toward modeling the HVS' picture quality evaluation mechanism, and to apply the models to various situations (quality metrics themselves; or other applications, like image/video compression, watermarking, channel coding, signal restoration/enhancement, computer graphics, and visual content retrieval). In this tutorial, we will first introduce the problems associated with perceptual visual quality metrics (PVQMs) (to be in line with the HVS perception), the relevant physiological/psychological knowledge, and the major research and development work so far in the related fields. The basic computational modules are to be discussed, with different applications presented (e.g., in visual signal compression, enhancement, image rendering, and content retrieval). Since such technology has started to find applications in industries, we will also discuss some examples of early industrial deployment, based on the presenter's substantial project exposure with various companies. The tutorial aims at providing a systematic, comprehensive and up-to-date overview in perceptual quality gauging for images and videos. It can also provide a practical user's guide to the various relevant techniques, and all approaches are to be presented with clear classification, and careful comparison/comments whenever possible, based upon our understanding and experience in the said areas.

Description of the tutorial and material to be covered:

Part 1. Introduction & Problem Statements (~25 mins)
Relevant concepts, necessity and difficulties of perceptual visual quality evaluation
are to be firstly introduced. The progress, applications and challenges of the relevant research will be reviewed.

Part 2. Related Physiological & Psychological Findings (~35 mins)
An overview of the related human visual system (HVS) characteristics is given in this part, inclusive of physiological and psychological aspects. The results of important psycho-visual experiments are presented for both single-stimulus tests and real-world images. The emphasis is on the knowledge relevant to the existing and future R & D efforts.

Part 3. Basic Computational Modules (~40 mins)
We present the oft-adopted computational modules in the existing perceptual visual quality evaluation research and applications: spatial and temporal Contrast Sensitivity Function (CSF), luminance adaptation, visual attention, effect of eye movement, intraand inter- band contrast masking, common artefact detection, and just-noticeable difference (JND).

Part 4. Perceptual Visual Quality Metrics (PVQMs) (~60 mins)
Different types of PVQMs will be presented: vision-based, signal-based and hybrid; full-reference, reduced-reference and no reference. Discussions are to be given toward various operating domains (i.e., pixels, DCT, wavelet or other decompositions), according to the requirements/constraints in practice. Different recent approaches are to be highlighted for the two major processes: feature extraction and feature pooling. Subjective viewing tests and model verification with publicly available databases will be also discussed in this part. Some applications and early industrial deployment will be demonstrated with systems for both natural and computer-generated visual signals.

Part 5. Conclusions and Discussion on Future Work (~25 mins)
We will give a summary of this tutorial, the concluding remarks, and our views toward possible future research and development in the areas related to perceptual visual quality evaluation and modelling.

Tutorial material to be provided
The copy of the power point presentation and a list of related papers/bookchapters/websites for further reading

Speaker Biography

Weisi Lin graduated from Zhongshan University, China with B.Sc and M.Sc in 1982 and 1985, respectively, and from King's College, London University, UK with Ph.D in 1992. He taught and researched in Zhongshan University, Shantou University (China), Bath University (UK), National University of Singapore, Institute of Microelectronics (Singapore), and Institute for Infocomm Research (Singapore). He has been the project leader of 12 successfully-delivered projects (mostly for industries) in digital multimedia technology development since 1997. He also served as the Lab Head of Visual Processing and the Active Department Manager in Institute for Infocomm Research. Currently, he is an Associate Professor in School of Computer Engineering, Nanyang Technological University in Singapore. His areas of expertise include image processing, video and audio compression, perceptual modelling, computer vision, and multimedia communication. He is a Chartered Engineer, and a fellow of the IET. He is currently an Associate Editor of Journal of Visual Communication and Image Representation. He believes that good theory is practical, and keeps a good balance between academic research and industrial development.

 

Tutorial 8: Human Behavior Analysis

Presenters
Nicu Sebe, Nicola Conci (University of Trento)

Abstract
This tutorial will take a holistic view on the research issues and applications of
Human behavior analysis focusing on the image processing related aspects. There are two main directions of interest: (1) close-range behavior analysis which includes facial expression, eye tracking and gaze, head pose analysis and (2) far-range analysis which includes body tracking, trajectory representation and matching, activity detection.

Description of the tutorial and material to be covered:

Image and video processing plays a fundamental role in many new types of interfaces and application areas (multimodal and attentive interfaces, applications such as surveillance, ambient assisted living, etc.) in which humans play a central role. This implies that building systems capable of analyzing and understanding behaviors lies at the crossroads of many research areas (psychology, artificial intelligence, pattern recognition, image processing, computer vision, etc.).

Domains where human behavior understanding is crucial (e.g., human-computer interaction, affective computing, surveillance, etc.) rely on advanced image processing and pattern recognition techniques to automatically interpret complex behavioral cues generated when humans act in their natural environment. This is a challenging problem where many issues are still open, including the joint modeling of behavioral cues taking place at different time scales, the inherent uncertainty of machine detectable evidences of human behavior, the presence of long term dependencies in observations extracted from human behavior, and the important role of dynamics in human behavior understanding.

This tutorial is meant for researchers dealing with the problem of modeling human behavior under its multiple facets (expression of emotions, performance of individual or joint actions, etc.), with particular attention to image processing approaches that model the actual dynamics of behavior in close and far-range domains. The contiguity with ICIP, is expected to foster
cross-pollination between several research communities such as computer vision, image processing, human-centered computing, etc., in order to merge the analysis carried out at different scales (gaze, posture, position), improving therefore the capability of capturing the whole dynamics of the action under investigation.

In this tutorial, we take a holistic approach to developing human-behavior understanding systems. We aim to identify the important research issues, and to ascertain potentially fruitful future research directions in this area. In particular, we introduce key concepts, discuss technical approaches and open issues in two areas: (1) close-range behavior analysis: face tracking, facial expression analysis, eye tracking and gaze, head pose analysis and (2) farrange analysis: body tracking, activity detection, trajectory analysis, etc.
Each topic will be complemented by a number of demonstrations and end-user applications that will be shown as viable implementations, in order to highlight the benefits that image and video processing can infer to the deployment of tools in the areas of HCI, visual surveillance, assisted living, rehabilitation, etc.

Benefits & List of Topics: This tutorial will enable the participants to understand key concepts, state-of-the-art techniques, and open issues in the areas described below. The tutorial will cover parts of the following topic areas:

- Vision for multimodal interaction: overview of techniques and state of the art in eye detection and visual gaze estimation.
- Emotion recognition for affective retrieval and in affective interfaces: approaches to multimedia content analysis and interaction that use facial expression recognition.
- Machine learning: adaptive interfaces and learning of visual patterns from user input for automatic detection and recognition
- Vision for activity detection: analysis of trajectories of humans in indoor
environments, posture and motion dynamics
- Trajectory mining: query formulation, representation and retrieval of trajectories in video databases.
- Applications: traditional and emerging application areas will be described with specific examples in smart conference room research, interaction for people with disabilities, entertainment, and others.

This tutorial has been specifically designed for the audience of ICIP; while the focus of the tutorial will be technical, we aim at giving participants a broad view of research and important topics for developing Human Behavior Analysis Systems. Materials will include an overview of technical approaches for Vision-Based Human-Computer Interaction as well as materials from numerous sources not typically present at ICIP. Handouts will include presentation slides and also include relevant references.

 

Tutorial 9: Image Denoising and the SURE-LET Methodology

Presenters
Thierry Blu, Dept of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR.
Florian Luisier, Biomedical Imaging Laboratory, Swiss Federal Institute of Technology (EPFL) Lausanne, Switzerland.

Abstract
The goal of this tutorial is to introduce the attendance to a new approach for dealing with noisy data - typically, images or videos here.
Image denoising consists in approximating the noiseless image by performing some, usually non-linear, processing of the noisy image. Most standard techniques involve assumptions on the result of this processing (sparsity, low high-frequency contents, etc.); i.e., the denoised image.
Instead, the SURE-LET methodology that we promote consists in approximating the processing itself (seen as a function) in some linear combination of elementary non-linear processings (LET: Linear Expansion of Thresholds), and to optimize the coefficients of this combination by minimizing a statistically unbiased estimate of the Mean Square Error (SURE: Stein's Unbiased Risk Estimate, for additive Gaussian noise).
This tutorial will introduce the technique to the attendance, will outline its advantages (fast, noise-robust, flexible, image adaptive). A very complete set of results will be shown and compared with the state-of-the-art.
Extensions of the approach to Poisson noise reduction with application to microscopy imaging will also be shown.

Description of the tutorial and material to be covered:

1. Review of usual image denoising approaches
2. Stein's Unbiased Risk Estimate (SURE): Estimating the MSE without assumptions on the ground truth.
3. Aproximation of the denoising process using a linear combination of thresholds in a transformed domain (LET)
a. Wavelets and simple thresholds
b. Wavelet interscale thresholds
4. The SURE-LET optimization
a. Orthogonal representations/transformations
b. Non-orthogonal/Redundant representations
5. Algorithm description and results
a. Grayscale image denoising
b. Color/Multichannel image denoising
c. Video denoising
6. Extension to Poisson denoising
a. Poisson MSE estimate (PURE)
b. Interscale algorithm and results

Speaker Biography

Thierry Blu received Engineering Diplomas from Ecole Polytechnique, France, in 1986 and from Telecom Paris (ENST), France, in 1988. In 1996, he obtained a Ph.D in electrical engineering from ENST for a study on iterated rational filterbanks, applied to wideband audio coding.
Between 1998 and 2007, he was with the Biomedical Imaging Group at the Swiss Federal Institute of Technology (EPFL) in Lausanne, Switzerland. He is now a Professor in the Department of Electronic Engineering, The Chinese University of Hong Kong.
Dr. Blu was the recipient of two Best Paper Awards from the IEEE Signal Processing Society (2003 and 2006). He is also coauthor (with F. Luisier) of a paper that received a Young Author Best Paper Award (2009) from the same society. Between 2002 and 2006, he has been an Associate Editor of the IEEE Transactions on Image Processing and since 2006, of the IEEE Transactions on Signal Processing. He is also Associate Editor of Elsevier Signal Processing and of EURASIP Journal on Image and Video Processing. He is a member of the IEEE Technical Committee on "Signal Processing Theory and Methods".
Research interests: (multi)wavelets, multiresolution analysis, multirate filterbanks, interpolation, approximation and sampling theory, sparse sampling, image denoising, psychoacoustics, biomedical imaging, optics, wave propagationíK

Florian Luisier was born in Switzerland, in 1981. In 2005, he received his Master degree in Microengineering from the Swiss Federal Institute of Technology (EPFL) in Lausanne, Switzerland. In 2010, he obtained his Ph.D. in Computer, Communication, and Information Sciences from the same institution. Since 2005, he has been with the EPFL's Biomedical Imaging Group led by Prof. Michael Unser.
Dr. Luisier is the recipient of the 2009 Young Author Best Paper Award from the IEEE Signal Processing Society. Since 2006, he has served as reviewer for various international scientific journals, including IEEE Transactions on Image Processing and IEEE Transactions on Medical Imaging.
His research interests include image processing, multiresolution representations, risk estimation techniques, and the restoration of multidimensional biomedical data.

 

Tutorial10: Low-rank Matrix Recovery: From Theory to Imaging Applications

Presenters
Yi Ma, University of Illinois,
Zhouchen Lin, Microsoft Research Asia,
John Wright, Microsoft Research Asia

Abstract
The goal of this tutorial is to provide the ICIP community with an introduction to the quickly developing area of low-rank matrix recovery. Low-rank (or approximately low-rank) matrices arise in a great number of applications involving image and video data. A few recurrent examples in this tutorial will include aligning batches of images, super-resolution and video inpainting, and in background modelling for visual tracking and surveillance. However, in real applications our observations are never prefect: observations are always noisy, often missing, and sometimes grossly or even maliciously corrupted. The recent excitement surrounding low-rank matrix recovery is due to very recent results showing that under fairly general circumstances, the low rank recovery problem can be efficiently and exactly solved, by convex programming. These theoretical advances have inspired a flurry of algorithmic work, giving increasingly practical and scalable algorithms for solving the corresponding convex programs.

The theory and algorithms described above, which the proposers have had a strong role in developing, are already beginning to influence practice in a number of areas, including collaborative filtering and computer vision. However, we believe these results are poised for even stronger impact in image processing. The purpose of this tutorial is to bring these ideas to the ICIP community, by giving a solid and unified introduction to the existing theoretical and algorithmic state of the art in the area, and then show how this theory and algorithms are already being used to solve real imaging problems.

In particular, we will:

1. Familiarize participants with the basic problem setting and theory of low-rank matrix recovery, including when the problem is well-posed and when efficient solutions are possible.
2. Equip participants with algorithmic tools for solving with matrix recovery problems, based on recent advances in non-smooth convex programming.
3. Give examples showing how these tools can be used to solve real-world problems involving images and videos.

All lecture material (notes and slides) will be made available during the tutorial, via the website:
http://watt.csl.illinois.edu/~perceive/matrix-rank/
That website also contains source code that will allow interested participants to get a hands-on feel for the performance of the methods, and to begin using them in their own research.

Description of the tutorial and material to be covered:

The half-day tutorial will consist of an introduction, followed by three tutorial sessions of roughly 45 minutes each. The introduction will motivate the tutorial by introducing several model applications, and putting the work to be presented in the context of existing work in the area.
The introductory session will be followed by a session introducing the current theoretical understanding of the low-rank recovery problem, including when the problem is well-posed, and when one can hope for an efficient solution. This session will emphasize the new theory and algorithm's ability to simultaneously cope with many non-ideal factors in real application data, including missing elements, errors and corruption, and noise.
Once participants have a good introductory feel for the current theoretical state of the art in the area, the tutorial will move into its second stage, in which we discuss practical and scalable algorithmic solutions to the low-rank recovery problem. This section will discuss recent advances in non-smooth convex optimization that now enable the solution of moderate-to-large scale matrix recovery problems in a matter of minutes on a standard PC. We will discuss techniques from numerical linear algebra that can further improve speed and scalability, from a user's perspective. Finally, we will show how the algorithms introduced can be parallelized (for very large scale applications) and implemented on the GPU (for time-critical applications). The algorithms introduced will correspond to publicly available code packages that participants can immediately begin using in their own applications.
With this theoretical and algorithmic groundwork established, we will close the tutorial with a number of applications from image and video processing, as well as computer vision. We will show how rank-minimization problems arise naturally in the applications contexts, and how they can be understood and solved using the tools from the first two sessions of the tutorial. We will emphasize how the basic theory and algorithms extend to meet application challenges, for example, in dealing with image transformations in batch image alignment or image and video super-resolution.

Below is a more detailed outline of the planned tutorial sessions:

I. Introduction and Overview of Tutorial (5 min), Motivating Scenarios (10 min)

II. Basic Theory of Low-rank Matrix Recovery (45 min)

a. Problem formulation: errors, missing data, noise
b. Well-posedness: when can any algorithm recover a low-rank matrix?
c. Guarantees: when can efficient algorithms recover a low-rank matrix?

III. Algorithms for Low-rank Matrix Recovery (45 min)

a. Fast first-order methods for nonsmooth convex programming
b. Warm starts and specific techniques for improved speed
c. Parallel and GPU implementations

IV. Applications in Image and Video Processing (45 min)

a. Batch image alignment by low-rank and sparse decomposition
b. Video superresolution via low-rank optimization
c. Applications in tracking and surveillance

Speaker Biography

Yi Ma is an associate professor at the Electrical & Computer Engineering Department of the University of Illinois at Urbana-Champaign. He is also the research manager of the Visual Computing group at Microsoft Research Asia in Beijing since January 2009. His main research interest is in computer vision, high-dimensional data analysis, and systems theory. He is the first author of the popular vision textbook "An Invitation to 3-D Vision," published by Springer in 2003. Yi Ma received two Bachelors' degree in Automation and Applied Mathematics from Tsinghua University (Beijing, China) in 1995, a Master of Science degree in EECS in 1997, a Master of Arts degree in Mathematics in 2000, and a PhD degree in EECS in 2000, all from the University of California at Berkeley. Yi Ma received the David Marr Best Paper Prize at the International Conference on Computer Vision 1999, the Longuet-Higgins Best Paper Prize at the European Conference on Computer Vision 2004, and the Sang Uk Lee Best Student Paper Award with his students at the Asian Conference on Computer Vision in 2009. He also received the CAREER Award from the National Science Foundation in 2004 and the Young Investigator Award from the Office of Naval Research in 2005. He is an associate editor of IEEE Transactions on Pattern Analysis and Machine Intelligence and has served as the chief guest editor for special issues for the Proceedings of IEEE and the IEEE Signal Processing Magazine. He will also serve as Program Chair for ICCV 2013 in Sydney, Australia. He is a senior member of IEEE and a member of ACM, SIAM, and ASEE.

Zhouchen Lin is a researcher at Visual Computing group, Microsoft Research Asia. He received his Bachelor's degree in pure mathematics from Nankai University in 1993 and Master's and Doctor's degrees in applied mathematics from Peking University in 1996 and 2000, respectively. His research interests include computer vision, image processing, machine learning, pattern recognition, numerical computation and optimization, and computer graphics. He is a guest professor to Shanghai Jiaotong University, Beijing Jiaotong University, Southeast University, and Institute of Computing Technologies, Chinese Academy of Sciences. He is a senior member of the IEEE.

John Wright is a researcher in the Visual Computing group at Microsoft Research Asia. He received his PhD in Electrical Engineering from the University of Illinois at Urbana-Champaign. His graduate work focused on developing efficient and provably correct algorithms for error correction with high-dimensional data, and on their application in automatic face recognition. His research interests encompass a number of topics in vision and signal processing, including minimum description length methods for clustering and classification, error correction and inference with non-ideal data, video analysis and tracking, as well as face and object recognition. His work has received a number of awards and honors, including a UIUC Distinguished Fellowship, Carver Fellowship, Microsoft Research Fellowship, the UIUC Martin Award for Outstanding Graduate Research, and the Lemelson-Illinois Prize for Innovation.