 |
Tutorials
|
Sunday,
September 26, 09:30 - 12:30 |
|
T-1: Advanced
Image Processing Based on Spatially Adaptive Nonlocal Image
Filtering and Regularization
Presenters: Vladimir Katkovnik, Alessandro Foi, Karen
Egiazarian.
Location: S423 |
|
T-2: Poisson
Image Theory for Integrating Detectors: Algorithms and Applications
Presenters: Keigo Hirakawa and Patrick J. Wolfe
Location: S424 |
|
T-3: Perceptual
Quality Measurement for Video
Presenter: Stefan Winkler
Location: S426 |
|
T-4: Video
Processing Techniques for 3-D Television
Presenter: Yo-sung Ho
Location: S427 |
|
T-5: Video
Tracking: overview, applications and recent developments
Presenters: Andrea Cavallaro and Emilio Maggio
Location: S428 |
|
Sunday,
September 26, 13:30 - 16:30 |
|
T-6:
Photometric Methods for 3-D Modeling
Presenters: Yasuyuki Matsushita, Bennett Wilburn, and
Moshe Ben-Ezra
Location: S423 |
|
T-7:
To Tell a Good Picture from a Good One: Perceptual
Visual Quality Evaluation
Presenter: Weisi Lin
Location: S424 |
|
T-8: Human
Behavior Analysis
Presenters: Nicu Sebe and Nicola Conci
Location: S426 |
|
T-9: Image
Denoising and the SURE-LET METHODOLOGY
Presenters: Thierry Blu and Florian Luisier
Location: S427 |
|
T-10:
Low-rank Matrix Recovery: From Theory to Imaging
Applications
Presenters: Yi Ma, Zhouchen Lin, John Wright
Location: S428 |
|
|
Tutorial
1: Advanced Image Processing Based on Spatially Adaptive Nonlocal
Image Filtering and Regularization |
Presenters
Vladimir Katkovnik, Alessandro Foi, Karen Egiazarian.
Department of Signal Processing, Tampere University of Technology,
Tampere, Finland.
Abstract
Nonlocal imaging techniques look for blocks (patches, fragments)
which are similar to each other and process them jointly. This sort
of techniques appear independently and in parallel in a number of
different developments, in particular as the block matching proposed
for video processing and as nonlocal means in nonparametric regression
modeling. The last few years witness an intensive flow of publications
with ideas and techniques based on various nonlocal approximations.
Some of these nonlocal developments report very good and sometimes
even extraordinary good performance. While a theory for these methods
is far from being fully developed, the source of their advanced
performance is clear as it originates from the fact that real-life
images are characterized by mutual similarity of their fragments.
Nonlocal imaging techniques appear in forms and modifications that
can be so diverse that it is sometimes difficult even to recognize
that they belong to the same class of algorithms.
One of the motivations of this tutorial is to provide a proper classification
of the nonlocal techniques that can serve as a guideline for orienting
in this intensively developing area. We analyze the evolution of
the nonlocal modeling in imaging from the local Nadaraya-Watson
kernel estimate to the nonlocal means, and further to transform-domain
filtering based on nonlocal block-matching.
Another motivation is to demonstrate the efficiency and the state-of-the-art
performance achieved by the recent nonlocal algorithms.
The considered methods are classified mainly according to two main
features: local/nonlocal and pointwise/multipoint. These alternatives,
though obvious simplifications, allow to impose a fruitful and transparent
classification of the basic ideas in the advanced techniques. Here
nonlocal is an alternative to local, and multipoint is an alternative
to pointwise. In the multipoint case the data are typically processed
by overlapping subsets, i.e. windows, blocks or generic neighborhoods,
and multiple estimates are obtained for each individual point. The
final estimate is obtained by aggregating (fusing) the multiple
multipoint estimates. It is found that this sort of redundant approximations
with multiple estimates for each pixel dramatically improves the
accuracy of estimation.
Within this framework, we discuss different forms of efficient redundant
modeling as well as an original single- and multiple-model transform
domain nonlocal collaborative filtering approach.
The tutorial is accompanied by numerous examples where these methods
are applied to competitive image processing problems.
Applications include Gaussian and non-Gaussian denoising, deblurring
(deconvolution), deblocking and deringing, inverse-halftoning, color
image processing, etc.
Matlab software, which implements the presented techniques and experiments,
is publicly available on the website http://www.cs.tut.fi/~foi.
Description
of the tutorial and material to be covered:
1.
Local pointwise modeling: pointwise weighted means, pointwise polynomial
(non-polynomial) approximations;
2. Adaptivity of pointwise polynomial estimates: varying spatially
adaptive optimal window size (scale) estimates, Lepski's approach
for the window size selection, intersection of confidence intervals
(ICI) rule;
3. Local polynomial approximation (LPA) with anisotropic adaptive
supports;
4. Signal-dependent windows/weights;
5. Variational formulations of local pointwise modeling;
6. Local multipoint modeling: overcomplete transforms, model selection
rules, order adaptive models, aggregation of multipoint estimates;
7. Shape-adaptive transform domain filtering;
8. Nonlocal pointwise modeling: weights defined by pointwise and
neighborhood-wise differences, nonlocal (NL)-means;
9. Recursive reweighting;
10. Nonlocal pointwise higher-order models;
11. Variational formulations of nonlocal models;
12. Nonlocal multipoint modeling: single-model approach;
13. Nonlocal multipoint modeling: multiple-model approach, collaborative
filtering, block-matching 3D transform (BM3D) algorithms;
14. Non-Gaussian image processing based on optimized variance stabilizing
transformations with exact unbiased inverse.
15. Applications: denoising, deblurring, super-resolution imaging,
compressive sensing, still and video imaging, etc.
Speakers' Biographies
Prof. Vladimir Katkovnik received the M.Sc., Ph.D., and
D.Sc. degrees in technical cybernetics from the Leningrad Polytechnic
Institute, Leningrad, Russia, in 1960, 1964, and 1974, respectively.
From 1964 to 1991, he held the positions of Associate Professor
and Professor at the Department of Mechanics and Control Processes,
Leningrad Polytechnic Institute. From 1991 to 1999, he was a Professor
of the Department of Statistics of the University of South Africa,
Pretoria. From 2001 to 2003, he was a Professor of the Mechatronics
Department of Kwangju Institute of Science and Technology, South
Korea. From 2000 to 2001 and since 2003 he is a Research Professor
with the DSP/TUT. His research interests include stochastic signal
processing, linear and nonlinear filtering, nonparametric estimation,
imaging, nonstationary systems, and time-frequency analysis.
Dr.
Alessandro Foi received the M.Sc. degree in Mathematics from
the Universita degli Studi di Milano, Italy, in 2001, the Ph.D.
degree in Mathematics from the Politecnico di Milano in 2005, and
the D.Sc.Tech. degree in Signal Processing from Tampere University
of Technology, Finland, in 2007. His research interests include
mathematical and statistical methods for signal processing, functional
analysis, and harmonic analysis. Currently, he is a senior researcher
at the Department of Signal Processing, Tampere University of Technology.
His work focuses on spatially adaptive algorithms for denoising
and deblurring of digital images and on noise modeling for digital
imaging sensors.
Prof.
Karen Egiazarian received the M.Sc. degree in Mathematics
from Yerevan State University, Armenia, in 1981, the Ph.D. degree
in physics and mathematics from Moscow State University, Moscow,
Russia, in 1986, and the D.Tech. degree from Tampere University
of Technology, Finland, in 1994. He has been Senior Researcher with
the Department of Digital Signal Processing, Institute of Information
Problems and Automation, National Academy of Sciences of Armenia.
Since 1996, he has been an Assistant Professor with the DSP/TUT,
where he is currently a Professor, leading the Transforms and Spectral
Methods group. His research interests are in the areas of applied
mathematics, signal/image processing, and digital logic.
|
|
|
Tutorial
2: Poisson Imaging Theory for Integrating Detectors: Algorithms
and Applications |
Presenters
Keigo Hirakawa, University of Dayton
Patrick J. Wolfe, Harvard University
Abstract
The ubiquity of digital imaging in consumer, scientific, and medical
applications places ever greater demands on engineers to understand
how photons are accumulated and counted by pixel sensors as integrating
detectors. In this tutorial we address the key points of Poisson
imaging theory as a means of understanding the dynamics of image
data acquisition and processing and mitigating the effects of shrinking
device footprints, increasing dynamic range requirements, and heightened
image quality expectations. This topic is of broad interest yet
sufficiently focused to provide a clear, structured framework for
the tutorial setting. Its aim is to educate students, researchers,
and industry practitioners in relevant theory, algorithms, and applications
for working with Poisson data. Participants will gain an understanding
of practical implementational trade-offs, along with a set of technical
tools and frameworks for thinking holistically about integrating
detectors and inference for Poisson processes in imaging.
Description
of the tutorial and material to be covered:
I.
Introduction: Photon Counts, Poisson Processes and Integrating
Detectors |
|
A.
Integrating detectors and relevant concepts from image sensing
B. Basics of Poisson processes and statistical inference
C. Sources of Poisson variability, missingness, and censoring
in the data acquisition process |
II.
Algorithms for Sensing and Denoising Poisson Intensities |
|
A.
Image resolution/distortion trade-offs in sensing
B. Estimation with and without variance stabilization
C. Wavelet-based Poisson data processing
D. High dynamic range (HDR) image acquisition |
III.
Applications and In-Depth Worked Examples |
|
A.
Astronomy - Single-photon detection
B. Medical Imaging - The photon paucity problem
C. Consumer Color Cameras - Joint denoising/demosaicking |
Speaker
Biographies
The tutorial will be led by presenters with extensive experience
of both the theoretical and practical aspects of statistical image
acquisition and processing, and whose research has been funded by
companies such as Sony Electronics, Inc., and Texas Instruments.
Prof. Keigo Hirakawa, currently of the University of Dayton, has
been an ASIC engineer and principal image scientist for the camera
division of Hewlett-Packard/Agilent Technologies, and has been involved
with camera development efforts at Micron, Sony, Texas Instruments,
and Kodak. For two years he led a Harvard-Sony camera pipeline collaboration
jointly with Prof. Wolfe, who heads the Statistics and Information
Sciences Laboratory at Harvard. Jointly with Prof. Hirakawa, Prof.
Patrick Wolfe has received an ICIP 2007 DoCoMo paper award for work
in color filter array design, and delivered an ICIP 2008 tutorial
on the color imaging pipeline for digital cameras.
|
|
|
Tutorial
3: Perceptual Quality Measurement for Video |
Presenter
Stefan Winkler
Cheetah Technologies - http://www.cheetahtech.com/
1901 Charcot Avenue, San Jose, CA 95131, USA
Abstract
The proliferation of digital video content (e.g. IPTV services,
Internet video, etc.) has underlined the importance of subjective
and objective methods for video quality measurement. At the same
time, research on the various metrics, methods and standards is
highly active and constantly evolving, creating an abundance of
video quality monitoring solutions which can be confusing to users.
Significant progress has been made recently, both in terms of standardized
approaches, as well as our understanding of the evaluation and performance
of quality metrics.
This tutorial introduces participants to the fundamentals of visual
perception and the algorithmic modeling of human vision. It will
discuss the various factors that influence a viewer's perception
of video quality. It will also briefly cover the basics of MPEG-2/H.264
video compression, IP transmission, and the most common distortions
present as a result.
The focus of the tutorial will be subjective as well as objective
methods for video quality assessment. Here we will cover various
methods for subjective tests, how to obtain Mean Opinion Scores
(MOS), and related considerations important for conducting such
experiments. We will also review a number of popular video quality
metrics and discuss the different approaches taken by these algorithms,
as well as issues with metric validation.
In the last part, we will discuss international standards related
to video quality and ongoing standardization activities in VQEG,
ITU, ATIS, VSF, and other groups. We will also review publicly available
resources for video quality testing and metrics development. Finally,
the tutorial will conclude with an outlook on future trends in this
area, in particular 3D video quality.
Course material to be provided will include the tutorial slides
and a website with links to supporting material such as white papers,
public datasets, and software resources.
Outline of the tutorial and material to be covered:
�P Visual perception
�P Vision modeling
�P Quality factors
�P Video compression, transmission, and other common distortions
�P Subjective methods for video quality assessment
�P Objective methods for video quality assessment
�P How to validate quality metrics
�P Video quality standards (VQEG, ITU, ATIS, VSF, etc.)
�P Recent developments and trends (including 3D)
Speaker
Biography
Stefan Winkler holds a M.Sc. degree in Electrical Engineering from
the University of Technology in Vienna, Austria, and a Ph.D. degree
from the Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland.
He is currently Chief Scientist at Cheetah Technologies. Prior to
that, he was Principal Technologist at Symmetricom and Chief Scientist
at Genista, which he co-founded in 2001. He has also held assistant
professor positions at the National University of Singapore (NUS)
and the University of Lausanne, Switzerland.
Dr. Winkler has 15 years of experience in the areas of image/video
processing and quality assessment. He has published more than 60
papers and is the author of the book "Digital Video Quality".
He serves as an Associate Editor for IEEE Transactions on Image
Processing. He has been an active contributor to the Video Quality
Experts Group (VQEG) since it was founded in 1997 and is currently
co-chair of the QoE Metrics Activity Group of the Video Services
Forum (VSF).
|
|
|
Tutorial
4: Video Processing Techniques for 3-D Television |
Presenter
Yo-Sung Ho, Gwangju Institute of Science and Technology (GIST),
Korea
Abstract
In recent years, various multimedia services have become available
and the demand for three-dimensional television (3DTV) is growing
rapidly. Since 3DTV is considered as the next generation broadcasting
service that can deliver real and immersive experiences by supporting
user-friendly interactions, a number of advanced 3D video processing
technologies have been developed. In this tutorial lecture, we will
cover the current state-of-the-art video processing techniques for
3DTV, including camera calibration and image rectification, illumination
compensation and colour correction, depth map modelling and enhancement,
3D warping and depth map refinement, coding of multi-view video
and depth map, hole filling for occluded objects, and virtual view
synthesis.
Description
of the tutorial and material to be covered:
Trend
of Broadcasting Technologies |
Review
of 3DTV related Activities |
Video
Processing Techniques for 3DTV |
Pre-processing
of multi-view video |
|
-
Illumination compensation and colour correction
- Camera calibration and image rectification |
Depth
map generation for multi-view video |
|
-
Depth map modelling and enhancement
- 3D warping and depth map refinement |
Coding
of multi-view video and depth map |
|
-
Prediction structure
- Compression operation |
Intermediate
view synthesis |
|
-
Hole filling for occluded objects
- Virtual view synthesis |
Conclusions |
Speaker
Biography
Dr.
Yo-Sung Ho received the B.S. and M.S. degrees in electronic engineering
from Seoul National University, Seoul, Korea, in 1981 and 1983,
respectively, and the Ph.D. degree in electrical and computer engineering
from the University of California, Santa Barbara, in 1990. He joined
ETRI (Electronics and Telecommunications Research Institute), Daejon,
Korea, in 1983. From 1990 to 1993, he was with North America Philips
Laboratories, Briarcliff Manor, New York, where he was involved
in development of the Advanced Digital High-Definition Television
(AD-HDTV) system. In 1993, he rejoined the technical staff of ETRI
and was involved in development of the Korean DBS Digital Television
and High-Definition Television systems. Since 1995, he has been
with Gwangju Institute of Science and Technology (GIST), where he
is currently Professor of Information and Communications Department.
Since August 2003, he has been Director of Realistic Broadcasting
Research Center at GIST in Korea. He gave several tutorial lectures
at various international conferences, including the IEEE International
Conference on Image Processing (ICIP) in 2009, and the Pacific-Rim
Conference on Multimedia (PCM) in 2006 and 2008. He is presently
serving as an Associate Editor of IEEE Transactions on Circuits
and Systems Video Technology (T-CSVT). His research interests include
digital image and video coding, advanced source coding techniques,
three-dimensional image modelling and representation, and three-dimensional
television (3DTV).
|
|
|
Tutorial
5: Video Tracking: overview, applications and recent developments
|
Presenters
Andrea Cavallaro, Queen Mary University of London
Emilio Maggio, Vicon
Abstract
This tutorial will cover the fundamental aspects of algorithm and
application development for image-based tracking. The tutorial sets
forth the state-of-the-art in image feature extraction, object detection
and tracking algorithms and their performance evaluation. Moreover,
a number of video tracking applications relevant to the ICIP audience
will be described such as surveillance, robotics, smart environments,
video editing and human-computer interfaces.
The
tutorial will discuss and demonstrate the latest video tracking
algorithms with a unified and comprehensive coverage. Starting from
the general problem definition and a review of existing and emerging
video-based tracking applications, we will present popular methods,
such as those based on correlation and gradient-descent minimization.
Using practical examples and illustration as support, we will introduce
the participants in a discussion of the advantages and the limitations
of deterministic approaches and we will guide them toward more efficient
and accurate video-based tracking solutions. Recent algorithms based
on the Bayes' recursive framework will be presented and their application
to real-world tracking scenarios will be discussed. To better exemplify
these methods, a particular attention will be given to the video
surveillance problem and to multi-hypothesis data association algorithms.
Next, an insight is given of the forefront of the research by presenting,
discussing and evaluating video-based multi-target tracking based
on Finite Set Statistics. We will conclude the tutorial by introducing
a collection of software resources and publicly available datasets
to help the participants develop and test a video-based tracker.
The
lecturers will provide the course material, which will be organized
into chapters according to the proposed syllabus. The tutorial slides
will be provided as PDF for inclusion in the course distribution
material. A website will be developed for the course, which will
contain links to supporting material and video segments that enrich
the learning experience of the participants.
Description
of the tutorial and material to be covered:
1.
Image-based tracking: introduction
This first part of the tutorial introduces the reader to the video-based
tracking problem and motivates it by reviewing and discussing the
most popular and emerging applications of video-based tracking.
Practical examples show state-of-the-art video-based tracking results.
a) Overview of the tutorial
b) Motivating application examples
c) Formalization of the image tracking problem
2.
Image features and detectors We discuss image pre-processing techniques
used in target tracking. Image features that are significant for
the disambiguation of the objects of interest from the background
are discussed in general and for specific applications. As automated
tracking systems incorporate a detection algorithm to initialize
and to produce measurements to be processed by the tracker, we will
also discuss detection methodologies for both low-level features
(i.e., edges, texture and corners) and high-level objects (i.e.,
faces, vehicles and people).
a) Image pre-processing
b) Feature extraction
c) Appearance representation
d) Modelling appearance changes
e) Shape approximation
3.
Video-based single-object tracking
We discuss algorithms that treat each target independently and use
as input the initial location of the target (initialization). First,
simpler methods based on correlation and gradient descent are introduced.
Next, a comprehensive treatment of the probabilistic Bayesian framework
is presented. Finally, we discuss the advantages of recursive filtering
with the aid of practical examples
a) Gradient-based trackers
b) Bayes' tracking and the Kalman filter
c) Particle filter
d) Hybrid methods
4.
Video-based multi-object tracking
When the results from an object detector are available, the tracking
problem can be solved by grouping subsequent instances of the same
object over time by resolving association hypotheses in order to
form object trajectories. Although the problem is NP-complex with
the number of targets and the duration of the image sequence, simplifications
can be introduced to reduce the number of association hypotheses.
This part of the tutorial first introduces deterministic methods
for data association such as the Nearest Neighbor and the Graph
Path Cover. Then, methodologies that solve the data association
as a probabilistic filtering problem are discussed. Finally, the
chapter introduces and discusses the recent multi-target framework
based on Finite Set Statistics and Random Finite Sets. This framework
elegantly extends the Bayes' recursion to multiple objects and paves
the way for a novel class of multiobject video tracking algorithms.
a)
Data association |
|
-
Nearest neighbour
- Linear assignment
- Multiple Hypothesis Tracking |
b)
Random Finite Sets for tracking |
|
-
Random Finite Sets: introduction
- Probabilistic Hypothesis Density filter |
5.
Conclusions
a) Future development (context modelling, on-line learning)
b) Open research issues (tracking in crowds, tracking from UAV and
other mobile platforms)
c) Software and datasets available
Useful
Links
Andrea
Cavallaro, Queen Mary University of London -
http://www.elec.qmul.ac.uk/staffinfo/andrea
Emilio Maggio, Vicon - http://www.vicon.com
|
|
|
Tutorial
6: Photometric Methods for 3-D Modeling |
Presenters
Yasuyuki Matsushita, Bennett Wilburn, and Moshe Ben-Ezra Microsoft
Research Asia
Abstract
Over the past two decades, we have seen tremendous theoretical advances
in photometric methods for 3-D modeling, where we wish to reconstruct
the scene geometry from observed images under varying lighting conditions.
These advances now motivate us to more practical applications in
industry, or even in our daily life.
This
tutorial is a focused, vertical introduction to 3-D modeling from
photometric signals. In the first part of the tutorial, we will
begin with imaging models and cover various issues when one applies
photometric methods for 3-D modeling. It also introduces the traditional
photometric stereo problem in calibrated and uncalibrated cases,
and shows solution methods to these problems. In the second part,
we will show how to generalize the traditional methods so as to
work under relaxed assumptions, such as non-Lambertian cases, dynamic
scenes, unknown camera parameters, etc. This tutorial also covers
the application of these techniques in real-world scenarios, such
as computer graphics, digital archiving in e-Heritage, and etc.
Description
of the tutorial and material to be covered:
The
half-day tutorial will consist of two sessions of 1.5 hours each.
The first session will cover the theory and algorithms of standard
photometric methods, while the second will cover generalizations
of the standard techniques and applications of latest photometric
methods. A tentative list of lectures and schedule for the tutorial
are listed below. Slides associated with these lectures will be
made available for inclusion in the ICIP CDROM, and will also be
included in the tutorial web-page.
I.
Introduction to Photometric Methods for 3-D Modeling (20 minutes)
II. Basic Theory and Algorithms in Photometric Methods (70
minutes)
|
|
a.
Basic Image Formation Models in Photometry
b. Radiometric Calibration in the Imaging System
c. Calibrated and Uncalibrated Photometric Stereo
d. Surface Reconstruction from Surface Orientations |
III.
Generalization of Photometric 3-D Modeling Methods (45 minutes)
|
|
a.
Photometric Stereo with Non-Lambertian Surfaces
b. Self-Calibrating Photometric Stereo
c. Geometrically constrainted photometric stereo |
IV.
Photometric Stereo with Specialized Hardware (45 minutes)
|
|
a.
Handheld Photometric Stereo Camera
b. Photometric Stereo and Video Relighting for Dynamic Scenes
c. Dense photometric stereo using very high resolution camera |
Speaker
Biography
Yasuyuki
Matsushita, Lead Researcher, Visual Computing Group, Microsoft
Research Asia
Website: http://research.microsoft.com/en-us/people/yasumat/
Dr.
Matsushita received his B.S., M.S. and Ph.D degrees in EECS from
the University of Tokyo in 1998, 2000, and 2003, respectively. He
joined Microsoft Research Asia in April 2003. His areas of research
are computer vision (photometric techniques, such as radiometric
calibration, photometric stereo, shape-from-shading), computer graphics
(image relighting, video analysis and synthesis). Dr. Matsushita
served as an Area Chair for IEEE Computer Vision and Pattern Recognition
(CVPR) 2009 and International Conference on Computer Vision (ICCV)
2009, and he is on the editorial board member of International Journal
of Computer Vision (IJCV) and IPSJ Journal of Computer Vision and
Applications (CVA). He also serves as a Program Co-Chair of PSIVT
2010.
Bennett
Wilburn, Researcher, Visual Computing Group, Microsoft Research
Asia.
Website: http://research.microsoft.com/en-us/people/bwilburn
Dr.
Wilburn received his B.S. and M.S. in Electrical Engineering with
a focus in VLSI design in 1993. After working on microprocessors
for Hewlett Packard, AMD and Silicon Graphics, he returned to Stanford,
receiving his Ph.D in Electrical Engineering in 2005. For his thesis,
he designed custom CMOS cameras for a scalable video camera array,
and devised high-performance imaging methods using the 100 camera
system. His primary research interest is hardware system design
for graphics and vision, especially for real-world performance capture,
modeling, and relighting. Recently he has
begun researching 3D displays and new interaction styles for mobile
devices.
Moshe
Ben-Ezra, Lead Researcher, Visual Computing Group, Microsoft
Research Asia.
Website: http://research.microsoft.com/en-us/people/mosheb
Dr.
Ben-Ezra received his Ph.D in Computer sciene from the Hebrew University
of Jerusalem in 2000. He did his post-doctoral studies at Columbia
University in the city of New York. Before joining Microsoft Research
Asia in 2006 he was a member of technical staff in Siemens Corporate
Research. His research interests include physics based and hardware
related computer vision.
|
|
|
Tutorial
7: To Tell a Good Picture from a Good One: Perceptual Visual
Quality Evaluation |
Presenter
Weisi Lin, School of Computer Engineering, Nanyang Technological
University
Abstract
Quality evaluation of images and video is useful in many applications,
and also crucial in shaping almost all visual processing algorithms/systems,
as well as their implementation, optimization and testing. Since
the human visual system (HVS) is the final receiver and appreciator
for most processed images and videos (be they naturally captured
or computer generated), it would be beneficial to use a perceptual
quality criterion in the system design and optimization, instead
of a traditional one (e.g., MSE, SNR, PSNR, QoS). As a result of
the evolution, the HVS has developed unique characteristics. Significant
research effort has been made toward modeling the HVS' picture quality
evaluation mechanism, and to apply the models to various situations
(quality metrics themselves; or other applications, like image/video
compression, watermarking, channel coding, signal restoration/enhancement,
computer graphics, and visual content retrieval). In this tutorial,
we will first introduce the problems associated with perceptual
visual quality metrics (PVQMs) (to be in line with the HVS perception),
the relevant physiological/psychological knowledge, and the major
research and development work so far in the related fields. The
basic computational modules are to be discussed, with different
applications presented (e.g., in visual signal compression, enhancement,
image rendering, and content retrieval). Since such technology has
started to find applications in industries, we will also discuss
some examples of early industrial deployment, based on the presenter's
substantial project exposure with various companies. The tutorial
aims at providing a systematic, comprehensive and up-to-date overview
in perceptual quality gauging for images and videos. It can also
provide a practical user's guide to the various relevant techniques,
and all approaches are to be presented with clear classification,
and careful comparison/comments whenever possible, based upon our
understanding and experience in the said areas.
Description
of the tutorial and material to be covered:
Part
1. Introduction & Problem Statements (~25 mins)
Relevant concepts, necessity and difficulties of perceptual visual
quality evaluation
are to be firstly introduced. The progress, applications and challenges
of the relevant research will be reviewed.
Part
2. Related Physiological & Psychological Findings (~35 mins)
An overview of the related human visual system (HVS) characteristics
is given in this part, inclusive of physiological and psychological
aspects. The results of important psycho-visual experiments are
presented for both single-stimulus tests and real-world images.
The emphasis is on the knowledge relevant to the existing and future
R & D efforts.
Part
3. Basic Computational Modules (~40 mins)
We present the oft-adopted computational modules in the existing
perceptual visual quality evaluation research and applications:
spatial and temporal Contrast Sensitivity Function (CSF), luminance
adaptation, visual attention, effect of eye movement, intraand inter-
band contrast masking, common artefact detection, and just-noticeable
difference (JND).
Part
4. Perceptual Visual Quality Metrics (PVQMs) (~60 mins)
Different types of PVQMs will be presented: vision-based, signal-based
and hybrid; full-reference, reduced-reference and no reference.
Discussions are to be given toward various operating domains (i.e.,
pixels, DCT, wavelet or other decompositions), according to the
requirements/constraints in practice. Different recent approaches
are to be highlighted for the two major processes: feature extraction
and feature pooling. Subjective viewing tests and model verification
with publicly available databases will be also discussed in this
part. Some applications and early industrial deployment will be
demonstrated with systems for both natural and computer-generated
visual signals.
Part
5. Conclusions and Discussion on Future Work (~25 mins)
We will give a summary of this tutorial, the concluding remarks,
and our views toward possible future research and development in
the areas related to perceptual visual quality evaluation and modelling.
Tutorial
material to be provided
The copy of the power point presentation and a list of related papers/bookchapters/websites
for further reading
Speaker
Biography
Weisi Lin graduated from Zhongshan University, China with B.Sc and
M.Sc in 1982 and 1985, respectively, and from King's College, London
University, UK with Ph.D in 1992. He taught and researched in Zhongshan
University, Shantou University (China), Bath University (UK), National
University of Singapore, Institute of Microelectronics (Singapore),
and Institute for Infocomm Research (Singapore). He has been the
project leader of 12 successfully-delivered projects (mostly for
industries) in digital multimedia technology development since 1997.
He also served as the Lab Head of Visual Processing and the Active
Department Manager in Institute for Infocomm Research. Currently,
he is an Associate Professor in School of Computer Engineering,
Nanyang Technological University in Singapore. His areas of expertise
include image processing, video and audio compression, perceptual
modelling, computer vision, and multimedia communication. He is
a Chartered Engineer, and a fellow of the IET. He is currently an
Associate Editor of Journal of Visual Communication and Image Representation.
He believes that good theory is practical, and keeps a good balance
between academic research and industrial development.
|
|
|
Tutorial
8: Human Behavior Analysis |
Presenters
Nicu Sebe, Nicola Conci (University of Trento)
Abstract
This tutorial will take a holistic view on the research issues and
applications of
Human behavior analysis focusing on the image processing related
aspects. There are two main directions of interest: (1) close-range
behavior analysis which includes facial expression, eye tracking
and gaze, head pose analysis and (2) far-range analysis which includes
body tracking, trajectory representation and matching, activity
detection.
Description
of the tutorial and material to be covered:
Image
and video processing plays a fundamental role in many new types
of interfaces and application areas (multimodal and attentive interfaces,
applications such as surveillance, ambient assisted living, etc.)
in which humans play a central role. This implies that building
systems capable of analyzing and understanding behaviors lies at
the crossroads of many research areas (psychology, artificial intelligence,
pattern recognition, image processing, computer vision, etc.).
Domains
where human behavior understanding is crucial (e.g., human-computer
interaction, affective computing, surveillance, etc.) rely on advanced
image processing and pattern recognition techniques to automatically
interpret complex behavioral cues generated when humans act in their
natural environment. This is a challenging problem where many issues
are still open, including the joint modeling of behavioral cues
taking place at different time scales, the inherent uncertainty
of machine detectable evidences of human behavior, the presence
of long term dependencies in observations extracted from human behavior,
and the important role of dynamics in human behavior understanding.
This
tutorial is meant for researchers dealing with the problem of modeling
human behavior under its multiple facets (expression of emotions,
performance of individual or joint actions, etc.), with particular
attention to image processing approaches that model the actual dynamics
of behavior in close and far-range domains. The contiguity with
ICIP, is expected to foster
cross-pollination between several research communities such as computer
vision, image processing, human-centered computing, etc., in order
to merge the analysis carried out at different scales (gaze, posture,
position), improving therefore the capability of capturing the whole
dynamics of the action under investigation.
In
this tutorial, we take a holistic approach to developing human-behavior
understanding systems. We aim to identify the important research
issues, and to ascertain potentially fruitful future research directions
in this area. In particular, we introduce key concepts, discuss
technical approaches and open issues in two areas: (1) close-range
behavior analysis: face tracking, facial expression analysis, eye
tracking and gaze, head pose analysis and (2) farrange analysis:
body tracking, activity detection, trajectory analysis, etc.
Each topic will be complemented by a number of demonstrations and
end-user applications that will be shown as viable implementations,
in order to highlight the benefits that image and video processing
can infer to the deployment of tools in the areas of HCI, visual
surveillance, assisted living, rehabilitation, etc.
Benefits
& List of Topics: This tutorial will enable the participants
to understand key concepts, state-of-the-art techniques, and open
issues in the areas described below. The tutorial will cover parts
of the following topic areas:
- Vision
for multimodal interaction: overview of techniques and state
of the art in eye detection and visual gaze estimation.
- Emotion recognition for affective retrieval and in affective
interfaces: approaches to multimedia content analysis and interaction
that use facial expression recognition.
- Machine learning: adaptive interfaces and learning of visual
patterns from user input for automatic detection and recognition
- Vision for activity detection: analysis of trajectories
of humans in indoor
environments, posture and motion dynamics
- Trajectory mining: query formulation, representation and
retrieval of trajectories in video databases.
- Applications: traditional and emerging application areas
will be described with specific examples in smart conference room
research, interaction for people with disabilities, entertainment,
and others.
This
tutorial has been specifically designed for the audience of ICIP;
while the focus of the tutorial will be technical, we aim at giving
participants a broad view of research and important topics for developing
Human Behavior Analysis Systems. Materials will include an overview
of technical approaches for Vision-Based Human-Computer Interaction
as well as materials from numerous sources not typically present
at ICIP. Handouts will include presentation slides and also include
relevant references.
|
|
|
Tutorial
9: Image Denoising and the SURE-LET Methodology |
Presenters
Thierry Blu, Dept of Electronic Engineering, The Chinese University
of Hong Kong, Hong Kong SAR.
Florian Luisier, Biomedical Imaging Laboratory, Swiss Federal Institute
of Technology (EPFL) Lausanne, Switzerland.
Abstract
The goal of this tutorial is to introduce the attendance to a new
approach for dealing with noisy data - typically, images or videos
here.
Image denoising consists in approximating the noiseless image by
performing some, usually non-linear, processing of the noisy image.
Most standard techniques involve assumptions on the result of this
processing (sparsity, low high-frequency contents, etc.); i.e.,
the denoised image.
Instead, the SURE-LET methodology that we promote consists in approximating
the processing itself (seen as a function) in some linear combination
of elementary non-linear processings (LET: Linear Expansion of Thresholds),
and to optimize the coefficients of this combination by minimizing
a statistically unbiased estimate of the Mean Square Error (SURE:
Stein's Unbiased Risk Estimate, for additive Gaussian noise).
This tutorial will introduce the technique to the attendance, will
outline its advantages (fast, noise-robust, flexible, image adaptive).
A very complete set of results will be shown and compared with the
state-of-the-art.
Extensions of the approach to Poisson noise reduction with application
to microscopy imaging will also be shown.
Description
of the tutorial and material to be covered:
1.
Review of usual image denoising approaches
2. Stein's Unbiased Risk Estimate (SURE): Estimating the MSE without
assumptions on the ground truth.
3. Aproximation of the denoising process using a linear combination
of thresholds in a transformed domain (LET)
a. Wavelets and simple thresholds
b. Wavelet interscale thresholds
4. The SURE-LET optimization
a. Orthogonal representations/transformations
b. Non-orthogonal/Redundant representations
5. Algorithm description and results
a. Grayscale image denoising
b. Color/Multichannel image denoising
c. Video denoising
6. Extension to Poisson denoising
a. Poisson MSE estimate (PURE)
b. Interscale algorithm and results
Speaker
Biography
Thierry
Blu received Engineering Diplomas from Ecole Polytechnique,
France, in 1986 and from Telecom Paris (ENST), France, in 1988.
In 1996, he obtained a Ph.D in electrical engineering from ENST
for a study on iterated rational filterbanks, applied to wideband
audio coding.
Between 1998 and 2007, he was with the Biomedical Imaging Group
at the Swiss Federal Institute of Technology (EPFL) in Lausanne,
Switzerland. He is now a Professor in the Department of Electronic
Engineering, The Chinese University of Hong Kong.
Dr. Blu was the recipient of two Best Paper Awards from the IEEE
Signal Processing Society (2003 and 2006). He is also coauthor (with
F. Luisier) of a paper that received a Young Author Best Paper Award
(2009) from the same society. Between 2002 and 2006, he has been
an Associate Editor of the IEEE Transactions on Image Processing
and since 2006, of the IEEE Transactions on Signal Processing. He
is also Associate Editor of Elsevier Signal Processing and of EURASIP
Journal on Image and Video Processing. He is a member of the IEEE
Technical Committee on "Signal Processing Theory and Methods".
Research interests: (multi)wavelets, multiresolution analysis, multirate
filterbanks, interpolation, approximation and sampling theory, sparse
sampling, image denoising, psychoacoustics, biomedical imaging,
optics, wave propagation�K
Florian
Luisier was born in Switzerland, in 1981. In 2005, he received
his Master degree in Microengineering from the Swiss Federal Institute
of Technology (EPFL) in Lausanne, Switzerland. In 2010, he obtained
his Ph.D. in Computer, Communication, and Information Sciences from
the same institution. Since 2005, he has been with the EPFL's Biomedical
Imaging Group led by Prof. Michael Unser.
Dr. Luisier is the recipient of the 2009 Young Author Best Paper
Award from the IEEE Signal Processing Society. Since 2006, he has
served as reviewer for various international scientific journals,
including IEEE Transactions on Image Processing and IEEE Transactions
on Medical Imaging.
His research interests include image processing, multiresolution
representations, risk estimation techniques, and the restoration
of multidimensional biomedical data.
|
|
|
Tutorial10:
Low-rank Matrix Recovery: From Theory to Imaging Applications
|
Presenters
Yi Ma, University of Illinois,
Zhouchen Lin, Microsoft Research Asia,
John Wright, Microsoft Research Asia
Abstract
The goal of this tutorial is to provide the ICIP community with
an introduction to the quickly developing area of low-rank matrix
recovery. Low-rank (or approximately low-rank) matrices arise in
a great number of applications involving image and video data. A
few recurrent examples in this tutorial will include aligning batches
of images, super-resolution and video inpainting, and in background
modelling for visual tracking and surveillance. However, in real
applications our observations are never prefect: observations are
always noisy, often missing, and sometimes grossly or even maliciously
corrupted. The recent excitement surrounding low-rank matrix recovery
is due to very recent results showing that under fairly general
circumstances, the low rank recovery problem can be efficiently
and exactly solved, by convex programming. These theoretical advances
have inspired a flurry of algorithmic work, giving increasingly
practical and scalable algorithms for solving the corresponding
convex programs.
The
theory and algorithms described above, which the proposers have
had a strong role in developing, are already beginning to influence
practice in a number of areas, including collaborative filtering
and computer vision. However, we believe these results are poised
for even stronger impact in image processing. The purpose of this
tutorial is to bring these ideas to the ICIP community, by giving
a solid and unified introduction to the existing theoretical and
algorithmic state of the art in the area, and then show how this
theory and algorithms are already being used to solve real imaging
problems.
In
particular, we will:
1.
Familiarize participants with the basic problem setting and theory
of low-rank matrix recovery, including when the problem is well-posed
and when efficient solutions are possible.
2. Equip participants with algorithmic tools for solving with matrix
recovery problems, based on recent advances in non-smooth convex
programming.
3. Give examples showing how these tools can be used to solve real-world
problems involving images and videos.
All
lecture material (notes and slides) will be made available during
the tutorial, via the website:
http://watt.csl.illinois.edu/~perceive/matrix-rank/
That website also contains source code that will allow interested
participants to get a hands-on feel for the performance of the methods,
and to begin using them in their own research.
Description
of the tutorial and material to be covered:
The
half-day tutorial will consist of an introduction, followed by three
tutorial sessions of roughly 45 minutes each. The introduction will
motivate the tutorial by introducing several model applications,
and putting the work to be presented in the context of existing
work in the area.
The introductory session will be followed by a session introducing
the current theoretical understanding of the low-rank recovery problem,
including when the problem is well-posed, and when one can hope
for an efficient solution. This session will emphasize the new theory
and algorithm's ability to simultaneously cope with many non-ideal
factors in real application data, including missing elements, errors
and corruption, and noise.
Once participants have a good introductory feel for the current
theoretical state of the art in the area, the tutorial will move
into its second stage, in which we discuss practical and scalable
algorithmic solutions to the low-rank recovery problem. This section
will discuss recent advances in non-smooth convex optimization that
now enable the solution of moderate-to-large scale matrix recovery
problems in a matter of minutes on a standard PC. We will discuss
techniques from numerical linear algebra that can further improve
speed and scalability, from a user's perspective. Finally, we will
show how the algorithms introduced can be parallelized (for very
large scale applications) and implemented on the GPU (for time-critical
applications). The algorithms introduced will correspond to publicly
available code packages that participants can immediately begin
using in their own applications.
With this theoretical and algorithmic groundwork established, we
will close the tutorial with a number of applications from image
and video processing, as well as computer vision. We will show how
rank-minimization problems arise naturally in the applications contexts,
and how they can be understood and solved using the tools from the
first two sessions of the tutorial. We will emphasize how the basic
theory and algorithms extend to meet application challenges, for
example, in dealing with image transformations in batch image alignment
or image and video super-resolution.
Below
is a more detailed outline of the planned tutorial sessions:
I.
Introduction and Overview of Tutorial (5 min), Motivating
Scenarios (10 min)
|
II.
Basic Theory of Low-rank Matrix Recovery (45 min)
|
a.
Problem formulation: errors, missing data, noise
b. Well-posedness: when can any algorithm recover a low-rank
matrix?
c. Guarantees: when can efficient algorithms recover a low-rank
matrix?
|
III.
Algorithms for Low-rank Matrix Recovery (45 min)
|
a.
Fast first-order methods for nonsmooth convex programming
b. Warm starts and specific techniques for improved speed
c. Parallel and GPU implementations
|
IV.
Applications in Image and Video Processing (45 min)
|
a.
Batch image alignment by low-rank and sparse decomposition
b. Video superresolution via low-rank optimization
c. Applications in tracking and surveillance
|
Speaker
Biography
Yi
Ma is an associate professor at the Electrical & Computer
Engineering Department of the University of Illinois at Urbana-Champaign.
He is also the research manager of the Visual Computing group at
Microsoft Research Asia in Beijing since January 2009. His main
research interest is in computer vision, high-dimensional data analysis,
and systems theory. He is the first author of the popular vision
textbook "An Invitation to 3-D Vision," published by Springer
in 2003. Yi Ma received two Bachelors' degree in Automation and
Applied Mathematics from Tsinghua University (Beijing, China) in
1995, a Master of Science degree in EECS in 1997, a Master of Arts
degree in Mathematics in 2000, and a PhD degree in EECS in 2000,
all from the University of California at Berkeley. Yi Ma received
the David Marr Best Paper Prize at the International Conference
on Computer Vision 1999, the Longuet-Higgins Best Paper Prize at
the European Conference on Computer Vision 2004, and the Sang Uk
Lee Best Student Paper Award with his students at the Asian Conference
on Computer Vision in 2009. He also received the CAREER Award from
the National Science Foundation in 2004 and the Young Investigator
Award from the Office of Naval Research in 2005. He is an associate
editor of IEEE Transactions on Pattern Analysis and Machine Intelligence
and has served as the chief guest editor for special issues for
the Proceedings of IEEE and the IEEE Signal Processing Magazine.
He will also serve as Program Chair for ICCV 2013 in Sydney, Australia.
He is a senior member of IEEE and a member of ACM, SIAM, and ASEE.
Zhouchen
Lin is a researcher at Visual Computing group, Microsoft Research
Asia. He received his Bachelor's degree in pure mathematics from
Nankai University in 1993 and Master's and Doctor's degrees in applied
mathematics from Peking University in 1996 and 2000, respectively.
His research interests include computer vision, image processing,
machine learning, pattern recognition, numerical computation and
optimization, and computer graphics. He is a guest professor to
Shanghai Jiaotong University, Beijing Jiaotong University, Southeast
University, and Institute of Computing Technologies, Chinese Academy
of Sciences. He is a senior member of the IEEE.
John
Wright is a researcher in the Visual Computing group at Microsoft
Research Asia. He received his PhD in Electrical Engineering from
the University of Illinois at Urbana-Champaign. His graduate work
focused on developing efficient and provably correct algorithms
for error correction with high-dimensional data, and on their application
in automatic face recognition. His research interests encompass
a number of topics in vision and signal processing, including minimum
description length methods for clustering and classification, error
correction and inference with non-ideal data, video analysis and
tracking, as well as face and object recognition. His work has received
a number of awards and honors, including a UIUC Distinguished Fellowship,
Carver Fellowship, Microsoft Research Fellowship, the UIUC Martin
Award for Outstanding Graduate Research, and the Lemelson-Illinois
Prize for Innovation.
|
|
|