Sunday, June 28, 2009

Readings in Multimedia Indexing and Retrieval

Tôi lập topic này để tập hợp các tài liệu liên quan đến multimedia indexing/retrieval. Các tài liệu này được chọn lọc để cung cấp cho các bạn mới bắt đầu nghiên cứu về lĩnh vực này một cái nhìn tổng quát về state of the art, về những khó khăn, thách thức (challenges) và những vấn đề đang được quan tâm giải quyết (unsolved problems).

Các bài báo tổng quan:

1. Concept-Based Video Retrieval - Cees Snoek et. al. 2009
Đây là tài liệu mới nhất và nói khá chi tiết về lĩnh vực này. Khá dài, hơn 100 trang.
Link download: http://staff.science.uva.nl/%7Ecgmsnoek/pub/snoek-concept-based-video-retrieval-fntir.pdf

The paper on Concept-Based Video Retrieval by myself and Marcel Worring has appeared in Foundations and Trends® in Information Retrieval. In this paper, we review 300 references on video retrieval, indicating when text-only solutions are unsatisfactory and showing the promising alternatives which are in majority concept-based. Therefore, central to our discussion is the notion of a semantic concept: an objective linguistic description of an observable entity. Specifically, we present our view on how its automated detection, selection under uncertainty, and interactive usage might solve the major scientific problem for video retrieval: the semantic gap. To bridge the gap, we lay down the anatomy of a concept-based video search engine. We present a component-wise decomposition of such an interdisciplinary multimedia system, covering influences from information retrieval, computer vision, machine learning, and human-computer interaction. For each of the components we review state-of-the-art solutions in the literature, each having different characteristics and merits. Because of these differences, we cannot understand the progress in video retrieval without serious evaluation efforts such as carried out in the NIST TRECVID benchmark. We discuss its data, tasks, results, and the many derived community initiatives in creating annotations and baselines for repeatable experiments. We conclude with our perspective on future challenges and opportunities. The paper is available for download now.

2. Image Retrieval: Ideas, Influences, and Trends of the New Age - R. Datta et. al. 2008
Bài đăng trên ACM Computing Survey. Các tác giả thiên về hướng tiếp cận auto image annotation.
Link download http://infolab.stanford.edu/~wangz/project/imsearch/review/JOUR/datta.pdf

We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number
of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current
decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields.We also discuss significant challenges involved in the adaptation of existing image retrieval
techniques to build systems that can be useful in the real world.
In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.


3. Content-based Image Retrieval at the End of the Early Years - A. Smeulders et. al. 2000
Bài đăng trên PAMI được viết bởi các tiền bối của lĩnh vực này, review lại các kết quả nghiên cứu từ năm 2000 trở về được. Định nghĩa về semantic gap thường được các bài khác tham chiếu đến bày này.
Link download: https://eprints.kfupm.edu.sa/32245/1/32245.pdf

The paper presents a review of 200 references in content-based image retrieval. The paper starts with discussing the working conditions of content-based retrieval: patterns of use, types of pictures, the role of semantics, and the sensory gap. Subsequent sections discuss computational steps for image retrieval systems. Step one of the review is image processing for retrieval sorted by color, texture, and local geometry. Features for retrieval are discussed next, sorted by: accumulative and global features, salient points, object and shape features, signs, and structural combinations thereof. Similarity of pictures and objects in pictures is reviewed for each of the feature types, in close connection to the types and means of feedback the user of the systems is capable of giving by interaction. We briefly discuss aspects of system engineering: databases, system architecture, and evaluation. In the concluding section, we present our view on: the driving force of the field, the heritage from computer vision, the influence on computer vision, the role of similarity and of interaction, the need for databases, the problem of evaluation, and the role of the semantic gap.

4. Video Retrieval Based on Semantic Concepts - A. Hauptmann et al. 2008
Bài đăng trên IEEE, viết lại các kết quả nghiên cứu của nhóm Prof. A. Hauptmann, CMU, rất nổi tiếng với hệ thống Infomedia, một trong những hệ thống đầu tiên về lĩnh vực này.
Link download: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=04472084.

An approach using many intermediate semantic concepts is proposed with the potential to bridge the semantic gap between what a color, shape, and texture-based low level image analysis can extract from video and what users really want to find, most likely using text descriptions of their information needs. Semantic concepts such as cars, planes, roads, people, animals, and different types of scenes (outdoor, night time, etc.) can be automatically detected in the video with reasonable accuracy. This leads us to ask how can they be used automatically and how does a user (or a retrieval system) translate the user’s information need into a selection of related
concepts that would help find the relevant video clips, from the large list of available concepts. We illustrate how semantic concept retrieval can be automatically exploited by mapping queries into query classes and through pseudo-relevance feedback. We also provide evidence how a semantic concept can be utilized by users in interactive retrieval, through interfaces that provide affordances of explicit concept selection and search, concept filtering, and relevance feedback. How many concepts we actually need and how accurately they need to be detected and linked through various relationships is specified in the ontology structure.


5. TRECVID
TRECVID hiện là state of the art trong lĩnh vực này. Hàng năm đều có các task chung để các nhóm thử nghiệm (và đua tranh) các phương pháp của mình.
Link download: http://www-nlpir.nist.gov/projects/tvpubs/tv.pubs.org.html

6. Lectures
Summer School on Multimedia Semantics.
Link download: http://www.dcs.gla.ac.uk/ssms07/material.html

Các nhóm nghiên cứu hàng đầu:
- MediaMill - UvA: A. Smeulders, M. Worring, C. Snoek.
- DVMM - Columbia University: Shih-Fu Chang.
- Informedia - CMU: A. Hauptmann.
- VIREO - CityUHK: Ngo-Chong Wah.
- IBM Research - J. Smith, R. Yan.

Xem thêm ở đây: http://ledduy.blogspot.com/2006/10/people-of-interest.html


Các hội nghị chuyên ngành:
- ACM Multimedia - hội nghị tổ chức vào tháng 10 hàng năm.
- IEEE ICME - hội nghị tổ chức vào tháng 6, tháng 7 hàng năm.
- ACM MIR - hội nghị mới tách ra từ workshop của ACM Multimedia.
- ACM CIVR - hội nghị tổ chức vào tháng 3 hàng năm. Các nhóm làm về TRECVID thường hay gửi bài ở đây.
- IEEE ICASSP - hội nghị tổ chức vào tháng 4 hàng năm. Chủ yếu là về signal processing, nhưng có tracks dành cho video indexing, retrieval.

Xem thêm ở đây: http://ledduy.blogspot.com/2006/10/conferences-of-interest.html

Các tạp chí chuyên ngành:
- IEEE Trans on Multimedia.
- ACM Trans on Multimedia Computing, Communications, and Applications.
- Multimedia Tools and Applications.
- Multimedia Systems.

Xem thêm ở đây: http://ledduy.blogspot.com/2006/10/journals-of-interest.html

Lê Đình Duy

Xem đầy đủ bài viết tại http://ledduy.blogspot.com/2009/06/readings-in-multimedia-indexing-and.html

No comments:

Post a Comment

Popular Posts