Tokyo Tech News

Semantic indexing system for video search using a data-driven approach


Published: February 14, 2014

The volume of video data on the Internet increases rapidly each year, with the majority of the data being various kinds of low quality, consumer videos, without text tags. So there is strong demand for video search techniques based on the use of image and video features--so called “content-based video retrieval” (CBVR).

Video semantic indexing systems extract videos with “concepts” that are meaningful for users without using any text information such as tags or meta-data from internet video data. The concepts include: objects such as cars and chairs, scenes such as sunsets and families having an enjoyable time, or events such as wedding ceremonies and fireworks.

Nakamasa Inoue, Koichi Shinoda, and colleagues at the Dept. Computer Science at Tokyo Tech have developed a system using on a data-driven method based on the probability theory.

The researchers modelled each concept by a Gaussian Mixture Model (GMM) whose parameters were obtained by Maximum A Posteriori (MAP) estimation. A GMM supervector, which is made by concatenating GMM means was used as an input for the detection process using Support Vector Machines (SVMs). The Tokyo Tech supercomputer, TSUBAME, provided large computational resources to complete this task.

At the TRECVID workshop, which is an annual international workshop of video information search technology hosted by the National Institute of Standards and Technology (NIST), the Shinoda lab system achieved the best performance for video semantic indexing in two consecutive years (2011 and 2012), outperforming 15 teams from all over the world.

This system is not only useful for searching videos by using a text input, but also as a component for making a detection system of "complex events" comprised of several concepts, such as "changing a vehicle tire" and "making a sandwich". This project is sponsored by Canon Inc.

Examples of video search results


Koichi Shinoda, Nakamasa Inoue
Title of original paper:
Reusing Speech Techniques for Video Semantic Indexing
Journal, volume, pages and year:
IEEE signal processing magazine, vol. 30, no. 2, pp. 118-122, Mar, 2013
Nakamasa Inoue, Koichi Shinoda
Title of original paper:
A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors
Journal, volume, pages and year:
IEEE Transactions on Multimedia, vol. 14, no. 4-2, pp. 1196-1205, Aug, 2012