Tokyo Tech News
Tokyo Institute of Technology merged with Tokyo Medical and Dental University to form Institute of Science Tokyo (Science Tokyo) on October 1, 2024.
Over time, content on this site will be migrated to the Science Tokyo Web. Any information published on this site will be valid in relation to Science Tokyo.
Tokyo Tech News
Published: February 14, 2014
The volume of video data on the Internet increases rapidly each year, with the majority of the data being various kinds of low quality, consumer videos, without text tags. So there is strong demand for video search techniques based on the use of image and video features--so called content-based video retrieval (CBVR).
Video semantic indexing systems extract videos with concepts that are meaningful for users without using any text information such as tags or meta-data from internet video data. The concepts include: objects such as cars and chairs, scenes such as sunsets and families having an enjoyable time, or events such as wedding ceremonies and fireworks.
Nakamasa Inoue, Koichi Shinoda, and colleagues at the Dept. Computer Science at Tokyo Tech have developed a system using on a data-driven method based on the probability theory.
The researchers modelled each concept by a Gaussian Mixture Model (GMM) whose parameters were obtained by Maximum A Posteriori (MAP) estimation. A GMM supervector, which is made by concatenating GMM means was used as an input for the detection process using Support Vector Machines (SVMs). The Tokyo Tech supercomputer, TSUBAME, provided large computational resources to complete this task.
At the TRECVID workshop, which is an annual international workshop of video information search technology hosted by the National Institute of Standards and Technology (NIST), the Shinoda lab system achieved the best performance for video semantic indexing in two consecutive years (2011 and 2012), outperforming 15 teams from all over the world.
This system is not only useful for searching videos by using a text input, but also as a component for making a detection system of "complex events" comprised of several concepts, such as "changing a vehicle tire" and "making a sandwich". This project is sponsored by Canon Inc.
Authors: |
Koichi Shinoda, Nakamasa Inoue |
Title of original paper: |
Reusing Speech Techniques for Video Semantic Indexing |
Journal, volume, pages and year: |
IEEE signal processing magazine, vol. 30, no. 2, pp. 118-122, Mar, 2013 |
Authors: |
Nakamasa Inoue, Koichi Shinoda |
Title of original paper: |
A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors |
Journal, volume, pages and year: |
IEEE Transactions on Multimedia, vol. 14, no. 4-2, pp. 1196-1205, Aug, 2012 |
DOI: |