Proceedings of the 2nd ACM International Conference on Multimedia Retrieval - ICMR '12, 2012
The estimation of demographic target groups for web videos -- with applications in ad targeting -... more The estimation of demographic target groups for web videos -- with applications in ad targeting -- poses a challenging problem, as the textual description and view statistics available for many clips is extremely sparse. Therefore, the goal of this paper is to link a clip's popularity across different viewer ages and genders on the one hand with the video content on the other: Employing user comments and user profiles on YouTube, we show that there is a strong correlation between demographic target groups and semantic concepts appearing in the video (like "teenage male" and "skateboarding"). Based on this observation, we suggest two approaches: First, the demographic target group of a clip is predicted automatically via a content-based concept detection. Second, should sufficient view statistics already give a good impression of a video's audience, we show that this information can serve as a valuable additional signal to disambiguate concept detection. Our experimental results on a dataset of 14,000 YouTube clips commented by 1 mio. users show that -- though content-based viewership estimation is a challenging problem -- suitable demographic groups can be suggested by concept detection. Also, a combination with demographic information as an additional signal leads to relative improvements of concept detection accuracy by 47%.
Proceedings of International Conference on Multimedia Retrieval - ICMR '14, 2014
ABSTRACT The growing amounts of multimedia data being made available and shared via the Internet ... more ABSTRACT The growing amounts of multimedia data being made available and shared via the Internet pose an increasing problem for law enforcement to investigate the distribution and possession of child sexual abuse (CSA) media. In this paper we address the automatic detection of CSA material in image and video data by multi-modal feature description. Instead of analyzing hash sums or file names, we propose the content-based analysis on visual and, in case of videos, also audio features. To this end, we apply multiple low level features as well as SentiBank, a novel mid-level representation of visual content. In collaboration with police partners and European cyber crime units, we conducted experiments on several datasets, including real world CSA media. Our quantitative evaluation reveals the challenging nature of child pornography detection, especially in the joint presence of non-illegal pornographic data, rendering skin detection, a popular feature for detecting pornography, less discriminative. Further, the utilization of SentiBank features shows high potential for detection and explainability of such content. Overall, multi-modal feature fusion can achieve an improved detection accuracy, reducing equal error rate from 17% to 10% for images and from 16% to 8% for videos as compared to best single feature performance for the challenging task of classifying CSA content from adult media.
Multimedia event detection (MED) on user-generated content is the task of finding an event, e.g.,... more Multimedia event detection (MED) on user-generated content is the task of finding an event, e.g., a Flash mob or Attempting a bike trick, using its content characteristics. Recent research has focused on approaches that use semantically defined "concepts" trained with annotated audio clips. Using audio concepts allows us to show semantic evidence of their relationship to events, by looking at the probability distribution of the audio concepts per event. However, while the concept-based approach has been useful in image detection, audio concepts have generally not surpassed the performance of low-level audio features like Mel Frequency Cepstral Coefficients (MFCCs) in addressing the unstructured acoustic composition of video events. Such audio-concept based systems could benefit from temporal information, due to one of the intrinsic characteristics of audio: it occurs across a time interval. This paper presents a multimedia event detection system that uses audio concepts; it exploits the temporal correlation of audio characteristics for each particular event at two levels. The first level involves analyzing the short-and long-term surrounding context information for the audio concepts, through an implementation of a Hierarchical Deep Neural Network (H-DNN), to determine engineered audio-concept features. At the second level, we use Hidden Markov Models (HMMs) to describe the continuous and non-stationary characteristics of the audio signal throughout the video. Experiments using the TRECVID MED 2013 corpus show that an HMM system based on audio-concept features can perform competitively when compared with an MFCC-based system.
Run No. Run ID Run Description infMAP (%) training on IACC data 1 F A DFKI-MADM 3 SIFT visual wor... more Run No. Run ID Run Description infMAP (%) training on IACC data 1 F A DFKI-MADM 3 SIFT visual words, Color Correlograms and Face-Detection separately trained, late fusion of SVMs scores 5.0 2 F A DFKI-MADM 4 SIFT visual words with SVMs 4.4 training on YouTube 3 F D DFKI-MADM 1 SIFT visual words, Color Correlograms and Face-Detection separately trained, late fusion of SVMs scores 2.1 4 F B DFKI-MADM 2 SIFT visual words with SVMs 1.3
Gesellschaft für Informatik (GI) publishes this series in order to make available to a broad public recent findings in informatics (ie computer science and informa-tion systems), to document conferences that are organized in co-operation with GI and to publish the annual GI Award dissertation.
Abstract: With the availability of large scale online video platforms like YouTube, copyright inf... more Abstract: With the availability of large scale online video platforms like YouTube, copyright infringement becomes a severe problem, such that the demand for robust copy detection systems is growing. Such system must find multiple occurrence of copyright protected material within video clips that are created, modified, remixed and uploaded by the user. A particular challenge is to find the exact position of a copy in a–potentially huge–reference database. For this purpose, this paper presents a Content Based Copy Detection system ...
Among the vast information available on the web, social media streams capture what people current... more Among the vast information available on the web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems such as trend aware recommendation and automatic vocabulary selection for video concept detection systems.
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval - ICMR '12, 2012
The estimation of demographic target groups for web videos -- with applications in ad targeting -... more The estimation of demographic target groups for web videos -- with applications in ad targeting -- poses a challenging problem, as the textual description and view statistics available for many clips is extremely sparse. Therefore, the goal of this paper is to link a clip's popularity across different viewer ages and genders on the one hand with the video content on the other: Employing user comments and user profiles on YouTube, we show that there is a strong correlation between demographic target groups and semantic concepts appearing in the video (like "teenage male" and "skateboarding"). Based on this observation, we suggest two approaches: First, the demographic target group of a clip is predicted automatically via a content-based concept detection. Second, should sufficient view statistics already give a good impression of a video's audience, we show that this information can serve as a valuable additional signal to disambiguate concept detection. Our experimental results on a dataset of 14,000 YouTube clips commented by 1 mio. users show that -- though content-based viewership estimation is a challenging problem -- suitable demographic groups can be suggested by concept detection. Also, a combination with demographic information as an additional signal leads to relative improvements of concept detection accuracy by 47%.
Proceedings of International Conference on Multimedia Retrieval - ICMR '14, 2014
ABSTRACT The growing amounts of multimedia data being made available and shared via the Internet ... more ABSTRACT The growing amounts of multimedia data being made available and shared via the Internet pose an increasing problem for law enforcement to investigate the distribution and possession of child sexual abuse (CSA) media. In this paper we address the automatic detection of CSA material in image and video data by multi-modal feature description. Instead of analyzing hash sums or file names, we propose the content-based analysis on visual and, in case of videos, also audio features. To this end, we apply multiple low level features as well as SentiBank, a novel mid-level representation of visual content. In collaboration with police partners and European cyber crime units, we conducted experiments on several datasets, including real world CSA media. Our quantitative evaluation reveals the challenging nature of child pornography detection, especially in the joint presence of non-illegal pornographic data, rendering skin detection, a popular feature for detecting pornography, less discriminative. Further, the utilization of SentiBank features shows high potential for detection and explainability of such content. Overall, multi-modal feature fusion can achieve an improved detection accuracy, reducing equal error rate from 17% to 10% for images and from 16% to 8% for videos as compared to best single feature performance for the challenging task of classifying CSA content from adult media.
Multimedia event detection (MED) on user-generated content is the task of finding an event, e.g.,... more Multimedia event detection (MED) on user-generated content is the task of finding an event, e.g., a Flash mob or Attempting a bike trick, using its content characteristics. Recent research has focused on approaches that use semantically defined "concepts" trained with annotated audio clips. Using audio concepts allows us to show semantic evidence of their relationship to events, by looking at the probability distribution of the audio concepts per event. However, while the concept-based approach has been useful in image detection, audio concepts have generally not surpassed the performance of low-level audio features like Mel Frequency Cepstral Coefficients (MFCCs) in addressing the unstructured acoustic composition of video events. Such audio-concept based systems could benefit from temporal information, due to one of the intrinsic characteristics of audio: it occurs across a time interval. This paper presents a multimedia event detection system that uses audio concepts; it exploits the temporal correlation of audio characteristics for each particular event at two levels. The first level involves analyzing the short-and long-term surrounding context information for the audio concepts, through an implementation of a Hierarchical Deep Neural Network (H-DNN), to determine engineered audio-concept features. At the second level, we use Hidden Markov Models (HMMs) to describe the continuous and non-stationary characteristics of the audio signal throughout the video. Experiments using the TRECVID MED 2013 corpus show that an HMM system based on audio-concept features can perform competitively when compared with an MFCC-based system.
Run No. Run ID Run Description infMAP (%) training on IACC data 1 F A DFKI-MADM 3 SIFT visual wor... more Run No. Run ID Run Description infMAP (%) training on IACC data 1 F A DFKI-MADM 3 SIFT visual words, Color Correlograms and Face-Detection separately trained, late fusion of SVMs scores 5.0 2 F A DFKI-MADM 4 SIFT visual words with SVMs 4.4 training on YouTube 3 F D DFKI-MADM 1 SIFT visual words, Color Correlograms and Face-Detection separately trained, late fusion of SVMs scores 2.1 4 F B DFKI-MADM 2 SIFT visual words with SVMs 1.3
Gesellschaft für Informatik (GI) publishes this series in order to make available to a broad public recent findings in informatics (ie computer science and informa-tion systems), to document conferences that are organized in co-operation with GI and to publish the annual GI Award dissertation.
Abstract: With the availability of large scale online video platforms like YouTube, copyright inf... more Abstract: With the availability of large scale online video platforms like YouTube, copyright infringement becomes a severe problem, such that the demand for robust copy detection systems is growing. Such system must find multiple occurrence of copyright protected material within video clips that are created, modified, remixed and uploaded by the user. A particular challenge is to find the exact position of a copy in a–potentially huge–reference database. For this purpose, this paper presents a Content Based Copy Detection system ...
Among the vast information available on the web, social media streams capture what people current... more Among the vast information available on the web, social media streams capture what people currently pay attention to and how they feel about certain topics. Awareness of such trending topics plays a crucial role in multimedia systems such as trend aware recommendation and automatic vocabulary selection for video concept detection systems.
Uploads
Papers by D. Borth