教学文库网 - 权威文档分享云平台
您的当前位置:首页 > 文库大全 > 高等教育 >

Accommodating Hybrid Retrieval in a Comprehensive Video Data(4)

来源:网络收集 时间:2025-12-24
导读: Figure 3.2 Activity Model and a “Football” example In Figure 3.2, the left-hand side is the architecture of the Activity Model and right-hand side is a“Football” example. The model consists of fo

Figure 3.2 Activity Model and a “Football” example

In Figure 3.2, the left-hand side is the architecture of the Activity Model and right-hand side is a“Football” example. The model consists of four levels, Activity, Event, Motion, and Object. The top threelevels are mainly used by the query language processor to reason what activity the user wants to retrieve, sothat the processor would retrieve the video scene from the database according to the objects (i.e., featureswith spatio-temporal semantics) from the bottom level of the model.

A user may input an activity into the Specification Language Processing component (shown in Figure 3.1(c.1)) by using terms such as “playing sports”, “playing football”, “kicking football”, or “kicking”. Morespecific terms would yield more specific retrieval results. In general, the first word is known as a verb andthe word followed is a noun. The processor would analyze from the Motion level first. After some keywordsare matched, the processor would search up to the Event level using the second word. For example, if theterm is “kicking football”, the processor searches “kicking” from the Motion level, and then uses “football”to search from the Event level. If the term is “playing football” and there is no “playing” in the Motion level,the processor will try to reason the thesaurus of the word and then search again. However, if there is nomatch of word from the model, the processor would just skip the “verb” and search the “noun” from theEvent level to the Activity level. After the threshold of the search is met, the processor would go down to thecorresponding Object level. Then it would input those objects from the Object level into the Feature Index

EDICS

Tree as features and ask the user to input some spatio-temporal semantics (ST-Feature) into the database(shown in Figure 3.1 (c.2)).

At a later time, the user may want to retrieve video data based on some activities from the database. Forexample, he may input an activity query like “kicking football”. The Query Language Processor first getssome collections of objects from the Activity Model (shown in Figure 3.1 (e)) and then retrieves the result asthe original query processing (CAROL/ST) by treating the collections of objects as Features and ST-Features. Therefore, the main purposes of the Activity model are to facilitate annotating all common andsignificant activities.

3.2 CBR Extension to CAROL/ST

While CAROL/ST can facilitate effective retrieval based on rich semantics, for multimedia data such asvideo, visual content is also an inseparable (and can be more significant) part, which is difficult to bedescribed with text. On the other hand, content-based approach to automatically extract and index visualfeatures has been a main trend in the area of computer vision and video processing. To employ best strengthsfrom both areas, an extended version of VideoMAP, which we termed as VideoMAP+ [CWLZ01], isdeveloped for supporting hybrid retrieval of videos through both query-based and content-based accesses.Here we adopt visual content to our prototype only.

Figure 3.3 Architecture of VideoMAP+

The architecture of VideoMAP+ is as shown in Figure 3.3 (which is a modified version of Figure 3.1). Here,the Feature Extraction Component (FEC) is newly added in. During the procedure of Video Segmentation(by VCC), visual feature vector of the video and other object defined in are extracted, such as the color,texture, shape and so on. The Hybrid Query Language Processing module contains three kinds of retrievalformat: CAROL/ST Retrieval-the original retrieval format which mainly uses the semantic annotation andspatio-temporal relation of video. The Content-based Retrieval-module supports the newly added retrievalformat that mainly uses the visual information inherent in the video content, and also their HybridCombination Retrieval. CBR query functions are incorporated to form a hybrid query language. Hence theindices are now based on more video objects and the returning result also includes more video object types.

EDICS

3.2.1 Foundation Classes

VideoMAP+ extends a conventional OODB to define video objects through a specific hierarchy(videoàsceneàsegmentàkeyframe). In addition, it includes the concept of CBR to build index on visualfeatures of these objects. Their class attributes, methods and corresponding relations form a complexnetwork (or, a "map" as shown in Figure 3.4). Below we enumerate the foundation classes of theVideoMAP+ objects at various granularities, namely: Keyframe, Segment, Scene, Video and Visual object(cf. Figure 3.4).

VideoMAP+ is at video segment level. This is not the only bridging level possible, as others (such as thekeyframe and/or scene levels) are also meaningful for bridging the two. In VideoMAP+, the segment level ischosen as the direct bridge due to simplicity and efficiency reasons, because we regard video segments asthe basic unit of retrieval.

3.2.2 Search paths with CBR

After integrating CBR with CAROL/ST, three main groups of objects (i.e. Keyframe, Visual-Object, andImage-Feature) are added to the VideoMAP+ system as shown in the class diagram (Figure 3.4).

Image-Feature: Visual Feature extracted from video object, like color, texture, shape and etc.

Keyframe: The fundamental image frame in video sequence.

EDICS

Visual Object. All salient objects captured in a video’s physical space represented visually or textually areinstances of a physical object. Furthermore, every object has the spatio-temporal layout in the imagesequence.

Four new entry points to search for semantic-feature and visual-object are:

(a) Visual-Object,

(b) Image-Feature,

(c) Activity Model, and

(d) Object Level of the Activity Model.

The Object Level of the Activity Model [CL …… 此处隐藏:5785字,全部文档内容请下载后查看。喜欢就下载吧 ……

Accommodating Hybrid Retrieval in a Comprehensive Video Data(4).doc 将本文的Word文档下载到电脑,方便复制、编辑、收藏和打印
本文链接:https://www.jiaowen.net/wenku/128086.html(转载请注明文章来源)
Copyright © 2020-2025 教文网 版权所有
声明 :本网站尊重并保护知识产权,根据《信息网络传播权保护条例》,如果我们转载的作品侵犯了您的权利,请在一个月内通知我们,我们会及时删除。
客服QQ:78024566 邮箱:78024566@qq.com
苏ICP备19068818号-2
Top
× 游客快捷下载通道(下载后可以自由复制和排版)
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
VIP包月下载
特价:29 元/月 原价:99元
低至 0.3 元/份 每月下载150
全站内容免费自由复制
注:下载文档有可能出现无法下载或内容有问题,请联系客服协助您处理。
× 常见问题(客服时间:周一到周五 9:30-18:00)