Multimodal Learning Toward Micro-Video Understanding
暫譯: 多模態學習以理解微視頻
Nie, Liqiang, Liu, Meng, Song, Xuemeng
- 出版商: Morgan & Claypool
- 出版日期: 2019-09-17
- 售價: $3,530
- 貴賓價: 9.5 折 $3,354
- 語言: 英文
- 頁數: 186
- 裝訂: Hardcover - also called cloth, retail trade, or trade
- ISBN: 1681736306
- ISBN-13: 9781681736303
海外代購書籍(需單獨結帳)
相關主題
商品描述
Micro-videos, a new form of user-generated content, have been spreading widely across various social platforms, such as Vine, Kuaishou, and TikTok.
Different from traditional long videos, micro-videos are usually recorded by smart mobile devices at any place within a few seconds. Due to their brevity and low bandwidth cost, micro-videos are gaining increasing user enthusiasm. The blossoming of micro-videos opens the door to the possibility of many promising applications, ranging from network content caching to online advertising. Thus, it is highly desirable to develop an effective scheme for high-order micro-video understanding.
Micro-video understanding is, however, non-trivial due to the following challenges: (1) how to represent micro-videos that only convey one or few high-level themes or concepts; (2) how to utilize the hierarchical structure of venue categories to guide micro-video analysis; (3) how to alleviate the influence of low quality caused by complex surrounding environments and camera shake; (4) how to model multimodal sequential data, i.e. textual, acoustic, visual, and social modalities to enhance micro-video understanding; and (5) how to construct large-scale benchmark datasets for analysis. These challenges have been largely unexplored to date.
In this book, we focus on addressing the challenges presented above by proposing some state-of-the-art multimodal learning theories. To demonstrate the effectiveness of these models, we apply them to three practical tasks of micro-video understanding: popularity prediction, venue category estimation, and micro-video routing. Particularly, we first build three large-scale real-world micro-video datasets for these practical tasks. We then present a multimodal transductive learning framework for micro-video popularity prediction. Furthermore, we introduce several multimodal cooperative learning approaches and a multimodal transfer learning scheme for micro-video venue category estimation. Meanwhile, we develop a multimodal sequential learning approach for micro-video recommendation. Finally, we conclude the book and figure out the future research directions in multimodal learning toward micro-video understanding.
商品描述(中文翻譯)
微視頻(Micro-videos)是一種新型的用戶生成內容,已在各種社交平台上廣泛傳播,例如 Vine、快手(Kuaishou)和 TikTok。與傳統的長視頻不同,微視頻通常由智能移動設備在任何地方錄製,時長僅幾秒鐘。由於其簡短和低帶寬成本,微視頻正獲得越來越多用戶的熱情。微視頻的蓬勃發展為許多有前景的應用開啟了大門,這些應用範圍從網絡內容緩存到在線廣告。因此,開發一種有效的高階微視頻理解方案是非常可取的。
然而,微視頻理解並非易事,面臨以下挑戰:(1)如何表示僅傳達一個或幾個高層主題或概念的微視頻;(2)如何利用場地類別的層次結構來指導微視頻分析;(3)如何減輕由於複雜環境和相機抖動造成的低質量影響;(4)如何建模多模態序列數據,即文本、聲音、視覺和社交模態,以增強微視頻理解;(5)如何構建大規模基準數據集以進行分析。這些挑戰至今尚未得到充分探索。
在本書中,我們專注於通過提出一些最先進的多模態學習理論來解決上述挑戰。為了展示這些模型的有效性,我們將其應用於微視頻理解的三個實際任務:人氣預測、場地類別估計和微視頻路由。特別是,我們首先為這些實際任務構建三個大規模的真實世界微視頻數據集。然後,我們提出了一個多模態傳導學習框架,用於微視頻人氣預測。此外,我們介紹了幾種多模態協作學習方法和一個多模態轉移學習方案,用於微視頻場地類別估計。同時,我們開發了一種多模態序列學習方法,用於微視頻推薦。最後,我們總結了本書並探討了多模態學習在微視頻理解方面的未來研究方向。