Deep Learning for Video Understanding
暫譯: 深度學習在視頻理解中的應用

Name: Deep Learning for Video Understanding
Price: 5634 TWD
Availability: OnlineOnly
Author: Wu, Zuxuan, Jiang, Yu-Gang
ISBN: 3031576780

Wu, Zuxuan, Jiang, Yu-Gang

出版商: Springer
出版日期: 2024-08-02
售價: $5,930
貴賓價: 9.5 折 $5,634
語言: 英文
頁數: 188
裝訂: Hardcover - also called cloth, retail trade, or trade
ISBN: 3031576780
ISBN-13: 9783031576782
相關分類: DeepLearning

海外代購書籍(需單獨結帳)

商品描述

This book presents deep learning techniques for video understanding. For deep learning basics, the authors cover machine learning pipelines and notations, 2D and 3D Convolutional Neural Networks for spatial and temporal feature learning. For action recognition, the authors introduce classical frameworks for image classification, and then elaborate both image-based and clip-based 2D/3D CNN networks for action recognition. For action detection, the authors elaborate sliding windows, proposal-based detection methods, single stage and two stage approaches, spatial and temporal action localization, followed by datasets introduction. For video captioning, the authors present language-based models and how to perform sequence to sequence learning for video captioning. For unsupervised feature learning, the authors discuss the necessity of shifting from supervised learning to unsupervised learning and then introduce how to design better surrogate training tasks to learn video representations. Finally, the book introduces recent self-training pipelines like contrastive learning and masked image/video modeling with transformers. The book provides promising directions, with an aim to promote future research outcomes in the field of video understanding with deep learning.

商品描述(中文翻譯)

本書介紹了用於視頻理解的深度學習技術。關於深度學習的基本概念，作者涵蓋了機器學習流程和符號，2D 和 3D 卷積神經網絡用於空間和時間特徵學習。對於動作識別，作者介紹了圖像分類的經典框架，然後詳細說明了基於圖像和基於片段的 2D/3D CNN 網絡用於動作識別。對於動作檢測，作者詳細闡述了滑動窗口、基於提議的檢測方法、單階段和雙階段方法、空間和時間動作定位，隨後介紹了數據集。對於視頻標題生成，作者介紹了基於語言的模型以及如何進行序列到序列的學習以生成視頻標題。對於無監督特徵學習，作者討論了從監督學習轉向無監督學習的必要性，然後介紹了如何設計更好的替代訓練任務來學習視頻表示。最後，本書介紹了最近的自我訓練流程，如對比學習和使用變壓器的遮蔽圖像/視頻建模。本書提供了有前景的方向，旨在促進未來在深度學習視頻理解領域的研究成果。

作者簡介

Zuxuan Wu received the Ph.D. in Computer Science from the University of Maryland in 2020. He is currently an Associate Professor in the School of Computer Science at Fudan University and worked as a Research Scientist at Facebook AI. His research interests are in deep learning and large-scale video understanding. His work has been recognized by an AI 2000 Most Influential Scholars Award in 2022, a Microsoft Research PhD Fellowship (10 people Worldwide) in 2019 and a Snap PhD Fellowship (10 people Worldwide) in 2017.

Yu-Gang Jiang is a Chang Jiang Scholar Distinguished Professor at School of Computer Science, Fudan University. His research is focused on multimedia, computer vision, and robust & trustworthy AI. As the director of Shanghai Collaborative Innovation Center of Intelligent Visual Computing and Fudan Vision and Learning (FVL) Laboratory, he leads a group of researchers working on all aspects of robust & trustworthy visual analytics. He publishes extensively in top journals and conferences with over 25000 citations and an H-index of 79. His research outcomes have had major impacts on applications like mobile visual search/recognition and defect detection for high-speed railway infrastructures. His work has led to many awards, including the inaugural 2014 ACM China Rising Star Award, the 2015 ACM SIGMM Rising Star Award, several best paper awards, and various recognitions from NSF China, MOE China, and Shanghai Government. He holds a PhD in Computer Science from City University of Hong Kong and spent three years working at Columbia University before joining Fudan in 2011. He is an elected Fellow of IAPR and IEEE.

作者簡介(中文翻譯)

Zuxuan Wu於2020年獲得馬里蘭大學計算機科學博士學位。他目前是復旦大學計算機科學學院的副教授，並曾在Facebook AI擔任研究科學家。他的研究興趣包括深度學習和大規模視頻理解。他的工作在2022年獲得AI 2000最具影響力學者獎，2019年獲得微軟研究博士獎學金（全球10名），以及2017年獲得Snap博士獎學金（全球10名）。

Yu-Gang Jiang是復旦大學計算機科學學院的長江學者特聘教授。他的研究專注於多媒體、計算機視覺以及穩健與可信的人工智慧。作為上海智能視覺計算協同創新中心和復旦視覺與學習（FVL）實驗室的主任，他領導一組研究人員，專注於穩健與可信的視覺分析的各個方面。他在頂級期刊和會議上發表了大量論文，引用次數超過25000次，H指數為79。他的研究成果對移動視覺搜索/識別和高速鐵路基礎設施的缺陷檢測等應用產生了重大影響。他的工作獲得了多項獎項，包括2014年首屆ACM中國新星獎、2015年ACM SIGMM新星獎、數個最佳論文獎，以及來自中國國家自然科學基金、中國教育部和上海市政府的各種認可。他擁有香港城市大學的計算機科學博士學位，並在2011年加入復旦大學之前，在哥倫比亞大學工作了三年。他是IAPR和IEEE的當選會士。

Deep Learning for Video Understanding 暫譯: 深度學習在視頻理解中的應用