Advances in Multimodal Information Retrieval and Generation

Luo, Man, Gokhale, Tejas, Varshney, Neeraj

  • 出版商: Springer
  • 出版日期: 2024-06-26
  • 售價: $2,450
  • 貴賓價: 9.5$2,328
  • 語言: 英文
  • 頁數: 164
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 3031578155
  • ISBN-13: 9783031578151
  • 海外代購書籍(需單獨結帳)

相關主題

商品描述

This book provides an extensive examination of state-of-the-art methods in multimodal retrieval, generation, and the pioneering field of retrieval-augmented generation. The work is rooted in the domain of Transformer-based models, exploring the complexities of blending and interpreting the intricate connections between text and images. The authors present cutting-edge theories, methodologies, and frameworks dedicated to multimodal retrieval and generation, aiming to furnish readers with a comprehensive understanding of the current state and future prospects of multimodal AI. As such, the book is a crucial resource for anyone interested in delving into the intricacies of multimodal retrieval and generation. Serving as a bridge to mastering and leveraging advanced AI technologies in this field, the book is designed for students, researchers, practitioners, and AI aficionados alike, offering the tools needed to expand the horizons of what can be achieved in multimodal artificial intelligence.

商品描述(中文翻譯)

本書對多模態檢索、生成以及開創性的檢索增強生成領域的最先進方法進行了廣泛的探討。該研究根植於基於Transformer的模型領域,探索文本與圖像之間複雜的聯繫及其詮釋的複雜性。作者提出了專注於多模態檢索和生成的前沿理論、方法論和框架,旨在為讀者提供對當前狀態和未來前景的全面理解。因此,本書對於任何有興趣深入了解多模態檢索和生成細節的人來說,都是一個重要的資源。作為掌握和利用該領域先進AI技術的橋樑,本書專為學生、研究人員、實務工作者和AI愛好者設計,提供擴展多模態人工智慧所能實現的可能性的工具。

作者簡介

Man Luo, Ph.D. is a Research Fellow at Mayo Clinic, Arizona. She received her Ph.D. at ASU in 2023. Her research interests lie in Natural Language Processing (NLP) and Computer Vision (CV) with a specific focus on open-domain information retrieval under multi-modality settings and Retrieval-Augmented Generation Models. She has published first author at top conferences such as AAAI, ACL and EMNLP. She serves as the guest editor of PLOS Digital Medicine Journal. She has served as reviewer for AAAI, IROS, EMNLP, NAACL, ACL conferences. Dr. Luo is an organizer of the ODRUM workshops at CVPR 2022 and CVPR 2023 and Multimodal4Health at ICHI 2024.

Tejas Gokhale, Ph.D., is an Assistant Professor at the University of Maryland, Baltimore County. He received his Ph.D. from Arizona State University in 2023, M.S. from Carnegie Mellon University in 2017, and B.E.(Honours) from Birla Institute of Technology and Science, Pilani in 2015. Dr. Gokhale is a computer vision researcher working on robust visual understanding with a focus on connection between vision and language, semantic data engineering, and active inference. His research draws inspiration from the principles of perception, communication, learning, and reasoning. He is an organizer of the ODRUM workshops at CVPR 2022 and CVPR 2023, SERUM tutorial at WACV 2023, and RGMV tutorial at WACV 2024.

Neeraj Varshney is a Ph.D. candidate at ASU and works in natural language processing, primarily focusing on improving the efficiency and reliability of NLP models. He has published multiple papers in top-tier NLP and AI conferences including ACL, EMNLP, EACL, NAACL, and AAAI and is a recipient of the SCAI Doctoral Fellowship, GPSA Outstanding Research Award, and Jumpstart Research Grant. He has served as a reviewer for several conferences including ACL, EMNLP, EACL, and IJCAI and has also been selected as an outstanding reviewer by EACL'23 conference.

Yezhou Yang, Ph.D., is an Associate Professor with the School of Computing and Augmented Intelligence (SCAI), Arizona State University. He received his Ph.D. from University of Maryland. His primary interests lie in Cognitive Robotics, Computer Vision, and Robot Vision, especially exploring visual primitives in human action understanding from visual input, grounding them by natural language as well as high-level reasoning over the primitives for intelligent robots.

Chitta Baral, Ph.D., is a Professor with the School of Computing and Augmented Intelligence (SCAI), Arizona State University and received his Ph.D. from University of Maryland. His primary interests lie in Natural Language Processing (NLP), Computer Vision (CV), the intersection of NLP and CV, and Knowledge Representation and Reasoning.Chitta Baral is a Professor with the School of Computing and Augmented Intelligence (SCAI), Arizona State University, and received his PhD from University of Maryland. His primary interests lie in Natural Language Processing (NLP), Computer Vision (CV), the intersection of NLP and CV, and Knowledge Representation and Reasoning.

作者簡介(中文翻譯)

Man Luo 博士是美國梅奧診所(Mayo Clinic)亞利桑那州的研究員。她於2023年在亞利桑那州立大學(ASU)獲得博士學位。她的研究興趣包括自然語言處理(NLP)和計算機視覺(CV),特別專注於多模態環境下的開放領域信息檢索和檢索增強生成模型。她在AAAI、ACL和EMNLP等頂級會議上以第一作者身份發表過論文。她擔任PLOS Digital Medicine Journal的客座編輯,並曾擔任AAAI、IROS、EMNLP、NAACL和ACL會議的審稿人。Luo 博士是CVPR 2022和CVPR 2023的ODRUM工作坊以及ICHI 2024的Multimodal4Health的組織者。

Tejas Gokhale 博士是馬里蘭大學巴爾的摩縣分校的助理教授。他於2023年在亞利桑那州立大學獲得博士學位,2017年在卡內基梅隆大學獲得碩士學位,2015年在比爾拉科技與科學學院(Birla Institute of Technology and Science, Pilani)獲得榮譽學士學位。Gokhale 博士是一位計算機視覺研究者,專注於穩健的視覺理解,研究視覺與語言之間的聯繫、語義數據工程和主動推理。他的研究受到感知、交流、學習和推理原則的啟發。他是CVPR 2022和CVPR 2023的ODRUM工作坊、WACV 2023的SERUM教程以及WACV 2024的RGMV教程的組織者。

Neeraj Varshney 是亞利桑那州立大學的博士候選人,專注於自然語言處理,主要致力於提高NLP模型的效率和可靠性。他在ACL、EMNLP、EACL、NAACL和AAAI等頂級NLP和AI會議上發表了多篇論文,並獲得了SCAI博士獎學金、GPSA傑出研究獎和Jumpstart研究資助。他曾擔任多個會議的審稿人,包括ACL、EMNLP、EACL和IJCAI,並在EACL'23會議中被選為傑出審稿人。

Yezhou Yang 博士是亞利桑那州立大學計算與增強智能學院(SCAI)的副教授。他在馬里蘭大學獲得博士學位。他的主要研究興趣包括認知機器人學、計算機視覺和機器人視覺,特別是探索人類行動理解中的視覺原始元素,並通過自然語言將其基礎化,以及對這些原始元素進行高層次推理以實現智能機器人。

Chitta Baral 博士是亞利桑那州立大學計算與增強智能學院(SCAI)的教授,並在馬里蘭大學獲得博士學位。他的主要研究興趣包括自然語言處理(NLP)、計算機視覺(CV)、NLP與CV的交集,以及知識表示與推理。