From Extractive to Abstractive Summarization: A Journey
暫譯: 從提取式到抽象式摘要:一段旅程

Mehta, Parth, Majumder, Prasenjit

  • 出版商: Springer
  • 出版日期: 2019-08-30
  • 售價: $4,100
  • 貴賓價: 9.5$3,895
  • 語言: 英文
  • 頁數: 116
  • 裝訂: Hardcover - also called cloth, retail trade, or trade
  • ISBN: 9811389330
  • ISBN-13: 9789811389337
  • 海外代購書籍(需單獨結帳)

商品描述

This book describes recent advances in text summarization, identifies remaining gaps and challenges, and proposes ways to overcome them. It begins with one of the most frequently discussed topics in text summarization - 'sentence extraction' -, examines the effectiveness of current techniques in domain-specific text summarization, and proposes several improvements. In turn, the book describes the application of summarization in the legal and scientific domains, describing two new corpora that consist of more than 100 thousand court judgments and more than 20 thousand scientific articles, with the corresponding manually written summaries. The availability of these large-scale corpora opens up the possibility of using the now popular data-driven approaches based on deep learning. The book then highlights the effectiveness of neural sentence extraction approaches, which perform just as well as rule-based approaches, but without the need for any manual annotation. As a next step, multiple techniques for creating ensembles of sentence extractors - which deliver better and more robust summaries - are proposed. In closing, the book presents a neural network-based model for sentence compression. Overall the book takes readers on a journey that begins with simple sentence extraction and ends in abstractive summarization, while also covering key topics like ensemble techniques and domain-specific summarization, which have not been explored in detail prior to this.

商品描述(中文翻譯)

本書描述了文本摘要的最新進展,識別了仍然存在的空白和挑戰,並提出了克服這些挑戰的方法。書中首先討論了文本摘要中最常被提及的主題之一——「句子提取」,檢視了當前技術在特定領域文本摘要中的有效性,並提出了幾項改進建議。接著,本書描述了摘要在法律和科學領域的應用,介紹了兩個新的語料庫,這些語料庫包含超過十萬份法院判決和超過兩萬篇科學文章,並附有相應的手動撰寫摘要。這些大規模語料庫的可用性為基於深度學習的數據驅動方法的使用開啟了可能性。接下來,本書強調了神經句子提取方法的有效性,這些方法的表現與基於規則的方法相當,但不需要任何手動標註。作為下一步,提出了多種創建句子提取器集成的方法,這些方法能提供更好且更穩健的摘要。最後,本書介紹了一種基於神經網絡的句子壓縮模型。整體而言,本書帶領讀者從簡單的句子提取開始,最終達到抽象摘要,同時涵蓋了集成技術和特定領域摘要等關鍵主題,這些主題在此之前並未被詳細探討。

作者簡介

Dr. Parth Mehta completed his M.Tech. in Machine Intelligence and his Ph.D. in Text Summarization at Dhirubhai Ambani Institute of ICT (DA-IICT), Gandhinagar, India. At the DA-IICT he was part of the Information Retrieval and Natural Language Processing Lab. He was also involved in the national project "Cross Lingual Information Access", funded by the Govt. of India, which focused on building a cross-lingual search engine for nine Indian languages. Dr. Mehta has served as reviewer for the journals Information Processing and Management and Forum for Information Retrieval Evaluation. Apart from several journal and conference papers, he has also co-edited a book on text processing published by Springer. Prof. Prasenjit Majumder is an Associate Professor at Dhirubhai Ambani Institute of ICT (DA-IICT), Gandhinagar and a Visiting Professor at the Indian Institute of Information Technology, Vadodara (IIIT-V). Prof. Majumder completed his Ph.D. at Jadavpur University in 2008 and worked as a postdoctoral fellow at the University College Dublin, prior to joining the DA-IICT, where he currently heads the Information Retrieval and Language Processing Lab. His research interests lie at the intersection of Information Retrieval, Cognitive Science and Human Computing Interaction. He has headed several projects sponsored by the Govt. of India. He is one of the pioneers of the Forum for Information Retrieval Evaluation (FIRE), which assesses research on Information Retrieval and related areas for South Asian languages. Since being founded in 2008, FIRE has grown to become a respected conference, drawing participants from across the globe. Prof. Majumder has authored several journal and conference papers, and co-edited two special issues of Transactions in Information Systems (ACM). He has co-edited two books: 'Multi Lingual Information Access in South Asian Languages' and 'Text Processing, ' both published by Springer.

作者簡介(中文翻譯)

Dr. Parth Mehta 完成了在印度甘地納格的 Dhirubhai Ambani Institute of ICT (DA-IICT) 的機器智能碩士學位 (M.Tech.) 以及文本摘要的博士學位 (Ph.D.)。在 DA-IICT,他是資訊檢索與自然語言處理實驗室的一部分。他還參與了由印度政府資助的國家項目「跨語言資訊存取」,該項目專注於為九種印度語言建立跨語言搜尋引擎。Dr. Mehta 曾擔任期刊《Information Processing and Management》及《Forum for Information Retrieval Evaluation》的審稿人。除了多篇期刊和會議論文外,他還共同編輯了一本由 Springer 出版的文本處理書籍。Prof. Prasenjit Majumder 是 Dhirubhai Ambani Institute of ICT (DA-IICT) 的副教授,並且是印度信息技術學院 Vadodara (IIIT-V) 的訪問教授。Majumder 教授於 2008 年在 Jadavpur University 獲得博士學位,並在加入 DA-IICT 之前擔任都柏林大學學院的博士後研究員,目前他負責資訊檢索與語言處理實驗室。他的研究興趣位於資訊檢索、認知科學和人機互動的交集上。他曾負責多個由印度政府資助的項目。他是「資訊檢索評估論壇」(Forum for Information Retrieval Evaluation, FIRE) 的先驅之一,該論壇評估南亞語言的資訊檢索及相關領域的研究。自 2008 年成立以來,FIRE 已成為一個受人尊敬的會議,吸引來自全球的參與者。Majumder 教授已發表多篇期刊和會議論文,並共同編輯了《Transactions in Information Systems (ACM)》的兩個特刊。他還共同編輯了兩本書籍:《南亞語言的多語言資訊存取》('Multi Lingual Information Access in South Asian Languages')和《文本處理》('Text Processing'),均由 Springer 出版。

最後瀏覽商品 (20)