Low Resource Social Media Text Mining
暫譯: 低資源社交媒體文本挖掘
Palakodety, Shriphani, Khudabukhsh, Ashiqur R., Jayachandran, Guha
- 出版商: Springer
- 出版日期: 2021-10-03
- 售價: $2,800
- 貴賓價: 9.5 折 $2,660
- 語言: 英文
- 裝訂: Quality Paper - also called trade paper
- ISBN: 981165624X
- ISBN-13: 9789811656248
-
相關分類:
Text-mining
海外代購書籍(需單獨結帳)
商品描述
This book focuses on methods that are unsupervised or require minimal supervision--vital in the low-resource domain. Over the past few years, rapid growth in Internet access across the globe has resulted in an explosion in user-generated text content in social media platforms. This effect is significantly pronounced in linguistically diverse areas of the world like South Asia, where over 400 million people regularly access social media platforms. YouTube, Facebook, and Twitter report a monthly active user base in excess of 200 million from this region. Natural language processing (NLP) research and publicly available resources such as models and corpora prioritize Web content authored primarily by a Western user base. Such content is authored in English by a user base fluent in the language and can be processed by a broad range of off-the-shelf NLP tools. In contrast, text from linguistically diverse regions features high levels of multilinguality, code-switching, and varied language skill levels. Resources like corpora and models are also scarce. Due to these factors, newer methods are needed to process such text.
This book is designed for NLP practitioners well versed in recent advances in the field but unfamiliar with the landscape of low-resource multilingual NLP. The contents of this book introduce the various challenges associated with social media content, quantify these issues, and provide solutions and intuition. When possible, the methods discussed are evaluated on real-world social media data sets to emphasize their robustness to the noisy nature of the social media environment.
On completion of the book, the reader will be well-versed with the complexity of text-mining in multilingual, low-resource environments; will be aware of a broad set of off-the-shelf tools that can be applied to various problems; and will be able to conduct sophisticated analyses of such text.商品描述(中文翻譯)
本書專注於無監督或需要最少監督的方法,這在低資源領域中至關重要。在過去幾年中,全球互聯網接入的快速增長導致社交媒體平台上用戶生成的文本內容激增。這一現象在語言多樣性較高的地區,如南亞,表現得尤為明顯,該地區有超過4億人定期訪問社交媒體平台。YouTube、Facebook 和 Twitter 報告顯示,來自該地區的月活躍用戶數超過2億。自然語言處理(NLP)研究及公開可用的資源,如模型和語料庫,主要優先考慮由西方用戶群體創作的網絡內容。這些內容由流利使用英語的用戶創作,並且可以被各種現成的 NLP 工具處理。相比之下,來自語言多樣性地區的文本則具有高度的多語言性、代碼切換和不同的語言技能水平。語料庫和模型等資源也相對稀缺。由於這些因素,需要新的方法來處理這類文本。
本書旨在為熟悉該領域最新進展但對低資源多語言 NLP 環境不熟悉的 NLP 從業者設計。本書內容介紹了與社交媒體內容相關的各種挑戰,量化這些問題,並提供解決方案和直覺。當可能時,所討論的方法會在真實的社交媒體數據集上進行評估,以強調其對社交媒體環境噪聲特性的穩健性。
完成本書後,讀者將熟悉多語言、低資源環境中文本挖掘的複雜性;將了解一系列可應用於各種問題的現成工具;並能夠對這類文本進行複雜的分析。
作者簡介
Ashiqur Khuda Bukhsh is a project scientist at Carnegie Mellon University. He received his PhD in Computer Science from CMU.
Guha Jayachandran is the CEO and founder of Onai, USA. He received Ph.D. in Computer Science from Stanford University.
作者簡介(中文翻譯)
Shriphani Palakodety 是美國 Onai 的軟體工程師。
Ashiqur Khuda Bukhsh 是卡內基梅隆大學的專案科學家。他在卡內基梅隆大學獲得了計算機科學博士學位。
Guha Jayachandran 是美國 Onai 的執行長及創辦人。他在史丹佛大學獲得了計算機科學博士學位。