Text Mining with Machine Learning: Principles and Techniques
暫譯: 文本挖掘與機器學習:原則與技術
Zizka, Jan, Dařena, Frantisek, Svoboda, Arnost
- 出版商: CRC
- 出版日期: 2021-06-30
- 售價: $2,380
- 貴賓價: 9.5 折 $2,261
- 語言: 英文
- 頁數: 368
- 裝訂: Quality Paper - also called trade paper
- ISBN: 1032086211
- ISBN-13: 9781032086217
-
相關分類:
Text-mining、Machine Learning
海外代購書籍(需單獨結帳)
商品描述
This book provides a perspective on the application of machine learning-based methods in knowledge discovery from natural languages texts. By analysing various data sets, conclusions which are not normally evident, emerge and can be used for various purposes and applications. The book provides explanations of principles of time-proven machine learning algorithms applied in text mining together with step-by-step demonstrations of how to reveal the semantic contents in real-world datasets using the popular R-language with its implemented machine learning algorithms. The book is not only aimed at IT specialists, but is meant for a wider audience that needs to process big sets of text documents and has basic knowledge of the subject, e.g. e-mail service providers, online shoppers, librarians, etc.
The book starts with an introduction to text-based natural language data processing and its goals and problems. It focuses on machine learning, presenting various algorithms with their use and possibilities, and reviews the positives and negatives. Beginning with the initial data pre-processing, a reader can follow the steps provided in the R-language including the subsuming of various available plug-ins into the resulting software tool. A big advantage is that R also contains many libraries implementing machine learning algorithms, so a reader can concentrate on the principal target without the need to implement the details of the algorithms her- or himself. To make sense of the results, the book also provides explanations of the algorithms, which supports the final evaluation and interpretation of the results. The examples are demonstrated using realworld data from commonly accessible Internet sources.
商品描述(中文翻譯)
這本書提供了基於機器學習方法在自然語言文本知識發現中的應用視角。通過分析各種數據集,能夠得出通常不明顯的結論,這些結論可以用於各種目的和應用。書中解釋了在文本挖掘中應用的經過時間考驗的機器學習算法的原則,並逐步演示如何使用流行的R語言及其實現的機器學習算法來揭示現實世界數據集中的語義內容。這本書不僅針對IT專業人士,還面向需要處理大量文本文件並具備基本知識的更廣泛讀者,例如電子郵件服務提供商、網上購物者、圖書館員等。
本書首先介紹基於文本的自然語言數據處理及其目標和問題。它專注於機器學習,展示各種算法及其用途和可能性,並回顧其優缺點。從初始數據預處理開始,讀者可以按照書中提供的步驟使用R語言,包括將各種可用的插件整合到最終的軟體工具中。一個很大的優勢是R還包含許多實現機器學習算法的庫,因此讀者可以專注於主要目標,而無需自己實現算法的細節。為了理解結果,書中還提供了算法的解釋,這有助於最終的評估和結果解釋。示例使用來自常見可訪問的互聯網來源的真實數據進行演示。
作者簡介
Jan Zizka is a consultant in machine learning and data mining. He has worked as a system programmer, developer of advanced software systems, and researcher. For the last 25 years, he has devoted himself to AI and machine learning, especially text mining. He has been a faculty at a number of universities and research institutes. He has authored approximately 100 international publications.
Frantisek Dařena is an associate professor and the head of the Text Mining and NLP group at the Department of Informatics, Mendel University, Brno. He has published numerous articles in international scientific journals, conference proceedings, and monographs, and is a member of editorial boards of several international journals. His research includes text/data mining, intelligent data processing, and machine learning.
Arnost Svoboda is an expert programer. His speciality includes programming languages and systems such as R, Assembler, Matlab, PL/1, Cobol, Fortran, Pascal, and others. He started as a system programmer. The last 20 years, Arnost has worked also as a teacher and researcher at Masaryk University in Brno. His current interest are machine learning and data mining.
作者簡介(中文翻譯)
Jan Zizka 是一位機器學習和資料探勘的顧問。他曾擔任系統程式設計師、高級軟體系統的開發者以及研究人員。在過去的25年中,他專注於人工智慧和機器學習,特別是文本探勘。他曾在多所大學和研究機構任教,並發表了約100篇國際出版物。
Frantisek Dařena 是捷克布爾諾孟德爾大學資訊學系的副教授及文本探勘與自然語言處理(NLP)小組的負責人。他在國際科學期刊、會議論文集和專著上發表了大量文章,並且是幾本國際期刊的編輯委員會成員。他的研究領域包括文本/資料探勘、智能資料處理和機器學習。
Arnost Svoboda 是一位專家程式設計師。他的專長包括R、組合語言、Matlab、PL/1、Cobol、Fortran、Pascal等程式語言和系統。他最初是作為系統程式設計師開始的。在過去的20年中,Arnost也在布爾諾的馬薩里克大學擔任教師和研究人員。他目前的興趣是機器學習和資料探勘。