Apache Mahout Cookbook
Piero Giacomelli
- 出版商: Packt Publishing
- 出版日期: 2013-12-27
- 售價: $1,840
- 貴賓價: 9.5 折 $1,748
- 語言: 英文
- 頁數: 250
- 裝訂: Paperback
- ISBN: 1849518025
- ISBN-13: 9781849518024
-
相關翻譯:
Mahout 實踐指南 (Apache Mahout Cookbook) (簡中版)
海外代購書籍(需單獨結帳)
相關主題
商品描述
Whether you're a beginner or advanced user of Apache Mahout, this cookbook will expand your skills through a host of recipes, illustrations, and real-world examples. Your data mining will take on a totally new level of capability.
Overview
- Learn how to set up a Mahout development environment
- Start testing Mahout in a standalone Hadoop cluster
- Learn to find stock market direction using logistic regression
- Over 35 recipes with real-world examples to help both skilled and the non-skilled developers get the hang of the different features of Mahout
In Detail
The rise of the Internet and social networks has created a new demand for software that can analyze large datasets that can scale up to 10 billion rows. Apache Hadoop has been created to handle such heavy computational tasks. Mahout gained recognition for providing data mining classification algorithms that can be used with such kind of datasets.
"Apache Mahout Cookbook" provides a fresh, scope-oriented approach to the Mahout world for both beginners as well as advanced users. The book gives an insight on how to write different data mining algorithms to be used in the Hadoop environment and choose the best one suiting the task in hand.
"Apache Mahout Cookbook" looks at the various Mahout algorithms available, and gives the reader a fresh solution-centered approach on how to solve different data mining tasks. The recipes start easy but get progressively complicated. A step-by-step approach will guide the developer in the different tasks involved in mining a huge dataset. You will also learn how to code your Mahout’s data mining algorithm to determine the best one for a particular task. Coupled with this, a whole chapter is dedicated to loading data into Mahout from an external RDMS system. A lot of attention has also been put on using your data mining algorithm inside your code so as to be able to use it in an Hadoop environment. Theoretical aspects of the algorithms are covered for information purposes, but every chapter is written to allow the developer to get into the code as quickly and smoothly as possible. This means that with every recipe, the book provides the code for reusing it using Maven as well as the Maven Mahout source code.
By the end of this book you will be able to code your procedure to do various data mining tasks with different algorithms and to evaluate and choose the best ones for your tasks.
What you will learn from this book
- Configure from scratch a full development environment for Mahout with NetBeans and Maven
- Handle sequencefiles for better performance
- Query and store results into an RDBMS system with SQOOP
- Use logistic regression to predict the next step
- Understand text mining of raw data with Naïve Bayes
- Create and understand clusters
- Customize Mahout to evaluate different cluster algorithms
- Use the mapreduce approach to solve real world data mining problems
Approach
"Apache Mahout Cookbook" uses over 35 recipes packed with illustrations and real-world examples to help beginners as well as advanced programmers get acquainted with the features of Mahout.
Who this book is written for
"Apache Mahout Cookbook" is great for developers who want to have a fresh and fast introduction to Mahout coding. No previous knowledge of Mahout is required, and even skilled developers or system administrators will benefit from the various recipes presented.
商品描述(中文翻譯)
無論您是 Apache Mahout 的初學者還是進階使用者,本書將透過一系列食譜、插圖和實際案例來擴展您的技能。您的資料挖掘能力將達到全新的水平。
概述
- 學習如何設置 Mahout 開發環境
- 開始在獨立的 Hadoop 集群中測試 Mahout
- 學習使用邏輯回歸來找出股市走向
- 超過 35 個食譜,提供實際案例,幫助有經驗和沒有經驗的開發者掌握 Mahout 的不同功能
詳細內容
隨著互聯網和社交網絡的興起,對能夠分析可擴展至 100 億行的大型數據集的軟體需求日益增加。Apache Hadoop 被創建來處理這類繁重的計算任務。Mahout 因提供可用於此類數據集的資料挖掘分類算法而獲得認可。
《Apache Mahout Cookbook》為初學者和進階使用者提供了一種新穎的、以範疇為導向的方法來探索 Mahout 世界。本書深入探討如何編寫不同的資料挖掘算法,以便在 Hadoop 環境中使用,並選擇最適合當前任務的算法。
《Apache Mahout Cookbook》考察了各種可用的 Mahout 算法,並為讀者提供了一種以解決方案為中心的新方法來解決不同的資料挖掘任務。食譜從簡單開始,逐漸變得複雜。逐步的方法將指導開發者完成挖掘大型數據集的不同任務。您還將學習如何編寫 Mahout 的資料挖掘算法,以確定特定任務的最佳算法。此外,還有一整章專門介紹如何從外部 RDMS 系統將數據加載到 Mahout 中。本書也特別關注如何在代碼中使用您的資料挖掘算法,以便能夠在 Hadoop 環境中使用。算法的理論方面也有涵蓋以供參考,但每一章都旨在讓開發者能夠快速且順利地進入代碼。這意味著每個食譜都提供了可重用的代碼,並使用 Maven 以及 Maven Mahout 源代碼。
在本書結束時,您將能夠編寫程序以使用不同的算法執行各種資料挖掘任務,並評估和選擇最適合您任務的算法。
您將從本書中學到的內容
- 從零開始配置完整的 Mahout 開發環境,使用 NetBeans 和 Maven
- 處理 sequencefiles 以提高性能
- 使用 SQOOP 查詢並將結果存儲到 RDBMS 系統中
- 使用邏輯回歸預測下一步
- 理解使用 Naïve Bayes 對原始數據進行文本挖掘
- 創建和理解集群
- 自定義 Mahout 以評估不同的集群算法
- 使用 mapreduce 方法解決現實世界的資料挖掘問題
方法
《Apache Mahout Cookbook》使用超過 35 個食譜,配有插圖和實際案例,幫助初學者和進階程序員熟悉 Mahout 的功能。
本書的讀者對象
《Apache Mahout Cookbook》非常適合希望快速入門 Mahout 編碼的開發者。無需具備 Mahout 的先前知識,即使是有經驗的開發者或系統管理員也能從本書中提供的各種食譜中受益。