大數據導論技術實訓

劉鵬總主編 李四明 劉鵬 主編

  • 出版商: 清華大學
  • 出版日期: 2024-01-01
  • 定價: $414
  • 售價: 8.5$352
  • 語言: 簡體中文
  • ISBN: 7302651426
  • ISBN-13: 9787302651420
  • 相關分類: 大數據 Big-data
  • 下單後立即進貨 (約4週~6週)

  • 大數據導論技術實訓-preview-1
  • 大數據導論技術實訓-preview-2
  • 大數據導論技術實訓-preview-3
大數據導論技術實訓-preview-1

相關主題

商品描述

本書作為《大數據導論》(ISBN 9787302500704)的配套實訓教材,旨在幫助讀者夯實基礎知識,還原企業真實業務,提升實操能力。本書從大數據開發所需要的基礎編程知識出發,首先闡述 Linux 開發環境中常用的命令。接著介紹數據清洗工具 Kettle 的基礎操作以及常見的數據可視化效果,如餅圖、柱狀圖、折線圖、平行坐標圖等。最後通過數據清洗、數據可視化、數據挖掘等熱門大數據技術在環境、金融、電商等行業的具體應用,給讀者提供真實的大數據體驗情景。 本書提供了豐富的項目實訓案例,結合實際情況進行真實的行業數據研究,從而培養實用型人才的專業項目能力。本書既可作為培養應用型人才的課程教材,也可作為相關開發人員的自學教材和參考手冊。

目錄大綱

目 錄

第一篇 Linux 入門

實訓1 文件的創建、訪問、修改、刪除 ................................................... 2

1.1 實訓目的 ······················································································ 2

1.2 實訓要求 ······················································································ 2

1.3 實訓原理 ······················································································ 2

1.4 實訓步驟 ······················································································ 3

1.5 實訓結果 ······················································································ 6

實訓2 文件的創建、查看、內容修改 ....................................................... 8

2.1 實訓目的 ······················································································ 8

2.2 實訓要求 ······················································································ 8

2.3 實訓原理 ······················································································ 8

2.4 實訓步驟 ······················································································ 9

2.5 實訓結果 ······················································································ 9

實訓3 文本編輯常用技巧:復制、粘貼、刪除 ....................................... 12

3.1 實訓目的 ···················································································· 12

3.2 實訓要求 ···················································································· 12

3.3 實訓原理 ···················································································· 12

3.4 實訓步驟 ···················································································· 15

3.5 實訓結果 ···················································································· 17

第二篇 數據清洗

實訓4 從文本文件中抽取數據到數據庫 ................................................. 22

4.1 實訓目的 ···················································································· 22

4.2 實訓要求 ···················································································· 22

4.3 實訓原理 ···················································································· 22

4.3.1 Kettle 簡介 ··············································································· 22

4.3.2 從文本文件中抽取數據到數據庫的方法 ··········································· 23

4.4 實訓步驟 ···················································································· 23

4.4.1 安裝 ······················································································· 23

4.4.2 從文本文件中抽取數據到數據庫的步驟 ··········································· 26

4.5 實訓結果 ···················································································· 29

實訓5 從CSV 文件中抽取數據到數據庫 ............................................... 31

5.1 實訓目的 ···················································································· 31

5.2 實訓要求 ···················································································· 31

5.3 實訓原理 ···················································································· 31

5.4 實訓步驟 ···················································································· 32

5.5 實訓結果 ···················································································· 33

實訓6 將Excel 文件數據導入數據庫 ..................................................... 35

6.1 實訓目的 ···················································································· 35

6.2 實訓要求 ···················································································· 35

6.3 實訓原理 ···················································································· 35

6.4 實訓步驟 ···················································································· 35

6.5 實訓結果 ···················································································· 39

實訓7 將MySQL 數據遷移至MongoDB ............................................... 40

7.1 實訓目的 ···················································································· 40

7.2 實訓要求 ···················································································· 40

7.3 實訓原理 ···················································································· 40

7.4 實訓步驟 ···················································································· 41

7.5 實訓結果 ···················································································· 44

實訓8 數據庫增量數據抽取 ................................................................... 45

8.1 實訓目的 ···················································································· 45

8.2 實訓要求 ···················································································· 45

8.3 實訓原理 ···················································································· 45

8.4 實訓步驟 ···················································································· 46

8.5 實訓結果 ···················································································· 53

實訓9 數據增刪改的增量更新 ................................................................ 54

9.1 實訓目的 ···················································································· 54

9.2 實訓要求 ···················································································· 54

9.3 實訓原理 ···················································································· 54

9.4 實訓步驟 ···················································································· 55

9.5 實訓結果 ···················································································· 60

實訓10 數據脫敏 ................................................................................... 62

10.1 實訓目的 ··················································································· 62

10.2 實訓要求 ··················································································· 62

10.3 實訓原理 ··················································································· 62

10.4 實訓步驟 ··················································································· 63

10.5 實訓結果 ··················································································· 67

實訓11 數據檢驗 ................................................................................... 69

11.1 實訓目的 ··················································································· 69

11.2 實訓要求 ··················································································· 69

11.3 實訓原理 ··················································································· 69

11.4 實訓步驟 ··················································································· 69

11.4.1 設置檢驗規則 ·········································································· 69

11.4.2 非空驗證 ················································································ 71

11.4.3 日期類型驗證 ·········································································· 71

實訓12 缺失值清洗 ................................................................................ 75

12.1 實訓目的 ··················································································· 75

12.2 實訓要求 ··················································································· 75

12.3 實訓原理 ··················································································· 75

12.4 實訓步驟 ··················································································· 75

12.4.1 運行SQL 腳本進行清洗 ····························································· 76

12.4.2 運用控件進行清洗 ···································································· 77

實訓13 格式內容清洗 ............................................................................ 80

13.1 實訓目的 ··················································································· 80

13.2 實訓要求 ··················································································· 80

13.3 實訓原理 ··················································································· 80

13.4 實訓步驟 ··················································································· 80

13.4.1 對“格式錯誤類型1”進行清洗 ··················································· 80

13.4.2 對“格式錯誤類型2”進行清洗 ··················································· 84

實訓14 邏輯錯誤清洗 ............................................................................ 88

14.1 實訓目的 ··················································································· 88

14.2 實訓要求 ··················································································· 88

14.3 實訓原理 ··················································································· 88

14.4 實訓步驟 ··················································································· 89

14.4.1 對“邏輯錯誤類型1”進行清洗 ··················································· 89

14.4.2 對“邏輯錯誤類型2”進行清洗 ··················································· 92

第三篇 數據可視化

實訓15 餅圖、柱狀圖、折線圖、平行坐標圖繪制 ................................. 98

15.1 實訓目的 ··················································································· 98

15.2 實訓要求 ··················································································· 98

15.3 實訓原理 ··················································································· 98

15.4 實訓步驟 ················································································· 100

15.4.1 導入數據與模塊 ······································································ 100

15.4.2 數據提取 ··············································································· 101

15.4.3 圖形繪制 ··············································································· 101

實訓16 共享單車數據可視化分析 ........................................................ 109

16.1 實訓目的 ················································································· 109

16.2 實訓要求 ················································································· 109

16.3 實訓步驟 ·················································································· 110

16.3.1 數據準備 ··············································································· 110

16.3.2 數據清洗 ··············································································· 111

16.3.3 數據處理 ··············································································· 111

16.3.4 數據挖掘 ··············································································· 112

16.3.5 可視化分析 ············································································ 114

實訓17 小說雲圖繪制 .......................................................................... 120

17.1 實訓目的 ················································································· 120

17.2 實訓要求 ················································································· 120

17.3 實訓原理 ················································································· 120

17.3.1 jieba 分詞 ·············································································· 120

17.3.2 wordcloud 詞雲 ······································································· 120

17.4 實訓步驟 ················································································· 121

17.4.1 導入模塊 ··············································································· 121

17.4.2 讀取文件,設置路徑 ································································ 121

17.4.3 文本分詞 ··············································································· 122

17.4.4 繪制詞雲 ··············································································· 123

實訓18 籃球命中率可視化 ................................................................... 125

18.1 實訓目的 ················································································· 125

18.2 實訓要求 ················································································· 125

18.3 實訓原理 ················································································· 125

18.4 實訓步驟 ················································································· 126

18.4.1 導入模塊和數據文件 ································································ 126

18.4.2 處理數據 ··············································································· 127

18.4.3 可視化分析 ············································································ 128

第四篇 環境大數據實戰

實訓19 二氧化碳含量預測 ................................................................... 136

19.1 實訓目的 ················································································· 136

19.2 實訓要求 ················································································· 136

19.3 實訓原理 ················································································· 137

19.4 實訓步驟 ················································································· 137

19.4.1 導入包並加載數據 ··································································· 137

19.4.2 初始數據可視化 ······································································ 138

19.4.3 ARIMA 時間序列模型 ······························································ 139

19.4.4 ARIMA 時間序列模型的參數選擇 ··············································· 139

19.4.5 配置ARIMA 時間序列模型 ······················································· 140

19.4.6 驗證預測 ··············································································· 142

19.4.7 生成和可視化預測 ··································································· 145

實訓20 新加坡空氣污染原因分析 ........................................................ 146

20.1 實訓目的 ················································································· 146

20.2 實訓要求 ················································································· 146

20.3 實訓原理 ················································································· 146

20.4 實訓步驟 ················································································· 147

20.4.1 數據準備 ··············································································· 147

20.4.2 驗證假設1:製造業的增加將導致新加坡的空氣污染增加 ················· 148

XII 大數據導論技術實訓

20.4.3 驗證假設2:建築房屋數量的增加將導致新加坡的空氣污染增加 ········ 151

20.4.4 驗證假設3:車輛數量的增加將導致新加坡的空氣污染增加 ·············· 157

實訓21 上海歷史天氣統計 ................................................................... 160

21.1 實訓目的 ················································································· 160

21.2 實訓要求 ················································································· 160

21.3 實訓原理 ················································································· 160

21.4 實訓步驟 ················································································· 161

21.4.1 編寫Mapper 程序 ···································································· 161

21.4.2 編寫Reducer 程序 ··································································· 162

21.4.3 統計上海2016 年每月歷史天氣 ·················································· 162

實訓22 上海每月空氣質量統計 ............................................................ 164

22.1 實訓目的 ················································································· 164

22.2 實訓要求 ················································································· 164

22.3 實訓原理 ················································································· 164

22.4 實訓步驟 ················································································· 165

22.4.1 編寫Mapper 程序 ···································································· 165

22.4.2 編寫Reducer 程序 ··································································· 165

22.4.3 統計上海2016 年每月空氣質量 ·················································· 166

實訓23 北京和上海月均氣溫對比統計 ................................................. 168

23.1 實訓目的 ················································································· 168

23.2 實訓要求 ················································································· 168

23.3 實訓原理 ················································································· 168

23.4 實訓步驟 ················································································· 168

23.4.1 編寫Mapper 程序 ···································································· 168

23.4.2 編寫Reducer 程序 ··································································· 169

23.4.3 統計北京和上海2016 年月平均氣溫對比 ······································· 170

第五篇 金融大數據實戰

實訓24 最優投資組合(上) ............................................................... 172

24.1 實訓目的 ················································································· 172

24.2 實訓要求 ················································································· 172

24.3 實訓原理 ················································································· 172

24.4 實訓步驟 ················································································· 173

24.4.1 導入實訓需要的模塊 ································································ 173

24.4.2 讀取數據 ··············································································· 173

24.4.3 觀察缺失值 ············································································ 173

24.4.4 數據可視化 ············································································ 174

24.4.5 初步統計分析 ········································································· 175

24.4.6 投資組合優化 ········································································· 175

24.4.7 計算組合均值收益率 ································································ 176

24.5 實訓結果 ················································································· 177

實訓25 最優投資組合(下) ............................................................... 179

25.1 實訓目的 ················································································· 179

25.2 實訓要求 ················································································· 179

25.3 實訓原理 ················································································· 179

25.4 實訓步驟 ················································································· 180

25.4.1 最大夏普比率投資組合 ····························································· 180

25.4.2 最小方差投資組合 ··································································· 181

25.4.3 畫散點圖 ··············································································· 182

25.5 實訓結果 ················································································· 182

實訓26 股票走勢預測 .......................................................................... 184

26.1 實訓目的 ················································································· 184

26.2 實訓要求 ················································································· 184

26.3 實訓原理 ················································································· 184

26.4 實訓步驟 ················································································· 185

26.4.1 導入模塊 ··············································································· 185

26.4.2 ARIMA 模型建立 ···································································· 185

26.4.3 數據差分 ··············································································· 186

26.4.4 自相關圖和偏自相關圖 ····························································· 187

26.4.5 模型訓練 ··············································································· 188

26.5 實訓結果 ················································································· 188

第六篇 商業大數據實戰

實訓27 電商產品評論數據情感分析 ..................................................... 192

27.1 實訓目的 ················································································· 192

27.2 實訓要求 ················································································· 192

XIV 大數據導論技術實訓

27.3 實訓原理 ················································································· 192

27.4 實訓步驟 ················································································· 193

27.4.1 評論數據抽取 ········································································· 193

27.4.2 評論文本去重 ········································································· 193

27.4.3 模型準備 ··············································································· 194

27.4.4 刪除前綴評分 ········································································· 194

27.4.5 文本分詞 ··············································································· 195

27.4.6 模型構建 ··············································································· 196

27.5 實訓結果 ················································································· 197

實訓28 eBay 汽車銷售數據分析 .......................................................... 198

28.1 實訓目的 ················································································· 198

28.2 實訓要求 ················································································· 198

28.3 實訓原理 ················································································· 199

28.3.1 數據標準化 ············································································ 199

28.3.2 數據可視化 ············································································ 199

28.4 實訓步驟 ················································································· 199

28.4.1 數據加載和描述 ······································································ 199

28.4.2 數據剖析 ··············································································· 200

28.4.3 預處理 ·················································································· 202

28.4.4 可視化分析 ············································································ 204

28.5 實訓結果 ················································································· 219

實訓29 航空公司客戶價值分析 ............................................................ 220

29.1 實訓目的 ················································································· 220

29.2 實訓要求 ················································································· 220

29.3 實訓原理 ················································································· 220

29.4 實訓步驟 ················································································· 220

29.4.1 數據準備 ··············································································· 220

29.4.2 數據處理 ··············································································· 221

29.4.3 數據預處理 ············································································ 222

29.4.4 構建模型 ··············································································· 225

29.5 實訓結果 ················································································· 226

實訓30 市場購物籃分析 ...................................................................... 227

30.1 實訓目的 ················································································· 227

30.2 實訓要求 ················································································· 227

30.3 實訓原理 ················································································· 227

30.3.1 MLxtend ················································································ 227

30.3.2 關聯規則 ··············································································· 227

30.3.3 Apriori 算法挖掘頻繁項集 ························································· 228

30.4 實訓步驟 ················································································· 228

30.4.1 用Pandas 和MLxtend 代碼導入並讀取數據 ··································· 228

30.4.2 數據處理 ··············································································· 228

30.4.3 One-Hot 編碼 ·········································································· 229

30.4.4 使用算法包進行關聯規則運算 ···················································· 230

30.4.5 結果檢視 ··············································································· 231

30.4.6 德國流行的組合 ······································································ 231

附錄A 大數據和人工智能實驗環境 ...................................................... 233

A.1 大數據實驗環境 ········································································· 233

A.2 人工智能實驗環境 ······································································ 236