開啟智能對話新紀元:大規模語言模型的探索與實踐
蔡華、徐清、宣曉華
商品描述
"本書深度探討了當今科技領域最引人註目的大規模語言模型相關技術,內容主要圍繞大規模語言模型構建、評估和應用展開,分為以下四部分:第 1~5章主要介紹大規模語言模型的發展歷程及其訓練相關內容,包括語言模型的基本架構、大規模語言模型的高效微調技術、人類反饋強化學習和模型的分佈式訓練;第 6和 7章主要介紹大規模語言模型的推理優化技術、推理加速框架和模型的評估;第 8~10章主要介紹大規模語言模型擴展和應用,包括大規模語言模型和知識的融合、多模態大規模語言模型的技術介紹和其智能體擴展應用,以及大規模語言模型的垂直領域應用;第 11章主要介紹大規模語言模型研究的困難、挑戰和未來潛在研究方向。 本書面向技術愛好者、從業者、學術研究者和一般讀者。它提供大規模語言模型相關的全面介紹,幫助從業人員和專業人士瞭解大規模語言模型的應用及技術原理,支持學術界研究前沿技術,並以通俗的語言幫助讀者理解這一技術及其對生活的影響。 "
目錄大綱
目錄
第 1章大規模語言模型的背景介紹 ......................................1
1.1語言建模的發展階段 ................................................ 2
1.2大規模語言模型帶來的機遇 ...................................... 3
第 2章從統計語言模型到預訓練語言模型 ............................5
2.1統計語言模型 .......................................................... 6
2.2神經網絡語言模型.................................................... 7
2.2.1前饋神經網絡語言模型 .................................. 7
2.2.2循環神經網絡語言模型 .................................. 8
2.2.3長短期記憶神經網絡語言模型 ........................ 9
2.2.4 Word2Vec詞向量表示模型 ...........................10
2.3 預訓練語言模型 ......................................................12 ELMo .........................................................12
2.3.1 Transformer.................................................13
2.3.2 BERT .........................................................22
2.3.3 ELECTRA ..................................................23
2.3.4 GPT 1-3......................................................25
2.3.5 BART .........................................................29
2.3.6 T5 ..............................................................31
2.3.7
第 3章大規模語言模型的框架結構 ........................................................................34
3.1編碼器結構.................................................................................................36
3.2 編碼器-解碼器結構 .....................................................................................36 GLM...............................................................................................36
3.2.1 UL2 ................................................................................................41
3.2.2
3.3 解碼器結構.................................................................................................43 PaLM..............................................................................................43
3.3.1 BLOOM..........................................................................................45
3.3.2 InstructGPT....................................................................................47
3.3.3
3.4 LLaMA家族 ..............................................................................................50
3.4.1預訓練數據 ......................................................................................52
3.4.2模型架構 .........................................................................................53
3.4.3中文 LLaMA ...................................................................................62
3.4.4中文 Alpaca.....................................................................................66
第 4章大規模語言模型的訓練方法 ........................................................................69
4.1模型的訓練成本 ..........................................................................................71
4.1.1算力估算 .........................................................................................71
4.1.2費用和能耗 ......................................................................................72
4.2有監督微調.................................................................................................74
4.2.1提示學習 .........................................................................................75
4.2.2上下文學習 ......................................................................................76
4.2.3指令微調 .........................................................................................77
4.3參數高效微調 .............................................................................................78
4.3.1部分參數的高效微調.........................................................................79
4.3.2參數增加的高效微調.........................................................................80
4.3.3重參數化的高效微調.........................................................................91
4.3.4混合高效微調系列 ............................................................................97
4.4人類反饋強化學習.....................................................................................100
4.4.1強化學習 .......................................................................................101
4.4.2近端策略優化.................................................................................104
4.4.3人類反饋對齊.................................................................................111
4.5大模型災難性遺忘.....................................................................................123
第 5章大模型分佈式並行技術.............................................................................125
5.1分佈式系統...............................................................................................125
5.2數據並行 ..................................................................................................129
5.2.1輸入數據切分.................................................................................130
5.2.2模型參數同步.................................................................................131
5.2.3數據並行優化.................................................................................132
5.3模型並行 ..................................................................................................134
5.3.1 張量並行 .......................................................................................134
5.3.2 流水線並行 ....................................................................................139
5.3.3 優化器相關並行 .............................................................................141
5.4其他並行 ..................................................................................................146
5.4.1 異構系統並行.................................................................................146
5.4.2 專家並行 .......................................................................................147
5.4.3 多維混合並行.................................................................................148
5.4.4 自動並行 .......................................................................................149
5.5並行訓練框架 ...........................................................................................149
5.5.1 Megatron-LM................................................................................152
5.5.2 DeepSpeed.....................................................................................159 Colossal-AI....................................................................................163
5.5.3
第 6章大規模語言模型解碼推理優化相關技術 .....................................................168
6.1解碼方法 ..................................................................................................168
6.1.1 基於搜索的解碼方法.......................................................................169
6.1.2 基於採樣的解碼方法.......................................................................171
6.2推理優化方法 ...........................................................................................174
6.2.1 推理原理 .......................................................................................177
6.2.2 推理加速 .......................................................................................177
6.3模型壓縮技術 ...........................................................................................179
6.3.1 量化 ..............................................................................................181
6.3.2 剪枝 ..............................................................................................184
6.3.3 蒸餾 ..............................................................................................186
6.4顯存優化技術 ...........................................................................................187
6.4.1 鍵值緩存 .......................................................................................187
6.4.2 註意力優化 ....................................................................................188
6.5算子優化技術 ...........................................................................................195
6.5.1 算子融合 .......................................................................................195
6.5.2 高性能算子 ....................................................................................195
6.6推理加速框架 ...........................................................................................195
6.6.1 HuggingFace TGI...........................................................................196 vLLM............................................................................................197
6.6.2
6.6.3 LightLLM......................................................................................200
第 7章大規模語言模型的評估.............................................................................203
7.1評估概述 ..................................................................................................205
7.2評估體系 ..................................................................................................206
7.2.1知識與能力 ....................................................................................207
7.2.2倫理與安全 ....................................................................................209
7.3評估方法 ..................................................................................................212
7.3.1自動評估 .......................................................................................213
7.3.2人工評估 .......................................................................................217
7.3.3其他評估 .......................................................................................221
7.4評估領域 ..................................................................................................223
7.4.1通用領域 .......................................................................................223
7.4.2特定領域 .......................................................................................226
7.4.3綜合評測 .......................................................................................227
7.5評估挑戰 ..................................................................................................232
第 8章大規模語言模型與知識的結合...................................................................233
8.1知識和知識表示 ........................................................................................233
8.2知識圖譜簡介 ...........................................................................................236
8.3大規模語言模型和知識圖譜的結合 .............................................................238
8.4知識圖譜增強大規模語言模型 ....................................................................240
8.4.1 LLM預訓練階段............................................................................240
8.4.2 LLM評估階段 ...............................................................................245
8.4.3 LLM推理階段 ...............................................................................247
8.5大規模語言模型增強知識圖譜 ....................................................................249
8.5.1知識圖譜嵌入.................................................................................249
8.5.2知識圖譜補全.................................................................................251
8.5.3知識圖譜構建.................................................................................257
8.5.4知識圖譜到文本生成.......................................................................263
8.5.5知識圖譜問答.................................................................................265
8.6大規模語言模型和知識圖譜協同.................................................................267
8.6.1知識表示 .......................................................................................267
8.6.2知識推理 .......................................................................................268
8.7知識檢索增強大規模語言模型工程應用.......................................................268
8.7.1結構化數據 ....................................................................................269
8.7.2結構化和非結構化數據 ...................................................................270
8.7.3向量數據庫 ....................................................................................272
8.7.4 LangChain知識庫問答...................................................................276
8.8未來的發展方向 ........................................................................................279
第 9章多模態大規模語言模型技術應用 ...............................................................281
9.1多模態指令調節 ........................................................................................285
9.1.1模態對齊 .......................................................................................286
9.1.2數據收集 .......................................................................................287
9.1.3模態橋接 .......................................................................................290
9.1.4模型評估 .......................................................................................292
9.2多模態上下文學習.....................................................................................296
9.3多模態思維鏈 ...........................................................................................299
9.3.1模態連接 .......................................................................................299
9.3.2學習範式 .......................................................................................300
9.3.3鏈的配置和形式 .............................................................................301
9.4 LLM輔助視覺推理 ...................................................................................301
9.4.1訓練範式 .......................................................................................303
9.4.2功能角色 .......................................................................................305
9.4.3模型評估 .......................................................................................307
9.5 LLM擴展智能體 ......................................................................................307
9.5.1智能體...........................................................................................308
9.5.2記憶模塊 .......................................................................................312
9.5.3任務規劃 .......................................................................................314
9.5.4動作模塊 .......................................................................................317
9.5.5評估策略 .......................................................................................319
9.6多模態語言模型挑戰 .................................................................................323
9.6.1技術問題 .......................................................................................323
9.6.2成本問題 .......................................................................................323
9.6.3社會問題 .......................................................................................324
第 10章大規模語言模型應用 ..............................................................................326
10.1法律領域 ................................................................................................328
10.1.1法律提示研究..............................................................................329
10.1.2法律綜合評估..............................................................................332
10.2教育領域 ................................................................................................336
10.2.1能力評估 ....................................................................................336
10.2.2倫理問題 ....................................................................................340
10.2.3問答應用 ....................................................................................341
10.3金融領域 ................................................................................................342
10.3.1智能應用場景..............................................................................346
10.3.2困難和挑戰 .................................................................................347
10.4生物醫療 ................................................................................................348
10.4.1潛力和價值 .................................................................................348
10.4.2應用的場景 .................................................................................351
10.4.3困難和挑戰 .................................................................................355
10.5代碼生成 ................................................................................................356
10.5.1代碼生成問題..............................................................................356
10.5.2代碼大規模語言模型....................................................................357
10.5.3發展趨勢 ....................................................................................361
第 11章展望和結論 ...........................................................................................363
11.1局限和挑戰 .............................................................................................363
11.1.1局限 ...........................................................................................363
11.1.2挑戰 ...........................................................................................364
11.2方向和建議 .............................................................................................365
11.2.1數據方面 ....................................................................................365
11.2.2技術方面 ....................................................................................365
11.2.3應用方面 ....................................................................................366
11.2.4方向建議 ....................................................................................366
11.3值得探索的研究 ......................................................................................368
11.3.1基礎理論研究..............................................................................369
11.3.2高效計算研究..............................................................................370
11.3.3安全倫理研究..............................................................................371
11.3.4數據和評估研究 ..........................................................................372
11.3.5認知學習問題..............................................................................373
11.3.6高效適配研究..............................................................................374
參考文獻 ...............................................................................................................376