Site Reliability Engineering: How Google Runs Production Systems (Paperback)
暫譯: 網站可靠性工程:Google 如何運行生產系統 (平裝本)

Niall Richard Murphy, Betsy Beyer, Chris Jones, Jennifer Petoff

買這商品的人也買了...

相關主題

商品描述

Building and operating distributed systems is fundamental to large-scale production infrastructure, but doing so in a scalable, reliable, and efficient way requires a lot of good design, and trial and error. In this collection of essays and articles, key members of the Site Reliability Team at Google explain how the company has successfully navigated these deep waters over the past decade.

You’ll learn how Google continuously monitors and deploys some of the largest software systems in the world, how its Site Reliability Engineering team learns and improves after outages, and how they balance risk-taking vs reliability with error budgets.

商品描述(中文翻譯)

建立和運營分散式系統對於大規模生產基礎設施至關重要,但以可擴展、可靠和高效的方式做到這一點需要良好的設計以及反覆試驗。在這本文章和論文的合集裡,Google 的網站可靠性團隊的關鍵成員解釋了公司在過去十年中如何成功地應對這些深奧的挑戰。

您將學到 Google 如何持續監控和部署全球一些最大的軟體系統,網站可靠性工程團隊在故障後如何學習和改進,以及他們如何在風險承擔與可靠性之間平衡錯誤預算。