97 Things Every Sre Should Know: Collective Wisdom from the Experts (每位 SRE 應知的 97 件事:專家的集體智慧)

Stolarsky, Emil, Woo, Jaime

  • 出版商: O'Reilly
  • 出版日期: 2020-12-29
  • 定價: $1,700
  • 售價: 9.5$1,615
  • 貴賓價: 9.0$1,530
  • 語言: 英文
  • 頁數: 252
  • 裝訂: Quality Paper - also called trade paper
  • ISBN: 1492081493
  • ISBN-13: 9781492081494
  • 相關分類: DevOps
  • 相關翻譯: SRE工程師應知應會97件事 (簡中版)
  • 立即出貨 (庫存=1)

相關主題

商品描述

When your system goes down, every minute means lost business and angry customers venting frustration on social media. You may be at wits' end, wishing you knew more about the problem. Enter site reliability engineering (SRE). This practical book takes you through actionable advice on a wide range of topics including how to adopt SRE, where DevOps and SRE overlap, and how monitoring and observability differ.

Editors Jaime Woo and Emil Stolarsky, cofounders of Incident Labs, have collected 97 concise and useful tips from various colleagues and fellow professionals to help you expand your SRE skills through trusted best practices and new approaches to knotty problems. You'll hone your SRE skills through sound advice, including how to ask thought-provoking questions that will drive the direction of the field.

  • Learn how SRE relates to concepts including DevOps and resilience engineering
  • Assess how SRE is implemented across companies of different sizes
  • Implement foundational concepts of SRE, including SLOs, error budgets, incident response, game days, and post-mortems
  • Build and scale an SRE team for your organization's changing needs
  • Evaluate the progress of SRE adoption and strategies and relate them back to stakeholders

商品描述(中文翻譯)

當您的系統發生故障時,每一分鐘都意味著失去業務和客戶在社交媒體上發洩不滿。您可能已經束手無策,希望自己對問題有更多了解。這就是網站可靠性工程(SRE)的用武之地。這本實用書籍提供了一系列可行的建議,包括如何採用SRE,DevOps和SRE的重疊之處,以及監控和可觀察性的區別。

編輯Jaime Woo和Emil Stolarsky是Incident Labs的共同創始人,他們從各種同事和專業人士那裡收集了97條簡潔而有用的建議,以幫助您通過可信的最佳實踐和解決棘手問題的新方法來擴展您的SRE技能。您將通過實用的建議來提升您的SRE技能,包括如何提出發人深省的問題,推動該領域的發展方向。

本書涵蓋了以下主題:
- 了解SRE與DevOps和韌性工程等概念的關聯性
- 評估不同規模公司實施SRE的方式
- 實施SRE的基礎概念,包括服務水準目標(SLOs)、錯誤預算、事故回應、遊戲日和事後分析
- 為組織不斷變化的需求建立和擴展SRE團隊
- 評估SRE採用的進展和策略,並與利益相關者進行關聯

作者簡介

Emil Stolarsky is a site reliability engineer, who previously worked on caching, performance, & disaster recovery at Shopify and the internal Kubernetes platform at DigitalOcean. He is the program co-chair for SREcon EMEA 2019 and SREcon Americas West 2020, and contributed a chapter to the O'Reilly book "Seeking SRE."

Jaime Woo is an award-nominated writer, and is a frequent speaker at SREcon EMEA, Americas West, and Americas East. He spent three years as a molecular biologist, before working at DigitalOcean, Riot, and Shopify, where he launched the engineering communications function.

作者簡介(中文翻譯)

Emil Stolarsky是一位網站可靠性工程師,曾在Shopify負責緩存、性能和災難恢復,並在DigitalOcean負責內部Kubernetes平台。他是SREcon EMEA 2019和SREcon Americas West 2020的節目聯合主席,並為O'Reilly書籍《尋找SRE》撰寫了一章。

Jaime Woo是一位獲獎提名的作家,經常在SREcon EMEA、Americas West和Americas East發表演講。他在成為一名分子生物學家之前,曾在DigitalOcean、Riot和Shopify工作,並在後者推出了工程通訊功能。