Observability 可观测性 #
GitHub awesome-observability
起源和定义 #
1 标准 #
2 咨询报告 #
3 专利 & 论文 #
4 开源 & 厂商 & 云 #
CNCF #
Elastic #
Splunk #
PingCAP #
Google Cloud #
5 演讲 & 分享 & 博客 #
- 2022-07-05 构建适合组织的可观测性能力,用「实战」破解“网络谜案”!https://mp.weixin.qq.com/s?__biz=MzU1MzY4NzQ1OA==&mid=2247510126&idx=1&sn=79a48c3983fbf31627c810b59eb3f292
- 2021-11-23 天旦:一文读懂“监控”与“可观测性”的区别 https://www.netis.com/2021/11/23/%e4%b8%80%e6%96%87%e8%af%bb%e6%87%82%e7%9b%91%e6%8e%a7%e4%b8%8e%e5%8f%af%e8%a7%82%e6%b5%8b%e6%80%a7%e7%9a%84%e5%8c%ba%e5%88%ab/
- 2021-10-15 PingCAP:做出让人爱不释手的基础软件:可观测性和可交互性 https://mp.weixin.qq.com/s?__biz=MzI3NDIxNTQyOQ==&mid=2247497795&idx=1&sn=0cb882d6e2f71c7aea1bcdf6e7bffcb3
- 2021-03-01 Kubernetes 稳定性保障手册 – 日志专题 https://mp.weixin.qq.com/s/5ezU9Z6f1-Q8YyRU7O5ZXA
- 2020-12-07 Ship and visualize your Istio virtual service traces with AWS X-Ray https://aws.amazon.com/cn/blogs/containers/ship-and-visualize-your-istio-virtual-service-traces-with-aws-x-ray/
- 2020-10-15 美团:AIOps在美团的探索与实践——故障发现篇 https://mp.weixin.qq.com/s/AjE7uP7ApVPyL_HdQDkk5g
- 2020-08-17 Splunk:可观测性,难道又是新瓶装了旧酒? https://mp.weixin.qq.com/s/EFkggMBYt1Wxvudhx6y7gA
- 2020-05-14 Service Mesh 高可用在企业级生产中的实践 https://www.infoq.cn/article/tcVNEFUWTEdIECFyt2rg
- 2020-05-07 Elastic:Elastic 可观测性解决方案在 SRE 和事件响应中的应用 https://www.elastic.co/cn/blog/elastic-observability-sre-incident-response
- 2020-05-01 万字破解云原生可观测性 https://zhuanlan.zhihu.com/p/137672436
- 2019-09 用Elastic Stack破解云原生的可观测性 https://elasticsearch.cn/slides/232
- 2019-08-11 蚂蚁金服在云原生架构下的可观察性的探索和实践 https://www.sofastack.tech/blog/sofa-meetup-3-cloud-original-retrospect/
- 2019-03-26 The Importance of Distributed Tracing for Apache Kafka Based Applications https://www.confluent.io/blog/importance-of-distributed-tracing-for-apache-kafka-based-applications/
- 2019-03-01 Elastic:借助 Elastic Stack 实现可观察性 https://www.elastic.co/cn/blog/observability-with-the-elastic-stack
- 2019-02-06 腾讯:天机阁——全链路跟踪系统设计与实现 https://www.infoq.cn/article/JF-144XPDqDxxdizdfwT
- 2018-05-03 opensource.com:How the four components of a distributed tracing system work together https://opensource.com/article/18/5/distributed-tracing
- 2018-04-06 微服务架构—链路追踪(Nginx篇)https://my.oschina.net/yu120/blog/1790419
- 2017-10-07 阿里:全链路稳定性背后的数字化支撑:阿里巴巴鹰眼技术解密 https://mp.weixin.qq.com/s/xyJ4GB955PoOXk7UOMqGBw
- 2017-05-31 阿里:打造立体化监控体系的最佳实践——分布式调用跟踪和监控实践 https://developer.aliyun.com/article/91435
- 2016-10-14 美团:分布式会话跟踪系统架构设计与实践 https://tech.meituan.com/2016/10/14/mt-mtrace.html
- 2016-04-01 生产环境下的性能监控 - Datadog https://tech.glowing.com/cn/performance-monitoring-with-datadog/
- 2014-03-28 微博:微博平台的链路追踪及服务质量保障系统——Watchman 系统 https://www.infoq.cn/article/weibo-watchman
6 案例 #
Elastic
- Box:部署 Elastic Stack 以实现可观察性——每次一项微服务
- Entel:选用 Elastic 以将多国的可观测性数据集中到一起,获得 360 度视角
- Delhivery:助力第三方物流实现卓越运营
7 业界动态 #
- 2021-02-17 OpenTelemetry 规范 v1.0.0 | 追踪版
- 2021-02-11 Datadog Acquires Timber Technologies
- 2020-11-24 Splunk to Acquire Network Performance Monitoring Leader Flowmill
- 2020-10-22 跟踪、指标、日志于一身的OpenTelemetry发布跟踪规范RC版本 + GA计划
- 2020-10-20 Splunk to Acquire Plumbr and Rigor, Expanding the World’s Most Comprehensive Observability Portfolio
- 2019-10-02 Splunk Acquires SignalFx
- 2019-04-20 Merging OpenTracing and OpenCensus: A Roadmap to Convergence
资料:https://www.splunk.com/en_us/about-splunk/acquisitions.html
8 其他 #
8.1 日志结构化 #
- OpenTelemetry Log Data Model
- Kubernetes - Introducing Structured Logs
- https://github.com/kubernetes/enhancements/tree/master/keps/sig-instrumentation/1602-structured-logging
- https://kubernetes.io/docs/concepts/cluster-administration/system-logs
- https://docs.docker.com/config/containers/logging/configure/#supported-logging-drivers
- sumo logic - Structured Logging
- Splunk - Logging best practices
日志字段映射参考,包括如下格式与「 OpenTelemetry Log Data Model」之间的映射转换:
- RFC5454 Syslog
- Windows Event Log
- SignalFx Events
- Splunk HEC
- Log4j
- Zap
- Apache HTTP Server access log
- CloudTrail Log Event
- Google Cloud Logging
8.2 日志格式 #
日志格式有很多种,比如:
- Json, Write Logs for Machines, use JSON
- Common Log Format
- RFC3164 Syslog
- CEE(Common Event Expression)
- nginx自定义
也有叫 codec,详见 Logstash codec plugin。