How Twitter Monitors Millions of Time-series
Complex distributed systems and services present new problems for traditional monitoring and debugging tools. The systems are significantly larger, produce significantly more data, and have significantly more interconnected and interdependent components which must be considered. As existing monitoring solutions are no longer able to keep pace, it is vital to develop new tools and architectures to enable developers and analysts to understand and scale the large systems of today. The Twitter Observability stack is one such tool set which operates at a very large scale, monitoring the distributed system which is Twitter.
This talk will cover the end to end architecture of the stack, starting from instrumenting services and applications, through the storage technologies for managing millions of time series, and finally to the analytic and monitoring tools enabling users to use data to build and maintain large scale distributed systems. Learnings will be shared on how to make many classes of users happy, including developers and operations staff, in addition to lessons on how to build and scale monitoring stacks for your application, whether big or small.
Outline
- Starting with good data – what you measure is one of the most important aspects.
- The importance of low-level and high-level aggregate data in distributed systems, including the importance of data dependencies.
- Lessons learned and architecture of collecting and moving data from the applications through the system.
- Storage and how to manage explosive, continual growth of time series and trace datasets.
- How to delight users with powerful tools to query and visualize their data
- Automatic monitoring and alerting: how to scale automated monitoring and prevent failures before they happen.
People planning to attend this session also want to see:
Yann Ramin
Senior Software Engineer, Twitter, Inc.
Yann is the technical lead on the Twitter Observability group, where he is responsible for guiding the growth and scale for the Observability service for all of Twitter. Yann is a software engineer with over 10 years of experience building large scale distributed monitoring systems, including thousand node wireless sensor network and control systems, to large scale software and service monitoring stacks.