Note: as the video registration was lost due to a technical issue, the recovered low quality audio has been remixed with the slides kindly provided by the speaker. SLAs, SLIs, SLOs… oh my! What are all those definitions now? Why should I need them? If unreliable, even the best-designed API is useless and your client will leave for other places. But neither should you do too much, failure should be accepted and permitted because 100% availability is a myth. How do you define your reliability level? How many 9 should your uptime have? How do you know if you're monitoring the right metrics? Do you know what an error budget is and how to use it? If reliability is a feature, how should it be prioritised against other features? In this speech, we will discuss the current trend, known as site reliability engineering, even if the topic is not that new. I will also talk about how to implement this methodology and how to improve the reliability of our Conversational AI services in Swisscom while being the main customer entry point.
Get notified about new features and conference additions.