Site Reliability Engineering at Ignite 2018
Are you attending or watching Ignite 2018? Here’s the resources around Site Reliability Engineering here or on-line at Ignite. Come find out more about this role and how to transform your career to take this role.
Site Reliability Engineering (SRE) is a new role for many folks in the Microsoft ecosystem. This role has been around with some major companies like Google, LinkedIn, Facebook and Etsy. Translating the SRE role to an enterprise IT organization has been something that Microsoft has been part of and driving for Microsoft, but also for our customers. At Ignite 2018, you are going to see the thoughts of this transformation into SRE from the mindset of Service Engineering.
For those of you attending Ignite 2018 in-person, please join me and my other SRE speakers along with other speakers on how to succeed with Azure in the Azure Customer Success area in the Microsoft Showcase area left of the Landmark as you walk in. Look in the Applications & Infrastructure area for the Customer Success in Azure area. I will be in the booth from 2:00 pm to 5:00 pm on Monday, Sept. 24, 2018 and 10:00 am to 1:00pm on Wednesday, Sept. 26, 2018. While we will have speakers to the SRE role at all times in the Customer Success in Azure area, please feel free to stop by during my shifts to understand about the change to the SRE role from IT Operations.
While the Customer Success in Azure area is a great opportunity for those of you here in Orlando, there are ways for folks attending virtually and in-person to get more information on the SRE role. We have four great sessions about the SRE role through the week and some great speakers presenting those sessions. Join these great speakers, including myself, to hear more about how the SRE role works and how IT Pros can look to move to the SRE role in their career. These sessions will be available live for those in-person, live-streamed for those unable to be here in-person, and recorded to view after they are complete.
Please come join myself, David Blank-Edelman, Kishore Jalleda, and Jason Hand to understand how this role fits into not only single service online companies but into the corporate IT environment.
Tuesday, September 25, 2018
BRK2272 - Introducing Site Reliability Engineering
David Blank-Edelman, Microsoft
9:00 AM in OCCC W240 (45 min)
Just within the last fifteen years we have seen at least two separate communities evolve from the generic idea of operations. The first, DevOps, grew up very much in public. The second, Site Reliability Engineering (SRE) germinated more within the halls of public cloud providers, but is now starting to catch on like wildfire throughout the industry in organizations of all sizes and stripes. SRE is providing them with a concrete approach for preserving the stability of their production environment while maintaining the feature velocity crucial for the success of the business. Join us while we explore the basic ideas behind SRE and talk about how you can get started implementing its principles and practices in your own organization.
BRK2314 - Incident response: Where SRE and DevOps collide
Kishore Jalleda, Microsoft
Jason Hand, Microsoft
10:45 AM in OCCC W205 (75 min)
What happens when things go wrong? The 1ES Site Reliability Engineering (SRE) team has built an effective incident response process that drives reliability and performance in their own services and services they depend on. We dive into what incident response looks like from notification or detection all the way through the post-mortem and remediation of the contributing factors.
Thursday, September 27, 2018
BRK4025 - Implementing SRE practices on Azure: SLI/SLO deep dive
David Blank-Edelman, Microsoft
9:00 AM in OCCC W311 A-D (45 min)
One of the most useful practices many organizations embrace when they first implement Site Reliability Engineering (SRE) is the adoption of Service Level Indicators (SLIs) and Service Level Objectives (SLOs). Once in place, they can serve as a concrete foundation for the tricky negotiation between feature velocity and operational stability crucial for achieving the desired reliability of your services, systems, and products. Join us for a technical deep dive as we explore the basics of SLIs/SLOs and the tools Microsoft Azure provides to help implement and manage them in your environment.
BRK2362 - The SRE role: An unexpected journey
Jared Shockley, Microsoft
10:45 AM in OCCC W304 E-H (75 min)
As the world of information technology advances, the correlating roles and responsibilities also continue to evolve. Examining the progress from IT operations through service engineering and into site reliability engineering, IT pros will need a strategic development plan that builds on current skill sets.
In this session, we discuss the mindset required for effective site reliability engineering, including how to most efficiently grow career skills, utilize specific tools and processes, and incorporate lessons learned from inherent failures. We also analyze the results of platform moves to modern engineering practices and systems.
LinkedIn SREInCon 2018
I want to thank all of the attendees that came to my session. As this was an internal conference, I cannot provide any details about my session nor any of the resources.
While attending Microsoft's first SRECon earlier this year, I found out that our LinkedIn SRE sisters and brothers not only have their own but it has been put on several years now. I worked with my boss to get an invitation but wanted to bring something to their conference, so I applied to speak.
First, I want to thank the LinkedIn SREInCon 2018 Committee for letting me not only attend but speak at this year's event. It was a tremendous honor to learn a lot about how LinkedIn runs their services and site, how their SRE's work to improve the site reliability and the integrated roles that the SRE's have with the engineering teams they work with.
Additionaly, I want to thank all of the attendees that came to my session. I had a great time with the topic and it felt well received. The Q&A at the end was amazing as well. As this was an internal conference, I cannot provide any details about my session nor any of the resources.