Site Reliability Engineer (R6095)


At Elastic, we see endless possibility in a world of endless data. And we use the power of search to help people and organizations turn that possibility into results. Elastic is the leading platform for search-powered solutions. With solutions in Enterprise Search, Observability, and Security, we help improve customer and employee search experiences, keep critical applications running smoothly, and protect against cyber threats. Elastic enables organizations worldwide to use the power of Elastic, including Netflix, Uber, BBC, Microsoft, and thousands of others.

Elastic was built on a foundation of being free and open, which trickles down to how we work. We’re a distributed organization and have been from the beginning. Being distributed isn’t just a way of doing business—it’s a mentality that is at the core of our culture.

Elastic Cloud is the fast-growing flagship provider of Elastic’s products. As part of the global Platform Engineering team, the Managed Kubernetes Infrastructure SRE team is responsible to design, build and maintain our next generation multi-cloud Kubernetes environments, ensuring they can host both internal and customer facing services such as the Elasticsearch Stateless one. This also includes developing and hosting services that themselves support the rest of the infrastructure, so that we can rapidly deploy products from all corners of the organization. We need help building a scalable, reliable and easy to maintain infrastructure to enable us to offer a truly exceptional customer experience. This is where you come in!

What you will be doing

  • Building the next generation of Kubernetes Infrastructure to host internal and external services
  • Building features for our quickly expanding Kubernetes fleet management system
  • Building tooling and automation to support the new infrastructure scaling needs.
  • Mentoring internal customers on how to build well-behaved Kubernetes applications
  • Staffing regular, but humane, on-call to support uptime
  • Working closely with the sister teams in SRE to present a unified service platform to the entire organization

What you bring along

You don't need to have all of these items, but these represent the skills that would serve the team well.

  • You have built or managed a Kubernetes-at-scale infrastructure (e.g. 50+ clusters, 100+ nodes each), ideally across multiple cloud providers, and the necessary automation to support it.
  • You have extensive operational experience with either managed or self-hosted Kubernetes clusters and at least one etcd or CNI horror story to tell.
  • You enjoy solving problems with code, you've created and used APIs
  • You have experience in at least one programming language, its ecosystem, its testing framework, its build system, and you’ve probably tried to profile it. You also know its rough edges and there is at least one thing you really hate about it.
  • You enjoy learning new technology. Infrastructure work is building and gluing things together.
  • You know Linux. Even Kubernetes still runs on computers! This means you can debug most normal issues with performance, networking, kernel drivers, package management, etc. or have a good idea where to start
  • You have used at least one of the major cloud providers deeply and can reason about cloud-native architecture, controlling costs, and interacting with its APIs.
  • You like to work on projects as a team and help to define outcomes that measure their success.
  • You can thrive in a diverse, distributed environment across continents and ethnicities, communicating well in written English.

Bonus Points:

  • Experience writing non-trivial programs in Go
  • Experience building infra with Infrastructure as Code tooling
  • Experience with at least one Kubernetes Service Mesh solution
  • Worked in a large SaaS environment hosted in a public cloud provider
  • Designed, implemented, diagnosed, and/or resolved issues with the Elastic Stack
  • Worked on projects using GitHub for version control and/or you’ve tracked work using GitHub Issues
  • Experience with open-source communities, maybe even a project that you’ve fostered yourself
  • Experience with Crossplane.

Additional Information - We Take Care of Our People

As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.

We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do.

  • Competitive pay based on the work you do here and not your previous salary
  • Health coverage for you and your family in many locations
  • Ability to craft your calendar with flexible locations and schedules for many roles
  • Generous number of vacation days each year
  • Double your charitable giving - We match donations 1:1 up to $1500 USD (or local currency equivalent)
  • Up to 40 hours each year to use toward volunteer projects you love.
  • Embracing parenthood with minimum of 16 weeks of parental leave

Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation.

We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email We will reply to your request within 24 business hours of submission.

Applicants have rights under Federal Employment Laws, view posters linked below:
Family and Medical Leave Act (FMLA) Poster; Equal Employment Opportunity (EEO) Poster; and Employee Polygraph Protection Act (EPPA) Poster.

Please see here for our Privacy Statement.

Powered By Ongig