Site Reliability Engineer II (Core SRE) (R4364)


At Elastic, we see endless possibility in a world of endless data. And we use the power of search to help people and organizations turn that possibility into results. Elastic is the leading platform for search-powered solutions. With solutions in Enterprise Search, Observability, and Security, we help improve customer and employee search experiences, keep critical applications running smoothly, and protect against cyber threats. Elastic enables organizations worldwide to use the power of Elastic, including Netflix, Uber, BBC, Microsoft, and thousands of others.

Elastic was built on a foundation of being free and open, which trickles down to how we work. We’re a distributed organization and have been from the beginning. Being distributed isn’t just a way of doing business—it’s a mentality that is at the core of our culture.

Thanks to our ongoing expansion, we have the opportunity to grow our Cloud Reliability team. As part of Elastic Cloud engineering, we focus on delivering a reliable and resilient Elastic Cloud. We draw upon our operational experience to not just troubleshoot issues with distributed systems, but also influence the direction of Elastic Cloud for designing and solving for a stable and reliable service. We’re looking for people who are just as passionate about taking an engineering approach to solving operational problems as they are to utilising data and feedback to work collaboratively to solve problems. 

In this role you will:

  • Lead technical initiatives aimed at improving the reliability of Elastic Cloud, taking an engineering approach to the prevention, detection, and timely mitigation of issues.
  • Contribute to SRE engineering through auto-remediation and system engineering efforts to continue our efforts in reducing human intervention in automation of processes and operational tasks.
  • Respond to major incidents, correcting and improving systems to prevent incidents and grow at scale.
  • Solve the operational problems that you find in Elastic Cloud with full support from your team. You will contribute to a culture of elevating others, collaboration, and operational excellence.
  • Participate in a weekly on-call rotation, using a follow-the-sun model.

What you bring along:

  • You have a holistic view of and true appreciation for reliability, borne of real-world experience operating production services. You have examples of using software engineering and SRE practices to solve operational problems.
  • You have a background in software engineering, and can confidently collaborate with engineers to identify and resolve issues. Ideally with experience in public cloud; AWS, GCP, Azure and preferably on distributed systems at scale. 
  • You have outstanding interpersonal skills, and are able to build strong relationships with your inclusive communication methods. Examples of working in distributed teams or working remotely is desirable.

Bonus Points:

You don't need to have all of these items, but these represent the types of work you will do on our Core SRE team

  • You have operated a SaaS product in a public cloud (AWS, GCP, Azure, or SoftLayer preferred).
  • You have experience in system administration with professional skills in Linux on distributed systems at scale.
  • You have designed, implemented or diagnosed and resolved issues with the Elastic Stack.
  • You have demonstrable experience in leading alerting and major incident management best practices.
  • You are experienced in contributing in a self-organizing and collaborative team environment.
  • You have mentored, coached, and grown team members to bring out the best in them.
  • You are comfortable writing software to automate orchestration tasks at scale (we commonly use Python, Go, and Shell scripting).
  • You have used metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) effectively to diagnose issues and quantify impacts, sharing this information with others at varying levels in the organization.
  • You have worked with containerized services (such as Docker.)

Additional Information - We Take Care of Our People

As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.

We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do.

  • Competitive pay based on the work you do here and not your previous salary
  • Health coverage for you and your family in many locations
  • Ability to craft your calendar with flexible locations and schedules for many roles
  • Generous number of vacation days each year
  • Double your charitable giving - We match donations 1:1 up to $1500 USD (or local currency equivalent)
  • Up to 40 hours each year to use toward volunteer projects you love.
  • Embracing parenthood with minimum of 16 weeks of parental leave

Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation.

We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email We will reply to your request within 24 business hours of submission.

Applicants have rights under Federal Employment Laws, view posters linked below:
Family and Medical Leave Act (FMLA) Poster; Equal Employment Opportunity (EEO) Poster; and Employee Polygraph Protection Act (EPPA) Poster.

Please see here for our Privacy Statement.

Learn More About Elastic's Culture

Notify Me of Open Positions

Sign in with your social account to receive emails when Elastic posts open positions you might be interested in:

Powered By Ongig