Senior Site Reliability Engineer (Golang, K8) (R5066)


At Elastic, we see endless possibility in a world of endless data. And we use the power of search to help people and organizations turn that possibility into results. Elastic is the leading platform for search-powered solutions. With solutions in Enterprise Search, Observability, and Security, we help improve customer and employee search experiences, keep critical applications running smoothly, and protect against cyber threats. Elastic enables organizations worldwide to use the power of Elastic, including Netflix, Uber, BBC, Microsoft, and thousands of others.

Elastic was built on a foundation of being free and open, which trickles down to how we work. We’re a distributed organization and have been from the beginning. Being distributed isn’t just a way of doing business—it’s a mentality that is at the core of our culture.

What you will be doing

  • Investigate performance issues in our fleet and implement any necessary changes
  • Solve large scale problems with code – we are developers first
  • Make configuration changes across thousands of VMs efficiently
  • Automate system testing to validate VM image builds and live running systems
  • Enhance observability for our systems, be able to catch problems earlier
  • Ensure our systems can talk to each other across the network
  • Build features for our quickly expanding Kubernetes fleet management system
  • Maintain our CDN and associated services
  • Model for our internal customers how to deploy well-behaved Kubernetes applications
  • Staff regular, but humane, on-call to support uptime
  • Work closely with the sister teams in SRE to present a unified service platform to the entire organization

What you bring along

You don't need to have every one of these items, but these represent the skills that would serve the team well, and the more you have, the more you would enjoy the job.

  • You think in terms of core SRE tenets – error budgets, SLOs, and percentages. At our scale things always fail. The question is, “how much is too much?”
  • You like the configuration management space (Chef, Salt, Puppet, or Ansible)
  • You enjoy troubleshooting performance issues, you’ve built multiple dashboards (Kibana, Grafana, Prometheus, etc) and you can maybe tell a story about a sysctl setting that’s bitten you in the past.
  • You have either used Kubernetes or want to ramp up with it quickly
  • You understand the differences between virtualization and containerization, you know how to troubleshoot applications running inside Linux containers and know how container networking works.
  • You have good knowledge of DNS, HTTP, caching, proxies, and CDNs
  • You understand Linux networking and feel comfortable using tcpdump to troubleshoot iptables issues. You know what CNIs are and have maybe used them to isolate k8s applications.
  • You have experience in at least one programming language, its ecosystem, its testing framework, its build system, and you’ve probably tried to profile it. You also know its rough edges and there is at least one thing you really hate about it. You’ve written something non-trivial with it, preferably a service with async code.
  • You enjoy learning new technology. Infrastructure work is building and gluing things together. You will not always be able to glue something on that you know well. You’re not afraid to read and understand code that you didn’t write.
  • You know Linux. Even Kubernetes still runs on computers! This means you can debug most normal issues with performance, networking, kernel drivers, package management, etc. or have a good idea where to start
  • You have used at least one of the major cloud providers deeply and can reason about cloud-native architecture, controlling costs, and interacting with its APIs.
  • You like to work on projects as a team and help to define outcomes that measure their success.
  • You can thrive in a diverse, distributed environment across continents and ethnicities, communicating well in written English.

Bonus Points

  • Experience writing advanced Go
  • Experience writing Rust, Clojure, or Haskell
  • Experience building infra with Terraform, even better if you’ve written automation around Terraform
  • You have managed a production application(s) on Kubernetes.
  • You have used strace or eBPF to get insights into what an application is doing because you didn’t have access to the source code.
  • Worked in a large SaaS environment in a public cloud provider
  • Designed, implemented, diagnosed, and/or resolved issues with the Elastic Stack
  • Worked on projects using GitHub for version control and/or you’ve tracked work using GitHub Issues
  • Have mentored, coached, and grown team members to bring out their best
  • Experience with open-source communities, maybe even a project that you’ve fostered yourself

Additional Information - We Take Care of Our People

As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.

We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do.

  • Competitive pay based on the work you do here and not your previous salary
  • Health coverage for you and your family in many locations
  • Ability to craft your calendar with flexible locations and schedules for many roles
  • Generous number of vacation days each year
  • Double your charitable giving - We match donations 1:1 up to $1500 USD (or local currency equivalent)
  • Up to 40 hours each year to use toward volunteer projects you love.
  • Embracing parenthood with minimum of 16 weeks of parental leave

Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation.

We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email We will reply to your request within 24 business hours of submission.

Applicants have rights under Federal Employment Laws, view posters linked below:
Family and Medical Leave Act (FMLA) Poster; Equal Employment Opportunity (EEO) Poster; and Employee Polygraph Protection Act (EPPA) Poster.

Please see here for our Privacy Statement.


Learn about Elastic's Culture

Notify Me of Open Positions

Sign in with your social account to receive emails when Elastic posts open positions you might be interested in:

Powered By Ongig