Observability - Principal Software Engineer (R4220) (R4220)


At Elastic, we see endless possibility in a world of endless data. And we use the power of search to help people and organizations turn that possibility into results. Elastic is the leading platform for search-powered solutions. With solutions in Enterprise Search, Observability, and Security, we help improve customer and employee search experiences, keep critical applications running smoothly, and protect against cyber threats. Elastic enables organizations worldwide to use the power of Elastic, including Netflix, Uber, BBC, Microsoft, and thousands of others.

Elastic was built on a foundation of being free and open, which trickles down to how we work. We’re a distributed organization and have been from the beginning. Being distributed isn’t just a way of doing business—it’s a mentality that is at the core of our culture.

The Observability team is in charge of developing solutions that focus on application developers and engineers that run infrastructure and services supporting these applications. Elasticsearch is an efficient datastore for logs, metrics, and application traces, supporting the three pillars of observability. With the recent acquisition of Optimyze, the developers of Prodfiler, Elastic has added support for datacenter-wide, whole-system, continuous profiling. In the Profiling team within Observability our mission is to profile all your code, everywhere, all the time. Every programming language, every containerisation solution, userland and kernel. 

As a Profiling Software Engineer you will be part of a team developing innovative, high quality solutions for CPU, memory and I/O profiling, aimed at helping our customers understand the performance characteristics of their applications down to the very lines of code responsible for consuming resources of all kinds. As a team we thrive on, and encourage, deep understanding of problems and well-engineered solutions. Our work is a mix of low level systems understanding and engineering, combined with large scale data transfer, storage and analysis. On a day-to-day basis this can involve reverse engineering the internals of a programming language runtime, debugging the Linux kernel, implementing ideas from state of the art research papers, developing novel approaches to systems engineering problems, code reviews, writing design documents, participating in our engineering mailing list, debugging issues in our production infrastructure, talking to customers and more. 

The team is distributed across the world, predominantly in Europe, and collaborates on a daily basis over Github, mailing lists, Zoom, and Slack. 

Why work with us

Work on hard problems - Our long term vision is observability both at scale (entire data center) and in depth (entire operating system). This requires solving a variety of problems involving large scale data gathering/transmission/storage as well as deep understanding of operating system and programming language internals. Along the way we are continuously looking for new technologies (e.g. ebpf) to help us solve these problems. 

Work on meaningful problems - Deep/wide observability is a competitive differentiator. Companies that have it can detect problems and opportunities that their competitors cannot, leading to more efficient, scalable and better performing solutions. The margin/cost-of-goods-sold for SaaS companies is largely related to their cloud costs, and improvements on this is a C-level goal for many companies. 

Work with smart and curious people - Our existing team has a background in diverse areas and technologies. We hire people with a love of deep understanding and building great software. 

Work in a great environment - Elastic strives to ensure employees maintain work-life balance while still being ambitious and playing to win. 

What you will be working on

  • The mission of this role is to expand our capabilities for in-production profiling. As it stands we support always-on CPU profiling, and we wish to expand its capabilities while also extending our features to include memory profiling, I/O profiling, and network profiling. The goal is to enable our customers to get deep insights into their entire technology stack in production environments. 
  • The problems that we address can be broadly broken down into the following two areas. Some of our engineers work across both areas, while others are specialized into one or the other. We are open to hiring people that would like to fit into either mould.
    • On-host agent - We have an on-host agent that combines an eBPF component that runs inside the Linux kernel, with a userland component that is written in Go. At a high level an engineer working on the agent is concerned with extracting as much profiling information as they can from the system with the lowest performance impact possible. We pride ourselves on our agent consuming < 1% of CPU, and ensuring we stay there, while also profiling every major language, requires deep systems and programming language runtime understanding, and a dedication to well engineered solutions
    • Storage backend - Our profiling solution is designed with the goal to allow users to store everything, and filter later, instead of using aggregations or other solutions for reducing data intake at the expense of accuracy or explorative use-cases. This requires us to think carefully and from first principles about both the hardware and software aspects of how we ingest, store, transform and read data. If the idea of efficiently storing and querying billions of events per day excites you, then you’ll enjoy working here! 
  • We believe that engineers that build software should have some hand in operating it. Operating a SaaS brings engineers closer to the consequences of their engineering decisions, and this feedback is invaluable when one needs to squeeze every ounce of performance from their platform. We encourage everyone to get involved with deployments and debugging platform issues.
  • We also believe that engineers should be in contact with the problems their software is solving. This feedback inspires new product ideas, shows up deficiencies, and generally ensures we’re building the right things. We encourage all our engineers to both use our software and to talk to our customers, understand their use cases, and help solve their problems
  • Software quality is hugely important to us. We value high performance, stability and maintainability. In practice this means everyone participates in code reviews, and we encourage engineers to dedicate time to improving and adding to our automated testing and benchmarking pipelines. 
  • We emphasize written documents for software design, and everyone participates in both the writing and reviewing of design documents. We also have an engineering mailing list for technical discussions in which everyone is encouraged to contribute. 

What you will bring along

  • Deep understanding of one or more of:
    • Operating system internals e.g. experience with the Linux kernel, eBPF
    • Programming language runtime internals e.g. OpenJDK, v8, Python, .NET
    • Compiler technologies and SDKs, e.g. profiling guided optimisation (PGO, AutoFDO), link time optimisation (LTO), JIT compilers, clang & gcc internals
    • Building high performance data ingestion, storage and querying platforms
    • Database internals e.g. Clickhouse, Scylla, Elasticsearch
  • 5+ years of experience in developing and shipping software  
  • A willingness to engage in all of the important tasks that are not purely writing code e.g. design documents, code reviews, debugging, interacting with customers, operations
  • Ability to work independently in a globally distributed team. We are predominantly based in Europe (Central European Time), and having 2-3 hours of overlap in your workday with 9-6 CET would be ideal. 

Bonus Skills

  • Previous experience working on observability platforms
  • Previous experience with Elasticsearch, eBPF, time series databases (e.g. Clickhouse), Go, Rust, C++
  • Experience building debuggers, disassemblers, or other similar tooling for low level software introspection and understanding

Additional Information - We Take Care of Our People

As a distributed company, diversity drives our identity. Whether you’re looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life. Your age is only a number. It doesn’t matter if you’re just out of college or your children are; we need you for what you can do.

We strive to have parity of benefits across regions and while regulations differ from place to place, we believe taking care of our people is the right thing to do.

  • Competitive pay based on the work you do here and not your previous salary
  • Health coverage for you and your family in many locations
  • Ability to craft your calendar with flexible locations and schedules for many roles
  • Generous number of vacation days each year
  • Double your charitable giving - We match donations 1:1 up to $1500 USD (or local currency equivalent)
  • Up to 40 hours each year to use toward volunteer projects you love.
  • Embracing parenthood with minimum of 16 weeks of parental leave

Different people approach problems differently. We need that. Elastic is an equal opportunity/affirmative action employer committed to diversity, equity, and inclusion. Qualified applicants will receive consideration for employment without regard to race, ethnicity, color, religion, sex, pregnancy, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, disability status, or any other basis protected by federal, state or local law, ordinance or regulation.

We welcome individuals with disabilities and strive to create an accessible and inclusive experience for all individuals. To request an accommodation during the application or the recruiting process, please email candidate_accessibility@elastic.co We will reply to your request within 24 business hours of submission.

Applicants have rights under Federal Employment Laws, view posters linked below:
Family and Medical Leave Act (FMLA) Poster; Equal Employment Opportunity (EEO) Poster; and Employee Polygraph Protection Act (EPPA) Poster.

Please see here for our Privacy Statement.

Learn about Elastic's Culture

Notify Me of Open Positions

Sign in with your social account to receive emails when Elastic posts open positions you might be interested in:

Powered By Ongig