Senior Site Reliability Engineer - Klaviyo|Meet.jobs

Salary

156k - 235k USD Annually

Required skills

    Job description

    Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at https://klaviyo.tech

    Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to provide services, tooling, and guidance to Klaviyo's product engineers to make them more productive and ensure their services are sufficiently reliable, scalable, and secure.

    The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. SREs are team players who work collaboratively amongst themselves and with engineers from product teams to build the platform Klaviyo relies on to power its products.

    As a Senior Site Reliability Engineer you will own multiple foundational Klaviyo services and make a big impact on the productivity of our product engineering teams.

    How you will make a difference:

    • Ship foundational services to enable Klaviyo engineering to move faster with confidence
    • Design and develop systems and processes that enable highly available & scalable systems
    • Design, build and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
    • Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
    • Leverage technology such as Python, Go, Bash, Django, AWS, Kubernetes, Terraform, MySQL, Apache Pulsar, Redis, and Clickhouse to advance Klaviyo’s platform
    • Champion best practices by actively collaborating with other teams in a culture that values technical design review
    • Contribute to the company as a subject matter expert in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
    • Mentor and pair with other Klaviyo engineers to build better software by focusing on performance, self-healing system, configuration as code; defensive programming, application security, etc.
    • Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue
    • Work hand-in-hand with product-facing engineers to ship impactful code
    • Perform quantitative analysis to understand and scale Klaviyo systems and manage the cross-functional effort to resolve scalability issues
    • Produce and advocate for preventative, upstream solutions with internal stakeholders and external vendors and dependencies
    • Confidently make informed, data-driven decisions in a fast paced environment with competing priorities
    • Evangelize Site Reliability best practices across the engineering organization and community

    Who You Are:

    • BA or BS Degree in Computer Science, related field, or equivalent experience
    • 5+ years of responsibility operating & scaling complex distributed systems
    • Experience developing applications in Python, Ruby, Go, etc.
    • Experience working on an engineering team building software
    • Fundamental understanding of Linux (we run Ubuntu) and all layers of the networking stack; you should be confident administering and debugging production Linux systems
    • Ability to handle yourself and complex systems in outage situations and to drive failures to root cause analysis and prevention of future issues

    Klaviyo