Senior Site Reliability Engineer, Observability

4 semanas atrás

São Paulo, São Paulo, Brasil Chainlink Labs Tempo inteiro

Overview

Chainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web. Chainlink is the industry-standard platform for providing access to real-world data, offchain computation, and secure cross-chain interoperability across any blockchain. Chainlink Labs helps power verifiable applications for banking, DeFi, global trade, and gaming by collaborating with some of the world's largest financial institutions, notably Swift, DTCC, and ANZ. Chainlink Labs also works with top Web3 teams, including Aave, Compound, GMX, Maker, and Synthetix. Chainlink Labs was ranked as one of the Global Top 100 Most Loved Workplaces by Newsweek 2025.

The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load. This role is a good fit for someone with a strong DevOps mentality, a passion for building and maintaining a mature GitOps environment, and experience focusing on observability. The team is expanding, offering opportunities to build, learn, and grow. We are committed to diversity and inclusion and encourage you to apply even if you don\'t match 100% of the job requirements.

Your Impact

Build and orchestrate a Modern OTEL-based Observability Platform
Support multiple telemetry types, like metrics, logs and traces
Define and support modern governance in observability and problems at scale
Ensure reliability, security, and performance exceed our defined SLAs
Collaborate with engineers across the company to troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action
Ingest, aggregate, transform, and utilize data from multiple sources in the real-time data pipeline
Oversee the availability, performance, and supportability of observability infrastructure
Create processes around alert response operations and support the team to ensure reliable delivery of oracle data
Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release
Champion reliability and security by taking the time to do your work right the first time

Requirements

7+ years of relevant professional experience. You have likely worked on a devops, infrastructure, SRE, and/or platform team before
Ability to develop software outside of the scope of typical infrastructure requirements and configurations
Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
Expert knowledge in all aspects of designing, developing, and managing large real-time systems
Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack
Experience with distributed systems and container orchestration. You have maintained or built Kubernetes clusters and feel comfortable deploying new services on them
Strong communication skills. You can give and receive constructive feedback and participate in planning meetings and code reviews

Desired Qualifications

Excitement for blockchain, Web 3.0, and similar decentralized technologies
Experience running infrastructure in the blockchain/web3 space
Ability to scale systems sustainably through automation and evolve systems for reliability and velocity
Experience working remotely in a distributed team
A strong desire to grow and challenge yourself by automating services to reduce toil

Tools and Services

AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
We expect you to be comfortable with most of those tools and proficient in several of them

All roles with Chainlink Labs are global and remote-based. Overlap with Eastern Standard Time (EST) is encouraged unless stated otherwise.

We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes. The closing date is listed on the job advert. We encourage thoughtful preparation of your application and will communicate the status after the closing date.

Commitment to Equal Opportunity

Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this form.

Global Data Privacy Notice for Job Candidates and Applicants

Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you submit, is subject to our Privacy Policy. By submitting your application, you agree to our use and processing of your data as required.

Seniority level

Mid-Senior level

Employment type

Full-time

Job function

Engineering and Information Technology
Industries: Technology, Information and Internet

#J-18808-Ljbffr

Site Reliability Engineer

Há 2 dias

São Paulo, São Paulo, Brasil WEX Inc. Tempo inteiro R$70.000 - R$120.000 por ano

About the Team/RoleWe are seeking a Software Development Engineer Level 3 to join our SRE team dedicated to the Mobility line of business. This role is for a professional with a software development background who will apply SRE principles to ensure the reliability, scalability, and performance of our complex software systems.The ideal candidate will have...
Remote Site Reliability Engineer

4 semanas atrás

São Paulo, São Paulo, Brasil INDI Staffing Services Tempo inteiro

Overview We are looking for a Site Reliability Engineer to build and maintain highly reliable, scalable, and secure OpenShift/Kubernetes clusters. We will need you to approach the problem of building and maintaining production systems from a software engineering perspective with a focus on automation, and reliability. Responsibilities Build, automate, and...
Mid level Site Reliability Engineer

1 semana atrás

São Paulo, São Paulo, Brasil WEX Inc. Tempo inteiro R$80.000 - R$160.000 por ano

About the Team/RoleThe WEX Site Reliability Engineering (SRE) team seeks individuals passionate about developing software and solutions for observability, incident response, reliability, performance, operational excellence, and compliance. As part of the Site Reliability Engineering organization, you will support internal stakeholders and Payment Platform...
Remote Site Reliability Engineer

4 semanas atrás

São Paulo, São Paulo, Brasil INDI Staffing Services Tempo inteiro

At INDI, we're passionate about empowering individuals and businesses worldwide. Our cutting-edge recruiters connect leading companies with top talent, fostering a dynamic environment where innovation thrives. Join us in shaping the future of work. Overview of the role:We are looking for a Site Reliability Engineer to build and maintain highly reliable,...
Observability Engineering

2 semanas atrás

São Paulo, São Paulo, Brasil Appoena Tempo inteiro R$90.000 - R$120.000 por ano

We're Hiring: Observability Engineer Are you an expert in monitoring and ensuring system reliability? Do you thrive in a dynamic environment and have a passion for observability? We're looking for an Observability Engineer to join our innovative teamRole Overview:As an Observability Engineer, you will be at the forefront of our efforts to maintain system...
Site reliability engineer

4 semanas atrás

São Paulo, São Paulo, Brasil META SERVIÇOS EM INFORMÁTICA S.A. Tempo inteiro

Our partners are seeking Site Reliability Engineers for our team who thrive on pushing the limits of technology to produce state of the art solutions. The SRE team is challenged with creating scalable solutions for monitoring live trading infrastructures, building command frameworks, and generating actionable alerts for on call operations members as well as...
Site Reliability Engineer

3 semanas atrás

São Paulo, Estado de São Paulo, Brasil Appoena Tempo inteiro

Estamos contratando: Site Reliability Engineer [Especialista] Local: São Paulo, SP (modelo híbrido – possibilidade de home office parcial) Empresa: Appoena – Consultoria especializada em Observabilidade e Parceira Premier da DatadogDescrição da Vaga: Buscamos um(a) Site Reliability Engineer (SRE) [Especialista] para atuar garantindo a...
Remote Site Reliability Engineer

2 semanas atrás

São Paulo, Estado de São Paulo, Brasil INDI Staffing Services Tempo inteiro

At INDI, we're passionate about empowering individuals and businesses worldwide. Our cutting-edge recruiters connect leading companies with top talent, fostering a dynamic environment where innovation thrives. Join us in shaping the future of work.Overview of the role:We are looking for a Site Reliability Engineer to build and maintain highly reliable,...
Site Reliability Engineer

1 semana atrás

São Paulo, São Paulo, Brasil Loadsmart Tempo inteiro R$80.000 - R$120.000 por ano

ARE YOU INTERESTED IN JOINING AN INNOVATIVE LOGISTICS TECHNOLOGY COMPANY? Loadsmart is a growth-stage technology company valued at over $1 billion (a true Tech Unicorn We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight by helping shippers, brokers, warehouses and...
Senior Site Reliability Engineer

4 semanas atrás

São Paulo, São Paulo, Brasil Canonical Tempo inteiro

Overview Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. The company is founder led, profitable and growing, with 1200+ colleagues in...

Américas

Europa

Ásia / Oceania

África

Senior Site Reliability Engineer, Observability