Site Reliability Engineer

4 semanas atrás

Brasil Kraken Tempo inteiro

Overview

Site Reliability Engineer - Data Platform role at Kraken. Join our Data Infrastructure team to uphold the reliability, scalability, and efficiency of our data platform.

Responsibilities

Design the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations.
Implement the infrastructure we use to ingest our data, store it, catalog it with the right metadata and capture its lineage.
Provide a state-of-the-art suite of BI tools for multiple teams within the company.
Guarantee the availability, high performance, scalability and cost efficiency of our data platform.
Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts
Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
Develop and maintain automation scripts using bash/shell scripting to automate operational tasks and deployments
Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure
Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues
Manage and implement role-based access control (RBAC) and permissions for multiple user groups and machine workflows across environments
Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium (CDC)
Ensure timely and accurate processing of streaming data for insights
Utilize Kubernetes to manage containerized applications within the data infrastructure
Implement incident response procedures and participate in on-call rotations
Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement solutions
Document architecture, processes, and best practices to enable knowledge sharing and continuous improvement
Support AI/ML teams with their infra requests

Qualifications

Proven experience (5+ years) as a Site Reliability Engineer, Infrastructure Engineer, Data Infrastructure Engineer, or similar roles with a focus on data infrastructure and security
Experience with real-time data processing technologies such as Kafka, Flink, and Debezium
Experience managing hybrid multi-tenant cloud systems, particularly on AWS
Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
Experience with containerization/orchestration tools (Kubernetes, Nomad, Docker)
Strong Bash/shell scripting and proficiency in at least one programming language (preferably Python or JVM languages)
Experience with data technologies: Apache Airflow, Apache Spark, databases, BI tooling
Experience solving data access management at large-scale data lakes
Familiarity with CI/CD deployment pipelines and related tools
Strong problem-solving skills and ability to troubleshoot complex systems

This job is accepting ongoing applications and there is no application deadline.
Please note, applicants may redact or remove information identifying age, date of birth, or dates of attendance/graduation on their resume.
We consider qualified applicants with criminal histories for employment consistent with the San Francisco Fair Chance Ordinance.

Kraken is powered by people from around the world and we celebrate diverse talents, backgrounds, and perspectives. We hire based on merit and encourage applying for roles even if you don\'t meet every listed requirement, especially if you\'re passionate about crypto. Kraken is an equal opportunity employer; we do not tolerate discrimination or harassment. See Kraken\'s Career and Privacy policies for more information.

#J-18808-Ljbffr

Reliability Engineer, Mill

4 semanas atrás

Brasil Rosebel Gold Mines N.V. Tempo inteiro

OverviewAt Rosebel Gold Mines N.V., we are seeking a highly skilled and motivated MillReliability Engineer to join our Mill department. As a Reliability Engineer, you will play a crucial role in ensuring the smooth operation and maintenance of our fixed asset milling equipment. In this role, you will play a crucial role in enhancing the team's understanding...
Reliability Engineer

4 semanas atrás

Brasil Flinks Tempo inteiro

Flinks is where financial data moves—with purpose, trust, and impact. We're on a mission to simplify access to financial data and help businesses build better, faster, and more secure financial products and experiences. Since 2016, we've been bridging the gap between fintechs, financial institutions, and consumers by enabling seamless, secure data...
System Reliability Expert

4 semanas atrás

Brasil beBeeReliability Tempo inteiro

Job Title: Site Reliability Engineer Are you a skilled professional seeking a challenging and dynamic work environment? Our company is a multinational corporation specializing in the Management, Implementation, Development and Maintenance of Information Systems. We are looking for an experienced System Reliability Expert to join our team. With over 150...
Senior Site Reliability Engineer

4 semanas atrás

Brasil Stone Tempo inteiro

Quem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções para quem empreende no Brasil. Quem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções...
Site Reliability Engineer Sênior

Há 7 dias

Brasil Stone Tempo inteiro

Quem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções para quem empreende no Brasil. Pensando nisso, construímos a Stone Tech! A junção dos times de tecnologia Stone Co. e as empresas financeiras do grupo que reconhecem o potencial...
Reliability Expert

3 semanas atrás

Brasil beBeeReliability Tempo inteiro

Job Description We are seeking a skilled Site Reliability Engineer to fill this key role. The primary focus of this position is incident resolution via Critical Issue Response System (CIRS) and providing regular updates until successful resolution. Responsibilities: Handle major incidents using the CIRS platform Perform in-depth application troubleshooting,...
Site Reliability Engineer

4 semanas atrás

Brasil HCLTech Tempo inteiro

ResponsibilitiesHandling major incidents via CIRS (Critical Issue Response System) and providing frequent updates until resolution. Performing deep-dive application troubleshooting and identifying preventive actions. Managing CIRS-related requests including deployments, feature toggles, and data fixes. Following up on major production incidents and...
Site reliability engineer

3 semanas atrás

Brasil HCLTech Tempo inteiro

Your role and responsabilities: Handling major incidents via CIRS (Critical Issue Response System) and providing frequent updates until resolution. Performing deep-dive application troubleshooting and identifying preventive actions. Managing CIRS-related requests including deployments, feature toggles, and data fixes. Following up on major production...
Senior Devops Engineer

4 semanas atrás

Brasil ITG Software, Inc. Tempo inteiro

Calling all DevOps Engineers We're looking for an experienced DevOps Engineer who wants to do more than just keep the lights on. This is a chance to design, build, and secure production-grade, cloud-native platforms that power real-world impact. About the Role:You'll be at the heart of our engineering team, scaling infrastructure, automating workflows, and...
Application reliability engineer sre focado em aplicacoes

4 semanas atrás

Brasil Netvagas Tempo inteiro

Overview Join to apply for the Application reliability engineer sre focado em aplicacoes role at Netvagas 2 days ago Be among the first 25 applicants Join to apply for the Application reliability engineer sre focado em aplicacoes role at Netvagas Responsibilities Desenvolver e implementar soluções de infraestrutura, garantindo estabilidade,...

Américas

Europa

Ásia / Oceania

África

Site Reliability Engineer