
Site Reliability Engineer
4 semanas atrás
Site Reliability Engineer - Data Platform role at Kraken. Join our Data Infrastructure team to uphold the reliability, scalability, and efficiency of our data platform.
Responsibilities- Design the data governance mechanisms that ensure our lakehouse is easy to interact with, secure and in compliance with all applicable regulations.
- Implement the infrastructure we use to ingest our data, store it, catalog it with the right metadata and capture its lineage.
- Provide a state-of-the-art suite of BI tools for multiple teams within the company.
- Guarantee the availability, high performance, scalability and cost efficiency of our data platform.
- Implement data infrastructure solutions (self service) that support the needs of 10+ business units and over 100 engineering and data analysts
- Utilize Infrastructure as Code (IaC) principles to design, provision, and manage both on-premises and cloud (AWS) infrastructure components using tools such as Terraform
- Develop and maintain automation scripts using bash/shell scripting to automate operational tasks and deployments
- Enhance and manage CI/CD pipelines to facilitate consistent software deployments across the data infrastructure
- Implement robust data monitoring and alerting solutions to proactively detect anomalies and performance issues
- Manage and implement role-based access control (RBAC) and permissions for multiple user groups and machine workflows across environments
- Manage and maintain real-time streaming data architecture using technologies like Kafka and Debezium (CDC)
- Ensure timely and accurate processing of streaming data for insights
- Utilize Kubernetes to manage containerized applications within the data infrastructure
- Implement incident response procedures and participate in on-call rotations
- Collaborate with data analysts, engineers, and cross-functional teams to understand requirements and implement solutions
- Document architecture, processes, and best practices to enable knowledge sharing and continuous improvement
- Support AI/ML teams with their infra requests
- Proven experience (5+ years) as a Site Reliability Engineer, Infrastructure Engineer, Data Infrastructure Engineer, or similar roles with a focus on data infrastructure and security
- Experience with real-time data processing technologies such as Kafka, Flink, and Debezium
- Experience managing hybrid multi-tenant cloud systems, particularly on AWS
- Infrastructure as Code tools such as Terraform, Terragrunt and Atlantis
- Experience with containerization/orchestration tools (Kubernetes, Nomad, Docker)
- Strong Bash/shell scripting and proficiency in at least one programming language (preferably Python or JVM languages)
- Experience with data technologies: Apache Airflow, Apache Spark, databases, BI tooling
- Experience solving data access management at large-scale data lakes
- Familiarity with CI/CD deployment pipelines and related tools
- Strong problem-solving skills and ability to troubleshoot complex systems
This job is accepting ongoing applications and there is no application deadline.
Please note, applicants may redact or remove information identifying age, date of birth, or dates of attendance/graduation on their resume.
We consider qualified applicants with criminal histories for employment consistent with the San Francisco Fair Chance Ordinance.
Kraken is powered by people from around the world and we celebrate diverse talents, backgrounds, and perspectives. We hire based on merit and encourage applying for roles even if you don\'t meet every listed requirement, especially if you\'re passionate about crypto. Kraken is an equal opportunity employer; we do not tolerate discrimination or harassment. See Kraken\'s Career and Privacy policies for more information.
#J-18808-Ljbffr-
Reliability Engineer, Mill
4 semanas atrás
Brasil Rosebel Gold Mines N.V. Tempo inteiroOverviewAt Rosebel Gold Mines N.V., we are seeking a highly skilled and motivated MillReliability Engineer to join our Mill department. As a Reliability Engineer, you will play a crucial role in ensuring the smooth operation and maintenance of our fixed asset milling equipment. In this role, you will play a crucial role in enhancing the team's understanding...
-
Reliability Engineer
4 semanas atrás
Brasil Flinks Tempo inteiroFlinks is where financial data moves—with purpose, trust, and impact. We're on a mission to simplify access to financial data and help businesses build better, faster, and more secure financial products and experiences. Since 2016, we've been bridging the gap between fintechs, financial institutions, and consumers by enabling seamless, secure data...
-
System Reliability Expert
4 semanas atrás
Brasil beBeeReliability Tempo inteiroJob Title: Site Reliability Engineer Are you a skilled professional seeking a challenging and dynamic work environment? Our company is a multinational corporation specializing in the Management, Implementation, Development and Maintenance of Information Systems. We are looking for an experienced System Reliability Expert to join our team. With over 150...
-
Senior Site Reliability Engineer
4 semanas atrás
Brasil Stone Tempo inteiroQuem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções para quem empreende no Brasil. Quem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções...
-
Site Reliability Engineer Sênior
Há 7 dias
Brasil Stone Tempo inteiroQuem é Stone Tech? A Stone nasceu com o propósito de ser protagonista na transformação da indústria de pagamentos, lutando para oferecer as melhores soluções para quem empreende no Brasil. Pensando nisso, construímos a Stone Tech! A junção dos times de tecnologia Stone Co. e as empresas financeiras do grupo que reconhecem o potencial...
-
Reliability Expert
3 semanas atrás
Brasil beBeeReliability Tempo inteiroJob Description We are seeking a skilled Site Reliability Engineer to fill this key role. The primary focus of this position is incident resolution via Critical Issue Response System (CIRS) and providing regular updates until successful resolution. Responsibilities: Handle major incidents using the CIRS platform Perform in-depth application troubleshooting,...
-
Site Reliability Engineer
4 semanas atrás
Brasil HCLTech Tempo inteiroResponsibilitiesHandling major incidents via CIRS (Critical Issue Response System) and providing frequent updates until resolution. Performing deep-dive application troubleshooting and identifying preventive actions. Managing CIRS-related requests including deployments, feature toggles, and data fixes. Following up on major production incidents and...
-
Site reliability engineer
3 semanas atrás
Brasil HCLTech Tempo inteiroYour role and responsabilities: Handling major incidents via CIRS (Critical Issue Response System) and providing frequent updates until resolution. Performing deep-dive application troubleshooting and identifying preventive actions. Managing CIRS-related requests including deployments, feature toggles, and data fixes. Following up on major production...
-
Senior Devops Engineer
4 semanas atrás
Brasil ITG Software, Inc. Tempo inteiroCalling all DevOps Engineers We're looking for an experienced DevOps Engineer who wants to do more than just keep the lights on. This is a chance to design, build, and secure production-grade, cloud-native platforms that power real-world impact. About the Role:You'll be at the heart of our engineering team, scaling infrastructure, automating workflows, and...
-
Application reliability engineer sre focado em aplicacoes
4 semanas atrás
Brasil Netvagas Tempo inteiroOverview Join to apply for the Application reliability engineer sre focado em aplicacoes role at Netvagas 2 days ago Be among the first 25 applicants Join to apply for the Application reliability engineer sre focado em aplicacoes role at Netvagas Responsibilities Desenvolver e implementar soluções de infraestrutura, garantindo estabilidade,...