Sre/Production Support Engineer
Há 16 horas
*Native/Bilingual English is required for this role (read/written/spoken)
Please upload your CV Resume in English.
Monthly salary: $4,000 - $5,500 USD
Along with our partner, we are seeking a Senior SRE/Production Support Engineer to lead the operational reliability, stability, and performance of their production systems. The selected professional will serve as a technical leader for incident response, root cause analysis, and long-term operational improvements. This role requires deep expertise in AWS serverless architectures, Python backends, PostgreSQL, and frontend technologies like React/Amplify.
The Senior Production Support Engineer not only resolves incidents but also drives system improvements, mentors junior engineers, and shapes processes for reliability and monitoring.
Responsibilities:
- Lead incident management for production issues across: AWS Lambda-based microservices, PostgreSQL (RDS), and React/Amplify frontend applications
- Investigate, diagnose, and resolve complex production issues, including performance, data, and configuration problems.
- Conduct and lead post-incident reviews and root cause analyses (RCA), driving preventive solutions.
- Mentor and guide junior/mid-level production support engineers in troubleshooting and operational best practices.
- Maintain and enhance monitoring, alerting, logging, and observability tools (CloudWatch, X-Ray, DataDog, etc.).
- Collaborate with engineering teams to improve system reliability, scalability, and maintainability.
- Own and improve runbooks, playbooks, and operational documentation.
- Participate in on-call rotations, providing technical leadership during high-impact incidents.
- Analyze recurring issues and propose architectural or procedural improvements to prevent recurrence.
- Support deployment validation, emergency rollbacks, and operational changes.
- Partner with DevOps and Engineering teams to optimize performance, cost, and availability of cloud resources.
Required Qualifications:
- 5+ years of experience in production support, SRE, DevOps, or backend engineering roles.
- Strong expertise with AWS services, particularly Lambda, API Gateway, RDS (PostgreSQL), S3, Cognito, and CloudWatch.
- Proficient in Python, with the ability to read, debug, and modify code to resolve issues.
- Deep understanding of PostgreSQL, including query optimization, data integrity, and troubleshooting.
- Experience managing and improving observability, monitoring, and alerting in production systems.
- Proven experience handling high-severity incidents and leading incident response.
- Strong problem-solving skills and ability to navigate distributed systems.
- Excellent communication skills for incident reporting, collaboration, and mentoring.
Preferred Qualifications:
- Experience with frontend technologies (React, Amplify) for debugging full-stack issues.
- Familiarity with serverless architecture best practices and cost/performance optimization.
- Experience with infrastructure-as-code (CloudFormation, CDK, Terraform).
- Knowledge of automation and scripting for operational tasks (Python preferred).
- Prior experience in defining or improving SLOs, SLAs, and operational KPIs.
- Familiarity with modern CI/CD pipelines and automated deployment strategies.
- Hands-on experience with observability and monitoring platforms (DataDog, New Relic, Sentry).
Success Indicators:
- Production incidents are resolved quickly and effectively, minimizing business impact.
- Post-incident RCAs lead to measurable improvements in system reliability.
- Operational playbooks and runbooks are well-maintained and widely used.
- Junior/mid-level engineers are mentored effectively and develop troubleshooting skills.
- Systems are proactively monitored, optimized, and improved for stability, scalability, and cost efficiency.
Tools You May Use:
- AWS Services: Lambda, RDS (PostgreSQL), S3, API Gateway, Cognito, CloudWatch, X-Ray, SNS/SQS, EventBridge
- Languages & Scripting: Python
- Monitoring & Observability: CloudWatch, DataDog, Sentry, X-Ray
- Version Control & CI/CD: GitHub/GitLab, CI/CD pipelines
- Frontend Collaboration: React, Amplify
- Ticketing & Collaboration: Jira, Confluence
- AI Prompting: Cursor, ChatGPT
Benefits:
- A fully remote position with a structured schedule that supports work-life balance.
- The opportunity to join a forward-thinking company transforming the future of film and television production through cutting-edge technology.
- Two weeks of paid vacation per year.
- 10 paid days for local holidays.
Work Schedule: US Pacific Standard Time
*Please note our partner is only looking for full-time dedicated team members who are eager to fully integrate within their team.
-
Site Readiness Engineer
Há 3 dias
Várzea Grande, Brasil Pathlock Tempo inteiroAbout Pathlock: Pathlock is a leader in application security, access governance, and compliance automation. Our cloud-based solutions help organizations secure critical applications, mitigate risk, and enforce policies across a diverse IT landscape. Job Summary: We are looking for a skilled Site Readiness Engineer (SRE) with expertise in CI/CD automation...
-
Technical Support Engineer
Há 3 dias
Várzea Grande, Brasil Hitch Equity Tempo inteiroTechnical Support Engineer (Mid-Level) Location: Remote (Brazil. or Latin America preferred) Company: Hitch, Inc. About Hitch Hitch builds the digital infrastructure powering modern home equity and non-QM lending. Our white-label platform enables lenders to launch HELOC, DSCR, bridge, and home equity products faster — with integrated point of sale,...
-
Brazil | Sre, Java
Há 6 dias
Várzea Grande, Brasil Affinity Tempo inteiroA Job? Or a Lifetime Experience? Start Yours Here! Our mission is to be a meaningful part of our people's careers. As we grow, so does our determination to offer the best experience to our employees and clients — and that's exactly what drives us. We are a Portuguese technology consulting company with offices in Lisbon, Porto and Óbidos, and...
-
Senior It Support Engineer
Há 6 dias
Várzea Grande, Brasil Rain Tempo inteiroJob Description Rain is the fastest-growing earned wage access (EWA) fintech in the U.S., serving 3.5 million employees and backed by top investors like QED and Prosus. We've raised nearly $400M in funding—including the largest Series A in fintech history—and just closed our Series B to fuel our next stage of hypergrowth. We're seeking an experienced...
-
Senior Site Reliability Engineer
Há 5 dias
Várzea Grande, Brasil Mercado Eletrônico Tempo inteiroO Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração. Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...
-
Applied Ai Engineer
Há 3 dias
Várzea Grande, Brasil Tecla Tempo inteiro*Native/Bilingual English is required for this role (read/written/spoken) Please upload your CV Resume in English. Monthly salary: $5,000 - $6,500 USD Our partner is entering their next phase of growth, expanding their platform and building intelligent systems that transform how banks and credit unions understand and serve their small business customers....
-
Site Reliability Engineer Sr
Há 3 dias
Várzea Grande, Brasil Mercado Eletrônico Tempo inteiroO Mercado Eletrônico é líder na América Latina em soluções de gestão de compras B2B. Suas tecnologias e serviços para as áreas de compras ajudam empresas a conquistarem mais economia, agilidade, governança e colaboração. Com escritórios no Brasil, Estados Unidos, México e Portugal, contabiliza mais de 1 milhão de fornecedores, 10 mil...
-
Especialista Sre
2 semanas atrás
Rio Grande, Brasil Sankhya Gestão De Negócios Tempo inteiroEstamos em busca de um(a) Especialista SRE para reforçar nosso time de Cloud e Engenharia de Confiabilidade.Esse profissional será fundamental para garantir a estabilidade, performance e resiliência dos nossos ambientes Saa S, promovendo automação e excelência operacional em escala.Missão do cargoComo parte da nossa estrutura de Cloud e SRE, sua...
-
Associate Linux Support Engineer
2 semanas atrás
Campo Grande, Brasil Canonical Tempo inteiroJoin to apply for the Associate Linux Support Engineer role at Canonical 3 days ago Be among the first 25 applicants Join to apply for the Associate Linux Support Engineer role at Canonical Get AI-powered advice on this job and more exclusive features. Canonical is a leading provider of open source software and operating systems to the global enterprise and...
-
Senior Software Engineer
Há 6 dias
Várzea Grande, Brasil Pride Global Tempo inteiroPride Global is hiring a Senior Software Engineer in Brazil. Please apply for consideration!! Location: Remote (Brazil) Employment Type: PJ Job Summary We are seeking a Senior Software Engineer to join our innovative team. You will develop high-quality, performant, and testable code contributing directly to company-wide initiatives using exciting...