Senior Site Reliability Operations Engineer
37 minutos atrás
At Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.
Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference.
By applying for this position, you're taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future.
Our ClientA leading Financial Services
Job Summary
The Site Reliability Operations (SRO) team ensures 24/7 stability of Pennymac's internal IT infrastructure and mission-critical backend systems. This role is not DevOps-focused, but is crucial in monitoring, coordinating, and restoring operations during incidents, particularly in a high-stakes, regulated environment.
The role balances incident command, technical troubleshooting, project leadership, and communication with multiple internal and external stakeholders.
ResponsibilitiesLead incident response as Incident Commander, coordinating teams, communications, and service restoration
Produce executive-level incident reports, run RCAs, and drive continuous improvement
Monitor and improve observability using tools like AWS CloudWatch and New Relic, reducing alert noise and gaps
Provide hands-on system support across Linux and Windows environments, including complex infrastructure issues
Manage and execute deployments via Jenkins, GitLab, or similar CI/CD platforms
Own infrastructure initiatives such as migrations, upgrades, and process improvements
Enforce change management and risk assessment for production changes
Maintain documentation and SOPs, acting as a key liaison between engineering teams and external vendors
On-call rotation: 1-week rotation, subject to critical incident call-ins between 6:00 PM and 6:00 AM PT.
5+ years of experience in Windows and Linux environments with proven troubleshooting capabilities.
Strong knowledge of monitoring tools like AWS CloudWatch, New Relic, Nagios, SumoLogic.
Practical experience with CI/CD tools (Jenkins, GitLab) and backup tools (CommVault, AWS Backup).
Strong scripting skills in PowerShell, Python, or equivalent.
Outstanding communication skills, especially under pressure, including executive reporting.
Experience in high-paced environments and with on-call support models.
Autonomous and proactive attitude; capable of managing complex tasks independently.
100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection.
Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings.
Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed.
Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock.
Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.
A Culture That Values You: We prioritize well-being and work-life balance, offering engagement activities and fostering dynamic teams to ensure you thrive both personally and professionally.
Diverse, Global Network: Connect with over 600 professionals in 25+ countries, expand your network, and collaborate with a multicultural team from Latin America.
Team Up with Skilled Professionals: Join forces with senior talent. All of our team members are seasoned experts, ensuring you're working with the best in your field.
Apply now
-
Senior Site Reliability Engineer
11 minutos atrás
São Paulo, São Paulo, Brasil Enumerate Tempo inteiroRole OverviewWe're looking for a Senior Site Reliability Engineer who can own the architecture, governance, and cost efficiency of our cloud and platform infrastructure. In this role you'll design and evolve our production environments, define standards and best practices, and partner with engineering and IT teams to build scalable, reliable systems that are...
-
Site Reliability Engineer
28 minutos atrás
São Paulo, São Paulo, Brasil Sur Tempo inteiroAs the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
-
Site Reliability Engineer
2 semanas atrás
São Paulo, São Paulo, Brasil INDI Staffing Services Tempo inteiroAt INDI, we're passionate about empowering individuals and businesses worldwide. Our cutting-edge recruiters connect leading companies with top talent, fostering a dynamic environment where innovation thrives. Join us in shaping the future of work.Overview of the role:We are looking for a Site Reliability Engineer to build and maintain highly reliable,...
-
Site Reliability Engineer
3 semanas atrás
São Paulo, Estado de São Paulo, Brasil Conquest One Tempo inteiroVaga: SRE Sênior️ Inglês para conversação é imprescindívelHíbrido – presencial 2x na semana no Jardim Paulista (Av. Nove de Julho – São Paulo/SP) + 3x na semana de home office Contratação: CLT Horário de trabalho: 09:00 às 18:00Estamos em busca de um(a) Site Reliability Engineer Sênior para atuar de forma estratégica na transformação e...
-
Site Reliability Engineer
34 minutos atrás
São Paulo, São Paulo, Brasil PayRetailers Tempo inteiroJob DescriptionWe're PayRetailers, and we offer cutting-edge payment solutions that empower businesses to succeed in Latin America & Africa. Our collaborative and inclusive work environment encourages creativity and growth, where every employee's contribution is valued.We've got big plans to expand into new markets and make a meaningful impact on the world...
-
Site Reliability Engineer
7 minutos atrás
São Paulo, São Paulo, Brasil FullStack Tempo inteiroAbout FullStackFullStack is the most transparent IT talent network, connecting highly skilled individuals with top global companies and Silicon Valley startups for remote, on-demand projects. We focus on building a trusted, high-performance network where talent can thrive in a positive, respectful, and supportive environment. By prioritizing transparency,...
-
Senior Site Reliability Operations Engineer
13 minutos atrás
São Paulo, São Paulo, Brasil Truelogic Software Tempo inteiroAbout TruelogicAt Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.Our team of 600+ highly skilled...
-
Senior Site Reliability Operations Engineer
4 minutos atrás
São Paulo, São Paulo, Brasil Truelogic Tempo inteiroAbout TruelogicAt Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.Our team of 600+ highly skilled...
-
Site Reliability Engineer
24 minutos atrás
São Paulo, São Paulo, Brasil Enter Tempo inteiroA Enter (anteriormente Talisman AI) foi fundada em 2023 com a missão de tornar o Brasil um protagonista em Inteligência Artificial. Unimos a expertise humana à eficiência da IA para ajudar grandes empresas da América Latina a otimizar processos críticos de alto volume e que exigem intenso trabalho manual. Iniciamos nossa jornada aplicando IA para...
-
Site Reliability Engineer
27 minutos atrás
São Paulo, São Paulo, Brasil Enter Tempo inteiroA Enter (anteriormente Talisman AI) foi fundada em 2023 com a missão de tornar o Brasil um protagonista em Inteligência Artificial. Unimos a expertise humana à eficiência da IA para ajudar grandes empresas da América Latina a otimizar processos críticos de alto volume e que exigem intenso trabalho manual. Iniciamos nossa jornada aplicando IA para...