Senior HPC Cluster Support Engineer

Há 9 horas


Salvador, Brasil Sky Systems, Inc. (SkySys) Tempo inteiro

Role: HPC Cluster Support – CIBA 4 (Senior)

Position Type: Part-Time Contract (20hrs/week)

Contract Duration: 6 months

Work Hours: EST or PST

Location: 100% Remote

We're seeking a Senior HPC Cluster Support Engineer to maintain and support large-scale production HPC environments running Bright Cluster Manager and Slurm. This role focuses on cluster operations, hardware troubleshooting, user support, and vendor coordination to ensure uninterrupted high-performance computing workloads.

Key Responsibilities

- Manage and support HPC clusters: job submission issues, queue management, and user troubleshooting
- Monitor cluster health and resolve node failures, networking issues, and domain problems
- Diagnose hardware faults (GPUs, boards, power, nodes) and perform remote checks using BMC tools (Dell iDRAC, HPE iLOM, Supermicro)
- Troubleshoot InfiniBand, Panasas storage, and network integration issues
- Coordinate repairs and escalate with vendors (ParkPlace, VDura)
- Apply system updates, patches, and configurations
- Collaborate with users and provide regular status updates

Required Skills

- Strong experience with Bright Cluster Manager and Slurm
- Linux systems administration and advanced troubleshooting
- Hardware diagnostics, BMC remote management tools
- Experience with InfiniBand, HPC storage systems (Panasas), and vendor escalation
- Active Directory integration for Linux is a plus



  • Salvador, Brasil beBeeFrequency Tempo inteiro

    Job Title: Senior Radio Frequency Engineer Job Summary We are seeking an experienced and skilled Senior Radio Frequency Engineer to join our team. As a key member of our engineering team, you will be responsible for monitoring system operation status, recording core system indicators, and handling anomalies according to standard operating procedures. Key...


  • Salvador, Brasil Bebeefrequency Tempo inteiro

    Job Title: Senior Radio Frequency Engineer Job SummaryWe are seeking an experienced and skilled Senior Radio Frequency Engineer to join our team.As a key member of our engineering team, you will be responsible for monitoring system operation status, recording core system indicators, and handling anomalies according to standard operating procedures.Key...


  • Salvador, Brasil beBeeVmware Tempo inteiro

    VMware Engineer We're seeking an experienced VMware expert to join our infrastructure team as a VMware engineer. The ideal candidate will have strong hands-on experience with vSAN, VMware cluster management, upgrades, and networking at the hypervisor level. The successful professional will be responsible for administering and supporting VMware ESXi and...


  • Salvador, Brasil Sphise Tempo inteiro

    Senior Backend Engineer (PHP / Laravel) Location: Brazil (Remote) Our trusted high-growth healthcare technology partner is seeking a talented Senior Backend Engineer (PHP / Laravel) to join their dynamic team. This innovative company is dedicated to revolutionizing the healthcare industry through cutting-edge technology solutions. Position Overview As a...

  • Senior Ux/Ui Engineer

    3 semanas atrás


    Salvador, Brasil Pride Global Tempo inteiro

    We're seeking for a Senior UX/UI Engineer in Brazil - 100% remoteType: 6-Month Contract (with strong potential for extension)Location: RemoteAbout the RoleWe’re looking for a Senior UX/UI Engineer to help build and improve internal web-based user interfaces for our Site Reliability Engineering teams. This role is ideal for a front-end engineer who loves to...


  • Salvador, Brasil Bebeesupport Tempo inteiro

    Enterprise Support Engineer Role OverviewWe are seeking a seasoned Support Specialist to join our team.This hands-on role involves full-stack JavaScript development and customer-facing support.Develop and deliver features tailored to meet the needs of a large enterprise customer.Provide technical support, troubleshooting, and issue resolution.Communicate...


  • Salvador, Brasil beBeeSupport Tempo inteiro

    Enterprise Support Engineer Role Overview We are seeking a seasoned Support Specialist to join our team. This hands-on role involves full-stack JavaScript development and customer-facing support. Develop and deliver features tailored to meet the needs of a large enterprise customer. Provide technical support, troubleshooting, and issue resolution....

  • Site Reliability Engineer

    4 semanas atrás


    Salvador, Brasil AgileEngine Tempo inteiro

    Overview Site Reliability Engineer (Middle/Senior) ID38916 at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to...

  • Site Reliability Engineer

    4 semanas atrás


    Salvador, Brasil AgileEngine Tempo inteiro

    OverviewSite Reliability Engineer (Middle/Senior) ID38916 at AgileEngine. AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to...

  • Senior Cloud Engineer

    2 semanas atrás


    Salvador, Brasil Cloud Decisions Tempo inteiro

    English-speaking Senior Cloud Engineer (Azure or AWS)$4,000 – $5,000 USD/month Fully Remote Full-time, long-term contract Excellent English skills required for this positionAbout the CompanyJoin aUS-based Cloud Expert MSPthat delivers end-to-end cloud solutions to enterprise clients across the US.Their teams design, manage, and optimize complexAzure and...