C++ Engineer AI Runtime

Há 5 dias


São Paulo, Brasil Baasi Tempo inteiro

OverviewWe are a stealth-mode startup building next-generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility making advanced models portable, efficient, and customizable for real-world deployment. If you want to be part of a small, fast-moving team shaping the future of applied AI systems , this is your opportunity. Role We are looking for a C++ Engineer with strong systems and GPU programming background to help extend and optimize an open-source AI inference runtime. You will work on low-level internals of large language model serving, focusing on: Dynamic adapter integration (e.g., LoRA/QLoRA) Incremental model update mechanisms Multi-session inference caching and scheduling GPU performance improvements (Tensor Cores, CUDA/ROCm) This is a hands-on role : you will be designing, coding, profiling, and iterating on high-performance inference code that runs directly on CPUs and GPUs. ResponsibilitiesImplement support for runtime adapter loading (LoRA), enabling models to be customized on the fly without retraining or model merges. Design and implement mechanisms for incremental model deltas, allowing models to be extended and updated efficiently. Extend runtime to handle multi-session execution, with isolation and caching strategies for concurrent users. Optimize core math kernels and memory layouts to improve inference performance on CPU and GPU backends. Collaborate with backend and infrastructure engineers to integrate your work into APIs and orchestration layers. Write benchmarks, unit tests, and profiling tools to ensure correctness and measure performance gains. Contribute to system architecture discussions and help define the roadmap for future runtime features. RequirementsStrong proficiency in modern C++ (C++14/17/20) and systems programming. Solid understanding of low-level performance optimization: memory management, multithreading, SIMD, cache efficiency. Experience with CUDA and/or ROCm/HIP GPU programming. Familiarity with linear algebra kernels (matrix multiply, attention) and how they map to hardware acceleration (Tensor Cores, BLAS libraries, etc.). Exposure to machine learning inference frameworks (e.g., llama.cpp, TensorRT, ONNX Runtime, TVM, PyTorch internals) is a plus. Comfortable working in a Unix/Linux environment; experience with build systems (CMake, Bazel) and CI pipelines. Strong problem-solving and debugging skills; ability to dive deep into both code and performance traces. Self-motivated and able to thrive in a fast-moving startup environment. Nice to HaveExperience implementing LoRA or adapter-based fine-tuning in inference runtimes. Knowledge of quantization methods and deploying quantized models efficiently. Background in distributed systems or multi-GPU orchestration. Contributions to open-source ML/AI systems. Why JoinBuild core IP at the intersection of AI and systems engineering. Work with a highly technical founding team on problems that are both intellectually challenging and commercially impactful. Opportunity to shape the direction of a new AI platform from the ground up Competitive compensation (contract or full-time), equity potential, and flexible remote work. Please use this link to apply to this job: #J-18808-Ljbffr



  • São Paulo, Brasil Baasi Inc. Tempo inteiro

    C++ Engineer AI Runtime (São Paulo, Brazil) - full time Ref # About Us We are a stealth-mode startup building next-generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility making advanced models...


  • São Paulo, Brasil Baasi Inc. Tempo inteiro

    C++ Engineer AI Runtime (São Paulo, Brazil) - full timeRef # About UsWe are a stealth-mode startup building next-generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility making advanced models...

  • C++ Engineer AI Runtime

    2 semanas atrás


    São Paulo, Brasil Baasi Tempo inteiro

    Overview We are a stealth-mode startup building next-generation infrastructure for the AI industry. Our team has decades of experience in software, systems, and deep tech. We are working on a new kind of AI runtime that pushes the boundaries of performance and flexibility making advanced models portable, efficient, and customizable for real-world deployment....

  • Backend Engineer

    Há 3 dias


    São Paulo, Brasil Baasi Inc. Tempo inteiro

    2 weeks ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. About UsWe are a stealth-mode startup building new infrastructure for the AI industry. Our mission is to make advanced language models deployable, customizable, and secure across diverse environments. Our platform leverages an existing SaaS codebase...

  • Backend Engineer

    2 semanas atrás


    São Paulo, Brasil Baasi Tempo inteiro

    1 day ago Be among the first 25 applicants About Us We are a stealth-mode startup building new infrastructure for the AI industry. Our mission is to make advanced language models deployable, customizable, and secure across diverse environments. Our platform leverages an existing SaaS codebase for authentication, billing, and user management, and we are...


  • São Paulo, Brasil AI Architechs Tempo inteiro

    Software Engineer with AI Automation ExperienceRemote | Global | Emerging Markets WelcomeWhy You Will Want This Role• Competitive compensation with opportunities for bonuses and long-term growth • Fully remote position with flexible working hours aligned with U.S. East Coast time • Continuous training in software and AI engineering, guided by leading...


  • São Paulo, Brasil LatamCent Tempo inteiro

    Overview Senior AI Engineer Full-Time | Remote from Latin America | Required Overlap: 9AM - 3PM PST (6 hours) EkLine is seeking a Senior AI Engineer to lead the development and optimization of AI-powered features for our documentation platform. You will design and deploy AI agents, fine-tune large language models (LLMs), and build ML-driven tools for content...

  • [C] Senior AI Engineer

    4 semanas atrás


    São Paulo, Brasil LatamCent Tempo inteiro

    OverviewSenior AI Engineer Full-Time | Remote from Latin America | Required Overlap: 9AM - 3PM PST (6 hours) EkLine is seeking a Senior AI Engineer to lead the development and optimization of AI-powered features for our documentation platform. You will design and deploy AI agents, fine-tune large language models (LLMs), and build ML-driven tools for content...

  • AI Engineer

    2 semanas atrás


    São Paulo, Brasil AllianceIT Inc Tempo inteiro

    This position is a full-time remote position to work from LATAM with US clients.We offer: 100% Remote Work $28-30 USD per hourTitle: AI Engineer We are seeking an experienced AI Engineer to serve as a trusted technical advisor to our customers and help solve complex machine learning challenges. You will provide expert guidance on practical ML implementation,...

  • AI Engineer

    2 semanas atrás


    São Paulo, Brasil AllianceIT Inc Tempo inteiro

    This position is a full-time remote position to work from LATAM with US clients.We offer: 100% Remote Work $28-30 USD per hourTitle: AI Engineer We are seeking an experienced AI Engineer to serve as a trusted technical advisor to our customers and help solve complex machine learning challenges. You will provide expert guidance on practical ML implementation,...