AI/ML Evaluation Engineer

Há 7 dias


São Paulo, São Paulo, Brasil Truelogic Software Tempo inteiro US$80.000 - US$120.000 por ano

About Truelogic
At Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.

Our team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects. Whether collaborating with Fortune 500 giants or scaling startups, we deliver results that make a difference.

By applying for this position, you're taking the first step in joining a dynamic team that values your expertise and aspirations. We aim to align your skills with opportunities that foster exceptional career growth and success while contributing to transformative projects that shape the future.

*Our Client*
A global technology organization with a balanced engineering–creative model, focused on solving complex challenges across emerging technologies, AI, and modern consumer behavior. With multidisciplinary teams and a worldwide footprint, the company delivers secure, high-performance, and accessible digital experiences at scale.

Job Summary
We're looking for an
AI/ML Evaluation Engineer
to drive the accuracy, reliability, and performance of next-generation AI systems. You'll build evaluation pipelines, metrics, datasets, and automation that ensure model outputs are consistent, safe, and aligned with real-world expectations. This role is fully technical and highly collaborative, working closely with AI engineers, QA, data scientists, and product leaders.

Responsibilities

  • Write Python and SQL scripts to evaluate outputs from large language models (LLMs).
  • Design and implement LLM-as-Judge evaluations with clear scoring rubrics (faithfulness, relevance, completeness, correctness).
  • Define and calculate metrics such as exact match, token-level F1, ROUGE, cosine similarity, and subjective rubric scores.
  • Build and maintain ground-truth datasets for benchmarking and regression testing.
  • Automate evaluation workflows and integrate them into CI/CD pipelines.
  • Analyze large unstructured datasets to identify inconsistencies, anomalies, biases, and missing values.
  • Diagnose failure modes such as hallucinations, irrelevant answers, and formatting issues.
  • Produce clear reports summarizing evaluation findings and quality trends.
  • Collaborate with AI engineers, QA, data scientists, and product managers to define quality standards and release criteria.
  • Document all processes, evaluation setups, specifications, and architecture diagrams.
  • Maintain reproducibility and traceability for all evaluation runs and datasets.

Qualifications And Job Requirements

  • Advanced Python skills, including writing, debugging, and automating scripts.
  • Strong SQL proficiency and experience manipulating large datasets.
  • Hands-on experience with Python libraries such as Pandas and NumPy.
  • Ability to clean, standardize, and analyze structured and unstructured data.
  • Experience inspecting datasets, visualizing distributions, and preparing data for analysis.
  • Solid understanding of large language models, prompt behavior, hallucinations, and grounding concepts.
  • Knowledge of retrieval-augmented generation (RAG) flows and embedding-based search.
  • Awareness of vector similarity concepts such as cosine similarity and dot product.
  • Experience with at least one LLM evaluation framework (RAGAS, TruLens, LangSmith, etc.) or ability to quickly learn one.
  • Ability to design or implement custom LLM-as-Judge evaluation systems.
  • Applied understanding of statistical concepts such as variance, confidence intervals, precision/recall, and correlation.
  • Ability to translate ambiguous quality expectations into measurable metrics.
  • Familiarity with cloud-run services and automation pipelines, preferably on Google Cloud Platform (GCP).
  • Ability to learn new infrastructure tools quickly.
  • Strong analytical and problem-solving abilities for open-ended technical challenges.
  • Excellent communication skills for collaborating with cross-functional teams and presenting technical findings.

What we offer

  • 100% Remote Work: Enjoy the freedom to work from the location that helps you thrive. All it takes is a laptop and a reliable internet connection.
  • Highly Competitive USD Pay: Earn an excellent, market-leading compensation in USD, that goes beyond typical market offerings.
  • Paid Time Off: We value your well-being. Our paid time off policies ensure you have the chance to unwind and recharge when needed.
  • Work with Autonomy: Enjoy the freedom to manage your time as long as the work gets done. Focus on results, not the clock.
  • Work with Top American Companies: Grow your expertise working on innovative, high-impact projects with Industry-Leading U.S. Companies.

Why You'll Like Working Here

  • A Culture That Values You: We prioritize well-being and work-life balance, offering engagement activities and fostering dynamic teams to ensure you thrive both personally and professionally.
  • Diverse, Global Network: Connect with over 600 professionals in 25+ countries, expand your network, and collaborate with a multicultural team from Latin America.
  • Team Up with Skilled Professionals: Join forces with senior talent. All of our team members are seasoned experts, ensuring you're working with the best in your field.

Apply now



  • São Paulo, São Paulo, Brasil Truelogic Tempo inteiro US$80.000 - US$120.000 por ano

    About TruelogicAt Truelogic we are a leading provider of nearshore staff augmentation services headquartered in New York. For over two decades, we've been delivering top-tier technology solutions to companies of all sizes, from innovative startups to industry leaders, helping them achieve their digital transformation goals.Our team of 600+ highly skilled...


  • São Paulo, São Paulo, Brasil Motorola Solutions Tempo inteiro R$120.000 - R$240.000 por ano

    Please submit your resume in English - we can only consider applications submitted in this language.Company OverviewAt Motorola Solutions, we believe that everything starts with our people. We're a global close-knit community, united by the relentless pursuit to help keep people safer everywhere. Our critical communications, video security and command center...


  • São Paulo, São Paulo, Brasil Google Tempo inteiro R$80.000 - R$180.000 por ano

    Minimum qualifications:Bachelor's degree or equivalent practical experience.2 years of experience with software development in one or more programming languages (Python, C++, etc.).1 year of experience testing, maintaining, or launching software products, and 1 year of experience with software design and architecture.1 year of experience with Generative AI...

  • AI Gateway Engineer

    Há 5 dias


    São Paulo, Estado de São Paulo, Brasil AVM Consulting Inc Tempo inteiro

    AI Gateway EngineerWe are seeking a skilled AI Gateway Engineer to join our team. The ideal candidate will have hands-on experience with AI or API gateways, a strong background in backend development, and expertise in deploying and optimizing AI/ML models in a production environment. You will be instrumental in building and scaling our AI infrastructure...


  • São Paulo, São Paulo, Brasil CloudWalk, Inc. Tempo inteiro US$150.000 - US$200.000 por ano

    About CloudWalk:We are not just another fintech unicorn. We are a pack of dreamers, makers, and tech enthusiasts building the future of payments. With millions of happy customers and a hunger for innovation, we're now expanding our neural network - literally and metaphorically.About The RoleDo you speak AI fluently? Do you know how to train Large Language...


  • São Paulo, São Paulo, Brasil CloudWalk, Inc. Tempo inteiro R$80.000 - R$120.000 por ano

    About CloudWalk:We are not just another fintech unicorn. We are a pack of dreamers, makers, and tech enthusiasts building the future of payments. With millions of happy customers and a hunger for innovation, we're now expanding our neural network - literally and metaphorically.About The RoleDo you speak AI fluently? Do you know how to train Large Language...


  • São Paulo, São Paulo, Brasil CloudWalk Tempo inteiro R$80.000 - R$120.000 por ano

    About CloudWalk: We are not just another fintech unicorn. We are a pack of dreamers, makers, and tech enthusiasts building the future of payments. With millions of happy customers and a hunger for innovation, we're now expanding our neural network - literally and metaphorically. About the Role Do you speak AI fluently? Do you know how to train Large...

  • MLOps Engineer

    1 semana atrás


    São Paulo, São Paulo, Brasil CloudWalk, Inc. Tempo inteiro

    Who we are:CloudWalk is a fintech company reimagining the future of financial services. We are building intelligent infrastructure powered by AI, blockchain, and thoughtful design. Our products serve millions of entrepreneurs across Brazil and the US every day, helping them grow with tools that are fast, fair, and built for how business actually works. Learn...

  • AI Engineer

    1 semana atrás


    São Paulo, São Paulo, Brasil Within Tempo inteiro US$80.000 - US$120.000 por ano

    About the Role: We are seeking a highly skilled and motivated AI Engineer to join our team. This role is essential for driving our innovation at the intersection of large language models (LLMs) and advanced data analytics. The ideal candidate will possess a strong foundation in software engineering, a keen understanding of LLM evaluation methodologies, and...

  • Sr. ML Ops Engineer

    Há 5 dias


    São Paulo, Estado de São Paulo, Brasil Capgemini Tempo inteiro

    We are seeking a highly motivated Sr. ML Ops Engineer to join our team.Our Client is one of the United States’ largest insurers, providing a wide range of insurance and financial services products with gross written premiums well over US$25 Billion (P&C). They proudly serve more than 10 million U.S. households with more than 19 million individual policies...