Freelance Agent Evaluation Engineer

Mindrift


Date: 2 weeks ago
City: Hyderabad, Telangana
Contract type: Part time
Remote

Please submit your CV in English and indicate your level of English proficiency.

Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation is project-based, not permanent employment.

What this opportunity involves

We're building a dataset to evaluate AI coding agents - how well a model handles real-world developer tasks.

You'll create challenging tasks and evaluation criteria within realistic simulated environments:

  • Build realistic developer environments - a virtual company with codebase, infrastructure, and context (tickets, docs, conversations) that forms a believable development history
  • Design tasks from intermediate states of these environments - craft the prompt, define what "solved" means, and ensure the task is solvable by an AI agent
  • Write tests that verify agent solutions - accept all valid approaches and reject incorrect ones, neither too strict nor too lenient
  • Iterate on tasks and tests based on QA feedback - review agent solutions, analyze failures, and refine until the evaluation is fair and robust

What this is NOT

  • Not data labeling
  • Not prompt engineering
  • Not writing code from scratch - the agent writes most of the code; you guide and evaluate

What we look for

  • 5+ years in software development
  • Core stack: Python (FastAPI), JavaScript/TypeScript (React), Docker, Postgres, Kafka, Redis
  • Experience writing tests (functional, integration)
  • English proficiency - B2+

Why this is hard

Frontier models are already good at coding. Creating a task that genuinely challenges the best models is non-trivial. You need to deeply understand where models fail and what scenarios reveal the difference between a good and a bad solution. Tasks have many valid solutions - writing tests that accept all correct solutions and reject incorrect ones is harder than it sounds.

How it works

Apply → Pass qualification(s) → Join a project → Complete tasks → Get paid

Effort estimate

Tasks for this project are estimated to take 20 hours to complete, depending on complexity. This is an estimate and not a schedule requirement; you choose when and how to work. Tasks must be submitted by the deadline and meet the listed acceptance criteria to be accepted.

Compensation

Up to $30/hr equivalent, depending on level and pace. Tasks are estimated at ~20 hours each; you set your own schedule.

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Technical Product Owner

Ideagen, Hyderabad, Telangana
2 days ago
About UsLocation - Hyderabad, IndiaDepartment - Product R&DLevel - Team Leader/ ProfessionalWorking Pattern - Work from office.Benefits - Benefits At IdeagenSalary - this will be discussed at the next stage of the process, if you do have any questions, please feel free to reach out!Join our dynamic Agile development team as a Technical Product Owner, where you will play a...

Personal Trainer

Curefit, Hyderabad, Telangana
3 days ago
We are seeking a highly motivated and experienced Personal Trainer to join our team. As a Personal Trainer, you will be responsible for delivering personalized fitness training programs, providing exceptional coaching, and helping clients achieve their fitness goals.Personal Training, Strength Training, FitnessPersonalized Training Programs: Design and deliver personalized fitness training programs, tailored to each client's goals, needs, and preferences.One-on-One Coaching:...

ADMINISTRATOR L3(CONTRACT)

Wipro Limited, Hyderabad, Telangana
4 days ago
Job Description #body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply- Job Title: ADMINISTRATOR L3(CONTRACT) City: Hyderabad State/Province: Telangana Posting Start Date: 6/26/26 Wipro Limited (NYSE: WIT, BSE: 507685, NSE: WIPRO) is a leading technology services and consulting company focused on building innovative solutions that address clients’ most complex digital transformation needs. Leveraging our holistic portfolio...