About This Opportunity
The field of agentic AI — where systems autonomously plan, reason across multiple steps, delegate sub-tasks, and coordinate resources to achieve complex goals — is one of the most rapidly advancing areas in global technology. Building robust evaluation frameworks and benchmark tasks for these systems is foundational work that will shape how the entire industry assesses AI capability, safety, and reliability for years to come.
This Operations Engineer role sits at the frontier of applied AI research. You will design and build the operational scenarios — logistics problems, incident response simulations, capacity planning challenges, project management tasks — that multi-agent AI systems are evaluated against. This requires both strong Python engineering skills and genuine domain expertise in operational problem-solving. You need to understand how real operational problems are structured, what makes them hard, and how to encode that difficulty into a form that AI evaluation can measure precisely.
Opportunities to work directly on multi-agent AI evaluation at this level are rare and represent significant career capital in one of technology's most consequential emerging fields.
Role Responsibilities
Design and develop multi-agent benchmark tasks involving planning, scheduling, and resource allocationCreate real-world operational scenarios (logistics, project management, incident response, capacity planning)
Build constraint-rich problem statements with multiple dependencies and variables
Develop Python-based scripts to evaluate feasibility, completeness, and optimality
Break down complex problems into structured sub-tasks for multi-agent systems
Model scenarios with timelines, dependencies, and resource constraints
Collaborate with teams to improve task quality, coverage, and evaluation rigor
Applying for This Role
Requirements
5+ years of experience in operations, project management, logistics, or supply chain
Strong understanding of constraints, dependencies, and scheduling logic
Proficiency in Python for validation and verification scripting
Strong structured problem-solving and decomposition skills
Ability to model real-world operational scenarios
Clear technical communication and documentation skills
Ability to work in a fast-paced environment and meet deadlines
Preferred Qualifications
Experience with optimization techniques (linear programming, constraint satisfaction, scheduling algorithms)
Background in operations research
Experience with simulation or modeling tools
Familiarity with AI planning systems or automated reasoning
Experience with AI benchmarks (e.g., SWE-bench, Terminal-bench)
Hands-on experience with Docker
Application Process
Apply via Easy Apply / shared link and complete the Interest Check Form (ICF)
Complete the take-home assessment (post-shortlisting)
Shortlisted candidates will be reviewed further
The team will connect with next steps
Compensation: $15/hour
Crossing Hurdles is a social enterprise focused on empowering individuals and communities through skills training, mentorship, and capacity-building programmes. The organisation partners with NGOs, public institutions, and private sector actors to help people overcome barriers to employment, education, and economic participation.
Operations leaders hire for process thinking, cross-functional coordination, and measurable efficiency improvement. They want candidates who can diagnose bottlenecks, implement fixes, and sustain improvements without constant oversight — and who communicate trade-offs clearly before committing resources.
Full-time roles typically include benefits (health insurance, pension contributions, paid leave). During salary negotiation, always consider the total compensation package — benefits can be worth 20–30% on top of base salary. Ask specifically about probation period, performance review cadence, and remote/hybrid flexibility before signing.
| Salary | Competitive |
| Type | Full-time |
| Location | — |
| Category | Operations |
| Posted | Apr 27, 2026 |
New jobs, scholarships and career tips — delivered to your inbox daily. Unsubscribe any time.