Can I participate solo?

Yes. There is no minimum team size. Solo participants are welcome and compete on equal footing with teams.

Can I be on multiple teams?

Yes. However, if two teams sharing a member submit substantially overlapping agents, the worse-performing submission will be excluded from award eligibility.

Is any LLM allowed, or are there restrictions on proprietary vs open-source?

In the Open Track, any LLM is allowed - proprietary, open-source, any size. In the Cerebras Track, you use models on Cerebras infrastructure via the free tier.

Can I submit to both tracks with the same agent?

Yes, you can enter both tracks. However, the Cerebras Track requires your agent to use Cerebras infrastructure, so you may need to adapt your agent. The same team can submit different agents to each track.

What happens if my agent hangs during evaluation?

Each task has a time limit. If your agent doesn’t respond within the limit, that trial is scored as a failure. The evaluation continues with remaining tasks. For dev submissions, you can debug and resubmit immediately. For test set evaluations, there is a 24-hour grace window to fix and resubmit.

Are API credits provided?

For official evaluation runs, LLM API costs are covered by the organizers as far as possible - no cost to you (except special LLM setup or usage scenarios). For development, you use your own API keys. Cerebras Track participants use the Cerebras free tier.

How is the hidden test set protected?

The hidden test set is a completely novel set that is never shared with participants. Evaluation runs in a controlled environment. Agents cannot observe or extract test set tasks. Any attempt to probe the test set is a rule violation resulting in disqualification.

Will the test set be released after the competition?

Post-competition release will be decided and announced separately. The train and dev sets are available now under the MIT License.

Is this competition in-person only, or can I participate remotely?

The competition runs entirely online. You develop and submit remotely. Only the final presentations happen in-person at IJCAI-ECAI 2026 in Bremen (August 15–21), and attendance is encouraged but not required for prize eligibility.

What if I can’t attend IJCAI-ECAI 2026 to present?

Winners who cannot attend may present remotely or designate a co-author to present on their behalf. Prize eligibility is not affected by attendance.

What counts as “hard-coding” vs “clever prompting”?

Hard-coding means encoding specific task answers or building lookup tables that map task descriptions to pre-determined solutions. This is prohibited. Clever prompting means crafting system prompts, few-shot examples, or reasoning strategies that help the agent generalize - this is encouraged. The litmus test: would your approach work on novel tasks it hasn’t seen? If yes, it’s prompting. If it only works on tasks whose answers you’ve memorized, it’s hard-coding.

Can I use retrieval / RAG over the environment data?

Yes. Retrieval-augmented generation over the environment data (tools, policies, world data) is explicitly allowed and encouraged as a research direction.

Can I use the training data for fine-tuning?

Yes. You may fine-tune on the provided training data using any method (SFT, RLHF, DPO, etc.). You may also use the dev set for validation. The training and dev data are released under the MIT License.

Will technical reports be published? Where?

Selected technical reports will be published as non-archival proceedings on the competition website. This means they won’t preclude later publication at a venue of your choice. Reports are 4 pages in IJCAI format (excluding references).

How does the A2A protocol work?

The A2A (Agent-to-Agent) protocol is a structured communication format between the evaluation environment (Evaluator Agent) and your agent (Agent-under-Test). Your agent receives user messages and tool results, and sends back responses and tool calls. The starter kit includes a complete working example. See the Data & Starter Kit page for details.

Still Have Questions?

Join the Discord for real-time help, or email us directly.