Future agents will flip the plan:execute paradigm to 80:20 from todays 20:80 and much, much more reliably oneshot features because of it. (View Highlight)
I expect every coding harness to have some kind of skill plugin model of conditional rules to use in specific situations, likely evaluated by some other model on when to be included so that the rules don’t pollute context like the dreaded MCP context flood. (View Highlight)
nested directory rules are table stakes (View Highlight)
Best of N is a standard generative practice for picking the best candidate. For those unfamiliar, Best of N refers to the practice of having a model answer a question multiple times (N being a placeholder for the amount of times you want it to run) and picking the best resulting answer from those generations. (View Highlight)
today, you can let the model itself evaluate the generations and pick the best one or synthesize the results into a coherent form. (View Highlight)
retro/meta agents reviewing the agents process and doing continuous improvement, like tuning its system prompt, managing claude/agents.md, adding new tooling, adding hooks or guards, etc. (View Highlight)
The prime agent that you interact withs entire goal is only to manage subagents who do all of the coding and validation and task execution. Its job is to help validate the subagent behavior, stay on course, and on scope. (View Highlight)
the ability to click into a sub-agent to see their documented steps and visual displays of when they’re being returned back to the primary agent and dismissed or when a new one is dispatched. (View Highlight)
You should absolutely always, either manually or through tooling, ask a model to review its work after it is finished. (View Highlight)