
There are levels to AI coding, and Matthew Berman reckons most people are stuck on the bottom rung. Beginners prompt, wait for the agent to finish, review the work, then prompt again. Round and round. Experts do something different: they automate the whole loop so the machine keeps working while they sleep. This video is his attempt to show you what the actual pros are doing.
Start with the tools. Berman uses all of them because it's his job, but his two daily drivers are Cursor and Codex. Cursor he likes because you can run models from different companies (OpenAI, Anthropic, Cursor's own), and it was first to ship cloud agents. Codex he rates for the design and the concise summaries: it runs a command, gives you a one or two sentence note on what it did, and moves on. He cannot stand reading essays from an agent. Claude Code is great but he burns through quota too fast, so he's drifted off it. Devin and Factory both get a nod too. His advice: go use them, find your own fit.
Then there's the config layer: rules, agents.md, claude.md. These are where you tell the tool exactly how to behave. Commit structure, commit message style, the personality of the model, your coding preferences. Nearly everything supports agents.md except Claude Code, which has its own claude.md. Cursor's "rules" basically just write to the agents.md file. Start with the vibe and personality, then add your workflow and deploy process as you learn what annoys you.
Now the bit he hammers hardest: skills. Anything you do more than once should be a skill. Instead of pasting the same prompt over and over, you type slash, invoke the skill, done. He uses them for repeated tasks, domain-specific rules (company writing style, how you write GitHub issues), tool instructions (how to kick off tests, how to use a specific API or CLI), and quality gates (run all tests locally, 100% pass rate before opening a PR, fix anything that fails). Agents can even discover which skill to use at runtime, so you don't always have to invoke it. There are loads off the shelf: he points at "agent skills," 61,000 GitHub stars, covering idea to PRD to code to QA to deploy. Grab the URL, tell Cursor or Codex or Factory to install it, restart if needed. Done.
The sponsor was Greptile, and it slots into the workflow as a code reviewer. Connect it to a repo and it reviews every PR automatically: a summary of what changed, a confidence score 0 to 5 on whether merging will land clean, the files touched, a flowchart, and specific issues with a copy-paste prompt to fix them. Used by Nvidia, Compass, WorkOS, Zapier, Brex and Scale, apparently.
Here's where it gets interesting: automations and loops. Automations prompt your agent off a trigger. His example: when a PR opens, wait until Greptile leaves comments, address each one, push the fixed code back. Cursor even auto-detects the tools you'll need to make it run. Codex does the same via chat or manual setup. Loops are the other half. A loop is a trigger, a repeated action, and an end goal so it stops. He announced a free loop library (signals.future.ai/loop-library) with real ones you can nick. Examples: an overnight docs sweep at 1am that compares yesterday's changes to the docs and fixes the gaps. A "sub-50ms page load" loop that hammers every page, modal and sidebar until everything loads under 50 milliseconds (he's run it for hours, says the app ended up lightning fast). And a production error sweep that reads the logs every night, diagnoses errors, writes a fix and opens a PR, so there's a fix waiting when he wakes up.
His best-practice trio: 100% test coverage, always-fresh documentation, and exhaustive logging. Log everything, keep a 7 or 30 day window, and let the agent fix whatever shows up. He calls it the flywheel.
On cloud versus local agents: cloud spins up isolated environments per agent, so it's infinitely parallel (no frying your laptop running 12 to 20 agents), accessible from your phone, and avoids the conflicts you hit when multiple agents write to the same repo. Cursor even hands you a video and screenshots of the changes. Local is faster (environment's already warm) and gives more control and earlier features, but he's moving his whole workflow to cloud anyway. Work trees get a mention too: separate working copies per agent so they don't trip over each other, conflicts resolved at merge. Use them unless agents are working in totally separate areas. And set cloud agents up with a full environment: keys, env vars, the lot.
He closes on multi-model. Speed and cost are why you don't just use the one frontier model. Build the plan with a big model that sees the whole codebase, write the code with something cheaper like Composer, then review with a different model again (he names GPT 5.5) for a fresh pair of eyes. All of it baked into a skill.
The one thing he admits nobody has cracked: merging and deploying a dozen parallel agents to production at once. He's asked OpenAI, Cursor, the best engineers going. Still a mess.
The future of coding is you asleep and the agents grafting. The catch is they still can't agree on who merges first.