> While it might seem like a simple screenshot, building a browser from scratch is extremely difficult.
> Another experiment was doing an in-place migration of Solid to React in the Cursor codebase. It took over 3 weeks with +266K/-193K edits. As we've started to test the changes, we do believe it's possible to merge this change.
In my view, this post does not go into sufficient detail or nuance to warrant any serious discussion, and the sparseness of info mostly implies failure, especially in the browser case.
It _is_ impressive that the browser repo can do _anything at all_, but if there was anything more noteworthy than that, I feel they'd go into more detail than volume metrics like 30K commits, 1M LoC. For instance, the entire capability on display could be constrained to a handful of lines that delegate to other libs.
And, it "is possible" to merge any change that avoids regressions, but the majority of our craft asks the question "Is it possible to merge _the next_ change? And the next, and the 100th?"
If they merge the MR they're walking the walk.
If they present more analysis of the browser it's worth the talk (not that useful a test if they didn't scrutinize it beyond "it renders")
Until then, it's a mountain of inscrutable agent output that manages to compile, and that contains an execution pathway which can screenshot apple.com by some undiscovered mechanism.
Time to raise the bar. By 2029 someone will build a new browser using mainly AI-assisted coding and the surprise is that it was designed to be used by pelicans.
You're either overestimating the capabilities of current AI models or underestimating the complexity of building a web browser. There are tons of tiny edge cases and standards to comply with where implementing one standard will break 3 others if not done carefully. AI can't do that right now.
```
pub fn render_placeholder(&self, frame_id: FrameId) -> Result<FrameBuffer, String> {
let (width, height) = self.viewport_css;
let len = (width as usize)
.checked_mul(height as usize)
.and_then(|px| px.checked_mul(4))
.ok_or_else(|| "viewport size overflow".to_string())?;
if len > MAX_FRAME_BYTES {
return Err(format!(
"requested frame buffer too large: {width}x{height} => {len} bytes"
));
}
// Deterministic per-frame fill color to help catch cross-talk in tests/debugging.
let id = frame_id.0;
let url_hash = match self.navigation.as_ref() {
Some(IframeNavigation::Url(url)) => Self::url_hash(url),
Some(IframeNavigation::AboutBlank) => Self::url_hash("about:blank"),
Some(IframeNavigation::Srcdoc { content_hash }) => {
let folded = (*content_hash as u32) ^ ((*content_hash >> 32) as u32);
Self::url_hash("about:srcdoc") ^ folded
}
None => 0,
};
let r = (id as u8) ^ (url_hash as u8);
let g = ((id >> 8) as u8) ^ ((url_hash >> 8) as u8);
let b = ((id >> 16) as u8) ^ ((url_hash >> 16) as u8);
let a = 0xFF;
let mut rgba8 = vec![0u8; len];
for px in rgba8.chunks_exact_mut(4) {
px[0] = r;
px[1] = g;
px[2] = b;
px[3] = a;
}
Ok(FrameBuffer {
width,
height,
rgba8,
})
}
The moment all code is interacted with through agents I cease to care about code quality. The only thing that matters is the quality of the product, cost of maintenance etc. exactly the thing we measure software development orgs against. It could be handy to have these projects deployed to demonstrate their utility and efficacy? Looking at PRs of agents feels a wrong headed, like who cares if agents code is hard to read if agents are managing the code base?
I used similar techniques to build tjs [1] - the worlds fastest and most accurate json schema validator, with magical TypeScript types. I learned a lot about autonomous programming. I found a similar "planner/delegate" pattern to work really well, with the use of git subtrees to fan out work [2].
I think any large piece of software with well established standards and test suites will be able to be quickly rewritten and optimized by coding agents.
This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.
The implied future here is _unreal cool_. Swarms of coding agents that can build anything, with little oversight. Long-running projects that converge on high-quality, complex projects.
But the examples feel thin. Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets. The closest to real code is what they've done with Cursor's codebase .... but it's not merged yet.
I don't want to say, call me when it's merged. But I'm not worried about agents ability to produce millions of lines of code. I'm worried about their ability to intersect with the humans in the real world, both as users of that code and developers who want to build on top of it.
Not everything, only code-bases of existing (open-source?) applications.
But what would be the point of re-creating existing applications? It would be useful if you can produce a better version of those applications. But the point in this experiment was to produce something "from scratch" I think. Impressive yes, but is it useful?
A more practically useful task would be for Mozilla Foundation and others to ask AI to fix all bugs in their application(s). And perhaps they are trying to do that, let's wait and see.
Did anyone manage to run the tests from the repository itself? The code seems filled with errors and warnings, as far as I can tell none of them because of the platform I'm on (Linux). I went and looked at the Action workflow history for some pages, and seems CI been failing for a while, PRs also all been failing CI but merged. How exactly was this verified to be something to be used as an successful example, or am I misunderstanding what point they are trying to make? They mention a screenshot, but they never actually mention if their goal was successfully met, do they?
I'm not sure the approach of "completely autonomous coding" is the right way to go. I feel like maybe we'll be able to use it more effectively if we think of them as something to be used by a human to accomplish some thing instead, lean into letting the human drive the thing instead, because quality spirals so quickly out of control.
The browser it built, obviously the context window of the entire project is huge. They mention loads of parallel agents in the blog post, so I guess each agent is given a module to work on, and some tests? And then a 'manager' agent plugs this in without reading the code? Otherwise I can't see how, even with ChatGPT 5.2/Gemini 3, you could do this otherwise? In retrospect it seems an obvious approach and akin to how humans work in teams, but it's still interesting.
GPT-5.2-Codex has a 400,000 token window. Claude 4.5 Opus is half of that, 200,000 tokens.
It turns out to matter a whole lot less than you would expect. Coding Agents are really good at using grep and writing out plans to files, which means they can operate successfully against way more code than fits in their context at a single time.
Get a good "project manager" agents.md and it changes the whole approach of vibe coding. For a professional environment, with each person given a little domain, arranged in the usual hierarchy of your coding team, truly amazing things can get done.
Presumably the security and validation of code still needs work, I haven't read anything that indicates those are solved yet, so people still need to read and understand the code, but we're at the "can do massive projects that work" stage.
Division of labor and planning and hierarchy are all rapidly advancing, the orchestration and coordination capabilities are going to explode in '26.
Supposing agents and their organization improve, it seems like we’re approaching a point where the cost of a piece of software will be driven down to the cost of running the hardware, and the cost of the tokens required to replicate it.
The tokens were “expensive” from the minds of humans …
It will be driven down to the cost of having a good project and product manager effectively understanding what the customer wants, which has been the main barrier to excellent software for a good long time.
And not only understanding what the customer wants, but communicating that unambiguously to the AI. And note who is the "customer" here? Is it the end-users, or is it a client-company which contracts the project-manager for this task? But then the issue is still there, who in the client-company decides exactly what is needed and what the (potential) users want?
I think this situation emphasizes the importance of (something like) Agile. To produce something useful can only happen via experimentation and getting feedback from actual users, and re-iterating relentlessly.
Can a browser expert please go through the code the agent wrote (skim it), and let us know how it is. Is it comparable to ladybird, or Servo, can it ever reach that capability soon?
Define "from scratch" in "building a web browser from scratch". This thing has over 100 crates as dependencies... To implement css layouting, it uses Taffy, a crate used by existing browser implementations...
> Another experiment was doing an in-place migration of Solid to React in the Cursor codebase. It took over 3 weeks with +266K/-193K edits. As we've started to test the changes, we do believe it's possible to merge this change.
In my view, this post does not go into sufficient detail or nuance to warrant any serious discussion, and the sparseness of info mostly implies failure, especially in the browser case.
It _is_ impressive that the browser repo can do _anything at all_, but if there was anything more noteworthy than that, I feel they'd go into more detail than volume metrics like 30K commits, 1M LoC. For instance, the entire capability on display could be constrained to a handful of lines that delegate to other libs.
And, it "is possible" to merge any change that avoids regressions, but the majority of our craft asks the question "Is it possible to merge _the next_ change? And the next, and the 100th?"
If they merge the MR they're walking the walk.
If they present more analysis of the browser it's worth the talk (not that useful a test if they didn't scrutinize it beyond "it renders")
Until then, it's a mountain of inscrutable agent output that manages to compile, and that contains an execution pathway which can screenshot apple.com by some undiscovered mechanism.
But is this actually true? They don't say that as far as I can tell, and it also doesn't compile for me nor their own CI it seems.
I shared my LLM predictions last week, and one of them was that by 2029 "Someone will build a new browser using mainly AI-assisted coding and it won’t even be a surprise" https://simonwillison.net/2026/Jan/8/llm-predictions-for-202... and https://www.youtube.com/watch?v=lVDhQMiAbR8&t=3913s
This project from Cursor is the second attempt I've seen at this now! The other is this one: https://www.reddit.com/r/Anthropic/comments/1q4xfm0/over_chr...
What is `FrameState::render_placeholder`?
``` pub fn render_placeholder(&self, frame_id: FrameId) -> Result<FrameBuffer, String> { let (width, height) = self.viewport_css; let len = (width as usize) .checked_mul(height as usize) .and_then(|px| px.checked_mul(4)) .ok_or_else(|| "viewport size overflow".to_string())?;
} ```What is it doing in these diffs?
https://github.com/wilsonzlin/fastrender/commit/f4a0974594e3...
I'd be really curious to see the amount of work/rework over time, and the token/time cost for each additional actual completed test case.
I think any large piece of software with well established standards and test suites will be able to be quickly rewritten and optimized by coding agents.
[1] https://github.com/sberan/tjs
[2] /spawn-perf-agents claude command: https://github.com/sberan/tjs/blob/main/.claude/commands/spa...
The implied future here is _unreal cool_. Swarms of coding agents that can build anything, with little oversight. Long-running projects that converge on high-quality, complex projects.
But the examples feel thin. Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets. The closest to real code is what they've done with Cursor's codebase .... but it's not merged yet.
I don't want to say, call me when it's merged. But I'm not worried about agents ability to produce millions of lines of code. I'm worried about their ability to intersect with the humans in the real world, both as users of that code and developers who want to build on top of it.
because it is absolutely impossible to review that code and there is gazillion issues there.
The only way it can get merged is YOLO and then fix issues for months in prod which kinda defeats the purpose and brings gains close to zero.
But what would be the point of re-creating existing applications? It would be useful if you can produce a better version of those applications. But the point in this experiment was to produce something "from scratch" I think. Impressive yes, but is it useful?
A more practically useful task would be for Mozilla Foundation and others to ask AI to fix all bugs in their application(s). And perhaps they are trying to do that, let's wait and see.
I'm not sure the approach of "completely autonomous coding" is the right way to go. I feel like maybe we'll be able to use it more effectively if we think of them as something to be used by a human to accomplish some thing instead, lean into letting the human drive the thing instead, because quality spirals so quickly out of control.
It turns out to matter a whole lot less than you would expect. Coding Agents are really good at using grep and writing out plans to files, which means they can operate successfully against way more code than fits in their context at a single time.
Who created those agents and gives them the tasks to work on. Who created the tests? AI, or the humans?
Presumably the security and validation of code still needs work, I haven't read anything that indicates those are solved yet, so people still need to read and understand the code, but we're at the "can do massive projects that work" stage.
Division of labor and planning and hierarchy are all rapidly advancing, the orchestration and coordination capabilities are going to explode in '26.
The tokens were “expensive” from the minds of humans …
I think this situation emphasizes the importance of (something like) Agile. To produce something useful can only happen via experimentation and getting feedback from actual users, and re-iterating relentlessly.