What an interesting article. I did not assume I would read it until the end when I opened it, but the writing was super clear and easy to follow.
At the end, I admire the craft and patience to try to solve code diff rendering, and wish the folks at GitHub could put the same effort to improve their platform.
On a side note, I feel that we’re going to see more and more of this type of agentic usage, in well defined sub tasks, and the ability of a model to try many possibilities is a huge gift here.
It's cool seeing all the engineering that goes into optimizing performance of diffs. I'm working on a FreeCAD workbench that generates diffs on CAD model trees[0], and although my bottlenecks are a bit different, I can still implement some of your optimizations down the line if needed (such as deferred syntax highlighting).
My main bottleneck is that I do a complete diff on all open + changed documents in the repository up front, because due to how document properties are stored, I won't know if the file has meaningful changes until I compute the full diff (FreeCAD may save the document, but not have anything meaningful change.)
Feature request; `git diff --color-moved` uses colors to display moved chunks of code. Scanning https://diffs.com/docs it isn't obvious that yall support that; please add it :)
I disagree with the theory that scrolling frame rate doesn't need to be smooth for scrolling to feel smooth.
On mobile it kinda does. Scrolling diffs on mobile just kinda feels crap.
I have been spoiled by years of engineer hours spent getting scrolling to be 60- or even 120Hz smooth to match my finger, and diffs just.. isn't.
I know this is frustrating to hear, and that this is technically compounded by mobile probably having the lowest device performance to be playing with too, but.. There you go.
> disagree with the theory that scrolling frame rate doesn't need to be smooth for scrolling to feel smooth
It's possible you might be misunderstanding what I was trying to say here because 120hz scrolling on a 120hz device was the goal and why one of those virtualization techniques was not acceptable to me which lead me to coming up with a novel workaround to this problem (Inverse Sticky Technique).
CodeView uses a system that allows scrolling to update at your native framerate (120hz) WITHOUT needing Javascript needing to keep up at 120hz. If you're seeing stuttering while scrolling on https://diffshub.com would love to know more context (device/diff link/etc) because that is very much NOT our experience.
Even the linked ghostty PR on your home shows this - this is Firefox Android on a Nokia XR21 / TA-1486.
It's not unuseable, but it definitely feels like 'js hacking my scrolling' and not a native surface flinging around.
The experience is actually worse with smaller movements, i guess because my brain is more conscious when breaking the 'finger physically moving the text' illusion.
I don't mean to be dismissive - you're working on a really hard problem, and you're clearly approaching it with a mindset of perfection. I'm posting because I know you're probably able to solve this too :)
Edit: as a point of (unfair) comparison, the codemirror Huge File demo works fine: https://codemirror.net/examples/million/
It does suffer from the occasional partial paint when quickly coasting, but I'm not bothered by this at all, it's far less intrusive than dropping frames / stuttering / etc.
Just to be clear tho, we don't actually scroll jack, native scrolling works as it should and content should move with normal gpu composited scroll. That said, it's possible that loading that much data into memory may be causing causing knock on effects somehow that are just slowing everything down.
Matters a great deal on desktop too, and laptops for that matter. Even more on platforms like macOS that smooths scrolling by default too, but very noticeable on Windows and various Linux distributions too when native scrolling is janky/choppy, and it frustrates even casual users.
I was hoping that this would talk more about the logic behind generating a diff, rather than the optimisations involved in rendering the text.
IMO (as someone who doesn't have to deal with the actual rendering) it would go a bit deeper into talking about deciding how to show what has changed. There's a lot of improvements that could be made there. e.g. "whitespace has changed here" so there's no real code changes involved.
Or "this big list of imports has changed, and code formatting has line-wrapped the list into different lines" - gitlab for example copes poorly with this. I'd love to just see a clean diff that highlights the additional import, and not just ten lines of changes caused by adding one line to a big list of imported symbols/functions.
One of our next big projects is actually to support semantic diffs, which I think will be a lot more applicable to what you're asking for here. Currently diffs just takes a normal git patch file, or generates one from 2 versions of a file.
Most projects start with tree-sitter and then switch to language-native parsers. Either way, it's not something you solve yourself – you just find the language-specific implementation load megabytes of WASM on the frontend or generate it on the backend.
difftastic, semanticdiff.. lots of projects like that. Obviously they can offer stuff like "function name changed" instead of showing you 30 lines of +newName -oldName
> rather than the optimizations involved in rendering the text.
Any views they have on this topic is going to come across as quite opinionated given their choices for text rendering for this post and general aesthetics of website.
Naw, the truth is I'm not really smart or intelligent enough to build a semantic diff system. For that you'll need to wait on a post from one of our smarter devs, this was a post about rendering diffs in a browser.
It is SO NICE to see people working on making fast, nice-to-use tools. It's a lovely experience to use diffshub. Thank you for creating it, and than you for the great write-up! (I made it "that far" )
Yes and no. It would help to improve things a bit when it comes the measure/reconciliation phase (unclear to say how much). However we've already done a pretty good job around batching writes vs reads.
However passing a million lines of code through pretext is unlikely to be very efficient, so a lot of the work around estimation is still very important.
That said, while I don't want to make pretext a direct dependency of the library, there's a good chance I'll explore the possibility of allowing devs to pass it in as an additional argument perhaps improve performance a bit.
It should also be noted that we have a full API to support things like line annotations (comments, etc) that are entirely controlled by the user, so there's always a bit of a dynamic aspect there that would come into play
I feel like virtualization is not the right way to handle things. It adds so much complexity and makes the user experience buggy due to breaking optimizations and features of browsers.
Computers are very powerful these days and have a done of resources that they can use. We should be able to handle large diffs without any crazy tricks.
Performance and optimization is one of many pieces, but yes, it's a meme to render 500k lines.
That said though, and maybe I didn't say it well in the post, the more performant and optimized your tool is, the less burden you put on developers and users.
Sure you won't review 100k lines, but maybe the diff includes a ton of testing snapshots, or maybe it's a long running feature branch and you need to just quickly jump in and look at a specific change from a specific file. The less the developer or the user needs to think about `how` to render the diff or `how to navigate the diff`, the better we did our job.
At the end, I admire the craft and patience to try to solve code diff rendering, and wish the folks at GitHub could put the same effort to improve their platform.
On a side note, I feel that we’re going to see more and more of this type of agentic usage, in well defined sub tasks, and the ability of a model to try many possibilities is a huge gift here.
https://repo.autonoma.ca/repo/treetrek/commit/3fe9360599ae23...
The diffs rendering library looks amazing: https://diffs.com/
Presumably the red-green issue is a simple CSS update?
My main bottleneck is that I do a complete diff on all open + changed documents in the repository up front, because due to how document properties are stored, I won't know if the file has meaningful changes until I compute the full diff (FreeCAD may save the document, but not have anything meaningful change.)
[0] https://github.com/eblanshey/HistoryWorkbench
On mobile it kinda does. Scrolling diffs on mobile just kinda feels crap.
I have been spoiled by years of engineer hours spent getting scrolling to be 60- or even 120Hz smooth to match my finger, and diffs just.. isn't.
I know this is frustrating to hear, and that this is technically compounded by mobile probably having the lowest device performance to be playing with too, but.. There you go.
It's possible you might be misunderstanding what I was trying to say here because 120hz scrolling on a 120hz device was the goal and why one of those virtualization techniques was not acceptable to me which lead me to coming up with a novel workaround to this problem (Inverse Sticky Technique).
CodeView uses a system that allows scrolling to update at your native framerate (120hz) WITHOUT needing Javascript needing to keep up at 120hz. If you're seeing stuttering while scrolling on https://diffshub.com would love to know more context (device/diff link/etc) because that is very much NOT our experience.
It's not unuseable, but it definitely feels like 'js hacking my scrolling' and not a native surface flinging around.
The experience is actually worse with smaller movements, i guess because my brain is more conscious when breaking the 'finger physically moving the text' illusion.
I don't mean to be dismissive - you're working on a really hard problem, and you're clearly approaching it with a mindset of perfection. I'm posting because I know you're probably able to solve this too :)
Edit: as a point of (unfair) comparison, the codemirror Huge File demo works fine: https://codemirror.net/examples/million/ It does suffer from the occasional partial paint when quickly coasting, but I'm not bothered by this at all, it's far less intrusive than dropping frames / stuttering / etc.
Just to be clear tho, we don't actually scroll jack, native scrolling works as it should and content should move with normal gpu composited scroll. That said, it's possible that loading that much data into memory may be causing causing knock on effects somehow that are just slowing everything down.
IMO (as someone who doesn't have to deal with the actual rendering) it would go a bit deeper into talking about deciding how to show what has changed. There's a lot of improvements that could be made there. e.g. "whitespace has changed here" so there's no real code changes involved.
Or "this big list of imports has changed, and code formatting has line-wrapped the list into different lines" - gitlab for example copes poorly with this. I'd love to just see a clean diff that highlights the additional import, and not just ten lines of changes caused by adding one line to a big list of imported symbols/functions.
difftastic, semanticdiff.. lots of projects like that. Obviously they can offer stuff like "function name changed" instead of showing you 30 lines of +newName -oldName
Any views they have on this topic is going to come across as quite opinionated given their choices for text rendering for this post and general aesthetics of website.
(I say this, having done a vibe-port of the code to a browser extension, so the underlying concept works.)
However passing a million lines of code through pretext is unlikely to be very efficient, so a lot of the work around estimation is still very important.
That said, while I don't want to make pretext a direct dependency of the library, there's a good chance I'll explore the possibility of allowing devs to pass it in as an additional argument perhaps improve performance a bit.
It should also be noted that we have a full API to support things like line annotations (comments, etc) that are entirely controlled by the user, so there's always a bit of a dynamic aspect there that would come into play
Computers are very powerful these days and have a done of resources that they can use. We should be able to handle large diffs without any crazy tricks.
something i'd really want to see from forges is alternate diff techniques: like AST diffing.
That said though, and maybe I didn't say it well in the post, the more performant and optimized your tool is, the less burden you put on developers and users.
Sure you won't review 100k lines, but maybe the diff includes a ton of testing snapshots, or maybe it's a long running feature branch and you need to just quickly jump in and look at a specific change from a specific file. The less the developer or the user needs to think about `how` to render the diff or `how to navigate the diff`, the better we did our job.