Profiling Hacker News users based on their comments

(simonwillison.net)

51 points | by simonw 1 hour ago

21 comments

alexpotato 1 hour ago
Story time:
My first full time job (early 2000s) was working for a firm that did online cybersecurity related investigations for Fortune 500 companies (generally via a 3rd party law firm they had retained).
A big part of this was running investigations into people running "pump and dump" stock schemes on Yahoo message boards. We would generally start by scraping all of the posts for a user who had instigated one of these and then handing off the posts to an analyst.
It's amazing:
a. how much info people give out even when they think they are being careful
b. related to a, how even small tidbits combined over time can build a pretty accurate picture of who someone is.
e.g. they post "oh man, the Cubs lost", then a year later "went for a walk on Lakeshore drive", another year later, there was a fire at my local subway stop etc etc and you pretty quickly narrow down the rough neighborhood where they live in Chicago.
Combined with tools like Lexis Nexus and you get a list of people that you can narrow down by age, sex, occupation etc and we could narrow it down to <20 people based on other info they had shared.
Then you fold in their posting patterns and it's pretty obvious who is at work (posting 9 to 5pm) vs home (posting 7pm to 1am).
Again, you keep adding constraints and the intersection of the Venn diagrams gets smaller and smaller.
This was all in the early 2000s before we had cellphones that tracked your location and ad infrastructure that followed you around the internet.
[-]
- martin-t 0 minutes ago
  I thought I had deja vu when reading your comment so I searched and found that you wrote something very similar 6 years ago, then 4 months ago and then 3 comments within the last month.
  Out of curiosity and without meaning it to sound like an accusation, did you write such similar posts by hand or do you use some form of automation for commenting?
- nunez 41 minutes ago
  People search engines do a lot of the heavy lifting and can give you that data on a platter for a few dollars. I pay for a service that employs people to periodically do data removal requests with them. It's not great that _they_ have a bunch of data about me, but I'd rather it be in one place that tries to safeguard it than in a bunch of places all over the Internet. (There are A LOT of people search engines.)
  As for using clues to discover people's whereabouts and such: lots of police/detective shows have turned "finding where people are through Instagram photos" into a meme. Most people don't think about cybersecurity outside of "oh, I need to change my password now."
- eth0up 1 hour ago
  So how do you think this situation will change now that LexisNexis, Oracle, Palantir, Clearview and others are all converging with our four frontier LLM models (plus military contracts) or directly with their own AI?
  What used to require a little work is now instant. And we're much further into the predictive part than most will admit.
- api 1 hour ago
  Think about browser fingerprinting. Every little bit of info is literally one more bit, so by the time you get to 32 bits you’ve narrowed it down to one in four billion. An oversimplification but that’s the idea.
  Being strongly private online requires spy tradecraft levels of precaution.
stego-tech 1 hour ago
This is...disquieting. It's one thing to know that it's possible, another thing to know nation states or large megacorps are doing it, but another thing entirely to see such verbose output from free models about, well, me.
The first two, I've made peace with (nothing I can do about it anyway). The last one picks quite fiercely at old trauma that really makes me reconsider my socials in general, not just HN.
But maybe that's just the anxiety and trauma talking, encouraging me to recede back into the shadows and re-apply the old mask of "acceptableness" I've been trying to toss aside. Maybe the fact a free chatbot can do such a thorough analysis is in fact reason enough to stop worrying about every aspect of my identity and its perception by others, and instead just...be me, and deal with whatever consequences arise from that.
I dunno. Just...lot of emotions, here, most of them quite bad.
[-]
- nirui 14 minutes ago
  From another perspective, it's like hearing others judging you behind your back. First few times it's awkward and maybe even annoying, but given enough time you stops to give a damn about it.
  But, the problem is real if it's a nation states or megacorps are doing it. They'll use such tech in an unjustified way, make a misjudgement, and then ask you to explain yourself out of the situation. Yeah, they're definitely going that, because they don't give a damn about it.
- simonw 1 hour ago
  Right, as is so often the case with AI stuff the thing that's disconcerting is how cheap and low friction and friction adopt available this ability is now.
  Anyone with access to a decent LLM can now perform a version of this in just a few seconds.
  [-]
  - stego-tech 37 minutes ago
    It's a lot to take in, if I'm being honest. Growing up in the sort of cultures where gossip and tabloids were the norm, this tool is painful to me in a way I'm not sure many folks can understand. It's not even low friction anymore; it's no friction, in the sense that anyone with a chatbot and minimal rails can just ask it to do these sorts of profiles now, on anyone they choose.
    We desperately need to modernize laws around discrimination in light of the proliferation of these tools. No longer does someone need to thread the needle in interviews around "illegal" questions to find something to (metaphorically) hang an interviewee with, as these tools can pick it apart quite cleanly. People in protected classes are going to get reamed by bad actors leveraging these tools.
    That said, after rubber ducking with a friend on this, I've come to the conclusion that there's two paths forward from this point: flight (scrubbing socials, hiding online, creating an acceptable persona) or fight (being firmly authentic, owning your weirdness, and accepting you can't control the outcomes of others' actions using these tools). I've spent decades in 'flight', and I'm tired of it. I can't control who uses these tools and to what end, so I may as well just be my damn self anyhow and do regular threat assessments accordingly. The more people who behave authentically, the less power these tools have over us.
  - lysace 38 minutes ago
    That is apparently forbidden information. Your post went from the frontpage to page 3 in minutes.
johnfn 1 hour ago
> This is arguably their defining HN characteristic: they are one of the most vocal, persistent AI optimists on the platform. They claim ~90-95% of their shipped code is AI-generated, report 5-10x productivity gains, and have built a detailed methodology around it — using Playwright for visual verification, static typechecking as a hallucination filter, and e2e test suites as automated validation harnesses
Wow, I sound really annoying. Sorry about that everyone!
[-]
- hypercube33 1 hour ago
  I sound like an annoying old people I guess so I think I'm worse. Either way I forgive you. (GPT called me a wiring closet gremlin)
sachaa 1 hour ago
You can also do this with a simple bookmarklet, no extension needed.
Create a new bookmark in your browser, name it something like "Profile HN User", and paste this as the URL:
javascript:void(function(){var u;var m=window.location.href.match(/news\.ycombinator\.com\/user\?id=([^&]+)/);if(m){u=m[1]}else{u=prompt(%27Enter HN username:%27)}if(!u)return;var msg=%27Profile this HN user: https://hn.algolia.com/api/v1/search_by_date?tags=comment,au...})()
If you're on a HN profile page (news.ycombinator.com/user?id=someone) it grabs the username automatically. Otherwise it prompts you to type one. It copies the profiling prompt to your clipboard and opens a new Claude conversation, just Cmd/Ctrl+V and hit Enter.
zoogeny 20 minutes ago
It is interesting that Marc Andreesen was having a bit of a X crash out over his belief that introspection is bad [1]
I disagree because I tend to seek a middle way. I would agree that too much (excessive) introspection is bad. But I would argue that too little is equally bad.
I think obsessively examining ones own comment history would verge on excessive. I'm wondering how much LLM analysis of my public and private life can remain healthy.
1.https://x.com/pmarca/status/2035190797218587116
janalsncm 1 hour ago
Not doubting the method works in general, but Simon Willison is a public-enough figure so the baseline level of info is higher than just HN comments. If you turn off Claude’s web search:
> Simon Willison is a British software developer, blogger, and open-source advocate, best known for…
[-]
- simonw 1 hour ago
  I'm pretty sure Claude hasn't picked up my fondness of kākāpō parrots yet.
  [-]
  - xnx 1 hour ago
    "Make an image of a kākāpō parrot riding a bicycle"
    [-]
    - simonw 1 hour ago
      I usually ask for a skateboard, kākāpō have dumpy little legs.
  - chewbacha 1 hour ago
    Well, now it might.
- georgemcbay 1 hour ago
  [dead]
alexgandy 1 hour ago
This just in; posting ridiculous amounts of personal information on the internet can lead to you being profiled correctly. Wild stuff.
[-]
- Retr0id 1 hour ago
  We all know this, but in the past someone probably wasn't going to go through thousands of your comments unless you've really pissed them off. It's worth realising how much lower the activation energy is these days.
  [-]
  - alexgandy 1 hour ago
    You’re right about the speed; but it doesn’t change the outcome. If you don’t want it to be associated with you, you simply can’t put it out there.
    [-]
    - rdevilla 1 hour ago
      Yes, and then people wonder why you are a gaping hole in the social media surveillance dragnet with your absence. It took 12 years for me to make my first comment on this account.
  - SilentM68 1 hour ago
    Yeah, it takes just one carefully worded comment to turn the majority of readers in this forum against you, particularly if the comment does not align with the majority's point of view. Definitely a place with a low melting point where my HN Karma, consistently takes a beating :)
- shusaku 1 hour ago
  The real game starts when you use a tool like this to introduce a counter narrative. Now excuse while I’m off to play in my next professional basketball game.
  [-]
  - llbbdd 1 hour ago
    I'll see you on the court, Victor Wembanyama.
- cm2012 1 hour ago
  There was an Irish mafia guy who was caught because his anonymous restaurant review profile got linked to him.
  [-]
  - barrkel 1 hour ago
    Christy Kinahan is still at large, in or around Dubai.
- DoctorOetker 1 hour ago
  Not entirely correctly though, since there are forms of censorship even on HN, which selectively blinds any method of analysis in a systematically biased way.
- stefan_ 1 hour ago
  Yeah, but now with AI. Because the shtick of this blog is "everything is ~~computer~~ AI".
- aaron695 1 hour ago
  [dead]
n2d4 1 hour ago
This was interesting to do on my own profile. It got a bunch of personality attributes about me right that I haven't directly mentioned on here, which is impressive.
I then followed it up with "Given my chat history, how do they compare to me?", and it started making comparisons of myself to myself. Very fun experience.
plun9 1 hour ago
You can just ask a chatbot about Hacker News or reddit users based on their username.
[-]
- aworks 6 minutes ago
  [dead]
michaelteter 58 minutes ago
And note that HN does not allow you to delete your comments after a short time passes.
If you contact them and ask for your data to be deleted, they will directly refuse.
few 1 hour ago
See also
https://antirez.com/news/150
https://antirez.com/hnstyle?username=pg&threshold=20&action=...
Which lets you find the alts of a handle
[-]
- defrost 17 minutes ago
  suspected alts .. like the pg example I also have similar accounts with similar match levels and I know I've never had a HN alt, nor do I recognise any of my suggested alts as familar accounts I've interacted with.
JSR_FDED 1 hour ago
Given a profile like this, how good would an LLM be at figuring out whether the profile if from a bot or a real person?
[-]
- wolvoleo 1 hour ago
  Probably pretty good. Humans are very consistent in ways that LLMs aren't.
Forgeties79 1 hour ago
> “Two things can be true at the same time” — he holds nuanced positions
I feel the need to point out that 99% of the time that phrase is essentially an insult and isn’t indicative of a “nuanced position” lol it generally means “you’re myopic in your views/your argument lacks nuance.” That strikes me as a pretty charitable interpretation by the model there.
You seem like a good dude, and I’m not going to pretend I haven’t thrown out the flippant quip here and there in my comments. I just thought that interpretation was pretty funny.
[-]
- simonw 1 hour ago
  Hah, that's completely fair. When I say "two things can be true at the same time" it's usually in a combative tone when I think someone is making a weak argument.
  [-]
  - Forgeties79 52 minutes ago
    I’d be lying if I said I haven’t done literally the exact same thing. Hell I’ve probably done it recently.
sgbeal 57 minutes ago
(...does this thing to check own profile[^1]...)
> Old man raising fist at, and yelling at, clouds. Get off his lawn.
[^1]: not really - this is speculation (so... kinda the same thing the LLM is doing) but is possibly an accurate representation.
Simulacra 48 minutes ago
I've been doing this for a long time, it's amazing what ChatGPT can suss out with enough data. I like to feed it comments from message boards to try to uncover interesting business opportunities, or threads to follow for my own research.
tamimio 55 minutes ago
> Recurring Hobby Horse
>The word "engineer" being diluted by software/bootcamp culture is something they return to obsessively — arguably their strongest ideological position alongside surveillance criticism
Busted!!
That being said, not surprised because it listed exactly what I want my persona to appear, does that mean I am like that irl? No, I rarely bring the above “engineer” term IRL let alone to be obsessed about it, but in HN it makes sense to bring up, rest are mostly about techie stuff that I usually don’t bring with my friends or family. Also, this can be about anything you produce, like your blog, books, YouTube, or anything, that personality is what attracts (or repels) other people to be around you, it’s human society 101.
vpribish 1 hour ago
HAHAHA - I like me. but claude (sonnet 4.6) seemed like it was cheerleading a bit
SanjayMehta 1 hour ago
"Fetched 0 comments."
bibimsz 1 hour ago
hacker news is a goldmine since you can't delete comments nor even delete your account. this site is a privacy nightmare, in a world where everyone is excited to cancel and dox for unpopular opinions (on this site that means anything to the right of bernie sanders).
[-]
- nunez 33 minutes ago
  I actually appreciate this. I definitely panicked that one time I posted something spicy about a past employer using this handle and couldn't remove it, but forcing that accountability makes me think about what I say and whether it's worth posting it for everyone to see forever.
- raw_anon_1111 1 hour ago
  It could be worse, I have arguments from the 90s when I was on comp.sys.msc.advocacy
- SilentM68 1 hour ago
  Correct!
- b112 1 hour ago
  Have you read the FAQ? HN won't hang you out to dry.
  [-]
  - lysace 1 hour ago
    They will offer to rename your account name to something random.
    They will not delete all your comments. They might agree to delete a very small number of comments.
    That does not help much in the profile-building/AI perspective.
    They should be transparent about this upfront, on the signup page, but I suppose that would hurt conversation stats.
    Cliché but true: On HN we are the product.
    [-]
    - toomuchtodo 1 hour ago
      I ingest, process, and archive the HN firehose. I know others do as well. Regardless of how one feels, once you put something on the Internet, any hope of control of that info is gone forever. Act accordingly. They are kind enough to make changes within some forum integrity tolerances, even though those changes are likely to help very little from an opsec perspective.
      Edit: my use case is building a graph for archiving every link ever posted on HN (posts and comments), if that’s relevant. The contents of HN comments have little value to me for my workflow, nor do I profile users.
      [-]
      - lysace 1 hour ago
        I know. I despise you.
        Edit: However, the real blame must go to YC who refuses to state any of these things on the extremely minimalistic signup page.
        [-]
        toomuchtodo 1 hour ago
        That’s a shame, I don’t have any feelings about you and wish you well regardless.
        [-]
        TimorousBestie 28 minutes ago
        Wow, so magnanimous.
        [-]
        toomuchtodo 26 minutes ago
        You mispronounced genuinely polite. It is free to be so. Is that not allowed here when someone doesn’t have strong feelings? If you care here about here, you care too much imho, find something that matters to care about.
        [-]
        TimorousBestie 20 minutes ago
        It's not polite to be condescending.
        [-]
        toomuchtodo 18 minutes ago
        My comment was genuine. It is a genuine shame someone states they despise me with little additional context, and as I said, it doesn’t bother me and I do wish them well. Where was the condescending? Am I supposed to have strong feelings? Am I supposed to lash out? No, all unnecessary. “Be conservative in what you do, be liberal in what you accept from others". They are simply at a different point in their journey than I am.
        How else would you politely respond to someone who comments only “I know. I despise you”?
- BenjiWiebe 1 hour ago
  Feel free to not post on here, if you're concerned about privacy.
raw_anon_1111 41 minutes ago
[dead]