No strcpy either

(daniel.haxx.se)

134 points | by firesteelrain 4 hours ago

13 comments

  • Tharre 47 minutes ago
    It's worth noting that strcpy() isn't just bad from a security perspective, on any CPU that's not completely ancient it's bad from a performance perspective as well.

    Take the best case scenario, copying a string where the precise length is unknown but we know it will always fit in, say, 64 bytes.

    In earlier days, I would always have used strcpy() for this task, avoiding the "wasteful" extra copies memcpy() would make. It felt efficient, after all you only replace a i < len check with buf[i] != null inside your loop right?

    But of course it doesn't actually work that way, copying one byte at a time is inefficient so instead we copy as many as possible at once, which is easy to do with just a length check but not so easy if you need to find the null byte. And on top of that you're asking the CPU to predict a branch that depends completely on input data.

    • amelius 17 minutes ago
      We should just move away from null-terminated strings, where we can, as fast as we can.
      • raverbashing 8 minutes ago
        Yes

        And maybe even have a (arch dependent) string buffer zone where the actual memory length is a multiple of 4 or even 8

  • t43562 2 hours ago
    I've always wondered at the motivatons of the various string routines in C - every one of them seems to have some huge caveat which makes them useless.

    After years I now think it's essential to have a library which records at least how much memory is allocated to a string along with the pointer.

    Something like this: https://github.com/msteinert/bstring

    • alexfoo 45 minutes ago
      Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.

      A library that records how much memory is allocated to a string along with the pointer isn't a necessity.

      Most people who write in C professionally are completely used to it although the footgun is (and all of the others are) always there lurking.

      You'd generally just see code like this:-

          char hostname[20];
          ...
          strncpy( hostname, input, 20 );
          hostname[19]=0;
      
      The problem obviously comes if you forget the line to NUL that last byte AND you have a input that is greater than 19 characters long.

      (It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.)

      I remember debugging a problem 20+ years ago on a customer site with some software that used Sybase Open/Server that was crashing on startup. The underlying TDS communications protocol (https://www.freetds.org/tds.html) had a fixed 30 byte field for the hostname and the customer had a particularly long FQDN that was being copied in without any checks on its length. An easy fix once identified.

      Back then though the consequences of a buffer overrun were usually just a mild annoyance like a random crash or something like the Morris worm. Nowadays such a buffer overrun is deadly serious as it can easily lead to data exfiltration, an RCE and/or a complete compromise.

      Heartbleed and Mongobleed had nothing to do with C string functions. They were both caused by trusting user supplied payload lengths. (C string functions are still a huge source of problems though.)

      • __float 28 minutes ago
        > Yet software developed in C, with all of the foibles of its string routines, has been sold and running for years with trillions of USD is total sales.

        This doesn't seem very relevant. The same can be said of countless other bad APIs: see years of bad PHP, tons of memory safety bugs in C, and things that have surely led to significant sums of money lost.

        > It's also very easy to get this wrong, I almost wrote `hostname[20]=0;` first time round.

        Why would you do this separately every single time, then?

        The problem with bad APIs is that even the best programmers will occasionally make a mistake, and you should use interfaces (or...languages!) that prevent it from happening in the first place.

        The fact we've gotten as far as we have with C does not mean this is a defensible API.

        • alexfoo 3 minutes ago
          Sure, the post I was replying to made it sound like it's a surprise that anything written in C could ever have been a success.

          Not many people starting a new project (commercial or otherwise) are likely to start with C, for very good reason. I'd have to have a very compelling reason to do so, as you say there are plenty of more suitable alternatives. Years ago many of the third party libraries available only had C style ABIs and calling these from other languages was clumsy and convoluted (and would often require implementing cstring style strings in another language).

          > Why would you do this separately every single time, then?

          It was just an illustration or what people used to do. The "set the trailing NUL byte after a strncpy() call" just became a thing lots of people did and lots of people looked for in code reviews - I've even seen automated checks. It was in a similar bucket to "stuff is allocated, let me make sure it is freed in every code path so there aren't any memory leaks", etc.

          Many others would have written their own function like `curlx_strcopy()` in the original article, it's not a novel concept to write your own function to implement a better version of an API.

    • jerf 20 minutes ago
      Yes, not having a length along with the string was a mistake. It dates from an era where every byte was precious and the thought of having two bytes instead of one for length was a significant loss.

      I have long wondered how terrible it would have been to have some sort of "varint" at the beginning instead of a hard-coded number of bytes, but I don't have enough experience with that generation to have a good feel for it.

    • lesuorac 1 hour ago
      It's from a time before computer viruses no?

      But also all of this book-keeping takes up extra time and space which is a trade-off easily made nowadays.

      • rini17 1 hour ago
        Yes, in the old times if you crashed a program or whole computer with invalid input, it was your fault.

        Viruses did exist, and these were considered users' fault too.

    • formerly_proven 2 hours ago
      strncpy is fairly easy, that's a special-purpose function for copying a C string into a fixed-width string, like typically used in old C applications for on-disk formats. E.g. you might have a char username[20] field which can contain up to 20 characters, with unused characters filled with NULs. That's what strncpy is for. The destination argument should always be a fixed-size char array.

      A couple years ago we got a new manual page courtesy of Alejandro Colomar just about this: https://man.archlinux.org/man/string_copying.7.en

      • Cyph0n 2 hours ago
        strncpy doesn’t handle overlapping buffers (undefined behavior). Better to use strncpy_s (if you can) as it is safer overall. See: https://en.cppreference.com/w/c/string/byte/strncpy.html.

        As an aside, this is part of the reason why there are so many C successor languages: you can end up with undefined behavior if you don’t always carefully read the docs.

        • Asooka 1 hour ago
          Back when strncpy was written there was no undefined behaviour (as the compiler interprets it today). The result would depend on the implementation and might differ between invocations, but it was never the "this will not happen" footgun of today. The modern interpretation of undefined behaviour in C is a big blemish on the otherwise excellent standards committee, committed (hah) in the name of extremely dubious performance claims. If "undefined" meaning "left to the implementation" was good enough when CPU frequency was measured in MHz and nobody had more than one, surely it is good enough today too.

          Also I'm not sure what you mean with C successor languages not having undefined behaviour, as both Rust and Zig inherit it wholesale from LLVM. At least last I checked that was the case, correct me if I am wrong. Go, Java and C# all have sane behaviour, but those are much higher level.

          • Cyph0n 40 minutes ago
            The problem isn't undefined behavior per se; I was using it as an example for strncpy. Rust is a no - in fact, the goal of (safe) Rust is to eliminate undefined behavior. Zig on the other hand I don't know about.

            In general, I see two issues at play here:

            1. C relies heavily on unsized pointers (vs. fat pointers), which is why strncpy_s had to "break" strncpy in order to improve bounds checks.

            2. strncpy memory aliasing restrictions are not encoded in the API and can only be conveyed through docs. This is a footgun.

            For (1), Rust APIs of this type operate on sized slices, or in the case of strings, string slices. Zig defines strings as sized byte slices.

            For (2), Rust enforces this invariant via the borrow checker by disallowing (at compile-time) a shared slice reference that points to an overlapping mutable slice reference. In other words, an API like this is simply not possible to define in (safe) Rust, which means you (as the user) do not need to pore over the docs for each stdlib function you use looking for memory-related footguns.

      • dundarious 1 hour ago
        Yes, these were also common in several wire formats I had to use for market data/entry.

        You would think char symbol[20] would be inefficient for such performance sensitive software, but for the vast majority of exchanges, their technical competencies were not there to properly replace these readable symbol/IDs with a compact/opaque integer ID like a u32. Several exchanges tried and they had numerous issues with IDs not being "properly" unique across symbol types, or time (intra-day or shortly before the open restarts were a common nightmare), etc. A char symbol[20] and strncpy was a dream by comparison.

      • ufo 2 hours ago
        A big footgun with strncpy is that the output string may not be null terminated.
        • kccqzy 2 hours ago
          Yeah but fixed width strings don’t need null termination. You know exactly how long the string is. No need to find that null byte.
          • ninkendo 2 hours ago
            Until you pass them as a `char *` by accident and it eventually makes its way to some code that does expect null termination.

            There’s languages where you can be quite confident your string will never need null termination… but C is not one of them.

            • kccqzy 1 hour ago
              You don’t do that by accident. Fixed-width strings are thoroughly outdated and unusual. Your mental model of them is very different from regular C strings.
          • Sharlin 2 hours ago
            Good luck though remembering not to pass one to any function that does expect to find a null terminator.
            • kevin_thibedeau 1 hour ago
              Ignore the prefix and always treat strncpy() as a special binary data operation for an era where shaving bytes on storage was important. It's for copying into a struct with array fields or direct to an encoded block of memory. In that context you will never be dependent on the presence of NUL. The only safe usage with strings is to check for NUL on every use or wrap it. At that point you may as well switch to a new function with better semantics.
            • integralid 45 minutes ago
              That's not a problem with strncpy, right? Fixed width records are a thing of the past, and even then it was only used for on-disk storage.
            • andrepd 1 hour ago
              Seriously. We have type systems and compilers that help us to not forget these things. It's not the 70s anymore!
      • dingi 1 hour ago
        Isn't strlcpy the safer solution these days?
  • ahoka 41 minutes ago
    "strncpy() is a weird function with a crappy API."

    Well if you bother looking up that it's originally created for non null-terminated strings, then it kinda makes sense.

    The real problem begun when static analyzers started to recommend using it instead of strcpy (the real alternative used to be snprintf, now strlcpy).

  • swinglock 2 hours ago
    I'm surprised curlx_strcopy doesn't return success. Sure you could check if dest[0] != '/0' if you care to, but that's not only clumsy to write but also error prone, and so checking for success is not encouraged.
    • jutter 2 hours ago
      This is especially bizarre given that he explains above that "it is rare that copying a partial string is the right choice" and that the previous solution returned an error...

      So now it silently fails and sets dest to an empty string without even partially copying anything!?

    • AlexeyBrin 2 hours ago
      I guess the idea is that if the code does not crash at this line:

          DEBUGASSERT(slen < dsize);
      
      it means it succeeded. Although some compilers will remove the assertions in release builds.

      I would have preferred an explicit error code though.

      • swinglock 1 hour ago
        assert() is always only compiled if NDEBUG is not defined. I hope DEBUGASSERT is just that too because it really sounds like it, even more so than assert does.

        But regardless of whether the assert is compiled or not, its presence strongly signals that "in a C program strcpy should only be used when we have full control of both" is true for this new function as well.

  • Waterluvian 48 minutes ago
    > Enforce checks close to code

    This makes a lot of sense but one time I find this gets messy is when there’s times I need to do checks earlier in a dataset’s lifetime. I don’t want to pay to check multiple times, but I don’t want to push the check up and it gets lost in a future refactor.

    I’m imagining a metadata for compile time that basically says, “to act on this data it must have been first checked. I don’t care when, so long as it has been by now.” Which I’m imagining is what Rust is doing with a Result type? At that point it stops mattering how close to code a check is, as long as you type distinguish between checked and unchecked?

  • pama 3 hours ago
    Congrats on the completion of this effort! C/C++ can be memory safe but take some effort.

    IMHO the timeline figure could benefit in mobile from using larger fonts. Most plotting libraries have horrible font size defaults. I wonder why no library picked the other extreme end: I have never seen too large an axis label yet.

    • saagarjha 2 hours ago
      Removing strcpy from your code does not make it memory safe.
      • kjjfnkeknrn 1 hour ago
        Removing strcpy from your code does make it a little memory safer.
        • alexfoo 42 minutes ago
          Removing strcpy from your code makes it a little less memory unsafe.

          (Depends on what you replace it with obviously...)

    • Tempest1981 2 hours ago
      Yes, the graph font-sizes seem intended for printing them on a single sheet of paper, vs squeezed into a single column in a blog.
  • Scubabear68 3 hours ago
    From the article:

    > It has been proven numerous times already that strcpy in source code is like a honey pot for generating hallucinated vulnerability claims

    This closing thought in the article really stood out to me. Why even bother to run AI checking on C code if the AI flags strcpy() as a problem without caveat?

    • CGamesPlay 2 hours ago
      It's not quite as black and white as the article implies. The hallucinated vulnerability reports don't flag it "without caveat", they invent a convoluted proof of vulnerability with a logical error somewhere along the way, and then this is what gets submitted as the vulnerability report. That's why it's so agitating for the maintainers: it requires reading a "proof" and finding the contradiction.
    • Sharlin 1 hour ago
      Because these people who run AI checks on OSS code and submit bogus bug reports either assume that AIs don't make mistakes, or just don't care if the report is legit or not, because there's little to no personal cost to them even if it isn't.
    • saagarjha 2 hours ago
      Because people are stupid and use AI for things it is not good at.
      • Tempest1981 2 hours ago
        > people are stupid

        people overestimate AI

        • lesuorac 1 hour ago
          Its weird though because looking through the hackone reports in the slop wiki page there aren't actually reproduction steps. It's basically always just a line of code and an explanation of how a function can be mis-used but not a "make a webserver that has this hardcoded response".

          So like why doesn't the person iterate with the AI until they understand the bug (and then ultimately discover it doesn't exist)? Like have any of this bug reports actually paid out? It seems like quickly people should just give up from a lack of rewards.

          • zahlman 49 minutes ago
            > So like why doesn't the person iterate with the AI until they understand the bug (and then ultimately discover it doesn't exist)? Like have any of this bug reports actually paid out? It seems like quickly people should just give up from a lack of rewards.

            This sounds a bit like expecting the people who followed a "make your own drop-shipping company" tutorial to try using the products they're shipping to understand that they suck.

          • amenhotep 1 hour ago
            As long as the number of people newly being convinced that AI generated bounty demands are a good way to make money equals or exceeds the number of people realising it isn't and giving up, the problem remains.

            Not helped, I imagine, that once you realise it doesn't work, an easy pivot is to start convincing new people that it'll work if they pay you money for a course on it.

            • zahlman 48 minutes ago
              Apparently FOSS developers have been getting this kind of slop report even though they clearly don't offer a bug bounty.
  • loeg 2 hours ago
    A weird Annex-K like API. The destination buffer size includes space for the trailing nul, but the source size only includes non-nul string bytes.

    I don't really think this adds anything over forcing callers to use memcpy directly, instead of strcpy.

  • zahlman 1 hour ago
    > To make sure that the size checks cannot be separated from the copy itself we introduced a string copy replacement function the other day that takes the target buffer, target size, source buffer and source string length as arguments and only if the copy can be made and the null terminator also fits there, the operation is done.

    ... And if the copy can't be made, apparently the destination is truncated as long as there's space (i.e., a null terminator is written at element 0). And it returns void.

    I'm really not sold on that being the best way to handle the case where copying is impossible. I'd think that's an error case that should be signaled with a non-zero return, leaving the destination buffer alone. Sure, that's not supposed to happen (hence the DEBUGASSERT macro), but still. It might even be easier to design around that possibility rather than making it the caller's responsibility to check first.

  • stabbles 2 hours ago
    Apart from Daniel Sternberg's frequent complaints about AI slop, he also writes [1]

    > A new breed of AI-powered high quality code analyzers, primarily ZeroPath and Aisle Research, started pouring in bug reports to us with potential defects. We have fixed several hundred bugs as a direct result of those reports – so far.

    [1] https://daniel.haxx.se/blog/2025/12/23/a-curl-2025-review/

  • snvzz 3 hours ago
    The AI chatbot vulnerability reports part sure is sad to read.

    Why is this even a thing and isn't opt-in?

    I dread the idea of starting to get notifications from them in my own projects.

    • trollbridge 2 hours ago
      Making a strcpy honeypot doesn’t sound like a bad idea…

        void nobody_calls_me(const char *stuff) {
                char *a, *b;
                const size_t c = 1024;
      
                a = calloc(c);
                if (!a) return;
                b = malloc(c);
                if (!b) {
                        free(a);
                        return;
                }
                strncpy(a, stuff, c - 1);
                strcpy(b, a);
                strcpy(a, b);
                free(a);
                free(b);
        }
      
      Some clever obfuscation would make this even more effective.
      • snvzz 10 minutes ago
        That got those Core SDI abo vibes.

        Flashback of writing exploits for these back in high school.

    • easterncalculus 1 hour ago
      It's a symptom of complete failure of this industry that maintainers are even remotely thinking about, much less implementing changes in their work to stave off harassment over false security impact from bots.
    • Y_Y 3 hours ago
      Because humans generate and relay the slop-reports in the hopes of being helpful
      • nottorp 21 minutes ago
        There is or was a cash bug bounty.

        And even if not, the motivation is building a reputation as a security “expert”.

      • captn3m0 3 hours ago
        s/being helpful/making money.
  • senthil_rajasek 3 hours ago
    Title is :

    No strcpy either

    @dang

    • Snild 3 hours ago
      I don't see a problem with that, but for the record, the title on the site is lower-case for me (both browser tab title, and the header when in reader mode).
      • 1f60c 2 hours ago
        I think the submission originally had a typo ("strpy", with no C)
  • TZubiri 2 hours ago
    LMAO

    After all this time the initial AI Slop report was right:

    https://hackerone.com/reports/2298307

    • lesuorac 1 hour ago
      ?

      Nonce and websockets don't appear at all in the blog post. The only thing the ai slop got right is that by removing strcpy curl will get less issues [submitted about it].