Welcome to yet another article that began life as a rant in the comments section of an article in The Coded Message. It's a good blog, pay it a visit!
Introduction
There are many good reasons to champion C++, and to want it to get safer. There are literally billions of lines of C++ in the world1; this code is not going away any time soon. If it has bugs, those bugs should be cleaned up—and a more modern, safer C++ is a fantastic tool to achieve this.
There is also, however, a really bad reason to champion C++. That being: “But why would you consider C++ memory-unsafe when it's got so much better lately? Why bother switching to another language like Rust? There's no reason!”
This is, to put it mildly, absurdly false. As established earlier, Rust's memory safety is to C++'s as Mach 3 is to a particularly slow walker. Even calling Rust's memory safety “overwhelmingly larger” than C++'s would be to vastly undersell it, and even a 20× improvement on C++'s safety wouldn't even reach the level of unsafe Rust—let alone safe Rust.
“We can become like Rust too! Well, uh, sort of, some day, maybe?”
Many of C++'s biggest advocates (Bjarne Stroustrup, Herb Sutter to name just two) appear to be saying something to the extent of “With enough development, C++ will manage to offer a reasonable approximation of Rust's memory safety if the programmer has enough discipline/time/courage/determination/money/blood/sweat/tears/sacrifices to Shub'Niggurath!”
And, well…
- Ample experimental evidence says this is not remotely true. But let's ignore that.
- Rust is memory-safe right now, not at some undefined point in the future. But let's ignore that.
- Playing catch-up is not remotely close to sufficient any more, not when Rust is breaking so much ground in new features. But let's ignore that.
- Rust gives developers memory safety essentially for free. No need for blood or sweat or tears, at least once they learn how the borrow-checker works. But let's ignore that.
- Rust gives practically-complete memory safety, not just “a reasonable approximation”. But, let's ignore even that!
Even if we ignore all five of those points, even if we assume that the pedestrian can become a Concorde, even if we assume that this will happen completely and painlessly and right now… Rust would nonetheless have an overwhelming advantage in one area: auditability.
Auditability: Rust vs C++
This article was indirectly inspired to a large extent by Bjarne Stroustrup's response to the NSA. Reading through it, it appears that the absolute most that he can bring himself to promise is that one can convince just oneself of the safety of one's own C++ program. This would only be sufficient in a fantasy land where all programmers program everything from scratch, every time. This might be the C/C++ experience by sheer necessity, but it's the view of a slim minority.
Out here in the 21st century, there exist these little things called “package managers”. They let us trivially incorporate other people's code into our own. Importantly, they also make it important to judge the memory safety of the code we incorporate.
How does each language go about this?
Rust
Auditing a Rust dependency for memory safety consists of grep unsafe; if it returns nothing then you're done, bam, no more thought needed. Even if it returns something, it's only this specific minuscule fraction of the crate that you have to audit, not all of it. And while this admittedly can't detect if it carries over memory unsafety from its own dependencies, said dependencies can in turn be audited just as easily. That alone gives Rust an edge that C++ could only ever dream of.
What's that you say? Having to do manual audits is just too time-consuming? cargo-geiger not good enough? Well then, you can just pass --forbid unsafe_code --cap-lints=forbid to your RUSTFLAGS and you can forbid unsafe code for all your dependencies, direct and transitive. Memory safety guaranteed by the compiler.
C++
For comparison: the closest thing we can get to this for C++ is static analysers like Astrée. Their efficiency is, well… let me just quote the web-page for you:
a program of 132,000 lines of C [was] analyzed in 80 minutes on a 2.8 GHz 32-bit PC using 300 Mb of memory, and 50 minutes on a 64-bit AMD Athlon™ 64 using 580 Mb of memory.
Yeeeeeeeeeeeeah. Fancy doing that every time you change one line in the code-base? 'Cause I don't think I do. Granted, those results are from 2003 and computers have got much faster since then; on the other hand, 132kLoC is not exactly a gigantic code-base either. One would assume that, if they had more recent impressive results, they'd have bothered mentioning them on their web-page.
For Rust meanwhile, where static analysis is baked into the compilation pipe-line, “very slow to compile” means “more than five minutes”, even for projects spanning 500+kLoC.
And that's before we even get to the part where they don't even have a fixed price for their product; “contact us for a quote” tends to translate to “as much as we can extort you for”. Rust, meanwhile, is just… free.
The fundamental problem with static analysers like Astrée is that C/C++ currently has no way to expose safe interfaces, and because of that it has no local reasoning either. Thus, all static analysis can do is to examine the entire code-base for mistaken usages, every time a change happens. This means that static analysis oughtn't to be capable of examining code-bases in linear time: every part of the code-base has to be rubbed against every other part of the code-base. This is an $O(N^2)$ cost, meaning that a code-base 10 times larger would need 100 times the compute costs to be examined. And, again: any change in the program, even in just one line, invalidates every prior static analysis; therefore, this has to run for each change we want to make. This is very much a losing proposition.
There's also an empirical factor: Astrée has existed since 2003. FAANG companies ought to be able to afford it, yet around 65% of their bugs are due to memory safety. This can't be a solution, not if we assume that FAANG companies aren't dumb. Again: all this is using Google's own data.
To once again quote Alex Gaynor: “Until you have the evidence, don’t bother with hypothetical notions that someone can write 10 million lines of C without ubiquitious memory-unsafety vulnerabilities—it’s just Flat Earth Theory for software engineers.”
═════════════════════════════════════════════════════════════════════════════
PS: Important thanks to this commenter for offering some much-needed clarifications, instead of (judging by the name) exploding everything in sight.
Google alone has around 2 billion lines of code. Assuming that even a quarter of those is C++, that's half a GLoC of C++. Assuming similar sizes and percentages for the rest of tech industry giants, we get a rough estimate of at least 1-2 GLoC of C++ world-wide.