A few things I’ve learned while auditing open-source projects.
calloc
These days, it’s common understanding not to worry much about useless writes. For small, simple cases, this is great advice. People sometimes make risky decisions to omit a few store instructions, and modern compilers are often smart enough to identify dead assignments. Moreover, trying to shave away needless initializations usually is rarely a valuable time investment.
calloc
, however, can be a notable exception. As you likely know, calloc
zeroes the memory that it allocates. It’s probably linked into your binary
dynamically, so the compiler can’t identify its writes as useless in IR- or
assembly-level optimizations.
Clang 3.7 with -O2
can reduce a trivial usage like this:
p = calloc(1, sizeof(long)); return p[0];
To return 0
. However, a typical usage of the form:
p = calloc(n, sizeof(struct huge_struct)); memcpy(p, p2, n * sizeof(struct huge_struct));
Will call the stdlib’s calloc
function, therefore including the useless
zeroing. Writes of this type can be tens of kilobytes - a big performance hit
if used in a hot path.
The compiler doesn’t have an easy out, either. While there are
platform-specific functions like OpenBSD’s
reallocarray
,
there is no POSIX- or ANSI-specified calloc
relative that doesn’t
zero its memory. The compiler could add an overflow check and then call
malloc, but it’s then responsible for the correctness and the size of
this silently injected code.
So, avoid calloc
when immediately overwriting the allocated memory.
Instead, use malloc
if the number of elements is 1
and your
platform’s reallocarray
equivalent otherwise. If there isn’t one, you
can always include the file from OpenSSH’s portability
code.
The above calloc example exposes a more general weakness in modern C. While compiler optimizations at the basic block and function level are very strong, it’s amazing how little Clang[1] leverages the libc API for optimization. While there are a few complications, it seems like a fruitful approach.
To this effect, I’m writing a strnlen(3)
optimization for Clang.
Hopefully I’ll be able to write similar optimizations for other
string.h
functions. And hopefully I’ll have time to share details of
the experience and related advice here.
I haven’t had a chance to investigate how GCC does this. What I’ve heard about its architecture and code quality makes me hesitant… ↩