I used to naively assume that Clang always handed off “basically” the same IR to
the LLVM optimisation pipeline regardless of optimisation level. I was at least
aware of the optnone
attribute set on functions when compiling at -O0
, but
I’ve slowly started to notice there are more divergences than just that.
Survey
In an attempt to gain a bit more understanding into exactly what kinds of
decisions depend on optimisation level in Clang, I surveyed the IR emission
code
paths.
I examined Clang source at commit 7c4c72b52038810a8997938a2b3485363cd6be3a
(2024-08).
I ignored decisions related to specialised language specifics (Objective-C, ARC, HLSL, OpenMP) and ABI details.
- When optimisation is disabled
- When optimisation is enabled
- Check if
errno
is disabled - Pass
__builtin_expect
along viallvm.expect
(2) - Add various virtual table invariants and assumptions (more in same file)
- Split constant struct / array stores into sequence for each field
- Add various variable invariants
- Add load range metadata
- Add matrix index assumptions (2, 3)
- Collapse trap calls
- Add exact dynamic casts
- Add loop unrolling metadata
- Add condition likelihood (2, 3, 4, 5)
- Track condition likelihood
- Pass
__builtin_unpredictable
along via metadata (2) - Add lifetime markers
- Add type-based alias analysis (TBAA) metadata (2, 3)
- Add strict virtual table metadata
- Preserve function declarations
- Add opportunistic virtual tables
- Check if
Example
If you’d like to explore the differences yourself, take a look at this Compiler
Explorer example. The input source is not too
interesting (I’ve grabbed a random slice of Git source files that I happened to
have on hand). The left IR view shows -O0
and the right IR view shows -O1
with LLVM passes disabled. We can ask Clang to produce LLVM IR without sending
it through the LLVM optimisation pipeline by adding -Xclang -disable-llvm-passes
(a useful tip for LLVM archaeology).
After diffing the two outputs, there are two features that are only activated when optimisation is enabled that appear to be responsible for most of the differences in this example:
- Lifetime markers
- Type-based alias analysis (TBAA) metadata
Lifetime markers are especially interesting in this example, as Clang actually
reshapes control flow (adding several additional cleanup
blocks) so that it
can insert these markers (which are calls to LLVM intrinsic functions
llvm.lifetime.start/end
).