I used to naively assume that Clang always handed off “basically” the same IR to the LLVM optimisation pipeline regardless of optimisation level. I was at least aware of the optnone attribute set on functions when compiling at -O0, but I’ve slowly started to notice there are more divergences than just that.

Survey

In an attempt to gain a bit more understanding into exactly what kinds of decisions depend on optimisation level in Clang, I surveyed the IR emission code paths. I examined Clang source at commit 7c4c72b52038810a8997938a2b3485363cd6be3a (2024-08).

I ignored decisions related to specialised language specifics (Objective-C, ARC, HLSL, OpenMP) and ABI details.

Example

If you’d like to explore the differences yourself, take a look at this Compiler Explorer example. The input source is not too interesting (I’ve grabbed a random slice of Git source files that I happened to have on hand). The left IR view shows -O0 and the right IR view shows -O1 with LLVM passes disabled. We can ask Clang to produce LLVM IR without sending it through the LLVM optimisation pipeline by adding -Xclang -disable-llvm-passes (a useful tip for LLVM archaeology).

Compiler Explorer playground comparing O0 and O1 LLVM IR

After diffing the two outputs, there are two features that are only activated when optimisation is enabled that appear to be responsible for most of the differences in this example:

  • Lifetime markers
  • Type-based alias analysis (TBAA) metadata

Lifetime markers are especially interesting in this example, as Clang actually reshapes control flow (adding several additional cleanup blocks) so that it can insert these markers (which are calls to LLVM intrinsic functions llvm.lifetime.start/end).