Phase 1: Build information
Before we can even think about recompiling C++ code, let alone patching machine code, we need to be able to answer a fundamental question: when touching a set of files (be it headers or translation units), which are part of the build, which files do we even have to recompile?
Remember that one of our goals we set initially was that the tool should not care about the project setup, the build setup and the toolchain used for compiling the code. So, the question is: where do we start with this?
Debug information
If your initial idea was to look at the debug information generated for a build, you would be off to a good start. In addition to storing data about symbols, lines, and variables, which are mostly meant to be ingested by debuggers, debug information often also contains a lot of other useful bits.
For our purposes, we need to be able to answer the following obvious questions:
- Which translation units are part of the build?
- Which compiler options were used for compiling each translation unit? Different translation units will use different options.
- Which toolchain was used to compile each translation unit? A build might link in static libraries which were built with e.g. different versions of the MSVC compiler, for various reasons.
- Which header files are included by each translation unit? Changing something in a header file should recompile all corresponding translation units that make use of this file in one way or another.
There is one more question we need to answer, which I think is non-obvious:
- Which translation unit contributed which symbols to the final build?
We will tackle this one much later, but I'll give you a hint for now: think about inlining and how inlined functions can produce different assembly code in different translation units, yet the linker ultimately has to choose only one (or, in some of the more general COMDAT cases, report an error). This is detailed in the PE/COFF specification if you want to take a look at the available options.
For those who want to follow along at home during the next sections, we are going to use Dia2Dump from Microsoft, which uses the DIA SDK for dumping PDB files.
Here's a simple .bat script onto which you can drop any .pdb file and have it produce a .pdb.dump.txt in the same directory, assuming Dia2Dump.exe is available in that directory as well:
cd %~dp0
Dia2Dump.exe -all %1 > %1.dump.txt
Warning: the -all option dumps everything, which will easily produce files larger than 10GB if you try to use it on large .pdb files.
Additionally, grab the latest Live++ examples, open Examples_x64\build\VS2022\Examples.sln, and build the first example named 01_HotReload with the Debug configuration.
Now take Examples_x64\bin\x64\01_HotReload_Debug.pdb, drop it onto our .bat script, and open 01_HotReload_Debug.pdb.dump.txt in a text editor of your choice.
Modules
The first thing you will see at the top of the file are the modules. Modules in DIA parlance are actually a mix of two things: translation units that were linked into the corresponding .exe/.dll, and any .dll which is loaded when the process starts by linking against the corresponding import library.
Looking at the dump file, you should see something like the following:
0001 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Cube.obj
0002 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FireDemo.obj
0003 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FrameBuffer.obj
0004 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Input.obj
0005 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\MainHotReload.obj
0006 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\MainLoop.obj
0007 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Matrix4x4.obj
0008 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Mutex.obj
0009 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Palette.obj
000A D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\PalettizedFrameBuffer.obj
000B D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Timestamp.obj
000C D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Window.obj
000D D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\XorShift.obj
These are all translation units which are part of the Live++ example itself. After that, you'll see the following:
000E KERNEL32.dll
000F USER32.dll
0010 GDI32.dll
These are .dlls used by the Live++ example, because it (implicitly) links against the corresponding import libraries.
The next part will probably come as a bit of a surprise to you:
0011 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\gshandler.obj
0012 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\amdsecgs.obj
0013 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\gs_cookie.obj
0014 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\fltused.obj
0015 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\new_array.obj
0016 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\delete_array.obj
0017 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\new_scalar.obj
0018 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\delete_scalar_size.obj
0019 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\exe_main.obj
001A D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\gs_report.obj
001B D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\delete_scalar.obj
001C D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\throw_bad_alloc.obj
001D D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\utility.obj
001E D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\gs_support.obj
001F D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\matherr.obj
0020 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\argv_mode.obj
0021 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\commit_mode.obj
0022 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\file_mode.obj
0023 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\new_mode.obj
0024 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\thread_locale.obj
0025 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\tncleanup.obj
0026 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\env_mode.obj
0027 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\invalid_parameter_handler.obj
0028 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\denormal_control.obj
0029 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\default_local_stdio_options.obj
002A D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\matherr_detection.obj
002B D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\dyn_tls_init.obj
002C D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\dyn_tls_dtor.obj
002D D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\utility_desktop.obj
002E D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\initsect.obj
002F D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\initializers.obj
0030 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\guard_support.obj
0031 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\std_type_info_static.obj
0032 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\cpu_disp.obj
0033 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\ucrt_detection.obj
0034 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\guard_dispatch.obj
0035 D:\a\_work\1\s\Intermediate\crt\vcstartup\build\xmd\msvcrt_xmd_kernel32.vcxproj\objd\amd64\guard_xfg_dispatch.obj
These .obj files are "compiler scaffolding" required by things your code explicitly or implicitly uses, and often depend on the actual compiler options used. Examples are "Enable Security Check (/GS)", things like locales, support for thread-local variables, and so on.
The paths you see are the actual directories these object files were originally compiled in. They simply became part of a static library (.lib), which is really nothing more than an archive of .obj files, so whatever debug information they were compiled with naturally becomes part of the static library as well.
Following in the dump is this:
0036 VCRUNTIME140_1D.dll
This means that our build uses the dynamic CRT (/MDd), and not the static CRT (/MTd). With the latter, we would have seen the actual .obj files that make up the static CRT as well. Go ahead and try it if you want, you will see files such as the following:
004B d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\tran\amd64\xmt\objfre\amd64\fmodf.obj
004C d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\heap\xmt\objfre\amd64\free.obj
004D d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\heap\xmt\objfre\amd64\malloc.obj
004E d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\_file.obj
004F d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\fclose.obj
0050 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\fopen.obj
0051 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\fread.obj
0052 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\fseek.obj
0053 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\ftell.obj
0054 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdio\xmt\objfre\amd64\output.obj
0055 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\tran\xmt\objfre\amd64\cosf.obj
0056 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\tran\xmt\objfre\amd64\sinf.obj
0057 d:\os\obj\amd64fre\minkernel\crts\ucrt\src\appcrt\dll\xmt\..\..\stdlib\xmt\objfre\amd64\abs.obj
...
The last module in the list is a hardcoded module which contains information about the linking step used for producing the .exe:
0043 * Linker *
Publics
The next section in the dump file will be public symbols. This section contains all symbols having external linkage. Although we are not interested in them today, we will come back to them in a future post.
Per-module information
For each module that became part of an executable file, there will be a separate stream in the .pdb that contains additional information. The first module you should see in the dump file is the following:
** Module: D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Cube.obj
CompilandEnv : obj = "D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Cube.obj"
CompilandDetails:
Language: C++
Target processor: x64
Compiled for edit and continue: no
Compiled without debugging info: no
Compiled with LTCG: no
Compiled with /bzalign: no
Managed code present: no
Compiled with /GS: yes
Compiled with /sdl: no
Compiled with /hotpatch: yes
Converted by CVTCIL: no
MSIL module: no
Frontend Version: Major = 19, Minor = 43, Build = 34808, QFE = 0
Backend Version: Major = 19, Minor = 43, Build = 34808, QFE = 0
Version string: Microsoft (R) Optimizing Compiler
We are now looking at what DIA calls the compiland details - a compiland in DIA parlance is simply a translation unit. As can be seen, each compiland knows which language it was compiled from, for which target, with which version of a compiler, along with the compiler's version string and a few extra bits for certain compiler options, such as /hotpatch.
While Live++ is not interested in most of these details, it does use them to check for managed code, link-time code generation, and target machine and will output an error in case any of those don't match our requirements.
One other thing worth mentioning here is the following:
** Module: D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Cube.obj
CompilandEnv : obj = "D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Cube.obj"
We already know the path of the module, so why does the compiland environment contain seemingly the same path? While it may look like redundant information, it's actually not. It's just a coincidence both paths are the same in our example.
There's a crucial difference between those two paths:
- A module path stores the path to files that were linked into the executable. Those paths can be either absolute, or relative, which often happens when linking object files from static libraries.
- The compiland environment path stores the path to the file that was compiled. Those paths can point to remote paths (when using distributed build systems such as e.g. FASTBuild), or temporary files (e.g. for build systems that compile to .obj.tmp and then move the file to .obj).
A bit further down in the same module information dump you should find the following, skipping over all UsingNamespace, Function, Data and CallSite bits:
CompilandEnv : cwd = "D:\QA\LPP_2_11_1\Examples_x64\build\VS2022"
CompilandEnv : cl = "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\bin\HostX64\x64\CL.exe"
CompilandEnv : src = "..\..\src\Cube.cpp"
CompilandEnv : pdb = "D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\vc143.pdb"
CompilandEnv : cmd = "-c -ID:\QA\LPP_2_11_1 -Zi -nologo -W3 -WX- -diagnostics:column -MP -Od -Gm- -EHs -EHc -MDd -GS -fp:precise -Zc:wchar_t -Zc:forScope -Zc:inline -external:W3 -Gd -TP -FC -errorreport:prompt -I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\include" -I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\atlmfc\include" -I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\VS\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt" -I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\VS\UnitTest\include" -I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\um" -I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\winrt" -I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\cppwinrt" -I"C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8.1\Include\um" -external:I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\include" -external:I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\atlmfc\include" -external:I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\VS\include" -external:I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\ucrt" -external:I"C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\VS\UnitTest\include" -external:I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\um" -external:I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\shared" -external:I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\winrt" -external:I"C:\Program Files (x86)\Windows Kits\10\Include\10.0.22621.0\cppwinrt" -external:I"C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8.1\Include\um" -X"
This is exactly the kind of information we need. Breaking it down, the compiland environment also stores the following pieces of information:
- 'cwd': The working directory used for invoking the compiler.
- 'cl': The path to the compiler executable used for compiling the corresponding compiland.
- 'src': An absolute or relative path to the source .cpp file that makes up this compiland.
- 'pdb': An absolute or relative path to the intermediate compiler .pdb that was built during compilation.
- 'cmd': The command-line used for invoking the compiler executable.
Since there seems to be confusion surrounding these vc*.pdb files in general, this warrants a quick explanation: If your build uses the "Program Database (/Zi)" compiler option (which our example does), the compiler will produce an intermediate .pdb file, which by default is named vc*.pdb. This .pdb file does not correspond to the final .pdb produced by the linker, and should therefore be treated like a build artefact. Never submit this file into version control or ship to customers.
If you change the Debug Information Format to "C7 compatible (/Z7)", the compiler will no longer produce such an intermediate file, and the corresponding 'pdb' entry in the compiland environment will hold an empty path.
Now, going back to our dump file, scroll down until you find the following, very last module in our list:
** Module: * Linker *
As mentioned above, this is a hardcoded module which is part of every .pdb. Since it is stored as a per-module stream in the .pdb internally, it also stores a similar compiland environment:
CompilandEnv : cwd = "D:\QA\LPP_2_11_1\Examples_x64\build\VS2022"
CompilandEnv : exe = "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Tools\MSVC\14.43.34808\bin\HostX64\x64\link.exe"
CompilandEnv : pdb = "D:\QA\LPP_2_11_1\Examples_x64\bin\x64\01_HotReload_Debug.pdb"
CompilandEnv : cmd = " /ERRORREPORT:PROMPT /OUT:D:\QA\LPP_2_11_1\Examples_x64\build\VS2022\..\..\bin\x64\01_HotReload_Debug.exe /INCREMENTAL:NO /NOLOGO /FUNCTIONPADMIN /MANIFEST "/MANIFESTUAC:level='asInvoker' uiAccess='false'" /manifest:embed /DEBUG:FULL /PDB:D:\QA\LPP_2_11_1\Examples_x64\build\VS2022\..\..\bin\x64\01_HotReload_Debug.pdb /SUBSYSTEM:CONSOLE /OPT:NOREF /OPT:NOICF /TLBID:1 /DYNAMICBASE /NXCOMPAT /IMPLIB:D:\QA\LPP_2_11_1\Examples_x64\build\VS2022\..\..\bin\x64\01_HotReload_Debug.lib /MACHINE:X64"
- 'cwd': The working directory used for invoking the linker.
- 'exe': The path to the linker executable used for linking the corresponding executable.
- 'pdb': An absolute or relative path to the final linker .pdb that was built during linking.
- 'cmd': The command-line used for invoking the linker executable.
While it might not be immediately obvious what Live++ would use this information for, it will become clearer in a future post. For now, we can check the linker command-line whether it contains required options, and does not contain options Live++ is not compatible with.
Still, one interesting thing to look at for now are what's called CoffGroup in DIA parlance:
CoffGroup : [0x0001:0x00000000] 0x00001000, len = 00000060, characteristics = 60000020, .text$di
CoffGroup : [0x0001:0x00000060] 0x00001060, len = 00007670, characteristics = 60000020, .text$mn
CoffGroup : [0x0001:0x000076d0] 0x000086D0, len = 00000040, characteristics = 60000020, .text$mn$00
CoffGroup : [0x0001:0x00007710] 0x00008710, len = 0000018D, characteristics = 60000020, .text$x
CoffGroup : [0x0002:0x00000000] 0x00009000, len = 00000348, characteristics = C0000040, .idata$5
CoffGroup : [0x0002:0x00000348] 0x00009348, len = 00000038, characteristics = 40000040, .00cfg
CoffGroup : [0x0002:0x00000380] 0x00009380, len = 00000008, characteristics = 40000040, .CRT$XCA
CoffGroup : [0x0002:0x00000388] 0x00009388, len = 00000008, characteristics = 40000040, .CRT$XCAA
CoffGroup : [0x0002:0x00000390] 0x00009390, len = 00000010, characteristics = 40000040, .CRT$XCU
CoffGroup : [0x0002:0x000003a0] 0x000093A0, len = 00000008, characteristics = 40000040, .CRT$XCZ
CoffGroup : [0x0002:0x000003a8] 0x000093A8, len = 00000008, characteristics = 40000040, .CRT$XIA
CoffGroup : [0x0002:0x000003b0] 0x000093B0, len = 00000008, characteristics = 40000040, .CRT$XIAA
CoffGroup : [0x0002:0x000003b8] 0x000093B8, len = 00000008, characteristics = 40000040, .CRT$XIAC
CoffGroup : [0x0002:0x000003c0] 0x000093C0, len = 00000008, characteristics = 40000040, .CRT$XIZ
CoffGroup : [0x0002:0x000003c8] 0x000093C8, len = 00000008, characteristics = 40000040, .CRT$XPA
CoffGroup : [0x0002:0x000003d0] 0x000093D0, len = 00000008, characteristics = 40000040, .CRT$XPZ
CoffGroup : [0x0002:0x000003d8] 0x000093D8, len = 00000008, characteristics = 40000040, .CRT$XTA
CoffGroup : [0x0002:0x000003e0] 0x000093E0, len = 00000010, characteristics = 40000040, .CRT$XTZ
CoffGroup : [0x0002:0x000003e8] 0x000093E8, len = 00000000, characteristics = 40000040, .gehcont$y
CoffGroup : [0x0002:0x000003e8] 0x000093E8, len = 00000000, characteristics = 40000040, .gfids$y
CoffGroup : [0x0002:0x000003f0] 0x000093F0, len = 00000610, characteristics = 40000040, .rdata
CoffGroup : [0x0002:0x00000a00] 0x00009A00, len = 00000080, characteristics = 40000040, .rdata$CastGuardVftablesA
CoffGroup : [0x0002:0x00000a80] 0x00009A80, len = 00000080, characteristics = 40000040, .rdata$CastGuardVftablesC
CoffGroup : [0x0002:0x00000b00] 0x00009B00, len = 000001F4, characteristics = 40000040, .rdata$r
CoffGroup : [0x0002:0x00000cf4] 0x00009CF4, len = 000000DC, characteristics = 40000040, .rdata$voltmd
CoffGroup : [0x0002:0x00000dd0] 0x00009DD0, len = 00000400, characteristics = 40000040, .rdata$zzzdbg
CoffGroup : [0x0002:0x000011d0] 0x0000A1D0, len = 00000008, characteristics = 40000040, .rtc$IAA
CoffGroup : [0x0002:0x000011d8] 0x0000A1D8, len = 00000008, characteristics = 40000040, .rtc$IZZ
CoffGroup : [0x0002:0x000011e0] 0x0000A1E0, len = 00000008, characteristics = 40000040, .rtc$TAA
CoffGroup : [0x0002:0x000011e8] 0x0000A1E8, len = 00000008, characteristics = 40000040, .rtc$TZZ
CoffGroup : [0x0002:0x000011f0] 0x0000A1F0, len = 00000760, characteristics = 40000040, .xdata
CoffGroup : [0x0002:0x00001950] 0x0000A950, len = 000000EC, characteristics = 40000040, .xdata$x
CoffGroup : [0x0002:0x00001a3c] 0x0000AA3C, len = 00000000, characteristics = 40000040, .edata
CoffGroup : [0x0002:0x00001a3c] 0x0000AA3C, len = 00000078, characteristics = C0000040, .idata$2
CoffGroup : [0x0002:0x00001ab4] 0x0000AAB4, len = 00000014, characteristics = C0000040, .idata$3
CoffGroup : [0x0002:0x00001ac8] 0x0000AAC8, len = 00000348, characteristics = C0000040, .idata$4
CoffGroup : [0x0002:0x00001e10] 0x0000AE10, len = 00000796, characteristics = C0000040, .idata$6
CoffGroup : [0x0003:0x00000000] 0x0000C000, len = 000000C8, characteristics = C0000040, .data
CoffGroup : [0x0003:0x000000c8] 0x0000C0C8, len = 00000080, characteristics = C0000040, .data$r
CoffGroup : [0x0003:0x00000148] 0x0000C148, len = 00000028, characteristics = C0000040, .data$rs
CoffGroup : [0x0003:0x00000170] 0x0000C170, len = 00000850, characteristics = C0000080, .bss
CoffGroup : [0x0004:0x00000000] 0x0000D000, len = 000008B8, characteristics = 40000040, .pdata
CoffGroup : [0x0005:0x00000000] 0x0000E000, len = 00000008, characteristics = 40000040, .lpp_hotreload_prepatch_hooks
CoffGroup : [0x0006:0x00000000] 0x0000F000, len = 00000008, characteristics = 40000040, .lpp_hotreload_postpatch_hooks
CoffGroup : [0x0007:0x00000000] 0x00010000, len = 00000060, characteristics = 40000040, .rsrc$01
CoffGroup : [0x0007:0x00000060] 0x00010060, len = 00000180, characteristics = 40000040, .rsrc$02
These are useful for finding the ranges of addresses certain sections that make up the executable lie in. As an example, if you ever wondered how Live++ hooks work, the COFF groups shown above already kind of give it away:
CoffGroup : [0x0005:0x00000000] 0x0000E000, len = 00000008, characteristics = 40000040, .lpp_hotreload_prepatch_hooks
CoffGroup : [0x0006:0x00000000] 0x0000F000, len = 00000008, characteristics = 40000040, .lpp_hotreload_postpatch_hooks
When registering a hook in your code via the Live++ C++ API, the underlying macro will put the registered hooks into specially named sections. You can see this happening in the Live++ examples, e.g. in MainLoop.cpp, line 106:
LPP_HOTRELOAD_PREPATCH_HOOK(FunctionCalledBeforePatching);
Expanding the macro, this yields the following:
LPP_HOOK(LPP_HOTRELOAD_PREPATCH_HOOK_SECTION, _function, LPP_NAMESPACE LppHotReloadPrepatchHookId, const wchar_t* const recompiledModulePath, const wchar_t* const* const modifiedFiles, unsigned int modifiedFilesCount, const wchar_t* const* const modifiedClassLayouts, unsigned int modifiedClassLayoutsCount)
where LPP_HOOK is implemented as follows:
#if defined(__clang__)
# define LPP_HOOK(_section, _function, ...) \
extern void (* const LPP_IDENTIFIER(LPP_CONCATENATE(lpp_hook_function, _function)))(__VA_ARGS__) __attribute__((section(_section))); \
extern void (* const LPP_IDENTIFIER(LPP_CONCATENATE(lpp_hook_function, _function)))(__VA_ARGS__) __attribute__((section(_section))) = &_function
#elif defined(_MSC_VER)
# define LPP_HOOK(_section, _function, ...) \
__pragma(section(_section, read)) __declspec(allocate(_section)) extern void (* const LPP_IDENTIFIER(LPP_CONCATENATE(lpp_hook_function, _function)))(__VA_ARGS__) = &_function
#else
# error("Live++: Unknown compiler.");
#endif
Ignoring the LPP_IDENTIFIER and LPP_CONCATENATE bits for now, the macro defines a global function pointer in a specially named section called LPP_HOTRELOAD_PREPATCH_HOOK_SECTION, which is ".lpp_precompile_hooks". Note that this works for any type of variable, not just function pointers. As an example, you could put a global integer into your own section named "my_globals" like this:
__pragma(section("my_globals", read)) __declspec(allocate("my_globals")) extern int g_myInt = 10;
Globals
The next section in the dump file will be global symbols. Don't be misled by the name, as it has nothing to do with C++ globals. The section contains both symbols having external linkage as well as some internally generated symbols that also have external linkage, e.g. "MainLoop::Update'::`1'::dtor$0".
For now, we will simply ignore this section and come back to it in a future post, but do note that these symbols are stored with their undecorated name, in contrast to public symbols, which are stored with their mangled name. Rest assured that this decision made about 30 years ago is going to bite us in a future post.
Files
Following after the global symbols in the dump file will be a section named files. This section, again, stores information per compiland, this time a list of all files that contributed information during compilation. As an example, let's take a look at FrameBuffer.obj:
Compiland = D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FrameBuffer.obj
D:\QA\LPP_2_11_1\Examples_x64\src\PalettizedFrameBuffer.h (0x3: C4A57A0DE6FE64104EF4A4D3F69CCCBBC45DA1F572F2F393772412BE697C243A)
D:\QA\LPP_2_11_1\Examples_x64\src\FrameBuffer.cpp (0x3: 5F93E31AF8A90F5A44909D40D06D856EAEA3592F3D634C79DB4CD92966EB2341)
D:\QA\LPP_2_11_1\Examples_x64\src\Palette.h (0x3: A0254E47D5B7124B1B9CFBB9068C97BECDFA1582037A369028770646CD0A62F7)
D:\QA\LPP_2_11_1\Examples_x64\src\RGB.h (0x3: 4E786F25A1D2298FDE5A19A1EFBD595830E18CB979CCD46371C1297747ADC9D9)
D:\QA\LPP_2_11_1\Examples_x64\src\FrameBuffer.h (0x3: 8D6D0F113FA308557B36D02E5750F0C5C5686A6BE94672BE7D08CC23448A0FC8)
That is our dependency database for recompiling individual translation units right there. Well, almost, since this maps from .obj going to .h/.cpp, but we need a mapping from .h and .cpp back to .obj. Of course, this data structure is trivial to build with the information stored in this section.
Lines
The next section after the files in the dump files will be a section named lines. This contains the mapping from source lines to generated assembly instructions, and is mostly used by debuggers and tools like profilers. It's of no interest to us.
Section contribution
This section follows after the line information and is also the last one we are interested in. For each address (or range of addresses) in the executable, this stores the module that contributed the corresponding symbol (or symbols) to that address range:
*** SECTION CONTRIBUTION
RVA Address Size Module
00001010 0001:00000010 0000001B D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\MainLoop.obj
00001040 0001:00000040 0000001B D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\MainLoop.obj
00001070 0001:00000070 000007BD D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Cube.obj
00001840 0001:00000840 00000288 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FireDemo.obj
00001AD0 0001:00000AD0 00000047 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FireDemo.obj
00001B20 0001:00000B20 0000000E D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FireDemo.obj
00001B40 0001:00000B40 0000000D D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FireDemo.obj
00001B60 0001:00000B60 000001D4 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FrameBuffer.obj
00001D40 0001:00000D40 0000000B D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FrameBuffer.obj
00001D60 0001:00000D60 0000000F D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\FrameBuffer.obj
00001D80 0001:00000D80 000000A7 D:\QA\LPP_2_11_1\Examples_x64\temp\x64_Debug_01_HotReload\Input.obj
...
As an example, the symbol with a size of 0x0000000F bytes at RVA (Relative Virtual Address) 0x00001D60 was contributed by FrameBuffer.obj. The symbol in question might have been a data or function symbol, so you could consult the list of public symbols to find out exactly which symbol it is:
PublicSymbol: [00001D60][0001:00000D60] ?GetData@PalettizedFrameBuffer@@QEBAPEBEXZ(public: unsigned char const * __cdecl PalettizedFrameBuffer::GetData(void)const )
Wrapping up
To recap, let us identify which pieces of information we need (as outlined in the beginning) and how or where we obtain them:
Which translation units are part of the build?
We can extract this from the list of modules, the per-module streams and the compiland environments.
Which compiler options were used for compiling each translation unit?
Which toolchain was used to compile each translation unit?
Similar to the above, these can be extracted from the compiland environment of the per-module streams.
Which header files are included by each translation unit?
We can build a dependency database with the information stored in the files section.
Which translation unit contributed which symbols to the final build?
This can be extracted from the section contributions.
While the above sounds easy in theory, building the actual database of all translation units and their environment is a bit more complicated in practice due to the mix of relative paths, absolute paths, remote paths, compiled translation units, and linked translation units.
Files stored with an absolute path in section A might be stored with a relative path in section B, people use virtual drives and symlinks, and the toolchains involved are usually not consistent, especially with file cases. They use all lowercase paths in one place and an almost case-correct path in another place.
Internally, Live++ only uses canonicalized paths with correct casing and figures out where .obj, .cpp and .h files actually live on disk when building the database for an executable. This is achieved by trying different combinations of compiler working directory, linker working directory, static library paths, .pdb paths, all of which is performed by a larger C++ function that is the culmination of more than 100 studios throwing their .pdb files at it.
Additionally, Live++ also employs caches that use internal Windows functions like NtQueryDirectoryFile directly, since hammering the filesystem with a few hundred thousand requests is way too slow for large, AAA-scale projects.
Outlook
While having .pdb files is great for Windows and Xbox platforms, here's a bit of a riddle for you until next time: what about .elf based platforms? Any idea how to get the same pieces of information there?