Embedding Source Code in PDBs (with Rust!)

October 30, 2021

Microsoft Windows has stellar debugging tools. I've spent a lot of my career debugging gnarly C++ issues with Visual Studio.

Debugging a program requires three things:

  1. .exe / .dll (compiled code) or .dmp (crash dump)
  2. .pdb (debug symbols)
  3. source code

If you have all three you can successfully debug a program line-by-line.

Microsoft symbol server allows a debugger to automatically download .pdb symbols. These can be hosted either publicly or privately. Many companies host an internal symbol server to store symbols for builds released to customers.

Source Indexing embeds commands in .pdb files which download source code from source control.

Symbol Server + Source Indexing is magical. They allow you to debug a customer crash dump on a 6-month old build using the correct symbols and matched with the correct source code in seconds. Literally all you do is open the .dmp in Visual Studio and press F5.

Critical Limitations

For 97% of users Symbol Server + Source Indexing is enough. If you're making builds from CI and storing symbols then you have all you'll ever need.

However this workflow fails in two scenarios:

  1. User does not have access to source control
  2. Source code is not present in source control

I readily admit these are niche scenarios. But let's consider them.

For #1, have you ever had to debug a crash on a trade show floor with no internet? (Oh god PTSD!) Do your playtest rooms have network access + credentials for source control? Do artists and designers have access to source code? Did they set it up?

The second case is even more interesting. Source Indexing only works if source code is available in source control. Obviously 🙄. This means that local builds can't be source indexed. If you make a build on your local machine and share it with teammate via Dropbox they can't access the source used to compile that executable.

Do these limitations matter? Not for most people, most of the time! I've been burnt both. I would love an effective way to distribute binaries + symbols + code in one concise package.

tldr;

I used Rust to write a tool called fts_pdbsrc. It is a cmdline tool that both embeds source code in PDBs and extracts it.

Unfortunately, I also had to write a tool called fts_pdbsrc_service. This is because Microsoft hates me and wants me to suffer. I was forced to write a small Windows service that watches for source embedded .pdb files. Grumble grumble.

I'm writing this blog for a few reasons.

  1. Maybe this tool will be useful for you!
  2. Documentation for this stuff poor. This post chronicles my adventure.
  3. Now that I know how source indexing works under the hood I know how to support any source control system in any environment. This is pretty cool and worth sharing.

This project is dual-licensed under MIT and UNLICENSE. `fts_pdbsrc` is about 700 lines of Rust code. `fts_pdbsrc_service` is just 500 lines. The core operation performed is quite simple. I encourage readers to fork and modify this tool in a way that makes sense for your project, build system, and runtime environment.

Source Code: GitHub

fts_pdbsrc command line help

Sample Project - CrashTest

For testing purposes I created a stupid simple C++ project. It looks like this:

// CrashTest.cpp
#include <iostream>

int main() {
    std::cout << "Hello World! =D" << std::endl;

    int* x = nullptr;
    *x += 3;

    std::cout << "Goodbye cruel world! :(" << std::endl;
}

This program crashes immediately due to nullptr dereference. My goal is to embed Foo.cpp into CrashTest.pdb so that the crash can be easily debugged.

How Source Indexing Works

Let's dig into how Source Indexing works. A srcsrv.ini file is embeded within the .pdb. This file has a list of indexed source files and a command that can be used to download it from source control. Here's an example (mostly) from the official docs:

SRCSRV: ini ------------------------------------------------ 
VERSION=1
VERCTRL=<source_control_type_str>
DATETIME=<date_time_str>
SRCSRV: variables ------------------------------------------ 
SRCSRVTRG=%sdtrg% 
SRCSRVCMD=%sdcmd% 
SRCSRVENV=var1=string1\bvar2=string2 
DEPOT=//depot 
SDCMD=sd.exe -p %fnvar%(%var2%) print -o %srcsrvtrg% -q %depot%/%var3%#%var4%
SDTRG=%targ%\%var2%\%fnbksl%(%var3%)\%var4%\%fnfile%(%var1%) 
WIN_SDKTOOLS= sserver.microsoft.com:4444 
SRCSRV: source files --------------------------------------- 
c:\path\to\FooProject\src\foo.cpp*FooProject\src\foo.cpp*foo.cpp
SRCSRV: end ------------------------------------------------

The most important pieces are SRCSRVCMD and SRCSRVTRG. This configs says that to download foo.cpp you need to run the command:

sd.exe -p FILE print -o OUTPUTFILE -q DEPOTPATH

The exact value of OUTPUTFILE is the result of recursive macro expansion. It's a hot mess. What's important is this defines a command (SRCSRVCMD), which fetches the source file (foo.cpp), and copies it into a pre-determined location (SRCSRVTRG). Visual Studio can then open the file for debugging.

Microsoft provides some Perl scripts to source index PDBs for common source control platforms such as Perforce and SVN. There are also poorly documented and magical commands to perform HTTP downloads, such as from GitHub.

Understanding PDBs

Documentation on PDBs is pretty crappy. I spent hours reading, googling, and bashing my head against the keyboard trying to figure everything.

Microsoft provides a tool called pdbstr.exe (docs) which is short for, I assume, pdb stream. This tool can be used to both read and write "streams". A .pdb contains, amongst other things, named streams which contain arbitrary data. The srcsrv stream contains the srcsrv.ini file with all the source indexing information.

Given an already source indexed .pdb you can read the srcsrv stream with the command:

pdbstr -r -p:c:/path/to/foo.pdb -s:srcsrv

Similarly you can write a new stream with:

pdbstr -w -p:c:/path/to/foo.pdb -s:/mystream -i:c:/path/to/mystream.txt

As far as I can tell Microsoft provides no tool for dumping the list of all available streams. Furthermore, the PDB format is undocumented. They only provide the inscruitable project microsoft-pdb which demonstrates by example.

Thankfully the Rust crate pdb provides an easy to use interface for reading, but not writing, .pdb files.

Using this crate I was able to iterate and dump all streams.

// Load PDB
let pdbfile = File::open(pdb_path)?;
let mut pdb = pdb::PDB::open(pdbfile)?;

// Iterate streams
let info = pdb.pdb_information()?;
let stream_names = info.stream_names()?;
for stream_name in stream_names.iter() {
  println!("{}", stream_name.name)
}

For my dummy C++ project the default streams were:

/UDTSRCLINEUNDONE
/src/headerblock
/LinkInfo
/TMCache
/names

Microsoft also provides a tool called srctool.exe (docs). srctool will take a .pdb and run the srcsrv specified commands to acquire files. This is the same operation that Visual Studio will perform when attempting to fetch source code for a pdb.

Example usage: srctool c:\path\to\Foo.pdb -x -n

Plan of Attack

Here's my plan:

  1. embed source code in .pdb via new streams
  2. extract source code via srcsrv defined command
  3. Profit! 💰

Embedding Source

This part is relatively easy. Using the same Rust pdb crate we can iterate all files referenced by a PDB.

// Load PDB
let pdbfile = File::open(pdb_path)?;
let mut pdb = pdb::PDB::open(pdbfile)?;

// Iterate modules
let di = pdb.debug_information()?;
let mut modules = di.modules()?;
while let Some(module) = modules.next()? {
    if let Some(module_info) = pdb.module_info(&module)? {
        // Iterate files
        let line_program = module_info.line_program()?;
        let mut file_iter = line_program.files();
        while let Some(file) = file_iter.next()? {
            // Convert pdb::RawString to std::fs::Path
            let raw_filepath = string_table.get(file.name)?;
            let filename_utf8 = std::str::from_utf8(raw_filepath.as_bytes())?;
            let filepath = Path::new(filename_utf8);
            println!("{:?}", filepath);
        }
    }
}

This will print something roughly like this:

"C:\\temp\\cpp\\CrashTest\\CrashTest.cpp"
"C:\\Users\\Forrest\\AppData\\Local\\Temp\\lnk{EDED327F-A1CC-417A-A652-9BF9A38D74DF}.tmp"
"C:\\Program Files (x86)\\Windows Kits\\10\\Include\\10.0.19041.0\\ucrt\\corecrt_wtime.h"
<many other files under Program Files>
"d:\\agent\\_work\\4\\s\\src\\ExternalAPIs\\Windows\\10\\sdk\\inc\\winerror.h"
<many other files under d:\\agent>

In my particular case the only file I really care about is CrashTest.cpp. The d:\\agent files are, I believe, from Microsoft provided static libs linked into my CrashTest.exe? I'm not sure. My computer doesn't have a d:\.

Now all we need to do is find the source files we care about and add each one to a unique stream in the .pdb. I chose to put them under /fts_pdbsrc/. The pdb Rust crate does not support writing into PDBs. Therefore fts_pdbsrc.exe invokes pdbstr.exe as a subprocess. Here's an example subprocess command:

pdbstr -w -p:c:/temp/pdb/CrashTest.pdb -s:/fts_pdbsrc/CrashTest/CrashTest.cpp -i:c:/projects/cpp/CrashTest/CrashTest.cpp

We can then extract that file with:

pdbstr -r -p:c:/temp/pdb/CrashTest.pdb -s:/fts_pdbsrc/CrashTest/CrashTest.cpp -i:C:\Users\Forrest\AppData\Local\fts\fts_pdbsrc\CrashTest\CrashTest.cpp

Or we can extract it to a specified outfile from Rust via:

// Load PDB
let pdb_file = File::open(pdb_path)?;
let mut pdb = pdb::PDB::open(pdb_file)?;

// Get file stream
let stream_name = "/fts_pdbsrc/CrashTest/CrashTest.cpp";
let file_stream = pdb
    .named_stream(stream_name.as_bytes())
    .expect(&format!("Failed to find stream named [{}]", stream_name));
let file_stream_str: &str = std::str::from_utf8(&file_stream)?;

// Write to outfilepath
let out_filepath = /* srcsrv specified */;
let out_dir = out_filepath.parent()?;
fs::create_dir_all(out_dir)?;
let mut file = std::fs::File::create(out_filepath)?;
file.write_all(file_stream_str.as_bytes())?;

Creating a new srcsrv

At this point know how to embed source code into a .pdb. We also know how to extract it. Now we just need to tell Visual Studio how to get it via srcsrv.

First, we need to embed a new, custom srcsrv.ini. We write this into the pdb with the same command as before:

pdbstr -w -p:c:/path/to/foo.pdb -s:srcsrv -i:c:/path/to/mysrcsrv.ini

Next, we need to generate mysrcsrv.ini. Specifically, we need to inject all the file paths that we care about. Here's what it should look like.

SRCSRV: ini ------------------------------------------------
VERSION=1
VERCTRL=fts_pdbsrc
SRCSRV: variables ------------------------------------------
SRCSRVTRG=%LOCALAPPDATA%\fts\fts_pdbsrc\CrashTest\%var2%
SRCSRVCMD=fts_pdbsrc extract_one --file %var2% --out %SRCSRVTRG%
SRCSRV: source files ------------------------------------------
C:\temp\cpp\CrashTest\CrashTest.cpp*CrashTest\CrashTest.cpp*CrashTest.cpp
SRCSRV: end ------------------------------------------------

The way srcsrv is used each line within "SRCSRV: sourcefiles" is split into macros:

var1*var2*var3*var4*...*var10

var1 must be the fully qualified file path. The other vars are up to the user. Standard convention appears to be:

FullPath*RelPath*Filename*Extras.

Visual Studio or srctool will then expand SRCSRVCMD into the following command:

fts_pdbsrc extract_one --file FILE --out OUTFILE

Voila!

Microsoft Hates Me and Wants Me to Suffer

Unfortunately there's a problem. As previously mentioned, Microsoft hates me.

This almost works as is. But only almost. The problem is that Microsoft designed srcsrv to work with source control servers. We're embedding everything in the pdb. Unfortunately when Visual Studio or srctool.exe invokes SRCSRVCMD we don't have a path to CrashTest.pdb.

There is not, to the best of my knowledge, a variable that represents the fully qualified pdb path. If there was then we'd be done. Pass the pdb_path to fts_pdbsrc and extract the file streams.

Unfortunately this isn't possible. Which means that fts_pdbsrc has no way of finding or accessing the relevant .pdb. :(

An Ugly Workaround

My solution to this problem is fts_pdbsrc_service.exe. It runs as a Windows service, scans specified folders for .pdb files containing a srcsrv stream with VERCTRL=fts_pdbsrc and VERSION=1. Then it runs a filewatcher looking for new or changed pdb files. Visual Studio runs fts_pdbsrc.exe which queries fts_pdbsrc_service via TCP on localhost for the path to the matching .pdb.

"Matching PDB" is defined with a UUID. When embedding source, fts_pdbsrc generated a UUID and includes it in the srcsrv stream in a variable called FTS_PDBSRC_UUID. That UUID is passed to fts_pdbsrc as part of SRCSRVCMD. fts_pdbsrc sends a query to fts_pdbsrc_service and passes along the UUID.

I am not happy with this solution. My goal was to eliminate the need for a source control server, and I basically wrote a local source control server. :(

Unfortunately I don't know of a better option. If there's a hidden, undocumented variable for the PDB path then fts_pdbsrc_service goes away. Alternatively, if someone at Microsoft would like to do me a solid this could be fixed in Visual Studio 2022. :)

File Encryption

Embedding source files in PDBs is dangerous and super scary. Most projects don't ship pdbs to customers. However many projects have accidentally forgotten to strip PDBs in some release. Accidentally releasing full source code would be catastrophic.

To mitigate this I added support for encrypting files before writing them into the PDB. Disclaimer: I know nothing about crytopgraphy and you should not trust this. The code looks roughly as follows:

// Generate once for all files
let mut rng = rand::thread_rng();
let key_bytes = rng.gen::<[u8; 32]>();
let key = Key::from_slice(&key_bytes); // 256-bits; unique per PDB
let cipher = Aes256Gcm::new(&key);

// Per file
let file_bytes = /* omitted */;
let nonce_bytes = rng.gen::<[u8; 12]>();
let nonce = Nonce::from_slice(&nonce_bytes); // 96-bits; unique per file
let encrypted_text = cipher.encrypt(nonce, files_bytes);

// Write encrypted text into PDB

Decrypting looks like:

let key_bytes = /* omitted */
let key = Key::from_slice(&key_bytes);
let cipher = Aes256Gcm::new(&key);

let nonce_bytes = hex::decode(nonce_str)?;
let nonce = Nonce::from_slice(&nonce_bytes);

let cipher_text = /* omitted */
let plain_text = cipher.decrypt(nonce, cipher_text);

This implementation relies entirely on the Rust crates rand and aes-gcm.

Files can be embedded in one of three modes:

  1. No encryption
  2. User provided encryption key
  3. Randomly generated encryption key

If a key is randomly generated then it is printed into the console and must be manually saved out.

Decryption keys are pulled from fts_pdbsrc_config.json. Each key is used to decrypt a file until one suceeds or all fail.

Encryption Disclaimer

Full disclosure: I know absolutely nothing about encryption or crytopgraphy. You should probably not use this feature. I implemented it as a learning exercise. Proceed with extreme caution.

Tying it all together

To embed:

  1. Run fts_pdbsrc embed --pdb c:/path/to/foo.pdb --roots c:/path/to/ProjectRoot --encrypt-mode Plaintext
    1. Encrypt with rng key: --encrypt-mode EncryptFromRngKey
    2. Encrypt key explicit key: --encrypt-mode EncryptWithKey(0124567890124567890124567890124567890124567890124567890124567890)

To extract:

  1. Install fts_pdbsrc.exe and fts_pdbsrc_service.exe into your path
  2. Add .pdb search directories to fts_pdbsrc_service_config.json
  3. (Optional) Add decryption keys to fts_pdbsrc_config.json
  4. (Admin) Run fts_pdbsrc.exe install_service once
    1. To uninstall: fts_pdbsrc.exe uninstall_service
  5. Debug with Visual Studio!

Rust is awesome

I wrote both fts_pdbsrc.exe and fts_pdbsrc_service.exe with Rust. This project would have never come together if not for the amazing Rust ecosystem.

Here are the crates that are worth a special shout out.

Plea to Microsoft

Dear Microsoft,

Please provide a mechanism for srcsrv to have a variable that expands into the full path of the .pdb file.

Sincerely,

Forrest

Letter to Rust Community

Curiously, this Rust written tool does not actually work with Rust derived .pdb files. :(

For some reason local project files are not listed within the PDB the same way they are for C++ projects. I'm not sure why.

I do my Rust debugging with VS Code. I'd love to see this work with Rust + VS Code. However I'm not sure if VS Code even supports source indexing?

Future Work

No further work is planned at this time. I'm happy to accept pull requests if anyone fixes bugs or adds broadly useful features.

If Microsoft exposes the .pdb path then this project could be super cool for open source projects. However I expect proprietary projects are more likely to build new tools based on my code rather than use it directly.

I think it would be super cool if both symbols and code were directly embedded in .exe / .dll files. I think it might be technically possible to embed a pdb (with embedded source) into a compiled binary. However I don't know if it's currently possible for Visual Studio to extract symbols during the "Load Symbols" operation. Maybe in Visual Studio 2025? :)

Discussion

I wrote a tool that allows source code to be embedded into .pdb files. This tool can be used by Visual Studio to automatically extract and open source files for debugging. It works and is pretty neat.

For open source projects I think embedding source is a cool idea. Tools like Blender could ship PDBs and with embedded source it would be trivial for users to debug crashes or even mystifying behavior.

For proprietary projects this tool is probably a bad idea. It's neat and does have value in niche cases. But it's much safer and more secure to pull source for an external source control server.

However Microsoft only provides scripts for a few services - Perforce, SVN, Team Foundation Server, and Visual Source Safe. If you work at a company with a Git or Mercurial monorepo you'll need something custom.

This post + GitHub project should contain all the information needed to write a custom source indexing tool to work with any source control system in any corporate environment.

Thanks for reading.