Debugging a Dynamic Library that Wouldn't Unload

June 6, 2021

I recently ran into an unexpected bug that sent me on a bit of adventure and had a very surprising conclusion. This is my tale.

Background

Once upon a time I wrote a blog post titled How to Reload Native Plugins in Unity. This is important because my workflow is mildly uncommon. It looks roughly like this.

  1. Launch Unity Editor
  2. Enter 'Play Mode'
  3. Load .dll plugins via LoadLibrary and GetProcAddress
  4. Exit 'Play Mode'
  5. Unload .dll plugins via FreeLibrary
  6. Modify C++ code, recompile code, replace .dll plugins, goto Step 2

The important thing here is that dynamic libraries are being loaded, unloaded, modified, and loaded again. Everything happens within the Unity.exe process which is never closed. Restarting the Unity editor is slow and cumbersome. This workflow allows me to rapidly iterate on new C++ plugins without having to restart the Unity editor between runs.

I've been using this workflow for almost two years. My Unity native plugin reloader has proven to be effective and reliable.

A Wild Bug Appears

One of my users started to observe weird program behavior. Impossible behavior even. They seemed to have some stale state sticking around between runs. That shouldn't be possible!

My workflow builds the world "On Enter Play" and tears the world down "On Exit Play". This is a trade-off. One really really nice benefit of this architecture is no stale state. By cleaning everything and fully unloading the .dll I can guarantee there is zero stale state between runs. Everytime I click "Play" in the editor I'm guaranteed to start clean.

Or so I thought.

Down the Rabbit Hole

After a little debugging it became very clear that there was indeed stale state. But how? My code avoids globals like the plague. (All globals are evil.) So even if some global or static snuck into a naughty .dll it shouldn't matter since the whole thing gets unloaded.

It turns out the .dll was not being unloaded! I attached the Visual Studio debugger and the modules window made plain as day that Foo.dll was staying in memory between runs.

Loaded modules in Unity

What's weird is that my project loads multiple custom native plugins. And only Foo.dll was staying loaded between runs. My other plugins Bar.dll, Baz.dll, etc were all successfully unloaded. Out of several plugins one, and only one, was failing to unload. Something was uniquely wrong with Foo.dll.

LoadLibrary is a reference counted operation. Calling LoadLibrary will load the library if necessary, otherwise it increments the existing refcount. Similarly, FreeLibrary decrements the refcount and unloads on zero. I added a few logs and confirmed that every call to LoadLibrary had a matching FreeLibrary.

Next I altered my API in Foo.dll to do nothing and return immediately. Lo and behold foo.dll unloads! This implies that something inside Foo.dll is bumping its refcount causing it to not unload. But what?

WinDbg to the Rescue

Thanks to a friend I learned a new trick. Using WinDbg you can run the command bm *GetModuleHandle* to inject a breakpoint into every function matching the pattern.

Hitting this breakpoint reveals the mystery.

Crash Callstack in WinDbg

Let's break this down.

  1. std::thread constructor causes GetModuleHandleExW to be invoked.
  2. rcx register contains 0x04. This corresponds to GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS.
  3. rdx contains 0x00007ffc9e1d1145. This corresponds to some function inside Foo.dll.

The first line of documentation for GetModuleHandleExW reads:

Retrieves a module handle for the specified module and increments the module's reference count

🎉 Tada! 🎉

The std::thread constructor calls _beginthreadex which calls create_thread_parameter which calls GetModuleHandleExW. The call to GetModuleHandleExW passes the flag GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS which increments the module refcount. There is a corresponding call to FreeLibraryAndExitThread via std::thread destructor and _endthread.

The root cause of my "module won't unload bug" is a failure to properly cleanup all background threads. Fixing my sloppy shutdown code allowed Foo.dll to actually unload. Success!

To Crash or Not to Crash

At this point my mystery is solved. However you should be asking youself some questions.

Imagine for a second that you load some .dll and call a module function that spins up a background thread to perform some expensive operation. Then you call FreeLibrary. What happens?

  1. Program explodes catastrophically.
  2. Program continues to function.

I expected #1, program explodes. The thread is executing instructions, those instructions are unloaded from memory, kaboom. 💥

I observed #2. std::thread increments the module refcount which prevents the module from unloading. This surprised me and everybody I talked to. Maybe this behavior is obvious to you. It certainly wasn't to me!

But wait, there's more

Let's consider another scenario. Imagine if std::thread initially calls a "safe" function which later calls a module function that gets unloaded? What happens?

Here's a minimum example to find out.

// Foo.cpp compiled into Foo.dll
extern "C" {    
  __declspec(dllexport) void ExpensiveFunc() {
    std::cout << "begin expensive operation" << std::endl;
    std::this_thread::sleep_for(std::chrono::milliseconds(500));
    std::cout << "end expensive operation" << std::endl;
  }

  // Not actually C. Simplified for blog.
  __declspec(dllexport) std::thread ExpensiveFuncAsync() {
      // bumps refcount of Foo.dll
      return std::thread([]() {
        ExpensiveFunc();
      });
  }
}
// main.cpp compiled into main.exe
void main() {
  using VoidFn = void(*)();
  using ThreadFn = std::thread(*)();

  // Works
  {
    auto module = LoadLibraryA("Foo.dll"); // Foo.dll refcount = 1
    ThreadFn expensiveFuncAsyncFn = (ThreadFn)GetProcAddress(module, "ExpensiveFuncAsync");
    std::thread works = expensiveFuncAsyncFn(); // Foo.dll refcount = 2
    FreeLibrary(module); // Foo.dll refcount = 1
    works.join(); // Foo.dll refcount = 0; unloads
  }

  // Crashes
  {
    auto module = LoadLibraryA("Foo.dll"); // Foo.dll refcount = 1
    VoidFn expensiveFuncFn = (VoidFn)GetProcAddress(module, "ExpensiveFunc");
    
    // std::thread calls lambda, which does NOT bump refcount of Foo.dll
    std::thread crashes = std::thread([expensiveFuncFn]() {
      expensiveFuncFn();
    });
    
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    FreeLibrary(module); // Foo.dll refCount = 0; unloads
    crashes.join(); // kaboom! Access violation executing location
  }
}

There are two functions in Foo.dll – void ExpensiveFunc() and std::thread ExpensiveFuncAsync(). The first performs some expensive operation. The second creates a new thread which performs some expensive operation.

There are two blocks of code inside main.cpp. The first loads Foo.dll, calls std::thread ExpensiveFuncAsync(), frees Foo.dll, and joins the thread. This block of code does NOT crash because Foo.dll's refcount gets bumped when ExpensiveFuncAsync constructs a new std::thread.

The second block constructs a std::thread inside foo.exe which then calls void ExpensiveFunc in Foo.dll. This version explodes catastrophically.

What this means is that if you are naughty and "leak" a std::thread then your program MIGHT crash, or it might not. And it doesn't depend on what code is executing. It depends on what code created the std::thread.

I personally think MSVC's STL behavior here is highly questionable. Bumping the module refcount from std::thread is super extremely non-obvious. No one on my team expected this behavior. It didn't even protect me from a bug. It merely swept my sloppy bug under the rug and made it difficult to discover. I would rather my program explode the moment I call FreeLibrary. That would have been both obvious and trivial to fix.

Windows vs Linux

This entire blog post was written in the context of Windows compiling with Visual Studio 2019. I do not know if other operating systems with other STL implementations have the same behavior. If any reader would like to test and let me know then I'll happily update this post.

Conclusion

Debugging this particular issue was a bit of an adventure. My blog post title gave away the fact that a module wasn't unloading. It actually took a bit of time to make that discovery. I was so surprised by the fact that std::thread bumps the module refcount that I felt it worthy of a blog post.

Thanks for reading.