Writing a Custom D3D12 Renderer for Doom 2016 Level Geometry

July 30, 2024 - Armin Jahanpanah

This is a small write-up of my latest personal project to implement a custom DirectX 12 rendering framework in C++ with the goal of rendering level geometry from the game Doom (2016).

I have always been interested in examining how commercial games are built - both with respect to programming techniques being used as well as with regards to design and graphics aspects. This project seemed like a good combination of getting more familiar with DX12, examining some aspects of Doom's rendering code (and associated file formats) and having something nice to display on screen as well.

The resulting demo application showcases rendering of geometry data consisting of ~5 mio triangles and is running in 4K resolution at 60 Hz on a low-end Core i5-4460 3.2 GHz with NVidia 980GTX.

More specifically, the goals of the project were to

implement a simple D3D12 rendering interface and get more familiar with the new concepts introduced by modern rendering APIs (like Command lists, PSOs, explicit resource synchronization, etc.)
understand the Doom geometry file format and associated data structures and build a robust loader for them
create a minimal framework to abstract platform/system-dependent code and file I/O

The following sections will highlight some of the associated implementation aspects in more detail.

Data Formats and Loading

Doom stores level geometry in the so-called .bmodel file format, and most levels are roughly 200-300 MB in size. Some information about the file format can be found on the web [2], however, additional reverse engineering was done to improve data structures and implement a more robust loader.

An older version of the format was already used to some extent in Doom 3 (BFG). This most likely also explains why some parts of the data are stored in big endian data format, as the BFG edition was ported in particular to the Xbox 360 and PS3, both of which use a PowerPC CPU.

The geometry data file is split into several sections (or surfaces) consisting of:

a small header (number of vertices/indices, surface/material names, etc.),
the vertex/index data, and
its encompassing bounding box (min/max).

The introductory level of DOOM 2016 consists of roughly 2000 sections.

The vertex format uses a packed 32-bit format for normals, where each component (X,Y,Z) is encoded into 8 bits. Index data uses 16-bit unsigned indices.


    struct idVertex
    {
        float    position[3];
        float    texcoord[2];
        uint8_t  normal[4];
        uint8_t  tangent[4];
        uint8_t  color[4];
        float    texcoord2[2];
        uint8_t  unk0[4];
        uint8_t  unk1[4];
    };

While implementing the geometry loading code, it quickly became apparent that care has to be taken regarding file I/O operations to ensure optimal performance.
In this context, a major aspect is to avoid doing many small read operations directly from disk, and instead either load the complete file into a temporary RAM buffer (which can be discarded after uploading geometry data to GPU VRAM) or to use memory-mapped files.
A naive implementation that does individual reads for each value (and performing the necessary endian conversion) can easily result in loading times in the order of 15 minutes for one level. In contrast, the memory-based parsing approach of the data as described above merely takes seconds.

Rendering and Performance

In the demo application, all geometry is uploaded to a single Vertex-/Indexbuffer in GPU VRAM using a D3D12 default heap. This means all vertex/index data of the individual geometry sections need to be merged into one big vertex/index buffer (indices were adjusted accordingly).
All uploads are done at startup, before demo/rendering begins - as a consequence, the use of an D3D12 upload queue (with synchronization) can be avoided.

Interestingly, when we take a moment to review our rendering data, we can identify several aspects we can take advantage of to improve performance:

all geometry has been merged into one buffer
all vertex data is already specified in world-space, individual matrix transformations of geometry sections are not needed
we are using the same shader for all geometry

Based on these conditions, it turns out we can actually render the whole level in one drawcall which is a best-case scenario when aiming for brute-force GPU throughput.

While this is nice from a performance point of view in the context of a demo, for a more complete rendering implementation the following points would require more attention:

Level geometry usually consists of several thousand sections, each of them potentially associated with a different material/shader
(as mentioned, the introductory level consists of ~2000 sections, for example).
Dimensions of the levels are quite large, the implemented demo uses a depth/Z range of 100-65000 to display all geometry without far-plane clipping

As a consequence, extending the demo to support individual shaders per section would mean that many separate GPU state changes and drawcalls need to be performed, which can result in significant CPU/driver overhead. In this case, most likely some form of shader sorting and visibility culling would be necessary to reduce the amount of drawcalls to be processed per frame. Another way to address these issues is to use so-called ubershaders or a deferred rendering pipeline to reduce the number of required shader/state switches.

PSOs and Shaders

The demo implements three different render modes which can be selected via an on-screen menu:

simple Lambert shading with fixed directional light source
debug visualization of surface normal vectors
wireframe rendering (to showcase the visual complexity of geometry)

These render modes utilize simple Pipeline State Objects (PSOs, see [3]) and corresponding HLSL shaders. PSOs are a new feature of D3D12 that bundles GPU rendering state for the rasterizer stage, blending, depth stencil operations, and the primitive topology type of the submitted geometry as well as the shaders that will be used for rendering. This matches actual hardware architecture of modern GPUs more closely than previously available APIs and allows for more efficient data processing. However, one needs to be aware that the number of required PSOs can quickly increase if more fine-grained control is needed.

To integrate these shaders into the main C++ code they were compiled offline using the dxc shader compiler tool [4] to generate header files that can be included from C++ directly.

User Interface

To be able to display status information (e.g. number of vertices and triangles, framerate, ...) and allow the user to modify certain parameters of the demo Omar Cornut's Dear ImGUI library [5] was integrated.

The UI is very minimal and consists mainly of simple widgets like labels, sliders, list boxes and a color picker.

Framework

To ensure code modularity, the project uses a layered approach where the main application code is built on top of a framework of several reusable components.

The main application code is responsible for the high-level rendering logic and resource managment as well as running the main loop of the application.

The framework provides the following components, split into two main modules Core and Graphics:

The Core layer provides abstractions for platform/system-dependent code and file I/O
Graphics: DX12 graphics wrapper, ImGUI and model data structures and loading code

The relationship between the individual components is illustrated by the following diagram:

Compared to previous versions of Direct3D, a lot more detailed work is required by the application developer to setup the rendering pipeline and be able to execute draw commands. To make this work a little easier a DX12 graphics wrapper was implemented which helps to simplify common tasks like

Device creation as well as setup of swapchain and render targets (incl. depth-stencil)
DescriptorHeap management
handling of CommandLists and Queues (for both graphics and compute operations)
Fence-based synchronization

Conclusion

Working on this project was quite rewarding, as I not only gained a deeper understanding of current GPU paradigms due to hands-on usage of D3D12 and modern rendering API concepts, but as I also was able to peek a bit under the hood of a (somewhat) recent game engine.
Examining the data structures and formats of these engines, as well as aspects like storage and submission of geometry, aids in understanding important data layout and processing aspects. This knowledge helps in improving one's own code and designing better data structures and rendering abstractions in the future.

The demo application showcases the underlying concepts in practice and allows the user to get a impression of the geometric detail and complexity involved in the creation of modern games.

Two main features that could be added in the future to improve the current implementation are a proper material system/shader handling (incl. sorting/batching) and some form of visibility culling.

References

[1] id Software, Doom (2016), https://en.wikipedia.org/wiki/Doom_(2016_video_game)
[2] chronokun, Making sense of DOOM's .bmodel file format
[3] Microsoft, Managing Graphics Pipeline State in Direct3D 12
[4] https://github.com/microsoft/DirectXShaderCompiler
[5] Omar Cornut, Dear ImGUI, https://github.com/ocornut/imgui