PE Headers for Malware Analysts: From File Structure to Suspicious Indicators

in

Introduction

This blog post aims to highlight my understanding of the headers within a PE-COFF file and how they can be extracted, and why they matter for malware research and reverse engineering.

In preparation for this post, and to demonstrate software development proficiency as well as a decent understanding of the PE-COFF file headers, I developed a C++ project named PEDetect, which parses the relevant headers from a given executable.
PEDetect is a self-made attempt at mirroring the features and capabilities of multiple well-known industry tools by aiming to develop an advanced understanding of the PE-COFF file structure. With PEDetect the aim is to understand how these tools work, and more importantly how the PE-COFF and other headers in executables are structured. This understanding should lead to an increase in knowledge about the information that these headers contain and how they can be extracted and correlated.

During the description of the PE file structure in this first blog post, the HxDSetup executable will be used to guide the reader through the different headers and structures. However, as this project is approached from the perspective of a malware reverse engineer, we will also highlight the relevance of certain headers and fields from a malware research perspective.

Table of Contents

Understanding the PE-COFF File Structure

PEDetect works by reading the input file byte by byte, starting at offset 0 and will obtain its information from the DOS Header and build upon this information to learn more about the subsequent headers: DOS Header, DOS Stub, COFF header, Optional Header, Data Directory and sections.

Application Name SHA256
HxDSetup.exe DCCFA4B16AA79E273CC7FFC35493C495A7FD09F92A4B790F2DC41C65F64D5378

DOS Header

The DOS Header is the first header in a PE file and is 64 bytes long. It starts with the infamous magic bytes MZ or in hex 0x4D5A. The header structure itself is defined in the winnt.h header file and appropriately named the IMAGE_DOS_HEADER struct.


typedef struct _IMAGE_DOS_HEADER {      
    WORD   e_magic;                 // Magic number (MZ)
    WORD   e_cblp;                  // Bytes on last page of file
    WORD   e_cp;                    // Pages in file
    WORD   e_crlc;                  // Relocations
    WORD   e_cparhdr;               // Size of header in paragraphs
    WORD   e_minalloc;              // Minimum extra paragraphs needed
    WORD   e_maxalloc;              // Maximum extra paragraphs needed
    WORD   e_ss;                    // Initial (relative) SS value
    WORD   e_sp;                    // Initial SP value
    WORD   e_csum;                  // Checksum
    WORD   e_ip;                    // Initial IP value
    WORD   e_cs;                    // Initial (relative) CS value
    WORD   e_lfarlc;                // File address of relocation table
    WORD   e_ovno;                  // Overlay number
    WORD   e_res[4];                // Reserved words
    WORD   e_oemid;                 // OEM identifier (for e_oeminfo)
    WORD   e_oeminfo;               // OEM information; e_oemid specific
    WORD   e_res2[10];              // Reserved words
    LONG   e_lfanew;                // File address of COFF header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;

Often the most important field in the DOS header is e_lfanew. This field contains the file offset of the PE signature, also called the start of the NT Headers. At that offset, the loader expects PE\0\0, followed by the COFF/File Header and the Optional Header.
The image below depicts the output of PEDetect, which displays the size of the header and the location of the COFF header.

DOS header


DOS Stub

The DOS Stub is located directly after the DOS Header and is a small legacy code region which will print a default error message along the lines of This program cannot be run in DOS mode or This program must be run under Win32 if the executable is loaded in MS-DOS. In essence, the stub is actually a piece of machine code which you can disassemble and analyze using a disassembler like GHIDRA/IDA. To do so, you can extract the stub bytes and load them into a disassembler as 16-bit code. The DOS Stub is a 16-bit MS-DOS program native to Intel 8086 processors. Once loaded you will notice that it contains a few simple instructions along the lines of obtaining the address of the error message and printing it using a DOS interrupt API call before exiting with an error.

DOS Stub


Relevance for Malware Research

Relatively recent research specifically demonstrated how malware performs PE-format manipulations. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. The research goes on to show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better trade-off in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks.

RICH Header

Executables compiled using the Microsoft Visual Studio toolset will have a populated RICH header. The RICH header is an officially undocumented structure, however, over the years researchers have been able to 'decode' the specific items in this header. You will notice, that in the executable that we have used so far, the RICH header is fully nulled. A missing, zeroed, or invalid Rich Header may suggest the binary was not produced by the Microsoft linker, but it can also be stripped, altered, packed, or intentionally corrupted. Treat it as a clue, not proof.

The RICH header begins with a chunk of XOR-ed data, a signature and a 32-bit checksum which simultaneously acts as the XOR key. Let's start with the most straightforward part, the signature: the signature is a 4-byte object containing the string Rich (0x52696368).

The last part of the RICH Header, is a 32-bit checksum. This checksum is simultaneously the XOR key to decode the data before the signature. Once decoded, the data will contain a signature containing the string DanS. Subsequently, there is likely to be padding in the form of zeroed DWORD values. Lastly, the data contains DWORD key-value pairs which each represent a tool name, the build number of the tool and the number of times the tool has been used. PEDetect will read the XORed data blob into a fixed 88-byte array, it will then parse the signature to determine if a proper RICH header blob has been read in. Subsequently, it will skip over the padding and parse every 8 bytes, corresponding to the DWORD key-value pairs to decode the build number, tool number and count.

RICH Header


Relevance for Malware Research

ESET has performed Rich Header work which explicitly discusses malware families such as Dridex and Industroyer and shows how Rich Header features can support clustering, hunting, and anomaly detection. It also makes the important point that Rich Headers are useful but not absolute proof of authorship.
A Rich Header can support "this was built in a similar environment" or "this resembles a known cluster", but it should not be treated as direct attribution proof.

PE Header

Now we get to the PE Header. We previously obtained the starting address of this header in the DOS Header by reading the value located in the e_lfanew field. The PE Header starts with the PE Signature, which simultaneously marks the start of the PE file. The value is always \x50\x45\x00\x00 and represents the ASCII string PE\0\0.

PE/COFF Header


COFF Header

The COFF Header, also commonly referred to as the PE File Header or IMAGE_FILE_HEADER, contains seven fields.

Offset Size Field Description
0 2 Machine The number identifying the type of target machine
2 2 NumberOfSections The number of sections
4 4 TimeDateStamp The number of seconds indicating when the file was created
8 4 PointerToSymbolTable The file offset of the COFF symbol table
12 4 NumberOfSymbols The number of entries in the symbol table
16 2 SizeOfOptionalHeader The size of the optional header
18 2 Characteristics The flags that indicate the attributes of the file

Using PEDetect, we extract all of these values and display them accordingly. A more detailed explanation of all fields will be given below.

PE/COFF Header


Relevance for Malware Research

Timestamps are one of the aspects that is used to perform attribution. However, a timestamp should be treated as a clue, not truth. Compare COFF timestamp against Rich Header, debug directory, certificate timestamp, resource timestamps, and campaign context. The timestamp is one of several PE locations that may carry time information, but malware authors and packers can easily tamper with it.

Machine

The first field is 2 bytes in size and the value represents the target architecture for which the executable was built. There's a subset of predefined examples, such as:

Architecture in bytes Architecture representation
0x014C Intel 386
0x01C0 ARM
0xAA64 ARM64
0x8664 x64 (AMD64)

NumberOfSections

Subsequently, the second field, once again 2 bytes in size, represents the number of sections that are present in the PE file. It will do so, by only counting the section headers, and not the full section. The sections, which we will dive into later, represent different parts of the file that contains code, data or other resources the executable requires.

TimeDateStamp

In the third field, 4 bytes in size, the timestamp, indicating when the file was built is present. This can be used in forensics and malware reverse engineering cases. For example, it allows for speculation regarding when certain campaigns were developed and can be used to differentiate between samples and progressions in development of specific adversarial campaigns. Note that although it can support timeline analysis, it should not be treated as reliable on its own. Malware authors and packers can tamper with it.

PointerToSymbolTable

The fourth field, 4 bytes in size, contains an address which points to the symbol table. However, note that this value, especially in executables and DLL files is often set to 0x00000000. Therefore, we will not discuss this any further at this point.

NumberOfSymbols

The subsequent field, also 4 bytes in size, contains the number of symbols (entries) in the symbol table. As the symbol table is often not present, this value is likely to be 0x00000000 as well.

SizeOfOptionalHeader

The sixth field, 2 bytes in size, contains the size of the Optional Header. In normal PE triage, the two values you will most commonly care about are 0x10B for PE32 and 0x20B for PE32+.

Characteristics

Ultimately, we end with the Characteristics field, which is 2 bytes in size. The Characteristics field is a combination of one or multiple flags that indicate the attributes and characteristics of the executable. Microsoft has disclosed the full list of Flags and their corresponding values and descriptions.

Optional Header

Despite its name, the optional header is present in every image file and provides information to the loader. As defined by Microsoft, the optional header is only optional in object files. The first step is to validate that optional header magic number and ensure our previous assumption in terms of 32-bit and 64-bit executables is correct.

Depending on the format, 32-bit or 64-bit, the Optional Header will have one more field. For 32-bit files, the BaseOfData field exists, and doesn't exist in 64-bit executables. Furthermore, some fields might be 8 bytes in size instead of 4.

Within the code of PEDetect, we account for this accordingly and have developed a PE32 and PE64 parser. Using PEDetect, the output will display all significant values that can be found in the header.

Optional Header

Relevance for Malware Research

The Optional Header tells you how the loader will map and start the image. For malware analysis, AddressOfEntryPoint, ImageBase, SectionAlignment, FileAlignment, Subsystem, and DllCharacteristics deserve special attention.

Offset (PE/PE32+) Size (PE/PE32+) Field Description
0 2 Magic Specifies file format
2 1 MajorLinkerVersion The linker major version number
3 1 MinorLinkerVersion The linker minor version number
4 4 SizeOfCode The size of the code (text) section(s)
8 4 SizeOfInitializedData The size of the initialized data section(s) (.data, .rdata, resources, etc.)
12 4 SizeOfUninitializedData The size of the uninitialized data section(s) (BSS)
16 4 AddressOfEntryPoint The address of the entry point relative to the image base when loaded into memory
20 4 BaseOfCode The relative address of the beginning-of-code section
24 4 BaseOfData The relative address of the beginning-of-data section, does not exist in PE32+ executables
28/24 4/8 ImageBase The preferred address of the first byte of image when loaded into memory
32/32 4 SectionAlignment The alignment in bytes of sections when they are loaded into memory
36/36 4 FileAlignment The alignment in bytes that is used to align the raw data of sections
40/40 2 MajorOperatingSystemVersion The major version number of the required operating system
42/42 2 MinorOperatingSystemVersion The minor version number of the required operating system
44/44 2 MajorImageVersion The major version number of the image
46/46 2 MinorImageVersion The minor version number of the image
48/48 2 MajorSubsystemVersion The major version number of the subsystem
50/50 2 MinorSubsystemVersion The minor version number of the subsystem
52/52 4 Win32VersionValue Must be zero
56/56 4 SizeOfImage The size (in bytes) of the image, including all headers, as the image is loaded in memory
60/60 4 SizeOfHeaders The combined size of an MS-DOS stub, PE header and section headers
64/64 4 Checksum The image file checksum. Important for drivers and some system images; often zero or ignored for ordinary user-mode executables.
68/68 2 Subsystem The subsystem that is required to run this image
70/70 2 DllCharacteristics Flags describing security and loader characteristics of the image
72/72 4/8 SizeOfStackReserve The size of the stack to reserve
76/80 4/8 SizeOfStackCommit The size of the stack to commit
80/88 4/8 SizeOfHeapReserve The size of the heap to reserve
84/96 4/8 SizeOfHeapCommit The size of the heap to commit
88/104 4 LoaderFlags Reserved, must be zero
92/108 4 NumberOfRvaAndSizes The number of data-directory entries in the remainder of the optional header
Magic

Like all headers so far, the Optional Header also starts with magic bytes. The magic is 2 bytes long and can be either one of the following two values. If the value is 0x10B it represents a 32-bit executable, if the value is 0x20B it represents a 64-bit executable.

Major/MinorLinkerVersion

The bytes in these two fields make up the version number of the linker that generated the file and as such it indicates the compatibility of the file with the linker software. These values combined can be used to determine the toolset which was used to create the executable and contribute to assumptions about the programming language used.
An example value: 0x0E24 should be read as MajorLinkerVersion = 0x0E and MinorLinkerVersion = 0x24 which corresponds to Microsoft Linker version 14.36 (associated with Visual Studio 2022).

SizeOfCode

This value represents the total size of all sections that contain executable code. As we will discuss sections later that is all we need from this field for now.

SizeOf(Un)initializedData

These two fields represent the size of the initialized data section (.data, .rdata, resources, etc.) and the uninitialized data section (.bss).

AddressOfEntryPoint

The value in this field represents the Relative Virtual Address where the Windows loader transfers control after mapping the image and performing loader-managed initialization. It is not necessarily main, WinMain, or the first code that runs in the process. TLS callbacks and runtime startup code may execute before developer-controlled logic.

BaseOfCode

The value in this field represents the Relative Virtual Address of the start of the code section in memory.

BaseOfData

The value in this field represents the Relative Virtual Address of the start of the data section in memory.

ImageBase

The ImageBase represents the preferred memory address at which the image should be loaded. In most cases the defaults for these are 0x400000 for PE32 and 0x140000000 for PE32+. This should be familiar to those who have loaded images into disassemblers like GHIDRA/IDA and see addresses starting with 0x4... or 0x14... respectively.

Section/FileAlignment

The alignment of sections in memory and in the file on disk respectively.
Section: This value is typically set to 0x1000 representing 4KB. However, the value must always be greater than or equal to the FileAlignment.
File: This value is typically set to 0x200 (512 bytes) but can vary based on the file format.

Major/MinorOperatingSystemVersion

The version number corresponding to the minimum Windows OS version that is required to run the executable. Possible values could be:

  • 5.1 -> "Windows XP",
  • 6.2 -> "Windows 8, Windows Server 2012",
  • 10.0 -> "Windows 10/11, Windows Server 2016/2019/2022"

Major/MinorImageVersion

The major and minor version number of the image, this is set by the developer and as such can have any value.

Major/MinorSubsystemVersion

See Major/MinorOperatingSystemVersion

Win32VersionValue

This value, by default, must be zero.

SizeOfHeaders

The value in this field contains the size of the MS-DOS stub, the PE Header and sections headers. Since we have already obtained the MS-DOS stub and PE header, and therefore know its sizes, we can calculate the size of the section headers.

Checksum

The checksum is important for kernel-mode drivers and some system-critical images. For normal user-mode executables it is often zero or not meaningful during basic triage.

Subsystem

The subsystem field determines which Windows subsystem (if any) is required to run the image. The full list of values and corresponding descriptions can be found on the Microsoft website.
Examples:

  • 0x02 -> Windows GUI,
  • 0x03 -> Windows Console

DllCharacteristics

The DllCharacteristics field defines security and execution characteristics for a binary, such as whether it supports Address Space Layout Randomization (ASLR) or Data Execution Prevention (DEP). From a malware perspective, this is a valuable section because it shows which mitigations the binary opts into or avoids.
Common flags include:

  • HighEntropyVirtualAddressSpace (0x0020): Used for 64-bit images to support high-entropy ASLR.
  • DynamicBase (0x0040): Enables ASLR, allowing the image to be relocated at load time.
  • ForceIntegrity (0x0080): Enforces code integrity checks.
  • NxCompat (0x0100): Indicates the image is compatible with Non-eXecutable (NX) memory protection.
  • NoSeh (0x0400): Specifies that the image does not use Structured Exception Handling (SEH).
  • AppContainer (0x1000): Requires the image to run inside an AppContainer.
  • ControlFlowGuard (0x4000): Indicates support for Microsoft Control Flow Guard security mitigation.

SizeOf(Stack/Heap)(Commit/Reserve)

These values determine how the size of the memory that should be reserved and committed on both the heap and the stack. The default size of the stack reserve is 1MB for PE32 and 4MB for PE32+.

LoaderFlags

This is a reserved field and must, by default, be zero.

NumberOfRvaAndSizes

The number of Data Directories that follow this field, and as such the OptionalHeader. By using the value in this field, we can determine the size in bytes we have to read to capture all the fields of the Data Directory. Most often, the value of this field is 0x10 or 16 which covers the standard PE directories like the Import and Export Tables and Import Address Table.

Data Directories

The data directory is a set of pointers that are part of the Optional Header.

Offset (PE32/PE32+) Size Field name Description
96/112 8 Export Table The export table address and size (.edata section)
104/120 8 Import Table The import table address and size (.idata section)
112/128 8 Resource Table The resource table address and size (.rsrc section)
120/136 8 Exception Table The exception table address and size (.pdata section)
128/144 8 Certificate Table The certificate table address and size
136/152 8 Base Relocation Table The base relocation table address and size (.reloc section)
144/160 8 Debug The debug data starting address and size (.debug section)
152/168 8 Architecture Reserved, must be zero
160/176 8 Global Ptr The RVA of the value to be stored in the global pointer register
168/184 8 TLS Table The thread local storage table address and size (.tls section)
176/192 8 Load Config Table The load configuration table address and size
184/200 8 Bound Import The bound import table address and size
192/208 8 Import Address Table The import address table address and size
200/216 8 Delay Import Descriptor The delay import descriptor address and size
208/224 8 CLR Runtime Header The CLR runtime header address and size (.cormeta section)
216/232 8 Not specified Reserved, must be zero

For most Data Directory entries, the first four bytes represent an RVA and the second four bytes represent the size. The Certificate Table is an important exception: its address is a file offset because certificate data is not mapped into memory like normal image data.
Each binary can and likely will have a different Data Directory layout because not every directory needs to be present in every binary.
PEDetect reads each directory entry and checks whether the RVA and size are present. If a directory is present, the tool can attempt to map the RVA to a section and parse the corresponding structure.

Relevance for Malware Research

Mandiant documented a Ursnif/Gozi-ISFB sample that manipulated TLS callbacks while injecting into a child process. Their report also explains the key teaching point: TLS callbacks can execute before the normal AddressOfEntryPoint, meaning analysts and automated tools can miss the real first malicious code if they only break at the entry point.

Furthermore, Data directories are where PE structure starts to become behaviorally meaningful. Imports suggest capability, TLS changes execution order, resources may hide payload/configuration, relocations affect mapping, and the certificate table affects trust decisions.

Optional Header


Sections

We saw in the Data Directories that most tables correspond to a specific section. A section in a PE file contains code or data that linkers and Microsoft Win32 loaders process without special knowledge of the section contents.
With PEDetect, a best-effort attempt has been made at reading and parsing the sections. Mainly the most common sections, such as .text, .rdata, .data and a few others were prioritized. A sample output of PEDetect is displayed below for a few of the parsed sections.

sections

Relevance for Malware Research

Packed files commonly show symptoms such as few imports, high-entropy regions, unusual section names, and entry points in unexpected places. Entropy is especially useful because compressed or encrypted regions often create visible entropy shifts, although entropy alone is not a verdict.
During triage, analysts should look for overlapping or misaligned sections, invalid entry-point mappings, corrupted data directories, malformed imports, fake UPX names, and packed-lookalike layouts.
When analyzing Section headers we can ask: "Does the file layout look like a normal compiler produced it, or does it look transformed by a packer, protector, loader, or adversarial manipulation?"

Each section is 40 bytes long and contains the 10 fields outlined below:

Offset Size Field name Description
0 8 Name An ASCII string representing the section name
8 4 VirtualSize The total size of the section in memory. Note that a section may be larger than the size on disk due to alignment
12 4 VirtualAddress The RVA of the section, relative to the image base
16 4 SizeOfRawData The size of the section data in the file, aligned to the File Alignment
20 4 PointerToRawData The file offset where the section's data starts
24 4 PointerToRelocations The file offset of the relocation entries for the section
28 4 PointerToLinenumbers The file offset of the line number entries for the section
32 2 NumberOfRelocations The number of relocation entries for the section
34 2 NumberOfLinenumbers The number of line number entries for the section
36 4 Characteristics Flags indicating attributes for the section

Conclusion

After parsing the individual headers, PEDetect can be used as a triage aid rather than simply as a PE structure viewer. A clean baseline executable should show coherent header offsets, expected section names, reasonable alignment values, a plausible import table, and section permissions that match their purpose.

For suspicious samples, the same fields can reveal weak signals: an unusual entry point, a missing or tiny import table, high-entropy sections, suspicious section permissions, malformed data directories, a stripped or inconsistent Rich Header, or timestamps that do not align with the rest of the file.

None of these indicators proves maliciousness on its own. The value of PE-header analysis is that it tells the analyst where to look next.