Introduction
This blog post aims to highlight my understanding of the headers within a PE-COFF file and how
they can be extracted, and why they matter for malware research and reverse engineering.
In preparation for this post, and to demonstrate software development proficiency as well as a
decent understanding of the PE-COFF file headers, I developed a C++ project named PEDetect,
which parses the relevant headers from a given executable.
PEDetect is a self-made attempt at mirroring the features and capabilities of multiple
well-known industry tools by aiming to develop an advanced understanding of the PE-COFF file
structure. With PEDetect the aim is to understand how these tools work, and more importantly how
the PE-COFF and other headers in executables are structured. This understanding should lead to
an increase in knowledge about the information that these headers contain and how they can be
extracted and correlated.
During the description of the PE file structure in this first blog post, the HxDSetup executable
will be used to guide the reader through the different headers and structures. However, as this
project is approached from the perspective of a malware reverse engineer, we will also highlight
the relevance of certain headers and fields from a malware research perspective.
Table of Contents
Understanding the PE-COFF File Structure
PEDetect works by reading the input file byte by byte, starting at offset 0 and will obtain its information from the DOS Header and build upon this information to learn more about the subsequent headers: DOS Header, DOS Stub, COFF header, Optional Header, Data Directory and sections.
| Application Name | SHA256 |
| HxDSetup.exe | DCCFA4B16AA79E273CC7FFC35493C495A7FD09F92A4B790F2DC41C65F64D5378 |
DOS Header
The DOS Header is the first header in a PE file and is 64 bytes long. It starts with the infamous magic bytes MZ or in hex 0x4D5A. The header structure itself is defined in the winnt.h header file and appropriately named the IMAGE_DOS_HEADER struct.
typedef struct _IMAGE_DOS_HEADER {
WORD e_magic; // Magic number (MZ)
WORD e_cblp; // Bytes on last page of file
WORD e_cp; // Pages in file
WORD e_crlc; // Relocations
WORD e_cparhdr; // Size of header in paragraphs
WORD e_minalloc; // Minimum extra paragraphs needed
WORD e_maxalloc; // Maximum extra paragraphs needed
WORD e_ss; // Initial (relative) SS value
WORD e_sp; // Initial SP value
WORD e_csum; // Checksum
WORD e_ip; // Initial IP value
WORD e_cs; // Initial (relative) CS value
WORD e_lfarlc; // File address of relocation table
WORD e_ovno; // Overlay number
WORD e_res[4]; // Reserved words
WORD e_oemid; // OEM identifier (for e_oeminfo)
WORD e_oeminfo; // OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words
LONG e_lfanew; // File address of COFF header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
Often the most important field in the DOS header is e_lfanew.
This field contains the file offset of the PE signature, also called the start of the NT
Headers. At that offset, the loader expects PE\0\0,
followed by the COFF/File Header and the Optional Header.
The image below depicts the output of PEDetect, which displays the size of the header and the
location of the COFF header.
DOS Stub
The DOS Stub is located directly after the DOS Header and is a small legacy code region which will print a default error message along the lines of This program cannot be run in DOS mode or This program must be run under Win32 if the executable is loaded in MS-DOS. In essence, the stub is actually a piece of machine code which you can disassemble and analyze using a disassembler like GHIDRA/IDA. To do so, you can extract the stub bytes and load them into a disassembler as 16-bit code. The DOS Stub is a 16-bit MS-DOS program native to Intel 8086 processors. Once loaded you will notice that it contains a few simple instructions along the lines of obtaining the address of the error message and printing it using a DOS interrupt API call before exiting with an error.
Relevance for Malware Research
Relatively recent research specifically demonstrated how malware performs PE-format manipulations. These attacks, named Full DOS, Extend and Shift, inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. The research goes on to show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better trade-off in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks.
RICH Header
Executables compiled using the Microsoft Visual Studio toolset will have a populated RICH
header. The RICH header is an officially undocumented structure, however, over the years
researchers have been able to 'decode' the specific items in this header. You will notice, that
in the executable that we have used so far, the RICH header is fully nulled. A missing, zeroed,
or invalid Rich Header may suggest the binary was not produced by the Microsoft linker, but it
can also be stripped, altered, packed, or intentionally corrupted. Treat it as a clue, not
proof.
The RICH header begins with a chunk of XOR-ed data, a signature and a 32-bit checksum which
simultaneously acts as the XOR key. Let's start with the most straightforward part, the
signature: the signature is a 4-byte object containing the string Rich (0x52696368).
The last part of the RICH Header, is a 32-bit checksum. This checksum is simultaneously the XOR
key to decode the data before the signature. Once decoded, the data will contain a signature
containing the string DanS. Subsequently, there is likely to be
padding in the form of zeroed DWORD values. Lastly, the data contains DWORD key-value pairs
which each represent a tool name, the build number of the tool and the number of times the tool
has been used. PEDetect will read the XORed data blob into a fixed 88-byte array, it will then
parse the signature to determine if a proper RICH header blob has been read in. Subsequently, it
will skip over the padding and parse every 8 bytes, corresponding to the DWORD key-value pairs
to decode the build number, tool number and count.
Relevance for Malware Research
ESET has performed
Rich Header work which explicitly discusses malware families such as Dridex and Industroyer and
shows how Rich Header features can support clustering, hunting, and anomaly detection. It also
makes the important point that Rich Headers are useful but not absolute proof of authorship.
A Rich Header can support "this was built in a similar environment" or "this resembles a known
cluster", but it should not be treated as direct attribution proof.
PE Header
Now we get to the PE Header. We previously obtained the starting address of this header in the DOS Header by reading the value located in the e_lfanew field. The PE Header starts with the PE Signature, which simultaneously marks the start of the PE file. The value is always \x50\x45\x00\x00 and represents the ASCII string PE\0\0.
COFF Header
The COFF Header, also commonly referred to as the PE File Header or IMAGE_FILE_HEADER, contains seven fields.
| Offset | Size | Field | Description |
|---|---|---|---|
| 0 | 2 | Machine | The number identifying the type of target machine |
| 2 | 2 | NumberOfSections | The number of sections |
| 4 | 4 | TimeDateStamp | The number of seconds indicating when the file was created |
| 8 | 4 | PointerToSymbolTable | The file offset of the COFF symbol table |
| 12 | 4 | NumberOfSymbols | The number of entries in the symbol table |
| 16 | 2 | SizeOfOptionalHeader | The size of the optional header |
| 18 | 2 | Characteristics | The flags that indicate the attributes of the file |
Using PEDetect, we extract all of these values and display them accordingly. A more detailed explanation of all fields will be given below.
Relevance for Malware Research
Timestamps are one of the aspects that is used to perform attribution. However, a timestamp should be treated as a clue, not truth. Compare COFF timestamp against Rich Header, debug directory, certificate timestamp, resource timestamps, and campaign context. The timestamp is one of several PE locations that may carry time information, but malware authors and packers can easily tamper with it.
Machine
The first field is 2 bytes in size and the value represents the target architecture for which the executable was built. There's a subset of predefined examples, such as:
| Architecture in bytes | Architecture representation |
| 0x014C | Intel 386 |
| 0x01C0 | ARM |
| 0xAA64 | ARM64 |
| 0x8664 | x64 (AMD64) |
NumberOfSections
Subsequently, the second field, once again 2 bytes in size, represents the number of sections that are present in the PE file. It will do so, by only counting the section headers, and not the full section. The sections, which we will dive into later, represent different parts of the file that contains code, data or other resources the executable requires.
TimeDateStamp
In the third field, 4 bytes in size, the timestamp, indicating when the file was built is present. This can be used in forensics and malware reverse engineering cases. For example, it allows for speculation regarding when certain campaigns were developed and can be used to differentiate between samples and progressions in development of specific adversarial campaigns. Note that although it can support timeline analysis, it should not be treated as reliable on its own. Malware authors and packers can tamper with it.
PointerToSymbolTable
The fourth field, 4 bytes in size, contains an address which points to the symbol table. However, note that this value, especially in executables and DLL files is often set to 0x00000000. Therefore, we will not discuss this any further at this point.
NumberOfSymbols
The subsequent field, also 4 bytes in size, contains the number of symbols (entries) in the symbol table. As the symbol table is often not present, this value is likely to be 0x00000000 as well.
SizeOfOptionalHeader
The sixth field, 2 bytes in size, contains the size of the Optional Header. In normal PE triage, the two values you will most commonly care about are 0x10B for PE32 and 0x20B for PE32+.
Characteristics
Ultimately, we end with the Characteristics field, which is 2 bytes in size. The Characteristics field is a combination of one or multiple flags that indicate the attributes and characteristics of the executable. Microsoft has disclosed the full list of Flags and their corresponding values and descriptions.
Optional Header
Despite its name, the optional header is present in every image file and provides information
to the loader. As defined by Microsoft, the optional header is only optional in object files.
The first step is to validate that optional header magic number and ensure our previous
assumption in terms of 32-bit and 64-bit executables is correct.
Depending on the format, 32-bit or 64-bit, the Optional Header will have one more field. For
32-bit files, the BaseOfData field exists, and doesn't exist in 64-bit executables. Furthermore,
some fields might be 8 bytes in size instead of 4.
Within the code of PEDetect, we account for this accordingly and have developed a PE32 and PE64
parser.
Using PEDetect, the output will display all significant values that can be found in the header.
Relevance for Malware Research
The Optional Header tells you how the loader will map and start the image. For malware analysis, AddressOfEntryPoint, ImageBase, SectionAlignment, FileAlignment, Subsystem, and DllCharacteristics deserve special attention.
| Offset (PE/PE32+) | Size (PE/PE32+) | Field | Description |
|---|---|---|---|
| 0 | 2 | Magic | Specifies file format |
| 2 | 1 | MajorLinkerVersion | The linker major version number |
| 3 | 1 | MinorLinkerVersion | The linker minor version number |
| 4 | 4 | SizeOfCode | The size of the code (text) section(s) |
| 8 | 4 | SizeOfInitializedData | The size of the initialized data section(s) (.data, .rdata, resources, etc.) |
| 12 | 4 | SizeOfUninitializedData | The size of the uninitialized data section(s) (BSS) |
| 16 | 4 | AddressOfEntryPoint | The address of the entry point relative to the image base when loaded into memory |
| 20 | 4 | BaseOfCode | The relative address of the beginning-of-code section |
| 24 | 4 | BaseOfData | The relative address of the beginning-of-data section, does not exist in PE32+ executables |
| 28/24 | 4/8 | ImageBase | The preferred address of the first byte of image when loaded into memory |
| 32/32 | 4 | SectionAlignment | The alignment in bytes of sections when they are loaded into memory |
| 36/36 | 4 | FileAlignment | The alignment in bytes that is used to align the raw data of sections |
| 40/40 | 2 | MajorOperatingSystemVersion | The major version number of the required operating system |
| 42/42 | 2 | MinorOperatingSystemVersion | The minor version number of the required operating system |
| 44/44 | 2 | MajorImageVersion | The major version number of the image |
| 46/46 | 2 | MinorImageVersion | The minor version number of the image |
| 48/48 | 2 | MajorSubsystemVersion | The major version number of the subsystem |
| 50/50 | 2 | MinorSubsystemVersion | The minor version number of the subsystem |
| 52/52 | 4 | Win32VersionValue | Must be zero |
| 56/56 | 4 | SizeOfImage | The size (in bytes) of the image, including all headers, as the image is loaded in memory |
| 60/60 | 4 | SizeOfHeaders | The combined size of an MS-DOS stub, PE header and section headers |
| 64/64 | 4 | Checksum | The image file checksum. Important for drivers and some system images; often zero or ignored for ordinary user-mode executables. |
| 68/68 | 2 | Subsystem | The subsystem that is required to run this image |
| 70/70 | 2 | DllCharacteristics | Flags describing security and loader characteristics of the image |
| 72/72 | 4/8 | SizeOfStackReserve | The size of the stack to reserve |
| 76/80 | 4/8 | SizeOfStackCommit | The size of the stack to commit |
| 80/88 | 4/8 | SizeOfHeapReserve | The size of the heap to reserve |
| 84/96 | 4/8 | SizeOfHeapCommit | The size of the heap to commit |
| 88/104 | 4 | LoaderFlags | Reserved, must be zero |
| 92/108 | 4 | NumberOfRvaAndSizes | The number of data-directory entries in the remainder of the optional header |
Magic
Like all headers so far, the Optional Header also starts with magic bytes. The magic is 2 bytes long and can be either one of the following two values. If the value is 0x10B it represents a 32-bit executable, if the value is 0x20B it represents a 64-bit executable.
Major/MinorLinkerVersion
The bytes in these two fields make up the version number of the linker that generated the file
and as such it indicates the compatibility of the file with the linker software. These values
combined can be used to determine the toolset which was used to create the executable and
contribute to assumptions about the programming language used.
An example value: 0x0E24 should be read as MajorLinkerVersion = 0x0E and MinorLinkerVersion = 0x24 which corresponds to Microsoft Linker
version 14.36 (associated with Visual Studio 2022).
SizeOfCode
This value represents the total size of all sections that contain executable code. As we will discuss sections later that is all we need from this field for now.
SizeOf(Un)initializedData
These two fields represent the size of the initialized data section (.data, .rdata, resources, etc.) and the uninitialized data section (.bss).
AddressOfEntryPoint
The value in this field represents the Relative Virtual Address where the Windows loader transfers control after mapping the image and performing loader-managed initialization. It is not necessarily main, WinMain, or the first code that runs in the process. TLS callbacks and runtime startup code may execute before developer-controlled logic.
BaseOfCode
The value in this field represents the Relative Virtual Address of the start of the code section in memory.
BaseOfData
The value in this field represents the Relative Virtual Address of the start of the data section in memory.
ImageBase
The ImageBase represents the preferred memory address at which the image should be loaded. In most cases the defaults for these are 0x400000 for PE32 and 0x140000000 for PE32+. This should be familiar to those who have loaded images into disassemblers like GHIDRA/IDA and see addresses starting with 0x4... or 0x14... respectively.
Section/FileAlignment
The alignment of sections in memory and in the file on disk respectively.
Section: This value is typically set to 0x1000
representing
4KB. However, the value must always be greater than or equal to the FileAlignment.
File: This value is typically set to 0x200 (512 bytes)
but
can vary based on the file format.
Major/MinorOperatingSystemVersion
The version number corresponding to the minimum Windows OS version that is required to run the executable. Possible values could be:
- 5.1 -> "Windows XP",
- 6.2 -> "Windows 8, Windows Server 2012",
- 10.0 -> "Windows 10/11, Windows Server 2016/2019/2022"
Major/MinorImageVersion
The major and minor version number of the image, this is set by the developer and as such can have any value.
Major/MinorSubsystemVersion
See Major/MinorOperatingSystemVersion
Win32VersionValue
This value, by default, must be zero.
SizeOfHeaders
The value in this field contains the size of the MS-DOS stub, the PE Header and sections headers. Since we have already obtained the MS-DOS stub and PE header, and therefore know its sizes, we can calculate the size of the section headers.
Checksum
The checksum is important for kernel-mode drivers and some system-critical images. For normal user-mode executables it is often zero or not meaningful during basic triage.
Subsystem
The subsystem field determines which Windows subsystem (if any) is required to run the image.
The full list of values and corresponding descriptions can be found on the Microsoft
website.
Examples:
- 0x02 -> Windows GUI,
- 0x03 -> Windows Console
DllCharacteristics
The DllCharacteristics field defines security and execution characteristics for a binary, such
as whether it supports Address Space Layout Randomization (ASLR) or Data Execution Prevention
(DEP). From a malware perspective, this is a valuable section because it shows which mitigations
the binary opts into or avoids.
Common flags include:
- HighEntropyVirtualAddressSpace (0x0020): Used for 64-bit images to support high-entropy ASLR.
- DynamicBase (0x0040): Enables ASLR, allowing the image to be relocated at load time.
- ForceIntegrity (0x0080): Enforces code integrity checks.
- NxCompat (0x0100): Indicates the image is compatible with Non-eXecutable (NX) memory protection.
- NoSeh (0x0400): Specifies that the image does not use Structured Exception Handling (SEH).
- AppContainer (0x1000): Requires the image to run inside an AppContainer.
- ControlFlowGuard (0x4000): Indicates support for Microsoft Control Flow Guard security mitigation.
SizeOf(Stack/Heap)(Commit/Reserve)
These values determine how the size of the memory that should be reserved and committed on both the heap and the stack. The default size of the stack reserve is 1MB for PE32 and 4MB for PE32+.
LoaderFlags
This is a reserved field and must, by default, be zero.
NumberOfRvaAndSizes
The number of Data Directories that follow this field, and as such the OptionalHeader. By using the value in this field, we can determine the size in bytes we have to read to capture all the fields of the Data Directory. Most often, the value of this field is 0x10 or 16 which covers the standard PE directories like the Import and Export Tables and Import Address Table.
Data Directories
The data directory is a set of pointers that are part of the Optional Header.
| Offset (PE32/PE32+) | Size | Field name | Description |
|---|---|---|---|
| 96/112 | 8 | Export Table | The export table address and size (.edata section) |
| 104/120 | 8 | Import Table | The import table address and size (.idata section) |
| 112/128 | 8 | Resource Table | The resource table address and size (.rsrc section) |
| 120/136 | 8 | Exception Table | The exception table address and size (.pdata section) |
| 128/144 | 8 | Certificate Table | The certificate table address and size |
| 136/152 | 8 | Base Relocation Table | The base relocation table address and size (.reloc section) |
| 144/160 | 8 | Debug | The debug data starting address and size (.debug section) |
| 152/168 | 8 | Architecture | Reserved, must be zero |
| 160/176 | 8 | Global Ptr | The RVA of the value to be stored in the global pointer register |
| 168/184 | 8 | TLS Table | The thread local storage table address and size (.tls section) |
| 176/192 | 8 | Load Config Table | The load configuration table address and size |
| 184/200 | 8 | Bound Import | The bound import table address and size |
| 192/208 | 8 | Import Address Table | The import address table address and size |
| 200/216 | 8 | Delay Import Descriptor | The delay import descriptor address and size |
| 208/224 | 8 | CLR Runtime Header | The CLR runtime header address and size (.cormeta section) |
| 216/232 | 8 | Not specified | Reserved, must be zero |
For most Data Directory entries, the first four bytes represent an RVA and the second
four bytes represent the size. The Certificate Table is an important exception: its
address is a file offset because certificate data is not mapped into memory like normal
image data.
Each binary can and likely will have a different Data Directory layout because not every
directory needs to be present in every binary.
PEDetect reads each directory entry and checks whether the RVA and size are present.
If a directory is present, the tool can attempt to map the RVA to a section and parse
the corresponding structure.
Relevance for Malware Research
Mandiant documented
a Ursnif/Gozi-ISFB sample that manipulated TLS callbacks while injecting
into a child process. Their report also explains the key teaching point: TLS callbacks can
execute before the normal AddressOfEntryPoint, meaning analysts and automated tools can miss
the real first malicious code if they only break at the entry point.
Furthermore, Data directories are where PE structure starts to become behaviorally
meaningful.
Imports suggest capability, TLS changes execution order, resources may hide
payload/configuration, relocations affect mapping, and the certificate table affects trust
decisions.
Sections
We saw in the Data Directories that most tables correspond to a specific section. A section
in a
PE file contains code or data that linkers and Microsoft Win32 loaders process without
special
knowledge of the section contents.
With PEDetect, a best-effort attempt has been made at reading and parsing the sections.
Mainly
the most common sections, such as .text, .rdata, .data and a few others
were
prioritized. A sample output of PEDetect is displayed below for a few of the parsed
sections.
Relevance for Malware Research
Packed files commonly show symptoms such as few imports, high-entropy regions, unusual
section
names, and entry points in unexpected places. Entropy is especially useful because
compressed or
encrypted regions often create visible entropy shifts, although entropy alone is not a
verdict.
During triage, analysts should look for overlapping or misaligned sections, invalid entry-point
mappings, corrupted data directories, malformed imports, fake UPX names, and
packed-lookalike layouts.
When analyzing Section headers we can ask: "Does the file layout look like a normal
compiler produced it, or does it look transformed by a packer, protector, loader, or
adversarial manipulation?"
Each section is 40 bytes long and contains the 10 fields outlined below:
| Offset | Size | Field name | Description |
|---|---|---|---|
| 0 | 8 | Name | An ASCII string representing the section name |
| 8 | 4 | VirtualSize | The total size of the section in memory. Note that a section may be larger than the size on disk due to alignment |
| 12 | 4 | VirtualAddress | The RVA of the section, relative to the image base |
| 16 | 4 | SizeOfRawData | The size of the section data in the file, aligned to the File Alignment |
| 20 | 4 | PointerToRawData | The file offset where the section's data starts |
| 24 | 4 | PointerToRelocations | The file offset of the relocation entries for the section |
| 28 | 4 | PointerToLinenumbers | The file offset of the line number entries for the section |
| 32 | 2 | NumberOfRelocations | The number of relocation entries for the section |
| 34 | 2 | NumberOfLinenumbers | The number of line number entries for the section |
| 36 | 4 | Characteristics | Flags indicating attributes for the section |
Conclusion
After parsing the individual headers, PEDetect can be used as a triage aid rather than
simply as a PE structure viewer. A clean baseline executable should show coherent header
offsets, expected section names, reasonable alignment values, a plausible import table,
and section permissions that match their purpose.
For suspicious samples, the same fields can reveal weak signals: an unusual entry point,
a missing or tiny import table, high-entropy sections, suspicious section permissions,
malformed data directories, a stripped or inconsistent Rich Header, or timestamps that
do not align with the rest of the file.
None of these indicators proves maliciousness on its own. The value of PE-header analysis
is that it tells the analyst where to look next.