PE Structure
Introduction
According to wikipidea, The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. As someone who wants to write / reverse engineer a Malware, it is important to delve deep into the structure of a PE file, so let's start without using any more time.
DOS Header
Any PE file starts with MZ or 4d 5a
(reversed due to little-endian format) which is also known as the magic byte of the file. The magic bytes of a file define which kind of file would it be, for eg. a JPG starts with FF D8 FF
.
The first 64 bytes of the PE file is IMAGE_DOS_HEADER
structure , the main reason for its existence is backward compatibility. We can find the structure definition in winnt.h
(which can be found here https://codemachine.com/downloads/win80/winnt.h)
typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header
WORD e_magic; // Magic number
WORD e_cblp; // Bytes on last page of file
WORD e_cp; // Pages in file
WORD e_crlc; // Relocations
WORD e_cparhdr; // Size of header in paragraphs
WORD e_minalloc; // Minimum extra paragraphs needed
WORD e_maxalloc; // Maximum extra paragraphs needed
WORD e_ss; // Initial (relative) SS value
WORD e_sp; // Initial SP value
WORD e_csum; // Checksum
WORD e_ip; // Initial IP value
WORD e_cs; // Initial (relative) CS value
WORD e_lfarlc; // File address of relocation table
WORD e_ovno; // Overlay number
WORD e_res[4]; // Reserved words
WORD e_oemid; // OEM identifier (for e_oeminfo)
WORD e_oeminfo; // OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
There are a lot of fields but we are interested in only few of them like e_lfanew
which points to IMAGE_NT_HEADERS
which is another structure. Since we know the type of all the members, we can calculate the offset of the member e_lfanew
which would be (word = 2 bytes, 16 members , an array of 4 word, and an array of 10 words).
which is 0x3C
in hex, so the address of IMAGE_NT_HEADERS
is at offset 0x3C from the start of the PE file.
DOS Stub
After the IMAGE_DOS_HEADER
we have the DOS stub which again isn't importante, it runs the message "This program cannot be run in DOS mode" and then exits. Ill ignore this but if you want to know more , you can check 0xrick's blog on it.
Rich Headers
We have Rich Headers in between the DOS Stub and the NT Headers , and the special thing about it is that it is an undocumented structure and also that it is only present in the PE developed using the Microsoft Visual Studio. It contains metadata about the tools used and their specific versions. You can check this page , and this out for more on Rich Headers.
NT Headers
Here is the definition of NT Headers from winnt.h
file, It has 2 variants, one for 32-bit and other for 64-bit systems, the first member is the same DWORD Signature
and has a constant value of 50 45 00 00
which translates to PE
and the 2 null bytes. Apart from that , there is a File Header structure and an Optional Header structure.
typedef struct _IMAGE_NT_HEADERS64 {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER64 OptionalHeader;
} IMAGE_NT_HEADERS64, *PIMAGE_NT_HEADERS64;
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;
File Header
The File Header structure looks like this (aka Coff Header)
typedef struct _IMAGE_FILE_HEADER {
WORD Machine;
WORD NumberOfSections;
DWORD TimeDateStamp;
DWORD PointerToSymbolTable;
DWORD NumberOfSymbols;
WORD SizeOfOptionalHeader;
WORD Characteristics;
} IMAGE_FILE_HEADER, *PIMAGE_FILE_HEADER;
it talks about few things like the
Machine
: specifies whether the PE is for 32 / 64 bit architecture.NumberOfSections
: The no. of sections the PE has.TimeDateStamp
: date & time of binary compilation.PointerToSymbolTable
: Offset to the COFF Symbol tableNumberOfSymboles
: No. of symbols in that tableSizeOfOptionalHeader
: clearly says what it isCharacteristics
: which talks about the characteristics of the PE file.
Optional Header
Then we have the Optional Header which is considered to be one of the most important information centric structure , Here's the definition (from MSDN page)
typedef struct _IMAGE_OPTIONAL_HEADER {
WORD Magic;
BYTE MajorLinkerVersion;
BYTE MinorLinkerVersion;
DWORD SizeOfCode;
DWORD SizeOfInitializedData;
DWORD SizeOfUninitializedData;
DWORD AddressOfEntryPoint;
DWORD BaseOfCode;
DWORD BaseOfData;
DWORD ImageBase;
DWORD SectionAlignment;
DWORD FileAlignment;
WORD MajorOperatingSystemVersion;
WORD MinorOperatingSystemVersion;
WORD MajorImageVersion;
WORD MinorImageVersion;
WORD MajorSubsystemVersion;
WORD MinorSubsystemVersion;
DWORD Win32VersionValue;
DWORD SizeOfImage;
DWORD SizeOfHeaders;
DWORD CheckSum;
WORD Subsystem;
WORD DllCharacteristics;
DWORD SizeOfStackReserve;
DWORD SizeOfStackCommit;
DWORD SizeOfHeapReserve;
DWORD SizeOfHeapCommit;
DWORD LoaderFlags;
DWORD NumberOfRvaAndSizes;
IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;
Magic
: which specifies whether the system is 32 bit (0x010B) or 64 bit (0x020B)AddressOfEntryPoint
: which lets us know from where will the windows begin the execution of the PE. This is a Relative Virtual Address (RVA) which means that it is at an offset ofImageBase + RVA
.ImageBase
: The preferred base address of the PE when loaded into memory which is generally0x00400000
for exe files but of course not every file can run with same base address so it may be different for some.BaseOfCode
&BaseOfData
: RVA for Code segment & Data Segment.Subsystem
: Lets us know which subsystem is required to run the image. (See full list at MSDN document )DataDirectory
: The data directory indicates where to find other important components of executable information in the file. It is really nothing more than an array ofIMAGE_DATA_DIRECTORY
structure. There are 16 possibleDataDirectory
.
Data Directory
The last member is DataDirectory
which is of type IMAGE_DATA_DIRECTORY
and the IMAGE_NUMBEROF_DIRECTORY_ENTRIES
is a constant value of 16, so basically there would be 16 DataDirectory
, looking at the structure IMAGE_DATA_DIRECTORY
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
These are the directories with the last one (15th) being reserved, we can see this in winnt.h file
// Directory Entries
#define IMAGE_DIRECTORY_ENTRY_EXPORT 0 // Export Directory
#define IMAGE_DIRECTORY_ENTRY_IMPORT 1 // Import Directory
#define IMAGE_DIRECTORY_ENTRY_RESOURCE 2 // Resource Directory
#define IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 // Exception Directory
#define IMAGE_DIRECTORY_ENTRY_SECURITY 4 // Security Directory
#define IMAGE_DIRECTORY_ENTRY_BASERELOC 5 // Base Relocation Table
#define IMAGE_DIRECTORY_ENTRY_DEBUG 6 // Debug Directory
// IMAGE_DIRECTORY_ENTRY_COPYRIGHT 7 // (X86 usage)
#define IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 // Architecture Specific Data
#define IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 // RVA of GP
#define IMAGE_DIRECTORY_ENTRY_TLS 9 // TLS Directory
#define IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 // Load Configuration Directory
#define IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 // Bound Import Directory in headers
#define IMAGE_DIRECTORY_ENTRY_IAT 12 // Import Address Table
#define IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 // Delay Load Import Descriptors
#define IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14 // COM Runtime descriptor
the Export Directory and the Import Directory , also known as Export Address Table (EAT) and Import Address Table (IAT) are the 2 important directories from both developing and analyzing malware point of view. IAT contains a ton of information and important structures like the Process Environment Block (PEB)
and Thread Environment Block (TEB)
and tells us the about the winapi functions are being used. So something like a VirtualAllocEx
and CreateProcessThread
along with WriteProcessMemory
would point us towards high possibility of process injection.
Section Header
Then comes the Section Header which contains information on the different sections and their sizes in the PE file. the structure looks like this (from winnt.h
)
typedef struct _IMAGE_SECTION_HEADER {
BYTE Name[IMAGE_SIZEOF_SHORT_NAME];
union {
DWORD PhysicalAddress;
DWORD VirtualSize;
} Misc;
DWORD VirtualAddress;
DWORD SizeOfRawData;
DWORD PointerToRawData;
DWORD PointerToRelocations;
DWORD PointerToLinenumbers;
WORD NumberOfRelocations;
WORD NumberOfLinenumbers;
DWORD Characteristics;
} IMAGE_SECTION_HEADER, *PIMAGE_SECTION_HEADER;
There are different sections having different purpose mentioned below:
.text
stores the actual code of the program.data
holds the initialized and defined variables.bss
holds the uninitialized data (declared variables with no assigned values).rdata
contains the read-only data.edata
: contains exportable objects and related table information.idata
imported objects and related table information.
reloc
image relocation information.rsrc
links external resources used by the program such as images, icons, embedded binaries, and manifest file, which has all information about program versions, authors, company, and copyright.
Import Directory (IAT & ILT)
Now whenever we import the functions from windows api , all of these information gets stored in the .idata
section. The .idata
section consists of IMAGE_IMPORT_DIRECTORY
which consists of series of _IMAGE_IMPORT_DESCRIPTOR
structures
typedef struct _IMAGE_IMPORT_DESCRIPTOR
{
union {
DWORD Characteristics; // 0 for terminating null import descriptor
DWORD OriginalFirstThunk; // RVA to original unbound IAT (PIMAGE_THUNK_DATA) / ILT
} DUMMYUNIONNAME;
DWORD TimeDateStamp; // 0 if not bound,
// -1 if bound, and real date ime stamp
// in IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT (new BIND)
// O.W. date/time stamp of DLL bound to (Old BIND)
DWORD ForwarderChain; // -1 if no forwarders
DWORD Name;
DWORD FirstThunk; // RVA to IAT (if bound this IAT has actual addresses)
} IMAGE_IMPORT_DESCRIPTOR;
The OriginalFirstThunk
member points to the ILT or the Import Lookup Table which is very similar to IAT but the only thing is that it remains static and contains RVA and ordinal or hint-name table for the functions imported, and the IAT gets overwritten with the address of the imported functions when the binary is loaded. the reason behind this behavior is explained well here, the hint-name table structure is as follows
typedef struct _IMAGE_IMPORT_BY_NAME {
WORD Hint;
CHAR Name[1];
} IMAGE_IMPORT_BY_NAME, *PIMAGE_IMPORT_BY_NAME;
where the Hint is the number that is used to lookup the function, its first used as index to Export Name Table pointer array (of the DLL) , and if that is incorrect then a binary search is performed.
Conclusion
In conclusion, The IMAGE_IMPORT_DESCRIPTOR
structure defines function imports in PE files, with OriginalFirstThunk
pointing to the Import Lookup Table (ILT), which contains function names or ordinals, and FirstThunk
pointing to the Import Address Table (IAT), where function addresses are stored once resolved at runtime. The ILT remains static while the IAT is updated when the program loads, allowing dynamic linking of external libraries without tying the application to specific function addresses.
I will soon write another post dissecting a PE and showing most of the things I have said here.
References
Last updated
Was this helpful?