PE Structure
Introduction
According to wikipidea, The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems, and in UEFI environments. As someone who wants to write / reverse engineer a Malware, it is important to delve deep into the structure of a PE file, so let's start without using any more time.
DOS Header
Any PE file starts with MZ or 4d 5a (reversed due to little-endian format) which is also known as the magic byte of the file. The magic bytes of a file define which kind of file would it be, for eg. a JPG starts with FF D8 FF.
The first 64 bytes of the PE file is IMAGE_DOS_HEADER structure , the main reason for its existence is backward compatibility. We can find the structure definition in winnt.h (which can be found here https://codemachine.com/downloads/win80/winnt.h)
typedef struct _IMAGE_DOS_HEADER { // DOS .EXE header
WORD e_magic; // Magic number
WORD e_cblp; // Bytes on last page of file
WORD e_cp; // Pages in file
WORD e_crlc; // Relocations
WORD e_cparhdr; // Size of header in paragraphs
WORD e_minalloc; // Minimum extra paragraphs needed
WORD e_maxalloc; // Maximum extra paragraphs needed
WORD e_ss; // Initial (relative) SS value
WORD e_sp; // Initial SP value
WORD e_csum; // Checksum
WORD e_ip; // Initial IP value
WORD e_cs; // Initial (relative) CS value
WORD e_lfarlc; // File address of relocation table
WORD e_ovno; // Overlay number
WORD e_res[4]; // Reserved words
WORD e_oemid; // OEM identifier (for e_oeminfo)
WORD e_oeminfo; // OEM information; e_oemid specific
WORD e_res2[10]; // Reserved words
LONG e_lfanew; // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;There are a lot of fields but we are interested in only few of them like e_lfanew which points to IMAGE_NT_HEADERS which is another structure. Since we know the type of all the members, we can calculate the offset of the member e_lfanew which would be (word = 2 bytes, 16 members , an array of 4 word, and an array of 10 words).
which is 0x3C in hex, so the address of IMAGE_NT_HEADERS is at offset 0x3C from the start of the PE file.
DOS Stub
After the IMAGE_DOS_HEADER we have the DOS stub which again isn't importante, it runs the message "This program cannot be run in DOS mode" and then exits. Ill ignore this but if you want to know more , you can check 0xrick's blog on it.
Rich Headers
We have Rich Headers in between the DOS Stub and the NT Headers , and the special thing about it is that it is an undocumented structure and also that it is only present in the PE developed using the Microsoft Visual Studio. It contains metadata about the tools used and their specific versions. You can check this page , and this out for more on Rich Headers.
NT Headers
Here is the definition of NT Headers from winnt.h file, It has 2 variants, one for 32-bit and other for 64-bit systems, the first member is the same DWORD Signature and has a constant value of 50 45 00 00 which translates to PE and the 2 null bytes. Apart from that , there is a File Header structure and an Optional Header structure.
File Header
The File Header structure looks like this (aka Coff Header)
it talks about few things like the
Machine: specifies whether the PE is for 32 / 64 bit architecture.NumberOfSections: The no. of sections the PE has.TimeDateStamp: date & time of binary compilation.PointerToSymbolTable: Offset to the COFF Symbol tableNumberOfSymboles: No. of symbols in that tableSizeOfOptionalHeader: clearly says what it isCharacteristics: which talks about the characteristics of the PE file.
Optional Header
Then we have the Optional Header which is considered to be one of the most important information centric structure , Here's the definition (from MSDN page)
Magic: which specifies whether the system is 32 bit (0x010B) or 64 bit (0x020B)AddressOfEntryPoint: which lets us know from where will the windows begin the execution of the PE. This is a Relative Virtual Address (RVA) which means that it is at an offset ofImageBase + RVA.ImageBase: The preferred base address of the PE when loaded into memory which is generally0x00400000for exe files but of course not every file can run with same base address so it may be different for some.BaseOfCode&BaseOfData: RVA for Code segment & Data Segment.Subsystem: Lets us know which subsystem is required to run the image. (See full list at MSDN document )DataDirectory: The data directory indicates where to find other important components of executable information in the file. It is really nothing more than an array ofIMAGE_DATA_DIRECTORYstructure. There are 16 possibleDataDirectory.
Data Directory
The last member is DataDirectory which is of type IMAGE_DATA_DIRECTORY and the IMAGE_NUMBEROF_DIRECTORY_ENTRIES is a constant value of 16, so basically there would be 16 DataDirectory , looking at the structure IMAGE_DATA_DIRECTORY
These are the directories with the last one (15th) being reserved, we can see this in winnt.h file
the Export Directory and the Import Directory , also known as Export Address Table (EAT) and Import Address Table (IAT) are the 2 important directories from both developing and analyzing malware point of view. IAT contains a ton of information and important structures like the Process Environment Block (PEB) and Thread Environment Block (TEB) and tells us the about the winapi functions are being used. So something like a VirtualAllocEx and CreateProcessThread along with WriteProcessMemory would point us towards high possibility of process injection.
Section Header
Then comes the Section Header which contains information on the different sections and their sizes in the PE file. the structure looks like this (from winnt.h)
There are different sections having different purpose mentioned below:
.textstores the actual code of the program.dataholds the initialized and defined variables.bssholds the uninitialized data (declared variables with no assigned values).rdatacontains the read-only data.edata: contains exportable objects and related table information.idataimported objects and related table information.
relocimage relocation information.rsrclinks external resources used by the program such as images, icons, embedded binaries, and manifest file, which has all information about program versions, authors, company, and copyright.
Import Directory (IAT & ILT)
Now whenever we import the functions from windows api , all of these information gets stored in the .idata section. The .idata section consists of IMAGE_IMPORT_DIRECTORY which consists of series of _IMAGE_IMPORT_DESCRIPTOR structures
The OriginalFirstThunk member points to the ILT or the Import Lookup Table which is very similar to IAT but the only thing is that it remains static and contains RVA and ordinal or hint-name table for the functions imported, and the IAT gets overwritten with the address of the imported functions when the binary is loaded. the reason behind this behavior is explained well here, the hint-name table structure is as follows
where the Hint is the number that is used to lookup the function, its first used as index to Export Name Table pointer array (of the DLL) , and if that is incorrect then a binary search is performed.
Conclusion
In conclusion, The IMAGE_IMPORT_DESCRIPTOR structure defines function imports in PE files, with OriginalFirstThunk pointing to the Import Lookup Table (ILT), which contains function names or ordinals, and FirstThunk pointing to the Import Address Table (IAT), where function addresses are stored once resolved at runtime. The ILT remains static while the IAT is updated when the program loads, allowing dynamic linking of external libraries without tying the application to specific function addresses.
I will soon write another post dissecting a PE and showing most of the things I have said here.
References
Last updated
Was this helpful?