This document explains in detail how to start exploring and examining file formats, with a focus on Game Resource Archives. For beginners and advanced users alike.
The definitive word in archive exploration.
Download below, or scroll on down and read it here:
Authors: Mr.Mouse and Watto
Version: 1.0 as of November 2004
Rewritten for the WIKI by Dinoguy1000 as of August 2006
- 1 Table of Contents
- 2 Introduction
- 3 Tools of the Trade
- 4 Terms, Definitions and Data Structures
- 5 Archive Patterns
- 6 Checking Your Results
- 7 Encryption and Compression
- 8 Worked Examples
- 9 Appendix
- 10 Legal Information
|THE DEFINITIVE GUIDE TO|
|EXPLORING FILE FORMATS|
= Revision 2 =
Table of Contents
Computer games are vast and many, covering a wide range of genres and game styles, but there is one fundamental feature that all games require - resources. Every game has a range of resources that help make it unique - from texture images to audio soundtracks. With all these resources, there needs to be a way they can be stored so that games can use them, and the way this is typically done is to store them in a big archive file.
|An archive is a single computer file that contains the data for several smaller files. A common analogy would be a cardboard box - it can be used to store a lot of different items (paper, food, objects), and each item can have different properties (size, color, shape)|
The question that may arise is "why do game developers use archives to store their game resources? WouldnÃ¢â‚¬â„¢t it be easier to just store all the files normally?" The answer is yes, storing the files normally would be much easier, and certainly much better during the game development, but before the final production they are packaged into archives for several reasonsÃ¢â‚¬Â¦
- An archive can store a lot of files in a single location, so it is quicker to access the files from a hard disk or CD
- A large archive, due to it being in 1 block on the disk, can utilise features such as file buffers, further increasing read performance
- It reduces the number of files on the disk, making the reading of the file index quicker
- The files can be hidden away, making it harder to hack or modify the game
- All files can be accessed using a single file stream, reducing the time required to generate file stream objects, and making the file access programming simpler
- Files can be compressed easily, and other information such as file descriptions and ID numbers can be stored
Purpose Of This Book
Unfortunately, there is a downside to using archives - there are no real standards defined for the creation and use of archives. In order to read or write archives for a particular game, someone usually needs to analyse the file themselves, or perform other complicated and time-consuming tasks such as reverse engineering or hex editing.
Some of the more modern games produced these days recognise that they can gain extra advertising by allowing the internet community to mod their games. Due to this, some game developers have changed to supporting standard archive types, such as Zip archives, however there is still an overwhelming number of games with their own proprietary archive formats.
|Mod, short for modification, refers to the alteration of a computer game by a member of the internet community, usually to support extra functionality or to generate a different game built on top of the original. Some examples include changing the sounds and textures used by a game, or creating new game maps.|
This book aims to provide an insight into the way game archives are created, and how to analyse an archive to locate the files contained within. In the following pages, we will discuss some of the basic fundamentals of computer-stored numbering, common structures used by most archives, compression, encryption, and the tools that you can use to help get the job done. Hopefully, by the time you have finished reading this book, you will be able to analyse your own archives, and take the first step towards your own development and game modding.
Thanks for reading our book, we wish you the best of luck in your exploration Ã¯ÂÅ .
Formatting Used In This Book
|Link||A link to a website of interest or for further information.|
|Link||A link to a different section of the document.|
|Term||An important term, or a term that is being defined.|
A general comment, or clarification of a point.
|Value||A value, usually in an example|
|Caption||Caption for an image, or a reference to some information in the image|
|Reference||A tool reference, such as a menu, button, or action in a specific program.|
|Brief descriptions of a term, related notes, or other supplementary material will be presented in a box like this. This will often accompany a term.|
What is a GRAF?
The term GRAF describes the way a game archive is constructed, and in particular, the storage of the files within the archive. The format of an archive usually differs between each individual game, however occasionally a game developer will stick with a particular format for a few games of the same vintage, particularly if the games are built using the same underlying game engine.
|GRAF stands for Game Resource Archive Format, which is most simply the specifications describing the format of a particular archive.|
Programmers usually define their GRAFs according to the needs and structure of the game itself. For example, the memory in an XBOX game console is based around blocks of 2048 bytes - the GRAFs for most XBOX games utilise this so the game data can be opened efficiently.
The development of a GRAF is particularly troublesome - there is a constant weigh-up between factors such as efficient storage, quick loading, and fast targeting. One of the things that has great influence is human readability - the things that make archives easy for humans to use, often make it less efficient. For example, the storing of filenames in an archive tells humans the purpose and type of data, however it is very inefficient and slow to read filenames from an archive - thus the weigh-up.
|Efficient storage: Files need to be stored in a way that conserves space on the disk and/or in memory.|
|Quick Loading: When the game is loading, the required resources are loaded into memory - this needs to be done quickly, while still gathering all the required information.|
|Fast Targeting: When a resource is loaded into memory, it needs to be quick and easy for the game to find the file. This is usually a big weigh-up between human readability (filenames) vs. computer efficiency (hash fields and trees).|
During the game development, the actual resources used by the game change frequently. To make it quick and easy to adapt the changes, the GRAF is usually structured following a common and recognisable pattern, some of which will be described in later chapters.
Tools of the Trade
The generic hex editor is the main type of program used to view data in non-text files, such as archives. Similarly to the way word processors display text data, a hex editor displays the contents of a file using hex characters.
|Hex characters are an alternate way to represent the byte data in a file. Whereas word processors display byte values as letters, hex displays each byte as a 2-character code that represents all possible values 0-256 (00-FF). The way to read and construct hex values is discussed in a later chapter|
There are literally hundreds of hex editors available for use - the one that you choose is your own personal taste. All hex editors have the same basic functionality, but some provide other tools and features that make it quicker and easier to work with files. Most hex editors are freely available over the Internet.
Following, we will provide a brief introduction into our own preferred programs, so you can see the general style and features available to you. This list is personal preference only - we encourage you to actively seek out your own preferred programs.
Hex Workshop from Breakpoint Software will be used for the examples and screenshots in this book, however the processes and screens should be similar across all hex editors. Hex Workshop includes several handy functions for analysis work, such as:
- A hexadecimal calculator
- Lists of the data types at the current location in the file
- Colour mapping.
|Hex Workshop is available from http://www.bpsoft.com|
While we encourage you to try many different programs, the one you ultimately choose should be based on your needs.
Here we present a brief introduction into the use of Hex Workshop. Although this is the main program that will be used for the screenshots in this book, take note that almost everything in this program can be applied to other hex editors, including the interface structure and layout.
n[[Image:## Error Converting ##]] File:Guide To Exploring File Formats - 011 - 01.png Figure 3.1.1a: General layout of Hex Workshop
A. Hexadecimal representation of the file content
B. ASCII interpretation of the file content
C. Different representations of the data at current cursor position
D. User-assigned bookmarks and their descriptions
When you have installed Hex Workshop, a convenience link is added to the context menu of Windows Explorer. Just right-click on a file and select "Edit with Hex Workshop" to open the file in the program
|The context menu is the menu that appears when you right-click in a Windows program. Named due to the fact that the links in the menu depend on the context of the right-click. For example, right-clicking on a file will give different choices to right-clicking on a selected piece of text.|
Once you have opened a file, you will be presented with a view similar to that depicted in Figure 3.1.1a. You can examine the files hexadecimal interpretation in section A, or the ASCII interpretation of the same bytes in section B. The table to the far left shows the offset of the lines shown.
|An offset is the location of the file data in relation to the start of the file. For example, an offset of value 560 means there are 560 bytes of data before you reach the current location.|
In this example, we have opened one of the *.pk4 files from the game Doom 3. We will later see that these are actually generic *.zip files. For now, you can see the file starts with the characters PK. The characters at the beginning of a file are often referred to as a header, ID tag, or magic number - and are usually a reliable way to identify whether the file is a common type. For example, all *.zip archives have the characters PK at the beginning, therefore there is a strong probability that the archive in our example is a *.zip archive. A brief list of some common header tags can be found later in the book.
|A header tag is simply a small group of bytes at the start of a file that help to identify the format of the remaining data. The header tag is usually a 4-byte string, however it can also be a preset set of byte values. While it is true that a filesÃ¢â‚¬â„¢ extension can help determine a file format, it is often unreliable and can be easily changed, whereas a header tag is hard to alter and is usually unique. In reality, the best way to determine a files format is to use a combination of the file extension and the header tag.|
The current position of the cursor in our example is at offset 18. The Data Interpreter in section C shows the different interpretations of the data at this file position, ranging from numbers to strings. The different data interpretations are covered more completely in a later chapter.
In our example image, we have color mapped and bookmarked (as in section D) some areas of our interest. Any range of bytes can be bookmarked or color mapped - simply click and drag the cursor along your area of interest and select the appropriate option from the context menu. When you make a bookmark, you can choose the data interpretation of the selection (its value), and give a description. The bookmarks will be shown with their offset in the file and the length in bytes. This is a very useful feature, as it allows you to click on a bookmark to jump to that offset.
|Color mapping: assigns a color to the selected area, to make it stand out.|
|Bookmarking: records the current cursor location in section D, with a user-defined description.|
Hex Workshop has the ability to save the bookmarks and color maps, so that you can load them on another file and see if the pattern matches. ie, if you have solved the pattern of a GRAF, you can apply the bookmarks and colour mapping to other files that you expect to have the same format.
Hex Workshop has another handy function - GoTo. If you select a range of bytes in a file and choose GoTo from the context menu, you can jump to the location identified by the selected value.