Difference between revisions of "DGTEFF"

From XentaxWiki
Jump to: navigation, search
m
m
Line 140: Line 140:
  
 
During the game development, the actual resources used by the game change frequently. To make it quick and easy to adapt the changes, the GRAF is usually structured following a common and recognisable pattern, some of which will be described in later chapters.
 
During the game development, the actual resources used by the game change frequently. To make it quick and easy to adapt the changes, the GRAF is usually structured following a common and recognisable pattern, some of which will be described in later chapters.
== Tools ==
 
  
=== Hex Editors ===
+
==Tools of the Trade==
 +
===Hex Editors===
 +
The generic hex editor is the main type of program used to view data in non-text files, such as archives. Similarly to the way word processors display text data, a hex editor displays the contents of a file using <font color="#008000">''hex''</font> characters.
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|<font color="#008000">Hex</font> characters are an alternate way to represent the byte data in a file. Whereas word processors display byte values as letters, hex displays each byte as a 2-character code that represents all possible values 0-256 (00-FF). The way to read and construct hex values is discussed in a later chapter
 +
|-
 +
|}
 +
 
 +
There are literally hundreds of hex editors available for use - the one that you choose is your own personal taste. All hex editors have the same basic functionality, but some provide other tools and features that make it quicker and easier to work with files. Most hex editors are freely available over the Internet.
 +
 
 +
Following, we will provide a brief introduction into our own preferred programs, so you can see the general style and features available to you. This list is personal preference only - we encourage you to actively seek out your own preferred programs.
 +
 
 +
Mikes<nowiki>’</nowiki> editor of choice is <font color="#FF6600">''Hex Workshop''</font><font color="#800000"> </font>from Breakpoint Software. This editor will be used for the examples and screenshots in this book, however the processes and screens should be similar across all hex editors. Hex Workshop includes several handy functions for analysis work, such as:
 +
 
 +
* A hexadecimal calculator
 +
* Lists of the data types at the current location in the file
 +
* Bookmarking
 +
* Colour mapping.
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|<font color="#FF6600">Hex Workshop</font> is available from <font color="#0000FF"><u>http://www.bpsoft.com</u></font>
 +
|-
 +
|}
 +
 
 +
WATTOs<nowiki>’</nowiki> editor of choice is his homemade program <font color="#FF6600">''Total Byte Informer''</font>. This program was developed specifically for analysis of files, and as such it carries a different focus and set of tools:
 +
 
 +
* Display 2 files side-by-side for easy comparison
 +
* View bytes as either byte values, hex, character, or color shades (or combinations of displays).
 +
* Coloring of null values for easy pattern identification
 +
* Lists the data types at the current location
 +
* Quick conversion between little-big endian formats
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|<font color="#FF6600">Total Byte Informer</font> is free for use, and includes the source code. It can be downloaded from <font color="#0000FF"><u>http://www.watto.org/program/java.html</u></font>
 +
|-
 +
|}
 +
 
 +
Both tools should also be used with the <font color="#FF6600">''Windows Calculator''</font> - it has a nice, simple interface and can convert values to/from hex, binary, and octal.
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|<font color="#FF6600">Windows Calculator</font> comes with all versions of Windows, and can be found either in the start menu, or at <font color="#0000FF"><u>c:\windows\system32\calc.exe</u></font>
 +
|-
 +
|}
 +
 
 +
While we encourage you to try many different programs, the one you ultimately choose should be based on your needs. For example, if you are devoted specifically to exploring archives, Total Byte Informer could be for you. However if you intend to explore all different types of files, or delve into areas such as compression and encryption, a general hex editor like Hex Workshop would be more beneficial. The choice is yours.
 +
 
 +
====Hex Workshop====
 +
Here we present a brief introduction into the use of <font color="#FF6600">''Hex Workshop''</font>. Although this is the main program that will be used for the screenshots in this book, take note that almost everything in this program can be applied to other hex editors, including the interface structure and layout.
 +
 
 +
n[[Image:## Error Converting ##]]
 +
[[Image:Guide_To_Exploring_File_Formats_-_011_-_01.png]]
 +
<font color="#000080">'''Figure 3.1.1a: General layout of Hex Workshop'''</font>
 +
 
 +
''A. ''Hexadecimal representation of the file content
 +
 
 +
''B. ''ASCII interpretation of the file content
 +
 
 +
''C. ''Different representations of the data at current cursor position
 +
 
 +
''D. ''User-assigned bookmarks and their descriptions
 +
 
 +
When you have installed Hex Workshop, a convenience link is added to the ''context menu'' of Windows Explorer. Just right-click on a file and select "Edit with Hex Workshop" to open the file in the program
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|The context menu is  the menu that appears when you right-click in a Windows program. Named due to the fact that the links in the menu depend on the context of the right-click. For example, right-clicking on a file will give different choices to right-clicking on a selected piece of text.
 +
|-
 +
|}
 +
 
 +
Once you have opened a file, you will be presented with a view similar to that depicted in '''Figure 3.1.1a'''. You can examine the files hexadecimal interpretation in section '''A''', or the ASCII interpretation of the same bytes in section '''B'''. The table to the far left shows the ''offset ''of the lines shown.
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|An offset is the location of the file data in relation to the start of the file. For example, an offset of value 560 means there are 560 bytes of data before you reach the current location.
 +
|-
 +
|}
 +
 
 +
In this example, we have opened one of the <nowiki>*</nowiki>.pk4 files from the game Doom 3. We will later see that these are actually generic <nowiki>*</nowiki>.zip files. For now, you can see the file starts with the characters PK. The characters at the beginning of a file are often referred to as a ''header'', ''ID tag'', or ''magic number'' - and are usually a reliable way to identify whether the file is a common type. For example, all <nowiki>*</nowiki>.zip archives have the characters PK at the beginning, therefore there is a strong probability that the archive in our example is a <nowiki>*</nowiki>.zip archive. A brief list of some common ''header tags'' can be found later in the book.
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|A ''header tag'' is simply a small group of bytes at the start of a file that help to identify the format of the remaining data. The header tag is usually a 4-byte string, however it can also be a preset set of byte values. While it is true that a files<nowiki>’</nowiki> extension can help determine a file format, it is often unreliable and can be easily changed, whereas a header tag is hard to alter and is usually unique. In reality, the best way to determine a files format is to use a combination of the file extension and the header tag.
 +
|-
 +
|}
 +
 
 +
The current position of the cursor in our example is at offset 18. The Data Interpreter in section '''C''' shows the different interpretations of the data at this file position, ranging from numbers to strings. The different data interpretations are covered more completely in a later chapter.
 +
 
 +
In our example image, we have ''color mapped'' and ''bookmarked'' (as in section '''D''') some areas of our interest. Any range of bytes can be bookmarked or color mapped - simply click and drag the cursor along your area of interest and select the appropriate option from the context menu. When you make a bookmark, you can choose the data interpretation of the selection (its ''value''), and give a ''description''. The bookmarks will be shown with their ''offset ''in the file and the ''length'' in bytes. This is a very useful feature, as it allows you to click on a bookmark to jump to that offset.
 +
 
 +
{|cellspacing="0" cellpadding = "0" style="border-style:solid; border-width:1px; border-collapse:collapse" width="100%"
 +
|align = "justify" bgcolor = "#E6E6E6"|Color mapping: assigns a color to the selected area, to make it stand out.
 +
|-
 +
|align = "justify" bgcolor = "#E6E6E6"|&nbsp;
 +
|-
 +
|align = "justify" bgcolor = "#E6E6E6"|Bookmarking: records the current cursor location in section D, with a user-defined description.
 +
|-
 +
|}
 +
 
 +
Hex Workshop has the ability to save the bookmarks and color maps, so that you can load them on another file and see if the pattern matches. ie, if you have solved the pattern of a GRAF, you can apply the bookmarks and colour mapping to other files that you expect to have the same format.
 +
 
 +
Hex Workshop has another handy function - <font color="#008000">''GoTo''</font>. If you select a range of bytes in a file and choose GoTo from the context menu, you can jump to the location identified by the selected value.
  
=== Hex Workshop ===
 
  
 
== Terms, Definitions and Data Structures ==
 
== Terms, Definitions and Data Structures ==

Revision as of 14:12, 17 October 2006

This document explains in detail how to start exploring and examining file formats, with a focus on Game Resource Archives. For beginners and advanced users alike.
The definitive word in archive exploration.

Download below, or scroll on down and read it here:

DGTEFF as PDF

DGTEFF as ZIPPED PDF

Authors: Mr.Mouse and Watto

Version: 1.0 as of November 2004

Rewritten for the WIKI by Dinoguy1000 as of August 2006


Title page


THE DEFINITIVE GUIDE TO
EXPLORING FILE FORMATS
 

= Revision 2 =

WATTO

(www.watto.org)

Mike Zuurman

(www.xentax.com)

Table of Contents

Introduction

General Introduction

Computer games are vast and many, covering a wide range of genres and game styles, but there is one fundamental feature that all games require - resources. Every game has a range of resources that help make it unique - from texture images to audio soundtracks. With all these resources, there needs to be a way they can be stored so that games can use them, and the way this is typically done is to store them in a big archive file.

An archive is a single computer file that contains the data for several smaller files. A common analogy would be a cardboard box - it can be used to store a lot of different items (paper, food, objects), and each item can have different properties (size, color, shape)

The question that may arise is "why do game developers use archives to store their game resources? Wouldn’t it be easier to just store all the files normally?" The answer is yes, storing the files normally would be much easier, and certainly much better during the game development, but before the final production they are packaged into archives for several reasons…

  • An archive can store a lot of files in a single location, so it is quicker to access the files from a hard disk or CD
  • A large archive, due to it being in 1 block on the disk, can utilise features such as file buffers, further increasing read performance
  • It reduces the number of files on the disk, making the reading of the file index quicker
  • The files can be hidden away, making it harder to hack or modify the game
  • All files can be accessed using a single file stream, reducing the time required to generate file stream objects, and making the file access programming simpler
  • Files can be compressed easily, and other information such as file descriptions and ID numbers can be stored

Purpose Of This Book

Unfortunately, there is a downside to using archives - there are no real standards defined for the creation and use of archives. In order to read or write archives for a particular game, someone usually needs to analyse the file themselves, or perform other complicated and time-consuming tasks such as reverse engineering or hex editing.

Some of the more modern games produced these days recognise that they can gain extra advertising by allowing the internet community to mod their games. Due to this, some game developers have changed to supporting standard archive types, such as Zip archives, however there is still an overwhelming number of games with their own proprietary archive formats.

Mod, short for modification, refers to the alteration of a computer game by a member of the internet community, usually to support extra functionality or to generate a different game built on top of the original. Some examples include changing the sounds and textures used by a game, or creating new game maps.

This book aims to provide an insight into the way game archives are created, and how to analyse an archive to locate the files contained within. In the following pages, we will discuss some of the basic fundamentals of computer-stored numbering, common structures used by most archives, compression, encryption, and the tools that you can use to help get the job done. Hopefully, by the time you have finished reading this book, you will be able to analyse your own archives, and take the first step towards your own development and game modding.

Thanks for reading our book, we wish you the best of luck in your exploration .

Formatting Used In This Book

Link A link to a website of interest or for further information.
Link A link to a different section of the document.
Term An important term, or a term that is being defined.
A general comment, or clarification of a point.
Value A value, usually in an example
Caption Caption for an image, or a reference to some information in the image
Reference A tool reference, such as a menu, button, or action in a specific program.


Brief descriptions of a term, related notes, or other supplementary material will be presented in a box like this. This will often accompany a term.


     

What is a GRAF?

The term GRAF describes the way a game archive is constructed, and in particular, the storage of the files within the archive. The format of an archive usually differs between each individual game, however occasionally a game developer will stick with a particular format for a few games of the same vintage, particularly if the games are built using the same underlying game engine.

GRAF stands for Game Resource Archive Format, which is most simply the specifications describing the format of a particular archive.

Programmers usually define their GRAFs according to the needs and structure of the game itself. For example, the memory in an XBOX game console is based around blocks of 2048 bytes - the GRAFs for most XBOX games utilise this so the game data can be opened efficiently.

The development of a GRAF is particularly troublesome - there is a constant weigh-up between factors such as efficient storage, quick loading, and fast targeting. One of the things that has great influence is human readability - the things that make archives easy for humans to use, often make it less efficient. For example, the storing of filenames in an archive tells humans the purpose and type of data, however it is very inefficient and slow to read filenames from an archive - thus the weigh-up.

Efficient storage: Files need to be stored in a way that conserves space on the disk and/or in memory.
 
Quick Loading: When the game is loading, the required resources are loaded into memory - this needs to be done quickly, while still gathering all the required information.
 
Fast Targeting: When a resource is loaded into memory, it needs to be quick and easy for the game to find the file. This is usually a big weigh-up between human readability (filenames) vs. computer efficiency (hash fields and trees).

During the game development, the actual resources used by the game change frequently. To make it quick and easy to adapt the changes, the GRAF is usually structured following a common and recognisable pattern, some of which will be described in later chapters.

Tools of the Trade

Hex Editors

The generic hex editor is the main type of program used to view data in non-text files, such as archives. Similarly to the way word processors display text data, a hex editor displays the contents of a file using hex characters.

Hex characters are an alternate way to represent the byte data in a file. Whereas word processors display byte values as letters, hex displays each byte as a 2-character code that represents all possible values 0-256 (00-FF). The way to read and construct hex values is discussed in a later chapter

There are literally hundreds of hex editors available for use - the one that you choose is your own personal taste. All hex editors have the same basic functionality, but some provide other tools and features that make it quicker and easier to work with files. Most hex editors are freely available over the Internet.

Following, we will provide a brief introduction into our own preferred programs, so you can see the general style and features available to you. This list is personal preference only - we encourage you to actively seek out your own preferred programs.

Mikes’ editor of choice is Hex Workshop from Breakpoint Software. This editor will be used for the examples and screenshots in this book, however the processes and screens should be similar across all hex editors. Hex Workshop includes several handy functions for analysis work, such as:

  • A hexadecimal calculator
  • Lists of the data types at the current location in the file
  • Bookmarking
  • Colour mapping.
Hex Workshop is available from http://www.bpsoft.com

WATTOs’ editor of choice is his homemade program Total Byte Informer. This program was developed specifically for analysis of files, and as such it carries a different focus and set of tools:

  • Display 2 files side-by-side for easy comparison
  • View bytes as either byte values, hex, character, or color shades (or combinations of displays).
  • Coloring of null values for easy pattern identification
  • Lists the data types at the current location
  • Quick conversion between little-big endian formats
Total Byte Informer is free for use, and includes the source code. It can be downloaded from http://www.watto.org/program/java.html

Both tools should also be used with the Windows Calculator - it has a nice, simple interface and can convert values to/from hex, binary, and octal.

Windows Calculator comes with all versions of Windows, and can be found either in the start menu, or at c:\windows\system32\calc.exe

While we encourage you to try many different programs, the one you ultimately choose should be based on your needs. For example, if you are devoted specifically to exploring archives, Total Byte Informer could be for you. However if you intend to explore all different types of files, or delve into areas such as compression and encryption, a general hex editor like Hex Workshop would be more beneficial. The choice is yours.

Hex Workshop

Here we present a brief introduction into the use of Hex Workshop. Although this is the main program that will be used for the screenshots in this book, take note that almost everything in this program can be applied to other hex editors, including the interface structure and layout.

n[[Image:## Error Converting ##]] File:Guide To Exploring File Formats - 011 - 01.png Figure 3.1.1a: General layout of Hex Workshop

A. Hexadecimal representation of the file content

B. ASCII interpretation of the file content

C. Different representations of the data at current cursor position

D. User-assigned bookmarks and their descriptions

When you have installed Hex Workshop, a convenience link is added to the context menu of Windows Explorer. Just right-click on a file and select "Edit with Hex Workshop" to open the file in the program

The context menu is the menu that appears when you right-click in a Windows program. Named due to the fact that the links in the menu depend on the context of the right-click. For example, right-clicking on a file will give different choices to right-clicking on a selected piece of text.

Once you have opened a file, you will be presented with a view similar to that depicted in Figure 3.1.1a. You can examine the files hexadecimal interpretation in section A, or the ASCII interpretation of the same bytes in section B. The table to the far left shows the offset of the lines shown.

An offset is the location of the file data in relation to the start of the file. For example, an offset of value 560 means there are 560 bytes of data before you reach the current location.

In this example, we have opened one of the *.pk4 files from the game Doom 3. We will later see that these are actually generic *.zip files. For now, you can see the file starts with the characters PK. The characters at the beginning of a file are often referred to as a header, ID tag, or magic number - and are usually a reliable way to identify whether the file is a common type. For example, all *.zip archives have the characters PK at the beginning, therefore there is a strong probability that the archive in our example is a *.zip archive. A brief list of some common header tags can be found later in the book.

A header tag is simply a small group of bytes at the start of a file that help to identify the format of the remaining data. The header tag is usually a 4-byte string, however it can also be a preset set of byte values. While it is true that a files’ extension can help determine a file format, it is often unreliable and can be easily changed, whereas a header tag is hard to alter and is usually unique. In reality, the best way to determine a files format is to use a combination of the file extension and the header tag.

The current position of the cursor in our example is at offset 18. The Data Interpreter in section C shows the different interpretations of the data at this file position, ranging from numbers to strings. The different data interpretations are covered more completely in a later chapter.

In our example image, we have color mapped and bookmarked (as in section D) some areas of our interest. Any range of bytes can be bookmarked or color mapped - simply click and drag the cursor along your area of interest and select the appropriate option from the context menu. When you make a bookmark, you can choose the data interpretation of the selection (its value), and give a description. The bookmarks will be shown with their offset in the file and the length in bytes. This is a very useful feature, as it allows you to click on a bookmark to jump to that offset.

Color mapping: assigns a color to the selected area, to make it stand out.
 
Bookmarking: records the current cursor location in section D, with a user-defined description.

Hex Workshop has the ability to save the bookmarks and color maps, so that you can load them on another file and see if the pattern matches. ie, if you have solved the pattern of a GRAF, you can apply the bookmarks and colour mapping to other files that you expect to have the same format.

Hex Workshop has another handy function - GoTo. If you select a range of bytes in a file and choose GoTo from the context menu, you can jump to the location identified by the selected value.


Terms, Definitions and Data Structures

Files

Bits

Bytes

16-bit (2-byte) numbers

32-bit (4-byte) numbers

64-bit (8-byte) numbers

Strings

Hexadecimal Numbering

Signed and Unsigned Numbers

Big-Endian and Little-Endian

File Offsets

Archive Patterns

Directory Archives

Tree Archives

Chunked Archives

Split Chunk Archives

External Directory Archives

Checking Your Results

Common Types of Fields

Validating Your Fields

Padding

Filename Patterns

Encryption and Compression

The Basics

XOR

NOT

SHL

SHR

Encryption

Painkiller Encryption

Compression

Worked Examples

Quake *.PAK

Appendix

Binary -> Byte Number Table

American Standard Code for Information Interchange (ASCII) Table

Formats of some Common Game Archives

Useful References

Common File Format Tags

Legal Information