On the Greenhouse forums, someone brought up the topic of making mods for the Penny Arcade game. At that point, someone had already used some FMOD tools to extract and re-package the sound files for the game, but the rest of the game data was packed into ".hha" files. Since they didn't seem to be any known format, I started trying to decode the .HHA ("Hothead Archive", I'm assuming) format, with some help from Jon. I got distracted before I finished (weddings are so time-consuming), and by the time I got back from vacation, someone else had released source code to extract all files. I lost interest pretty quickly after that.
On a whim, I decided to pick up my little project and do something useful with it. Part 1 (this article) is about my first round of research. Part 2 will be about adding support to PhysicsFS (I have a few bugs to work out before I release that part). I might write a part 3 about actually doing something with the "Rain-slick Precipice of Darkness" files, or creating an archive, or something.
Info/header
Blocks represent DWORDs (aka 32-bit integers):
0 1 2 3 +----------------+---------------+---------------+--------------+ | MAGIC | VERSION | FILENAMES SZ | FILE ENTRIES | +----------------+---------------+---------------+--------------+Like most file formats, HHA starts with a "magic" number. This is 0xAC2FF34F, when read as little-endian, like all the numbers in the format. I'm just assuming the next DWORD is a version number, which is 0x00010000 in all the files I've seen. It probably represents something like "0.1.0.0", but I'm not sure. The next number is the size of the filenames list that follows (again, in LE). The last number in the header is the number of files contained in the archive.
Filename list
After the header is a null-delimited list of file and directory names. It's meant to be referenced by an offset that is stored with the file's other metadata (better explanation in the next section).
File metadata
Following the list of filenames is an array of 6-integer groups of file metadata, best represented by this structure:
//file metadata structure struct hha_file_info { int dir; //directory name (offset into filename list) int name; //file (offset into filename list) int compress; //compression level (0-2) int offset; //offset from start of file int full_len; //uncompressed file size int len; //file size in archive };(some day, this blog will have syntax highlighting)
To get the directory name, add the first number to the starting address of the filenames list. Read from there until you get to a null, like any C string. Do the same for the file name.
The compression level can be 0 (no compression), 1 (deflate/zlib), or 2 (LZMA). The offset determines where the file data begins. The next two numbers are the uncompressed and compressed sizes of the file (these numbers will, of course, be identical for uncompressed files).
Before the above-mentioned vacation and loss of interest, I had yet to actually try to figure out compression types 1 & 2. Credit goes to Maks Verver [ maksverver (at) geocities.com ] for figuring that out.
Source code or it didn't happen
I've posted my original code on our Google code project, though I don't recommend actually using it. It will only extract uncompressed files from the archive. Maks Verver posted a better version that supports the two compression types and can also create archives. For now, I would use his version instead, if you just want to play with your Penny Arcade game files.
Credits/disclaimer
Just to be clear, the author (me) does NOT work for Hothead Games, Greenhouse Interactive, Penny Arcade, etc. Any/all trademarks mentioned in this article belong to one of them, and the author is not associated or endorsed in any way.
Also, buy their games. If you're a Linux geek/gamer like me, you know there are few games that are released with native Linux clients (and OS X, if you're in to that sort of thing). Of course, they're a lot of fun, too.