Difference between revisions of "Compression"
(Created page with "Terranigma uses a compression for most of the data. I re'ed it from the code and translated it to C code. So there won't be a real explanation on how it works. You can downl...") |
(fixed link) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
Terranigma uses a compression for most of the data. | Terranigma uses a compression for most of the data. | ||
== Format == | |||
You can download a tool with the source to decompress data from any given offset here | Compressed data always starts with a 4 byte header: | ||
{| class="wikitable" style="font-family:Lucida Console;" | |||
|- | |||
! Offset !! Length !! Description | |||
|- | |||
| 0 || 1 || Chunk Counter ??, always 0 (not used in the game) | |||
|- | |||
| 1 || 2 || Uncompressed data size (LE) | |||
|- | |||
| 3 || 1 || First byte of uncompressed data | |||
|} | |||
After that follows the compressed data. | |||
It consists of data bytes intersparsed with a control code bitstream, both use MSB first (BE) encoding. | |||
Data bytes are always aligned at byte boundary. | |||
The control code stream is fit in between the data bytes so that a new cc byte is only placed | |||
when a bit needs to be used and the previous cc byte is full. | |||
{| class="wikitable" style="font-family:Lucida Console;" | |||
|+ Terranigma Compression Format | |||
|- | |||
! style="background:#F2F299;" | cc | |||
! style="background:#F2F299;" | cc | |||
! style="background:#F2F299;" | cc | |||
! style="background:#F2F299;" | cc | |||
! 8 bit | |||
! 5 bit | |||
! 3 bit | |||
! 8 bit | |||
! Description | |||
|- | |||
| style="background:#FFFFCC;" | 1 | |||
| colspan="3" style="background:#FFFFCC;" | - | |||
| Literal | |||
| colspan="3" | - | |||
| '''Literal Byte''' | |||
|- | |||
| rowspan="4" style="background:#FFFFCC;" | 0 | |||
| rowspan="3" style="background:#FFFFCC;" | 1 | |||
| rowspan="3" colspan="2" style="background:#FFFFCC;" | - | |||
| rowspan="3" colspan="2" | Offset | |||
| Length != 0 | |||
| - | |||
| Copy '''Length+2''' bytes from '''next_output_pos-0x2000+offset''' to the output | |||
|- | |||
| rowspan="2" | 0 | |||
| Length != 0 | |||
| Copy '''Length+1''' bytes from '''next_output_pos-0x2000+offset''' to the output | |||
|- | |||
| 0 | |||
| '''End of Data''' (usually in this case '''Offset''' is also 0) | |||
|- | |||
| style="background:#FFFFCC;" | 0 | |||
| colspan="2" style="background:#FFFFCC;" | Length | |||
| Offset | |||
| colspan="3" | - | |||
| Copy '''Length+2''' bytes from '''next_output_pos-0x100+offset''' to the output | |||
|} | |||
In the table, each '''cc''' stands for a control code bit. The second data byte is divided into 5+3 bits for mode '''01'''. | |||
== Compression used for the Game == | |||
In general there is more than one way to compress a chunk of data with the given format. | |||
But the game's developers used a very specific ''greedy'' algorithm to compress all the data in the game. | |||
It can be reproduced with the following algorithm: | |||
# Append 0x100 zero bytes to the (uncompressed) input data. These don't count into '''input_size''' and are only there for padding! | |||
# Output the header, using 00, '''input_size''', input[0] | |||
# '''input_pos''' = 1 | |||
# While there is still input to process, repeat the following: | |||
## Find the longest match of data for the current input_pos: | |||
##* Look in the range '''max(0,input_pos-0x2000)..(input_pos-1)''' for a match of up to '''0x100''' bytes. | |||
##* Call the position at which the match is found '''best_offset''' and the length of the match '''best_length'''. | |||
##* If there are multiple matches of the same length, use the one with the highest offset. | |||
## if '''input_pos + best_length > input_size''', then '''best_length''' = '''input_size - input_pos''' | |||
## Encode the match: | |||
##* if '''2 <= best_length <= 5''' and '''best_offset''' is no less than '''inpus_pos-0x100''', then encode the match using mode '''00'''. | |||
##* else if '''best_length <= 2''', then encode the current input byte as literal and set '''best_length = 1'''. | |||
##* else if '''best_length <= 9''', then encode the match using mode '''01'''. | |||
##* else encode the match using mode '''01L'''. | |||
## '''inpus_pos''' += '''best_length''' | |||
== Tool == | |||
Crediar re'ed the decompression from the code and translated it to C code. | |||
You can download a tool with the source to decompress data from any given offset [http://www.terranigma.be/td-v0.1-cred.rar here]. |
Latest revision as of 23:22, 22 October 2016
Terranigma uses a compression for most of the data.
Format
Compressed data always starts with a 4 byte header:
Offset | Length | Description |
---|---|---|
0 | 1 | Chunk Counter ??, always 0 (not used in the game) |
1 | 2 | Uncompressed data size (LE) |
3 | 1 | First byte of uncompressed data |
After that follows the compressed data. It consists of data bytes intersparsed with a control code bitstream, both use MSB first (BE) encoding. Data bytes are always aligned at byte boundary. The control code stream is fit in between the data bytes so that a new cc byte is only placed when a bit needs to be used and the previous cc byte is full.
cc | cc | cc | cc | 8 bit | 5 bit | 3 bit | 8 bit | Description |
---|---|---|---|---|---|---|---|---|
1 | - | Literal | - | Literal Byte | ||||
0 | 1 | - | Offset | Length != 0 | - | Copy Length+2 bytes from next_output_pos-0x2000+offset to the output | ||
0 | Length != 0 | Copy Length+1 bytes from next_output_pos-0x2000+offset to the output | ||||||
0 | End of Data (usually in this case Offset is also 0) | |||||||
0 | Length | Offset | - | Copy Length+2 bytes from next_output_pos-0x100+offset to the output |
In the table, each cc stands for a control code bit. The second data byte is divided into 5+3 bits for mode 01.
Compression used for the Game
In general there is more than one way to compress a chunk of data with the given format. But the game's developers used a very specific greedy algorithm to compress all the data in the game. It can be reproduced with the following algorithm:
- Append 0x100 zero bytes to the (uncompressed) input data. These don't count into input_size and are only there for padding!
- Output the header, using 00, input_size, input[0]
- input_pos = 1
- While there is still input to process, repeat the following:
- Find the longest match of data for the current input_pos:
- Look in the range max(0,input_pos-0x2000)..(input_pos-1) for a match of up to 0x100 bytes.
- Call the position at which the match is found best_offset and the length of the match best_length.
- If there are multiple matches of the same length, use the one with the highest offset.
- if input_pos + best_length > input_size, then best_length = input_size - input_pos
- Encode the match:
- if 2 <= best_length <= 5 and best_offset is no less than inpus_pos-0x100, then encode the match using mode 00.
- else if best_length <= 2, then encode the current input byte as literal and set best_length = 1.
- else if best_length <= 9, then encode the match using mode 01.
- else encode the match using mode 01L.
- inpus_pos += best_length
- Find the longest match of data for the current input_pos:
Tool
Crediar re'ed the decompression from the code and translated it to C code.
You can download a tool with the source to decompress data from any given offset here.