Difference between revisions of "Compression"

From Terranigma Wiki
Jump to navigation Jump to search
(described compression in detail)
(fixed link)
 
(2 intermediate revisions by one other user not shown)
Line 66: Line 66:
In the table, each '''cc''' stands for a control code bit. The second data byte is divided into 5+3 bits for mode '''01'''.
In the table, each '''cc''' stands for a control code bit. The second data byte is divided into 5+3 bits for mode '''01'''.


== Compression used for the Game ==
In general there is more than one way to compress a chunk of data with the given format.
But the game's developers used a very specific ''greedy'' algorithm to compress all the data in the game.
It can be reproduced with the following algorithm:
# Append 0x100 zero bytes to the (uncompressed) input data. These don't count into '''input_size''' and are only there for padding!
# Output the header, using 00, '''input_size''', input[0]
# '''input_pos''' = 1
# While there is still input to process, repeat the following:
## Find the longest match of data for the current input_pos:
##* Look in the range '''max(0,input_pos-0x2000)..(input_pos-1)''' for a match of up to '''0x100''' bytes.
##* Call the position at which the match is found '''best_offset''' and the length of the match '''best_length'''.
##* If there are multiple matches of the same length, use the one with the highest offset.
## if '''input_pos + best_length > input_size''', then '''best_length''' = '''input_size - input_pos'''
## Encode the match:
##* if '''2 <= best_length <= 5''' and '''best_offset''' is no less than '''inpus_pos-0x100''', then encode the match using mode '''00'''.
##* else if '''best_length <= 2''', then encode the current input byte as literal and set '''best_length = 1'''.
##* else if '''best_length <= 9''', then encode the match using mode '''01'''.
##* else encode the match using mode '''01L'''.
## '''inpus_pos''' += '''best_length'''


== Tool ==
== Tool ==
Line 71: Line 92:
Crediar re'ed the decompression from the code and translated it to C code.
Crediar re'ed the decompression from the code and translated it to C code.


You can download a tool with the source to decompress data from any given offset [http://crediar.no-ip.com/td-v0.1-cred.rar here].
You can download a tool with the source to decompress data from any given offset [http://www.terranigma.be/td-v0.1-cred.rar here].

Latest revision as of 23:22, 22 October 2016

Terranigma uses a compression for most of the data.

Format

Compressed data always starts with a 4 byte header:

Offset Length Description
0 1 Chunk Counter ??, always 0 (not used in the game)
1 2 Uncompressed data size (LE)
3 1 First byte of uncompressed data

After that follows the compressed data. It consists of data bytes intersparsed with a control code bitstream, both use MSB first (BE) encoding. Data bytes are always aligned at byte boundary. The control code stream is fit in between the data bytes so that a new cc byte is only placed when a bit needs to be used and the previous cc byte is full.

Terranigma Compression Format
cc cc cc cc 8 bit 5 bit 3 bit 8 bit Description
1 - Literal - Literal Byte
0 1 - Offset Length != 0 - Copy Length+2 bytes from next_output_pos-0x2000+offset to the output
0 Length != 0 Copy Length+1 bytes from next_output_pos-0x2000+offset to the output
0 End of Data (usually in this case Offset is also 0)
0 Length Offset - Copy Length+2 bytes from next_output_pos-0x100+offset to the output

In the table, each cc stands for a control code bit. The second data byte is divided into 5+3 bits for mode 01.

Compression used for the Game

In general there is more than one way to compress a chunk of data with the given format. But the game's developers used a very specific greedy algorithm to compress all the data in the game. It can be reproduced with the following algorithm:

  1. Append 0x100 zero bytes to the (uncompressed) input data. These don't count into input_size and are only there for padding!
  2. Output the header, using 00, input_size, input[0]
  3. input_pos = 1
  4. While there is still input to process, repeat the following:
    1. Find the longest match of data for the current input_pos:
      • Look in the range max(0,input_pos-0x2000)..(input_pos-1) for a match of up to 0x100 bytes.
      • Call the position at which the match is found best_offset and the length of the match best_length.
      • If there are multiple matches of the same length, use the one with the highest offset.
    2. if input_pos + best_length > input_size, then best_length = input_size - input_pos
    3. Encode the match:
      • if 2 <= best_length <= 5 and best_offset is no less than inpus_pos-0x100, then encode the match using mode 00.
      • else if best_length <= 2, then encode the current input byte as literal and set best_length = 1.
      • else if best_length <= 9, then encode the match using mode 01.
      • else encode the match using mode 01L.
    4. inpus_pos += best_length

Tool

Crediar re'ed the decompression from the code and translated it to C code.

You can download a tool with the source to decompress data from any given offset here.