Difference between revisions of "Compression"

Latest revision as of 23:22, 22 October 2016

Terranigma uses a compression for most of the data.

Format

Compressed data always starts with a 4 byte header:

Offset	Length	Description
0	1	Chunk Counter ??, always 0 (not used in the game)
1	2	Uncompressed data size (LE)
3	1	First byte of uncompressed data

After that follows the compressed data. It consists of data bytes intersparsed with a control code bitstream, both use MSB first (BE) encoding. Data bytes are always aligned at byte boundary. The control code stream is fit in between the data bytes so that a new cc byte is only placed when a bit needs to be used and the previous cc byte is full.

Terranigma Compression Format
cc	cc	cc	8 bit	5 bit	3 bit	8 bit	Description
1	-		Literal	-			Literal Byte
0	1	-	Offset		Length != 0	-	Copy Length+2 bytes from next_output_pos-0x2000+offset to the output
					0	Length != 0	Copy Length+1 bytes from next_output_pos-0x2000+offset to the output
					0	0	End of Data (usually in this case Offset is also 0)
	0	Length	Offset	-			Copy Length+2 bytes from next_output_pos-0x100+offset to the output

In the table, each cc stands for a control code bit. The second data byte is divided into 5+3 bits for mode 01.

Compression used for the Game

In general there is more than one way to compress a chunk of data with the given format. But the game's developers used a very specific greedy algorithm to compress all the data in the game. It can be reproduced with the following algorithm:

Append 0x100 zero bytes to the (uncompressed) input data. These don't count into input_size and are only there for padding!
Output the header, using 00, input_size, input[0]
input_pos = 1
While there is still input to process, repeat the following:
1. Find the longest match of data for the current input_pos:
  - Look in the range max(0,input_pos-0x2000)..(input_pos-1) for a match of up to 0x100 bytes.
  - Call the position at which the match is found best_offset and the length of the match best_length.
  - If there are multiple matches of the same length, use the one with the highest offset.
2. if input_pos + best_length > input_size, then best_length = input_size - input_pos
3. Encode the match:
  - if 2 <= best_length <= 5 and best_offset is no less than inpus_pos-0x100, then encode the match using mode 00.
  - else if best_length <= 2, then encode the current input byte as literal and set best_length = 1.
  - else if best_length <= 9, then encode the match using mode 01.
  - else encode the match using mode 01L.
4. inpus_pos += best_length

Tool

Crediar re'ed the decompression from the code and translated it to C code.

You can download a tool with the source to decompress data from any given offset here.

@@ Line 1: / Line 1: @@
 Terranigma uses a compression for most of the data.
-I re'ed it from the code and translated it to C code. So there won't be a real explanation on how it works.
+== Format ==
-You can download a tool with the source to decompress data from any given offset [http://crediar.no-ip.com/td-v0.1-cred.rar here].
+Compressed data always starts with a 4 byte header:
+{| class="wikitable" style="font-family:Lucida Console;"
+|-
+! Offset !! Length !! Description
+|-
+| 0 || 1 || Chunk Counter ??, always 0 (not used in the game)
+|-
+| 1 || 2 || Uncompressed data size (LE)
+|-
+| 3 || 1 || First byte of uncompressed data
+|}
+After that follows the compressed data.
+It consists of data bytes intersparsed with a control code bitstream, both use MSB first (BE) encoding.
+Data bytes are always aligned at byte boundary.
+The control code stream is fit in between the data bytes so that a new cc byte is only placed
+when a bit needs to be used and the previous cc byte is full.
+{| class="wikitable" style="font-family:Lucida Console;"
+|+ Terranigma Compression Format
+|-
+! style="background:#F2F299;" | cc
+! style="background:#F2F299;" | cc
+! style="background:#F2F299;" | cc
+! style="background:#F2F299;" | cc
+! 8 bit
+! 5 bit
+! 3 bit
+! 8 bit
+! Description
+|-
+| style="background:#FFFFCC;" | 1
+| colspan="3" style="background:#FFFFCC;" | -
+| Literal
+| colspan="3" | -
+| '''Literal Byte'''
+|-
+| rowspan="4" style="background:#FFFFCC;" | 0
+| rowspan="3" style="background:#FFFFCC;" | 1
+| rowspan="3" colspan="2" style="background:#FFFFCC;" | -
+| rowspan="3" colspan="2" | Offset
+| Length != 0
+| -
+| Copy '''Length+2''' bytes from '''next_output_pos-0x2000+offset''' to the output
+|-
+| rowspan="2" | 0
+| Length != 0
+| Copy '''Length+1''' bytes from '''next_output_pos-0x2000+offset''' to the output
+|-
+| 0
+| '''End of Data''' (usually in this case '''Offset''' is also 0)
+|-
+| style="background:#FFFFCC;" | 0
+| colspan="2" style="background:#FFFFCC;" | Length
+| Offset
+| colspan="3" | -
+| Copy '''Length+2''' bytes from '''next_output_pos-0x100+offset''' to the output
+|}
+In the table, each '''cc''' stands for a control code bit. The second data byte is divided into 5+3 bits for mode '''01'''.
+== Compression used for the Game ==
+In general there is more than one way to compress a chunk of data with the given format.
+But the game's developers used a very specific ''greedy'' algorithm to compress all the data in the game.
+It can be reproduced with the following algorithm:
+# Append 0x100 zero bytes to the (uncompressed) input data. These don't count into '''input_size''' and are only there for padding!
+# Output the header, using 00, '''input_size''', input[0]
+# '''input_pos''' = 1
+# While there is still input to process, repeat the following:
+## Find the longest match of data for the current input_pos:
+##* Look in the range '''max(0,input_pos-0x2000)..(input_pos-1)''' for a match of up to '''0x100''' bytes.
+##* Call the position at which the match is found '''best_offset''' and the length of the match '''best_length'''.
+##* If there are multiple matches of the same length, use the one with the highest offset.
+## if '''input_pos + best_length > input_size''', then '''best_length''' = '''input_size - input_pos'''
+## Encode the match:
+##* if '''2 <= best_length <= 5''' and '''best_offset''' is no less than '''inpus_pos-0x100''', then encode the match using mode '''00'''.
+##* else if '''best_length <= 2''', then encode the current input byte as literal and set '''best_length = 1'''.
+##* else if '''best_length <= 9''', then encode the match using mode '''01'''.
+##* else encode the match using mode '''01L'''.
+## '''inpus_pos''' += '''best_length'''
+== Tool ==
+Crediar re'ed the decompression from the code and translated it to C code.
+You can download a tool with the source to decompress data from any given offset [http://www.terranigma.be/td-v0.1-cred.rar here].

Difference between revisions of "Compression"

Latest revision as of 23:22, 22 October 2016

Format

Compression used for the Game

Tool

Navigation menu

Search