mirror of https://github.com/stenzek/duckstation
				
				
				
			
			You cannot select more than 25 topics
			Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
		
		
		
		
		
			
		
			
				
	
	
		
			329 lines
		
	
	
		
			9.8 KiB
		
	
	
	
		
			Plaintext
		
	
			
		
		
	
	
			329 lines
		
	
	
		
			9.8 KiB
		
	
	
	
		
			Plaintext
		
	
LZMA compression
 | 
						|
----------------
 | 
						|
Version: 9.35
 | 
						|
 | 
						|
This file describes LZMA encoding and decoding functions written in C language.
 | 
						|
 | 
						|
LZMA is an improved version of famous LZ77 compression algorithm. 
 | 
						|
It was improved in way of maximum increasing of compression ratio,
 | 
						|
keeping high decompression speed and low memory requirements for 
 | 
						|
decompressing.
 | 
						|
 | 
						|
Note: you can read also LZMA Specification (lzma-specification.txt from LZMA SDK)
 | 
						|
 | 
						|
Also you can look source code for LZMA encoding and decoding:
 | 
						|
  C/Util/Lzma/LzmaUtil.c
 | 
						|
 | 
						|
 | 
						|
LZMA compressed file format
 | 
						|
---------------------------
 | 
						|
Offset Size Description
 | 
						|
  0     1   Special LZMA properties (lc,lp, pb in encoded form)
 | 
						|
  1     4   Dictionary size (little endian)
 | 
						|
  5     8   Uncompressed size (little endian). -1 means unknown size
 | 
						|
 13         Compressed data
 | 
						|
 | 
						|
 | 
						|
 | 
						|
ANSI-C LZMA Decoder
 | 
						|
~~~~~~~~~~~~~~~~~~~
 | 
						|
 | 
						|
Please note that interfaces for ANSI-C code were changed in LZMA SDK 4.58.
 | 
						|
If you want to use old interfaces you can download previous version of LZMA SDK
 | 
						|
from sourceforge.net site.
 | 
						|
 | 
						|
To use ANSI-C LZMA Decoder you need the following files:
 | 
						|
1) LzmaDec.h + LzmaDec.c + 7zTypes.h + Precomp.h + Compiler.h
 | 
						|
 | 
						|
Look example code:
 | 
						|
  C/Util/Lzma/LzmaUtil.c
 | 
						|
 | 
						|
 | 
						|
Memory requirements for LZMA decoding
 | 
						|
-------------------------------------
 | 
						|
 | 
						|
Stack usage of LZMA decoding function for local variables is not 
 | 
						|
larger than 200-400 bytes.
 | 
						|
 | 
						|
LZMA Decoder uses dictionary buffer and internal state structure.
 | 
						|
Internal state structure consumes
 | 
						|
  state_size = (4 + (1.5 << (lc + lp))) KB
 | 
						|
by default (lc=3, lp=0), state_size = 16 KB.
 | 
						|
 | 
						|
 | 
						|
How To decompress data
 | 
						|
----------------------
 | 
						|
 | 
						|
LZMA Decoder (ANSI-C version) now supports 2 interfaces:
 | 
						|
1) Single-call Decompressing
 | 
						|
2) Multi-call State Decompressing (zlib-like interface)
 | 
						|
 | 
						|
You must use external allocator:
 | 
						|
Example:
 | 
						|
void *SzAlloc(void *p, size_t size) { p = p; return malloc(size); }
 | 
						|
void SzFree(void *p, void *address) { p = p; free(address); }
 | 
						|
ISzAlloc alloc = { SzAlloc, SzFree };
 | 
						|
 | 
						|
You can use p = p; operator to disable compiler warnings.
 | 
						|
 | 
						|
 | 
						|
Single-call Decompressing
 | 
						|
-------------------------
 | 
						|
When to use: RAM->RAM decompressing
 | 
						|
Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h
 | 
						|
Compile defines: no defines
 | 
						|
Memory Requirements:
 | 
						|
  - Input buffer: compressed size
 | 
						|
  - Output buffer: uncompressed size
 | 
						|
  - LZMA Internal Structures: state_size (16 KB for default settings) 
 | 
						|
 | 
						|
Interface:
 | 
						|
  int LzmaDecode(Byte *dest, SizeT *destLen, const Byte *src, SizeT *srcLen,
 | 
						|
      const Byte *propData, unsigned propSize, ELzmaFinishMode finishMode, 
 | 
						|
      ELzmaStatus *status, ISzAlloc *alloc);
 | 
						|
  In: 
 | 
						|
    dest     - output data
 | 
						|
    destLen  - output data size
 | 
						|
    src      - input data
 | 
						|
    srcLen   - input data size
 | 
						|
    propData - LZMA properties  (5 bytes)
 | 
						|
    propSize - size of propData buffer (5 bytes)
 | 
						|
    finishMode - It has meaning only if the decoding reaches output limit (*destLen).
 | 
						|
         LZMA_FINISH_ANY - Decode just destLen bytes.
 | 
						|
         LZMA_FINISH_END - Stream must be finished after (*destLen).
 | 
						|
                           You can use LZMA_FINISH_END, when you know that 
 | 
						|
                           current output buffer covers last bytes of stream. 
 | 
						|
    alloc    - Memory allocator.
 | 
						|
 | 
						|
  Out: 
 | 
						|
    destLen  - processed output size 
 | 
						|
    srcLen   - processed input size 
 | 
						|
 | 
						|
  Output:
 | 
						|
    SZ_OK
 | 
						|
      status:
 | 
						|
        LZMA_STATUS_FINISHED_WITH_MARK
 | 
						|
        LZMA_STATUS_NOT_FINISHED 
 | 
						|
        LZMA_STATUS_MAYBE_FINISHED_WITHOUT_MARK
 | 
						|
    SZ_ERROR_DATA - Data error
 | 
						|
    SZ_ERROR_MEM  - Memory allocation error
 | 
						|
    SZ_ERROR_UNSUPPORTED - Unsupported properties
 | 
						|
    SZ_ERROR_INPUT_EOF - It needs more bytes in input buffer (src).
 | 
						|
 | 
						|
  If LZMA decoder sees end_marker before reaching output limit, it returns OK result,
 | 
						|
  and output value of destLen will be less than output buffer size limit.
 | 
						|
 | 
						|
  You can use multiple checks to test data integrity after full decompression:
 | 
						|
    1) Check Result and "status" variable.
 | 
						|
    2) Check that output(destLen) = uncompressedSize, if you know real uncompressedSize.
 | 
						|
    3) Check that output(srcLen) = compressedSize, if you know real compressedSize. 
 | 
						|
       You must use correct finish mode in that case. */ 
 | 
						|
 | 
						|
 | 
						|
Multi-call State Decompressing (zlib-like interface)
 | 
						|
----------------------------------------------------
 | 
						|
 | 
						|
When to use: file->file decompressing 
 | 
						|
Compile files: LzmaDec.h + LzmaDec.c + 7zTypes.h
 | 
						|
 | 
						|
Memory Requirements:
 | 
						|
 - Buffer for input stream: any size (for example, 16 KB)
 | 
						|
 - Buffer for output stream: any size (for example, 16 KB)
 | 
						|
 - LZMA Internal Structures: state_size (16 KB for default settings) 
 | 
						|
 - LZMA dictionary (dictionary size is encoded in LZMA properties header)
 | 
						|
 | 
						|
1) read LZMA properties (5 bytes) and uncompressed size (8 bytes, little-endian) to header:
 | 
						|
   unsigned char header[LZMA_PROPS_SIZE + 8];
 | 
						|
   ReadFile(inFile, header, sizeof(header)
 | 
						|
 | 
						|
2) Allocate CLzmaDec structures (state + dictionary) using LZMA properties
 | 
						|
 | 
						|
  CLzmaDec state;
 | 
						|
  LzmaDec_Constr(&state);
 | 
						|
  res = LzmaDec_Allocate(&state, header, LZMA_PROPS_SIZE, &g_Alloc);
 | 
						|
  if (res != SZ_OK)
 | 
						|
    return res;
 | 
						|
 | 
						|
3) Init LzmaDec structure before any new LZMA stream. And call LzmaDec_DecodeToBuf in loop
 | 
						|
 | 
						|
  LzmaDec_Init(&state);
 | 
						|
  for (;;)
 | 
						|
  {
 | 
						|
    ... 
 | 
						|
    int res = LzmaDec_DecodeToBuf(CLzmaDec *p, Byte *dest, SizeT *destLen, 
 | 
						|
        const Byte *src, SizeT *srcLen, ELzmaFinishMode finishMode);
 | 
						|
    ...
 | 
						|
  }
 | 
						|
 | 
						|
 | 
						|
4) Free all allocated structures
 | 
						|
  LzmaDec_Free(&state, &g_Alloc);
 | 
						|
 | 
						|
Look example code:
 | 
						|
  C/Util/Lzma/LzmaUtil.c
 | 
						|
 | 
						|
 | 
						|
How To compress data
 | 
						|
--------------------
 | 
						|
 | 
						|
Compile files: 
 | 
						|
  7zTypes.h
 | 
						|
  Threads.h	
 | 
						|
  LzmaEnc.h
 | 
						|
  LzmaEnc.c
 | 
						|
  LzFind.h
 | 
						|
  LzFind.c
 | 
						|
  LzFindMt.h
 | 
						|
  LzFindMt.c
 | 
						|
  LzHash.h
 | 
						|
 | 
						|
Memory Requirements:
 | 
						|
  - (dictSize * 11.5 + 6 MB) + state_size
 | 
						|
 | 
						|
Lzma Encoder can use two memory allocators:
 | 
						|
1) alloc - for small arrays.
 | 
						|
2) allocBig - for big arrays.
 | 
						|
 | 
						|
For example, you can use Large RAM Pages (2 MB) in allocBig allocator for 
 | 
						|
better compression speed. Note that Windows has bad implementation for 
 | 
						|
Large RAM Pages. 
 | 
						|
It's OK to use same allocator for alloc and allocBig.
 | 
						|
 | 
						|
 | 
						|
Single-call Compression with callbacks
 | 
						|
--------------------------------------
 | 
						|
 | 
						|
Look example code:
 | 
						|
  C/Util/Lzma/LzmaUtil.c
 | 
						|
 | 
						|
When to use: file->file compressing 
 | 
						|
 | 
						|
1) you must implement callback structures for interfaces:
 | 
						|
ISeqInStream
 | 
						|
ISeqOutStream
 | 
						|
ICompressProgress
 | 
						|
ISzAlloc
 | 
						|
 | 
						|
static void *SzAlloc(void *p, size_t size) { p = p; return MyAlloc(size); }
 | 
						|
static void SzFree(void *p, void *address) {  p = p; MyFree(address); }
 | 
						|
static ISzAlloc g_Alloc = { SzAlloc, SzFree };
 | 
						|
 | 
						|
  CFileSeqInStream inStream;
 | 
						|
  CFileSeqOutStream outStream;
 | 
						|
 | 
						|
  inStream.funcTable.Read = MyRead;
 | 
						|
  inStream.file = inFile;
 | 
						|
  outStream.funcTable.Write = MyWrite;
 | 
						|
  outStream.file = outFile;
 | 
						|
 | 
						|
 | 
						|
2) Create CLzmaEncHandle object;
 | 
						|
 | 
						|
  CLzmaEncHandle enc;
 | 
						|
 | 
						|
  enc = LzmaEnc_Create(&g_Alloc);
 | 
						|
  if (enc == 0)
 | 
						|
    return SZ_ERROR_MEM;
 | 
						|
 | 
						|
 | 
						|
3) initialize CLzmaEncProps properties;
 | 
						|
 | 
						|
  LzmaEncProps_Init(&props);
 | 
						|
 | 
						|
  Then you can change some properties in that structure.
 | 
						|
 | 
						|
4) Send LZMA properties to LZMA Encoder
 | 
						|
 | 
						|
  res = LzmaEnc_SetProps(enc, &props);
 | 
						|
 | 
						|
5) Write encoded properties to header
 | 
						|
 | 
						|
    Byte header[LZMA_PROPS_SIZE + 8];
 | 
						|
    size_t headerSize = LZMA_PROPS_SIZE;
 | 
						|
    UInt64 fileSize;
 | 
						|
    int i;
 | 
						|
 | 
						|
    res = LzmaEnc_WriteProperties(enc, header, &headerSize);
 | 
						|
    fileSize = MyGetFileLength(inFile);
 | 
						|
    for (i = 0; i < 8; i++)
 | 
						|
      header[headerSize++] = (Byte)(fileSize >> (8 * i));
 | 
						|
    MyWriteFileAndCheck(outFile, header, headerSize)
 | 
						|
 | 
						|
6) Call encoding function:
 | 
						|
      res = LzmaEnc_Encode(enc, &outStream.funcTable, &inStream.funcTable, 
 | 
						|
        NULL, &g_Alloc, &g_Alloc);
 | 
						|
 | 
						|
7) Destroy LZMA Encoder Object
 | 
						|
  LzmaEnc_Destroy(enc, &g_Alloc, &g_Alloc);
 | 
						|
 | 
						|
 | 
						|
If callback function return some error code, LzmaEnc_Encode also returns that code
 | 
						|
or it can return the code like SZ_ERROR_READ, SZ_ERROR_WRITE or SZ_ERROR_PROGRESS.
 | 
						|
 | 
						|
 | 
						|
Single-call RAM->RAM Compression
 | 
						|
--------------------------------
 | 
						|
 | 
						|
Single-call RAM->RAM Compression is similar to Compression with callbacks,
 | 
						|
but you provide pointers to buffers instead of pointers to stream callbacks:
 | 
						|
 | 
						|
SRes LzmaEncode(Byte *dest, SizeT *destLen, const Byte *src, SizeT srcLen,
 | 
						|
    const CLzmaEncProps *props, Byte *propsEncoded, SizeT *propsSize, int writeEndMark, 
 | 
						|
    ICompressProgress *progress, ISzAlloc *alloc, ISzAlloc *allocBig);
 | 
						|
 | 
						|
Return code:
 | 
						|
  SZ_OK               - OK
 | 
						|
  SZ_ERROR_MEM        - Memory allocation error 
 | 
						|
  SZ_ERROR_PARAM      - Incorrect paramater
 | 
						|
  SZ_ERROR_OUTPUT_EOF - output buffer overflow
 | 
						|
  SZ_ERROR_THREAD     - errors in multithreading functions (only for Mt version)
 | 
						|
 | 
						|
 | 
						|
 | 
						|
Defines
 | 
						|
-------
 | 
						|
 | 
						|
_LZMA_SIZE_OPT - Enable some optimizations in LZMA Decoder to get smaller executable code.
 | 
						|
 | 
						|
_LZMA_PROB32   - It can increase the speed on some 32-bit CPUs, but memory usage for 
 | 
						|
                 some structures will be doubled in that case.
 | 
						|
 | 
						|
_LZMA_UINT32_IS_ULONG  - Define it if int is 16-bit on your compiler and long is 32-bit.
 | 
						|
 | 
						|
_LZMA_NO_SYSTEM_SIZE_T  - Define it if you don't want to use size_t type.
 | 
						|
 | 
						|
 | 
						|
_7ZIP_PPMD_SUPPPORT - Define it if you don't want to support PPMD method in AMSI-C .7z decoder.
 | 
						|
 | 
						|
 | 
						|
C++ LZMA Encoder/Decoder 
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
C++ LZMA code use COM-like interfaces. So if you want to use it, 
 | 
						|
you can study basics of COM/OLE.
 | 
						|
C++ LZMA code is just wrapper over ANSI-C code.
 | 
						|
 | 
						|
 | 
						|
C++ Notes
 | 
						|
~~~~~~~~~~~~~~~~~~~~~~~~
 | 
						|
If you use some C++ code folders in 7-Zip (for example, C++ code for .7z handling),
 | 
						|
you must check that you correctly work with "new" operator.
 | 
						|
7-Zip can be compiled with MSVC 6.0 that doesn't throw "exception" from "new" operator.
 | 
						|
So 7-Zip uses "CPP\Common\NewHandler.cpp" that redefines "new" operator:
 | 
						|
operator new(size_t size)
 | 
						|
{
 | 
						|
  void *p = ::malloc(size);
 | 
						|
  if (p == 0)
 | 
						|
    throw CNewException();
 | 
						|
  return p;
 | 
						|
}
 | 
						|
If you use MSCV that throws exception for "new" operator, you can compile without 
 | 
						|
"NewHandler.cpp". So standard exception will be used. Actually some code of 
 | 
						|
7-Zip catches any exception in internal code and converts it to HRESULT code.
 | 
						|
So you don't need to catch CNewException, if you call COM interfaces of 7-Zip.
 | 
						|
 | 
						|
---
 | 
						|
 | 
						|
http://www.7-zip.org
 | 
						|
http://www.7-zip.org/sdk.html
 | 
						|
http://www.7-zip.org/support.html
 |