public class LZMA2Options extends FilterOptions
While this allows setting the LZMA2 compression options in detail,
often you only need LZMA2Options()
or
LZMA2Options(int)
.
Modifier and Type | Field and Description |
---|---|
static int |
DICT_SIZE_DEFAULT
The default dictionary size is 8 MiB.
|
static int |
DICT_SIZE_MAX
Maximum dictionary size for compression is 768 MiB.
|
static int |
DICT_SIZE_MIN
Minimum dictionary size is 4 KiB.
|
static int |
LC_DEFAULT
The default number of literal context bits is 3.
|
static int |
LC_LP_MAX
Maximum value for lc + lp is 4.
|
static int |
LP_DEFAULT
The default number of literal position bits is 0.
|
static int |
MF_BT4
Match finder: Binary tree 2-3-4
|
static int |
MF_HC4
Match finder: Hash Chain 2-3-4
|
static int |
MODE_FAST
Compression mode: fast.
|
static int |
MODE_NORMAL
Compression mode: normal.
|
static int |
MODE_UNCOMPRESSED
Compression mode: uncompressed.
|
static int |
NICE_LEN_MAX
Maximum value for
niceLen is 273. |
static int |
NICE_LEN_MIN
Minimum value for
niceLen is 8. |
static int |
PB_DEFAULT
The default number of position bits is 2.
|
static int |
PB_MAX
Maximum value for pb is 4.
|
static int |
PRESET_DEFAULT
Default compression preset level is 6.
|
static int |
PRESET_MAX
Maximum valid compression preset level is 9.
|
static int |
PRESET_MIN
Minimum valid compression preset level is 0.
|
Constructor and Description |
---|
LZMA2Options()
Creates new LZMA2 options and sets them to the default values.
|
LZMA2Options(int preset)
Creates new LZMA2 options and sets them to the given preset.
|
LZMA2Options(int dictSize,
int lc,
int lp,
int pb,
int mode,
int niceLen,
int mf,
int depthLimit)
Creates new LZMA2 options and sets them to the given custom values.
|
Modifier and Type | Method and Description |
---|---|
Object |
clone() |
int |
getDecoderMemoryUsage()
Gets how much memory the LZMA2 decoder will need to decompress the data
that was encoded with these options and stored in a .xz file.
|
int |
getDepthLimit()
Gets the match finder search depth limit.
|
int |
getDictSize()
Gets the dictionary size in bytes.
|
int |
getEncoderMemoryUsage()
Gets how much memory the encoder will need with these options.
|
InputStream |
getInputStream(InputStream in)
Gets a raw (no XZ headers) decoder input stream using these options.
|
int |
getLc()
Gets the number of literal context bits.
|
int |
getLp()
Gets the number of literal position bits.
|
int |
getMatchFinder()
Gets the match finder type.
|
int |
getMode()
Gets the compression mode.
|
int |
getNiceLen()
Gets the nice length of matches.
|
FinishableOutputStream |
getOutputStream(FinishableOutputStream out)
Gets a raw (no XZ headers) encoder output stream using these options.
|
int |
getPb()
Gets the number of position bits.
|
byte[] |
getPresetDict()
Gets the preset dictionary.
|
void |
setDepthLimit(int depthLimit)
Sets the match finder search depth limit.
|
void |
setDictSize(int dictSize)
Sets the dictionary size in bytes.
|
void |
setLc(int lc)
Sets the number of literal context bits.
|
void |
setLcLp(int lc,
int lp)
Sets the number of literal context bits and literal position bits.
|
void |
setLp(int lp)
Sets the number of literal position bits.
|
void |
setMatchFinder(int mf)
Sets the match finder type.
|
void |
setMode(int mode)
Sets the compression mode.
|
void |
setNiceLen(int niceLen)
Sets the nice length of matches.
|
void |
setPb(int pb)
Sets the number of position bits.
|
void |
setPreset(int preset)
Sets the compression options to the given preset.
|
void |
setPresetDict(byte[] presetDict)
Sets a preset dictionary.
|
getDecoderMemoryUsage, getEncoderMemoryUsage
public static final int PRESET_MIN
public static final int PRESET_MAX
public static final int PRESET_DEFAULT
public static final int DICT_SIZE_MIN
public static final int DICT_SIZE_MAX
The decompressor supports bigger dictionaries, up to almost 2 GiB. With HC4 the encoder would support dictionaries bigger than 768 MiB. The 768 MiB limit comes from the current implementation of BT4 where we would otherwise hit the limits of signed ints in array indexing.
If you really need bigger dictionary for decompression,
use LZMA2InputStream
directly.
public static final int DICT_SIZE_DEFAULT
public static final int LC_LP_MAX
public static final int LC_DEFAULT
public static final int LP_DEFAULT
public static final int PB_MAX
public static final int PB_DEFAULT
public static final int MODE_UNCOMPRESSED
public static final int MODE_FAST
public static final int MODE_NORMAL
public static final int NICE_LEN_MIN
niceLen
is 8.public static final int NICE_LEN_MAX
niceLen
is 273.public static final int MF_HC4
public static final int MF_BT4
public LZMA2Options()
LZMA2Options(PRESET_DEFAULT)
.public LZMA2Options(int preset) throws UnsupportedOptionsException
UnsupportedOptionsException
- preset
is not supportedpublic LZMA2Options(int dictSize, int lc, int lp, int pb, int mode, int niceLen, int mf, int depthLimit) throws UnsupportedOptionsException
UnsupportedOptionsException
- unsupported options were specifiedpublic void setPreset(int preset) throws UnsupportedOptionsException
The presets 0-3 are fast presets with medium compression.
The presets 4-6 are fairly slow presets with high compression.
The default preset (PRESET_DEFAULT
) is 6.
The presets 7-9 are like the preset 6 but use bigger dictionaries and have higher compressor and decompressor memory requirements. Unless the uncompressed size of the file exceeds 8 MiB, 16 MiB, or 32 MiB, it is waste of memory to use the presets 7, 8, or 9, respectively.
UnsupportedOptionsException
- preset
is not supportedpublic void setDictSize(int dictSize) throws UnsupportedOptionsException
The dictionary (or history buffer) holds the most recently seen uncompressed data. Bigger dictionary usually means better compression. However, using a dictioanary bigger than the size of the uncompressed data is waste of memory.
Any value in the range [DICT_SIZE_MIN, DICT_SIZE_MAX] is valid, but sizes of 2^n and 2^n + 2^(n-1) bytes are somewhat recommended.
UnsupportedOptionsException
- dictSize
is not supportedpublic int getDictSize()
public void setPresetDict(byte[] presetDict)
The .xz format doesn't support a preset dictionary for now. Do not set a preset dictionary unless you use raw LZMA2.
Preset dictionary can be useful when compressing many similar, relatively small chunks of data independently from each other. A preset dictionary should contain typical strings that occur in the files being compressed. The most probable strings should be near the end of the preset dictionary. The preset dictionary used for compression is also needed for decompression.
public byte[] getPresetDict()
public void setLcLp(int lc, int lp) throws UnsupportedOptionsException
The sum of lc
and lp
is limited to 4.
Trying to exceed it will throw an exception. This function lets
you change both at the same time.
UnsupportedOptionsException
- lc
and lp
are invalidpublic void setLc(int lc) throws UnsupportedOptionsException
All bytes that cannot be encoded as matches are encoded as literals. That is, literals are simply 8-bit bytes that are encoded one at a time.
The literal coding makes an assumption that the highest lc
bits of the previous uncompressed byte correlate with the next byte.
For example, in typical English text, an upper-case letter is often
followed by a lower-case letter, and a lower-case letter is usually
followed by another lower-case letter. In the US-ASCII character set,
the highest three bits are 010 for upper-case letters and 011 for
lower-case letters. When lc
is at least 3, the literal
coding can take advantage of this property in the uncompressed data.
The default value (3) is usually good. If you want maximum compression,
try setLc(4)
. Sometimes it helps a little, and sometimes it
makes compression worse. If it makes it worse, test for example
setLc(2)
too.
UnsupportedOptionsException
- lc
is invalid, or the sum
of lc
and lp
exceed LC_LP_MAXpublic void setLp(int lp) throws UnsupportedOptionsException
This affets what kind of alignment in the uncompressed data is
assumed when encoding literals. See setPb
for
more information about alignment.
UnsupportedOptionsException
- lp
is invalid, or the sum
of lc
and lp
exceed LC_LP_MAXpublic int getLc()
public int getLp()
public void setPb(int pb) throws UnsupportedOptionsException
This affects what kind of alignment in the uncompressed data is
assumed in general. The default (2) means four-byte alignment
(2^pb
= 2^2 = 4), which is often a good choice when
there's no better guess.
When the alignment is known, setting the number of position bits
accordingly may reduce the file size a little. For example with text
files having one-byte alignment (US-ASCII, ISO-8859-*, UTF-8), using
setPb(0)
can improve compression slightly. For UTF-16
text, setPb(1)
is a good choice. If the alignment is
an odd number like 3 bytes, setPb(0)
might be the best
choice.
Even though the assumed alignment can be adjusted with
setPb
and setLp
, LZMA2 still slightly favors
16-byte alignment. It might be worth taking into account when designing
file formats that are likely to be often compressed with LZMA2.
UnsupportedOptionsException
- pb
is invalidpublic int getPb()
public void setMode(int mode) throws UnsupportedOptionsException
This specifies the method to analyze the data produced by
a match finder. The default is MODE_FAST
for presets
0-3 and MODE_NORMAL
for presets 4-9.
Usually MODE_FAST
is used with Hash Chain match finders
and MODE_NORMAL
with Binary Tree match finders. This is
also what the presets do.
The special mode MODE_UNCOMPRESSED
doesn't try to
compress the data at all (and doesn't use a match finder) and will
simply wrap it in uncompressed LZMA2 chunks.
UnsupportedOptionsException
- mode
is not supportedpublic int getMode()
public void setNiceLen(int niceLen) throws UnsupportedOptionsException
niceLen
bytes is found,
the algorithm stops looking for better matches. Higher values tend
to give better compression at the expense of speed. The default
depends on the preset.UnsupportedOptionsException
- niceLen
is invalidpublic int getNiceLen()
public void setMatchFinder(int mf) throws UnsupportedOptionsException
Match finder has a major effect on compression speed, memory usage,
and compression ratio. Usually Hash Chain match finders are faster
than Binary Tree match finders. The default depends on the preset:
0-3 use MF_HC4
and 4-9 use MF_BT4
.
UnsupportedOptionsException
- mf
is not supportedpublic int getMatchFinder()
public void setDepthLimit(int depthLimit) throws UnsupportedOptionsException
The default is a special value of 0
which indicates that
the depth limit should be automatically calculated by the selected
match finder from the nice length of matches.
Reasonable depth limit for Hash Chain match finders is 4-100 and 16-1000 for Binary Tree match finders. Using very high values can make the compressor extremely slow with some files. Avoid settings higher than 1000 unless you are prepared to interrupt the compression in case it is taking far too long.
UnsupportedOptionsException
- depthLimit
is invalidpublic int getDepthLimit()
public int getEncoderMemoryUsage()
FilterOptions
getEncoderMemoryUsage
in class FilterOptions
public FinishableOutputStream getOutputStream(FinishableOutputStream out)
FilterOptions
getOutputStream
in class FilterOptions
public int getDecoderMemoryUsage()
The returned value may bigger than the value returned by a direct call
to LZMA2InputStream.getMemoryUsage(int)
if the dictionary size
is not 2^n or 2^n + 2^(n-1) bytes. This is because the .xz
headers store the dictionary size in such a format and other values
are rounded up to the next such value. Such rounding is harmess except
it might waste some memory if an unsual dictionary size is used.
If you use raw LZMA2 streams and unusual dictioanary size, call
LZMA2InputStream.getMemoryUsage(int)
directly to get raw decoder
memory requirements.
getDecoderMemoryUsage
in class FilterOptions
public InputStream getInputStream(InputStream in) throws IOException
FilterOptions
getInputStream
in class FilterOptions
IOException
Copyright © 2016 Internet2. All rights reserved.