public class SeekableXZInputStream extends SeekableInputStream
Each .xz file consist of one or more Streams. Each Stream consist of zero or more Blocks. Each Stream contains an Index of Streams' Blocks. The Indexes from all Streams are loaded in RAM by a constructor of this class. A typical .xz file has only one Stream, and parsing its Index will need only three or four seeks.
To make random access possible, the data in a .xz file must be splitted into multiple Blocks of reasonable size. Decompression can only start at a Block boundary. When seeking to an uncompressed position that is not at a Block boundary, decompression starts at the beginning of the Block and throws away data until the target position is reached. Thus, smaller Blocks mean faster seeks to arbitrary uncompressed positions. On the other hand, smaller Blocks mean worse compression. So one has to make a compromise between random access speed and compression ratio.
Implementation note: This class uses linear search to locate the correct Stream from the data structures in RAM. It was the simplest to implement and should be fine as long as there aren't too many Streams. The correct Block inside a Stream is located using binary search and thus is fast even with a huge number of Blocks.
The amount of memory needed for the Indexes is taken into account when checking the memory usage limit. Each Stream is calculated to need at least 1 KiB of memory and each Block 16 bytes of memory, rounded up to the next kibibyte. So unless the file has a huge number of Streams or Blocks, these don't take significant amount of memory.
When using XZOutputStream
, a new Block can be started by calling
its endBlock
method. If you know
that the decompressor will only need to seek to certain uncompressed
positions, it can be a good idea to start a new Block at (some of) these
positions (and only at these positions to get better compression ratio).
liblzma in XZ Utils supports starting a new Block with
LZMA_FULL_FLUSH
. XZ Utils 5.1.1alpha added threaded
compression which creates multi-Block .xz files. XZ Utils 5.1.1alpha
also added the option --block-size=SIZE
to the xz command
line tool. XZ Utils 5.1.2alpha added a partial implementation of
--block-list=SIZES
which allows specifying sizes of
individual Blocks.
SeekableFileInputStream
,
XZInputStream
,
XZOutputStream
Constructor and Description |
---|
SeekableXZInputStream(SeekableInputStream in)
Creates a new seekable XZ decompressor without a memory usage limit.
|
SeekableXZInputStream(SeekableInputStream in,
int memoryLimit)
Creates a new seekable XZ decomporessor with an optional
memory usage limit.
|
Modifier and Type | Method and Description |
---|---|
int |
available()
Returns the number of uncompressed bytes that can be read
without blocking.
|
void |
close()
Closes the stream and calls
in.close() . |
int |
getBlockCheckType(int blockNumber)
Gets integrity check type (Check ID) of the given Block.
|
long |
getBlockCompPos(int blockNumber)
Gets the position where the given compressed Block starts in
the underlying .xz file.
|
long |
getBlockCompSize(int blockNumber)
Gets the compressed size of the given Block.
|
int |
getBlockCount()
Gets the number of Blocks in the .xz file.
|
int |
getBlockNumber(long pos)
Gets the number of the Block that contains the byte at the given
uncompressed position.
|
long |
getBlockPos(int blockNumber)
Gets the uncompressed start position of the given Block.
|
long |
getBlockSize(int blockNumber)
Gets the uncompressed size of the given Block.
|
int |
getCheckTypes()
Gets the types of integrity checks used in the .xz file.
|
int |
getIndexMemoryUsage()
Gets the amount of memory in kibibytes (KiB) used by
the data structures needed to locate the XZ Blocks.
|
long |
getLargestBlockSize()
Gets the uncompressed size of the largest XZ Block in bytes.
|
int |
getStreamCount()
Gets the number of Streams in the .xz file.
|
long |
length()
Gets the uncompressed size of this input stream.
|
long |
position()
Gets the current uncompressed position in this input stream.
|
int |
read()
Decompresses the next byte from this input stream.
|
int |
read(byte[] buf,
int off,
int len)
Decompresses into an array of bytes.
|
void |
seek(long pos)
Seeks to the specified absolute uncompressed position in the stream.
|
void |
seekToBlock(int blockNumber)
Seeks to the beginning of the given XZ Block.
|
skip
mark, markSupported, read, reset
public SeekableXZInputStream(SeekableInputStream in) throws IOException
in
- seekable input stream containing one or more
XZ Streams; the whole input stream is usedXZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify
options not supported by this implementationEOFException
- less than 6 bytes of input was available
from in
, or (unlikely) the size
of the underlying stream got smaller while
this was reading from itIOException
- may be thrown by in
public SeekableXZInputStream(SeekableInputStream in, int memoryLimit) throws IOException
in
- seekable input stream containing one or more
XZ Streams; the whole input stream is usedmemoryLimit
- memory usage limit in kibibytes (KiB)
or -1
to impose no
memory usage limitXZFormatException
- input is not in the XZ formatCorruptedInputException
- XZ data is corrupt or truncatedUnsupportedOptionsException
- XZ headers seem valid but they specify
options not supported by this implementationMemoryLimitException
- decoded XZ Indexes would need more memory
than allowed by the memory usage limitEOFException
- less than 6 bytes of input was available
from in
, or (unlikely) the size
of the underlying stream got smaller while
this was reading from itIOException
- may be thrown by in
public int getCheckTypes()
The returned value has a bit set for every check type that is present.
For example, if CRC64 and SHA-256 were used, the return value is
(1 << XZ.CHECK_CRC64)
| (1 << XZ.CHECK_SHA256)
.
public int getIndexMemoryUsage()
public long getLargestBlockSize()
public int getStreamCount()
public int getBlockCount()
public long getBlockPos(int blockNumber)
IndexOutOfBoundsException
- if
blockNumber < 0
or
blockNumber >= getBlockCount()
.public long getBlockSize(int blockNumber)
IndexOutOfBoundsException
- if
blockNumber < 0
or
blockNumber >= getBlockCount()
.public long getBlockCompPos(int blockNumber)
IndexOutOfBoundsException
- if
blockNumber < 0
or
blockNumber >= getBlockCount()
.public long getBlockCompSize(int blockNumber)
IndexOutOfBoundsException
- if
blockNumber < 0
or
blockNumber >= getBlockCount()
.public int getBlockCheckType(int blockNumber)
IndexOutOfBoundsException
- if
blockNumber < 0
or
blockNumber >= getBlockCount()
.getCheckTypes()
public int getBlockNumber(long pos)
IndexOutOfBoundsException
- if
pos < 0
or
pos >= length()
.public int read() throws IOException
read
in class InputStream
-1
to indicate the end of the compressed streamCorruptedInputException
UnsupportedOptionsException
MemoryLimitException
XZIOException
- if the stream has been closedIOException
- may be thrown by in
public int read(byte[] buf, int off, int len) throws IOException
If len
is zero, no bytes are read and 0
is returned. Otherwise this will try to decompress len
bytes of uncompressed data. Less than len
bytes may
be read only in the following situations:
len
bytes have already been successfully
decompressed. The next call with non-zero len
will immediately throw the pending exception.read
in class InputStream
buf
- target buffer for uncompressed dataoff
- start offset in buf
len
- maximum number of uncompressed bytes to read-1
to indicate
the end of the compressed streamCorruptedInputException
UnsupportedOptionsException
MemoryLimitException
XZIOException
- if the stream has been closedIOException
- may be thrown by in
public int available() throws IOException
CorruptedInputException
may get
thrown before the number of bytes claimed to be available have
been read from this input stream.available
in class InputStream
IOException
public void close() throws IOException
in.close()
.
If the stream was already closed, this does nothing.close
in interface Closeable
close
in interface AutoCloseable
close
in class InputStream
IOException
- if thrown by in.close()
public long length()
length
in class SeekableInputStream
public long position() throws IOException
position
in class SeekableInputStream
XZIOException
- if the stream has been closedIOException
public void seek(long pos) throws IOException
read
is called
to read at least one byte.
Seeking past the end of the stream is possible. In that case
read
will return -1
to indicate
the end of the stream.
seek
in class SeekableInputStream
pos
- new uncompressed read positionXZIOException
- if pos
is negative, or
if stream has been closedIOException
- if pos
is negative or if
a stream-specific I/O error occurspublic void seekToBlock(int blockNumber) throws IOException
XZIOException
- if blockNumber < 0
or
blockNumber >= getBlockCount()
,
or if stream has been closedIOException
Copyright © 2016 Internet2. All rights reserved.