This allows for reading the EOCD records without then automatically
reading all of the entry data as well, so that we can do other things
faster, like provide the number of entries in an archive.
When loading extra fields from both the central directory and local headers,
unknown fields were not merged correctly. They were being appended, which
means that we end up with the two versions stuck together - in some
cases duplicating the field completely.
This broke all kinds of things (like calculating the size of a local
header) in subtle ways.
This commit fixes this by implementing a new `Unknown` extra field type,
and making sure that when reading local and central extra fields they
are stored and preserved correctly. We cannot assume the unknown fields
use the same data in the local and central headers.
Fixes#505.
If a zip file has a comment that is 65,535 characters long - which is a
valid length and the maximum allowable length - the initial read of the
archive fails to find the End of Central Directory Record and therefore
cannot read the rest of the file.
This commit fixes this by making sure that we look far enough back into
the file from the end to find the EoCDR. Test added to catch
regressions.
Fixes#508.
`test_read_from_truncated_zip_file` was not testing what it thought it
was. It was testing whether we caught an out-of-bounds cdir offset, not
whether we caught a corrupted cdir entry.
This commit embraces the actual behaviour and tests that we catch an
out-of-bounds error for both standard `IO`s and `StringIO`s.
Previously, only the extra fields stored in the central directory were
being read in. In reality it is often the case that the extra field in
the central directory is just a marker, and the full data is in the
local header. So we need to read both in and merge the two into the
final correct extra field. This merging infrastructure was already
implemented in the extra field code but we never actually read the
local extra fields in until now.
Reading the central directory headers and local entry headers seems
rather fragile, so we can't just read one over the other and hope to end
up with a correctly merged set of extra fields because this breaks other
things. So we need to specifically read the local extra field data and
merge just those bits.
This commit also fixes a couple of tests that were 'broken' by us now
reading extra fields in correctly!
This commit adds the capability of creating archives larger than
4GB via zip64 extensions. It also fixes bugs reading archives of
this size (specifically, the 64-bit offset of the local file
header was not being read from the central directory entry).
To maximize compatibility, zip64 extensions are used only when
required. Unfortunately, at the time we write a local file header,
we don't know the size of the file and thus whether a Zip64
Extended Information Extra Field will be required. Therefore
this commit writes a 'placeholder' extra field to reserve space
for the zip64 entry, which will be written if necessary when
we update the local entry with the final sizes and CRC. I use
the signature "\x99\x99" for this field, following the example
of DotNetZip which does the same.
This commit also adds a rake task, zip64_full_test, which
fully tests zip64 by actually creating and verifying a 4GB zip
file. Please note, however, that this test requires UnZip
version 6.00 or newer, which may not be supplied by your OS.
This test doesn't run along with the main unit tests because
it takes a few minutes to complete.