A close reading of the ZIP spec insists that if bit 3 of the GP flags is set
then the archive cannot be read via `Zip::InputStream`. But in most cases
the correct information is present to be able to do so, both safely and
reliably, and v2.4 does allow this.
This commit ensures that behaviour is present in v3.0.
This method returns `true` if the time instance was created with accurate
timezone information. Ultimately, only those times parsed from binary
DOS format are missing accurate timezone information, but we need this
flag because ruby `Time` objects (from which `DOSTime` is decended) always
have a timezone set (usually whatever is local at the time).
This commit adds a parameter to the `File#extract` and `Entry#extract` methods
so that a base destination directory can be specified for extracting archives
in bulk to somewhere in the filesystem that isn't the current working
directory. This directory is `.` by default. It is combined with the entry
path - which shouldn't but could have relative directories (e.g. `..`) in it -
and tested for safety before extracting.
Resolves#540.
Previously the central directory Zip64 data was written even if it wasn't
strictly needed. The standard allows for entries to include Zip64 data
(say, if they are streamed and their size is unknown when writing the file
data) without needing any Zip64 data in the central directory. So now we
only write central directory Zip64 data if there are over 65535 files or
the file data is huge.
With Zip64 write support enabled by default, it's important that we
only store the extra data when we need to. This commit ensures that
the Zip64 extra data is included for an entry if its size is over
4GB, or if we don't know how big it will be at the point of writing
the local header data.
This commit also removes the need for the Zip64Placeholder extra
data field. Now we just use the Zip64 field itself and ensure it's
filled in correctly.
Previously if RubyZip attempted to create an archive with more than
64K entries, the central directory would truncate the count. `unzip`
and `zipinfo` would fail with an error message such as:
```
error: expected central file header signature not found (file #93272).
(please check that you have transferred or created the zipfile in the
appropriate BINARY mode and that you have compiled UnZip properly)
```
This generated a lot of confusion and a production issue since many
tools fail to decode a RubyZip-created archive if Zip64 is not enabled
for a large number of files. Since Zip64 support is now the norm,
enable this by default.
`GPFBit3Error` doesn't really mean anything to the general user, and
it's not descriptive of the issue at hand. This error is raised when a
zip file cannot be streamed via `InputStream`, so `StreamingError` makes
more sense.
Also standardize the error message while we're about it.
When reading an archive with `InputStream`, `Entry.ftype` was returning
`:file` for all entries, even if they were a directory. This is due to
various side-effects in many methods in `Entry`. This commit fixes the
behaviour, but not the side-effects.
Fixes#533.
Also, remove `Entry#extra=` as it makes no sense (and wasn't even being
tested).
And remove slightly odd test that was assuming an archive would not be
changed if its utime was changed - even if it was being changed back
immediately. This test was merely confirming that we weren't catching
timestamp changes correctly.
Set it to true by default - because a new `Entry` is dirty by
definition, having not been written yet. Then make sure that an `Entry`
that is created by reading from a zip file is set as not dirty.
When passing an `Entry` type to `File#get_output_stream` the entry is
used to create a `StreamableStream`, which preserves all the info in the
entry, such as timestamp, etc. But then in `put_next_entry` all that is
lost due to the test for `kind_of?(Entry)` which a `StreamableStream` is
not. See #503 for details.
This change tests for `StreamableStream`s in `put_next_entry` and uses
them directly. Some set-up within `Entry` needed to be made more robust
to cope with this, but otherwise it's a low impact change, which does
fix the problem.
The reason this case was being missed before is that the tests weren't
testing `get_output_stream` with an `Entry` object, so I have also added
that test too.
Fixes#503.
`CentralDirectory` shouldn't be in the public API for rubyzip and
there's nothing that `CentralDirectory::read_from_stream` did that
couldn't be done by just initializing an object first. Keeping it around
risked things getting out of date as we streamline and fix other things.
If a zip file has a comment that is 65,535 characters long - which is a
valid length and the maximum allowable length - the initial read of the
archive fails to find the Zip64 End of Central Directory Locator and
therefore cannot read the rest of the file.
This commit fixes this by making sure that we look far enough back into
the file from the end to find this locator, and then use the
information in it to find the Zip64 End of Central Directory Record.
Test added to catch regressions.
Fixes#509.
This method provides a short cut to finding out how many entries are in
an archive by reading this number directly from the central directory,
and not iterating through the entire set of entries.
When loading extra fields from both the central directory and local headers,
unknown fields were not merged correctly. They were being appended, which
means that we end up with the two versions stuck together - in some
cases duplicating the field completely.
This broke all kinds of things (like calculating the size of a local
header) in subtle ways.
This commit fixes this by implementing a new `Unknown` extra field type,
and making sure that when reading local and central extra fields they
are stored and preserved correctly. We cannot assume the unknown fields
use the same data in the local and central headers.
Fixes#505.
If a zip file has a comment that is 65,535 characters long - which is a
valid length and the maximum allowable length - the initial read of the
archive fails to find the End of Central Directory Record and therefore
cannot read the rest of the file.
This commit fixes this by making sure that we look far enough back into
the file from the end to find the EoCDR. Test added to catch
regressions.
Fixes#508.
We were previously trying to work out where the next entry would be,
even with GP bit 3 set, but the logic was flaky and cannot really be
correct given the data available. It's not expected behaviour, so raise
the error instead.
This means that we get rid of the incorrect `Entry.data_descriptor_size`
which was doing more harm than good.