rubyzip/README.md

427 lines
14 KiB
Markdown
Raw Normal View History

2013-08-27 04:34:17 +08:00
# rubyzip
2015-02-03 05:23:04 +08:00
[![Gem Version](https://badge.fury.io/rb/rubyzip.svg)](http://badge.fury.io/rb/rubyzip)
[![Tests](https://github.com/rubyzip/rubyzip/actions/workflows/tests.yml/badge.svg)](https://github.com/rubyzip/rubyzip/actions/workflows/tests.yml)
[![Linter](https://github.com/rubyzip/rubyzip/actions/workflows/lint.yml/badge.svg)](https://github.com/rubyzip/rubyzip/actions/workflows/lint.yml)
2014-12-05 18:29:32 +08:00
[![Code Climate](https://codeclimate.com/github/rubyzip/rubyzip.svg)](https://codeclimate.com/github/rubyzip/rubyzip)
[![Coverage Status](https://img.shields.io/coveralls/rubyzip/rubyzip.svg)](https://coveralls.io/r/rubyzip/rubyzip?branch=master)
2002-01-03 01:48:31 +08:00
2015-04-20 21:08:06 +08:00
Rubyzip is a ruby library for reading and writing zip files.
2002-01-03 01:48:31 +08:00
## Important notes
### Version 3.0
The public API of some classes has been modernized to use named parameters for optional arguments. Please check your usage of the following Rubyzip classes:
* `File`
* `Entry`
* `InputStream`
* `OutputStream`
### Older versions (pre 2.0)
2013-06-03 16:05:08 +08:00
2015-04-20 21:08:06 +08:00
The Rubyzip interface has changed!!! No need to do `require "zip/zip"` and `Zip` prefix in class names removed.
2013-06-03 16:05:08 +08:00
If you have issues with any third-party gems that require an old version of rubyzip, you can use this workaround:
2013-08-31 04:37:50 +08:00
```ruby
2013-12-06 23:22:16 +08:00
gem 'rubyzip', '>= 1.0.0' # will load new rubyzip version
gem 'zip-zip' # will load compatibility for old rubyzip API.
2013-08-31 04:37:50 +08:00
```
## Requirements
2019-09-19 04:47:09 +08:00
- Ruby 2.4 or greater (for rubyzip 2.0; use 1.x for older rubies)
## Installation
2015-04-20 21:08:06 +08:00
Rubyzip is available on RubyGems:
2002-03-22 05:12:19 +08:00
2012-04-08 05:21:02 +08:00
```
gem install rubyzip
```
2002-03-22 05:12:19 +08:00
2012-04-08 05:21:02 +08:00
Or in your Gemfile:
```ruby
gem 'rubyzip'
```
2002-01-06 06:09:48 +08:00
## Usage
### Basic zip archive creation
2012-03-13 07:31:55 +08:00
```ruby
require 'rubygems'
2013-06-03 18:42:16 +08:00
require 'zip'
2012-08-30 09:50:41 +08:00
folder = "Users/me/Desktop/stuff_to_zip"
input_filenames = ['image.jpg', 'description.txt', 'stats.csv']
2012-08-30 09:50:41 +08:00
zipfile_name = "/Users/me/Desktop/archive.zip"
2012-08-30 09:50:41 +08:00
Zip::File.open(zipfile_name, create: true) do |zipfile|
input_filenames.each do |filename|
# Two arguments:
# - The name of the file as it will appear in the archive
# - The original file, including the path to find it
zipfile.add(filename, File.join(folder, filename))
end
2017-04-17 13:53:10 +08:00
zipfile.get_output_stream("myFile") { |f| f.write "myFile contains just this" }
end
2012-03-13 07:31:55 +08:00
```
### Zipping a directory recursively
Copy from [here](https://github.com/rubyzip/rubyzip/blob/9d891f7353e66052283562d3e252fe380bb4b199/samples/example_recursive.rb)
```ruby
2013-06-03 18:42:16 +08:00
require 'zip'
# This is a simple example which uses rubyzip to
# recursively generate a zip file from the contents of
# a specified directory. The directory itself is not
# included in the archive, rather just its contents.
#
# Usage:
# directory_to_zip = "/tmp/input"
# output_file = "/tmp/out.zip"
# zf = ZipFileGenerator.new(directory_to_zip, output_file)
# zf.write()
class ZipFileGenerator
# Initialize with the directory to zip and the location of the output archive.
2015-06-01 13:25:19 +08:00
def initialize(input_dir, output_file)
@input_dir = input_dir
@output_file = output_file
end
2015-06-01 13:25:19 +08:00
# Zip the input directory.
2015-06-01 13:25:19 +08:00
def write
entries = Dir.entries(@input_dir) - %w[. ..]
2015-06-01 13:25:19 +08:00
::Zip::File.open(@output_file, create: true) do |zipfile|
2017-04-17 13:53:10 +08:00
write_entries entries, '', zipfile
2015-06-01 13:25:19 +08:00
end
end
2015-06-01 13:25:19 +08:00
private
2015-06-01 13:25:19 +08:00
# A helper method to make the recursion work.
2017-04-17 13:53:10 +08:00
def write_entries(entries, path, zipfile)
2015-06-01 13:25:19 +08:00
entries.each do |e|
2017-04-17 13:53:10 +08:00
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(@input_dir, zipfile_path)
2015-06-01 13:25:19 +08:00
if File.directory? disk_file_path
2017-04-17 13:53:10 +08:00
recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
else
2017-04-17 13:53:10 +08:00
put_into_archive(disk_file_path, zipfile, zipfile_path)
end
2015-06-01 13:25:19 +08:00
end
end
2017-04-17 13:53:10 +08:00
def recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
zipfile.mkdir zipfile_path
subdir = Dir.entries(disk_file_path) - %w[. ..]
2017-04-17 13:53:10 +08:00
write_entries subdir, zipfile_path, zipfile
2015-06-01 13:25:19 +08:00
end
2017-04-17 13:53:10 +08:00
def put_into_archive(disk_file_path, zipfile, zipfile_path)
zipfile.add(zipfile_path, disk_file_path)
2014-11-06 11:03:21 +08:00
end
end
```
### Save zip archive entries sorted by name
2013-10-21 04:09:40 +08:00
To save zip archives with their entries sorted by name (see below), set `::Zip.sort_entries` to `true`
2013-10-21 04:09:40 +08:00
```
Vegetable/
Vegetable/bean
Vegetable/carrot
Vegetable/celery
fruit/
fruit/apple
fruit/kiwi
fruit/mango
fruit/orange
```
Opening an existing zip file with this option set will not change the order of the entries automatically. Altering the zip file - adding an entry, renaming an entry, adding or changing the archive comment, etc - will cause the ordering to be applied when closing the file.
2013-10-21 04:09:40 +08:00
### Default permissions of zip archives
On Posix file systems the default file permissions applied to a new archive
are (0666 - umask), which mimics the behavior of standard tools such as `touch`.
On Windows the default file permissions are set to 0644 as suggested by the
[Ruby File documentation](http://ruby-doc.org/core-2.2.2/File.html).
When modifying a zip archive the file permissions of the archive are preserved.
### Reading a Zip file
```ruby
2019-09-13 05:01:38 +08:00
MAX_SIZE = 1024**2 # 1MiB (but of course you can increase this)
Zip::File.open('foo.zip') do |zip_file|
# Handle entries one by one
zip_file.each do |entry|
puts "Extracting #{entry.name}"
2019-09-13 05:01:38 +08:00
raise 'File too large when extracted' if entry.size > MAX_SIZE
# Extract to file or directory based on name in the archive
entry.extract
# Read into memory
content = entry.get_input_stream.read
end
# Find specific entry
entry = zip_file.glob('*.csv').first
2019-09-13 05:01:38 +08:00
raise 'File too large when extracted' if entry.size > MAX_SIZE
puts entry.get_input_stream.read
end
```
### Notes on `Zip::InputStream`
2015-02-17 03:51:44 +08:00
`Zip::InputStream` can be used for faster reading of zip file content because it does not read the Central directory up front.
2015-02-17 03:51:44 +08:00
There is one exception where it can not work however, and this is if the file does not contain enough information in the local entry headers to extract an entry. This is indicated in an entry by the General Purpose Flag bit 3 being set.
2015-02-17 03:51:44 +08:00
> If bit 3 (0x08) of the general-purpose flags field is set, then the CRC-32 and file sizes are not known when the header is written. The fields in the local header are filled with zero, and the CRC-32 and size are appended in a 12-byte structure (optionally preceded by a 4-byte signature) immediately after the compressed data.
If `Zip::InputStream` finds such an entry in the zip archive it will raise an exception (`Zip::GPFBit3Error`).
`Zip::InputStream` is not designed to be used for random access in a zip file. When performing any operations on an entry that you are accessing via `Zip::InputStream.get_next_entry` then you should complete any such operations before the next call to `get_next_entry`.
```ruby
zip_stream = Zip::InputStream.new(File.open('file.zip'))
while entry = zip_stream.get_next_entry
# All required operations on `entry` go here.
end
```
Any attempt to move about in a zip file opened with `Zip::InputStream` could result in the incorrect entry being accessed and/or Zlib buffer errors. If you need random access in a zip file, use `Zip::File`.
2015-02-17 03:51:44 +08:00
2015-01-17 18:38:13 +08:00
### Password Protection (Experimental)
2015-04-20 21:08:06 +08:00
Rubyzip supports reading/writing zip files with traditional zip encryption (a.k.a. "ZipCrypto"). AES encryption is not yet supported. It can be used with buffer streams, e.g.:
2015-01-17 18:38:13 +08:00
```ruby
Zip::OutputStream.write_buffer(
::StringIO.new, encrypter: Zip::TraditionalEncrypter.new('password')
) do |out|
2015-01-17 18:38:13 +08:00
out.put_next_entry("my_file.txt")
out.write my_data
end.string
```
This is an experimental feature and the interface for encryption may change in future versions.
2013-06-01 03:55:43 +08:00
## Known issues
### Modify docx file with rubyzip
Use `write_buffer` instead `open`. Thanks to @jondruse
```ruby
2013-06-03 18:42:16 +08:00
buffer = Zip::OutputStream.write_buffer do |out|
2013-06-01 03:55:43 +08:00
@zip_file.entries.each do |e|
unless [DOCUMENT_FILE_PATH, RELS_FILE_PATH].include?(e.name)
out.put_next_entry(e.name)
out.write e.get_input_stream.read
end
end
2013-06-01 03:55:43 +08:00
out.put_next_entry(DOCUMENT_FILE_PATH)
out.write xml_doc.to_xml(:indent => 0).gsub("\n","")
2013-06-01 03:55:43 +08:00
out.put_next_entry(RELS_FILE_PATH)
out.write rels.to_xml(:indent => 0).gsub("\n","")
end
File.open(new_path, "wb") {|f| f.write(buffer.string) }
2013-06-01 03:55:43 +08:00
```
## Configuration
2002-01-06 06:09:48 +08:00
2019-09-13 05:01:38 +08:00
### Existing Files
By default, rubyzip will not overwrite files if they already exist inside of the extracted path. To change this behavior, you may specify a configuration option like so:
```ruby
Zip.on_exists_proc = true
```
If you're using rubyzip with rails, consider placing this snippet of code in an initializer file such as `config/initializers/rubyzip.rb`
Additionally, if you want to configure rubyzip to overwrite existing files while creating a .zip file, you can do so with the following:
```ruby
Zip.continue_on_exists_proc = true
```
2019-09-13 05:01:38 +08:00
### Non-ASCII Names
2015-04-20 21:08:06 +08:00
If you want to store non-english names and want to open them on Windows(pre 7) you need to set this option:
```ruby
Zip.unicode_names = true
```
2019-09-13 05:01:38 +08:00
Sometimes file names inside zip contain non-ASCII characters. If you can assume which encoding was used for such names and want to be able to find such entries using `find_entry` then you can force assumed encoding like so:
```ruby
Zip.force_entry_names_encoding = 'UTF-8'
```
Allowed encoding names are the same as accepted by `String#force_encoding`
### Date Validation
2015-04-20 21:08:06 +08:00
Some zip files might have an invalid date format, which will raise a warning. You can hide this warning with the following setting:
```ruby
Zip.warn_invalid_date = false
```
2019-09-13 05:01:38 +08:00
### Size Validation
2019-09-26 03:56:53 +08:00
By default (in rubyzip >= 2.0), rubyzip's `extract` method checks that an entry's reported uncompressed size is not (significantly) smaller than its actual size. This is to help you protect your application against [zip bombs](https://en.wikipedia.org/wiki/Zip_bomb). Before `extract`ing an entry, you should check that its size is in the range you expect. For example, if your application supports processing up to 100 files at once, each up to 10MiB, your zip extraction code might look like:
2019-09-13 05:01:38 +08:00
```ruby
MAX_FILE_SIZE = 10 * 1024**2 # 10MiB
MAX_FILES = 100
Zip::File.open('foo.zip') do |zip_file|
num_files = 0
zip_file.each do |entry|
num_files += 1 if entry.file?
raise 'Too many extracted files' if num_files > MAX_FILES
raise 'File too large when extracted' if entry.size > MAX_FILE_SIZE
entry.extract
end
end
```
If you need to extract zip files that report incorrect uncompressed sizes and you really trust them not too be too large, you can disable this setting with
```ruby
Zip.validate_entry_sizes = false
```
Note that if you use the lower level `Zip::InputStream` interface, `rubyzip` does *not* check the entry `size`s. In this case, the caller is responsible for making sure it does not read more data than expected from the input stream.
### Compression level
2019-09-13 05:01:38 +08:00
When adding entries to a zip archive you can set the compression level to trade-off compressed size against compression speed. By default this is set to the same as the underlying Zlib library's default (`Zlib::DEFAULT_COMPRESSION`), which is somewhere in the middle.
You can configure the default compression level with:
2014-12-02 15:55:30 +08:00
```ruby
Zip.default_compression = X
2014-12-02 15:55:30 +08:00
```
Where X is an integer between 0 and 9, inclusive. If this option is set to 0 (`Zlib::NO_COMPRESSION`) then entries will be stored in the zip archive uncompressed. A value of 1 (`Zlib::BEST_SPEED`) gives the fastest compression and 9 (`Zlib::BEST_COMPRESSION`) gives the smallest compressed file size.
This can also be set for each archive as an option to `Zip::File`:
```ruby
Zip::File.open('foo.zip', create:true, compression_level: 9) do |zip|
zip.add ...
end
```
2019-09-13 05:01:38 +08:00
### Zip64 Support
By default, Zip64 support is disabled for writing. To enable it do this:
```ruby
2019-09-13 05:01:38 +08:00
Zip.write_zip64_support = true
```
2019-09-13 05:01:38 +08:00
_NOTE_: If you will enable Zip64 writing then you will need zip extractor with Zip64 support to extract archive.
### Block Form
2015-04-20 21:08:06 +08:00
You can set multiple settings at the same time by using a block:
```ruby
Zip.setup do |c|
c.on_exists_proc = true
c.continue_on_exists_proc = true
c.unicode_names = true
c.default_compression = Zlib::BEST_COMPRESSION
end
```
## Compatibility
Rubyzip is known to run on a number of platforms and under a number of different Ruby versions. Please see the table below for what we think the current situation is. Note: an empty cell means "unknown", not "does not work".
| OS | 2.4 | 2.5 | 2.6 | 2.7 | 3.0 | Head | JRuby 9.2.17.0 | JRuby Head | Truffleruby 21.1.0 | Truffleruby Head |
|----|-----|-----|-----|-----|-----|------|----------------|------------|--------------------|------------------|
|Ubuntu 20.04| CI | CI | CI | CI | CI | ci | CI | ci | CI | ci |
|Mac OS 10.15.7| CI | x | x | x | x | | x | | x | |
|Windows 10| | | | x | | | | | | |
|Windows Server 2019| CI | | | | | | | | | |
Key: `CI` - tested in CI, should work; `ci` - tested in CI, might fail; `x` - known working; `o` - known failing.
Please [raise a PR](https://github.com/rubyzip/rubyzip/pulls) if you know Rubyzip works on a platform/Ruby combination not listed here, or [raise an issue](https://github.com/rubyzip/rubyzip/issues) if you see a failure where we think it should work.
## Developing
Install the dependencies:
```shell
bundle install
```
Run the tests with `rake`:
```shell
rake
```
Please also run `rubocop` over your changes.
Our CI runs on [GitHub Actions](https://github.com/rubyzip/rubyzip/actions). Please note that `rubocop` is run as part of the CI configuration and will fail a build if errors are found.
## Website and Project Home
2013-07-02 06:26:52 +08:00
http://github.com/rubyzip/rubyzip
2013-07-02 06:26:52 +08:00
http://rdoc.info/github/rubyzip/rubyzip/master/frames
## Authors
2002-01-05 08:37:45 +08:00
See https://github.com/rubyzip/rubyzip/graphs/contributors for a comprehensive list.
2012-04-08 05:21:02 +08:00
### Current maintainers
2010-09-24 21:22:25 +08:00
* Robert Haines (@hainesr)
* John Lees-Miller (@jdleesmiller)
* Oleksandr Simonov (@simonoff)
### Original author
* Thomas Sondergaard
## License
2015-04-20 21:08:06 +08:00
Rubyzip is distributed under the same license as ruby. See
http://www.ruby-lang.org/en/LICENSE.txt
## Research notice
Please note that this repository is participating in a study into sustainability
of open source projects. Data will be gathered about this repository for
approximately the next 12 months, starting from June 2021.
Data collected will include number of contributors, number of PRs, time taken to
close/merge these PRs, and issues closed.
For more information, please visit
[our informational page](https://sustainable-open-science-and-software.github.io/) or download our [participant information sheet](https://sustainable-open-science-and-software.github.io/assets/PIS_sustainable_software.pdf).