Zip files are great for compressing data and are used in many different circumstances. If you’re building a Python script and need to create or decompress Python ZIP files, you’re in luck.
In this tutorial, you’re going to learn how to zip and unzip (decompress) zip files, all with Python!
Let’s get started!
Prerequisites
If you’d like to follow along with this tutorial, be sure you have the following:
Creating a Zip File with Python
Let’s get started and first focus on zipping files with Python. This tutorial will begin with the simplest possible method and will build upon various techniques.
To zip a single file:
1. Open your favorite text editor.
2. If you’d like to follow along exactly with the tutorial, create a directory at ~/pythonzipdemo and download these BMP files into it. Once you do, you should four bitmap files; all_black.bmp, all_blue.bmp, all_green.bmp, and all_red.bmp. You’ll also have a demo.py Python script that contains all of the code in this tutorial.
The tutorial files are bitmap files, which greatly benefit from compression. Not all types of files will compress as well.
3. Create a new Python script called demo.py. This tutorial will be saving the demo script in the home directory or ~/demo.py.
4. In your Python script, start coding. First, import the zipfile
module. The zipfile
module is a built-in Python module that contains all of the functions you’ll need to zip and unzip files with Python.
import zipfile # imports the zipfile module so you can make use of it
5. Next, instantiate a zipfile
object using the ZipFile()
method, open the zip file in write mode (w
), add a file to it, and then close the instance (close()
).
## Instantiate a new zipfile object creating the single_zip.zip archive.
zf = zipfile.ZipFile('single_file.zip', mode='w')
## Add a file to the archive
zf.write('all_black.bmp')
## Close the archive releasing it from memory
zf.close()
This tutorial relies on relative paths. The relative path looks for the files in the current directory, where Python is currently running. You can specify absolute paths instead when specifying file names.
Once you run the script, you should now see a file called single_file.zip in the folder. Congrats! You’ve zipped up your first file! But wait. Notice that it’s not any smaller. By default, the zip file is not compressed. To compress the files inside, you must perform one more step.
Since
zipfile
is a context manager, you can also use thewith
statement e.g.
with ZipFile('single_file.zip', 'w') as zipfile:
zipfile.write('all_black.bmp')
6. Finally, add a compression
parameter when instantiating the zipfile
object. The zipfile
module uses various compression attributes such as ZIP_STORED, ZIP_DEFLATED, ZIP_BZIP2, and ZIP_LZMA that dictate how Python compresses the files inside. By default, that attribute is set to ZIP_STORED
, meaning no compression.
To compress the files inside the archive, you must specify the ZIP_DEFLATED
attribute, as shown below.
import zipfile
## Instantiate a new zipfile object creating the single_zip.zip archive and compressing it
zf = zipfile.ZipFile('single_file.zip', mode='w', compression=zipfile.ZIP_DEFLATED)
## Add a file to the archive
zf.write('all_black.bmp')
## Close the archive releasing it from memory
zf.close()
When you run this script, you should now see that the single_file.zip is much smaller than the actual bitmap file.
Applying Filters When Zipping Files
A lot about zipping files has been mentioned already, but only by referencing the files manually. When you have to deal with a large number of files, specifying each file manually is far from ideal.
You can filter the files you want to operate on based on filters. A filter can be based on the file size or the file name, for example.
There are no built-in ways to filter files being added to a zip file. You’ll need to build your own solution. One way to build your own solution is to create a Python function. A Python function is a set of code you can execute as one. Building a function is a great way to “encapsulate” code into one single unit.
The function to build for this task will consist of roughly four separate phases:
- Opening the zip file in write mode.
- Read all files in the source folder you’d like to add files to the zip file from.
- Iterate over each file and check to see if it matches a particular string.
- If the file matches the string, add it to the zip file.
You can see this function below. This function is more complex than what you may be used to but it shows how flexible Python is, allowing you to create whatever you need.
A
lambda
in Python is used to create small anonymous functions. Strings are not callable, and to makename
callable;lambda
convertsname
into a function.
# imports the zipfile module so you can make use of it
import zipfile
# for manipulating directories
import os
# for deleting any prefix up to the last slash character of a pathname file from a given directory that matches the filter
from os.path import basename
# zip the files from given directory that matches the filter
def filter_and_zip_files (destination_file, source_foulder, filter):
with zipfile.ZipFile(destination_file, mode="w", compression=zipfile.ZIP_DEFLATED) as new_zip:
# Iterate over all the files in the folder
test = os.walk(source_foulder)
for folder, subfolders, filenames in os.walk(source_foulder):
for filename in filenames:
if filter(filename):
# create complete filepath of file in directory
file_path = os.path.join(folder, filename)
# Add file to zip
new_zip.write(file_path, basename(file_path))
# zipping only colors that start with b to only_b_colors by using_filters
filter_and_zip_files("only_b_colors.zip", ".", lambda name : name.startswith("all_b"))
list_zip_file_contents("only_b_colors.zip")
Listing the Contents of a Zip File
Once you have created a zip file, you can also use Python to read the files inside. To do that, use the namelist()
method. The namelist()
method is a handy way to query all files in the zip file returned as an array of file names.
As shown below, building off the previous example, invoke the namelist()
method on the zipfile
object.
import zipfile
## Open the zip file for reading
zip_read = zipfile.ZipFile('single_file.zip', mode='r')
## Inspect the contents of single_file.zip
zip_read.namelist()
## Close the archive releasing it from memory
zip_read.close()
You’ll see that the namelist()
returns an array of each file in the zip file. At this time, you’ll only see one file (all_black.bmp) in the zip file.
If you’d like a more human-readable output of file names, use a Python for loop to read each file name and place a tab in the middle of each one, as shown below.
import zipfile
## Open the zip file for reading
zip_read = zipfile.ZipFile('single_file.zip', mode='r')
## Inspect the contents of single_file.zip
files = zip_read.namelist()
for file_name in files:
# prints the name of files within, with a tab for better readability
print("\t", file_name)
## Close the archive releasing it from memory
zip_read.close()
Adding Files to an Existing Zip File
So you’ve already got a zip file and would like to add some files to it. No problem! You’ll simply need to change the mode that you open the zip file in.
To add a file to an existing zip file, open the zip file in append mode (mode='a'
), then invoke the write()
method passing the file to add to the zip file, as shown below.
import zipfile
## Open the zip file for appending
zip_file = zipfile.ZipFile('single_file.zip', mode='a')
## Optionally see the files in the zip before the addition
zip_read.namelist()
## Add the file to the zip
zip_file.write('all_blue.bmp')
## Optionally see the files in the zip before after the addition
zip_read.namelist()
## Close the archive releasing it from memory
zip_read.close()
Extracting the Contents of Zip Files with the zipfile
Module
Let’s move on to extracting files from existing zip files. Using either the extract()
and extractall()
methods, you can make it happen.
The example below is opening the zip for reading (mode='r'
) then extracting a single file (all_blue.bmp
) from the zip to your home folder. It’s also demonstrating using the extractall()
method to extract all files in the zip to your home folder.
import zipfile
## Open the zip file for reading
zip_file = zipfile.ZipFile('single_file.zip', mode='r')
## Extract a single file from the zip file to your home directory
zip_file.extract('all_blue.bmp', path='~')
## Extract all files from the zip file to your home directory
zip_file.extractall('path='~')
## Close the archive releasing it from memory
zip_read.close()
Extracting the Contents of Python Zip Files with the shutil
module
As an alternative to using the zipfile
module, you can also use another built-in Python module called shutil
. shutil
is a module that’s not specifically focused on zip files but more general file management. It just so happens to have some handy zip file methods to use.
To extract the contents of zip files with the shutil
module, provide the zip file and path to extract all files to. Unlike the extract()
method on the zipfile
module, the unpack_archive()
method of shutil
does not allow you to pick files from the archive to extract.
# imports the shutil module
import shutil
## Extract all files in the single_zip.zip file to your home folder.
shutil.unpack_archive('single_file.zip', '~')
Applying Filters When Unzipping Files
As covered in the zip filtering section, let’s now apply the same methodology (creating a function) to unzip files matching a specific filter. You can see the function example below. This function reads a zip file, reads all files inside the zip file, iterates over each file, and if that file matches the defined filter, it passes it to the extract()
method.
import zipfile
# unzip only files withtin the zip file that match the filter
def unzip_filtered_files (source_file, destination_folder, filter):
with zipfile.ZipFile(source_file, "r") as source:
list_of_file_names = source.namelist() # you'll get an iterable object
for file_name in list_of_file_names:
# checks if the file matches the filter
if filter(file_name):
# Extract the files to destination_folder
source.extract(file_name,path=destination_folder)
unzip_filtered_files("multiple_files.zip", "not_b_colors", lambda name : not name.startswith("all_b"))
Conclusion
You should now have a good understanding of working with zip files in Python. Using both the zipfile
and shutil
module, along with even building some Python functions, you should now know how to manage whatever zip file task is thrown at you.
How will you integrate this newfound knowledge into your next Python project?