SharpZipLib Versus 7-Zip for Large File Sets

Recently I was asked to help out when we had a need to reduce the size of a directory of rolling backups. And when I say size, I am talking around 12GB a day backup sizes.  This was already a compressed directory, but I found that if we zip up the folders, we can get even better compression (to the tune of 60+% more room).

First note: Windows built-in zipping utility is not so good, it won't even try to compress anything that size. Instead it throws an error. Nice.

Basically I started working with two options, #ZipLib (SharpZipLib) or 7-Zip.  #ZipLib is written entirely in C#, which lends nicely to logging and reporting purposes, but in my experience it may not be the answer for large file sets (NOTE: I benchmarked with #Zip's ZipFile, not with FastZip, which might be faster with the same compression results). 7-Zip is a fast utility, but it is an external application, so you have to call it from within your code and wait for the exit code.  I realize this already gives 7-Zip an unfair advantage, but I needed the best tool for the job, so the comparison really is about which you should use when you are working with large file sets.

Note: 7-Zip has a Stand Alone version (7za.exe) that I am talking about from here on out when I mention 7-Zip. They are on version 4.57.

Both tools compressed a 2GB directory (500MB on a compressed drive) down to a size of 185MB in ZIP format.  #ZipLib took nearly an hour, so I didn't let it finish with the other directories. 7-Zip took a little over an hour and a half to compress all 12GB.  I am presenting the code for both, so if anyone sees a faster way to use #ZipLib, I am definitely open to suggestions because I can get better logging about what is going on while it is running than I can with 7-Zip.

The IFileZipUtility Interface

The first thing we need is an interface IFileZipUtility.

namespace Tools.Compression
{
    /// <summary>
    /// Zips files up
    /// </summary>
    public interface IFileZipUtility
    {
        /// <summary>
        /// Zips files from a directory and subdirectories
        /// </summary>
        /// <param name="zipFilePath">The name of the zipped file</param>
        /// <param name="directory">directory to zip files up in</param>
        void GenerateZipFile(string zipFilePath, string directory);

        /// <summary>
        /// Zips files from a directory and subdirectories if recursive is true
        /// </summary>
        /// <param name="zipFilePath">The name of the zipped file</param>
        /// <param name="directory">directory to zip files up in</param>
        /// <param name="recursive">Would you like to zip up subdirectories as well?</param>
        void GenerateZipFile(string zipFilePath, string directory,bool recursive);
    }
}

The FileZipUtility Implementation of IFileZipUtility

Now we create our FileZipUtility that will call out to an IZip Application.

using log4net;

namespace Tools.Compression
{
    /// <summary>
    /// Zips Files up
    /// </summary>
    public sealed class FileZipUtility : IFileZipUtility
    {
        //todo: implement zipping filters

        private readonly ILog _logger = LogManager.GetLogger(typeof(FileZipUtility));
        private readonly IZip _zipApplication;

        /// <summary>
        /// Calls other constructor with default of 7-Zip for speedy compression
        /// </summary>
        public FileZipUtility() : this(new SevenZip())
        {
        }

        /// <summary>
        /// Creates a new zip utility that can zip files up
        /// </summary>
        /// <param name="zipApplication">Zip application to use on the back end</param>
        public FileZipUtility(IZip zipApplication)
        {
            _logger.DebugFormat("Initializing {0}.", GetType());
            _zipApplication = zipApplication;
        }

        /// <summary>
        /// Zips files from a directory and subdirectories
        /// </summary>
        /// <param name="zipFilePath">The name of the zipped file</param>
        /// <param name="directory">directory to zip files up in</param>
        public void GenerateZipFile(string zipFilePath, string directory)
        {
            GenerateZipFile(zipFilePath, directory, true);
        }

        /// <summary>
        /// Zips files from a directory and subdirectories
        /// </summary>
        /// <param name="zipFilePath">The name of the zipped file</param>
        /// <param name="directory">directory to zip files up in</param>
        /// <param name="recursive">Would you like to zip up subdirectories as well?</param>
        public void GenerateZipFile(string zipFilePath, string directory, bool recursive)
        {
            _logger.InfoFormat("Attempting to zip up directory \"{0}\" to \"{1}\".", directory,zipFilePath);
            _zipApplication.GenerateZipFile(zipFilePath, directory, recursive);
        }
    }
}

The IZip Interface

This is very simple. One method.

namespace Tools.Compression
{
    /// <summary>
    /// Zips up files and folders
    /// </summary>
    public interface IZip
    {
        /// <summary>
        /// Generates zip files
        /// </summary>
        /// <param name="zipFilePath">Path of zip file to create</param>
        /// <param name="directoryToZip">The top level directory to zip up</param>
        /// <param name="recursive">Would you like to zip up subdirectories as well?</param>
        void GenerateZipFile(string zipFilePath, string directoryToZip,bool recursive);
    }
}

The #ZipLib Implementation of IZip

using System.IO;
using Tools.FileSystem;
using log4net;
using ICSharpCode.SharpZipLib.Zip;
using ICSharpCode.SharpZipLib.Core;

namespace Tools.Compression
{
    /// <summary>
    /// Zips up files and directories using the #ZipLibrary (SharpZipLibrary), which is native to C#.  Do not use if you are zipping up huge amounts of data.
    /// </summary>
    public sealed class SharpZip : IZip
    {
        private readonly ILog _logger = LogManager.GetLogger(typeof(SharpZip));
        private ZipFile _currentZipFile;
        private readonly IFileSystemAccess _fileSystem;

        /// <summary>
        /// Default constructor that calls the other with a windows file system
        /// </summary>
        public SharpZip() :this(new WindowsFileSystemAccess())
        {
        }

        /// <summary>
        /// Creates a new #zip item that can zip up files natively.
        /// </summary>
        /// <param name="fileSystem">Type of file system you are using.</param>
        public SharpZip(IFileSystemAccess fileSystem)
        {
            _logger.DebugFormat("Initializing {0}.", GetType());
            _fileSystem = fileSystem;
        }

        /// <summary>
        /// Generates zip files using the #ZipLib library
        /// </summary>
        /// <param name="zipFilePath">Path of zip file to create</param>
        /// <param name="directoryToZip">The top level directory to zip up</param>
        /// <param name="recursive">Would you like to zip up subdirectories as well?</param>
        public void GenerateZipFile(string zipFilePath, string directoryToZip,bool recursive)
        {
            if (string.IsNullOrEmpty(zipFilePath) || string.IsNullOrEmpty(directoryToZip))
            {
                _logger.WarnFormat("Either you have left out a path to zip files to or the directory that you want to zip up. You specified {0} and {1} respectively. No files will be zipped."
                    , zipFilePath ?? ""
                    , directoryToZip ?? ""
                    );
                return;
            }

            if (_fileSystem.DirectoryExists(directoryToZip))
            {
                _logger.InfoFormat("Using #ZipLib to zip up directory {0}.",directoryToZip);
                DirectoryInfo di = _fileSystem.GetDirectoryInfo(directoryToZip);
                ZipFile zf = ZipFile.Create(zipFilePath);

                _currentZipFile = zf;

                zf.BeginUpdate();
                zf.NameTransform = new ZipNameTransform(di.Parent.FullName);

                FileSystemScanner scanner = new FileSystemScanner(".*")
                                                {
                                                    ProcessFile = ProcessZipFile,
                                                    ProcessDirectory = ProcessZipDirectory
                                                };
                scanner.Scan(directoryToZip,recursive);

                zf.CommitUpdate();
                zf.Close();
            }
        }

        /// <summary>
        /// Callback for adding a new file.
        /// </summary>
        /// <param name="sender">The scanner calling this delegate.</param>
        /// <param name="args">The event arguments.</param>
        private void ProcessZipFile(object sender, ScanEventArgs args)
        {
            _logger.DebugFormat("Adding file \"{0}\" to zip.", args.Name);
            _currentZipFile.Add(args.Name);
        }

        /// <summary>
        /// Callback for adding a new directory.
        /// </summary>
        /// <param name="sender">The scanner calling this delegate.</param>
        /// <param name="args">The event arguments.</param>
        /// <remarks>Directories are only added if they are empty and
        /// the user has specified that empty directories are to be added.</remarks>
        private void ProcessZipDirectory(object sender, DirectoryEventArgs args)
        {
            if (!args.HasMatchingFiles)
            {
                _logger.DebugFormat("Adding folder \"{0}\" to zip.", args.Name);
                _currentZipFile.AddDirectory(args.Name);
            }
        }
    }
}

The 7-Zip Implementation of IZip

using System;
using Tools.External;
using Tools.FileSystem;
using log4net;

namespace Tools.Compression
{
    /// <summary>
    /// Zips up files and folders using 7-Zip, a command line tool with speed
    /// </summary>
    public sealed class SevenZip : IZip
    {
        private readonly ILog _logger = LogManager.GetLogger(typeof(SevenZip));
        private readonly IFileSystemAccess _fileSystem;
        private readonly string _sevenZipExecutablePath = ".\\7Zip\\7za.exe";

        /// <summary>
        /// Default constructor that calls the other with a windows file system
        /// </summary>
        public SevenZip()
            : this(new WindowsFileSystemAccess())
        {
        }

        /// <summary>
        /// Creates a new 7-Zip ready to zip files
        /// </summary>
        /// <param name="fileSystem">Type of file system you are using.</param>
        public SevenZip(IFileSystemAccess fileSystem)
            : this(fileSystem, ".\\7Zip\\7za.exe")
        {
        }

        /// <summary>
        /// Creates a new 7-Zip ready to zip files
        /// </summary>
        /// <param name="fileSystem">Type of file system you are using.</param>
        /// <param name="sevenZipExecutablePath">Path to the 7-Zip stand alone file</param>
        public SevenZip(IFileSystemAccess fileSystem, string sevenZipExecutablePath)
        {
            _logger.DebugFormat("Initializing {0}.", GetType());
            _fileSystem = fileSystem;
            _sevenZipExecutablePath = sevenZipExecutablePath;
        }

        /// <summary>
        /// Generates zip files using the 7-Zip on the command line. Blindingly fast
        /// </summary>
        /// <param name="zipFilePath">Path of zip file to create</param>
        /// <param name="directoryToZip">The top level directory to zip up</param>
        /// <param name="recursive">Would you like to zip up subdirectories as well?</param>
        public void GenerateZipFile(string zipFilePath, string directoryToZip, bool recursive)
        {
            if (string.IsNullOrEmpty(zipFilePath) || string.IsNullOrEmpty(directoryToZip))
            {
                _logger.WarnFormat("Either you have left out a path to zip files to or the directory that you want to zip up. You specified {0} and {1} respectively. No files will be zipped."
                    , zipFilePath ?? ""
                    , directoryToZip ?? ""
                    );
                return;
            }

            if (_fileSystem.DirectoryExists(directoryToZip))
            {
                _logger.InfoFormat("Using 7-Zip to zip up directory {0}.", directoryToZip);
                string externalAppArgs = string.Format("a -tzip \"{0}\" \"{1}\\*.*\" -r", zipFilePath, directoryToZip);
                if (!recursive)
                {
                    externalAppArgs = string.Format("a -tzip \"{0}\" \"{1}\\*.*\"", zipFilePath, directoryToZip);
                }
                _logger.InfoFormat("Calling external process {0} with arguments {1}.", _sevenZipExecutablePath, externalAppArgs);
                SevenZipExitCodeType exitCode = (SevenZipExitCodeType)ExternalApplication.RunCommandLine(
                        _sevenZipExecutablePath
                        , externalAppArgs
                        , true
                        , false
                        );

                if (exitCode != SevenZipExitCodeType.Success)
                {
                    string error = string.Format("7-Zip Utility had an error zipping up files. The reported error was {0}.",
                                      Enum.GetName(typeof(SevenZipExitCodeType), exitCode));
                    _logger.ErrorFormat(error);
                    throw new ApplicationException(error);
                }
            }
        }
    }
}

Our Best Friend Forever 7-Zip's Exit Codes

namespace Tools.Compression
{
    /// <summary>
    /// Exit codes associated with 7-zip
    /// </summary>
    public enum SevenZipExitCodeType
    {
        Success = 0,
        Warning = 1,
        FatalError = 2,
        CommandLineError = 7,
        NotEnoughMemoryError = 8,
        UserStoppedProcessingError = 255
    }
}

The External Application Utility

With 7-Zip we are calling an external application so we need a static class that will make calls out to the application and give us an integer exit code that we will convert to the 7-Zip Exit Codes above.

using System.IO;
using System.Diagnostics;
using log4net;

namespace Tools.External
{
    /// <summary>
    /// Allows access to external processes and applications through the command line
    /// </summary>
    public static class ExternalApplication
    {
        private static readonly ILog _logger = LogManager.GetLogger(typeof(ExternalApplication));

        /// <summary>
        /// Runs an external process on the command line
        /// </summary>
        /// <param name="filePath">Path to the executable</param>
        /// <param name="args">Any arguments you want passed to the executable</param>
        /// <returns>Returns the exit code of the executable</returns>
        public static int RunCommandLine(string filePath, string args)
        {
            return RunCommandLine(filePath, args, true, true);
        }

        /// <summary>
        /// Runs an external process on the command line
        /// </summary>
        /// <param name="filePath">Path to the executable</param>
        /// <param name="args">Any arguments you want passed to the executable</param>
        /// <param name="waitForExit">Whether to wait for external application to complete before returning</param>
        /// <param name="setWorkingDirectoryToPath">Whether to change the working directory of the executable to the executable's directory</param>
        /// <returns>Will return -1 every time if you do not wait for the executable to complete. Returns the exit code of the executable</returns>
        public static int RunCommandLine(string filePath, string args, bool waitForExit, bool setWorkingDirectoryToPath)
        {
            ProcessStartInfo process = new ProcessStartInfo(filePath, string.Format(" {0}", args))
                                           {
                                               UseShellExecute = false,
                                               RedirectStandardOutput = false,
                                               CreateNoWindow = false
                                           };

            if (setWorkingDirectoryToPath)
            {
                _logger.Debug("Setting working directory to the path of the executable.");
                process.WorkingDirectory = Path.GetDirectoryName(filePath);
            }

            _logger.InfoFormat("Starting process \"{0}\" with arguments \"{1}\" in working directory \"{2}\". Waiting for exit? {3}", 
                            process.FileName, 
                            process.Arguments, 
                            process.WorkingDirectory, 
                            waitForExit.ToString()
                            );

            return RunProcess(process, waitForExit);
        }

        private static int RunProcess(ProcessStartInfo processInfo, bool waitForExit)
        {
            using (Process process = new Process())
            {
                int exitCode = -1;

                process.StartInfo = processInfo;

                process.Start();
                if (waitForExit)
                {
                    process.WaitForExit();
                    exitCode = process.ExitCode;
                }

                return exitCode;
            }
        }
    }
}
 

The As_A_FileZipUtility Test

And at the very least you need to verify your Zip Utility Works. I definitely need some more tests on this including mocks.

using System.IO;
using Tools.FileSystem;
using MbUnit.Framework;
using Tools.Compression;

namespace Tools.Tests.Integration.Compression
{
    [TestFixture]
    public class As_A_FileZipUtility
    {
        private FileZipUtility _utility;
        private const string _sevenZipPath = "7Zip\\7Za.exe";

        [SetUp]
        public void I_want_to()
        {
            _utility = new FileZipUtility(new SevenZip(new WindowsFileSystemAccess(), _sevenZipPath));
        }

        [Test]
        public void Validate_FileZipUtility_is_created_successfully()
        {
            Assert.IsNotNull(_utility);
        }

        [Test]
        public void Validate_GenerateZipFile_creates_a_zipped_file()
        {
            const string zipFileName = @".\TestZippedFile.zip";
            const string directory = ".\\7Zip";

            _utility.GenerateZipFile(zipFileName, directory, true);

            Assert.IsTrue(File.Exists(zipFileName));
        }
    }
}

One thing you may have noticed is an interface IFileSystemAccess. This is my step between the code and the file system to be orthogonal.

This will quickly introduce zipping into your application with logging (using one of my biggest loves: log4net).  I am serious about feedback for #ZipLib. If you have found FastZip to be extremely fast, I would love to hear about it (and may explore it then).

Print | posted @ Tuesday, September 23, 2008 11:21 PM

Comments on this entry:

Gravatar # re: SharpZipLib Versus 7-Zip for Large File Sets
by Robz at 9/23/2008 11:41 PM

It appears there may be an interface for calling 7Zip directly from code.

http://innerlimit.googlepages.com/sevenzipinterface
Gravatar # re: SharpZipLib Versus 7-Zip for Large File Sets
by Robz at 9/24/2008 12:12 AM

I have a #ZipLib Alternate that I didn't post about that may benefit from a little of this:

http://community.sharpdevelop.net/forums/t/6873.aspx
Gravatar # re: SharpZipLib Versus 7-Zip for Large File Sets
by Chris Sutton at 10/1/2008 7:05 AM

Nice writeup. I've been using 7zip recently and really like it's compression ratio.

Chris
Comments have been closed on this topic.