Solemn's Site
Never stopping to think if I should.
Serving static gzip-compressed content with Apache

Posted in Linux on 28 Mar 2019 at 18:49 UTC

Background

The CI pipeline for one of my projects generates coverage reports as a collection of HTML files, which are published on one of my web servers. Each report is only ~8MB, but that starts to add up pretty quickly after a few dozen commits, so I wanted to compress the reports on disk and have them decompressed as needed rather than using up my precious disk space.

Strangely, this doesn't seem to be a widely-used (or at least well-documented) Apache configuration. All references I found were out of date or didn't do what I wanted.

So here's how I got it working.

Configuration

First, ensure your Apache installation has the following modules enabled:

You may be able to do this by running a2enmod <modulename>, check the documentation for your distribution.

The highlighted configuration directives below will allow serving pre-compressed HTML/CSS files from a single VirtualHost:

<VirtualHost *:80>
        ServerName example.com
        DocumentRoot /var/www/example.com/

        DirectoryIndex "index.html.gz" "index.html"

        # Don't list the compressed files in directory indexes - you probably don't want to expose
        # the .gz URLs to the outside. They're also misleading, since requesting them will give the
        # original files rather than compressed ones.
        IndexIgnore "*.html.gz" "*.css.gz"

        RewriteEngine On

        # Rewrite requests for .html/.css files to their .gz counterparts, if they exist.
        RewriteCond "%{DOCUMENT_ROOT}%{REQUEST_FILENAME}.gz" -s
        RewriteRule "^(.*)\.(html|css)$" "$1\.$2\.gz" [QSA]

        # Serve compressed HTML/CSS with the correct Content-Type header.
        RewriteRule "\.html\.gz$" "-" [T=text/html,E=no-gzip:1]
        RewriteRule "\.css\.gz$"  "-" [T=text/css,E=no-gzip:1]

        # Define a filter which decompresses the content using zcat.
        # CAVEAT: This will fork a new process for every request made by a client that doesn't
        # support gzip compression (most browsers do), so may not be suitable if you're running a
        # very busy site.
        ExtFilterDefine zcat cmd="/bin/zcat -"

        <FilesMatch "\.(html|css)\.gz$">
                <If "%{HTTP:Accept-Encoding} =~ /gzip/">
                        # Client supports gzip compression - pass it through.
                        Header append Content-Encoding gzip
                </If>
                <Else>
                        # Client doesn't support gzip compression - decompress it.
                        SetOutputFilter zcat
                </Else>

                # Force proxies to cache gzipped & non-gzipped css/js files separately.
                Header append Vary Accept-Encoding
        </FilesMatch>
</VirtualHost>

Once Apache is configured as described, you can start replacing HTML/CSS files with gzip-compressed counterparts as necessary. Something like the following command can be used to compress everything under your DocumentRoot:

find /var/www/example.com/ -type f \( -name '*.html' -o -name '*.css' \) \
    -exec gzip {} \;

Appendix: Fixing directory indexes

If, like me, you have Apache configured to generate directory listings and want that to keep working as before, then there is a bodge you can do:

Linux supports sparse files, and Apache will generate its index on the file's logical size, so you can create an empty sparse xxx.html with the original size alongside xxx.html.gz, and it will appear in the directory listing as it did before, without taking any disk space (besides a directory entry), and when someone requests it they get the .gz version instead.

The compress-and-make-sparse-file step can be automated with a command like the following:

find /var/www/example.com/ -type f \( -name '*.html' -o -name '*.css' \) \
    -exec bash -c 'if [ ! -f "$1.gz" ]; then size=$(stat -c "%s" "$1"); gzip -k "$1" && truncate -s 0 "$1" && truncate -s $size "$1"; fi' -- {} \;

I added the above to my server's crontab, with an extra -mmin +5 on the find command so that it only compresses files that haven't been modified in 5 minutes, to avoid compressing partially-uploaded files.

NOTE: Making copies/backups/etc with software which isn't sparse-file-aware will expand these files to their original size.

Results

Your results will vary, but if you have a site which is mostly static text then you can probably achieve similar space savings to mine.

Before:

# du -sh /var/www/example.com/
326M    /var/www/example.com/

After:

# du -sh /var/www/example.com/
19M     /var/www/example.com/

Comments

No comments have been posted

Comments are currently disabled