タナクロ・エンジニアブログ

ファッション×ITで世界に挑む「タナクロ」のエンジニアチームです

comp

SwiftDataCompression - A Swift 3 extension wrapper around libcompression to easily inflate and deflate data

Today we will take a closer look at how to use a library with a C interface from within swift and what performance overhead that will cause. As example we will take the very common task of deflating and inflating data with the beloved zlib. In dire need to uncompress some deflated data, I was surprised to find out that there is no easy way already provided by the swift standard library. Currently there are at least 3 ways to use the zlib from within a swift project.

An already written objective-c extension that calls the zlib functions.
  • One dependency more
  • I always try to avoid adding objective-c code to swift projects
Use an already written extension in swift
  • Long story short, they all leave to be desired. All leak memory, some get the swift 3 pointer handling wrong, are overly complicated or for example implement the adler32 hash function wrong.
Use the brand new libcompression library released together with iOS9.
  • Among other algorithms libcompression has support for zlib deflate
  • No need for extra linking zlib into the project
  • Dependency free and maintained by apple

So the choice for libcompression should be obvious. import Compression and we are done? Unfortunately its not that easy. libcompression comes with its own problems and limitations. So this blog post will focus on how to overcome these.

Configuration limitations:

libcompression supports most of the common compression methods like ZLIB, LZMA, LZ4 and apples own proprietary compression algorithm LZFSE, but that basically sums your choices up. You can't configure the algorithms further. For example zlib deflate compression is set to level 5 (balance between compression speed and compression ratio) and libcompression does not offer a method to change this. So this could already be a deal breaker for some.

C like interface

Even though released recently, libcompression only offers a C interface. In other works we have to get our hands dirty with pointers and memory management. Also until now every mayor release of swift completely throws the previous pointer handling syntax overboard and came up with something new. (Not a bad thing since pointer handling got better every time, but I have the feeling this wrapper will need some minor changes in the next major swift release as well.)

zlib data format RFC-1950 vs RFC-1951

The are two common formats for deflated data. As defined in RFC-1950 and RFC-1951.
Long story short. RFC-1950 is the raw stream from RFC-1951 wrapped in a header and checksum. libcompression only supports the raw stream as defined in RFC-1951 so if we want to support zlib streams as defined in RFC-1950, we have to go the extra mile.

  • An example for RFC-1950 (wrapped) would be using Zip::Deflate in Ruby
  • An example for RFC-1951 (raw) would be deflated data send by a web server (Content-Encoding: deflate)

Stream or block operation

If you have ever used a compression library at a lower level interface you will probably have encountered the crucial question of "stream vs. block operation?". Which then will lead directly to the next question: "size of the destination buffer?".

Block operations:

compression_encode_buffer
compression_decode_buffer

  • easy to use (one function call)
  • if they fail in case of a too small destination buffer, the whole operation has to be repeated with a larger buffer.
Stream operations:

compression_stream_init
compression_stream_process
compression_stream_destroy

  • complicated stateful interface (init + loop over partial blocks + finalisation)
  • you can allocate buffer space on the fly, if more is needed

Block operations for decompressing are only practically if the size of the uncompressed data is known. This is basically never the case for zlib deflated data, since it is indeed a stream compressor. Block operation compressing sounds good on paper since one could assume, that the destination buffer would always be smaller than input buffer, unfortunately that is not the always the case. An empty input or an already compressed data stream could end up bigger after a compression operation. Ergo we will of course use the stream interface which does not suffer from these problems, because more space can be allocated when needed.

Extension interface

Since we aim for an easy to use extension of the Data type, our interface should look like this:
Notice the optional return type of Data, indicated by the question mark, because compressing and decompressing could fail.

extension Data {
    func deflate() -> Data? { ... }
    func inflate() -> Data? { ... }
}



And a usage example would look like this: (ignoring error checking completely)

let str = "Other strings make fun of me because I'm sooo long. Help me :("
let utf = str.data(using: .utf8)!.deflate()!.inflate()!
let res = String(data: utf, encoding: .utf8)!
assert(str == res)

Interface implementation

Regardless of the operation (compression or decompression), the libcompression interface call procedure will be exactly the same. Only the modus flag passed to compression_stream_init will be different (COMPRESSION_STREAM_ENCODE or COMPRESSION_STREAM_DECODE). Because of that we will implement the complicated routine only once, preventing errors and other problems. But first the extension wrapper code.

import Compression 

extension Data {

    /// Compresses the data using the zlib deflate algorithm.
    /// - returns: raw deflated data according to RFC-1951
    /// - note: Compression level 5 (best trade off between speed and time)
    public func deflate() -> Data? {
        return self.withUnsafeBytes { (sourcePtr: UnsafePointer<UInt8>) -> Data? in
            return perform(modus: COMPRESSION_STREAM_ENCODE, source: sourcePtr, sourceSize: count)
        }
    }
    
    /// Deompresses the data using the zlib deflate algorithm. Self is expected to be a raw deflate
    /// stream according to RFC-1951.
    /// - returns: uncompressed data
    public func inflate() -> Data? {
        return self.withUnsafeBytes { (sourcePtr: UnsafePointer<UInt8>) -> Data? in
            return perform(modus: COMPRESSION_STREAM_DECODE, source: sourcePtr, sourceSize: count)
        }
    }
}

As exlained, deflate and inflate will both delegate its work to a function called perform:modus:source:sourceCount -> Data? where the heavy lifting will occur.

func perform(modus: compression_stream_operation, source: UnsafePointer<UInt8>, sourceSize: Int) -> Data?

compression_stream_operation, COMPRESSION_STREAM_DECODE, COMPRESSION_STREAM_ENCODE are symbols directly imported from libcompression. As source data libcompression also expects a pointer in the form of UnsafePointer<UInt8> and the size of the input buffer, so we will pass these as well. Returning an optional Data object, indicating success or failure, which we will be passed trough the withUnsafeBytes closure, back to the caller.

The new swift 3 feature withUnsafeBytes is very convenient here, giving us a typed pointer, as expected by compression_stream_init as we will see later. Without that we would have to do some ugly cast acrobatics which would kinda look like this:

let ptr: UnsafePointer<UInt8> = (self as NSData).bytes.assumingMemoryBound(to: UInt8.self)


Implementation

fileprivate func perform(modus: compression_stream_operation, source: UnsafePointer<UInt8>, sourceSize: Int) -> Data?
{
    guard modus == COMPRESSION_STREAM_ENCODE || sourceSize > 0 else { return nil }

First we add a fast error return in case of decompressing empty data.

    let streamBase = UnsafeMutablePointer.allocate(capacity: 1)
    defer { streamBase.deallocate(capacity: 1) }
    var stream = streamBase.pointee
    
    let status = compression_stream_init(&stream, modus, COMPRESSION_ZLIB)
    guard status != COMPRESSION_STATUS_ERROR else { return nil }

The compression_stream_init function will expect a pointer to an uninitialized compression_stream struct. We could allocate this on the stack, since it won't leave this function, but that would require calling its constructor and setting the stream data, which then according to the documentation of
compression_stream_init can be overwritten, and must be set again. That is kinda ugly, so we only allocate the space and pass the pointer, setting the field for source and destination after the compression_stream_init call.

As memory management demands, the allocated space has to be freed again. We will use swifts defer statement here, to make sure this happens when this function returns. Very convenient when we have multiple return paths, which will be the case.

defer { compression_stream_destroy(&stream) }

A call to compression_stream_init demands a corresponding call to compression_stream_destroy in the future. We accomplish that again with a defer statement.

    let bufferSize = Swift.max( Swift.min(sourceSize, 64 * 1024), 64)
    let buffer = UnsafeMutablePointer.allocate(capacity: bufferSize)
    defer { buffer.deallocate(capacity: bufferSize) }
    
    stream.dst_ptr  = buffer
    stream.dst_size = bufferSize
    stream.src_ptr  = source
    stream.src_size = sourceSize

Since we use the stream interface it is not that important how much space we allocate for the loop buffer, but we still don't want to allocate a too large chunk. So we go with the size of the source buffer unless it it larger than 64kb. This will result in one or two loops for small data, and reasonable chunk sizes for large data. (The zlib block operations use 64kb as well.)
Also we make sure to at least use a buffer of 64 bytes to avoid strange behavior for very short data. Uncompressed empty data is also possible, so we certainly don't want a buffer with the size of zero bytes here.
And of course this buffer will have to be deallocated as well.

    var res = Data()
    let flags: Int32 = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
    
    while true {
        switch compression_stream_process(&stream, flags) {
            
        case COMPRESSION_STATUS_OK:
            guard stream.dst_size == 0 else { return nil }
            res.append(buffer, count: bufferSize)
            stream.dst_ptr = buffer
            stream.dst_size = bufferSize
            
        case COMPRESSION_STATUS_END:
            res.append(buffer, count: stream.dst_ptr - buffer)
            return res
            
        default:
            return nil
        }
    }

Last but not least, the block processing loop. We accumulate process data in res which will be our result in case of success. compression_stream_process will be called as long there is still data to process, returning COMPRESSION_STATUS_OK in case of a full destination buffer and COMPRESSION_STATUS_END when all source data is processed. We only expect COMPRESSION_STATUS_OK in case of a full destination buffer, so we guard against other cases that could occur in case of corrupted data, resulting in a infinite loop. Yes, corrupted data does not automatically result in COMPRESSION_STATUS_ERROR. If the buffer is full we simple append the processed data to the result and reuse our chunk buffer by resetting the pointer and size again.

That would complete our support for deflated stream data according to RFC-1951. Until now our extension should look like this:

import Foundation
import Compression

extension Data
{
    /// Compresses the data using the zlib deflate algorithm.
    /// - returns: raw deflated data according to RFC-1951
    /// - note: Compression level 5 (best trade off between speed and time)
    public func deflate() -> Data? {
        return self.withUnsafeBytes { (sourcePtr: UnsafePointer<UInt8>) -> Data? in
            return perform(modus: COMPRESSION_STREAM_ENCODE, source: sourcePtr, sourceSize: count)
        }
    }
    
    /// Deompresses the data using the zlib deflate algorithm. Self is expected to be a raw deflate
    /// stream according to RFC-1951.
    /// - returns: uncompressed data
    public func inflate() -> Data? {
        return self.withUnsafeBytes { (sourcePtr: UnsafePointer<UInt8>) -> Data? in
            return perform(modus: COMPRESSION_STREAM_DECODE, source: sourcePtr, sourceSize: count)
        }
    }
}



fileprivate func perform(modus: compression_stream_operation, source: UnsafePointer<UInt8>, sourceSize: Int) -> Data?
{
    guard modus == COMPRESSION_STREAM_ENCODE || sourceSize > 0 else { return nil }
    
    let streamBase = UnsafeMutablePointer<compression_stream>.allocate(capacity: 1)
    defer { streamBase.deallocate(capacity: 1) }
    var stream = streamBase.pointee
    
    let status = compression_stream_init(&stream, modus, COMPRESSION_ZLIB)
    guard status != COMPRESSION_STATUS_ERROR else { return nil }
    defer { compression_stream_destroy(&stream) }
    
    // will most likely result in 1 or 2 loops for small data
    let bufferSize = Swift.max( Swift.min(sourceSize, 64 * 1024), 64)
    let buffer = UnsafeMutablePointer<UInt8>.allocate(capacity: bufferSize)
    defer { buffer.deallocate(capacity: bufferSize) }
    
    stream.dst_ptr  = buffer
    stream.dst_size = bufferSize
    stream.src_ptr  = source
    stream.src_size = sourceSize
    
    var res = Data()
    let flags: Int32 = Int32(COMPRESSION_STREAM_FINALIZE.rawValue)
    
    while true {
        switch compression_stream_process(&stream, flags) {
            
        case COMPRESSION_STATUS_OK:
            guard stream.dst_size == 0 else { return nil }
            res.append(buffer, count: stream.dst_ptr - buffer)
            stream.dst_ptr = buffer
            stream.dst_size = bufferSize
            
        case COMPRESSION_STATUS_END:
            res.append(buffer, count: stream.dst_ptr - buffer)
            return res
            
        default:
            return nil
        }
    }
}

Supporting the zip stream format according to RFC-1950

Adding support for the slightly more common zlib stream data format would be desirable since we could reuse everything written above. But we would have to do some hashing and header parsing, which results in some pointer arithmetic... Lets do it!

tl;dr version of the zip stream header as defined in RFC-1950 would go like this:

[CMFFLG + ZDATA + CKSUM]
where
CMFFLG: a 2 byte header indicating stream type and compression level class.
ZDATA: the deflated data according to RFC-1951
CKSUM: a 4 byte adler32 checksum of the uncompressed data

So first we need a method to calculate the adler32 checksum.

extension Data {
    public func adler32() -> UInt32 {
        var s1: UInt32 = 1
        var s2: UInt32 = 0
        let prime: UInt32 = 65521
        
        for byte in self {
            s1 += UInt32(byte)
            if s1 >= prime { s1 = s1 % prime }
            s2 += s1
            if s2 >= prime { s2 = s2 % prime }
        }
        
        return (s2 << 16) | s1
    }
}

Would be a hasty implementation. Regarding hash functions, you would want to use a highly optimized version that is already available trough a library. At least now we regret a bit that we choose libcompression over direct zlib, since zlib would offer an interface to calculate adler32 checksums. However, unlike crc32 etc., adler32 is so simple that it does not offer much optimization potential and we want to keep it simple here.

Unzipping the stream would require separating the header, data and checksum. Validating the header, inflating the data and finally validating the checksum. Which would look like:

    public func unzip(skipCheckSumValidation: Bool = false) -> Data?
    {
        // 2 byte header + 4 byte adler32 checksum
        let overhead = 6
        guard count > overhead else { return nil }
        
        let header: UInt16 = withUnsafeBytes { (ptr: UnsafePointer<UInt16>) -> UInt16 in
            return ptr.pointee.bigEndian
        }
        
        // check for the deflate stream bit
        guard header >> 8 & 0b1111 == 0b1000 else { return nil }
        // check the header checksum
        guard header % 31 == 0 else { return nil }
       
        // inflate the data
        let cresult: Data? = withUnsafeBytes { (ptr: UnsafePointer<UInt8>) -> Data? in
            let source = ptr.advanced(by: 2)
            return perform(modus: COMPRESSION_STREAM_DECODE, source: source, sourceSize: count - overhead)
        }
        
        guard let inflated = cresult else { return nil }
        
        if skipCheckSumValidation { return inflated }

        // get the checksum from the end of the stream
        let cksum: UInt32 = withUnsafeBytes { (bytePtr: UnsafePointer<UInt8>) -> UInt32 in
            let last = bytePtr.advanced(by: count - 4)
            return last.withMemoryRebound(to: UInt32.self, capacity: 1) { (intPtr) -> UInt32 in
                return intPtr.pointee.bigEndian
            }
        }

        return cksum == inflated.adler32() ? inflated : nil
    }

Expecting more than 6 bytes should be self-evident so we can later safely dereference the header (2 bytes) and the checksum (4 bytes). RFC-1950 defines the header and checksum of course in network-byte-order, so using .bigEndian after dereferencing is crucial.

RFC-1950 quotes as:

When viewed as a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG) must be a multiple of 31.

So we check for that, and for the deflate indication bit as well, advance the data pointer past the header, and pass that to our previously implemented perform function for inflation. Last but not least, we perform the checksum validation.

withUnsafeBytes does a great job again at cherry-picking the desired bytes here. Of course it looks a little bit more complicated than a C implementation, but only so slightly. Advancing a byte pointer (advacnce:by), recasting to another typed pointer (withMemoryRebound:to) and then dereferencing indeed gives you the infamous bare metal feeling. A C implementation of that should not require more than 2~3 assembler instructions. Would be interesting to see how much overhead is added by swift. Let's take a look...

Compiling only a minimal version of the pointer advancing, recasting and dereferencing part as a standalone function with this command:

swiftc -c -O -emit-assembly tmp.swift

will reveal the assembly instructions. An unoptimized compile does indeed leave all the closure parts, and function calls to advance and rebound in, but after providing the -O flag, we get very interesting results:

func foo(data: Data) -> UInt32 {
    return data.withUnsafeBytes { (bytePtr: UnsafePointer<UInt8>) -> UInt32 in
        let last = bytePtr.advanced(by: data.count - 4)
        return last.withMemoryRebound(to: UInt32.self, capacity: 1) { (intPtr) -> UInt32 in
            return intPtr.pointee.bigEndian
        }
    }
}

boils down to:

// data.count - 4, rax is now our offset
subq    $4, %rax    
// subtraction overflow check        
jo      LBB6_1  
// load the pointee in the %ebx register, assuming $r14 contains the bytePtr at the start position           
movl    (%r14,%rax), %ebx   
// unrelated memory management
movq    %r15, %rdi     
// unrelated memory management     
callq   _rt_swift_release   
// big endian to little endian byte swap
bswapl  %ebx                

Very impressive optimization result that the swift compiler and llvm offer us here. No closures, and no function calls to advance and rebound ramain anymore. Really nothing left to be desired. A great indicator that swift does indeed deserve the title system language. A C implementation would almost result in the same assembler instructions, only skipping the overflow check since it is not part of any C standard. (and no memory management instructions of course)

Back to the zip implementation. Compressing data conforming to the RFC-1950 is next.

    public func zip() -> Data?
    {
        var res = Data(bytes: [0x78, 0x5e])
        
        guard let deflated = self.deflate() else { return nil }
        res.append(deflated)
        
        var adler = self.adler32().bigEndian
        res.append(Data(bytes: &adler, count: MemoryLayout<UInt32>.size))
        
        return res
    }

The header contains the compression level, but libcompression forces us at level 5. So for simplicity we can just use a static header here. 0x785e will indicate medium compression rate. Then append the deflated data, append the checksum, and we are finally done.

Still left to do would be some test cases. Distributing this extension with the new swift package manager would be interesting for a follow up post.

For the time being, you can find the complete extension on github. This version also offers support for other compression algorithms like LZMA, LZ4, etc.
https://github.com/mw99/SwiftDataCompression

Conclusion

Since compression and decompression is such a common task, it is surprising that these routines are not already provided by the standard library. Apple should provide their own extension wrapper for the Data type since all major scripting languages have their own easy wrapper solutions around zlib, gzip, etc.. Swift still lacks in that aspect. Providing a new library only with an archaic C interface leaves to be desired.

Besides that using pointers in swift is no rocket science. Making it very easy to connect C libraries and swift code from within Swift itself. Other "high level languages" are famous to lack in that regard. Letting swift really shine here as modern system level language. But pointer arithmetic and memory allocation in swift should be unknown territory for everybody. So analyzing an extension like this one in the leak checker or valgrind is essential.

-開発
-, , , , ,