2022-02-22

scheme-bytestructures: Guile library spotlight

In this post, we will take a quick look at scheme-bytestructures, a library for reading and writing structured binary data in Guile and other schemes.

Library homepage: scheme-bytestructures

scheme-bytestructures is really easy to use. Using this library, handling binary data in Scheme becomes more pleasant than C. Combined with macros, it lets you handle binary data almost at the speed of thought. The macro API provided by scheme-bytestructures ensures that there is 0 runtime overhead over writing the binary interpreting code yourself. With speed of development and performance, this library makes no tradeoffs.

To install scheme-bytestructures, you can head over to the official page, download the tarball from the releases section, cd to the extracted directory, and run.

autoreconf -vif # initialize autotools
./configure # configure build system
make # build the library
sudo make install # install the library to the default path

Alternatively, you can install it using guix by running

guix package -i guile-bytestructures

You should now be able to import scheme-bytestructures using:

(use-modules (bytestructures guile))

Now, let's use the library for a real world scenario, reading the file header of a .wav file. WAV files are used to store lossless audio, and are used very commonly in audio programming and production. It contains audio data in the linear pulse-code modulation format. This is a very simple way to store audio; feel free to read more about it if you are interested.

The header of the WAV file provides information on how to interpret the raw binary data in the file. In this post, we want to parse this header. The format of the header is described here: WAV audio format. Note that it consists of strings and integers, which is common for most binary data.

Using scheme-bytestructures, we simply replicate the structure described on the webpage in Scheme. We describe the format using a bs:struct descriptor, which takes an alist with the keys being the "name" of the field, and the value being the type / descriptor of the value.

We can define this using the bytestructures macro API like this:

(define-bytestructure-accessors
  (bs:struct `((filetype ,(bs:string 4 'utf8))
               (filesize ,uint32)
               (filetype-header ,(bs:string 4 'utf8))
               (format-chunk-marker ,(bs:string 4 'utf8))
               (format-chunk-length ,uint32)
               (format-type ,uint16)
               (num-channels ,uint16)
               (sample-freq ,uint32)
               (bytes/sec ,uint32)
               (block-alignment ,uint16)
               (bits-per-sample ,uint16)
               (data-chunk-header ,(bs:string 4 'utf8))
               (data-chunk-size ,uint32)))
    wav-header-unwrap wav-header-ref wav-header-set! wav-header-ref* wav-header-set!*)

Now to explain the code. Let's look at the first element in the struct alist. We name it "filetype" and declare that it's value is a string, that's 4 bytes long, and encoded using utf8. We call the next field "filesize", and declare that it is encoded using an unsigned 32 bit integer. And so on until the end of the header.

The identifiers (e.g., wav-header-unwrap) at the bottom represent macros that the define-bytestructure-accessors macro will define for us.

wav-header-unwrap helps us get the offset of a specific item into a bytevector. We can use it to know that filetype-header starts 8 bytes from the start of the binary data.

wav-header-ref, wav-header-set!, wav-header-ref*, and wav-header-set!* helps us access and modify the contents of the binary data. The star variants take an arbitrary offset into a bytevector, in case you have structured data partially embedded inside a bytevector.

For example, we can use wav-header-ref like this:

(wav-header-ref bytevector filesize)

Which returns an integer of the filesize field in the bytevector.

Now, let's use it on a real WAV file! First, we read a WAV file as a bytevector. We use the wav-header-ref function to construct an alist representing the header, which the key being a symbol representing the field.

(use-modules (ice-9 binary-ports))

(define data (call-with-input-file "REC00056.WAV" get-bytevector-all))

(list
 (cons 'filetype (wav-header-ref data filetype))
 (cons 'filesize (wav-header-ref data filesize))
 (cons 'filetype-header (wav-header-ref data filetype-header))
 (cons 'format-chunk-marker (wav-header-ref data format-chunk-marker))
 (cons 'format-chunk-length (wav-header-ref data format-chunk-length))
 (cons 'format-type (wav-header-ref data format-type))
 (cons 'num-channels (wav-header-ref data num-channels))
 (cons 'sample-freq (wav-header-ref data sample-freq))
 (cons 'bytes/sec (wav-header-ref data bytes/sec))
 (cons 'block-alignment (wav-header-ref data block-alignment))
 (cons 'bits-per-sample (wav-header-ref data bits-per-sample))
 (cons 'data-chunk-header (wav-header-ref data data-chunk-header))
 (cons 'data-chunk-size (wav-header-ref data data-chunk-size)))

This evaluates to

((filetype . "RIFF")
 (filesize . 30536556)
 (filetype-header . "WAVE")
 (format-chunk-marker . "fmt ")
 (format-chunk-length . 16)
 (format-type . 1)
 (num-channels . 2)
 (sample-freq . 44100)
 (bytes/sec . 264600)
 (block-alignment . 6)
 (bits-per-sample . 24)
 (data-chunk-header . "data")
 (data-chunk-size . 30536520))

Wasn't that easy?

Bonus: Macros

Generating the alist took a lot of copying. Let's try to do the same thing using a macro!

(define-syntax define-wav-header-parser
  (syntax-rules ()
    ((_ parser-name (field-name field-type) ...)
     (begin
       (define-bytestructure-accessors
         (bs:struct `((field-name ,field-type) ...))
         not-used-1 ref-macro not-used-2 not-used-3 not-used-4)
       (define (parser-name bv)
         (list
          (cons 'field-name (ref-macro bv field-name)) ...))))))

(define-wav-header-parser wav-header-parse
  (filetype (bs:string 4 'utf8))
  (filesize uint32)
  (filetype-header (bs:string 4 'utf8))
  (format-chunk-marker (bs:string 4 'utf8))
  (format-chunk-length uint32)
  (format-type uint16)
  (num-channels uint16)
  (sample-freq uint32)
  (bytes/sec uint32)
  (block-alignment uint16)
  (bits-per-sample uint16)
  (data-chunk-header (bs:string 4 'utf8))
  (data-chunk-size uint32))

(wav-header-parse data) ;; same output as above

That's all you need to have a function that parser a WAV file header. Note that most of the code just describes how the WAV file is structured. Actually parsing it took no code at all.

Conclusion

This post was a preview to scheme-bytestructures, which provides all you need to easily parse binary data in Guile Scheme. Note that I have only previewed part of the library. There is a runtime API which doesn't need macros. Additionally, it supports addditional data types such as unions, vectors, arrays, pointers, etc.