In this post, we will take a quick look at
scheme-bytestructures
, a library for reading and writing
structured binary data in Guile and other schemes.
Library homepage: scheme-bytestructures
scheme-bytestructures
is really easy to use. Using this
library, handling binary data in Scheme becomes more pleasant than C.
Combined with macros, it lets you handle binary data almost at the
speed of thought. The macro API provided by
scheme-bytestructures
ensures that there is 0 runtime overhead
over writing the binary interpreting code yourself. With speed of
development and performance, this library makes no tradeoffs.
To install scheme-bytestructures
, you can head over to the
official page,
download the tarball from the releases section, cd
to the
extracted directory, and run.
autoreconf -vif # initialize autotools
./configure # configure build system
make # build the library
sudo make install # install the library to the default path
Alternatively, you can install it using guix by running
guix package -i guile-bytestructures
You should now be able to import scheme-bytestructures
using:
(use-modules (bytestructures guile))
Now, let's use the library for a real world scenario, reading the file
header of a .wav
file. WAV files are used to store lossless
audio, and are used very commonly in audio programming and production.
It contains audio data in the linear
pulse-code modulation format. This is a very simple way to store
audio; feel free to read more about it if you are interested.
The header of the WAV file provides information on how to interpret the raw binary data in the file. In this post, we want to parse this header. The format of the header is described here: WAV audio format. Note that it consists of strings and integers, which is common for most binary data.
Using scheme-bytestructures
, we simply replicate the structure
described on the webpage in Scheme. We describe the format using a
bs:struct
descriptor, which takes an alist with the keys
being the "name" of the field, and the value being the type /
descriptor of the value.
We can define this using the bytestructures macro API like this:
(define-bytestructure-accessors
(bs:struct `((filetype ,(bs:string 4 'utf8))
(filesize ,uint32)
(filetype-header ,(bs:string 4 'utf8))
(format-chunk-marker ,(bs:string 4 'utf8))
(format-chunk-length ,uint32)
(format-type ,uint16)
(num-channels ,uint16)
(sample-freq ,uint32)
(bytes/sec ,uint32)
(block-alignment ,uint16)
(bits-per-sample ,uint16)
(data-chunk-header ,(bs:string 4 'utf8))
(data-chunk-size ,uint32)))
wav-header-unwrap wav-header-ref wav-header-set! wav-header-ref* wav-header-set!*)
Now to explain the code. Let's look at the first element in the struct alist. We name it "filetype" and declare that it's value is a string, that's 4 bytes long, and encoded using utf8. We call the next field "filesize", and declare that it is encoded using an unsigned 32 bit integer. And so on until the end of the header.
The identifiers (e.g., wav-header-unwrap
) at the bottom
represent macros that the define-bytestructure-accessors
macro will define for us.
wav-header-unwrap
helps us get the offset of a specific item
into a bytevector. We can use it to know that filetype-header
starts 8 bytes from the start of the binary data.
wav-header-ref
, wav-header-set!
, wav-header-ref*
,
and wav-header-set!*
helps us access and modify the contents of
the binary data. The star variants take an arbitrary offset into a
bytevector, in case you have structured data partially embedded inside
a bytevector.
For example, we can use wav-header-ref
like this:
(wav-header-ref bytevector filesize)
Which returns an integer of the filesize
field in the
bytevector
.
Now, let's use it on a real WAV file! First, we read a WAV file as a
bytevector. We use the wav-header-ref
function to construct an
alist representing the header, which the key being a symbol
representing the field.
(use-modules (ice-9 binary-ports))
(define data (call-with-input-file "REC00056.WAV" get-bytevector-all))
(list
(cons 'filetype (wav-header-ref data filetype))
(cons 'filesize (wav-header-ref data filesize))
(cons 'filetype-header (wav-header-ref data filetype-header))
(cons 'format-chunk-marker (wav-header-ref data format-chunk-marker))
(cons 'format-chunk-length (wav-header-ref data format-chunk-length))
(cons 'format-type (wav-header-ref data format-type))
(cons 'num-channels (wav-header-ref data num-channels))
(cons 'sample-freq (wav-header-ref data sample-freq))
(cons 'bytes/sec (wav-header-ref data bytes/sec))
(cons 'block-alignment (wav-header-ref data block-alignment))
(cons 'bits-per-sample (wav-header-ref data bits-per-sample))
(cons 'data-chunk-header (wav-header-ref data data-chunk-header))
(cons 'data-chunk-size (wav-header-ref data data-chunk-size)))
This evaluates to
((filetype . "RIFF")
(filesize . 30536556)
(filetype-header . "WAVE")
(format-chunk-marker . "fmt ")
(format-chunk-length . 16)
(format-type . 1)
(num-channels . 2)
(sample-freq . 44100)
(bytes/sec . 264600)
(block-alignment . 6)
(bits-per-sample . 24)
(data-chunk-header . "data")
(data-chunk-size . 30536520))
Wasn't that easy?
Bonus: Macros
Generating the alist took a lot of copying. Let's try to do the same thing using a macro!
(define-syntax define-wav-header-parser
(syntax-rules ()
((_ parser-name (field-name field-type) ...)
(begin
(define-bytestructure-accessors
(bs:struct `((field-name ,field-type) ...))
not-used-1 ref-macro not-used-2 not-used-3 not-used-4)
(define (parser-name bv)
(list
(cons 'field-name (ref-macro bv field-name)) ...))))))
(define-wav-header-parser wav-header-parse
(filetype (bs:string 4 'utf8))
(filesize uint32)
(filetype-header (bs:string 4 'utf8))
(format-chunk-marker (bs:string 4 'utf8))
(format-chunk-length uint32)
(format-type uint16)
(num-channels uint16)
(sample-freq uint32)
(bytes/sec uint32)
(block-alignment uint16)
(bits-per-sample uint16)
(data-chunk-header (bs:string 4 'utf8))
(data-chunk-size uint32))
(wav-header-parse data) ;; same output as above
That's all you need to have a function that parser a WAV file header. Note that most of the code just describes how the WAV file is structured. Actually parsing it took no code at all.
Conclusion
This post was a preview to scheme-bytestructures
, which
provides all you need to easily parse binary data in Guile Scheme.
Note that I have only previewed part of the library. There is a
runtime API which doesn't need macros. Additionally, it supports
addditional data types such as unions, vectors, arrays, pointers, etc.