Extracting NetMD messages with Rust macros
- Reading the disc title
- A cleaner API
- Generating code with macros
- Implementing our own macro
- Error handling
- Testing it out
- Rounding it up
- Final thoughts
- References
I was recently using the amazing Web MiniDisc 💽 tool written by Stefano Brilli, which allows you to rename, upload, delete, and move tracks on a MiniDisc hooked up to a NetMD player via USB. This amazed me not only because it's written entirely in TypeScript, but also because it's a web app, interfacing with NetMD devices via WebUSB and using a compiled version of ffmpeg targeting WebAssembly for audio conversion.
Me being me, this prompted to me to make an attempt to rewrite (at least) the protocol in Rust. The Web MiniDisc web app uses the library netmd-js
that the same author ported from linux-minidisc
to interface with NetMD devices.
After a few attempts of reading the netmd-js
source code and porting it over to Rust, my program was able to read and print out the disc title and total number of tracks from my Sony MZ-N505 player!
I was very happy with the results, except, there were a few ugly lines of code that irked me.
My personal Sony MZ-N505 MiniDisc player
Reading the disc title
After sending a query to a NetMD device, the device first responds with a header which tells you how many bytes to expect. The program then allocates a buffer with the aforementioned number of bytes, and the data is written into the buffer.
Each data frame that comes from the device has a known format. In the case of reading disc titles, the format looks like this:
The definition of placeholders for our example above are as follows:
%?
: any byte (we don't want to parse or match this)%b
: one byte%w
: two bytes (word) in big endian order%*
: rawVec<u8>
buffer
For our case, we want to read:
- two bytes
%w
in big endian which represents the current chunk size - two bytes
%w
in big endian which represents the number of total chunks - the rest bytes in raw
%*
: the disc title of the MiniDisc, Shift JIS encoded (may be incomplete, but we'll ignore that for now)
I've written a function called scan()
(which is semi-analagous to the original scanQuery()
function implemented in TypeScript), that
- returns data matched to their placeholders as a
Vec<&'a [u8]>
- matches all defined bytes 1-to-1 in order (the non-placeholder part)
Here's the usage of scan()
and what irked me:
I didn't like this code, especially the part where I'm destructuring data
into three raw &&[u8]
s that I needed to then manually parse into u16
s and String
s, and not to mention unreachable!()
.
Rust complains when you do not add the if let {} else {}
part, as the let
binding requires an irrefutable pattern match:
All these manual extraction and type conversion didn't seem sane to me, as I can still make mistakes such as:
- specifying too few destructures because I misread the template
- adding a new placeholder in the template and forgetting to add another destructure
- specifying the wrong
parse_*
types
I thought about adding a generic type T
and specifying that type on the call site, but the problems above still can happen, and the compiler will not complain:
Either way, I don't think there's a way to return variadic tuples. How can I implement scan()
in a way that T
can be a tuple of any length?
If only there is another way...
A cleaner API
Let's wrap everything that we know before, and throw (almost) all of it out the window. Personally I like to delete code and start over — knowing what I want, and what I don't want from the code that I'm writing.
I would like an API that looks something like this:
Since the template string literal contains a known amount of placeholders and is 'static
, the return type of scan()
should aptly correspond to the same number of values with its associated type, contained in a tuple. For example:
%b de ad be ef %w
: contains 2 values(u8, u16)
ca fe ba be
: no placeholders()
00 %*
: match-to-end wildcard of raw bytes&[u8]
We can also leverage type-inference to our advantage. In an isolated example, the following code will automatically infer the return types.
With that, we want our function scan()
to return a concrete type which is known at compile time, and it should roughly work like the following:
Generating code with macros
Macros are a way of writing code that writes other code (very meta). You've probably used macros before, such as the declarative println!
macro, or using #[derive(Serialize)]
derive macros from serde
.
From the previous vision of a cleaner API, we would like some sort of way to automatically implement that block for us, be it none, two, three, or n amounts of placeholders. Turns out, we can use function-like procedural macros to do exactly what we want.
Before getting started, we need a way to inspect our macro output. Fortunately I didn't have to look far until I stumbled upon the cargo-expand
crate. This cool utility allows us to take a peek into expanded macros, just like how rustc
would do during compile time.
Running cargo expand
outputs:
Implementing our own macro
This section implies you have pre-knowledge of macros and the basics on how they work. I've intentionally left out step-by-step instructions and instead focused on how to formulate the solution to the problem. If in doubt, please refer to reputable references, or get your hands dirty on the source code mentioned at the end of this article.
Remember that our scan()
function takes two inputs:
- a string template of
&'static str
- a data buffer of
&[u8]
We would like our scan!
macro to:
- match, in order, the bytes defined in the template
- extract all placeholder data into typed data
We can represent the input tokens of the macro with the following struct
:
MacroInput
can then be used in a generate()
function that returns a TokenStream
, which is a stream of tokenized representations of Rust code that will be inlined as code. The example below uses the quote!
macro that inlines the template string literal and the data as a tuple:
In a separate crate, we can test our preliminary scan!
macro by running cargo expand
, which will show that the macro is expanding to a tuple of the expected type (&str, &[u8])
!
For the placeholders %_
, we can use an enum
representation that facilitates type conversion after extracting data:
We then implement quote::ToTokens
for the newtype Template
that converts the enclosed Vec<Placeholder>
into a TokenStream
which can then be interpolated:
Let's break it down:
We're enumerating Vec<Placeholder>
and mapping it into tokens where:
#getter
gets the value at index#index
of theextracted
variable, and binds the result to a variablevalue
#converter
matches thePlaceholder
and performs the associated conversion on the bound*value
We then assemble the getter and the converter in a {}
block with quote!
:
We then collect all the tokens into the variable all_tokens
with the type Vec<TokenStream>
, and call tokens.extend()
to extend the existing TokenStream
by interpolation and repetition of all_tokens
:
Then, we define a function parse_template()
, that parses the template string literal into our newtype Template
, and another function extract()
that is used and executed during runtime which returns all the extracted bytes in-order as a Vec<&'a [u8]>
:
These two functions can then be used inside our generate()
function:
The Template
that is generated with parse_template()
is interpolated with the ToTokens
implementation that we have covered before, and the basics are done!
With type inference, a
is correctly inferred as a u8
, and b
as a u16
!
Error handling
Despite using a healthy dose of .unwrap()
s, we would ideally return custom Error
types. This can be done with the help of the fantastic thiserror
crate that does the heavy lifing for us.
For our usage, we want to define two error types:
CompileError
representing compile-time errors, such as invalid hex characters within the template string literalExtractError
representing runtime extraction errors, such as missing, mismatched or residual unparsed data
The two error types can be used in our parse_template()
and extract()
, replacing the previous unit type ()
:
We can then update all occurrences of .unwrap()
to ?
or .map_err()?
wherever applicable, and update our generate()
macro entry to propagate errors to the Rust compiler:
Now, let's try compiling our example code with cargo run
:
Oops, we hit a compile error.
No worries! We can use cargo expand
to expand our test code that will help us illustrate the problem:
The block let (a, b) = { /* ... */ }
uses the ?
operator which returns to the nearest function or closure. In our case, it's the main()
function; but it's not allowed since the main()
function returns a unit type ()
. We could change it so that it returns a Result
, but I wanted it to act like calling a function, so that's off the table.
An alternative is to wrap the block with a try
block. Although this is ocularly pleasant and terse, it requires an unstable nightly feature flag #![feature(try_blocks)]
:
Rather than fiddling with unstable feature flags, a workaround is to use an IIFE (|| {})()
with an explicit return type:
which expands our code into the following, while preserving type inference on the Ok
variant:
Testing it out
Let's try out the full example where we match two bytes0xff
and 0xbe
, and extract a u8
and a u16
:
Running cargo run
outputs:
It works, and there's no assert panics! 🥳
Rounding it up
With all this effort being made, we can finally convert our original snippet:
to use our scan!
macro:
Much better! Especially without the awkward if let [a, b] = &data[..]
and unreachable!()
!
The compiler can now:
- automatically infer the number of elements in the tuple
- infer the type associated with the placeholders
- provide feedback if I've messed up the template string
Exactly what I needed! Though I could also eliminate parse_string(data)
— maybe a with a template placeholder like %*S
with the associated type Placeholder::StringRest
?
Final thoughts
Writing macros isn't easy nor simple. Since I hadn't really stumbled upon a good guide on macros, I experimented it myself instead: reading documentation, searching for examples, looking at GitHub issues, before getting stuff working properly.
To be fair, this isn't my first time writing macros.
Last year, I had to write an Impala query library for work which interfaces via Apache Thrift. I needed a way to transform a data row into a struct
using a derive macro, which should look like sqlx
's FromRow
API. I ended up browsing sqlx
's source code and figuring out how code was generated, and adapted my macro to use my own getters and transformers.
People have done crazy and impressive things with macros, like yew's html!
macro:
In the end, I think it's worth it to invest some effort into providing better APIs for your own libraries and utilities. Sure, it might be a little overkill for some cases, but you would certainly still learn something from it.
That said, time to go back to experimenting talking to NetMD devices with Rust!
References
- Source code: https://github.com/liangchunn/scan-macro-example
- MiniDisc logo source (GPLv2): md0.svg