Media – LSD::RELOAD

I could use some help with a design problem for a RESTful API.

The APIs we’re trying to do are those for a media production process with MXF and AAF as interchange formats. Data comes out of a database to go into complex long-running processes that slice and dice the data, eventually coming back to merge into the database. That database itself is replicated across half a dozen sites in an eventually consistent pattern, and connected up in various ways to other (enterprise) databases. Because the full complexity of these media formats gets in the way of designing the API basics I’ve come up with a simpler example. The weirdness of the example comes from it being distilled out of the complex use cases, where it does make (some) sense.

Setting the scene

Imagine a library of digital books. The library for reasons of storage efficiency and others has ripped all the books apart and has stored the individual chapters. When you are searching through the library or fetching bits of content, you interact with a representation of the books and the chapters (like a virtual index card) that does not include their content.

So books consist of 0 or more chapters, chapters are part of one or more books. Chapters can be part of multiple books, really. This happens because The collected works of William Shakespeare is represented as all the chapters from all of his books stitched together.

Both books and chapters have 0 or more titles (usually one title per language but there are various also known as edge cases).

Browsing through books

Imagine we represent a book as

<book xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA"> <title xml:lang="en-GB">The Merchant of Venice</title> <title xml:lang="nl">De Koopman van Venetië</title> <chapters> <chapter id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E"> <title xml:lang="en-GB">FIRST ACT</title> <title xml:lang="nl">EERSTE BEDRIJF</title> </chapter> </chapters> </book>

and a chapter as

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E"> <title xml:lang="en-GB">FIRST ACT</title> <title xml:lang="nl">EERSTE BEDRIJF</title> <book id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA"> <title xml:lang="en-GB">The Merchant of Venice</title> <title xml:lang="nl">De Koopman van Venetië</title> </book> </chapter>

It’s hopefully obvious that you can do a GET /library/{book|chapter}/{uuid} to retrieve these representations.

Changing book metadata

It’s also not difficult to imagine that you can do a PUT to the same URL to update the resource. You just PUT the same kind of document back.

What is a bit difficult is what happens when you do that PUT. The logic that I want is that a PUT of a book can be used to change the titles for that book and change which chapters are part of that book. For a PUT of a chapter, that should be used to change the titles for the chapter, but not to add or remove the chapter from a book (the list of chapters is ordered and the chapter doesn’t know where it is in the ordering).

(Again these rules seem pretty artificial in the example but in MXF there’s a variety of complex constraints that dictate in many cases that a new UMID should be created if an object in the model changes in a way that matters)

This sort-of breaks the PUT contract, because no matter how often you GET a book document, change the title of a chapter inside the book, and PUT that changed representation, your change will not be picked up. You have to follow the href, get the representation for the chapter, change the title there, and PUT it back.

This also breaks the common expectation people have with XML documents — if the data is there and you edit it and then you save it, normal things happen.

The problem with minimal representations

It’s easy to minimize the representations in use so this problem goes away. For example,

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E"> <title xml:lang="en-GB">FIRST ACT</title> <title xml:lang="nl">EERSTE BEDRIJF</title> <book href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA" /> </chapter>

It’s clear what you’re dealing with. The PUT does what it is supposed to do, and to learn the book title you just do another GET.

The problem with this approach is that the number of HTTP requests grows much larger if you want to display something in the UI, because the visual representation of a chapter shows the book title. To build snappy UIs that use ajax to communicate with my service, the rich representation that has the title information is much better.

Some options

So what should I do?

Use multiple representations

I could have /library/{book|chapter}/{uuid}/annotated as well as /library/{book|chapter}/{uuid}, with the latter serving the minimal representation and supporting PUT, or if I had smart ajax clients (I don’t) I could use some kind of content negotiation to get to the rich annotated version.

This is rather a bit of work and when documents leave the web for some kind of offline processing (the AAF files go into a craft edit suite and come back very different many weeks later, but they will still reference some of my original data) I have a risk that the “wrong” document makes into that edit suite.

Document the situation

I could stick with my original richly annotated XML and simply document which fields are and aren’t processed when you do a PUT. I’d probably change the PUT to a POST to make it a bit clearer.

Document and enforce the situation

I could strongly validate all documents that are PUT to me to make sure they do not contain any elements (in my namespace) that I do not intend to save, and reject documents that

Document the situation inside the XML

I could do something like

<chapter xmlns="http://schemas.example.com/library/v1/" id="urn:uuid:B24B6A07-7E48-4C61-B10F-FE13CCE7B20E" href="/library/chapter/B24B6A07-7E48-4C61-B10F-FE13CCE7B20E"> <title xml:lang="en-GB">FIRST ACT</title> <title xml:lang="nl">EERSTE BEDRIJF</title> <referencedBy>  <book id="urn:uuid:084E014B-784D-41AE-9EF6-01CE202B5EDA" href="/library/book/084E014B-784D-41AE-9EF6-01CE202B5EDA"> <title xml:lang="en-GB">The Merchant of Venice</title> <title xml:lang="nl">De Koopman van Venetië</title> </book> </referencedBy> </chapter>

This way it’s hopefully quite obvious to the API consumer what is going to happen when they PUT a document back. It is still rather unclean REST (so should I use POST?), but it avoids me having to design separate representations for browse vs edit.

One disadvantage is that I have to keep more resource state around when parsing or generating the content. Not an issue when things are built-up in memory, but for large documents and/or for pipeline processing, I made life a lot harder. There’s other possibilities to alleviate this (like adding an isReference attribute or inlining referencedBy sections throughout the document rather than put them all at the bottom), but they’re even less please esthetically.

Something else?

Which approach do you think is best? Is there a better one? What would you do?

Right now, since I’m just doing some quick prototyping, I’ve gone for the “document the situation” approach, but I think that eventually I’d either like to somehow highlight the “this is a forward reference for your convenience but don’t edit it” bits of the XML, or go for the multiple representations approach.

Category: Media

Forward references for RESTful resource collections