Further Evolution
December 17, 2019

Bookish Development Diary, episode 4.

A tale of how the last version of the Datastore was just right, but it turns out that the latest version is even righter.

As I mentioned in a previous post, I really enjoy the way an API evolves as one slowly figures out what it is supposed to do.

This process is continuing with Datastore, the database backend that I’m using for Bookish.

It’s perpetually humbling to realise just how unfinished that thing that you thought was finished actually is. For instance: it turns out you actually need API to delete records - who knew? 1

Something that I particularly enjoy is the moment where two concepts/classes/algorithms that you thought were separate reveal themselves to be two aspects of the same thing.

This seems to be happening with Datastore…

The Story So Far…

To recap, Datastore is designed to be asynchronous, so rather than fetching entities and then manipulating them directly, the way you work with Datastore is more like:

In early versions of the API, this lead to two fundamental concepts:

You always pass in some references, to describe what you want. If you’re performing an update, you also pass in a dictionary with the changes. If you’re looking up entities, you get back some references. If you’re looking up properties, you get back a dictionary.

The references started off as quite simple things that just specified the entity identifier in a neutral way.

Then they evolved a bit so that they’re more like database queries that can be used to locate the entities to work with. Rather than just matching identifiers, they can also match properties.

Then they evolved further and became a way that you could not only describe an existing object, but also how to make it if it didn’t already exist. Passing in a bunch of these references has a pleasingly declarative feel about it. It’s as if you’re a subset of the database as you need it to be, and leaving the storage system to work out whether it needs to fetch existing entities or make new ones.

Meanwhile, another interesting aspect of references is that they can be resolved. As I mentioned above, to a client of the Datastore, they can be thought of as an immutable description, which is a way of locating an entity. Internally though, there has to be a process of resolving this description, to turn it into an actual entity record2. This means that there’s some potential value in passing back the resolved reference as well as whatever other results you’re supposed to return. A resolved reference is guaranteed to describe a record in the underlying database, and can hold on to some internal state allowing them to skip the lookup work.

As a result of this, the concept of a GuaranteedReference was born, and the API was modified so that whilst it took instances of EntityReference, it would also return instance of GuaranteedReference. These guaranteed references were interchangeable with normal ones, but the need to return them, potentially along with a property dictionary as well, made some of the result code a little messy3.

Evolving References

I digress slightly.

Getting back to the evolution of references into things that could describe how to make something: doing this required some types of references gaining a property dictionary, used to populate the entity if it had to be created.

Which was interesting, because it offered a tantalising glimpse of a possible world where, rather than passing in references and getting back properties associated with them, perhaps you passed in references-with-associated-properties, and also got back references-with-associated-properties.

Instead of having to sometimes pass in a list of EntityReference instances, and other times a list of EntityReference:PropertyDictionary pairs, and sometimes getting back a list of GuaranteedReference instances, and other times a list of GuaranteedReference:PropertyDictionary pairs, maybe the API could be arranged so that you always pass in a list of references, and get back a list of references.

If the references own property dictionaries, then on the way in, these can be used as a description of how to set up the entity, or how to modify the entity. On the way out, the dictionaries can contain any properties that were looked up, and the returned references themselves can be upgraded to be guaranteed.

This leads to an API where every operation feels like a transformation. You pass in a list of existing references, and you get back a list of transformed references. Which feels nice.

It also feels very much like what I will need for the user interface, where you’ll need to have some sort of placeholder objects, standing in for the entities, which the user interface code can extract properties from, and perhaps perform changes to.

The Future

This is where I’m heading with the next iteration of the Datastore API (which I should probably call 2.0, but may well just call 1.3 since things are moving so fast at the moment that I’m not be completely strict about semantic versioning).

There are still a few wrinkles to work out456.

I’m not yet sure what all the answers are, but I’m looking forward to finding out.

  1. I might have fogotten to add one in earlier versions of the API. :facepalm:. 

  2. The current implementation uses CoreData under the hood, so needs to work with NSManagedObject and NSManagedObjectContext. We don’t want to leak these implementation details though, which is where references come in. The resolution process takes a reference and a CoreData context, and gets back the NSManagedObject corresponding to that entity. 

  3. For example, you’d ask for a list of properties, for a list of entities, and get back a dictionary where the keys were the corresponding guaranteed entities and the values were property dictionaries. The references you passed in and the ones you got back described the same entities, but were different objects, which complicated the process of extracting results from the dictionary. 

  4. One thing I’m pondering is whether the references that describe a new entity need to make a distinction between properties to use to create it if it’s missing, and properties for the actual operation you requested. For example, if you want to look up a record by id, but also set its name if you create it. If a record with the id exists, the name might have a different value, so you don’t want to set it in that situation. Does that mean that references need two dictionaries - an initial values one and a values-to-use-for-the-operation one? Or are the only operations that take a dictionary always going to overwrite anything that was there before anyway? 

  5. Another thing I’m pondering is whether these references really can now act as fully fledged stand-ins for the real entities in the user interface. In which case they probably need a new name. Can you fetch a few, display them in the UI, use them to collect property modifications from the UI, and then issue an update back to the store to commit them? Does this work cleanly in all situations? From the UI’s point of view, these references are stand-ins, but are not perfect copies of the real things, and may not contain all the properties that the real thing contains; they may also be out of date if a synchronisation from another device has changed something in the meantime. 

  6. Should the references’ dictionaries be mutable, or is it better that they are immutable and always replaced by a new object when the callback from the previous operation is called to indicate that it has completed? Is there a danger that using an older copy of a reference in a new operation might accidentally overwrite new property values with stale ones?