Bookish Development Diary, episode 2.
One of the devevelopment tasks I enjoy most is designing then implementing a new module. The smaller the module, the more self-contained the job it is tasked with, the cleaner and simpler the resulting API, the better.
Perhaps that’s the reason that I allowed myself to be distracted by convinced of the need to develop Datastore, my generic data storage layer for Bookish…
A large part of the fun is how totally wrong I get it to begin with. You create a new project, write a few tests and a bit of code, and think you’re done. Simples!
Then you go back to the client project to try to adopt the new API, and it all falls apart. You forgot X. And Y. Actually, you wrote it in the wrong language, missed out ɑ through to Ω, and may be holding the map upside down. So you return to the API, and fix your obvious mistakes, then try again. And come up short again. And so it goes… rinse and repeat.
I love this process. Not for the frequent reminders of what an idiot I am (healthy though that is), but for the sense of progress that you get as the real shape of the API slowly emerges out of the mist.
I won’t write in detail here about all of the inspirations behind the choices I made for the Datastore design (that’s for another post), but I’m a great believer in Occam’s Razor.
So I started with the simplest thing that I thought would meet the needs of Bookish: look up records with an identifier, fetch them in batches and get the results as dictionaries, and send batches of (identifier:dictionary) pairs back to the store to write or update records.
Bookish supports importing from other products (such as the Kindle app, or Tasty Archive1), and they don’t always give you identifiers - sometimes you have to make do with names. So I realised that I need to be able to fetch a record by a name as well as an id. Back to the API drawing board.
Then I realised that entities have any number of properties, and name
is just one of them. Being able to fetch by name was a special-case, and I don’t like special cases. So I reworked the API to allow looking up a record by any arbitrary property.
There are competing tensions at work here, influencing the API. The first change was driven by need. The second by a desire for flexibility. Behind it, always, is that Occam’s Razor desire to keep the API as small as possible: “the simplest thing that works”.
So once I’d made those changes, I realised that there are times when I need to look up an object if it exists, or create it if it doesn’t. This was particularly relevant in cases where I was adding a property to one object which added a relationship to another one.
Doing this with the asynchronous API felt clunky and resulted in a chain of calls that didn’t feel natural. What I needed was a way to specify an object, along with a minimal set of values to use, if necessary, to create it. Back to the API drawing board, and entity references were born; lightweight objects which combine both of these abilities.
Adapting the Bookish import to use these references, something else fell out of the design. I wanted the importer to be able to cope with being run multiple times with the same source. This can’t be perfect in all cases, but it is nice if it does something vaguely sensible rather than just duplicating all the records again (especially for imports from things like your Kindle library, which is going to keep on changing outside of Bookish).
In order to do this, I needed some of the identifiers that the importer creates to resolve to the same thing for each import. If it’s run a second time, it’ll pick up the same records rather than making new ones. However, if a record with that identifier doesn’t exist, but a record with the right name already does (eg the user already had an existing record for a given named author), I probably want the importer to pick this in preference.
So now my entity references needed to be able to specify a list of criteria to match against - identifier, then name - only falling back to making a new object if one of those failed. Back to the drawing board…
Finally, I got to the point where the existing importer code could be rewritten cleanly to use Datastore. Hurrah!
On to the next bit of conversion, at which point I realised that I had some debug code that was counting objects. It’d be really inefficient to have to do that by fetching all of actual entities. How could I have missed that use case? Back to the drawing board…
Then I realised that my existing Bookish implementation has the ability to start you off with a sample database (useful for real users), or to reset the database back to the sample at any point (useful for testers). Which relies on some core data API to safely duplicate/delete the backing files. Which needs hiding behind the Datastore API, since it’s not supposed to expose the technology it’s using for the store. Back to the drawing board…
And so it goes on.
If I was just adding new API each time, this could feel like code-rot was creeping in. Generally though, I’m not adding, I’m refactoring. When I do have to add new API, I can usually do it in such a way that it generalises some old functionality at the same time.
The overall size of the API grows as I add new things, then shrinks as I spot generalisations, or realise that something I’ve added has made something else obsolete.
Each time this happens, it feels like I’ve got a little bit closer to the true expression of the problem I’m trying to solve. Which feels satisfying.
I know I’m not there yet. I think I’ve caught a hazy glimpse of something even simpler - maybe involving actual objects which are references on the way in, but can also contain properties. These would be the current values on the way out, or new values on the way in. Or maybe that’s over-complicating things. Not quite sure, but I’m looking forward to finding out…
This is the app that I currently use to catalogue my books. Its limitations, and the trajectory it has followed over the years since it was first released, are a large part of my reason for making Bookish. I may have its name slightly wrong. ↩