darcs, haskell, ops
July 7, 2013

darcsden db thoughts

Spent about half of yesterday setting up Aditya’s darcsden patches on the dev instance of hub.darcs.net, testing them, and exploring db migration issues.

Following BSRK’s instructions, I got the dev instance authenticating via Google’s OAuth servers. Good progress. The UI flow I saw needs a bit more work - eg logging in with google seemed to want me to register a new account. Or, there may be a problem with my setup at Google (wrong callback urls ?) - will have to review it with BSRK.

Schema, migrations

My dev instance has so far been using the same database as the live production instance. This is partly because I don’t yet know how to run a second CouchDB instance, partly to reduce complexity, partly to be able to compare old and new code with the same realistic data set.

This of course can lead to trouble, if old and new code require different schemas. darcsden uses CouchDB, a “schemaless” database, but of course there is an implicit schema required by the application code, even if couch doesn’t enforce one. I got more clarity on this when I noticed my dev instance experiments causing errors on the production app.

New darcsden code may include changes to the (implicit) db schema. In this case, there’s a change to the user’s password field. I need to notice such schema changes, and if I want to exercise them on the dev instance, I should first also install them on the production instance. Or, use a separate couchdb instance. Or, use separate databases in the couchdb instance. Or possibly, use separate views in the couchdb databases ?

Eg, here BSRK made the code nicely read user documents (db records) with the old or new schema. Before testing it on the shared db I should have deployed that patch to production as well as dev.

Looking ahead, is this approach (including code to deal with all old schemas) the best way to handle this ? Maybe. It makes things work and seems convenient, at least for now. But it also reminds me of years working with Zope’s ZODB (a schemaless python object database) and the layers of on-the-fly schema updating that built up, and the uncounted number of runtime bugs hunted down due to schema variations in individual objects.

Schema-less or schema-ful ?

While recovering from this, I learned some more about managing couchdb, schema migration, and current couchdb alternatives.

Couch has some really good and unusual qualities, and I feel I’m only scratching the surface of it’s power. Even so, I’m starting to feel a schema-ful, relational database is a better fit for darcsden/darcs hub. Replacing couch has been a topic of discussion on #darcs for some time, for other reasons. Here are some reasons to replace it:

Some reasons not to: