How my microservice approach to DynamoDB accidentally beat vendor lock-in!


One important thing I try to remember when making development decisions is to always try and keep options open. I want to commit to my technology choices as late as possible in the design process. One big technology choice is the choice of persistent store to use for a microservice. There is a lot of choice in this area from the style of store; Relational, object, document, graph, to the specific technology; Oracle, mysql, mongoDB, Cassandra, DynamoDB. The choices go on!
When I start building a microservice I rarely have enough information to make this choice. I typically want to put a MVP service in place and see how things go before committing. For this reason, I built a data store abstraction layer (https://github.com/rmetcalf9/object_store_abstraction) for use with my Python microservices. This allows me to code against a simple interface and not be tied to a choice. I have developed a number of adaptors already, memory and file based stores being the simplest. I have wrapped SQLAlchemy to get access to many relational based data stores. I have an adaptor for DynamoDB, and I am sure I will be writing Cassandra and MongoDB adaptors in the future when the need arises.

The trade off

On the plus side my microservices can now use any object store I want. Changing only requires a configuration change. On the downside my microservices can not take advantage of any features of a particular storage technology. I figure I can get around this with strict adherence to TDD allowing me to refactor the abstraction layer out in favour of a specific data store interface. I can do this at the point in time when I need the features of that technology.
I think this plays well with the theme of microservices. I can make them small and simple and only add complexity at the point in time when it is required. Microservices should be kept small enough to be refactored or replaced quickly so this change shouldn’t be a big problem.

So how does this avoid Vendor lock in?

I have been using the free trail of Amazon RDS service for almost a year now. This means that the trial is nearly up and I will have to start paying for my single micro database. I did think that amazon would tell me how much I would have been paying if I didn’t have the trial but it seems they don’t. I checked for myself and got a shock at how much a single micro MariaDB instance will cost! At other providers can get a virtual machine for a year for much cheaper!
Diving deeper into Amazon costs I found that there are some always free services and DyanmoDB is one of them. I decided to write a DynamoDB adapter and move over to using it. I have a live service with data inside it and I didn’t want to take it down to do the move. (I could have but I wanted to work out how I would cope if that wasn’t an option.) I would normally be nervous using an Amazon only datastore but my abstraction layer makes me feel less uncomfortable about that.

Migration mode

The reason I am writing this post is because I worked out what I think is a neat way of achieving a smooth migration. As well as writing a new DynamoDB adaptor I also created a migration adaptor. This is a kind of meta adaptor that creates instances of two adaptors. E.g. one for MariaDB and one for DynamoDB. The actual adaptors used do not matter. One store is labelled current and the other is labelled migration target. All reads use the current store but writes are sent to both. This means the migration target will get all new data and start to slowly build up the amount of data it holds.
I wrote a rough and ready utility that let me query the percentage of migration that was complete and a process to migrate the old data that will be missed out in the process. In the future I may create a utility that does this in a more controlled manor, but for my purposes now a quick script was good enough.
While the migration is going on, the live service can keep chugging away and once migration is 100% it will stay at 100%. At that point all I need to do is reconfigure the microservice using my CD pipeline to only use the new datastore and it’s job done!

TLDR

By using a datastore abstraction layer I get to defer the data store choice for my microservice but I can’t make use of particular datastore features without some refactoring. An unexpected benefit of the datastore abstraction layer was that I was able to build a migration adaptor which gave me the capability of live migration from one datastore to another even across technologies. This is one way to reduce the risks of vendor lock-in!

Published Date