The SQLite team is preparing an efficient remote replication tool

The SQLite team is preparing an efficient remote replication tool

SQLite, the ubiquitous database engine, is likely to be enhanced with a replication tool that will ensure a consistent snapshot of a database even while there are active reads and writes taking place.

The sqlite3-rsync utility can copy an origin database to a local or remote location. If remote, ssh (secure shell) protocol is used, so that data is encrypted. The target, called REPLICA in the draft documentation, can already exist and have live connections during the replication process, though it is read-only. Further, the replica will end up as a snapshot of the origin database at the time the tool was invoked, so if any writes were made to the target but not the origin, they will be lost; and if any new writes were made to the origin during the process, they will not be included.

There is already a SQLite backup API but this API locks the database, though it can done in steps to avoid excessive delays for users. There is also a VACUUM INTO command that exports a database into a file.

The new tool though has distinctive features, one of which is that it designed to target remote replicas, not just local files. It is also very efficient, since it checks for data that does not need to be sent because the target is already identical. The way this works is that the replica sends a cryptographic hash of each of its pages (sections of storage) and if it is already the same as the origin, no data is transmitted for that page. “For databases with a 4096-byte page size, the minimum bandwidth required to run this utility is equivalent to about 0.5% of the database,” the documentation states.

Limitations of the new tools include that both databases must use write-ahead logging and use the same page size. In addition, the tool can only synchronize one database at a time. 

The new tool will offer another possibility for backup but is also interesting for scenarios where many more users need to read the data, than to write to it, and where replicas that are slightly out of date are acceptable.

SQLite is small and well-suited to embedded use, but can handle large amounts of data. The theoretical maximum size of a SQLite database is around 281 terabytes though the docs state that “this particular upper bound is untested since the developers do not have access to hardware capable of reaching this limit.”

There is no published timeline for the delivery of a production version of sqlite3-rsync but there is code and reported bugs are fixed promptly so there is no doubting its active development.