|
vkdb
A time series database engine in C++.
|
vkdb is built on log-structured merge (LSM) trees. These have an in-memory layer and a disk layer, paired with a write-ahead log (WAL) for persistence of in-memory changes.
When you instantiate a vkdb::Database, all of the prior in-memory information (in-memory layer, metadata, etc.) will be loaded in if the database already exists, and if not, a new one is set up. This persists on disk until you clear it via vkdb::Database::clear.
It's best to make all interactions via vkdb::Database, or the vkdb::Table type via vkdb::Database::getTable, unless you just want to play around with vq (more on the playground here).
$HOME environment variable is set correctly, as all database files will be stored in .vkdb within your home directory. Only tamper with this directory if moving databases between machines!
The LSM tree uses time-window compaction to efficiently organise and merge SSTables across different layers (C0-C7). Each layer has a specific time window size and maximum number of SSTables.
| Layer | Time Window | Max. SSTables |
|---|---|---|
| C0 | Overlapping | 32 |
| C1 | 1 day | 2,048 |
| C2 | 1 week | 1,024 |
| C3 | 1 month | 512 |
| C4 | 3 months | 256 |
| C5 | 6 months | 128 |
| C6 | 1 year | 64 |
| C7 | 3 years | 32 |
When the memtable fills up, it's flushed to C0 as an SSTable. C0 acts as a buffer for the later layers, and when it exceeds its SSTable limit, all the SSTables are merged into C1 at once, with each SSTable spanning a 1-day window.
When any other layer exceeds its SSTable limit, only its oldest, excess SSTables are merged with the next layer's SSTables based on the layer's time window. For example, if C1 has too many SSTables:
This time-window compaction strategy enables:
Lexing is done quite typically, with enumerated token types and line/column number stored for error messages. Initially, I directly executed queries as string streams, but that was a nightmare for robustness.
In terms of parsing, vq has been constructed to have an LL(1) grammar—this meant I could write a straightforward recursive descent parser for the language. This directly converts queries to an abstract syntax tree (AST) with std::variant.
Finally, the interpreter makes quick use of the AST via the visitor pattern, built into C++ with std::variant (mentioned earlier) and std::visit. This ended up making the interpreter (and pretty-printer) very satisfying to write.