Build On SingleStore 🛠️ Row Store

Not Your Everyday Database

In a prior age MemSQL (YC11) first gained traction with a memory-optimized skiplist-based row store table structure. As cloud RAM prices stayed high, and disks increased in speed, MemSQL subsequently invested in building a disk-based columnar storage engine. So what happened to the row store?

The Row Store is Dead, Long Live the Row Store

If already you read the linked paper (which is much more interesting than this article), you already know that row storage in SingleStore still exists in two ways.

First, column store tables have a row-store segment that buffers rows until they can be accumulated in batches and flushed to the columnar on-disk format.

Second, you can still create tables with the classic row storage structure, including the skiplist index, using the ROWSTORE option. But why would you do this?

Predictable, Ultra-Low-Latency

SingleStore’s unified table structure is great for most use-cases. It’s exceptional at analytical queries, and supports high volume read and write transactions due to years of optimization and enhancements. But for the absolute lowest and most predictable latency for point lookups and small inserts, updates, and deletes, a ROWSTORE table can outperform due to the universally-applicable performance optimization of doing less work:

No background merging and compaction consuming CPU.
Deletes and updates are simpler.
No reconstruction of rows from columnar data required on retrieval

Show Me How

Unfortunately my usual approach of using the SingleStore Free Shared Tier won’t work, because it doesn’t support ROWSTORE tables. So instead, I’ll turn to a convenient way to run SingleStore for development and testing: the SingleStore Dev Image. Running an ephemeral instance of SingleStore with Docker is as simple as the following:

docker run --rm \
    -e ROOT_PASSWORD="password123" \
    -p 3306:3306 -p 8080:8080 -p 9000:9000 \
    ghcr.io/singlestore-labs/singlestoredb-dev:latest

After running this, just browse to http://localhost:8080, enter root and password123 for your credentials, and you’re up-and-running!

Create a ROWSTORE table like this:

CREATE DATABASE IF NOT EXISTS test;
USE test;
CREATE ROWSTORE TABLE foo (
    k INT PRIMARY KEY,
    v INT NOT NULL
);

Now you can load a bunch of data to test it out! Here’s one way to bulk load with Python, adapted from loadit:

import io
import singlestoredb as s2
from queue import Queue


def load(queue):
    config = {
        "user": "root",
        "password": "password123",
        "host": "localhost",
        "port": 3306,
        "database": "test",
        "local_infile": True,
    }
    with s2.connect(**config) as conn:
        with conn.cursor() as cur:
            cur.execute("TRUNCATE TABLE foo;")
            cur.execute(
                (
                    "LOAD DATA LOCAL INFILE ':stream:' INTO TABLE foo "
                    "FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';"
                ),
                infile_stream=queue,
            )


print("preparing data...")
buffer = io.BytesIO()
queue = Queue()
# chunks of 1M rows works better than one giant 16M row chunk
# this should take maybe 10 seconds to run
for i in range(16):
    buffer.seek(0)
    buffer.truncate()
    for j in range(1000000):
        buffer.write(f"{1000000*i + j},1\n".encode())
    queue.put(buffer.getvalue())
queue.put(None)
print("loading data...")cd
load(queue)
print("done")

Closing

If you need predictable, very low latencies for small transactions, and your data set fits in the available memory, SingleStore’s ROWSTORE tables can be an interesting choice. You can mix-and-match them with column store tables in the same database, so use whichever is the most appropriate for your workload. The ROWSTORE table option is used somewhat less frequently than column store tables, but for some use cases it’s just the right tool for the job.