In a prior age MemSQL (YC11) first gained traction with a memory-optimized skiplist-based row store table structure. As cloud RAM prices stayed high, and disks increased in speed, MemSQL subsequently invested in building a disk-based columnar storage engine. So what happened to the row store?
If already you read the linked paper (which is much more interesting than this article), you already know that row storage in SingleStore still exists in two ways.
First, column store tables have a row-store segment that buffers rows until they can be accumulated in batches and flushed to the columnar on-disk format.
Second, you can still create tables with the classic row storage structure, including the skiplist
index, using the ROWSTORE
option. But why would you do this?
SingleStore’s unified table structure is great for most use-cases. It’s exceptional at analytical
queries, and supports high volume read and write transactions due to years of optimization and
enhancements. But for the absolute lowest and most predictable latency for point lookups and small
inserts, updates, and deletes, a ROWSTORE
table can outperform due to the universally-applicable
performance optimization of doing less work:
Unfortunately my usual approach of using the
SingleStore Free Shared Tier
won’t work, because it doesn’t support ROWSTORE
tables. So instead, I’ll turn to a convenient way
to run SingleStore for development and testing: the
SingleStore Dev Image. Running an
ephemeral instance of SingleStore with Docker is as simple as the following:
docker run --rm \
-e ROOT_PASSWORD="password123" \
-p 3306:3306 -p 8080:8080 -p 9000:9000 \
ghcr.io/singlestore-labs/singlestoredb-dev:latest
After running this, just browse to http://localhost:8080, enter root
and
password123
for your credentials, and you’re up-and-running!
Create a ROWSTORE
table like this:
CREATE DATABASE IF NOT EXISTS test;
USE test;
CREATE ROWSTORE TABLE foo (
k INT PRIMARY KEY,
v INT NOT NULL
);
Now you can load a bunch of data to test it out! Here’s one way to bulk load with Python, adapted from loadit:
import io
import singlestoredb as s2
from queue import Queue
def load(queue):
config = {
"user": "root",
"password": "password123",
"host": "localhost",
"port": 3306,
"database": "test",
"local_infile": True,
}
with s2.connect(**config) as conn:
with conn.cursor() as cur:
cur.execute("TRUNCATE TABLE foo;")
cur.execute(
(
"LOAD DATA LOCAL INFILE ':stream:' INTO TABLE foo "
"FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';"
),
infile_stream=queue,
)
print("preparing data...")
buffer = io.BytesIO()
queue = Queue()
# chunks of 1M rows works better than one giant 16M row chunk
# this should take maybe 10 seconds to run
for i in range(16):
buffer.seek(0)
buffer.truncate()
for j in range(1000000):
buffer.write(f"{1000000*i + j},1\n".encode())
queue.put(buffer.getvalue())
queue.put(None)
print("loading data...")cd
load(queue)
print("done")
If you need predictable, very low latencies for small transactions, and your data set fits in the
available memory, SingleStore’s ROWSTORE
tables can be an interesting choice. You can
mix-and-match them with column store tables in the same database, so use whichever is the most
appropriate for your workload. The ROWSTORE
table option is used somewhat less frequently than
column store tables, but for some use cases it’s just the right tool for the job.