Jason Thorsness

github
github icon
linkedin
linkedin icon
twitter
twitter icon
hi
18 ā€” Jan 22 25

Build On SingleStore šŸ› ļø
Analytics

At least until Tractor Loader takes off, my day job involves building the SingleStore database and managed service. SingleStore is the evolution and new name for MemSQL, which began years ago as a distributed in-memory SQL database. Itā€™s an independent engine, which makes it particularly interesting to work on and gives it unique strengths compared to PostgreSQL, MySQL, and other SQL databases.

One of the downsides to being a unique engine, as opposed to, say, a PostgreSQL plugin, is that thereā€™s less written about when and how to use SingleStore. So, Iā€™m going to cover a few use cases where SingleStore works well, from the ground-up, and apply them to my own web sites as an example.

So whatā€™s one good use-case? When you need analytics!

Show Me A Chart

Mostly my web sites use Vercel. Iā€™d like an independent count of page visits, since Vercelā€™s own analytics are mostly an upsell.

SingleStore is great for this scenario due to its two-tier storage for tables ā€” a row store segment for fresh data supporting fast writes and retrievals, flushed periodically to a columnar format and pushed to bottomless object storage. The application gets the best of both worlds: fast analytics and fresh data, and the database doesnā€™t have to work very hard.

Hereā€™s a real-time view of this pageā€™s traffic over SingleStore since I wrote this article. Try refreshing! Try to overwhelm my system by sharing this article with all of your friends šŸ˜‰.

Loading...

Since this blog is more aspirational than popular, most pages donā€™t get many views. But since I wrote this article one did: DeepSeek My User Agent. Check out the traffic for that page!

Loading...

Build It Yourself

OK first, if you just want analytics probably just buy something like Vercelā€™s analytics or Fathom or something else. But if you want to BUILD an analytics SaaS, or strictly do-it-yourself, then SingleStore can be the database for you.

The Database

First, youā€™ll need a SingleStore instance. Iā€™ve previously set up an account for my personal projects on SingleStoreā€™s Free Tier which I currently use as a caching layer for weather.bingo. Iā€™ll use it for my analytics backend as well.

Iā€™ll be connecting to SingleStore using the SingleStore Kai endpoint for MongoDB client drivers. Why MongoDB? I am a shill for SingleStore Kai, and also I want client-first schema because itā€™s easier. The same thing can be accomplished through the SQL interface, so use whatever you prefer.

Iā€™ll store my data in an analytics collection. The SingleStore Kai proxy will create the backing table with just two main columns, an _id and _more. This can be ported to a more optimized schema later (the subject of a future articleā€¦).

The Source

My Vercel site is static and cached in many places, so Iā€™ll need to trigger the userā€™s browser to make an API call on every page to be counted. Iā€™ll use a tried-and-true option: an invisible tracking pixel. This will call a server API where I can log the request.

Hereā€™s the client code Iā€™ll add to my root layout.tsx:

<img src="/api/tracking" alt="" width={1} height={1} className="hidden" />

Hereā€™s the API route handler.

import clientPromise from "../../mongodb";
import { waitUntil } from "@vercel/functions";
import { NextRequest, NextResponse, userAgent } from "next/server";

const pixel = Buffer.from(
  // curl https://upload.wikimedia.org/wikipedia/commons/c/c0/Blank.gif | base64
  "R0lGODlhAQABAIAAAP///wAAACH5BAAAAAAALAAAAAABAAEAAAICRAEAOw==",
  "base64"
);

export async function GET(req: NextRequest): Promise<NextResponse> {
  let path = "";
  let hostname = "";
  try {
    const url = new URL(req.headers.get("referer")!);
    path = url.pathname;
    hostname = url.hostname;
  } catch {
    // OK
  }

  // send to DB in background
  waitUntil(
    (await clientPromise)
      .db()
      .collection<any>("analytics")
      .insertOne({
        hostname: hostname,
        path: path,
        country: req.headers.get("x-vercel-ip-country"),
        region: req.headers.get("x-vercel-ip-country-region"),
        city: req.headers.get("x-vercel-ip-city"),
        ua: userAgent(req),
        ts: new Date(),
      })
  );

  return new NextResponse(pixel, {
    headers: {
      "Content-Type": "image/gif",
    },
  });
}

The Client

I want to display the up-to-the-second stats for visitors to my page, so Iā€™ll create an API route to serve the data. I asked ChatGPT to write this MongoDB-language query; GPT-4o is quite good at MongoDB queries.

import clientPromise from "../../mongodb";
import { NextRequest, NextResponse, userAgent } from "next/server";

export async function GET(req: NextRequest): Promise<NextResponse> {
  const { searchParams } = new URL(req.url);
  const path = searchParams.get("path");
  const col = (await clientPromise).db().collection("analytics");

  const response = await col
    .aggregate([
      { $match: { path } },
      {
        $group: {
          _id: "$country",
          count: { $sum: 1 },
        },
      },
      {
        $unionWith: {
          coll: "analytics",
          pipeline: [
            { $match: { path } },
            { $group: { _id: "Total", count: { $sum: 1 } } }
          ],
        },
      },
      { $project: { _id: { $ifNull: ["$_id", "Other"] }, count: 1 } },
      { $sort: { count: -1 } },
      { $limit: 10 },
    ])
    .toArray();

  return new NextResponse(JSON.stringify(response), {
    headers: {
      "Content-Type": "application/json",
    },
  });
}

Here is the data rendered with chart.js:

<Bar
  data={{
    labels: data.map((item: any) => item._id),
    datasets: [
      {
        label: "Views",
        data: data.map((item: any) => item.count),
        backgroundColor: "rgba(54, 162, 235, 0.5)",
      },
    ],
  }}
  options={{
    indexAxis: "y",
    animation: false,
    plugins: {
      tooltip: false,
    },
    hover: {
      mode: null,
    },
  }}
/>

But, Does It Scale?

The analytics use-case is a proven use of SingleStore, even at very large, multi-terabyte scale. For my own site, I wonā€™t need to scale beyond the free tier, even when I (hopefully) do start to get tens of thousands of views per article.

One of SingleStoreā€™s competitors, ClickHouse, publishes a benchmark where SingleStore ranks up with the best performing databases for web-style analytics. But what that benchmark doesnā€™t tell you is how SingleStore supports that performance over arbitrary JSON and BSON data, maintains the performance with constant real-time ingestion, supports many joins in analytics queries, and includes many other features that make the same system suitable for a wide variety of use-cases without requiring complex ETL moving the data around all the time.

Iā€™ll touch on more of those capabilities in some future articles. Thanks for reading!

Ā TopĀ 
TermsPrivacy