Whether you’re a developer, data analyst, QA engineer, DevOps person, or product manager –SQLite is the perfect tool for you.
There are a few well-known facts to begin with.
- SQLite is the most common DBMS in the world, supported by all popular operating systems.
- SQLite is serverless.
- For developers, SQLite can be embedded directly into applications.
- For others, there is a handy database console (REPL) that comes in the form of a single file (SQlite3.exe on Windows, SQlite3 on Linux/macOS).
Console, import and export
The console is SQLite’s killer for data analysis: more powerful than Excel and simpler than PANDAS. People can import CSV data with a single command, and tables are created automatically.
> .import --csv city.csv city
> select count(*) from city;
1117
Copy the code
The console supports basic SQL functionality and displays query results in a nice ASCIi-drawn table. Advanced SQL functionality is also supported, but more on that later.
select
century || ' century' as dates,
count(*) as city_count
from history
group by century
order by century desc;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ dates │ city_count │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ 21st century 1 │ │ │ 20 century 263 │ │ │ 19 century │ 189 │ 18 century │ 191 │ 17 century │ 137 │... │... │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
Data can be exported as SQL, CSV, JSON, or even Markdown or HTML. All it takes is a few commands.
.mode json
.output city.json
select city, foundation_year, timezone from city limit 10;
.shell cat city.json
Copy the code
[
{ "city": "Amsterdam", "foundation_year": 1300, "timezone": "UTC+1" },
{ "city": "Berlin", "foundation_year": 1237, "timezone": "UTC+1" },
{ "city": "Helsinki", "foundation_year": 1548, "timezone": "UTC+2" },
{ "city": "Monaco", "foundation_year": 1215, "timezone": "UTC+1" },
{ "city": "Moscow", "foundation_year": 1147, "timezone": "UTC+3" },
{ "city": "Reykjavik", "foundation_year": 874, "timezone": "UTC" },
{ "city": "Sarajevo", "foundation_year": 1461, "timezone": "UTC+1" },
{ "city": "Stockholm", "foundation_year": 1252, "timezone": "UTC+1" },
{ "city": "Tallinn", "foundation_year": 1219, "timezone": "UTC+2" },
{ "city": "Zagreb", "foundation_year": 1094, "timezone": "UTC+1" }
]
Copy the code
If you use BI tools more often than a console: Popular data exploration tools such as Metabase, Redash, and Superset all support SQLite.
Local JSON
Nothing is more convenient than SQLite for analyzing and transforming JSON. You can select data directly from the file as if it were a normal table. Or import the data into a table and select from it.
select
json_extract(value, '$.iso.code') as code,
json_extract(value, '$.iso.number') as num,
json_extract(value, '$.name') as name,
json_extract(value, '$.units.major.name') as unit
from
json_each(readfile('currency.sample.json'))
;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ code │ num │ name │ unit │ ├ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ ARS │ │ 032 Argentine peso | peso │ │ CHF │ │ 756 Swiss Franc │ Franc │ │ │ │ EUR 978 Euro │ Euro │ │ GBP 826 │ │ British Pound │ Pound │ INR │ 356 │ Indian Rupee │ Rupee │ JPY │ 392 │ Japanese yen │ yen │ MAD │ 504 Moroccan Dirham │ Dirham │ RUR │ 643 │ Russian Rouble │ Rouble │ SOS │ 706 │ Somali Shilling │ Shilling │ USD │ 840 │ US Dollar │ Dollar │ └ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
No matter how deep the JSON is nested – you can extract any nested object.
select
json_extract(value, '$.id') as id,
json_extract(value, '$.name') as name
from
json_tree(readfile('industry.sample.json'))
where
path like '$[%].industries'
;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ id name │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ 7.538 Internet provider │ │ 7.539 │ IT consulting │ 7.540 │ Software development │ 9.399 │ Mobile communication │ 9.400 │ Fixed communication │ 43.641 9.401 │ Fiber - called │ │ │ Audit │ │ │ 43.646 Insurance │ │ │ 43.647 Bank │ └ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
CTEs and set operations
Of course, SQLite supports generic table expressions (WITH clauses) and JOINs, which I won’t illustrate here. If the data is hierarchical (tables refer to themselves by columns like parent_id) –WITH RECURSIVE will come in handy. Any hierarchy, no matter how deep, can be “expanded” with a query.
with recursive tmp(id, name, level) as (
select id, name, 1 as level
from area
where parent_id is null
union all
select
area.id,
tmp.name || ', ' || area.name as name,
tmp.level + 1 as level
from area
join tmp on area.parent_id = tmp.id
)
select * from tmp;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ id name │ level │ ├ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ US 93 1 │ │ │ │ US 768, Washington DC 2 │ │ │ │ US 1833, Washington 2 │ │ │ │ US 2987, Washington, Bellevue 3 │ │ │ │ 3021 US, Washington, Everett 3 │ │ │ │ 3039 US, Washington, Kent 3 │ │ │... │... │... │ └ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
The Set? No problem: UNION,INTERSECT,EXCEPT.
select employer_id
from employer_area
where area_id = 1
except
select employer_id
from employer_area
where area_id = 2;
Copy the code
Calculate one column based on the other columns? Enter the generated columns.
Alter table vacancy add column salary_net INTEGER as (case when salary_gross = true then round(salary_from/1.04) else salary_from end );Copy the code
The generated columns can be queried in the same way as “normal” columns.
select
substr(name, 1, 40) as name,
salary_net
from vacancy
where
salary_currency = 'JPY'
and salary_net is not null
limit 10;
Copy the code
Mathematical statistics
Descriptive statistics? Simple: mean, median, percentile, standard deviation, whatever you want. You have to load an extension, but it’s also a single command (and a file).
.load sqlite3-stats
select
count(*) as book_count,
cast(avg(num_pages) as integer) as mean,
cast(median(num_pages) as integer) as median,
mode(num_pages) as mode,
percentile_90(num_pages) as p90,
percentile_95(num_pages) as p95,
percentile_99(num_pages) as p99
from books;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ┐ │ book_count │ mean │ median │ mode of p90 │ │ p95 │ p99 │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ┤ │ │ │ │ │ │ 640 256 295 349 1483 817 1199 │ │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ┘Copy the code
Notes on the extension. SQLite lacks a lot of functionality compared to other DBMSS such as PostgreSQL. But they’re easy to add, and that’s what people do — so the result is quite a mess.
Therefore, I decided to make a consistent set of extensions, divided by domain area and compiled for the major operating systems. There are still few people there, but more are on their way.
sqlean @ GitHub
More on the joys of statistics. You can draw data distribution directly from the console. Look how cute it is.
with slots as (
select
num_pages/100 as slot,
count(*) as book_count
from books
group by slot
),
max as (
select max(book_count) as value
from slots
)
select
slot,
book_count,
printf('%.' || (book_count * 30 / max.value) || 'c', '*') as bar
from slots, max
order by slot;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ slot │ book_count │ bar │ ├ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ │ │ 116 0 * * * * * * * * * │ │ │ │ 254 1 * * * * * * * * * * * * * * * * * * * * │ │ │ 2 │ 376 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * │ │ │ │ 285 3 * * * * * * * * * * * * * * * * * * * * * * │ │ │ │ 184 4 * * * * * * * * * * * * * * │ │ │ │ 90 5 6 * * * * * * * │ │ │ │ 54 * * * * │ │ │ │ 41 7 * * * │ │ │ │ 31 8 * * │ │ │ │ │ 15 * 9 10 11 │ x │ │ │ │ 11 12 │ x │ │ │ │ │ 2 * 12 │ └ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
performance
SQLite can handle billions of records just fine. Normal INSERT, on my laptop, shows about 240,000 records per second. If you join the CSV file as a virtual table (with an extension) – inserts are twice as fast.
.load sqlite3-vsv
create virtual table temp.blocks_csv using vsv(
filename="ipblocks.csv",
schema="create table x(network text, geoname_id integer, registered_country_geoname_id integer, represented_country_geoname_id integer, is_anonymous_proxy integer, is_satellite_provider integer, postal_code text, latitude real, longitude real, accuracy_radius integer)",
columns=10,
header=on,
nulls=on
);
Copy the code
.timer on
insert into blocks
select * from blocks_csv;
Run Time: real 5.176 user 4.716420 sys 0.403866
Copy the code
select count(*) from blocks;
3386629
Run Time: real 0.095 user 0.021972 sys 0.063716
Copy the code
There is a popular belief among developers that SQLite is unsuitable for the Web because it does not support concurrent access. It’s a myth. In write-first, take-back logging mode (which has been around for a long time), you can have as many concurrent reads as you want. You can have only one current write, but often one is sufficient.
SQLite is the perfect choice for small websites and applications. Sqlite.org uses SQLite as a database and doesn’t bother to optimize (≈200 requests per page). It handles 700,000 visits a month and offers pages faster than 95% of the sites I’ve seen.
Files, charts, and searches
SQLite supports partial index and expression indexing, just like a “large “DBMS. You can index the generated columns and even turn SQLite into a document database. Simply store the raw JSON and index it on the jSON_extract ()-ed column.
create table currency(
body text,
code text as (json_extract(body, '$.code')),
name text as (json_extract(body, '$.name'))
);
create index currency_code_idx on currency(code);
insert into currency
select value
from json_each(readfile('currency.sample.json'));
Copy the code
explain query plan
select name from currency where code = 'EUR';
QUERY PLAN
`--SEARCH TABLE currency USING INDEX currency_code_idx (code=?)
Copy the code
You can also use SQLite as a graphics database. A bunch of complex WITH RECURSIVE, or maybe you’d prefer to add a little Python.
simple-graph @ GitHub
Full-text search out of the box.
create virtual table books_fts
using fts5(title, author, publisher);
insert into books_fts
select title, author, publisher from books;
select
author,
substr(title, 1, 30) as title,
substr(publisher, 1, 10) as publisher
from books_fts
where books_fts match 'ann'
limit 5;
Copy the code
┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┬ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ author │ title │ publisher │ ├ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┼ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┤ │ Ruby Ann Boxcar │ Ruby Ann 's Down Home Trailer P │ Citadel │ Ruby Ann Boxcar │ Ruby Ann's Down Home Trailer P │ Citadel │ Lynne Ann DeSpelder │ The Last Dance: Encountering D │ McGraw-hil │ Daniel Defoe │ Robinson Crusoe │ Ann Arbor │ Ann Thwaite │ Waiting for the Party: The Lif │ David r. G │ └ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┴ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘Copy the code
Perhaps you need an in-memory database for intermediate calculations? One line of Python code.
db = sqlite3.connect(":memory:")
Copy the code
You can even access it from multiple connections.
db = sqlite3.connect("file::memory:? cache=shared")Copy the code
And so much more
There are fancy window functions (as in PostgreSQL). UPSERT,UPDATE FROM, and generate_series(). R – Tree indexes. Regular expressions, fuzzy search, and geography. In terms of functionality, SQLite can compete with any “large “DBMS.
There are also great tools around SQLite. I particularly like Datasette, an open source tool for exploring and publishing SQLite datasets. DBeaver is an excellent open source database IDE that supports the latest VERSION of SQLite.
I hope this article inspires you to try SQLite. Thanks for reading!
Follow on Twitter@ohmypy, learn about the new article 🚀.
antonz.org/sqlite-is-n…