Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 1 | # SQLite abstraction layer |
| 2 | |
Victor Costan | f265dd7f | 2021-05-26 17:47:48 | [diff] [blame] | 3 | [TOC] |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 4 | |
| 5 | ## SQLite for system designers |
| 6 | |
| 7 | [SQLite](https://www.sqlite.org/) is a |
| 8 | [relational database management system (RDBMS)](https://en.wikipedia.org/wiki/Relational_database#RDBMS) |
| 9 | that [supports most of SQL](https://www.sqlite.org/lang.html). |
| 10 | |
| 11 | SQLite is architected as a library that can be embedded in another application, |
| 12 | such as Chrome. SQLite runs in the application's process, and shares its memory |
| 13 | and other resources. This is similar to embedded databases like |
| 14 | [LevelDB](https://github.com/google/leveldb) and |
| 15 | [BerkeleyDB](https://en.wikipedia.org/wiki/Berkeley_DB). By contrast, most |
| 16 | popular RDMBSes, like [PostgreSQL](https://www.postgresql.org/) and |
| 17 | [MySQL](https://www.mysql.com/), are structured as standalone server processes |
| 18 | that accept queries from client processes. |
| 19 | |
| 20 | TODO: Explain the process model and locking |
| 21 | |
| 22 | TODO: Explain Chrome decisions -- exclusive locking, full per-feature isolation |
| 23 | (separate databases and page caches) |
| 24 | |
| 25 | |
| 26 | ## SQLite for database designers |
| 27 | |
| 28 | The section summarizes aspects of SQLite that are relevant to schema and |
| 29 | query design, and may be surprising to readers with prior experience in other |
| 30 | popular SQL database systems, such as |
| 31 | [PostgreSQL](https://www.postgresql.org/) and [MySQL](https://www.mysql.com/). |
| 32 | |
| 33 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 34 | ### Data storage model {#storage-model} |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 35 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 36 | The main bottleneck in SQLite database performance is usually disk I/O. So, |
| 37 | designing schemas that perform well requires understanding how SQLite stores |
| 38 | data on disk. |
| 39 | |
| 40 | At a very high level, a SQLite database is a forest of |
| 41 | [B-trees](https://en.wikipedia.org/wiki/B-tree), some of which are |
| 42 | [B+-trees](https://en.wikipedia.org/wiki/B%2B_tree). The database file is an |
| 43 | array of fixed-size pages, where each page stores a B-tree node. The page size |
| 44 | can only be set when a database file is created, and impacts both SQL statement |
| 45 | execution speed, and memory consumption. |
| 46 | |
| 47 | The data in each table (usually called *rows*, *records*, or *tuples*) is stored |
| 48 | in a separate B-tree. The data in each index (called *entries*, *records* or |
| 49 | *tuples*) is also stored in a separate B-tree. So, each B-tree is associated |
| 50 | with exactly one table. The [*Indexing* section](#indexing-model) goes into |
| 51 | further details. |
| 52 | |
| 53 | Each B-tree node stores multiple tuples of values. The values and their |
| 54 | encodings are described in the [*Value types* section](#data-types). |
| 55 | |
| 56 | Tying everything together: The performance of a SQL statement is roughly the |
| 57 | number of database pages touched (read / written) by the statement. These pages |
| 58 | are nodes belonging to the B-trees associated with the tables mentioned in the |
| 59 | statement. The number of pages touched when accessing a B-tree depends on the |
| 60 | B-tree's depth. Each B-tree's depth depends on its record count (number of |
| 61 | records stored in it), and on its node width (how many records fit in a node). |
| 62 | |
| 63 | |
| 64 | #### Value types {#data-types} |
| 65 | |
| 66 | SQLite stores values using |
| 67 | [5 major types](https://www.sqlite.org/datatype3.html), which are summarized |
| 68 | below. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 69 | |
| 70 | 1. NULL is a special type for the `NULL` value. |
| 71 | |
| 72 | 2. INTEGER represents big-endian twos-complement integers. Boolean values |
| 73 | (`TRUE` and `FALSE`) are represented as the integer values 1 and 0. |
| 74 | |
| 75 | 3. REAL represents IEEE 754-2008 64-bit floating point numbers. |
| 76 | |
| 77 | 4. TEXT represents strings (sequences of characters) encoded using a |
| 78 | [supported SQLite encoding](https://www.sqlite.org/c3ref/c_any.html). These |
| 79 | values are |
| 80 | [sorted](https://www.sqlite.org/datatype3.html#sort_order) according to |
| 81 | [a collating sequence](https://www.sqlite.org/datatype3.html#collation) or |
| 82 | [a collating function](https://www.sqlite.org/c3ref/create_collation.html). |
| 83 | |
| 84 | 5. BLOB represents sequences of bytes that are opaque to SQLite. These values are |
| 85 | sorted using the bitwise binary comparison offered by |
| 86 | [memcmp](https://en.cppreference.com/w/cpp/string/byte/memcmp). |
| 87 | |
| 88 | SQLite stores index keys and row values (records / tuples) using |
| 89 | [a tightly packed format](https://sqlite.org/fileformat2.html#record_format) |
| 90 | that makes heavy use of [varints](https://sqlite.org/fileformat2.html#varint) |
| 91 | and variable-length fields. The column types have almost no influence on the |
| 92 | encoding of values. This has the following consequences. |
| 93 | |
| 94 | * All SQL integer types, such as `TINYINT` and `BIGINT`, are treated as aliases |
| 95 | for `INTEGER`. |
| 96 | * All SQL non-integer numeric types, such as `DECIMAL`, `FLOAT`, and |
| 97 | `DOUBLE PRECISION` are treated as aliases for `REAL`. |
| 98 | * Numeric precision and scale specifiers, such as `DECIMAL(5,2)` are ignored. |
| 99 | * All string types, such as `CHAR`, `CHARACTER VARYING`, `VARCHAR`, and `CLOB`, |
| 100 | are treated as aliases for `TEXT`. |
| 101 | * Maximum string length specifiers, such as `CHAR(255)` are ignored. |
| 102 | |
| 103 | SQLite uses clever heuristics, called |
| 104 | [type affinity](https://www.sqlite.org/datatype3.html#type_affinity), |
| 105 | to map SQL column types such as `VARCHAR` to the major types above. |
| 106 | |
| 107 | Chrome database schemas should avoid type affinity, and should not include any |
| 108 | information ignored by SQLite. |
| 109 | |
| 110 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 111 | #### Indexing {#indexing-model} |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 112 | |
| 113 | SQLite [uses B-trees](https://www.sqlite.org/fileformat2.html#pages) to store |
| 114 | both table and index data. |
| 115 | |
| 116 | The exclusive use of B-trees reduces the amount of schema design decisions. |
| 117 | Notable examples: |
| 118 | |
| 119 | * There is no equivalent to |
| 120 | [PostgreSQL's index types](https://www.postgresql.org/docs/13/indexes-types.html). |
| 121 | In particular, since there are no hashed indexes, the design does not need to |
| 122 | consider whether the index only needs to support equality queries, as opposed |
| 123 | to greater/smaller than comparisons. |
| 124 | |
| 125 | * There is no equivalent to |
| 126 | [PostgreSQL's table access methods](https://www.postgresql.org/docs/13/tableam.html). |
| 127 | Each table is clustered by a primary key index, which is implicitly stored in |
| 128 | the table's B-tree. |
| 129 | |
| 130 | By default, table rows (records / tuples) are stored in a B-tree keyed by |
| 131 | [rowid](https://sqlite.org/lang_createtable.html#rowid), an automatically |
| 132 | assigned 64-bit integer key. Effectively, these tables are clustered by rowid, |
| 133 | which acts as an implicit primary key. Opting out of this SQLite-specific |
| 134 | default requires appending |
| 135 | [`WITHOUT ROWID`](https://sqlite.org/withoutrowid.html) to the `CREATE TABLE` |
| 136 | instruction. |
| 137 | |
| 138 | SQLite's [B-tree page format](https://sqlite.org/fileformat2.html#b_tree_pages) |
| 139 | has optimized special cases for tables clustered by rowid. This makes rowid the |
| 140 | most efficient [surrogate key](https://en.wikipedia.org/wiki/Surrogate_key) |
| 141 | implementation in SQLite. To make this optimization easier to use, any column |
| 142 | that is a primary key and has an `INTEGER` type is considered an alias for |
| 143 | rowid. |
| 144 | |
Victor Costan | c44ad3c | 2021-06-02 18:43:03 | [diff] [blame] | 145 | Each SQLite index |
| 146 | [is stored in a B-tree](https://sqlite.org/fileformat2.html#representation_of_sql_indices). |
| 147 | Each index entry is stored as a B-tree node whose key is made up of the record's |
| 148 | index key column values, followed by the record's primary key column values. |
| 149 | |
| 150 | `WITHOUT ROWID` table indexes can include primary key columns without additional |
| 151 | storage costs. This is because indexes for `WITHOUT ROWID` tables enjoy |
| 152 | [a space optimization](https://sqlite.org/fileformat2.html#representation_of_sql_indices) |
| 153 | where columns in both the primary key and the index key are not stored twice in |
Andrew Paseltiner | 839fc23 | 2023-02-21 19:10:40 | [diff] [blame] | 154 | B-tree nodes. Note that data in such tables cannot be recovered by `sql::Recovery`. |
Victor Costan | c44ad3c | 2021-06-02 18:43:03 | [diff] [blame] | 155 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 156 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 157 | ### Statement execution model {#query-model} |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 158 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 159 | At [a very high level](https://www.sqlite.org/arch.html), SQLite compiles SQL |
| 160 | statements (often called *queries*) into bytecode executed by a virtual machine |
| 161 | called the VDBE, or [the bytecode engine](https://www.sqlite.org/opcode.html). |
| 162 | A compiled statement can be executed multiple times, amortizing the costs of |
| 163 | query parsing and planning. Chrome's SQLite abstraction layer makes it easy to |
| 164 | use compiled queries. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 165 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 166 | Assuming effective use of cached statements, the performance of a SQL statement |
| 167 | comes down to the *query plan* that SQLite generates for the statement. The |
| 168 | query plan is the sequence of B-tree accesses used to execute the statement, |
| 169 | which determines the number of B-tree pages touched. |
| 170 | |
| 171 | The rest of this section summarizes the following SQLite documentation pages. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 172 | |
| 173 | 1. [query planner overview](https://www.sqlite.org/queryplanner.html) |
| 174 | 2. [query optimizer overview](https://www.sqlite.org/optoverview.html) |
| 175 | 3. [`EXPLAIN QUERY PLAN` output description](https://www.sqlite.org/eqp.html) |
| 176 | |
Victor Costan | d9f80b5 | 2021-09-23 23:50:23 | [diff] [blame] | 177 | At a high level, a SQLite query plan is a sequence of **nested** loops, where |
| 178 | each loop iterates over the data in a B-tree. Each loop can use the current |
| 179 | record of the outer loops. |
| 180 | |
| 181 | TODO: Complete this section. Cover joins, sorting, etc. |
| 182 | |
| 183 | #### Getting SQLite's query plans |
| 184 | |
| 185 | Ideally, the SQL schemas and statements used by Chrome features would be simple |
| 186 | enough that the query plans would be obvious to the reader. |
| 187 | |
| 188 | When this isn't the case, the fastest way to get the query plan is to load the |
| 189 | schema in [the SQLite shell](https://sqlite.org/cli.html), and use |
| 190 | [`EXPLAIN QUERY PLAN`](https://www.sqlite.org/eqp.html). |
| 191 | |
| 192 | The following command builds a SQLite shell that uses Chrome's build of SQLite, |
| 193 | and supports the `EXPLAIN QUERY PLAN` command. |
| 194 | |
| 195 | ```sh |
| 196 | autoninja -C out/Default sqlite_dev_shell |
| 197 | ``` |
| 198 | |
| 199 | Inside the SQLite shell, the `.eqp on` directive automatically shows the results |
| 200 | of `EXPLAIN QUERY PLAN` for every SQL statement executed in the shell. |
| 201 | |
| 202 | |
| 203 | #### Query steps {#query-step-types} |
| 204 | |
| 205 | Query steps are the building blocks of SQLite query plans. Each query step is |
| 206 | essentially a loop that iterates over the records in a B-tree. These loops |
| 207 | differ in terms of how many B-tree pages they touch, and how many records they |
| 208 | produce. This sub-section lists the types of steps implemented by SQLite. |
| 209 | |
| 210 | ##### Scans |
| 211 | |
| 212 | Scans visit an entire (table or index) B-tree. For this reason, scans are almost |
| 213 | never acceptable in Chrome. Most of our features don't have limits on the amount |
| 214 | of stored data, so scans can result in an unbounded amount of I/O. |
| 215 | |
| 216 | A *table scan* visits the entire table's B-tree. |
| 217 | |
| 218 | A *covering index scan* visits an entire index B-tree, but doesn't access the |
| 219 | associated table B-tree. |
| 220 | |
| 221 | SQLite doesn't have any special optimization for `COUNT(*)` queries. In other |
| 222 | words, SQLite does not track subtree sizes in its B-tree nodes. |
| 223 | |
| 224 | Reviewers sometimes emphasize performance issues by calling the scans *full* |
| 225 | table scans and *full* index scans, where "full" references the fact that the |
| 226 | number of B-tree pages accessed is proportional to the entire data set stored on |
| 227 | disk. |
| 228 | |
| 229 | TODO: Complete this section. Add examples in a way that doesn't make the section |
| 230 | overly long. |
| 231 | |
| 232 | ##### Searches |
| 233 | |
| 234 | Searches access a subset of a (table or index) B-tree nodes. Searches limit the |
| 235 | amount of nodes they need to access based on query restrictions, such as terms |
| 236 | in the `WHERE` clause. Seeing a `SEARCH` in a query plan is not a guarantee of |
| 237 | performance. Searches can vary wildly in the amount of B-tree pages they need to |
| 238 | access. |
| 239 | |
| 240 | One of the fastest possible searches is a *table search* that performs exactly |
| 241 | one B-tree lookup, and produces at most one record. |
| 242 | |
| 243 | The other fastest possible search is a *covering index search* that also |
| 244 | performs one lookup, and produces at most one record. |
| 245 | |
| 246 | TODO: Complete this section. Add examples in a way that doesn't make the section |
| 247 | overly long. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 248 | |
| 249 | |
| 250 | ## General advice |
| 251 | |
| 252 | The following pieces of advice usually come up in code reviews. |
| 253 | |
Victor Costan | f265dd7f | 2021-05-26 17:47:48 | [diff] [blame] | 254 | |
| 255 | ### Quickly iterating on SQL statements |
| 256 | |
| 257 | [The SQLite shell](https://sqlite.org/cli.html) offers quick feedback for |
| 258 | converging on valid SQL statement syntax, and avoiding SQLite features that are |
| 259 | disabled in Chrome. In addition, the |
| 260 | [`EXPLAIN`](https://www.sqlite.org/lang_explain.html) and |
| 261 | [`EXPLAIN QUERY PLAN`](https://www.sqlite.org/eqp.html) statements show the |
| 262 | results of SQLite's query planner and optimizer, which are very helpful for |
Victor Costan | 54f8955a | 2021-07-09 04:29:08 | [diff] [blame] | 263 | reasoning about the performance of complex queries. The SQLite shell directive |
| 264 | `.eqp on` automatically issues `EXPLAIN QUERY PLAN` for all future commands. |
| 265 | |
Victor Costan | f265dd7f | 2021-05-26 17:47:48 | [diff] [blame] | 266 | |
| 267 | The following commands set up SQLite shells using Chrome's build of SQLite. |
| 268 | |
| 269 | ```sh |
| 270 | autoninja -C out/Default sqlite_shell sqlite_dev_shell |
| 271 | ``` |
| 272 | |
| 273 | * `sqlite_shell` runs the SQLite build that we ship in Chrome. It offers the |
| 274 | ground truth on whether a SQL statement can be used in Chrome code or not. |
| 275 | * `sqlite_dev_shell` enables the `EXPLAIN` and `EXPLAIN QUERY PLAN` statements, |
| 276 | as well as a few features used by [Perfetto](https://perfetto.dev/)'s analysis |
| 277 | tools. |
| 278 | |
| 279 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 280 | ### SQL style |
| 281 | |
| 282 | SQLite queries are usually embedded as string literals in C++ code. The |
| 283 | advice here has the following goals. |
| 284 | |
| 285 | 1. Easy to read queries. The best defense against subtle bugs is making the |
| 286 | queries very easy to read, so that any bugs become obvious at code review |
| 287 | time. SQL string literals don't benefit from our code analysis |
| 288 | infrastructure, so the only lines of defense against bugs are testing and |
| 289 | code review. |
| 290 | |
| 291 | 2. Simplify crash debugging. We will always have a low volume of non-actionable |
| 292 | crash reports, because Chrome runs on billions of devices, some of which have |
| 293 | faulty RAM or processors. |
| 294 | |
| 295 | 3. No unnecessary performance overheads. The C++ optimizer doesn't understand |
| 296 | SQL query literals, so the queries end up as written in the Chrome binary. |
| 297 | Extra characters cost binary size, as well as CPU time (which turns into |
| 298 | battery usage) during query parsing. |
| 299 | |
| 300 | 4. Match the embedding language (C++) style guide. This reduces the mental |
| 301 | context switch overhead for folks who write and/or review C++ code that |
| 302 | contains SQL. |
| 303 | |
| 304 | Format statements like so. |
| 305 | |
| 306 | ```cc |
| 307 | static constexpr char kOriginInfoSql[] = |
| 308 | // clang-format off |
| 309 | "CREATE TABLE origin_infos(" |
Victor Costan | 76898750 | 2021-09-17 22:29:34 | [diff] [blame] | 310 | "origin TEXT NOT NULL," |
| 311 | "last_modified INTEGER NOT NULL," |
| 312 | "secure INTEGER NOT NULL)"; |
| 313 | // clang-format on |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 314 | |
| 315 | static constexpr char kInsertSql[] = |
Victor Costan | 76898750 | 2021-09-17 22:29:34 | [diff] [blame] | 316 | // clang-format off |
| 317 | "INSERT INTO infos(origin,last_modified,secure) " |
| 318 | "VALUES(?,?,?)"; |
| 319 | // clang-format on |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 320 | |
| 321 | static constexpr char kSelectSql[] = |
Victor Costan | 76898750 | 2021-09-17 22:29:34 | [diff] [blame] | 322 | // clang-format off |
| 323 | "SELECT origin,last_modified,secure FROM origins " |
| 324 | "WHERE last_modified>? " |
| 325 | "ORDER BY last_modified"; |
| 326 | // clang-format on |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 327 | ``` |
| 328 | |
Victor Costan | c44ad3c | 2021-06-02 18:43:03 | [diff] [blame] | 329 | * [SQLite keywords](https://sqlite.org/lang_keywords.html) should use ALL CAPS. |
| 330 | This makes SQL query literals easier to distinguish and search for. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 331 | |
| 332 | * Identifiers, such as table and row names, should use snake_case. |
| 333 | |
| 334 | * Identifiers, keywords, and parameter placeholders (`?`) should be separated by |
| 335 | exactly one character. Separators may be spaces (` `), commas (`,`), or |
| 336 | parentheses (`(`, `)`). |
| 337 | |
| 338 | * Statement-ending semicolons (`;`) are omitted. |
| 339 | |
| 340 | * SQL statements are stored in variables typed `static constexpr char[]`, or in |
| 341 | string literals passed directly to methods. |
| 342 | |
| 343 | * [`INSERT` statements](https://sqlite.org/lang_insert.html) should list all the |
| 344 | table columns by name, in the same order as the corresponding `CREATE TABLE` |
| 345 | statements. |
| 346 | |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 347 | * [`SELECT` statements](https://sqlite.org/lang_select.html) should list the |
| 348 | desired table columns by name, in the same order as the corresponding |
| 349 | `CREATE TABLE` statements. `SELECT *` is strongly discouraged, at least until |
| 350 | we have schema checks on database opens. |
| 351 | |
| 352 | * [`SELECT` statements](https://sqlite.org/lang_select.html) that retrieve more |
| 353 | than one row should include an |
| 354 | [`ORDER BY` clause](https://sqlite.org/lang_select.html#the_order_by_clause) |
| 355 | to clarify the implicit ordering. |
| 356 | * SELECTs whose outer loop is a table search or table scan implicitly order |
| 357 | results by [rowid](https://sqlite.org/lang_createtable.html#rowid) or, in |
| 358 | the case of [`WITHOUT ROWID`](https://sqlite.org/withoutrowid.html) tables, |
| 359 | by the table's primary key. |
| 360 | * SELECTs whose outer loop is an index scan or index search order results |
| 361 | according to that index. |
| 362 | |
| 363 | * [`CREATE INDEX` statements](https://sqlite.org/lang_createindex.html) should |
| 364 | immediately follow the |
| 365 | [`CREATE TABLE` statement](https://sqlite.org/lang_createtable.html) for the |
| 366 | indexed table. |
| 367 | |
| 368 | * Explicit `CREATE UNIQUE INDEX` statements should be preferred to |
| 369 | [`UNIQUE` constraints on `CREATE TABLE`](https://sqlite.org/lang_createtable.html#unique_constraints). |
| 370 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 371 | * Values must either be embedded in the SQL statement string literal, or bound |
| 372 | using [parameters](https://www.sqlite.org/lang_expr.html#varparam). |
| 373 | |
| 374 | * Parameter placeholders should always use the `?` syntax. Alternative syntaxes, |
| 375 | such as `?NNN` or `:AAAA`, have few benefits in a codebase where the `Bind` |
| 376 | statements are right next to the queries, and are less known to readers. |
| 377 | |
Victor Costan | 6c962e5 | 2022-02-14 19:10:49 | [diff] [blame] | 378 | * SQL statements should be embedded in C++ as string literals. The `char[]` type |
| 379 | makes it possible for us to compute query length at compile time in the |
| 380 | future. The `static` and `constexpr` qualifiers both ensure optimal code |
| 381 | generation. |
| 382 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 383 | * Do not execute multiple SQL statements (e.g., by calling `Step()` or `Run()` |
| 384 | on `sql::Statement`) on the same C++ line. It's difficult to get more than |
| 385 | line numbers from crash reports' stack traces. |
| 386 | |
| 387 | |
| 388 | ### Schema style |
| 389 | |
Victor Costan | c44ad3c | 2021-06-02 18:43:03 | [diff] [blame] | 390 | Identifiers (table / index / column names and aliases) must not be |
| 391 | [current SQLite keywords](https://sqlite.org/lang_keywords.html). Identifiers |
| 392 | may not start with the `sqlite_` prefix, to avoid conflicting with the name of a |
| 393 | [SQLite internal schema object](https://www.sqlite.org/fileformat2.html#storage_of_the_sql_database_schema). |
| 394 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 395 | Column types should only be one of the the SQLite storage types (`INTEGER`, |
| 396 | `REAL`, `TEXT`, `BLOB`), so readers can avoid reasoning about SQLite's type |
| 397 | affinity. |
| 398 | |
| 399 | Columns that will store boolean values should have the `INTEGER` type. |
| 400 | |
| 401 | Columns that will store `base::Time` values should have the `INTEGER` type. |
| 402 | Values should be serialized using `sql::Statement::BindTime()` and deserialized |
| 403 | using `sql::Statement::ColumnTime()`. |
| 404 | |
| 405 | Column types should not include information ignored by SQLite, such as numeric |
| 406 | precision or scale specifiers, or string length specifiers. |
| 407 | |
Victor Costan | 0166412 | 2021-05-26 19:11:38 | [diff] [blame] | 408 | Columns should have |
| 409 | [`NOT NULL` constraints](https://sqlite.org/lang_createtable.html#not_null_constraints) |
| 410 | whenever possible. This saves maintainers from having to reason about the less |
| 411 | intuitive cases of [`NULL` handling](https://sqlite.org/nulls.html). |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 412 | |
Victor Costan | 0166412 | 2021-05-26 19:11:38 | [diff] [blame] | 413 | `NOT NULL` constraints must be explicitly stated in column definitions that |
| 414 | include `PRIMARY KEY` specifiers. For historical reasons, SQLite |
| 415 | [allows NULL primary keys](https://sqlite.org/lang_createtable.html#the_primary_key) |
| 416 | in most cases. When a table's primary key is composed of multiple columns, |
| 417 | each column's definition should have a `NOT NULL` constraint. |
| 418 | |
| 419 | Columns should avoid `DEFAULT` values. Columns that have `NOT NULL` constraints |
| 420 | and lack a `DEFAULT` value are easier to review and maintain, as SQLite takes |
| 421 | over the burden of checking that `INSERT` statements aren't missing these |
| 422 | columns. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 423 | |
| 424 | Surrogate primary keys should use the column type `INTEGER PRIMARY KEY`, to take |
| 425 | advantage of SQLite's rowid optimizations. |
| 426 | [`AUTOINCREMENT`](https://www.sqlite.org/autoinc.html) should only be used where |
| 427 | primary key reuse would be unacceptable. |
| 428 | |
| 429 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 430 | ### Sophisticated features |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 431 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 432 | SQLite exposes a vast array of functionality via SQL statements, many of which |
| 433 | are not a good match for average Chrome feature code. Use them with care and if |
| 434 | you know what you're doing. |
Victor Costan | 9e480bc1 | 2021-05-21 17:50:46 | [diff] [blame] | 435 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 436 | #### PRAGMA statements {#no-pragmas} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 437 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 438 | [`PRAGMA` statements](https://www.sqlite.org/pragma.html) should never be used |
| 439 | directly. Chrome's SQLite abstraction layer should be modified to support the |
| 440 | desired effects instead. |
| 441 | |
| 442 | Direct `PRAGMA` use limits our ability to customize and secure our SQLite build. |
| 443 | `PRAGMA` statements may turn on code paths with less testing / fuzzing coverage. |
| 444 | Furthermore, some `PRAGMA` statements invalidate previously compiled queries, |
| 445 | reducing the efficiency of Chrome's compiled query cache. |
| 446 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 447 | #### Foreign key constraints {#no-foreign-keys} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 448 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 449 | [SQL foreign key constraints](https://sqlite.org/foreignkeys.html) should not be |
| 450 | used. All data validation should be performed using explicit `SELECT` statements |
| 451 | (generally wrapped as helper methods) inside transactions. Cascading deletions |
| 452 | should be performed using explicit `DELETE` statements inside transactions. |
| 453 | |
| 454 | Chrome features cannot rely on foreign key enforcement, due to the |
| 455 | possibility of data corruption. Furthermore, foreign key constraints make it |
| 456 | more difficult to reason about system behavior (Chrome feature code + SQLite) |
| 457 | when the database gets corrupted. Foreign key constraints also make it more |
| 458 | difficult to reason about query performance. |
| 459 | |
Will Harris | 563a50c4 | 2023-03-29 21:40:35 | [diff] [blame] | 460 | As a result, foreign key constraints are not enforced on SQLite databases |
| 461 | opened with Chrome's `sql::Database` infrastructure. |
Victor Costan | 54d7138 | 2022-03-21 21:47:18 | [diff] [blame] | 462 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 463 | After |
| 464 | [WebSQL](https://www.w3.org/TR/webdatabase/) is removed from Chrome, we plan |
| 465 | to disable SQLite's foreign key support using |
| 466 | [SQLITE_OMIT_FOREIGN_KEY](https://sqlite.org/compile.html#omit_foreign_key). |
| 467 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 468 | #### CHECK constraints {#no-checks} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 469 | |
| 470 | [SQL CHECK constraints](https://sqlite.org/lang_createtable.html#check_constraints) |
| 471 | should not be used, for the same reasons as foreign key constraints. The |
| 472 | equivalent checks should be performed in C++, typically using `DCHECK`. |
| 473 | |
Andrew Paseltiner | b181d0d | 2025-04-04 14:22:58 | [diff] [blame] | 474 | After |
| 475 | [WebSQL](https://www.w3.org/TR/webdatabase/) is removed from Chrome, we plan |
| 476 | to disable SQLite's CHECK constraint support using |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 477 | [SQLITE_OMIT_CHECK](https://sqlite.org/compile.html#omit_check). |
| 478 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 479 | #### Triggers {#triggers} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 480 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 481 | [SQL triggers](https://sqlite.org/lang_createtrigger.html) should be used |
| 482 | thoughtfully. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 483 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 484 | Use triggers only if they *increase* legibility and *reduce* query complexity. |
| 485 | The vast majority of Chrome features should not need triggers as their SQL |
| 486 | queries are typically very simple, and triggers can obfuscate behavior by |
| 487 | introducing an easy-to-miss side effect to certain queries. It's generally |
| 488 | preferred to "manually" execute a second query with the desired effect. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 489 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 490 | There are exceptions to this rule, such as when the follow-up query would itself |
| 491 | be much more complex than the trigger, and/or a single trigger can take the |
| 492 | place of multiple queries. |
Victor Costan | 7c23482 | 2021-07-13 03:03:02 | [diff] [blame] | 493 | |
Evan Stade | ff25d2d | 2025-06-24 19:03:44 | [diff] [blame] | 494 | Triggers are disabled by default, with hopes that this will encourage feature |
| 495 | authors to carefully weigh their use against the alternatives, but can be |
| 496 | enabled via `DatabaseOptions::set_triggers_enabled()`. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 497 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 498 | #### Common Table Expressions {#no-ctes} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 499 | |
| 500 | [SQL Common Table Expressions (CTEs)](https://sqlite.org/lang_with.html) should |
| 501 | not be used. Chrome's SQL schemas and queries should be simple enough that |
| 502 | the factoring afforded by |
| 503 | [ordinary CTEs](https://sqlite.org/lang_with.html#ordinary_common_table_expressions) |
| 504 | is not necessary. |
| 505 | [Recursive CTEs](https://sqlite.org/lang_with.html#recursive_common_table_expressions) |
| 506 | should be implemented in C++. |
| 507 | |
| 508 | Common Table Expressions do not open up any query optimizations that would not |
| 509 | be available otherwise, and make it more difficult to review / analyze queries. |
| 510 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 511 | #### Views {#no-views} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 512 | |
| 513 | SQL views, managed by the |
| 514 | [`CREATE VIEW` statement](https://www.sqlite.org/lang_createview.html) and the |
| 515 | [`DROP VIEW` statement](https://www.sqlite.org/lang_dropview.html), should not |
| 516 | be used. Chrome's SQL schemas and queries should be simple enough that the |
| 517 | factoring afforded by views is not necessary. |
| 518 | |
| 519 | Views are syntactic sugar, and do not open up any new SQL capabilities. SQL |
| 520 | statements on views are more difficult to understand and maintain, because of |
| 521 | the extra layer of indirection. |
| 522 | |
Victor Costan | fe078f9 | 2021-07-19 20:02:59 | [diff] [blame] | 523 | Access to views is disabled by default for SQLite databases opened with Chrome's |
| 524 | `sql::Database` infrastructure. This is intended to steer feature developers |
| 525 | away from the discouraged feature. |
| 526 | |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 527 | After |
| 528 | [WebSQL](https://www.w3.org/TR/webdatabase/) is removed from Chrome, we plan |
| 529 | to disable SQLite's VIEW support using |
| 530 | [SQLITE_OMIT_VIEW](https://www.sqlite.org/compile.html#omit_view). |
| 531 | |
Victor Costan | 5c2e8551 | 2021-07-17 00:11:53 | [diff] [blame] | 532 | #### Double-quoted string literals {#no-double-quoted-strings} |
| 533 | |
| 534 | String literals should always be single-quoted. That being said, string literals |
| 535 | should be rare in Chrome code, because any user input must be injected using |
| 536 | statement parameters and the `Statement::Bind*()` methods. |
| 537 | |
| 538 | Double-quoted string literals are non-standard SQL syntax. The SQLite authors |
| 539 | [currently consider this be a misfeature](https://www.sqlite.org/quirks.html#double_quoted_string_literals_are_accepted). |
| 540 | |
Victor Costan | 4e442d0 | 2021-07-20 17:43:13 | [diff] [blame] | 541 | SQLite support for double-quoted string literals is disabled for databases |
Andrew Paseltiner | 986a13ba | 2025-04-02 18:46:29 | [diff] [blame] | 542 | opened with Chrome's `sql::Database` infrastructure. |
Victor Costan | 5c2e8551 | 2021-07-17 00:11:53 | [diff] [blame] | 543 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 544 | #### Compound SELECT statements {#no-compound-queries} |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 545 | |
| 546 | [Compound SELECT statements](https://www.sqlite.org/lang_select.html#compound_select_statements) |
| 547 | should not be used. Such statements should be broken down into |
| 548 | [simple SELECT statements](https://www.sqlite.org/lang_select.html#simple_select_processing), |
| 549 | and the operators `UNION`, `UNION ALL`, `INTERSECT` and `EXCEPT` should be |
| 550 | implemented in C++. |
| 551 | |
| 552 | A single compound SELECT statement is more difficult to review and properly |
| 553 | unit-test than the equivalent collection of simple SELECT statements. |
Victor Costan | 9e480bc1 | 2021-05-21 17:50:46 | [diff] [blame] | 554 | Furthermore, the compound SELECT statement operators can be implemented more |
| 555 | efficiently in C++ than in SQLite's bytecode interpreter (VDBE). |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 556 | |
| 557 | After |
| 558 | [WebSQL](https://www.w3.org/TR/webdatabase/) is removed from Chrome, we plan |
| 559 | to disable SQLite's compound SELECT support using |
| 560 | [SQLITE_OMIT_COMPOUND_SELECT](https://www.sqlite.org/compile.html#omit_compound_select). |
| 561 | |
Victor Costan | bad4d6e2 | 2021-07-13 01:09:45 | [diff] [blame] | 562 | #### Built-in functions {#no-builtin-functions} |
Victor Costan | 9e480bc1 | 2021-05-21 17:50:46 | [diff] [blame] | 563 | |
| 564 | SQLite's [built-in functions](https://sqlite.org/lang_corefunc.html) should be |
| 565 | only be used in SQL statements where they unlock significant performance |
| 566 | improvements. Chrome features should store data in a format that leaves the most |
| 567 | room for query optimizations, and perform any necessary transformations after |
| 568 | reading / before writing the data. |
| 569 | |
| 570 | * [Aggregation functions](https://sqlite.org/lang_aggfunc.html) are best |
| 571 | replaced with C++ code that iterates over rows and computes the desired |
| 572 | results. |
| 573 | * [Date and time functions](https://sqlite.org/lang_datefunc.html) are best |
| 574 | replaced by `base::Time` functionality. |
| 575 | * String-processing functions, such as |
| 576 | [`printf()`](https://sqlite.org/printf.html) and `trim()` are best replaced |
| 577 | by C++ code that uses the helpers in `//base/strings/`. |
| 578 | * Wrappers for [SQLite's C API](https://sqlite.org/c3ref/funclist.html), such as |
| 579 | `changes()`, `last_insert_rowid()`, and `total_changes()`, are best replaced |
| 580 | by functionality in `sql::Database` and `sql::Statement`. |
| 581 | * SQLite-specific functions, such as `sqlite_source_id()` and |
| 582 | `sqlite_version()` should not be necessary in Chrome code, and may suggest a |
| 583 | problem in the feature's design. |
| 584 | |
| 585 | [Math functions](https://sqlite.org/lang_mathfunc.html) and |
| 586 | [Window functions](https://sqlite.org/windowfunctions.html#biwinfunc) are |
| 587 | disabled in Chrome's SQLite build. |
| 588 | |
Evan Stade | 1d80896e | 2024-02-28 18:40:26 | [diff] [blame] | 589 | #### ATTACH DATABASE statements |
Victor Costan | fd24d8a | 2021-04-23 20:23:21 | [diff] [blame] | 590 | |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 591 | [`ATTACH DATABASE` statements](https://www.sqlite.org/lang_attach.html) should |
Evan Stade | 1d80896e | 2024-02-28 18:40:26 | [diff] [blame] | 592 | be used thoughtfully. Each Chrome feature should store its data in a single database. |
Victor Costan | 841e5b7 | 2021-04-20 06:14:32 | [diff] [blame] | 593 | Chrome code should not assume that transactions across multiple databases are |
| 594 | atomic. |
| 595 | |
Victor Costan | 31dfb7e7 | 2022-02-28 21:14:24 | [diff] [blame] | 596 | ### Disabled features |
| 597 | |
| 598 | We aim to disable SQLite features that should not be used in Chrome, subject to |
| 599 | the constraint of keeping WebSQL's feature set stable. We currently disable all |
| 600 | new SQLite features, to avoid expanding the attack surface exposed to WebSQL. |
| 601 | This stance may change once WebSQL is removed from Chrome. |
| 602 | |
| 603 | The following SQLite features have been disabled in Chrome. |
| 604 | |
| 605 | #### JSON |
| 606 | |
| 607 | Chrome features should prefer |
| 608 | [procotol buffers](https://developers.google.com/protocol-buffers) to JSON for |
| 609 | on-disk (persistent) serialization of extensible structured data. |
| 610 | |
| 611 | Chrome features should store the values used by indexes directly in their own |
| 612 | columns, instead of relying on |
| 613 | [SQLite's JSON support](https://www.sqlite.org/json1.html). |
| 614 | |
| 615 | #### UPSERT |
| 616 | |
| 617 | [SQLite's UPSERT implementation](https://www.sqlite.org/lang_UPSERT.html) has |
| 618 | been disabled in order to avoid increasing WebSQL's attack surface. UPSERT is |
| 619 | disabled using the `SQLITE_OMIT_UPSERT` macro, which is not currently included |
| 620 | in [the SQLite compile-time option list](https://www.sqlite.org/compile.html), |
| 621 | but exists in the source code. |
| 622 | |
| 623 | We currently think that the new UPSERT functionality is not essential to |
| 624 | implementing Chrome features efficiently. An example where UPSERT is necessary |
| 625 | for the success of a Chrome feature would likely get UPSERT enabled. |
| 626 | |
| 627 | #### Window functions |
| 628 | |
| 629 | [Window functions](https://sqlite.org/windowfunctions.html#biwinfunc) have been |
| 630 | disabled primarily because they cause a significant binary size increase, which |
| 631 | leads to a corresponding large increase in the attack surface exposed to WebSQL. |
| 632 | |
| 633 | Window functions increase the difficulty of reviewing and maintaining the Chrome |
| 634 | features that use them, because window functions add complexity to the mental |
| 635 | model of query performance. |
| 636 | |
| 637 | We currently think that this maintenance overhead of window functions exceeds |
| 638 | any convenience and performance benefits (compared to simpler queries |
| 639 | coordinated in C++). |
Andrew Paseltiner | ab20e8b9 | 2024-05-15 19:29:45 | [diff] [blame] | 640 | |
| 641 | #### Virtual tables {#no-virtual-tables} |
| 642 | |
| 643 | [`CREATE VIRTUAL TABLE` statements](https://www.sqlite.org/vtab.html) are |
| 644 | disabled. The desired functionality should be implemented in C++, and access |
| 645 | storage using standard SQL statements. |
| 646 | |
| 647 | Virtual tables are [SQLite's module system](https://www.sqlite.org/vtab.html). |
| 648 | SQL statements on virtual tables are essentially running arbitrary code, which |
| 649 | makes them very difficult to reason about and maintain. Furthermore, the virtual |
| 650 | table implementations don't receive the same level of fuzzing coverage as the |
| 651 | SQLite core. |
| 652 | |
| 653 | Chrome's SQLite build has virtual table functionality reduced to the minimum |
| 654 | needed to support an internal feature. |
| 655 | [SQLite's run-time loading mechanism](https://www.sqlite.org/loadext.html) is |
| 656 | disabled, and most |
| 657 | [built-in virtual tables](https://www.sqlite.org/vtablist.html) are disabled as |
| 658 | well. |
| 659 | |
| 660 | Ideally we would disable SQLite's virtual table support using |
| 661 | [SQLITE_OMIT_VIRTUALTABLE](https://sqlite.org/compile.html#omit_virtualtable) |
| 662 | now that [WebSQL](https://www.w3.org/TR/webdatabase/) has been removed from |
| 663 | Chrome, but virtual table support is required to use SQLite's |
| 664 | [built-in corruption recovery module](https://www.sqlite.org/recovery.html). The |
| 665 | [SQLITE_DBPAGE virtual table](https://www.sqlite.org/dbpage.html) is also |
| 666 | enabled only for corruption recovery and should not be used in Chrome. |