jq
is unbeatable for JSON processing, but its syntax requires a lot of learning.
jonq wraps jq
in a SQL-lish/pythonic layer you can read and remember.
Who It's For: Jonq is designed for anyone who needs to work with JSON data. It's good for quick JSON exploration, lightweight ETL tasks, or validating config files in CI pipelines.
jonq is NOT a database. It's NOT competing with DuckDB or Pandas. jonq is a command-line tool that makes jq accessible by wrapping it in SQL-like syntax.
- Quick JSON exploration
- jq beginners
- Ad-hoc JSON tasks - No setup, just run a command
- Config file validation - Quick checks in scripts
- Data analysis - Use Pandas, or Polars or DuckDB
- Complex joins - Use DuckDB, PostgreSQL, or whatever DB you want
- Business intelligence - Use proper BI tools
Category | What you can do | Example |
---|---|---|
Selection | Pick fields | select name, age |
Wildcard | All fields | select * |
Filtering | Python‑style opsand / or / between / contains |
if age > 30 and city = 'NY' |
Aggregations | sum avg min max count |
select avg(price) as avg_price |
Grouping | group by + having |
… group by city having count > 2 |
Ordering | sort <field> [asc|desc] <limit> |
sort age desc 5 |
Nested arrays | from [].orders or inline paths |
select products[].name … |
Inline maths | Real expressions | sum(items.price) * 2 as double_total |
CSV / stream | --format csv , --stream |
Task | Raw jq filter | jonq one‑liner |
---|---|---|
Select specific fields | jq '.[]|{name:.name,age:.age}' |
jonq data.json "select name, age" |
Filter rows | jq '.[]|select(.age > 30)|{name,age}' |
… "select name, age if age > 30" |
Sort + limit | `jq 'sort_by(.age) | reverse |
Deep filter | jq '.[]|select(.profile.address.city=="NY")|{name,city:.profile.address.city}' |
… "select name, profile.address.city if profile.address.city = 'NY'" |
Count items | `jq 'map(select(.age>25)) | length'` |
Group & count | `jq 'group_by(.city) | map({city:.[0].city,count:length})'` |
Complex boolean | `jq '.[] | select(.age>25 and (.city=="NY" or .city=="Chicago"))'` |
Group & HAVING | `jq 'group_by(.city) | map(select(length>2)) |
Aggregation expression | - | … "select sum(price) * 1.07 as total_gst" |
Nested‑array aggregation | - | … "select avg(products[].versions[].pricing.monthly) as avg_price" |
Take‑away: a single jonq
string replaces many pipes and brackets while still producing pure jq under the hood.
Aspect | jonq | DuckDB | Pandas |
---|---|---|---|
Primary Use Case | Fast, lightweight JSON querying directly from the command line | General-purpose data manipulation and analysis in Python | Analytical SQL queries on large datasets, including JSON |
Setup | No DB, streams any JSON | Requires DB file / extension | Requires a Python environment with pandas and its dependencies installed |
Query language | Familiar SQL‑ish, no funky json_extract |
SQL + JSON functions | Python code for data manipulation and analysis |
Footprint | Minimal: requires only jq (a ~500 KB binary); no environment setup | ~ 140 MB binary | Larger: ~20 MB for pandas and its dependencies |
Streaming | --stream processes line‑delimited JSON lazily |
Must load into table | Can process large files using chunking, but not as memory-efficient as streaming |
Memory Usage | Low; streams data to avoid loading full JSON into memory | In-memory database, but optimized for large data with columnar storage | Loads data into memory; can strain RAM with large datasets |
jq ecosystem | Leverages all jq filters for post‑processing | No | Part of the Python data science ecosystem; integrates with NumPy, Matplotlib, scikit-learn, etc |
- Instant JSON Querying, No Setup Hassle
You have a JSON file (data.json) and need to extract all records where age > 30 in seconds.
-
With
jonq
: Runjonq "SELECT * FROM data.json WHERE age > 30"
. Done. No environment setup, no imports—just install jq and go. -
Pandas: Fire up Python, write a script (
import pandas as pd; df = pd.read_json('data.json'); df[df['age'] > 30]
), and run it. More steps. -
DuckDB: Set up a database, load the JSON (
SELECT * FROM read_json('data.json') WHERE age > 30
), and execute. Powerful, but overkill for a quick task.
- Command-Line Power
Use Case: Chain commands in a pipeline, like cat data.json | jonq "SELECT name, age FROM stdin" | grep "John".
Jonq
thrives in shell scripts or CI/CD workflows. Pandas and DuckDB require scripting or a heavier integration layer.
- Lightweight and Efficient
Jonq
uses jq’s
streaming mode (--stream
) for large JSON files, processing data piece-by-piece instead of loading it all into memory.
Comparison: Pandas loads everything into a DataFrame (RAM-intensive), and while DuckDB is memory-efficient for analytics, it’s still a full database engine, thus there'll be significant overhead.
- SQL Simplicity for JSON
Example: jonq "SELECT name, email FROM users.json WHERE status = 'active' ORDER BY name"
.
Advantage: If you know SQL, "jonq" feels natural for JSON—no need to learn jq’s super difficult syntax.
- Speed for Ad-Hoc Tasks
Test Case: Querying a 1 GB JSON file for specific fields.
-
Jonq: Streams it in seconds with minimal memory use.
-
Pandas: Might choke or require chunking hacks.
-
DuckDB: Fast, but setup and SQL complexity add time.
Supported Platforms: Jonq works on Linux, macOS, and Windows with WSL.
- Python 3.9+
jq
command line tool installed (https://stedolan.github.io/jq/download/)
From PyPI
pip install jonq # latest stable
From source
git clone https://github.com/duriantaco/jonq.git
cd jonq && pip install -e .
Verify Installation: After installation, run jonq --version
to ensure it's working correctly.
For users dealing with large or complex nested JSON structures, we recommend installing the optional jonq_fast
Rust extension.
pip install jonq-fast
We will explain more about this down below
echo '[{"name":"Alice","age":30},{"name":"Bob","age":25}]' > data.json
jonq data.json "select name, age if age > 25"
# Output: [{"name":"Alice","age":30}]
The query syntax follows a simplified format:
select <fields> [if <condition>] [sort <field> [asc|desc] [limit]]
where:
<fields>
- Comma-separated list of fields to select or aggregationsif <condition>
- Optional filtering conditiongroup by <fields>
- Optional grouping by one or more fieldssort <field>
- Optional field to sort byasc|desc
- Optional sort direction (default: asc)limit
- Optional integer to limit the number of results
You can also refer to the json_test_files
for the test jsons and look up USAGE.md
guide. Anyway let's start with simple.json
.
Imagine a json like the following:
[
{ "id": 1, "name": "Alice", "age": 30, "city": "New York" },
{ "id": 2, "name": "Bob", "age": 25, "city": "Los Angeles" },
{ "id": 3, "name": "Charlie", "age": 35, "city": "Chicago" }
]
jonq path/to/simple.json "select *"
jonq path/to/simple.json "select name, age"
jonq path/to/simple.json "select name, age if age > 30"
jonq path/to/simple.json "select name, age sort age desc 2"
jonq path/to/simple.json "select sum(age) as total_age"
jonq path/to/simple.json "select avg(age) as average_age"
jonq path/to/simple.json "select count(age) as count"
Simple enough i hope? Now let's move on to nested jsons
Imagine a nested json like below:
[
{
"id": 1,
"name": "Alice",
"profile": {
"age": 30,
"address": { "city": "New York", "zip": "10001" }
},
"orders": [
{ "order_id": 101, "item": "Laptop", "price": 1200 },
{ "order_id": 102, "item": "Phone", "price": 800 }
]
},
{ "id": 2, "name": "Bob", "profile": { "age": 25, "address": { "city": "Los Angeles", "zip": "90001" } }, "orders": [ { "order_id": 103, "item": "Tablet", "price": 500 } ] }
]
# nested field access
jonq nested.json "select name, profile.age"
jonq nested.json "select name, profile.address.city"
# count array elements
jonq nested.json "select name, count(orders) as order_count"
# boolean logic (AND / OR / parentheses)
jonq nested.json "select name if profile.address.city = 'New York' or orders[0].price > 1000"
jonq nested.json "select name if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"
jonq nested.json "select name, profile.age if profile.address.city = 'New York' or orders[0].price > 1000"
### Find users who are both under 30 **and** from Los Angeles
jonq nested.json "select name, profile.age if profile.age < 30 and profile.address.city = 'Los Angeles'"
### Using parentheses for complex logic
jonq nested.json "select name, profile.age if (profile.age > 25 and profile.address.city = 'New York') or (profile.age < 26 and profile.address.city = 'Los Angeles')"
jonq can output results in CSV format using the --format csv
or -f csv
option:
jonq path/to/simple.json "select name, age" --format csv > output.csv
Using flatten_json in your code:
from jonq.csv_utils import flatten_json
import csv
data = {
"user": {
"name": "Alice",
"address": {"city": "New York"},
"orders": [
{"id": 1, "item": "Laptop", "price": 1200},
{"id": 2, "item": "Phone", "price": 800}
]
}
}
flattened = flatten_json(data, sep=".")
print(flattened)
For users dealing with large or complex nested JSON structures, we recommend installing the optional jonq_fast
Rust extension for significantly improved performance.
Once installed, you can use jonq_fast from the command line with the --fast
or -F
flag:
jonq data.json "select name, age" --format csv --fast > output.csv
This flag improves performance when converting to CSV format by using a faster JSON flattening implementation. The performance benefit is most noticeable with large or deeply nested JSON structures.
You can also use jonq_fast directly in your Python code:
import jonq_fast
import csv
# fake data
data = {
"user": {
"name": "Alice",
"address": {"city": "New York"},
"orders": [
{"id": 1, "item": "Laptop", "price": 1200},
{"id": 2, "item": "Phone", "price": 800}
]
}
}
flattened = jonq_fast.flatten(data, ".")
print(flattened)
# Output: {
# "user.name": "Alice",
# "user.address.city": "New York",
# "user.orders.0.id": 1,
# "user.orders.0.item": "Laptop",
# "user.orders.0.price": 1200,
# "user.orders.1.id": 2,
# "user.orders.1.item": "Phone",
# "user.orders.1.price": 800
# }
## write to your csv here
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(flattened.keys())
writer.writerow(flattened.values())
For processing large JSON files efficiently, jonq supports streaming mode with the --stream
or -s
option:
jonq path/to/large.json "select name, age" --stream
New: Streaming now uses async processing to handle chunks concurrently, providing performance improvements on large files. No changes to commands. Same flags, same syntax, just faster thats all.
- Make sure jq is installed on your system
- Verify jq is in your PATH by running
jq --version
- Install jq: https://stedolan.github.io/jq/download/
- Check your JSON file for syntax errors
- Verify the file exists and is readable
- Use a JSON validator to check your file structure
- Verify your query follows the correct syntax format
- Ensure field names match exactly what's in your JSON
- Check for missing quotes around string values in conditions
- Verify your condition isn't filtering out all records
- Check if your field names match the casing in the JSON
- For nested fields, ensure the dot notation path is correct
- Performance: For very large JSON files (100MB+), processing may be slow.
- Advanced jq Features: Some advanced jq features aren't exposed in the jonq syntax.
- Multiple File Joins: No support for joining data from multiple JSON files.
- Custom Functions: User-defined functions aren't supported in the current version.
- Date/Time Operations: Limited support for date/time parsing or manipulation.
Pandas: Go here for complex analysis (e.g., merging datasets, statistical ops, plotting). Jonq
won’t crunch numbers or integrate with machine learning libraries.
DuckDB: Pick this for big data analytics with joins, aggregates, or window functions across multiple files. Jonq
is simpler, not a database.
Docs here: https://jonq.readthedocs.io/en/latest/
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- jq: This tool depends on the jq command-line JSON processor, which is licensed under the MIT License. jq is copyright (C) 2012 Stephen Dolan.
The jq tool itself is not included in this package - users need to install it separately.