cat on steroids — Parquet, CSV, JSONL, Avro, Excel, SQL queries, and remote sources
Every GNU cat flag works. -n, -b, -s, -A, -v — all of them. If you know cat, you know mcat.
Parquet, CSV, TSV, JSONL, JSON, Avro, Excel — auto-detected by extension or magic bytes.
Filter with --query using SQL WHERE clauses. Powered by DuckDB with predicate pushdown on Parquet.
Column profiling from Parquet metadata — min, max, nulls, uniques with zero full-file I/O.
Stream from S3, GCS, Azure, HTTP, MinIO, R2. Zero-config auth — uses your existing credentials.
Transparent decompression for gzip, zstd, bz2, lz4, xz. Works on local and remote files.
pip install mcat
# Install uv first (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install mcat
uv tool install mcat
brew tap christyjacob4/tap
brew install mcat
# Add the PPA
sudo add-apt-repository ppa:christyjacob4/mcat
sudo apt update
# Install mcat
sudo apt install mcat
$ mcat sales_data.parquet ┌─────────────┬──────────┬─────────┬───────────┐ │ name │ region │ sales │ quarter │ ├─────────────┼──────────┼─────────┼───────────┤ │ Alice Chen │ APAC │ 94,230 │ Q1 2024 │ │ Bob Muller │ EMEA │ 71,450 │ Q1 2024 │ │ Carol Smith │ Americas │ 88,920 │ Q1 2024 │ │ David Park │ APAC │ 102,100 │ Q2 2024 │ │ Eve Santos │ Americas │ 67,800 │ Q2 2024 │ └─────────────┴──────────┴─────────┴───────────┘
$ mcat sales.parquet --query "sales > 80000 AND region = 'APAC'" ┌─────────────┬──────────┬─────────┬───────────┐ │ name │ region │ sales │ quarter │ ├─────────────┼──────────┼─────────┼───────────┤ │ Alice Chen │ APAC │ 94,230 │ Q1 2024 │ │ David Park │ APAC │ 102,100 │ Q2 2024 │ └─────────────┴──────────┴─────────┴───────────┘
$ mcat --diff q1_sales.csv q2_sales.csv ┌─────┬────────┬─────────────┬─────────────────────────┐ │ Row │ Status │ name │ sales │ ├─────┼────────┼─────────────┼─────────────────────────┤ │ 0 │ ~ │ Alice Chen │ 94,230 → 98,100 │ │ 1 │ │ Bob Muller │ 71,450 │ │ 2 │ ~ │ Carol Smith │ 88,920 → 91,340 │ │ 3 │ + │ Frank Lee │ 55,200 │ └─────┴────────┴─────────────┴─────────────────────────┘
$ mcat --stats sales_data.parquet Stats sales_data.parquet (1,234,567 rows · 4 columns) Column Type Non-Null Null Min Max Mean —————————————————————————————————————————————— name STRING 1,234,567 0 Aaron Zoe — age INT64 1,230,000 4,567 18 94 36.4 salary FLOAT64 1,200,000 34,567 22,000 450,000 87,432 region STRING 1,234,567 0 APAC EMEA —
mcat file.txt
mcat -n file.txt
mcat -A file.txt
echo "hello" | mcat
mcat data.parquet
mcat data.csv
mcat --format jsonl data.parquet
mcat --schema data.parquet
mcat --head 10 data.parquet
mcat --tail 5 data.csv
mcat --columns name,age data.parquet
mcat --sample 20 data.parquet
mcat data.parquet \
--query "age > 30 AND city = 'NYC'"
mcat data.csv --query "salary > 50000" \
--format jsonl
mcat data.parquet --sort age
mcat data.parquet --sort -age,name
mcat data.csv --grep "Smith"
mcat data.csv --grep "NYC" --head 5
mcat --diff old.csv new.csv
mcat --stats data.parquet
mcat --count data.parquet
mcat --detect data.parquet
mcat s3://bucket/data.parquet
mcat gs://bucket/data.parquet
mcat https://example.com/data.csv
mcat --s3-endpoint https://play.min.io \
s3://mybucket/data.parquet
mcat data.parquet.gz
mcat data.csv.zst --head 100
mcat data.parquet -o data.jsonl \
--format jsonl
mcat data.parquet --pager
Formats are auto-detected by extension, then by magic bytes (PAR1, Obj\x01) as fallback.
| Format | Extensions | Features |
|---|---|---|
| Parquet | .parquet .pq | Row-group streaming, schema inspect, instant count/stats |
| Avro | .avro | Stream blocks, schema inspect |
| CSV | .csv | Table with headers, auto-detect delimiter |
| TSV | .tsv | Table with headers |
| JSONL | .jsonl .ndjson | Pretty-print records |
| JSON | .json | Array of objects or single object |
| Excel | .xlsx .xls | First sheet, both legacy and modern |
Streaming via fsspec — no full downloads, range requests where supported.
| Protocol | Backend | Notes |
|---|---|---|
s3:// | s3fs + boto3 | AWS S3, MinIO, R2, B2, DO Spaces |
gs:// | gcsfs | Google Cloud Storage |
az:// | adlfs | Azure Blob Storage |
https:// | fsspec built-in | Range requests where supported |
| S3-compatible | s3fs + --s3-endpoint | MinIO, Cloudflare R2, Backblaze B2 |
Zero-config auth — mcat uses credentials you've already set up for your cloud provider.
# AWS CLI (recommended)
aws configure
mcat s3://my-bucket/data.parquet
# Environment variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
mcat s3://my-bucket/data.parquet
# Named profile
AWS_PROFILE=prod mcat s3://my-bucket/data.parquet
# Per-command endpoint
mcat --s3-endpoint https://play.min.io s3://mybucket/data.parquet
# Environment variable
export AWS_ENDPOINT_URL=https://play.min.io
mcat s3://mybucket/data.parquet
# Cloudflare R2
mcat --s3-endpoint https://<account>.r2.cloudflarestorage.com \
s3://bucket/file.parquet
# gcloud CLI (recommended)
gcloud auth application-default login
mcat gs://my-bucket/data.parquet
# Service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
mcat gs://my-bucket/data.parquet