Open Source · MIT · Python 3.10+

mcat

cat on steroids — reads Parquet, CSV, JSONL,
Avro, Excel, JSON and remote sources.

$ uv tool install mcat Click to copy
GitHub ↗
MIT License Python 3.10+ GitHub Stars PyPI Version
01 / Features

Everything cat does,
plus superpowers

01
>_

Drop-in cat

Every GNU cat flag works. -n, -b, -s, -A, -v — all of them. If you know cat, you know mcat.

02
{ }

7+ formats

Parquet, CSV, TSV, JSONL, JSON, Avro, Excel — auto-detected by extension or magic bytes.

03
SQL

SQL queries

Filter with --query using SQL WHERE clauses. Powered by DuckDB with predicate pushdown on Parquet.

04

Instant stats

Column profiling from Parquet metadata — min, max, nulls, uniques with zero full-file I/O.

05
S3

Cloud native

Stream from S3, GCS, Azure, HTTP, MinIO, R2. Zero-config auth — uses your existing credentials.

06
gz

Compression

Transparent decompression for gzip, zstd, bz2, lz4, xz. Works on local and remote files.

02 / Demos

See it in action

parquet table
$ mcat sales_data.parquet
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ name        โ”ƒ region   โ”ƒ   sales โ”ƒ quarter โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Alice Chen  โ”‚ APAC     โ”‚  94,230 โ”‚ Q1 2024 โ”‚
โ”‚ Bob Muller  โ”‚ EMEA     โ”‚  71,450 โ”‚ Q1 2024 โ”‚
โ”‚ Carol Smith โ”‚ Americas โ”‚  88,920 โ”‚ Q1 2024 โ”‚
โ”‚ David Park  โ”‚ APAC     โ”‚ 102,100 โ”‚ Q2 2024 โ”‚
โ”‚ Eve Santos  โ”‚ Americas โ”‚  67,800 โ”‚ Q2 2024 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
5 rows ยท 4 columns ยท parquet
$ mcat sales.parquet --query "sales > 80000 AND region = 'APAC'"
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ name       โ”ƒ region โ”ƒ   sales โ”ƒ quarter โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ Alice Chen โ”‚ APAC   โ”‚  94,230 โ”‚ Q1 2024 โ”‚
โ”‚ David Park โ”‚ APAC   โ”‚ 102,100 โ”‚ Q2 2024 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
2 rows ยท 4 columns ยท parquet
$ mcat --diff q1_sales.csv q2_sales.csv
โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Row โ”ƒ Status โ”ƒ name        โ”ƒ sales                   โ”ƒ
โ”กโ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚   0 โ”‚ ~      โ”‚ Alice Chen  โ”‚ 94,230 -> 98,100          โ”‚
โ”‚   1 โ”‚        โ”‚ Bob Muller  โ”‚ 71,450                   โ”‚
โ”‚   2 โ”‚ ~      โ”‚ Carol Smith โ”‚ 88,920 -> 91,340          โ”‚
โ”‚   3 โ”‚ +      โ”‚ Frank Lee   โ”‚ 55,200                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
1 unchanged ยท 2 modified ยท 1 added ยท 0 removed
$ mcat --stats sales_data.parquet
        Stats  sales_data.parquet  (1,234,567 rows ยท 4 columns)
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Column โ”ƒ Type    โ”ƒ  Non-Null โ”ƒ   Null โ”ƒ    Min โ”ƒ     Max โ”ƒ   Mean โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ name   โ”‚ STRING  โ”‚ 1,234,567  โ”‚      0 โ”‚  Aaron โ”‚     Zoe โ”‚      - โ”‚
โ”‚ age    โ”‚ INT64   โ”‚ 1,230,000  โ”‚  4,567 โ”‚     18 โ”‚      94 โ”‚   36.4 โ”‚
โ”‚ salary โ”‚ FLOAT64 โ”‚ 1,200,000  โ”‚ 34,567 โ”‚ 22,000 โ”‚ 450,000 โ”‚ 87,432 โ”‚
โ”‚ region โ”‚ STRING  โ”‚ 1,234,567  โ”‚      0 โ”‚   APAC โ”‚    EMEA โ”‚      - โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
 4.2 MB ยท parquet ยท compression: SNAPPY
03 / Install

Up and running
in one command

pip install mcat
# Install uv first (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install mcat
uv tool install mcat
brew tap christyjacob4/tap
brew install mcat
# Add the PPA
sudo add-apt-repository ppa:christyjacob4/mcat
sudo apt update

# Install mcat
sudo apt install mcat
04 / Usage

Every flag you need,
at the command line

Drop-in cat

mcat file.txt
mcat -n file.txt
mcat -A file.txt
echo "hello" | mcat

Structured data

mcat data.parquet
mcat data.csv
mcat --format jsonl data.parquet
mcat --schema data.parquet

Filter & slice

mcat --head 10 data.parquet
mcat --tail 5 data.csv
mcat --columns name,age data.parquet
mcat --sample 20 data.parquet

SQL queries

mcat data.parquet \
  --query "age > 30 AND city = 'NYC'"
mcat data.csv --query "salary > 50000" \
  --format jsonl

Sort & grep

mcat data.parquet --sort age
mcat data.parquet --sort -age,name
mcat data.csv --grep "Smith"
mcat data.csv --grep "NYC" --head 5

Diff & stats

mcat --diff old.csv new.csv
mcat --stats data.parquet
mcat --count data.parquet
mcat --detect data.parquet

Remote sources

mcat s3://bucket/data.parquet
mcat gs://bucket/data.parquet
mcat https://example.com/data.csv
mcat --s3-endpoint https://play.min.io \
  s3://mybucket/data.parquet

Compression & output

mcat data.parquet.gz
mcat data.csv.zst --head 100
mcat data.parquet -o data.jsonl \
  --format jsonl
mcat data.parquet --pager
05 / Formats

7 formats,
zero configuration

Auto-detected by extension, then by magic bytes (PAR1, Obj\x01) as fallback.

FormatExtensionsFeatures
Parquet.parquet .pqRow-group streaming, schema inspect, instant count/stats
Avro.avroStream blocks, schema inspect
CSV.csvTable with headers, auto-detect delimiter
TSV.tsvTable with headers
JSONL.jsonl .ndjsonPretty-print records
JSON.jsonArray of objects or single object
Excel.xlsx .xlsFirst sheet, both legacy and modern

Remote Sources

Streaming via fsspec — no full downloads, range requests where supported.

ProtocolBackendNotes
s3://s3fs + boto3AWS S3, MinIO, R2, B2, DO Spaces
gs://gcsfsGoogle Cloud Storage
az://adlfsAzure Blob Storage
https://fsspec built-inRange requests where supported
S3-compatibles3fs + --s3-endpointMinIO, Cloudflare R2, Backblaze B2
06 / Auth

Zero-config
cloud auth

mcat piggybacks on credentials you've already configured for your cloud provider.

# AWS CLI (recommended)
aws configure
mcat s3://my-bucket/data.parquet

# Environment variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
mcat s3://my-bucket/data.parquet

# Named profile
AWS_PROFILE=prod mcat s3://my-bucket/data.parquet
# Per-command endpoint
mcat --s3-endpoint https://play.min.io s3://mybucket/data.parquet

# Environment variable
export AWS_ENDPOINT_URL=https://play.min.io
mcat s3://mybucket/data.parquet

# Cloudflare R2
mcat --s3-endpoint https://<account>.r2.cloudflarestorage.com \
  s3://bucket/file.parquet
# gcloud CLI (recommended)
gcloud auth application-default login
mcat gs://my-bucket/data.parquet

# Service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
mcat gs://my-bucket/data.parquet