mcat

cat on steroids — Parquet, CSV, JSONL, Avro, Excel, SQL queries, and remote sources

MIT License Python 3.10+ GitHub Stars PyPI Version
pip install mcatClick to copy

Why mcat?

>_

Drop-in cat

Every GNU cat flag works. -n, -b, -s, -A, -v — all of them. If you know cat, you know mcat.

{ }

7+ formats

Parquet, CSV, TSV, JSONL, JSON, Avro, Excel — auto-detected by extension or magic bytes.

Q

SQL queries

Filter with --query using SQL WHERE clauses. Powered by DuckDB with predicate pushdown on Parquet.

#

Instant stats

Column profiling from Parquet metadata — min, max, nulls, uniques with zero full-file I/O.

S3

Cloud native

Stream from S3, GCS, Azure, HTTP, MinIO, R2. Zero-config auth — uses your existing credentials.

gz

Compression

Transparent decompression for gzip, zstd, bz2, lz4, xz. Works on local and remote files.

Install

pip install mcat
# Install uv first (if you don't have it)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install mcat
uv tool install mcat
brew tap christyjacob4/tap
brew install mcat
# Add the PPA
sudo add-apt-repository ppa:christyjacob4/mcat
sudo apt update

# Install mcat
sudo apt install mcat

See it in action

parquet table
$ mcat sales_data.parquet
┌─────────────┬──────────┬─────────┬───────────┐
 name         region    sales    quarter   
├─────────────┼──────────┼─────────┼───────────┤
 Alice Chen   APAC      94,230   Q1 2024   
 Bob Muller   EMEA      71,450   Q1 2024   
 Carol Smith  Americas  88,920   Q1 2024   
 David Park   APAC      102,100  Q2 2024   
 Eve Santos   Americas  67,800   Q2 2024   
└─────────────┴──────────┴─────────┴───────────┘
5 rows · 4 columns · parquet
--query — SQL filtering
$ mcat sales.parquet --query "sales > 80000 AND region = 'APAC'"
┌─────────────┬──────────┬─────────┬───────────┐
 name         region    sales    quarter   
├─────────────┼──────────┼─────────┼───────────┤
 Alice Chen   APAC      94,230   Q1 2024   
 David Park   APAC      102,100  Q2 2024   
└─────────────┴──────────┴─────────┴───────────┘
2 rows · 4 columns · parquet
--diff — file comparison
$ mcat --diff q1_sales.csv q2_sales.csv
┌─────┬────────┬─────────────┬─────────────────────────┐
 Row  Status  name         sales                   
├─────┼────────┼─────────────┼─────────────────────────┤
 0    ~       Alice Chen   94,23098,100      
 1            Bob Muller   71,450                  
 2    ~       Carol Smith  88,92091,340      
 3    +       Frank Lee    55,200                  
└─────┴────────┴─────────────┴─────────────────────────┘
1 unchanged · 2 modified · 1 added · 0 removed
--stats — column profiling
$ mcat --stats sales_data.parquet
Stats  sales_data.parquet  (1,234,567 rows · 4 columns)

 Column      Type       Non-Null    Null    Min         Max          Mean
 ——————————————————————————————————————————————
 name        STRING     1,234,567      0    Aaron       Zoe          —
 age         INT64      1,230,000  4,567    18          94           36.4
 salary      FLOAT64    1,200,000 34,567    22,000      450,000      87,432
 region      STRING     1,234,567      0    APAC        EMEA         —
4.2 MB · parquet · compression: SNAPPY

Usage

Drop-in cat

mcat file.txt
mcat -n file.txt
mcat -A file.txt
echo "hello" | mcat

Structured data

mcat data.parquet
mcat data.csv
mcat --format jsonl data.parquet
mcat --schema data.parquet

Filter & slice

mcat --head 10 data.parquet
mcat --tail 5 data.csv
mcat --columns name,age data.parquet
mcat --sample 20 data.parquet

SQL queries

mcat data.parquet \
  --query "age > 30 AND city = 'NYC'"
mcat data.csv --query "salary > 50000" \
  --format jsonl

Sort & grep

mcat data.parquet --sort age
mcat data.parquet --sort -age,name
mcat data.csv --grep "Smith"
mcat data.csv --grep "NYC" --head 5

Diff & stats

mcat --diff old.csv new.csv
mcat --stats data.parquet
mcat --count data.parquet
mcat --detect data.parquet

Remote sources

mcat s3://bucket/data.parquet
mcat gs://bucket/data.parquet
mcat https://example.com/data.csv
mcat --s3-endpoint https://play.min.io \
  s3://mybucket/data.parquet

Compression & output

mcat data.parquet.gz
mcat data.csv.zst --head 100
mcat data.parquet -o data.jsonl \
  --format jsonl
mcat data.parquet --pager

Format Support

Formats are auto-detected by extension, then by magic bytes (PAR1, Obj\x01) as fallback.

FormatExtensionsFeatures
Parquet.parquet .pqRow-group streaming, schema inspect, instant count/stats
Avro.avroStream blocks, schema inspect
CSV.csvTable with headers, auto-detect delimiter
TSV.tsvTable with headers
JSONL.jsonl .ndjsonPretty-print records
JSON.jsonArray of objects or single object
Excel.xlsx .xlsFirst sheet, both legacy and modern

Remote Sources

Streaming via fsspec — no full downloads, range requests where supported.

ProtocolBackendNotes
s3://s3fs + boto3AWS S3, MinIO, R2, B2, DO Spaces
gs://gcsfsGoogle Cloud Storage
az://adlfsAzure Blob Storage
https://fsspec built-inRange requests where supported
S3-compatibles3fs + --s3-endpointMinIO, Cloudflare R2, Backblaze B2

Authentication

Zero-config auth — mcat uses credentials you've already set up for your cloud provider.

# AWS CLI (recommended)
aws configure
mcat s3://my-bucket/data.parquet

# Environment variables
export AWS_ACCESS_KEY_ID=AKIA...
export AWS_SECRET_ACCESS_KEY=...
mcat s3://my-bucket/data.parquet

# Named profile
AWS_PROFILE=prod mcat s3://my-bucket/data.parquet
# Per-command endpoint
mcat --s3-endpoint https://play.min.io s3://mybucket/data.parquet

# Environment variable
export AWS_ENDPOINT_URL=https://play.min.io
mcat s3://mybucket/data.parquet

# Cloudflare R2
mcat --s3-endpoint https://<account>.r2.cloudflarestorage.com \
  s3://bucket/file.parquet
# gcloud CLI (recommended)
gcloud auth application-default login
mcat gs://my-bucket/data.parquet

# Service account key
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
mcat gs://my-bucket/data.parquet