Data Inventory
Live from the archive — updated daily by the compaction pipeline
Access the Data
No authentication required — all data is publicly accessible
Query with DuckDB
Query Parquet files directly from GCS using DuckDB's httpfs extension:
INSTALL httpfs;
LOAD httpfs;
SELECT *
FROM read_parquet(
'http://parquet.gtfsrt.io/<feed_type>/date=<date>/base64url=<base64url>/data.parquet',
hive_partitioning = true
)
LIMIT 100;
Read with Python
Use Polars to read Parquet files over HTTP:
# pip install polars
import polars as pl
df = pl.read_parquet(
"http://parquet.gtfsrt.io/<feed_type>"
"/date=<date>"
"/base64url=<base64url>"
"/data.parquet"
)
print(df.schema)
print(df.head(10))
Direct Download
Files are organized with Hive-style partitioning:
# Parquet files (compacted daily)
http://parquet.gtfsrt.io/<feed_type>/date=<date>/base64url=<base64url>/data.parquet
# Raw protobuf snapshots
http://protobuf.gtfsrt.io/<feed_type>/date=<date>/hour={ISO_HOUR}/base64url=<base64url>/{timestamp}.pb
The base64url partition is a URL-safe base64 encoding (no padding) of the feed URL.
Use the inventory table above to find feed URLs and their encoded values.
Explore with the Sandbox
The GTFS-RT Sandbox is a companion project for exploring this archived data using DuckDB and dbt. It includes staging models, TIDES-compliant transformations, and analytics views you can run on your laptop.
Source Code
Licensed under AGPL-3.0