gtfsrt.io — Open GTFS-Realtime Data Archive

Data Inventory

Live from the archive — updated daily by the compaction pipeline

JavaScript is required to load the live inventory. You can view the raw data at inventory.json.

Access the Data

No authentication required — all data is publicly accessible

Select a feed to populate examples:

Query with DuckDB

Query Parquet files directly from GCS using DuckDB's httpfs extension:

INSTALL httpfs;
LOAD httpfs;

SELECT *
FROM read_parquet(
  'http://parquet.gtfsrt.io/<feed_type>/date=<date>/base64url=<base64url>/data.parquet',
  hive_partitioning = true
)
LIMIT 100;

Read with Python

Use Polars to read Parquet files over HTTP:

# pip install polars
import polars as pl

df = pl.read_parquet(
    "http://parquet.gtfsrt.io/<feed_type>"
    "/date=<date>"
    "/base64url=<base64url>"
    "/data.parquet"
)
print(df.schema)
print(df.head(10))

Direct Download

Files are organized with Hive-style partitioning:

# Parquet files (compacted daily)
http://parquet.gtfsrt.io/<feed_type>/date=<date>/base64url=<base64url>/data.parquet

# Raw protobuf snapshots
http://protobuf.gtfsrt.io/<feed_type>/date=<date>/hour={ISO_HOUR}/base64url=<base64url>/{timestamp}.pb

The base64url partition is a URL-safe base64 encoding (no padding) of the feed URL. Use the inventory table above to find feed URLs and their encoded values.

Explore with the Sandbox

The GTFS-RT Sandbox is a companion project for exploring this archived data using DuckDB and dbt. It includes staging models, TIDES-compliant transformations, and analytics views you can run on your laptop.

Sandbox Repository Clone and explore with dbt + DuckDB dbt Documentation Browse models, columns, and lineage

Source Code

gtfs-realtime-archiver Archiver service, Dagster pipeline, and infrastructure gtfsrt-sandbox dbt + DuckDB sandbox for data exploration

Licensed under AGPL-3.0