gtfsrt.io

Open GTFS-Realtime data archive

A continuously-updated archive of GTFS-Realtime feeds from transit agencies across the US. Raw protobuf snapshots and analytics-ready Parquet files, freely accessible for researchers and developers.

by Jarvus Innovations

Agencies
Feeds
Records
Archived
Date Range

Data Inventory

Live from the archive — updated daily by the compaction pipeline

Access the Data

No authentication required — all data is publicly accessible

Query with DuckDB

Query Parquet files directly from GCS using DuckDB's httpfs extension:

INSTALL httpfs;
LOAD httpfs;

SELECT *
FROM read_parquet(
  'http://parquet.gtfsrt.io/<feed_type>/date=<date>/base64url=<base64url>/data.parquet',
  hive_partitioning = true
)
LIMIT 100;

Read with Python

Use Polars to read Parquet files over HTTP:

# pip install polars
import polars as pl

df = pl.read_parquet(
    "http://parquet.gtfsrt.io/<feed_type>"
    "/date=<date>"
    "/base64url=<base64url>"
    "/data.parquet"
)
print(df.schema)
print(df.head(10))

Direct Download

Files are organized with Hive-style partitioning:

# Parquet files (compacted daily)
http://parquet.gtfsrt.io/<feed_type>/date=<date>/base64url=<base64url>/data.parquet

# Raw protobuf snapshots
http://protobuf.gtfsrt.io/<feed_type>/date=<date>/hour={ISO_HOUR}/base64url=<base64url>/{timestamp}.pb

The base64url partition is a URL-safe base64 encoding (no padding) of the feed URL. Use the inventory table above to find feed URLs and their encoded values.

Explore with the Sandbox

The GTFS-RT Sandbox is a companion project for exploring this archived data using DuckDB and dbt. It includes staging models, TIDES-compliant transformations, and analytics views you can run on your laptop.

Source Code

Licensed under AGPL-3.0