This Google Flights Scraper Uses Protobuf Secrets No One Talks About

B
Bright Coding
Author
Share:
This Google Flights Scraper Uses Protobuf Secrets No One Talks About
Advertisement

This Google Flights Scraper Uses Protobuf Secrets No One Talks About

What if I told you that Google hides your flight search data in plain sight—and that decoding it could make your scraper 10x faster than anything using Playwright?

Every developer who's tried to build with flight data knows the nightmare. Google's official Flights API? Dead since 2018, buried in what developers bitterly call the "Google Graveyard." Free alternatives? Choked by rate limits that make them unusable at scale. Enterprise solutions? Your wallet weeps just reading the pricing page.

And then there's the scraping path. You fire up Playwright, watch Chromium lumber to life, and wait... and wait... only to hit a Cloudflare wall on edge deployments. Sound familiar? I've been there. You've been there. We've all wasted hours on "solutions" that solve nothing.

But here's where it gets insane. A developer named AWeirdDev cracked open Google Flights' URL parameters and discovered something nobody expected: Base64-encoded Protobuf strings carrying all your search parameters in a compact binary format. No browser automation. No headless Chromium. Just clean, structured data flying over the wire at native speeds.

Enter fast-flights—a strongly-typed Python scraper that reverse-engineers Google's own serialization protocol to fetch flight data faster than you thought possible. This isn't scraping in the traditional sense. This is speaking Google's secret language back to them.

Ready to see how deep this rabbit hole goes? Let's decode it together.


What is fast-flights?

fast-flights is a fast, robust Google Flights scraper (API) implemented in Python that exploits Google's internal use of Protocol Buffers (Protobuf)—Google's own binary serialization format—for constructing flight search queries. Currently at version 3.0rc1, this library represents a fundamental departure from conventional scraping approaches that rely on browser automation.

Created by AWeirdDev (2024-2026), the project emerged from a genuine developer pain point: building a chat-interface trip recommendation app without viable API access to flight data. Rather than accepting the slow, fragile Playwright-based solutions dominating PyPI, AWeirdDev performed genuine reverse engineering on Google's URL parameters—specifically the cryptic tfs query parameter that looks like gibberish but actually encodes structured Protobuf messages.

The project's trending status stems from three converging factors:

  • Performance desperation: Developers are actively abandoning Playwright-based scrapers due to speed and deployment constraints
  • Protobuf literacy gap: Most developers encounter Protobuf in gRPC contexts, never suspecting it powers consumer web UIs
  • Google's API strategy vacuum: The 2018 shutdown of public QPX access created permanent demand for alternatives

What makes fast-flights genuinely different is its type safety and semantic clarity. Rather than manipulating raw strings or DOM selectors, you construct queries using strongly-typed Python objects—FlightQuery, Passengers, enumerated seat classes and trip types—that compile down to the same binary format Google's frontend generates. You're not hacking around Google's interface; you're replicating its internal communication protocol.

The library's evolution tells its own story: v2.0 introduced a succinct API with Playwright fallback support; v2.2 added local Playwright for direct requests; v3.0 pivots to JavaScript data extraction—showing continuous refinement of the core approach while maintaining the Protobuf foundation.


Key Features That Separate fast-flights from the Scraping Herd

Let's dissect what makes this library architecturally superior to conventional alternatives:

Protobuf-Native Query Construction

Instead of URL parameter munging or form simulation, fast-flights builds binary-serialized Protobuf messages that Google's backend recognizes natively. This eliminates an entire class of parsing errors and makes your queries structurally identical to legitimate Google Flights requests. The compact binary format also reduces payload size versus JSON equivalents—critical for latency-sensitive applications.

Strong Typing Throughout

Every component carries explicit types: FlightQuery objects with validated date strings and IATA airport codes; Passengers with constrained integer fields; seat restricted to "economy" | "business" | "first" | "premium-economy"; trip locked to "one-way" | "round-trip" | "multi-city". This catches errors at construction time, not at Google's 400 Bad Request response.

Integration-Ready Architecture

The library exposes a clean integration hook for proxy services—specifically Bright Data in the current implementation. This matters enormously because Google rate-limits aggressively; having first-class proxy rotation built into the API surface, not bolted on afterward, shows architectural foresight.

Multi-Language Response Support

The language parameter (e.g., "zh-TW" for Traditional Chinese) demonstrates that the Protobuf schema carries localization metadata—enabling truly international applications without separate scraping pipelines per market.

Zero Browser Dependencies (Core Path)

While v2.2 added Playwright as an escape hatch, the primary execution path requires no browser automation whatsoever. This translates to: sub-second cold starts versus Playwright's multi-second Chromium launches, compatibility with serverless/edge environments that choke on browser binaries, and dramatically lower memory footprints.

Active Development with Clear Governance

The contribution guidelines reveal maintainer priorities: no AI-generated slop, atomic changes, dependency minimalism, and realistic bandwidth expectations. This signals sustainable open-source stewardship—not a abandonware timebomb.


Use Cases Where fast-flights Absolutely Dominates

1. Travel Startup MVP Validation

You're building the next Hopper or Kayak competitor. You need real flight data for pricing algorithms, but airline direct APIs require lengthy certification processes and minimum revenue guarantees. fast-flights gets you production flight data in hours, not quarters—letting you validate demand before committing to expensive distribution partnerships.

2. Conversational AI / Trip Planning Bots

The original use case: chat interfaces that need to ground recommendations in actual bookable inventory. The strongly-typed API maps cleanly to function-calling schemas in modern LLMs (OpenAI functions, Claude tool use), making fast-flights an ideal retrieval backend for AI travel agents.

3. Price Monitoring & Alert Systems

Build arbitrage detectors, fare drop alerts, or historical pricing databases. The Protobuf-based approach's speed advantage compounds when you're issuing thousands of queries daily—Playwright overhead becomes a genuine cost center at scale.

4. Corporate Travel Dashboards

Enterprise travel management tools need unified search across carriers without GDS fees. fast-flights provides Google's aggregated view (which includes most major carriers) through a programmatic interface—perfect for internal tooling where per-search costs must approach zero.

5. Edge-Deployed Microservices

Serverless functions on Vercel, Cloudflare Workers, or AWS Lambda@Edge simply cannot run Playwright. fast-flights' pure HTTP+Protobuf architecture deploys anywhere Python runs—no Docker hacks, no layer gymnastics, no cold start death spirals.


Step-by-Step Installation & Setup Guide

Prerequisites

  • Python 3.8+ (3.11 recommended for performance)
  • pip or uv for package management
  • Optional: Bright Data account for production proxy rotation

Core Installation

# Standard installation (pre-release channel for v3 features)
pip install fast-flights==3.0rc0

# Or with uv for faster resolution
uv pip install fast-flights==3.0rc0

⚠️ Note: The README shows pip install fast-flights without version pinning, but PyPI currently hosts 3.0rc0 as the latest. Pin explicitly to access v3 features.

Verify Installation

import fast_flights
print(fast_flights.__version__)  # Should report 3.0rc0 or compatible

Environment Configuration (Bright Data Integration)

For production deployments requiring proxy rotation:

Advertisement
# Set credentials via environment variables (recommended)
import os
os.environ["BRIGHTDATA_USERNAME"] = "your-zone-username"
os.environ["BRIGHTDATA_PASSWORD"] = "your-zone-password"

# Or configure in code (not recommended for committed code)
from fast_flights.integrations import BrightData
proxy = BrightData(
    username="your-zone-username",
    password="your-zone-password"
)

Development vs. Production Setup

Environment Configuration Rationale
Local dev No proxy, direct requests Fast iteration, IP exposure acceptable
Staging BrightData with limited zone Test proxy behavior, cost control
Production BrightData with rotating residential Avoid rate limits, geographic distribution

Health Check Pattern

from fast_flights import create_query, get_flights, FlightQuery

# Minimal query to verify connectivity
test_query = create_query(
    flights=[FlightQuery(date="2025-06-01", from_airport="JFK", to_airport="LAX")],
    seat="economy",
    trip="one-way",
    passengers=fast_flights.Passengers(adults=1),
)

try:
    result = get_flights(test_query)
    print(f"✅ Connected: found {len(result.flights)} flights")
except Exception as e:
    print(f"❌ Failed: {e}")

REAL Code Examples from the Repository

Let's examine actual code from the fast-flights README, with deep technical commentary on what's happening under the hood.

Example 1: Basic Flight Search (The Core Pattern)

from fast_flights import (
    FlightQuery,           # Strongly-typed single flight leg specification
    Passengers,            # Passenger composition with validation
    create_query,          # Factory: compiles Python objects → Protobuf bytes
    get_flights            # Transport: executes HTTP request, parses response
)

# Build the search specification using domain objects
query = create_query(
    flights=[
        FlightQuery(
            date="YYYY-MM-DD",   # ISO 8601 date; validated at construction
            from_airport="MYJ",  # IATA code: Matsuyama Airport, Japan
            to_airport="TPE",    # IATA code: Taiwan Taoyuan International
        ),
    ],
    seat="economy",  # Enum-like string: constrains to valid cabin classes
    trip="one-way",  # Enum-like string: routing behavior modifier
    passengers=Passengers(adults=1),  # Extensible: supports children, infants
    language="zh-TW",  # BCP 47 language tag: affects carrier names, currency formatting
)

# Execute: serializes query to Protobuf → Base64 → URL parameter
# Receives JSONP-like response → parses to structured Python objects
res = get_flights(query)

What's actually happening here? The create_query function constructs a Protobuf message matching Google's internal GoogleSucks schema (the author's candid naming from reverse engineering). This message encodes: flight segments with airport references as nested Airport messages (field numbers 13 and 14 in the discovered schema), passenger counts affecting pricing tiers, and locale metadata for response localization. The Base64 encoding makes it URL-safe while preserving binary efficiency.

Example 2: Proxy Integration for Production Resilience

from fast_flights import get_flights
from fast_flights.integrations import BrightData

# BrightData integration injects proxy configuration into the HTTP transport
get_flights(
    ...,  # Your existing query object
    integration=BrightData()  # Reads credentials from environment by default
)

Critical implementation detail: The integration parameter uses a strategy patternBrightData() returns a configuration object that the HTTP client layer applies to requests or httpx session initialization. This keeps proxy logic decoupled from core query construction, enabling future integrations (Oxylabs, Smartproxy, etc.) without API breakage.

Example 3: The Discovered Protobuf Schema (Reverse Engineering Artifact)

syntax = "proto3"

// Airport reference: field number 2 carries the IATA code
// Google's schema uses field 2 consistently for string identifiers
message Airport {
    string name = 2;
}

// Single flight leg: date at field 2, airports nested at 13/14
// The gap in field numbers (2 → 13 → 14) suggests evolution:
// earlier fields likely carried deprecated features
message FlightInfo {
    string date = 2;
    Airport dep_airport = 13;
    Airport arr_airport = 14;
}

// Top-level container: repeated field 3 holds all flight segments
// "GoogleSucks" = author's commentary on schema discoverability
message GoogleSucks {
    repeated FlightInfo = 3;
}

Why this matters for users: You don't write this Protobuf directly—fast-flights handles compilation. But understanding the schema explains why the API is structured as it is. The FlightQuery Python class maps directly to FlightInfo; the flights=[...] list in create_query() becomes the repeated collection. The field number choices (2, 13, 14) aren't arbitrary—they're the actual wire positions Google's frontend uses, discovered through decoder analysis.

Example 4: v3.0 JavaScript Data Extraction Pattern

While v3.0 specifics are evolving in the RC phase, the "Uses Javascript data instead" changelog entry indicates a pivot from Protobuf query construction to JavaScript-rendered response parsing—likely extracting structured data from <script> tags or window object hydration rather than API responses. This pattern, seen in Next.js and React applications, often exposes richer metadata than formal APIs:

# Conceptual v3.0 usage (inferred from changelog)
from fast_flights import get_flights

# Response may now include JavaScript-derived fields:
# - Real-time seat availability maps
# - Fare class breakdowns with baggage allowances
# - Carbon emission estimates
# - Airline-specific booking deep-links
res = get_flights(query)

# Access v3-enhanced attributes (hypothetical, based on direction)
for flight in res.flights:
    print(flight.co2_emissions_kg)  # Environmental data
    print(flight.baggage_included)   # Policy transparency

Advanced Usage & Best Practices

Query Batching for Multi-City Itineraries

The flights=[...] list isn't limited to one element. For complex trips:

multi_city = create_query(
    flights=[
        FlightQuery(date="2025-07-01", from_airport="NYC", to_airport="LON"),
        FlightQuery(date="2025-07-10", from_airport="LON", to_airport="BER"),
        FlightQuery(date="2025-07-20", from_airport="BER", to_airport="NYC"),
    ],
    trip="multi-city",  # Critical: must match flight count
    # ...
)

Error Handling for Schema Evolution

Google changes field meanings without notice. Defensive code:

from fast_flights import get_flights
from fast_flights.exceptions import SchemaChangedError  # hypothetical

try:
    res = get_flights(query)
except Exception as e:  # Broad catch due to RC status
    if "protobuf" in str(e).lower() or "decode" in str(e).lower():
        # Pin to last known working version
        # Report to: https://github.com/AWeirdDev/flights/issues
        raise RuntimeError(f"Schema drift detected: {e}")

Caching Strategy

Flight prices change, but availability patterns are semi-stable:

from functools import lru_cache
import hashlib

@lru_cache(maxsize=128)
def cached_get_flights(query_hash: str):
    # Reconstruct query from hash, or use deterministic serialization
    pass

Rate Limiting Self-Protection

Even with BrightData, respect Google's infrastructure:

import time
from random import uniform

def respectful_search(queries):
    for q in queries:
        yield get_flights(q)
        time.sleep(uniform(2.0, 5.0))  # Human-like pacing

Comparison with Alternatives

Dimension fast-flights hugoglvs/google-flights-scraper SerpAPI Enterprise QPX
Core Technology Protobuf over HTTP Playwright browser automation Proxy farm + parser Official Google API
Speed ⚡ Sub-second 🐢 5-30 seconds ⚡ Fast (cached) ⚡ Fast
Cost Free (OSS) + proxy optional Free (OSS) + infrastructure $50-200/month $$$$ (negotiated)
Type Safety ✅ Strong (Python dataclasses) ❌ Weak (dict/string manipulation) ⚠️ JSON schema ✅ Strong
Edge Deployable ✅ Yes ❌ No (Chromium required) ✅ Via proxy config ❌ Corporate VPN
Maintenance Burden Low (stable Protobuf) High (DOM changes break) None (managed) Low
Data Freshness Real-time Real-time Cached (stale risk) Real-time
Legal Clarity Gray (scraping) Gray (scraping) Gray (reselling) ✅ Licensed

The Verdict: Choose fast-flights when you need production speed without enterprise budget, value type safety in Python ecosystems, and can tolerate maintenance of proxy infrastructure. It's the sweet spot between unmaintainable free scrapers and prohibitively expensive official channels.


FAQ: What Developers Actually Ask

Is fast-flights legal to use?

Google's Terms of Service prohibit automated access to consumer services. However, enforcement varies; the library operates in the same gray zone as SerpAPI and similar services. Use responsibly, respect rate limits, and consider legal counsel for commercial deployments.

Why Protobuf instead of JSON or GraphQL?

Google's frontend uses Protobuf natively—we're not choosing it, we're matching their protocol. This provides structural authenticity that reduces detection risk and payload efficiency that improves latency.

Will Google break this by changing their schema?

Possible, but Protobuf schemas evolve with backward-compatible field addition by design. The repeated and nested message patterns provide extension points. The v2→v3 transition shows the maintainer actively adapts to changes.

Can I use this without Bright Data?

Yes, for low-volume personal projects. Google's rate limiting will block repeated requests from single IPs. Bright Data (or similar) becomes essential for production workloads.

How does v3.0 differ from v2.x?

v3.0 shifts from Protobuf query responses to JavaScript-rendered data extraction—likely parsing hydration data from Google's React frontend. This may provide richer metadata but could be more fragile to framework updates.

Is this suitable for real-time price comparison?

With proper proxy rotation and caching, yes. Without infrastructure investment, you'll hit rate limits. The library is capable; your architecture determines reliability.

What about multi-passenger or complex cabin mixes?

The Passengers class supports adults, children, and infants. Cabin class applies to entire itineraries—per-segment mixed cabins aren't currently exposed in the API surface.


Conclusion: The Protobuf-Powered Path to Flight Data Freedom

The death of Google's official Flights API in 2018 created a vacuum that slow, brittle Playwright scrapers rushed to fill. fast-flights represents something smarter: understanding Google's actual communication protocol and speaking it fluently.

By reverse-engineering the Protobuf schemas hidden in tfs parameters, AWeirdDev built a tool that doesn't simulate user behavior—it replicates legitimate request structures with type-safe Python abstractions. The result is speed that browser automation cannot touch, deployability that serverless platforms welcome, and an API surface that professional developers can maintain.

Is it perfect? No—it's release candidate software in an adversarial environment where Google could shift schemas overnight. But the architectural approach is sound, the maintenance is active, and the alternative is either crawling through Playwright molasses or paying enterprise tolls.

If you're building anything that needs real flight data—AI agents, price monitors, travel tools, corporate dashboards—stop accepting slow as inevitable. The Protobuf secret is out. Use it.

👉 Get started now: github.com/AWeirdDev/flights

Install, decode, fly.

Advertisement

Comments (0)

No comments yet. Be the first to share your thoughts!

Leave a Comment

Apps & Tools Open Source

Apps & Tools Open Source

Bright Coding Prompt

Bright Coding Prompt

Categories

Advertisement
Advertisement
Advertisement