Found in translation: announcing a library of GTFS Realtime translators

Tyler Green
IxN — The Intersection Blog
6 min readJul 2, 2019

--

Converging on the specification aids in delivering accurate transit information to more riders.

Imagine unpacking books to place on a bookshelf, and discovering that every book is a different shape and size. Maybe the first one is a normal-sized paper back, but the next one is an oversized coffee table book. And what if the third book has the form of a heart-shaped box of chocolates? All of these books have pages, but they do not make it easy to be stored next to each other on the same shelf. Now imagine that each of those books was a set of real-time transit arrivals and the bookshelf, a data pipeline. How would you begin to process each of them?

At Intersection, our systems process thousands of transit data points each day. Building a separate system for each set of these would be time-consuming and inefficient.

Fortunately, a common language exists for transmitting real-time transit information: the GTFS Realtime specification. Commonly thought of as a producer (e.g., transit operator, as opposed to a consumer: 3rd party app) output, this specification is a data model that can provide tremendous value even when used later in the transit data lifecycle.

We are excited to release a set of open-source GTFS Realtime Translators! They are released under the Apache 2.0 license, which we find to be a balanced license with advantages like broad compatibility. The best part: we are not announcing a new specification! Let’s strengthen the existing ones.

“How Standards Proliferate”, Source: xkcd “Standards”

Motivation

For a variety of historical reasons, many transit operators in North America publish their real-time updates in a custom format. Our goal is to help bring their transit data to more riders.

To achieve this, our technical motivation in building the GTFS Realtime Translators is straightforward: facilitate processing of transit data in a single pipeline. Language bindings already exist in many popular programming languages for consuming GTFS Realtime data. For data feeds in custom formats, the translators are a key step immediately before using a language binding. We see the translators as increasing the importance of the language bindings; the translators exist to construct the input for a language binding.

There is a softer side to why we built the translators: increase the awareness of the GTFS realtime specification and promote adoption of its vocabulary.

The GTFS Realtime specification pairs up-to-the-second real-time data with a previously-scheduled trip from GTFS static. This interdependence with the static schedule data is fundamental to GTFS Realtime. It promotes a paradigm where you don’t need to transmit structural information about your transit system (“What routes are served by this stop?”, “Does this trip run on weekends?”, etc.) through a performant API. Just publish a rock-solid GTFS static feed, and offer up any updates to that through a GTFS Realtime API.

Architecture

The GTFS Realtime Translators are designed to sit between data produced by transit operators and a system to process GTFS Realtime feeds.

As the diagram above shows, the input to a translator is a custom real-time arrivals feed and the output is that same feed in GTFS Realtime format. We expect that users of these translators will want to integrate them as a pre-processing step in a larger transit data processing system. The translators are designed to make the first step of this easier for the transit community!

The GTFS Realtime Translators library (green) is a series of layers built around the GTFS Realtime specification (yellow).

At the core of the GTFS Realtime Translators is the GTFS Realtime Specification. This is a .proto file maintained in the google/transit repository. This defines the output schema of each individual translator.

The first layer proved in the library itself are Factories. These are Python objects which wrap the fields and their relationships defined in the specification. For now, these are TripUpdate and FeedMessage. (Alert and VehiclePosition will be supported in future releases.) A FeedMessage consists of one or more TripUpdates. That a TripUpdate actually consists of a StopTimeUpdate, two StopTimeEvents, and a TripDescriptor (at minimum) is of no concern to a user of a factory. When building a new translator, the factories are intended to be your building blocks.

The next layer is the Translators themselves. Our v0.2.0 release contains the LaMetroGtfsRealtimeTranslator and SeptaRegionalRailTranslator. The translator layer takes as input the custom real-time arrivals data from a producer in whatever size it is provided (more on this in the next section). A translator parses this real-time input and uses the factories to construct a GTFS Realtime compliant output. An example of translator usage appears as follows.

translator = LaMetroGtfsRealtimeTranslator(la_metro_rail_input_data, stop_id=’80122')
feed_bytes = translator.serialize()

The final layer provided in the library is the Registry. The registry provides a standard interface for any downstream processing systems. To continue the LA Metro example, code which uses the translator does not need to be aware of the actual Python class name.

translator_klass = TranslatorRegistry.get(‘la-metro’)
kwargs = {‘stop_id’: ‘80122’}
translator = translator_klass(input_data, **kwargs)
feed_bytes = translator.serialize()

Compromises & Customizations

Remember when I less-than-eloquently said that a translator accepts data “in whatever size it is provided”? This is a key difference between data produced by a GTFS Realtime Translator and a GTFS Realtime feed produced directly by a transit operator.

Kurt Raschke gave an insightful presentation on this at TransportationCamp Philadelphia 2019. One of his main points is that a GTFS Realtime feed provides a “synoptic” view of a transit system. On a single request, a GTFS Realtime feed can give you a view of every trip and vehicle active in the system.

In contrast, a translator can only produce as much of a system view as it is provided. Since many custom real-time arrivals feeds only deliver data for a single transit stop (what Raschke labels “piecewise”), your translator GTFS Realtime output will also be for a single stop. This means it can take multiple translator calls to produce a series of GTFS Realtime output for an entire system, even though each piece of output is GTFS Realtime compliant. (In theory, you could write a translator to accept data from more than one set of custom arrivals to turn the translator output into a synoptic view. This is left as an exercise to the reader.)

Another interesting aspect of writing a translator is that some custom real-time arrivals feeds do not provide a trip ID to match an arrival back to static; for these, the GTFS Realtime Translator library recommends using custom extensions to GTFS Realtime. We have created Intersection extensions which are built into the translators and used by the factories.

For example, SEPTA’s real-time feed for regional rail departures provides enough data to be fully presented to riders, but no way to join each departure trip with a corresponding GTFS static. Key trip details are often satisfied from GTFS static; therefore, losing this connection requires custom extensions to pass these trip details from a real-time feed. The SeptaRegionalRailTranslator is an example of this: it uses the headsign field of the IntersectionTripUpdate message, which is an extension of the core TripUpdate.

What’s Next?

We are thrilled to release the GTFS Realtime Translators library to the transit community! As I mentioned, future releases will support the Alert and VehiclePosition domain objects defined by the GTFS Realtime specification.

For details on how to install the GTFS Realtime Translators and some usage examples, please explore the library README.

Are you producing or consuming a custom real-time arrivals feed? Consider developing a translator and contributing it back to the library. We look forward to discovering patterns in custom feeds with the community and building that knowledge into future library releases.

Together we can reduce the friction that exists between how transit systems operate and how riders learn about the state of their operation.

If you are interested in joining Intersection’s engineering team, apply to join our team!

--

--