Musings on the TransXChange UK PTI Profile
As part of the DfT Bus Open Data Service (BODS) project, a new TransXChange version 2.4 profile snappily entitled the UK Passenger Transport Information Profile (TXC-PTI for short) has been commissioned, with the aim of providing a common standard for exchanging UK timetable and schedule data within BODS and similar systems. The initial draft documentation, running to 66 pages, was issued on 30th April 2020. Subject to the laying of secondary legislation in parliament, every bus operator in England will be legally required to publish timetable data that complies with this profile from January 2021.
As anyone familiar with TransXChange will already be aware, it is a complex beast that presents a high barrier to entry, not least because there are so many different ways to represent the same information – and as you would expect, each software supplier has generally chosen the way that suits their underlying system best. Woe betide any transport data novice who suggests that they will “just pull the relevant trips out with a couple of lines of code”; if only life was so straightforward. The schema guide alone for TransXChange version 2.4 runs to some 210 pages. Compare that with the GTFS Specification that fits comfortably onto 21 sides of A4 (yes that’s 10% of the size in crude terms) and you begin to understand why a little shudder goes does the spine of many software engineers whenever the word “TransXChange” is uttered. However, this article isn’t a TransXChange vs GTFS debate; there are plenty of column inches devoted to that elsewhere. Suffice to say that Elydium’s principle criterion when reviewing the TXC-PTI draft is an attempt to think from the perspective of future BODS data consumers and to evaluate the usability of a national data set that is presented as a large collection of TXC-PTI documents that have been generated by a multitude of data producers using a disparate set of supplier systems.
Coming at it from that perspective, it is therefore a great relief to note that we will (finally) have a more understandable and less ambiguous standard that everyone can rally behind. Our initial reading of the draft suggests that there are plenty of positives that we should be celebrating: the mandated use of National Operator Codes, consistent service coding across operators and regional boundaries, and clarity of version numbering requirements – to name but a few.
Now for the technical deep dive…
Historically, perhaps the biggest obstacles facing newcomers to TransXChange have been the hierarchical override/inheritance principle and the data entity normalisations, which were originally intended as a means to minimise the size of the resulting XML files. (Don’t forget that TransXChange was originally designed back in the days when a 512kbps internet connection was something pretty special and relational databases were regarded as the only professional show in town.) We were therefore relieved to see that many such features of the underlying XML schema have been identified and eliminated in the new profile, which goes so far as to state that:
“The overriding principle is that data should be stated, once, where is best placed. In TXC there is too much scope for writing a mixture of elements, some of which are inherited, some of which are not”
2.1 – General Principles
and later:
“TXC-PTI adopts the approach that the best place to hold the information about a trip is within the trip itself”
9.1 – Vehicle Journeys – Introduction
So far, so good. At Elydium we are strong advocates of these two principles, even though they are sometimes at odds with the more traditional approach in which data normalisation and minimal file sizes are paramount.
“A principle behind TXC-PTI is that data which is declared as a default and then over-ridden in a more specialised element should be avoided wherever possible, as this leads to confusion. On that basis, it is preferable that the Service/OperatingProfile element should be omitted. However, this does mean that the size of files could expand dramatically when there is no need to because all OperatingProfile information has to be encoded into individual VehicleJourney elements.”
5.3.3 – Operating Dates and Patterns
There’s nothing to disagree with here either: After all, a larger file size is a small price to pay if it avoids confusion and accidental misreading of the data, isn’t it? (Besides which, once an XML file is zipped for upload/download, the size difference is likely be negligible anyway.)
We were therefore somewhat dismayed to discover that despite such good intentions, it seems that there are still several areas in TXC-PTI over which the “old ways” still have something of a stranglehold:
“For TXC-PTI, therefore, this principle can be relaxed for OperatingProfile, so long as the operating profile completely defines the operations of the largest number of trips in full (e.g. it defines Mondays to Fridays excluding Bank Holidays). Individual VehicleJourney elements then have either:
5.3.3 – Operating Dates and Patterns
a) no OperatingProfile, because they follow the default pattern; or
b) a complete set of OperatingProfile elements which completely replace the default pattern,
and which describe in full when that VehicleJourney runs.”
Oh dear, we were doing so well. By permitting this “either/or” approach, TXC-PTI immediately introduces unnecessary complexity that will result in additional development and support time for each data consumer who now has to handle multiple use cases and varying interpretations of what constitutes compliant data by any number of supplying systems. So much for “the best place to hold the information about a trip is within the trip itself”. And all to save a few kilobytes per file. As you can probably guess, at Elydium we would much prefer a total ban on defining OperatingProfile within Service; we would much prefer the definition of when each trip is valid to be contained within the trip itself.
“Additionally, a tendency in TXC is to export entire routes as single route sections, which then repeat sequences of stops when there are variations to routes e.g. short workings.”
7 – Routes and Tracks
“As with routes, there has been a tendency in TXC to generate journey patterns which consist of a single journey pattern section with all of the timing links required for that particular journey pattern.”
8.1 – Journey Timings on Standard Services – Introduction
Absolutely. It’s a case of simplicity over complexity every time if you’re a software engineer. Less complexity means a simpler technical specification – leading to shorter development cycles, fewer bugs and a happier life.
“It is strongly recommended that a TXC-PTI document uses efficiency when creating RouteSections, breaking them down into logical sections, so that re-use within Routes is readily achievable. This will help limit file sizes.”
7 – Routes and Tracks
“In TXC-PTI it is strongly recommended that JourneyPatternSections are logically structured to facilitate re-use within JourneyPatterns to help minimise file sizes”
8.1 – Journey Timings on Standard Services – Introduction
This is a shame. Having such recommendations means that all receiving systems need to be able to re-assemble RouteSections and JourneyPatternSections correctly. In and of itself, it probably won’t require a massive amount of additional development resource to achieve – but it’s another small hill to climb for an app developer who simply wants to extract a set of trips from an XML file and cannot comprehend why it’s so much more difficult to do this with a TransXChange file than it is with other standards.
It’s not that we are anti-re-use. Far from it; where software development is concerned we’re all for it! It’s the choice that we take issue with: with too many “recommended” and “strongly recommended” options, we end up in a minefield of possibilities when we try to process the resulting files. We would much prefer that it should be mandatory to generate routes and journey patterns which consist of a single route section or journey pattern section – even if the resulting files are a few kilobytes larger.
“However, for reasons of readability and compactness it is possible to define default patterns and timings that suit the majority of the trips, and then override them in individual vehicle journeys where appropriate.”
9.1 – Vehicle Journeys – Introduction
It’s hard to imagine whose readability is being referred to here; from the perspective of a data consumer, surely readability is improved by adhering to the aforementioned principle that “the best place to hold the information about x is within x itself”. Whilst setting of defaults (and then overriding them) is well-understood by those of us who have been using TransXChange for 15+ years, it really doesn’t seem to be a sensible way of encouraging wider adoption of the official national Bus Open Data repository by app developers who are not experts in UK transport data modelling.
Ultimately, it’s in everyone’s interest that the data supplied by BODS is as easily understandable and usable as it is accessible to its target users. The last thing anyone wants is endless arguments between data producers and consumers surrounding what the data is supposed to mean vs what the data actually says – yet whilst any complexities of inheritance, overrides and excessive normalisation remain in TXC-PTI, it seems highly likely that this will be a regular occurrence. Since there are no plans for an official arbiter of TransXChange correctness of interpretation, we run the risk of data consumers walking away from BODS and either using different sources (including unofficial GTFS feeds of varying quality) or worse still, shying away from the development of transport apps completely. Either way, this would rather defeat the original objective of investing so much time and effort in a national Bus Open Data repository.
Whatever happens, there are two unwritten rules that always seem to hold true: firstly, timetable and schedule data exchange is complex. Really, really complex when compared with the majority of Open Data that app developers are used to consuming. Secondly, it’s relatively easy to develop software that correctly handles data from a single data producer at a given point in time; it’s an entirely different proposition to ensure that your software correctly handles any compliant data from all data producers (who are at liberty to switch supplier systems as and when they please) – especially when the standard data format contains a number of optional ways of conveying the same piece of information. Surely the best solution is to simplify the standard as much as you dare!