Sunday, November 13, 2016

How far to the nearest bus stop?

One question that occurred to me when I attended the Rebooting the City Deal meeting was how dense the coverage of a public transport system needs to be to make it useful.

I've seen several statements along the lines of x% of Cambridge being within a 7-minute (or whatever) walk to a bus stop. But what is a reasonable walking distance?

I don't know, but I have a little data. OK, 2 data points. Where I live I have a choice of 2 bus stops. One is about 300m away, or 4 minutes. The other is about 500m with a slightly tricky crossing, or about 7 minutes. I would have no issues using the closer stop on a regular basis; the more distant one I'm not terribly happy with.

So, if we want to encourage large-scale use of public transport, I would say we want to aim for something close to the 300m mark. How does this relate to the actual distribution of distances in Cambridge?

I looked at OpenStreetMap. They have all the buildings, and bus stops. And it's open, so I can get at the data.

The good folks at BBBike make a bunch of exports of OSM data available. This includes the city of Cambridge, which is naturally convenient. I grabbed the main osm file.

It's in XML format, which is a little unfortunate. I then used osmfilter to reduce the data to more manageable proportions.

First, I want the locations of all the bus stops. Just the nodes, no relations or ways or dependencies. The bus stops have highway=bus_stop so the following filter command should do it:

./osmfilter Cambridge.osm --keep="highway=bus_stop" --drop-ways --drop-relations --ignore-dependencies

Getting the list of places (where place is somewhere a user may use as a starting point or destination) was a bit trickier. In the end I made the assumption that everywhere of interest would have a postcode. It's not quite true, but I'm just interested in the distribution here so it will do as a first approximation. That gives me a filter command like:

./osmfilter Cambridge.osm --keep="addr:postcode=" --drop-ways --drop-relations --ignore-dependencies

The format of the XML is quite simple, it splits things up nicely onto separate lines. This makes it very easy to use basic Unix tools like grep and awk to pick out the lines that have latitude and longitude on them which gives me lists of coordinates.

I then put together a very cheap and cheerful Java program to read the files of coordinates and calculate the distances between all of them, using a simple equirectangular approximation as described here. The first run of ~10000 places and ~1000 bus stops took less than half a second to print the distance to the nearest bus stop, so I wasn't going to optimize it any further.

The most immediate question can then be answered - what is the distribution of distances to the nearest bus stop? A quick hack using the dist prefab in ploticus and I get the following graph:

That's number of places against distance in meters, 10m bins.

This is really quite interesting. It shows that most places in Cambridge are within a few hundred metres of a bus stop. In fact, almost all are within my acceptable distance.

In reality, it's not quite as good as that. The first thing to correct for is that these are distances as the crow flies. Actual walking distance would be a little further - it's probably reasonable to multiply by 1.4-1.5 to allow for corners, curves, and crossings. Even then the bulk of Cambridge is still inside that 300m range.

The other problem, and it's a much bigger problem, is that this measures the distance to a bus stop, not to a bus service. A significant number of the bus stops in the data are no longer in use. Short of hand-editing those out I'm not sure how to approach this.

Furthermore, of those bus stops that are in service, you have to make allowances for the timetable. If there's only one bus an hour (some bus stops are one or two a day) then you have to make some allowance for that. One simple approach would be to calculate the time to next bus, which would be the walking time plus half the time between buses. (I'm prepared to use the peak frequency here. In reality people would time their setting out to align with the timetable rather than it being random, although the less frequent a service the earlier you aim to arrive to make sure you don't miss a bus. It's complicated.)

It would be nice to show the distances on a map, which would give you a much better visualization of where in Cambridge has good or bad access to a bus service. But that's only really worth doing if you have better data.

At the present time only a couple of dozen bus stops in the OSM dataset are annotated with the necessary information to allow more detailed analysis. It would be nice to get more accurate metadata (and to have the stops in the right places). There's a local Meetup group, but it's not terribly active. Still, the whole point of OpenStreetMap is that it's freely editable by anyone.

No comments: