OpenStreetMap

I took a tip from Dexter from the City of Detroit Office of Innovation, and started on the Detroit Mapping Challenge by browsing City of Detroit Open Data. What open data could help make OpenStreetMap Detroit the best map in the world? BikeShare locations looked like a useful and straightforward starting point. And now OSM Detroit has very accurate MoGo bike share docking stations. There turned out to be a few surprises getting there, and lessons to absorb for mapping all of Detroit.

The data looked decent on quick inspection, and is licensed public domain. Maybe a very small human supervised import is in order. I browsed OSM to see what was already there, and turned out this already all 43 docking stations were already added by mapper175, with the changeset comment Added nodes for MoGo Bike Share system stations (resurvey needed for most of them). Are they in the right place? Is there something a remote mapper could do here?

I picked the Second Ave & Prentis St docking station at random, and opened it up in iD.

I cycled through all the imagery options available to OpenStreetMap, and no evidence of a bike share dock station. Turns out MoGo launched just over a year ago and all of the aerial imagery is apparently older than that.

I then tried street level imagery. Bing Streetside is comprehensive but collected back in 2014. Fortunately in Detroit, Mapillary and OpenStreetCam have extensive and recent coverage. Clicking through the street level images in iD to find something with the correct alignment to capture the dock was sometimes tricky — depended on the precision of the capture position, the direction, the field of view, and distance. Confirming features seen in the street level imagery against aerial imagery was helpful to choose a well located shot.

For Second Ave and Prentis Street, the docking station moved just a few meters, from on street to inside a parking lot. While the accuracy of the first location in OSM was probably ok enough to find the docking station, the position of on the street vs sidewalk has substantial meaning for high definition mapping and analysis.

I next checked out the dock at Cass Ave & W Hancock St. The original position on West Hancock looked ok, but the OpenStreetCam imagery quickly confused me. One image showed the doc on West Hancock, and another from a few months later on Cass Avenue. I couldn’t trust what my eyes told me here — is the location or direction of the images incorrect? Am I looking at the same dock?

I searched and found that this dock had moved due to street construction. The reported move date didn’t quite match up with the first image but seemed reasonable to explain what I was seeing. And it also mentions the move of **2nd Ave. and Prentis — somehow luckily the first two stations I picked had substantial changes.

The MoGo site also had a map, and browsing it, the locations for the above two dock stations were very accurate. It looked like the location in the MoGo map corresponded exactly with the docking station kiosk unit. Perhaps I could simply use these locations to correct the data in OSM. But what about the license? Asked Dexter and ..

.. he confirmed that I could us it in OSM and that they would now update the data set on the main Open Data site. Open Data is more than a data source, it’s a conversation.

But first, had to screen scrape. Viewing source, the dock locations were stored in two lists of coordinates. I wrote a quick script to scrape this and transform into GeoJSON and then load into iD as a local file.

I had confidence in this data from the first two stations, but decided to confirm each location from street level imagery before adjusting. I zoomed in on each station from the local file in turn, enabled street level imagery, clicked to find the best view and recency of imagery, until I had a confirmation of the MoGo location.

It became a bit monotonous, though still interesting to investigate the streetscape across Detroit. I began day dreaming about a process that would make it easier for me to confirm. Something that would show the current and new point, automatically determine the best source of imagery from all available — biased towards most recent, and including machine learning to filter images that probably have a dock in view. Step through each point, reposition if needed, and confirm.

For the most part, the locations were spot on. This one confirmed with Mapillary imagery contributed by, in fact, Dexter!

And in just a couple cases, very close but slightly off. Like the station above, where the MoGo map has the station on the street side of the sidewalk, but OpenStreetCam had the station closer to the building. I decided to go as close as I could tell from the imagery.

I learned a lot from the short exercise.

  • OSM is iterative. The first version of this data was pretty good, now it’s great. And we’ll need to update again as MoGo changes and grows.
  • Open data is a conversation, not just a download site. Connect with data holders and it will help unlock more data and possibilities.
  • Street level imagery is a superb source. Utilize as many sources as possible, research from multiple angles, and pay close attention to recency.
  • Human review is always key, but we need to make it very easy with processes that minimize drudgery and take best advantage of human intellect.
  • You can learn a lot about a city, even from far away. Takes patience but it’s rewarding to understanding the geography of a city.

There’s going to be a lot to learn about the city, and especially about the process of mapping, from making Detroit the best map in the world. Excited to see what happens next.

Discussion

Comment from SK53 on 16 July 2018 at 19:54

Excellent post, there’s much in this workflow which can of course be applied elsewhere.

It’s pretty neat that it’s possible to sample at least some locations using Mapillary & OpenStreetCam. It may be possible to do similar things with other open data sets which represent highly visible objects (food safety ratings for restaurants & fast food is one obvious example).

I think small random (or semi-random) samples from open data would ideally be done at the outset.

I’ve done something like this myself looking at 180 (out of 18,000) trees in the Birmingham open data set, selected at random within 1 km of two locations which were convenient for me to survey before a hospital appointment. In the vast majority of cases, there was a tree at, or very close to the position, but a rather higher proportion of the sample had errors of one sort or another. For a mapper just wanting tree positions the data is fine, for someone wanting to find particular trees there was work to be done. The worst error was the weirdest, the data was OK, it just shouldn’t have been there: about 40 or more trees which I sampled were not actually owned by Birmingham council at all: they have University of Birmingham tree inventory tags.

I really need to write my own post about all I’ve found in the past couple of years using Open Data on street trees, as I think there are other learning points. But I’d heartily endorse your approach here as the experiences you describe closely match my own with a different data domain.

Log in to leave a comment