The past 2 months my team at Mapbox have been working full-time on one of our largest data projects to date - Realigning the highways of Japan on OSM, which is now reaching its final stages.
Japan looks thoroughly mapped on OSM and is dense with highways and local streets. On inspecting the data however, we found that most of the roads do not match the underlying satellite imagery or GPS tracks.
To complicate matters, we found that the Bing imagery has strange distortions and is poorly orthorectified and offset from GPS tracks. Inititally we thought this could be an isolated case, but on comparing with StravaGlobalheat layer these distortions continue throughout the country in an inconsistent manner.
Comparison of Bing and ortho imagery from Geospatial Information Authority of Japan (GSI)
Further research revealed that most of the map data comes from a poorly documented import from Yahoo Japan in 2011 and the the OSM-JP community has been struggling to to execute a post import cleanup of the massive dataset. A complete list of data issues we found has been now documented in this diary post. Our finding was that most of the data has not been touched since the import and remains to be of questionable quality.
This motivated us to kick into action and see how our team resources could be used to help out in such a situation.
Confirming findings
The first step was reaching out to the OSM-JP community to make sure our findings were on the right track. The most accessible point of contact was Taichi-san, a prolific mapper and Director of OpenStreetMap Foundation Japan. Our full investigation and workflow were documented on our public /mapping repository and our findings were confirmed by Taichi - the imported data in Japan had serious data issues.
Thanks to the community we found the availability of orthorectified satellite imagery from GSI which perfectly matches GPS data and is the only reliable imagery source for the country. The coverage of the imagery includes most urban areas but is not complete for Japan.
Coverage areas for GSI imagery compared with density of geolocated tweets
Moving the map
Realigning existing road networks on such a large scale is a tough problem similar to that of the US TIGER import which has still not been completely fixed after 8 years. Unlike in the US where it was possible to compare later TIGER datasets with OSM to help find differences to cleanup, there was no way to automatically calculate the scale of the issue in Japan, which meant a completely manual cleanup by eyeballing every road. That amounts to over 5 million roads in the import. The current set of mapping tools have been built for data creation and addition rather than rework. To establish a workflow, we did a trial run in a less dense part of northern Japan using the tasking manager to evolve a process for the cleanup. Some of our observations:
- GSI imagery coverage was limited to parts of Japan and we would need to focus on these areas first
- Road were split into small segments which needed to be merged for easy realignment
- Divided highways were traced as a single road instead of parallel oneway roads
- Unwanted tags like
yh-width
where present from the import and had to be discarded for easy merging
- Prioritizing the remapping by focusing on only major roads (motorway, trunk, primary, secondary and tertiary) was more practical than realigning all the roads which would be a never ending task.
- Realigning data in multiple passes, each limited on one class of roads helps maintain focus on a task
After a rough idea on the remapping process, we created a series of projects with comprehensive instructions on the teachosm tasking manager and spread the word on the OSM Diaries and the talk-ja mailing list to encourage more community participation in the effort.
Remapping project areas and statistics
### Tasks on http://tasks.teachosm.org
Note A pass is a mapping run for a task area. Pass 1 = fixing motorway, trunk, primary, secondary roads. Pass 2 = fixing tertiary roads. Pass 3 = fixing unclassified and residential roads
To scale up the merging process, @Rub21 made a script which smartly merges highways taking into account the tags (to merge roads with identical tags) and the angle between ways (to avoid merging streets meeting at intersections). We used the whole project as a chance to discover possible tools that can be handy and speed up the mapping skills.
Outreach and community involvement
Carrying out such a large scale map improvement could not have been possible without the support of the JP community. Regular OSM dairies from our team were helpfully translated into Japanese by volunteers from talk-ja. The project instructions were also simplified and translated for the convenience of local mappers interested in joining. A Facebook Group helped gain more visibility as it tends to be a more popular medium for communicating with Japanese mappers than the mailing list.
Students of Taichi-san’s class involved in the remapping projects
In the meantime we are looking at translating our mapping guides into Japanese to allow more local mappers to join the projects using JOSM.
SOTM-Japan
Also timely was State of the Map-Japan that Mapbox got an opportunity to sponsor, and a vacationing @planemad gave the closing keynote to bring attention to the data issues and the remapping.
@planemad at SOTM Japan
The road ahead
The scope of our team’s realignment efforts was limited to only major highways (motorways, trunk, primary and secondary roads), and further by the availability of ortho imagery from GSI. The target was to correct the roads which carry the bulk of the road traffic and further continuously improve areas where we detect active map users and map feedback. We have currently covered 90% of our target improvement area and will be evaluating the impact of the cleanup in the coming weeks and building missing documentation, tools and JOSM plugins for the benefit of new mappers.
Correcting the entire street map data in Japan will take many more years and can only be accomplished by the presence of an active local community that can maintain and enrich the data using field knowledge. Meanwhile, this effort needs to serve as a reminder on the issues of large data imports and the impact it has on dynamics of community growth and participation. It also exposes how inspite of being united by a common map canvas, language still plays a significant barrier in how we communicate and support each other in OSM.
Dairy entries on remapping
Status of remapping: Light Green = Done (Pass 1), Dark Green = Done (Pass 1 and 2), Red = Not done
Some Statistics
- Area of Japan: 377,000 km2
- Area covered by GSI imagery: 180,000 km2 (47%)
- Priority project area: 120,000 (60%) km2
- Outside project area: 60,000 km2
- Cleanup progress as on 11/24: 106,000 km2 (90% of priority area)
- Data team strength: 19 members
- Number of weeks spent: 9
Here is a Map showing all the roads align/merged during this project by data team since September.
Next Steps
- As we wind down with Japan, it would be great to hear from the Japanese OSM community on the impact of this exercise. We invite Taichi-san and other members of OSM-JP to assess the work and provide us valuable feedback that can be used to improve how the Mapbox data team can help strengthen local mapping communities.
- The Japanese community can greatly benefit with localized mapping guides to use JOSM for cleanup. Interested translators can get in touch here
- If you have an idea for a new tool or JOSM plugin to help in such map cleanup tasks, propose an idea in our /mapping repository.
- Do you consider yourself an expert mapper? jump right into the latest project in Hiroshima