OpenStreetMap

tomczk's Diary

Recent diary entries

Recently I found Streamlit which is a pretty cool Python library that makes it easy to create web apps for visualising data.

I converted changeset dump from planet.osm.org to Parquet file format and uploaded it to AWS S3 storage. Then created this streamlit app in their free cloud: https://ttomasz-tt-osm-changeset-analyzer-main-apdkpy.streamlit.app/ which displays some basic statistics.

The app leverages the power of DuckDB, a database engine that can query these files over internet on demand. Parquet files, which are a popular format in modern cloud data lakes, have several advantages over traditional file formats. They are column-oriented, compressed, and support range requests, which means that you can download only the portion of the file you need, instead of having to go through the entire file, making processing larger datasets much faster.

DuckDB works similarly to SQLite in that it doesn’t have a dedicated server. You run the queries locally [0]. This makes the setup super simple you either install the binary or configure connection in IDE like DBeaver and you can run SQL queries.

Running these simple SQL queries over remote Parquet files takes about a minute or two. Trying to do the same with a custom script on raw changesets.xml.bz2 file would run longer not to mention that the effort to prepare the code would be much much larger.

It would be great if OSM hosted more “consumption ready” data instead of relying on users to do their own coding and parsing.

Let me know if you have some ideas for charts/tables that could be added to the demo.

[0] - well in this case they are running on streamlit cloud’s server but you can run the queries locally on the same parquet files easily

If you want to import OpenStreetMap data to PostgreSQL (+ PostGIS) database two popular tools are osm2pgsql and imposm3.

Both were designed to prepare data for rendering although some time ago osm2pgsql was upgraded with scripting capabilities that go beyond simple mapping files that specify what tables to create and what objects should be inserted there (filter by tags).

There is also osmosis if you want a - more or less - copy of OpenStreetMap database.

imposm3 development was put on hold due to lack of funding while osm2pgsql is actively developed. This makes it questionable choice to use imposm3 but it does have some nice qualities that can make it better for some projects.

In my opinion it’s a very good tool that is useful when:

  • you have a webapp and want to use OSM data but Overpass API is too slow/limited
  • you want to use OSM data for spatial analysis and keep it updated

** The rest is of the text including instructions how to import data is in my github page: https://ttomasz.github.io/2022-04-08/osm-import-data-with-imposm **