OpenStreetMap logo OpenStreetMap

Carnildo's Diary

Recent diary entries

Sigh

Posted by Carnildo on 23 September 2020 in English.

I spend hours studying news reports and carefully tracing building outlines in Malden, noting which ones did or didn’t survive the Babb fire.

And then a bunch of wanna-be do-gooders come by and crap out HOT-quality mapping, complete with duplicate buildings, vehicles mapped as buildings, scribbled outlines, and nothing but self-congratulatory hashtags for edit summaries. I think I’m going to just revert the whole batch.

Location: Malden, Whitman County, Washington, 99149, United States

What the robots.txt file does

Posted by Carnildo on 24 June 2019 in English.

Disclaimer: I am not an OSM website developer. All information here was obtained by looking at the OSM GitHub repository and poking at the OSM website.

There’s been some controversy recently over the contents of the OpenStreetMap robots.txt file. I think it might be informative to look at what the file actually does.

Allow: /user/

This does nothing. “Allow” lines in a robots.txt file permit the crawling of URLs that would otherwise be denied, but there’s nothing in the file that would deny the /user hierarchy.

Disallow: /traces/tag/
Disallow: /traces/page/

These are various alternate ways of searching the GPS traces that have been uploaded on the site. The main trace listing is still accessible.

Disallow: /trace/

This is the API endpoint for accessing GPS traces. It is not intended to be displayed in a web browser, and contains nothing useful for a search engine.

Disallow: /api/

This is the API endpoint for editing the map. It is not intended to be displayed in a web browser, and contains nothing useful for a search engine.

Disallow: /edit

This is the URL for the in-browser editor. Everything under this URL is behind a login barrier, and it contains nothing useful for a search engine.

Disallow: /message

This is the URL hierarchy for the on-site PM system. Everything under this URL is behind a login barrier, and it contains nothing useful for a search engine.

Disallow: /login

This is the above-mentioned login barrier. It contains nothing useful for a search engine.

Disallow: /history

This is the visual history browser. The contents change far too rapidly to meaningfully index on a search engine.

Disallow: /geocoder

This is the on-site search system. Search engines searching search engines never ends well.

Disallow: /browse

Disallow: /*lat=
Disallow: /*node=
Disallow: /*way=
Disallow: /*relation=

These are obsolete URL hierarchies for browsing individual map elements. The current URL hierarchy, with URLs of the form https://www.openstreetmap.org/way/238241022, can be indexed by search engines.

Disallow: /user/*/traces/
Disallow: /user/*/diary
Disallow: /diary

These are the only entries that block pieces of the site that might be of interest to a search engine. /user/*/traces/ are the description pages for individual GPS traces, /user/*/diary is individual diary entries, and /diary is the main diary listing.