OpenStreetMap

n76's Diary

Recent diary entries

This is not going very fast

Posted by n76 on 13 August 2020 in English.

I haven’t had much experience in performing an import and the Orange County, California buildings and addresses is the first (and at this rate only) import I’ve instigated.

Other Imports

I assisted with a building import for Cupertino when I lived in Silicon Valley. And I added a couple of buildings in support of the Los Angeles County import a few of years ago. But in both cases my contributions were very small.

Most of my experience with imports has been in attempting to clean them up.

TIGER and NHD

Anyone who has edited in the United States will have run into “TIGER deserts” and I’ve spent my time in purgatory in those deserts. And if you‘ve edited in the rural areas of California may have run into some imports from the national hydrological dataset (NHD) which doesn‘t seem to be much better for water than the old TIGER was for roads. The two can interact in annoying ways. At least annoying to someone who has a desire to keep the number of suspect issues reported by Osmose down. For example:

  • You map a track or trail out of the mountains to where it connects to a road in the Central Valley.
  • That road is from the TIGER import and has never been updated.
  • So you take a moment to correct the alignment where the new trail or track connects to the road and maybe add a surface tag, etc.
  • You are now the last editor that touched that road.
  • But that road extends a fair distance and it crosses a number of waterways from a NHD import.
  • Osmose (and possibly other tools) now report that you are the most recent editor of a road that is crossing waterways.
  • So you go back and try to correct those issues. In at least few cases you will find the waterways goes under the road through a culvert.
  • So you split the waterways and add the appropriate tunnel=culvert and layer=-1 tags.
  • You are now the last editor of those waterways.
  • Which happens to cross a several more TIGER imported roads so your Osmose error count just went up.
  • Wash, rinse and repeat. I don’t think I’ll ever get done with this.

San Diego Address Imports

When I moved to Southern California several years back I became aware of problems with an address import that had been done for San Diego County. In attempting to make things better I made a conscious effort to touch as few tags as possible. So there were a lot of is_in:*=* tags I left even though they were deprecated at the time. Well the QA tools have gotten more verbose over time and Osmose started nagging me about those. In addition to a bunch of duplicate addresses and/or address streets not matching street names that I can‘t remotely resolve. My biggest regret on my attempt to clean up the San Diego addresses is that I used my normal OSM ID rather than creating a new “fix it” ID.

Mapping in San Clemente

When I moved to San Clemente I spent quite a few days walking the streets to collect address data. It gave me a pretty good overview of my new home town, gave me exercise, and allowed me to add what I surveyed to OpenStreetMap. I was able to map the addresses and buildings for nearly everything within a couple of miles of my house.

But there are some gated neighborhoods that I could not access. And there were many more neighborhoods that were too far from my home for easy mapping. So there have been annoying blank spots on the map of my city.

Maybe An Import For Those Blank Spots

I recently became aware that there is a “Map With AI” layer that contains building outlines that could be imported. I took a look at it and found that the outlines are the ones provided by Microsoft/Bing and are, in my opinion, far too low a quality for OSM.

But that got me to looking around for a dataset that could be imported that was better. It turns out that Orange County has published a dataset with a Public Domain license containing building outlines, addresses and, on some buildings, elevation and height information.

So I decided to set up an import.

Issues

I‘ve imported a tiny fraction of the data and it is not easy going.

  • The building outlines are the same Microsoft/Bing data that was not worth importing by itself. The result is every single building needs to be corrected before it can be saved to OpenStreetMap.
  • It seems to take me longer to correct a bad building outline than it would to create a new one.
  • The address data is not as clean as I thought when I first examined it. Not horrible, but there are duplicate addresses, missing addresses and addresses that are obviously on the wrong street.

Slow going

I don‘t want to have any changeset from this import to be thought of as yet another reason why imports should be discouraged, so it is taking me hours for each little area.

For example, yesterday I noticed an obvious address error. The fact it was an error was obvious but unfortunately the correction was not obvious. So this morning I drove to the area in question to do a survey.

If I actually have to survey each area, then what good is an doing this as an import? And I am not up to walking every street in the county, I‘d never get done.

Going Forward

For San Clemente I will continue to slog though this data and curate and edit it enough that I feel it is adequate for OpenStreetMap. But scaling the building by building corrections up for all of Orange County is way to big a job for me.

I am thinking about centroiding the building outlines into single points and then only importing the addresses for the rest of the county. That would probably be much faster and would provide some benefit to OpenStreetMap.

Hiking trails in OpenStreetMap

Posted by n76 on 3 June 2020 in English.

I have been a fairly avid hiker for decades would like to make sure that the trails that I use are properly mapped in OpenStreetMap. As I render my own maps this is doubly important to me.

But there is some ambiguity in how this should be done.

First, the OSM terms/tags of footway and path are not really used in my dialect of English. Second, OSM tag names and values are more code words, somewhat based on UK English, with OSM.specific meanings. It might be easier if something like “h1” were used as a value rather than, say, “trunk”. Then the baggage of your own local dialect of English would not get in the way.

We have sidewalks or walkways and we have trails. While we do use the word path we don’t use footway. At least not in everyday usage.

  • Sidewalks and/or walkways are usually in urban or suburban areas. They are generally hard surfaced with concrete being the most used surfacing material though compacted crushed rock bound with an acrylic binding material may be used in more “park like” areas. They are level and smooth enough that you can easily walk on them with flip-flops while pushing a stroller.
  • Trails are almost always multipurpose. While hiking may be the predominant use most trails are also open to bicycles and equestrian use. Within US Federally designated wilderness bicycles are not allows but trails are usually still shared use between equestrians and hikers. Note that while bicycles are allowed on most trails, it would be folly to try to ride them with anything other than a mountain bike. In addition there are trail systems specifically set up for “off highway vehicle” (OHV) use. They are called trails but you will find dirt bikes and other ATVs rather than hikers, equestrians and bicyclists.

OpenStreetMap Wiki

With that in mind we look at the various tagging defined in the OpenStreetMap Wiki that might be appropriate.

First, of course, is looking at highways and the obvious first stop is on the section about paths. The paragraph in that table about “footways” says it “includes walking tracks”. “Walking tracks” is not used in my dialect of English. To me it might mean hiking trails or it might not. Another possibility listed is for “path” and that says it is a “a non-specific path” and it says to use “footway” “for paths mainly for walkers”.

So my first impression is that “path” more closely aligns with “trails” in the Western United States. But lets dig a bit deeper to be sure.

“Footways”

The page about footways says that it should only be “used for mapping minor pathways which are used mainly or exclusively by pedestrians.” For more major pedestrian ways it say we should use highway=pedestrian. And “for multi-use or unspecified paths and trails used by a variety of non-motorised traffic the tag highway=path may be better suited”. It further says that path can be modified with tagging for sac_scale, trail_visibility, surface and access. My feeling from reading all this is that hiking trails should be tagged as “path”.

Apparently I am not alone as I see lots of other mappers in my area use “path” for hiking trails and “footway” for paved walkways in suburban and urban areas.

“Path”

Paths are “either multi-use or unspecified usage, open to all non-motorized vehicles and not intended for motorized vehicles unless tagged so separately.” Seems to fit our local trails open for hikers, equestrians and, in many cases, mountain bikers.

So again, from my local perspective, it seems hiking trails should be tagged with “path”.

A gap in the values for “highway”

The “non-motorized” adjective rules out using “path” for OHV trails. But there isn’t an alternative either: OHV trails are usually too narrow to be tagged as a track. And if you want to be pedantic about it, “tracks” are “roads for mostly agricultural or forestry uses.” Clearly, a recreational trail is not for agricultural or forestry use.

And, from reading the tagging email list, it seems there are parts of the world where the way from one isolated hamlet/village to another is too narrow for a normal sized vehicle so they are used by motorbikes. While for utilitarian use rather than recreation this seems to be the same gap in highway values as OHV trails fall into.

But tagging abhors a vacuum and “path” seems to be the closest fit so it has been used in those cases.

In France there are bicycle ways called “voie verte”. It is apparently a wide, smooth way for bicycles, usable by regular (not mountain) bikes, but does not meet the OSM standards for “cycleway”. So they are apparently often tagged as highway=path, bicycle=yes. Exactly the same tagging as a typical hiking trail with quite different characteristics where I live.

Because there were no highway values specifically applicable for OHV trails, access to remote villages, voie verte, and probably other types of highways, “path” has been used or misused. It is now a bit like highway=“road”. It lets you know something is there, probably narrow and unpaved, but not really what it is.

Hiking in other parts of the world

My impression is that in other parts of the world hiking trails are often restricted to hiking. No mountain bike use. And, I guess, no equestrians. In those areas hiking trails are often but not always tagged as “footway”.

Other Tags

So if you want to render or otherwise process OSM data for hiking trails you need to be able to handle both “footway” and “path”. And you should be able to tell the difference between a voie verte and a Western United States multipurpose trail, even though they may be tagged the same way.

This is at least part of the issue with “path” that has been raging on the tagging email lists recently.

At present my feeling is that “path” should be formally redefined to be the narrow/not suitable for normal motor vehicle equivalent to “highway=road” and that new purpose specific highway=* values be defined for hiking trails, voie verte, OHV trails, etc.

While I want a nice resolution to this, my immediate goal is to be able to render trail distances on my maps given the tagging that is currently being used “in the wild”. So lets look at other tags that are often, but not always, found on hiking trails. The hope is that we can disambiguate hiking trails from other things.

sac_scale

The Swiss Alpine Club (SAC) scale wiki page was poorly translated into English. My impression from the photographs and descriptions are that only two values make sense for the trails worth rendering: “Hiking” and “mountain_hiking”. The other, more difficult classifications say things like “the treadway may not be visible”. In my mind, if the treadway (trail) is not visible then it is not a trail. In my part of the world that would be classified as “off trail hiking”. But, for use in deciding if a footway is a paved walkway or a hiking trail the simple existence of this tag could be a clue.

trail_visibility

This tag comes from the same people that created the sac_scale tag. Again, it has some values that confuse me. “Horrible” is defined as “often pathless” and “no” is defined as “mostly pathless”. As above, if you can’t see the trail then in my mind it isn’t a trail. Still, the presence of this tag strongly indicates we are dealing with a hiking trail.

Informal

The “informal” tag seems to be widely misunderstood. I recall the threads on it in the tagging list and from that it seems clear to me: It is used to mark a trail that is not officially sanctioned by the land manager.

If I recall correctly, it came out of trying to create an official map for a California state park where they had “social” or “Desire” paths that they were trying to close off and restore back to nature. And they certainly did not want shown on an official path.

But OSM maps what is on the ground and if the land manager removed the trails from OSM they would immediately be re-added some mapper. The solution was to mark the trails as “informal” and also put a “access=no” tag. The informal tag allowed their renderer to ignore the trail. The access=no would, they hoped, keep OSM based apps from routing hikers past their piled brush barriers and over the illegal trail.

Anyway, for my purpose the “informal” tag is only seen on trails so if a way has it then it is highly likely that it is a trail. Maybe not one the land manager wants me to use, but it is a trail.

Surface

Surface is a horrible tag. Not because the concept is bad but because the range of values in use. On the good side, there are some fairly common values that can help distinguish between a walkway (hard surfaces like “paved”, “concrete” or “asphalt”) from a hiking trail (soft surfaces like “dirt”, “ground”, “sand”, “unpaved”). Of course there are some ambiguous values too.

Access

It seems to me that access can be used as a negative indicator. If a “path” has a “foot=no” tag then it is likely the path is for something else, possibly it is a OHV trail. But I have never seen this value on a path in the wild.

Name

At least where I am, very few suburban walkways have names but many trails do. In addition, most of the named trails have a suffix of “Trail”. So a way tagged “highway=path”, “name=‘Overlook Trail’” is likely to be a trail. This is, of course, a regional thing and not likely to be expected around the world.

Putting it together

A final wrinkle

A lot of the local “wilderness parks” have only recently been retired from being ranch land. And a lot of the designated trails were four wheel drive access roads. “Tracks” in OSM parlance. I want to treat these a bit like hiking trails, that is show distances on them. But I don’t want to show distances on all tracks. Fortunately I can use the “foot=” tag to detect that.

The decision tree

Based on all that, my current decision on whether a way is a hiking trail or not is currently:

drawing

Wouldn’t it just be easier if they were tagged as highway=hiking_trail? It wouldn’t be hard to convert all the local trails from highway=path and it would be easier for non-outdoor types to map a hiking trail.

Mapping a neighborhood

Posted by n76 on 15 May 2020 in English.

I created this video last summer to show how I map a neighborhood.

And now I am seeing if I can create an “embedded style” link to it on my OSM diary using kramdown.

Mapping a neighborhood

I’ve slightly revised my mapping and editing techniques since then but I think it is still a reasonable introduction or tutorial on using some Android tools and JOSM for mapping.

Location: San Clemente, Orange County, California, United States

Use of the Name tag

Posted by n76 on 17 January 2020 in English.

My native language is a dialect of the lingua franca of the late 20th and early 21st centuries. And I live in a culture that is notorious for being adamantly monolingual. But I thought I had some understanding of the issues of mapping names in a way friendly for internationalization.

It seems pretty clear cut when you read the wiki. Put the local name, in the local language, as the value for the “name” tag. You may also put it in the “name:<lg>” tag value too.

To be clear, I am not worrying about the legal name, short name, international name, alternate name, or other various names for a place. Just “the common default name” to put in the “name” tag.

I make paper maps for myself and if traveling like to have both the local name as I will find on signs and the name in English, if available, both rendered. For example:

काठमाडौ
Kathmandu

I may not be able to read the local language but I can compare the glyphs on my map with the glyphs on a sign to see I am entering a specific village. And if an English name exists, even if only (automatic) transliteration, I will have something to verbalize.

But my attempt to produce a map of a trekking destination in Nepal showed that it is not that simple.

First, the local mappers in Kathmandu and apparently throughout Nepal decided to put “Romanized” versions of their names in the name tag. I am not sure what “Romanized” means in this context as they did not specify what phonetics might be used when “Romanizing”. The current tagging of Kathmandu breaks Internationalization:

name=Kathmandu
name:ne=काठमाडौ

Please, please, don’t do this. It is specifically discouraged in the wiki. If a transliteration is needed, it can be done automatically by the data consumer. The tagging should be:

name=काठमाडौ
name:ne=काठमाडौ

Second, even if the “name:<lg>” value is set for the local language but the “name” value does not match how can a renderer determine what the local language is to use for the area? So far the solutions are ad hoc. I have seen suggestions that this has been solved by the OSM DE people. But when I look at the tool kit I see the problem it solves is transliterating a local language if the tag for the desired language does not exist. That is a good and useful thing to do but it is not the problem I am worried about.

Another suggestion is to follow the lead of SomeoneElse who has produced maps with Welsh and Scots Gaelic names in areas where those languages are dominant. This is closer to what I am looking for.

But both the German automatic transliteration and SomeoneElse’s implementations reveal that they use internally coded polylines to define areas where a language is used. In effect they are implementing an additional geographic database to help interpret the OSM geographic database. This seems terribly wrong.

Adding language boundaries to OSM has been discussed and there are issues. So it is unlikely that it will be agreed upon.

One of the issues is that there are places that share a boundary with many localities with many languages. For example, the coast line of the Mediterranean Sea is shared with many countries where many languages are used. What name should be used for that feature? The English one I used? Probably not.

Looking at it from the point of a simple data consumer it seems the following rules would go a long ways to improving how OSM derived maps can be presented:

  • If the feature has a name in only one language, then put that value in for the “name” tag.
  • Strongly encourage mappers to also put that value in the “name:<lg>” tag where “<lg>” is the code for the local language. This will allow data consumers to identify the language used in the “name” tag.
  • If more than one “name:<lg>” tag is present, one of them must match the value of the “name” tag. This allows identification of the language of the “name” tag.
  • If the feature is on a boundary with more than one language, then there may be no local “common default name”. In that case, remove the name tag. If there is a QA tool that complains, then put a “noname=yes” tag on the feature. Yes, the feature has a name, it has multiple names. But we don’t know which is the “common default name” used by the multiple sets of locals speaking different languages.
  • Don’t try to be helpful and put multiple versions of the name in the name tag. That is almost a guarantee that the name tag should be omitted instead. I’d fix that for Mount Everest except that the values in the “name” tag don’t match any of the values in the many “name:<lg>” tags and I haven’t a clue where to move them since I read neither Nepali nor Chinese.

The above suggestions will not fix all the issues with internationalization of names. But given the OSM schema as it stands today, it will make some types of processing possible. For example, if you want to render a map in language “xx” then you can:

  • Use the “name:xx” value if it exists.
  • Otherwise determine the language of the “name” tag, possibly by looking at all the “name:<lg>” values for a match. Once the language has been determined, automatically transliterate the “name” tag to language “xx”.

Rendering a map, like the reference one at www.openstreetmap.org or as created by SomeoneElse where local names are used is a harder issue. If a feature has names in multiple languages (e.g. Mediterranean Sea) then which name should be used? That does not seem to have a general answer. If the communities that border the feature have mutually agreed to a common default name then that could be put in the “name” tag. But I suspect a mutually agreed upon default name is unlikely to be agreed to by the parties closest to the feature if they are using different languages.

Internationalization

Posted by n76 on 13 December 2019 in English.

I find that OpenStreetMap has the most current and most accurate data for hiking trails. But I also find websites and apps that use OpenStreetMap data like OpenTopoMap and CalTopo take a fairly long time, often months, to update. Some apps, like AllTrails, don’t seem to update their copy of OSM data at all (the OSM attribution by AllTrails is really buried too but that is a separate issue). Other apps like OsmAnd and Maps.me update frequently but either don’t show elevation (Maps.me) or are fairly complex with formatting other than what I like (OsmAnd).

I really like the look of the older (1950s through 1970s) USGS topographic maps that I was raised on. So for hiking I use Avenza Maps and I load in geo-referenced PDF maps that I render myself using publicly available elevation data and OpenStreetMap extracts. This allows me to have a current map I can use offline that looks good to me.

When I am planning on hiking in some far distant place I usually prepare by generating Avenza compatible geo-referenced PDF maps for the area. I start with looking at the area in JOSM with the various OSM compatible aerial imagery. I get a feel for how well the area has been mapped and sometimes make corrections if there is a glaringly obvious mistake or omission. Once I am comfortable with the data in the area I run my map generation scripts.

Next spring I will be trekking through some Himalayan villages in Nepal. My first step in getting ready for this trip was to revise my map generation scripts to handle internationalization. Once I fixed my scripts I realized that about the only names in the area that have Nepali names on them were those along the border. Those seem to be dual named in Nepali and Chinese. My guess is that it is politically expedient to have the name tag for Mount Everest be “珠穆朗玛峰 - सगरमाथा चुचुरो” rather than in only one of the two languages.

Away from the border most places don’t even have name:ne values at all much less have the Nepali name as the value for the name tag. They all seem to be “Romanized” (I did not notice what phonetics were used for the transliteration).

I was under the impression that the name value should be in the local language and in the local character set. Even Kathmandu has its name tag set to “Kathmandu” not “काठमाडौ”. Looking at the OSM Wiki, I see this follows the guidelines for mapping in the Kathmandu Valley and the change that specified this tagging was from over seven years ago.

Odd to me but apparently not unusual. Looking at the wiki pages for localization of the name tag and how some other countries are documented it seems this is not uncommon.

My preferred formatting for a name the local name in the local alphabet with the English name, if different, below. For example:

काठमाडौ
Kathmandu

It looks like my map generation script would need to ignore the name field if a name:ne tag is present. The easiest way to do this is to create a special case that is specific to Nepal. And if I were generate a map for some other country I’d have to special case that too.

I wonder why they decided to use this convention as it seems to make things more difficult for internationalizing maps. Why not just put the local official name value in the name tag?

I think the locals should decide how they map, so I will not try to “correct” their style of name tagging. Especially as there does not seem to be a world wide consensus about what is correct.

OSM has failed me

Posted by n76 on 17 May 2017 in English.

Actually, failed is too strong a word. Annoyed or disappointed is better.

I am on a vacation in Bilbao which is a lovely city with friendly people, photogenic streets and very good food. But I can’t use Osmand or Maps.me to find streets named in any of the tourist guides of even by the locals including the very nice staff at our hotel.

Why? Because the directions are in always in Spanish/Castilian and a year or so ago the many of the name=* tags for the streets here were edited to remove the Spanish names and have only the Basque names.

I have been consciously observing street signs and they are consistently are in both Basque and Spanish. Usually with Spanish on top. It is my understanding that OSM multilingual tagging calls for having all languages on the signs tagged. I have no issue with replacing the Spanish with Basque on the default name=* tags. But the Spanish names should have been put into name:es=* tags. That would allow visitors like myself a chance at a much better experience.

Imagine being able to find the street given to you by a friendly local who assumes, correctly, that you speak no Basque.

Not that it would have helped much with Maps.me as I don’t see any setting in that app to specify display or search on anything other than the default name=* tag. Osmand does have a setting for specifying the language but I don’t know how well it works as the data is not in OSM for it to work with.

So here I am with a wonderful looking OSM based maps with an amazing number of points of interest. And I have apps that can give me detailed directions once I figure out where I want to go. But I am reduced to using Google to find my destination and then I need to compare the street geometry to see where the location is on my OSM based map.

Maybe not a “fail” for OSM, but definitely annoying and disappointing.

Blame me for duplicate addresses. . .

Posted by n76 on 6 March 2016 in English.

Here is my excuse

On the road

As a child in the 1950s and 60s our family visited my grandfather’s beach house every summer. When we drove west on US80 from Arizona we often ate at the Major’s Coffeeshop in Pine Valley for a late lunch as that was the first cool spot to stop at after crossing the desert.

I recently moved to a Southern California beach town and had to make a number of trips to Southern Arizona and so was retracing this old route, now on I-8 rather than US80. On one trip I wondered if the Major’s Coffeeshop still existed and if the food there was still decent. Looking up “Major’s Coffeeshop”, “Major” and even “Pine Valley” on OsmAnd (offline mode) turned up nothing. Not to surprising about the coffee shop not showing up but a little curious that “Pine Valley” did not show up.

But there was a freeway exit sign pointing to Pine Valley so we took it to see what we’d find. Sure enough, about 1/4 mile from the freeway on the old road was the 50+ year old sign for the Major’s Coffeeshop with a small addition on the bottom for the current name, Major’s Diner. We had lunch (adequate and average) and went on our way secure in knowing OsmTracker would show where we stopped and having a receipt that showed the address so I could add it to OSM.

When I pulled the area up in JOSM the building was missing so I added it and added the tags for the address and that it was a restaurant, etc.

We have a problem here

But I noticed that there were a lot of nodes sprinkled through the area with addresses. Curious as to why “Pine Valley” did not come up in my search on OsmAnd if there were address information in the area I looked at them. They all were tagged with addr:city=”San Diego”.

Huh? San Diego is a long way from Pine Valley and on any direct route you would need to go through other incorporated citys like El Cajon and Alpine. Curiously, the nodes also had values set for addr:postcode and when I looked up the ZIP/postcode I found the U.S. Postal Service thinks that code is for Pine Valley. And when in the area I noticed the post office near the restaurant with “Pine Valley” clearly posted on the front. It seemed the addr:city values for all addresses in that area were wrong or at least very suspect.

I also noted that the addr:street values contained abbreviations which is not standard OSM practice. Further, all these address nodes seemed be from an old import and the mappers involved were no longer active and did not respond to my query about the import.

So I thought I could do some “arm chair” mapping and clean up the obvious errors. But first a check on the mailing lists to see what the group opinion might be. Typical responses from people far, far from the area who apparently didn’t even look at the place in question were pretty negative about trying to fix this without on the ground surveying. But a response from a mapper who was raised in the area confirmed that the locals there view it as Pine Valley and not some oddly displaced portion of the city of San Diego.

About that time another email showed up on the list from someone who noticed all the addr:street values through out San Diego county had abbreviations. I took a look a bit farther afield than just Pine Valley and discovered they were right: All of San Diego county was covered with address nodes with bogus street names and it seemed that any node that was not in an incorporated city was tagged as being in the city of San Diego. Basically a mess with lots of wrong data.

Making things better

Despite the stigma of “arm chair” mapping and especially the stigma of automated or semi-automated edits, I thought I could improve the situation. I decided to start with the sparsely populated mountainous and desert area of eastern San Diego County. My first attempts were simply using JOSM. I’d do a grep search for a addr:street value that ended in, say, “Ave”, select one of the results and search for all matching key/values. Once I had a selection I’d zoom in and see what the underlying street name was and correct them. Wash, rinse and repeat. Once the addr:street values were taken care of, I’d repeat for addr:postcode values to assure the addr:city names seemed reasonable. Very slow, very tedious, very error prone. And that was in an area with almost no address nodes. This was not going to work for the densely populated western coastal area at all.

Time to semi-automate the process. I wrote a script that would read a .osm file and expand out the values found in addr:street tags on nodes only (I thought that any polygons would have been entered manually by a individual mapper rather than through the flawed import). It would also flag any node where the was a mismatch between the addr:postcode and addr:city values. The workflow would be to select an area in JOSM, download the data for it, export the data, run the script, load the changed data, verify all the changes and upload. Repeat for another area, etc.

I found you have to be very careful with expanding abbreviations. For example “St” at the beginning of a street name is probably “Saint” while at the end is probably “Street”. A “Tr” suffix is probably “Trail” in the eastern mountain or desert areas but likely “Terrace” in the western coastal areas. And some of the coastal cities have alphabetically named streets, so you want to expand “E Ave” to “E Avenue” not “East Avenue”.

There appear to be no hard and fast rules that work universally for expanding address abbreviations even for a limited area in the United States. Any expansion the script made had to be checked by a human. So I had script output a log of the changes it made to help with verifying they made sense. This was slow going even with semi-automated find/replace logic but at least is was many times faster than just doing the same thing in JOSM alone.

This helped a lot on reducing the errors reported by OSM Inspector but there were still a bunch of “Street not found” issues. Taking another look, it seems that there were a bunch of capitalization issues and spelling errors. So the next version of the script was to have it build a list of highway names and then print a list of suspect addr:street values, those being ones that did not match a name value for an nearby highway. Looking at these, some could be resolved by reviewing the highway name (usually from the original TIGER import), the highway name shown in the latest TIGER overlay and the preponderance of spelling on addr:street values.

But in many other cases, especially along state highways, it was not possible to guess the correct street name. If the addr:street values along a road are a mix of “Highway 79”, “SR 79” and “CA 79” and the highway does not have a name value only a ref tag, which to choose? It depends on the actual signs along the road so it can’t be resolved remotely.

The next issue discovered was that many addresses were imported more than once, often in widely different positions. So the script was modified again to output a list of duplicate addresses. Using Bing satellite imagery and the highway locations shown in JOSM sometimes these can be resolved. But in many cases, it just isn’t clear so a field survey is needed.

Result

The the current state of addresses in San Diego county, after many, many thousands of address nodes were cleaned up:

  • The addr:city tag should match the postal city given by the ZIP code.

  • There should be no abbreviated values in the addr:street tags.

  • Most, but far from all, of the addr:street values match the name value for a near by street.

  • Some of the duplicate addresses have been resolved.

But I am now listed as the most recent editor of a lot of address nodes that are duplicates and/or don’t have a matching highway name.