OpenStreetMap

Friendly_Ghost's Diary

Recent diary entries

Finding SEO spam in OSM

Posted by Friendly_Ghost on 18 June 2023 in English. Last updated on 23 June 2023.

After I came across some business descriptions in OSM that were of dubious quality, I decided to hunt them down systematically. OSM is, after all, not a place for advertisements. Now, about half a year and hundreds of POI tag fixes later, it is time to reflect on this project and to share my observations.

Introduction

People who map their business on OSM usually have a single changeset in which they put their business on their map. They often foul up the opening hours sysntax and international formatting for phone numbers, and there is usually a lot of info still missing. This is fine, since OSM data in general follows the trend where basic map data receives details, corrections and improvements by different mappers over time.

An issue arises when companies try to sneak in their brochure texts and other SEO spam. We want OSM to stay objective and neutral and we want data that relates to the real world, so this information is unwelcome. We can’t stop people from mapping their company details, but moderation is clearly needed if we are to uphold these principles.

I started looking for a way to detect the unwanted spam. The result is this Overpass query for buzzwords in the description tag. Think about words like “award winning”, “reliable service” and “conveniently located”. This is a dynamic process, because I regularly add new buzzwords that I encounter alongside the ones that I find through the query and I remove words that result in false positives.

Image 1: Distribution of the results in Overpass Turbo Image 1: Distribution of the results in Overpass Turbo

Results

Many businesses are properly tagged apart from the questionable description, and for those it’s a quick and easy process to delete the description and move on. Example

The more complicated cases in be categorised as follows:

Places that are missing a main feature tag

Some businesses are not tagged as anything, but instead they just have a name, address, description and with some luck a website. A close inspection is needed to figure out how to tag these businesses. Example

Descriptions with additional information to tag

Some descriptions contain both the unwanted spam and some useful information. It might be a hotel that offers free wi-fi, a pet-friendly café or an insurance company that mentions its phone number in the description tag. With enough understanding of OSM’s tagging practices it’s possible to turn this SEO spam into proper tags like internet_access, dog=yes or (contact:)phone=*. Example

Chaos

Sometimes a person manages to fit too much SEO spam into an OSM object. There might be emojis, inviting messages in bold text, names fully capitalised, 15+ payment options, cuisine tags packed with all the drinks that are offered, an image that’s just the logo, website tags that lead to a review page and address tags that link to Google Maps, all on top of the usual shenanigans. There is no way to speedrun a cleanup of these objects; they need to be inspected one tag at a time. I have seen too many of these and am now contemplating a position as monk at the nearest monastery. Example

False positives

Sometimes buzzwords like “famous for” and “the best” are not intended to allure potential customers, but are somehow part of neutral descriptions of places. I saved the IDs of the false positives I found to exclude them from the query. Example

Editing

To edit the tags of these objects I mainly use Overpass Turbo in conjunction with the OSM tags editor, which is an extension for Google Chrome that lets you edit tags with a minimalistic UI directly on osm.org. My main considerations are speed and simplicity, but for more versatility, like the ability to remove duplicate POIs or to have a validator tool, it pays off to choose JOSM or iD/Rapid instead.

MapRoulette

I have created MapRoulette challenges to ask for help with reviewing and removing business descriptions. So far, some helpful mappers have removed roughly 700 unwanted descriptions globally. These challenges only feature nodes for now. I just uploaded a new version of the challenge here.

Conclusion

OpenStreetMap is becoming an increasingly interesting medium for firms to make their presence known to the world. We generally welcome their contributions to the map, but since these people usually don’t return to OSM after their initial effort to map their businesses, we need to have a good look at their work to assure that it meets community standards. I am taking a deep dive into the descriptions they add, and after I worked my way through hundreds of them I can conclude that there is a lot of room for improvement, either through removing spam or through converting it to other useful tags. As with everything else in OSM it is an effort to which anyone can contribute.

Congratulations for making it to the end of my essay. Thank you for reading this.

P.s. I created a forum thread in which we can discuss this topic.

My first day of editing in JOSM

Posted by Friendly_Ghost on 19 March 2020 in English.

Today has been my first day of contributing to OSM. My hometown is still as good as empty, so I have a lot of work to do. I already added several POIs (mainly shops) and put some roads where they’re supposed to be. I have a lot to learn, but I already got a lot of help to get started. My goal is to rival Google Maps in completeness in about a 1 km radius from where I live.

Location: Bennekom Centrum, Bennekom, Ede, Gelderland, Netherlands