Nextcloud Memories: Putting Thousands of Photos on the World Map

Varun Patil
6 min readFeb 11, 2023

Over the last few months, I’ve been working on Memories, a free and fully featured photo suite designed to rival the likes of Google, Amazon and Apple Photos. An important missing piece of Memories since its inception has been Geolocation. Modern cameras and phones tag EXIF data in photos with the latitude and longitude of the place where it was taken; Memories made little to no use of this metadata. In this post, I will try to briefly describe the journey of arriving at the designs of two new exciting components introduced in v4.11:

  1. Reverse Geocoding: Getting the name of the location.
  2. Map Clustering: Showing the photos on an actual map. If you’re here just for this part, skip ahead!

Both of these use unconventional designs compared to existing open source implementations, so this will hopefully be interesting!

Photo map in Memories

Reverse Geocoding

The first part of the story is to look up the name of the place where a photo was taken, for example, and allow the user to browse photos taken at a particular location.

In the spirit of not reinventing the wheel, I started by examining the approaches taken by other FOSS photo libraries that support this feature (maybe I missed some!):

  1. LibrePhotos: Uses theMapbox API, which provides a lot of free lookups. This wasn’t an option since the solution had to be completely free and private.
  2. PhotoPrism: Uses a special free API that doesn’t leak the coordinates. For similar reasons, that didn’t provide a satisfactory solution, since it needs to be e.g. rate limited to prevent abuse.
  3. Immich: From the docs, Immich uses the free Geonames dataset for reverse geocoding (through the local-reverse-geocoder package). This was the strongest contender, since it provided a completely “offline” solution. But there was a problem.

Why not Geonames?

The Geonames dataset provides granularity upto the level of cities. Each city is represented by a single point, the coordinates of the “center” of the city. And it carries other useful metadata such as the name of the city, state country etc. Using this database for geocoding gives two advantages (besides being offline):

  1. It’s tiny. The cities 500 database is just 10MB zipped, and has approximately all cities in the world with population >500.
  2. It’s easy to lookup. Since all you need (and can) do is find the nearest point, looking up the database is very easy.

The problem — granularity. In urban areas where cities may be clustered closely together, looking up the closest point may more often than not yields an incorrect data point. Here’s an example:

| Coordinates              | Actual        | Closest       |
| 34.044720, -118.259246 | Los Angeles | Los Angeles |
| 34.044720, -118.259246 | Los Angeles | Westwood |
| 33.961000, -118.310100 | Los Angeles | Westmont |
| 33.963117, -118.337780 | Inglewood | Inglewood |

The reason behind this becomes quite obvious when you look at the map of Los Angeles. With just the coordinates of the center, it is impossible to determine where a given point is, considering LA has 88 cities.

Map of the City of Los Angeles

The solution to the problem is simple — we need to store the actual boundaries of the cities.

Boundary Dataset

The most popular available data that has these boundaries (and the one from which the above image was taken) is the Open Street Maps. Unfortunately, there is no dataset from OSM with only boundaries. The available dataset of the map of the entire planet is an unworkable 67GB, when stored as a compact binary representation that cannot be queried.

So the first task was to remove all data from the binary planet that we didn’t need. Fortunately, there are many tools for working with OSM, and keeping only boundaries from the dataset was quite easy with Osmium.

osmium tags-filter planet.osm.pbf wra/boundary=administrative -o boundaries.osm.pbf

Doing this brings down the size of the dump to 1GB! That is good already, but remember this is a binary format that cannot be queried. Converting it directly to an SQLite database would create a sizable database of ~8GB after indexing.

At this point, I noted that many of these boundaries were extremely detailed, and such details would largely be irrelevant to the users of a photo suite. As such, the boundaries could be simplified by a large margin. The shapely python package provides an easy way to simplify polygons. On doing some clever simplification, the SQLite database came down to 3GB!

There was one piece left. To query a SQLite database with boundary data, one would need the Spatialite extension for SQLite. Installing this would be extra maintenance overhead, as well as more moving parts to break.

Fortunately, the recommended way to run Nextcloud is to use MySQL/MariaDB or Postgres. Both databases support basic querying and indexing of polygon data. So finally, I simply dumped all the simplified boundaries as JSON into a file, which can be directly imported inside the database. The result — a dataset size of 80MB zipped before import, and sub-millisecond query times for the entire planet! And as far as accuracy goes,

| Coordinates              | Actual        | Closest       |
| 34.044720, -118.259246 | Los Angeles | Los Angeles |
| 34.044720, -118.259246 | Los Angeles | Los Angeles |
| 33.961000, -118.310100 | Los Angeles | Los Angeles |
| 33.963117, -118.337780 | Inglewood | Inglewood |

Yay! Add in a bit of UX and Memories now has reverse geocoding!

Offline Reverse Geocoding in Memories

Map Clustering

The simplest approach to showing photos on a map is to load data about all the photos and then cluster them client-side. Such clustering has been highly optimized, and libraries such as Leaflet.markercluster are known to work with tens of thousands of points.

However, with large photo libraries, the entire pipeline of querying, serializing, transferring and deserializing (before the clustering itself) could seriously compromise the user experience. As a result, when building a map, this clustering needs to be done on the server, preferably by the RDBMS itself.

This crucial feature was contributed by Raymond Huang from Purdue University. By simply running a GROUP BY query on the division of the latitude / longitude by a fixed number, we can divide the planet into progressively smaller sections inside the database, then SELECT the AVG of the points inside the section to show to the user.

As the user zooms in, we can adjust the size of the section to be smaller, and progressively show the user more clusters of photos. As long as the section sizes differ by powers of 2, the boundaries of smaller sections would overlap with larger ones, leading to a smooth user experience.

Cluster divisions for coarse (blue) and progressively finer (red) scales

However, such a query would require scanning over the entire database every time the user zooms in slightly or pans the map (since we would filter by the visible bounds).

To resolve this, we need to consider how humans actually click photographs. Most of our photos are concentrated around a few locations (say home or work). Even when visiting some place, it is highly likely that you might click multiple photos at the same location. Eventually, these photos would show up as a single point on the map.

We can leverage this information by pre-clustering photos within a small radius. Arbitrarily setting this radius to approximately 30 meters, any photos within this range are automatically clustered together eagerly before the map query is actually run. This way, when we do run the query, we can simply query the clusters instead of the photos, and get almost exactly the same result with a magnitude lower overhead!

Finally, we need to show these to the user. Fortunately, this is rather simple, with a combination of leaflet and some CSS transition tricks. With one last query to show the photos in the current map bounds to the user, we have our beautiful and efficient map ready for prime time!

--

--