Technical White Paper
Introduction
In the beginning, tzf was designed for backend services that need to convert coordinates to timezones, mostly for geo and weather services.
As the project evolved, we need to add Python support since timezonefinder’s speed around borders could not satisfy our needs back then.
Thats why tzf(Golang), tzfpy(Python), tzf-rs(Rust) were created.
It’s clear that we need a high performance library for this purpose, and we can accept not too accurate around borders, let’s say incorrect result around borders 1KM range is acceptable for us.
So we need a library that can:
- Convert coordinates to timezones.
- Performance is more important than accuracy
- At least support Go and Python. (Rust was developed because of the the ecosystem of PyO3).
- Less distribution/binary size, since we need to use it in backend services.
In these white paper, two topics will be explained in detail:
- Polygon simplification
- Tile based indexing
Others, like the implementation details of the library, will be explained in the codebase.
Polygon simplification
First, we converted the GeoJSON polygon data into a binary encoding using Protocol Buffers, reducing the file size by approximately 80 MB:
1message Point {
2 float lng = 1;
3 float lat = 2;
4}
5
6message Polygon {
7 repeated Point points = 1;
8 repeated Polygon holes = 2;
9}
10
11message Timezone {
12 repeated Polygon polygons = 1;
13 string name = 2;
14}
15
16message Timezones {
17 repeated Timezone timezones = 1;
18}
This conversion still required about 900 MB of memory to load. To further optimize, we applied the Ramer–Douglas–Peucker (RDP) algorithm to simplify polygon shapes by reducing the number of points:
The effect of varying epsilon in a parametric implementation of RDP, source
After filtering, the data size dropped to approximately 11 MB. Finally, we employed Google Maps’ Polyline encoding algorithm to compress the coordinate sequences into a compact ASCII representation, reducing the file size to about 4.6 MB for efficient distribution.
If you want to know the actual parameter for the simplification, you can see the code.
Tile based indexing
A naïve Ray Casting approach operates in O(n^2) time, which is unsuitable for high-concurrency backend services. We considered spatial R-trees but found minimal performance gains given the small number of global time zones and their uneven area distributions.
Instead, we adopted a tile-based indexing scheme inspired by map tile formats used in weather data services. Each tile defines a quadrilateral region at a given zoom level:
1┌───────────┬───────────┬───────────┐
2│ │ │ │
3│ │ │ │
4│ x-1,y-1,z │ x+0,y-1,z │ x+1,y-1,z │
5│ │ │ │
6│ │ │ │
7├───────────┼───────────┼───────────┤
8│ │ │ │
9│ │ │ │
10│ x-1,y+0,z │ x+0,y+0,z │ x+1,y+0,z │
11│ │ │ │
12│ │ │ │
13├───────────┼───────────┼───────────┤
14│ │ │ │
15│ │ │ │
16│ x-1,y+1,z │ x+0,y+1,z │ x+1,y+1,z │
17│ │ │ │
18│ │ │ │
19└───────────┴───────────┴───────────┘
This quadtree-like layout ensures parent tiles contain exactly four child tiles, allowing aggregation without gaps:
Tile-based timezone index demo(not the real data being used), A live demo to show polygon and it’s index can be view via tzf-web
When querying a point’s timezone, we identify the corresponding tile at the precomputed zoom level and only test against the small subset of polygons indexed within that tile, achieving near-constant query time.