The Sanctuary

Writing about interests; Computer Science, Philosophy, Mathematics and AI.

The Haversine Formula: Building a Store Locator in MUMPS

MUMPSCachégeolocationalgorithmsHaversine

A deep dive into the Haversine formula and its implementation for geospatial store ranking in InterSystems Caché.

I. The Problem

A user opens a retail website. They want to find the nearest store. The browser knows their coordinates. The database holds the coordinates of every store. The question seems simple: which store is closest?

But the Earth is not flat. Euclidean distance—the straight line between two points—fails spectacularly when applied to a sphere. A degree of longitude near the equator spans roughly 111 kilometers; near the poles, it collapses toward zero. Any naive calculation that treats latitude and longitude as Cartesian coordinates will produce nonsense.

We need spherical geometry. We need the Haversine formula.

II. The Haversine Formula

The Haversine formula calculates the great-circle distance between two points on a sphere, given their latitudes and longitudes. It is named after the haversine function:

$$\text{hav}(\theta) = \sin^2\left(\frac{\theta}{2}\right)$$

The haversine of an angle is simply the square of the sine of half that angle. This function has a useful property: it remains numerically stable for small angles, avoiding the floating-point errors that plague direct cosine-based calculations.

Given two points with coordinates $(\phi_1, \lambda_1)$ and $(\phi_2, \lambda_2)$, where $\phi$ represents latitude and $\lambda$ represents longitude (both in radians), the Haversine formula proceeds as follows:

Step 1: Calculate the haversine of the central angle

$$a = \sin^2\left(\frac{\phi_2 - \phi_1}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\lambda_2 - \lambda_1}{2}\right)$$

Step 2: Calculate the central angle

$$c = 2 \cdot \text{atan2}\left(\sqrt{a}, \sqrt{1-a}\right)$$

Step 3: Calculate the distance

$$d = R \cdot c$$

Where $R$ is the Earth’s radius. For kilometers, $R \approx 6371$. For miles, $R \approx 3959$.

The result $d$ is the shortest distance between the two points along the surface of the Earth—the path an aircraft would fly.

III. Why Haversine?

Several formulas exist for calculating spherical distances. The spherical law of cosines is simpler:

$$d = R \cdot \arccos\left(\sin(\phi_1) \cdot \sin(\phi_2) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \cos(\lambda_2 - \lambda_1)\right)$$

But this formula suffers from numerical instability when the two points are very close together. The argument to $\arccos$ approaches 1, and floating-point rounding errors can push it beyond the valid domain $[-1, 1]$, causing computation failures.

The Haversine formula avoids this problem. By using the haversine function, it remains accurate even for distances of a few meters. For a store locator where users might be standing next to a store, this precision matters.

Vincenty’s formulae offer even greater accuracy by modeling the Earth as an oblate spheroid rather than a perfect sphere. But for distances under a few hundred kilometers—the typical range of a store locator—Haversine’s error is negligible, and its computational simplicity is preferable.

IV. The Algorithm

With the distance formula established, the store locator algorithm becomes straightforward:

  1. Obtain user coordinates via the browser’s Geolocation API
  2. Retrieve all store coordinates from the database
  3. Calculate the distance from the user to each store using Haversine
  4. Sort stores by distance in ascending order
  5. Return the ranked list to the frontend

The computational complexity is $O(n \log n)$ where $n$ is the number of stores—linear traversal for distance calculation, plus the sort. For a retail chain with hundreds or even thousands of locations, this executes in milliseconds.

For larger datasets, spatial indexing structures like R-trees or geohashing can reduce the search space. But for most retail applications, the brute-force approach is fast enough and far simpler to implement and maintain.

V. Implementation in MUMPS

InterSystems Caché—the modern incarnation of MUMPS—handles this problem elegantly. If you are unfamiliar with MUMPS, I have written an introduction to the language that covers its unique characteristics.

First, the Haversine calculation itself:

Haversine(lat1,lon1,lat2,lon2)
    ; Returns distance in kilometers
    NEW pi,R,dLat,dLon,a,c
    SET pi=3.141592653589793
    SET R=6371
    ; Convert degrees to radians
    SET lat1=lat1*pi/180
    SET lat2=lat2*pi/180
    SET dLat=(lat2-lat1)
    SET dLon=(lon2-lon1)*pi/180
    ; Haversine formula
    SET a=$ZSIN(dLat/2)**2+($ZCOS(lat1)*$ZCOS(lat2)*$ZSIN(dLon/2)**2)
    SET c=2*$ZARCTAN($ZSQR(a),$ZSQR(1-a))
    QUIT R*c

The $Z functions are Caché extensions providing trigonometric operations. The calculation mirrors the mathematical formula exactly.

Next, the ranking logic:

RankStores(userLat,userLon)
    ; Build a distance-indexed list of stores
    NEW storeId,storeLat,storeLon,dist
    SET storeId=""
    FOR  SET storeId=$ORDER(^Stores(storeId)) QUIT:storeId=""  DO
    . SET storeLat=^Stores(storeId,"lat")
    . SET storeLon=^Stores(storeId,"lon")
    . SET dist=$$Haversine(userLat,userLon,storeLat,storeLon)
    . ; Index by distance (with storeId to handle equal distances)
    . SET ^TMP($JOB,"dist",dist,storeId)=""
    ; Traverse in order (MUMPS globals are automatically sorted)
    NEW rank SET rank=0
    SET dist=""
    FOR  SET dist=$ORDER(^TMP($JOB,"dist",dist)) QUIT:dist=""  DO
    . SET storeId=""
    . FOR  SET storeId=$ORDER(^TMP($JOB,"dist",dist,storeId)) QUIT:storeId=""  DO
    . . SET rank=rank+1
    . . ; Output or build response
    . . WRITE rank,": Store ",storeId," - ",dist," km",!
    ; Cleanup
    KILL ^TMP($JOB,"dist")
    QUIT

The elegance here lies in MUMPS’s native behavior: global subscripts are automatically maintained in sorted order. By indexing stores under their calculated distance, we get sorting for free. The $ORDER function traverses this sorted structure, yielding stores from nearest to farthest without an explicit sort operation.

VI. The Browser Integration

The frontend obtains coordinates through the Geolocation API:

navigator.geolocation.getCurrentPosition(
    (position) => {
        const { latitude, longitude } = position.coords;
        fetch(`/api/stores/nearest?lat=${latitude}&lon=${longitude}`)
            .then(response => response.json())
            .then(stores => renderStoreList(stores));
    },
    (error) => {
        // Fall back to IP geolocation or manual entry
        handleGeolocationError(error);
    }
);

The backend receives these coordinates, executes the MUMPS ranking routine, and returns the sorted store list as JSON. The entire round trip—from permission grant to rendered results—typically completes in under a second.

Error handling deserves attention. Users may deny location access, or the browser may fail to obtain a fix. A robust implementation falls back to IP-based geolocation (less accurate but better than nothing) or prompts the user to enter their postal code manually.

VII. Reflections

There is something satisfying about implementing a spherical trigonometry formula in a language designed in 1966 for medical records. MUMPS was never intended for geospatial computation, yet its sorted globals and terse syntax make the solution almost elegant.

The Haversine formula itself carries a quiet beauty. It reduces a complex geometric problem—finding the shortest path across a curved surface—to a handful of trigonometric operations. Sailors used these calculations to navigate oceans centuries before GPS satellites existed. We use them to help someone find the nearest store that stocks their preferred brand of coffee.

The mathematics endures. The applications change. And occasionally, an aging language proves surprisingly well-suited to a modern problem.


The Haversine Formula: Building a Store Locator in MUMPS

A deep dive into the Haversine formula and its implementation for geospatial store ranking in InterSystems Caché.

Achraf SOLTANI — August 15, 2019