Elasticsearch in Action: Geospatial Data Types
The excerpts are taken from my book Elasticsearch in Action, Second Edition. The code is available in my GitHub repository. You can find executable Kibana scripts in the repository so you can run the commands in Kibana straight away. All code is tested against Elasticsearch 8.4 version.
In the last article, we looked fundamentals of location search. In this article, we look at the data types provided by Elasticearch for geo-search.
Similar to how the textual data is represented by the text
data type, Elasticsearch provides two dedicated data types to work with spatial data: the geo_point
and geo_shape
. The geo_point
data type expresses a longitude and latitude that works on location-based queries. The geo_shape
type, on the other hand, lets us index geoshapes such as points, multi lines, polygons, and a few others. Let’s look at these spatial data types in the following sections.
The geo_point data type
A location on a map is expressed universally by longitude and latitude. Elasticsearch supports the representation of such location data using a dedicated geo_point
data type. Once the mapping is ready, we can index a document. The following listing demonstrates the code for creating a data schema for the bus_stops
index with a couple of fields.
PUT bus_stops
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"location":{
"type": "geo_point"
}
}
}
}
The bus_stops
index is defined with two properties: a name
and a location
. The location
is represented by a geo_point
data type, which means it would expect to be set with latitude and longitude values when indexing the document. The following query in the next listing indexes the London Bridge Station bus stop.
POST bus_stops/_doc
{
"name":"London Bridge Station",
"location":"51.07, 0.08"
}
As the query shows, the location
field is provided with stringified latitude and longitude values separated by a comma: “51.07, 0.08”
. Providing the coordinates in this string format is not the only way you can set the location
field. Fortunately, there are a bunch of formats in addition to string, such as array, well-known-text (WKT) point, and geohash, that we can use to input the location
field’s geographic coordinates. The query in the following listing provides the mechanism of these types of inputs.
# As WKT point (lat, lon)
POST bus_stops/_doc
{
"text": "London Victoria Station",
"location" : "POINT (51.49 0.14)"
}
# As location object
POST bus_stops/_doc
{
"text": "Leicester Square Station",
"location" : {
"lon":-0.12,
"lat":51.50
}
}
# As an array (lon, lat)
POST bus_stops/_doc
{
"text": "Westminster Station",
"location" : [51.54, 0.23]
}
# As a geohash
POST bus_stops/_doc
{
"text": "Hyde Park Station",
"location" : "gcpvh2bg7sff"
}
The queries in the above given listing index various bus stop locations using multiple formats. As you can see, one can use a string of latitude and longitude as in the previous listing or, as in the listing given above, either an object, an array, a geohash, or a WKT-formatted POINT shape.
Now that we understand the geo_point
data type, it’s time to learn about the second type: the geo_shape
data type. As the name indicates, the geo_shape
type helps index and search data using a particular shape; for example, a polygon. Let’s next look at the geo_shape
data type to understand how we can index data for geoshapes.
The geo_shape data type
Similar to the geo_point
type, which represents a point on the map, Elasticsearch provides a geo_shape
data type to represent shapes such as points, multipoints, lines, and polygons. The shapes are represented by an open standard called GeoJSON (http://geojson.org) and, accordingly, is written in JSON format. The geometric shapes are mapped to a geo_shape
data type.
Let’s first create the mapping for an index of cafes
with a couple of fields. One of them is the address
field, which points to the location of a cafe, represented as a geo_shape
type. The following listing demonstrates this.
PUT cafes
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"address": {
"type": "geo_shape"
}
}
}
}
The code creates an index called cafes
to house local restaurants. The notable field is the address
field, which is defined as a geo_shape
type. This type now expects inputs of shapes in GeoJSON or WKT. For example, to represent a point on a map, we can input the field using Point
in GeoJSON or POINT
in WKT as the code in this listing demonstrates.
# Inputting the address in GeoJSON format
PUT cafes/_doc/1
{
"name":"Costa Coffee",
"address" : {
"type" : "Point",
"coordinates" : [0.17, 51.57]
}
}
# Inputting the address in WKT format
PUT /cafes/_doc/2
{
"address" : "POINT (0.17 51.57)"
}
This code declares two ways to input a geo_shape
field: using GeoJSON or WKT. GeoJSON expects a type attribute of an appropriate shape (“type”:”Point”
) and the corresponding coordinates (“coordinates”:[0.17, 51.57]
) as in the example. The second example in the listing given above shows the mechanics of creating a point using a WKT format (“address”: “POINT (0.17 51.57)”
).
Note: There is a subtle difference when representing the coordinates using a string format versus other formats. The string format expects the values in the order of latitude and longitude separated by a comma; for example, “(51.57, 0.17)”
. However, the coordinates are interchanged for GeoJSON or WKT formats as longitude and latitude; for example, “POINT (0.17 51.57)”
.
We can build various shapes using these formats. The table below provides a brief description of a few of them. I suggest that you consult the Elasticsearch documentation about how you can index and search documents to understand the concepts and examples in detail.
That’s pretty much about Geo data types. Don’t forget to read the last article describing the basics of location search.