2018-12-23 09:31:07 +08:00
[[geoip-processor]]
2020-08-12 23:28:00 +08:00
=== GeoIP processor
++++
<titleabbrev>GeoIP</titleabbrev>
++++
2016-01-26 04:06:39 +08:00
2021-06-28 15:04:49 +08:00
The `geoip` processor adds information about the geographical location of an
IPv4 or IPv6 address.
2021-04-15 19:47:09 +08:00
2021-06-28 15:04:49 +08:00
[[geoip-automatic-updates]]
By default, the processor uses the GeoLite2 City, GeoLite2 Country, and GeoLite2
ASN GeoIP2 databases from
http://dev.maxmind.com/geoip/geoip2/geolite2/[MaxMind], shared under the
CCA-ShareAlike 4.0 license. {es} automatically downloads updates for
these databases from the Elastic GeoIP endpoint:
https://geoip.elastic.co/v1/database. To get download statistics for these
updates, use the <<geoip-stats-api,GeoIP stats API>>.
2021-04-15 19:47:09 +08:00
2021-06-28 15:04:49 +08:00
If your cluster can't connect to the Elastic GeoIP endpoint or you want to
manage your own updates, see <<manage-geoip-database-updates>>.
2021-06-15 21:46:49 +08:00
2021-06-28 15:04:49 +08:00
If {es} can't connect to the endpoint for 30 days all updated databases will become
invalid. {es} will stop enriching documents with geoip data and will add `tags: ["_geoip_expired_database"]`
field instead.
2016-01-26 04:06:39 +08:00
2016-06-09 03:55:59 +08:00
[[using-ingest-geoip]]
2018-12-22 22:49:56 +08:00
==== Using the `geoip` Processor in a Pipeline
2016-06-09 03:55:59 +08:00
[[ingest-geoip-options]]
2018-12-22 22:49:56 +08:00
.`geoip` options
2016-01-26 04:06:39 +08:00
[options="header"]
|======
| Name | Required | Default | Description
2016-04-21 00:00:11 +08:00
| `field` | yes | - | The field to get the ip address from for the geographical lookup.
2021-06-28 15:04:49 +08:00
| `target_field` | no | geoip | The field that will hold the geographical information looked up from the MaxMind database.
2020-10-30 01:27:17 +08:00
| `database_file` | no | GeoLite2-City.mmdb | The database filename referring to a database the module ships with (GeoLite2-City.mmdb, GeoLite2-Country.mmdb, or GeoLite2-ASN.mmdb) or a custom database in the `ingest-geoip` config directory.
2020-10-01 03:06:51 +08:00
| `properties` | no | [`continent_name`, `country_iso_code`, `country_name`, `region_iso_code`, `region_name`, `city_name`, `location`] * | Controls what properties are added to the `target_field` based on the geoip lookup.
2016-12-21 02:53:28 +08:00
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document
2019-12-07 04:57:06 +08:00
| `first_only` | no | `true` | If `true` only first found geoip data will be returned, even if `field` contains array
2016-01-26 04:06:39 +08:00
|======
2019-05-09 14:52:07 +08:00
*Depends on what is available in `database_file`:
2016-02-12 09:40:32 +08:00
2016-03-04 14:49:31 +08:00
* If the GeoLite2 City database is used, then the following fields may be added under the `target_field`: `ip`,
2018-07-21 02:23:29 +08:00
`country_iso_code`, `country_name`, `continent_name`, `region_iso_code`, `region_name`, `city_name`, `timezone`, `latitude`, `longitude`
2016-04-21 00:00:11 +08:00
and `location`. The fields actually added depend on what has been found and which properties were configured in `properties`.
2016-03-04 14:49:31 +08:00
* If the GeoLite2 Country database is used, then the following fields may be added under the `target_field`: `ip`,
2017-12-22 14:51:44 +08:00
`country_iso_code`, `country_name` and `continent_name`. The fields actually added depend on what has been found and which properties
were configured in `properties`.
* If the GeoLite2 ASN database is used, then the following fields may be added under the `target_field`: `ip`,
2020-09-25 00:51:50 +08:00
`asn`, `organization_name` and `network`. The fields actually added depend on what has been found and which properties were configured
2017-12-22 14:51:44 +08:00
in `properties`.
2016-01-26 04:06:39 +08:00
2020-09-25 00:51:50 +08:00
2016-03-04 14:49:31 +08:00
Here is an example that uses the default city database and adds the geographical information to the `geoip` field based on the `ip` field:
2016-01-26 04:06:39 +08:00
2019-09-06 22:55:16 +08:00
[source,console]
2016-01-26 04:06:39 +08:00
--------------------------------------------------
2016-08-10 20:55:42 +08:00
PUT _ingest/pipeline/geoip
2016-01-26 04:06:39 +08:00
{
2016-08-10 20:55:42 +08:00
"description" : "Add geoip info",
2016-01-26 04:06:39 +08:00
"processors" : [
{
"geoip" : {
2016-04-21 00:00:11 +08:00
"field" : "ip"
2016-01-26 04:06:39 +08:00
}
}
]
}
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/my_id?pipeline=geoip
2016-08-10 20:55:42 +08:00
{
"ip": "8.8.8.8"
}
2020-07-28 02:46:39 +08:00
GET my-index-000001/_doc/my_id
2016-08-10 20:55:42 +08:00
--------------------------------------------------
Which returns:
2019-09-07 02:05:36 +08:00
[source,console-result]
2016-08-10 20:55:42 +08:00
--------------------------------------------------
{
"found": true,
2020-07-28 02:46:39 +08:00
"_index": "my-index-000001",
2016-08-10 20:55:42 +08:00
"_id": "my_id",
"_version": 1,
2018-12-17 22:22:13 +08:00
"_seq_no": 55,
"_primary_term": 1,
2016-08-10 20:55:42 +08:00
"_source": {
"ip": "8.8.8.8",
"geoip": {
"continent_name": "North America",
2020-10-01 03:06:51 +08:00
"country_name": "United States",
2016-08-10 20:55:42 +08:00
"country_iso_code": "US",
2017-12-19 21:24:30 +08:00
"location": { "lat": 37.751, "lon": -97.822 }
2016-08-10 20:55:42 +08:00
}
}
}
2016-01-26 04:06:39 +08:00
--------------------------------------------------
2018-12-17 22:22:13 +08:00
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term":1/"_primary_term" : $body._primary_term/]
2016-01-26 04:06:39 +08:00
2016-08-10 20:55:42 +08:00
Here is an example that uses the default country database and adds the
2020-10-01 03:06:51 +08:00
geographical information to the `geo` field based on the `ip` field. Note that
2018-12-22 20:21:49 +08:00
this database is included in the module. So this:
2016-01-26 04:06:39 +08:00
2019-09-06 22:55:16 +08:00
[source,console]
2016-01-26 04:06:39 +08:00
--------------------------------------------------
2016-08-10 20:55:42 +08:00
PUT _ingest/pipeline/geoip
2016-01-26 04:06:39 +08:00
{
2016-08-10 20:55:42 +08:00
"description" : "Add geoip info",
2016-01-26 04:06:39 +08:00
"processors" : [
{
"geoip" : {
2016-04-21 00:00:11 +08:00
"field" : "ip",
2016-01-26 04:06:39 +08:00
"target_field" : "geo",
2018-03-12 15:07:33 +08:00
"database_file" : "GeoLite2-Country.mmdb"
2016-01-26 04:06:39 +08:00
}
}
]
}
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/my_id?pipeline=geoip
2016-08-10 20:55:42 +08:00
{
"ip": "8.8.8.8"
}
2020-07-28 02:46:39 +08:00
GET my-index-000001/_doc/my_id
2016-08-10 20:55:42 +08:00
--------------------------------------------------
returns this:
2019-09-07 02:05:36 +08:00
[source,console-result]
2016-08-10 20:55:42 +08:00
--------------------------------------------------
{
"found": true,
2020-07-28 02:46:39 +08:00
"_index": "my-index-000001",
2016-08-10 20:55:42 +08:00
"_id": "my_id",
"_version": 1,
2018-12-17 22:22:13 +08:00
"_seq_no": 65,
"_primary_term": 1,
2016-08-10 20:55:42 +08:00
"_source": {
"ip": "8.8.8.8",
"geo": {
"continent_name": "North America",
2020-10-01 03:06:51 +08:00
"country_name": "United States",
2016-08-10 20:55:42 +08:00
"country_iso_code": "US",
}
}
}
2016-01-26 04:06:39 +08:00
--------------------------------------------------
2018-12-17 22:22:13 +08:00
// TESTRESPONSE[s/"_seq_no": \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
2016-09-14 09:12:02 +08:00
Not all IP addresses find geo information from the database, When this
occurs, no `target_field` is inserted into the document.
2017-12-19 21:24:30 +08:00
Here is an example of what documents will be indexed as when information for "80.231.5.0"
2016-09-14 09:12:02 +08:00
cannot be found:
2019-09-06 22:55:16 +08:00
[source,console]
2016-09-14 09:12:02 +08:00
--------------------------------------------------
PUT _ingest/pipeline/geoip
{
"description" : "Add geoip info",
"processors" : [
{
"geoip" : {
"field" : "ip"
}
}
]
}
2018-11-15 12:05:45 +08:00
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/my_id?pipeline=geoip
2016-09-14 09:12:02 +08:00
{
2017-12-19 21:24:30 +08:00
"ip": "80.231.5.0"
2016-09-14 09:12:02 +08:00
}
2018-11-15 12:05:45 +08:00
2020-07-28 02:46:39 +08:00
GET my-index-000001/_doc/my_id
2016-09-14 09:12:02 +08:00
--------------------------------------------------
Which returns:
2019-09-07 02:05:36 +08:00
[source,console-result]
2016-09-14 09:12:02 +08:00
--------------------------------------------------
{
2020-07-28 02:46:39 +08:00
"_index" : "my-index-000001",
2018-11-15 12:05:45 +08:00
"_id" : "my_id",
"_version" : 1,
2018-12-17 22:22:13 +08:00
"_seq_no" : 71,
"_primary_term": 1,
2018-11-15 12:05:45 +08:00
"found" : true,
"_source" : {
"ip" : "80.231.5.0"
2016-09-14 09:12:02 +08:00
}
}
--------------------------------------------------
2018-12-17 22:22:13 +08:00
// TESTRESPONSE[s/"_seq_no" : \d+/"_seq_no" : $body._seq_no/ s/"_primary_term" : 1/"_primary_term" : $body._primary_term/]
2016-12-19 17:06:12 +08:00
2018-11-15 12:05:45 +08:00
[[ingest-geoip-mappings-note]]
===== Recognizing Location as a Geopoint
2018-12-22 20:21:49 +08:00
Although this processor enriches your document with a `location` field containing
2018-11-15 12:05:45 +08:00
the estimated latitude and longitude of the IP address, this field will not be
2019-01-07 21:44:12 +08:00
indexed as a {ref}/geo-point.html[`geo_point`] type in Elasticsearch without explicitly defining it
2018-11-15 12:05:45 +08:00
as such in the mapping.
You can use the following mapping for the example index above:
2019-09-06 22:55:16 +08:00
[source,console]
2018-11-15 12:05:45 +08:00
--------------------------------------------------
2019-01-22 22:13:52 +08:00
PUT my_ip_locations
2018-11-15 12:05:45 +08:00
{
"mappings": {
2019-01-22 22:13:52 +08:00
"properties": {
"geoip": {
"properties": {
"location": { "type": "geo_point" }
2018-11-15 12:05:45 +08:00
}
}
}
}
}
--------------------------------------------------
////
2019-09-06 22:55:16 +08:00
[source,console]
2018-11-15 12:05:45 +08:00
--------------------------------------------------
PUT _ingest/pipeline/geoip
{
"description" : "Add geoip info",
"processors" : [
{
"geoip" : {
"field" : "ip"
}
}
]
}
PUT my_ip_locations/_doc/1?refresh=true&pipeline=geoip
{
"ip": "8.8.8.8"
}
GET /my_ip_locations/_search
{
2020-07-22 00:24:26 +08:00
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "1m",
"geoip.location": {
"lon": -97.822,
"lat": 37.751
}
2018-11-15 12:05:45 +08:00
}
2020-07-22 00:24:26 +08:00
}
2018-11-15 12:05:45 +08:00
}
2020-07-22 00:24:26 +08:00
}
2018-11-15 12:05:45 +08:00
}
--------------------------------------------------
// TEST[continued]
2019-09-07 02:05:36 +08:00
[source,console-result]
2018-11-15 12:05:45 +08:00
--------------------------------------------------
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
2018-12-06 02:49:06 +08:00
"total" : {
"value": 1,
"relation": "eq"
},
2018-11-15 12:05:45 +08:00
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_ip_locations",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"geoip" : {
"continent_name" : "North America",
2020-10-01 03:06:51 +08:00
"country_name" : "United States",
2018-11-15 12:05:45 +08:00
"country_iso_code" : "US",
"location" : {
"lon" : -97.822,
"lat" : 37.751
}
},
"ip" : "8.8.8.8"
}
}
]
}
}
--------------------------------------------------
// TESTRESPONSE[s/"took" : 3/"took" : $body.took/]
////
2021-06-28 15:04:49 +08:00
[[manage-geoip-database-updates]]
==== Manage your own GeoIP2 database updates
If you can't <<geoip-automatic-updates,automatically update>> your GeoIP2
databases from the Elastic endpoint, you have a few other options:
* <<use-proxy-geoip-endpoint,Use a proxy endpoint>>
* <<use-custom-geoip-endpoint,Use a custom endpoint>>
* <<manually-update-geoip-databases,Manually update your GeoIP2 databases>>
[[use-proxy-geoip-endpoint]]
**Use a proxy endpoint**
If you can't connect directly to the Elastic GeoIP endpoint, consider setting up
a secure proxy. You can then specify the proxy endpoint URL in the
<<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
of each node’ s `elasticsearch.yml` file.
[[use-custom-geoip-endpoint]]
**Use a custom endpoint**
You can create a service that mimics the Elastic GeoIP endpoint. You can then
get automatic updates from this service.
. Download your `.mmdb` database files from the
http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
. Copy your database files to a single directory.
. From your {es} directory, run:
+
[source,sh]
----
./bin/elasticsearch-geoip -s my/source/dir [-t target/directory]
----
. Serve the static database files from your directory. For example, you can use
Docker to serve the files from an nginx server:
+
[source,sh]
----
docker run -v my/source/dir:/usr/share/nginx/html:ro nginx
----
. Specify the service's endpoint URL in the
<<ingest-geoip-downloader-endpoint,`ingest.geoip.downloader.endpoint`>> setting
of each node’ s `elasticsearch.yml` file.
+
By default, {es} checks the endpoint for updates every three days. To use
another polling interval, use the <<cluster-update-settings,update cluster
settings API>> to set
<<ingest-geoip-downloader-poll-interval,`ingest.geoip.downloader.poll.interval`>>.
[[manually-update-geoip-databases]]
**Manually update your GeoIP2 databases**
. Use the <<cluster-update-settings,update cluster settings API>> to set
`ingest.geoip.downloader.enabled` to `false`. This disables automatic updates
that may overwrite your database changes. This also deletes all downloaded
databases.
. Download your `.mmdb` database files from the
http://dev.maxmind.com/geoip/geoip2/geolite2[MaxMind site].
+
You can also use custom city, country, and ASN `.mmdb` files. These files must
be uncompressed and use the respective `-City.mmdb`, `-Country.mmdb`, or
`-ASN.mmdb` extensions.
. On {ess} deployments upload database using
a {cloud}/ec-custom-bundles.html[custom bundle].
. On self-managed deployments copy the database files to `$ES_CONFIG/ingest-geoip`.
. In your `geoip` processors, configure the `database_file` parameter to use a
custom database file.
2016-12-19 17:06:12 +08:00
[[ingest-geoip-settings]]
===== Node Settings
2018-12-22 22:49:56 +08:00
The `geoip` processor supports the following setting:
2016-12-19 17:06:12 +08:00
`ingest.geoip.cache_size`::
The maximum number of results that should be cached. Defaults to `1000`.
2018-12-22 22:49:56 +08:00
Note that these settings are node settings and apply to all `geoip` processors, i.e. there is one cache for all defined `geoip` processors.
2021-06-28 15:04:49 +08:00
[[geoip-cluster-settings]]
===== Cluster settings
[[ingest-geoip-downloader-enabled]]
`ingest.geoip.downloader.enabled`::
(<<dynamic-cluster-setting,Dynamic>>, Boolean)
If `true`, {es} automatically downloads and manages updates for GeoIP2 databases
from the `ingest.geoip.downloader.endpoint`. If `false`, {es} does not download
updates and deletes all downloaded databases. Defaults to `true`.
[[ingest-geoip-downloader-endpoint]]
`ingest.geoip.downloader.endpoint`::
(<<static-cluster-setting,Static>>, string)
Endpoint URL used to download updates for GeoIP2 databases. Defaults to
`https://geoip.elastic.co/v1/database`. {es} stores downloaded database files in
each node's <<es-tmpdir,temporary directory>> at
`$ES_TMPDIR/geoip-databases/<node_id>`.
[[ingest-geoip-downloader-poll-interval]]
`ingest.geoip.downloader.poll.interval`::
(<<dynamic-cluster-setting,Dynamic>>, <<time-units,time value>>)
How often {es} checks for GeoIP2 database updates at the
`ingest.geoip.downloader.endpoint`. Must be greater than `1d` (one day). Defaults
to `3d` (three days).