Docker - ELK 7.6 : Elastic Stack with Docker Compose
Elastic Stack docker/kubernetes series:
Git repo Elasticsearch stack (ELK) with docker-compose.
The following docker-compose.yml brings up a elasticseach, logstash and Kibana containers so we can see how things work. This all-in-one configuration is a handy way to bring up our first dev cluster before we build a distributed deployment with multiple hosts:
version: '3.7' services: elasticsearch: build: context: elasticsearch/ args: ELK_VERSION: $ELK_VERSION volumes: - type: bind source: ./elasticsearch/config/elasticsearch.yml target: /usr/share/elasticsearch/config/elasticsearch.yml read_only: true - type: volume source: elasticsearch target: /usr/share/elasticsearch/data ports: - "9200:9200" - "9300:9300" environment: ES_JAVA_OPTS: "-Xmx256m -Xms256m" ELASTIC_PASSWORD: changeme # Use single node discovery in order to disable production mode and avoid bootstrap checks # see https://www.elastic.co/guide/en/elasticsearch/reference/current/bootstrap-checks.html discovery.type: single-node networks: - elk logstash: build: context: logstash/ args: ELK_VERSION: $ELK_VERSION volumes: - type: bind source: ./logstash/config/logstash.yml target: /usr/share/logstash/config/logstash.yml read_only: true - type: bind source: ./logstash/pipeline target: /usr/share/logstash/pipeline read_only: true ports: - "5000:5000/tcp" - "5000:5000/udp" - "9600:9600" environment: LS_JAVA_OPTS: "-Xmx256m -Xms256m" networks: - elk depends_on: - elasticsearch kibana: build: context: kibana/ args: ELK_VERSION: $ELK_VERSION volumes: - type: bind source: ./kibana/config/kibana.yml target: /usr/share/kibana/config/kibana.yml read_only: true ports: - "5601:5601" networks: - elk depends_on: - elasticsearch networks: elk: driver: bridge volumes: elasticsearch:
Run docker-compose to bring up the three-node Elasticsearch cluster and Kibana:
$ docker-compose up Creating network "docker-elk_elk" with driver "bridge" Creating docker-elk_elasticsearch_1 ... done Creating docker-elk_kibana_1 ... done Creating docker-elk_logstash_1 ... done Attaching to docker-elk_elasticsearch_1, docker-elk_kibana_1, docker-elk_logstash_1 logstash_1 | OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release. elasticsearch_1 | Created elasticsearch keystore in /usr/share/elasticsearch/config ... elasticsearch_1 | OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release. elasticsearch_1 | {"type": "server", "timestamp": "2020-04-02T17:34:33,288Z", "level": "INFO", "component": "o.e.e.NodeEnvironment", "cluster.name": "docker-cluster", "node.name": "9097740a0d56", "message": "using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/sda1)]], net usable_space [5.4gb], net total_space [58.4gb], types [ext4]" } ... logstash_1 | [2020-04-02T17:35:44,439][INFO ][logstash.licensechecker.licensereader] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://elastic:xxxxxx@elasticsearch:9200/]}} logstash_1 | [2020-04-02T17:35:46,084][WARN ][logstash.licensechecker.licensereader] Restored connection to ES instance {:url=>"http://elastic:xxxxxx@elasticsearch:9200/"} ... elasticsearch_1 | {"type": "server", "timestamp": "2020-04-02T17:35:51,857Z", "level": "WARN", "component": "o.e.c.r.a.DiskThresholdMonitor", "cluster.name": "docker-cluster", "node.name": "9097740a0d56", "message": "high disk watermark [90%] exceeded on [AxHQR8ZNRy6JultUbzBtVg][9097740a0d56][/usr/share/elasticsearch/data/nodes/0] free: 5.4gb[9.2%], shards will be relocated away from this node; currently relocating away shards totalling [0] bytes; the node is expected to continue to exceed the high disk watermark when these relocations are complete", "cluster.uuid": "591mkpKPTqmIoyaiUNsc2g", "node.id": "AxHQR8ZNRy6JultUbzBtVg" } kibana_1 | {"type":"log","@timestamp":"2020-04-02T17:35:52Z","tags":["warning","config","deprecation"],"pid":6,"message":"Setting [elasticsearch.username] to \"elastic\" is deprecated. You should use the \"kibana\" user instead."} ... kibana_1 | {"type":"log","@timestamp":"2020-04-02T17:36:10Z","tags":["info","http","server","Kibana"],"pid":6,"message":"http server running at http://0:5601"}
Once we see from the output similar to this, we know Kibana is ready:
kibana_1 | {"type":"log","@timestamp":"2020-04-02T04:31:03Z","tags":["info","http","server","Kibana"],"pid":6,"message":"http server running at http://0:5601"}
Once everything appears to be OK, we may want to stop it and run the Elastic stack in detached mode using docker-compose up -d
.
Type in browser: localhost:5601 and login with "elastic/changeme":
Now, our ELK stack is running:
$ docker-compose ps Name Command State Ports ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- einsteinish-elk-stack-with-docker-compose_elasticsearch_1 /usr/local/bin/docker-entr ... Up 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp einsteinish-elk-stack-with-docker-compose_kibana_1 /usr/local/bin/dumb-init - ... Up 0.0.0.0:5601->5601/tcp einsteinish-elk-stack-with-docker-compose_logstash_1 /usr/local/bin/docker-entr ... Up 0.0.0.0:5000->5000/tcp, 0.0.0.0:5000->5000/udp, 5044/tcp, 0.0.0.0:9600->9600/tcp
We can check how much resources are consumed by our containers using docker stats
:
$ docker stats CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS af9519caa50f einsteinish-elk-stack-with-docker-compose_logstash_1 2.81% 466.5MiB / 1.945GiB 23.43% 48.7kB / 216kB 0B / 0B 40 2b9c23122e48 einsteinish-elk-stack-with-docker-compose_kibana_1 1.16% 288MiB / 1.945GiB 14.46% 901kB / 11.7MB 0B / 0B 12 1704e0b35296 einsteinish-elk-stack-with-docker-compose_elasticsearch_1 1.43% 545.7MiB / 1.945GiB 27.40% 1.54MB / 609kB 0B / 0B 63
Our Elastic stack is ready, however, we haven't created any index pattern. Before doing that, let's inject some log entries.
The shipped Logstash configuration (/logstash/pipeline/logstash.conf) allows us to send content via TCP:
input { tcp { port => 5000 } } output { elasticsearch { hosts => "elasticsearch:9200" user => "elastic" password => "changeme" } }
$ lsof -nP -iTCP:5000 \COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME com.docke 33800 ki.hong 35u IPv4 0xd2b8428b995337cb 0t0 TCP *:5000 (LISTEN) com.docke 33800 ki.hong 36u IPv6 0xd2b8428ba27692eb 0t0 TCP [::1]:5000 (LISTEN)
$ cat /tmp/logstash-tutorial.log |nc -c localhost 5000
We can get the logstash-tutorial.log.gz from here.
Now that we have logs, after creating an index, we get some display on the Kibana.
In Discover, we have access to every document in every index that matches the selected index pattern. The index pattern tells Kibana which Elasticsearch index we are currently exploring. We can submit search queries, filter the search results, and view document data.
Here are the steps:
In the side navigation, click Discover. Ensure "logstash*" is the current index pattern.
We can see a histogram that shows the distribution of documents over time. A table lists the fields for each matching document. By default, all fields are shown.
To choose which fields to display, hover the pointer over the list of Available fields, and then click add next to each field we want include as a column in the table. For example, if we add the message and DestWeather and @timestamp fields, the display includes columns for those two fields.
Here, we'll play with queries.
Before we do that, let's make sure the setup for xpack in elasticsearch/config/elasticsearch.yml is set as the following:
xpack.license.self_generated.type: basic xpack.security.enabled: true xpack.monitoring.collection.enabled: true
Otherwise, we may have the following error because a request without authentication credentials which throws this Exception as security is now enabled.
{"error":{"root_cause":[{"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}}],"type":"security_exception","reason":"missing authentication credentials for REST request [/]","header":{"WWW-Authenticate":"Basic realm=\"security\" charset=\"UTF-8\""}},"status":401}
After the change, we need to restart the stack.
We'll start to using cat APIs which are only intended for human consumption using the Kibana console or command line.
On a Browser, then we'd be prompted for authentication (Basic Authentication) . If we're using i.e. curl, we need to pass the credentials with our request with -u username:password (elastic:changeme).
Not recommended, we can disable the security by setting it false:
xpack.license.self_generated.type: basic xpack.security.enabled: false xpack.monitoring.collection.enabled: true
To list all available commands:
- curl:
$ curl -X GET "localhost:9200/_cat" -u elastic:changeme =^.^= /_cat/allocation /_cat/shards /_cat/shards/{index} /_cat/master /_cat/nodes /_cat/tasks /_cat/indices /_cat/indices/{index} /_cat/segments /_cat/segments/{index} /_cat/count /_cat/count/{index} /_cat/recovery /_cat/recovery/{index} /_cat/health /_cat/pending_tasks /_cat/aliases /_cat/aliases/{alias} /_cat/thread_pool /_cat/thread_pool/{thread_pools} /_cat/plugins /_cat/fielddata /_cat/fielddata/{fields} /_cat/nodeattrs /_cat/repositories /_cat/snapshots/{repository} /_cat/templates
- browser:
Each of the _cat commands accepts a query string parameter v to turn on verbose output. For example, on Kibana console with Dev Tools:
$ curl -X GET "localhost:9200/_cat/nodes?" 192.168.96.2 60 92 5 0.35 0.32 0.41 dilm * 8b73d9076e68
h query string parameter which forces only those columns to appear:
$ curl -X GET "localhost:9200/_cat/nodes?h=ip,port,heapPercent,name&pretty" 192.168.96.2 9300 67 8b73d9076e68
We can also request multiple columns using simple wildcards like /_cat/thread_pool?h=ip,queue* to get all headers (or aliases) starting with queue.
$ curl -X GET "localhost:9200/_cat/thread_pool?h=ip,queue*" 192.168.96.2 0 16 192.168.96.2 0 100 ... 192.168.96.2 0 -1 192.168.96.2 0 4 192.168.96.2 0 -1 192.168.96.2 0 1000 192.168.96.2 0 200
If we want to find the largest index in our cluster (storage used by all the shards, not number of documents). The /_cat/indices API is ideal. We only need to add three things to the API request:
- The bytes query string parameter with a value of b to get byte-level resolution.
- The s (sort) parameter with a value of store.size:desc to sort the output by shard storage in descending order.
- The v (verbose) parameter to include column headings in the response.
$ curl -X GET "localhost:9200/_cat/indices?bytes=b&s=store.size:desc&v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .monitoring-es-7-2020.04.06 XQSHRxs7RsOZ3ZeLsa0Y_Q 1 0 77410 55160 34506536 34506536 green open .monitoring-es-7-2020.04.10 Gjs4h4dqTIWKC3owN0cjqQ 1 0 9435 1443 10628315 10628315 green open .monitoring-logstash-7-2020.04.06 6itFo78lShiWIb1e5i6GKg 1 0 43969 0 3007501 3007501 green open .monitoring-kibana-7-2020.04.06 LWbl13UVQLq_cWwkCccatA 1 0 5522 0 1220685 1220685 green open .monitoring-logstash-7-2020.04.10 myGRrPNMRlebzYfOdBbHUQ 1 0 2577 0 542115 542115 green open .monitoring-kibana-7-2020.04.10 knd52K_vSTellI_qhGPYpA 1 0 544 0 269953 269953 green open .security-7 e21FT4JoQ2WML_oax2GFYA 1 0 36 0 99098 99098 green open .kibana_1 2yJ-CzinQ-Czv0H3Rg4mQg 1 0 10 1 39590 39590 yellow open logstash-2020.04.06-000001 YShJ9NKUQO-4TuwhS0MlXA 1 1 100 0 36727 36727 green open ilm-history-1-000001 rVfV3nLQSXOM7c6yN68dbg 1 0 18 0 32919 32919 green open .kibana_task_manager_1 y29CTX98TEuZt3pb6lnXhA 1 0 2 0 6823 6823 green open .apm-agent-configuration zC5fg2AhSVK0TvV3WUcv_Q 1 0 0 0 283 283
The following queries give the same response in json format:
$ curl 'localhost:9200/_cat/indices?format=json&pretty' [ { "health" : "green", "status" : "open", "index" : ".security-7", "uuid" : "e21FT4JoQ2WML_oax2GFYA", "pri" : "1", "rep" : "0", "docs.count" : "36", "docs.deleted" : "0", "store.size" : "96.7kb", "pri.store.size" : "96.7kb" }, ... $ curl 'localhost:9200/_cat/indices?pretty' -H "Accept: application/json" [ { "health" : "green", "status" : "open", "index" : ".security-7", "uuid" : "e21FT4JoQ2WML_oax2GFYA", "pri" : "1", "rep" : "0", "docs.count" : "36", "docs.deleted" : "0", "store.size" : "96.7kb", "pri.store.size" : "96.7kb" },
s query string parameter which sorts the table by the columns specified as the parameter value. Columns are specified either by name or by alias, and are provided as a comma separated string. By default, sorting is done in ascending fashion. Appending :desc to a column will invert the ordering for that column. :asc is also accepted but exhibits the same behavior as the default sort order.
For example, with a sort string s=column1,column2:desc,column3, the table will be sorted in ascending order by column1, in descending order by column2, and in ascending order by column3.
Let's put JSON documents into an Elasticsearch index.
We can do this directly with a simple PUT request that specifies the index we want to add the document, a unique document ID, and one or more "field": "value" pairs in the request body:
$ curl -X PUT "localhost:9200/customer/_doc/1?pretty" -H 'Content-Type: application/json' -d' { "name": "John Doe" } ' { "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1 }
This request automatically creates the customer index if it doesn’t already exist, adds a new document that has an ID of 1, and stores and indexes the name field.
The new document is available immediately from any node in the cluster. We can retrieve it with a GET request that specifies its document ID:
$ curl -X GET "localhost:9200/customer/_doc/1?pretty" { "_index" : "customer", "_type" : "_doc", "_id" : "1", "_version" : 1, "_seq_no" : 0, "_primary_term" : 1, "found" : true, "_source" : { "name" : "John Doe" } }
If we have a lot of documents to index, we can submit them in batches with the https://www.elastic.co/guide/en/elasticsearch/reference/7.6/docs-bulk.htmlbulk API.
Let's download the accounts.json sample data set:
$ curl -L https://github.com/elastic/elasticsearch/blob/master/docs/src/test/resources/accounts.json?raw=true -o accounts.json
The data is randomly-generated data set represent user accounts with the following information (Index some documents):
{ "account_number": 0, "balance": 16623, "firstname": "Bradshaw", "lastname": "Mckenzie", "age": 29, "gender": "F", "address": "244 Columbus Place", "employer": "Euron", "email": "bradshawmckenzie@euron.com", "city": "Hobucken", "state": "CO" }
We're going to index the account data into the bank index with the following _bulk request:
$ curl -H "Content-Type: application/json" -XPOST \ "localhost:9200/bank/_bulk?pretty&refresh" \ --data-binary "@accounts.json" { "took" : 913, "errors" : false, "items" : [ { "index" : { "_index" : "bank", "_type" : "_doc", "_id" : "1", "_version" : 1, "result" : "created", "forced_refresh" : true, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 0, "_primary_term" : 1, "status" : 201 } }, ... { "index" : { "_index" : "bank", "_type" : "_doc", "_id" : "995", "_version" : 1, "result" : "created", "forced_refresh" : true, "_shards" : { "total" : 2, "successful" : 1, "failed" : 0 }, "_seq_no" : 999, "_primary_term" : 1, "status" : 201 } } ] }
The --data-binary posts data exactly as specified with no extra processing whatsoever while --data or -d sends the specified data in a POST request to the HTTP server, in the same way that a browser does when a user has filled in an HTML form and presses the submit button. This will cause curl to pass the data to the server using the content-type application/x-www-form-urlencoded.
We can check if the 1,000 documents were indexed successfully:
$ curl -X GET "localhost:9200/_cat/indices?v" health status index uuid pri rep docs.count docs.deleted store.size pri.store.size green open .apm-agent-configuration zC5fg2AhSVK0TvV3WUcv_Q 1 0 0 0 283b 283b green open .kibana_1 2yJ-CzinQ-Czv0H3Rg4mQg 1 0 10 1 38.6kb 38.6kb green open .kibana_task_manager_1 y29CTX98TEuZt3pb6lnXhA 1 0 2 0 32kb 32kb green open .monitoring-es-7-2020.04.06 XQSHRxs7RsOZ3ZeLsa0Y_Q 1 0 77410 55160 32.9mb 32.9mb green open .monitoring-es-7-2020.04.10 Gjs4h4dqTIWKC3owN0cjqQ 1 0 30096 0 10.7mb 10.7mb green open .monitoring-es-7-2020.04.11 FMSpb4JKScGYhu8nEzXt1A 1 0 95 18 695.5kb 695.5kb green open .monitoring-kibana-7-2020.04.06 LWbl13UVQLq_cWwkCccatA 1 0 5522 0 1.1mb 1.1mb green open .monitoring-kibana-7-2020.04.10 knd52K_vSTellI_qhGPYpA 1 0 1752 0 534.6kb 534.6kb green open .monitoring-kibana-7-2020.04.11 GxM1BDKvRkGTv0gWyt8U_A 1 0 3 0 42.9kb 42.9kb green open .monitoring-logstash-7-2020.04.06 6itFo78lShiWIb1e5i6GKg 1 0 43969 0 2.8mb 2.8mb green open .monitoring-logstash-7-2020.04.10 myGRrPNMRlebzYfOdBbHUQ 1 0 8617 0 1.1mb 1.1mb green open .monitoring-logstash-7-2020.04.11 TAjxbas0Rd-5mb2JOmvr0A 1 0 15 0 95.5kb 95.5kb green open .security-7 e21FT4JoQ2WML_oax2GFYA 1 0 36 0 96.7kb 96.7kb yellow open bank bDhhObs0SMiHpPJti21rZA 1 1 1000 0 414.1kb 414.1kb yellow open customer Q68qN_NBSOqz3dnWG6P0yQ 1 1 1 0 3.4kb 3.4kb green open ilm-history-1-000001 rVfV3nLQSXOM7c6yN68dbg 1 0 18 0 32.1kb 32.1kb yellow open logstash-2020.04.06-000001 YShJ9NKUQO-4TuwhS0MlXA 1 1 100 0 35.8kb 35.8kb
Just to see the bank index:
$ curl -X GET localhost:9200/_cat/indices/bank health status index uuid pri rep docs.count docs.deleted store.size pri.store.size yellow open bank bDhhObs0SMiHpPJti21rZA 1 1 1000 0 414.1kb 414.1kb
The following section is based on Elasticsearch Reference [7.9] » Getting started with Elasticsearch » Start searching.
Now that we have ingested some data into an Elasticsearch index, we can search it by sending requests to the _search endpoint. To access the full suite of search capabilities, we use the Elasticsearch Query DSL to specify the search criteria in the request body. We specify the name of the index we want to search in the request URI.
The following request, for example, retrieves all documents in the bank index sorted by account number:
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] } ' { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] } ' { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "0", "_score" : null, "_source" : { "account_number" : 0, "balance" : 16623, "firstname" : "Bradshaw", "lastname" : "Mckenzie", "age" : 29, "gender" : "F", "address" : "244 Columbus Place", "employer" : "Euron", "email" : "bradshawmckenzie@euron.com", "city" : "Hobucken", "state" : "CO" }, "sort" : [ 0 ] }, ... { "_index" : "bank", "_type" : "_doc", "_id" : "9", "_score" : null, "_source" : { "account_number" : 9, "balance" : 24776, "firstname" : "Opal", "lastname" : "Meadows", "age" : 39, "gender" : "M", "address" : "963 Neptune Avenue", "employer" : "Cedward", "email" : "opalmeadows@cedward.com", "city" : "Olney", "state" : "OH" }, "sort" : [ 9 ] } ] } }
As we can see from the output above, by default, the hits section of the response includes the first 10 documents that match the search criteria.
The response also provides the following information about the search request:
- took – how long it took Elasticsearch to run the query, in milliseconds
- timed_out – whether or not the search request timed out
- _shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
- hits.total.value - how many matching documents were found
- hits.max_score – the score of the most relevant document found
- hits.sort - the document’s sort position (when not sorting by relevance score)
- hits._score - the document’s relevance score (not applicable when using match_all)
To page through the search hits, specify the from and size parameters in our request. For example, the following request gets hits 10 through 12:
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ], "from": 10, "size": 3 } ' { "took" : 15, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "10", "_score" : null, "_source" : { "account_number" : 10, "balance" : 46170, "firstname" : "Dominique", "lastname" : "Park", "age" : 37, "gender" : "F", "address" : "100 Gatling Place", "employer" : "Conjurica", "email" : "dominiquepark@conjurica.com", "city" : "Omar", "state" : "NJ" }, "sort" : [ 10 ] }, { "_index" : "bank", "_type" : "_doc", "_id" : "11", "_score" : null, "_source" : { "account_number" : 11, "balance" : 20203, "firstname" : "Jenkins", "lastname" : "Haney", "age" : 20, "gender" : "M", "address" : "740 Ferry Place", "employer" : "Qimonk", "email" : "jenkinshaney@qimonk.com", "city" : "Steinhatchee", "state" : "GA" }, "sort" : [ 11 ] }, { "_index" : "bank", "_type" : "_doc", "_id" : "12", "_score" : null, "_source" : { "account_number" : 12, "balance" : 37055, "firstname" : "Stafford", "lastname" : "Brock", "age" : 20, "gender" : "F", "address" : "296 Wythe Avenue", "employer" : "Uncorp", "email" : "staffordbrock@uncorp.com", "city" : "Bend", "state" : "AL" }, "sort" : [ 12 ] } ] } }
Now can start to construct queries that are a bit more interesting than match_all.
To search for specific terms within a field, we can use a match query. For example, the following request searches the address field to find customers whose addresses contain mill or lane:
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match": { "address": "mill lane" } } } ' { "took" : 18, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 19, "relation" : "eq" }, "max_score" : 9.507477, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "136", "_score" : 9.507477, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } }, { "_index" : "bank", "_type" : "_doc", "_id" : "970", "_score" : 5.4032025, "_source" : { "account_number" : 970, "balance" : 19648, "firstname" : "Forbes", "lastname" : "Wallace", "age" : 28, "gender" : "M", "address" : "990 Mill Road", "employer" : "Pheast", "email" : "forbeswallace@pheast.com", "city" : "Lopezo", "state" : "AK" } },
To perform a phrase search rather than matching individual terms, we use match_phrase instead of match. For example, the following request only matches addresses that contain the phrase mill lane:
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "match_phrase": { "address": "mill lane" } } } ' { "took" : 45, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1, "relation" : "eq" }, "max_score" : 9.507477, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "136", "_score" : 9.507477, "_source" : { "account_number" : 136, "balance" : 45801, "firstname" : "Winnie", "lastname" : "Holland", "age" : 38, "gender" : "M", "address" : "198 Mill Lane", "employer" : "Neteria", "email" : "winnieholland@neteria.com", "city" : "Urie", "state" : "IL" } } ] } }
To construct more complex queries, we can use a bool query to combine multiple query criteria. We can designate criteria as required (must match), desirable (should match), or undesirable (must not match).
For example, the following request searches the bank index for accounts that belong to customers who are 33 years old, but excludes anyone who lives in Idaho (ID):
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": [ { "match": { "age": "33" } } ], "must_not": [ { "match": { "state": "ID" } } ] } } } ' { "took" : 6, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 50, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "18", "_score" : 1.0, "_source" : { "account_number" : 18, "balance" : 4180, "firstname" : "Dale", "lastname" : "Adams", "age" : 33, "gender" : "M", "address" : "467 Hutchinson Court", "employer" : "Boink", "email" : "daleadams@boink.com", "city" : "Orick", "state" : "MD" } }, ...
Each must, should, and must_not element in a Boolean query is referred to as a query clause. How well a document meets the criteria in each must or should clause contributes to the document’s relevance score. The higher the score, the better the document matches our search criteria. By default, Elasticsearch returns documents ranked by these relevance scores.
The criteria in a must_not clause is treated as a filter. It affects whether or not the document is included in the results, but does not contribute to how documents are scored. We can also explicitly specify arbitrary filters to include or exclude documents based on structured data.
For example, the following request uses a range filter to limit the results to accounts with a balance between $20,000 and $30,000 (inclusive).
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "query": { "bool": { "must": { "match_all": {} }, "filter": { "range": { "balance": { "gte": 20000, "lte": 30000 } } } } } } ' { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 217, "relation" : "eq" }, "max_score" : 1.0, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "49", "_score" : 1.0, "_source" : { "account_number" : 49, "balance" : 29104, "firstname" : "Fulton", "lastname" : "Holt", "age" : 23, "gender" : "F", "address" : "451 Humboldt Street", "employer" : "Anocha", "email" : "fultonholt@anocha.com", "city" : "Sunriver", "state" : "RI" } },
This section is based on Elasticsearch Reference [7.9] » Getting started with Elasticsearch » Analyze results with aggregations
Elasticsearch aggregations enable us to get meta-information about our search results and answer questions like, "How many account holders are in Texas?" or "What’s the average balance of accounts in Tennessee?" We can search documents, filter hits, and use aggregations to analyze the results all in one request.
For example, the following request uses a terms aggregation to group all of the accounts in the bank index by state, and returns the ten states with the most accounts in descending order:
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" } } } } ' { "took" : 11, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 743, "buckets" : [ { "key" : "TX", "doc_count" : 30 }, { "key" : "MD", "doc_count" : 28 }, { "key" : "ID", "doc_count" : 27 }, { "key" : "AL", "doc_count" : 25 }, { "key" : "ME", "doc_count" : 25 }, { "key" : "TN", "doc_count" : 25 }, { "key" : "WY", "doc_count" : 25 }, { "key" : "DC", "doc_count" : 24 }, { "key" : "MA", "doc_count" : 24 }, { "key" : "ND", "doc_count" : 24 } ] } } }
The buckets in the response are the values of the state field. The doc_count shows the number of accounts in each state. For example, we can see that there are 27 accounts in ID (Idaho). Because the request set size=0, the response only contains the aggregation results but not including the details of the accounts like this:
"hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "1", "_score" : 1.0, "_source" : { "account_number" : 1, "balance" : 39225, "firstname" : "Amber", "lastname" : "Duke", "age" : 32, "gender" : "M", "address" : "880 Holmes Lane", "employer" : "Pyrami", "email" : "amberduke@pyrami.com", "city" : "Brogan", "state" : "IL" } }, ...
We can combine aggregations to build more complex summaries of our data. For example, the following request nests an avg aggregation within the previous group_by_state aggregation to calculate the average account balances for each state.
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword" }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } ' { "took" : 38, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 743, "buckets" : [ { "key" : "TX", "doc_count" : 30, "average_balance" : { "value" : 26073.3 } }, { "key" : "MD", "doc_count" : 28, "average_balance" : { "value" : 26161.535714285714 } }, { "key" : "ID", "doc_count" : 27, "average_balance" : { "value" : 24368.777777777777 } }, { "key" : "AL", "doc_count" : 25, "average_balance" : { "value" : 25739.56 } }, { "key" : "ME", "doc_count" : 25, "average_balance" : { "value" : 21663.0 } }, { "key" : "TN", "doc_count" : 25, "average_balance" : { "value" : 28365.4 } }, { "key" : "WY", "doc_count" : 25, "average_balance" : { "value" : 21731.52 } }, { "key" : "DC", "doc_count" : 24, "average_balance" : { "value" : 23180.583333333332 } }, { "key" : "MA", "doc_count" : 24, "average_balance" : { "value" : 29600.333333333332 } }, { "key" : "ND", "doc_count" : 24, "average_balance" : { "value" : 26577.333333333332 } } ] } } }
Instead of sorting the results by count, we could sort using the result of the nested aggregation by specifying the order within the terms aggregation:
$ curl -X GET "localhost:9200/bank/_search?pretty" -H 'Content-Type: application/json' -d' { "size": 0, "aggs": { "group_by_state": { "terms": { "field": "state.keyword", "order": { "average_balance": "desc" } }, "aggs": { "average_balance": { "avg": { "field": "balance" } } } } } } ' { "took" : 37, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : { "value" : 1000, "relation" : "eq" }, "max_score" : null, "hits" : [ ] }, "aggregations" : { "group_by_state" : { "doc_count_error_upper_bound" : -1, "sum_other_doc_count" : 827, "buckets" : [ { "key" : "CO", "doc_count" : 14, "average_balance" : { "value" : 32460.35714285714 } }, { "key" : "NE", "doc_count" : 16, "average_balance" : { "value" : 32041.5625 } }, { "key" : "AZ", "doc_count" : 14, "average_balance" : { "value" : 31634.785714285714 } }, { "key" : "MT", "doc_count" : 17, "average_balance" : { "value" : 31147.41176470588 } }, { "key" : "VA", "doc_count" : 16, "average_balance" : { "value" : 30600.0625 } }, { "key" : "GA", "doc_count" : 19, "average_balance" : { "value" : 30089.0 } }, { "key" : "MA", "doc_count" : 24, "average_balance" : { "value" : 29600.333333333332 } }, { "key" : "IL", "doc_count" : 22, "average_balance" : { "value" : 29489.727272727272 } }, { "key" : "NM", "doc_count" : 14, "average_balance" : { "value" : 28792.64285714286 } }, { "key" : "LA", "doc_count" : 17, "average_balance" : { "value" : 28791.823529411766 } } ] } } }
Credits: this post is based on the repo: Elastic stack (ELK) on Docker
Docker & K8s
- Docker install on Amazon Linux AMI
- Docker install on EC2 Ubuntu 14.04
- Docker container vs Virtual Machine
- Docker install on Ubuntu 14.04
- Docker Hello World Application
- Nginx image - share/copy files, Dockerfile
- Working with Docker images : brief introduction
- Docker image and container via docker commands (search, pull, run, ps, restart, attach, and rm)
- More on docker run command (docker run -it, docker run --rm, etc.)
- Docker Networks - Bridge Driver Network
- Docker Persistent Storage
- File sharing between host and container (docker run -d -p -v)
- Linking containers and volume for datastore
- Dockerfile - Build Docker images automatically I - FROM, MAINTAINER, and build context
- Dockerfile - Build Docker images automatically II - revisiting FROM, MAINTAINER, build context, and caching
- Dockerfile - Build Docker images automatically III - RUN
- Dockerfile - Build Docker images automatically IV - CMD
- Dockerfile - Build Docker images automatically V - WORKDIR, ENV, ADD, and ENTRYPOINT
- Docker - Apache Tomcat
- Docker - NodeJS
- Docker - NodeJS with hostname
- Docker Compose - NodeJS with MongoDB
- Docker - Prometheus and Grafana with Docker-compose
- Docker - StatsD/Graphite/Grafana
- Docker - Deploying a Java EE JBoss/WildFly Application on AWS Elastic Beanstalk Using Docker Containers
- Docker : NodeJS with GCP Kubernetes Engine
- Docker : Jenkins Multibranch Pipeline with Jenkinsfile and Github
- Docker : Jenkins Master and Slave
- Docker - ELK : ElasticSearch, Logstash, and Kibana
- Docker - ELK 7.6 : Elasticsearch on Centos 7
- Docker - ELK 7.6 : Filebeat on Centos 7
- Docker - ELK 7.6 : Logstash on Centos 7
- Docker - ELK 7.6 : Kibana on Centos 7
- Docker - ELK 7.6 : Elastic Stack with Docker Compose
- Docker - Deploy Elastic Cloud on Kubernetes (ECK) via Elasticsearch operator on minikube
- Docker - Deploy Elastic Stack via Helm on minikube
- Docker Compose - A gentle introduction with WordPress
- Docker Compose - MySQL
- MEAN Stack app on Docker containers : micro services
- MEAN Stack app on Docker containers : micro services via docker-compose
- Docker Compose - Hashicorp's Vault and Consul Part A (install vault, unsealing, static secrets, and policies)
- Docker Compose - Hashicorp's Vault and Consul Part B (EaaS, dynamic secrets, leases, and revocation)
- Docker Compose - Hashicorp's Vault and Consul Part C (Consul)
- Docker Compose with two containers - Flask REST API service container and an Apache server container
- Docker compose : Nginx reverse proxy with multiple containers
- Docker & Kubernetes : Envoy - Getting started
- Docker & Kubernetes : Envoy - Front Proxy
- Docker & Kubernetes : Ambassador - Envoy API Gateway on Kubernetes
- Docker Packer
- Docker Cheat Sheet
- Docker Q & A #1
- Kubernetes Q & A - Part I
- Kubernetes Q & A - Part II
- Docker - Run a React app in a docker
- Docker - Run a React app in a docker II (snapshot app with nginx)
- Docker - NodeJS and MySQL app with React in a docker
- Docker - Step by Step NodeJS and MySQL app with React - I
- Installing LAMP via puppet on Docker
- Docker install via Puppet
- Nginx Docker install via Ansible
- Apache Hadoop CDH 5.8 Install with QuickStarts Docker
- Docker - Deploying Flask app to ECS
- Docker Compose - Deploying WordPress to AWS
- Docker - WordPress Deploy to ECS with Docker-Compose (ECS-CLI EC2 type)
- Docker - WordPress Deploy to ECS with Docker-Compose (ECS-CLI Fargate type)
- Docker - ECS Fargate
- Docker - AWS ECS service discovery with Flask and Redis
- Docker & Kubernetes : minikube
- Docker & Kubernetes 2 : minikube Django with Postgres - persistent volume
- Docker & Kubernetes 3 : minikube Django with Redis and Celery
- Docker & Kubernetes 4 : Django with RDS via AWS Kops
- Docker & Kubernetes : Kops on AWS
- Docker & Kubernetes : Ingress controller on AWS with Kops
- Docker & Kubernetes : HashiCorp's Vault and Consul on minikube
- Docker & Kubernetes : HashiCorp's Vault and Consul - Auto-unseal using Transit Secrets Engine
- Docker & Kubernetes : Persistent Volumes & Persistent Volumes Claims - hostPath and annotations
- Docker & Kubernetes : Persistent Volumes - Dynamic volume provisioning
- Docker & Kubernetes : DaemonSet
- Docker & Kubernetes : Secrets
- Docker & Kubernetes : kubectl command
- Docker & Kubernetes : Assign a Kubernetes Pod to a particular node in a Kubernetes cluster
- Docker & Kubernetes : Configure a Pod to Use a ConfigMap
- AWS : EKS (Elastic Container Service for Kubernetes)
- Docker & Kubernetes : Run a React app in a minikube
- Docker & Kubernetes : Minikube install on AWS EC2
- Docker & Kubernetes : Cassandra with a StatefulSet
- Docker & Kubernetes : Terraform and AWS EKS
- Docker & Kubernetes : Pods and Service definitions
- Docker & Kubernetes : Service IP and the Service Type
- Docker & Kubernetes : Kubernetes DNS with Pods and Services
- Docker & Kubernetes : Headless service and discovering pods
- Docker & Kubernetes : Scaling and Updating application
- Docker & Kubernetes : Horizontal pod autoscaler on minikubes
- Docker & Kubernetes : From a monolithic app to micro services on GCP Kubernetes
- Docker & Kubernetes : Rolling updates
- Docker & Kubernetes : Deployments to GKE (Rolling update, Canary and Blue-green deployments)
- Docker & Kubernetes : Slack Chat Bot with NodeJS on GCP Kubernetes
- Docker & Kubernetes : Continuous Delivery with Jenkins Multibranch Pipeline for Dev, Canary, and Production Environments on GCP Kubernetes
- Docker & Kubernetes : NodePort vs LoadBalancer vs Ingress
- Docker & Kubernetes : MongoDB / MongoExpress on Minikube
- Docker & Kubernetes : Load Testing with Locust on GCP Kubernetes
- Docker & Kubernetes : MongoDB with StatefulSets on GCP Kubernetes Engine
- Docker & Kubernetes : Nginx Ingress Controller on Minikube
- Docker & Kubernetes : Setting up Ingress with NGINX Controller on Minikube (Mac)
- Docker & Kubernetes : Nginx Ingress Controller for Dashboard service on Minikube
- Docker & Kubernetes : Nginx Ingress Controller on GCP Kubernetes
- Docker & Kubernetes : Kubernetes Ingress with AWS ALB Ingress Controller in EKS
- Docker & Kubernetes : Setting up a private cluster on GCP Kubernetes
- Docker & Kubernetes : Kubernetes Namespaces (default, kube-public, kube-system) and switching namespaces (kubens)
- Docker & Kubernetes : StatefulSets on minikube
- Docker & Kubernetes : RBAC
- Docker & Kubernetes Service Account, RBAC, and IAM
- Docker & Kubernetes - Kubernetes Service Account, RBAC, IAM with EKS ALB, Part 1
- Docker & Kubernetes : Helm Chart
- Docker & Kubernetes : My first Helm deploy
- Docker & Kubernetes : Readiness and Liveness Probes
- Docker & Kubernetes : Helm chart repository with Github pages
- Docker & Kubernetes : Deploying WordPress and MariaDB with Ingress to Minikube using Helm Chart
- Docker & Kubernetes : Deploying WordPress and MariaDB to AWS using Helm 2 Chart
- Docker & Kubernetes : Deploying WordPress and MariaDB to AWS using Helm 3 Chart
- Docker & Kubernetes : Helm Chart for Node/Express and MySQL with Ingress
- Docker & Kubernetes : Deploy Prometheus and Grafana using Helm and Prometheus Operator - Monitoring Kubernetes node resources out of the box
- Docker & Kubernetes : Deploy Prometheus and Grafana using kube-prometheus-stack Helm Chart
- Docker & Kubernetes : Istio (service mesh) sidecar proxy on GCP Kubernetes
- Docker & Kubernetes : Istio on EKS
- Docker & Kubernetes : Istio on Minikube with AWS EC2 for Bookinfo Application
- Docker & Kubernetes : Deploying .NET Core app to Kubernetes Engine and configuring its traffic managed by Istio (Part I)
- Docker & Kubernetes : Deploying .NET Core app to Kubernetes Engine and configuring its traffic managed by Istio (Part II - Prometheus, Grafana, pin a service, split traffic, and inject faults)
- Docker & Kubernetes : Helm Package Manager with MySQL on GCP Kubernetes Engine
- Docker & Kubernetes : Deploying Memcached on Kubernetes Engine
- Docker & Kubernetes : EKS Control Plane (API server) Metrics with Prometheus
- Docker & Kubernetes : Spinnaker on EKS with Halyard
- Docker & Kubernetes : Continuous Delivery Pipelines with Spinnaker and Kubernetes Engine
- Docker & Kubernetes : Multi-node Local Kubernetes cluster : Kubeadm-dind (docker-in-docker)
- Docker & Kubernetes : Multi-node Local Kubernetes cluster : Kubeadm-kind (k8s-in-docker)
- Docker & Kubernetes : nodeSelector, nodeAffinity, taints/tolerations, pod affinity and anti-affinity - Assigning Pods to Nodes
- Docker & Kubernetes : Jenkins-X on EKS
- Docker & Kubernetes : ArgoCD App of Apps with Heml on Kubernetes
- Docker & Kubernetes : ArgoCD on Kubernetes cluster
- Docker & Kubernetes : GitOps with ArgoCD for Continuous Delivery to Kubernetes clusters (minikube) - guestbook
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization