High Availability with Logstash

There are countless benefits that centralization brings to your infrastructure. In the last years, several solutions appeared, and among them the Logstash stands out.

In this article I will explain how to create a high-availability cluster for centralized logs with Logstash and Elasticsearch. With this cluster we will be able to index 750GB of logs per day.

Scenario

The infrastructure was developed in AWS, using 4 c3.4xlarge instances with 4 disks of 750 GB in RAID 0 and operating system CentOS 6.5.

These machines receive logs of 350 linux servers.

Logstash

Installing and configuring Elasticsearch

http://www.elasticsearch.org

Elasticsearch is a document-oriented search engine, that keeps them in JSON format, provides real time data and can be configured to provide high availability.

Installing Elasticsearch

This tutorial was made using CentOS 6.x as the server’s OS.

First, let’s create the repository for Elasticsearch, creating the elasticsearch.repo file in /etc/yum.repos.d/ with the following content:

[elasticsearch-1.4]
name=Elasticsearch repository for 1.4.x packages
baseurl=http://packages.elasticsearch.org/elasticsearch/1.4/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1
sslverify=true

Now install Elasticsearch and start it on the boot system.

rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch
yum install elasticsearch
chkconfig --add elasticsearch
chkconfig elasticsearch on

Create the RAID 0 via software, such as in the example below, and mount the Elasticsearch partition:

mdadm -C /dev/md0 --level=raid0 --raid-devices=4 /dev/xvdl /dev/xvdm /dev/xvdn /dev/xvdo
mkfs.ext4 /dev/md0
mkdir /elasticsearch
mount -t ext4 /dev/md0 /elasticsearch/
chown -R elasticsearch:elasticsearch /elasticsearch

Now, let’s set the Elasticsearch “datadir“, to start it in the partition we created. Set the amount of memory for Elasticsearch, and configure it so that it uses only IPv4.

The c3.4xlarge machine has 30GB of memory. I recommend to use 20GB for the Elasticsearch.

Edit the file /etc/sysconfig/elasticsearch and set the following parameters:

ES_HEAP_SIZE=20g
ES_JAVA_OPTS="-Djava.net.preferIPv4Stack=true"
DATA_DIR=/elasticsearch

Restart the service:

service elasticsearch restart

Note: This configuration must be done in all servers of the cluster.

Creating the Elasticsearch Cluster

Edit the file /etc/elasticsearch/elasticsearch.yml to configure the cluster.

cluster.name: logstash-elasticsearch
node.name: "server1"
node.master: true
node.data: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["server1", "server2", "server3", "server4"]
discovery.zen.minimum_master_nodes: 3
discovery.zen.ping.timeout: 15s
cluster.routing.allocation.enable: all

cluster.name: It is a cluster name, it can be anyone of his preference, it is used to discover and auto-join other nodes.

node.name: You may also want to change the default node name for each node to something like the display hostname. By default
Elasticsearch will randomly pick a Marvel character name from a list of around 3000 names when your node starts up.

node.master: Allow this node to be eligible as a master node (enabled by default).

node.data: Elasticsearch allows to configure a node to either be allowed to store data locally or not. Storing data locally basically means that shards of different indices are allowed to be allocated on that node. By default, each node is considered to be a data node, and it can be turned off by setting node.data to false.

discovery.zen.ping.multicast.enabled: Multicast ping discovery of other nodes is done by sending one or more multicast requests which existing nodes will receive and respond to.

discovery.zen.ping.unicast.hosts: The unicast discovery allows for discovery when multicast is not enabled. It basically requires a list of hosts to use that will act as gossip routers.

discovery.zen.minimum_master_nodes: The minimum_master_nodes setting is extremely important to the stability of your cluster. This setting helps prevent split brains, the existence of two masters in a single cluster.

When you have a split brain, your cluster is at danger of losing data. Because the master is considered the supreme ruler of the cluster, it decides when new indices can be created, how shards are moved, and so forth. If you have two masters, data integrity becomes perilous, since you have two nodes that think they are in charge.

This setting tells Elasticsearch to not elect a master unless there are enough master-eligible nodes available. Only then will an election take place.

This setting should always be configured to a quorum (majority) of your master-eligible nodes. A quorum is (number of master-eligible nodes / 2) + 1.

discovery.zen.ping.timeout: The discovery.zen.ping_timeout (which defaults to 3s) allows for the tweaking of election time to handle cases of slow or congested networks (higher values assure less chance of failure).

Shards Allocation: Shards allocation is the process of allocating shards to nodes. This can happen during initial recovery, replica allocation, rebalancing, or handling nodes being added or removed.

cluster.routing.allocation.enable: Controls shard allocation for all indices, by allowing specific kinds of shard to be allocated.

Can be set to:

  • all: (default) Allows shard allocation for all kinds of shards.
  • primaries: Allows shard allocation only for primary shards.
  • new_primaries: Allows shard allocation only for primary shards for new indices.
  • none: No shard allocations of any kind are allowed for all indices.

Note: This configuration must be done in all servers of the cluster, changing only node.name.

Elasticsearch Tuning

Edit the file /etc/elasticsearch/elasticsearch.yml:

bootstrap.mlockall: true
indices.memory.index_buffer_size: 50%
index.translog.flush_threshold_ops: 50000
index.number_of_shards: 3
index.number_of_replicas: 1

threadpool.search.type: fixed
threadpool.search.size: 20
threadpool.search.queue_size: 200

threadpool.index.type: fixed
threadpool.index.size: 60
threadpool.index.queue_size: 200

bootstrap.mlockall: Using mlockall to lock the process address space into RAM, preventing any Elasticsearch memory from being swapped out.

Indexing Buffer: The indexing buffer setting allows to control how much memory will be allocated for the indexing process. It is a global setting that bubbles down to all the different shards allocated on a specific node.

indices.memory.index_buffer_size: The indices.memory.index_buffer_size accepts either a percentage or a byte size value. It defaults to 10%, meaning that 10% of the total memory allocated to a node will be used as the indexing buffer size. This amount is then divided between all the different shards. Also, if percentage is used, it is possible to set min_index_buffer_size (defaults to 48mb) and max_index_buffer_size (defaults to unbounded).

Translog: Each shard has a transaction log or write ahead log associated with it. It allows to guarantee that when an index/delete operation occurs, it is applied atomically, while not “committing” the internal Lucene index for each request. A flush (“commit”) still happens based on several parameters:

index.translog.flush_threshold_ops: After how many operations to flush. Defaults to unlimited.

Create Index: Elasticsearch provides support for multiple indices, including executing operations across several indices.

Index Settings: Each index created can have specific settings associated with it.

index.number_of_shards: The number of primary shards that an index should have, which defaults to 5. This setting cannot be changed after index creation.

index.number_of_replicas: The number of replica shards (copies) that each primary shard should have, which defaults to 1. This setting can be changed at any time on a live index.

Thread Pool: A node holds several thread pools in order to improve how threads memory consumption are managed within a node. Many of these pools also have queues associated with them, which allow pending requests to be held instead of discarded.

  • index: For index/delete operations. Defaults to fixed with a size of # of available processors, queue_size of 200.
  • search: For count/search operations. Defaults to fixed with a size of 3x # of available processors, queue_size of 1000.
  • suggest: For suggest operations. Defaults to fixed with a size of # of available processors, queue_size of 1000.
  • get: For get operations. Defaults to fixed with a size of # of available processors, queue_size of 1000.
  • bulk: For bulk operations. Defaults to fixed with a size of # of available processors, queue_size of 50.
  • percolate: For percolate operations. Defaults to fixed with a size of # of available processors, queue_size of 1000.
  • snapshot: For snapshot/restore operations. Defaults to scaling, keep-alive 5m with a size of (# of available processors)/2.
  • warmer: For segment warm-up operations. Defaults to scaling with a 5m keep-alive.
  • refresh: For refresh operations. Defaults to scaling with a 5m keep-alive.
  • listener: Mainly for java client executing of action when listener threaded is set to true. Default size of (# of available processors)/2, max at 10.

Thread pool types: The following are the types of thread pools that can be used and their respective parameters.

  • cache: The cache thread pool is an unbounded thread pool that will spawn a thread if there are pending requests.
  • fixed: The fixed thread pool holds a fixed size of threads to handle the requests with a queue (optionally bounded) for pending requests that have no threads to service them.
  • size: The size parameter controls the number of threads, and defaults to the number of cores times 5.
  • queue_size: The queue_size allows to control the size of the queue of pending requests that have no threads to execute them. By default, it is set to -1 which means its unbounded. When a request comes in and the queue is full, it will abort the request.

Note: This configuration must be done in all servers of the cluster.

Using ElasticSearch RestFul API

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs.html

The ElasticSearch API is very useful for various tasks as it can index, search, delete and do updates in the tool.

Here are some examples that we can use to check the health of ‘nodes’.

http://localhost:9200/?pretty

{
  "status" : 200,
  "name" : "server1",
  "cluster_name" : "logstash-elasticsearch",
  "version" : {
    "number" : "1.4.3",
    "build_hash" : "36a29a7144cfde87a960ba039091d40856fcb9af",
    "build_timestamp" : "2015-02-11T14:23:15Z",
    "build_snapshot" : false,
    "lucene_version" : "4.10.3"
  },
  "tagline" : "You Know, for Search"
}

http://localhost:9200/_nodes/process?pretty

{
  "cluster_name" : "logstash-elasticsearch",
  "nodes" : {
    "hmmW5sBDTjezJ8-ZoRfEBg" : {
      "name" : "server1",
      "transport_address" : "inet[/192.168.0.1:9300]",
      "host" : "server1.example.com",
      "ip" : "127.0.0.1",
      "version" : "1.4.3",
      "build" : "36a29a7",
      "http_address" : "inet[/192.168.0.1:9200]",
      "attributes" : {
        "master" : "true"
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 20490,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      }
    },
    "PnXa3HANRYy168zrj-gHUA" : {
      "name" : "server2",
      "transport_address" : "inet[/192.168.0.2:9300]",
      "host" : "server2.example.com",
      "ip" : "127.0.0.1",
      "version" : "1.4.4",
      "build" : "c88f77f",
      "http_address" : "inet[/192.168.0.2:9200]",
      "attributes" : {
        "master" : "true"
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 6807,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      }
    },
    "gCqqiJwqQuCLbCYsHkmxaQ" : {
      "name" : "server3",
      "transport_address" : "inet[/192.168.0.3:9300]",
      "host" : "server3.example.com",
      "ip" : "127.0.0.1",
      "version" : "1.4.3",
      "build" : "36a29a7",
      "http_address" : "inet[/192.168.0.3:9200]",
      "attributes" : {
        "master" : "true"
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 27220,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      }
    },
    "8dBGxBvnSw2QgEDdDL6Emg" : {
      "name" : "server4",
      "transport_address" : "inet[/192.168.0.4:9300]",
      "host" : "server4.example.com",
      "ip" : "127.0.0.1",
      "version" : "1.4.3",
      "build" : "36a29a7",
      "http_address" : "inet[/192.168.0.4:9200]",
      "attributes" : {
        "master" : "true"
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 11087,
        "max_file_descriptors" : 65535,
        "mlockall" : true
      }
    }
  }
}

Installing and Configuring Logstash

http://logstash.net/

Logstash is a tool to collect, store and analyze log.

Installing Logstash:

First let’s create the repository for Logstash, for that create the file logstash.repo in /etc/yum.repos.d/ with the following content:

[logstash-1.4]
name=logstash repository for 1.4.x packages
baseurl=http://packages.elasticsearch.org/logstash/1.4/centos
gpgcheck=1
gpgkey=http://packages.elasticsearch.org/GPG-KEY-elasticsearch
enabled=1
sslverify=true

Now, we will install and start Logstash on the boot system.

rpm --import http://packages.elasticsearch.org/GPG-KEY-elasticsearch
yum install logstash
chkconfig --add logstash
chkconfig logstash on

Edit the file /etc/init.d/logstash to set the following parameters:

LS_USER=root
LS_GROUP=logstash
LS_HOME=/var/lib/logstash
LS_HEAP_SIZE="4g"
LS_JAVA_OPTS="-Djava.io.tmpdir=${LS_HOME} -Djava.net.preferIPv4Stack=true"
LS_LOG_DIR=/var/log/logstash
LS_LOG_FILE="${LS_LOG_DIR}/$name.log"
LS_CONF_DIR=/etc/logstash/conf.d
LS_OPEN_FILES=65535
LS_NICE=19
LS_OPTS=""

Let’s create the configuration file that will receive and send logs to Elasticsearch.

Create the file logstash-server.conf in /etc/logstash/conf.d/ with the following content:

input {
 lumberjack {
  port => 5000
  ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
  ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
  type => "lumberjack"
 }
}

filter {
 multiline {
  pattern => "^\s"
  what => "previous"
 }
 grok {
  match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
  tag_on_failure => [""]
 }
 if [message] == "" {
  drop { }
 }
}

output {
 elasticsearch {
  cluster => "logstash-elasticsearch"
  protocol => http
  workers => 20
 }
}

input: An input plugin enables a specific source of events to be read by Logstash.

lumberjack: Receive events using the lumberjack protocol. This is mainly to receive events shipped with lumberjack, now represented primarily via the Logstash-forwarder.

port: The port to listen on. This is a required setting. Value type is number. There is no default value for this setting.

ssl_certificate: SSL certificate to use. This is a required setting. Value type is path. There is no default value for this setting.

ssl_key: SSL key to use. This is a required setting. Value type is path. There is no default value for this setting.

type: Add a type field to all events handled by this input. Types are used mainly for filter activation. The type is stored as part of the event itself, so you can also use the type to search for it in Kibana. Value type is string. There is no default value for this setting.

filter: A filter plugin performs intermediary processing on an event. Filters are often applied conditionally depending on the characteristics of the event.

multiline: This filter will collapse multiline messages from a single source into one Logstash event. The original goal of this filter was to allow joining of multi-line messages from files into a single event. For example – joining java exception and stacktrace messages into a single event.

pattern: The regular expression to match. This is a required setting. Value type is string. There is no default value for this setting.

what: If the pattern matched, does event belong to the next or previous event?

This is a required setting. Value can be any of: previous, next There is no default value for this setting.

grok: Parse arbitrary text and structure it. Grok is currently the best way in logstash to parse crappy unstructured log data into something structured and queryable. This tool is perfect for syslog logs, apache and other webserver logs, mysql logs, and in general, any log format that is generally written for humans and not computer consumption.

match: A hash of matches of field ⇒ value. Value type is hash. Default value is {}.

tag_on_failure: Append values to the tags field when there has been no successful match. Value type is array. Default value is [“_grokparsefailure”].

output: An output plugin sends event data to a particular destination. Outputs are the final stage in the event pipeline.

elasticsearch: This output lets you store logs in Elasticsearch and is the most recommended output for Logstash. If you plan on using the Kibana web interface, you’ll need to use this output.

cluster: The name of your cluster if you set it on the Elasticsearch side. Useful for discovery. Value type is string. There is no default value for this setting.

protocol: Choose the protocol used to talk to Elasticsearch.

The node protocol will connect to the cluster as a normal Elasticsearch node (but will not store data). This allows you to use things like multicast discovery. If you use the node protocol, you must permit bidirectional communication on the port 9300 (or whichever port you have configured).

The transport protocol will connect to the host you specify and will not show up as a node in the Elasticsearch cluster. This is useful in situations where you cannot permit connections outbound from the Elasticsearch cluster to this Logstash server.

The http protocol will use the Elasticsearch REST/HTTP interface to talk to elasticsearch.

All protocols will use bulk requests when talking to Elasticsearch.

The default protocol setting under java/jruby is “node”. The default protocol on non-java rubies is “http”

Value can be any of: node, transport, http. There is no default value for this setting.

workers: The number of workers to use for this output. Note that this setting may not be useful for all outputs. Value type is number. Default value is 1.

Note: This setting should be done on all servers in the cluster.

Creating certificates for Lumberjack (Logstash-Forwarder)

Lumberjack has recently been renamed to Logstash-Forwarder. This project has the objective to provide low latency and security while sending the log.

To create the key and the certificate we create a configuration file called default.config as shown below:

[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no

[req_distinguished_name]
C = BR
ST = Sao Paulo
L = Sao Paulo
O = Movile Internet Movel
CN = *

[v3_req]
subjectKeyIdentifier = hash
authorityKeyIdentifier = keyid,issuer:always
basicConstraints = CA:TRUE
subjectAltName = @alt_names

[alt_names]
DNS.1 = *
DNS.2 = *.*
DNS.3 = *.*.*
DNS.4 = *.*.*.*
DNS.5 = *.*.*.*.*
DNS.6 = *.*.*.*.*.*
DNS.7 = *.*.*.*.*.*.*
IP.1 = IP Server1
IP.2 = IP Server2
IP.3 = IP Server3
IP.4 = IP Server4

It is very important to configure the IPs, otherwise Logstash-Forwarder can’t connect to the server.

The following settings must be changed into your configuration:

C = BR
ST = Sao Paulo
L = Sao Paulo
O = Movile Internet Movel
CN = *

Now let’s create the certificate and the key with the following command:

sudo openssl req -x509  -batch -nodes -newkey rsa:2048 -keyout /etc/pki/tls/private/logstash-forwarder.key -out /etc/pki/tls/certs/logstash-forwarder.crt -config default.config -days 3650
chmod 600 /etc/pki/tls/private/logstash-forwarder.key
chmod 600 /etc/pki/tls/certs/logstash-forwarder.crt

Now we can start the Logstash.

service logstash start

Note: The logstash-forwarder.crt certificate should be placed on all servers that will send logs to Logstash.

Note: The logstash-forwarder.key key should be placed on all Logstash servers.

Installing and Configuring Logstash-Forwarder

https://github.com/elasticsearch/logstash-forwarder

Note: Logstash-Forwarder must be installed on all servers that will send logs.

Install Logstash-Forwarder through the rpm package:

rpm -ivh https://download.elastic.co/logstash-forwarder/binaries/logstash-forwarder-0.4.0-1.x86_64.rpm

Add as a service:

sudo chkconfig --add logstash-forwarder
sudo chkconfig logstash-forwarder on

Let’s create the configuration file that will send logs to the Logstash server:

Create the file logstash-forwarder.conf in /etc, changing the content below to your need:

{
 "network": {
 "servers": [ "server1:5000", "server2:5000", "server3:5000", "server4:5000" ],
 "timeout": 15,
 "ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt"
 },

 "files": [
  {
   "paths": [
    "/var/log/secure",
    "/var/log/messages"
   ],
   "fields": { "type": "syslog" },
   "dead time": "10m"
  }
 ]
}

In this configuration it is important to know that:

network: The network section covers network configuration.

servers: A list of downstream servers listening for our messages. Logstash-Forwarder will pick one at random and only switch if the selected one appears to be dead or unresponsive.

ssl ca: The path to your trusted ssl CA file. This is used to authenticate your downstream server.

files: The list of files configurations.

paths: The path to your logs files.

dead time: The setting controlling the idle time, idle files are now closed to save resources.

Now do service restart:

service logstash-forwarder restart

Analyzing Logs with Kibana

https://github.com/elasticsearch/kibana

Kibana is one of the best web interfaces to use with ElasticSearch. Besides being a very pleasant interface, it is very interactive and easy to use.

Configuring Kibana

The Logstash installed comes with Logstash-Web – Logstash-Web is the name given to the service Kibana. Since it is already installed, let’s configure it.

Edit the file /opt/logstash/vendor/kibana/config.js and configure as below:

elasticsearch: "http://"+window.location.hostname+":80",

Now, do service restart and put on boot system.

sudo service logstash-web restart
sudo chkconfig logstash-web on

Now, let’s install Nginx.

sudo yum install nginx

Create the file kibana.conf in /etc/nginx/conf.d/ with the configuration below:

server {
  listen                *:80 ;

  server_name           localhost;
  access_log            /var/log/nginx/kibana.access.log;

  location / {
    root  /opt/logstash/vendor/kibana;
    index  index.html  index.htm;
  }

  location ~ ^/_aliases$ {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }
  location ~ ^/.*/_aliases$ {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }
  location ~ ^/_nodes$ {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }
  location ~ ^/.*/_search$ {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }
  location ~ ^/.*/_mapping {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }

  # Password protected end points
  location ~ ^/kibana-int/dashboard/.*$ {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }
  location ~ ^/kibana-int/temp.*$ {
    proxy_pass http://127.0.0.1:9200;
    proxy_read_timeout 90;
  }
}

Start Nginx and put it in the boot system.

sudo service nginx start
sudo chkconfig nginx on

To access Kibana:

http://localhost

Kibana

References: