That is where parsing comes into the picture, beautifying the data and enhancing it to allow you to analyze the various fields constructing the log message more easily.
Fine-tuning Logstash to use a grok filter on your logs correctly is an art unto itself and can be extremely time-consuming. Take the timestamp format, for example. Just search for “Logstash timestamp” on Google, and you will quickly be drowned in thousands of StackOverflow questions from people who are having issues with log parsing because of bad grokking.
Also, logs are dynamic. Over time, they change in format and require periodic configuration adjustments. This all translates into hours of work and money.
Mapping
Elasticsearch mapping defines the different types that reside within an index. It defines the fields for documents of a specific type—the data type (such as string and integer) and how the fields should be indexed and stored in Elasticsearch.
With dynamic mapping (which is turned on by default), Elasticsearch automatically inspects the JSON properties in documents before indexing and storage. However, if your logs change (something that is common especially with application logs) and you index documents with a different mapping, they will not be indexed by Elasticsearch. So, unless you monitor the Elasticsearch logs, you will likely not notice the resulting “MapperParsingException” error and thereby lose the logs rejected by Elasticsearch.
Scaling
You’ve got your pipeline set up, and logs are coming into the system. To ensure high availability and scalability, your ELK deployment must be robust enough to handle pressure. For example, an event occurring in production will cause a sudden spike in traffic, with more logs being generated than usual. Such cases will require the installation of additional components on top (or in front) of your ELK Stack.
Most production-grade ELK deployments now include a queuing system in front of Logstash. This ensures that bottlenecks are not formed during periods of high traffic and Logstash does not cave in during the resulting bursts of data. Installing additional Redis or Kafka instances means more time and more money, and in any case, you must make sure that these components will scale whenever needed.
In addition, you will also need to figure out how and when to scale in or out your Logstash and Elasticsearch cluster. Scaling out an Elasticsearch cluster involves creating an alert to identify the rising load in real time, which should trigger a logic to determine the number of additional nodes needed to meet the load, spin the nodes up, and add them to the cluster. Scaling in following a decrease in load involves similar steps, but also involves draining the selected nodes (i.e., migrating their indices to other nodes) before taking them down.
Performance tuning
While built for scalability, speed, and high availability, the ELK Stack — as well as the infrastructure (server, OS, network) on which you chose to set it up — requires fine-tuning and optimization to ensure high performance.
For example, you will want to configure the allocations for the different memory types used by Elasticsearch such as the JVM heap and OS swap. The number of indices handled by Elasticsearch affects performance, so you will want to make sure you remove or freeze old and unused indices.
Fine-tuning shard size, configuring partition merges for unused indices, and shard recovery in the case of node failure—these are all tasks that will affect the performance of your ELK Stack deployment and will require planning and implementation.