In a new article in
Big Data Quarterly, NVIDIA's Jim Scott writes about cyBERT which provides a toolset powered by NLP to perform log parsing, which is a game changer in the critical and time-sensitive area of cybersecurity:
Network security logs are a ubiquitous record of system runtime states and messages of system activities and events. They become the primary source of system behavior and are critical when triaging abnormalities in otherwise normal system execution. The logs are usually unstructured textual messages that are difficult to go through manually because of the ever-increasing rate at which they are created. The raw data from the logs is unstructured, noisy, and inconsistent; thus, some preprocessing and parsing is essential.
Parsing logs with regular expressions is the most widely utilized method available for network log analysis. A regular expression (regex) is a sequence of characters specifying how to match a sequence of characters. Outside of one-off parsing, you are most likely going to use regular expressions to repeatedly parse and normalize log files as part of the analysis infrastructure. However, as the log file format changes, regular expressions fail, and this can create failures in how log data is processed and evaluated. This is often the case as log structures vary in source, format, and time. As the number of sources increases, the number of custom regex parsers increases as well.
Full article continues here.