Make Your Own Rules, ElastAlert Style

Right off the bat, I want to say that, this blog does not cover installing and configuring ElastAlert in the usual sense, i.e. working with pre-existing rules. It helps, I hope, in understanding the requirements for adding one’s own rule.

This was a unique use case. Well, not that unique. A lot of people must have encountered it before but it might have been different for each. Confused? Don’t be, I’ll explain. We needed to make sure that the backend time response from our API endpoints to Haproxy did not exceed the usual value. What is the usual value, you ask? I must say, it was stated as clear as water in the requirement specification. We had to learn the backend response time of each API endpoint over a period of a month and calculate the 95th Percentile of their respective values. This would be the usual value. In the second part, we were to calculate the average response time of each endpoint in the last 5-10 minutes, every hour, match it against its respective 95th percentile value and throw alerts based on that. Now, I don’t know what you are thinking but I thought that I need to create a new alert type as ElastAlert doesn’t have one to support this requirement yet. Thankfully, they do make it quite easy to create one’s own rule type.

Create a module

This is just like any other python module, you create a directory with an __init__.py and write your package files in it. There’s a catch though. ElastAlert already provides you a class, RuleType. All you have to do is create its subclass and write your rule logic in it. There are a few functions in the class that will help you along the way. Go through the official documentation to understand them better (here). The only function mandatory is, add_data. This function lets you download documents matching your query, 10,000 at a time. This limit can be changed from elastic search but it is not recommended. If a larger number of results are expected but they don’t need to be downloaded, we can use use_query_count which will return result count and not download the documents. It is explained better in the documentation.

If you expect a large number of results, consider using use_count_query for the rule. If max_query_size limit is reached, a warning will be logged but ElastAlert will continue without downloading more results. This setting will override a global max_query_size”.
ElastAlert Docs max_query_size

The best help we can get on how to continue is there in ElastAlert home directory. There’s a file /opt/elastalert/elastalert/ruletypes.py (assuming /opt/elastalert is set as home) which contains rule definitions for all existing ElastAlert rule types. Just going through them will make most things clear. Similarly, if you go to /opt/elastalert/example_rules/ directory, you can see examples for different types of rule configurations.

For reference, this is how I did it:

This is ElastAlert home directory, specifically /opt/elastalert in my case.

[adeel@opstreeLabs:/opt/elastalert](master) $ ls

changelog.md docker-compose.yml docs example_rules Makefile README.md requirements.txt setup.py tests
config.yaml.example Dockerfile-test elastalert LICENSE pytest.ini requirements-dev.txt setup.cfg supervisord.conf.example tox.ini

Here, create a new directory or files needed for the module,

[adeel@opstreeLabs:/opt/elastalert](master) $ mkdir elastalert_modules

Now, cd to that directory,

[adeel@opstreeLabs:/opt/elastalert](master) $ cd elastalert_modules

Once you are done with package files, ls to /opt/elastalert/elastalert_modules would give like this:

[adeel@opstreeLabs:/opt/elastalert/elastalert_modules](master) $ ls

__init__.py percentile_calculate.py time_backend_response_collect.py

This concludes my module creation.

Write rule config files

Rule config files are easy to write. Just follow (this) document and you are ready to go. I’ll cover where problems may arise as in the problems I faced:

Understanding the difference between run_every, buffer_time, and time_frame: Well, it is simple. Two of them: run_every and buffer_time are fixed, already defined. You just have to understand how they work. run_every lets you configure at what intervals your rule will be run. buffer_time is stated better in the docs, “ElastAlert will continuously query against a window from the present to buffer_time ago.” This means you can’t keep buffer_time greater than run_every as ElastAlert also doesn’t repeat queries on the same time duration. For example, if your run_every is 5 minutes and buffer_time is 15 minutes, ElastAlert will run every 5 minutes to query the last 15 minutes of data that came to elasticsearch. Only, in the next 5 minutes, there are only 5 minutes of new data. Documents older than that have already been queried by ElastAlert. If it got a bit confusing, I assure you that some might find this useful. The thing is, you need to store older required information from the queries if you want to use it in your alert logic repeatedly. time_frame is a variable you can manipulate in your subclass. It can be used to perform some action and get results on the documents lying under a certain period of time.

realert: This is an important setting when you are writing your own rules types. Usually, this prevents you from getting repeated alerts, as in, if the same alert reoccurs multiple times in the same query run, it will give just one alert. The bottom line is that if you want to be alerted about each hit, set it to 0.

For reference, below is my alert config file, percentile_calculate.yaml

name: "95th Percentile Alert Prod"
index: prod-haproxy-*
type: "elastalert_modules.percentile_calculate.PercentileRule"
percentile_value: "95"
run_every:
  hours : 1
buffer_time:
  hours : 1
realert:
  minutes: 0
max_query_size: 10000
max_scrolling_count: 0
  
filter:
- query:
    query_string: {query: 'http_status_code:"200" and not http_request:"/favicon.ico"'}

alert:
- slack
slack:
slack_webhook_url: "https://hooks.slack.com/services/***********************"
slack_username_override: "elastalert"
slack_channel_override: "#percentile-alert"
slack_emoji_override: ":robot_face:"

Test your rule

You can test our rules before running them using elastalert-test-rule. Prerequisites for running test are:

Python3 virtual environment must be activated or python3 must be the default
All the required modules must be downloaded which goes without saying.

Once you run the test and it is successful, you can be sure that the actual rule will run the same way as well. If it doesn’t, troubleshooting time, I guess. Of Course, all the required details for running the test can be found in the docs (here).

There you have it, your own rule. I guess what they say is true, “Nobody can tell me what to do, I make my own rules.” Who says, you ask? Well, there are influencers, motivational speakers, hormonal teenagers, cranky old-timers, and not to forget criminals, of course. Jokes apart, this blog might not be a word to word guide on how to create an alert type. Mostly because we have official documentation for that. This blog just reassures someone who is stuck that it can be done and rather easily. Until next time.

GIF Source

Opstree is an End to End DevOps solution provider