Results 1 to 6 of 6

Thread: [RESOLVED] Dynamic "max_check_attempts" based on state for log monitors

  1. #1
    Junior Member
    Join Date
    Sep 2011
    Posts
    27

    [RESOLVED] Dynamic "max_check_attempts" based on state for log monitors

    My Shinken deployment features a lot of log file monitoring; on the monitored nodes I'm using NSClient++ and the excellent cross-platform check_logfiles by ConSol Labs (http://labs.consol.de/lang/en/nagios/check_logfiles/).

    The "max_check_attempts" for these logfile services is set to 1, since re-checks don't make sense for log files.

    Unfortunately, some of my monitored nodes have sub-optimal network connectivity and the logfile checks occasionally exit with UNKNOWN status. Since "max_check_attempts" is 1, this UNKNOWN condition is notified immediately. However, the vast majority of the time, the next consecutive check works correctly and "catches up" on the logfile monitoring that it missed on the last run.

    I would like to increase the "max_check_attempts" so I don't get these nuisance notifications, but I can't because it would disrupt normal logfile monitoring and cause most WARNING and CRITICAL results to be ignored (since they do not repeat on consecutive checks).

    Ideally, I would like to use a different "max_check_attempts" value for each state. For UNKNOWN it would be 3, and for all other states it would be 1. It doesn't appear that this is possible with the current configuration options.

    Does anyone have any ideas of how I could get the result I want? Is there any way this feature could ever make it's way into the core Shinken feature set?

  2. #2

    Re: Dynamic "max_check_attempts" based on state for log monitors

    Hi,
    i hopy you have also set "is_volatile 1", because you could miss events if you set "max_check_attempts " alone.
    If you don't want notifications on UNKNOWN, why don't you add "notification_options w,c,r,f,s" (the whole list without u)?

    Gerhard

  3. #3
    Junior Member
    Join Date
    Sep 2011
    Posts
    27

    Re: Dynamic "max_check_attempts" based on state for log monitors

    Yes, I am setting "is_volatile" to ensure that I don't miss consecutive CRITICAL checks.

    Unfortunately, I can't simply disable UNKNOWN alerts - there are legitimate cases where I need to be notified of check errors. My problem is there are occasional connection problems or other check errors (resulting in UNKNOWN) that resolve themselves on the next check, and I only wish to be notified of persistent check errors.

    Of course, because it's a logfile, I still need the WARNING and CRITICAL status checks to notify immediately on the first check - hence my predicament.

  4. #4
    Junior Member
    Join Date
    Sep 2011
    Posts
    27

    Re: Dynamic "max_check_attempts" based on state for log monitors

    I've implemented a work-around that gets the results I want, but it's not elegant.

    In my environment, the WARNING status isn't used since we don't need to distinguish severity levels. I have therefore rewritten my logfile plugin as follows:

    When a CRITICAL condition is detected, the plugin submits 2 passive check results with status WARNING but does not exit. These check results are not alerted, and serve only to "bump through" the soft states. After submitting the passive check results, the plugin (which is still running) sleeps for 5 seconds to allow then passive results to be processed, then exits normally with a CRITICAL status. Since the plugin's CRITICAL status is now the 3rd check result, it is considered a HARD state and is alerted immediately.

    In this way, UNKNOWN results are still subject to the "max_check_attempts" restriction on notification, but CRITICAL status is notified immediately. Since I have no notifications attached to WARNING status, the passive check results will never cause notifications if the service is already in a hard state.

    Unfortunately, this method is not completely reliable; if the passive checks are not processed promptly, the CRITICAL status is processed first and becomes the soft state, so it is never notified. The ideal solution would still be a "max_check_attempts" that could be configured on a per-status basis.

  5. #5
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: Dynamic "max_check_attempts" based on state for log monitors

    I think such case should be manage by the futur "trigger" thing. Adding more complexity in max_attemps is not a good idea. It's already difficult to understand for lot of users. But future triggers should be able to manage such a case (look at http://www.shinken-monitoring.org/wiki/triggers for the trigger draft).

    So from now your hack is the only way, but I hope one day we will have a bit better way to hack the internals
    No direct support by personal message. Please open a thread so everyone can see the solution

  6. #6
    Junior Member
    Join Date
    Sep 2011
    Posts
    27

    Re: Dynamic "max_check_attempts" based on state for log monitors

    "Triggers" look like they will be a much simpler solution to my challenge, once they are available. Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •