Results 1 to 6 of 6

Thread: Notification Problems

  1. #1

    Notification Problems

    Hello,
    We have a fully operational shinken installation (v2.0.3) and we are setting the Notifications part.
    The problem we have encountered is related to lose notifications (hosts and services) since a undetermined number of notifications. Sometimes it happens at the forth or the sixth ... no visible pattern.

    No error in any log, the alerts are not risen and as a result the notifications are missing. The scheduler then begins to post this Warning:
    Code:
    Warning : 1 action never came back for the satellite 'reactionner-master'. I reenable them for polling
    After several hours of wait, there was no more alerts or notifications for the monitored Host.


    It is very easy to reproduce, just a single host with quick check and notification timings. The values for the checks / notification are these:

    Code:
    define host{
        use          generic-host
        host_name       dummy01
        address        192.168.122.246
    
        # Checking part
        check_command      check_host_alive
        max_check_attempts   2
        check_interval     1
    
        # Notification part
        contact_groups     C1
        notification_interval  3
        notification_period   24x7
        notification_options  d,u,r,f
        notifications_enabled  1
    
        # Check every time
        active_checks_enabled  1
        check_period      24x7
    }
    I think this must be the timeline:
    • [li]T=0 First Detection, State Down (Soft)[/li]
      [li]T=1 Second Detection, State Down (Hard) + Notification 1[/li]
      [li]T=4 No host changes, Notification 2[/li]
      [li]T=7 No host changes, Notification 3[/li]
      [li]T=10 No host changes, Notification 4[/li]
      [li]T=13 No host changes, Notification 5[/li]
      [li]...[/li]
      [li]T=n No host changes, Notification m[/li]


    And these are the logs in the Shinken Scheduler
    Code:
    2014-11-03 14:16:55,774 [1415020615] HOST ALERT: dummy01;DOWN;SOFT;1;CRITICAL - 192.168.122.246: rta nan, lost 100%
    2014-11-03 14:17:55,893 [1415020675] HOST ALERT: dummy01;DOWN;HARD;2;CRITICAL - 192.168.122.246: rta nan, lost 100%
    2014-11-03 14:17:55,895 [1415020675] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
    2014-11-03 14:20:55,268 [1415020855] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
    2014-11-03 14:23:55,634 [1415021035] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
    2014-11-03 14:26:55,001 [1415021215] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
    
    2014-11-03 14:35:40,060 [1415021740] Warning : 1 actions never came back for the satellite 'reactionner-master'. I reenable them for polling
    ...
    ALWAYS THE PREVIOUS MESSAGE
    ...


  2. #2

    Re: Notification Problems

    We have test it with different values of "max_check_attempts" and "check_interval" and the soft and hard Notifications (UP and DOWN) always work fine.

    The problem always appears with the notification interval. We have test short and long periods with the same result. The max number of notifications got until now is 8.






  3. #3

    Re: Notification Problems

    UPDATE (forget to mention):
    Multiple retention configured in the tests (pickle, mongo, none) and same results.

  4. #4

  5. #5

    Re: Notification Problems

    UPDATE:
    Also tried with / without Escalations (Shinken style) and same results.

  6. #6

    Re: Notification Problems

    I think this should be moved to "Notification escalations" in the Advanced Topics.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •