Hello,
We have a fully operational shinken installation (v2.0.3) and we are setting the Notifications part.
The problem we have encountered is related to lose notifications (hosts and services) since a undetermined number of notifications. Sometimes it happens at the forth or the sixth ... no visible pattern.

No error in any log, the alerts are not risen and as a result the notifications are missing. The scheduler then begins to post this Warning:
Code:
Warning : 1 action never came back for the satellite 'reactionner-master'. I reenable them for polling
After several hours of wait, there was no more alerts or notifications for the monitored Host.


It is very easy to reproduce, just a single host with quick check and notification timings. The values for the checks / notification are these:

Code:
define host{
    use          generic-host
    host_name       dummy01
    address        192.168.122.246

    # Checking part
    check_command      check_host_alive
    max_check_attempts   2
    check_interval     1

    # Notification part
    contact_groups     C1
    notification_interval  3
    notification_period   24x7
    notification_options  d,u,r,f
    notifications_enabled  1

    # Check every time
    active_checks_enabled  1
    check_period      24x7
}
I think this must be the timeline:
  • [li]T=0 First Detection, State Down (Soft)[/li]
    [li]T=1 Second Detection, State Down (Hard) + Notification 1[/li]
    [li]T=4 No host changes, Notification 2[/li]
    [li]T=7 No host changes, Notification 3[/li]
    [li]T=10 No host changes, Notification 4[/li]
    [li]T=13 No host changes, Notification 5[/li]
    [li]...[/li]
    [li]T=n No host changes, Notification m[/li]


And these are the logs in the Shinken Scheduler
Code:
2014-11-03 14:16:55,774 [1415020615] HOST ALERT: dummy01;DOWN;SOFT;1;CRITICAL - 192.168.122.246: rta nan, lost 100%
2014-11-03 14:17:55,893 [1415020675] HOST ALERT: dummy01;DOWN;HARD;2;CRITICAL - 192.168.122.246: rta nan, lost 100%
2014-11-03 14:17:55,895 [1415020675] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
2014-11-03 14:20:55,268 [1415020855] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
2014-11-03 14:23:55,634 [1415021035] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%
2014-11-03 14:26:55,001 [1415021215] HOST NOTIFICATION: pedroC1;dummy01;DOWN;notify-host-by-email;CRITICAL - 192.168.122.246: rta nan, lost 100%

2014-11-03 14:35:40,060 [1415021740] Warning : 1 actions never came back for the satellite 'reactionner-master'. I reenable them for polling
...
ALWAYS THE PREVIOUS MESSAGE
...