Results 1 to 3 of 3

Thread: Arbiter pinging other Shinken daemons result in 2 minute timeout for each daemon

  1. #1
    Junior Member
    Join Date
    Dec 2013
    Posts
    6

    Arbiter pinging other Shinken daemons result in 2 minute timeout for each daemon

    HI there,

    I'm running Shinken 1.4.1 and use two Shinken masters for HA.
    One master (FOO) is usually the active one and BAR is the spare, waiting for the active one to die.

    I set the timeout for each Shinken master daemon to 3 seconds. With 3 attempts this should declare the daemon dead and unreachable in 9 seconds:
    Code:
    define scheduler {
     data_timeout 3
     check_interval 30
     weight 2
     skip_initial_broks 0
     modules RedisRetention_bs
     spare 0
     timeout 3
     address FOO
     scheduler_name scheduler-FOO
     max_check_attempts 3
     realm All
     port 7768
    }
    This is what happens when I reboot the active master:
    • [li]Quite quickly the spare one detects that the other master is dead and tries to dispatch its config[/li]
      [li]This results in a sequential pinging of each Shinken daemon found in the shinken-specific.cfg, which includes the previous master FOO![/li]
      [li]Now each attempt to reach the dead master results in a 1 minute timeout times two for some reason, see logfile below[/li]


    Code:
    2014-06-23 15:54:59,442 [1403531699] Info :  Arbiter Master is dead. The arbiter Arbiter-Master-itinfra-mon-bap01 take the lead
    2014-06-23 15:54:59,442 [1403531699] Info :  Begin to dispatch configurations to satellites
    2014-06-23 15:54:59,442 [1403531699] Info :  Pinging scheduler-FOO
    2014-06-23 15:54:59,444 [1403531699] Info :   (PYROLOC://FOO:7768/ForArbiter)
    2014-06-23 15:56:02,540 [1403531762] Warning : Add failed attempt to scheduler-FOO (1/3) connection failed
    2014-06-23 15:57:05,645 [1403531825] Info :  Pinging scheduler-satellite2
    2014-06-23 15:57:05,647 [1403531825] Info :   (PYROLOC://satellite2:7768/ForArbiter)
    
    2014-06-23 15:57:07,216 [1403531827] Info :  Pinging reactionner-FOO
    2014-06-23 15:57:07,217 [1403531827] Info :   (PYROLOC://FOO:7769/ForArbiter)
    2014-06-23 15:58:10,348 [1403531890] Warning : Add failed attempt to reactionner-FOO (1/3) connection failed
    2014-06-23 15:59:13,453 [1403531953] Info :  Pinging reactionner-BAR
    2014-06-23 15:59:13,453 [1403531953] Info :   (PYROLOC://BAR:7769/ForArbiter)
    2014-06-23 15:59:13,455 [1403531953] Info :  Pinging poller-FOO
    2014-06-23 15:59:13,456 [1403531953] Info :   (PYROLOC://FOO:7771/ForArbiter)
    2014-06-23 16:00:16,556 [1403532016] Warning : Add failed attempt to poller-FOO (1/3) connection failed
    2014-06-23 16:01:19,661 [1403532079] Info :  Pinging poller-satellite2
    This goes on for each Shinken daemon on the dead master, which results in a long unnecessary downtime for the monitoring service.

    Is this 1 minute downtime a generic Python timeout or did I miss a config setting somewhere?

  2. #2
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: Arbiter pinging other Shinken daemons result in 2 minute timeout for each daemon

    The interval between checks is check_interval 30 (in seconds)
    No direct support by personal message. Please open a thread so everyone can see the solution

  3. #3
    Junior Member
    Join Date
    Dec 2013
    Posts
    6

    Re: Arbiter pinging other Shinken daemons result in 2 minute timeout for each daemon

    Ah, yes thanks! Didn't see that

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •