Results 1 to 3 of 3

Thread: setting up HA for Shinken 2.4, having issues on failover

  1. #1
    Junior Member
    Join Date
    Jul 2015
    Posts
    5

    setting up HA for Shinken 2.4, having issues on failover

    I did not want to post a long message with alot of config files if not necessary to start this.
    I have setup 2 Shinken 2.4 nodes (A & B). When I setup A as teh master, and B as the spare .. it never fails over.
    BUT, if I setup B as master, and A as spare .. fail over occurs, but fail back does not.
    On server B .. I see this in the arbiter-debug logs:
    Code:
    [1437967451] DEBUG: [Shinken] HTTP: calling lock for have_conf
    [1437967454] DEBUG: [Shinken] Debug perf: ping [args:4.05311584473e-06] [aqu_lock:9.53674316406e-07][calling:9.53674316406e-07] [json:8.10623168945e-06] [global:1.4066696167e-05]
    [1437967456] DEBUG: [Shinken] HTTP: calling lock for put_conf
    [1437967575] DEBUG: [Shinken] Received message to not run. I am the Master, ignore and continue to run.
    [1437967575] DEBUG: [Shinken] Debug perf: do_not_run [args:4.05311584473e-06] [aqu_lock:9.53674316406e-07][calling:0.0307230949402] [json:1.59740447998e-05] [global:0.0307440757751]
    [1437967576] DEBUG: [Shinken] Debug perf: ping [args:4.05311584473e-06] [aqu_lock:0.0][calling:1.90734863281e-06] [json:8.10623168945e-06] [global:1.4066696167e-05]
    [1437967576] DEBUG: [Shinken] Debug perf: what_i_managed [args:2.86102294922e-06] [aqu_lock:9.53674316406e-07][calling:1.31130218506e-05] [json:1.69277191162e-05] [global:3.38554382324e-05]
    [1437967576] DEBUG: [Shinken] HTTP: calling lock for have_conf
    [1437967579] DEBUG: [Shinken] Debug perf: ping [args:3.81469726562e-06] [aqu_lock:1.19209289551e-06][calling:9.53674316406e-07] [json:7.86781311035e-06] [global:1.38282775879e-05]
    [1437967580] DEBUG: [Shinken] HTTP: calling lock for put_conf
    I have similar log messages when B is the spare, but I never see this kind of messages in server A (spare or master).

    I suspect file ownership or permissions .. or some package was missed on server B. Any recommendations? (hoping somebody else ran into this issue).
    Oh, Centos 6.3 is the OS.

  2. #2
    Junior Member
    Join Date
    Jul 2015
    Posts
    5
    I ended up blowing away server A and creating server C, being careful to perform same steps as used to create server B. When B is master, it will fail over to C. But if I make C master, the arbiter-debug.log contains entry after entry of the following, and DOES NOT fail over .. it seems that this is NOT waiting for the timeout to occur:
    [1437973948] DEBUG: [Shinken] HTTP: calling lock for put_conf
    [1437973948] DEBUG: [Shinken] Debug perf: put_conf [args:0.0327141284943] [aqu_lock:9.3936920166e-05][calling:6.91413879395e-06] [json:1.00135803223e-05] [global:0.0328249931335]
    [1437973948] DEBUG: [Shinken] Received message to not run. I am the spare, stopping.
    [1437973948] DEBUG: [Shinken] Debug perf: do_not_run [args:2.86102294922e-06] [aqu_lock:0.0][calling:0.000116109848022] [json:7.86781311035e-06] [global:0.000126838684082]
    [1437973948] DEBUG: [Shinken] HTTP: calling lock for have_conf
    [1437973948] DEBUG: [Shinken] Debug perf: have_conf [args:9.20295715332e-05] [aqu_lock:0.000365972518921][calling:3.09944152832e-05] [json:3.50475311279e-05] [global:0.000524044036865]
    [1437973948] DEBUG: [Shinken] Received message to not run. I am the spare, stopping.
    [1437973948] DEBUG: [Shinken] Debug perf: do_not_run [args:1.21593475342e-05] [aqu_lock:1.90734863281e-06][calling:0.000355005264282] [json:3.50475311279e-05] [global:0.000404119491577]
    .[1437973949] DEBUG: [Shinken] HTTP: calling lock for have_conf
    [1437973949] DEBUG: [Shinken] Debug perf: have_conf [args:9.98973846436e-05] [aqu_lock:0.000424146652222][calling:1.8835067749e-05] [json:4.00543212891e-05] [global:0.000582933425903]
    [1437973949] DEBUG: [Shinken] Received message to not run. I am the spare, stopping.
    [1437973949] DEBUG: [Shinken] Debug perf: do_not_run [args:1.21593475342e-05] [aqu_lock:1.90734863281e-06][calling:0.000394105911255] [json:3.38554382324e-05] [global:0.000442028045654]
    [1437973949] DEBUG: [Shinken] HTTP: calling lock for have_conf
    [1437973949] DEBUG: [Shinken] Debug perf: have_conf [args:8.79764556885e-05] [aqu_lock:0.000424146652222][calling:1.69277191162e-05] [json:3.60012054443e-05] [global:0.000565052032471]
    [1437973949] DEBUG: [Shinken] Debug perf: ping [args:1.21593475342e-05] [aqu_lock:1.90734863281e-06][calling:3.09944152832e-06] [json:1.8835067749e-05] [global:3.60012054443e-05]
    [1437973949] DEBUG: [Shinken] HTTP: calling lock for put_conf
    [1437973949] DEBUG: [Shinken] Debug perf: put_conf [args:0.0344560146332] [aqu_lock:0.000123977661133][calling:6.91413879395e-06] [json:1.09672546387e-05] [global:0.0345978736877]
    [1437973949] DEBUG: [Shinken] Received message to not run. I am the spare, stopping.
    [1437973949] DEBUG: [Shinken] Debug perf: do_not_run [args:3.09944152832e-06] [aqu_lock:9.53674316406e-07][calling:0.000197887420654] [json:1.12056732178e-05] [global:0.000213146209717]
    When B is master though, C does have in the log file that it is waiting for timeout .. very strange behavior. Is there something left over in a message queue someplace or file that is not getting cleared when I change the .cfg files to switch "master" and "spare"? Something I need to run in addition to clear up the issue? (and when I switch back to B being master, failover works again).
    Last edited by LanceMurray; 07-27-2015 at 06:43 AM.

  3. #3
    Junior Member
    Join Date
    Jul 2015
    Posts
    5
    re-installed again, and it seemed to resolve the issue .. I have not been able to replicate this issue. Please go ahead and mark as closed.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •