Page 2 of 3 FirstFirst 123 LastLast
Results 11 to 20 of 26

Thread: 2 datacenters and distributed monitoring

  1. #11

    Re: 2 datacenters and distributed monitoring

    Ok i just disabled the arbiter spare on node 2.

    The arbiters shut down correctly and i don't have the errors anymore except the following warnings :

    2013-01-02 16:29:52,138 [1357140592] Warning : Missing satellite poller for configuration 0:
    2013-01-02 16:29:52,138 [1357140592] Warning : Missing satellite poller for configuration 1:

    Are these errors significant ?

    I'm about to read what you added xkilian. thanks

  2. #12

    Re: 2 datacenters and distributed monitoring

    Well, now i'm trying to understand how the poller_tags is working.

    I configured shinken as follow (it's an excerpt with just the schedulers and pollers conf) :
    Code:
    define poller {
     poller_name poll01
     realm PMU
     poller_tags siteA
     data_timeout 120
     timeout 3
     address server1
     modules
     port 7771
     manage_sub_realms 0
     max_workers 0
     check_interval 60
     polling_interval 1
     max_check_attempts 3
     min_workers 0
     processes_by_worker 256
    }
    
    define scheduler {
     data_timeout 120
     timeout 3
     weight 1
     skip_initial_broks 0
     modules ,PickleRetention
     spare 0
     check_interval 60
     address server1
     scheduler_name sched01
     realm PMU
     max_check_attempts 3
     port 7768
    }
    
    define poller {
     poller_name poll02
     realm PMU
     poller_tags siteB
     data_timeout 120
     timeout 3
     address server2
     modules
     port 7771
     manage_sub_realms 0
     max_workers 0
     check_interval 60
     polling_interval 1
     max_check_attempts 3
     min_workers 0
     processes_by_worker 256
    }
    
    define scheduler {
     data_timeout 120
     timeout 3
     weight 1
     skip_initial_broks 0
     modules ,PickleRetention
     spare 0
     check_interval 60
     address server2
     scheduler_name sched02
     realm PMU
     max_check_attempts 3
     port 7768
    }
    
    define broker {
     broker_name brok01-spare
     realm PMU
     modules Livestatus, Simple-log, NPCDMOD, WebUI
     manage_arbiters 1
     manage_sub_realms 0
     spare 1
     check_interval 60
     address server2
     port 7772
    }
    Code:
    I configured my hosts as follow. The hosts located on siteA should have to be monitored by the poller on the same location.
    define host{
        use           linux,ssh,http
        poller_tag       siteA
        contact_groups     admins
        host_name        server1
        address         server1
        icon_set        server
        }
    But when i shot down the server1 (in order to simulate an outage), i noticed that the list of hosts contains hosts from each "poller_tag". 50% of hosts has disapeared.
    I was expecting to see only the hosts from siteB...

    Maybe i misunderstood this feature too

  3. #13
    Administrator
    Join Date
    Dec 2011
    Posts
    278

    Re: 2 datacenters and distributed monitoring

    List of hosts where? In the WebUI?

    The WebUI interrogates the Broker, which gets the configuration for all hosts and services for a given realm(and possibly sub-realms).
    So it is normal that hosts/services supervised by the poller with SiteA tag will still show up, their state will move to unknown because they are no longer polled.

    That is normal. within SiteA you should have two pollers defined.

    For larger instalations you should have two on one server and two on the other server, with one of the two daemons on a given server acting as a spare for the active poller daemon on the other server. Pollers and schedulers should be on physical servers if you have a lot of hosts/services to collect. You can read the large installation article on the documentation wiki which gives best practices.

    Cheers,

    xkilian

  4. #14

    Re: 2 datacenters and distributed monitoring

    Hello xkilian,

    Sorry i give you more details.

    Following my tests please find in the attached files a quick & dirty schema.

    The architecture seems really simple but i've 2 issues to address :

    1 - i can't get working my broker spare. for an unknown reason when i stop the active broker, the spare doesn't take its role.
    The livestatus should start on the node2 but nothing happens

    2 - i would like to get working my arbiter spare but when i configure it on node2 i get the following error.

    Code:
    2013-01-02 16:08:37,442 [1357139317] Critical : Exception trace follows: Traceback (most recent call last):
     File "/usr/local/shinken/shinken/daemons/arbiterdaemon.py", line 553, in main
      self.do_mainloop()
     File "/usr/local/shinken/shinken/daemon.py", line 244, in do_mainloop
      self.do_loop_turn()
     File "/usr/local/shinken/shinken/daemons/arbiterdaemon.py", line 587, in do_loop_turn
      self.run()
     File "/usr/local/shinken/shinken/daemons/arbiterdaemon.py", line 685, in run
      self.dispatcher.check_dispatch()
     File "/usr/local/shinken/shinken/dispatcher.py", line 146, in check_dispatch
      arb.put_conf(self.conf.whole_conf_pack)
     File "/usr/local/shinken/shinken/satellitelink.py", line 133, in put_conf
      self.con.put_conf(conf)
     File "/usr/lib/python2.6/site-packages/Pyro/core.py", line 384, in __call__
      return self.__send(self.__name, args, kwargs)
     File "/usr/lib/python2.6/site-packages/Pyro/core.py", line 459, in _invokePYRO
      return self.adapter.remoteInvocation(name, Pyro.constants.RIF_VarargsAndKeywords, vargs, kargs)
     File "/usr/lib/python2.6/site-packages/Pyro/protocol.py", line 440, in remoteInvocation
      return self._remoteInvocation(method, flags, *args)
     File "/usr/lib/python2.6/site-packages/Pyro/protocol.py", line 501, in _remoteInvocation
      answer.raiseEx()
     File "/usr/lib/python2.6/site-packages/Pyro/errors.py", line 73, in raiseEx
      raise self.excObj
    NameError: global name 'cPickle' is not defined
    I attached my current configuration. Could you please have a look to it ?

    Thanks

  5. #15

    Re: 2 datacenters and distributed monitoring

    Hi,

    For point 1, i don't know why but when i came back a few minutes later the test worked. I did it several time without a problem.
    Maybe it needs a long time to sync ?...

    I still need a arbiter spare on the other side.

    See ya

  6. #16

    Re: 2 datacenters and distributed monitoring

    Point 1 is still an issue : i tried again this morning and didn't manage to get working the broker spare.


    None of the modules is started.
    I tried to configure just one module : Simple-log but it didn't work too.

    [server2]/usr/local/shinken/var # tail -1 /usr/local/shinken/var/nagios.log
    [1357229570] SERVICE ALERT: xxx;UNKNOWN;SOFT;1;(Service Check Timed Out)
    [server2]/usr/local/shinken/var # date -d@1357229570
    Thu Jan 3 17:12:50 CET 2013

  7. #17

    Re: 2 datacenters and distributed monitoring

    On the brokerd.log (server2), i've nothing more than the following line :
    2013-01-04 11:39:43,692 [1357295983] Info : Waiting for initial configuration
    would that be usefull to increase the log level to DEBUG ??

    Maybe it needs the arbiter daemon ?

  8. #18

    Re: 2 datacenters and distributed monitoring


    For the arbiter issue :
    I saw this post (French) http://forums.monitoring-fr.org/index.php?topic=6024.0

    Maybe i should do a rollback to 1.2... i'll try that.

  9. #19
    Administrator
    Join Date
    Dec 2011
    Posts
    278

    Re: 2 datacenters and distributed monitoring

    Hello,

    You can add the import cPickle that was missing.

    @Nap, can you let him know where in the code he must add it in.

    Thank you,

    xkilian

  10. #20

    Re: 2 datacenters and distributed monitoring

    Hello xkilian

    Thank you again. Regarding the arbiter HA i'll wait for Nap's answer.
    Do you have an idea for the broker spare ? what can i do in order to investigate the issue ?

    Sam

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •