Page 2 of 5 FirstFirst 1234 ... LastLast
Results 11 to 20 of 44

Thread: Discovering interfaces to monitor on a router/switch

  1. #11

    Re: Discovering interfaces to monitor on a router/switch

    Master:
    var/nagios.log:[1350317971] Info : [SnmpBooster] Initialization of the SNMP Booster 1.0
    var/nagios.log:[1350317971] Info : I correctly loaded the modules: [SnmpBooster]

    [1350318244] Info : [SnmpBooster] Get a snmp poller module for plugin SnmpBooster
    [1350318244] Info : Get a Named pipe module for plugin NamedPipe-Autogenerated
    [1350318244] Info : Trying to init module: PickleRetentionArbiter
    [1350318244] Info : Trying to init module: SnmpBooster
    [1350318244] Info : [SnmpBooster] Initialization of the SNMP Booster 1.0
    [1350318244] Info : I correctly loaded the modules: [PickleRetentionArbiter,SnmpBooster,NamedPipe-Autogenerated]


    var/schedulerd.log:2012-10-15 16:18:31,688 [1350317911] Info : [SnmpBooster] Initialization of the SNMP Booster 1.0
    var/schedulerd.log:2012-10-15 16:18:31,714 [1350317911] Info : I correctly loaded the modules: [PickleRetention,SnmpBooster]



    Remote probe:
    pollerd.log:2012-10-15 17:19:31,789 [1350317971] Info : I correctly loaded the modules: [SnmpBooster]
    pollerd.log:2012-10-15 17:19:32,815 [1350317972] Info : [SnmpBooster] Module SNMP Booster started!




    I couldn't find it specifically at arbiterd.log, tho.
    Can't find any messages about errors in recent restarts.
    Timezone is different from both servers but they are with same UTC clock.

  2. #12
    Administrator
    Join Date
    Dec 2011
    Posts
    278

    Re: Discovering interfaces to monitor on a router/switch

    Lets take it from the top:

    shiken-specific.cfg

    Have you placed the Defaults_unified.ini file in your shinken/etc directory
    Have you correctly declared the path to it in your SnmpBooster module definition. <----- Probably this.
    Same for memcached server and TCP port. You would have an error otherwise.

    Are your configured hosts and services file definition in a Shinken cfg directory. See nagios.cfg.
    Are you running at least Shinken 1.2 or Shinken git master (It is stable as it is about to become 1.2.1).

    You have copied an SNMP template that has the commands and various options that the snmp poller module expects. If you had bad arguments the poller module would spit out an error.

    Please post a copy of your shinken-specific.cfg(arbiter, poller, scheduler and SnmpBooster module), snmp template, host and service definition excerpt.

    If you had a configuration error, you would have the SnmpBooster Arbiter module complaining about missing DS_TEMPLATES, or unknown OIDs or TRIGGER_TEMPLATES, etc.

    Which is why I suspect it is something to do with the module configuration in shinken-specific.cfg.

    Make sure you have no other import errors or warnings.

    Cheers,

    xkilian

  3. #13

    Re: Discovering interfaces to monitor on a router/switch

    Hi there,

    I had little time to debug it today.

    What I just discovered was that the poller started complaining about too many open files...

    2012-10-16 08:00:01,434 [1350370801] Error : Fail launching command: /usr/local/shinken/libexec/check_snmp.py -H (ip) -C public -V 2c -t _SERVICEDSTEMPLATE$ -i map(interface-name,ae1.1001) -T None [Errno 24] Too many open files True
    2012-10-16 08:00:01,504 [1350370801] Warning : [0] I DIE because I cannot do my job as I should (too many open files?)... forgot me please.
    And that it was using SNMP community public, not allowed in the monitored equipment :-[

    I had $SNMPCOMMUNITYREAD$ listed twice in the resources.cfg, probably that's why it never worked. :-X

    I've fixed and will keep you posted as it goes.

    Thanks

  4. #14
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: Discovering interfaces to monitor on a router/switch

    Did you got some zombies processes? You can play with min_workers, process_by_worker and ulimit command for your poller if your ulimit is too low.
    No direct support by personal message. Please open a thread so everyone can see the solution

  5. #15

    Re: Discovering interfaces to monitor on a router/switch

    Naparuba: no zombies, at least not now. I had restarted shinken anyway, so lost all information that we could gather. Sorry

    -------------

    XKilian, sorry that I'm too lazy to post all configuration :P

    What I don't get from both logs is that the hase for arbiter is None...

    2012-10-16 13:45:43,926 [1350395143] Debug : Add module object {'configuration_errors': [], 'use': '', 'hash': '', 'name': '', 'tags': set([]), 'modules': [], 'customs': {}, 'configuration_warnings': [], 'module_name': u'SnmpBooster', 'properties': {'daemons': ['poller', 'scheduler', 'arbiter'], 'phases': ['running', 'late_configuration', None], 'type': 'snmp_poller', 'external': False, 'worker_capable': True}, 'memcached_port': u'11211', 'memcached_host': u'(--redacted--), 'plus': {}, 'datasource_file': u'/usr/local/shinken/etc/Defaults_unified.ini', 'module_type': u'snmp_poller', 'id': 1, 'imported_from': u'/usr/local/shinken/etc/shinken-specific.cfg'}
    It is configured as:

    Code:
    define arbiter {
    modules PickleRetentionArbiter,SnmpBooster
    spare 0
    address localhost
    port 7770
    arbiter_name Arbiter-Master
    }
    The SnmpBooster is as:

    Code:
    define module {
       module_name     SnmpBooster
       module_type     snmp_poller
       datasource_file   /usr/local/shinken/etc/Defaults_unified.ini  ; MODIFY THE PATH TO MATCH YOUR INSTALLATION
       memcached_host    (--redacted--) ; SET THE IP ADDRESS OF YOUR memcached SERVER, DO NOT USE 127.0.0.1
       memcached_port    11211 ; default port for a memcached process
     
    }
    SnmpBooster is defined long before Arbiter... It's actually on top of the file.


    The hosts configuration is exactly as generated by genDevConfig.

    > > > > > >

    Then I realized I had a missing bracket in the definition.

    Code:
    define command {
      command_name  check_snmp_booster
      command_line  $PLUGINSDIR$/check_snmp.py -H $HOSTNAME$ -C $SNMPCOMMUNITYREAD$ -V 2c -t $ARG1$ -i $_SERVICEINST$ -T $_SERVICETRIGGERGROUP$
    }
    That last bracket was not there and no tool said anything.


    Still I don't see any progress on it.

    On the remote poller log I see

    Code:
    2012-10-16 15:40:25,008 [1350398425] Info :  [SnmpBooster] Get a snmp poller module for plugin SnmpBooster
    2012-10-16 15:40:25,008 [1350398425] Info :  Trying to init module: SnmpBooster
    2012-10-16 15:40:25,008 [1350398425] Info :  [SnmpBooster] Initialization of the SNMP Booster 1.0
    2012-10-16 15:40:25,317 [1350398425] Info :  I correctly loaded the modules: [SnmpBooster]
    2012-10-16 15:40:26,323 [1350398426] Info :  [poller-XXX1_1] Allocating new fork Worker: 0
    2012-10-16 15:40:26,335 [1350398426] Info :  [poller-XXX1_1] Allocating new snmp_poller Worker: 1
    2012-10-16 15:40:26,346 [1350398426] Info :  [SnmpBooster] Module SNMP Booster started!

  6. #16

    Re: Discovering interfaces to monitor on a router/switch

    Code:
    define service {
      name        default-snmp-template
        check_command   check_snmp_booster!_SERVICEDSTEMPLATE$!$_SERVICEINST$!$_SERVICETRIGGERGROUP$
        _inst          None
      _triggergroup      None
        max_check_attempts   3  
      check_interval     1  
      retry_interval     1  
        use           generic-host
      register        0  
    }
    
    define host{
        name          default-snmp-host-template
        alias          default-snmp-host
        check_command    check_host_alive
        max_check_attempts   3  
      check_interval     1  
      retry_interval     1  
      use           generic-host
      register        0  
    }
    Could this _inst None be of value?

    Some hosts

    Code:
    define host {
      host_name    (--redacted--)
      display_name   (--redacted--)
      _sys_location  
      address   (--redacted--)
      hostgroups   
      notes   
      parents   
          contact_groups            +admins
          poller_tag  (--redacted--)
            check_command             check-host-alive
            check_period             workhours
            notification_period          workhours
      use     default-snmp-host-template
      register   1  
    }
    
    define service {
      host_name    (--redacted--)
      service_description chassis
      display_name   chassis.generic - 
      _display_order  999 
      _dstemplate   Generic-Device
      _inst    0  
      active_checks_enabled0
      notes    (--redacted, long JunOS version name line--)
      use     default-snmp-template
      register   1  
    }

  7. #17
    Administrator
    Join Date
    Dec 2011
    Posts
    278

    Re: Discovering interfaces to monitor on a router/switch

    Hello openglx,

    You are missing ONE key piece of information:

    define command {
    command_name check_snmp_booster
    command_line check_snmp_booster -H $HOSTNAME$ -C $SNMPCOMMUNITYREAD$ -V 2c -t $ARG1$ -i $_SERVICEINST$ -T $_SERVICETRIGGERGROUP$
    module_type snmp_poller ; <-------This was missing
    }

    Otherwise, how is the poller to know that it should use the built in module!!

    Look at the name i put as the command line. There is no path, and the name can be anything.

    My fault, bad cut and paste in genDevConfig sample template.

    Very, very sorry about that. The wiki was ok, but the sample file was not.

    You should now be set.

    xkilian

  8. #18

    Re: Discovering interfaces to monitor on a router/switch

    Hey xkilian, thanks for it! ;D

    Got some progress, services are now UNKNOWN and not CRITICAL. That's very good, state changed finally.


    Still I have this on master scheduler.log:

    Code:
    ==> /usr/local/shinken/var/schedulerd.log <==
    2012-10-16 20:16:08,357 [1350418568] Error :  [SnmpBooster] Host not found: (--ip address--)
    And on the check_mk "output of plugin" info I see:

    Code:
    Host not found in memcache: `(--ip address--)'
    How can I confirm that something is being written into memcache ?
    I made no configuration on memcached, could it be something missing?

    I see traffic from the remote probe towards my master's memcached, with the IP of my host to be checked there...

    Code:
        0x0000: 4500 003c 2e98 4000 3506 531e 0ae2 571e E..<..@.5.S...W.
        0x0010: 0ae6 5720 a79a 2bcb 38cf 1d83 0000 0000 ..W...+.8.......
        0x0020: a002 16d0 9c14 0000 0204 05b4 0402 080a ................
        0x0030: 7442 331b 0000 0000 0103 0307      tB3.........
    And the answer from memcached:

    Code:
        0x0000: 4500 003c 0000 4000 4006 76b6 0ae6 5720 E..<..@.@.v...W.
        0x0010: 0ae2 571e 2bcb a79a f818 cee2 38cf 1d84 ..W.+.......8...
        0x0020: a012 3890 28fc 0000 0204 05b4 0402 080a ..8.(...........
        0x0030: 6d3d 1d0e 7442 331b 0103 0307      m=..tB3.....
    Kinda clueless here.

    Once more, thank you!

  9. #19
    Administrator
    Join Date
    Dec 2011
    Posts
    278

    Re: Discovering interfaces to monitor on a router/switch

    Once again my template was bad.

    There is a missing $ to !_SERVICETEMPLATE$
    ----

    I am not sure what it was doing to the command, but it can't be good.

    check_snmp_booster!$_SERVICEDSTEMPLATE$!$_SERVICEI NST$!$_SERVICETRIGGERGROUP$


    That should get you fixed up.

    As to memcache:

    medump (dump memcache list of keys)
    memcat (get value from key)

    Also, the Arbiter is responsible for writing to memcache the host keys. It should generate error messages if it has a problem. Look at your Arbiter logs.

    If you want extra logger messages, add them between lines 1411 and 1425 of shinken/modules/snmp_poller.py.

    Once again, sorry for the bad template…

    Cheers,

    xkilian

  10. #20

    Re: Discovering interfaces to monitor on a router/switch

    OK, added the missing $ on the line. More progress, now we got Trackbacks!

    It is happening once a minute, give or take.

    Code:
    ==> /usr/local/shinken/var/nagios.log <==
    [1350462112] Error :  The instance SnmpBooster raised an exception timed out, I remove it!
    [1350462112] Error :  Back trace of this remove: Traceback (most recent call last):
     File "/usr/local/shinken/shinken/modulesmanager.py", line 131, in try_instance_init
      inst.init()
     File "/usr/local/shinken/shinken/modules/snmp_poller.py", line 1145, in init
      if not self.memcached.get_stats():
     File "/usr/lib/python2.6/site-packages/memcache.py", line 197, in get_stats
      line = readline()
     File "/usr/lib/python2.6/site-packages/memcache.py", line 889, in readline
      data = recv(4096)
    timeout: timed out

    That's strange as memcached is running and without firewalls (it's local to that machine actually). I'm using the IP on configuration, not 127.0.0.1.

    Code:
    496   25412 0.0 0.8 332968 15724 ?    Ssl Oct15  0:04 memcached -d -p 11211 -u memcached -m 64 -c 1024 -P /var/run/memcached/memcached.pid

    Any ideas?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •