Results 1 to 5 of 5

Thread: Problems with perl plugins - zombies

  1. #1
    Junior Member
    Join Date
    Jul 2011
    Posts
    2

    Problems with perl plugins - zombies

    Hello,

    At my Company, we are testing both gearmand and shinken for our next monitoring infrastructure.
    We are facing some problems with shinken pollers : it seems every check (via nagios perl plugins, both officials and of our own) is ending in a zombie.
    Sometimes we have up to 800 zombies at a time.
    The check are done through.

    The same plugins used by nagios and gearmand show no problems.
    Does someone have any ideas ?

    We are checking 16000 services with one poller. The same poller is used for shinken and nagios / gearmand (not at the same time of course )
    the average load is higher using shinken than nagios / gearmand.

    The poller is a 8 cores / 16 MT cores with 12 Go RAM.
    We have another physical server as arbiter, broker (ndo and NPCD), receiver and reactionner ; and a VM for scheduler.

    Any idea is wlecome

    Regards

  2. #2
    Administrator
    Join Date
    Jun 2011
    Posts
    216

    Re: Problems with perl plugins - zombies

    Hi

    If you managed to have a distributed Shinken working it's a very thing. For my own i've tried to add a simple ditributed poller and it doesn't work very well

    Concerning the zombies process I have no idea but 16 000 services is quite huge. How often are they checked?
    In your poller server, I guess you have a Shinken installed and you only run the shinken-poller isn't it?
    If it so, did you try to edit your shinken-specific.cfg on the server running the arbiter?
    If not, try to change the values of min workers and max workers maybe it will help

    Are zombie plugins noticed as timeout in the nagios.log?

  3. #3
    Junior Member
    Join Date
    Jul 2011
    Posts
    2

    Re: Problems with perl plugins - zombies

    Hi,

    thanks for the reply.
    Yes, the poller is dedicated for shinken (or nagios).
    You are right, the zombies are noticed as "snmp timeout".

    But it works fine with nagios, and exactly the same config.
    It played a bit with shinken/nagios.cfg and shinken-specific.cfg, but i did not succeed.

    16000 is quite big, but it works fine with nagios...

    Regards

  4. #4
    Administrator
    Join Date
    Jun 2011
    Posts
    216

    Re: Problems with perl plugins - zombies

    Hi,

    Well, i've looked and the Internet something about the SNMP timeout.
    Sometimes it is due to a bad value for the service_check_timeout. I think you have the same in Nagios an Shinken.
    Sometimes it's mean that the servir which recieved the SNMP request is overloaded.

    But careful of you config, Shinken do not use every parameters from nagios, even if there are wirtten in your nagios.cfg.
    For some parameters, it is necessary to load appropriate module to make it work fine

    Try to launch in debug mode to see if the poller / scheduler / etc are overloaded

  5. #5
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: Problems with perl plugins - zombies

    If I'm now wrong it was the problem solved in the mailing list with more workers by poller, and less process by workers isn't it? If so, we can renamed this post as [RESOLVED]
    No direct support by personal message. Please open a thread so everyone can see the solution

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •