Results 1 to 4 of 4

Thread: a passive check that start going only active

  1. #1
    Junior Member
    Join Date
    Dec 2011
    Posts
    7

    a passive check that start going only active

    Hi everyone,
    (sorry for the long post... and bad english)

    I'm still a noob with Shinken and recently I encountered a problem with one of my passive checks.


    - The context is very simple :

    1 little Shinken Server, listening to dozens of passive checks with the WS_Arbiter (all the monitored machines are behind firewalls).
    The hosts are not checked at all, only their services : custom Python scripts verify what I want and send the result to the WS_Arbiter, generally every hour.
    If Shinken doesn't receive the result in time (freshness_threshold 4200 seconds to have a margin for hourly passive checks), a dummy active check put the service in Critical state.

    The configuration of the service is
    Code:
    #Service Definition#
    define service{
        host_name host01
        service_description Check_hourly
        use tpl-service-private-hourly
        check_command check_dummy!2 "Passive critical"
    }
    The template used is
    Code:
    #Template Definition#
    define service{
        name              	tpl-service-private-hourly
        notifications_enabled      	1
        event_handler_enabled     	 	0
        flap_detection_enabled     	 	0
        process_perf_data        	1
        retain_status_information    	1
        retain_nonstatus_information  	1
        contact_groups         	admins
        action_url           	../../pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
    		
    	active_checks_enabled        0
    	passive_checks_enabled       1
    	check_freshness           1
    	notification_period         24x7
    	notification_options        w,u,c,r,s
    	notification_interval        60
    
    	check_period            24x7
    	check_interval           1
    	retry_interval           1
    	freshness_threshold         4200
    	#check if passive done every 4200 seconds (1 hour and 10 minutes)
    	max_check_attempts         1
    	register              0
    }

    - Here is my problem :

    One of the passive checks didn't received the result in time and the dummy active check was triggered to put the service to critical.
    I checked the problem on the host and manually run the script that monitor the service and send the result to Shinken.
    Shinken received the result through WS_Arbiter and changed the service state from Critical to OK.

    However, for a reason that I don't know, that service continue to be regularly actively checked, launching the dummy script that just return a Critical result code.
    Of course the template of the service include active_checks_enabled 0.


    - What i've done so far :

    I verify the configuration of that service and his template = everything seems ok.
    I tested a similar service, that is configured the same way (same template), and willingly stopped the monitoring script = the service state goes to Critical and return to Ok when receiving again the passive result but didn't continue to actively check the service.
    I restarted all Shinken services = nothing changed.

    I could test to erase this service from my config and create it again, but I'd rather like to understand what is going on and how to fix it.



    Do someone already encounter the same problem ?
    Thanks for the help

  2. #2
    gdizier2
    Guest

    Re: a passive check that start going only active

    Hello,

    I encounter exactly the same problem but all my passive check.
    I also use WS_Arbitrer to boost the results of my check written in powershell.

    At first glance, one would say that Shinken completely ignores the parameter freshness_threshold and he miscalculates at that.

    If I run my checks manually, I have good result in Shinken.
    The problem is the freshness of my normal passive check varies between 1 hour and 24 hours of work that I checks.

    Someone an idea?

  3. #3
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,130

    Re: a passive check that start going only active

    Remeber that the active_check_enabled (and passive one) are only read from configuration at the first service launch. Then it's saved into the retention data (so got to use use an external command to change it, configuration won't).

    Try to remove your retention data and restart. Can solve this.
    No direct support by personal message. Please open a thread so everyone can see the solution

  4. #4
    gdizier
    Guest

    Re: a passive check that start going only active

    Hi,

    Thank you Naparuba for your reply.

    I have created a script as event_handler to disable active check and enable active check when active check is executed and display my error message.

    The definition of my passive check:

    Code:
    define service{
        name                   Passive-ArcServ-xxxx
        use                    service-xxxx
    
        active_checks_enabled     0
        passive_checks_enabled    1
    
        check_freshness           1
        freshness_threshold        7200
    
        flap_detection_enabled     0
    
        max_check_attempts       1
        check_interval             1
        retry_interval             1
    
        check_command           no_arcserv_result
        event_handler             reset_service_topassive
    
        register            0
    }
    
    
    define service {
        name                  ArcServ
        use                    Passive-ArcServ-xxxx
        service_description        vc2_arcserv
        host_name              vc2
    }
    This is the definition of my dummy command which is executed when the service switch to active.

    Code:
    define command {
      command_name  no_arcserv_result
      command_line  $PLUGINSDIR$/check_dummy 1 "No ArcServ Passive Check Result (Freshness 1h) (CODE: 666)"
    }
    This the definition of my event handler command

    Code:
    # Change Service Passive Check Active to Passive
    define command {
      command_name  reset_service_topassive
      command_line  $PLUGINSDIR$/reset_service_topassive.sh $HOSTNAME$ $SERVICESTATE$ $SERVICEDESC$ $SERVICEOUTPUT$
    }
    This is the code of my event handler script

    Code:
    #!/bin/bash
    # Reset Passive Service to Passive Service Check
    
    HOSTNAME=$1
    STATE=$2
    SERVICE=$3
    TEXT=$4
    
    COMMANDFILE="/usr/local/shinken/var/rw/nagios.cmd"
    
    case "$STATE" in.
    OK)
      # Nothing to do
      ;;
    WARNING)
      if [[ "$TEXT" == *"(CODE: 666)"* ]]; then
          NOW=`date +%s`
          /bin/printf "[%lu] ENABLE_PASSIVE_SVC_CHEKS;$HOSTNAME;$SERVICE\n" $NOW > $COMMANDFILE
          NOW=`date +%s`
          /bin/printf "[%lu] DISABLE_SVC_CHECK;$HOSTNAME;$SERVICE\n" $NOW > $COMMANDFILE
          NOW=`date +%s`
          /bin/printf "[%lu] DISABLE_SERVICE_FLAP_DETECTION;$HOSTNAME;$SERVICE\n" $NOW > $COMMANDFILE
      fi
      ;;
    CRITICAL)
      # Nothing to do
      ;;
    esac
    
    exit 0
    I can't remove retention data because mine is in mongodb.

    I have 2 problems.

    The first the freshness threshold I set is absolutely not respected, so the dummy command is executed too early.

    The second is that once past the check active, it never returns passively while my script at the event handler is running and working properly.

    I'm a little lost.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •