Results 1 to 9 of 9

Thread: duplicate hosts in Livestatis

  1. #1
    Junior Member
    Join Date
    Oct 2012
    Location
    Russia, Ekaterinburg
    Posts
    22

    duplicate hosts in Livestatis

    Hi.
    Shinken 1.2.2
    I need to regularly restart Arbiter.
    I have one server.
    Livestatis periodically fall. The problem went away, when I start one more scheduler...

    But now i catch duplicate some hosts in Livestatis, these hosts are not checked.
    If I restart the shinken, the problem comes again after some time.

  2. #2
    Administrator
    Join Date
    Jun 2011
    Posts
    216

    Re: duplicate hosts in Livestatis

    Livestatus is linked to the Broker, it's strange that adding a sched fix the issue.

    BTW, your schedulers does not have the same hosts (packs are made before launching). How did you configure your scheduler? Is your architecture only one "tree" ?

    Something else, what do you mean by "periodically fall" ? Something on logs is written?

  3. #3
    Junior Member
    Join Date
    Oct 2012
    Location
    Russia, Ekaterinburg
    Posts
    22

    Re: duplicate hosts in Livestatis

    Thank you for your response.

    [quote author=Seb-Solon link=topic=984.msg5307#msg5307 date=1372432838]
    Livestatus is linked to the Broker, it's strange that adding a sched fix the issue.
    Something else, what do you mean by "periodically fall" ? Something on logs is written?
    [/quote]
    I understand that. But now I have no way to disable one scheduler and wait for the collapse of the system to bring the logs here. Livestatus now does not fall, I will return later to the problem.

    Now I want to solve the problem of duplicate hosts, as the Livestatus does not show the correct information until I restart Broker.

    [quote author=Seb-Solon link=topic=984.msg5307#msg5307 date=1372432838]
    How did you configure your scheduler? Is your architecture only one "tree" ?
    [/quote]

    What do you mean "only one tree"?
    I checked about 7,000 hosts and 12,000 services. dependencies between them in the config has not made.
    My shinken-specific.cfg:
    Code:
    #===============================================================================
    # ARBITER (S1_Arbiter)
    #===============================================================================
    define arbiter {
      arbiter_name  Arbiter-Master
      host_name    shinken.local    ; CHANGE THIS if you have several Arbiters
      address     shinken.local  ; DNS name or IP
      port      7770
      spare      0      ; 1 = is a spare, 0 = is not a spare
      modules CommandFile
      ## Uncomment these lines in a HA architecture so the master and slaves know
      ## how long they may wait for each other.
      timeout       10  ; Ping timeout
      data_timeout     120 ; Data send timeout
      max_check_attempts  3  ; If ping fails N or more, then the node is dead
      check_interval    10 ; Ping node every minutes
    }
    
    #===============================================================================
    # SCHEDULER (S1_Scheduler)
    #===============================================================================
    define scheduler {
      scheduler_name   Scheduler-shinken ; Just the name
      address       shinken.local  ; IP or DNS address of the daemon
      port        7768    ; TCP port of the daemon
      ## Optional
      spare        0  ; 1 = is a spare, 0 = is not a spare
      weight       1  ; Some schedulers can manage more hosts than others
      timeout       20  ; Ping timeout
      data_timeout    120 ; Data send timeout
      max_check_attempts 3  ; If ping fails N or more, then the node is dead
      check_interval   10 ; Ping node every minutes
    
      modules   MongodbRetention
    
      ## Advanced Features
      # Realm is for multi-datacenters
      realm  All
      # Skip initial broks creation. Boot fast, but some broker modules won't
      # work with it!
      skip_initial_broks 0
      # In NATted environments, you declare each satellite ip[:port] as seen by
      # *this* scheduler (if port not set, the port declared by satellite itself
      # is used)
      #satellitemap  poller-1=1.2.3.4:1772, reactionner-1=1.2.3.5:1773, ...
    }
    
    define scheduler {
      scheduler_name   Scheduler2-shinken ; Just the name
      address       shinken.local  ; IP or DNS address of the daemon
      port        17768    ; TCP port of the daemon
    
      spare        0  ; 1 = is a spare, 0 = is not a spare
      weight       1  ; Some schedulers can manage more hosts than others
      timeout       20  ; Ping timeout
      data_timeout    120 ; Data send timeout
      max_check_attempts 3  ; If ping fails N or more, then the node is dead
      check_interval   10 ; Ping node every minutes
    
      modules   MongodbRetention
    
      realm  All
      skip_initial_broks 0
    }
    
    
    #===============================================================================
    # POLLER (S1_Poller)
    #===============================================================================
    define poller {
      poller_name   Poller-shinken
      address     shinken.local
      port      7771
      spare      0
    
      ## Optional
      manage_sub_realms  0  ; Does it take jobs from schedulers of sub-Realms?
      min_workers     0  ; Starts with N processes (0 = 1 per CPU)
      max_workers     0  ; No more than N processes (0 = 1 per CPU)
      processes_by_worker 20 ; Each worker manages N checks
      polling_interval  1  ; Get jobs from schedulers each 1 second
      timeout       10  ; Ping timeout
      data_timeout    120 ; Data send timeout
      max_check_attempts 3  ; If ping fails N or more, then the node is dead
      check_interval   10 ; Ping node every minutes
    
      modules
    
      ## Advanced Features
      #passive       0    ; For DMZ monitoring, set to 1 so the connections
                     ; will be from scheduler -> poller.
      #poller_tags     None
      realm         All
    }
    
    
    #===============================================================================
    # BROKER (S1_Broker)
    #===============================================================================
    define broker {
      broker_name   Broker-shinken
      address     shinken.local
      port      7772
      spare      0
      ## Optional
      manage_arbiters   1  ; Take data from Arbiter. There should be only one
                  ; broker for the arbiter.
      manage_sub_realms  1  ; Does it take jobs from schedulers of sub-Realms?
      timeout       10  ; Ping timeout
      data_timeout    120 ; Data send timeout
      max_check_attempts 3  ; If ping fails N or more, then the node is dead
      check_interval   10 ; Ping node every minutes
    
      ## Modules
      modules   Livestatus
    
      ## Advanced
      realm  All
    }
    
    
    #===============================================================================
    # REACTIONNER (S1_Reactionner)
    #===============================================================================
    define reactionner {
      reactionner_name  Reactionner-shinken
      address       shinken.local
      port        7769
      spare        0
    
      ## Optionnal
      manage_sub_realms  0  ; Does it take jobs from schedulers of sub-Realms?
      min_workers     1  ; Starts with N processes (0 = 1 per CPU)
      max_workers     15 ; No more than N processes (0 = 1 per CPU)
      polling_interval  1  ; Get jobs from schedulers each 1 second
      timeout       10  ; Ping timeout
      data_timeout    120 ; Data send timeout
      max_check_attempts 3  ; If ping fails N or more, then the node is dead
      check_interval   10 ; Ping node every minutes
    
      ## Modules
      modules
    
      ## Advanced
      realm  All
    }
    
    #===============================================================================
    # RECEIVER (S1_Receiver)
    #===============================================================================
    define receiver {
      receiver_name  receiver-shinken
      address     shinken.local
      port      7773
      spare      0
      ## Optional
      timeout       10
      data_timeout    120
      max_check_attempts 3
      check_interval   10
    
      ## Modules
      #modules     NSCA, CommandFile
      modules   CommandFile
    
      ## Advanced Feature
      direct_routing   0  ; If enabled, it will directly send commands to the
                  ; schedulers if it know about the hostname in the
                  ; command.
      realm  All
    }

  4. #4
    Administrator
    Join Date
    Jun 2011
    Posts
    216

    Re: duplicate hosts in Livestatis

    Ok, if there are no big dependencies between your hosts then there should be not problem for the scheduler to have 2 differents "packs" of hosts.

    I dont think the LS duplicate hosr is related to the scheduler because this is the arbiter that give conf to everyone.

    Can you try to log a ticket on github with this?

  5. #5
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: duplicate hosts in Livestatis

    I don't understand how it is possible to have duplicate hosts on the broker. Can you look at the hosts id in livestatus, to see if they are different or not?
    No direct support by personal message. Please open a thread so everyone can see the solution

  6. #6
    Junior Member
    Join Date
    Oct 2012
    Location
    Russia, Ekaterinburg
    Posts
    22

    Re: duplicate hosts in Livestatis

    Hi,
    I caught the problem again.

    Filter:
    Code:
    def get_livestatus_hosts_count():
        try:
            return mk_livestatus.SingleSiteConnection("tcp:shinken.local:50000").query_table_assoc(
                "GET hosts\n"
                "Filter: name ~~ WFA001-20-SAR48-08.ekb\n"
                "Limit: 10")
        except Exception, e:
            print e
    
    for host in get_livestatus_hosts_count():
        print host
        print "\n"
    Result:
    {'last_time_unreachable': 1373731178, 'childs': [], 'action_url': '', 'num_services_warn': 0, 'low_flap_threshold': 25, 'filename': '', 'check_flapping_recovery_notification&#039 ;: 1, 'num_services_crit': 0, 'last_state': 'UP', 'display_name': u'WFA001-20-SAR48-08.ekb', 'notification_interval': 5, 'last_hard_state_change': 1375417037, 'retry_interval': 1, 'event_handler_enabled': 0, 'parents': [u'WFS001-SAR48-08.ekb'], 'execution_time': 0.031677961349487305, 'notifications_enabled': 0, 'services_with_info': [], 'no_more_notifications': 0, 'next_notification': 0, 'name': u'WFA001-20-SAR48-08.ekb', 'notes': '', 'custom_variable_values': [u'public'], 'num_services_hard_unknown': 0, 'num_services_pending': 0, 'num_services_hard_ok': 0, 'checks_enabled': 1, 'active_checks_enabled': 1, 'process_performance_data': 1, 'source_problems': [], 'in_notification_period': 1, 'total_services': '', 'accept_passive_checks': 1, 'notes_url': '', 'contacts': [u'duty'], 'last_time_up': 1375681192, 'last_hard_state': 'UP', 'comments': [], 'icon_image': '', 'state': 0, 'num_services_ok': 0, 'is_problem': 0, 'icon_image_expanded': '', 'num_services_unknown': 0, 'modified_attributes_list': [], 'comments_with_info': [], 'action_url_expanded': '', 'contact_groups': [u'admins'], 'downtimes_with_info': [], 'groups': [], 'address': u'192.168.237.164', 'services_with_state': [], 'acknowledgement_type': 1, 'business_impact': 2, 'max_check_attempts': 2, 'child_dependencies': [], 'hard_state': '', 'statusmap_image': '', 'current_notification_number': 0, 'in_check_period': 1, 'worst_service_hard_state': 0, 'check_options': '', 'last_notification': 0, 'check_type': 0, 'check_period': u'24x7', 'current_attempt': 1, 'worst_service_state': 0, 'parent_dependencies': [u'WFS001-SAR48-08.ekb'], 'percent_state_change': 0.0, 'plugin_output': u'OK - 192.168.237.164: rta 4.927ms, lost 0%', 'initial_state': 'u', 'first_notification_delay': 0, 'has_been_checked': 1, 'z_3d': '', 'pending_flex_downtime': 0, 'event_handler': '', 'x_3d': '', 'is_executing': 0, 'state_type': 1, 'criticity': 2, 'num_services': 0, 'scheduled_downtime_depth': 0, 'check_command': u'check-host-alive', 'last_state_change': 1375417037, 'y_3d': '', 'high_flap_threshold': 50, 'check_interval': 1, 'next_check': 1375681253, 'num_services_hard_warn': 0, 'perf_data': u'rta=4.927ms;1000.000;1000.000;0; pl=0%;99;99;; rtmax=7.641ms;;;; rtmin=2.498ms;;;;', 'check_freshness': 0, 'is_impact': 0, 'impacts': [], 'icon_image_alt': '', 'custom_variables': [(u'SNMP_COMMUNITY', u'public&#039], 'latency': 3.3800339698791504, 'alias': '', WiFi', 'custom_variable_names': [u'SNMP_COMMUNITY'], 'flap_detection_enabled': 1, 'last_check': 1375681189, 'got_business_rule': 0, 'services': [], 'notes_url_expanded': '', 'obsess_over_host': 0, 'num_services_hard_crit': 0, 'downtimes': [], 'acknowledged': 0, 'last_time_down': 1375416968, 'pnpgraph_present': 0, 'modified_attributes': 0L, 'notification_period': u'24x7', 'is_flapping': 0, 'long_plugin_output': u'', 'notes_expanded': ''}


    {'last_time_unreachable': 1373731178, 'childs': [], 'action_url': '', 'num_services_warn': 0, 'low_flap_threshold': 25, 'filename': '', 'check_flapping_recovery_notification&#039 ;: 1, 'num_services_crit': 0, 'last_state': 'UP', 'display_name': u'WFA001-20-SAR48-08.ekb', 'notification_interval': 5, 'last_hard_state_change': 1375289498, 'retry_interval': 1, 'event_handler_enabled': 0, 'parents': [u'WFS001-SAR48-08.ekb'], 'execution_time': 0.0530698299407959, 'notifications_enabled': 0, 'services_with_info': [], 'no_more_notifications': 0, 'next_notification': 0, 'name': u'WFA001-20-SAR48-08.ekb', 'notes': '', 'custom_variable_values': [u'public'], 'num_services_hard_unknown': 0, 'num_services_pending': 0, 'num_services_hard_ok': 0, 'checks_enabled': 1, 'active_checks_enabled': 1, 'process_performance_data': 1, 'source_problems': [], 'in_notification_period': 1, 'total_services': '', 'accept_passive_checks': 1, 'notes_url': '', 'contacts': [u'duty'], 'last_time_up': 1375360437, 'last_hard_state': 'UP', 'comments': [], 'icon_image': '', 'state': 0, 'num_services_ok': 0, 'is_problem': 0, 'icon_image_expanded': '', 'num_services_unknown': 0, 'modified_attributes_list': [], comments_with_info': [], 'action_url_expanded': '', 'contact_groups': [u'admins'], 'downtimes_with_info': [], 'groups': [], 'address': u'192.168.237.164', 'services_with_state': [], 'acknowledgement_type': 1, 'business_impact': 2, 'max_check_attempts': 2, 'child_dependencies': [], 'hard_state': '', 'statusmap_image': '', 'current_notification_number': 0, 'in_check_period': 1, 'worst_service_hard_state': 0, 'check_options': '', 'last_notification': 0, 'check_type': 0, 'check_period': u'24x7', 'current_attempt': 1, 'worst_service_state': 0, 'parent_dependencies': [u'WFS001-SAR48-08.ekb'], 'percent_state_change': 0.0, 'plugin_output': u'OK - 192.168.237.164: rta 15.131ms, lost 0%', 'initial_state': 'u', 'first_notification_delay': 0, 'has_been_checked': 1, 'z_3d': '', 'pending_flex_downtime': 0, 'event_handler': '', 'x_3d': '', 'is_executing': 0, 'state_type': 1, 'criticity': 2, 'num_services': 0, 'scheduled_downtime_depth': 0, 'check_command': u'check-host-alive', 'last_state_change': 1375289498, 'y_3d': '', 'high_flap_threshold': 50, 'check_interval': 1, 'next_check': 1375360499, 'num_services_hard_warn': 0, 'perf_data': u'rta=15.131ms;1000.000;1000.000;0; pl=0%;99;99;; rtmax=32.359ms;;;; rtmin=5.026ms;;;;', 'check_freshness': 0, 'is_impact': 0, 'impacts': [], 'icon_image_alt': '', 'custom_variables': [(u'SNMP_COMMUNITY', u'public&#039], 'latency': 13.62529706954956, 'alias': '', WiFi', 'custom_variable_names': [u'SNMP_COMMUNITY'], 'flap_detection_enabled': 1, 'last_check': 1375360433, 'got_business_rule': 0, 'services': [], 'notes_url_expanded': '', 'obsess_over_host': 0, 'num_services_hard_crit': 0, 'downtimes': [], 'acknowledged': 0, 'last_time_down': 1375289428, 'pnpgraph_present': 0, 'modified_attributes': 0L, 'notification_period': u'24x7', 'is_flapping': 0, 'long_plugin_output': u'', 'notes_expanded': ''}

    [quote author=naparuba link=topic=984.msg5318#msg5318 date=1372748840]
    Can you look at the hosts id in livestatus, to see if they are different or not?
    [/quote]
    how can I do this?

  7. #7
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: duplicate hosts in Livestatis

    Can you query the "id" field on livestatus? If we got the same id or different ones, can be a different source problem
    No direct support by personal message. Please open a thread so everyone can see the solution

  8. #8
    Junior Member
    Join Date
    Oct 2012
    Location
    Russia, Ekaterinburg
    Posts
    22

    Re: duplicate hosts in Livestatis

    Ok, can you explain how I can get the "id" field?

    Table of hosts have no "id" field

    query:
    Code:
    "GET hosts\n"
    "Columns: name id\n"
    "Filter: name ~~ WFA001-20-SAR48-08.ekb\n"
    Result:
    Code:
    {'id': u'', 'name': u'WFA001-20-SAR48-08.ekb'}

  9. #9
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,131

    Re: duplicate hosts in Livestatis

    Arg, so we will have to enable the debug output of livestatus and look at the borker debug output
    No direct support by personal message. Please open a thread so everyone can see the solution

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •