Results 1 to 8 of 8

Thread: [RESOLVED] Multiple values for ws_arbiter

  1. #1
    aboisvert
    Guest

    [RESOLVED] Multiple values for ws_arbiter

    Hello everyone,

    Here is some context for my questions:

    I plan to do some passive monitoring with Shinken and Graphite. I have a custom agent on a distant machine and this agents reports in every 20 minutes with data.
    I saw the ws_arbiter module and I enabled it. It works as advertised ;D I can receive data on a web service and ultimately send it to the graphing solution (the Graphite broker).
    Everything works well, a simple curl command can send a metric value.

    Now, I want to receive more data in a single POST request. I looked at the code of the ws_arbiter and I found a way.
    The Bottle Web Framework used in the ws_arbiter allows us to recover more data in a single POST.
    I created a new version of the ws_arbiter which accepts more than one metric value. (here: http://nopaste.me/paste/328330054515c408d1bdda)
    It does work! The new ws_arbiter can fetch more data in a single POST.

    However, there is a problem.
    The ws_arbiter adds the command in a command queue (with the method app.from_q.put()).
    To my great surprise, if I add 3 commands to the queue, only the last one will end up on the Graphite broker.

    Here is an example:

    I send the values for a metric:
    (73% at 07:58, 74% at 07:59, 75% at 08:00)

    The values detected by the ws_arbiter are OK:
    time_stamp = 1364993880, host_name = ServerName, service_description = MetricIndex, return_code = 0, output = MetricIndex=73%
    time_stamp = 1364993940, host_name = ServerName, service_description = MetricIndex, return_code = 0, output = MetricIndex=74%
    time_stamp = 1364994000, host_name = ServerName, service_description = MetricIndex, return_code = 0, output = MetricIndex=75%
    The ws_arbiter then converts them into commands.

    I use a simple for loop to add the commands to the ws_arbiter queue
    [tt]for c in commands_list:
    ext = ExternalCommand(c)
    app.from_q.put(ext)[/tt]

    Only the value "75% at 08:00" (the last one) will be sent to Graphite.

    The only way I found to patch this (the ugly way) is to add a sleep() in the loop.

    I believe the new version of the code has some potential, but at the moment, it can't be used with the sleep() command.

    Am I missing something about the inner workings of the ws_arbiter?



    Thank you for reading.

  2. #2
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,130

    Re: Multiple values for ws_arbiter

    What about putting the value in the reverse order in the queue?

    I4m quite interested by a pull request about this new feature
    No direct support by personal message. Please open a thread so everyone can see the solution

  3. #3
    aboisvert
    Guest

    Re: Multiple values for ws_arbiter

    I will try this an report back as soon as I can.

    Thank you for the suggestion.

  4. #4
    aboisvert
    Guest

    Re: Multiple values for ws_arbiter

    Hello,

    I tried the suggestion. Ordering the commands in the ws_arbiter by the time_stamps.

    Sadly, the problem is still there

    To help debug, here is some additional information.

    The configuration (shinken-specific.cfg) for the setup:

    Code:
    # ws_arbiter (as defined)
    define module {
     username anonymous
     host 0.0.0.0
     module_type ws_arbiter
     module_name WS_Arbiter
     port 7760
    }
    
    # the receiver has the ws_arbiter module
    define receiver {
     data_timeout 120
     check_interval 60
     receiver_name receiver-1
     direct_routing 0
     modules NSCA,WS_Arbiter
     spare 0
     timeout 3
     address localhost
     realm All
     max_check_attempts 3
     port 7773
    }
    Context for the metrics:

    Host name = SL410
    Service Name = HappynessIndex (this is a random-ish value between 0% and 100%)

    Here are the logs (partial):

    In time_stamp increasing order

    receiverd.log
    Code:
    2013-04-03 11:34:01,932 [1365003241] Debug :  [receiver-1] [Ws_arbiter] time_stamp_list: ['1365003061', '1365003121', '1365003181']
    2013-04-03 11:34:01,933 [1365003241] Debug :  [receiver-1] [Ws_arbiter] host_name_list: ['SL410', 'SL410', 'SL410']
    2013-04-03 11:34:01,933 [1365003241] Debug :  [receiver-1] [Ws_arbiter] service_description_list: ['HappynessIndex', 'HappynessIndex', 'HappynessIndex']
    2013-04-03 11:34:01,933 [1365003241] Debug :  [receiver-1] [Ws_arbiter] return_code_list: ['0', '0', '0']
    2013-04-03 11:34:01,933 [1365003241] Debug :  [receiver-1] [Ws_arbiter] output_list: ['OK | HappynessIndex=69%', 'OK | HappynessIndex=69%', 'OK | HappynessIndex=72%']
    2013-04-03 11:34:01,933 [1365003241] Debug :  [receiver-1] [Ws_arbiter] commands = ['[1365003061] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=69%', '[1365003121] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=69%', '[1365003181] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=72%']
    2013-04-03 11:34:01,933 [1365003241] Debug :  [receiver-1] [Ws_arbiter] commands = ['[1365003061] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=69%', '[1365003121] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=69%', '[1365003181] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=72%']
    2013-04-03 11:34:02,113 [1365003242] Debug :  [receiver-1] Enqueuing an external command: {'__module__': 'shinken.external_command', 'my_type': 'externalcommand', '__init__': <function __init__ at 0x1c75d70>, '__doc__': None}
    2013-04-03 11:34:02,116 [1365003242] Debug :  [receiver-1] Enqueuing an external command: {'__module__': 'shinken.external_command', 'my_type': 'externalcommand', '__init__': <function __init__ at 0x1c75d70>, '__doc__': None}
    2013-04-03 11:34:02,116 [1365003242] Debug :  [receiver-1] Enqueuing an external command: {'__module__': 'shinken.external_command', 'my_type': 'externalcommand', '__init__': <function __init__ at 0x1c75d70>, '__doc__': None}
    schedulerd.log
    Code:
    2013-04-03 11:34:02,131 [1365003242] Debug :  scheduler resolves command '[1365003061] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=69%'
    2013-04-03 11:34:02,131 [1365003242] Debug :  mode= applyer, global= False
    2013-04-03 11:34:02,131 [1365003242] Debug :  Searching for a new arg: SL410 (1)
    2013-04-03 11:34:02,132 [1365003242] Debug :  For command arg: SL410
    2013-04-03 11:34:02,132 [1365003242] Debug :  Searching for a new arg: HappynessIndex (2)
    2013-04-03 11:34:02,132 [1365003242] Debug :  For command arg: HappynessIndex
    2013-04-03 11:34:02,132 [1365003242] Debug :  Searching for a new arg: 0 (2)
    2013-04-03 11:34:02,132 [1365003242] Debug :  For command arg: 0
    2013-04-03 11:34:02,133 [1365003242] Debug :  Searching for a new arg: OK | HappynessIndex=69% (3)
    2013-04-03 11:34:02,133 [1365003242] Debug :  For command arg: OK | HappynessIndex=69%
    2013-04-03 11:34:02,133 [1365003242] Debug :  Got commands PROCESS_SERVICE_CHECK_RESULT [<shinken.objects.service.Service object at 0x1c18380>, 0, 'OK | HappynessIndex=69%']
    2013-04-03 11:34:02,133 [1365003242] PASSIVE SERVICE CHECK: SL410;HappynessIndex;0;OK | HappynessIndex=69%
    2013-04-03 11:34:02,134 [1365003242] Debug :  scheduler resolves command '[1365003121] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=69%'
    2013-04-03 11:34:02,135 [1365003242] Debug :  mode= applyer, global= False
    2013-04-03 11:34:02,135 [1365003242] Debug :  Searching for a new arg: SL410 (1)
    2013-04-03 11:34:02,135 [1365003242] Debug :  For command arg: SL410
    2013-04-03 11:34:02,135 [1365003242] Debug :  Searching for a new arg: HappynessIndex (2)
    2013-04-03 11:34:02,135 [1365003242] Debug :  For command arg: HappynessIndex
    2013-04-03 11:34:02,136 [1365003242] Debug :  Searching for a new arg: 0 (2)
    2013-04-03 11:34:02,136 [1365003242] Debug :  For command arg: 0
    2013-04-03 11:34:02,136 [1365003242] Debug :  Searching for a new arg: OK | HappynessIndex=69% (3)
    2013-04-03 11:34:02,136 [1365003242] Debug :  For command arg: OK | HappynessIndex=69%
    2013-04-03 11:34:02,136 [1365003242] Debug :  Got commands PROCESS_SERVICE_CHECK_RESULT [<shinken.objects.service.Service object at 0x1c18380>, 0, 'OK | HappynessIndex=69%']
    2013-04-03 11:34:02,137 [1365003242] PASSIVE SERVICE CHECK: SL410;HappynessIndex;0;OK | HappynessIndex=69%
    2013-04-03 11:34:02,137 [1365003242] Debug :  scheduler resolves command '[1365003181] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=72%'
    2013-04-03 11:34:02,137 [1365003242] Debug :  mode= applyer, global= False
    2013-04-03 11:34:02,137 [1365003242] Debug :  Searching for a new arg: SL410 (1)
    2013-04-03 11:34:02,137 [1365003242] Debug :  For command arg: SL410
    2013-04-03 11:34:02,138 [1365003242] Debug :  Searching for a new arg: HappynessIndex (2)
    2013-04-03 11:34:02,138 [1365003242] Debug :  For command arg: HappynessIndex
    2013-04-03 11:34:02,138 [1365003242] Debug :  Searching for a new arg: 0 (2)
    2013-04-03 11:34:02,138 [1365003242] Debug :  For command arg: 0
    2013-04-03 11:34:02,138 [1365003242] Debug :  Searching for a new arg: OK | HappynessIndex=72% (3)
    2013-04-03 11:34:02,138 [1365003242] Debug :  For command arg: OK | HappynessIndex=72%
    2013-04-03 11:34:02,139 [1365003242] Debug :  Got commands PROCESS_SERVICE_CHECK_RESULT [<shinken.objects.service.Service object at 0x1c18380>, 0, 'OK | HappynessIndex=72%']
    2013-04-03 11:34:02,139 [1365003242] PASSIVE SERVICE CHECK: SL410;HappynessIndex;0;OK | HappynessIndex=72%
    brokerd.log
    Code:
    2013-04-03 11:34:04,158 [1365003244] Debug :  [broker-1] [Graphite broker] Groking: HappynessIndex=72%
    2013-04-03 11:34:04,159 [1365003244] Debug :  [broker-1] [Graphite broker] End of grok: HappynessIndex, 72%
    Now in time_stamp decreasing order:
    receiverd.log
    Code:
    2013-04-03 11:49:01,186 [1365004141] Debug :  [receiver-1] [Ws_arbiter] time_stamp_list: ['1365003961', '1365004021', '1365004081']
    2013-04-03 11:49:01,186 [1365004141] Debug :  [receiver-1] [Ws_arbiter] host_name_list: ['SL410', 'SL410', 'SL410']
    2013-04-03 11:49:01,187 [1365004141] Debug :  [receiver-1] [Ws_arbiter] service_description_list: ['HappynessIndex', 'HappynessIndex', 'HappynessIndex']
    2013-04-03 11:49:01,187 [1365004141] Debug :  [receiver-1] [Ws_arbiter] return_code_list: ['0', '0', '0']
    2013-04-03 11:49:01,187 [1365004141] Debug :  [receiver-1] [Ws_arbiter] output_list: ['OK | HappynessIndex=52%', 'OK | HappynessIndex=50%', 'OK | HappynessIndex=57%']
    2013-04-03 11:49:01,187 [1365004141] Debug :  [receiver-1] [Ws_arbiter] commands = ['[1365003961] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=52%', '[1365004021] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=50%', '[1365004081] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=57%']
    2013-04-03 11:49:01,187 [1365004141] Debug :  [receiver-1] [Ws_arbiter] commands = ['[1365004081] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=57%', '[1365004021] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=50%', '[1365003961] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=52%']
    2013-04-03 11:49:01,470 [1365004141] Debug :  [receiver-1] Enqueuing an external command: {'__module__': 'shinken.external_command', 'my_type': 'externalcommand', '__init__': <function __init__ at 0x190dd70>, '__doc__': None}
    2013-04-03 11:49:01,471 [1365004141] Debug :  [receiver-1] Enqueuing an external command: {'__module__': 'shinken.external_command', 'my_type': 'externalcommand', '__init__': <function __init__ at 0x190dd70>, '__doc__': None}
    2013-04-03 11:49:01,471 [1365004141] Debug :  [receiver-1] Enqueuing an external command: {'__module__': 'shinken.external_command', 'my_type': 'externalcommand', '__init__': <function __init__ at 0x190dd70>, '__doc__': None}
    schedulerd.log
    Code:
    2013-04-03 11:49:01,479 [1365004141] Debug :  scheduler resolves command '[1365004081] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=57%'
    2013-04-03 11:49:01,479 [1365004141] Debug :  mode= applyer, global= False
    2013-04-03 11:49:01,479 [1365004141] Debug :  Searching for a new arg: SL410 (1)
    2013-04-03 11:49:01,479 [1365004141] Debug :  For command arg: SL410
    2013-04-03 11:49:01,480 [1365004141] Debug :  Searching for a new arg: HappynessIndex (2)
    2013-04-03 11:49:01,480 [1365004141] Debug :  For command arg: HappynessIndex
    2013-04-03 11:49:01,480 [1365004141] Debug :  Searching for a new arg: 0 (2)
    2013-04-03 11:49:01,480 [1365004141] Debug :  For command arg: 0
    2013-04-03 11:49:01,480 [1365004141] Debug :  Searching for a new arg: OK | HappynessIndex=57% (3)
    2013-04-03 11:49:01,481 [1365004141] Debug :  For command arg: OK | HappynessIndex=57%
    2013-04-03 11:49:01,481 [1365004141] Debug :  Got commands PROCESS_SERVICE_CHECK_RESULT [<shinken.objects.service.Service object at 0x2d2a390>, 0, 'OK | HappynessIndex=57%']
    2013-04-03 11:49:01,481 [1365004141] PASSIVE SERVICE CHECK: SL410;HappynessIndex;0;OK | HappynessIndex=57%
    2013-04-03 11:49:01,482 [1365004141] Debug :  scheduler resolves command '[1365004021] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=50%'
    2013-04-03 11:49:01,482 [1365004141] Debug :  mode= applyer, global= False
    2013-04-03 11:49:01,483 [1365004141] Debug :  Searching for a new arg: SL410 (1)
    2013-04-03 11:49:01,483 [1365004141] Debug :  For command arg: SL410
    2013-04-03 11:49:01,483 [1365004141] Debug :  Searching for a new arg: HappynessIndex (2)
    2013-04-03 11:49:01,483 [1365004141] Debug :  For command arg: HappynessIndex
    2013-04-03 11:49:01,483 [1365004141] Debug :  Searching for a new arg: 0 (2)
    2013-04-03 11:49:01,483 [1365004141] Debug :  For command arg: 0
    2013-04-03 11:49:01,484 [1365004141] Debug :  Searching for a new arg: OK | HappynessIndex=50% (3)
    2013-04-03 11:49:01,484 [1365004141] Debug :  For command arg: OK | HappynessIndex=50%
    2013-04-03 11:49:01,484 [1365004141] Debug :  Got commands PROCESS_SERVICE_CHECK_RESULT [<shinken.objects.service.Service object at 0x2d2a390>, 0, 'OK | HappynessIndex=50%']
    2013-04-03 11:49:01,484 [1365004141] PASSIVE SERVICE CHECK: SL410;HappynessIndex;0;OK | HappynessIndex=50%
    2013-04-03 11:49:01,485 [1365004141] Debug :  scheduler resolves command '[1365003961] PROCESS_SERVICE_CHECK_RESULT;SL410;HappynessIndex;0;OK | HappynessIndex=52%'
    2013-04-03 11:49:01,485 [1365004141] Debug :  mode= applyer, global= False
    2013-04-03 11:49:01,485 [1365004141] Debug :  Searching for a new arg: SL410 (1)
    2013-04-03 11:49:01,485 [1365004141] Debug :  For command arg: SL410
    2013-04-03 11:49:01,485 [1365004141] Debug :  Searching for a new arg: HappynessIndex (2)
    2013-04-03 11:49:01,486 [1365004141] Debug :  For command arg: HappynessIndex
    2013-04-03 11:49:01,486 [1365004141] Debug :  Searching for a new arg: 0 (2)
    2013-04-03 11:49:01,486 [1365004141] Debug :  For command arg: 0
    2013-04-03 11:49:01,486 [1365004141] Debug :  Searching for a new arg: OK | HappynessIndex=52% (3)
    2013-04-03 11:49:01,486 [1365004141] Debug :  For command arg: OK | HappynessIndex=52%
    2013-04-03 11:49:01,486 [1365004141] Debug :  Got commands PROCESS_SERVICE_CHECK_RESULT [<shinken.objects.service.Service object at 0x2d2a390>, 0, 'OK | HappynessIndex=52%']
    2013-04-03 11:49:01,487 [1365004141] PASSIVE SERVICE CHECK: SL410;HappynessIndex;0;OK | HappynessIndex=52%
    brokerd.log
    Code:
    2013-04-03 11:49:03,502 [1365004143] Debug :  [broker-1] [Graphite broker] Groking: HappynessIndex=52%
    2013-04-03 11:49:03,503 [1365004143] Debug :  [broker-1] [Graphite broker] End of grok: HappynessIndex, 52%
    From what I see above, the time_stamp ordering does not affect the end result. Still only one value (the last one) gets to the Graphite broker.

    The 3 values are received in the ws_arbiter module and the values are added to the queue. The 3 values are also received by the scheduler as indicated in the log.

    For an unknown reason at this time, the scheduler does not transmit all the commands.

    I saw in the code the "hook tick". I will investigate further on this to better understand the workings.

    If anyone has an idea, I'm open.

    @naparuba:
    I will work on a pull request soon

  5. #5
    Shinken project leader
    Join Date
    May 2011
    Location
    Bordeaux (France)
    Posts
    2,130

    Re: Multiple values for ws_arbiter

    I think the key is in the scheduler in fact. Each "turn" you can only have on result for an element. So the last one is erasing the previous "check" object with the new value. Will be hard to avoid this problem, because it was design like this in the whole scheduler. We can try to print on the scheduler if it's what happened really.
    No direct support by personal message. Please open a thread so everyone can see the solution

  6. #6
    aboisvert
    Guest

    Re: Multiple values for ws_arbiter

    Thank you for the feedback.

    I will keep working on this on github. I'll see what I can do and will report back Monday.


  7. #7
    aboisvert
    Guest

    Re: Multiple values for ws_arbiter

    Hello,

    I did some tests on my shinken installation.

    With the new ws_arbiter module I wrote, I can send more than one value to the webservice and the values are sent to Graphite. I tested 2 hosts with 2 services by host. Everything worked OK.

    Problems only occur when I send the same metric (Host/Service) with multiple values. I can't send the following data properly:
    Host1/Service1/Value1/Timestamp1, Host1/Service1/Value2/Timestamp2
    I can't buffer metrics for a few minutes and then send the metrics in bulk. The scheduler seems to filter out "old" results. This behavior is built into shinken itself.

    I will mark this post as [RESOLVED] and open a "suggestion thread" on github.

    Thank you

  8. #8
    Administrator
    Join Date
    Dec 2011
    Posts
    278

    Re: [ANSWERED] Multiple values for ws_arbiter

    So the short answer is:

    Shinken cannot
    • [li]send two values related to a host/service within the same update period (1 second)[/li]
      [li]have more resolution of events than a second[/li]


    Cheers,

    xkilian

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •