Hello everyone,

I have installed a Shinken HA architecture with 2 nodes with Thruk and NagVis frontends with the Livestatus module. It worked for a while (it's not yet in production) until last friday when the Livestatus broker was broken simultaneaously and unexpectedly on the 2 nodes :-(

Here are the error messages which are written in loop in the brokerd.log:
[1460239205] ERROR: [broker-master] LiveStatusClientError: Could not read from client: [Errno 104] Connection reset by peer
[1460239205] WARNING: [broker-master] Error on client socket shutdown: [Errno 107] Transport endpoint is not connected
My software versions:
  • CentOS 6.7 x86_64
  • Python 2.6.6
  • Shinken 2.4.2
  • Livestatus module 1.4.2
  • Thruk 2.02
  • NagVis 1.7.10

Livestatus config file:
## Module:      livestatus
## Loaded by:   Broker
# The LIVESTATUS API makes internal Shinken data available via the network
# using an SQL-like syntax. The API supports various access methods,
# authentication and sophisticated performance options. The premier interface
# to Shinken internal host and service states, historical data, performance
# data, configuration data, comments, maintenance periods, etc.
define module {
    module_name     livestatus
    module_type     livestatus
    host            *           ; * = listen on all configured IP addresses
    port            50000       ; port to listen
    socket          /var/lib/shinken/rw/live  ; If a Unix socket is required
    ## Available modules:
    # - logstore-sqlite: send historical logs to a local sqlite database
    # - logstore-mongodb: send historical logs to a mongodb database
    # - logstore-null : send historical logs to a black hole
    modules         logstore-sqlite
    #debug           /tmp/ls.debug   ; Enable only for debugging this module
    #debug_queries   0   ; Set to 1 to dump queries/replies too (very verbose)
Thank you for your help, I don't really have a clue of where did it come from!