Hyper Open Edge Cloud

SlapOs process watchdog architecture explaigned

Describes the SlapOs watchdog architecture and basic implementation details
  • Last Update:2016-06-28
  • Version:001
  • Language:en

Architecture

Every SlapOs machine has a "watchdog" process which is monitoring other processes and calls "bang" to master (either a real slapos.org master or local slapproxy one).

The purpose of bang is to infor master a process died so it can do whatever needed so next call to slapos node instance really do start it.

Bang is called whenever:

  • called explicitly (by a promise for example, by a service itself)
  • a process watched by the watchdog becomes in a state that is not the one it is supposed to be

In theory, buildout is run all the time, repeatedly and is supposed to have 0 execution time (theoretical model). But since that would take 100% of CPU, we have to call it less often. So, we find ways to call it less often.

buildout is called

  • every X (this can be configured at the profile level)
  • if promises are not all satisfied
  • if requested services are not available
  • as the result of bang

buildout is actually called by slapgrid. Slapgrid itself is called every Y (in theory, Y = 0, but in reality 1 minute). So, slapgrid is called:

  • at lease every minute
  • right after a slapgrid call if something happened in the previous call (ex. request of new service, failing promise) with an increasing delay to reduce CPU load

Short cut optimisations to consider

Currently bang has to go through the master. It is possible in future to consider a short cut that does not go through the master. But I (JP) am not sure yet that it is a good idea. It is probably simpler and cleaner to run slapproxy locally if one needs full autonomy.