In Vitrage we use configuration files, called “templates”, to express rules regarding raising deduced alarms, setting deduced states, and detecting/setting RCA links. This page describes the format of the Vitrage templates, with some examples and open questions on extending this format. Additionally, a short guide on adding templates is presented.
metadata:
name: <unique template identifier>
description: <what this template does>
definitions:
entities:
- entity: ...
- entity: ...
relationships:
- relationship: ...
- relationship: ...
scenarios:
- scenario:
condition: <if statement true do the action>
actions:
- action: ...
The template is divided into three main sections:
metadata: Contains the template name, and brief description of what the template does (optional)
The condition which needs to be met will be phrased using the entities and relationships previously defined. An expression is either a single entity, or some logical combination of relationships. Expression can be combined using the following logical operators:
The following are examples of valid expressions, where X, Y and Z are relationships:
The following template demonstrates
metadata: name: host_high_mem_load_to_instance_mem_suboptimal description: when there is high memory on the host, show implications on the instances definitions: entities: - entity: category: ALARM type: host_high_mem_load template_id: host_alarm # some string - entity: category: ALARM type: instance_mem_performance_problem template_id: instance_alarm - entity: category: RESOURCE type: nova.host template_id: host - entity: category: RESOURCE type: nova.instance template_id: instance relationships: - relationship: source: host_alarm # source and target from entities section target: host relationship_type: on template_id : alarm_on_host - relationship: source: instance_alarm target: instance relationship_type: on template_id : alarm_on_instance - relationship: source: host target: instance relationship_type: contains template_id : host_contains_instance scenarios: - scenario: condition: alarm_on_host and host_contains_instance # condition uses relationship ids actions: - action: action_type: raise_alarm properties: alarm_name: instance_mem_performance_problem severity: warning action_target: target: instance # entity template_id - action: action_type: set_state properties: state: suboptimal action_target: target: instance # entity template_id - scenario: condition: alarm_on_host and alarm_on_instance and host_contains_instance actions: - action: action_type: add_causal_relationship action_target: source: host_alarm target: instance_alarm
The following template will change the state of a resource to “ERROR” if there is any alarm of severity “CRITICAL” on it.
metadata: id: deduced_state_for_all_with_alarm description: deduced state for all resources with alarms definitions: entities: - entity: category: RESOURCE template_id: a_resource # entity ids are any string - entity: category: ALARM severity: critical template_id: high_alarm # entity ids are any string relationships: - relationship: source: high_alarm target: a_resource relationship_type: on template_id : high_alarm_on_resource scenarios: - scenario: condition: high_alarm_on_resource actions: - action: action_type : set_state properties: state: error action_target: target: a_resource
This template will cause an alarm to be raised on any Host in state “ERROR”
Note that in this template, there are no relationships. The condition is just that the entity exists. Also note that the states and severity are case-insensitive.
metadata: name: deduced_alarm_for_all_host_in_error description: raise deduced alarm for all hosts in error definitions: entities: - entity: category: RESOURCE type: nova.host state: error template_id: host_in_error scenarios: - scenario: condition: host_in_error actions: - action: action_type: raise_alarm properties: alarm_name: host_in_error_state severity: critical action_target: target: host_in_error
This template will raise a deduced alarm on an instance, which can be caused by an alarm on the hosting zone or an alarm on the hosting host.
metadata: name: deduced_alarm_two_possible_triggers description: deduced alarm using or in condition definitions: entities: - entity: category: ALARM type: zone_connectivity_problem template_id: zone_alarm - entity: category: ALARM type: host_connectivity_problem template_id: host_alarm - entity: category: RESOURCE type: nova.zone template_id: zone - entity: category: RESOURCE type: nova.host template_id: host - entity: category: RESOURCE type: nova.instance template_id: instance relationships: - relationship: source: zone_alarm target: zone relationship_type: on template_id : alarm_on_zone - relationship: source: zone_alarm target: zone relationship_type: on template_id : alarm_on_host - relationship: source: zone target: host relationship_type: contains template_id : zone_contains_host - relationship: source: host target: instance relationship_type: contains template_id : host_contains_instance scenarios: - scenario: condition: (alarm_on_host and host_contains_instance) or (alarm_on_zone and zone_contains_host and host_contains_instance) actions: - action: action_type : raise_alarm properties: alarm_name: instance_connectivity_problem severity: critical action_target: target: instance
block | key | supported values | comments |
---|---|---|---|
entity | category | ALARM RESOURCE | |
entity (ALARM) | type | any string | |
entity (RESOURCE) | type | openstack.cluster, nova.zone, nova.host, nova.instance, cinder.volume, switch | These are for the datasources that come with vitrage by default. Adding datasources will add more supported types, as defined in the datasource transformer |
action | action_type | raise_alarm, set_state, add_causal_relationship mark_down |
action:
action_type : raise_alarm
properties:
alarm_name: some problem # mandatory; string that is valid variable name
severity: critical # mandatory; should match values in "vitrage.yaml"
action_target:
target: instance # mandatory. entity (from the definitions section) to raise an alarm on. Should not be an alarm.
action:
action_type : set_state
properties:
state: error # mandatory; should match values in the relevant datasource_values YAML file for this entity.
action_target:
target: host # mandatory. entity (from the definitions section) to change state
action:
action_type : add_causal_relationship
action_target:
source: host_alarm # mandatory. the alarm that caused the target alarm (name from the definitions section)
target: instance_alarm # mandatory. the alarm that was caused by the source alarm (name from the definitions section)
Set an entity marked_down field. This can be used along with nova notifier to call force_down for a host
action: action_type : mark_down action_target: target: host # mandatory. entity (from the definitions section, only host) to be marked as down
We need to support a “not” operator, that indicates the following expression must not be satisfied in order for the condition to be met. “not” should apply to relationships, not entities. Then we could have a condition like
condition: host_contains_instance and not alarm_on_instance
Consider a template that has two entities of the same category+type, say E1 and E2 both are instances like this:
metadata: name: two_similar_instances definitions: entities: - entity: category: RESOURCE type: nova.host template_id: host - entity: category: RESOURCE type: nova.instance template_id: instance1 - entity: category: RESOURCE type: nova.instance template_id: instance2 ... relationships: - relationship: source: host target: instance1 relationship_type: contains template_id: link1 - relationship: source: host target: instance2 relationship_type: contains template_id: link2 ...
There are three options of how to interpret this template:
Thus, we need a way to distinguish between options 2 & 3 (as option 1 can be expressed by using only instance1). This can be done in two ways: 1. Introducing another logical operator “neq”, to be used between expressions:
condition: (instance1 neq instance2) and...
relationship: source: instance1 target: instance2 relationship_type: neq
To support cardinality, for example to express we want a host to have two instances on it, we could take different approaches.
1. One approach would rely on the “neq” relationship described above. Similar to the example given in the previous section, stating that the two instances on the host are not equal is equivalent to a cardinality of two. 2. A different approach would be to expand the definition of the “relationship” clause. By default cardinality=1 (which will support backward compatibility)
For example, we might use the one of the following formats
- relationship: # option A
source: host
target: instance
target_cardinality: 2 # two instances, but only one host
relationship_type: contains
template_id: host_contains_two_instances_A
- relationship: # option B, same as option A but split into two lines
source: host
target: instance
cardinality_for: instance
cardinality: 2
relationship_type: contains
template_id: host_contains_two_instances_B