OS-Faults 0.2.7¶
OpenStack fault-injection library
OS-Faults library provides an unified abstract API for performing destructive actions against OpenStack cloud. The library is extendable using drivers. Basic drivers for DevStack, Linux services and power management are already included.
Contents¶
Quickstart¶
This section describes how to start using os-faults.
Installation¶
At the command line:
$ pip install os-faults
Or, if you have virtualenvwrapper installed:
$ mkvirtualenv os-faults
$ pip install os-faults
The library contains optional libvirt driver, if you plan to use it, please use the following command to install os-faults with extra dependencies:
pip install os-faults[libvirt]
The library relies on Ansible which needs to be installed separately. Please refer to [https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html] for installation instructions.
Basics¶
Configuration file¶
The cloud deployment configuration schema has simple YAML/JSON format:
cloud_management:
driver: devstack
args:
address: 192.168.1.240
auth:
username: stack
private_key_file: cloud_key
iface: enp0s8
power_managements:
- driver: libvirt
args:
connection_uri: qemu+ssh://ubuntu@10.0.1.50/system
By default, the library reads configuration from a file with one of
the following names: os-faults.{json,yaml,yml}
. The configuration
file is searched in one of default locations:
- current directory
- ~/.config/os-faults
- /etc/openstack
Also, the name of the configuration file can be specified in the
OS_FAULTS_CONFIG
environment variable:
$ export OS_FAULTS_CONFIG=/home/alex/my-os-faults-config.yaml
Execution¶
Establish a connection to the cloud and verify it:
import os_faults
cloud_management = os_faults.connect(config_filename='os-faults.yaml')
cloud_management.verify()
or via CLI:
$ os-faults verify -c os-faults.yaml
Make some destructive action:
cloud_management.get_service(name='keystone').restart()
or via CLI:
$ os-inject-fault -c os-faults.yaml restart keystone service
API¶
The library operates with different types of objects:
- service - is a software that runs in the cloud, e.g. nova-api
- containers - is a container that runs in the cloud, e.g. neutron-api
- nodes - nodes that host the cloud, e.g. a hardware server with a hostname
Human API¶
Human API is used to specify faults as normal English sentences.
import os_faults
cloud_management = os_faults.connect(config_filename='os-faults.yaml')
os_faults.human_api(cloud_management, 'restart keystone service')
Service-oriented command performs specified action against service on all, on one random node or on the node specified by FQDN:
<action> <service> service [on (random|one|single|<fqdn> node[s])]
- Examples:
- Restart Keystone service - restarts Keystone service on all nodes.
- kill nova-api service on one node - restarts Nova API on one randomly-picked node.
Node-oriented command performs specified action on node specified by FQDN or set of service’s nodes:
<action> [random|one|single|<fqdn>] node[s] [with <service> service]
- Examples:
- Reboot one node with mysql - reboots one random node with MySQL.
- Reset node-2.domain.tld node - reset node node-2.domain.tld.
Network-oriented command is a subset of node-oriented and performs network management operation on selected nodes:
<action> <network> network on [random|one|single|<fqdn>] node[s]
[with <service> service]
- Examples:
- Disconnect management network on nodes with rabbitmq service - shuts down management network interface on all nodes where rabbitmq runs.
- Connect storage network on node-1.domain.tld node - enables storage network interface on node-1.domain.tld.
Extended API¶
1. Service actions¶
Get a service and restart it:
cloud_management = os_faults.connect(cloud_config)
service = cloud_management.get_service(name='glance-api')
service.restart()
- Available actions:
- start - start Service
- terminate - terminate Service gracefully
- restart - restart Service
- kill - terminate Service abruptly
- unplug - unplug Service out of network
- plug - plug Service into network
2. Container actions¶
Get a container and restart it:
cloud_management = os_faults.connect(cloud_config)
container = cloud_management.get_container(name='neutron_api')
container.restart()
- Available actions:
- start - start Container
- terminate - terminate Container gracefully
- restart - restart Container
3. Node actions¶
Get all nodes in the cloud and reboot them:
nodes = cloud_management.get_nodes()
nodes.reboot()
- Available actions:
- reboot - reboot all nodes gracefully
- poweroff - power off all nodes abruptly
- reset - reset (cold restart) all nodes
- disconnect - disable network with the specified name on all nodes
- connect - enable network with the specified name on all nodes
4. Operate with nodes¶
Get all nodes where a service runs, pick one of them and reset:
nodes = service.get_nodes()
one = nodes.pick()
one.reset()
Get nodes where l3-agent runs and disable the management network on them:
fqdns = neutron.l3_agent_list_hosting_router(router_id)
nodes = cloud_management.get_nodes(fqdns=fqdns)
nodes.disconnect(network_name='management')
5. Operate with services¶
Restart a service on a single node:
service = cloud_management.get_service(name='keystone')
nodes = service.get_nodes().pick()
service.restart(nodes)
Configuration specification¶
Configuration file contains the following parameters:
- cloud_management
- power_managements
- node_discover
- services
- containers
Each parameter specifies a driver or a list of drivers.
Example configuration:
cloud_management:
driver: devstack
args:
address: 192.168.1.240
auth:
username: ubuntu
iface: enp0s3
power_managements:
- driver: libvirt
args:
connection_uri: qemu+ssh://ubuntu@10.0.1.50/system
- driver: ipmi
args:
fqdn_to_bmc:
node-1.domain.tld:
address: 120.10.30.65
username: alex
password: super-secret
node_discover:
driver: node_list
args:
- fqdn: node-1.domain.tld
ip: 192.168.1.240
mac: 1e:24:c3:75:dd:2c
services:
glance-api:
driver: screen
args:
grep: glance-api
window_name: g-api
hosts:
- 192.168.1.240
containers:
neutron_api:
driver: docker_container
args:
container_name: neutron_api
cloud_management¶
This parameter specifies cloud management driver and its arguments.
cloud_management
is responsible for configuring connection to nodes
and contains arguments such as SSH username/password/key/proxy.
cloud_management:
driver: devstack # name of the driver
args: # arguments for the driver
address: 192.168.1.240
auth:
username: ubuntu
iface: enp0s3
Drivers can support discovering of cloud nodes. For example,
saltcloud
drives allow discovering information about nodes
through master/config node of the cloud.
List of supported drivers for cloud_management: Cloud management
power_managements¶
This parameter specifies list of power management drivers. Such drivers allow controlling power state of cloud nodes.
power_managements:
- driver: libvirt # name of the driver
args: # arguments for the driver
connection_uri: qemu+ssh://ubuntu@10.0.1.50/system
- driver: ipmi # name of the driver
args: # arguments for the driver
fqdn_to_bmc:
node-1.domain.tld:
address: 120.10.30.65
username: alex
password: super-secret
List of supported drivers for power_managements: Power management
node_discover¶
This parameter specifies node discover driver. node_discover
is responsible
for fetching list of hosts for the cloud. If node_discover
is specified in
configuration then cloud_management
will only control connection options to
the nodes.
node_discover:
driver: node_list
args:
- fqdn: node-1.domain.tld
ip: 192.168.1.240
mac: 1e:24:c3:75:dd:2c
List of supported drivers for node_discover: Node discover
services¶
This parameter specifies list of services and their types. This parameter
allows updating/adding services which are embedded in cloud_management
driver.
services:
glance-api: # name of the service
driver: screen # name of the service driver
args: # arguments for the driver
grep: glance-api
window_name: g-api
hosts: # list of hosts where this service running
- 192.168.1.240
mysql: # name of the service
driver: process # name of the service driver
args: # arguments for the driver
grep: mysqld
port:
- tcp
- 3307
restart_cmd: sudo service mysql restart
start_cmd: sudo service mysql start
terminate_cmd: sudo service mysql stop
Service driver contains optional hosts
parameter which controls discovering
of hosts where the service is running. If hosts
specified, then service
discovering is disabled for this service and hosts specified in hosts
will
be used, otherwise, service will be searched across all nodes.
List of supported drivers for services: Service drivers
containers¶
This parameter specifies list of containers and their types. This parameter
allows updating/adding containers which are embedded in cloud_management
driver.
containers:
neutron_api: # name of the container
driver: docker_container # name of the container driver
args: # arguments for the driver
container_name: neutron_api
hosts: # list of hosts where this container running
- 192.168.1.240
Container driver contains optional hosts
parameter which controls discovering
of hosts where the container is running. If hosts
specified, then container
discovering is disabled for this container and hosts specified in hosts
will
be used, otherwise, container will be searched across all nodes.
List of supported drivers for containers: Container drivers
OS-Faults + Rally¶
Combination of OS-Faults and Rally gives a powerful tool to test OpenStack high availability and fail-over under the load.
Fault injection is implemented with help of Rally Fault Injection Hook. Following is an example of Rally scenario performing Keystone authentication with restart of one of Memcached services:
---
Authenticate.keystone:
-
runner:
type: "constant_for_duration"
duration: 30
concurrency: 5
context:
users:
tenants: 1
users_per_tenant: 1
hooks:
-
name: fault_injection
args:
action: restart memcached service on one node
trigger:
name: event
args:
unit: iteration
at: [100]
The moment of fault injection can be specified as iteration number or in time relative to the beginning of the test:
trigger:
name: event
args:
unit: time
at: [10]
Parameter action contains fault specification in human-friendly format, see Human API for details.
More on reliability testing of OpenStack:
- Reliability Test Plan in OpenStack performance documentation
- Keystone authentication with restart memcached report collected in OpenStack deployed by Fuel
- Introduction into reliability metrics video cast
CLI reference¶
os-inject-fault¶
usage: os-inject-fault [-h] [-c CONFIG] [-d] [-v] [command]
positional arguments:
command fault injection command, e.g. "restart keystone
service"
optional arguments:
-h, --help show this help message and exit
-c CONFIG, --config CONFIG
path to os-faults cloud connection config
-d, --debug
-v, --verify verify connection to the cloud
Built-in drivers:
devstack - DevStack driver
docker_container - Docker container
ipmi - IPMI power management driver
libvirt - Libvirt power management driver
node_list - Reads hosts from configuration file
process - Service as process
salt_service - Service in salt
saltcloud - SaltCloud management driver
system_service - System Service (systemd, upstart, SysV, etc.)
universal - Universal cloud management driver
*Service-oriented* commands perform specified action against service on
all, on one random node or on the node specified by FQDN:
<action> <service> service [on (random|one|single|<fqdn> node[s])]
where:
action is one of:
freeze - Pause service execution
kill - Terminate Service abruptly on all nodes or on particular subset
plug - Plug Service into network on all nodes or on particular subset
restart - Restart Service on all nodes or on particular subset
start - Start Service on all nodes or on particular subset
terminate - Terminate Service gracefully on all nodes or on particular subset
unfreeze - Resume service execution
unplug - Unplug Service out of network on all nodes or on particular subset
service is one of supported by driver:
devstack: cinder-api, cinder-scheduler, cinder-volume, etcd, glance-api,
heat-api, heat-engine, keystone, memcached, mysql,
neutron-dhcp-agent, neutron-l3-agent,
neutron-meta-agent, neutron-openvswitch-agent,
neutron-server, nova-api, nova-compute,
nova-scheduler, placement-api, rabbitmq
saltcloud: cinder-api, cinder-backup, cinder-scheduler, cinder-volume,
elasticsearch, glance-api, glance-glare,
glance-registry, grafana-server, heat-api,
heat-engine, horizon, influxdb, keystone, kibana,
memcached, mysql, nagios3, neutron-dhcp-agent,
neutron-l3-agent, neutron-metadata-agent,
neutron-openvswitch-agent, neutron-server, nova-api,
nova-cert, nova-compute, nova-conductor,
nova-consoleauth, nova-novncproxy, nova-scheduler,
rabbitmq
universal: /no built-in support/
Examples:
* "Restart Keystone service" - restarts Keystone service on all nodes.
* "kill nova-api service on one node" - restarts Nova API on one
randomly-picked node.
*Node-oriented* commands perform specified action on node specified by FQDN
or set of service's nodes:
<action> [random|one|single|<fqdn>] node[s] [with <service> service]
where:
action is one of:
connect - Connect nodes to <network_name> network
disconnect - Disconnect nodes from <network_name> network
poweroff - Power off all nodes abruptly
poweron - Power on all nodes abruptly
reboot - Reboot all nodes gracefully
reset - Reset (cold restart) all nodes
shutdown - Shutdown all nodes gracefully
stress - Stress node OS and hardware
service is one of supported by driver:
devstack: cinder-api, cinder-scheduler, cinder-volume, etcd, glance-api,
heat-api, heat-engine, keystone, memcached, mysql,
neutron-dhcp-agent, neutron-l3-agent,
neutron-meta-agent, neutron-openvswitch-agent,
neutron-server, nova-api, nova-compute,
nova-scheduler, placement-api, rabbitmq
saltcloud: cinder-api, cinder-backup, cinder-scheduler, cinder-volume,
elasticsearch, glance-api, glance-glare,
glance-registry, grafana-server, heat-api,
heat-engine, horizon, influxdb, keystone, kibana,
memcached, mysql, nagios3, neutron-dhcp-agent,
neutron-l3-agent, neutron-metadata-agent,
neutron-openvswitch-agent, neutron-server, nova-api,
nova-cert, nova-compute, nova-conductor,
nova-consoleauth, nova-novncproxy, nova-scheduler,
rabbitmq
universal: /no built-in support/
Examples:
* "Reboot one node with mysql" - reboots one random node with MySQL.
* "Reset node-2.domain.tld node" - reset node node-2.domain.tld.
*Network-oriented* commands are subset of node-oriented and perform network
management operation on selected nodes:
[connect|disconnect] <network> network on [random|one|single|<fqdn>] node[s]
[with <service> service]
where:
network is one of supported by driver:
devstack: all-in-one
saltcloud: /no built-in support/
universal: /no built-in support/
service is one of supported by driver:
devstack: cinder-api, cinder-scheduler, cinder-volume, etcd, glance-api,
heat-api, heat-engine, keystone, memcached, mysql,
neutron-dhcp-agent, neutron-l3-agent,
neutron-meta-agent, neutron-openvswitch-agent,
neutron-server, nova-api, nova-compute,
nova-scheduler, placement-api, rabbitmq
saltcloud: cinder-api, cinder-backup, cinder-scheduler, cinder-volume,
elasticsearch, glance-api, glance-glare,
glance-registry, grafana-server, heat-api,
heat-engine, horizon, influxdb, keystone, kibana,
memcached, mysql, nagios3, neutron-dhcp-agent,
neutron-l3-agent, neutron-metadata-agent,
neutron-openvswitch-agent, neutron-server, nova-api,
nova-cert, nova-compute, nova-conductor,
nova-consoleauth, nova-novncproxy, nova-scheduler,
rabbitmq
universal: /no built-in support/
Examples:
* "Disconnect management network on nodes with rabbitmq service" - shuts
down management network interface on all nodes where rabbitmq runs.
* "Connect storage network on node-1.domain.tld node" - enables storage
network interface on node-1.domain.tld.
For more details please refer to docs: http://os-faults.readthedocs.io/
os-faults¶
Usage: os-faults [OPTIONS] COMMAND [ARGS]...
Options:
-d, --debug Enable debug logs
--version Show version and exit.
--help Show this message and exit.
Commands:
discover Discover services/nodes and save them to output config file
drivers List os-faults drivers
nodes List cloud nodes
verify Verify connection to the cloud
os-faults verify¶
Usage: os-faults verify [OPTIONS]
Verify connection to the cloud
Options:
-c, --config FILE path to os-faults cloud connection config
--help Show this message and exit.
os-faults discover¶
Usage: os-faults discover [OPTIONS] OUTPUT
Discover services/nodes and save them to output config file
Options:
-c, --config FILE path to os-faults cloud connection config
--help Show this message and exit.
os-faults nodes¶
Usage: os-faults nodes [OPTIONS]
List cloud nodes
Options:
-c, --config FILE path to os-faults cloud connection config
--help Show this message and exit.
os-faults drivers¶
Usage: os-faults drivers [OPTIONS]
List os-faults drivers
Options:
--help Show this message and exit.
Drivers¶
Cloud management¶
universal [CloudManagement]¶
Universal cloud management driver
This driver is suitable for the most abstract (and thus universal) case. The driver does not have any built-in services nor node discovery capabilities. All services need to be listed explicitly in a config file. Node list is specified using node_list node discovery driver.
Example of multi-node configuration:
Note that in this configuration a node discovery driver is required.
cloud_management:
driver: universal
node_discover:
driver: node_list
args:
- ip: 192.168.5.149
auth:
username: developer
private_key_file: cloud_key
become_password: my_secret_password
- ip: 192.168.5.150
auth:
username: developer
private_key_file: cloud_key
become_password: my_secret_password
devstack [CloudManagement, NodeDiscover]¶
Driver for DevStack.
This driver requires DevStack installed with Systemd (USE_SCREEN=False). Supports discovering of node MAC addresses.
Example configuration:
cloud_management:
driver: devstack
args:
address: 192.168.1.10
auth:
username: ubuntu
password: ubuntu_pass
private_key_file: ~/.ssh/id_rsa_devstack
iface: eth1
parameters:
- address - ip address of any devstack node
- username - username for all nodes
- password - password for all nodes (optional)
- private_key_file - path to key file (optional)
- iface - network interface name to retrieve mac address (optional)
Default services:
- cinder-api
- cinder-scheduler
- cinder-volume
- etcd
- glance-api
- heat-api
- heat-engine
- keystone
- memcached
- mysql
- neutron-dhcp-agent
- neutron-l3-agent
- neutron-meta-agent
- neutron-openvswitch-agent
- neutron-server
- nova-api
- nova-compute
- nova-scheduler
- placement-api
- rabbitmq
saltcloud [CloudManagement, NodeDiscover]¶
Driver for OpenStack cloud managed by Salt.
Supports discovering of slave nodes.
Example configuration:
cloud_management:
driver: saltcloud
args:
address: 192.168.1.10
auth:
username: root
password: root_pass
private_key_file: ~/.ssh/id_rsa_tcpcloud
slave_auth:
username: ubuntu
password: ubuntu_pass
become_username: root
slave_name_regexp: ^(?!cfg|mon)
slave_direct_ssh: True
get_ips_cmd: pillar.get _param:single_address
parameters:
- address - ip address of salt config node
- username - username for salt config node
- password - password for salt config node (optional)
- private_key_file - path to key file (optional)
- slave_username - username for salt minions (optional) username will be used if slave_username not specified
- slave_password - password for salt minions (optional) password will be used if slave_password not specified
- master_sudo - Use sudo on salt config node (optional)
- slave_sudo - Use sudo on salt minion nodes (optional)
- slave_name_regexp - regexp for minion FQDNs (optional)
- slave_direct_ssh - if False then salt master is used as ssh proxy (optional)
- get_ips_cmd - salt command to get IPs of minions (optional)
- serial - how many hosts Ansible should manage at a single time. (optional) default: 10
Default services:
- cinder-api
- cinder-backup
- cinder-scheduler
- cinder-volume
- elasticsearch
- glance-api
- glance-glare
- glance-registry
- grafana-server
- heat-api
- heat-engine
- horizon
- influxdb
- keystone
- kibana
- memcached
- mysql
- nagios3
- neutron-dhcp-agent
- neutron-l3-agent
- neutron-metadata-agent
- neutron-openvswitch-agent
- neutron-server
- nova-api
- nova-cert
- nova-compute
- nova-conductor
- nova-consoleauth
- nova-novncproxy
- nova-scheduler
- rabbitmq
Power management¶
libvirt [PowerDriver]¶
Libvirt driver.
Example configuration:
power_managements:
- driver: libvirt
args:
connection_uri: qemu+unix:///system
parameters:
- connection_uri - libvirt uri
Note that Libvirt domain name should be specified as node attribute. Refer to node discover (node_list driver) for details.
ipmi [PowerDriver]¶
IPMI driver.
Example configuration:
power_managements:
- driver: ipmi
args:
mac_to_bmc:
aa:bb:cc:dd:ee:01:
address: 170.0.10.50
username: admin1
password: Admin_123
aa:bb:cc:dd:ee:02:
address: 170.0.10.51
username: admin2
password: Admin_123
fqdn_to_bmc:
node3.local:
address: 170.0.10.52
username: admin1
password: Admin_123
parameters:
- mac_to_bmc - list of dicts where keys are the node MACs and
values are the corresponding BMC configurations with the folowing
fields:
- address - ip address of IPMI server
- username - IPMI user
- password - IPMI password
Node discover¶
node_list [NodeDiscover]¶
Node list.
Allows specifying list of nodes in configuration.
Example configuration:
node_discover:
driver: node_list
args:
- ip: 10.0.0.51
mac: aa:bb:cc:dd:ee:01
fqdn: node1.local
libvirt_name: node1
- ip: 192.168.1.50
mac: aa:bb:cc:dd:ee:02
fqdn: node2.local
auth:
username: user1
password: secret1
jump:
host: 10.0.0.52
username: ubuntu
private_key_file: /path/to/file
- ip: 10.0.0.53
mac: aa:bb:cc:dd:ee:03
fqdn: node3.local
become_password: my_secret_password
node parameters:
- ip - ip/host of the node
- mac - MAC address of the node (optional). MAC address is used for libvirt driver.
- fqdn - FQDN of the node (optional). FQDN is used for filtering only.
- libvirt_name - Libvirt domain name (optional).
- auth - SSH related parameters (optional):
- username - SSH username (optional)
- password - SSH password (optional)
- private_key_file - SSH key file (optional)
- become_password - privilege escalation password (optional)
- jump - SSH proxy parameters (optional):
- host - SSH proxy host
- username - SSH proxy user
- private_key_file - SSH proxy key file (optional)
Service drivers¶
process [Service]¶
Service as process
“process” is a basic service driver that uses ps and kill in actions like kill / freeze / unfreeze. Commands for start / restart / terminate should be specified in configuration, otherwise the commands will fail at runtime.
Example configuration:
services:
app:
driver: process
args:
grep: my_app
restart_cmd: /bin/my_app --restart
terminate_cmd: /bin/stop_my_app
start_cmd: /bin/my_app
port: ['tcp', 4242, 'ingress']
parameters:
- grep - regexp for grep to find process PID
- restart_cmd - command to restart service (optional)
- terminate_cmd - command to terminate service (optional)
- start_cmd - command to start service (optional)
- port - tuple with two or three values - protocol, port number, direction (optional)
Note that network operations are based on iptables. They are applied to the whole host and not restricted to a single process.
system_service [ServiceAsProcess]¶
System service
This is universal driver for any system services supported by Ansible (e.g. systemd, upstart). Please refer to Ansible documentation http://docs.ansible.com/ansible/latest/service_module.html for the whole list.
Example configuration:
services:
app:
driver: system_service
args:
service_name: app
grep: my_app
port: ['tcp', 4242, 'ingress']
parameters:
- service_name - name of a service
- grep - regexp for grep to find process PID
- port - tuple with two or three values - protocol, port number, direction (optional)
salt_service [ServiceAsProcess]¶
Salt service
Service that can be controlled by salt service.* commands.
Example configuration:
services:
app:
driver: salt_service
args:
salt_service: app
grep: my_app
port: ['tcp', 4242, 'egress']
parameters:
- salt_service - name of a service
- grep - regexp for grep to find process PID
- port - tuple with two or three values - protocol, port number, direction (optional)
Container drivers¶
docker_container [Container]¶
Docker container
This is docker container driver for any docker containers supported by Ansible. Please refer to Ansible documentation https://docs.ansible.com/ansible/latest/modules/docker_container_module.html for the whole list.
Example configuration:
containers:
app:
driver: docker_container
args:
container_name: app
parameters:
- container_name - name of the container
API Reference¶
-
os_faults.
connect
(cloud_config=None, config_filename=None)¶ Connects to the cloud
Parameters: - cloud_config – dict with cloud and power management params
- config_filename – name of the file where to read config from
Returns: CloudManagement object
-
os_faults.
discover
(cloud_config)¶ Connect to the cloud and discover nodes and services
Parameters: cloud_config – dict with cloud and power management params Returns: config dict with discovered nodes/services
-
os_faults.
human_api
(cloud_management, command)¶ Executes a command written as English sentence
Parameters: - cloud_management – library instance as returned by :connect: function
- command – text command
-
os_faults.
register_ansible_modules
(paths)¶ Registers ansible modules by provided paths
Allows to use custom ansible modules in NodeCollection.run_task method
Parameters: paths – list of paths to folders with ansible modules
-
class
os_faults.api.cloud_management.
CloudManagement
¶ -
execute_on_cloud
(hosts, task, raise_on_error=True)¶ Execute task on specified hosts within the cloud.
Parameters: - hosts – List of host FQDNs
- task – Ansible task
- raise_on_error – throw exception in case of error
Returns: Ansible execution result (list of records)
-
get_container
(name)¶ Get container with specified name
Parameters: name – name of the container Returns: Container
-
get_nodes
(fqdns=None)¶ Get nodes in the cloud
This function returns NodesCollection representing all nodes in the cloud or only those that has specified FQDNs. :param fqdns list of FQDNs or None to retrieve all nodes :return: NodesCollection
-
get_service
(name)¶ Get service with specified name
Parameters: name – name of the service Returns: Service
-
classmethod
list_supported_networks
()¶ Lists all networks supported by nodes returned by this driver
Returns: [String] list of network names
-
classmethod
list_supported_services
()¶ Lists all services supported by this driver
Returns: [String] list of service names
-
verify
()¶ Verify connection to the cloud.
-
-
class
os_faults.api.service.
Service
(service_name, config, node_cls, cloud_management, hosts=None)¶ -
discover_nodes
()¶ Discover nodes where this Service is running
Returns: NodesCollection
-
freeze
(nodes=None, sec=None)¶ Pause service execution
Send SIGSTOP to Service into network on all nodes or on particular subset. If sec is defined - it mean Service will be stopped for a wile.
Parameters: - nodes – NodesCollection
- sec – int
-
get_nodes
()¶ Get nodes where this Service is running
Returns: NodesCollection
-
kill
(nodes=None)¶ Terminate Service abruptly on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
plug
(nodes=None, direction=None, other_port=None)¶ Plug Service into network on all nodes or on particular subset
Parameters: - nodes – NodesCollection
- direction – str, traffic direction ‘ingress’ or ‘egress’
- other_port – int, port number which needs to be allowed
-
restart
(nodes=None)¶ Restart Service on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
start
(nodes=None)¶ Start Service on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
terminate
(nodes=None)¶ Terminate Service gracefully on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
unfreeze
(nodes=None)¶ Resume service execution
Send SIGCONT to Service into network on all nodes or on particular subset.
Parameters: nodes – NodesCollection
-
unplug
(nodes=None, direction=None, other_port=None)¶ Unplug Service out of network on all nodes or on particular subset
Parameters: - nodes – NodesCollection
- direction – str, traffic direction ‘ingress’ or ‘egress’
- other_port – int, port number which needs to be blocked
-
-
class
os_faults.api.container.
Container
(container_name, config, node_cls, cloud_management, hosts=None)¶ -
discover_nodes
()¶ Discover nodes where this Container is running
Returns: NodesCollection
-
get_nodes
()¶ Get nodes where this Container is running
Returns: NodesCollection
-
restart
(nodes=None)¶ Restart Container on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
start
(nodes=None)¶ Start Container on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
terminate
(nodes=None)¶ Terminate Container gracefully on all nodes or on particular subset
Parameters: nodes – NodesCollection
-
-
class
os_faults.api.node_collection.
NodeCollection
(cloud_management=None, hosts=None)¶ -
connect
(network_name)¶ Connect nodes to <network_name> network
Parameters: network_name – name of network
-
disconnect
(network_name)¶ Disconnect nodes from <network_name> network
Parameters: network_name – name of network
-
pick
(count=1)¶ Pick one Node out of collection
Returns: NodeCollection consisting just one node
-
poweroff
()¶ Power off all nodes abruptly
-
poweron
()¶ Power on all nodes abruptly
-
reboot
()¶ Reboot all nodes gracefully
-
reset
()¶ Reset (cold restart) all nodes
-
revert
(snapshot_name, resume=True)¶ Revert snapshot for all nodes
-
run_task
(task, raise_on_error=True)¶ Run ansible task on node colection
Parameters: - task – ansible task as dict
- raise_on_error – throw exception in case of error
Returns: AnsibleExecutionRecord with results of task
-
shutdown
()¶ Shutdown all nodes gracefully
-
snapshot
(snapshot_name, suspend=True)¶ Create snapshot for all nodes
-
stress
(target, duration=None)¶ Stress node OS and hardware
-
Contributing¶
If you would like to contribute to the development of OpenStack, you must follow the steps in this page:
If you already have a good understanding of how the system works and your OpenStack accounts are set up, you can skip to the development workflow section of this documentation to learn how changes to OpenStack should be submitted for review via the Gerrit tool:
Note that the primary repo is https://opendev.org/performa/os-faults/ Repos located at GitHub are mirrors and may be out of sync.
Project bug tracker is Launchpad:
Release Notes¶
CHANGES¶
0.2.7¶
- Fix extended plug/unplug commands
0.2.6¶
- Extend network commands to operate with inter-service connectivity
- Add a job to mirror the code from OpenDev to Github
- Split requirements for py27 and >py36
0.2.5¶
- Update project links and package metadata
- Update jsonschema according to requirements
0.2.4¶
- Document how to use libvirt driver
- Fix reboot command
- Remove oom method since it has never been implemented
- Fix representation of signal.SIG* constants
0.2.3¶
- Re-implement network faults using standard iptables module
- Update constraints handling
0.2.2¶
- Fix service discovery
- Fix devstack and py27 jobs
- Update repo address in readme file
- OpenDev Migration Patch
0.2.1¶
- Declare support of python 3.7
- add python 3.7 unit test job
- Unpin pytest and its plugins
- Fix key management in devstack job
- Add DevStack plugin
0.2.0¶
- Use bindep to specify binary dependencies
- Update hacking version
- Add integration test for devstack cloud management driver
- Tests cleanup
- Rename `tcpcloud` driver into `saltcloud`
- Unify auth parameters between drivers
- Specify auth parameters in devstack tests
- Remove deprecated methods
- Simplify universal driver
- Modify integration tests job to run on DevStack
- Change openstack-dev to openstack-discuss
- Remove old drivers
- Do not link with Ansible code
- Fix README markup
- Optimizing the safety of the http link site in HACKING.rst
- Move Zuul jobs from global project-config to the repo
0.1.18¶
- Specify webhook id for documentation build
- Update containers documentation
- Add docker containers support
- Typo fix in universal driver documantation
- fix tox python3 overrides
- Replaces yaml.load() with yaml.safe_load()
- Zuul: Remove project name
- Add section describing OS-Faults and Rally integration
- Update documentation and examples
0.1.17¶
- Add integration job for Zuul
- Fix custom module support in Ansible 2.4
- Add glance-glare service to the tcpdriver
0.1.16¶
- Fix for Ansible >= 2.4.0
- Add logging into human API call
- Universal cloud management driver
- Universal driver for any system services (systemd, upstart, etc.)
0.1.15¶
- Upper constraint ansible to fix library
- Move all service drivers under services/ package
- Refactor services module
- Make privilege escalation password configurable
- Group drivers modules by type
- Fix process name patterns in DevStack driver
- Implement stress injection
0.1.14¶
- Add Devstack Systemd driver
- Switch from oslosphinx to openstackdocstheme
0.1.13¶
- [trivial] Several typos fix in tcpcloud driver
- [trivial] Several typo fixes in devstack driver
- Skip checking of Devstack hosts
0.1.12¶
- Auth configuration for each node
- Allow adding libvirt name for node
- Add serial parameter to cloud drivers
- [docs] Add documentation for config file
- [CLI] Add os-faults discover command
- [Core] Predefined ips for services in config
- [Core] Allow adding services in config
- [Core] Services as drivers
- Add ‘slave_direct_ssh’ parameter to fuel driver
0.1.11¶
- Bump ansible version
- Add ConnectTimeout=60 to default ssh args
- Allow binding BMC by node fqdn in ipmi driver
- Add shutdown method to all power drivers
- Update requirements
- Add monitoring services to tcpcloud driver
- Load modules before creating ansible playbook
0.1.10¶
- Remove brackets from Service.GREP variables
- Add password auth to devstack and tcpcloud
- Add docs for drivers
- [tcpcloud] Additional configuration parameters
- Allow usage of multiple power drivers at once
- Add node_list driver
- Use filter in get_nodes
0.1.9¶
- Add ironic services to devstack driver
- Support for multi-node devstack
- Add neutron services to tcpcloud driver
- Allow filtering of node collection
- Skip monitoring nodes in TcpCloud
0.1.8¶
- New configuration parameters for TCPCloud driver
- Add glance-registry and cinder-backup to TcpCloud driver
- Add missing Nova serices to TcpCloud driver
- Search all nodes except master on TcpCloud driver
- Add Swift services to Fuel driver
- Add Cinder and Glance services to Fuel driver
- Add Nova services to fuel driver
- Add destroy before revert when using libvirt driver
0.1.7¶
- Add neutron agents to Fuel driver
- Add neutron-server
- Add snapshot and revert to libvirt driver
0.1.6¶
- Extend tcpcloud configuration
- Add start/terminate for all services
- Fix ssh to slave nodes
- Ironic service for Fuel driver
0.1.5¶
- Revert “Change requirement link to sphinxcontrib-programoutput”
0.1.4¶
- horizon and cinder-volume service for TCPCloud and Fuel
- Change requirement link to sphinxcontrib-programoutput
- Add Cinder services to TCPCloud and Fuel drivers
0.1.3¶
- Add Services to TCPCloud driver
- Add devstack services
- Support TCPCloud platform
- Add restart command for Fuel MySQL service
- Add support of standard operators to NodeCollection
- Add NodeCollection.get_fqdns method
- Fix RESTART_CMD for Fuel services
- Fix result logs in AnsibleRunner.execute
- Improve logging
- Fixed link to launchpad in docs
- Fix readthedocs build
- Move services to common module and reuse them in libvirt driver
0.1.2¶
- Fix AnsibleRunner.execute
- Return result from NodeCollection.run_task
- Fix release notes build for readthedocs
0.1.1¶
- Allow to run custom ansible modules on NodeCollection
- Fix docs build for readthedocs
- Raise ServiceError in case of unknown service requested
- Decorator for required variables
- Improvements for version
- Move plug/unplug bash command to ansible module
- Make libvirt-python an extra requirement
- Fuel services refactoring
- Fix doc for libvirt driver
- Rename private_key to private_key_file for devstack driver
0.1.0¶
- Configuration validation
- Update configuration
- Add usage and API reference into docs
- Improve help in command-line utility
- Load drivers dynamically
- Fix the issue with getting nodes by FQDNs
- Enable release notes translation
- Unit tests for os_faults.ansible.executor
- Rename and fix network-management commands in NodesCollection
- Add missing unit tests for libvirt and ipmi drivers
- Unit test for os_faults.cmd
- Small cleanup before the first release
- Move unit tests from os_faults/tests/ to os_faults/tests/unit/
- Improve logging for os_faults/drivers/fuel.py module
- Implement reboot action in Fuel driver
- Fix printing large output when getting hosts in debug mode
- Unit test for os_faults.utils
- Restart for nova-api, glance-api services
- Support nova compute/scheduller, neutron L2/L3 agent, heat api/engine
- Fix py35 tests
- Add human-friendly interface
- Unit tests for fuel network manegement ansible module
- Add memcached service operations
- Fix FuelService._run_task
- Fix RabbitMQ commands
- Allow to view logs from pytest
- Unit tests for devstack driver
- Unit tests for FuelService
- Tests for os_faults.connect
- Unit tests for FuelNodeCollection
- Use pytest to run test in tox
- Fix FuelManagement._retrieve_hosts_fqdn
- Unit tests for FuelManagement
- OS-Faults mechanism to close/open service ports on nodes
- Fix Fuel key distribution script
- Adding unit tests for IPMI driver
- Specify which key to use to access the cloud
- Adding unit tests for libvirt driver
- Docs cleanup
- SIGSTOP/SIGCONT signals for RabbitMQ/MySQL/Keystone-API/Nova-API services
- Read configuration from file or standard location
- Examples cleanup
- Fuel driver code improvements
- Rename os-failures into os-faults
- Update .gitreview post project rename
- Adding threads to IPMI driver
- Sigkill signal for RabbitMQ/MySQL/Keystone-API/Nova-API/Glance-API services
- Require Ansible 2 or higher
- Service discovering
- Adding threds to libvirt driver
- Add IPMI driver
- Adding a simple libvirt driver
- Fix project name in .gitreview
- Docs cleanup
- Run DevStack driver with sudo
- Add sample DevStack driver
- Fix formatting in readme
- Document use cases in readme file
- Add simple network management module
- Cleanup code and more logging
- Remove unnecessary utils module
- Enable logging
- Add network failures API
- Added get_nodes by fqdn
- Document API
- Added ky distribution script
- Fix PEP8 errors
- Cleanup Ansible executor
- Add connection verification
- Flatten client configuration
- Added power management
- Add node operations
- Service API prototype
- Added Ansible runner
- Tune tox.ini and update readme
- Initial Cookiecutter Commit