NAME
keepalived.conf - configuration file for Keepalived
Note:
This documentation MUST be considered as THE exhaustive source of information in order to configure Keepalived. This documenation is supported and maintained by Keepalived Core-Team.
DESCRIPTION
keepalived.conf is the configuration file which describes all the Keepalived keywords. Keywords are placed in hierarchies of blocks and subblocks, each layer being delimited by ’{’ and ’}’ pairs.
Comments start with ’#’ or ’!’ to the end of the line and can start anywhere in a line.
The keyword ’include’ and variants allow inclusion of other configuration files from within the main configuration file, or from subsequently included files.
The format of the include directive is:
include FILENAME
FILENAME can be a fully qualified or relative pathname, and can include wildcards, including csh style brace expressions such as "{foo/{,cat,dog},bar}" if glob() supports them.
After opening an included file, the current directory is set to the directory of the file itself, so any relative paths included from a file are relative to the directory of the including file itself.
The include
variants add additional include checks to the current
include_check level (see below) The variants are:
includer FILENAME - same as include_check readable
includem FILENAME - same as include_check match
includew FILENAME - same as include_check wildcard_match
includeb FILENAME - same as include_check brace_match
includea FILENAME - all include_check checks
NOTE: If the libc glob() function does not support GLOB_ALTDIRFUNC (e.g. Musl libc as on Alpine Linux etc.), then only includea, includer and includew of the above options will work.
Why do we want to allow errors? Suppose a configuration has optional files in /etc/keepalived/conf.d, then include_/etc/keepalived/conf.d/* could be specified, but it should not error if there are no files in the directory; in this case includer should be used. Otherwise it is sensible to use includea.
include handling will not work with if the include line uses conditional configuration or parameter substitution, since the detection of the include keywords is done before the processing on conditional configuration and parameter substitution.
The basic include keyword is retained for backward compatibility, since it does not produce config errors if files could not be opened etc.
PARAMETER SYNTAX
<BOOL>
is one of on|off|true|false|yes|no
<TIMER> is a time value in seconds, including
fractional seconds, e.g. 2.71828 or 3; resolution of timer
is micro-seconds.
SCRIPTS
There are three classes of scripts can be configured to be executed.
(a) Notify scripts that are run when a vrrp instance or vrrp group changes state, or a virtual server quorum changes between up and down.
(b) vrrp tracking scripts that will cause vrrp instances to go down if they exit a non-zero exist status, or if a weight is specified will add or subtract the weight to/from the priority of that vrrp instance.
(c) LVS checker misc scripts that will cause a real server to be configured down if they exit with a non-zero status.
By default the scripts will be executed by user keepalived_script if that user exists, or if not by root, but for each script the user/group under which it is to be executed can be specified.
There are significant security implications if scripts are executed with root privileges, especially if the scripts themselves are modifiable or replaceable by a non root user. Consequently, security checks are made at startup to ensure that if a script is executed by root, then it cannot be modified or replaced by a non root user.
All scripts should be written so that they will terminate on receipt of a SIGTERM signal. Scripts will be sent SIGTERM if their parent terminates, or it is a script that keepalived is awaiting its exit status and it has run for too long.
Quoted strings
Quoted strings are specified between " or ’ characters and strings are delimited by whitespace. In the examples below the ´ characters are not part of the strings and should not be specified:
´abcd" efg h jkl "mnop´
will be the single string:
´abcd efg h jkl mnop´
whereas:
´abcd "efg h jkl" mnop´
will be the three strings:
´abcd´, ´efg h jkl´ and ´mnop´
i.e. the " and ’ characters are removed and any intervening whitespace is retained.
Quoted strings
can also have escaped characters, like the shell. \a, \b,
\E, \f,
\n, \r, \t, \v, \nnn and \xXX (where nnn is up to 3 octal
digits, and XX is any
sequence of hex digits) and \cC (which produces the control
version of
character C) are all supported. \C for any other character C
is just
treated as an escaped version of character C, so \\ is a \
character and
\" will be a " character, but it won’t start
or terminate a quoted string.
For specifying
scripts with parameters, unquoted spaces will separate the
parameters.
If it is required for a parameter to contain a space, it
should be enclosed in single
quotes (’).
CONFIGURATION PARSER
Traditionally the configuration file parser has not been one of the strengths of keepalived. Lot of efforts have been put to correct this even if this is not the primal goal of the project.
TOP HIERACHY
Keepalived configuration file is articulated around a set of configuration blocks. Each block is focusing and targetting a specific daemon family feature. These features are:
GLOBAL CONFIGURATION
BFD CONFIGURATION
VRRPD CONFIGURATION
LVS CONFIGURATION
GLOBAL CONFIGURATION
contains subblocks of Global definitions, Linkbeat interfaces, Interface up/down transition delays, Static track groups, Static addresses, Static routes, and Static rules
Global definitions
# Following are
global daemon facilities for running
# keepalived in a separate network namespace:
# --
# Set the network namespace to run in.
# The directory /run/keepalived will be created as an
# unshared mount point, for example for pid files.
# syslog entries will have _NAME appended to the ident.
# Note: the namespace cannot be changed on a configuration
reload.
net_namespace NAME
# Add the IPVS
configuration in the specified net namespace. It allows to
easily
# split the VIP traffic on a given namespace and keep the
healthchecks traffic
# in another namespace. If NAME is not specified, then the
default namespace
# will be used.
net_namespace_ipvs NAME
# ipsets
wasn’t network namespace aware until Linux 3.13, and
so
# if running with an earlier version of the kernel, by
default
# use of ipsets is disabled if using a namespace and
vrrp_ipsets
# has not been specified. This options overrides the default
and
# allows ipsets to be used with a namespace on kernels prior
to 3.13.
namespace_with_ipsets
# If multiple
instances of keepalived are run in the same namespace,
# this will create pid files with NAME as part of the file
names,
# in /run/keepalived.
# Note: the instance name cannot be changed on a
configuration reload
instance NAME
# Create pid
files in /run/keepalived
use_pid_dir
# Poll to
detect media link failure using ETHTOOL, MII or ioctl
interface
# otherwise uses netlink interface.
linkbeat_use_polling
# Time for main
process to allow for child processes to exit on termination
# in seconds. This can be needed for very large
configurations.
# (default: 5)
child_wait_time SECS
Note: All
processes/scripts run by keepalived are run with parent
death signal set
to SIGTERM. All such processes/scripts should either not
change the action for
SIGTERM, or ensure that the process/script terminates once
SIGTERM is received,
possibly following any cleanup actions needed.
# Global
definitions configuration block
global_defs {
# In order to ensure that all processes read exactly the
same configuration,
# while the config is first read it is written, by default,
to a memory based
# file (or to an anonymous file in /tmp/ if memfd_create()
is not supported).
# If your configuration is very large, you may not want the
copy to be
# held in memory, in which case specifing the
tmp_config_directory causes the
# configuration to be written to an anonymous file on the
filesystem on which
# the specified directory resides, which must be writeable
by keepalived.
# This setting cannot be changed on a reload, and it should
be specified as
# early as possible in the configuration.
tmp_config_directory DIRECTORY
#
config_save_dir causes keepalived to save configuration
state and
# configuration files before and after each reload. This is
used for debugging
# purposes if there appear to be problems related to
repeated reloads.
# The directory will be created if it does not exist, but
all parent
# directories must exist.
config_save_dir DIRECTORY
# Set the
process names of the keepalived processes to the default
values:
# keepalived, keepalived_vrrp, keepalived_ipvs,
keepalived_bfd
process_names
# Specify the
individual process names
process_name NAME
vrrp_process_name NAME
checker_process_name NAME
bfd_process_name NAME
# keepalived by
default resolves script path names to remove symlinks.
# To keep symlinks in pathnames, specify use_syslink_paths.
use_symlink_paths [<BOOL>]
# The startup
and shutdown scripts are run once, when keepalived starts
# before any child processes are run, and when keepalived
stops after
# all child processes have terminated, respectively.
# The original motivation for adding this feature was that
although
# keepalived can setup IPVS configuration using firewall
marks, there
# was no mechanism for adding configuration to set the
firewall marks
# (or for removing it afterwards).
# This feature can also be used to setup the iptables
framework required
# if using iptables (see vrrp_iptables option below), modify
interface
# settings, or anything else that can be done from a script
or program.
# Only one startup script and one shutdown script can be
specified.
# The timeouts (in seconds default 10 seconds) are the time
allowed for
# scripts to run; if the timeout expires the scripts will be
killed (this
# is to stop keepalived hanging waiting for the scripts to
terminate).
startup_script SCRIPT_NAME [username [groupname]]
startup_script_timeout SECONDS # range [1,1000]
shutdown_script SCRIPT_NAME [username [groupname]]
shutdown_script_timeout SECONDS # range [1,1000]
# Set of email
To: notify. To include a display name, the whole email
address
# must be included in double quotes(").
notification_email {
admin [AT] example1.com "My admin
<admin [AT] example2.com>"
...
}
# email from
address that will be in the header (see comment above for
# including a display name).
# (default: keepalived@local_host_name)
notification_email_from admin [AT] example.com
# Remote SMTP
server used to send notification email.
# IP address or domain name with optional port number.
# (default port number: 25)
smtp_server 127.0.0.1 [<PORT>]
# Name to use
in HELO messages.
# (default: local host name)
smtp_helo_name <STRING>
# SMTP server
connection timeout in seconds.
smtp_connect_timeout 30
# Sets default
state for all smtp_alerts
smtp_alert <BOOL>
# Sets default
state for vrrp smtp_alerts
smtp_alert_vrrp <BOOL>
# Sets default
state for checker smtp_alerts
smtp_alert_checker <BOOL>
# Logs every
failed real server check in syslog
# (nevertheless, SMTP alert is only sent when all retry
checks failed
# and real server transitions to DOWN state)
checker_log_all_failures <BOOL>
# Don’t
send smtp alerts for fault conditions
no_email_faults
# String
identifying the machine (doesn’t have to be hostname).
# (default: local host name)
router_id <STRING>
# Multicast
Group to use for IPv4 VRRP adverts
# Defaults to the RFC5798 IANA assigned VRRP multicast
address 224.0.0.18
# which You typically do not want to change.
vrrp_mcast_group4 224.0.0.18
# Multicast
Group to use for IPv6 VRRP adverts
# (default: ff02::12)
vrrp_mcast_group6 ff02::12
# sets the
default interface for static addresses.
# (default: eth0)
default_interface p33p1.3
# The sync
daemon as provided by the IPVS kernel code only supports
# one master and one backup daemon instance at a time to
synchronize
# the IPVS connection table.
# See ipvsadm(8) man page for more details of the sync
daemon.
# Parameters are binding interface, and optional:
# inst VRRP_INSTANCE (inst can be omitted for backward
compatibility)
# syncid (0 to 255) for lvs syncd, default is the VRID of
vrrp instance,
# or 0 if no vrrp instance
# maxlen (1..65507) maximum packet length (limit is mtu - 20
- 8)
# port (1..65535) UDP port number to use, default 8848
# ttl (1..255)
# group - multicast group address(IPv4 or IPv6), default
224.0.0.81
# If VRRP_INSTANCE is not specified, both the master and
backup sync daemons
# will be run as long as keepalived is running, otherwise
the sync daemon
# master/backup state tracks the state of the specified vrrp
instance: if
# the vrrp instance is in master state, only the master sync
daemon will run,
# if the vrrp instance is not master, only the backup sync
daemon will run.
# NOTE: maxlen, port, ttl and group are only available on
Linux 4.3 or later.
# See kernel source
doc/Documentation/networking/ipvs-sysctl.txt for details of
# parameters controlling IPVS and the sync daemon.
# /proc/net/ip_vs* provide some details about the state of
IPVS.
lvs_sync_daemon <INTERFACE> [[inst]
<VRRP_INSTANCE>] [id <SYNC_ID>] \
[maxlen <LEN>] [port <PORT>] [ttl <TTL>]
[group <IP ADDR>]
# lvs_timeouts
specifies the tcp, tcp_fin and udp connection tracking
timeouts
# in seconds. At least one value must be specified; not
setting a value leaves
# it unchanged from when keepalived started.
lvs_timeouts [tcp SECS] [tcpfin SECS] [udp SECS]
# flush any
existing LVS configuration at startup
lvs_flush
# flush
remaining LVS configuration at shutdown (for large
configurations
# this is much faster than the default approach of deleting
each RS and
# each VS individually).
# If VS is specified, remove each keepalived managed virtual
# server without explicitly removing the real servers (the
kernel will
# remove them).
lvs_flush_on_stop [VS]
# delay for
second set of gratuitous ARPs after transition to MASTER.
# in seconds, 0 for no second set.
# (default: 5)
vrrp_garp_master_delay 10
# number of
gratuitous ARP messages to send at a time after
# transition to MASTER.
# (default: 5)
vrrp_garp_master_repeat 1
# delay for
second set of gratuitous ARPs after lower priority
# advert received when MASTER.
# (default: vrrp_garp_master_delay)
vrrp_garp_lower_prio_delay 10
# Default value
for vrrp down_timer_adverts.
vrrp_down_timer_adverts [1:100]
# number of
gratuitous ARP messages to send at a time after
# lower priority advert received when MASTER.
# (default: vrrp_garp_master_repeat)
vrrp_garp_lower_prio_repeat 1
# minimum time
interval for refreshing gratuitous ARPs while MASTER.
# in seconds (resolution seconds).
# (default: 0 (no refreshing))
vrrp_garp_master_refresh 60
# number of
gratuitous ARP messages to send at a time while MASTER
# (default: 1)
vrrp_garp_master_refresh_repeat 2
# Delay between
gratuitous ARP messages sent on an interface
# decimal, seconds (resolution usecs).
# (default: 0)
vrrp_garp_interval 0.001
# Delay between
unsolicited NA messages sent on an interface
# decimal, seconds (resolution usecs).
# (default: 0)
vrrp_gna_interval 0.000001
# By default
keepalived sends 5 gratuitions ARP/NA messages at a
# time, and after transitioning to MASTER sends a second
block of
# 5 messages 5 seconds later.
# With modern switches this is unnecessary, so setting
vrrp_min_garp
# causes only one ARP/NA message to be sent, with no repeat
5 seconds
# later.
vrrp_min_garp [<BOOL>]
# The following
option causes periodic GARP/NA messages to be sent on
# interfaces of VIPs/eVIPs that are not the interface of the
VRRP
# instance, in order to ensure that switch MAC caches are
maintained
# (specified in seconds).
# Many switches have a default cache timeout of 300 seconds,
and so
# a garp repeat rate of 1/3rd of that would be sensible. The
maximum
# permitted value is 1 day (86400 seconds);
# By default, it will only send on VMAC interfaces;
specifying all
# will cause it to send GARP/NA on each interface used by
the VRRP instance.
vrrp_garp_extra_if [all] 100
# If a lower
priority advert is received, don’t send another
advert.
# This causes adherence to the RFCs. Defaults to false,
unless
# strict_mode is set.
vrrp_lower_prio_no_advert [<BOOL>]
# If we are
master and receive a higher priority advert, send an advert
# (which will be lower priority than the other master),
before we
# transition to backup. This means that if the other master
has
# garp_lower_priority_repeat set, it will resend garp
messages.
# This is to get around the problem of their having been two
simultaneous
# masters, and the last GARP messages seen were from us.
vrrp_higher_prio_send_advert [<BOOL>]
# Set the
default VRRP version to use
# (default: 2, but IPv6 instances will use version 3)
vrrp_version <2 or 3>
# See
vrrp_instance description of V3_checksum_as_V2
v3_checksum_as_v2 [<BOOL>]
# keepalived
uses a firewall (either nftables or iptables) for two
purposes:
# i) To implement no_accept mode
# ii) To stop IGMP/MLD/Router-Solicit packets being sent on
VMAC interfaces,
# and to move IGMP/MLD messages onto the underlying
interface.
# If both vrrp_iptables and vrrp_nftables are specified,
keepalived will use
# nftables and not iptables. Similarly, if the iptables
command is generating
# nftables configuration, or there is no iptables command
installed,
# keepalived will use nftables rather than iptables.
# If neither vrrp_nftables or vrrp_iptables are specified
but VMACs are in use
# or no_accept is specified, keepalived will use nftables if
it is available.
# Use nftables
as the firewall.
# TABLENAME must not exist, and must be different for each
# instance of keepalived running in the same network
namespace.
# Default tablename is keepalived, and priority is -1.
# keepalived will create base chains in the table.
# counters means counters are added to the rules (primarily
for
# debugging purposes).
# ifindex means create IPv6 link local sets using ifindex
rather
# than ifnames. This is the default unless the vrrp_instance
has
# set dont_track_primary. The alternative is to use
interface names
# as part of the set key, but the nft utility prior to
v0.8.3 will
# then not output interface names properly.
nftables [TABLENAME]
nftables_priority PRIORITY
nftables_counters
nftables_ifindex
# Similarly for
IPVS iptables - used for setting fwmarks for virtual
# server groups. keepalived will allocate a fwmark for each
virtual
# server group, so that only one virtual server for each
group needs
# to be configured in IPVS, by using a fwmark, and nftables
will be
# used to set the fwmark for each of the virtual server
# address/protocol/port combinations specified.
# nftables_ipvs_start_fwmark specifies the first fwmark for
keepalived
# to use (default 1000). This will be incremented for each
subsequent
# virtual server group.
nftables_ipvs [TABLENAME]
nftables_ipvs_priority PRIORITY
nftables_ipvs_start_fwmark NUMBER
# Use iptables
as the firewall.
# Note: it is necessary for the specified chain to exist in
# the iptables and/or ip6tables configuration, and for the
chain
# to be called from an appropriate point in the iptables
configuration.
# It will probably be necessary to have this filtering after
accepting
# any ESTABLISHED,RELATED packets, because IPv4 might select
the VIP as
# the source address for outgoing connections.
# Note: although the default chains that are used are INPUT
and OUTPUT,
# since those are the only chains that will always exist, it
is not safe
# or sensible to use those chains and specific chains should
be created
# and called from appropriate points in the iptables
configuration. The
# chains used for keepalived should not be used for any
other purpose, and
# should have no rules configured, other than the rules that
keepalived
# manages.
# A startup_script (see above) can be used to create the
chains and to
# add rules to call them. A shutdown_script can be used to
remove the
# iptables configuration added by the startup_script.
# Note2: If using ipsets, the iptables VIP rules are
appended to the end
# of the specified chains; if not using ipsets, the VIP
rules are inserted
# at the beginning of the chains. Any IGMP rules are always
appended to
# the end of the chains.
# (default: INPUT)
vrrp_iptables keepalived
# or for
outbound filtering as well
# Note, outbound filtering won’t work with IPv4, since
the VIP can be
# selected as the source address for an outgoing connection.
With IPv6
# this is unlikely since the addresses are deprecated.
vrrp_iptables keepalived_in keepalived_out
# or to to use
default chains (INPUT and OUTPUT)
vrrp_iptables
# Keepalived
may have the option to use ipsets in conjunction with
# iptables. If so, then the ipset names can be specified,
defaults
# as below. If no names are specified, ipsets will not be
used,
# otherwise any omitted names will be constructed by adding
"_if"
# and/or "6" and _igmp/_mld/_nd to previously
specified names.
vrrp_ipsets [keepalived [keepalived6 [keepalived_if6
[keepalived_igmp [keepalived_mld
[keepalived_vmac_nd]]]]]]
# An
alternative to moving IGMP messages from VMACs to their
parent interfaces
# is to disable them altogether in the kernel by setting
# igmp_link_local_mcast_reports false.
# This stops IGMP join etc messages for 224.0.0.0/24, since
they should
# always be forwarded to all interfaces (see RFC4541).
# This is available from Linux 4.3 onwards.
disable_local_igmp
# The following
enables checking that when in unicast mode, the
# source address of a VRRP packet is one of our unicast
peers.
vrrp_check_unicast_src
# Checking all
the addresses in a received VRRP advert can be time
# consuming. Setting this flag means the check won’t
be carried out
# if the advert is from the same master router as the
previous advert
# received.
# (default: don’t skip)
vrrp_skip_check_adv_addr
# Enforce
strict VRRP protocol compliance. This currently includes
# enforcing the following. Please note that other checks may
be
# added in the future if they are found to be missing:
# 0 VIPs not allowed
# unicast peers not allowed
# IPv6 addresses not allowed in VRRP version 2
# First IPv6 VIP is link local
# State MASTER can be configured if and only if priority is
255
# Authentication is not supported
# Preempt delay is not supported
# Accept mode cannot be set for VRRPv2
# If accept/no accept is not specified, accept is set if
priority
# is 255 aand cleared otherwise
# Gratuitous ARP repeats cannot be enabled
# Cannot clear lower_prio_no_advert
# Cannot set higher_prio_send_advert
# Cannot use vmac_xmit_base
# Cannot have no VIPs with VRRPv3
vrrp_strict
# Send vrrp
instance priority notifications on notify FIFOs.
vrrp_notify_priority_changes <BOOL>
# The following
options can be used if vrrp, checker or bfd processes
# are timing out. This can be seen by a backup vrrp instance
becoming
# master even when the master is still running, because the
master or
# backup system is too busy to process vrrp packets.
# --
# keepalived can, if it detects that it is not running
sufficiently
# soon after a timer should expire, increase its priority,
first
# of all switching to realtime scheduling, and if that is
not
# sufficient, it will then increase its realtime priority by
one each
# time it detects a further delay in running. If the event
that realtime
# scheduling is enabled, RLIMIT_RTTIME will be set, using
the values for
# {bfd,checker,vrrp}_rlimit_rttime (see below). These values
may need
# to be increased for slower processors.
# --
# To limit the maximum increased automatic priority, specify
the following
# (0 doesn’t use automatic priority increases, and is
the default. -1 disables
# the warning message at startup). Omitting the priority
sets the maximum value.
max_auto_priority [<-1 to 99>] # 99 is really
sched_get_priority_max(SCHED_RR)
# Minimum delay
in microseconds after timer expires before keeplalived is
# scheduled after which the process priority will be auto
incremented
# (default is 1000000 usecs (1 second), maximum is 10000000
(10 seconds))
min_auto_priority_delay <delay in usecs>
# Set the vrrp
child process priority (Negative values increase priority)
vrrp_priority <-20 to 19>
# Set the
checker child process priority
checker_priority <-20 to 19>
# Set the BFD
child process priority
bfd_priority <-20 to 19>
# Set the vrrp
child process non swappable
vrrp_no_swap
# Set the
checker child process non swappable
checker_no_swap
# Set the BFD
child process non swappable
bfd_no_swap
# The following
options can be used to force vrrp, checker and bfd
# processes to run on a restricted CPU set.
# You can either bind processes to a single CPU or define a
set of
# cpu. In that last case Linux kernel will be restricted to
that cpu
# set during scheduling. Forcing process binding to single
CPU can
# increase performances on heavy loaded box.
# INTEGER following configuration keyword are representing
cpu_id
# as shown in /proc/cpuinfo on line "processor:"
# --
# Set CPU Affinity for the vrrp child process
vrrp_cpu_affinity <INTEGER>
[<INTERGER>]...[<INTEGER>]
# Set CPU
Affinity for the checker child process
checker_cpu_affinity <INTEGER>
[<INTERGER>]...[<INTEGER>]
# Set CPU
Affinity for the bfd child process
bfd_cpu_affinity <INTEGER>
[<INTERGER>]...[<INTEGER>]
# Set the vrrp
child process to use real-time scheduling
# at the specified priority
vrrp_rt_priority <1..99>
# Set the
checker child process to use real-time scheduling
# at the specified priority
checker_rt_priority <1..99>
# Set the BFD
child process to use real-time scheduling
# at the specified priority
bfd_rt_priority <1..99>
# Set the limit
on CPU time between blocking system calls,
# in microseconds
# (default: 10000)
vrrp_rlimit_rttime >=2
checker_rlimit_rttime >=2
bfd_rlimit_rttime >=2
# If Keepalived
has been build with SNMP support, the following
# keywords are available.
# Note: Keepalived, checker and RFC support can be
individually
# enabled/disabled
# --
# Specify socket to use for connecting to SNMP master agent
# (see source module keepalived/vrrp/vrrp_snmp.c for more
details)
# (default: unix:/var/agentx/master)
snmp_socket udp:1.2.3.4:705
# enable SNMP
handling of vrrp element of KEEPALIVED MIB
enable_snmp_vrrp
# enable SNMP
handling of checker element of KEEPALIVED MIB
enable_snmp_checker
# enable SNMP
handling of RFC2787 and RFC6527 VRRP MIBs
enable_snmp_rfc
# enable SNMP
handling of RFC2787 VRRP MIB
enable_snmp_rfcv2
# enable SNMP
handling of RFC6527 VRRP MIB
enable_snmp_rfcv3
# enable SNMP
traps
enable_traps
# When SNMP
requests are made, the checker process only updates the
# virtual and real server stats from the kernel if the last
time the
# stats for that virtual server were read was more than this
configured
# interval (in seconds). The default interval is 5 seconds,
and the
# valid range is 0.001 (1 milli-second) to 30 seconds.
snmp_vs_stats_update_interval <TIMER>
# Like
snmp_vs_stats_update_interval but for real servers. Stats
for
# real servers are only read if there is an SNMP request for
real server
# stats.
snmp_rs_stats_update_interval <TIMER>
# If Keepalived
has been build with DBus support, the following
# keywords are available.
# --
# Enable the DBus interface
enable_dbus
# Name of DBus
service
# Useful if you want to run multiple keepalived processes
with DBus enabled
# (default: org.keepalived.Vrrp1)
dbus_service_name SERVICE_NAME
# String to use
for DBus path when VRRP instance has no interface configured
# Useful if your system has an interface named
"none"!
# (default: "none")
dbus_no_interface_name NAME
# Specify the
default username/groupname to run scripts under.
# If this option is not specified, the user defaults to
keepalived_script
# if that user exists, otherwise the uid/gid under which
keepalived is running.
# If groupname is not specified, it defaults to the
user’s group.
script_user username [groupname]
# Don’t
run scripts configured to be run as root if any part of the
path
# is writable by a non-root user. Also, enforce the default
script_user is
# keepalived_script, and don’t default to the user
under which keepalived
# is running (usually root).
enable_script_security
# Rather than
using notify scripts, specifying a fifo allows more
# efficient processing of notify events, and guarantees that
they
# will be delivered in the correct sequence.
# NOTE: the FIFO names must all be different
# --
# FIFO to write notify events to
# See vrrp_notify_fifo and lvs_notify_fifo for format of
output
# For further details, see the description under
vrrp_sync_group.
# see doc/samples/sample_notify_fifo.sh for sample usage.
notify_fifo FIFO_NAME [username [groupname]]
# script to be
run by keepalived to process notify events
# The FIFO name will be passed to the script as the last
parameter
notify_fifo_script STRING|QUOTED_STRING [username
[groupname]]
# FIFO to write
vrrp notify events to.
# The string written will be a line of the form: INSTANCE
"VI_1" MASTER 100
# and will be terminated with a new line character.
# For further details of the output, see the description
under vrrp_sync_group
# and doc/samples/sample_notify_fifo.sh for sample usage.
vrrp_notify_fifo FIFO_NAME [username [groupname]]
# script to be
run by keepalived to process vrrp notify events
# The FIFO name will be passed to the script as the last
parameter
vrrp_notify_fifo_script STRING|QUOTED_STRING [username
[groupname]]
# FIFO to write
notify healthchecker events to
# The string written will be a line of the form:
# VS [192.168.201.15]:tcp:80 {UP|DOWN}
# RS [1.2.3.4]:tcp:80 [192.168.201.15]:tcp:80 {UP|DOWN}
# and will be terminated with a new line character.
lvs_notify_fifo FIFO_NAME [username [groupname]]
# script to be
run by keepalived to process healthchecher notify events
# The FIFO name will be passed to the script as the last
parameter
lvs_notify_fifo_script STRING|QUOTED_STRING [username
[groupname]]
# By default,
when keepalived reloads the vrrp instance and sync group
states
# are not written to the relevant FIFOs. Setting this option
will cause the
# states to be sent to the FIFO(s) when keepalived reloads.
fifo_write_vrrp_states_on_reload
# Allow
configuration to include interfaces that don’t exist
at startup.
# This allows keepalived to work with interfaces that may be
deleted and restored
# and also allows virtual and static routes and rules on
VMAC interfaces.
# allow_if_changes allows an interface to be deleted and
recreated with a
# different type or underlying interface, eg changing from
vlan to macvlan
# or changing a macvlan from eth1 to eth2. This is
predominantly used for
# reporting duplicate VRID errors at startup if
allow_if_changes is not set.
dynamic_interfaces [allow_if_changes]
# The following
options are only needed for large configurations, where
either
# keepalived creates a large number of interface, or the
system has a large
# number of interface. These options only need using if
# "Netlink: Receive buffer overrun" messages are
seen in the system logs.
# If the buffer size needed exceeds the value in
/proc/sys/net/core/rmem_max
# the corresponding force option will need to be set.
# --
# Set netlink receive buffer size. This is useful for
# very large configurations where a large number of
interfaces exist, and
# the initial read of the interfaces on the system causes a
netlink buffer
# overrun.
vrrp_netlink_cmd_rcv_bufs BYTES
vrrp_netlink_cmd_rcv_bufs_force <BOOL>
vrrp_netlink_monitor_rcv_bufs BYTES
vrrp_netlink_monitor_rcv_bufs_force <BOOL>
# The vrrp
netlink command and monitor socket the checker command and
# and monitor socket and process monitor buffer sizes can be
independently set.
# The force flag means to use SO_RCVBUFFORCE, so that the
buffer size
# can exceed /proc/sys/net/core/rmem_max.
lvs_netlink_cmd_rcv_bufs BYTES
lvs_netlink_cmd_rcv_bufs_force <BOOL>
lvs_netlink_monitor_rcv_bufs BYTES
lvs_netlink_monitor_rcv_bufs_force <BOOL>
# As a guide
for process_monitor_rcv_bufs for 1400 processes terminating
# simultaneously, 212992 (the default on some systems) is
insufficient, whereas
# 500000 is sufficient.
process_monitor_rcv_bufs BYTES
process_monitor_rcv_bufs_force <BOOL>
# When a socket
is opened, the kernel configures the max rx buffer size for
# the socket to /proc/sys/net/core/rmem_default. On some
systems this can be
# very large, and even generally this can be much larger
than necessary.
# This isn’t a problem so long as keepalived is
reading all queued data from
# it’s sockets, but if rmem_default was set
sufficiently large, and if for
# some reason keepalived stopped reading, it could consume
all system memory.
# The vrrp_rx_bufs_policy allows configuring of the rx bufs
size when the
# sockets are opened. If the policy is MTU, the rx buf size
is configured
# to the total of interface’s MTU *
vrrp_rx_bufs_multiplier for each vrrp
# instance using the socket. Likewise, if the policy is
ADVERT, then it is
# the total of each vrrp instances advert packet size *
multiplier.
# (default: use system default)
vrrp_rx_bufs_policy [MTU|ADVERT|NUMBER]
# (default: 3)
vrrp_rx_bufs_multiplier NUMBER
# Send notifies
at startup for real servers that are starting up
rs_init_notifies
# Don’t
send an email every time a real server checker changes
state;
# only send email when a real server is added or removed
no_checker_emails
# The umask to
use for creating files. The number can be specified in hex,
octal
# or decimal. BITS are I{R|W|X}{USR|GRP|OTH}, e.g. IRGRP,
separated by ’|’s.
# IRWX{U|G|O} can also be specified.
# The default umask is IXUSR | IRWXG | IRWXO. This option
cannot override the
# command-line option.
umask [NUMBER|BITS]
# On some
systems when bond interfaces are created, they can start
passing traffic
# and then have a several second gap when they stop passing
traffic inbound. This
# can mean that if keepalived is started at boot time, i.e.
at the same time as
# bond interfaces are being created, keepalived
doesn’t receive adverts and hence
# can become master despite an instance with higher priority
sending adverts.
# This option specifies a delay in seconds before vrrp
instances start up after
# keepalived starts,
vrrp_startup_delay 5.5
# The following
will cause logging of receipt of VRRP adverts for VRIDs not
configured
# on the interface on which they are received.
log_unknown_vrids
# Specify the
prefix for generated VMAC names (default "vrrp")
vmac_prefix STRING
# Specify the
prefix for generated VMAC names for VIPs which use a VMAC
but are not
# on the VRRP instance’s interface (default
vmac_prefix value)
vmac_addr_prefix STRING
# Specify
random seed for ${_RANDOM}, to make configurations
repeatable (default
# is to use a seed based on the time, so that each time a
different configuration
# will be generated).
random_seed UNSIGNED_INT
# If a
configuration reload is attempted with an updated
configuration file that has
# errors, keepalived may terminate, and possibly enter a
loop indefinitely restarting
# and terminating. If reload_check_config is set, then
keepalived will attempt to
# validate the configuration before initiating a reload, and
only initiate the reload
# if the configuration is valid.
reload_check_config [LOG_FILE]
# Treat any missing include file as an error. The OPTIONS can be any combination of
|
- error if a match is not a readable file |
||||||
|
- error if no file matches (unless wildcard specified)
|
- error if no file matches (even if wildcard specified)
| |||||
|
- error if a brace expansion does not match a file |
# Note: match, wildcard_match
and brace_match include the readable check.
# The setting of include_check is saved when a new include
file is opened, and restored
# when the file is closed. This means that the include_check
setting when reading a
# file cannot be changed by a subsequently included file. To
change the setting for all
# included files, include_check should be set at the
beginning of the configuration file
# specified in the command line (default
/etc/keepalived/keepalived.conf).
# Note2: If the libc glob() function does not support
GLOB_ALTDIRFUNC (e.g. Musl libc as
# on Alpine Linux etc.), then only readable and
wildcard_match of the above options will work.
# It is possible to add or remove individual settings;
’+’ means add the following
# checks, ’-’ means remove the following checks.
For example
# include_check +match -wildcard_match
# adds the requirement that there is a matching file, and
removes the requirement for
# wildcard matches.
# If no option is specified, it is the same as specifying
all options.
include_check [OPTIONS]
#
reload_time_file allows a reload of keepalived to be
scheduled in the future. This is
# particularly useful if there is a master keepalived and
one or more backup keepalived
# instances and the new configuration is incompatible with
the previous configuration,
# e.g. adding or removing VIPs which would cause adverts to
be rejected.
# All the instances can be scheduled to reload at the same
time, thereby ensuring that
# no mismatching adverts are received by the backup
instances.
# The configuration specifies a file which keepalived will
monitor. The first line of
# the file must contain a valid time or date/time exactly in
the formats specified below.
# When keepalived starts up, it reads the file if it exists,
and schedules a reload at
# the specified time. If the file does not exist, then when
it is subsequently created
# a reload will be scheduled. If the file is updated, the
reload time will be modified
# accordingly. If the file is deleted, the reload is
cancelled.
# Normally when the reload occurs the specified file is
deleted, since the reload has
# been done; if the file included a date then the reload
will be in the past and so
# ignored. However, if there is no date, then if the file
were reread following the
# reload, a reload would be scheduled for 24 hours time. In
order to stop this, the
# file is deleted (unlinked) by default. If reload_repeat is
specified, then the
# file is not deleted, and if the file contains a time only
with no date, then
# keepalived will keep reloading at that time every day
until the file is removed or
# modified.
# If the directory containing the file does not exist at
startup/reload, or if the
# directory is removed or renamed, then no future scheduled
reloads will occur until
# a manual (SIGHUP) reload is done or keepalived restarts.
# The permitted formats of the entry in the timer file are
precisely:
# HH:MM:SS
# YY-MM-DD HH:MM:SS
# YYYY-MM-DD HH:MM:SS
# each with an optional ’Z’ at the end.
# There must be no leading or trailing whitespace, and only
one space between the date
# and the time.
# If there is a ’Z’ at the end of the time, the
time is parsed as UTC, otherwise the
# time is the localtime for the environment in which
keepalived is running. If the
# systems which are being reloaded are in different
timezones, it is probably safer to
# use UTC.
# If using local time with daylight savings, beware that
some times don’t exist and
# some times are duplicated and hence ambiguous.
reload_time_file ABSOLUTE-PATHNAME-OF-FILE
reload_repeat
# Some users
frequently update their configurations and reload
keepalived. reload_file
# provides a mechanism that allows the configuration update
processes not to update the
# configuration files while keepalived is reading them.
# The reload file will be created by keepalived before it
starts reading configuration
# files, unless the file exists. If the file already exists,
it will be truncated. Once
# keepalived has completed reading the files it will remove
the reload file.
# If reload_file with no file name is specified, the default
filename keepalived.reload
# in the PID directory will be used.
# The best way to use the reload file is for the
configuration update process to touch
# the reload file before it signals keepalived to reload,
and then wait for the file
# to be deleted, which indicates that keepalived has
finished reading the config files.
# When keepalived starts reading the configuration files,
since it truncates the reload
# file, if update process creates the reload_file with
non-zero size, it can detect
# the reloading starting by the reload_file becoming zero
length.
reload_file [ABSOLUTE-PATHNAME-OF-FILE]
# Sending
SIGUSR1 to keepalived causes it to dump its data structures
# for debugging purposes, although some users use this
feature and
# process the output. Please note that the format of the
.data files
# produced is not guaranteed to maintain backward
compatibility.
# The standard file names are keepalived_parent.data,
keepalived.data,
# keepalived_check.data and keepalived_bfd.data. This causes
a problem
# if more than one keepalived instance is running on a
system.
# In order to alleviate this, enabling data_use_instance
includes the
# instance name and network namespace in the file name of
the .data files.
data_use_instance [<BOOL>]
# json_version
2 puts the VRRP data in a named array and adds
# track_process details. Default is version 1.
json_version {1|2}
}
Linkbeat interfaces
The linkbeat_interfaces block allows specifying which interfaces should use polling via MII, Ethtool or ioctl status rather than rely on netlink status updates. This allows more granular control of global definition linkbeat_use_polling.
This option is preferred over the deprecated use of linkbeat_use_polling in a vrrp_instance block, since the latter only allows using linkbeat on the interface of the vrrp_instance itself, whereas track_interface and virtual_ipaddresses and virtual_iproutes may require monitoring other interfaces, which may need to use linkbeat polling.
The default polling type to use is MII, unless that isn’t supported in which case ETHTOOL is used, and if that isn’t supported then ioctl polling. The preferred type of polling to use can be specified with MII or ETHTOOL or IOCTL after the interface name, but if that type isn’t supported, a supported type will be used.
The syntax for
linkbeat_interfaces is:
linkbeat_interfaces {
eth2
enp2s0 ETHTOOL
}
Static track groups
Static track groups are used to allow vrrp instances to track static addresses, routes and rules. If a static address/route/rule specifies a track group, then if the address/route/rule is deleted and cannot be restored, the vrrp instance will transition to fault state.
The syntax for
a track group is:
track_group GROUP1 {
group {
VI_1
VI_2
}
}
Static routes/addresses/rules
Keepalived can configure static addresses, routes, and rules. These addresses, routes and rules are NOT moved by vrrpd, they stay on the machine. If you already have IPs and routes on your machines and your machines can ping each other, you don’t need this section. The syntax for rules and routes is the same as for ip rule add/ip route add (except shortened option names are not supported due to ambiguities). The track_group specification refers to a named track_group which lists the vrrp instances which will track the address, i.e. if the address is deleted the vrrp instances will transition to backup.
NOTE: since rules without preferences can be added in different orders due to vrrp instances transitioning from master to backup etc, rules need to have a preference. If a preference is not specified, keepalived will assign one, but it will probably not be what you want.
The syntax is the same for virtual addresses and virtual routes. If no dev element is specified, it defaults to default_interface (default eth0). Note: the broadcast address may be specified as ’-’ or ’+’ to clear or set the host bits of the address.
If a route or rule could apply to either IPv4 or IPv6 it will default to IPv4. To force a route/rule to be IPv6, add the keyword "inet6".
By default keepalived prepends routes (the kernel’s default) which adds the route before any matching routes (this is the same behaviour as the (undocumented) as the ’ip route add’ command, which only adds the route if there is no matching route. If ’append’ is specified, the behaviour is the same as the ’ip route append’ command, i.e. the route is added after any matching route. Note: the rules for whether a route matches differ between IPv4 and IPv6; for example specifying a different proto means a matching route can be prepended/appended for IPv4 but not for IPv6. If in doubt, test it using the ’ip route add/prepend/append’ commands.
static_ipaddress
{
<IPADDR>[/<MASK>] [brd <IPADDR>] [dev
<STRING>] [scope <SCOPE>]
[label <LABEL>] [peer <IPADDR>] [home]
[-nodad] [mngtmpaddr] [noprefixroute]
[autojoin] [track_group GROUP] [preferred_lft nn|forever]
192.168.1.1/24 dev eth0 scope global
...
}
static_routes
{
192.168.2.0/24 via 192.168.1.100 dev eth0 track_group
GROUP1
192.168.100.0/24
table 6909 nexthop via 192.168.101.1 dev wlan0
onlink weight 1 nexthop via 192.168.101.2
dev wlan0 onlink weight 2
192.168.200.0/24
dev p33p1.2 table 6909 tos 0x04 protocol bird
scope link priority 12 mtu 1000 hoplimit 100
advmss 101 rtt 102 rttvar 103 reordering 104
window 105 cwnd 106 ssthresh lock 107 realms
PQA/0x14 rto_min 108 initcwnd 109 initrwnd 110
vrf blue features ecn add
2001:470:69e9:1:2::4
dev p33p1.2 table 6909 tos 0x04 protocol
bird scope link priority 12 mtu 1000
hoplimit 100 advmss 101 rtt 102 rttvar 103
reordering 104 window 105 cwnd 106 ssthresh
lock 107 rto_min 108 initcwnd 109 append
initrwnd 110 features ecn fastopen_no_cookie 1
...
}
static_rules
{
from 192.168.2.0/24 table 1 track_group GROUP1
to 192.168.2.0/24 table 1
from
192.168.28.0/24 to 192.168.29.0/26 table small iif p33p1
oif wlan0 tos 22 fwmark 24/12
preference 39 realms 30/20 goto 40
to
1:2:3:4:5:6:7:0/112 from 7:6:5:4:3:2::/96 table 6908
uidrange 10000-19999
to
1:2:3:4:6:6:7:0/112 from 8:6:5:4:3:2::/96 l3mdev protocol 12
ip_proto UDP sport 10-20 dport 20-30
...
}
Track files
Adds a file to be monitored. The file will be read whenever it is modified. The value in the file will be recorded for all VRRP instances, sync groups and real servers which monitor it. Note that the file will only be read if at least one VRRP instance, sync group or real server monitors it.
A value will be read as a number in text from the file. If the weight configured against the track_file is 0, a non-zero value in the file will be treated as a failure status, and a zero value will be treated as an OK status, otherwise the value will be multiplied by the weight configured in the track_file statement.
For VRRP instances, if the result is less than -253 anything monitoring the script will transition to the fault state (the weight can be 254 to allow for a negative value being read from the file).
If the vrrp instance or sync group is not the address owner and the result is between -253 and 253, the result will be added to the initial priority of the VRRP instance (a negative value will reduce the priority), although the effective priority will be limited to the range [1,254]. Likewise for real servers.
If a vrrp instance using a track_file is a member of a sync group, unless sync_group_tracking_weight is set on the group weight 0 must be set. Likewise, if the vrrp instance is the address owner, weight 0 must also be set.
For real servers monitoring the file, the limits of values read from the track file are 2147483648 to -2147483648. The value, once multiplied by the weight, will be added to the real server’s IPVS weight. If the result is <= than 2147483648 then the checker will be in the FAULT state.
NOTE: weights for track_file for real servers are not fully implemented yet. In particular allowing weight 0, handling negative calculated values and reloading.
The syntax for track file is:
|
track_file <STRING> { |
# vrrp_track_file is a deprecated synonym |
# file to track (weight
defaults to 1)
file <QUOTED_STRING>
# optional
default weight
weight <-2147483647..2147483647> [reverse]
# create the
file and/or initialise the value
# This causes VALUE (default 0) to be written to
# the specified file at startup if the file doesn’t
# exist, unless overwrite is specified in which case
# any existing file contents will be overwritten with
# the specified value.
init_file [VALUE] [overwrite]
}
VRRP track processes
The configuration block looks like:
vrrp_track_process
<STRING> {
# process to monitor (with optional parameters)
# A quoted string is treated as a single element, so if the
first item
# after the process keyword is quoted, that will be the
command name.
# For example:
# process "/tmp/a b" param1 "param 2"
# would mean a process named ’/tmp/a b’ (quotes
removed) with 2 parameters
# ’param1’ and ’param 2’.
process <STRING>|<QUOTED_STRING>
[<STRING>|<QUOTED_STRING> ...]
# If matching
parameters, this specifies a partial match (i.e. the first
# n parameters match exactly), or an initial match, i.e. the
last
# parameter may be longer that the parameter configured.
# To specify that a command must have no parameters,
don’t specify
# any parameters, but specify param_match.
param_match {initial|partial}
# default
weight (default is 0). For description of reverse, see
track_process.
# ’weight 0 reverse’ will cause the vrrp
instance to be down when the
# quorum is up, and vice versa.
|
# A non-zero weight will adjust the VRRP priority of the tracking VRRP instance, | |
|
# whereas a 0 weight will cause the VRRP instance to enter FAULT state if the | |
|
# track process is in the failed state (see above for the effect of "reverse"). |
weight <-254..254> [reverse]
# minimum
number of processes for success (default 1)
quorum NUM
# maximum
number of processes for success. For example, setting
# this to 1 would cause a failure if two instances of the
process
# were running (but beware forks - see fork_delay below).
# Setting this to 0 would mean failure if the matching
process were
# running at all.
|
# Default is unlimited. |
quorum_max NUM
# time to delay
after process quorum gained after fork before
# consider process up (in fractions of second)
# This is to avoid up/down bounce for fork/exec
fork_delay SECS
# time to delay
after process quorum lost before
# consider process down (in fractions of second)
# This is to avoid down/up bounce after terminate/parent
refork.
terminate_delay SECS
# this sets
fork_delay and terminate_delay
delay SECS
# Normally
process string is matched against the process name,
# as shown on the Name: line in /proc/PID/status, unless
# parameters are specified.
# This option forces matching the full command line
full_command
}
To avoid having to frequently run a track_script to monitor the existance of processes (often haproxy or nginx), vrrp_track_process can monitor whether other processes are running.
One difference from pgrep is track_process doesn’t do a regular expression match of the command string, but does an exact match. ’pgrep ssh’ will match an sshd process, this track_process will not (it is equivalent to pgrep "^ssh$").
If full_command is used (equivalent to pgrep -f), /proc/PID/cmdline is used, but any updates to cmdline will not be detected (a process shouldn’t normally change it, although it is possible with great care, for example systemd).
Prior to Linux v3.2 track_process will not support detection of changes to a process name, since the kernel did not notify changes of process name prior to 3.2. Most processes do not change their process name, but, for example, firefox forks processes that change their process name to "Web Content". The process name referred to here is the contents of /proc/PID/comm.
Quorum is the number of matching processes that must be run for an OK status.
Delay might be useful if it anticipated that a process may be reloaded (stopped and restarted), and it isn’t desired to down and up a vrrp instance.
A positive weight means that an OK status will add <weight> to the priority of all VRRP instances which monitor it. On the opposite, a negative weight will be subtracted from the initial priority in case of insufficient processes.
If the vrrp instance or sync group is not the address owner and the result is between -253 and 253, the result will be added to the initial priority of the VRRP instance (a negative value will reduce the priority), although the effective priority will be limited to the range [1,254].
If a vrrp instance using a track_process is a member of a sync group, unless sync_group_tracking_weight is set on the group weight 0 must be set. Likewise, if the vrrp instance is the address owner, weight 0 must also be set.
Rational for not using pgrep/pidof/killall and the likes:
Every time pgrep or its equivalent is run, it iterates though the /proc/[1-9][0-9]* directories, and opens the status and cmdline pseudo files in each directory. The cmdline pseudo file is mapped to the process’s address space, and so if that part of the process is swapped out, it will have to be fetched from the swap space. pgrep etc also include zombie processes whereas keepalived does not, since they aren’t running.
This implementation only iterates though /proc/[1-9][0-9]*/ directories at start up, and it won’t even read the cmdline pseudo files if ’full_command’ is not specified for any of the vrrp_track_process entries. After startup, it uses the process_events kernel <-> userspace connector to receive notification of process changes. If full_command is specified for any track_process instance, the cmdline pseudo file will have to be read upon notification of the creation of the new process, but at that time it is very unlikely that it will have already been swapped out.
On a busy system with a high number of process creations/terminations, using a track_script with pgrep/pidof/killall may be more efficient, although those processes are inefficient compared to the minimum that keepalived needs.
Using pgrep etc on a system that is swapping can have a significant detrimental impact on the performance of the system, due to having to fetch swapped memory from the swap space, thereby causing additional swapping.
BFD CONFIGURATION
This is an implementation of RFC5880 (Bidirectional forwarding detection), and this can be configured to work between 2 keepalived instances, but using unweighted track_bfds between a master/backup pair of VRRP instances means that the VRRP instance will only be able to come up if both VRRP instance are running, which somewhat defeats the purpose of VRRP.
This implementation has been tested with OpenBFDD (available at https://github.com/dyninc/OpenBFDD).
The syntax for bfd instance is :
bfd_instance
<STRING> {
# BFD Neighbor IP (synonym neighbour_ip)
neighbor_ip <IP ADDRESS>
# Source IP to
use (optional, except in order to ensure that the
# local port is valid, it is required)
source_ip <IP ADDRESS>
# Required min
RX interval, in ms (resolution is micro-seconds e.g. 3.312)
# (default is 10 ms)
min_rx <DECIMAL>
# Desired min
TX interval, in ms (resolution is micro-seconds)
# (default is 10 ms)
min_tx <DECIMAL>
# Desired idle
TX interval, in ms (resolution is micro-seconds)
# (default is 1000 ms)
idle_tx <DECIMAL>
# Number of
missed packets after
# which the session is declared down
# (default is 5)
multiplier <INTEGER>
# Operate in
passive mode (default is active)
passive
# outgoing IPv4
ttl to use (default 255)
ttl <INTEGER>
# outgoing IPv6
hoplimit to use (default 64)
hoplimit <INTEGER>
# maximum
reduction of ttl/hoplimit
# in received packet (default 0)
# (255 disables hop count checking)
max_hops <INTEGER>
# RFC 5883
specifies port 4784 must be used for multihop bfd, rather
than
# port 3784. Specifying multihop enables that option, but if
multiple hops
# are in use, then max_hops (see above) will also need to be
configured.
multihop [<BOOL>]
# Default
tracking weight
# Normally, positive weights are added to the vrrp instance
priority when
# the bfd instance is up, negative weights reduce the
priority when it is down.
# However, if reverse is specified, the priority is
decreased when up and
# increased when down. ’weight 0 reverse’ will
cause the vrrp instance to be down
# when the bfd instance is up, and vice versa.
weight <-253:253> [reverse]
# Normally bfd
event notifications are sent to both the VRRP and checker
processes.
# Specifying vrrp or checker will cause event notifications
for this bfd_instance
# only to be sent to the specified process
vrrp
checker
}
VRRPD CONFIGURATION
contains subblocks of VRRP script(s), VRRP synchronization group(s), VRRP gratuitous ARP and unsolicited neighbour advert delay group(s) and VRRP instance(s)
VRRP script(s)
The script will be executed periodically, every <interval> seconds. Its exit code will be recorded for all VRRP instances which monitor it. Note that the script will only be executed if at least one VRRP instance monitors it.
The default weight equals 0, which means that any VRRP instance monitoring the script will transition to the fault state after <fall> consecutive failures of the script. After that, <rise> consecutive successes will cause VRRP instances to leave the fault state, unless they are also in the fault state due to other scripts or interfaces that they are tracking.
A positive weight means that <rise> successes will add <weight> to the priority of all VRRP instances which monitor it. On the opposite, a negative weight will be subtracted from the initial priority in case of <fall> failures.
The syntax for the vrrp script is:
# Adds a script
to be executed periodically. Its exit code will be
# recorded for all VRRP instances and sync groups which are
monitoring it.
vrrp_script <SCRIPT_NAME> {
# path of the script to execute
script <STRING>|<QUOTED-STRING>
# seconds
between script invocations, (default: 1 second)
interval <INTEGER>
# seconds after
which script is considered to have failed
timeout <INTEGER>
# adjust
priority by this weight, (default: 0)
# For description of reverse, see track_script.
# ’weight 0 reverse’ will cause the vrrp
instance to be down when the
# script is up, and vice versa.
weight <INTEGER:-253..253> [reverse]
# required
number of successes for OK transition
rise <INTEGER>
# required
number of successes for KO transition
fall <INTEGER>
# user/group
names to run script under.
# group default to group of user
user USERNAME [GROUPNAME]
# assume script
initially is in failed state
init_fail
}
VRRP synchronization group(s)
VRRP Sync Group is an extension to VRRP protocol. The main goal is to define a bundle of VRRP instance to get synchronized together so that transition of one instance will be reflected to others group members.
In addition there is an enhanced notify feature for fine state transition catching.
You can also define multiple track policy in order to force state transition according to a third party event such as interface, scripts, file, BFD.
Important: for a SYNC group to run reliably, it is vital that all instances in the group are MASTER or that they are all either BACKUP or FAULT. A situation where some instances have higher priority on machine A and others have higher priority on machine B will lead to constant re-elections. For this reason, when instances are grouped, any track scripts/files configured against member VRRP instances must have their tracking weights unset (i.e. equal to zero). Any trackers with a non-zero priority will be ignored.
The syntax for vrrp_sync_group is :
vrrp_sync_group
<STRING> {
group {
# name of the vrrp_instance (see below)
# Set of VRRP_Instance string
<STRING>
<STRING>
...
}
#
Synchronization group tracking interface, script, file &
bfd will
# update the status/priority of all VRRP instances which are
members
# of the sync group.
# ’weight 0 reverse’ will cause the vrrp
instance to be down when the
# interface is up, and vice versa.
track_interface {
eth0
eth1
eth2 weight <-253..253> [reverse]
...
}
# add a
tracking script to the sync group (<SCRIPT_NAME> is
the name
# of the vrrp_script entry) go to FAULT state if any of
these go down
# if unweighted.
# reverse causes the direction of the adjustment of the
priority to be reversed.
track_script {
<SCRIPT_NAME>
<SCRIPT_NAME> weight <-253..253>
[reverse|noreverse]
}
# Files whose
state we monitor, value is added to effective priority.
# <STRING> is the name of a track_file
# weight defaults to weight configured in track_file
track_file {
<STRING>
<STRING> weight <-254..254> [reverse|noreverse]
...
}
# Process to
monitor, weight is added to effective priority.
# <STRING> is the name of a vrrp_track_process
# weight defaults to weight configured in
vrrp_track_process.
# See vrrp_instance track_process for description of weight.
track_process {
<STRING>
<STRING> weight <-254..254> [reverse|noreverse]
...
}
# BFD instances
we monitor, value is added to effective priority.
# <STRING> is the name of a BFD instance
track_bfd {
<STRING>
<STRING>
<STRING> weight <INTEGER: -253..253>
[reverse|noreverse]
...
}
# notify
scripts and alerts are optional
#
# filenames of scripts to run on transitions can be unquoted
(if
# just filename) or quoted (if it has parameters)
# The username and groupname specify the user and group
# under which the scripts should be run. If username is
# specified, the group defaults to the group of the user.
# If username is not specified, they default to the
# global script_user and script_group
# to MASTER
transition
notify_master /path/to_master.sh [username
[groupname]]
# to BACKUP
transition
notify_backup /path/to_backup.sh [username
[groupname]]
# FAULT
transition
notify_fault "/path/fault.sh VG_1" [username
[groupname]]
# executed when
stopping vrrp
notify_stop <STRING>|<QUOTED-STRING>
[username [groupname]]
#
notify_deleted causes DELETED to be sent to notifies rather
# than the default FAULT after a vrrp instance is deleted
during a
# reload. If a script is specified, that script will be
executed
# as well.
notify_deleted [<STRING>|<QUOTED-STRING>
[username [groupname]]]
# for ANY state
transition.
# "notify" script is called AFTER the notify_*
script(s) and
# is executed with 4 additional arguments after the
configured
# arguments provided by Keepalived:
# $(n-3) = "GROUP"|"INSTANCE"
# $(n-2) = name of the group or instance
# $(n-1) = target state of transition (stop only applies to
instances)
#
("MASTER"|"BACKUP"|"FAULT"|"STOP"|"DELETED")
# $(n) = priority value
# $(n-3) and $(n-1) are ALWAYS sent in uppercase, and the
possible
# strings sent are the same ones listed above
# ("GROUP"/"INSTANCE",
"MASTER"/"BACKUP"/"FAULT"/"STOP"/"DELETED")
# (note: DELETED is only applicable to instances)
notify <STRING>|<QUOTED-STRING> [username
[groupname]]
# The notify
fifo output is the same as the last 4 parameters for the
"notify"
# script, with the addition of
"MASTER_RX_LOWER_PRI" instead of state for an
# instance, and also "MASTER_PRIORITY" and
"BACKUP_PRIORITY" if the priority
# changes and notify_priority_changes is configured.
# MASTER_RX_LOWER_PRI is used if a master needs to set some
external state, such
# as setting a secondary IP address when using Amazon AWS;
if another keepalived
# has transitioned to master due to a communications break,
the lower priority
# instance will have taken over the secondary IP address,
and the proper master
# needs to be able to restore it.
# Send FIFO
notifies for vrrp priority changes
notify_priority_changes <BOOL>
# Send email
notification during state transition,
# using addresses in global_defs above (default no,
# unless global smtp_alert/smtp_alert_vrrp set)
smtp_alert <BOOL>
# DEPRECATED.
Use track_interface, track_script and
# track_file on vrrp_sync_groups instead.
global_tracking
# allow sync
groups to use differing weights.
# This probably WON’T WORK, but is a replacement for
# global_tracking in case different weights were used
# across different vrrp instances in the same sync group.
sync_group_tracking_weight
}
VRRP gratuitous ARP and unsolicited neighbour advert delay group(s)
specifies the setting of delays between sending gratuitous ARPs and unsolicited neighbour advertisements. This is intended for when an upstream switch is unable to handle being flooded with ARPs/NAs.
Use interface when the limits apply on the single physical interface. Use interfaces when a group of interfaces are linked to the same switch and the limits apply to the switch as a whole.
Note: Only one of interface or interfaces should be used per block.
If the global vrrp_garp_interval and/or vrrp_gna_interval are set, any interfaces that aren’t specified in a garp_group will inherit the global settings.
The syntax for garp_group is :
garp_group
{
# Sets the interval between Gratuitous ARP (in seconds,
resolution microseconds)
garp_interval <DECIMAL>
# Sets the
default interval between unsolicited NA (in seconds,
resolution microseconds)
gna_interval <DECIMAL>
# The physical
interface to which the intervals apply
interface <STRING>
# A list of
interfaces across which the delays are aggregated.
interfaces {
<STRING>
<STRING>
...
}
}
VRRP instance(s)
A VRRP Instance is the VRRP protocol key feature. It defines and configures VRRP behaviour to run on a specific interface. Each VRRP Instance is related to a unique interface.
The syntax for vrrp_instance is :
vrrp_instance
<STRING> {
# Initial state, MASTER|BACKUP
# If the priority is 255, then the instance will transition
immediately
# to MASTER if state MASTER is specified; otherwise the
instance will
# wait between 3 and 4 advert intervals before it can
transition,
# depending on the priority.
state MASTER
# interface for
inside_network, bound by vrrp.
# Note: if using unicasting, the interface can be omitted as
long
# as the unicast addresses are not IPv6 link local addresses
(this is
# necessary, for example, if using asymmetric routing).
# If the interface is omitted, then all VIPs and eVIPs
should specify
# the interface they are to be configured on, otherwise they
will be
# added to the default interface.
interface eth0
# If using
unicasting without specifying an interface, the VRF to
operate
# in can be specified.
vrf VRF_IF
# Use VRRP
Virtual MAC (macvlan).
# The macvlan will be created on the configured interface
for
# the VRRP instance, and the VIPs, and eVIPs of the matching
address
# family, which do not specify a different interface will be
configured
# on the macvlan.
# The VRRP adverts will also be sent and received on the
macvlan
# interface, unless vmac_xmit_base is configured.
# NOTE: If sysctl net.ipv4.conf.all.rp_filter is set,
# and this vrrp_instance is an IPv4 instance, using
# this option will cause the individual interfaces to be
# updated to the greater of their current setting, and
# all.rp_filter, as will default.rp_filter, and
all.rp_filter
# will be set to 0.
# The original settings are restored on termination.
# NOTE 2: If using use_vmac with unicast peers,
# vmac_xmit_base must be set.
# The MAC address can be specified with only 5 octets, in
which case
# the virtual_router_id will be used as the last octet.
# If netlink_notify_msg is specified, when keepalived
creates a macvlan
# interface it will force a netlink message to be sent for
the base interface
# since the kernel does not send one, even if the
promiscuity of the base
# interface has been updated.
# By default the VMAC is created in the same link group as
the parent interface.
# Specifying group GROUP_ID (where GROUP_ID is either a
valid group number, or a
# name in /etc/iproute2/group) will create the interface in
the specied group.
# The name option can be specified if you want to use an
interface name "group".
use_vmac [[name] <VMAC_INTERFACE_NAME>]
[MAC_ADDRESS] [netlink_notify_msg] [group GROUP_ID]
# use_vmac_addr
is used to create VMAC (macvlan) interfaces for
# each interface that is used by a VIP or eVIP where the
interface
# is not the same as the interface on which the VRRP
instance is
# configured or the eVIP’s address family does not
match the VRRP
# instance’s. Alternatively, use_vmac can be specified
against each
# VIP/eVIP that specifies an interface (dev).
# NOTE: if use_vmac is specfied and an eVIP is not the same
address
# family as the vrrp instance, unless use_vmac_addr is
specified, or
# use_vmac is specified for the eVIP, the eVIP will be
configured on
# the vrrp instance’s VMAC, which will have the wrong
MAC address for
# the address family of the eVIP.
use_vmac_addr
# Send/Recv
VRRP messages from base interface instead of
# VMAC interface
vmac_xmit_base
# Use IPVLAN
interface. keepalived will create a mode L2
# ipvlan interface on top of the specified interface.
# For IPv4 instances, an IP address is required, for IPv6
# the address is optional, in which case the link local
# address will be used.
# The mode flags defaults to bridge. NOTE: the mode flags
must be the
# same for all ipvlans on the same underlying interface.
# It is safer to configure an interface name, in case
keepalived crashes
# and restarts, in which case it can more reliably find a
previously
# created interface.
# The name option can be specified if you want to use a name
that would cause
# a parsing error (e.g. "bridge").
# For a description of the group option, see use_vmac.
use_ipvlan [[name] <INTERFACE_NAME>] [IP_ADDRESS]
[bridge|private|vepa] [group GROUP_ID]
# force
instance to use IPv6 (this option is deprecated since
# the virtual ip addresses determine whether IPv4 or IPv6 is
used).
native_ipv6
# Ignore VRRP
interface faults (default unset).
# Note: when using IPv6, setting the interface
administratively down, e.g.
# ’ip link set IF down’ will by default cause
all IPv6 addresses to be
# deleted from the interface, and consequently the VRRP
instance will
# go to fault state due to the addresses being deleted.
Setting sysctl
# net.ipv6.conf.IF.keep_addr_on_down to 1 will allow non
link-local addresses
# to remain when the interface is downed.
dont_track_primary
# optional,
monitor these as well.
# go to FAULT state if any of these go down if unweighted.
# When a weight is specified in track_interface, instead of
setting the vrrp
# instance to the FAULT state in case of failure, its
priority will be
# increased by the weight when the interface is up (for
positive weights),
# or decreased by the weight’s absolute value when the
interface is down
# (for negative weights), unless reverse is specified, in
which case the
# direction of adjustment of the priority is reversed.
# The weight must be comprised between -253 and +253
inclusive.
# 0 is the default behaviour which means that a failure
implies a
# FAULT state. The common practice is to use positive
weights to count a
# limited number of good services so that the server with
the highest count
# becomes master. Negative weights are better to count
unexpected failures
# among a high number of interfaces, as it will not saturate
even with high
# number of interfaces. Use reverse to increase priority if
an interfaces is down
track_interface {
eth0
eth1
eth2 weight <-253..253> [reverse]
...
}
# add a
tracking script to the interface
# (<SCRIPT_NAME> is the name of the vrrp_track_script
entry)
# The same principle as track_interface can be applied to
track_script entries,
# except that an unspecified weight means that the default
weight declared in
# the script will be used (which itself defaults to 0).
# reverse causes the direction of the adjustment of the
priority to be reversed.
track_script {
<SCRIPT_NAME>
<SCRIPT_NAME> weight <-253..253>
[reverse|no_reverse]
}
# Files whose
state we monitor, value is added to effective priority.
# <STRING> is the name of a track_file
track_file {
<STRING>
<STRING>
<STRING> weight <-254..254> [reverse|noreverse]
...
}
# Positive
weights are added/subtracted when the process is running,
# negative weights are subtracted/added when the not
running.
# If reverse is specified, the addition/subtraction is
reversed.
# <STRING> is the name of a vrrp_track_process
# weight defaults to weight configured in vrrp_track_process
track_process {
<STRING>
<STRING> weight <-254..254> [reverse|noreverse]
...
}
# BFD instances
we monitor, value is added to effective priority,
# unless reverse is specified, when the value is subtracted.
# Positive weights are add/subtracted when the bfd instance
is up,
# negative weights are subtracted/added when the bfd
instance is down.
# <STRING> is the name of a BFD instance
track_bfd {
<STRING>
<STRING>
<STRING> weight <INTEGER: -253..253>
[reverse|noreverse]
...
}
# default IP
for binding vrrpd is the primary IP
# on interface. If you want to hide the location of vrrpd,
# use this IP as src_addr for multicast or unicast vrrp
# packets. (since it’s multicast, vrrpd will get the
reply
# packet no matter what src_addr is used).
# optional
mcast_src_ip <IPADDR>
unicast_src_ip <IPADDR>
# specify an
alternative multicast address to use as the destination
# of VRRP adverts and for listening for adverts. Note, if
you are using
# multiple VRRP instances with VMACs and different multicast
addresses
# and the same VRID, you will have to specify alternative
MAC addresses
# for at least all but one of the VMACs.
# IPv6 multicast addresses must be link-local, i.e. start
ffX2:
# Using different multicast addresses with IPv6 on the same
interface without
# using VMACs is only supported if the kernel supports
IPV6_MULTICAST_ALL
# (from Linux v4.20).
mcast_dst_ip <MULTICAST_IPADDR>
# if the
configured src_ip doesn’t exist or is removed put the
# instance into fault state
track_src_ip
# VRRP version
to run on interface
# default is global parameter vrrp_version, but IPv6
instances will
# always use version 3.
version <2 or 3>
# The following
enables checking that when in unicast mode, the
# source address of a VRRP packet is one of our unicast
peers.
check_unicast_src
# Do not send
VRRP adverts over a VRRP multicast group.
# Instead it sends adverts to the following list of
# ip addresses using unicast. It can be cool to use
# the VRRP FSM and features in a networking
# environment where multicast is not supported!
# IP addresses specified can be IPv4 as well as IPv6.
# If min_ttl and/or max_ttl are specified, the TTL/hop limit
# of any received packet is checked against the specified
# TTL range, and is discarded if it is outside the range.
# Specifying min_ttl or max_ttl turns on check_unicast_src.
unicast_peer {
<IPADDR> [min_ttl {0..255}] [max_ttl {0..255}]
...
}
# It is not
possible to operate in unicast mode without any peers.
# Until v2.2.4 keepalived would silently operate in
multicast mode
# if no peers were specified but a unicast keyword had been
specified.
# Using this keywork stops defaulting to multicast if no
peers are
# specified and puts the VRRP instance into fault state.
unicast_fault_no_peer
# Specify the
unicast TTL/HLIM for sending unicast adverts
unicast_ttl {0..255}
# The checksum
calculation when using VRRPv3 changed after v1.3.6.
# The reason for the change is that keepalived was
calculating the
# checksum using the multicast address even when it was
using
# unicast, whereas the checksum should be calculated using
the
# actual address that is in the IPv4 header.
# Setting this flag forces the old checksum algorithm to be
used
# to maintain backward compatibility, although keepalived
will
# attempt to maintain compatibility anyway if it sees an old
# version checksum. Specifying never will turn off auto
detection
# of old checksums. [This option may not be enabled - check
output
# of ’keepalived -v’ for OLD_CHKSUM_COMPAT.]
old_unicast_checksum [never]
# Some
manufacturers (e.g. Cisco and Juniper) interpret RFC5798
5.2.8
# as applying only to IPv6, since the pseudo-header in
RFC2460 is
# specified only for IPv6, although most open source
implementations,
# including tcpdump/wireshark, include the pseudo-header for
IPv4.
# Keepalived by default uses a pseudo-header for VRRPv3 IPv4
as well.
# Setting this option turns off including the pseudo-header
in the
# checksum calculation for VRRPv3 IPv4.
v3_checksum_as_v2 [<BOOL>]
# interface
specific settings, same as global parameters.
# default to global parameters
garp_master_delay 10
garp_master_repeat 1
garp_lower_prio_delay 10
garp_lower_prio_repeat 1
garp_master_refresh 60
garp_master_refresh_repeat 2
|
# specifying 0 disables feature |
# The VRRP RFCs
state that the master down timer is 3 advert intervals plus
# a skew time. Setting down_timer_adverts means the master
down timer will be
# down_timer_adverts advert intervals.
# The default is 3, to conform with the VRRP RFCs. Setting
this to any other
# value is a deviation from the VRRP protocol. All virtual
routers for a given
# VRRP instance MUST use the same value.
down_timer_adverts [1-100]
# Some users
experience "thread_timer_expired" log messages.
These are caused
# by the kernel not scheduling keepalived quickly enough
after a timer expired,
# which is always due to insufficient CPU resources being
available (if running
# keepalived in a VM it could be due to the VM itself not
being scheduled), or
# keepalived not being run at a high enough priority (see
realtime scheduling
# options above).
# If nopreempt is configured and another instance has become
master, then there
# are circumstances where this instance is required not to
resume as master, but
# rather transition to backup.
# If using this option (and nopreempt is configured),
keepalived will calculate
# whether another instance may have taken over (based on the
advert interval and
# the highest priority of the other instances - default 254
unless specified with
# this option), and if that time has expired since the last
advert has been sent,
# the VRRP instance will revert to backup state (remember to
include and track_script
# etc. weights when calculating the highest priority of
other instances).
thread_timer_expired
[HIGHEST_PRIORITY_OF_OTHER_INSTANCES]
# If keepalived
is late running by more than 2 advert intervals for a VRRP
instance,
# it is possible that another instance has taken over as
master.
# If a lower priority advert is received, don’t send
another advert.
# This causes adherence to the RFCs (defaults to global
# vrrp_lower_priority_dont_send_advert).
lower_prio_no_advert [<BOOL>]
# If we are
master and receive a higher priority advert, send an advert
# (which will be lower priority than the other master),
before we transition
# to backup. This means that if the other master has
garp_lower_prio_repeat
# set, it will resend garp messages. This is to get around
the problem of
# their having been two simultaneous masters, and the last
GARP
# messages seen were from us.
higher_prio_send_advert [<BOOL>]
# arbitrary
unique number from 1 to 255
# used to differentiate multiple instances of vrrpd
# running on the same network interface and address
# family and multicast/unicast (and hence same socket).
# Note: using the same virtual_router_id with the same
# address family on different interfaces has been known
# to cause problems with some network switches; if you
# are experiencing problems with using the same
# virtual_router_id on different interfaces, but the
problems
# are resolved by not duplicating virtual_router_ids, your
# network switches are probably not functioning correctly.
#
# Whilst in general it is important not to duplicate a
# virtual_router_id on the same network interface, there is
a
# special case when using unicasting if the unicast peers
for
# the vrrp instances with duplicated virtual_router_ids on
the
# network interface do not overlap, in which case
virtual_router_ids
# can be duplicated.
# It is also possible to duplicate virtual_router_ids on an
# interface with multicasting if different multicast
addresses
# are used (see mcast_dst_ip).
virtual_router_id 51
# for electing
MASTER, highest priority wins.
# The valid range of values for priority is [1-255], with
priority
# 255 meaning "address owner".
# To be MASTER, it is recommended to make this 50 more than
on
# other machines. All systems should have different
priorities
# in order to make behaviour deterministic. If you want to
stop
# a higher priority instance taking over as master when it
starts,
# configure no_preempt rather than using equal priorities.
# If no_accept is configured (or vrrp_strict # which also
sets
# no_accept mode), then unless the vrrp_instance has
priority 255,
# the system will not receive packets addressed to the #
VIPs/eVIPs,
# and the VIPs/eVIPs can only be used for routeing purposes.
# Further, if an instance has priority 255 configured, the
priority cannot
# be reduced by track_scripts, track_process etc, and
likewise
# track_scripts etc cannot increase the priority to 255 if
the configured
# priority is not 255.
priority 100
# VRRP Advert
interval in seconds (e.g. 0.92) (use default)
advert_int 1
# Note:
authentication was removed from the VRRPv2 specification by
# RFC3768 in 2004.
# Use of this option is non-compliant and can cause
problems; avoid
# using if possible, except when using unicast, where it can
be helpful.
authentication {
# PASS|AH
# PASS - Simple password (suggested)
# AH - IPSEC (not recommended))
auth_type PASS
# Password for
accessing vrrpd.
# should be the same on all machines.
# Only the first eight (8) characters are used.
auth_pass 1234
}
# addresses
add|del on change to MASTER, to BACKUP.
# With the same entries on other machines,
# the opposite transition will be occurring.
# For virtual_ipaddress, virtual_ipaddress_excluded,
# virtual_routes and virtual_rules most of the options
# match the options of the command ip address/route/rule
add.
# The track_group option only applies to static
addresses/routes/rules.
# no_track is specific to keepalived and means that the
# vrrp_instance will not transition out of master state
# if the address/route/rule is deleted and the
address/route/rule
# will not be reinstated until the vrrp instance next
transitions
# to master.
# <LABEL>: is optional and creates a name for the
alias.
For compatibility with "ifconfig", it should
be of the form <realdev>:<anytext>, for example
eth0:1 for an alias on eth0.
# <SCOPE>:
("site"|"link"|"host"|"nowhere"|"global")
# preferred_lft is set to 0 to deprecate IPv6 addresses
(this is the
# default if the address mask is /128). Use
"preferred_lft forever"
# to specify that a /128 address should not be deprecated.
# NOTE: care needs to be taken if dev is specified for an
address and
# your network uses MAC learning switches. The VRRP protocol
ensures
# that the source MAC address of the interface sending
adverts is
# maintained in the MAC cache of switches; however by
default this
# will not work for the MACs of any VIPs/eVIPs that are
configured on
# different interfaces from the interface on which the VRRP
instance is
# configured, since the interface, especially if it is a
VMAC interface,
# will only send using the MAC address of the interface in
response to
# ARP requests. This may mean that the interface MAC
addresses may
# time out in the MAC caches of switches. In order to avoid
this, use
# the garp_extra_if or garp_extra_if_vmac options to send
periodic
# GARP/ND messages on those interfaces.
virtual_ipaddress {
<IPADDR>[/<MASK>] [brd <IPADDR>] [dev
<STRING>] [use_vmac] [scope <SCOPE>]
[label <LABEL>] [peer <IPADDR>] [home]
[-nodad] [mngtmpaddr] [noprefixroute]
[autojoin] [no_track] [preferred_lft nn|forever]
192.168.200.17/24 dev eth1
192.168.200.18/24 dev eth2 label eth2:1
}
# VRRP IP
excluded from VRRP optional.
# For cases with large numbers (eg 200) of IPs
# on the same interface. To decrease the number
# of addresses sent in adverts, you can exclude
# most IPs from adverts.
# The IPs are add|del as for virtual_ipaddress.
# Can also be used if you want to be able to add
# a mixture of IPv4 and IPv6 addresses, since all
# addresses in virtual_ipaddress must be of the
# same family.
virtual_ipaddress_excluded {
<IPADDR>[/<MASK>] [brd <IPADDR>] [dev
<STRING>] [scope <SCOPE>]
[label <LABEL>] [peer <IPADDR>] [home]
[-nodad] [mngtmpaddr] [noprefixroute]
[autojoin] [no_track]
<IPADDR>[/<MASK>] ...
...
}
# Specifying no
virtual IP addresses is generally a configuration error
# and VRRP version 3 explicitly states that the minimum
number of addresses
# is 1. Consequently keepalived warns if no VIPs are
configured.
# There are, however, circumstances when it is useful to
have no VIPs, for
# example when cloud servers, e.g. AWS, where floating IP
addresses are
# managed administratively, and are not configured on the
cloud virtual
# server. Specifying no_virtual_ipaddress supresses warnings
for no VIPs,
# and allows VRRPv3 to be used with no VIPs.
# WARNING - when using this with VRRPv3 it causes a protocol
violation and
# may not work with other VRRP implementations.
no_virtual_ipaddress
# Set the
promote_secondaries flag on the interface to stop other
# addresses in the same CIDR being removed when 1 of them is
removed
# For example if 10.1.1.2/24 and 10.1.1.3/24 are both
configured on an
# interface, and one is removed, unless promote_secondaries
is set on
# the interface the other address will also be removed.
promote_secondaries
# routes
add|del when changing to MASTER, to BACKUP.
# See static_routes for more details
virtual_routes {
# src <IPADDR> [to] <IPADDR>/<MASK> via|gw
<IPADDR>
# [or <IPADDR>] dev <STRING> scope <SCOPE>
table <TABLE>
src 192.168.100.1 to 192.168.109.0/24 via 192.168.200.254
dev eth1
192.168.110.0/24 via 192.168.200.254 dev eth1
192.168.111.0/24 dev eth2 no_track
192.168.112.0/24 via 192.168.100.254
192.168.113.0/24 via 192.168.200.254 or 192.168.100.254 dev
eth1
blackhole 192.168.114.0/24
0.0.0.0/0 gw 192.168.0.1 table 100 # To set a default
gateway into table 100.
}
# rules add|del
when changing to MASTER, to BACKUP
# See static_rules for more details
virtual_rules {
from 192.168.2.0/24 table 1
to 192.168.2.0/24 table 1 no_track
}
# VRRPv3 has an
Accept Mode to allow the virtual router when not the
# address owner to receive packets addressed to a VIP. This
is the default
# setting unless strict mode is set. As an extension, this
also works for
# VRRPv2 (RFC 3768 doesn’t define an accept mode).
# --
# Accept packets to non address-owner
accept
# Drop packets
to non address-owner.
no_accept
# A higher
priority VRRP instance will normally preempt a lower
priority instance
# when it comes online. "nopreempt" stops the
higher priority machine taking
# over the master role, and allows the lower priority
machine to remain as
# master.
# NOTE: For this to work, the initial state must not be
MASTER.
# --
nopreempt
# for backwards
compatibility
preempt
# Seconds of
delay until preemption after getting the advertisement
timeout
# at startup or when seeing a lower priority master.
#
# Since it is a delay, it cannot speed up taking over as
master.
# "preempt_delay" specifies the time in seconds to
delay preempting compared
# to if "preempt_delay" is not specified.
Advertisement timeout is
# 3 * advert_int + skew_time. Skew_time is defined by
RFC3768 and RFC5798.
#
# So if "advert_int" is 1, and priority is 128,
the instance would normally
# wait 3.5 seconds before taking over as master. If
"preempt_delay 2" is
# specified, then the delay before taking over as master
would be approximately
# 5.5 seconds.
#
# (if not disabled by "nopreempt").
# Range: 0 (default) to 1000 (e.g. 4.12)
# NOTE: For this to work, the initial state must not be
MASTER.
preempt_delay 300 # waits 5 minutes
# See
description of global vrrp_skip_check_adv_addr, which
# sets the default value. Defaults to
vrrp_skip_check_adv_addr
skip_check_adv_addr [<BOOL>]
# See
description of global vrrp_strict
# If strict_mode is not specified, it takes the value of
vrrp_strict.
# If strict_mode without a parameter is specified, it
defaults to on.
strict_mode [<BOOL>]
# Debug level,
not implemented yet.
# LEVEL is a number in the range 0 to 4
debug <LEVEL>
# notify
scripts, alert as above
notify_master <STRING>|<QUOTED-STRING>
[username [groupname]]
notify_backup <STRING>|<QUOTED-STRING>
[username [groupname]]
notify_fault <STRING>|<QUOTED-STRING>
[username [groupname]]
# executed when stopping vrrp
notify_stop <STRING>|<QUOTED-STRING>
[username [groupname]]
notify <STRING>|<QUOTED-STRING> [username
[groupname]]
# The
notify_master_rx_lower_pri script is executed if a master
# receives an advert with priority lower than the
master’s priority.
notify_master_rx_lower_pri
<STRING>|<QUOTED-STRING> [username
[groupname]]
# Send vrrp
instance priority notifications on notify FIFOs.
notify_priority_changes <BOOL>
# Send SMTP
alerts
smtp_alert <BOOL>
# Set socket
receive buffer size (see global_defs
# vrrp_rx_bufs_policy for explanation)
kernel_rx_buf_size
# Set use of
linkbeat for the interface of this VRRP instance. This
option is
# deprecated - use linkbeat_interfaces block instead.
linkbeat_use_polling
}
Interface up/down status change debouncing
If an interface
that is being used (or tracked) by a VRRP instance goes to
down state,
the VRRP instance(s) will, by default, immediately
transition to FAULT state, and when
all relevant interfaces are back up again the VRRP
instance(s) will immediately transition
to BACKUP state.
This can cause
problems if interfaces are bouncing, and so delays can be
specified
between the interface state change and the transition to
FAULT/BACKUP state. If the
interface returns to its original state before the delay
expires, no associated VRRP
instance state transition will occur.
interface_up_down_delays {
|
ifname down_delay [up_delay] | |
|
ifname2 down_delay [up_delay] | |
|
... | |
|
} |
The delays are
specified in seconds, with a resolution of microseconds,
e.g. a delay of
0.00001 means 10 usecs. A delay of 0 means no delay in state
change. The maximum delay
that can be specified is 255 seconds.
If up_delay is omitted, it is set to be the same as the down delay.
The delay on an
interface must be less than two (or more precisely one less
than
down_timer_adverts (default 3)) times the advert interval of
any VRRP instance
using that interface (otherwise a backup instance, while not
receiving adverts
may time out and become master before this instance
transitions to FAULT state).
Consequently the up/down delays can be dynamically reduced
if another instance is
master with a shorter advert interval.
If the VRRP
instance is using a VMAC, it will inherit the up/down
debounce delays of
its parent interface.
LVS CONFIGURATION
contains subblocks of Virtual server group(s) and Virtual server(s)
The subblocks contain arguments for configuring Linux IPVS (LVS) feature. Knowledge of ipvsadm(8) will be helpful here. Configuring LVS is achieved by defining virtual server groups, virtual servers and optionally SSL configuration. Every virtual server defines a set of real servers, you can attach healthcheckers to each real server. Keepalived will then lead LVS operation by dynamically maintaining topology.
For details of what configuration combinations are valid, see the ipvsadm(8) man page.
Note: Where an option can be configured for a virtual server, real server, and possibly checker, the virtual server setting is the default for real servers, and the real server setting is the default for checkers.
Note: Tunnelled real/sorry servers can differ from the address family of the virtual server and non tunnelled real/sorry servers, which all have to be the same. If a virtual server uses a fwmark, and all the real/sorry servers are tunnelled, the address family of the virtual server will be the same as the address family of the real/sorry servers if they are all the same, otherwise it will default to IPv4 (use ip_family inet6 to override this).
Note: The port for the virtual server can only be omitted if the virtual service is persistent.
Virtual server group(s)
This feature offers a way to simplify your configuration by factorizing virtual server definitions. If you need to define a bunch of virtual servers with exactly the same real server topology then this feature will make your configuration much more readable, optimize the duplication of IPVS virtual servers if nftables_ipvs is used, and will optimize healthchecking task by only spawning one healthchecker where multiple virtual servers declaration would spawn a dedicated healthchecker for every real server which will waste system resources.
Any combination of IP addresses, IP address ranges and firewall marks can be used, provided that the family of the IP addresses of the virtual server group match the IP address family of all the real servers of any virtual server using the virtual server group. The one exception to this is that the virtual server group can be configured with both IPv4 and IPv6 addresses and fwmarks provided that all the real servers (and sorry servers) of all virtual servers using the virtual server group use tunnel forwarding; if fwmarks are specified in this case, the address family must be specified (the one exception to this is if the virtual server group has no IP addresses (i.e. fwmarks only) and all the real/sorry servers are tunnelled, it will default to IPv4; it is not good practice to rely on this and the address families of the fwmarks should be configured). Use of this option is intended for very large LVSs, but note, this can create a huge number of virtual servers unless nftables_ipvs is used. The use of nftables_ipvs is stringly recommended due to the very significant optimisations and efficiencies it provides.
NOTE: do not configure more than one TCP, one UDP and one SCTP virtual server with the same IP address family using the same virtual server group (or to put it another way do not have two virtual servers with the same protocol and address family using the same virtual server group); if all the real servers are tunnelled, then you must not have both IPv4 and IPv6 virtual servers with the same protocol.
The syntax for virtual_server_group is :
virtual_server_group
<STRING> {
# Virtual IP Address and Port
<IPADDR> [<PORT>]
<IPADDR> [<PORT>]
...
# <IPADDR RANGE> is any of the following forms (or
their IPv6 equivalents)
# XXX.YYY.ZZZ.WWW-VVV eg 192.168.200.1-10 (includes both .1
and .10)
# AAA.BBB.CCC.DDD-EEE.FFF.GGG.HHH eg
192.168.200.250-192.168.201.10
# III.JJJ.KKK.LLL/nn eg 192.168.202.8/29
<IPADDR RANGE> [<PORT>] # VIP range [VPORT]
<IPADDR RANGE> [<PORT>]
...
# Firewall Mark (fwmark)
# inet/inet6 should only be specified for virtual server
groups
# where all real servers of the virtual servers are
tunnelled.
fwmark <INTEGER>
fwmark <INTEGER> [inet|inet6]
...
}
Virtual server(s)
A virtual_server can be a declaration of one of <IPADDR> [<PORT>] , fwmark <INTEGER> or group <STRING>
The syntax for virtual_server is :
virtual_server
<IPADDR> [<PORT>] |
virtual_server fwmark <INTEGER> |
virtual_server group <STRING> {
# LVS scheduler
lvs_sched
rr|wrr|lc|wlc|lblc|sh|mh|dh|fo|ovf|lblcr|sed|nq|twos
# Enable flag-1
for scheduler (-b flag-1 in ipvsadm)
flag-1
# Enable flag-2 for scheduler (-b flag-2 in ipvsadm)
flag-2
# Enable flag-3 for scheduler (-b flag-3 in ipvsadm)
flag-3
# Enable sh-port for sh scheduler (-b sh-port in ipvsadm)
sh-port
# Enable sh-fallback for sh scheduler (-b sh-fallback in
ipvsadm)
sh-fallback
# Enable mh-port for mh scheduler (-b mh-port in ipvsadm)
mh-port
# Enable mh-fallback for mh scheduler (-b mh-fallback in
ipvsadm)
mh-fallback
# Enable One-Packet-Scheduling for UDP (-o in ipvsadm)
ops
# Override
default LVS forwarding method (default is NAT).
# Default tunnel type is ipip. Since Linux 5.2 the GUE
tunnel type can
# be specified. If using GUE, a port number is required.
Since Linux 5.3
# if the tunnel type is GUE, the checksum option can also be
specified.
# Since Linux 5.3, GRE tunnel type is also supported, but
without the
# remcsum option.
lvs_method NAT|DR
or
lvs_method TUN [type {ipip|gue port NUM|gre}
[nocsum|csum|remcsum]]
# LVS persistence engine name (currently only sip supported)
persistence_engine <STRING>
# LVS persistence timeout in seconds, default 6 minutes
persistence_timeout [<INTEGER>]
# LVS granularity mask (-M in ipvsadm)
persistence_granularity <NETMASK>
# L4 protocol
protocol TCP|UDP|SCTP
# If VS IP address is not set,
# suspend healthchecker’s activity
ha_suspend
# Send email
notification during quorum up/down transition,
# using addresses in global_defs above (default no,
# unless global smtp_alert/smtp_alert_checker set)
smtp_alert <BOOL>
# Default
VirtualHost string for HTTP_GET or SSL_GET
# eg virtualhost www.firewall.loc
# Overridden by virtualhost config of real server or checker
virtualhost <STRING>
# snmp_name is
a text string that is returned as part of the snmp
# data for this virtual server. It can be used to help
identify the
# virtual server when parsing SNMP output.
snmp_name <STRING>
# On daemon
startup assume that all RSs are down
# and healthchecks failed. This helps to prevent
# false positives on startup. Alpha mode is
# disabled by default.
alpha
# On daemon
shutdown consider quorum and RS
# down notifiers for execution, where appropriate.
# Omega mode is disabled by default.
omega
# Minimum total
weight of all live servers in
# the pool necessary to operate VS with no
# quality regression. Defaults to 1.
quorum <INTEGER>
# Tolerate this
much weight units compared to the
# nominal quorum, when considering quorum gain
# or loss. A flap dampener. Defaults to 0.
hysteresis <INTEGER>
# Script to
execute when quorum is gained.
quorum_up <STRING>|<QUOTED-STRING> [username
[groupname]]
# Script to
execute when quorum is lost.
quorum_down <STRING>|<QUOTED-STRING>
[username [groupname]]
# IP family for
a fwmark service (only needed if all real servers are
tunnelled
# and persistence_granularity is not specified). Defaults to
inet if not specified.
ip_family inet|inet6
# setup realserver(s)
# RS to add to
LVS topology when the quorum isn’t achieved.
# If a sorry server is configured, all real servers will
# be brought down when the quorum is not achieved and be
# replaced with the sorry server.
sorry_server <IPADDR> [<PORT>]
# applies inhibit_on_failure behaviour to the sorry_server
# It is very unlikely that you want to use this, since if
you
# have a real server available, you almost certainly want to
use
# it.
sorry_server_inhibit
# Sorry server LVS forwarding method. Default is the virtual
# server’s default.
# For details of tunnel type, see virtual_server details.
sorry_server_lvs_method NAT|DR
or
sorry_server_lvs_method TUN [type {ipip|gue port
NUM|gre} [nocsum|csum|remcsum]]
# Optional
connection timeout in seconds.
# The default is 5 seconds
connect_timeout <TIMER>
# Retry count
to make additional checks if check
# of an alive server fails. Default: 1 unless specified
below
retry <INTEGER>
# delay before
retry after failure. Defaults to delay_loop for DNS_CHECK,
# 3 seconds for HTTP_GET and SSL_GET, and 1 second
otherwise.
delay_before_retry <TIMER>
# Optional
random delay to start the initial check
# for maximum N seconds.
# Useful to scatter multiple simultaneous
# checks to the same RS. Enabled by default, with
# the maximum at delay_loop. Specify 0 to disable
warmup <TIMER>
# delay timer
for checker polling (60 seconds if not specified)
delay_loop <TIMER>
# Set weight to
0 when healthchecker detects failure
inhibit_on_failure
# one entry for
each realserver
real_server <IPADDR> [<PORT>] {
# relative weight to use, default: 1
weight <INTEGER>
# LVS forwarding method
# For details of tunnel type, see virtual_server details.
The default
# setting is taken from the virtual_server’s setting.
lvs_method NAT|DR
or
lvs_method TUN [type {ipip|gue port NUM|gre}
[nocsum|csum|remcsum]]
# Script to
execute when healthchecker
# considers service as up.
notify_up <STRING>|<QUOTED-STRING> [username
[groupname]]
# Script to execute when healthchecker
# considers service as down.
notify_down <STRING>|<QUOTED-STRING>
[username [groupname]]
# maximum
number of connections to server
uthreshold <INTEGER>
# minimum number of connections to server
lthreshold <INTEGER>
# Send email
notification during state transition,
# using addresses in global_defs above (default yes,
# unless global smtp_alert/smtp_alert_checker set)
smtp_alert <BOOL>
# Default
VirtualHost string for HTTP_GET or SSL_GET
# eg virtualhost www.firewall.loc
# Overridden by virtualhost config of a checker
virtualhost <STRING>
|
# snmp_name is a text string that is returned as part of the snmp | |
|
# data for this real server. It can be used to help identify the | |
|
# real server when parsing SNMP output. | |
|
snmp_name <STRING> |
alpha
<BOOL> # see above
connect_timeout <TIMER> # see above
retry <INTEGER> # see above
delay_before_retry <TIMER> # see above
warmup <TIMER> # see above
delay_loop <TIMER> # see above
inhibit_on_failure <BOOL> # see above
#
healthcheckers. Can be multiple of each type
#
HTTP_GET|SSL_GET|TCP_CHECK|SMTP_CHECK|DNS_CHECK|MISC_CHECK|BFD_CHECK|UDP_CHECK|PING_CHECK|FILE_CHECK
# All checkers
have the following options, except MISC_CHECK which only
# has options alpha onwards, and BFD_CHECK and FILE_CHECK
which have none
# of the standard options:
CHECKER_TYPE {
# ======== generic connection options
# Optional IP address to connect to.
# The default is the realserver IP
connect_ip <IPADDR>
# Optional port
to connect to
# The default is the realserver port
connect_port <PORT>
# Optional
address to use to
# originate the connection
bindto <IPADDR>
# Optional
interface to use; needed if
# the bindto address is IPv6 link local
bind_if <IFNAME>
# Optional
source port to
# originate the connection from
bind_port <PORT>
# Optional
fwmark to mark all outgoing
# checker packets with
fwmark <INTEGER>
alpha
<BOOL> # see above
connect_timeout <TIMER> # see above
retry <INTEGER> # see above
delay_before_retry <TIMER> # see above
warmup <TIMER> # see above
delay_loop <TIMER> # see above
log_all_failures <BOOL> # log all failures when
checker up
}
# The following options are additional checker specific
# HTTP and SSL
healthcheckers
HTTP_GET|SSL_GET {
# HTTP protocol version, one of 1.0, 1.0C, 1.1
# Protocol version 1.0C means version 1.0 with the addition
# of a "Connection: close" line, which is included
in
# version 1.1 by default.
http_protocol <PROTOCOL>
# When alpha mode is set, or when recovering from a failure,
# each URL is checked, with a delay of <delay_loop>
between
# each check. if there were 20 URLs, and the
<delay_loop> were
# 3 seconds, it would take 1 minute before the RS would come
up
# following startup, or recovery from a failure. Setting
# fast_recovery removes the delay, both at start up and
after
# recovery from a failure, meaning that the RS will come up
# once all the URLs have been checked, with no delay between
# checking each URL.
fast_recovery [<BOOL>]
# An url to test
# can have multiple entries here
url {
#eg path / , or path /mrtg2/
path <STRING>
# healthcheck needs digest
# or status_code and digest
# Digest computed with genhash
# eg digest 9b3a0c85a887a256d6939da88aabd8cd
digest <STRING>
# status code returned in the HTTP header
# eg status_code 200 or status_code 200-299 400-499 503 505
# Default is 200-299
status_code <INTEGER|RANGE>
[<INTEGER|RANGE>] ...
# VirtualHost string. eg virtualhost www.firewall.loc
# If not set, uses virtualhost from real or virtual server
virtualhost <STRING>
# Regular expression to search returned data against.
# A failure to match causes the check to fail.
regex <STRING>
# Reverse the sense of the match, so a match of the
# returned text causes the check to fail.
regex_no_match
# Space separated list of options for regex.
# See man pcre2api for a description of the options.
# The following option are supported:
# allow_empty_class alt_bsux auto_callout caseless
# dollar_endonly dotall dupnames extended firstline
# match_unset_backref multiline never_ucp never_utf
# no_auto_capture no_auto_possess no_dotstar_anchor
# no_start_optimize ucp ungreedy utf never_backslash_c
# alt_circumflex alt_verbnames use_offset_limit
regex_options <OPTIONS>
# For complicated regular expressions a larger stack
# may be needed, and this allows the start and maximum
# sizes in bytes to be specified. For more details see
# the documentation for pcre2_jit_stack_create()
regex_stack <START> <MAX>
# The minimum offset into the returned data to start
# checking for the regex pattern match. This can save
# processing time if the returned data is large.
regex_min_offset <OFFSET>
# The maximum offset into the returned data for the
# start of the subject match.
regex_max_offset <OFFSET>
|
# SSL_GET only - see SSL_GET below for description | |||
|
tls_compliant |
}
}
SSL_GET
{
# when provided, send Server Name Indicator during SSL
handshake
enable_sni
|
# Comply with TLS protocol - send close_notify alert | |
|
# (see SSL_set_quiet_shutdown(3) man page) | |
|
tls_compliant |
}
# TCP
healthchecker
TCP_CHECK {
# No additional options
}
# SMTP
healthchecker
SMTP_CHECK {
# Optional string to use for the SMTP HELO request
helo_name <STRING>|<QUOTED-STRING>
}
# DNS
healthchecker. Uses UDP protocol.
DNS_CHECK {
# The retry default is 3.
# DNS query
type
# A|NS|CNAME|SOA|MX|TXT|AAAA
# The default is SOA
type <STRING>
# Domain name
to use for the DNS query
# The default is . (dot)
name <STRING>
}
# MISC
healthchecker, run a program
MISC_CHECK {
# The retry default is 0.
# External
script or program
misc_path <STRING>|<QUOTED-STRING>
# Script execution timeout
misc_timeout <INTEGER>
# If
misc_dynamic is set, the exit code from healthchecker
# is used to dynamically adjust the weight as follows:
# exit status 0: svc check success, weight
# unchanged.
# exit status 1: svc check failed.
# exit status 2-255: svc check success,
# then the RS weight is increased by
# (exit status - 2 - rs configured weight).
# An exit status of 10 will set the RS weight to 10. If
# the exit status subsequently changes to 20, the RS
# weight will become 20.
# If there is only one MISC_CHECK and no FILE_CHECKers
# the effect is to set the RS weight to two less than
# the exit status.
# (for example: an exit status of 255 would set
# weight to 253 if no other MISC_CHECKers or
# FILE_CHECKers where configured on the RS)
misc_dynamic
# Specify the
username/groupname that the script should
# be run under.
# If GROUPNAME is not specified, the group of the user
# is used
user USERNAME [GROUPNAME]
}
# BFD instance
name to check
BFD_CHECK {
name <STRING>
}
# PING
healthchecker
# Note: using this checker may cause
/proc/sys/net/ipv4/ping_group_range to be
# updated to allow root to use an IPPROTO_ICMP socket.
PING_CHECK {
# No additional options
}
# UDP
healthchecker
# Note: for this checker to work properly, it relies on ICMP
error messages such as
# HOST_UNREACH, NET_UNREACH, PORT_UNREACH. HOST_UNREACH
relies on ARP requests
# timing out, and so connect_timeout should be long enough
to allow for this (e.g.
# at least 4 seconds).
|
# If payload is specified, the HEX_STR will be sent as the UDP data, otherwise a | |
|
# random payload will be sent. | |
|
# If require_reply is specified, the received data length is checked to ensure that it | |
|
# lies between min_reply_length and max_reply_length. | |
|
# If require_reply without a hex string is specified, udp reply data must be received | |
|
# but the data content is not checked. | |
|
# If a require_reply HEX_STR is specified, the reply data will be checked against the | |
|
# HEX_STR, which must match up to the minimum of the received data length and the length | |
|
# of the require_reply HEX_STR. | |
|
# The format of HEX_STR is quite free format, for example: | |
|
# Ab12f 3 456 546443123 | |
|
# would be interpreted as: | |
|
# AB 12 0F 03 45 06 54 64 43 12 03 | |
|
# For the require_reply HEX_STR, a character can be specified as X or x, in which case | |
|
# the value of those 4 bits in the reply is ignored. This allows, for example, for | |
|
# some form of counter or otherwise. |
# It may be that you will want
to use PING_CHECK to the same server as well.
UDP_CHECK {
|
payload <HEX_STR> |
||||||
|
require_reply [<HEX_STR>] |
# Require a reply packet for check to be successful | |||||
|
min_reply_length <INT> |
# default 0 | |||||
|
max_reply_length <INT> |
# default is 255 |
}
# File checker
# This reads and monitors the contents of a file, where
STRING is the name specified
# in the track_file configuration block (see above).
FILE_CHECK {
track_file <STRING>
# If dynamic is
set, the value from the file is used
# to dynamically adjust the weight by adding the weight
# to the quorum and the LVS weight
dynamic
# The weight
multiplier to apply to the value read from the file
weight <-2147483647..2147483647> [reverse]
}
}
}
# Parameters
used for SSL_GET check.
# If none of the parameters are specified, the SSL context
# will be auto generated.
SSL {
# Password
password <STRING>
# CA file
ca <STRING>
# Certificate file
certificate <STRING>
# Key file
key <STRING>
}
ADVANCED CONFIGURATION
Configuration parser has been extended to support advanced features such as conditional configuration and parameter substitution. These features are very useful for any scripted environment where configuration templates are generated (datacenters).
Conditional configuration and configuration id
The config-id defaults to the first part of the node name as returned by uname, and can be overridden with the -i or --config-id command line option.
Any configuration line starting with ’@’ is a conditional configuration line. The word immediately following (i.e. without any space) the ’@’ character is compared against the config-id, and if they don’t match, the configuration line is ignored.
Alternatively, ’@^’ is a negative comparison, so if the word immediately following does NOT match the config-id, the configuration line IS included.
The purpose of this is to allow a single configuration file to be used for multiple systems, where the only differences are likely to be the router_id, vrrp instance priorities, and possibly interface names and unicast addresses.
For example:
global_defs
{
@main router_id main_router
@backup router_id backup_router
}
...
vrrp_instance VRRP {
...
@main unicast_src_ip 1.2.3.4
@backup unicast_src_ip 1.2.3.5
@backup2 unicast_src_ip 1.2.3.6
unicast_peer {
@^main 1.2.3.4
@^backup 1.2.3.5
@^backup2 1.2.3.6
}
...
}
If keepalived is invoked with -i main, then the router_id will be set to main_router, if invoked with -i backup, then backup_router, if not invoked with -i, or with -i anything else, then the router_id will not be set. The unicast peers for main will be 1.2.3.5 and 1.2.3.6.
Parameter substitution
Substitutable parameters can be specified. The format for defining a parameter is:
$PARAMETER=VALUE
where there must be no space before the ’=’ and only whitespace may preceed to ’$’. Empty values are allowed.
Parameter names can be made up of any combination of A-Za-z0-9 and _, but cannot start with a digit. Parameter names starting with an underscore should be considered reserved names that keepalived will define for various pre-defined options.
After a parameter is defined, any occurrence of $PARAMETER followed by whitespace, or any occurrence of ${PARAMETER} (which need not be followed by whitespace) will be replaced by VALUE.
Replacement is recursive, so that if a parameter value itself includes a replaceable parameter, then after the first substitution, the parameter in the value will then be replaced; the substitution is done at replacement time and not at definition time, so for example:
$ADDRESS_BASE=10.2.${ADDRESS_BASE_SUB}
$ADDRESS_BASE_SUB=0
${ADDRESS_BASE}.100/32
$ADDRESS_BASE_SUB=10
${ADDRESS_BASE}.100/32
will produce:
10.2.0.100/32
10.2.10.100/32
Note in the above examples the use of both ADDRESS_BASE and ADDRESS_BASE_SUB required braces ({}) since the parameters were not followed by whitespace (after the first substitution which produced 10.2.${ADDRESS_BASE_SUB}.100/32 the parameter is still not followed by whitespace).
If a parameter is not defined, it will not be replaced at all, so for example ${UNDEF_PARAMETER} will remain in the configuration if it is undefined; this means that existing configuration that contains a ’$’ character (for example in a script definition) will not be changed so long as no new parameter definitions are added to the configuration.
Parameter substitution works in conjunction with conditional configuration. For example:
@main
$PRIORITY=240
@backup $PRIORITY=200
...
vrrp_instance VI_0 {
priority $PRIORITY
}
will produce:
...
vrrp_instance VI_0 {
priority 240
}
if the config_id is main.
$IF_MAIN=@main
$IF_MAIN priority 240
will produce:
priority 240
if the config_id is main and nothing if the config_id is not
main,
although why anyone would want to use this rather than
simply the
following is not known (but still possible):
@main priority 240
Multiline definitions are also supported, but when used there must be nothing on the line after the parameter name. A multiline definition is specified by ending each line except the last with a ’\’ character.
Example:
$INSTANCE= \
vrrp_instance VI_${NUM} { \
interface eth0.${NUM} \
use_vmac vrrp${NUM}.1 \
virtual_router_id 1 \
@high priority 130 \
@low priority 120 \
advert_int 1 \
virtual_ipaddress { \
10.0.${NUM}.254/24 \
} \
track_script { \
offset_instance_${NUM} \
} \
}
$NUM=0
$INSTANCE
$NUM=1
$INSTANCE
The use of multiline definitions can be nested.
Example:
$RS= \
real_server 192.168.${VS_NUM}.${RS_NUM} 80 { \
weight 1 \
inhibit_on_failure \
smtp_alert \
MISC_CHECK { \
misc_path "${_PWD}/scripts/vs.sh
RS_misc.${INST}.${VS_NUM}.${RS_NUM}.0
10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80"
\
} \
MISC_CHECK
{ \
misc_path "${_PWD}/scripts/vs.sh
RS_misc.${INST}.${VS_NUM}.${RS_NUM}.1
10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80"
\
} \
notify_up "${_PWD}/scripts/notify.sh RS_notify.${INST}.${VS_NUM}.${RS_NUM} UP 10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80" \
notify_down "${_PWD}/scripts/notify.sh RS_notify.${INST}.${VS_NUM}.${RS_NUM} DOWN 10.0.${VS_NUM}.4:80->192.168.${VS_NUM}.${RS_NUM}:80" \
}
$VS= \
virtual_server 10.0.${VS_NUM}.4 80 { \
quorum 2 \
quorum_up "${_PWD}/scripts/notify.sh
VS_notify.${INST} UP 10.0.${VS_NUM}.4:80" \
quorum_down "${_PWD}/scripts/notify.sh
VS_notify.${INST} DOWN 10.0.${VS_NUM}.4:80" \
$RS_NUM=1 \
$RS \
$RS_NUM=2 \
$RS \
$RS_NUM=3 \
$RS \
}
$VS_NUM=0
$ALPHA=alpha
$VS
$VS_NUM=1
$ALPHA=
$VS
The above will create 2 virtual servers, each with 3 real servers
Pre-defined definitions
The following definitions are pre-defined:
${_PWD}
: The directory of the current configuration file (this
can be changed if using the include directive).
${_INSTANCE} : The instance name (as defined by the -i
option, defaults to hostname).
${_RANDOM [MIN [MAX]]} : This is replaced by a random
integer in the range [MIN, MAX], where MIN and MAX are
optional non-negative integers. Defaults are MIN=0 and
MAX=32767.
${_HASH} : This is replaced by a ’#’
character, which would otherwise start a comment
${_BANG} : This is replaced by a ’!’
character, which would otherwise start a comment
Additional pre-defined definitions will be added as their need is identified. It will normally be quite straightforward to add additional pre-defined definitions, so if you need one, or have a good idea for one, then raise an issue at https://github.com/acassen/keepalived/issues requesting it.
Sequence blocks
A line starting ~SEQ(var, start, step, end) will cause the remainder of the line to be processed multiple times, with the variable $var set initially to start, and then $var will be incremented by step repeatedly, terminating when it is greater than end. step may be omitted, in which case it defaults to 1 or -1, depending on whether end is greater or less than start. start may also be omitted, in which case it defaults to 1 if end > 0 or -1 if end < 0. ~SEQx(...) is the same as ~SEQ(...), except the variable $var will for formatted in hexadecimal, which would be useful for IPv6 addresses.
Note: At the moment it is necessary to use different variables for the ~SEQ block from any previously defined variable, including one used as the variable in a previous ~SEQ block. This may change in the future, so do not rely on a ~SEQ block variable being defined after the end of the block.
Examples:
~SEQ(SUBNET, 0, 3) ip_address 10.0.${SUBNET}.1
would produce:
ip_address 10.0.0.1
ip_address 10.0.1.1
ip_address 10.0.2.1
ip_address 10.0.3.1
and
~SEQx(SUBNET, 144, 16, 192) ip_address fe80::20:${SUBNET}:1
or better
~SEQx(SUBNET, 0x90, 0x10, 0xc0) ip_address
fe80::20:${SUBNET}:1
would produce:
ip_address fe80::20:90:1
ip_address fe80::20:a0:1
ip_address fe80::20:b0:1
ip_address fe80::20:c0:1
Another example:
|
virtual_ipaddress { | |
|
~SEQx(AD2, 0x90, 0x10, 0xc0) ~SEQx(AD1, 0x12, -1, 0x0c) fe81::10:${AD2}:${AD1} | |
|
} |
There can be multiple ~SEQ elements on a line, so for example:
$VI4= \
track_file offset_instance_4.${IF}.${NUM}.${ID} { \
file
"${_PWD}/679/track_files/4.${IF}.${NUM}.${ID}" \
weight -100 \
} \
vrrp_instance vrrp4.${IF}.${NUM}.${ID} { \
interface bond${IF}.${NUM} \
use_vmac vrrp4.${IF}.${NUM}.${ID} \
virtual_router_id ${ID} \
priority 130 \
virtual_ipaddress { \
10.${IF}.${NUM}.${ID}/24 \
} \
track_file { \
offset_instance_4.${IF}.${NUM}.${ID} \
} \
}
~SEQ(IF,0,7) ~SEQ(NUM,0,31) ~SEQ(ID,1,254) $VI4
will produce
65024 vrrp instances with names from vrrp4.0.0.1 through to
vrrp4.7.31.254.
List blocks
List blocks are similar to sequence blocks, except that the values to substitute into the variable are listed in the ~LST specification.
A line starting ~LST(var, val1, val2, val3) will cause the remainder of the line to be processed multiple times, with the variable $var set initially to val1, and then val2, and finally val3. Any number of values can be specified, as long as at least one value is (although only one value would be pointless).
If it is desired to substitute more than one variable at a time, the variables and values need to be enclosed in {...} blocks. For example:
~LST({IP, IP1}, {10,1},{20,4},{5,6},{12,8}) 192.168.${IP}.${IP1}
would first set IP=10 and IP1=1, then IP=20 and IP1=4, etc, and produces:
192.168.10.1
192.168.20.4
192.168.5.6
192.168.12.8
List blocks can be nested, so:
~LST(IP, 1, 2, 3, 4) ~LST(IP1, 5,6,7) 192.169.${IP}.${IP1}
produces:
192.169.1.5
192.169.1.6
192.169.1.7
192.169.2.5
192.169.2.6
192.169.2.7
192.169.3.5
192.169.3.6
192.169.3.7
192.169.4.5
192.169.4.6
192.169.4.7
Finally, list blocks and sequence blocks can be combined, so:
~LST({IP, IP1}, {10,1},{20,4},{5,6},{12,8}) ~SEQ(IP2,168,2,172) 192.${IP2}.${IP}.${IP1}
produces:
192.168.10.1
192.170.10.1
192.172.10.1
192.168.20.4
192.170.20.4
192.172.20.4
192.168.5.6
192.170.5.6
192.172.5.6
192.168.12.8
192.170.12.8
192.172.12.8
KERNEL SETTINGS
It has been identified that if proxy_arp and proxy_arp_pvlan are enabled on an interface that has VIPs or eVIPs configured on it, it can cause incorrect replies to ARP requests due to the proxy replying to the ARP request as well as the keepalived host. Both need to be set to 0 to function properly.
AUTHORS
Initial by Joseph Mack. Extensive updates by Alexandre Cassen & Quentin Armitage.
SEE ALSO
ipvsadm(8), ip --help.