ipfw − IP firewall
int setsockopt (int socket, IPPROTO_IP, int command, void *data, int length)
The IP firewall facilities in the Linux kernel provide mechanisms for accounting IP packets, for building firewalls based on packet-level filtering, for building firewalls using transparent proxy servers (by redirecting packets to local sockets), and for masquerading forwarded packets. The administration of these functions is maintained in the kernel as a series of separate lists (hereafter referred to as chains) each containing zero or more rules. There are three builtin chains which are called input, forward and output which always exist. All other chains are user defined. A chain is a sequence of rules; each rule contains specific information about source and destination addresses, protocols, port numbers, and some other characteristics. Information about what to do if a packet matches the rule is also contained. A packet will match with a rule when the characteristics of the rule match those of the IP packet.
A packet always
traverses a chain starting at rule number 1. Each rule
specifies what to do when a packet matches. If a packet does
not match a rule, the next rule in that chain is tried. If
the end of a builtin chain is reached the default policy for
that chain is returned. If the end of a user defined chain
is reached then the rule after the rule which branched to
that chain is tried. The purpose of the three builtin chains
These rules regulate the acceptance of incoming IP packets. All packets coming in via one of the local network interfaces are checked against the input firewall rules (locally-generated packets are considered to come from the loopback interface). A rule which matches a packet will cause the rule’s packet and byte counters to be incremented appropriately.
These rules define the permissions for forwarding IP packets. All packets sent by a remote host having another remote host as destination are checked against the forwarding firewall rules. A rule which matches will cause the rule’s packet and byte counters to be incremented appropriately.
These rules define the permissions for sending IP packets. All packets that are ready to be be sent via one of the local network interfaces are checked against the output firewall rules. A rule which matches will cause the rule’s packet and byte counters to be incremented appropriately.
Each of the firewall rules contains either a branch name or a policy, which specifies what action has to be taken when a packet matches with the rule. There are five different policies possible: ACCEPT (let the packet pass the firewall), REJECT (do not accept the packet and send an ICMP host unreachable message back to the sender as notification), DENY (sometimes referred to as block; ignore the packet without sending any notification), REDIRECT (redirected to a local socket - input rules only) and MASQ (pass the packet, but perform IP masquerading - forwarding rules only).
The last two are special; for REDIRECT, the packet will be received by a local process, even if it was sent to another host and/or another port number. This function only applies to TCP or UDP packets.
For MASQ, the sender address in the IP packets is replaced by the address of the local host and the source port in the TCP or UDP header is replaced by a locally generated (temporary) port number before being forwarded. Because this administration is kept in the kernel, reverse packets (sent to the temporary port number on the local host) are recognized automatically. The destination address and port number of these packets will be replaced by the original address and port number that was saved when the first packet was masqueraded. This function only applies to TCP or UDP packets.
There is also a special target RETURN which is equivalent to falling off the end of the chain.
This paragraph describes the way a packet goes through the firewall. Packets received via one of the local network interfaces will pass the following chains:
input firewall (incoming device) Here, the device (network interface) that is used when trying to match a rule with an IP packet is listed between brackets. After this step, a packet will optionally be redirected to a local socket. When a packet has to be forwarded to a remote host, it will also pass the next set of rules: forwarding firewall (outgoing device) After this step, a packet will optionally be masqueraded. Responses to masqueraded packets will never pass the forwarding firewall (but they will pass both the input and output firewalls). All packets sent via one of the local network interfaces, either locally generated or being forwarded, will pass the following sets of rules: output firewall (outgoing device)
When a packet enters one of the three above chains rules are traversed from the first rule in order. When analysing a rule one of three things may occur.
If a rule is unmatched then the next rule in that chain is analysed. If there are no more rules for that chain the default policy for that chain is returned (or traversal continues back at the calling chain, in the case of a user-defined chain).
Rule matched (with branch to chain):
When a rule is matched by a packet and the rule contains a branch field then a jump/branch to that chain is made. Jumps can only be made to user defined chains. As described above, when the end of a builtin chain is reached then a default policy is returned. If the end of a used defined chain is reached then we return to the rule from whence we came.
There is a reference counter at the head of each chain which determines the number of references to that chain. The reference count of a chain must be zero before it can be deleted to ensure that no branches are effected. To ensure the builtin chains are never deleted their reference count is initialised to one. Also since no branches to builtin chains can be made, their reference counts are always one. The reference count on user defined chains are initialised to zero and are changed accordingly when rules are inserted, deleted etc.
Multiple jumps to different chains are possible which unfortunately make loops possible. Loop detection is therefore provided. Loops are detected when a packet tries to re-enter a chain it is already traversing. An example of a simple loop that could be created is if we set up two user defined chains called "test1" and "test2". We firstly insert a rule in the "input" chain which jumps to "test1". We then create a rule in the "test1" chain which points to "test2" and a rule in "test2" which points to "test1". Here we have obviously created a loop. When a packet then enters the input chain it will branch to the "test1" chain and then to the "test2" chain. From here it will try to branch back to the "test1" chain. A message in the syslog will be recorded along with the path which the packet traversed, to assist in debugging firewall rules.
Rule matched (special branch):
The special labels ACCEPT, DENY, REJECT, REDIRECT, MASQ or RETURN can be given which specify the immediate fate of the packet as discussed above. If no label is specified then the next rule in the chain is analysed.
Using this last option (no label) an accounting chain can be created. If each of the rules in this accounting chain have no branch or label then the packet will always fall through to the end of the chain and then return to the calling chain. Each rule that matches in the accounting chain will have its byte and packet counters incremented as expected. This accounting chain can be branched to from any other chain (eg input, forward or output chain). This is a very neat way of performing packet accounting.
The firewall administration can be changed via calls to setsockopt(2). The existing rules can be inspected by looking at two files in the /proc/net directory: ip_fwchains, ip_fwnames. These two files are readable only by root. The current administration related to masqueraded sessions can be found in the file ip_masquerade in the same directory.
Command for changing and setting up chains and rules is ipchains(8) Most commands require some additional data to be passed. A pointer to this data and the length of the data are passed as option value and option length arguments to setsockopt. The following commands are available:
This command allows a rule to be inserted in a chain at a given position (where 1 is considered the start of the chain). If there is already a rule in that position, it is moved one slot, as are any following rules in that chain. The reference count of any chains referenced by this inserted rule are incremented appropriately. The data passed with this command is an ip_fwnew structure, defining the position, chain and contents of the new rule.
Remove the first rule matching the specification from the given chain. The data passed with this command is an ip_fwchange structure, defining the rule to be deleted and its chain. The reference count of any chains referenced by this deleted rule are decremented appropriately. Note that the fw_mark field is currently ignored in rule comparisons (see the BUGS section).
Remove a rule from one of the chains at a given rule number (where 1 means the first rule). The data passed with this command is an ip_fwdelnum structure, defining the rule number of the rule to be deleted and its chain. The reference count of any chains referenced by this deleted rule are decremented appropriately.
Reset the packet and byte counters in all rules of a chain. The data passed with this command is an ip_chainlabel which defines the chain which is to be operated on. See also the description of the /proc/net files for a way to atomically list and reset the counters.
Remove all rules from a chain. The data passed with this command is an ip_chainlabel which defines the chain to be operated on.
Replace a rule in a chain. The new rule overwrites the rule in the given position. Any chains referenced by the new rule are incremented and chains referenced by the overwritten rule are decremented. The data passed with this command is an ip_fwnew structure, defining the contents of the new rule, the the chain name and the position of the rule in that chain.
Insert a rule at the end of one of the chains. The data passed with this command is an ip_fwchange structure, defining the contents of the new rule and the chain to which it is to be appended. Any chains referenced by this new rule have their refcount incremented.
Set the timeout values used for masquerading. The data passed with this command is a structure containing three fields of type int, representing the timeout values (in jiffies, 1/HZ second) for TCP sessions, TCP sessions after receiving a FIN packet, and UDP packets, respectively. A timeout value 0 means that the current timeout value of the corresponding entry is preserved.
Check whether a packet would be accepted, denied, rejected, redirected or masqueraded by a chain. The data passed with this command is an ip_fwtest structure, defining the packet to be tested and the chain which it is to be test on. Both builtin and user defined chains can be tested.
Create a chain. The data passed with this command is an ip_chainlabel defining the name of the chain to be created. Two chains can not have the same name.
Delete a chain. The data passed with this command is an ip_chainlabel defining the name of the chain to be deleted. The chain must not be referenced by any rule (ie. refcount must be zero). The chain must also be empty which can be achieved using IP_FW_FLUSH.
Changes the default policy on a builtin rule. The data passed with this command is an ip_fwpolicy structure, defining the chain whose policy is to be changed and the new policy. The chain must be a builtin chain as user-defined chains don’t have default policies.
ip_fw structure contains the following relevant
fields to be filled in for adding or replacing a rule:
struct in_addr fw_src, fw_dst
Source and destination IP addresses.
struct in_addr fw_smsk, fw_dmsk
Masks for the source and destination IP addresses. Note that a mask of 0.0.0.0 will result in a match for all hosts.
Name of the interface via which a packet is received by the system or is going to be sent by the system. If the option IP_FW_F_WILDIF is specified, then the fw_vianame need only match the packet interface up to the first NUL character in fw_vianame. This allows wildcard-like effects. The empty string has a special meaning: it will match with all device names.
Flags for this rule. The flags for the different options can be bitwise or’ed with each other.
The options are: IP_FW_F_TCPSYN (only matches with TCP packets when the SYN bit is set and both the ACK and RST bits are cleared in the TCP header, invalid with other protocols), The option IP_FW_F_MARKABS is described under the fw_mark entry. The option IP_FW_F_PRN can be used to list some information about a matching packet via printk(). The option IP_FW_F_FRAG can be used to specify a rule which applies only to second and succeeding fragments (initial fragments can be treated like normal packets for the sake of firewalling). Non-fragmented packets and initial fragments will never match such a rule. Fragments do not contain the complete information assumed for most firewall rules, notably ICMP type and code, UDP/TCP port numbers, or TCP SYN or ACK bits. Rules which try to match packets by these criteria will never match a (non-first) fragment. The option IP_FW_F_NETLINK can be specified if the kernel has been compiled with CONFIG_IP_FIREWALL_NETLINK enabled. This means that all matching packets will be sent out the firewall netlink device (character device, major number 36, minor number 3). The output of this device is four bytes indicating the total length, four bytes indicating the mark value of the packet (as described under fw_mark above), a string of IFNAMSIZ characters containing the interface name for the packet, and then the packet itself. The packet is truncated to fw_outputsize bytes if it is longer.
This field is a set of flags used to negate the meaning of other fields, eg. to specify that a packet must NOT be on an interface. The valid flags are IP_FW_INV_SRCIP (invert the meaning of the fw_src field) IP_FW_INV_DSTIP (invert the meaning of fw_dst) IP_FW_INV_PROTO (invert the meaning of fw_proto) IP_FW_INV_SRCPT (invert the meaning of fw_spts) IP_FW_INV_DSTPT (invert the meaning of fw_dpts) IP_FW_INV_VIA (invert the meaning of fw_vianame) IP_FW_INV_SYN (invert the meaning of fw_flg & IP_FW_F_TCPSYN) IP_FW_INV_FRAG (invert the meaning of fw_flg & IP_FW_F_FRAG). It is illegal (and useless) to specify a rule that can never be matched, by inverting an all-inclusive set. Note also, that a fragment will never pass any test on ports or SYN, even an inverted one.
The protocol that this rule applies to. The protocol number 0 is used to mean ’any protocol’.
__u16 fw_spts, fw_dpts
These fields specify the range of source ports, and the range of destination ports respectively. The first array element is the inclusive minimum, and the second is the inclusive maximum. Unless the rule specifies a protocol of TCP, UDP or ICMP, the port range must be 0 to 65535. For ICMP, the fw_spts field is used to check the ICMP type, and the fw_dpts field is used to check the ICMP code.
This field must be zero unless the target of the rule is "REDIRECT". Otherwise, if this redirection port is 0, the destination port of a packet will be used as the redirection port.
This field indicates a value to mark the skbuff with (which contains the administration data for the matching packet). This is currently unused, but could be used to control how individual packets are treated. If the IP_FW_F_MARKABS flag is set then the value in fw_mark simply replaces the current mark in the skbuff, rather than being added to the current mark value which is normally done. To subtract a value, simply use a large number for fw_mark and 32-bit wrap-around will occur.
__u8 fw_tosand, fw_tosxor
These 8-bit masks define how the TOS field in the IP header should be changed when a packet is accepted by the firewall rule. The TOS field is first bitwise and’ed with fw_tosand and the result of this will be bitwise xor’ed with fw_tosxor. Obviously, only packets which match the rule have their TOS effected. It is the responsibility of the user that packets with invalid TOS bits are not created using this option.
ip_fwuser structure, used when calling some of the
above commands contains the following fields:
struct ip_fw ipfw
ip_chainlabel label This is the label of the chain which is to be operated on.
ip_fwpkt structure, used when checking a packet,
contains the following fields:
struct iphdr fwp_iph
The IP header. See <linux/ip.h> for a detailed description of the iphdr structure.
struct udphdr fwp_protoh.fwp_udph
struct icmphdr fwp_protoh.fwp_icmph
The TCP, UDP, or ICMP header, combined in a union named fwp_protoh. See <linux/tcp.h>, <linux/udp.h>, or <linux/icmp.h> for a detailed description of the respective structures.
struct in_addr fwp_via
The interface address via which the packet is pretended to be received or sent.
The ability to add in extra chains other than just the standard input, output and forward chains is very powerful. The ability to branch to any chain makes the replication of rules unnecessary. Accounting becomes automatic as a single chain can be referenced by all builtin chains to do the accounting.
Fragments must now be handled explicitly; previously second and succeeding fragments were passed automatically.
The lowest TOS bit (MBZ) could not be effected previously; the kernel used to silently mask out any attempted manipulation of the lowest TOS bit. (’’So now you know how to do it - DON’T.’’).
The packet and byte counters are now 64-bit on 32-bit machines (actually presented as two 32-bit values).
The ability to specify an interface by an IP address was obsoleted by the ability to specify it by name; the combination of the two was error-prone and so only an interface name can now be used.
The old IP_FW_F_TCPACK flag was made obsolete by the ability to invert the IP_FW_F_TCPSYN flag.
The old IP_FW_F_BIDIR flag made the kernel code complex and is no longer supported.
The ability to specify several ports in one rule was messy and didn’t win much, so has been removed.
On success (or a straightforward packet accept for the CHECK options), zero is returned. On error, -1 is returned and errno is set appropriately. See setsockopt(2) for a list of possible error values. ENOENT indicates that the given chain name doesn’t exist. When the check packet command is used, zero is returned when the packet would be accepted without redirection or masquerading. Otherwise, -1 is returned and errno is set to ECONNABORTED (packet would be accepted using redirection), ECONNRESET (packet would be accepted using masquerading), ETIMEDOUT (packet would be denied), ECONNREFUSED (packet would be rejected), ELOOP (packet got into a loop), ENFILE
(packet fell off end of chain; only occurs for user defined chains).
directory /proc/net there are two entries to list the
currently defined rules and chains:
(for IP firewall chain names) One line per chain. Each line contains the chain name, policy, the number of references to that chain and the packet and byte counters which have matched the policy (represented as two pairs of 32-bit numbers; most significant 32-bits first).
(for IP firewall chains) One line per rule; rules are listed one chain at a time (from first to last as they appear in /proc/net/ip_fwnames) and in order from first to last down each chain.
The fields are: the chain name for that rule, source address and mask, destination address and mask, interface name (or "-"), the fw_flg field, the fw_invflg field, protocol number, packet and byte counters, the source and destination port ranges, the TOS and-mask, the TOS xor-mask, the fw_redirpt field, the fw_mark field, the fw_outputsize field, and the target (label). The IP addresses and masks are listed as eight hexadecimal digits, the TOS masks are listed as two hexadecimal digits preceded by the letters A and X, respectively, the fw_mark, fw_flg and fw_invflg fields are listed in hex, and the other values are represented in decimal format. The packet and bytes counters are represented as two space-separated 32-bit numbers, representing the most and least significant words respectively. Individual fields are separated by white space, by a "/" (the address and the corresponding mask), by "->" (the source and destination address/mask pairs), or "-" (the ranges for source and destination ports).
These files may also be opened in read/write mode. In that case, the packet and byte counters in all the rules of that category will be reset to zero after listing their current values.
The file /proc/net/ip_masquerade contains the kernel administration related to masquerading. After a header line, each masqueraded session is described on a separate line with the following entries, separated by white space or by ’:’ (the address/port number pairs): protocol name ("TCP" or "UDP"), source IP address and port number, destination IP address and port number, the new port number, the initial sequence number for adding a delta value, the delta value, the previous delta value, and the expire time in jiffies (1/HZ second). All addresses and numeric values are in hexadecimal format, except the last three entries, being represented in decimal format.
The setsockopt(2) interface is a crock. This should be put under /proc/sys/net/ipv4 and the world would be a better place.
There is no way to read and reset a single chain; stop packets traversing the chain and then list, reset and restore traffic.
The packet and byte counters should be presented in /proc as a single 64-bit value, not two 32-bit values.
The "fw_mark" field isn’t used for deletions of matching rules. This is to facilitate the ipfwadm compatibility script. Similarly, the IP_FW_F_MARKABS flag is ignored in comparisons.