IPSec is an IETF standardized technology to provide secure
communications over the Internet by securing data traffic at the IP layer. IPSec
is essential in the world of internet because IP datagrams are not secure by
itself, their IP source address can be spoofed, Content of IP datagrams can be
sniffed/modified and many more vulnerabilities exists.
There are dozens of RFCs and articles explain the IPSec protocols in depth. Though the major part of IPSec is embedded inside the Operating systems like Linux, its implementation details are rarely documented. This article is an attempt to peek into the details of IPSec framework of Linux kernel TCP/IP stack.
Basics at a Glance
There are dozens of RFCs and articles explain the IPSec protocols in depth. Though the major part of IPSec is embedded inside the Operating systems like Linux, its implementation details are rarely documented. This article is an attempt to peek into the details of IPSec framework of Linux kernel TCP/IP stack.
Basics at a Glance
IPsec is an end-to-end security scheme operating in the Internet Layer of the Internet Protocol Suite, while some other Internet security systems in widespread use, such as Transport Layer Security (TLS) and Secure Shell (SSH), operate in the upper layers of the TCP/IP model. Hence, IPsec can protect any application traffic across an IP network. Applications do not need to be specifically designed to use IPsec. And only the sender and receiver have to be IPsec compliant, rest of network can be usual IP.
Before we jump into implementation details lets have a quick
review of the ‘essential terminologies’ of IPSec technology like AH, ESP,
transport mode, tunnel mode, security association, security policy, IKE
protocol, IPSec configuration etc. It may sound complicated at this point.
Let’s discuss further to run thru each of them now.
Internet Protocol Security
(IPSec) can be achieved using two protocols namely AH(Authentication Header) or
ESP (Encapsulating Security Payload). At the same time IPSec supports two modes of operation namely transport
mode and tunnel mode. Based on the requirement an admin may choose one of the
above protocol and the mode of operation while configuring IPSec.
Authentication Header (AH) protocol
This is one of the TWO implementations of IPSec. AH provides source authentication & data integrity for IP datagrams. But it is not designed to provide confidentiality.
This is one of the TWO implementations of IPSec. AH provides source authentication & data integrity for IP datagrams. But it is not designed to provide confidentiality.
ESP
This is another widely used IPSec protocol. It provides source authentication, data integrity, and confidentiality.
IPSec operation modes:
IPSec operates in two different modes namely transport mode and tunnel mode.
In transport mode, only the payload of the IP packet is usually encrypted and/or authenticated. The routing is intact, since the IP header is neither modified nor encrypted;
where as in tunnel mode, the entire IP packet is encrypted and/or authenticated. It is then encapsulated into a new IP packet with a new IP header. Tunnel mode is used to create virtual private networks for network-to-network communications (e.g. between routers to link sites), host-to-network communications (e.g. remote user access) and host-to-host communications (e.g. private chat).
Security policy
Security policy is a rule which decides whether a given flow needs to go for IPSec processing or not. If no policy matches, the packet takes the default flow in the network stack.
Security association
Another important concept of IP security architecture is the security association. A security association is simply the bundle of algorithms and parameters (such as keys) that is being used to encrypt and authenticate a particular flow in one direction. IPsec uses the Security Parameter Index (SPI: an index to the security association database - SADB), along with the destination address in a packet header, which together uniquely identify a security association for that packet.
As we discussed, the IPSec implementation is spread across the user space and kernel space of Linux
operating system. The Kernel space
part does the Actual job of IPSec. We will soon look into the kernel part.
The User space part of IPSec Architecture
In order to establish IPSec between two network entities (two routers, two hosts etc), both parties must agree on few parameters like, Key exchange algorithms, key life time, encryption, data integrity, authentication etc. This can be achieved either manual configuration or a popular protocol called IKE. For bigger networks it is a tedious task to manually configure each node.
Setkey is the tool used for configuring IPSec.
setkey -f <conf file>
setkey -f <conf file>
Example config file for manual configuration:
/*####################################################*/
Security policy
spdadd 172.16.1.0/24 172.16.2.0/24 any -P out ipsec esp/tunnel/ 192.168.1.100 - 172.16.2.23 /require;
spdadd 172.16.1.0/24 172.16.2.0/24 any -P out ipsec esp/tunnel/ 192.168.1.100 - 172.16.2.23 /require;
Security State
Add 192.168.1.100 172.16.2.23 esp 0x201 -m tunnel -E 3des-cbc
0x7aeaca3f87d060a12f4a4487d5a5c3355920fae69a96c831 -A hmac-md5 0xc0291ff014dccdd03874d9e8e4cdf3e6;
/*###################################################*/
If we go for IKE daemon the SA block above is not required
as it will be established via this daemon. The popular Linux IKE implementation
tools are racoon, Openswan, strongSwan etc. the following discussion assumes
that the IPsec is configured via one of the above method
Uspace – Kspace Communication: NETLINK_XFRM
As we discussed some part of IPSec is in user space and some in kernel space, there should be a reliable mechanism for their communication, that is done using a netlink socket called NETLINK_XFRM.
If you want to debug
at this level, you may start with xfrm_netlink_rcv(). Which is the kernel
method responds to this netlink socket.
The table-1 shows the important netlink messages that XFRM supports to manage SAD and SPD from user space.
The table-1 shows the important netlink messages that XFRM supports to manage SAD and SPD from user space.
XFRM_MSG_NEWSA
|
To add a new SA to SAD
|
XFRM_MSG_DELSA
|
To delete a new SA to SAD
|
XFRM_MSG_GETSA
|
To get a new SA to SAD
|
XFRM_MSG_FLUSHSA
|
To flush
SAD
|
XFRM_MSG_NEWPOLICY
|
To add a new policy to SPD
|
XFRM_MSG_DELPOLICY
|
To delete a new policy to SPD
|
XFRM_MSG_GETPOLICY
|
To get a new policy to SPD
|
XFRM_MSG_FLUSHPOLICY
|
To flush SPD
|
Now Lets delve into the IPSec framework in kernel
We will look into the packet transmission and reception flow of IPSec enabled kernel. Also we will see the SPD and SAD lookup and off course the important kernel objects involved.
XFRM framework of kernel:
This is the 'IPSec co-ordinator' in kernel. The actual IPSec performs inside this frameowrk. Which internally calls the protocol specific implementations of AH and ESP protocols (net/ipv4/esp4.c, net/ipv6/esp6.c). Though most of the XFRM framework is common for both ipv4 and ipv6 (net/xfrm), the protocol to XFRM linking part is implemented in net/ipv4/xfrm4_policy.c and net/ipv6/xfrm6_policy.c
XFRM initialization is done by two methods, xfrm4_init() and xfrm6_init().
This is the 'IPSec co-ordinator' in kernel. The actual IPSec performs inside this frameowrk. Which internally calls the protocol specific implementations of AH and ESP protocols (net/ipv4/esp4.c, net/ipv6/esp6.c). Though most of the XFRM framework is common for both ipv4 and ipv6 (net/xfrm), the protocol to XFRM linking part is implemented in net/ipv4/xfrm4_policy.c and net/ipv6/xfrm6_policy.c
XFRM initialization is done by two methods, xfrm4_init() and xfrm6_init().
Kernel cryptography:
The 'acrypto'(asynchronous crypto), cryptd, pcrypto(for multicore environment) layers of kernel has already implemented almost all algrithms (DES, 3DES, AES, RC5, IDEA, 3-IDEA, CAST, BLOWFISH etc..). There are two IPSec stacks used in kernel. the native netkey stack(syncronous) and traditional KLIPS stack(asynchronous). So an IPSec developer may not need to know all the compilated mathematics of cryptography, but just call crypto APIs :-).
To start with, the core object of xfrm is the 'xfrm' member
of 'struct net'. i.e each network namespace has got a separate xfrm object.
This object will be reffered to access the hash tables (remeber hash tables :)
) of SPD and SAD. Also holds the state garbage collector (state_gc_work)
Data structures
Info in SPD indicates
“what” to do with arriving datagram; Info in the SAD indicates “how” to do it.
The building block of SPD (Policy Database) is struct xfrm_policy.
/*
################################################# */
struct xfrm_policy {
#ifdef CONFIG_NET_NS
struct net *xp_net;
#endif
struct hlist_node bydst;
struct hlist_node byidx;
/* This lock only affects elements except for entry. */
rwlock_t lock;
atomic_t refcnt;
struct timer_list timer;
struct flow_cache_object flo;
atomic_t genid;
u32 priority;
u32 index;
struct xfrm_mark mark;
struct xfrm_selector selector;
struct xfrm_lifetime_cfg lft;
struct xfrm_lifetime_cur curlft;
struct xfrm_policy_walk_entry walk;
struct xfrm_policy_queue polq;
u8 type;
u8 action;
u8 flags;
u8 xfrm_nr;
u16 family;
struct xfrm_sec_ctx *security;
struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH];
};
struct net *xp_net;
#endif
struct hlist_node bydst;
struct hlist_node byidx;
/* This lock only affects elements except for entry. */
rwlock_t lock;
atomic_t refcnt;
struct timer_list timer;
struct flow_cache_object flo;
atomic_t genid;
u32 priority;
u32 index;
struct xfrm_mark mark;
struct xfrm_selector selector;
struct xfrm_lifetime_cfg lft;
struct xfrm_lifetime_cur curlft;
struct xfrm_policy_walk_entry walk;
struct xfrm_policy_queue polq;
u8 type;
u8 action;
u8 flags;
u8 xfrm_nr;
u16 family;
struct xfrm_sec_ctx *security;
struct xfrm_tmpl xfrm_vec[XFRM_MAX_DEPTH];
};
Important Fields:
-
refcnt is to hold the reference to the policy.
-
which embedded xfrm_selector object to hold the source and destination IP
addresses, source and destination ports, protocol, interface index etc.
xfrm_selector_match() API checks if the given packet matches with the XFRM
selector.
-
lft: is the policy lifetime
-
timer: to handle the policy expiry
-
polq: is a queue to push the packets when there are no states associated with
this policy.
-
action: this field decides the fate of the traffic. (XFRM_POLICY_ALLOW and
XFRM_POLICY_BLOCK)
-
family (v4 or v6, as mentioned this structure is common for all protocols)
The building block of SAD (Association Database) is struct xfrm_state .
/*
#################################################### */
/* Full description of state of transformer. */
struct xfrm_state {
#ifdef CONFIG_NET_NS
struct net *xs_net;
#endif
union {
struct hlist_node gclist;
struct hlist_node bydst;
};
struct hlist_node bysrc;
struct hlist_node byspi;
atomic_t refcnt;
spinlock_t lock;
struct xfrm_id id;
struct xfrm_selector sel;
struct net *xs_net;
#endif
union {
struct hlist_node gclist;
struct hlist_node bydst;
};
struct hlist_node bysrc;
struct hlist_node byspi;
atomic_t refcnt;
spinlock_t lock;
struct xfrm_id id;
struct xfrm_selector sel;
/* Key manager bits */
struct
xfrm_state_walk km;
/* Parameters of this state. */
struct
{
u32 reqid;
u8 mode;
u8 replay_window;
u8 aalgo, ealgo, calgo;
u8 flags;
u16 family;
xfrm_address_t saddr;
int header_len;
int trailer_len;
u32 extra_flags;
} props;
u32 reqid;
u8 mode;
u8 replay_window;
u8 aalgo, ealgo, calgo;
u8 flags;
u16 family;
xfrm_address_t saddr;
int header_len;
int trailer_len;
u32 extra_flags;
} props;
struct
xfrm_lifetime_cfg lft;
/* Data for transformer */
struct xfrm_algo_auth *aalg;
struct xfrm_algo *ealg;
struct xfrm_algo *calg;
struct xfrm_algo_aead *aead;
struct xfrm_algo_auth *aalg;
struct xfrm_algo *ealg;
struct xfrm_algo *calg;
struct xfrm_algo_aead *aead;
/* Data for encapsulator */
struct
xfrm_encap_tmpl *encap;
--------------
-------------------
/* data for replay detection */
struct
xfrm_replay_state replay;
struct
xfrm_replay_state_esn *replay_esn;
struct
xfrm_replay_state preplay;
struct
xfrm_replay_state_esn *preplay_esn;
struct
xfrm_replay *repl;
u32 replay_maxage;
u32 replay_maxdiff;
struct
timer_list rtimer;
/* Statistics */
struct
xfrm_stats stats;
struct
xfrm_lifetime_cur curlft;
struct
tasklet_hrtimer mtimer;
/* Last used time */
unsigned
long lastused;
---------------------------
----------------------------
/* Private data of this
transformer, format is opaque,
* interpreted by xfrm_type methods. */
void *data;
void *data;
}
/* ###################################################### */
IPSec kernel APIs:
Xfrm_lookup()
|
xfrm lookup(SPD and SAD) method
|
Xfrm_input()
|
xfrm processing for an ingress packet
|
Xfrm_output()
|
xfrm processing for an egress packet
|
Xfrm4_rcv()
|
IPv4 specific Rx method
|
Xfrm6_rcv()
|
IPv6 specific Rx method
|
Esp_input()
|
ESP processing for an ingress packet
|
Esp_output()
|
ESP processing for an egress packet
|
Ah_output()
|
AH processing for an ingress packet
|
Ah_input()
|
ESP processing for an egress packet
|
xfrm_policy_alloc()
|
allocates an SPD object
|
Xfrm_policy_destroy()
|
frees an SPD object
|
xfrm_ policy_lookup
|
SPD lookup
|
xfrm_policy_byid()
|
SPD lookup based on id
|
Xfrm_policy_insert()
|
Add an entry to SPD
|
Xfrm_Policy_delete()
|
remove an entry from SPD
|
Xfrm_bundle_create()
|
creates a xfrm bundle
|
Xfrm_policy_delete()
|
releases the resources of a policy object
|
Xfrm_state_add()
|
add an entry to SAD
|
Xfrm_state_delete()
|
free and SAD object
|
Xfrm_state_alloc()
|
allocate an SAD object
|
xfrm_state_lookup_byaddr()
|
src address based SAD lookup
|
xfrm_state_find()
|
SAD look up based on dst
|
xfrm_state_lookup()
|
SAD lookup based on spi
|
table-2 : XFRM APIS
Kernel Code flow:
database lookup
The main API used for IPSec lookup is xfrm_lookup(). Which
internally does the SPD lookup and SAD lookup. Once the routing decision is
taken, the packet is given to xfrm_lookup(). I,e the dst_entry object is already
set in the packet (skb->dst). If the lookup succeeds the
‘skb->dst->output’ will set to xfrm_output().
To make the lookup faster for future packets, the important
informations like the route entry (ipv4 or ipv6), the matching policy etc. will
be cached by calling xfrm_bundle_create(). The struct xfrm_dst is the object
used for xfrm cache.
- The
main APIs used for SPD lookup are xfrm_ policy_lookup(), xfrm_policy_byid(). Which look for a match
for destination and source IP addresses, source and destination port addresses,
protocol, and interface index. More APIs are given in table-2 above.
- State
lookup can be done in 3 ways. based on SPI, based on dstination address or by
src address. XFRM maintains 3 hash tables per namespace (struct net) for this. The
APIs are given in the table-2.
The table-3 and 4 take you through the kernel methods involved in IPSec during packet transmission and reception respectively.
The table-3 and 4 take you through the kernel methods involved in IPSec during packet transmission and reception respectively.
IPSec in packet transmission
For better understanding I have divided the IPSec transmission process in 7 stepes as below
Step-1:
Transport_layer_sendmsg()
Does TCP/UDP specific jobs are done here before going for route
lookup
Step-2:
ip_route_output_slow()
Xfrm_lookup()
Step-3: ip_local_output()
Step-4: ip_local_out()
LOCAL_OUT netfilter applies
here.
Calls skb->dst->output(), which is xfrm4_output in case of ipv4
and xfrm6_output in the case of ipv6
Step-5: xfrm4_output/xfrm6_output
Step-6:
esp_output()/ah_output()
Step-7: ip_output()
Step-8: dev_queue_xmit()
Egress QoS comes here.
Step-9:
dev->ndo_start_xmit()
|
Table-3: IPSec Tx steps
IPSec in packet reception
For better understanding I have divided the IPSec reception process in 7 stepes, they are below
Step-1:
netif_receive_skb()
Step-2: ip_rcv()
Netfilter PRE_ROUTING applies here.
Step-3: ip_receive_finish
Calls ip_route_input_noref(). Which finds the route entry and set
dst->output for local delivery, forwarding etc. But IPSec applies on the
end systems ONLY. So we bothr if it is set for local delivery
Step-4: ip_local_deliver
LOCAL_IN Netfilter part here.
Step-5:
ip_local_deliver_finish()
Based on the protocol field of
ip header (IPPROTO_AH, IPPROTO_ESP),
packet will be given to xfrm4_rcv() function
Step-6: xfrm4_rcv()
Step-7: xfrm_input()
Calls xfrm_state_lookup()
calls esp_input()/ah_input()
Once again applies the PRE_ROUTING Netfilter, but now for the decapsulated packet
Step-8: xfrm4_rcv_encap_finish()
Will do the route lookup again for the decapsulated packet using ip_route_input_noref().
Again route lookup should decide for local_delivery.
Step-9:
ip_local_delivery()
again the LOCAL_IN Netfilter for decapsulated packet
now the protocol field will be TCP/UDP and the packet flows in the
native reception methods of TCP/UDP and delivers to the socket
Step-10:
transport_layer_rcvmsg()
-to userspace
|
Table-4: IPSec Rx steps
Here I conclude this document. Now we have briefly covered various building blocks of IPSec including XFRM framework, essential data structures, APIs, code flow etc. I hope this helped you to build a platform to dig more into IPSec feature of kernel stack.
Thank for a very useful article.
ReplyDeleteCan you please explain the purpose of the following commands and differences between them.
ip xfrm state deleteall
ip xfrm state flush
According to the man page :
ip xfrm state deleteall - delete all existing state in xfrm
ip xfrm state flush - flush all state in xfrm
How we can have "not existing" states in SAB?
This comment has been removed by the author.
ReplyDelete