Sunday, 19 October 2014

IPSec Implementation in Linux Kernel Stack

IPSec is an IETF standardized technology to provide secure communications over the Internet by securing data traffic at the IP layer. IPSec is essential in the world of internet because IP datagrams are not secure by itself, their IP source address can be spoofed, Content of IP datagrams can be sniffed/modified and many more vulnerabilities exists.

 There are dozens of RFCs and articles explain the IPSec protocols in depth.  Though the major part of IPSec is embedded inside the Operating systems like Linux, its implementation details are rarely documented. This article is an attempt to peek into the details of IPSec framework of Linux kernel TCP/IP stack.

Basics at a Glance 

IPsec is an end-to-end security scheme operating in the Internet Layer of the Internet Protocol Suite, while some other Internet security systems in widespread use, such as Transport Layer Security (TLS) and Secure Shell (SSH), operate in the upper layers of the TCP/IP model. Hence, IPsec can protect any application traffic across an IP network. Applications do not need to be specifically designed to use IPsec. And only the sender and receiver have to be IPsec compliant, rest of network can be usual IP.
Before we jump into implementation details lets have a quick review of the ‘essential terminologies’ of IPSec technology like AH, ESP, transport mode, tunnel mode, security association, security policy, IKE protocol, IPSec configuration etc. It may sound complicated at this point. Let’s discuss further to run thru each of them now.
 Internet Protocol Security (IPSec) can be achieved using two protocols namely AH(Authentication Header) or ESP (Encapsulating Security Payload). At the same time IPSec supports two modes of operation namely transport mode and tunnel mode. Based on the requirement an admin may choose one of the above protocol and the mode of operation while configuring IPSec.

Authentication Header (AH) protocol
This is one of the TWO implementations of IPSec.  AH provides source authentication & data integrity for IP datagrams. But it is not designed to provide confidentiality.

ESP
This is another widely used IPSec  protocol. It provides source authentication, data integrity, and confidentiality.

IPSec operation modes:
IPSec operates in two different modes namely transport mode and tunnel mode.
In transport mode, only the payload of the IP packet is usually encrypted and/or authenticated. The routing is intact, since the IP header is neither modified nor encrypted;
where as in tunnel mode, the entire IP packet is encrypted and/or authenticated. It is then encapsulated into a new IP packet with a new IP header. Tunnel mode is used to create virtual private networks for network-to-network communications (e.g. between routers to link sites), host-to-network communications (e.g. remote user access) and host-to-host communications (e.g. private chat).

Security policy
Security policy is a rule which decides whether a given flow needs to go for IPSec processing or not. If no policy matches, the packet takes the default flow in the network stack.

Security association
Another important concept of IP security architecture is the security association. A security association is simply the bundle of algorithms and parameters (such as keys) that is being used to encrypt and authenticate a particular flow in one direction.  IPsec uses the Security Parameter Index (SPI: an index to the security association database - SADB), along with the destination address in a packet header, which together uniquely identify a security association for that packet.        
As we discussed, the IPSec implementation is spread across the user space and kernel space of Linux operating system. The Kernel space part does the Actual job of IPSec. We will soon look into the kernel part.

The User space part of IPSec Architecture
In order to establish IPSec between two network entities (two routers, two hosts etc), both parties must agree on few parameters like, Key exchange algorithms, key life time, encryption, data integrity, authentication etc. This can be achieved either manual configuration or a popular protocol called IKE. For bigger networks it is a tedious task to manually configure each node.  
Setkey is the tool used for configuring IPSec.
setkey -f <conf file>
Example config file  for manual configuration:  
/*####################################################*/
Security policy
spdadd 172.16.1.0/24 172.16.2.0/24 any -P out ipsec esp/tunnel/ 192.168.1.100 - 172.16.2.23 /require;
Security State
Add 192.168.1.100 172.16.2.23 esp 0x201 -m tunnel -E 3des-cbc 0x7aeaca3f87d060a12f4a4487d5a5c3355920fae69a96c831 -A  hmac-md5 0xc0291ff014dccdd03874d9e8e4cdf3e6;
/*###################################################*/
If we go for IKE daemon the SA block above is not required as it will be established via this daemon. The popular Linux IKE implementation tools are racoon, Openswan, strongSwan etc. the following discussion assumes that the IPsec is configured via one of the above method

Uspace – Kspace Communication: NETLINK_XFRM
 As we discussed some part of IPSec is in user space and some in kernel space, there should be a reliable mechanism for their communication, that is done using a netlink socket called NETLINK_XFRM.
 If you want to debug at this level, you may start with xfrm_netlink_rcv(). Which is the kernel method responds to this netlink socket.
The table-1 shows the important netlink messages that XFRM supports to manage SAD and SPD from user space.
XFRM_MSG_NEWSA
 To  add a new SA to SAD
XFRM_MSG_DELSA
 To delete a new SA to SAD
XFRM_MSG_GETSA
 To  get a new SA to SAD
XFRM_MSG_FLUSHSA
 To  flush  SAD
XFRM_MSG_NEWPOLICY 
 To add a new policy to SPD
XFRM_MSG_DELPOLICY 
 To delete a new policy to SPD
XFRM_MSG_GETPOLICY 
 To get a new policy to SPD
XFRM_MSG_FLUSHPOLICY
To flush SPD


Now
 Lets delve into the IPSec framework in kernel


We will look into the packet transmission and reception flow of IPSec enabled kernel. Also we will see the SPD and SAD lookup and off course the important kernel objects involved.
XFRM framework of kernel:
 This is the 'IPSec co-ordinator' in kernel. The actual IPSec performs inside this frameowrk. Which internally calls the protocol specific implementations of AH and ESP protocols (net/ipv4/esp4.c, net/ipv6/esp6.c). Though most of the XFRM framework is common for both ipv4 and ipv6 (net/xfrm), the protocol to XFRM linking part is implemented in net/ipv4/xfrm4_policy.c and net/ipv6/xfrm6_policy.c
XFRM initialization is done by two methods, xfrm4_init() and xfrm6_init().

Kernel cryptography:
 The 'acrypto'(asynchronous crypto), cryptd, pcrypto(for multicore environment) layers of kernel has already implemented almost all algrithms (DES, 3DES, AES, RC5, IDEA, 3-IDEA, CAST, BLOWFISH etc..). There are two IPSec stacks used in kernel. the native netkey stack(syncronous) and traditional KLIPS stack(asynchronous). So an IPSec developer may not need to know all the compilated mathematics of cryptography, but just call crypto APIs :-).
To start with, the core object of xfrm is the 'xfrm' member of 'struct net'. i.e each network namespace has got a separate xfrm object. This object will be reffered to access the hash tables (remeber hash tables :) ) of SPD and SAD. Also holds the state garbage collector (state_gc_work)

Data structures
 Info in SPD indicates “what” to do with arriving datagram; Info in the SAD indicates “how” to do it.

The building block of SPD (Policy Database) is struct xfrm_policy.
/* ################################################# */
struct xfrm_policy {
#ifdef CONFIG_NET_NS
                struct net                            *xp_net;
#endif
                struct hlist_node              bydst;
                struct hlist_node              byidx;
                /* This lock only affects elements except for entry. */
                rwlock_t                              lock;
                atomic_t                              refcnt;
                struct timer_list                timer;
                struct flow_cache_object flo;
                atomic_t                              genid;
                u32                                         priority;
                u32                                         index;
                struct xfrm_mark             mark;
                struct xfrm_selector       selector;
                struct xfrm_lifetime_cfg lft;
                struct xfrm_lifetime_cur curlft;
                struct xfrm_policy_walk_entry walk;
                struct xfrm_policy_queue polq;
                u8                                           type;
                u8                                           action;
                u8                                           flags;
                u8                                           xfrm_nr;
                u16                                         family;
                struct xfrm_sec_ctx        *security;
                struct xfrm_tmpl              xfrm_vec[XFRM_MAX_DEPTH];
};

Important Fields:
                                - refcnt is to hold the reference to the policy.
                                - which embedded xfrm_selector object to hold the source and destination IP addresses, source and destination ports, protocol, interface index etc. xfrm_selector_match() API checks if the given packet matches with the XFRM selector.
                                - lft:  is the policy lifetime
                                - timer: to handle the policy expiry
                                - polq: is a queue to push the packets when there are no states associated with this policy.
                                - action: this field decides the fate of the traffic. (XFRM_POLICY_ALLOW and XFRM_POLICY_BLOCK)
                                - family (v4 or v6, as mentioned this structure is common for all protocols)          

The building block of SAD (Association Database) is struct xfrm_state .
/* #################################################### */
/* Full description of state of transformer. */
struct xfrm_state {
#ifdef CONFIG_NET_NS
                struct net                            *xs_net;
#endif
                union {
                                struct hlist_node              gclist;
                                struct hlist_node              bydst;
                };
                struct hlist_node              bysrc;
                struct hlist_node              byspi;
                atomic_t                              refcnt;
                spinlock_t                           lock;
                struct xfrm_id                   id;
                struct xfrm_selector       sel;
                /* Key manager bits */
                struct xfrm_state_walk km;
                /* Parameters of this state. */
                struct {
                                u32                         reqid;
                                u8                           mode;
                                u8                           replay_window;
                                u8                           aalgo, ealgo, calgo;
                                u8                           flags;
                                u16                         family;
                                xfrm_address_t               saddr;
                                int                           header_len;
                                int                           trailer_len;
                                u32                         extra_flags;
                } props;
                struct xfrm_lifetime_cfg lft;
                /* Data for transformer */
                struct xfrm_algo_auth   *aalg;

                struct xfrm_algo               *ealg;
                struct xfrm_algo               *calg;
                struct xfrm_algo_aead  *aead;
                /* Data for encapsulator */
                struct xfrm_encap_tmpl               *encap;
                --------------        
               -------------------
                /* data for replay detection */
                struct xfrm_replay_state replay;
                struct xfrm_replay_state_esn *replay_esn;
                struct xfrm_replay_state preplay;
                struct xfrm_replay_state_esn *preplay_esn;
                struct xfrm_replay          *repl;
                u32                                         replay_maxage;
                u32                                         replay_maxdiff;
                struct timer_list                rtimer;
                /* Statistics */
                struct xfrm_stats             stats;
                struct xfrm_lifetime_cur curlft;
                struct tasklet_hrtimer    mtimer;
                /* Last used time */
                unsigned long                    lastused;
                ---------------------------
                ----------------------------
                /* Private data of this transformer, format is opaque,
                 * interpreted by xfrm_type methods. */
                void                                       *data;
}

/* ###################################################### */


 IPSec kernel  APIs:     
                   
Xfrm_lookup()
xfrm lookup(SPD and SAD) method
Xfrm_input()
xfrm processing for an ingress packet
Xfrm_output()
xfrm processing for an egress packet
Xfrm4_rcv()
IPv4 specific Rx method
Xfrm6_rcv()
IPv6 specific Rx method
Esp_input()
ESP processing for an ingress packet
Esp_output()
ESP processing for an egress packet
Ah_output()
AH processing for an ingress packet
Ah_input()
ESP processing for an egress packet
xfrm_policy_alloc()
allocates an SPD object
Xfrm_policy_destroy()
frees an SPD object
xfrm_ policy_lookup
SPD lookup
xfrm_policy_byid()
SPD lookup based on id
Xfrm_policy_insert()
Add an entry to SPD 
Xfrm_Policy_delete()
remove an entry from SPD
Xfrm_bundle_create()
creates a xfrm bundle
Xfrm_policy_delete()
releases the resources of a policy object
Xfrm_state_add()
add an entry to SAD
Xfrm_state_delete()
free and SAD object
Xfrm_state_alloc()
allocate an SAD object
 xfrm_state_lookup_byaddr()
src address based SAD lookup
 xfrm_state_find() 
SAD look up based on dst
 xfrm_state_lookup()
SAD lookup based on spi
table-2 : XFRM APIS

Kernel Code flow:

database lookup
The main API used for IPSec lookup is xfrm_lookup(). Which internally does the SPD lookup and SAD lookup. Once the routing decision is taken, the packet is given to xfrm_lookup(). I,e the dst_entry object is already set in the packet (skb->dst). If the lookup succeeds the ‘skb->dst->output’ will set to xfrm_output().
To make the lookup faster for future packets, the important informations like the route entry (ipv4 or ipv6), the matching policy etc. will be cached by calling xfrm_bundle_create(). The struct xfrm_dst is the object used for xfrm cache.
             - The main APIs used for SPD lookup are xfrm_ policy_lookup(),  xfrm_policy_byid(). Which look for a match for destination and source IP addresses, source and destination port addresses, protocol, and interface index. More APIs are given in table-2 above. 
                - State lookup can be done in 3 ways. based on SPI, based on dstination address or by src address. XFRM maintains 3 hash tables per namespace (struct net) for this. The APIs are given in the table-2.
          
The table-3 and 4 take you through the kernel methods involved in IPSec during packet transmission and reception respectively.

IPSec in packet transmission

For better understanding I have divided the IPSec transmission process in 7 stepes as below
Step-1: Transport_layer_sendmsg()
Does TCP/UDP specific jobs are done here before going for route lookup
Step-2: ip_route_output_slow()
Xfrm_lookup()
Step-3: ip_local_output()
Step-4: ip_local_out()
 LOCAL_OUT netfilter applies here.
Calls skb->dst->output(), which is xfrm4_output in case of ipv4 and xfrm6_output in the case of ipv6
Step-5: xfrm4_output/xfrm6_output
Step-6: esp_output()/ah_output()
Step-7: ip_output()
Step-8: dev_queue_xmit()
Egress QoS comes here.
Step-9: dev->ndo_start_xmit()

Table-3: IPSec Tx steps

IPSec in packet reception

For better understanding I have divided the IPSec reception process in 7 stepes, they are below
Step-1: netif_receive_skb()
Step-2: ip_rcv()
Netfilter PRE_ROUTING applies here.
Step-3: ip_receive_finish
Calls ip_route_input_noref(). Which finds the route entry and set dst->output for local delivery, forwarding etc. But IPSec applies on the end systems ONLY. So we bothr if it is set for local delivery
Step-4: ip_local_deliver
LOCAL_IN Netfilter part here.
Step-5: ip_local_deliver_finish()
Based on the protocol field of  ip header (IPPROTO_AH, IPPROTO_ESP),  packet will be given to xfrm4_rcv() function
Step-6:  xfrm4_rcv()
Step-7: xfrm_input()
Calls xfrm_state_lookup()
calls esp_input()/ah_input()
Once again applies the PRE_ROUTING Netfilter, but now  for the decapsulated packet
Step-8: xfrm4_rcv_encap_finish()
Will do the route lookup again for the decapsulated packet using ip_route_input_noref(). Again route lookup should decide for local_delivery.
Step-9: ip_local_delivery()
again the LOCAL_IN Netfilter for decapsulated packet
now the protocol field will be TCP/UDP and the packet flows in the native reception methods of TCP/UDP and delivers to the socket
Step-10: transport_layer_rcvmsg()
-to userspace

Table-4: IPSec Rx steps

Here I conclude this document. Now we have briefly covered various building blocks of IPSec including XFRM framework, essential data structures, APIs, code flow etc. I hope this helped you to build a platform to dig more into IPSec feature of kernel stack.


2 comments:

  1. Thank for a very useful article.
    Can you please explain the purpose of the following commands and differences between them.

    ip xfrm state deleteall
    ip xfrm state flush
    According to the man page :

    ip xfrm state deleteall - delete all existing state in xfrm

    ip xfrm state flush - flush all state in xfrm
    How we can have "not existing" states in SAB?

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete