The IP security (IPsec) protocol consists of two main components:
The Encapsulating Security Payload (ESP) protocol securing the IP packets transferred between two IPsec endpoints.
The Internet Key Exchange Version 2 (IKEv2) auxiliary protocol responsible for the mutual authentication of the IPsec endpoints and the automated establishment of encryption and data integrity session keys for both the IKev2 management protocol itself and for the ESP payload protection.
We do not treat the authentication-only Authentication Header (AH) protocol which is rarely used, especially because it is not suited for NAT traversal.
The Encapsulation Security Payload (ESP) is defined in RFC 4303,
has IP protocol number
50 and doesn’t have any ports. ESP allows the encryption
of IP packets on the network layer carrying e.g. Layer 4 TCP traffic
In IPsec Transport mode the original IP header is retained and just the Layer 4 payload carried by the IP packet is encrypted. The ESP header is inserted between the original IP header and the encrypted payload.
Originally intended for protecting direct IPv6 host-to-host connections, transport mode is currently mainly used to secure the Layer 2 Tunneling Protocol (L2TP), see RFC 3193.
In IPsec Tunnel mode the complete IP packet is encapsulated by ESP and an outer IP header is prepended:
An ESP packet consists of an ESP header, the encrypted IP payload body and an ESP trailer needed for padding. The Authentication Data field appended at the end as a cryptographic checksum guarantees data integrity.
The 32 bit Security Parameters Index (SPI) is used by the receiving IPsec peer as an index into its kernel-based database to look up up the session keys needed to decrypt and authenticate the ESP packet. The SPI is also needed to determine the IPsec security policy that has to be enforced on the inbound plaintext IP packets after decryption.
Version 2 of the Internet Key Exchange (IKEv2) protocol defined in RFC 7296
manages the setup of IPsec connections. The IKEv2 auxiliary protocol uses UDP
datagrams with both source and destination ports set to the well-known UDP port
The Initiator starts the negotiation be sending an
IKE_SA_INIT request which
is answered by the Responder with an
If the Responder comes to the conclusion that it is under a Denial of Service
(DoS) attack, it can request a Cookie from the Initiator before sending the
computationally expensive Key Exchange (KE) payload in the
This effectively prevents IP spoofing.
Based on the exchange of the Key Exchange (KE) and Nonces (N) payloads in
IKE_SA_INIT, both endpoints can derive a Shared Secret which allows them to
encrypt all following IKE messages based on the
IKE_SA established via the
SA1r Security Association payloads.
ÌKE_AUTH request the Initiator authenticates itself by sending its
IDi and a Digital Signature in the
AUTHi payload accompanied by an
optional Certificate payload
CERTi. The Responder verifies the validity and
trustworthiness of the received end entity certificate by going up the X.509 trust
chain until a locally stored Root CA certificate is reached.
Additionally the Initiator sends a Security Association proposal
SA2i and a
set of Traffic Selectors
TSr to be used for the first
The Responder authenticates itself in turn with a Digital Signature in the
AUTHr payload accompanied by an optional Certificate payload
IKE_AUTH response and includes a selected Security Association
proposal and a possibly narrowed set of Traffic Selectors
With this information the
CHILD_SA defining the encryption and data integrity
of the IPsec payload packets can be installed and activated.
If a Pre-Shared Key (PSK) is used for authentication then the
payloads contain a hash over the exchanged IKEv2 messages and the pre-shared secret.
Since the Initiator is the first to send its password hash in the
this poses a serious security risk when the PSK is weak and is intercepted by an
active man-in-the-middle (MITM) who can then do an offline dictionary or brute force
attack on the
AUTHi payload and potentially crack the password. Therefore we
strongly discourage the use of PSK-based authentication if a sufficient password
strength cannot be enforced.
In order to prevent man-in-the-middle-attacks possible with
PSK-based authentication, EAP-based authentication
as been introduced by the IKEv2 standard. If the Initiator doesn’t include an
AUTHi payload in the
IKE_AUTH request, the Responder sends its strong Digital
Signature in the
AUTHr payload first, in order to establish trust and at the
same time initiates the EAP protocol by including a first EAP request in the
The Initiator can then use its PSK with EAP-MD5 or EAP-MSCHAPv2 to authenticate itself to the trusted Responder over the encrypted IKEv2 channel.
CREATE_CHILD_SA request/response pairs are used to negotiate additional
or to do the periodic rekeying of either the
IKE_SA or the
N(REKEY_SA) notification the
IKE_SA is rekeyed, the fresh
Key Exchange (KE) payloads guaranteeing Perfect Forward Secrecy (PFS). With a
N(REKEY_SA) notification included, a
CHILD_SA is rekeyed, the Key Exchange
(KE) payloads being optional.
Since the ESP protocol with IP protocol number
50 doesn’t have any ports,
per se it is not suited for Port Address Translation, the standard method of
traversing a NAT router for the TCP and UDP protocols.
Some NAT routers have a feature, often called something like IPsec Passthrough that detects outbound IKE traffic from a single host behind the NAT device and will forward inbound IKE and ESP packets to that specific host as shown in the figure below
Unfortunately this won’t work with multiple IPsec clients behind the same NAT router that all want to communicate with the same VPN gateway as shown in the network topology below
The solution proposed by RFC 3948 is to encapsulate ESP packets in
UDP datagrams which then allows to apply Port Address Translation as shown in
the figure above. The well-known NAT Traversal UDP port
4500 is shared with
the IKE protocol when a NAT situation is detected between
the two IPsec endpoints. The detection is based on the
NAT_DETECTION_DESTINATION_IP notifications sent in the
that contain source and destination IP address hashes, respectively.
ESP-in-UDP encapsulation can be enforced even if no NAT situation exists by setting
encap = yes for a given connection definition in
swanctl.conf. If enabled, the
charon daemon will send a manipulated
NAT_DETECTION_SOURCE_IP notify payload so that it will look to the remote peer
as if there were a NAT situation.
ESP-in-UDP encapsulation means that an eight octet UDP header is inserted between
the IP Header and the ESP Header of the ESP packet. At the outset the UDP source
and destination ports are both set to the well-known value
4500 but might get
changed on the way by one or several NAT routers.
The first field in the ESP header right after the UDP header is the 32 bit non-zero Security Parameters Index (SPI).
If the first 32 bits right after the UDP header are set to zero then instead of
an encapsulated ESP payload packet, an IKE management packet is carried. Thus this
four octet all-zero Non-ESP Marker is used to differentiate between ESP and IKE
traffic. ESP packets are processed in the kernel, whereas the IKE packets are
forwarded to the
charon userland IKE daemon.
When a NAT router applies Port Address Translation to an outbound IP packet, the address/port mapping is stored in an internal lookup table together with a time-to-live value. This mapping is needed by the router so that inbound IP packets can be translated back to the original address/port values.
Since an established IPsec connection can be inactive for minutes or even hours,
the IPsec peer behind a NAT router has to send periodic NAT-T keepalive UDP
packets containing a single
0xff byte in order to refresh the NAT mapping entry
in the NAT router’s lookup table.
Of course the NAT-T keepalives also reach the IPsec peer on the other side of the
connection but the packets are silently dropped by the kernel. By default the
keep-alives are sent ever
20s but the interval can configured via the
charon.keep_alive parameter in
strongswan.conf (set to
0 to disable
sending keepalives, e.g. behind a static DNAT aka port forwarding).