AWS: Creating VPNs between VPCs in different regions

At Amazon Web Services, one of the most common practices is to divide one account into multiple sub-accounts, where each one of them have their own credentials, instances and services. This practice adds complexity when we have a big network, but facilitates many things like: each account can have people with different roles and authorizations, without complex IAM rules; each cost center is entirely separated in a much easier way (without using tags, as it would be in the case of only one account); big projects may have their own totally independent infrastructure; among others.

When we have multiple accounts in the same company, we usually need to link these accounts in a secure way. A great way of doing this is using the VPC Peering to create VPCs in the same region. However, when the VPCs are in different regions, how can they communicate with each other? In this case, instead of using the native VPC Peering offered by AWS, we can create EC2 instances with IPSEC configured, to establish cryptographed VPNs between any network. We call this type of VPN a site-to-site VPN.

Here at Movile we use multiple AWS accounts in many different regions. To keep the monitoring and automation services up and secure, we had to implement these mechanisms. And here, in this tutorial, we show you how we did this.

Preparing

Before setting up the VPN between VPCs in different regions, it’s good to have the following observations in mind:

  • As with VPC Peering, it’s preferable that the two VPCs use CIDRs that don’t conflict. It’s important that, when creating a VPC, one begins to reserve subnets in an organized form, so they would never repeat. If, for any reason, the subnets CIDRs collide, it’s still possible to configure the scenario, but it gets a little more complex.
  • For each VPC you must have an EC2 instance running, and this includes an extra cost for these running servers.
  • Instances created with the ipsec will be a single point of failure for communication between the VPCs. It’s important that you also configure some kind of High Availability scheme to keep at least one ipsec instance always running.
  • As the instance is a single point of failure, all network bandwidth between VPCs will pass through this instance. It’s really important to choose a proper AWS instance, with good network connectivity (for high network throughput) and processing (a little less important, but necessary for the packets’ encryption between networks). For a network with less communication, a m3.medium type may be sufficient. For bigger networks, with dozens of instances and considerable traffic, we recommend a m3.large or up.
  • The softwares used in this tutorial are: CentOS and openswan (or the newer libreswan). Although the installation method can vary, everything works normally when using other distributions like Debian or Ubuntu.

Example structure

These are the data for our example structure. Feel free to replace any values with your configuration.

VPC 1 (us-east-1)

  • CIDR: 10.110.70.0/24
  • Public subnet: 10.110.70.0/25
  • Private subnet: 10.110.70.128/25
  • Instance hostname: openswan-br
  • Instance type: m3.large
  • Instance elastic IP: 54.152.52.80

VPC 2 (sa-east-1)

  • CIDR: 10.110.80.0/24
  • Public subnet: 10.110.80.0/25
  • Private subnet: 10.110.80.128/25
  • Instance hostname: openswan-br
  • Instance type: m3.large
  • Instance elastic IP: 54.94.177.156

VPC Specifications

  • The instances are located at the public subnet. This is mandatory because they need a fixed Elastic IP on the Internet in order to connect with each other.
  • The public subnet has an IGW (Internet Gateway) configured and associated with the default route (0.0.0.0/0). The private subnet has only the route to the local network, using the CIDR initially specified on the VPC.

Creating the instances

First of all, allocate an Elastic IP on each VPC and reserve, to put them on the instances. Write down the IPs for each VPC.

allocate-elastic-ip

Or with the CLI:

aws ec2 allocate-address --domain vpc

In our case, the elastic IPs are described in the topic with our example structure’s definitions.

Now, create a security group called ipsec in each VPC, releasing what ipsec needs to function correctly: UDP 500, UDP 4500, Custom Protocol 50, Custom Protocol 51. Remember that each VPC’s security group will allow access to the instance at the other VPC. In our example, we’ll get something like this:

For us-east-1:

ipsec-security-group

For sa-east-1:

ipsec-security-group2

Now, create two instances, one on each VPC, using those security groups.

Next, you’ll have to disable the Source/Destination Check at the instance’s network interface. By default, when you create an instance, it receives a virtual network interface (ENI) and this check comes enabled. Disabling this option means that the instance will receive traffic that was not for it (e.g, to mitigate IP spoof). This is exactly what we want for our instances, since they will route traffic among different VPC networks through ipsec.

To do this through the AWS panel:

src-dest-check-menu

src-dest-check

Or with the CLI:

# The ID eni-7e712e1b corresponds to the instance's interface. Replace with your instance's IDe
aws ec2 modify-network-interface-attribute --network-interface-id eni-7e712e1b --no-source-dest-check

Configuring the VPC routes

This part is extremely important, because without it the instances inside each VPC won’t be able to communicate with the other VPC. To configure the needed routes at the AWS panel, go to VPC – Virtual Private Cloud – Route Tables and select the main route of your VPC. Here, we have two route tables: one for the public subnet (tables on the left) and another for the private network (tables on the right).

In each route table, add one route to the CIDR of the other VPC. In our example, we’ll get something like this:

For us-east-1:

routes-useast1

For sa-east-1:

routes-saeast1

Notice the blue marked routes: they tell that, if the instances try to communicate with the network at the other VPC, the packets will be routed to the instance that we created, so they can be routed through the VPN/ipsec software. This type of route is also useful to create NAT instances on AWS.

Configuring kernel parameters

As the instances will function like routers between networks, it’s necessary to activate/deactivate some kernel parameters.

Inside the instances, create a file called /etc/sysctl.d/ipsec.conf with the following content:

net.ipv4.ip_forward = 1

net.ipv4.conf.all.accept_redirects = 0
net.ipv4.conf.default.accept_redirects = 0
net.ipv4.conf.all.send_redirects = 0
net.ipv4.conf.default.send_redirects = 0

net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0

Load these configurations immediately, with the command:

sysctl -p /etc/sysctl.d/ipsec.conf

If the instance gets a reboot, the distributions load the files with the extension .conf inside the /etc/ipsec.d/ipsec.conf directory, so this part is complete!

Configuring openswan/libreswan

Now, it’s time to configure ipsec on the instances. You can install openswan on CentOS with:

yum install openswan

Or using Debian/Ubuntu, with:

apt-get install openswan

In both instances, create the file /etc/ipsec.conf with the following content:

config setup
	protostack=netkey
	dumpdir=/var/run/pluto/
	nat_traversal=yes

include /etc/ipsec.d/*.conf

The last line is important and serves to organize configurations in separate files inside the /etc/ipsec.d/ directory. It’s also interesting to notice that the nat_traversal option is enabled, because AWS instances don’t have Public IPs inside them.Instead, they use an allocated IP from the VPC routers (in practice it is as if the VPN instance went to the internet through NAT).

Instance 1: openswan-us

Now, on the openswan-us (us-east-1), create the file /etc/ipsec.d/us-to-br.conf with the following content:

conn us-to-br
	type=tunnel
	authby=secret
	left=%defaultroute
	leftid=54.152.52.80
	leftnexthop=%defaultroute
	leftsubnet=10.70.0.0/24
	right=54.94.177.156
	rightsubnet=10.80.0.0/24
	pfs=yes
	auto=start

The left* options indicate the local instance network (us-east-1). The right* options correspond to the instance’s network at the other VPC (sa-east-1). Notice that the elastic IPs are specified on the leftid and right options.

To establish the ipsec connection between the instances, both have to know a shared password (Pre-Shared Key, or PSK). This password/key will be used as the key to crypt/decrypt packets.

Again, on the openswan-us server, create the file /etc/ipsec.d/us-to-br.secrets with the following content:

54.152.52.80 54.94.177.156: PSK "iamasecretpsk"

Note that the first IP is the elastic IP of the origin (left), and the second is from the other end (right). This means that the PSK “iamasecretpsk” will be used when the connection is made between the two servers. Later in this tutorial, you’ll see that the other end will have a very similar file with the same key.

Remember! Create a strong and big key, with uppercase and lowercase letters, numbers and symbols.

Oh, and as the file contains a password, make sure that the its permission is secure:

chmod 600 /etc/ipsec.d/us-to-br.secrets

Instance 2: openswan-br

On the openswan-br (sa-east-1), create the file /etc/ipsec.d/br-to-us.conf with the following content:

conn br-to-us
	type=tunnel
	authby=secret
	left=%defaultroute
	leftid=54.94.177.156
	leftnexthop=%defaultroute
	leftsubnet=10.80.0.0/24
	right=54.152.52.80
	rightsubnet=10.70.0.0/24
	pfs=yes
	auto=start

Now you can see that we switched the values. The left* options indicate the sa-east-1 VPC, while the right* options correspond to the us-east-1 VPC (that is now the other end).

Configure the key file /etc/ipsec.d/br-to-us.secrets with the following content:

54.94.177.156 54.152.52.80: PSK "iamasecretpsk"

And the file permission:

chmod 600 /etc/ipsec.d/br-to-us.secrets

Establishing the VPN

With openswan configured, start the service:

service ipsec start

After that, use the verify command to see if everything is right. If something shows in red, you need to review your configuration. Example:

[root@openswan-us ipsec.d]# ipsec verify
Verifying installed system and configuration files

Version check and ipsec on-path                   	[OK]
Libreswan 3.8 (netkey) on 3.10.0-123.20.1.el7.x86_64
Checking for IPsec support in kernel              	[OK]
 NETKEY: Testing XFRM related proc values
         ICMP default/send_redirects              	[OK]
         ICMP default/accept_redirects            	[OK]
         XFRM larval drop                         	[OK]
Pluto ipsec.conf syntax                           	[OK]
Hardware random device                            	[N/A]
Checking rp_filter                                	[OK]
Checking that pluto is running                    	[OK]
 Pluto listening for IKE on udp 500               	[OK]
 Pluto listening for IKE/NAT-T on udp 4500        	[OK]
 Pluto ipsec.secret syntax                        	[OK]
Checking NAT and MASQUERADEing                    	[TEST INCOMPLETE]
Checking 'ip' command                             	[OK]
Checking 'iptables' command                       	[OK]
Checking 'prelink' command does not interfere with FIPSChecking for obsolete ipsec.conf options          	[OK]
Opportunistic Encryption                          	[DISABLED]

After you do this on both instances, check the connection status:

ipsec auto --status

The last lines are what matter to us:

000 "us-to-br": 10.70.0.0/24===10.70.0.106[54.152.52.80]---10.70.0.1...54.94.177.156<54.94.177.156>===10.80.0.0/24; erouted; eroute owner: #4
000 "us-to-br":     oriented; my_ip=unset; their_ip=unset;
000 "us-to-br":   xauth info: us:none, them:none,  my_xauthuser=[any]; their_xauthuser=[any]; ;
000 "us-to-br":   modecfg info: us:none, them:none, modecfg policy:push, dns1:unset, dns2:unset, domain:unset, banner:unset;
000 "us-to-br":   labeled_ipsec:no, loopback:no; 
000 "us-to-br":    policy_label:unset; 
000 "us-to-br":   ike_life: 3600s; ipsec_life: 28800s; rekey_margin: 540s; rekey_fuzz: 100%; keyingtries: 0;
000 "us-to-br":   sha2_truncbug:no; initial_contact:no; cisco_unity:no; send_vendorid:no;
000 "us-to-br":   policy: PSK+ENCRYPT+TUNNEL+PFS+UP+IKEv2ALLOW+SAREFTRACK+IKE_FRAG; 
000 "us-to-br":   conn_prio: 24,24; interface: eth0; metric: 0; mtu: unset; sa_prio:auto;
000 "us-to-br":   newest ISAKMP SA: #8; newest IPsec SA: #4; 
000 "us-to-br":   IKE algorithm newest: AES_CBC_128-SHA1-MODP2048
000 "us-to-br":   ESP algorithm newest: AES_128-HMAC_SHA1; pfsgroup=<Phase1>
000  
000 Total IPsec connections: loaded 1, active 1
000  
000 State list:
000  
000 #8: "us-to-br":4500 STATE_MAIN_R3 (sent MR3, ISAKMP SA established); EVENT_SA_REPLACE in 1739s; newest ISAKMP; lastdpd=-1s(seq in:0 out:0); idle; import:not set
000 #4: "us-to-br":4500 STATE_QUICK_R2 (IPsec SA established); EVENT_SA_REPLACE in 16497s; newest IPSEC; eroute owner; isakmp#3; idle; import:not set
000 #4: "us-to-br" esp.d0297fbe@54.94.177.156 esp.1d146548@10.70.0.106 tun.0@54.94.177.156 tun.0@10.70.0.106 ref=0 refhim=4294901761 Traffic: ESPin=840B ESPout=840B! ESPmax=4194303B

Besides the configuration description and the path that the packets make between IPs, we also have the last lines, that must have these strings: ISAKMP SA established and IPsec SA established. If those messages appear, it means that the VPN was successfully established.

Another useful information to see if the connection works is to verify if there is an ipsec policy directing the other VPC’s traffic, using the elastic IP. For example, running this on openswan-br:

[root@openswan-br ~]# ip xfrm policy
src 10.80.0.0/24 dst 10.70.0.0/24 
	dir out priority 2344 ptype main 
	tmpl src 10.80.0.32 dst 54.152.52.80
		proto esp reqid 16385 mode tunnel
src 10.70.0.0/24 dst 10.80.0.0/24 
	dir fwd priority 2344 ptype main 
	tmpl src 54.152.52.80 dst 10.80.0.32
		proto esp reqid 16385 mode tunnel

Looking at the output, we can see that the traffic between networks 10.80.0.0/24 and 10.70.0.0/24 uses the Elastic IP (54.152.52.80) and passes through the tunnel.

Oh, and don’t forget to enable the service to start at the boot:

chkconfig ipsec on

Testing

The first thing that we can use to test and see if everything works is ping!

From us-east-1 instance to sa-east-1:

[root@openswan-us ~]# ping -c3 10.80.0.32
PING 10.80.0.32 (10.80.0.32) 56(84) bytes of data.
64 bytes from 10.80.0.32: icmp_seq=1 ttl=64 time=123 ms
64 bytes from 10.80.0.32: icmp_seq=2 ttl=64 time=123 ms
64 bytes from 10.80.0.32: icmp_seq=3 ttl=64 time=123 ms

--- 10.80.0.32 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 123.103/123.254/123.417/0.425 ms

From sa-east-1 instance to us-east-1:

[root@openswan-br ~]# ping -c3 10.70.0.106
PING 10.70.0.106 (10.70.0.106) 56(84) bytes of data.
64 bytes from 10.70.0.106: icmp_seq=1 ttl=64 time=123 ms
64 bytes from 10.70.0.106: icmp_seq=2 ttl=64 time=123 ms
64 bytes from 10.70.0.106: icmp_seq=3 ttl=64 time=122 ms

--- 10.70.0.106 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 122.954/123.128/123.427/0.212 ms

In these examples, 10.80.0.32 is the internal IP for openswan-br and 10.70.0.106 is the internal IP for openswan-us. If ping is working, great!

But besides pinging from one instance to another, we must ping other instances inside the subnets, to see if the VPC route tables are correct.

Suppose we have two more instances on the private subnets (or public, it doesn’t matter, since we configured both route tables). They will be identified as:

  • client-us: IP 10.70.0.250
  • client-br: IP 10.80.0.14

Testing with ping from one to another:

From us-east-1 instance to sa-east-1:

[root@client-us ~]# ping -c3 10.80.0.14
PING 10.80.0.14 (10.80.0.14) 56(84) bytes of data.
64 bytes from 10.80.0.14: icmp_seq=1 ttl=62 time=126 ms
64 bytes from 10.80.0.14: icmp_seq=2 ttl=62 time=125 ms
64 bytes from 10.80.0.14: icmp_seq=3 ttl=62 time=125 ms

--- 10.80.0.14 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 125.461/125.954/126.564/0.457 ms

From sa-east-1 instance to us-east-1:

[root@client-br ~]# ping -c3 10.70.0.250
PING 10.70.0.250 (10.70.0.250) 56(84) bytes of data.
64 bytes from 10.70.0.250: icmp_seq=1 ttl=62 time=125 ms
64 bytes from 10.70.0.250: icmp_seq=2 ttl=62 time=125 ms
64 bytes from 10.70.0.250: icmp_seq=3 ttl=62 time=125 ms

--- 10.70.0.250 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2002ms
rtt min/avg/max/mdev = 125.272/125.513/125.827/0.470 ms

Another interesting way to see the route working, is to use traceroute. An example from the client-us to client-br:

[root@ip-10-70-0-250 ~]# traceroute -n 10.80.0.14
traceroute to 10.80.0.14 (10.80.0.14), 30 hops max, 60 byte packets
 1  10.70.0.106  2.791 ms  2.740 ms  2.697 ms
 2  10.80.0.32  124.710 ms  124.682 ms  126.029 ms
 3  10.80.0.14  127.201 ms  127.158 ms  127.109 ms

We can see that the packet goes first to the openswan-us instance (10.70.0.106), then goes directly to the openswan-br instance (10.80.0.32) on the other VPC, then to its final destination: client-br. If the VPN was not active or the VPC route table was not configured, the packet would try to go through the Internet and it would not work, because the 10.80.0.0/24 network does not publicly exists on the Internet.

Troubleshooting

My two openswan instances ping between each other, but the internal instances don’t

  • Check if the kernel parameter /proc/sys/net/ipv4/ip_forward is 1. It needs to be 1.
  • Check if the VPC route table for your instances is configured to use the openswan instance’s network interface (ENI) when trying to reach the other VPC network.
  • Check if the openswan instances are in a security group that allows traffic between all instances on internal subnets

Sometimes the transfer between VPCs gets too slow

Make sure that the instance type used at the openswan instances supports the network throughput that you need. Raise the instance type to a better one and see if the problem persists. Use programs like iptraf to see the bandwidth in real time and the iperf to make tests between networks.

A common problem that occurs in this scenario is loss of packets.

Openswan can’t establish a connection to the other end

If you check the ipsec status and see something like this:

000 #1: "us-to-br":500 STATE_MAIN_I1 (sent MI1, expecting MR1); EVENT_RETRANSMIT in 23s; nodpd; idle; import:admin initiate
000 #1: pending Phase 2 for "us-to-br" replacing #0

it’s because you didn’t pass the Phase 1 of the ipsec connection. This means that openswan couldn’t even get to the other instance to negotiate the key, cryptography, networks, and so on.

This is generally easy to solve:

  • Make sure that both openswan instances are in a security group that allows: UDP 500, UDP 4500, Custom Protocol 50, Custom Protocol 51.
  • Review your openswan configuration and see if the left* and right* values are correct.

There’s no traffic between VPCs

Supposing openswan could establish a connection and negotiate Phase 1 and Phase 2, start monitoring the packets to see if they’re coming and getting crypted/decrypted. On both ends, use the command:

ip xfrm monitor

You should see packets come and go between the local IP from the instance and the elastic IP on the other end.

Also check if there’s no local firewall on the instance (iptables) blocking packets. Pay attention to the FORWARD chain at the filter table, and make sure there are no SNAT/DNAT rules rewriting the packets.

Finally, try what was described at the first item of this troubleshooting section.

My traffic only works with NAT or only works on one VPC

Some tutorials on the Internet teach how to configure openswan on the same instance as NAT to the Internet (a common scenario). It’s possible that, in this configuration, packets to the other VPC go with the elastic ip at the source field. There are two ways to identify when this problem occurs:

  • When analyzing traffic from the openswan instance, the packets go with the Elastic IP and not the internal one
  • When analyzing traffic between internal instances from both VPCs, the IP that arrives on the desination is always from the openswan instance, not from the internal instance (origin)

In the first case, when we execute tcpdump on the openswan instance and do a test with ping, we can see something like this:

[root@openswan-us ~]# tcpdump -i any -nn icmp
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
16:01:49.597273 IP 54.152.52.80 > 10.80.0.14: ICMP echo request, id 13053, seq 2, length 64
16:01:50.597281 IP 54.152.52.80 > 10.80.0.14: ICMP echo request, id 13053, seq 3, length 64
16:01:51.597269 IP 54.152.52.80 > 10.80.0.14: ICMP echo request, id 13053, seq 4, length 64

Notice that the origin IP is the elastic IP, but it should be the openswan-us internal IP (10.70.0.106). And also: the ping won’t work because when it reaches the other end, it’ll try to come out from the Internet and not from the tunnel (since the elastic IP is public, and not from the 10.70.0.0/24 subnet).

This can happen if you use the following line on the connection configuration in openswan:

leftsourceip=54.152.52.80

So, don’t use this line unless you really know what you’re trying to do.

Other symptom of this problem is that when the VPN connection gets established, openswan will create an additional route:

[root@openswan-us ~]# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.70.0.1       0.0.0.0         UG    0      0        0 eth0
10.70.0.0       0.0.0.0         255.255.255.128 U     0      0        0 eth0
10.80.0.0       10.70.0.1       255.255.255.0   UG    0      0        0 eth0

This route to the 10.80.0.0 network doesn’t need to exist. If you don’t use the leftsourceip option, the route won’t be created and the packets will normally pass through the tunnel, without being rewritten.

If you didn’t use this option, also check on iptables if there’s a NAT/Masquerade that rewrites the packets. For example, this common rule at NAT instances would not be a good idea:

iptables -t nat -A POSTROUTING -s 10.70.0.0/24 -j MASQUERADE

This would make the iptables rewrite the packets to always use the openswan IP instead of the instance’s origin IP. A better rule would be:

iptables -t nat -A POSTROUTING -s 10.70.0.0/24 ! -d 10.80.0.0/24 -j MASQUERADE

This way the instance would do NAT to its entire network, unless the packets go to the other VPC. The packets that go through the tunnel would not be rewritten.

References