1

My goal is configure a container behave as a router which load balances over a number of VPN connections.

To do this I'm probabilistically marking initiating packets with:

iptables -I PREROUTING -t mangle -j CONNMARK --restore-mark
iptables -A PREROUTING -t mangle -m statistic --mode random --probability .50 -j MARK --set-mark 200 -m mark --mark 0
iptables -A PREROUTING -t mangle -j MARK --set-mark 201 -m mark --mark 0
iptables -A POSTROUTING -t mangle -j CONNMARK --save-mark

Which selects one of two routing tables:

echo "200     tun0" >> /etc/iproute2/rt_tables
echo "201     tun1" >> /etc/iproute2/rt_tables 
ip rule add fwmark 200 table tun0
ip rule add fwmark 201 table tun1

I believe the routing table is being selected correctly, beacuase when I configure either of the tables tun0/1 to use the VPN gateway traffic seems to get to not get returned. A tcpdump shows traffic exiting but any command fails.

ip route add default 10.7.7.1 dev tun0 table tun0
ip route add default 10.7.7.1 dev tun1 table tun1

If tables tun0/1 use the non-VPN gateway 10.10.10.1 traffic behaves as expected. I can also select between VPN gateways by setting the default route on the main table:

ip route add default 10.7.7.1 dev tun0/1

So the problem appears to be when the VPN gateway is selected via one of the custom tables rather than the main table. Any clues/diagnostics/advice welcomed!

NB I've configured the requisite options :

echo 0 > /proc/sys/net/ipv4/conf/**/rp_filter
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter
sysctl -w net.ipv4.fwmark_reflect=1
sysctl -w net.ipv4.ip_forward=1

iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
iptables -t nat -A POSTROUTING -o eth1 -j MASQUERADE
iptables -t nat -A POSTROUTING -o tun0 -j MASQUERADE
iptables -t nat -A POSTROUTING -o tun1 -j MASQUERADE

ANSWER:

@A.B 's answer provides the solution. I need to add a route for traffic returning to the local network in the tun0/1 tables:

ip r a 10.10.10.0/24 via 10.10.10.1 table tun0
ip r a 10.10.10.0/24 via 10.10.10.1 table tun1

As @A.B said, without these marked packets are sent back out the tun on which they were recieved.

1 Answer 1

1

Let's follow what happens.

  • A packet (first of a new flow) arrives from a non-tunnel interface
  • conntrack create a new entry for this packet starting a new flow
  • packet receives (randomly, this time: ) mark 200 before the routing decision
  • packet gets routed using table 200
  • table 200 has a single possibility: packet will be sent through tun0
  • packet's mark gets saved for the whole flow in its conntrack entry (ie: the connmark).

So far all good, the packet (and its flow) has been load-balanced through tun0.

Now what happens when a reply packet in this flow comes back?

  • reply packet arrives from tun0

  • reply packet is identified by conntrack as part of an existing flow

  • packet inherits mark 200 from its connmark associated to the existing flow before the routing decision

  • packet gets routed using table 200

  • table 200 has a single possibility: packet will be sent through tun0

    Oups: the reply packet is routed back from where it came: the tunnel interface, instead of from where the initial packet of the flow came from.

  • depending on the next hop router (the tunnel remote endpoint) also having disabled Strict Reverse Path Forwarding (rp_filter=0) or not, the packet either gets dropped or is again routed back creating a loop until its decrementing TTL reaches 0.

So the problem appears to be when the VPN gateway is selected via one of the custom tables rather than the main table.

Indeed, the main routing table has more than a single default route. It typically includes one or more LAN routes. So when no mark is involved, the reply is routed correctly following an evaluation of all of the main routing table entries, not just following its default route.

These additional LAN routes: routes using eth0 and eth1 or at least the one involving client requests if not both, must also be copied to the additional routing tables 200 and 201.


Additional remark (which doesn't apply to OP's case): In a setup working in the opposite direction: original flows from separate nodes that use the very same (private) IP source address toward the same service, there could be two distinct flows looking identical (same 5-uple protocol,saddr,sport,daddr,dport) except for their tunnel interface. By default conntrack would see a single flow. To prevent this, one can use conntrack zones, (with a value chosen to represent the interface) to have conntrack handle them separately.

3
  • Fab, thanks @A.B ! I've added the route '10.10.10.0/24 via 10.10.10.1' to both tun0/1 tables, now traffic returns as expected. If I understand your additional remark, there's a problem if any services are exposed via the VPN? Fortunately that's not the case and connections will always be outbound. Am I understanding that correctly?
    – simonw
    Commented Apr 27, 2023 at 10:39
  • 1
    @simonw Yes you got it. It would be a problem if two different systems using the same source address (obviously not a public one) are only differentiated from the interface they send packets through and reach the same destination. As you confirmed things I'll edit it a little later but will still leave something in place in case it can help other readers with other use cases.
    – A.B
    Commented Apr 27, 2023 at 13:09
  • 1
    @simonw also, while this is needed in some setups, I wouldn't expect to have a directly attached LAN 10.10.10.0/24 requiring a gateway in the same LAN. if 10.10.10.0/24 is on eth0, then that would be simply 10.10.10.0/24 dev eth0. Else even if that still works, 10.10.10.1 will probably start issuing ICMP redirects to tell the destination is directly reachable (or if 10.10.10.1 is the local system itself, this has just no effect).
    – A.B
    Commented Apr 27, 2023 at 18:14

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .