Building Dual Stack Ipv4 / Ipv6 Router On Linux: Linuxcon Japan, Yokohama, June 6-8, 2012
Building Dual Stack Ipv4 / Ipv6 Router On Linux: Linuxcon Japan, Yokohama, June 6-8, 2012
Building Dual Stack Ipv4 / Ipv6 Router On Linux: Linuxcon Japan, Yokohama, June 6-8, 2012
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 2
spread
IPv4 Residential/SOHO Router Basic
Features
Gateway – ISPs view as focal hub for
• Cable/IPTV streaming to consumer devices
• Media Server, NAS/Print server, Home Automation, Guest Access
• ISP – eventually IPv6 Support required
Provisioning - DHCPv4/DNS Client, Server (PPPoE, PPTP, L2TP going away) – Impacts IPv6
• IPv6 Provisioning - WAN dhclient6 – IP address; LAN Prefix delegation + host OUI Unique IP and DHCP info request
• DNS – proxy server/or direct recursive – can handle resolving/caching ‘AAAA’ records
Performance
• Lowest 116Mbps/DS – 104Mbps/US – both DOCSIS/GPON will push up to 1Gbps
• Packet routing Zero impact to application Processor – ISPs require Offload engine for predictable growth
Network Features
• NAT – IPv6 N/A - Whole purpose of IPv6 – global IP space
• Bridging (i.e. Wireless/LAN, others) – IPv6 no impact
Bridged LAN+SSID and/or VLANs – only L2 header matters
• WDS – depending on topology - at L2 no impact at L3 extra configuration – rare configuration.
• VLANs (port based/Tagged) – Some IPv6 Impact
For routed vlan interfaces – Typical IPv6 Provisioning – LL, Prefix+OUI, DAD, IPv6 Routing
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 3
spread
IPv4 Residential/SOHO Basic Router
Features
• Multicast routing
In IPv4 Gateway manages hosts queries/reports and forwards them on WAN and sets up multi-cast routes
Same is needed in IPv6
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 4
spread
IPv4 Residential/SOHO Router
Features
Security
• IP/Port/Protocol Filtering - Same considerations as IPv4, require IPv6 FW rules
• ALGs
Same considerations as IPv4 (with exception of packet rewrite), require IPv6 equivalent ALGs for port opening
• SPI
Same considerations as IPv4 (tracking original/reply directions), require IPv6 equivalent FW rules
• Port Triggering - impacts IPv6
Same considerations as IPv4 (open in-bound port based on configured out-bound port), require IPv6 equivalent
impl.
• UPnP IGD – IPv6 impact
Management/Accessibility
• SSH, TFTP
For management require both protocol types to work
• Web (HTTP/HTTPS)
For UI require access from both protocols
• SNMP
Manage SNMP MIBs for IPv6 as well
• NTP – may not require IPv6
Many other features - Gateways today do lot more work! - NAT Bypass, VPN Pass-Through , Routed Subnets,
Parental Control, TR-69, Wireless Roaming,….
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 5
spread
Well known IPv4 Limitations
Well known Issues with IPv4 –
• Small IP Range – NAT implemented to reuse (private address range 10.xx…, 192.168….., 172.16. …) – Basic NAT operation
Example Outbound
SNAT: Src: 2.3.4.5:2012, Dst: 1.2.3.4:3012
Inbound DST: 1.2.3.4
SRC: 10.0.0.2 DNAT: Src: 1.2.3.4:3012, Dst:10.0.0.2 Port: 3012
Port: 2012
10.0.0.1 2.3.4.5
NAT Bindings
NAT introduces many issues – Performance – SNAT on way out, DNAT on way in – packet rewrite, IP, TCP checksum update
• NATs come in different flavors –
Symmetric (original dst ip/port only) – most restrictive
Full Cone (any ip/port) – least restrictive – can handover connection to other server
Restricted Cone NAT (original dst, any port) – handover within server
Restricted Port NAT (any dst, orig dst port) – handover across server with same port
NAT traversal discovery protocols STUN, TURN, ICE
IPv6 – Can connect from any IP/port to private device behind FW (TCP/UDP)
• Packet Rewrite (aka ALGs) – few protocols affected – complicates processing, affects CPU offload engines – few examples
PPTP (uses GRE) – several clients behind FW with same Call ID connecting to Server – must rewrite Call ID (on way out and in)
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 6
spread
Well known IPv4 Limitations
IPv4 Internet
192.168.1.1 4.3.2.1
DNAT
192.168.1.100 1.2.3.4
Request to 4.3.2.1/<port>
Translated to 192.168.1.100/<port>
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 7
spread
Related IPv6 Background
Primarily based on Cable eRouter standard for IPv6 delivery to premises
IPv6 Addressing
• In theory IPv6 has 340,282,366,920,938,463,463,374,607,431,768,211,456 address and IPv4 4,294,967,296 in practice
64 bits are used for subnet mask and 64 bits for host, probable to further subdivide the subnet without OUI usage
• IPv6 addresses can be huge - f.e. 128.91.45.157.220.40.0.0.0.0.252.87.212.200.31.20 - this would be an IPv6 address
• New Notation – compact hex with 16 bits 805B:2D9D:DC28:0000:0000:FC57:D4C8:1F14 and multiple 0’s can be collapsed
so address becomes 805B:2D9D:DC28::FC57:D4C8:1FFF
• Subnet notation CIDR like 805B:2D9D:DC28::/48 – routing prefix match is on first 48 bits
• Address Space Allocation dictated by bit prefixes
Loopback ::1/128 – the 0’s collapse to ::
Global Unicast 2000::/3
Link-Local unicast FE80::/10 – address in range can’t make it outside of subnet
Multicast FF00::/8
4-byte defines scope 2 – link local, …, 8 site local – MC can go beyond local subnet: FF02::1 all nodes MC, …
Unspecified address ::/128 – f.e. in DHCP messages when host does not know its IP address
• The elegant notation for ‘loopback’ interface comes from here - 0:0:0:0:0:0:0:1 is reduced to ::1
• IPv6 address Network/Host; Host is OUI constructed - for MAC = 39-A7-94-07-CB-D0 Host is: 3BA7:94FF:FE07:CBD0
• IPv6 Header is 40 bytes – twice IPv4 size
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 8
spread
Relevant IPv6 Background
IPv6 Packet Structure – combination of main header and extension header and option fields
• It does not have a header checksum
• Next Header TCP, UDP, Hop by Hop Header, ESP, AH, ICMPv6
• Couple Examples – of IPv6 Payload construction
NH=6 Src Dst TCP NH= Src Dst Frag NH=6 TCP
44 hdr
NH= Src Dst ESP NH= Src Dst Frag NH=6 TCP Trailer – pad, NH=41 ESP Auth
50 hdr 44 hdr
ENCRYPTERD
AUTHENTICATED
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 9
spread
Relevant IPv6 Background
• Routers don’t fragment, only hosts – minimum Fragment size changes from 576 to 1280
• Key Changes since IPv4
Renamed - Traffic class (IPv4 TOS), hop limit (IPv4 TTL)
Payload Length, Next Header, IPv4
Added - Flow Label; Removed - Internet Header Length, Identification, Flags (MF, ...), Fragment Offset, header checksum.
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 10
spread
Related IPv6 Background
Neighbor Solicitation Messages, for Neighbor disc, DAD, IP addr change – msg types NS, NA
Target address - 'Unsolicited Node MC Address' - FF02::1:FF<XX:YYYY> - followed by lower
24 bits of target IP address (33:33:FF:XX:YY:YY)
Only Hosts subscribed to MC address receive – helpful to mobile devices in PM mode.
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 11
spread
Related IPv6 Background
Security is built in – AH, ESP are another options that encapsulate the
payload
• Allows even securing NS, NA functions – in IPv4 ARP can’t be secured, MIM attacks eliminated
• Eliminates NAT issues - with several clients behind the FW (when same SPI is used)
• Related to security – scanning IPv6 address space much harder – larger address space
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 12
spread
Related IPv6 Background
More on Stateful Configuration – More in Linux Dual-Stack GW
DHCPv6 three primary goals
• Address Configuration
• Non-Address Configurable Parameters (like DNS Server, Domain Name, NTP server …..)
• Prefix delegation – provide several prefixes (eRouter implementation)
Client known as “Request Router”, Server as “Delegating Router”
DHCP Server and Relay agent MC address: FF02::1:2 – this MC Link scope
• Used by IPv6 DHCP clients, LL addresses are acceptable in source field for clients, although client may
use the unspecified address ::0/128
• The recipient(s) may be a Server or Relay agent
All DHCP servers MC address: FF05::1:3 – this site scope
• Used by relay agents to contact a DHCP server, in this case the relay agent must have non-LL address
UDP over IPv6, Client uses port 546 and server uses port 547, that is client sends to port 547,
server replies to port 546.
DHCPv6
Server IPv6
Lan segment 1 Client
DHCP Clients
DHCPv6
Relay
eth0
eth3
lan segment 2 IPv6
DHCPv6 DHCP Clients Client
Server eth1
eth4
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 13
spread
Related IPv6 Background
Architecture Principles
• Each subnet should not require a DHCP server, but at least a Relay agent
• DHCP server in the cloud accessed through the FF02::1:2 relay agent. The relay agent then uses a site local address DHCP server.
• There can be several DHCPv6 Relay agents in route.
• Default DHCP client behavior is to send requests to local MC address
• Clients in IPv6 are known by a DUID (as opposed to IPv4 MAC or client user identifier option)
Combination of MAC and some other string
• For relay – interface identifier used to route back responses (in above figure DHCP Relay disambiguates between LAN Segment 1 or 2).
Messaging – many similarities to IPv4
• SOLICIT, ADEVERTISE – client locate DHCP server, Server Advertises
• REQUEST, REPLY – client request parameters including addresses, Server replies
INFORMATION-REQUEST – client sends to server - reqst config params without an IP address assignment
(F.e. Stateless + Managed)
Two and Four Message Exchanges
• Two Message – ‘Rapid Commit’ - both client and server need to be configured for ‘rapid commit’
The client sends a SOLICIT and internally sets the ‘rapid commit’ option,
server responds with REPLY IP and config info
• Few other scenarios for 2-message exchange – like info request
Four Message Exchange – SOLICIT, ADVERTISE, REQUEST, REPLY
DNS in IPv6
• New Resource Record ‘AAAA’ added for Forward Lookup – ‘A’ for IPv4
• Reverse lookup – new reverse tree ‘ip6.arpa’ – ‘in-addr.arpa’ for IPv4
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 14
spread
Background on Conn Tracking, NAT, FW for
IPv4
Connection tracking – key concepts – few practical examples will follow
• Key to understanding stateful, FW, NAT and CPU Offload Engine
• Clear up confusion on ALGs in IPv6
• Foundation of Stateful Packet Inspection
• Conn tracking few key concepts
concept of ORIGINAL, REPLY direction
Monitor REPLY packet on receive – if FW rules – Filter/FORWARD – ESTABLISHED let through
conntrack structure with a hash – associated with each flow/session
FW rules integrated with conntracking – facilitates generic stateful packet inspection (i.e. FW rules)
• NAT – requires conntracking – to manage mapping for SNAT (MASQUERADE SNAT variant), DNAT
• Basis of ALGs – two parts – although coined as one –
Inbound Port opening (for example FTP)
Packet rewriting - (also FTP another good example)
• ALGs build on conntracking
Basis of ALGs – helper to monitor packet flow of ALG control stream
On match create an expectation, on hit mark as RELATED, - Filter/FORWARD – RELATED let through
• Network Offload Engine – Several reasons for it
New Technologies DOCSIS, GPON – high DL/UL data rates (128,256Mbps)
BOM – must be low
Preferable CPUs/SMP – built for high data rates into future – CPU(s) dedicated to Apps - QAM tuning/Stream MPEG over IP, D
LNA/UPnP, NAS
Offload Engine – tightly integrated to contracking, ALGs
ALG Control stream let OS handle
Mansge TLU, entry into/out of OS
Mange conntrack states like – conntrack timeout
Many other – routing table updates, interfaces up/down, ….,
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 15
spread
IPv4 Connection Tracking – SPI
Key Connection Tracking Structures Conntrack hash table – can hit in original or reply
• tuplehash – has the original and reply tuples and links direction
for hash table Default Namespace
init_net.ct.hash[]
• status – and other fields pointers, important later
nf_conn nf_conn
….
original
struct nf_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX];
reply
struct hlist_nulls_node hnnode
struct nf_conntrack_tuple tuple;
SRC DST
Ex: IP=1.2.3.4/3001 proto=TCP
Ex: IP=10.0.0.2/2001 proto=TCP dir = DIR_ORIGINAL
….
SRC DST original
Ex: IP=1.2.3.4/3001 proto=TCP Ex: IP=10.0.0.2/2001 proto=TCP
dir=DIR_REPLY
reply
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 16
spread
IPv4 Connection Tracking – SPI
Example of stateful firewall, with no NAT -- subset of Network stack Tables and Chains illustrated
FILTER Table/FORWARD Chain
int ip_forward()
{
return NF_HOOK(NFPROTO_IPV4, NF_INET_FORWARD,…, ip_forward_finish);
iptable_filter_hook,() - rule: input dev=wan, skb status ESTABLISHED or RELATED action ACCEPT
}
ip_rcv()
{ int ip_mc_output(struct sk_buff *skb)
return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, ip_rcv_finish) {
CONNTRACK/PREROUTING: ipv4_conntrack_in() return NF_HOOK_COND(NFPROTO_IPV4, NF_INET_POST_ROUTING, … , ip_finish_output, ..)
} CONNTRACK/POSTROUTING Chain: ipv4_confirm()
}
From WAN/LAN
To WAN/LAN
netif_receive_skb()
dev_queue_xmit()
Following Rule in Forward Chain: iptables –A FORWARD –i $WAN-IFACE –m state –state ESTABLISHED,RELATED –j ACCEPT
LANWAN: Dst: 10.0.1.2, Src: 10.0.0.10
In POSTROUTING hits ipv4_confirm() gets skb->nfct = &ct->ct_general SYN Hits PREROUTING – ipv4_conntrack_in(), miss in contrack hash creates new nf_conn
CT from skb->nfct, inserts nf_conn on two skb->nfctiinfo = IP_CT_NEW ORIGINAL: Dst: 10.0.1.2, Src: 10.0.0.10 ; REPLY: Dst: 10.0.0.10, Src: 10.0.1.2 and ports and
hash chains protocols
Filter Table in Forward Chain matches on skb->nfct = &ct->ct_general SYN/ACK WAN LAN: Dst: 10.0.0.10 Src: 10.0.1.2 – ipv4_conntrack_in() hit in CT hash, dir=REPLY
skb->nfctinfo = IP_CT_ESTABLISHED and skb->nfctiinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY set IPS_SEEN_REPLY_BIT, &ct->status ) for future reference. Forward the packet up
accepts the packet
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 17
spread
IPv4 Connection Tracking – SPI+NAT
Extend to NAT with Conntracking – typical scenario client using private IP
int ip_forward() FILTER Table/FORWARD Chain
{
return NF_HOOK(NFPROTO_IPV4, NF_INET_FORWARD,…, ip_forward_finish);
iptable_filter_hook,() -
rule: input dev=wan, skb status ESTABLISHED or RELATED action ACCEPT
}
netif_receive_skb()
dev_queue_xmit()
1. In POSTROUTING hits nf_nat_out() – skb->nfct = &ct->ct_general SYN LANWAN: Dst: 1.2.3.4, Dst: 10.0.0.10; GW Public=2.2..2.2
a) IP_CT_NEW – hits MASQUERADE rule, updates skb->nfctiinfo = IP_CT_NEW 1. Hits PREROUTING – ipv4_conntrack_in(), miss in contrack hash creates new
CT DIR_REPLY tuple: Src: 1.2.3.4, Dst: 2.2.2.2 nf_conn ORIGINAL: Dst: 1.2.3.4, Src: 10.0.0.10 ; REPLY: Dst: 10.0.0.10, Src: 1.2.3.4 and
Replies can now hit in conntrack hash ports and protocols
b) Marks ct->status with IPS_SRC_NAT 2. nf_nat_in() – ctstats not updated
c) Invert reply tuple – Src: 2.2.2.2; Dst: 1.2.3.4 (and L4 proto)
update IP hdr and checksums
2. In POSTROUTING hits ipv4_confirm() gets CT from skb->nfct,
inserts nf_conn on two hash chains, oblivious to NAT
updates.
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 18
spread
IPv4 Connection Tracking –
SPI+NAT+ALG
Add ALG to NAT with Conntracking – typical scenario client using private IP
ALG Example – FTP – the problem
FTP Server
Router Residential/SOHO User
1.2.3.4
4.3.2.1 192.168.0.1 192.168.0.100
IPv4 Internet
Passive mode off
FW State for Control
Session EST,REL PORT 192,100,0,253,180,27
TCP SYN 192.168.0.253 - 46107
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. & TimoJwithout permission. Page 19
No spread
IPv4 Connection Tracking –
SPI+NAT+ALG
IPv4 FTP ALG example – covers both general ALG work with FTP specifics
Register FTP ALG Helper manually
• Register “FTP helper” – ‘nf_conntrack_ftp’ registers ‘helper’ in ‘nf_ct_helper_hash[]’ – monitor FTP pkts
• Later “NAT helper” – ‘nf_nat_ftp’ referenced (can be dynamically loaded)
int ip_forward() FILTER Table/FORWARD Chain
{
return NF_HOOK(NFPROTO_IPV4, NF_INET_FORWARD,…, ip_forward_finish);
iptable_filter_hook,() -
rule: input dev=wan, skb status ESTABLISHED or RELATED action ACCEPT
}
netif_receive_skb()
dev_queue_xmit()
POST ROUTING ipv4_confirm() executes helper to parse packet LAN WAN: Client (w/Private IP) issues SYN to FTP server with port 21
– f.e. PORT commaidn with Private IP (i.e. PORT skb->nfct = &ct->ct_general (with associated helper) -PREROUTING ipv4_conntrack_in() - misses , helper hash
192,168,0,253,180,27) skb->nfctiinfo = IP_CT_NEW searched “ftp_helper” associate with ‘CT’
- Do standard NAT work
Several Packets go by before ALG command executed
POSTROUTING: helper finds PORT command …….
- modifiesl packet PORT IP with Public IP
- Creates a “nf_conntrack_expect” – programs DNAT skb->nfct = &ct->ct_general (with associated helper) LAN WAN: PREROUTING: Client issues PASSIVE mode PORT command
- Associates a NAT helper with expectation (nf_nat_ftp) skb->nfctiinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY
- enqueues ‘nf_contrack_expect” on - init_net->ct. expect_hash[]
WAN LAN: PREROUTING: FTP data connection from server will mis
conntrack hash table.
FILTER TABLE/ FORWARD CHAIN accepts packet skb->nfct = &ct->ct_general (with associated helper)
it’s marked RELATED -will hit the expect hash table, execute the NAT helper to create DNAT entry
skb->nfctiinfo = IP_CT_RELATED
-Set IPS_EXPECTED_BIT in ct->status, causes skb to be marked
….. Data Connection Statefully established by RELATED
Connection Tracking and FW Rules ….
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 20
spread
CPU Offload integrated with IPv4
Connection Tracking – SPI+NAT+ALG
General Principles of CPU Network Offload Integration
• ISPs want CPU heavy Apps on application processor – thus need to offload traffic
More predictable & cost effective – then adding CPUs i.e. as UL/DL rates go up
• CPU should be IDLE with Max packet bandwidth – GPON, DOCSIS 8-Ch Bonding 240 Mbps/DL 104 Mbps UL – higher in future
• Initial Packets – go up to kernel – to program offload engine – primarily L2/L3/L4 used
For DPI apps – Parental control – transparent proxy, content filtering more – kernel determines
• Must be fully integrated into: IP stack – connection tracking, NAT, ALGs, Routing, …..,
• Sessions Limited resource – must prioritize TLU – like streaming over casual browsing
Light DPI (f.e. HTTP persistent connection, Content-Type, …, fields)
General Operation
• LAN -> WAN
ingress hook: saves pre-SNAT’ed SRC L3/L4 (i.e. private IP), local MACs attaches info to skb
o PREROUTING drops packet if ALG, offloading stops here;
egress hook: saves SNAT’ed L3/L4 (i.e. Public IP) , WAN, GW MACs – programs Offload Engine
• WAN-> LAN (Path not shown)
ingress hook: saves pre DNAT’ed L3/L4 (i.e. public IP), GW, WAN MAC attaches to skb
o PREROUTING drops packet if ALG, offloading stops here;
egress hook: saves DNAT’ed L3/L4 (i.e. Private IP), local MACs – Programs Offload Engine
• Two Offload Entries programmed – on match packet headers update, switched to egress port
• Must see at least 2 packets – ORIGINAL/REPLY
• Must have: src/dst macs, src/dst IPs, src/dst ports (if applicable), protocol type, ingress, and egress port(s)
• Local packets to/from GW – obviously not accelerated
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 21
spread
CPU Offload integrated with IPv4
Connection Tracking – SPI+NAT+ALG
General integration of Offload Engine into Linux Network Stack
Locally Delivered
Not Offloaded Not offloaded
if Drop Rule Hit ip_local_deliver()
ip_forward()
FORWARD Check if helper
IPv4 associated, if
yes – disable offload
Network ip_rcv()
ALG control session
PREROUTING
Stack ip_output() dev_queue_xmit() Save Ingress
POSTROUTING Fields Associate
with ‘skb’
netif_receive_skb()
ingress hook
Program TLU with Ingress Egress output
dev_queue_xmit() Ingress/Egress L2,3,4 ,prot, port L2,3,4,prot ports (for MC) too
egress hook L2,L3,L4 and port values
Table Miss
Lookup
Offload Lan
Table
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 22
spread
CPU Offload integrated with IPv4
Connection Tracking – SPI+NAT+ALG
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 23
spread
IPv6 Connection Tracking – SPI
….
struct hlist_nulls_node hnnode original
struct nf_conntrack_tuple tuple;
SRC DST reply
Ex: IP= IP=2001:1:2:3::1/2001 Ex: 2001:1:2:4::2/3001 proto=TCP
proto=TCP dir = DIR_ORIGINAL
Both IPv4/IPv6
IPv4/IPv6 Connection Track
struct hlist_nulls_node hnnode; Agnostic Entries
struct nf_conntrack_tuple tuple;
SRC DST
Ex: IP=2001:1:2:4::2//3001 Ex: IP=2001:1:2:3::1/2001 proto=TCP nf_conn
proto=TCP dir=DIR_REPLY
….
original
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 24
spread
IPv6 Connection Tracking – SPI
ipv6_rcv()
{ int ip6_output(struct sk_buff *skb)
return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, ip6_rcv_finish) {
CONNTRACK/PREROUTING: ipv6_conntrack_in() return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, … , ip6_finish_output, ..)
} CONNTRACK/POSTROUTING Chain: ipv6_confirm()
}
From WAN/LAN
To WAN/LAN
netif_receive_skb()
dev_queue_xmit()
Following Rule in Forward Chain: ip6tables –A FORWARD –i $WAN-IFACE –m state –state ESTABLISHED,RELATED –j ACCEPT
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 25
spread
IPv6 Connection Tracking – SPI+ALG
Same ALG Example as for IPv4, NAT not applicable
FTP Server Router Resedential/SOHO User
2001:4:3:2:1::100 2001:1:2:3::100 2001:1:2:3::1
2001:1:2:3::2
IPv4 Internet
Passive mode off
FW State for Control
TCP SYN 2001:1:2:3::2- 46107 EPRT|2|2001:1:2:3::2:46107
Session EST,REL
ipv6_rcv()
{ int ip6_output(struct sk_buff *skb)
return NF_HOOK(NFPROTO_IPV6, NF_INET_PRE_ROUTING, ip6_rcv_finish) {
CONNTRACK/PREROUTING: ipv6_conntrack_in() - process expect return NF_HOOK_COND(NFPROTO_IPV6, NF_INET_POST_ROUTING, … , ip6_finish_output, ..)
} CONNTRACK/POSTROUTING Chain: ipv6_confirm() – execute helper
}
From WAN/LAN
To WAN/LAN
netif_receive_skb()
dev_queue_xmit()
Register an ALG helper – ‘nf_contrack_ftp’ – registers FTP helper - on “nf_ct_helper_hash[]” - monitor for port 21
POST ROUTING ipv6_confirm() executes helper skb->nfct = &ct->ct_general (with associated helper) Client issues SYN to FTP server with port 21
to parse packet – (f.e. command with skb->nfctiinfo = IP_CT_NEW -PREROUTING ipv6_conntrack_in() - misses , helper hash
EPRT |2|2001:1:2:3::2|46107 ) searched “ftp_helper” associate with ‘CT’
-Several Packets go by before ALG command executed
POSTROUTING: helper finds EPRT command skb->nfct = &ct->ct_general (with associated helper)
- Creates a “nf_conntrack_expect” – - PREROUTING: Client issues PASSIVE mode EPRT command
skb->nfctiinfo = IP_CT_ESTABLISHED + IP_CT_IS_REPLY
- enqueues ‘nf_contrack_expect” on - init_net->ct. expect_hash[]
- PREROUTING: FTP data connection from server will mis
conntrack hash table.
FILTER TABLE/ FORWARD CHAIN accepts packet skb->nfct = &ct->ct_general (with associated helper) - will hit the expect hash table,
it’s marked RELATED skb->nfctiinfo = IP_CT_RELATED - Set IPS_EXPECTED_BIT in ct->status, causes skb to be marked
RELATED
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 26
spread
CPU Offload integrated with IPv6 SPI,
ALGs
General Operation
• LAN -> WAN - ingress hook: similar to IPv4, although deals with IPv6 header structures
PREROUTING drops packet if ALG, offloading stops here – allow helper to follow ALG control connection – pkt modified
egress hook: similar to IPv4 – programs Offload Engine
• WAN-> LAN (Path not shown) ingress hook: similar to IPv4
PREROUTING drops packet if ALG, offloading stops here; egress hook: similar to IPv4– Programs Offload Engine
• Two Offload Entries programmed – on match packet headers update, switched to egress port
• Must see at least 2 packets – ORIGINAL/REPLY
• Must have: src/dst macs, src/dst IPs, src/dst ports (if applicable), protocol type, ingress, and egress port(s)
• Local packets to/from GW – obviously not accelerated
• Issues with protocols like IPsec/ESP go away
• Configuration and source directories
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 27
spread
CPU Offload integrated with IPv6 SPI,
ALGs
Locally Delivered
Not Offloaded Not offloaded
if Drop Rule Hit ip6_input()
ip6_forward()
FORWARD Check if helper
IPv6 associated, if
yes – disable offload
Network ipv6_rcv()
ALG control session
PREROUTING
Stack ip6_output() dev_queue_xmit() Save Ingress
POSTROUTING Fields Associate
with ‘skb’
netif_receive_skb()
ingress hook
Program TLU with Ingress Egress output
dev_queue_xmit() Ingress/Egress L2,3,4 ,prot, port L2,3,4,prot ports (for MC) too
egress hook L2,L3,L4 and port values
Table Miss
Lookup
Offload Lan
Table
L2: Src, Dst Mac - Src, Dst MAC changed L2: Src, Dst Mac - Src, Dst MAC changed
L3: Src, Dst IP - Hop Limit Updated L3: Src, Dst IP - Hop Limit Updated
L4: Ports or others, protocol# L4: Ports or others, protocol#
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 28
spread
6RD
Dual-Stack long term Preferable option for ISPs (example spec DOCSIS eRouter)
Expected both protocols will co-exist for years to come – few reasons - applications not migrated to
IPv6, embedded devices
DS-Lite another option – tunneling IPv4 in IPv6 – breaks IPv4 features – UPnP IGD, DDNS, DMZ, …
6RD – Rapid Deployment another option
• Tunneling IPv6 over IPv4 similar to 6to4
• tun6to4 device – used predefined IPv4 Anycast router to reach IPv6 internet
• 6RD allows ISP specific prefix (instead of 2002::/16) used w/IPv4 addr – i.e. 2001:4dc2::/32, 192.0.2.100 2001:4dc2:c000:264::/64
New DHCPv4 option with prefix, border router IPv4 address
• 6RD supported– ‘ip’ tool supports 6rd tunnel mode
[ ] IPv6: IPv6 Rapid Deployment (6RD) (EXPERIMENTAL) under “IPv6 protocols” enables 6RD
• Offload Engines – no support IPv6 IPv4
• 6rd delegated Prefix: 2002:4dc2; 6rd IPv4
6rd
RA Daemon
192.168.0.180
resolver -> A record
ip6tables 2001:4dc2::1/32 dev 6rd
LAN IPv6 ::/0 via ::1.2.3.4 dev 6rd WAN IPv6 IPv4
IPv6 ready client To Border Router
IPv4
192.168.0.1
iptables 4.3.2.1 IPv4
LL-fe80::<EUI-64>
2001:4dc2:0403:0201:<EUI-64> 2001:4dc2:0403:0201::<EUI>/64
resolver -> AAAA record
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 29
spread
Dual Stack IPv4/IPv6
IPv6 GW Implementation IPv4 GW Implementation
WAN LAN
LAN WAN NAT, SPI, Port Scanning, DoS,
ALGs - isolate guest SSID iwpriv, iwconfig Enterprise
net-snmp - ipv6 MIBS Radius Auth.
iptables Wireless
SPI, Port Scanning, DoS, ALGs - dropbear (ssh server) Tools Authenticator
ip6tables isolate guest SSID
dropbear (ssh server) net-snmp - ipv4 MIBS igmpproxy -query Lan Group Membership
- report upstream to MC router
- use -6 extensions for IPv6 - dhcp client - primarily ‘ip’ - heavily used everywhere
iproute-2 - dhcp script – config
mcproxy - MLD proxy udhcpc WAN IP, DNS Server, iproute-2 dnsmasq -DNS proxy Server – server to lan clients,
dnsmasq -DNS proxy Server – server to lan clients,
Default GW, config client to upstream – ‘A’ Records, …
client to upstream – ‘AAAA’ Records, …
IPv6 WAN prefix, Prefix Delegation routing
dhcp6c dhcp-script – configures client ebtables - Advanced features like NAT by-pass
side radvd Advertise (RAs) Prefix, Hop Limit, … params -DHCP Server – priv
ip range and options
raparse -parse RAs, determine M=1, start dhcp6s info-request – DNS Serv., Domain Name, bridge-utils - bridge management udhcpd
provisioning sequence NTP Server
ICMPv6 (58)
Neighbor Disc. ND Router Adv. Echo/Reply, Dst MLD Query/ IGMPv4 (2) ICMPv4(1) TCP (6) UDP (17)
NS, NA (Neigh. Cache RA, RS unreach, TTL Exec Report echo, dst unrch
Report/Query time exceed
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 30
spread
IPv6 Provisioning Cablelabs eRouter
Spec
eRouter is a dual stack spec – strong reference for IPv6 implementation
Can be configured for IPv4 only, IPv4+IPv6 or IPv6 only
• There are TLVs in TFTP config which determine mode – mode assumed here is Dual-Stack
IPv4 not covered – standard dhcp client /handler script - proxy DNS server, private IP DHCP server …
Procedure for ISP Facing Interface – most likely flow – other variants not practical
1. Construct link local address (LL) – ipv6/conf/wan/autoconf=1 – 0xfe80::<OUI host> - join ND and all Hosts MC group
Router DHCPv6 Server and/or Gateway
DAD NS – Self (Solicited Node MC)
2. Get RAs – confirm managed mode (M flag set), get default router other params like Hop Limit, MTU
Expect RA – Message
struct nd_router_advert
inspect - ra->nd_ra_flags_reserved for ND_RA_FLAG_MANAGED
3. The M bit must be set – issue DHCPv6 request – get IA_NA, IA_PD – perm. IPv6 address, prefixes and DNS server
Router may use Rapid Commit option in future – discussed earlier
Router DHCPv6 Server and/or Gateway
Solicit (FF02::1:2 UDP 547)
Advertise(IA_NA, IA_PD , DUID for router, Rapid Commit support option , DNS recursive server IP - use UDP 546)
Request (unicast)
Reply (unicast)
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 31
spread
IPv6 Provisioning & Routing
Cablelabs eRouter Spec
Procedure for Customer Facing Interface(s)
The Customer Facing Interface configuration – follows ISP server configuration
1. Create LL address w/DAD, subscribe to ND & All hosts MC Groups
2. Construct IPv6 address for each interface
Use IA_PD + interface OUI, run DAD, subscribe to ND & All hosts MC Groups
3. Generate RAs – with O=1 and provide Prefix option
IA_PD – from DHCP on ISP facing interface
Client use SLAAC
For example RADVD configuration:
interface eth0.3
{
AdvSendAdvert on;
MinRtrAdvInterval 30;
MaxRtrAdvInterval 100;
AdvOtherConfigFlag on;
prefix 2001:1:2:3::/64
….
};
4. Start up DHCPv6 Server
At very least pass DNS Server – determined from ISP Interface configuration
Other acceptable option – run proxy DNS server – update /etc/resolv.conf - pas router as DNS server
Example from dhcp6s –
option domain-name-servers 2001:1:2:3::50;
Routing
• IPv6 addresses are globally routable – nothing special – of link ND for GW, on-link ND for destination
• MLD – similar to IGMP must – manage LAN membership – provide reports to queries – on ISP facing interface
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 32
spread
Managing Dual Stack Gateway – SW solution
ISPs concerned Dual-Stack will require more upgrades
• Limit Service Calls, sending out technicians
Prefer to isolate both stacks
Parameter changes – require total reboot – safest approach
With Dual-Stack – shared components (DNS, SNMP, TR-69, SSH, …) – updates impact both stacks
Upgrade Flexibility – update kernel, apps keep other stack running
Intelligent upgrade – constant interface
Virtualization one solution
• introduces new challenges
MANAGEMENT Domain IPv4 Domain IPv6 Domain
WEB UI Interface, NAS, Media Server, DHCP Client/Server, DNS, DDNS, SSH, …, DHCP6s, DHCP6c, DNS, RADVD, SSH, …,
LAN IPv4/IPv6 WAN Mgmt Interface (upgrade,
monitor, …)
BR_FILTER in/out – Forward Filter ‘-p <ipv4,arp>’ drop on ipv6 interface; ‘-p<ipv6>’ drop on ipv4 interfaces
BR_FILTER local in – BROUTER drop
Same WAN MAC on both VMs
Allows independent management of stacks
CPU offload need backend/frontend driver
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 33
spread
Managing Dual Stack Gateway – HW
Support
MANAGEMENT Domain IPv4 Domain IPv6 Domain
WEB UI Interface, NAS, Media Server, DHCP Client/Server, DNS, DDNS, SSH, …, DHCP6s, DHCP6c, DNS, RADVD, SSH, …,
LAN IPv4/IPv6 WAN Mgmt Interface (upgrade,
monitor, …)
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 34
spread
Managing Dual Stack Gateway –
Upgrade
Typical Image Format without virtualization
kernel rootfs kernel rootfs
boot loader R/W FS
side1 side1 side2 side2
Inform
. Hello
.
. Hello
Inform
SSL
Download certificate rqst Handshake
Transfer Complete
certificate
MAC, Pub Key,
Request
Download HTTPS File Download http POST version, ..
Download
XML Config Version,
Parameters,
Valid Duration
Typical Upgrade of CPE – complex procedure http POST Download
Rqst
TR-69 RPC used – communicate image download
XML Format
HTTPS – used to download image File Name , type
.
CPE typically upgraded at image level
. TFTPFile
After download – image burned to ‘other’ side, boot side switched request
Mario
HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No & TimoJ without permission. Page 35
spread
Thank you
www.huawei.com