Academia.eduAcademia.edu

Routing design in operational networks

2004, ACM SIGCOMM Computer Communication Review

In any IP network, routing protocols provide the intelligence that takes a collection of physical links and transforms them into a network that enables packets to travel from one host to another. Though routing design is arguably the single most important design task for large IP networks, there has been very little systematic investigation into how routing protocols are actually used in production networks to implement the goals of network architects. We have developed a methodology for reverse engineering a coherent global view of a network's routing design from the static analysis of dumps of the local configuration state of each router. Starting with a set of 8,035 configuration files, we have applied this method to 31 production networks. In this paper we present a detailed examination of how routing protocols are used in operational networks. In particular, the results show the conventional model of "interior" and "exterior" gateway protocols is insuffi...

Session 1: Network Geometry and Design Routing Design in Operational Networks: A Look from the Inside ∗ David A. Maltz, Geoffrey Xie, Jibin Zhan, Hui Zhang {dmaltz,geoffxie,jibin,hzhang}@cs.cmu.edu Carnegie Mellon University Gı́sli Hjálmtýsson†, Albert Greenberg {gisli,albert}@research.att.com AT&T Labs–Research Abstract Keywords In any IP network, routing protocols provide the intelligence that takes a collection of physical links and transforms them into a network that enables packets to travel from one host to another. Though routing design is arguably the single most important design task for large IP networks, there has been very little systematic investigation into how routing protocols are actually used in production networks to implement the goals of network architects. We have developed a methodology for reverse engineering a coherent global view of a network’s routing design from the static analysis of dumps of the local configuration state of each router. Starting with a set of 8,035 configuration files, we have applied this method to 31 production networks. In this paper we present a detailed examination of how routing protocols are used in operational networks. In particular, the results show the conventional model of “interior” and “exterior” gateway protocols is insufficient to describe the diverse set of mechanisms used by architects. We provide examples of the more unusual designs and examine their trade-offs. We discuss the strengths and weaknesses of our methodology, and argue that it opens paths towards new understandings of network behavior and design. Routing design, static configuration analysis, reverse engineering, network modeling 1. Introduction By constructing the collective distributed routing state, routing protocols create the “network-wide intelligence” that transforms a collection of individual links and routers into an IP network. A network’s routing design is embodied in the configuration of these protocols. While originally targeted at establishing basic reachability, in practice routing designs are used to attempt to deal with a large and complex set of objectives and constraints: (i) providing resiliency and predictable behavior under a wide set of internal or external fault or overload conditions; (ii) maintaining stable and efficient internal operations; (iii) maintaining contractual or business relationships between different administrative domains; (iv) coping with complex interactions between a wide set of protocols, which run concurrently, overlap in functionality, and collectively determine the forwarding tables within the routers. Creating a routing design is in practice a policy driven design task of specifying packet filters, link weights, routing policies, and so forth. Understanding a routing design is complicated by the enormous range of options the routing designer may choose from to realize given objectives and constraints – a diverse range of routing designs may all satisfy a given set of constraints. More importantly, intricate details of the design choices have significant impact on fundamental aspects of overall network performance and operations, including complexity, cost, and survivability. Routing design is both inherently hard and the single most important network design task. It is natural to think of numerous possibilities to improve the situation: e.g., construction of simpler, more robust and more efficient routing designs with available protocols; construction of better models for reasoning about the range of emergent routing states that may result from the design in the operational network; construction of better configuration languages and better protocols that more cleanly separate policy intent from implementation, so that policies can be better composed and reasoned about. To succeed, we need first to get some level of understanding of what routing designs look like in operational networks, and what the routing designers are attempting to achieve. In practice this must be an exercise in reverse engineering, in part because documentation lags the network as network technologies and Categories and Subject Descriptors C.2.1 [Network Architecture and Design] General Terms Design, Management, Measurement ∗ This research was sponsored by the NSF under ITR Awards ANI-0085920 and ANI-0331653. Views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of AT&T, NSF, or the U.S. government. † Also at Reykjavı́k University Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SIGCOMM’04, Aug. 30–Sept. 3, 2004, Portland, Oregon, USA. Copyright 2004 ACM 1-58113-862-8/04/0008 ...$5.00. 27 business conditions change rapidly and overloaded operations staff only need the specific configuration rules rather than the global configuration intent. Two reverse engineering methodologies might be attempted. One, which we term “black-box” involves using a variety of data available without administrative privilege, including trace routes, pings, BGP table dumps, and DNS lookups. An excellent set of “black-box” research results have come from the RocketFuel [25], Skitter [2], and Mercator [8] projects. These projects have provided some understanding of the topology and POP structure of major backbone networks, providing an estimated snapshot of the routers and their IP connectivity within a given administrative domain. Another approach, which we term “white-box” involves using data available to those with network administrative privileges. Insights and results from white-box approaches can go substantially beyond IP topology snapshots. Topology snapshots emerge as the results of the interactions of routing configuration and routing protocols. White-box approaches shed direct light on the routing design that governs the protocols that produce the snapshots, and provide fundamental data needed to reason about why a particular topology emerges. In contrast to black-box approaches, there is little understanding of the power and limitations of white-box approaches. With this paper we begin the process of structuring and analyzing IP routing designs. Our approach is pragmatic. Rather than theorizing about goals and metrics, we have chosen to begin our research by investigating routing configurations of existing production networks. By far the best source of structure and design pattern information available for operational IP networks are the running configuration files associated with the routers. A router configuration file provides a dump of the complete set of configuration commands currently executing on the router. Roughly, a router configuration file corresponds to a program, and the set of router configuration files in a network corresponds to a distributed program. Just as program analysis has had a rich history (e.g., [16]) and a profound impact on computing technologies (e.g., modern compilers, RISC architectures), we believe routing design analysis may illuminate the way forward for better configuration languages and simpler, more robust network architectures. In this paper, we propose a scalable white-box approach for reverse engineering of routing designs, from data easily and routinely archived today in virtually all operational networks. In collaboration with a major network service provider we retrieved and anonymized the configuration files of 23,417 production routers. With the aid of a trusted intermediary we selected for detailed analysis 8,035 configuration files constituting 31 production networks ranging in size from medium to large, representing regional enterprises, global enterprise networks, and segments of provider’s backbone networks. To our knowledge such an undertaking has not been done before, and certainly not at a comparable scale. This is a significant contribution of this paper. An additional contribution is our methodology of working with anonymous data. Configuration files commonly contain sensitive and proprietary information. Working with anonymous data was key in getting access to the set of configuration files and makes our methods viable to the larger networking community. Using this methodology we perform a detailed examination of how routing protocols are used in R7 Enterprise Network Backbone Network R4 R1 R2 R6 R3 = interface = logical link R7 R5 = external router Figure 1: An example topology showing routers, interfaces, and links. R1-R3 represent a small enterprise network customer connected to R4-R6, which represent part of a transit network that also serves R7. operational networks and derive a number of interesting observations that could not have been made using any other existing approach. 2. Background This section explains the types of configurations required to implement a routing design. To make the explanation more concrete, Figure 1 shows a router-level view of an example topology. Routers R1 to R7, depicted as disks, are connected with physical links (shown as solid lines in the figure). Links terminate at interfaces, shown as small squares. In this example, routers R1-R3 belong to a small enterprise network that obtains connectivity to the Internet through a transit backbone network, of which routers R4-R6 are a part. Router R7 belongs to another customer of the backbone network. Figure 2 shows part of the routing configuration file from router R2 in Figure 1. This “configlet” is in the Cisco IOS language. While the syntax of other router configuration languages differ, the granularity and type of information they contain are very similar. We use this example and these figures throughout this section. As described in Section 4, user-specific information is anonymized for privacy reasons. 2.1 Link-level Topology Each router has one or more interfaces; each interface has one or more IP addresses and subnets that identify the set of other IP addresses directly reachable from that interface. Lines 1-11 of Figure 2 show interface definitions for three interfaces of R2, an Ethernet, Serial, and High Speed Serial (Hssi) interface, having IP addresses and subnets 66.251.71.144/25, 66.253.32.85/30, and 66.253.160.67/30, respectively. From the configuration files, we infer the logical IP links between routers by matching interfaces with the same subnet.1 When an interface fails to match with any other interface in the network’s configuration files, we can usually 1 Interfaces can also be unnumbered, meaning that no IP address is assigned to them. These interfaces cannot easily be matched into links without additional information, but they are quite rare in the networks we have evaluated so far: we found only 528 unnumbered interfaces out of 96,487 total interfaces. 28 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 interface Ethernet0 ip address 66.251.75.144 255.255.255.128 ip access-group 143 in ! interface Serial1/0.5 point-to-point ip address 66.253.32.85 255.255.255.252 ip access-group 143 in frame-relay interface-dlci 28 ! interface Hssi2/0 point-to-point ip address 66.253.160.67 255.255.255.252 ! router ospf 64 redistribute connected metric-type 1 subnets redistribute bgp 64780 metric 1 subnets network 66.251.75.128 0.0.0.127 area 0 ! router ospf 128 redistribute connected metric-type 1 subnets network 66.253.32.84 0.0.0.3 area 11 distribute-list 44 in Serial1/0.5 distribute-list 45 out ! router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 12762 neighbor 66.253.160.68 distribute-list 4 in neighbor 66.253.160.68 distribute-list 3 out ! access-list 143 deny 134.161.0.0 0.0.255.255 access-list 143 permit any route-map 8aTzlvBrbaW deny 10 match ip address 4 route-map 8aTzlvBrbaW permit 20 match ip address 7 ip route 10.235.240.71 255.255.0.0 10.234.12.7 Figure 2: Part of the router configuration file from R2 in Figure 1 showing three interface definitions, two different instances of OSPF, one instance of BGP, and assorted policies. User-specific information, such as a IP addresses and route-map names, have been anonymized for privacy (e.g., the routemap name “8aTzlvBrbaW” is a random string that replaces the actual name). route redistribution BGP RIB OSPF 64 RIB OSPF 128 RIB interface subnets static routes local RIB Route Selection Router RIB Figure 3: The relation between routing process RIBs, route creation, route redistribution, and the router RIB that stores routes used to forward packets. cess 64780 speaking BGP, respectively. By default, no information is exchanged between these routing processes. Each routing process can be associated with one or more interfaces on the router. There are many ways to create this association, but the most common one is via a network command (e.g. line 16 in Figure 2) that covers the address assigned to the interface (line 2). For two routing processes on different routers to directly exchange routing information, the processes must be adjacent. The definition of adjacent depends on the type of the routing process. Two BGP processes are adjacent if the processes are explicitly configured to speak to each other and it is possible to open a TCP connection between the two routers. For OSPF, IS-IS, RIP, or EIGRP processes to be adjacent, the processes must be of the same type; there must be a link between the routers on which the processes run; and each process must be configured to cover the interface at its end of the link. If R4 and R5 in Figure 1 are running OSPF processes and both interfaces on those routers are associated with the process, then the OSPF processes on R4 and R5 would be adjacent. 2.3 Route Calculation and Selection We model a route as an IP subnet address (e.g., 10.0.0.0/8) plus some additional attributes, such as weights or an AS path, that the router may use to calculate a next-hop to reach that subnet. There are several ways a router can learn a route. Routes to all the directly connected subnets are always available to the router, or routes can be manually configured (e.g., with static routes). An example of a static route is line 36 in Figure 2. Through routing protocols, routes can be learned dynamically. While different protocols exchange different types of routing information to convey routes between adjacent processes, e.g., OSPF and IS-IS use link-state advertisements and BGP uses path-vector records, the end result is the processes learning routes. For the purpose of reasoning about routing designs, the details of a large class of routers can be abstracted to the model depicted in Figure 3, where each routing process maintains its own Routing Information Base, RIB, where its associated routing state is stored. The routes a router uses for forwarding packets are centrally stored in the router RIB. Route selection logic is used to select which routes from the routing process RIBs should be entered into the router RIB. Prior work [7, 9, 10, 21] has studied route selection in BGP. declare the interface to be external facing and anything connected to it must be external to the network. In Figure 1, R6 is external to the enterprise network, and R7 is external to both enterprise and backbone networks. 2.2 Routing Topology Routing protocols are typically classified as either Interior Gateway Protocols (IGPs) used to exchange information inside a network (e.g., OSPF [20], IS-IS [4], RIP [13], and EIGRP [26]) or an Exterior Gateway Protocol (EGP) used to exchange information between networks (e.g., BGP [22]). Both IGPs and EGPs share the common goal of exchanging routing information between routers, but differ in the features and performance they provide. Each router can use multiple protocols simultaneously; moreover multiple instances of the same protocol may exist on a single router. To maintain boundaries on how routing information is shared, each routing protocol runs as a separate process on the router and is identified by a process-id. In Figure 2, lines 13-16, 18-22, and 24-28 define routing process 64 speaking OSPF, routing process 128 speaking OSPF, and routing pro- 29 2000 However, BGP route selection determines only which routes are present in the RIB of the BGP process. A second route selection process determines which of those routes are entered into the router RIB. 1800 1600 2.4 # lines in configuration file 1400 Redistribution, Routing Policies and Packet Filtering Routing protocols exchange routing information, hence routes, between routers. Inside a single router, a mechanism called route redistribution is used to transfer routes between routing processes, as illustrated by the dashed arrows in Figure 3. 2 To model the handling of routes for the subnets that are directly connected to the router and routes that are statically configured, we introduce a local RIB that holds these routes. This makes the handling of static routes parallel to that of dynamically learned routes, as route redistribution can then be used to redistribute routes from the local RIB into the other routing processes on the router. Lines 14 and 19 are examples of this type of redistribution. Routing policies are the mechanisms that control the exchange of routes between routers and between routing processes on the same router. Modern routers support a rich language to specify routing policies, and the complexity of routing design is largely incorporated into these policies. In our example above, R2 uses “distribute-list 4” to control the routes learned from R6, and “distribute-list 3” for routes announced out. It uses route-map “8aTzlvBrbaW” to control which routes can be redistributed from the “ospf 64” routing process into the BGP process. Redistribution and routing policy operate in the control plane of the network and determine the path that packets will take from their sources to their destinations. There is another kind of policy control in the network which works directly on the data plane. That is packet filtering. Packet filtering enables a router to classify the incoming or outgoing packet stream based on the properties associated with individual packets or packet streams. Matching packets are either forwarded (allowed) or dropped (denied). Unlike routing information, packet filters are interface specific, statically configured, are not shared across routers, and can only be changed manually. In Figure 2, lines 30 to 31 define a packet filter, which is assigned to the Serial1/0.5 interface in line 7. 3. 1200 1000 800 600 400 200 0 0 100 200 300 400 500 600 Router ID, sorted by configuration file size 700 800 900 Figure 4: Size distribution of the configuration files for net5 the network, examining even 10% of them by hand would be a major challenge. More importantly, the number of details in each configuration file and the distributed nature of router configuration makes manual reverse engineering unreliable. Correctly extracting the routing design requires creating the proper relationships between literally thousands of details, some located inside a single file and some distributed across many different files. It is like trying to capture by hand the global behavior of dozens of distributed yet interacting programs written in assembly language. What is needed is a framework for systematically reverse engineering, representing, and analyzing routing designs that enables the scientific study of the art of routing design. In this section we describe four abstractions we have developed that can be automatically reverse engineered from a network’s router configuration files: routing process graphs, routing instances, route pathway graphs, and address space structure. With these abstractions, we have a succinct means to capture the routing design of network and reduce the need for researchers and operators to work with routers only at the level of the configuration files. It opens the door to discussion of the performance and operation of complete networks, rather than individual protocols. A Model for Understanding Routing Design 3.1 Routing Process Graphs Our first step for extracting the routing design of a network is to build the routing process graph that models how routing information flows through the network. The vertices in this graph are the RIBs that store the routing information learned by each routing process. Since there is one RIB for each routing process on a router, the vertex list can be easily extracted from the configuration files. An edge between two RIBs is added to the graph whenever routes from one RIB might be transfered to the other RIB. These edges are discovered by parsing the configuration files for all commands that create adjacencies between routing processes or that import, export, or redistribute routes between processes. Policies that govern the exchange of routes can be modeled as annotations on the edges of the graph. Figure 5 shows the routing process graph for the example of While router configuration files are designed to be editable directly by human operators, it is extremely tedious to reverse engineer the routing design of a network by manually extracting information from the configuration files. Many production networks are large in terms of both the number of routers they contain and the size of each router’s configuration files. Figure 4 shows the configuration file size distribution of one network in our data set, which has a total of 881 routers. The configuration files for these routers contain an average of 270 lines of configuration commands each. With a total of 237,870 commands used to configure 2 JunOS and Gated use import and export commands, which always go through the router RIB, but this can be modeled in our framework. 30 111 000 000 111 000 111 000 111 111 000 000 111 000 111 000 111 BGP RIB BGP RIB OSPF RIB instance 2 ospf 64 OSPF RIB 1111111 0000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000 111 000 111 000 111 000 111 Router 7 OSPF RIB OSPF RIB EBGP BGP RIB IB G P IBGP Router 1 111 000 000 111 000 111 000 111 BGP RIB Router 4 000 111 111 000 111 000 000 000 111 111 000 000 111 111 111 000 000 111 000 111 000 111 000 000 111 111 000 111 BGP RIB OSPF RIB Router RIB OSPF RIB OSPF RIB Router RIB IBGP Router RIB Router RIB Router RIB Router 2 Router 6 Router 5 Router 3 instance 1 ospf 128 Router RIB EBGP Router RIB OSPF RIB 111 000 000 111 000 111 000 111 Router 1 instance 4 BGP AS 64780 111 000 000 111 000 111 000 111 OSPF RIB BGP RIB Router RIB Router 2 AS 8342 111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000 111 000 111 000 111 000 111 000 111 000 000111 111 instance 3 ospf 0 BGP RIB OSPF RIB Router RIB Typical backbone network Typical backbone network design Figure 6: A depiction of the routing designs for the networks in Figures 1 and 5. Figure 5: Routing process graph for the example networks in Figure 1. graph (Figure 5), is tremendously valuable in understanding the interaction between those processes. However, as we have discovered in our analysis of the production networks, the value of this type of model drops rapidly as the number of routers and the complexity of the routing design increases. To further abstract away details that obscure the structure of the routing design and enable the analysis and understanding of larger networks, we introduce the concept of a routing instance that can represent a large number of routing processes. A routing instance models the set of routing processes that share routing information. We compute the routing instances for a network from its configuration files by grouping together all the routing processes running the same protocol that are adjacent to each other. We first select from the network a routing process that has not yet been assigned an routing instance and assign to it a new unique instance number. We then locate all the adjacencies of that process, and compute the transitive closure to find the set of routers and routing processes belonging to the new routing instance. The closure operation flood fills through the routing process graph, stopping when it reaches an edge between routing processes of different types or an EBGP adjacency between BGP speakers with different AS numbers. The process of selecting an unassigned routing process is then repeated until all routing processes have been assigned to a routing instance. As described in Section 2.2, each routing process has a process ID assigned to it by the configuration file. However, the meaning of these process IDs is entirely separate from the routing instance defined above. In many production networks we examined the process ID gives no indication of how the routing processes are connected. It is very common for routing processes with the same process ID to be found in two different routing instances, or for processes with different IDs to be found in the same routing instance. In general, the only requirement enforced on routing process IDs is that they be unique on each router — they have no network-wide semantics. Figure 6 shows the result of applying our routing instance model to the example networks in Figure 5. The individual routers and routing processes in the networks have been removed and replaced with the routing instances that the routing processes are part of. The heavy lines between instances indicate where route exchange occurs between different protocols or ASs. In order to help a human reader Figure 1, in which a small enterprise network is connected to a transit backbone network. There are two main advantages of the routing process graph. First, it immediately becomes easier to frame questions that were previously unanswerable via static analysis, such as how many routes will a routing process have to handle or what destinations will be reachable from a particular router under any given failure scenario [27]. Second, the routing process graph exposes the detailed structure of the routing design so that alternative designs can be compared. For example, the enterprise network and backbone network in Figure 5 both contain the same number of routers, but have very different routing designs. In the backbone network (right half of the figure), routes to external subnets are learned from external peers (R7 and R2) via an External-BGP session (EBGP) and shared with the other routers in the network via Internal-BGP sessions (IBGP). The EBGP speaker also announces to the outside world the routes reachable via this AS. An IGP process (e.g., OSPF) is run on each router in the network and used to compute routes to all subnets internal to the network (aka infrastructure routes). This design is typical of large ISP transit networks. The hallmark of this design is that external routes are never redistributed into OSPF: the only place internal and external routes come together is in the router RIB of each individual router. Redistribution policies are not shown in the figure for clarity. In the enterprise network (left half of the figure), routes to external subnets are again learned from an external peer (R6) via an External-BGP session (EBGP), but here they are redistributed into OSPF on the border router (R2). The OSPF processes then exchange routes to both internal and external destinations. This design is typical of small enterprise networks, where it is chosen because BGP processes only need to be configured on the border routers, which are few in number. This also minimizes the size of the IBGP mesh that must be constructed inside the network. The border routers use BGP’s extensive routing policy features to craft a small number of key routes that summarize the external routes they have learned and inject these summaries into the IGP. 3.2 instance 5 BGP AS 12762 Router 5 Typical enterprise network Typical enterprise routing design 11111111 00000000 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 Routing Instance Graphs Directly showing the relationship between the routing processes on different routers, as is done in a routing process 31 themselves, and without additional processing the subnets found inside the configurations are too small and fragmented to reveal the overall structure of the address space usage. We recover the structure of the address space usage by finding a set of network numbers and netmasks that best covers the subnets that are mentioned in the configuration files. This discovery process starts with a list of all subnets mentioned in the network. The process then repeatedly joins together any two subnets whose network numbers differ in no more than the least two bits (basically, expanding the IP subnets so long as at least half the addresses in the enlarged subnet are “used” by the network) until no more joins are possible. The result is a hierarchical tree of address space blocks. In extracting a network’s routing design we make use of the address space structure in two ways. First, we can associate with each routing instance the set of address blocks that are connected to the instance. This reduces the number of individual subnets present in the extracted routing design and makes the design easier to understand and reason about. Second, it helps us to determine if the network being analyzed contains routers whose configuration state was left out of the data set. When a router R is missing from the data set, the routers in the data set will have their interfaces that should be adjacent to R erroneously marked as external-facing, since the matching interfaces on R cannot be found (see Section 2.1). Analysis of the address space structure can then help identify the missing router. Many networks assign their external-facing interfaces from a different block of addresses than the block used to assign internalfacing interfaces. If an interface is marked “external-facing” but has an address from the middle of an block used by many internal-facing interfaces, then it is very likely that the data set is missing a router’s configuration file, and that “external-interface” is actually connected to the missing router. External World instance 4 BGP AS 64780 111111 000000 000000 111111 000000 111111 000000 111111 000000 111111 000 111 000 111 000 111 000 111 instance 1 ospf 128 OSPF RIB Router RIB Router 1 (a) Typical Enterprise network External World 11111111 00000000 000000 00000000 111111 11111111 000000 111111 00000000 11111111 000000 00000000 111111 11111111 000000 111111 00000000 111111 11111111 000000 000 000111 111 000 111 000 111 000 111 000 111 000 000111 111 instance 5 BGP AS 12762 instance 3 ospf 0 BGP RIB OSPF RIB Router RIB Router 5 (b) Typical backbone network Figure 7: The pathways via which routing information is learned by Router 1 and Router 5. The pathway in (a) is typical of enterprise networks and the pathway in (b) is typical of backbone networks. understand how a particular router fits into the routing design, that router can be added back into the model as shown by routers R1, R2, and R5 in Figure 6. Dashed lines are used to indicate the routing instance to which each routing process belongs. The value of the routing instance model is its ability to scale to large numbers of routers without losing the ability to capture complex policies and interactions between multiple routing protocols. While in this example the routing process graph and routing instance graph have similar complexity, in later sections we will show examples from real networks where a single routing instance is able to represent hundreds of routing processes — radically reducing the complexity of the graph and rendering the structure of the network easily understandable. 3.3 Route Pathway Graphs Using the routing instance model, we can construct for any router a route pathway graph showing where the routes used by that router come from. Starting at the Router RIB for the router in question, we perform a breath-first-search through the routing instance model, recording the instances through which the search passes. As shown in Figure 7, Router 1 learns all its routes from routing instance 1, which learns all its routes from instance 4, which learns routes from the outside world via an external peer. Router 5 learns routes from instances 5 and 3, and all routes to the external world must come via instance 5. Routing policies are typically applied at each edge in the pathway graph. Route pathways are useful for characterizing the routing design of a network. They can also be used to locate all the routing policies that affect the routes seen by any particular router, and pinpoint where the policies are applied. 4. Obtaining Network Configurations The 31 networks whose configuration files we analyzed were obtained from a large telecommunications company that manages enterprise networks as part of its service portfolio. As is typical in real networks, the routing designs we see include those designed by the ISP’s engineers, those designed by the ISP’s customers, and hybrid mixes of the two. The value of router configuration files to networking researchers is that they contain intricate details about the structure and operation of the network they describe. For this same reason, they are carefully guarded as proprietary secrets of the companies that own and manage the network they describe. This secrecy has hampered researchers from gaining access to network configurations, and has driven the development of black-box methods of network analysis as the only alternative. We have overcome this challenge by developing a means for anonymizing router configuration files that preserves enough structure in the files to enable our analysis, but prevents readers of the files from determining even the identity of the network being examined. To conduct our study, we combined configuration file anonymization with a single-blind methodology. Three members of our group had knowledge of the identity of the networks being examined and contact information for the designers of 3.4 Extracting the Address Space Structure Network designers often have a structured plan for assigning addresses inside the network. Since the routes exchanged by the routing protocols are written in terms of subnets that represent blocks of IP addresses, understanding the structure used to divide the address space into blocks is very helpful in analyzing the routing design. Unfortunately, the configurations may never explicitly list the address blocks 32 0.7 those networks, but this information was kept from the rest of the group. All analysis described in this paper was performed on the anonymized configuration files without knowledge of the network identity. Our results were then verified with the actual designers via the group members with the contact information for those networks. Without both anonymization and the single-blind methodology, access to these configuration files would have been inconceivable — especially at this scale. Our hope is that these techniques may enable other researchers to gain access to similar types of data. Given the success of our methodology and the value we were able to generate from access to the configurations, perhaps more organizations can be led to anonymize and release their network configurations. The creation of a shared data-set of configurations would enable the direct clarification of many questions, such as network topology, that researchers have debated, and potentially increase network security for all concerned through an open-source-like review process. networks in study known networks 0.6 Fraction of networks 0.5 0.4 0.3 0.2 0.1 0 <10 20 40 80 160 320 Number of routers in network 640 1280 >1280 Figure 8: Distribution of the size of the 31 analyzed networks compared to the size distribution of all networks known in this repository. 4.1 Anonymizing Configuration Files Our current anonymizer [17] is specific to Cisco IOS, but the strategy is generally applicable. All comments are removed from the configs using regular expressions. Under the assumption that no “secret” or identifying information would appear in the published Cisco IOS command reference guide, but that most valid IOS commands would be found there, all of the words found in the guide were extracted and turned into a list of tokens that that do not require anonymization. All non-numeric tokens in the configurations are checked against this list, and any tokens not found in the list are hashed using SHA1 digests [1]. This anonymizes the names of class-maps, route-maps, and any other strings that could hold privileged information. Simple integers are generally not anonymized. Regexps are used to locate and anonymize all public AS numbers, although private AS numbers are not hashed since they do not leak information about the identity of the network. All IP addresses are hashed using a modified version of the tcpdpriv algorithm [19]. While hypothetical attacks on the tcpdpriv algorithm have been proposed [28], they use the frequency with addresses appear in a packet trace — information that is not available from anonymized static configuration files. An attack on the IP address anonymization could be attempted by fingerprinting the pattern of address space usage inside a network and probing addresses in candidate networks to look for a match, but most of the networks in our data set filter the packets that would be needed to conduct the probing. Even if the network permitted probing, determining the number of /30s, /29s etc., from ICMP Reply or backscatter packets would be quite challenging. In our case, these anonymization techniques have proven sufficient to give our partner telecommunications company a sufficient level of comfort to grant us access to the configuration files for this study. As a result of the anonymization process, all comments, documentation, semantically meaningful names, or anything that could convey intent is removed from the configurations. All that is left is raw mechanism. Since the configuration files for each network are placed in a single directory with filenames of the form “config1”, “config2”, ... not even DNS naming conventions that have been used by other groups to guess at POP structure and geographic location are avail- able. The lack of comments and meta-data increases the challenge in reverse-engineering the routing design of the network, but highlights the value of the modeling techniques proposed in this paper to extract the structure of the routing design from an unorganized mass of configuration files. 4.2 Size of the Analyzed Networks Figure 8 shows the distribution of the sizes of the 31 networks we studied in detail compared with the sizes of 2,400 networks in a repository available to us. Our study contains networks from across the range of network sizes found in the wild, with a slight overweighting towards networks with more than 20 routers. 5. Discoveries from Routing Design Analysis Using tools built to embody our reverse engineering methodology, we conducted extensive analysis of 31 networks whose configurations were available to us. In this section we discuss some of the more interesting routing related observations that show the power of the analysis of static configuration files to frame and answer questions about networks that were previously unexamined. 5.1 Using the Routing Instance Model to Understand a Network’s Structure To illustrate how reverse engineering the routing design can take even an extremely complicated network and make it intelligble, we examine an enterprise network called net5. This network contains a total of 881 routers and 14 different BGP ASs, all internal to the network. There are 24 routing instances, which range in size from the largest that contains 445 routers to the smallest that contains only a single router. The network connects to external networks through EBGP sessions with 16 different external ASs. A graph of the physical topology of net5 is an unintelligible “hairball” — a densely connected set of routers and links that gives no insight into how the network is structured or 33 instance 5 BGP AS 10436 3 routers AS1629 BGP RIB EIGRP RIB instance 4 BGP AS 65001 6 routers instance 6 EIGRP 32 routers instance 1 EIGRP 445 routers BGP RIB Router RIB EIGRP RIB Router RIB Router 1 Router 2 EIGRP RIB instance2 BGP AS 65010 39 routers EIGRP RIB Router RIB BGP RIB instance3 BGP AS 65040 7 routers instance 7 EIGRP 64 routers Router RIB Router 4 Router 3 AS6470 EIGRP RIB BGP RIB Router RIB Router 5 Figure 9: The routing design of three compartments in network net5. Net5 uses EIGRP as an inter-domain protocol to redistribute external routes between BGP instances 2 and 4, and EBGP as an intra-domain protocol to redistribute internal routes between BGP instances 2 and 3. Routes to external destinations learned by the 445 routers in EIGRP instance 1 have been passed through at least 3 layers of routing protocols and redistributions - all inside the single network. instance 6 EIGRP 32 routers External World instance 7 EIGRP 64 routers world by router 3 use the egress point at the far left of the network, or the far right?” The route pathway graph shows which routers have policy that governs how routes are propagated to router3. The route pathway graph for router 3 also makes clear that the routing design of net5 does not follow either the conventional enterprise or backbone network pattern. The use of routing protocols in net5 simply cannot be fit into the conventional model of a two layer EGP/IGP network. External World instance3 BGP AS 65040 instance 5 BGP AS 10436 instance 4 BGP AS 65001 instance2 BGP AS 65010 instance 1 EIGRP 445 routers 5.2 IGP vs EGP classification To quantify how many networks use an unconventional routing design, we examine how the networks make use of routing protocols. Routing protocols are commonly categorized as either Interior Gateway Protocol (IGP) or Exterior Gateway Protocols (EGP). The IGP and EGP classification refers to a protocol’s routing scope in relation to an administrative domain. RIP, OSPF, IS-IS, and EIGRP have been cast as IGPs primarily because they lack the route selection attributes believed to be required of an EGP, and because of their historical application as protocols whose operation remained entirely inside an administrative domain. It is also widely accepted that BGP is the only EGP for IP networks. One of the first observations we make is that the use of routing protocols in many of the 31 networks does not follow the IGP-EGP classification. To compute the frequency with which each routing protocol serves in a given role, we developed a method to classify the roles of all routing protocol instances employed by the 31 networks. Routing protocol instances that have adjacencies with the instances of another network are considered to be serving as an EGP or inter-domain protocols; otherwise they are being used as an IGP or intra-domain protocol. Determining whether a routing protocol instance has an adjacency with a routing instance in another network is not easy, as many links in IP networks are multi-point and unless all the addresses in the link’s subnet are found in the configuration files of other routers in the network, it is possible that an external router is present on the link - making the link an external peering point. Point-to-point links commonly use a /30 subnet, which generally contains only 2 usable IP addresses. If both addresses are present in the configuration files, the interface is declared internal-facing, otherwise it is declared externalfacing. Multipoint links, such as Ethernet, commonly have much larger subnets assigned to them (e.g., a /24 containing EIGRP RIB Router RIB Router 3 Figure 10: The route pathway graph by which Router 3 learns routing information. functions. In contrast, Figure 9 shows the routing instance graph for the majority of net5 (541 out of the 881 routers). As shown in the figure, most of the network’s routers are connected to one of three routing instances running the EIGRP protocol: 445 routers to instance 1, 32 routers to instance 6, and 64 routers to instance 7. Routes are redistributed between the EIGRP instances by four different BGP instances, each with a different AS number. Using the routing instance diagram as a key, it is possible to navigate through the configuration files finding the exact set of routers whose configuration must be understood in order to answer a given question. For example, a question might be “how many routers need to fail before instance 1 is partitioned from instance 2?” As shown in the routing instance graph, the function of router 2 in the routing design is to redistribute routes between routing instances 4 and 1. There are 6 routers in net5 that serve this same purpose, and they serve as redundant backups for each other. If all these 6 routers were to fail, these two instances would be separated (unless they were somehow reachable to each other through the external world, which is not true in this case) Figure 10 shows the route pathway graph for router 3, which sits in the middle of net5. This figure can be used to answer questions such as “will packets sent to the outside 34 1 Table 1: Number of protocol instances performing intra- or inter-domain routing. OSPF 9,624 1,161 IGP EIGRP RIP 12,741 156 1,342 161 0.8 Total 22,521 2,664 Fraction of networks < x IntraInter- EBGP Sessions 1,490 13,830 0.9 256 address). Such interfaces could be internal-facing and used to connect individual hosts, or they could implement a shared “DMZ” specifically designed to connect internal and external routers. However, routing configuration can be used to determine the true purpose of the link. If the interface is used as the next-hop for any external destinations (i.e., addresses not known to be inside the network as determined by analysis of the address space structure), then we assume there must be an external router on the link to accept and forward these packets and we mark the link as external-facing. Table 1 shows both the number of different routing instances found in our 31 networks, broken out by routing protocol and intra/inter role in the network. In our data we saw no use of IS-IS. The number for intra-domain use of EIGRP includes two instances of IGRP. For OSPF, EIGRP and EBGP, the vast majority of sessions conform with conventional wisdom, with about 90% of OSPF and EIGRP sessions being used for intra-domain routing and about the same fraction of EBGP sessions being inter-domain. However, the data show a more diverse use, with a significant number of sessions breaking the conventional paradigm. Among the 31 networks, there are a total of 2664 cases (11% of total) where an IGP protocol instance performs the function of an EGP and 1490 instances (10% of total) where an EBGP session is used for intra-network routing. Three networks do not use BGP at all. We hypothesize several reasons why designers use IGPs like RIP, OSPF, EIGRP, etc. as edge protocols that talk to another network: be it their provider or their customer. It is widely believed that these protocols are easier to configure than BGP, and there is anecdotal evidence that these protocols consume fewer memory resources than a BGP process (which is significant for low-end enterprise routers). There are also several reasons why EBGP might be chosen for use an internal protocol. Perhaps the network designer sought greater scalability by dividing the network into compartments, or perhaps the EBGP sessions are a legacy from when several separate networks were merged together to form a single network during a corporate merger. Alternatively, the use of BGP gives the network designer a fine degree of control over route selection and reachability with its extensive routing policy features. 5.3 Restricting Reachability in the Network Another surprising result from our analysis is that some of the networks have many packet filters applied to the internal links. This type of reachability restriction inside the network has not been documented before. According to conventional wisdom, “protective” route and packet filtering are only required at the edge of a network to avoid bogus route advertisements and spoofed packets [6]. To evaluate how 0.7 0.6 0.5 0.4 0.3 0.2 0 10 20 30 40 50 60 70 80 90 100 Percent of filter rules applied to internal links Figure 11: CDF of percentage of packet filters applied to internal links packet filtering is used in production networks, we gathered packet filter usage statistics for each network. Three networks do not have any packet filter definitions, and they are ignored for the purpose of this analysis reducing the data set size to 28 networks. The basic building block of a packet filter definition is an access control list or route-map that consists of a variable number of “if condition then action” clauses. To measure the total amount of filtering policy on a link, regardless of how the policy is grouped into filters, we treat each clause as a separate filter rule. Figure 11 plots the cumulative density distribution of the percentage of packet filter rules applied to internal links for the 28 networks. The figure shows that in more than 30% of the networks, at least 40% of the packet filter rules are applied at internal interfaces. We investigate further by examining the details of all the internal packet filters. The results show a great diversity in the policy goals that these filters try to achieve. For example: They are used to drop packets of a specific protocol (e.g., PIM) originating from an internal host, effectively disabling that protocol in all or parts of the network. They are used to block traffic that uses certain UDP or TCP ports. They are also used to dictate which set of hosts can use a particular application through selective filtering based the application’s port. Our detailed look at the packet filters also reveals weaknesses in the Cisco IOS language that can make configuring routers more error prone and the maintenance of router configurations more difficult. For example, we had a hard time deciphering the purpose of one packet filter because it consists of 47 clauses defining several policies simultaneously. A better design would be to create multiple packet filters, one for each policy. However, IOS only allows a single filter to be applied to each interface, thereby forcing the designer to place all 47 clauses into a single filter. 6. Case Studies of Routing Designs From the 31 analyzed networks, it is very clear that the use of alternate architectures is not just a theoretical pos- 35 sibility, it is very common. Beyond even the unusual use of “interior” and “exterior” gateway protocols described in Section 5.2, we have found a vast array of different routing designs in use in these production networks. The diversity highlights the need to study and understand these designs. As a community, we must determine if they represent a clever optimization of previously unknown trade-offs, or if they are simple kludges and mis-designs that could be avoided if the research community develops a repository of best-common practices for routing design through the study of such positive and negative examples. As a first step in this direction, this section presents a case study of the routing design of two networks to show how the designers’ latent insights can be brought out. The first case study examines how the need for an IBGP mesh can be avoided, even in networks that must implement complicated routing policy. The second case study examines how reachability to external destinations can be controlled. A2 A4 instance 2 BGP BGP AS 25286 A1 A3 A1 AB2 AB0 AB1 AB3 instance 1 OSPF A2 instance 6 OSPF AB4 A5 A4 instance 3 BGP BGP AS12762 A1 = acess−list 1 permit A3 AB0 AB1 A1 A1 instance 4 BGP instance 5 BGP A5 A1 = access−list 1 deny AB2 = Address Block 2 Figure 12: Routing design of network 15 annotated with policies. Backbone networks do not have the freedom to assign addresses to all the peers whose packets will transit their network, so their routers must use AS-path attributes to decide which routes should be placed in their RIBs or redistributed. In creating net5’s routing design, the network designer was able to avoid the need to configure an IBGP mesh that would distribute external routes throughout the network. Given the large number of nodes in the network, a simple IBGP mesh would not be scalable, and a complex set of IBGP reflectors would be required. The example of net5 indicates that the space of trade-offs in which network design takes places is much larger than originally believed. By analysis of an operational network’s routing design we found evidence supporting a new class of trade-offs: namely a tension between structured address assignment that enables simplified routing policies and arbitrary address assignment which requires more complex routing designs and routing policies. The exercise also validates our goal that routing design extraction can be used to help understand and assess the structure of networks for which no documentation is available. 6.1 Avoiding an IBGP Mesh Analysis of the routing design of net5 (Figure 9) shows that its highly unconventional structure is not a mistake, but rather a clever design optimized to the constraints of the network. For reasons we have been unable to discern from the configurations, the designer felt the need to segregate the network into compartments. This compartmentalization, combined with the large number of places the network peers with external networks, creates a large number of potential egress points for packets being routed in the middle of the network. Under conventional wisdom, the need for complex route selection logic in the middle of the network to choose among egress points and the need to pass route information between network compartments should cause the designer to decide use a backbone-like routing design. In a backbone design (typified by router 5 in Figures 5 and 7), IBGP sessions are used to distribute external routes through out the network, because implementing complex route selection logic generally requires the use of route attributes carried by BGP, such as AS-path information, that would be lost if the routes were redistributed into an IGP. However, the designer of net5 appears to have deliberately constructed the network to avoid the need for BGP attributes in route selection, enabling an IGP to be used for redistribution of all routes inside the compartments of the network. Two techniques were used to achieve this simplification. First, the address blocks used in the network were carefully laid out so that the routers and hosts in each network compartment use addresses from an address block made up by a small number of unique and non-overlapping subnets. This meant that even complicated redistribution policies could be expressed using only address-based route-maps, as the address space was laid out to support the containment the designer was trying to achieve. There was therefore no need for the use of BGP-style attributes like AS-paths to control route redistribution. Second, external routes were tagged to indicate their source as they were first redistributed into the network’s IGP instances. Route selection for the router RIB on each router was configured to key off the tag, and since the IGP can propagate these tags, the need for an IBGP mesh and related BGP configuration was avoided. 6.2 Controlling External Reachability While the ultimate goal of networking is to enable communication between hosts that are not directly connected, in the networks we studied we observed a wide diversity of mechanisms being used to limit the set of destinations that hosts could reach. A completely accurate answer to the question of which hosts can communicate is extremely difficult, as it requires modeling the details of the route selection algorithm used by each protocol on each router on the network. However, by applying routing design analysis there is a middle ground that avoids the need to model route selection but still provides an extremely useful view of the reachability provided by the network. We have developed a reachability analysis algorithm [27] that begins with the routing instances calculated as described in Section 3. Figure 12 shows an example of this analysis applied to net15, which has 79 routers. The network connection to the outside world is via EBGP peering sessions with two different public ASs (anonymized as AS 25286 and AS 12762). The network has six routing instances, shown with rounded boxes, and the links connecting them repre- 36 tion 3. In this section, we evaluate how closely production networks follow these architectures, and we verify whether heuristics commonly used to classify networks are accurate or not. Our data set of 31 networks contains 4 backbone networks, enterprise networks of varying sizes, and several large tier-2 ISPs. Table 2: Address blocks mentioned by redistribution policies. Policy # A1 A2 Contents AB0, AB1 AB2 Policy # A3 A4 A5 Contents AB0, AB3 AB4 AB0 Differences Among Routing Designs 7.1 Routing Design Although enterprise networks and backbone networks are commonly thought of as following canonical “textbook” architectures, in reality the routing designs used by networks are far more diverse than these two architectures. The four backbone networks do exhibit features close to the textbook backbone routing design: a large number of EBGP sessions are used to peer with external networks; IBGP is used for distribution of external routes from the border routers to interior routers; and a small number of IGP instances is used to distribute routes to internal subnets. The large tier-2 ISP has the BGP structure of a backbone network, but contains a very large number of staging IGP instances. These are routing instances of a traditional IGP protocol, like OSPF or EIGRP, that have only a single router inside the network, but a large number of external peers. Presumably these are used to connect customers that do not run BGP to the tier-2 ISP. Instead, an IGP protocol is used to distribute routing information to the customers. Network designers tell us that this is often done in preference to using static routes because the IGP provides ongoing validation that the link to the customer is still up. Seven of the 31 networks had routing designs very close to a textbook enterprise network: a small number of BGP speakers that communicate with the outside world and inject routes into a small number of IGP instances from which most of the network’s routers learn their routes. The largest of the seven divided up its 101 routers equally between two IGP instances, presumably to improve performance and scalability. The remaining 20 enterprise networks exhibited designs that were so markedly different both from textbook examples and from each other as to defy classification. Figure 7 shows how routes are redistributed between IGP and EGP for the canonical backbone and enterprise architectures using route pathway graphs. The difference between this figure and Figure 10 showing the route pathway graph for net5 makes clear how different production routing designs can be. The hierarchy of protocols and route redistribution used in net5’s routing design is only one example of the many different structures observed across the 31 networks we examined. 17 of the networks involved some form of the redistribution of external routes learned via BGP into an IGP, but they differed in the number of ASs used internal to the network, the arrangement of those ASs, the completeness of the IBGP mesh inside the ASs, and the redistribution of routes between the ASs. Networks are designed and built to serve a purpose, such as connecting together the computers used by a corporation or creating a transit backbone. In other fields, commonality of purpose has led to commonality in the mechanisms used to achieve these purposes through the construction of “cookbooks” or “best practices.” In routing design, the classic textbooks [12] generally define only two architectures: the enterprise and backbone architectures presented in Sec- 7.2 Size Backbone networks are commonly assumed to be the largest networks, but network size is not a good indication of network type. The four networks with a backbone architecture range in size from 400 to 600 routers, with a mean of 540 routers. The seven networks with a classic enterprise structure tend to be quite small, with sizes ranging from 19 to 101 sent the route redistribution between the instances. Where there is policy restricting the redistribution, the link is annotated with a policy number and whether the routes specified by the policy are blocked (stop sign) or permitted (flag). The subnets mentioned in the policies have been aggregated into numbered Address Blocks, which are connected to the routing instances where those subnets are attached to the network. From this figure and Table 2, which shows which address blocks are mentioned in each policy, the following observations emerge: First, this is an example of a network where hosts do not have reachability to the Internet at large. The only routes allowed into the network from the public BGP ASs are those listed by policies A1, A3, and A5, which total two /16 networks and 3 /24s. There is no default route permitted. However, routes to the hosts connected to the network (address blocks AB2 and AB4) are allowed out. It is impossible to tell whether the public ASs propagate these routes further, but from a security standpoint it is possible that packets from the global Internet will find their way into this network, although the hosts inside the network will not be able to respond. Second, from the structure of the routing design, presumably the routers in instances 1, 2, and 3 are located in a different location from those in instances 4, 5, and 6. However, the connections to the public ASs are not used to create a virtual private network between the two sites. In fact, packets from hosts connected in Address Block 2 cannot reach hosts in Address Block 4 at all, or vice versa. The intersection of the policies that control routes leaving the left half the network with those controlling routes entering the right half of the network is the null set: A2 ∩ A5 = A2 ∩ A3 = A4 ∩ A1 = ∅. Third, as is common in enterprise networks, OSPF is used to redistribute routes inside the network. However, by using the routing instance model, it is now possible to begin predicting the scalability of the routing design. The reachability analysis establishes that the ingress filters A1, A3 and A5 are the factors that control the maximum number of external routes that can be injected into the OSPF instances. Combined with the number of routers in the OSPF instance, the maximum load on the OSPF processes can be predicted. 7. 37 from the configuration files is valuable in checking the consistency of external inventory databases, or to provide new inputs to the database after acquiring a new network. Snapshots of the routing design over time can be used to track the steps in adding or removing equipment from the network. The network designer can also use the current routing design to determine where and how to add a new link or router to the network, including the planning of which routing protocol adjacencies and parameters to configure. Vulnerability assessment: The routing design model provides a concise summary of the routing protocols and parameters used in the operational network. This information can be used to assess potential network vulnerabilities, such as vulnerability to network attacks, configuration errors, or violations of best common practices. For example, the operator can identify connections to neighboring domains that do not have packet or route filters, or internal links and routers with incomplete routing protocol adjacencies. Network engineering: The operators can also evaluate the robustness of the routing design to equipment failures and planned maintenance activities. For example, analysis of the routing design data can uncover scenarios where a single link or session failure would disconnect part of the network. The operators can also schedule maintenance activities to avoid disabling multiple routers with static routes to the same destination prefix, to limit the likelihood of service disruptions. In addition to the routing design data, survivability analysis requires “what if” tools that model the effects of changes to the network topology and routing configuration [5]. The analysis may also require additional information about the mapping of IP links to layer-two (e.g., ATM switches) and layer-1 (e.g., fiber spans and optical amplifiers) to accurately capture the effects of failures and maintenance activities in the underlying transport network. With accurate measurements of the offered traffic, the operators can use the “what if” tools to determine the effects of changes to the routing configuration and the underlying topology on the traffic load in the network. Anomaly detection and diagnosis: Detecting and diagnosing anomalous behavior is a crucial part of running a large IP network [3]. Operators need effective ways to identify why a user cannot reach a particular destination, or why a routing protocol is flapping. Ultimately, anomaly detection depends on analyzing a wide range of network measurements of link and CPU load, packet and flow traces, performance statistics, fault alarms, and routing protocol messages. Diagnosing a problem may require the operator to probe the network using tools such as ping and traceroute. However, making sense of these data sets requires information about the routing design. For example, the routing design captures the fact that an EBGP session is associated with a particular edge interface, which may be important in explaining why the BGP session has failed. The routing design also reveals situations where two hosts should not be able to reach each other, due to packet or route filtering policies. Finally, the routing design is crucial for deciding where to place the measurement devices (such as packet monitors or routing monitors) to collect the most useful data. Table 3: Types of interfaces found among the 31 networks in analyzed data set. Type Null Multilink Fddi CBR Channel Virtual Async Port Tunnel BRI Count 2 4 6 14 51 83 90 151 202 1077 Type Dialer TokenRing GigabitEthernet Hssi Ethernet POS ATM FastEthernet Serial total # : Count 1296 1344 2171 2375 3685 3937 6242 20420 53337 96487 routers. In contrast, the 20 enterprise networks with an unclassifiable routing design vary in size from 4 to 1750 routers, with a mean of 300 and a median of 36. As shown in Figure 8, the distribution is skewed towards smaller networks, but there are still four networks larger than the largest backbone network, containing 760, 890, 1430, and 1750 routers. These networks belong to large enterprises and tier-2 ISPs. 7.3 Interface Composition The interfaces used in a network are a relatively good predictor of the type of the network. As expected, three of four backbones are largely built using Packet-Over-SONET (POS) interfaces, which are high-cost interfaces normally reserved for high-speed long-haul links. Table 3 shows the type and frequency of the interfaces in use on the 8,035 devices in the 31 networks we studied. By far, serial interfaces are the most common. The links connected to these interfaces are most commonly implemented using private T1 lines or frame-relay circuits. Ten of the networks make extensive use of ATM as a layer-2 technology for connecting their routers. While POS interfaces are heavily used in three of the four backbone networks, they appear in two enterprise networks as well. The fourth backbone is based on High Speed Serial Interfaces (HSSI) and ATM. 8. Routing Design in a Larger Context Our analysis of router configuration data sheds light on the poorly-understood art of routing design. In this section, we discuss why configuration data alone does not provide a complete view of the designers’ intent. Still, our model of the routing design can serve as an important building block supporting important tasks in running a large IP network. 8.1 Routing Design Data as a Building Block In practice, the design and operation of large IP networks consists of wide variety of tasks on different timescales. Having an accurate, up-to-date view of the routing design — constructed from configuration files or from an external database — is extremely useful for conducting these tasks: Inventory management: Maintaining an up-to-date view of the network equipment, router configuration, and the assignment of IP address blocks is an important part of running a large IP network. The routing design extracted In practice, an accurate, up-to-date view of the network topology, routing protocol configuration, and packet/route filters is crucial for supporting these and many other network management tasks. 38 8.2 Challenges of Inferring the Routing Design Certain aspects of routing design are not captured in a snapshot of a network’s configuration state: Uncovering the reasons for the design decisions: The analysis of our data reveals the network topology and routing protocol configuration but does not explain the reasoning behind these choices. For example, many large enterprises have a hub-and-spoke topology to provide spokes (e.g., retail stores) access to hubs (e.g., central places for maintaining inventory and credit-card charges). However, this does not imply that the spokes never communicate with each other, routing through the hub to do so. In some cases, the hub-and-spoke design might be motivated in part by cost constraints. In other cases, the network designer might not know that the spokes do indeed communicate (through the hub). Spoke-to-spoke communication would only be visible by measuring and analyzing traffic data. Similarly, a hierarchical routing design might suggest that a network consists of separate administrative regions; e.g., with different autonomous systems administered by different operations teams. However, the same design might arise to bound the processing load on the control plane of the routers within each autonomous system. In some cases, a more detailed analysis might hint that the explanation indeed lies in partitioning administrative responsibilities (e.g., presence of different passwords, operating system versions, or patterns of configuration commands). Still, understanding the true intent of the designer(s) is difficult without more information. Absence of important side information: The configuration files do not include basic information such as physical locations or the distances between the routers. Depending on the interface technology, the capacity of a link may be unknown or dependent on the underlying layer-two circuit. In some cases, important information may be gleaned from knowing the network operators’ conventions for naming the routers, entering comment strings, or assigning tunable parameters. For example, the hostname of a router might implicitly indicate its location, vendor, model, and role in the network. DNS names associated with routers and interfaces are sometimes used in a similar fashion. Comment fields in the interface section might indicate the role of the link (e.g., connection to a customer or peer), the name of the neighbor, and whether the interface is in the middle of provisioning. Specific values of routing protocol parameters (such as OSPF link costs) might indicate the type of link (e.g., intra-PoP or inter-PoP) and whether the link is undergoing maintenance. Acquiring this kind of “side information” from network databases and operators, while challenging and sometimes error prone, is extremely worthwhile because it makes newer, deeper forms of analysis possible. Limited information about the neighboring domains: The router configuration files only provide information about one end of the links and sessions to neighboring domains. Although the packet and route filters on edge links constrain the behavior of the neighbors, reasoning about the expected or typical behavior is challenging. In the extreme, an edge link might not have any filtering at all, making it impossible to know what kinds of data packets and route advertisements to expect. This problem becomes much simpler if packet traces, routing table snapshots, or the configuration files for the remote routers are available. In addition, the configuration data does not reveal whether the routers inside the network can communicate with each other through 39 neighboring domains. For example, a network with links to two neighboring domains may have a “backdoor” route through these external connections. The presence of such backdoor routes is difficult to discern, even for the network operators themselves. Often, routing table dumps or traceroute data are necessary to uncover these kinds of situations. Evolution of the routing design over time: In practice, routing design is not a discrete activity that takes place a single time when a network is first built. Instead, design is a continual process. At any given time, a network may have elements of old and new designs, including vestiges of incomplete or abandoned modifications to the configuration. Similarly, the provisioning and decommissioning of equipment may lead to network configurations that appear incomplete or inconsistent. In addition, mergers and acquisitions may lead to hybrid designs with distinct characteristics that date back to the original designs of the constituent networks. For example, a single network might use OSPF as the IGP in certain domains and EIGRP in others for purely historical reasons. Acquiring a deeper understanding of the evolution of the routing design requires a longitudinal analysis with multiple snapshots of the router configuration data over time. We plan to pursue this analysis as part of our ongoing work. 9. Related Work In the absence of the data needed to conduct white-box analysis of routing designs, there has been significant work on black-box reverse engineering of network topology and IP connectivity [25, 2, 8]. The work of [5, 3] illustrates the potential power of white-box network analysis, via automated processing of router configuration files. Many network management tools for network and traffic engineering often rely on similar methods to obtain topology and routing configuration information [14, 5]. A wealth of data on routing behavior has been gleaned from routing table dumps and route monitors, particularly the BGP data collected by the RouteViews project [18]. Such techniques, deployed within a given routing domain [24, 15], provide dynamic white-box measurements of IP connectivity and reachability information. Though such data would complement and enhance the investigation considered here, the associated instrumentation has not yet been widely deployed. We considered existing data models, but none were appropriate for modeling routing designs. ITU-T M-series recommendations [23] are more geared for inventory management. The Distributed Management Task Force (DMTF) has created a model for representing the configuration of networks [11], but it is at the wrong granularity for the study of routing design. It provides no means of abstraction, like our routing instances, or means of analysis, like our route pathway and routing process graphs. 10. Summary An IP network’s routing design is embodied in the configuration of its routing protocols. Through the routing design, network operators attempt to balance complex objective and constraints, and to ensure robust network operations. In this paper, we make three primary contributions: 1. We present a methodology for working with the configuration files of production networks that supports the reverse engineering of the network’s routing design. [5] Anja Feldmann, Albert Greenberg, Carsten Lund, Nick Reingold, and Jennifer Rexford. Netscope: Traffic engineering for IP networks. IEEE Network Magazine, pages 11–19, March 2000. [6] P. Ferguson and D. Senie. Network Ingress Filtering: Defeating Denial of Service Attacks which Employ IP Source Address Spoofing. Internet Engineering Task Force, January 1998. RFC 2267. [7] Lixin Gao and Feng Wang. The extent of AS path inflation by routing policies. In Proceedings of Global Internet 2002, 2002. [8] Ramesh Govindan and Hongsuda Tangmunarunkit. Heuristics for internet map discovery. In IEEE INFOCOM 2000, pages 1371–1380, Tel Aviv, Israel, March 2000. IEEE. [9] Timothy Griffin, F. Bruce Shepherd, and Gordon T. Wilfong. Policy disputes in path-vector protocols. In Proceedings of the 7th Annual International Conference on Network Protocols, pages 21–30, Toronto, Canada, November 1999. [10] Timothy G. Griffin and Gordon T. Wilfong. An analysis of BGP convergence properties. In Proceedings of SIGCOMM, pages 277–288, Cambridge, MA, August 1999. [11] DMTF Networks Working Group. http://www.dmtf.org/standards/cim/cim schema v27. [12] Sam Halabi and Danny McPherson. Internet Routing Architectures. Cisco Press, 2001. [13] C. Hedrick. RFC 1058 - Routing Information Protocol, 1988. [14] OPNET Technologies Inc. http://www.mil3.com/products/home.html. [15] Packet Design Inc. http://www.packetdesign.com. [16] D. E. Knuth. An empirical study of FORTRAN programs. Software - Practice and Experience, 1(2):105–133, April-June 1971. [17] David A. Maltz, Jibin Zhan, Geoffrey Xie, Hui Zhang, Gisli Hjalmtysson, Albert Greenberg, and Jennifer Rexford. Structure preserving anonymization of router configuration data. Technical Report CMU-CS-04-149, Carnegie Mellon University, 2004. [18] David Meyer and University of Oregon Route Views Project. http://antc.uoregon.edu/route-views/. [19] Greg Minshall. tcpdpriv - remove private information from a tcpdump -w file. Software distribution available from http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html, 1997. [20] J. Moy. RFC 2178 - OSPF Version 2, 1997. [21] Vern Paxson. End-to-end routing behavior in the Internet. IEEE/ACM Transactions on Networking, 5(5):601–615, 1997. [22] Y. Rekhter and T. Li. RFC 1771 - A Border Gateway Protocol 4 (BGP-4), 1995. [23] ITU-T M series recommendations. http://www.itu.int/rec/recommendation.asp?type=products&lang=e&parent=T-REC-M. [24] A. Shaikh, L. Kalampoukas, R. Dube, and A. Varma. Routing stability in congested networks: Experimentation and analysis. In Proc. ACM SIGCOMM’00, pages 163–174, Stockholm, Sweden, 2000. [25] N. Spring, R. Mahajan, and D. Wetherall. Measuring ISP topologies with Rocketfuel. In Proc. ACM SIGCOMM, August 2002. [26] Cisco Systems. Enhanced IGRP. http://www.cisco.com/univercd/cc/td/doc/cisintwk/ito doc/en igrp.htm. [27] Geoffrey Xie, Jibin Zhan, David A. Maltz, Hui Zhang, Albert Greenberg, Gisli Hjalmtysson, and Jennifer Rexford. On static reachability analysis of IP networks. Technical Report CMU-CS-04-146, Carnegie Mellon University, 2004. [28] Tatu Ylonen. Thoughts on how to mount an attack on tcpdpriv’s “-a50” option... Web White Paper available from http://ita.ee.lbl.gov/html/contrib/attack50/attack50.html. 2. We provide a general method to (i) automatically process the configuration files for a given network to extract the primitives that make up the network’s routing configuration and populate a router level model of the network, and (ii) derive coherent global views of the network’s routing design from the individual primitives spread across the configuration files. These global views include the routing process graph, routing instance graph, route pathway graph and address space structure. Together, they provide a means to abstract and summarize a network’s configuration that exposes the structure of the routing design and opens it up to direct analysis. 3. We demonstrate the value of our approach by presenting examples of the application of the techniques to thirty-one production networks, and 8,035 configuration files. This structural information is essential for several important operational tasks: inventory management; vulnerability assessment; network engineering; anomaly detection and diagnosis. Some of the unconventional features of the routing designs illustrate difficulties in meeting complex objectives and constraints in operational networks. Others illustrate that, like programming, routing design is an art where many approaches might be used to try to achieve the same result. We believe that the best way forward for the operational and research communities to improve routing designs is to first understand the details, strengths and weaknesses of existing designs. We see our methodology as part of series of steps towards a holistic theory of the design and operation of data networks. Understanding the the mechanisms and dynamic behavior of individual routing protocols is insufficient.We must work towards a framework for understanding the interactions between the individual protocols and mechanisms from which the network is forged in order to make progress on the goal of achieving more scalable and robust networks. 11. Acknowledgments We are grateful for the significant contributions Jennifer Rexford has made in advancing this work and revising this paper. 12. REFERENCES [1] D. Eastlake 3rd and P. Jones. RFC 3174 - US Secure Hash Algorithm 1 (SHA1), 2001. Available from http://www.ietf.org/rfcs/rfc3174.html. [2] CAIDA. http://www.caida.org/tools/measurement/skitter/, 2000. [3] Don Caldwell, Anna Gilbert, Joel Gottlieb, Albert Greenberg, Gisli Hjalmtysson, and Jennifer Rexford. The cutting EDGE of IP router configuration. In Second Workshop on Hot Topics in Networks (HotNets-II), November 2003. [4] R. Callon. RFC 1195 - Use of OSI IS-IS for routing in TCP/IP and dual environments, 1990. 40