Over the past couple of weeks I’ve been experiencing a strange problem with one of our client’s network. They payed for a 5Mpbs connection but had the ISP opened up the pipe to 100Mbps temporarily to their DR site to allow for the initial replication of servers to move a lot faster. The problem was that we weren’t getting anything close to 100Mbps whenever we did speedtests. Results averaged in the 40Mbps range and sometimes it would even get as low as 18Mbps.
Before I go any further, let me give you a brief overview of the current design. The client has their own /24 block and ASN. The same ISP that is providing their primary internet at HQ is also providing internet at the DR site along with the lease line that connects them. eBGP relationships have been established to the ISP at each site along with an iBGP peering between the HQ and the DR. They’re also using EIGRP as their IGP.
After the above configurations were completed I performed some initial tests. We were receiving a partial table at the DR site while only getting a default route at HQ. To prevent the upstream ISP sending the full table unexpectedly I configured a prefix list at both sites to only allow a default route in. Further tests were performed to ensure IGP was routing properly and that both sites were learning routes from each other. The issue arose when we were ready to test replication of the servers at HQ to the DR site. The ISP informed us that they had opened up the pipe to 100Mbps and that we could begin replication. However, conducting a number of bandwidth tests on speedtest didn’t concur with their confirmation. Speeds were fluctuating greatly but didn’t get anywhere close to 100Mbps. This was where the troubleshooting process began and what highlighted my BGP design #FAIL.
Now there could be any number of reasons why I wasn’t getting the desired results. So how do you go about troubleshooting such an issue? A carefully planned process and my good old buddy google. That’s how! The next step I did after testing on my laptop was to test on some other desktops and servers in the network. This would allow me to eliminate my POS laptop that freezes a zillion times a day out of the equation. The tests performed on the servers yielded the same results. Ok, so now we’re moving on to the next set of tests to perform. Before performing anymore tests internally I decided it would be more productive to test from the ISP’s fibre switch and walk my way into the network. It really didn’t make any sense performing a bunch of tests on the internal network when the problem might be with the ISP’s switch, the border router or even the PIX(shoot me now) that are all inline before hitting the internal network. The link comes in from the ISP via fibre, into their switch and then connects to the border router via ethernet. The border router then connects to the PIX. To ensure I was getting the desired bandwidth from the ISP I connected my laptop the to the fibre switch, placed the IP address used to peer up and then perform tests. Results were good. I was getting 98Mbps on average. Ok, so this proves that we are indeed getting the full bandwidth. The next step was to connect back the fibre switch to the border router then connect the router to my laptop. The results of these tests weren’t promising. So now I’m baffled. Could it be the border router? It can’t possible be! It’s a brand new 2901 ISR generation2 and as you know, the new ISRs are built to handle a lot more bandwidth. Time to hit up my friend google. I found another user with the exact same model router with the exact same design using a fibre switch from their upstream ISP, who was having the exact same problem. A lot of users were saying it could be a duplex/speed issue with the ports however this wasn’t the case for me as both the speed and duplex negotiated correctly. I saw @jastorino on skype around the same time I was researching the issue so decided to pitch my problem to him see what input he had. One idea that came up was that it could be possible that the ISP was doing rate limiting via sourced traffic. This made sense as when I connected directly to the ISPs switch using their IP I got full bandwidth, but when I connected to the router using IPs from the client’s block we didn’t. They denied that they were rate limiting as I was suggesting. At this point I knew I must be on to something. Why would I get full bandwidth when traffic was sourced from the ISPs IP but didn’t when it was sourced from the client’s IP. Something’s not right. Moving on, I checked to see how the client’s block was being seen as reachable from a couple of BGP looking glass. Performing a traceroute to an IP at the HQ revealed something quite interesting. The path traffic was taking to reach HQ was through the DR site. This meant that traffic was not taking the same path to return into the HQ network as the path used by exit traffic. This lead to a loud outburst of ooooooooooooooooooooh while at my desk :). Everyone around me wanted to know what happened. So now I’ve found the problem! The internet facing link at the DR site wasn’t opened to 100Mbps while the lease line was. Return traffic was actually being limited by the DR’s internet link. How do I fix it? My first thought was to split the /24 block into two /25 then reconfigure BGP to advertise those blocks to the upstream. This would allow return traffic to take the same path a the exit traffic depending on which block it was coming from. Unfortunately because ISPs wouldn’t advertise any prefix smaller than a /24 this was no longer an option. Using AS prepending was the next fix for this scenario. Prepending the client’s AS 3 times at the DR site made the AS path appear much longer than that of the HQ and because of BGP’s path selection process, return traffic would choose the HQ site.
I know this probably isn’t the best solution but it worked just fine for me. I’m by no means an expert at BGP. However, working on this project exposed me to a lot of good factors about the protocol that I will be sure to take into consideration during future designs.
The Border gateway protocol (BGP) is used to exchange network layer reachability information (NLRI) between Autonomous Systems. Like most routing protocols, BGP also has methods of preventing routing loops within networks. iBGP in particular uses the Split Horizon rule to prevent routing loops. The iBGP Split Horizon rule basically states that an iBGP router will not forward any prefixes learned via iBGP to other iBGP speakers because it assumes a full mesh topology. This means that if you have four routers, R1, R2, R3, R4 connected in a daisy chain. R1 being in AS 100 and R2, R3, R4 being in AS 200, any routes learned by R2 via its eBGP peering to R1 wouldn’t be advertised to R4.
To combat this problem, BGP uses two technologies with the goal of allowing your iBGP routers to still receive all routing updates and not require a full mesh topology.
- BGP Route-Reflectors
- Route-reflectors allow iBGP speakers to have a partial mesh topology while still propagating all iBGP learned routes to all iBGP speakers
- Route-reflectors consist of route-reflector servers and route-reflector clients
- eBGP routes learned by route-reflector servers are advertised to other eBGP neighbors, route-reflector clients and non-clients.
- iBGP routes learned from non-clients are advertised to eBGP neighbors and route-reflector clients
- iBGP routes learned from route-reflector clients are advertised to other clients, non-clients and eBGP neighbors.
- BGP Confederation
- Confederations achieve the same goal as Route-reflectors
- This is done by dividing the main AS into several smaller sub-autonomous systems.
- Typically the private range used for these sub ASs are in the range 6451-65535
- Neighbors in each sub-as must still be fully meshed
- Confederation iBGP/eBGP peers act the same way as BGP iBGP/eBGP peers.
Let’s first take a look at how we would configure route-reflectors followed by confederations. We’ll be using the topology consisting of 4 routers, R1-R4 as shown below.
R1#sh run | s router bgp router bgp 100 no synchronization bgp log-neighbor-changes network 220.127.116.11 mask 255.255.255.255 network 18.104.22.168 mask 255.255.255.255 network 22.214.171.124 mask 255.255.255.255 network 126.96.36.199 mask 255.255.255.255 network 192.168.1.0 neighbor 192.168.1.2 remote-as 200 no auto-summary
Base Configuration for R2
R2#sh run | s router bgp router bgp 200 no synchronization bgp log-neighbor-changes network 192.168.1.0 network 192.168.2.0 neighbor 192.168.1.1 remote-as 100 neighbor 192.168.2.3 remote-as 200 no auto-summary
Base Configuration for R3
R3#sh run | s router bgp router bgp 200 no synchronization bgp log-neighbor-changes network 192.168.2.0 network 192.168.3.0 neighbor 192.168.2.2 remote-as 200 neighbor 192.168.3.4 remote-as 200 no auto-summary
Base Configuration for R4
R4#sh run | s router bgp router bgp 200 no synchronization bgp log-neighbor-changes network 192.168.3.0 neighbor 192.168.3.3 remote-as 200 no auto-summary
Now let’s take a look at the output of the “show ip bpg” command.
Notice that R2 has learned about the loopbacks that are configured on R1 via it’s eBGP relationship with R1.
R2#sh ip bgp Network Next Hop Metric LocPrf Weight Path *> 188.8.131.52/32 192.168.1.1 0 0 100 i *> 184.108.40.206/32 192.168.1.1 0 0 100 i *> 220.127.116.11/32 192.168.1.1 0 0 100 i *> 18.104.22.168/32 192.168.1.1 0 0 100 i * 192.168.1.0 192.168.1.1 0 0 100 i *> 0.0.0.0 0 32768 i * i192.168.2.0 192.168.2.3 0 100 0 i *> 0.0.0.0 0 32768 i *>i192.168.3.0 192.168.2.3 0 100 0 i
Notice R3 also learns of R1’s loopbacks but this time via iBGP as indicated by the “i” in front of the networks.
R3#sh ip bgp Network Next Hop Metric LocPrf Weight Path *>i22.214.171.124/32 192.168.1.1 0 100 0 100 i *>i126.96.36.199/32 192.168.1.1 0 100 0 100 i *>i188.8.131.52/32 192.168.1.1 0 100 0 100 i *>i184.108.40.206/32 192.168.1.1 0 100 0 100 i *>i192.168.1.0 192.168.2.2 0 100 0 i * i192.168.2.0 192.168.2.2 0 100 0 i *> 0.0.0.0 0 32768 i * i192.168.3.0 192.168.3.4 0 100 0 i *> 0.0.0.0 0 32768 i
Here on R4 we see the iBGP split horizon rule in effect.
R4#sh ip bgp Network Next Hop Metric LocPrf Weight Path *>i192.168.2.0 192.168.3.3 0 100 0 i * i192.168.3.0 192.168.3.3 0 100 0 i *> 0.0.0.0 0 32768 i
R4 doesn’t learn of R1’s loopback interfaces because R3 isn’t passing the routes on. To correct this problem we’re going to use the first solution available to us, by making R3 a route-reflector server and R4 a route-reflector client. This will then allow R3 to advertise R1’s loopback interfaces to R4. The configuration to make all this happen only needs to be configured on the route-reflector server which is R3 in this case.
Route-reflector configuration for R3
R3(config)#router bgp 200 R3(config-router)#neighbor 192.168.3.4 route-reflector-client
Now let’s take another look at the output of the “show ip bgp” command on R4 to see if it did indeed learn those routes.
R4#sh ip bgp Network Next Hop Metric LocPrf Weight Path *>i220.127.116.11/32 192.168.1.1 0 100 0 100 i *>i18.104.22.168/32 192.168.1.1 0 100 0 100 i *>i22.214.171.124/32 192.168.1.1 0 100 0 100 i *>i126.96.36.199/32 192.168.1.1 0 100 0 100 i *>i192.168.1.0 192.168.2.2 0 100 0 i *>i192.168.2.0 192.168.3.3 0 100 0 i * i192.168.3.0 192.168.3.3 0 100 0 i *> 0.0.0.0 0 32768 i R4# R4#ping 188.8.131.52 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 184.108.40.206, timeout is 2 seconds: !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 4/5/8 ms
Success!!! We see that R4 is indeed now learning the routes via it’s iBGP neighbor R3 and we also have reachability.
Moving onto BGP Confederations. For this example we’re going to use the same topology however we’re going to sub-divide AS200 into As6500 and AS6501
One thing to note about Confederation configurations is that we first create the sub AS and then identify what our actual AS is within BGP sub configuration mode. I personally don’ t know why Cisco decided to implement Confederations in this way but this wouldn’t be the first time we’ve all scratched our heads at how Cisco chooses to implement stuff. Configurations for R1 will remain exactly the same, however for all other routers in AS200 (Confed AS 6500 and AD6501), we’re going to have to first remove the “router bgp 200” configs then proceed with the configurations. R2 and R3 are going to be placed into sub AS 6500 while R4 will be placed in sub AS6501.
The reason for us placing R4 into a separate sub AS to R2 and R3 is because Confederation iBGP/eBGP peers act exactly the same way as iBGP/eBGP peers. This means that a full mesh would still be required for all routers within the same sub AS. Had we placed R4 into the same sub AS as R2 and R3 the split horizon rule would have prevented R3 from advertising the networks on R1, to R4.
Confederations config on R2
R2#sh run | s router bgp router bgp 6500 no synchronization bgp log-neighbor-changes bgp confederation identifier 200 bgp confederation peers 6501 network 192.168.1.0 network 192.168.2.0 neighbor 192.168.1.1 remote-as 100 neighbor 192.168.2.3 remote-as 6500 no auto-summary
Confederations config on R3
R3#sh run | s router bgp router bgp 6500 no synchronization bgp log-neighbor-changes bgp confederation identifier 200 bgp confederation peers 6501 network 192.168.2.0 network 192.168.3.0 neighbor 192.168.2.2 remote-as 6500 neighbor 192.168.3.4 remote-as 6501 no auto-summary
Confederations config on R4
R4#sh run | s router bgp router bgp 6501 no synchronization bgp log-neighbor-changes bgp confederation identifier 200 bgp confederation peers 6500 network 192.168.3.0 neighbor 192.168.3.3 remote-as 6500 no auto-summary
Output of “show ip bgp” on R4
R4#sh ip bgp Network Next Hop Metric LocPrf Weight Path *> 220.127.116.11/32 192.168.1.1 0 100 0 (6500) 100 i *> 18.104.22.168/32 192.168.1.1 0 100 0 (6500) 100 i *> 22.214.171.124/32 192.168.1.1 0 100 0 (6500) 100 i *> 126.96.36.199/32 192.168.1.1 0 100 0 (6500) 100 i *> 192.168.1.0 192.168.2.2 0 100 0 (6500) i *> 192.168.2.0 192.168.3.3 0 100 0 (6500) i * 192.168.3.0 192.168.3.3 0 100 0 (6500) i *> 0.0.0.0 0 32768 i
Success!!! R4 learns of R1’s networks via it’s Confederation eBGP relationship with R3. Notice that the networks do not list the “i” in front of them to indicate it as being learnt via iBGP. This shows clearly how similar Confederation eBGP peers and normal eBGP peers are in terms of their neighbor relationships.
As we’ve seen throughout this post, both Route-Reflectors and Confederations can be used to solve the iBGP Split Horizon problem without the need for a full mesh topology within your BGP networks. I personally don’t have any production experience dealing with these technologies but if I had to choose one, I’d go with route-reflector just because the configuration is a lot simpler :-). Also one other advantage that came up during my group study session with @aconaway was that you would need to plan ahead if you ever wanted to implement Confederations. Reason for this is because the way in which BGP Confederations are configured in IOS. Let’s say we had some eBGP peers early in our network when it was a lot smaller but now your network has grown to a size that requires Confederations. You would need to remove the “router bgp AS” command thereby removing all BGP configurations on that router, then proceed with your Confederation configs where as with route-reflectors you simply need to place the “neighbor x.x.x.x route-reflector-client” command to achieve the same result.
This post was mainly written to aid me with my studies. The views expressed are based on my understanding and research of the technologies. Because this is a learning process mistakes may be committed to which I would greatly appreciate it if you would point those out via the comments section.