Home > Uncategorized > BGP Design #FAIL

BGP Design #FAIL

Over the past couple of weeks I’ve been experiencing a strange problem with one of our client’s network. They payed for a 5Mpbs connection but had the ISP opened up the pipe to 100Mbps temporarily to their DR site to allow for the initial replication of servers to move a lot faster.  The problem was that we weren’t getting anything close to 100Mbps whenever we did speedtests. Results averaged in the 40Mbps range and sometimes it would even get as low as 18Mbps.

Before I go any further, let me give you a brief overview of the current design. The client has their own /24 block and ASN. The same ISP that is providing their primary internet at HQ is also providing internet at the DR site along with the lease line that connects them. eBGP relationships have been established to the ISP at each site along with an iBGP peering between the HQ and the DR. They’re also using EIGRP as their IGP.

After the above configurations were completed I performed some initial tests. We were receiving a partial table at the DR site while only getting a default route at HQ. To prevent the upstream ISP sending the full table unexpectedly I configured a prefix list at both sites to only allow a default route in. Further tests were performed to ensure IGP was routing properly and that both sites were learning routes from each other. The issue arose when we were ready to test replication of the servers at HQ to the DR site. The ISP informed us that they had opened up the pipe to 100Mbps and that we could begin replication. However, conducting a number of bandwidth tests on speedtest didn’t concur with their confirmation. Speeds were fluctuating greatly but didn’t get anywhere close to 100Mbps. This was where the troubleshooting process began and what highlighted my BGP design #FAIL.

Now there could be any number of reasons why I wasn’t getting the desired results. So how do you go about troubleshooting such an issue? A carefully planned process and my good old buddy google. That’s how! The next step I did after testing on my laptop was to test on some other desktops and servers in the network. This would allow me to eliminate my POS laptop that freezes a zillion times a day out of the equation. The tests performed on the servers yielded the same results. Ok, so now we’re moving on to the next set of tests to perform. Before performing anymore tests internally I decided it would be more productive to test from the ISP’s fibre switch and walk my way into the network. It really didn’t make any sense performing a bunch of tests on the internal network when the problem might be with the ISP’s switch, the border router or even the PIX(shoot me now) that are all inline before hitting the internal network. The link comes in from the ISP via fibre, into their switch and then connects to the border router via ethernet. The border router then connects to the PIX. To ensure I was getting the desired bandwidth from the ISP I connected my laptop the to the fibre switch, placed the IP address used to peer up and then perform tests. Results were good. I was getting 98Mbps on average. Ok, so this proves that we are indeed getting the full bandwidth. The next step was to connect back the fibre switch to the border router then connect the router to my laptop. The results of these tests weren’t promising. So now I’m baffled. Could it be the border router? It can’t possible be! It’s a brand new 2901 ISR generation2 and as you know, the new ISRs are built to handle a lot more bandwidth.  Time to hit up my friend google. I found another user with the exact same model router with the exact same design using a fibre switch from their upstream ISP, who was having the exact same problem. A lot of users were saying it could be a duplex/speed issue with the ports however this wasn’t the case for me as both the speed and duplex negotiated correctly.  I saw @jastorino on skype around the same time I was researching the issue so decided to pitch my problem to him see what input he had. One idea that came up was that it could be possible that the ISP was doing rate limiting via sourced traffic. This made sense as when I connected directly to the ISPs switch using their IP I got full bandwidth, but when I connected to the router using IPs from the client’s block we didn’t. They denied that they were rate limiting as I was suggesting. At this point I knew I must be on to something. Why would I get full bandwidth when traffic was sourced from the ISPs IP but didn’t when it was sourced from the client’s IP. Something’s not right. Moving on, I checked to see how the client’s block was being seen as reachable from a couple of BGP looking glass. Performing a traceroute to an IP at the HQ revealed something quite interesting. The path traffic was taking to reach HQ was through the DR site. This meant that traffic was not taking the same path to return into the HQ network as the path used by exit traffic.  This lead to a loud outburst of ooooooooooooooooooooh while at my desk :). Everyone around me wanted to know what happened. So now I’ve found the problem! The internet facing link at the DR site wasn’t opened to 100Mbps while the lease line was. Return traffic was actually being limited by the DR’s internet link. How do I fix it? My first thought was to split the /24 block into two /25 then reconfigure BGP to advertise those blocks to the upstream. This would allow return traffic to take the same path a the exit traffic depending on which block it was coming from. Unfortunately because ISPs wouldn’t advertise any prefix smaller than a /24 this was no longer an option. Using AS prepending was the next fix for this scenario. Prepending the client’s AS 3 times at the DR site made the AS path appear much longer than that of the HQ and because of BGP’s path selection process, return traffic would choose the HQ site.

I know this probably isn’t the best solution but it worked just fine for me. I’m by no means an expert at BGP. However, working on this project exposed me to a lot of good factors about the protocol that I will be sure to take into consideration during future designs.

  1. November 23, 2011 at 3:12 PM

    Nice explanation. I can tell you I have dealt with this before and if as-prepend is not honored (some ISPs require a specific number of prepends); sometimes communities are another way to leverage what you need to have done.

  2. November 23, 2011 at 3:14 PM

    Interesting post, Jason.

    In reality, while providers won’t advertise prefixes longer than a /24 (and some limit to even shorter), many of them will still allow you to send a /24 _and_ the two /25 prefixes. The /25s won’t be advertised to their peers, but would be known within the provider’s network. That way, the advertised /24 drags traffic to your provider (who hosts both links), then when it gets within your provider, the /25s direct traffic the way you want.

    The benefit – if you are homed to two providers for example – of splitting the /24 into 2 x /25 is determinism. AS-path stuffing is great but isn’t a 100% solution; it influences the traffic rather than absolutely determines the path it will take – plus if you prepend the AS too many times you can find other problems down the road. Splitting a prefix on the other hand moves 100% of the traffic where you send it, due to longer match.

    All depends on your scenario and your provider, of course. AT&T for example allow you to do this, plus you can set specific communities on any route o indicate that it’s only to be advertised within AT&T themselves. Other providers I’ve worked with have similar mechanisms.

    Still – if the solution works, it works, and nicely done troubleshooting the problem!

  3. November 23, 2011 at 3:51 PM

    Nice post Jason, as Rob said above the next option I would have looked at would have been communities but that would have required some config both on your and the ISP’s side.

  4. November 23, 2011 at 5:20 PM

    Good post Jason. The return path of traffic if often overlooked and as with this case, can make a huge difference to the perceived performance of the network.

    AS-path prepending is a pretty decent solution and one that doesn’t require any input from your ISP. With multi-homing to the same ISP communities,(if supported) is a good option. With different ISP’s prepending works well, and i use it to influence traffic coming into our network from our T1 ISP transit connections and IX peering points.

  5. November 23, 2011 at 5:47 PM

    Thanks guys! My main reason for using AS-Path is that it gives me more control imo. If I need to perform a design change later to test the flow of traffic I wouldn’t need to contact the ISP. The provider had offered to change the metric for routes learnt from the DR site but I opted to stay with the as-path prepending.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: