Keywords: Troubleshooting, OSPF, Router ID, MPLS, VPLS
I ran into an strange issue today. The problem was reported as “can’t upload on a speed test”.
I started looking into the reported speed test issue by doing my own speed test between routers. I got expected speeds. Then did a reboot of everything in between. Same result reported, a speed test was failing to complete the upload part.
Tested without the router in place and got full expected speeds.
Tested with the router again. Download would complete at the expected speed, but the upload test wouldn’t run at all. Input packets would just drop to zero suddenly.
This was maddening!
This network runs VPLS. There were also 2x PE routers had been added. One at a time. The first one was 100% successful. The problems began (for both PE routers) when the second came online.
This led to checking the LDP and OSPF neighbor relationships with these 2 new PE routers. These both appeared to be normal.
Investigating the IP config of the new PE routers… No duplicate loopback IP addresses. No duplicate routing link IP addresses. No duplicate OSPF networks.
Until this point I had only checked the 2x new PE routers, 2 adjacent PE routers that had similar config (no duplication there either), and the P router that all 4x of those PE routers connected to. Everything appeared to be normal.
So I went to the far side of the VPLS tunnels. I immediately noticed something here. VPLS to one of the new PE’s would be up, and then it would go down, and VPLS to the other new PE would come up. It would flip-flop this way.
Digging into this flip-flop I found that it was on exactly a 10 second timer.
Strange. OSPF issues often make themselves visible by coming up 30-40s and then going down, and then coming back up. Not 10s intervals tho. But 10s felt like OSPF still.
I checked the routing table on the router that was showing the flip-flop, and indeed, the route for one new PE’s loopback would appear (and VPLS would come up), and then it would flip to the other new PE’s loopback (and that VPLS would come up). The route was also following this 10 second timer.
Having looked at the endpoint of the VPLS, and nothing else in between having changed recently, I went back to the new PE routers to look over details with a fine toothed comb.
I checked again for duplicate IPs addresses on loopbacks and on interfaces, and in OSPF networks.
Then I started checking all the other OSPF settings.
Here I stopped. In my haste, I had cloned one of the new PEs from the other new PE, and had left a duplicate router ID.
I restarted the OSPF process after fixing the router ID on that router.
And everything came up.
Beware of duplicate Router IDs.
The effects observed from the duplicate router ID will depend on where the duplicate router IDs are located in the network.
In this case it was fairly non-destructive to the network, just 2 PE routers fought over that ID and caused VPLS towards them to flap.