This is a mechanism to keep routes even when the BGP session goes down, but at a lower priority. Notably, this enables routing to keep working in case of disruption in the control plane while not keeping dead routes after a real failure.
Consider the following requisites:
- we want routes to be withdrawed immediately in case of a communication failure (link down on the path)
- we want routes to be kept when control plane is disrupted
Both requirements seem to contradict but we can reconcile them by keeping withdrawed routes only as last resort routes. This is exactly what BGP Long-Lived Graceful Restart is doing.
It provides two mechanisms:
-
an extension to graceful restart to mark and handle long-lived stale routes (configurable timer) with a lower priority to normal routes,
-
a community to use to send those routes to other BGP routers understanding the extension.
We use only the first mechanism in this lab. Also, graceful restart is not enabled explicitely. This means the long-lived stale route timer starts immediately.
This has been tested with:
- vRR 16.1R2.11 (no problem)
- vRR 17.3R1.10 (ECMP routes broken but otherwise work)
The lab has 3 nodes:
- a Juniper vRR split into two logical systems,
- a Linux node running GoBGP with LLGR enabled (it needs a patch for interop)
- a Linux node acting as a switch and able to simulate some kind of control plane failure by dropping BFD and BGP packets.
LLGR is completely configured by the llgr
group. Therefore, it's
easy to test what happens without and without. We focus on R1
(use
set cli logical-system R1
).
When everything is correct, three BGP sessions are established:
juniper@R:R1> show bgp summary
Groups: 3 Peers: 3 Down peers: 0
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet6.0
3 3 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
2001:db8:1::2 65000 3 2 0 0 19 Establ
inet6.0: 1/1/1/0
2001:db8:2::2 65000 4 0 0 1 19 Establ
inet6.0: 1/1/1/0
2001:db8:3::2 65000 4 0 0 1 13 Establ
inet6.0: 1/1/1/0
The same route is learnt over all peers:
juniper@R:R1> show route protocol bgp
inet6.0: 12 destinations, 14 routes (12 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
2001:db8:10::2/128 *[BGP/170] 00:00:34, localpref 100, from 2001:db8:1::2
AS path: I, validation-state: unverified
to 2001:db8:1::2 via em1.0
> to 2001:db8:2::2 via em2.0
to 2001:db8:3::2 via em3.0
[BGP/170] 00:00:39, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:2::2 via em2.0
[BGP/170] 00:00:34, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:3::2 via em3.0
LLGR is enabled on each neighbor:
juniper@R:R1> show bgp neighbor 2001:db8:1::2
Peer: 2001:db8:1::2+179 AS 65000 Local: 2001:db8:1::1+62976 AS 65000
Group: peer1 Routing-Instance: master
Forwarding routing-instance: master
Type: Internal State: Established Flags: <Sync>
Last State: OpenConfirm Last Event: RecvKeepAlive
Last Error: Open Message Error
Export: [ LOOPBACK NOTHING ]
Options: <Preference Ttl AddressFamily Multipath Refresh>
Options: <BfdEnabled LLGR>
Address families configured: inet6-unicast
Holdtime: 90 Preference: 170
NLRI inet6-unicast:
Number of flaps: 0
Error: 'Open Message Error' Sent: 1 Recv: 0
Peer ID: 1.0.0.2 Local ID: 1.0.0.1 Active Holdtime: 90
Keepalive Interval: 30 Group index: 0 Peer index: 0
I/O Session Thread: bgpio-0 State: Enabled
BFD: enabled, up
NLRI for restart configured on peer: inet6-unicast
NLRI advertised by peer: inet6-unicast
NLRI for this session: inet6-unicast
Peer supports Refresh capability (2)
Stale routes from peer are kept for: 300
Restart time requested by this peer: 0
Restart flag received from the peer: Notification
NLRI that peer supports restart for: inet6-unicast
NLRI peer can save forwarding state: inet6-unicast
NLRI that peer saved forwarding for: inet6-unicast
NLRI that restart is negotiated for: inet6-unicast
NLRI of received end-of-rib markers: inet6-unicast
NLRI of all end-of-rib markers sent: inet6-unicast
NLRI and times for LLGR configured for peer: inet6-unicast 00:02:00
NLRI and times that peer supports LLGR Restarter for: inet6-unicast 00:02:00
NLRI that peer saved LLGR forwarding for: inet6-unicast
Peer supports 4 byte AS extension (peer-as 65000)
Peer does not support Addpath
Table inet6.0 Bit: 20000
RIB State: BGP restart is complete
Send state: in sync
Active prefixes: 1
Received prefixes: 1
Accepted prefixes: 1
Suppressed due to damping: 0
Advertised prefixes: 1
Last traffic (seconds): Received 419 Sent 191 Checked 419
Input messages: Total 10 Updates 2 Refreshes 0 Octets 277
Output messages: Total 9 Updates 1 Refreshes 0 Octets 277
Output Queue[1]: 0 (inet6.0, inet6-unicast)
Notably:
juniper@R:R1> show bgp neighbor 2001:db8:1::2 | match llgr
Options: <BfdEnabled LLGR>
NLRI and times for LLGR configured for peer: inet6-unicast 00:02:00
NLRI and times that peer supports LLGR Restarter for: inet6-unicast 00:02:00
NLRI that peer saved LLGR forwarding for: inet6-unicast
If we simulate a control-plane problem with only one neighbor (using
ddos wire1
on the Linux box), the route is updated accordingly:
juniper@R:R1> show bgp summary
Groups: 3 Peers: 3 Down peers: 1
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet6.0
3 2 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
2001:db8:1::2 65000 8 5 0 2 2 Active
inet6.0: 0/1/1/0
2001:db8:2::2 65000 9 5 0 2 2:25 Establ
inet6.0: 1/1/1/0
2001:db8:3::2 65000 9 8 0 1 2:30 Establ
inet6.0: 1/1/1/0
juniper@R:R1> show route protocol bgp
inet6.0: 12 destinations, 14 routes (12 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
2001:db8:10::2/128 *[BGP/170] 00:00:09, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:2::2 via em2.0
to 2001:db8:3::2 via em3.0
[BGP/170] 00:02:37, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:3::2 via em3.0
[BGP/170] 00:00:09, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:1::2 via em1.0
Notably, the details of the last route say:
juniper@R:R1> show route protocol bgp extensive inactive-path
inet6.0: 12 destinations, 14 routes (12 active, 0 holddown, 0 hidden)
2001:db8:10::2/128 (3 entries, 1 announced)
TSI:
KRT in-kernel 2001:db8:10::2/128 -> {indirect(1048574)}
BGP Preference: 170/-101
Next hop type: Indirect, Next hop index: 0
Address: 0xc7887d0
Next-hop reference count: 1
Source: 2001:db8:1::2
Next hop type: Router, Next hop index: 722
Next hop: 2001:db8:1::2 via em1.0, selected
Session Id: 0x148
Protocol next hop: 2001:db8:1::2
Indirect next hop: 0xb1d2400 1048576 INH Session ID: 0x149
State: <Int Ext>
Inactive reason: LLGR stale
Local AS: 65000 Peer AS: 65000
Age: 48 Metric2: 0
Validation State: unverified
Task: BGP_65000.2001:db8:1::2
AS path: I
Communities: llgr-stale
Accepted LongLivedStale
Localpref: 100
Router ID: 1.0.0.2
Indirect next hops: 1
Protocol next hop: 2001:db8:1::2
Indirect next hop: 0xb1d2400 1048576 INH Session ID: 0x149
Indirect path forwarding next hops: 1
Next hop type: Router
Next hop: 2001:db8:1::2 via em1.0
Session Id: 0x148
2001:db8:1::/120 Originating RIB: inet6.0
Node path count: 1
Forwarding nexthops: 1
Next hop type: Interface
Nexthop: via em1.0
Notably:
juniper@R:R1> show route protocol bgp extensive inactive-path | match "llgr|long"
Inactive reason: LLGR stale
Communities: llgr-stale
Accepted LongLivedStale
If we simulate a failure on the second link, we get:
juniper@R:R1> show bgp summary
Groups: 3 Peers: 3 Down peers: 2
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet6.0
3 1 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received/Accepted/Damped...
2001:db8:1::2 65000 0 0 0 3 1:08 Connect
inet6.0: 0/1/1/0
2001:db8:2::2 65000 17 12 0 3 1 Active
inet6.0: 0/1/1/0
2001:db8:3::2 65000 18 16 0 1 6:21 Establ
inet6.0: 1/1/1/0
juniper@R:R1> show route protocol bgp
inet6.0: 12 destinations, 14 routes (12 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
2001:db8:10::2/128 *[BGP/170] 00:06:28, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:3::2 via em3.0
[BGP/170] 00:01:15, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:1::2 via em1.0
[BGP/170] 00:00:08, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:2::2 via em2.0
But once we make the third link fails, the route doesn't disappear:
juniper@R:R1> show route protocol bgp
inet6.0: 12 destinations, 14 routes (12 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
2001:db8:10::2/128 *[BGP/170] 00:00:05, localpref 100, from 2001:db8:1::2
AS path: I, validation-state: unverified
to 2001:db8:1::2 via em1.0
> to 2001:db8:2::2 via em2.0
to 2001:db8:3::2 via em3.0
[BGP/170] 00:00:39, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:2::2 via em2.0
[BGP/170] 00:00:05, localpref 100
AS path: I, validation-state: unverified
> to 2001:db8:3::2 via em3.0
The prefixes stay as long as the timer is not expired (two minutes in our example):
juniper@R:R1> show bgp neighbor 2001:db8:3::2 | match "llgr|long"
Options: <BfdEnabled LLGR>
Time until long-lived stale routes deleted: inet6-unicast 00:00:47
LLGR-stale prefixes: 1
On vRR 16.1R2.11, it doesn't seem possible to use an import policy to match a stale route. The displayed community may have been added after import policies are evaluated.
We can check we can ping the destination despite the ongoing "DDoS" (better than nothing):
juniper@R:R1> ping 2001:db8:10::2 count 10 rapid
PING6(56=40+8+8 bytes) 2001:db8:2::1 --> 2001:db8:10::2
!!.!.!.!.!
--- 2001:db8:10::2 ping6 statistics ---
10 packets transmitted, 6 packets received, 40% packet loss
round-trip min/avg/max/std-dev = 0.580/1.516/2.251/0.677 ms
Long-lived BGP graceful restart is still a draft. It is implemented by Juniper and Cisco.
I didn't try to get the value used for LLGR stale community in JunOS. This is only a minor compatibility problem if they differ between implementations.
GoBGP supports BGP LLGR since quite some time but can interoperate with JunOS since 1.33. The regular graceful restart timer cannot be set to 0 as this would disable everything. Therefore, it needs to be set to 1. GoBGP seems to correctly advertise BGP capabilities as a LLGR speaker and restarter but it doesn't mark routes as stale once the BGP session is over:
$ gobgp neighbor 2001:db8:102::1 adj-in
Neighbor 2001:db8:102::1's BGP session is not established
This needs to be investigated.
From JunOS point of view:
Peer: 2001:db8:104::4+60605 AS 65000 Local: 2001:db8:104::1+179 AS 65000
Group: peers Routing-Instance: master
Forwarding routing-instance: master
Type: Internal State: Established Flags: <Sync>
Last State: OpenConfirm Last Event: RecvKeepAlive
Last Error: None
Export: [ LOOPBACK NOTHING ]
Options: <Preference HoldTime Ttl AddressFamily Multipath Refresh>
Options: <BfdEnabled LLGR>
Address families configured: inet6-unicast
Holdtime: 6 Preference: 170
NLRI inet6-unicast:
Number of flaps: 1
Last flap event: Restart
Peer ID: 1.0.0.4 Local ID: 1.0.0.1 Active Holdtime: 6
Keepalive Interval: 2 Group index: 0 Peer index: 4
I/O Session Thread: bgpio-0 State: Enabled
BFD: enabled, up
NLRI for restart configured on peer: inet6-unicast
NLRI advertised by peer: inet6-unicast
NLRI for this session: inet6-unicast
Peer supports Refresh capability (2)
Stale routes from peer are kept for: 300
Restart time requested by this peer: 0
NLRI that peer supports restart for: inet6-unicast
NLRI peer can save forwarding state: inet6-unicast
NLRI that restart is negotiated for: inet6-unicast
NLRI of received end-of-rib markers: inet6-unicast
NLRI of all end-of-rib markers sent: inet6-unicast
NLRI and times for LLGR configured for peer: inet6-unicast 00:02:00
NLRI and times that peer supports LLGR Restarter for: inet6-unicast 00:02:00
Peer supports 4 byte AS extension (peer-as 65000)
Peer does not support Addpath
Table inet6.0 Bit: 20000
RIB State: BGP restart is complete
Send state: in sync
Active prefixes: 1
Received prefixes: 1
Accepted prefixes: 1
Suppressed due to damping: 0
Advertised prefixes: 1
Last traffic (seconds): Received 3150 Sent 2496 Checked 3150
Input messages: Total 1251 Updates 2 Refreshes 0 Octets 23899
Output messages: Total 1364 Updates 1 Refreshes 0 Octets 26060
Output Queue[1]: 0 (inet6.0, inet6-unicast)
We are missing the following line:
NLRI that peer saved LLGR forwarding for: inet6-unicast
From BIRD point of view:
BIRD 1.6.4 ready.
name proto table state since info
R1_1 BGP master up 10:35:03 Established
Preference: 100
Input filter: ACCEPT
Output filter: ACCEPT
Routes: 1 imported, 1 exported, 1 preferred
Route change stats: received rejected filtered ignored accepted
Import updates: 2 0 0 0 3
Import withdraws: 0 0 --- 0 0
Export updates: 12 10 0 --- 2
Export withdraws: 1 --- --- --- 0
BGP state: Established
Neighbor address: 2001:db8:104::1
Neighbor AS: 65000
Neighbor ID: 1.0.0.1
Neighbor caps: refresh restart-able llgr-able AS4
Session: internal AS4
Source address: 2001:db8:104::4
Hold timer: 5/6
Keepalive timer: 2/2
From JunOS point of view once LLGR kicks in:
juniper@R:R1> show bgp neighbor 2001:db8:104::4
Peer: 2001:db8:104::4+179 AS 65000 Local: 2001:db8:104::1+57667 AS 65000
Group: peers Routing-Instance: master
Forwarding routing-instance: master
Type: Internal State: Connect Flags: <>
Last State: Active Last Event: ConnectRetry
Last Error: None
Export: [ LOOPBACK NOTHING ]
Options: <Preference HoldTime Ttl AddressFamily Multipath Refresh>
Options: <BfdEnabled LLGR>
Address families configured: inet6-unicast
Holdtime: 6 Preference: 170
NLRI inet6-unicast:
Number of flaps: 2
Last flap event: Restart
Time until long-lived stale routes deleted: inet6-unicast 00:01:05
Table inet6.0 Bit: 20000
RIB State: BGP restart is complete
Send state: not advertising
Active prefixes: 0
Received prefixes: 1
Accepted prefixes: 1
Suppressed due to damping: 0
LLGR-stale prefixes: 1
Also:
BGP Preference: 170/-101
Next hop type: Indirect, Next hop index: 0
Address: 0xc776e10
Next-hop reference count: 1
Source: 2001:db8:104::4
Next hop type: Router, Next hop index: 778
Next hop: 2001:db8:104::4 via em1.104, selected
Session Id: 0x14a
Protocol next hop: 2001:db8:104::4
Indirect next hop: 0xb1d27c0 1048578 INH Session ID: 0x15c
State: <Int Ext>
Inactive reason: LLGR stale
Local AS: 65000 Peer AS: 65000
Age: 4 Metric2: 0
Validation State: unverified
Task: BGP_65000.2001:db8:104::4
AS path: I
Communities: llgr-stale
Accepted LongLivedStale
Localpref: 100
Router ID: 1.0.0.4
Indirect next hops: 1
Protocol next hop: 2001:db8:104::4
Indirect next hop: 0xb1d27c0 1048578 INH Session ID: 0x15c
Indirect path forwarding next hops: 1
Next hop type: Router
Next hop: 2001:db8:104::4 via em1.104
Session Id: 0x14a
2001:db8:104::/120 Originating RIB: inet6.0
Node path count: 1
Forwarding nexthops: 1
Next hop type: Interface
Nexthop: via em1.104
From BIRD point of view:
name proto table state since info
R1_1 BGP master start 11:20:17 Connect
Preference: 100
Input filter: ACCEPT
Output filter: ACCEPT
Routes: 1 imported, 0 exported, 0 preferred
Route change stats: received rejected filtered ignored accepted
Import updates: 2 0 0 0 4
Import withdraws: 0 0 --- 0 0
Export updates: 12 10 0 --- 2
Export withdraws: 1 --- --- --- 0
BGP state: Connect
Neighbor address: 2001:db8:104::1
Neighbor AS: 65000
Neighbor graceful restart active
LL stale timer: 32/-
The BGP connection is correctly torn down by BFD (so, no risk of sending an hold timer expired message). Also:
2001:db8:10::1/128 via 2001:db8:204::1 on eth0.204 [R1_2 10:35:01] * (100) [i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 2001:db8:204::1 fe80::5254:3300:cc00:5
BGP.local_pref: 100
via 2001:db8:104::1 on eth0.104 [R1_1 11:22:51] (100s) [i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path:
BGP.next_hop: 2001:db8:104::1 fe80::5254:3300:6800:5
BGP.local_pref: 100
BGP.community: (65535,6)
The route gets the community and the "stale" bit. The stale route is not used to build the ECMP route, except if we only have stale routes.
2001:db8:10::1 via 2001:db8:204::1 dev eth0.204 proto bird metric 1024 pref medium