This weekend I got a call from an educational institute. The WiFi was down. They use an SDA fabric with a fabric-enabled Cisco 9800 WLC for their wireless. The incident occurred right after they made a change from DNAC to push new radius servers to the WLC. The problem was that no client station could connect. They stay in the state associated.
After investigating the logs on the WLC, I saw the radius server was not reachable from the controller. And all Control plane traffic like authentication traffic is handled by the WLC. On the radius server, I saw also no request to these new servers come in.
The Network administrator also figured that out so he decided to roll back the change, by adding the old radius server (actually a loadbalancer VIP) containing pool members of the radius server.
fig: Design/Network settings on the gui of thier WLC:
The network administrator did not want to change the config on the SSID again, so they manually use the old server but on the WLAN’s authentication list
Dnac-cts is a reference object from the authentication list it looks like this on the CLI:
//aaa referenceaaa authorization network dnac-cts-eduroam-022aa653 group dnac-rGrp-eduroam-022aa653
aaa authentication dot1x dnac-cts-eduroam-022aa653 group dnac-rGrp-eduroam-022aa653
//dnac group
aaa group server radius dnac-rGrp-eduroam-022aa653
server name dnac-radius_10.xx.x.aaa
ip radius source-interface VlanxXx
However, DNAC did remove the old server before but did not add back in the old server (pink block) once needed again ( because it was not bound to anything)
The reference from the group was still there , but the server was not there.
So we added back the value from the reference object.
radius server dnac-radius_10.xx.x.aaa
address ipv4 10.xx.x.aaa auth-port 1812 acct-port 1813
timeout 4
retransmit 3
pac key 7 052916357543623D4D005D3C0E0A1278
//group-not working, because can’t connect to these radius servers.
aaa authorization network dnac-cts-eduroam-fa8affd3 group dnac-rGrp-eduroam-fa8affd3
aaa authentication dot1x dnac-cts-eduroam-fa8affd3 group dnac-rGrp-eduroam-fa8affd3
aaa group server radius dnac-rGrp-eduroam-fa8affd3
server name dnac-radius_10.xx.x.bb
server name dnac-radius_10.xx.x.cc
ip radius source-interface VlanXxX
// separate global radius servers
radius server dnac-radius_10.xx.x.bb
address ipv4 10.xx.x.bb auth-port 1812 acct-port 1813
timeout 4
retransmit 3
pac key 7 08035C745D162923460E462A2F2D327A
!
radius server dnac-radius_10.xx.x.cc
address ipv4 10.xx.x.cc auth-port 1812 acct-port 1813
timeout 4
retransmit 3
pac key 7 15301B36502507107C367F0C16010051
!
Conclusion
Automated provisioning in a system is nice, but understanding how things are glued together is key when troubleshooting when things break. All reference points need to be checked because the reference value may not be what it seems.