Me and other members of the IPA team were pleased to assist NANOG 94, where we found lots of food, excellent drinks, great friends and colleagues, and state-of-the-art talks on many topics related to networking. This edition was focused on datacenters, protocols, automation, platforms and of course, AI.
AI development and adoption is still on the hype, and it doesn’t seem to be slowing at any time soon, but rather accelerating. One of the talks regarding this topic was “GenAI powered Network Automation”, which asked us a very interesting question:
Can LLM Agents be Network Operators?
Network troubleshooting can (and should) be a structured approach, based on standards and best practices, which hopefully can get us to a standardized methodology that we can apply to most vendors, systems and platforms. This structured methodology not only improves consistency in how issues are diagnosed and resolved but also enhances collaboration across teams.
By basing our efforts on established standards and industry best practices, we can reduce guesswork, minimize downtime, and increase efficiency. I mean, hey, I would really like to keep your phone not ringing with emergencies.
But all of us know that there will be bumps on the road. Is AI up to this task? Or at least, give us a helping hand?
Of course, as with many new technologies over the years, there was excitement, consensus and disbelief, so we thought we would give it a try.
Preparing our test environment
Our objective here is to challenge the ability of some consumer AI tools, with a very simple scenario.
Two routers, running IOS-XE, a few addresses v4 and v6 addresses, OSPF and OSPFv3 as IGP, and iBGP between them.
By breaking protocols and running status, we expect the agents to be able to restore the network to a functioning state – with no previous information about the network.
Configuration for both test routers is as simple as it gets..
Router 1
hostname R1
ipv6 unicast-routing
interface lo0
ip address 192.168.1.1 255.255.255.255
ipv6 address 2001:db8::192:168:1:1/128
ipv6 ospf 1 area 0
interface gi1
ip address 10.1.2.1 255.255.255.252
ipv6 address 2001:db8::2001:10:1:2:1/64
ipv6 ospf 1 area 0
router ospf 1
router-id 192.168.1.1
passive-interface default
no passive-interface gi1
network 192.168.1.1 0.0.0.0 area 0
network 10.1.2.0 0.0.0.3 area 0
router ospfv3 1
router-id 192.168.1.1
passive-interface default
no passive-interface gi1
network 192.168.1.0 0.0.0.0 area 0
network 10.1.2.0 0.0.0.3 area 0
Router 2
hostname R2
ipv6 unicast-routing
interface lo0
ip address 192.168.1.2 255.255.255.255
ipv6 address 2001:db8::192:168:1:2/128
ipv6 ospf 1 area 0
interface gi1
ip address 10.1.2.2 255.255.255.252
ipv6 address 2001:db8::2001:10:1:2:2/64
ipv6 ospf 1 area 0
router ospf 1
router-id 192.168.1.2
passive-interface default
no passive-interfaace gi1
network 192.168.1.2 0.0.0.0 area 0
network 10.1.2.0 0.0.0.3 area 0
router ospfv3 1
router-id 192.168.1.2
passive-interface default
no passive-interface gi1
IPv4 and IPv6 Routing tables
The previous config gets the network into the following state. We are aware of this, but we won’t feed the AI agents with this information. We wanna challenge them.
And of course, we need to prompt them so the models can understand what we are doing. We will use the following:
You are a network engineer. Your task is to help in tshoot and hopefully fix issues on your own. The only thing we know is something is broken between the communication between 2 routers. We dont have any knowledge of the previous state of the network, and lucky us, we can get CLI into both devices and check and change config. I dont want you to give me instructions but rather act as the engineer itself. I will just be your interface between your train of thought and the command line.
Okay, understood. I'm the network engineer, and you're my interface to the devices. My goal is to troubleshoot and fix the communication issue between these two routers, starting from a blank slate regarding their previous state. First, I need to establish a baseline. I'll start by checking the basic connectivity and configuration on both routers. My initial approach will be: 1. Verify Physical Connectivity and Interface Status:
Are the interfaces up/up? Are there any obvious errors on the interfaces (CRC, input errors, etc.)? 2. Check IP Addressing:
Are the interfaces on the same subnet? Is the subnet mask correct? 3. Test Basic Reachability (Ping):
Can each router ping its directly connected neighbor's interface IP? 4. Check Routing Table:
Do both routers have routes to reach each other's networks? Are there any default routes that might be misdirecting traffic? What routing protocols are in use, if any? 5. Inspect for ACLs or Firewall Rules:
Are there any access control lists or firewall rules blocking traffic between the two routers? 6. Review Log Messages:
Are there any relevant error messages in the router logs that indicate a problem? Given this, my first step will be to log into both routers and check the interface status and IP configuration. What is the hostname or IP address of the first router you'd like me to investigate? Let's call them Router A and Router B for now. I'll need to know which one I'm on when I issue commands
first router is R1, the other is R2
I can get cli now into R1
Excellent. I'm now on R1. My first command will be to check the interface status and IP configuration. Please issue the following command on R1: show ip interface brief This will give me a quick overview of all interfaces, their IP addresses, and their current status (up/down, administratively down).
Okay, this is a critical piece of information. On R1, the GigabitEthernet1 interface, which has the IP address 10.1.2.1, is in an "administratively down" state. This means it's been manually shut down. The other GigabitEthernet interfaces are also administratively down and unassigned. The Loopback0 interface is up, but that's not the interface connecting to R2. This is likely the root cause of the communication issue. We need to bring up the interface that connects to R2. Assuming GigabitEthernet1 is the interface intended to connect to R2 (given its IP 10.1.2.1), we need to enable it. My next step is to bring up GigabitEthernet1 on R1. Please enter the following commands on R1: configure terminal interface GigabitEthernet1 no shutdown end copy running-config startup-config (to save the change) After this, I'll need to re-check the interface status on R1
done, what should i do next
Great! Now that we've issued the no shutdown command on GigabitEthernet1 on R1 and saved the configuration, the immediate next step is to verify if the interface has come up. Please issue the show ip interface brief command on R1 again. I'm looking for the "Status" and "Protocol" columns for GigabitEthernet1 to both show "up". If they do, it means Layer 1 and Layer 2 are operational on R1's side.
Excellent! On R1, GigabitEthernet1 is now "up up". This means the physical layer and data link layer are operational on R1's side for that interface. Now we need to do the same investigation on R2. The communication is between two routers, so both ends of the link need to be functional. My next step is to investigate R2. Please get CLI access to R2. Once on R2, I'll need you to issue the same command we started with on R1: show ip interface brief This will give me the status of its interfaces and IP configuration, which is crucial for determining the next steps for R2.
R2 looks like this
Alright, I see I have console access to R2. Please press RETURN as prompted to get to the command prompt. Once you're at the prompt (it should look something like R2> or R2#), please issue the following command: show ip interface brief This will allow me to assess the status of R2's interfaces, similar to what we did on R1.
Okay, this is excellent news! On R2, GigabitEthernet1 is also "up up" and has the IP address 10.1.2.2. Let's summarize:
R1's GigabitEthernet1 is 10.1.2.1, status "up up". R2's GigabitEthernet1 is 10.1.2.2, status "up up". Both interfaces are in the same 10.1.2.0/24 subnet, and both are up at Layer 1 and Layer 2. This suggests our physical and data link layers are working correctly between these two interfaces. Now that we have both ends of the link up and configured with IP addresses in the same subnet, the next logical step is to test basic IP reachability between them. My next step is to attempt a ping from R2 to R1's GigabitEthernet1 interface. Please issue the following command on R2: ping 10.1.2.1 Tell me the results of the ping. This will confirm Layer 3 connectivity.
Excellent! We have 100% success rate on the ping from R2 to R1's GigabitEthernet1 interface (10.1.2.1). The !!!!! output confirms that all 5 ICMP echo requests were successful. This means that basic Layer 3 connectivity is established and working correctly between the GigabitEthernet1 interfaces of R1 and R2. The immediate communication issue between the two directly connected interfaces is resolved. Now, the initial problem description was "something is broken between the communication between 2 routers." While we've fixed the direct link, this doesn't necessarily mean all communication between the routers (e.g., between their internal networks or loopbacks) is resolved if there are other networks involved. My next step is to investigate the routing tables on both R1 and R2 to ensure they know how to reach each other's other networks (like the Loopback0 interfaces we saw earlier) or any other networks that might be behind them. Without knowledge of the network topology, I'll start with the Loopback0 interfaces as an example. First, let's check the routing table on R1. Please issue the following command on R1: show ip route I'll be looking for routes to 192.168.1.2 (R2's Loopback0) and any other expected networks.
Okay, I see R1's routing table. Looking at the output, I see the following relevant entries:
C 10.1.2.1/32 is directly connected, GigabitEthernet1: This is R1's own IP address on the link. L 10.1.2.1/32 is directly connected, GigabitEthernet1: This is the local route for the G1 interface. C 10.1.2.0/30 is directly connected, GigabitEthernet1: This is the directly connected network between R1 and R2. C 192.168.1.1 is directly connected, Loopback0: This is R1's own Loopback0 interface. O 192.168.1.2 [110/2] via 10.1.2.2, 00:11:45, GigabitEthernet1: This is a very important entry! It shows that R1 has learned a route to 192.168.1.2 (which we know is R2's Loopback0) via 10.1.2.2 (R2's GigabitEthernet1) through OSPF. The 00:11:45 indicates this route has been learned recently. The presence of the O (OSPF) route to 192.168.1.2 on R1 is a good sign. It tells me that OSPF is likely running and exchanging routing information between R1 and R2. Now, I need to check the routing table on R2 to confirm it also has a route back to R1's Loopback0 (192.168.1.1). Please issue the following command on R2: show ip route This will help us confirm if two-way routing is established for these non-directly connected networks.
Excellent! Looking at R2's routing table, I see the following:
C 10.1.2.0/30 is directly connected, GigabitEthernet1: The directly connected link network. L 10.1.2.2/32 is directly connected, GigabitEthernet1: R2's own IP on the link. C 192.168.1.2 is directly connected, Loopback0: R2's own Loopback0 interface. O 192.168.1.1 [110/2] via 10.1.2.1, 00:12:45, GigabitEthernet1: This confirms that R2 has also learned a route to R1's Loopback0 (192.168.1.1) via R1's GigabitEthernet1 IP (10.1.2.1) through OSPF. This is exactly what we want to see. Both routers have routes to each other's directly connected networks and to each other's Loopback0 interfaces, and these routes are being learned via OSPF. This indicates that OSPF is functioning correctly and exchanging routes between R1 and R2. Since we've:
Brought up the interfaces on both sides. Verified direct Layer 3 connectivity with ping. Confirmed that OSPF is exchanging routes for the Loopback interfaces. The next logical step is to attempt to ping from one router's Loopback0 to the other router's Loopback0. This will test end-to-end routing between the two routers beyond just the directly connected link. My next step: I'd like to attempt a ping from R1's Loopback0 to R2's Loopback0. Please issue the following command on R1: ping 192.168.1.2 source Loopback0 Tell me the results. If this ping is successful, it means the fundamental communication between the two routers is fully restored and routing is working as expected.
Fantastic! We have a 100% success rate on the ping from R1's Loopback0 to R2's Loopback0. The !!!!! confirms all 5 echo requests were successful. This is a strong indication that the communication issue between the two routers has been resolved. We've gone from an "administratively down" interface to full end-to-end IP connectivity and verified routing protocol operation (OSPF). To summarize what we've done and the current state:
Identified Root Cause on R1: Found GigabitEthernet1 on R1 was administratively down. Remediation on R1: Brought GigabitEthernet1 on R1 up with no shutdown. Verification on R1: Confirmed GigabitEthernet1 on R1 was "up up". Investigation on R2: Confirmed GigabitEthernet1 on R2 was already "up up" with an appropriate IP. Basic Connectivity Test: Successfully pinged from R2's GigabitEthernet1 to R1's GigabitEthernet1, confirming Layer 3 reachability on the direct link. Routing Table Verification: Confirmed both R1 and R2 had OSPF-learned routes to each other's Loopback0 interfaces. End-to-End Connectivity Test: Successfully pinged from R1's Loopback0 to R2's Loopback0, demonstrating full routing capability between the two routers beyond the direct link. At this point, based on the information provided and the successful pings, the core communication issue between R1 and R2 appears to be fixed. Is there anything else specific you'd like me to check, or any other aspects of their communication that were originally failing? Otherwise, my assessment is that the problem is resolved.
Gemini’s approach
Gemini just assumed we were talking Cisco routers. Not good.
It never asked for config, or intended status of the network. Just wanted to set up whatever was down. And what about if it was down for a good reason?
It also assumed we were running just IPv4.
It did a great job identifying the down interfaces, validating L3 reachability with pings and confirming OSPF routes.
Perfect. Since we don’t know anything about the previous state, we’ll treat this as a black-box scenario and go step by step, starting from basic reachability and progressively checking control plane and data plane. Let’s start with these assumptions:
Two routers, A and B. Something’s broken between them. I can interact with their CLI through you. Let’s begin with Router A. Please run the following commands on Router A and share the outputs:
show ip interface brief show ip route ping <Router B interface IP> (if known, or else we’ll find it from #1) show cdp neighbors or show lldp neighbors (depending on what’s supported) Let’s gather these basics, and we’ll go from there.
Thanks. Here's what I see on Router A (R1):
All physical interfaces (Gi1 through Gi4) are administratively down. Only the loopback interface Loopback0 is up with IP 192.168.1.1. There are no routes other than the directly connected loopback. CDP and LLDP are both disabled, so we can't auto-discover neighbors. There's no interface on this router currently capable of communicating with anything external. So at this point, R1 is isolated—it’s not connected to anything, so there's no way it can reach Router B or any other device. Let’s now check Router B. Please run the same set of commands on Router B:
show ip interface brief show ip route show cdp neighbors or show lldp neighbors Once I see the status of B, I’ll decide the next best step.
Excellent. Here's what we can conclude from R2:
GigabitEthernet1 is up and assigned 10.1.2.2/30. Loopback0 is up with 192.168.1.2. Routing table confirms that 10.1.2.0/30 is directly connected. This router is ready and active on Gi1. Now, going back to R1, it has 10.1.2.1 configured on its GigabitEthernet1, but that interface is administratively down. That’s our problem. To bring communication up: Enable GigabitEthernet1 on R1.
Please run this on R1:
conf t interface GigabitEthernet1 no shutdown end
Then, confirm with:
show ip interface brief
Once that’s done, I’ll have you test connectivity.
Loopback0. That tells us:
Layer 1 and 2 are good (interface is up/up). Layer 3 is functional (OSPF adjacency established). OSPF is already configured and working between R1 and R2. Now let’s validate full connectivity. From R1, please run:
ping 10.1.2.2 ping 192.168.1.2 show ip route
This will confirm end-to-end reachability and whether OSPF has populated the routing table properly. Let’s check that next.
Excellent — here’s the current status:
ICMP to both 10.1.2.2 (R2's interface) and 192.168.1.2 (R2's loopback) is successful. OSPF route to 192.168.1.2/32 is present and marked as O (intra-area), with a metric of 2. R1 has full Layer 3 connectivity to R2. OSPF adjacency is up and functioning. Conclusion: The communication issue between the two routers is resolved. The root cause was that GigabitEthernet1 on R1 was administratively shut down. Once brought up, OSPF automatically established adjacency and exchanged routes. Let me know if you want to validate any specific protocol or service over this path now — or if we consider this troubleshooting session complete.
ChatGPT approach
ChatGPT also assumed we were talking Cisco routers. Are there no other platforms around?
It neither asked for config, or intended status of the network. Just wanted to set up whatever was down.
It also assumed we were running just IPv4.
Tried to figure out network topology with CDP and LLDP, I consider this a good attempt.
It also figured out we were running OSPF, probably out of the informational messages.
It did a great job identifying the down interfaces, validating L3 reachability with pings and confirming OSPF routes.
Introducción al Monitoreo
Herramientas de Monitoreo
Métricas y KPIs
Monitoreo de Redes
Monitoreo de Servidores
Monitoreo de Aplicaciones y Servicios
Visualización y Reporting
Automatización y Acciones Remotas
00:00 Intro y musiquita
00:58 Bienvenida
01:24 ¿Qué es monitoreo?
01:50 ¿Para qué monitorear?
02:35 Importancia del monitoreo
04:41 Tecnologías de monitoreo
05:14 Agente / Agentless
06:06 SNMP
07:05 Versiones SNMP
08:11 WMI Instrumental de administración de Windows
09:38 Telemetría
10:21 Software de monitoreo
10:49 Nagios
11:53 Zabbix
13:07 LibreNMS / Observium
14:09 PRTG
15:04 Prometheus
15:53 Grafana
16:33 Fluentd / Logstash
17:51 Graylog
19:00 Stack ELK (ElasticSearch, Logstash, Kibana)
20:13 Metricas y KPIs
20:52 Monitoreo de Infraestructura
22:53 Monitoreo de Redes
24:08 Monitoreo de Aplicaciones y Servicios
25:32 Automatización y Acciones Remotas
26:12 Análisis de aplicación y servicios
27:17 Análisis de log
28:13 Análisis de tráfico de red
29:47 Outro y musiquita
00:00 Intro y musiquita
00:44 Bienvenida
01:21 Qué es gestión fuera de banda
02:01 Importancia de la gestión fuera de banda
03:34 Diferencias entre in-band y out-of-band (OOB)
05:13 OOB Lógico y Físico
05:55 Introducción a IPMI
06:28 Características de IPMI
07:14 Funciones de gestión remota de IPMI
08:46 Virtual Media, ¿que és?
09:37 Configuración inicial y acceso
10:16 Estándares y no tanto
11:04 Introducción a HPE iLO
11:38 Particularidades de iLO
12:29 Introducción a iDRAC
12:37 Particularidades de iDRAC
13:37 Gustos son gustos
13:47 Beneficios de la gestión fuera de banda
15:01 Contras de la gestión fuera de banda
16:08 iDRAC real en producción
19:15 iLO real en producción
19:48 Virtual media en producción
20:50 Despedida
21:09 Outro y musiquita
Tanto escuchaste acerca de cloud computing, SaaS, VPS, Cloudflare, AWS, GCP, emepetres, el wach stady hopen vader la computadora
Vamos a repasar brevemente a que llamamos cloud, que tiene, con qué se come y porqué usarlo
00:00 Intro – In the Cloud Age | Lords of Uptime
00:30 Bienvenida
01:51 Qué es cloud
00:00 Intro y musiquita que les gusta
01:11 Bienvenida
01:44 Qué es virtualización
02:32 Ventajas y beneficios
Casos de virtualización
04:49 Servidor, Red, Almacenamiento, Escritorio
05:14 De servidor
06:08 Estructura clásica
07:18 De red
08:26 De funciones de red
10:02 De almacenamiento
11:14 HCI Infraestructura Hiperconvergente
12:09 De escritorio
Virtualización de Servidor
13:05 Virtualización de servidor
13:08 Hipervisores
14:51 Paravirtualización – Emulación
17:01 Contenedores
19:28 Pros y contras
Virtualización de Red
20:53 Virtualización de red
21:16 SDN – Software Defined Networking
22:54 SD-WAN
23:44 VNF / NFV / Network Funcion Virtualization
25:41 Pros y contras
Virtualización de Almacenamiento
27:02 Virtualización de almacenamiento
Virtualización de Escritorio
28:10Virtualización de escritorio
28:44 VMware Horizon / Citrix Virtual Desktops
29:36 Amazon WorkSpaces / Microsoft VDI
30:35 Pros y contras
Organizaciones de Internet
02:51 Vista general de organizaciones de Internet
05:06 Ecosistema de organizaciones de Internet
05:39 ICANN – Internet Corporation for Assigned Names and Numbers https://www.icann.org/
06:52 IANA – Internet Assigned Numbers Authority https://iana.org/
08:07 ISOC – Internet Society https://www.internetsociety.org/
09:16 IETF – Internet Engineering Task Force https://www.ietf.org/
09:59 Grupos de operadores de red, NANOG, LACNOG, NLNOG
NANOG en YouTube
@TeamNANOG
Importancia de la redundancia en los servicios y redes
Arquitecturas de alta disponibilidad
Tolerancia a fallos y failover
Componentes clave de una arquitectura de alta disponibilidad
Estrategias de redundancia
00:00 Intro y musiquita
00:53 Bienvenida
01:35 Que es HA o Alta Disponibilidad
02:59 Definición de HA
04:22 Ley de Murphy
04:53 Porqué HA – desde el punto de vista técnico
07:07 Porqué HA – desde el punto de vista del negocio
09:32 Tolerancia a fallos vs Failover
Componentes de HA
10:57 Componentes de una arquitectura HA
12:02 Redundancia de hardware
12:48 Balanceo de carga
14:12 Replicacion
15:24 Monitoreo y detección de fallos
17:15 Disaster recovery
Redundancia de Red
18:48 Opciones de HA en red
19:26 STP y otros sabores de Spanning Tree
20:27 VRRP como gateway y sus variantes
22:00 VRRP como señalización
22:52 Anycast
24:00 ECMP – Equal Cost Multi Path
25:27 EVPN VXLAN Anycast Gateway
26:48 Redudancia de Route Reflector
27:51 LAG y MLAG
29:29 Stacking
31:35 Virtual Chassis
Redundancia de Aplicaciones
32:32 Opciones de HA en aplicaciones
32:41 Kubernetes, pero no todo es Kubernetes
33:18 Balanceo de carga
34:47 Nginx, Istio, HA Proxy
35:20 Balanceo de carga mediante hardware
35:59 Balanceo de carga de aplicaciones de red
37:51 Clustering
39:04 Replicación
40:29 Que opción tenemos entonces?
Redundancia de Cómputo
41:46 No todo es crítico
41:59 Replicación de VM
43:22 Cómputo distribuido y HPC
44:56 Cómputo distribuido – a mano
Conclusiones
46:19 Conclusiones .. y a dónde vamos ahora?
47:37 Menos es más
48:32 Downtime
49:48 Despedida y musiquita
Storage o almacenamiento, pero mucho más allá del disco rígido de tu compu.
RAID via hardware y software. NAS vs SAN. NFS, iSCSI, Fibre Channel.
Almacenamiento en la nube y mucho más
00:00 Bienvenida
00:30 Tipos de storage
02:30 Storage de usuario
05:55 Storage de sevidor
Se cayo el sistemaaa !! Internet no anda !!
Se me rompió un archivoooo !! La heladera no enfríaaa !
Por donde arrancar?
Como enfrentar correctamente una sesión de troubleshooting, desde cero
Y por supuesto, como transformar tu tiempo y tu esfuerzo en $$$
00:00 Bienvenida
00:01:00 Tipos de problemas en IT
00:05:36 Procedimiento estándar de resolución de problemas
00:09:00 Importancia y urgencia de los problemas
00:10:50 Sistemas Externos – Sin Contrato
00:15:06 Sistemas Externos – Con Contrato
00:17:29 Equipos de usuario
00:20:11 Hardware de servidores y redes
00:24:22 Baremetal, VPS, instancias, DB administradas
00:25:57 Redes corporativas
00:30:56 Redes de servicios
00:34:50 Sistemas eléctricos
00:36:51 CCTV – Impresoras
00:38:46 Cómo cobro / Rates en consulting
00:45:33 Know how
00:46:27 Técnicas y comandos de diagnóstico en Linux
00:56:48 Técnicas y comandos de diagnóstico en Windows
00:59:18 Evitando dolores de cabeza en infraestructura
01:01:40 Técnicas y comandos de diagnóstico en redes – del lado del usuario
01:03:43 Técnicas y comandos de diagnóstico en redes – Entornos Cisco
01:04:55 Técnicas y comandos de diagnóstico en redes – Entornos Juniper
01:05:52 Técnicas y comandos de diagnóstico en redes – Modelo OSI y Generalidades
01:06:48 Conclusiones