The case for Anycast default gateways

Or to deliver public IPv4 addressing, without breaking the bank and your routing tables.

Years passed by, technology became cheaper and more powerful, making the Internet bigger and bigger. The cloud emerged, and we saw the rise and fall of many service providers all around the globe. Now your CPE had a public IP address, maybe your printer too, someone thought it was a good idea that your fridge could have a public address to, and maybe your home cameras and your microwave and your mobile and your cat and your toilet and so on.

The network became so big, that we saw an enormous increase on the number service providers and AS numbers. Everyone needed public addresses, maybe a /18, or a /20. Some poor bastards were only delivered a /22.

At this point, people started talking about IPv4 exhaustion.

Maybe you had a small network, emailed ARIN/LACNIC asking for moar addresses!, and they told you no.

Maybe you asked your upstream for a /26 in your DIA, and they told you no.

Maybe your customers asked for public addressing, and you realized that it was just wasteful to assign a /30 to anyone. Who can afford such a waste!?

IP addresses brokers are making a shitload of money out of this. Small operators need more addresses in order to achieve sustainable growth. Most RIRs will just tell you “Dude, there is nothing left!”

On this post we are not considering standard routed customers or where we can provision a a /30 or similar into the PE equipment, and let a routing protocol do its thing.

The question is, how do you provision an interface to your residential customers, in such a way that they can have a routable public IP address on their CPE, while keeping separate broadcast domains or VLANs for customer blocks, and doing all of this without wasting addresses in the process.

You probably have a router somewhere, with an IP address which serves as a default gateway for the entire segment. Maybe this router also serves DHCP, or acts a a PPPoE server, or any IP addresses provisioning method. How do achieve an efficient IP addressing schema, efficient route aggregation, and efficient layer 2 segmentation.

Yeah it would be easy to take a /21 of publics and putting them all on a VLAN. We’ll cover this option on next articles.

Many roads to the same place

For us small and medium operators, most typical efforts in IP addresses saving involves some sort of layer 2 extension, or subnetting into smaller blocks. Let’s look at some of these alternatives.

Big subnet, VLAN your way out, single access router

Simple enough. Put an entire /24, you will lose 3 addresses on the network, broadcast, and default gateway. Extend your customer VLAN over all the required switches in between.

The good part about this is that you will make an efficient usage of your addresses by only losing 3 out of 255, which I guess is a decent tradeoff.

Of course, this is an administration nightmare. Huge broadcast domain, VLANs that split over multiple switches in several locations, a strong requirement for DHCP snooping to prevent rouge servers, and a big STP tree to take care of.

Small subnets, VLANs everywhere, multiple access router

The obvious alternative is the exact opposite to the one we just saw. Let’s take a /24, split it into decent chunks, and put them into their own VLAN, which will be targeted into a separate access router.

This approach allows to segment the huge broadcast domain into several smaller VLANs, enabling us to keep possible broadcasts isolated into their own domain. You can also run MST (or PVST maybe) on top of it, to isolate loops into single instances of STP, instead of having a big spanning tree covering everything.

Even if this looks better, there is an obvious tradeoff. For every subnet, we still lose 3 addresses.

3 in 255 is not so much. If you split that into, let’s say /26s, you will need 3 for every network, broadcast and gateway address in every subnet. 3 addresses for 4 /26s subnet, is 12 wasted addresses. And 12 in 255 is not a minor thing. Specially once you remember that you have to pay a fee to your RIR for every address, every year.

Big subnet, VPLS your way out, single access router

This is a similar approach as the first one, where we used a big single layer 2 domain. We can make this layer 2 segment over VPLS tunnels, which will extend layer 2 using a MPLS overlay.

The IP addressing here can be the same used on the first scenario. A big /24 with a default gateway, thus losing 3 addresses out of 25.

This is a commonly used alternative which works, although I am not a fan of it. On this topology can run everything over a single (and different) transit VLANs between every router, but there are some requirements.

You will need to put layer 3 capable CPEs on every customer. Those CPEs will need to talk extended MTUs, be able to run MPLS/VPLS, and a routing protocol to readvertise their loopbacks into the aggregator router, to be able to terminate VPLS tunnels. And of course as we are making a big layer 2 domain, you still have to consider DHCP snooping, and possible loops.

An alternative without reinventing the wheel

The main focus is to avoid wasting IP addresses. Readers of this humble blog are medium size operators, and every penny spent on IP addressing can make a huge difference on a long term. Hopefully, proving that we are capable of making a difference on making a smart usage of addresses, can help us to present a business case to ARIN/LACNIC/whoever, to successfully request and get additional blocks of public addresses.

MPLS/VPLS solutions can work on top of this, but many operators do not have gear capable of talking such protocols, either because they are offloading their access layer into feature-limited software solutions, or because their hardware needs to be licensed to be MPLS capable. Also, many operators don’t have skilled MPLS engineers to design and support such network topologies.

Is there a way we can accomplish this in an easier way?

Anycast at rescue

Anycast is a network addressing and routing methodology in which a single destination IP address is shared by devices in multiple locations. Anycast usually comes to mind when we think about CDNs, DNS servers, and any destination that has to be present on multiple locations at the same time, on a same IP address.

For our purposes, our destination will be our default gateway.

An anycast IP address is not related at all with the kind of service present on the upper layers, so we can easily provide network services over an anycast address.

Surely you are familiar with CloudFlare 1.1.1.1 or Google 8.8.8.8, which are anycast DNS services. The 8.8.8.8 I ping from home is probably not the same 8.8.8.8 you can ping from your end.

Think twice about this. Anycast means we will have duplicate IP addresses in our network, by design. This is not VRRP or any kind of HSRP where you can have active/passive addresses. This is in fact, using the same IP address over many devices, on active interfaces.

Our new topology

On this scenario, we will share a default gateway on multiple devices.

The 10.10.10.254/24 address will be present and active on two routers, which face different layer 2 segments on the inside. For this example, I’m using a PE simulating DIA services on one side, and a BNG or PPPoE concentrator, to simulate residential services like FTTH or unlicensed wireless access.

Our objective here is to able to handoff /24 addressing to customers, so they can use the 10.10.10.254 address as their default gateway.

This address won’t be installed on any routing protocols, and instead, we are looking to advertise /32 (or bigger summaries) v4 prefixes over BGP, install them on the core router, and be able to advertise a single /24 summary from the core to the rest of the network. This core will be in fact, an aggregator too.

By doing this, the entire /24 can be subnetted up /32 advertisements for single hosts, or any other subnet like /25 or /26 for access subnets, while all the addresses usable inside those subnets. We can think of them as pools of addresses instead of subnets.

For example, 10.10.10.0-10.10.10.61 is the first /24, but customers inside will use a /24 submask, to be able to reach 10.10.10.254 on the AGG/BNG.

I will illustrate examples on both IOS and MikroTik platforms on the following steps.

GNS3 Topology

Our lab topology looks as follows. CORE, AGG and PE are all Cisco CSR10000v 16.12.03, and the BNG is Mikrotik CHR 6.48.6.

Core router

This will act as BGP RR for AS 65000. For sake of simplicity, all interfaces are on OSPF area 0 to reditribute loopbacks.

interface Loopback0
 ip address 10.1.1.1 255.255.255.255
!
interface GigabitEthernet1
 ip address 10.255.255.5 255.255.255.252
 negotiation auto
 no mop enabled
 no mop sysid
!
interface GigabitEthernet2
 ip address 10.255.255.1 255.255.255.252
 negotiation auto
 no mop enabled
 no mop sysid
!
router ospf 1
 router-id 10.1.1.1
 passive-interface default
 no passive-interface GigabitEthernet1
 no passive-interface GigabitEthernet2
 network 10.1.1.1 0.0.0.0 area 0
 network 10.255.255.0 0.0.0.3 area 0
 network 10.255.255.4 0.0.0.3 area 0
!
router bgp 65000
 bgp router-id 10.1.1.1
 bgp log-neighbor-changes
 neighbor 10.10.1.2 remote-as 65000
 neighbor 10.10.1.2 update-source Loopback0
 neighbor 10.10.1.3 remote-as 65000
 neighbor 10.10.1.3 update-source Loopback0
!

BNG Aggregation Router

This node will act as BGP RR client for AS 65000. As with the core, we are peering between loopbacks. I highlighted the default 10.10.10.254 gateway on ether2, which faces the VPCS host behind.

/interface bridge
add name=lo0
/routing bgp instance
set default as=65000 redistribute-static=yes
/routing ospf instance
set [ find default=yes ] router-id=10.10.10.2
/ip address
add address=10.10.1.3 interface=lo0 network=10.10.1.3
add address=10.255.255.2/30 interface=ether1 network=10.255.255.0
add address=10.10.10.254/24 interface=ether2 network=10.10.10.0
/routing bgp peer
add name=peer1 remote-address=10.1.1.1 remote-as=65000 update-source=lo0
/routing ospf interface
add passive=yes
add interface=ether1
/routing ospf network
add area=backbone network=10.255.255.0/30
add area=backbone network=10.10.1.3/32

Verifiying routing protocols

At this point, we should have a full neighborship relation on OSPF, sucessful loopbacks redistribution, and established peerings between CORE and BNG.

CORE#sh ip ospf neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.10.1.3        1   FULL/BDR        00:00:34    10.255.255.2    GigabitEthernet2
10.10.1.2         1   FULL/BDR        00:00:37    10.255.255.6    GigabitEthernet1
CORE#sh ip bgp summ
CORE#sh ip bgp summary
BGP router identifier 10.1.1.1, local AS number 65000
BGP table version is 2, main routing table version 2
1 network entries using 248 bytes of memory
1 path entries using 136 bytes of memory
1/1 BGP path/bestpath attribute entries using 288 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 672 total bytes of memory
BGP activity 1/0 prefixes, 1/0 paths, scan interval 60 secs
1 networks peaked at 23:11:52 Aug 20 2022 UTC (00:09:10.360 ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.10.1.2       4        65000     107     108        2    0    0 01:33:52        0
10.10.1.3       4        65000     100      96        2    0    0 01:25:23        1

CORE#sh ip ro
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2, m - OMP
       n - NAT, Ni - NAT inside, No - NAT outside, Nd - NAT DIA
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       H - NHRP, G - NHRP registered, g - NHRP registration summary
       o - ODR, P - periodic downloaded static route, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
C        10.1.1.1/32 is directly connected, Loopback0
O        10.10.1.2/32 [110/2] via 10.255.255.6, 01:34:00, GigabitEthernet1
O        10.10.1.3/32 [110/11] via 10.255.255.2, 01:25:28, GigabitEthernet2
B        10.10.10.2/32 [200/0] via 10.10.1.3, 00:09:13
C        10.255.255.0/30 is directly connected, GigabitEthernet2
L        10.255.255.1/32 is directly connected, GigabitEthernet2
C        10.255.255.4/30 is directly connected, GigabitEthernet1
L        10.255.255.5/32 is directly connected, GigabitEthernet1

BNG Route Advertisements

We will handle advertisements with static routes using the interface as the gateway for the desired hosts. Other methods are valid, like redistributing connected routes for the case of PPPoE interface sessions.

/routing bgp instance
set default as=65000 redistribute-static=yes

/ip route
add distance=1 dst-address=10.10.10.2/32 gateway=ether2

On the CORE side, this static route advertisement will look like this.

CORE#sh ip bgp
BGP table version is 4, local router ID is 10.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.2/32    10.10.1.3                     100      0 ?

PC2 VPCS Config

I’ll assign 10.10.10.2/24 to this host.

PC2> show ip

NAME        : PC2[1]
IP/MASK     : 10.10.10.2/24
GATEWAY     : 10.10.10.254
DNS         :
MAC         : 00:50:79:66:68:01
LPORT       : 20088
RHOST:PORT  : 127.0.0.1:20089
MTU         : 1500

Do we have a sucessful routing to the outside at this point? Let’s run a traceroute.

PC2> trace 10.1.1.1 -P 1
trace to 10.1.1.1, 8 hops max (ICMP), press Ctrl+C to stop
 1   10.10.10.254   0.941 ms  0.750 ms  0.828 ms
 2   10.1.1.1   2.662 ms  1.719 ms  1.811 ms

Awesome! At this point we have built a sucessful routed network – altough – this is nothing out of the ordinary.

Reusing default gateways

The VPCS host used 10.10.10.2/24 on a single layer 2 segment. Let’s consider the PE scenario, where we will asign 10.10.10.1/24, to another host, behind a PE, behing a totally different aggregator router. We want to keep 10.10.10.254/24 as the default gateway here.

AGG Config

The AGG router config is almost the same as the BNG. We are adding a vlan77 to manage the CPE behind the AGG.

interface Loopback0
 ip address 10.10.1.2 255.255.255.0
!
interface GigabitEthernet1
 ip address 10.255.255.6 255.255.255.252
 negotiation auto
!

router ospf 1
 router-id 10.10.1.2
 passive-interface default
 no passive-interface GigabitEthernet1
 network 10.10.1.2 0.0.0.0 area 0
 network 10.255.255.4 0.0.0.3 area 0
!
router bgp 65000
 bgp router-id 10.10.1.2
 bgp log-neighbor-changes
 neighbor 10.1.1.1 remote-as 65000
 neighbor 10.1.1.1 update-source Loopback0
!

PE Config

PE configuration is dead simple. An address on vlan16 for management, and a bridge-domain gi1 and gi4.

bridge-domain 1
 member GigabitEthernet1 service-instance 1
 member GigabitEthernet4 service-instance 1
!
interface GigabitEthernet1
 no ip address
 service instance 1 ethernet
  encapsulation untagged
 !
!
interface GigabitEthernet1.16
 encapsulation dot1Q 16
 ip address 172.16.100.100 255.255.0.0
!
interface GigabitEthernet4
 no ip address
 service instance 1 ethernet
  encapsulation untagged
 !
!

AGG BGP Advertisements

Here we are doing the same we did on the BNG; adding a static route for the destination host, and static redistribution under BGP.

ip route 10.10.10.1 255.255.255.255 GigabitEthernet2
!
router bgp 65000
 redistribute static

Verifying

At this point, we should see sucessful routing from the AGG to the CORE.

CORE#sh ip os neighbor

Neighbor ID     Pri   State           Dead Time   Address         Interface
10.10.1.3         1   FULL/BDR        00:00:33    10.255.255.2    GigabitEthernet2
10.10.1.2         1   FULL/BDR        00:00:37    10.255.255.6    GigabitEthernet1
CORE#sh ip bgp summ
CORE#sh ip bgp summary
BGP router identifier 10.1.1.1, local AS number 65000
BGP table version is 5, main routing table version 5
2 network entries using 496 bytes of memory
2 path entries using 272 bytes of memory
2/2 BGP path/bestpath attribute entries using 576 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 1344 total bytes of memory
BGP activity 2/0 prefixes, 2/0 paths, scan interval 60 secs
2 networks peaked at 19:14:11 Aug 21 2022 UTC (00:05:04.304 ago)

Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.10.1.2       4        65000    1427    1425        5    0    0 21:32:05        1
10.10.1.3       4        65000    1465    1416        5    0    0 21:23:36        1

CORE#sh ip bgp nei
CORE#sh ip bgp neighbors 10.10.1.2 re
CORE#sh ip bgp neighbors 10.10.1.2 rou
CORE#sh ip bgp neighbors 10.10.1.2 routes
BGP table version is 5, local router ID is 10.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
              x best-external, a additional-path, c RIB-compressed,
              t secondary path, L long-lived-stale,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

     Network          Next Hop            Metric LocPrf Weight Path
 *>i  10.10.10.1/32    10.10.1.2                0    100      0 ?

Total number of prefixes 1
CORE#sh ip ro
Codes: L - local, C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2, m - OMP
       n - NAT, Ni - NAT inside, No - NAT outside, Nd - NAT DIA
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       H - NHRP, G - NHRP registered, g - NHRP registration summary
       o - ODR, P - periodic downloaded static route, l - LISP
       a - application route
       + - replicated route, % - next hop override, p - overrides from PfR

Gateway of last resort is not set

      10.0.0.0/8 is variably subnetted, 9 subnets, 2 masks
C        10.1.1.1/32 is directly connected, Loopback0
O        10.10.1.2/32 [110/2] via 10.255.255.6, 21:32:25, GigabitEthernet1
O        10.10.1.3/32 [110/11] via 10.255.255.2, 19:55:51, GigabitEthernet2
B        10.10.10.1/32 [200/0] via 10.10.1.2, 00:05:19
B        10.10.10.2/32 [200/0] via 10.10.1.3, 19:55:46
C        10.255.255.0/30 is directly connected, GigabitEthernet2
L        10.255.255.1/32 is directly connected, GigabitEthernet2
C        10.255.255.4/30 is directly connected, GigabitEthernet1
L        10.255.255.5/32 is directly connected, GigabitEthernet1

Ok, routes are there, how about reachability?

CORE#traceroute 10.10.10.1 source lo0
Type escape sequence to abort.
Tracing the route to 10.10.10.1
VRF info: (vrf in name/id, vrf out name/id)
1 10.255.255.6 4 msec 2 msec 1 msec
2 10.10.10.1 14 msec 4 msec 3 msec
CORE#

Looks good, and from the end host?

PC1> trace 10.1.1.1 -P 1
trace to 10.1.1.1, 8 hops max (ICMP), press Ctrl+C to stop
1 10.10.10.254 2.303 ms 1.750 ms 1.438 ms
2 10.1.1.1 2.495 ms 2.000 ms 1.838 ms

The first hop in path is the same as before, 10.10.10.254. We are resuing the same address on different routers, while keeping routing intact.

Stay tuned for the upcoming post where the BNG will act as a proper PPPoE termination point, and doing all of this dynamically without operator intervention.

UFiber v4 Python Client

Yeah, I finally remembered to make a post about this. I know it will like as a copy-paste of the previous one, because, in fact it is.

Ok, if you have been following the series, you should already know that I equally love and hate UFiber OLTs. They are affordable, deliver a lot of bang for the buck, and have an awful GUI.

Well, the GUI is lovely on v4.

Python in the middle

I wrote a quick and dirty client which acts as a sort of middleware between the HTTP inteface of the OLT and you.

It allows to provision non existing ONUs, GPON profiles, WiFi profiles, retrieve active ONU status and general configuration.

Take a look to it on https://github.com/baldoarturo/ufiber-client-4, and feel free to contribute if you want to.

How to help

It would be awesome to have docs 😀

Are you a pydoc master? Let’a add docstrings.

Do you have an OLT for me to test? Ping me and we can set up a VPN.

olt.py

This is the core of the project. It uses the OLTClient class to provide a middleware between you and the HTTP interface of the OLT.

Initialize a new OLTClient instance with:

client = OLTClient(host='192.168.1.1', username='ubnt', password='ubnt', debug_level=logging.DEBUG)

Required params are only host, and credentials.

The initialization will handle the login for you, altough you can call the login() method manually.

If the OLT is network reachable, and you have provided the right credentials, and the OLT GUI is alive and well, you should be ready to start.

What changes on v4

Well, UBNT got rid of the GPON profiles. 🙁

This software is intented to give you an alternative by keeping profiles as JSON in the ./profiles folder.

You can copy the template.json file and make your way using it as a starting point. It should be self descriptive.

There is an schema.json which validates your profile before pushing changes into the OLT.

UFiber Python Client

Ok, if you have been following the series, you should already know that I equally love and hate UFiber OLTs. They are affordable, deliver a lot of bang for the buck, and have an awful GUI.

Please, be aware that this can change for better or worse in the future, and at the time I’m writing this the latest firmware is v3.1.3. I trust in you UBNT, hope you can sort out this and give us a better product. I’ll keep my fingers crossed.

Python in the middle

I wrote a quick and dirty client which acts as a sort of middleware between the HTTP inteface of the OLT and you.

It allows to provision non existing ONUs, GPON profiles, WiFi profiles, retrieve active ONU status and general configuration.

Take a look to it on https://github.com/baldoarturo/ufiber-client, and feel free to contribute if you want to.

Edited on Aug 15 2020: I did the same for firmware version 4, which is cleaner and fixes a lot of bugs. Stay tuned!

ufiber-client

This is a quick dirty project built to provide a quick dirty client for Ubiquiti UFiber OLTs, using firmware version 3.x

There is also a CLI attempt, but I couldn’t find any ready to use packages to build a decent CLI.

More info about what am I doing this is on the following entries:

olt.py

This is the core of the project. It uses the OLTCLient class to provide a middleware between you and the HTTP interface of the olt.

Initialize a new OLTClient instance with:

client = olt.OLTClient(host, username, password)

The initialization will handle the login for you, altough you can call the login() method manually.

If the OLT is network reacheable, and you have provided the right credentials, and the OLT WEB GUI is alive and well, you should be ready to start.

You can also connect using cli.py:

$ /cli.py
UFiber Client for fw version 3.1.3
UFiber> help

Documented commands (type help <topic>):
========================================
connect  help  onu  quit  show

UFiber> connect 10.20.0.101
Username:admin
Password:
Logging to 10.20.0.101 ...
Connection OK
UFiber>

UFiber OLT API

In a previous post we took a quick look to the Ubiquiti UFiber OLT. As always, UBNT tries to offer a non expensive solution to provide last-mile conectivity for end users. I am using non-expensive because UBNT gear is not cheap. Yeah, it can be affordable, but you only get what you pay for.

We saw that the command line is very limited, even when the software is a fork of Vyatta. There is no way to get ONUs provisioned from the command line, so forgot about Ansible (we love Ansible), netmiko, and other SSH clients tools to ease your life.

UBNT wants you to use the web GUI, period. They offer a dockerized management system called UNMS, which really comes handy after you have provisioned your customers.

Both you and me, as network operators, know that provisioning customers is one of the more boring tasks, but is is still a critical one. Fast and precise provisioning translates in more customers, more stability, faster troubleshooting, and peace of mind.

Yeah, SONAR exists, but not all operators can work with their pricing and technology supports. And don’t even think to integrate billing if you are using electronic invoicing with AFIP in Argentina.

If you are still here, don’t give up. If there is a will, there is a way.

Under the hood

The OLT has a web GUI served by HTTPS, with a self-signed certificate, on port 443. There is no easy way to use a proper certificate here, but well, it’s something.

If you are not authenticated, this is what waits for you in the URL root.

I want to know if this is a standard HTML form. And indeed it is.

<form id="LoginForm" method="post" class="ui-form">
    <input id="Username" name="username" class="text-input" type="text" placeholder="Username" autocapitalize="off" autocorrection="off">
    <input id="Password" name="password" class="text-input" type="password" placeholder="Password">
    <input id="LoginButton" class="submit-input ui-button ui-widget ui-state-default ui-corner-all" type="submit" value="Login" role="button" aria-disabled="false">
</form>

What happens when we log in? I’m using Chrome version 81 and something, let’s open devtools to see the network activity.

General

Request URL:
https://x.x.x.x/
Request Method:
POST
Status Code:
303 See Other
Remote Address:
x.x.x.x:443

Request Headers

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: es,en;q=0.9
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 32
Content-Type: application/x-www-form-urlencoded
Cookie: PHPSESSID=5da99950e9f74ad8b727f219c9e41d76; X-CSRF-TOKEN=9f0c78e2ea8994b39834e0241466c21b68a28df59bf98364ece91dcd183bdab5; beaker.session.id=29fdb5243db8446f81f75587c9c2a722
DNT: 1
Host: x.x.x.x
Origin: https://x.x.x.x
Referer: https://x.x.x.x/
Sec-Fetch-Dest: document
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: same-origin
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36

Response

Content-Length: 0
Content-Type: text/html; charset=UTF-8
Date: Fri, 02 Jan 2015 08:54:30 GMT
Location: https://x.x.x.x/
Server: Server
Set-Cookie: PHPSESSID=a24b5cbbd6874a1eb09c2d086a93efc6
Set-Cookie: X-CSRF-TOKEN=6f13035a0b7aa4b375e6798c7c60f12e805ecea8c74a3306da81c710e6a3701b
Set-Cookie: beaker.session.id=a24b5cbbd6874a1eb09c2d086a93efc6; httponly; Path=/; secure

Form Data

username: ubnt
password: ubnt

So, this is a standard POST. And we got a cookie.

This can be translated to Python by using the request module.

host = 'olt.ubnt'
url = 'https://{host}'.format(host=host)

USER_AGENT = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'

HEADER_FORM_URLENCODED = {
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': USER_AGENT,
}

form_data = {
    'username': username,
    'password': password,
}
response = requests.post(
    verify=False,
    url=url,
    headers=HEADER_FORM_URLENCODED,
    data=form_data
)

Good enough for me. Of course this should be into a try/catch structure, but ymmv.

Shut up and take my ONUs

So, we are logged in. What about ONU configuration? The GUI allows to update firware, but this is done automatically starting from software version 3.1.3. We can also set many parameters of the ONU configuration, and provision them via OMCI, but the ONU should already exist in the ONU list.

There is no way to add non-existing ONUs to the configuration, which makes pre-provisioning impossible.

What happens when we click on save?

A POST request is made, passing an interesting payload to the /api/edge/batch.json endpoint. Let’s see how does it looks like.

This POST puts a payload with all the ONU configuration, and a bit more. It uses a JSON structure which looks like this.

{
    "SET": {
        "onu-list": {
            "UBNTffffffff": {
                "disable": "false",
                "profile": "profile-2",
                "name": "ARTURO TEST",
                "wifi": {
                    "provisioned": false,
                    "enabled": true,
                    "ssid": "UBNT-ONU",
                    "hide-ssid": false,
                    "auth-mode": "wpa2psk",
                    "wpapsk": "",
                    "channel": "auto",
                    "channel-width": "20/40",
                    "tx-power": "100"
                },
                "pppoe-mode": "auto",
                "pppoe-user": "ARTURO",
                "pppoe-password": "ARTURO",
                "wan-address": "null",
                "port-forwards": []
            }
        }
    }
}

Of course this is a fake ONU with a UBNTffffffff serial number. Yeah, we can fool the GUI and send whatever values we want.

This comes real handy because you can pre-provision ONUs before they show up in the PON port.

As an ISP, this means a time saver, because you can deliver ONUs ready to plug and play, and the OLT will hand out all the configuration without further intervention.

The web GUI is handy (altough not so stable), but it really doesn’t makes sense to have to manually provision each ONU when they are connected in the PON port.
Technicians have to spend to a lot time on customers houses waiting for the NOC to configure each new customer. This is no-bueno in pandemic times. You want to install as many new customers as possible, as fast as possible, and staying in foreign homes as less as possible.

It seems it’s time for me to code something.

If you are reading this, Robert Pera, please make me a CLI.

Digging into Ubiquiti’s UFiber OLT

As some of you might know, currently I’m working as a network engineer on a medium size ISP. The company had a long history working as a WISP, and in later times they moved into FTTH, trying several vendors among the lead players of the industry.

As some of you might also know, Argentina has a history of economic meltdowns, currency devaluations and import restrictions. Considering this, the best solution to implement a network here is usually the one you can afford, which can provide the performance you need, and over all things, the one you will be able to keep buying in the future.

So, considering all these factors, when planning for a GPON network for a medium size operator…while trying to keep costs low for both the company and customers:

It really doesn’t matter if Calix supports XGS-PON technologies…
Or if Huawei gear is compatible with almost everything…
Or if Furukawa Electric has some great management software…

The real questions to ask were:

Can the company afford the OLTs, and the ONUs for the planned customer base?
Will they be in the market in the years to come?

Enter Ubiquiti UFiber

UFiber offers internet and telecom service providers a cost‑effective fiber optic delivery system for Triple Play Services (data, voice, IPTV/VoD) with speeds of up to 2.488 Gbps downstream and 1.244 Gbps upstream.

OLTs come with dual hot-swap power supplies, 4 and 8 PON ports versions. Every PON port supports 128 CPEs, 20 Km maximum range. The uplinks are two SFP+, which can with in LACP.

The ONUs options, at the time when I’m writing, are:

UFiber Nano – one PON (of course), one Gigabit Ethernet, a fancy LCD display. Passive PoE powered.
UFiber Loco – a PON, a Giga Eth, passive PoE powered or external micro USB power.
UFiber Wifi, like above, but with 4 Giga Ethernet ports, and a 802.11n interface.
UFiber Instant, a nice SFP ONU.

Ok, sounds nice. How do we manage them? There is a web GUI…

Once logged in, the GUI has a nice dashboard which looks like this. And it crashes from time to time.

But this not EdgeOS, the OLT is a different product! Let’s ssh into it to get the real feel.

ssh admin@olt
The authenticity of host 'olt (olt)' can't be established.
ECDSA key fingerprint is SHA256:thnWRB2bImsdNuu1ar74GryFwv5r7PoHJsHhJOkHnCQ.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'olt' (ECDSA) to the list of known hosts.
Welcome to EdgeOS
By logging in, accessing, or using the Ubiquiti product, you
acknowledge that you have read and understood the Ubiquiti
License Agreement (available in the Web UI at, by default,
http://192.168.1.1) and agree to be bound by its terms.
admin@olt's password:
Linux olt 4.4.159+ #1 SMP Fri Feb 22 15:28:22 UTC 2019 mips
Welcome to EdgeOS
Last login: Tue May 26 15:50:16 2020 from 190.211.80.70
admin@olt:~$

Ok, this is definitely EdgeOS. So we got a fully featured command line interface, with commands similar to Juniper JunOS.

admin@olt:~$ show configuration | display set
-vbash: display: command not found

Ok, maybe the command line is not so-fully-featured. No worries, I’ll write Ansible playbooks to manage the OLTs anyway. Most of the configuration is Juniper-like, so all I need at this moment is find out how to configure GPON profiles, and provision ONUs.

admin@olt:~$ show configuration | match onu
admin@olt:~$ show configuration | match profile
admin@olt:~$ show configuration | match gpon
gpon {

There you are! Let’s go into configuration mode.

admin@olt:~$ configure
[edit]
admin@olt# show system gpon
isolation enable
mtu 1518
[edit]
admin@olt#

Well, this is awkward. There is nothing about GPON in the command line. Neither in the working configuration, and of course being Ubiquiti, there are no command line manuals.

Love the smell of undocumented commands in the morning

So, I saw two interesting lines before: Linux olt 4.4.15, and -vbash: display: command not found, which tells me this is Linux, not BSD as in Junos, and we have bash.

admin@olt:~$
Possible completions:
  add           Add an object to a service
  clear         Clear system information
  configure     Enter configure mode
  connect       Establish a connection
  copy          Copy data
  delete        Delete a file
  disconnect    Take down a connection
  generate      Generate an object
  initial-setup Enter initial configuration dialog
  no            Disable or reset operational variable
  ping          Send Internet Control Message Protocol (ICMP) echo request
  ping6         Send IPv6 Internet Control Message Protocol (ICMP) echo request
  reboot        Reboot the system
  release       Release specified variable
  rename        Re-name something.
  renew         Renew specified variable
  reset         Reset a service
  restart       Restart a service
  set           Set system or shell options
  show          Show system information
  shutdown      Shutdown the system
  telnet        Telnet to <hostname|IPv4 address>
  terminal      Control terminal behaviors
  traceroute    Track network path to <hostname|IPv4 address>
  traceroute6   Track network path to <hostname|IPv6 address>

No signs of bourne again shells in the horizon. Does my magic have any power here?

admin@olt:~$ sh
sh-4.4$ whoami
admin
sh-4.4$ sudo su
root@olt:/home/admin#

Finally a decent shell. Which world is this?

root@olt:~# uname -a
Linux olt 4.4.159+ #1 SMP Fri Feb 22 15:28:22 UTC 2019 mips GNU/Linux
root@olt:~# ls -l /etc/ | grep apt
drwxr-xr-x 6 root root 117 Feb 22 2019 apt

We have apt, so this is a Debian world. I checked on /etc/apt/ and there are no repositories, but I am sure I could run cowsay on this. But the fun can wait.

Where is my GPON configuration? It should say “onu” somewhere.

root@olt:/# grep -r "onu" / | more
grep: /proc/sys/net/ipv4/route/flush: Permission denied
/config/onu_config.json: "onu-policies": {
/config/onu_config.json: "onu-list": {
/config/onu_config.json: "onu-profiles": {
/home/admin/.history:show configuration | match onu
/home/admin/.history:show configuration | match onu
Binary file /lib/mipsel-linux-gnu/libbsd.so.0.8.3 matches
Binary file /lib/mipsel-linux-gnu/libnss_hesiod-2.24.so matches
Binary file /lib/udev/hwdb.bin matches
/lib/udev/hwdb.d/20-OUI.hwdb: ID_OUI_FROM_DATABASE=Monument Labs, Inc.
/lib/udev/hwdb.d/20-OUI.hwdb: ID_OUI_FROM_DATABASE=Optical Zonu Corporation
/lib/udev/hwdb.d/20-OUI.hwdb: ID_OUI_FROM_DATABASE=Presonus Corporation
/lib/udev/hwdb.d/20-usb-vendor-model.hwdb: ID_VENDOR_FROM_DATABASE=PreSonus Audio Electronics, Inc.
Binary file /opt/bcm68620/bcm68620_appl.bin matches
Binary file /opt/bcm68620/bcm_dev_ctrl_linux.ko matches
Binary file /opt/bcm68620/bcm_user_appl matches
/opt/vyatta/share/vyatta-cfg/templates/system/gpon/logging/module/node.def:syntax:expression: $VAR(@) in "main", "oltsys", "onu", "session", "events", "mon_th", "sdk"
/opt/vyatta/share/vyatta-cfg/templates/system/gpon/logging/module/node.def:allowed: echo main oltsys onu session events mon_th sdk

I bolded the interesting information.

There is a /config directory, which has a JSON file called onu_config.json
The operating system, is in fact, Vyatta.

If you are curious, this is the content of /config. We will dig deeper on the next article.

root@olt:/# ls -l /config
total 200
-rw-rw-r-- 1 root vyattacfg 3336 Jan 1 2015 2020
drwxrwsr-x 1 root vyattacfg 160 Feb 22 2019 auth
-rw-rw-r-- 1 root vyattacfg 3882 May 26 11:59 config.boot
-rw-r----- 1 root vyattacfg 2402 Dec 31 2014 config.boot.2015-01-01-0001.pre-migration
-rw-r----- 1 root vyattacfg 3151 Apr 13 2015 config.boot.2015-04-14-0130.pre-migration
-rw------- 1 root vyattacfg 187285 May 26 16:14 onu_config.json
drwxrwsr-x 1 root vyattacfg 232 Feb 22 2019 scripts
drwxr-sr-x 2 root vyattacfg 232 Dec 31 2014 snmp
drwxrwsr-x 1 root vyattacfg 160 Feb 22 2019 support
drwxr-xr-x 1 root root 160 Oct 29 2018 udapi-bridge
drwxrwsr-x 1 root vyattacfg 160 Feb 22 2019 user-data
drwxr-sr-x 3 www-data vyattacfg 224 Dec 31 2014 wizard

Using RSYNC with Ansible

The past week I found myself in a situation where I had to copy a directory to a remote SMB share, using it as a backup destination.

I didn’t had a login to the remote server, just a share and credentials for it, so the easiest way to sync all the data was to use rsync.

After I coded a small bash script to execute rsync, the business requirements changed, and this storage was indented to be used as an “offline” backup. Of course the best way to execute an offline copy is to set up an intermediate host, with the following steps:

Mount a share from the intermediate server
Copy the data to this share
Unmount the share
Mount the share in the destination server
Copy the data from the share
Unmount the share

By using an intermediate server, the source host of the data and the backup destination are never directly connected, meaning that a compromised origin server has no way to directly compromise the destination server, in the worst case scenario.

At the moment the intermediate server is waiting to be deployed, to I had to wrote a quick Ansible playbook to mount the remote share, copy the data, and unmount the share after the copy.

Instead of running rsync for the first copy, I suggest to run a standard copy because there is nothing to compare on the destination, and we will save some time and bandwidth.

An email notification was added to the playbook to get feedback about the synchronization result, as it was syncing about 1TB of data over a slow WAN link.

---
- hosts: remote_server
  gather_facts: no
  become: yes

  tasks:

    - name: Mount external storage
      mount:
        src: //this_is_a_smb_path/on_another_server
        path: /srv/external
        state: mounted
        fstype: cifs
        opts: username=myuser,password=mypass

    - name: Rsync /srv/data to /srv/external
      synchronize:
        archive: yes
        compress: yes
        src: /srv/data
        dest: /srv/external
      delegate_to: remote_server
      register: sync

    - name: Unmount external storage
      mount:
        src: //this_is_a_smb_path/on_another_server
        path: /srv/external
        state: unmounted

    - name: Send e-mail
      mail:
        host: my.smtp.server
        port: 25
        subject: Ansible Backup Report
        body: "Backup status is {{ sync.rc }}"
        from: Ansible Backups <ansible-backup@server.com>
        to:
        - notifications@domain.com

MikroTik VPN with Windows NPS RADIUS

With the advance of cheap MikroTik routers and ready to use CHR instances, setting up a VPN concentrator for remote access has become an easy task. Moving even further, a single router could provide VPN access and dynamic routing to integrate remote networks to the backbone.

I have started a gig as a consultant and sysadmin for a logistics insurance company, and one of my first proposals was to improve the network access for road warriors and remote workers.

The past

There was a Proxmox hypervisor, with some Windows 2012 R2 servers, providing Terminal Services, to execute a locally installed client for an ERP system. Proxmox was also using iptables on its the Debian backend of the to masquerade the VM networks with a public IP address, for Internet connectivity, dstnat rules for a NGINX reverse proxy, and RDP for the Windows servers..

I guess we all know having internet-exposed RDP is not a good idea, even if it is running in a non default port, so the former sysadmin transitioned to a SSH tunnel system, where the users connected to the hypervisor via SSH to establish tunnel to the desired server.

This solution, which I considered not elegant, was the only available at the moment due to networking constraints of the VPS provider, so really it was the best they were able to do, and it worked fine for them.

Over the Proxmox hypervisor, they also had a MikroTik CHR instance, with a P1 license, which was used to make a L2TP tunnel to a RB2011UiAS-rm located on their HQ.

Networks behind the tunnel endpoints were routed with static routers, so I configured a quick multi-area OSPF routing system, with the directly connected networks on area 0, along with the /30 network of the tunnel. I added an additional area on both ends, for the future VPN networks. Once OSPF was working as expected, I remove the static routes.

Securing the tunnels

This interconnection via the L2TP tunnel was just plain ol’ L2TP, without IPsec. This is no bueno, and could be improved. Fortunately, IPsec configuration on MikroTik is trivial. Just select “Use IPsec” on both ends, and use the same IPsec pre-shared key.

This can of course be configured via CLI. Would you like some RouterOS configuration Ansible on next posts? Let me know in the comments.

/interface l2tp-server server
set authentication=mschap1,mschap2 default-profile=VPN enabled=yes ipsec-secret="PUT_A_SECRET_HERE" use-ipsec=yes

VPN profiles

It’s always a good idea to copy the default-encryption profile, and create a new one based on that template. I set up a local address which was of course, part of the networks announced in a separate area by the OSPF process. I also added a IP pool to be able to provide dynamic addresses for the VPN users.

Maybe you are aware that in the Cisco world, you have to use tcp adjust-mss to adjust the maximum TCP segment size, to advoid fragmentation of packets over the tunnel. Fortunately, this is configured by default on RouterOS.

Finally, to be able to redirect the dial-in to a RADIUS server, we need to instruct the PPP AAA system to use RADIUS, as shown next.

Setting up RADIUS authentication

RADIUS servers are very simple to set up on RouterOS.

Under the RADIUS submenu, add a new server for PPP service, and configure the following parameters.

IP address of the radius server
RADIUS secret
Authentication and accounting ports, usually 1812 and 1813. Some servers use 1645 for accounting. Those are all UDP.
REALM if your server supports that extension
Which source address should the router use for its NAS-IP-Address

Using Windows NPS as a RADIUS server

NPS can work without a Certificate Authority but if you are working in an Active Directory environment, you’ll save a lot of headaches by installing the CA role.

In my particular scenario, the server was not part of a domain so the certificate generation and association was skipped.

Once the roles have been configured, I headed to the NPS service configuration, and add new RADIUS client.

Make sure to match the RADIUS secret and the source IP address as you configured on the MikroTik side.

Next, the network access policies. I wanted to match the NAS IPv4 address, and the authentication types. If you are not familiar with the RADIUS lang, NAS stands for Network Access Server, which in this case, is the MikroTik router which provides the VPN service.

I had to use a CHAP fallback due to some legacy devices withuout MSCHAP support.

Next, I added a new Access condition, matched the NAS address once again, and selected the local server as point of authentication.

Once everything was properly configured, I set up the VPN client on my side, which looks like as follows. The idea of using NPS as RADIUS was to be able to use my Windows account credentials for the VPN.

I verified the successful authentication on the router logs, and the VPN was sucessfully connected.

Creating passwordless logins with Ansible

What kind of users? Well, a special user called Ansible, which will use SSH keys to login into remote devices, allowing for full automation on playbooks.

Creating a new key

If you have been following the series, maybe you remember that we already created keys on the Juniper Junos SSH Keys post.

To create a new key, let’s issue the ssh-keygen command as follows. The -f flag tells the output path, and the -C flags specifies a comment.

$ ssh-keygen -f ansible.key -C ansible-login-passwordless

This should output two files, ansible.key and ansible.key.pub.

The public key should look something like this.

$ cat ansible.key.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCtJUPfzJY6vKqLUssPPQe+LD7qRmIPbVhb/1i4Qab7T0Vf3x+ItfJyV4Ej4FsnRSU8iMU8J5eIdcetGQfsmwIZAm8glB0T6En5F9lvq2Yd+3RKIvxM3UlrIH6EaRedhsRUyV96CHfIO2nVqS9dmFfgrOJMIOwfTWIiRDNczUPw7aqw0FExslw9ZC0FO/1A6hYgofkGLrdIu9gK/WkNg5BE1EUCYPqbDBEHnnhv3C33LqiSJZnXJyqu53qz+jlv+1LZxerNHuovMGZMkjQsBo2f3r9Gk/9HqBmT0rcLr5prm4CqqryJ3S9VyVVlF599BlqYMuMjj+fCj277R8kSnLxl ansible-login-passwordless

Of course we need an inventory to use, which has the following content.

$ cat inventory.yml 
---
all:
  hosts:
    vars:
      ansible_ssh_user: ansible
      ansible_ssh_private_key_file: ansible.key
      ansible_python_interpreter: auto_silent
    hosts:
      localhost:

This inventory only has one host, localhost, and uses three main variables:

ansible_ssh_user, which tell Ansible to use the user ansible
ansible_ssh_private_key_file, which indicates the key for this user
ansible_python_interpreter, just to avoid non needed logs

The playbook will looks like this. Notice we don’t need to gather_facts here, and we will instruct ansible to use become to gain privileges on the destination host.

---
- hosts: all
  become: yes

  tasks:

    - name: Make sure we have a "wheel" group
      group:
        name: wheel
        state: present

    - name: Allow 'wheel' group to have passwordless sudo
      lineinfile:
        dest: /etc/sudoers
        state: present
        regexp: '^%wheel'
        line: '%wheel ALL=(ALL) NOPASSWD: ALL'
        validate: 'visudo -cf %s'
        
    - name: Create "ansible" user
      user:
        name: ansible
        comment: Ansible Automation User
        groups: wheel

    - name: Add ssh key
      authorized_key:
        user: ansible
        state: present
        key: "{{ lookup('file', './ansible.key.pub') }}"

First, we want to make sure there is a group called wheel which will group users with administrative privileges.

Then, the /etc/sudoers file will be edited by allowing the wheel group to gain privileges, with a failsafe using a visudo validation.

Once the group has been created, the new user will be created, and a SSH key will be added to it.

It seems allright, but, how should we run the playbook, if the default user is ansible and this user does not exists yet? Let’s give it a try.

$ ansible-playbook create-user.yml -i inventory.yml 

PLAY [all] ************************************************************************

TASK [Gathering Facts] ************************************************************
fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ansible@localhost: Permission denied (publickey,password).", "unreachable": true}

PLAY RECAP ************************************************************************
localhost                  : ok=0    changed=0    unreachable=1    failed=0    skipped=0    rescued=0    ignored=0

It fails, as expected, because the ansible user does not exists in the host.

Well, there is a way to provide a one-time password by connecting a as different user. You will need to install sshpass with your favourite package manager, like apt install sshpass.

One installed, run the playbook once again with the following arguments:

-e “ansible_ssh_user=xxxxx”, where xxxxx is a valid user on the remote host
-kK, which tell Ansible to ask for a login and a sudo password

$ ansible-playbook create-user.yml -i inventory.yml -e "ansible_ssh_user=arturo" -kK
SSH password: 
BECOME password[defaults to SSH password]: 

PLAY [all] ************************************************************************

TASK [Gathering Facts] ************************************************************
ok: [localhost]

TASK [Make sure we have a "wheel" group] ******************************************
changed: [localhost]

TASK [Allow 'wheel' group to have passwordless sudo] ******************************
changed: [localhost]

TASK [Create "ansible" user] ******************************************************
changed: [localhost]

TASK [Add ssh key] ****************************************************************
changed: [localhost]

PLAY RECAP ************************************************************************
localhost                  : ok=5    changed=4    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Awesome, we have sucessfully created a new user!

Let’s try to connect using the ansible user with its key, as defined in the playbook.

$ ansible -m ping -i inventory.yml all
localhost | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python"
    }, 
    "changed": false, 
    "ping": "pong"
}

Stay tuned for more automation using Ansible.

Ansible and Juniper – SSH Keys and Prompts

On previous posts we’ve seen how to connect with Ansible using credentials stored in a inventory file, and using SSH keys for authentication.

However, it isn’t a good idea to store credentials in plain text files, neither have to rebuild your inventory when you want to switch over to key authentication.

A possible solution is to first ask for credentials, run a playbook to install the SSH key, and then use this key for authentication on later playbooks.

You can find all the files for this post on the following repo.

https://github.com/baldoarturo/ansible-ssh-keys

Variable prompts

  vars_prompt:
    - name: "ansible_user"
      prompt: "Username"
      private: no

The vars_prompt section is used to prompt the user for information, which is stored in variables. System variables can be populated, for example the ansible_user and ansible_password variables, allowing us to provide credentials to connect.

Take a look to the new version of the uptime playbook.

---
- hosts: all
  gather_facts: no

  vars_prompt:
    - name: "ansible_user"
      prompt: "Username"
      private: no
      unsafe: yes

    - name: "ansible_password"
      prompt: "Password"
      private: yes
      unsafe: yes

  tasks:
    - name: Get uptime
      junos_command:
        commands:
            - show system uptime
      register: uptime
    
    - name: Show uptime
      debug: var=uptime

We’re prompting for the username and password on the vars_prompt section. The private settings indicates if the user input should appear on the screen. The unsafe option allows to enter special chars.

The task to execute are:

Get system uptime via the junos_command module, with “show system uptime”
Print the result using debug

And the new (and definitive) inventory looks like this now.

all:
    hosts:
      "192.168.227.101":
    vars:
      ansible_connection: netconf
      ansible_network_os: junos
      ansible_ssh_private_key_file: juniper-hosts.key
      ansible_python_interpreter: auto_silent

The ansible_python_interpreter variable is set to auto_silent just to avoid the warning about no Python interpreters on the remote end.

Let’s give the playbook a run, trying to login with user and password. If you have not been following the Ansible series, let me tell you that there is an user admin with a password of Password$1 on the router. Note that the password won’t be seen on the screen.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ ansible-playbook junos-auth-with-key.yaml -i junos-hosts.yaml 
Username: admin
Password: 

PLAY [all] ******************************************************************************************************************

TASK [Get uptime] ***********************************************************************************************************
ok: [192.168.227.101]

TASK [Show uptime] **********************************************************************************************************
ok: [192.168.227.101] => {
    "uptime": {
        "ansible_facts": {
            "discovered_interpreter_python": "/usr/bin/python"
        }, 
        "changed": false, 
        "failed": false, 
        "stdout": [
            "Current time: 2020-01-13 17:12:32 UTC\nSystem booted: 2020-01-13 14:55:46 UTC (02:16:46 ago)\nProtocols started: 2020-01-13 14:56:03 UTC (02:16:29 ago)\nLast configured: 2020-01-12 16:09:02 UTC (1d 01:03 ago) by admin\n 5:12PM  up 2:17, 2 users, load averages: 0.00, 0.00, 0.00"
        ], 
        "stdout_lines": [
            [
                "Current time: 2020-01-13 17:12:32 UTC", 
                "System booted: 2020-01-13 14:55:46 UTC (02:16:46 ago)", 
                "Protocols started: 2020-01-13 14:56:03 UTC (02:16:29 ago)", 
                "Last configured: 2020-01-12 16:09:02 UTC (1d 01:03 ago) by admin", 
                " 5:12PM  up 2:17, 2 users, load averages: 0.00, 0.00, 0.00"
            ]
        ]
    }
}

PLAY RECAP ******************************************************************************************************************
192.168.227.101            : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Great, the prompts work.

What if we try to login with the user ansible we configured on the previous post? This user has an SSH key installed on the router, and the local private key is on juniper-hosts.key.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ ansible-playbook junos-auth-with-key.yaml -i junos-hosts.yaml 
Username: ansible
Password: 

PLAY [all] ******************************************************************************************************************

TASK [Get uptime] ***********************************************************************************************************
ok: [192.168.227.101]

TASK [Show uptime] **********************************************************************************************************
ok: [192.168.227.101] => {
    "uptime": {
        "ansible_facts": {
            "discovered_interpreter_python": "/usr/bin/python"
        }, 
        "changed": false, 
        "failed": false, 
        "stdout": [
            "Current time: 2020-01-13 17:32:05 UTC\nSystem booted: 2020-01-13 14:55:46 UTC (02:36:19 ago)\nProtocols started:
 2020-01-13 14:56:03 UTC (02:36:02 ago)\nLast configured: 2020-01-12 16:09:02 UTC (1d 01:23 ago) by admin\n 5:32PM  up 2:36, 
1 user, load averages: 0.00, 0.01, 0.00"
        ], 
        "stdout_lines": [
            [
                "Current time: 2020-01-13 17:32:05 UTC", 
                "System booted: 2020-01-13 14:55:46 UTC (02:36:19 ago)", 
                "Protocols started: 2020-01-13 14:56:03 UTC (02:36:02 ago)", 
                "Last configured: 2020-01-12 16:09:02 UTC (1d 01:23 ago) by admin", 
                " 5:32PM  up 2:36, 1 user, load averages: 0.00, 0.01, 0.00"
            ]
        ]
    }
}

PLAY RECAP ******************************************************************************************************************
192.168.227.101            : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Excellent, by using the user ansible without password, it will fallback to the key authentication.

Ansible and Juniper Junos – Using SSH Keys

Previous posts introduced basics connection methods to manage Juniper devices using Ansible playbooks. The inventory files had sensitive information and credentials which should not be accessible to anyone.

SSH and NETCONF over SSH requires client authentication, for example with and username and password, which could looks like this:

admin> show configuration system login 
user admin {
    uid 2000;
    class super-user;
    authentication {
        encrypted-password "$1$./TeE4CZ$uAMigDedlRuuJgcZx4hYk0"; ## SECRET-DATA
    }
}

If you are a frequent SSH user, maybe you are aware that there are other login methods besides using usernames and passwords. By using a key-pair, with public and private keys, a password is no longer needed. The public key is installed on the remote host, and the private key is kept on the control node.

Although by using keys a password is no longer needed, a passphrase can be used with a key, adding an additional security factor to the connection. In fact, using SSH keys with passphrases is considered best practice. However, a private key with a passphrase is less useful for scheduled automation tasks because an operator may not be available to enter the passphrase at the scheduled time.

Creating a Key Pair

A key pair is a set of two cryptographic keys, a public one and a private one. The public key, as its name says, is the one we expose to the public. The private key, must be kept in a secure location.

To create a key pair, lauch ssh-keygen on a new console and follow the prompts. This utility will create two files, which are the public and private keys. Use the -f flag to specify the destination of the output files.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ ssh-keygen -f ./juniper-hosts.key
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in ./juniper-hosts.key.
Your public key has been saved in ./juniper-hosts.key.pub.
The key fingerprint is:
SHA256:Kc/MZ11dLlXpcrK9PKO4L6XzaTrdczaek1ydzaFTFXw arturo@arturo-ThinkPad-L440
The key's randomart image is:
+---[RSA 2048]----+
|              ..+|
|               oE|
|              . =|
|         .   o O.|
|      . S     @.B|
|       *   . * +=|
|        = o = = +|
|         o =.o.%+|
|           +X=o+B|
+----[SHA256]-----+
arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ ls 
juniper-hosts.key  juniper-hosts.key.pub

Now that we have created the key pair, let’s examine them to find out how a key looks like.

This is the private key, which in fact is a plain text file with a RSA key inside it.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ cat juniper-hosts.key
-----BEGIN RSA PRIVATE KEY-----
MIIEpQIBAAKCAQEAn0XtdTPJxWHQHeWi8IwSOGtsgWwW3z86Z91edH0dBS6SWDzm
seWshSTD2PdD8EU0mGac1V9+rWIJYIw0VZlpTeEKiNnS+1/feHxln+S0LLwn91E6
fgVRt6kE/9uTv217Pa5eP5HLbPuSwPsVMwRgdCJL1pYIFagJLyrsakDWE0qDtBXZ
0fendmUr6NrrCLofjHBkRASviJ9E4Vvx4YqrG9ak+2jX21g0LRNxZ/nFv2uHwVeI
gibC+1JiZ/IGaBSNCovF131wb1AUyLm5Z5DNvM6hu2C6+cMTNNQ5fiBfgMpryboW
4+MVCGJQ31EthyFEx0XPgjos/EcS9Pp436OppwIDAQABAoIBAQCVczEws4qV2oVF
OG/fFSAXrr0e6ATCMHsmcLKrzaZIcX3CrEqwDNoICQp4cPRf5SBIDKkHElc0a/Ru
ksCcvZnxCMQwy2vMkhaH4PoewaRLAbbiu2aOT4FxO3jEeA44JovowdAQCEcAmUMI
L9GhkG7NKk1NKnSllYogpz81KGd3qw21sRqb1NTLAlYnE4KOhJz+GKmJV8NdAaRj
zjkVeaLf3t/FCxPRhdAtoADkRQSS1KSCjU0hx0lDCmsdJzM8gFJykltzzBQJ9tZu
voPZ0TkaIjrR7o8+Oez/pkvUSa1AJPhmH7l6P3RqHdSzMJGQraM1yuvBOTixbnkt
lsQ26tZpAoGBAM/CbeNhJvGfl18kLLLLSsNb4femXXlBimo6TW4tOGdOunD6+fyt
LiA0FMLpWfvAh++/yvWX/jee+E3uXkDLLfQVWqWBbfFqIhU4VOKLpMvE5sMQzkOc
OyooZNR5hui9e2+eU5P2qND/MUVy4YrUBdHtK58Or/cqYT6sHA78e9OtAoGBAMRB
Y8F79BaqYoH7x0lJf51A45U5rLzKom8eJ+aJujDx9RgDvCEWCZmZ53q6MjvNgNBp
v5AZptCDn0nIfAVOn2hCmTIPs1IvaLgVfMtmufxzdM1aGdYF7wEu0u7DV+Dzspf2
h9Q/9C4Or4YZ5oe25Qf5mkwb+xnnmTZCWZETC70jAoGALvskplp91/3i2RzxDq1y
BqNsgfgZAyaTClqMz/Fh49qlxo66oSz4VUfxufHS618qXkjcuJTaY/GK7PSOU9Ce
X6fEi9Cs7/60HmBSsbgqV/n6xPmz6w4VQv9HbdTdcRwIIcGH3NnWawyKM846uo4f
ks0zJBDKMfZfbzC0V5840TECgYEAraoTZR6Tsw7ZFp6/DZoNZBEMknsz4OgK7vsn
YbiUW0VwleyQKFMA8bwf+xkS5JqIF2TMT+5zD+a5KKhRHr0hEDiGqab9DofHScYx
5SelAsEEJcdKP3qGsWxG2WNguz3K1vAf5/Ej2THDno4C0itE5la4dAr6m0S27i2u
ZlMNOzMCgYEAr3/7pkN2LdUCVZYEjVMAW4YxSzi4R5JeixyidZwihVF4GqqqJ2Nl
VaOQzleNYUg23QWBs/n6yz2iDQaQdKXNCcOwrsQYzkX0aMuEz+4iWXkDKcRIvGWt
5H/7ShsKoXFsvwbjXkaTnirBMqAJh4jyv4R3oAqIy966zJ5K2vd0nT4=
-----END RSA PRIVATE KEY-----

And the public keys looks like this.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ cat juniper-hosts.key.pub 
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCfRe11M8nFYdAd5aLwjBI4a2yBbBbfPzpn3V50fR0FLpJYPOax5ayFJMPY90PwRTSYZpzVX36tYglgjDRVmWlN4QqI2dL7X994fGWf5LQsvCf3UTp+BVG3qQT/25O/bXs9rl4/kcts+5LA+xUzBGB0IkvWlggVqAkvKuxqQNYTSoO0FdnR96d2ZSvo2usIuh+McGREBK+In0ThW/Hhiqsb1qT7aNfbWDQtE3Fn+cW/a4fBV4iCJsL7UmJn8gZoFI0Ki8XXfXBvUBTIublnkM28zqG7YLr5wxM01Dl+IF+AymvJuhbj4xUIYlDfUS2HIUTHRc+COiz8RxL0+njfo6mn arturo@arturo-ThinkPad-L440

This keypair can authenticate an user which connects via SSH.

Installing the public key on Junos

We already know that a basic authentication schema on Junos looks like this.

admin> show configuration system login 
user admin {
    uid 2000;
    class super-user;
    authentication {
        encrypted-password "$1$kYNQ.bg0$4T3W7GAPuXwsX3nbbsRCb/"; ## SECRET-DATA
    }
}

The main idea of using SSH keys, is to avoid user interaction, by trusting the keys instead of a credentials combination.

As seen above, the keys are plain text files. We need to install the public key on the Junos configuration, either configuring it manually, or using Ansible to configure it.

Manually configuring the key

I added a new user called ansible, set its class as super-user and configured its authentication as ssh-rsa.

[edit]
admin# show system login | display set                                                                                                                                 
set system login user admin uid 2000
set system login user admin class super-user
set system login user admin authentication encrypted-password "$1$MExZQJdK$lLhnzSw.CLSMQg5bdIiws."
set system login user ansible uid 2001
set system login user ansible class super-user
set system login user ansible authentication ssh-rsa "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCfRe11M8nFYdAd5aLwjBI4a2yBbBbfPzpn3V50fR0FLpJYPOax5ayFJMPY90PwRTSYZpzVX36tYglgjDRVmWlN4QqI2dL7X994fGWf5LQsvCf3UTp+BVG3qQT/25O/bXs9rl4/kcts+5LA+xUzBGB0IkvWlggVqAkvKuxqQNYTSoO0FdnR96d2ZSvo2usIuh+McGREBK+In0ThW/Hhiqsb1qT7aNfbWDQtE3Fn+cW/a4fBV4iCJsL7UmJn8gZoFI0Ki8XXfXBvUBTIublnkM28zqG7YLr5wxM01Dl+IF+AymvJuhbj4xUIYlDfUS2HIUTHRc+COiz8RxL0+njfo6mn arturo@arturo-ThinkPad-L440"

The SSH public key is copied and pasted between double quotes.

Using Ansible to configure the key

Altough the key can be configured manually on the remote hosts, what if we have hundreds, or thousands of hosts to configure?

The idea behind this series of posts is to use Ansible whenever possible, so, let’s write a quick playbook to automate the key configuration.

rturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ cat junos-install-ssh-key.yaml 
---
- hosts: all
  gather_facts: no

  vars:
    - auth_key: "{{lookup('file', '{{ key_file }}')}}"    

  tasks:
    - name: Install SSH key on remote host
      junos_config:
        lines:
          - set system login user ansible authentication ssh-rsa "{{ auth_key }}"
          - set system login user ansible class super-user

The playbook starts as usual, matching all hosts in the inventory, and without gathering facts, just for the sake of speed.

On vars, we are using the lookup plugin to read from a file and store its contents on a variable. Lookup can retrieve data from multiple sources, for example, take a secret from Hashicorp’s Vault. In this scenario, it will read a file which name is take from the key_file variable from the inventory.

It is possible to set a fixed file name on the playbook, but by taking the filename as a variable from the inventory, it gives us more flexibility. We could have multiple keys and rotate them by just changing the file name on the inventory, or use different keys per host group, and still apply the playbook to the full inventory, while using the proper key for each group.

The inventory for this playbook looks like the following. Notice the key_file variable which tells the playbook where to look for the key.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ cat junos-hosts.yaml 
all:
    hosts:
      "192.168.227.101":
    vars:
      ansible_connection: netconf
      ansible_network_os: junos
      ansible_user: admin
      ansible_password: Password$1
      key_file: juniper-hosts.key.pub

Running the playbook to install the key

The current configuration of router logins is:

admin> show configuration system login 
user admin {
    uid 2000;
    class super-user;
    authentication {
        encrypted-password "$1$MExZQJdK$lLhnzSw.CLSMQg5bdIiws."; ## SECRET-DATA
    }
}

Let’s run the playbook to apply the new configuration which will create the ansible user, set ssh-rsa authentication for it, and set its class as super-user.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ ansible-playbook junos-install-ssh-key.yaml -i junos-hosts.yaml 

PLAY [all] *****************************************************************************

TASK [Install SSH key on remote host] **************************************************
changed: [192.168.227.101]

PLAY RECAP *****************************************************************************
192.168.227.101            : ok=1    changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

Ok, the playbook executed with no errors, and Ansible says there is 1 changed host, which is what we expected.

Let’s check the router configuration again.

admin> show configuration system login    
user admin {
    uid 2000;
    class super-user;
    authentication {
        encrypted-password "$1$MExZQJdK$lLhnzSw.CLSMQg5bdIiws."; ## SECRET-DATA
    }
}
user ansible {
    uid 2001;
    class super-user;
    authentication {
        ssh-rsa "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCfRe11M8nFYdAd5aLwjBI4a2yBbBbfPzpn3V50fR0FLpJYPOax5ayFJMPY90PwRTSYZpzVX36tYglgjDRVmWlN4QqI2dL7X994fGWf5LQsvCf3UTp+BVG3qQT/25O/bXs9rl4/kcts+5LA+xUzBGB0IkvWlggVqAkvKuxqQNYTSoO0FdnR96d2ZSvo2usIuh+McGREBK+In0ThW/Hhiqsb1qT7aNfbWDQtE3Fn+cW/a4fBV4iCJsL7UmJn8gZoFI0Ki8XXfXBvUBTIublnkM28zqG7YLr5wxM01Dl+IF+AymvJuhbj4xUIYlDfUS2HIUTHRc+COiz8RxL0+njfo6mn arturo@arturo-ThinkPad-L440"; ## SECRET-DATA
    }
}

There is a new user called ansible, with all the parameters we specified. That’s great!

Authenticating using keys

I wrote another playbook to show the system uptime

---
- hosts: all
  gather_facts: no

  tasks:
    - name: Get uptime
      junos_command:
        commands:
            - show system uptime
      register: uptime
    
    - name: Show uptime
      debug: var=uptime

And our inventory now looks like this.

all:
    hosts:
      "192.168.227.101":
    vars:
      ansible_connection: netconf
      ansible_network_os: junos
      ansible_user: ansible
      ansible_ssh_private_key_file: juniper-hosts.key
      ansible_python_interpreter: auto_silent

There is no plain text password, but instead, by setting up the ansible_ssh_private_key_file variable, we are instructing Ansible to authenticate using the private key.

arturo@arturo-ThinkPad-L440:~/Desktop/ansible-01$ ansible-playbook junos-auth-with-key.yaml -i junos-hosts-w-key.yaml 

PLAY [all] *****************************************************************************

TASK [Get uptime] **********************************************************************
ok: [192.168.227.101]

TASK [Show uptime] *********************************************************************
ok: [192.168.227.101] => {
    "uptime": {
        "ansible_facts": {
            "discovered_interpreter_python": "/usr/bin/python"
        }, 
        "changed": false, 
        "failed": false, 
        "stdout": [
            "Current time: 2020-01-12 16:25:10 UTC\nSystem booted: 2020-01-12 13:42:06 UTC (02:43:04 ago)\nProtocols started: 2020-01-12 13:42:27 UTC (02:42:43 ago)\nLast configured: 2020-01-12 16:09:02 UTC (00:16:08 ago) by admin\n 4:25PM  up 2:43, 3 users, load averages: 0.08, 0.02, 0.01"
        ], 
        "stdout_lines": [
            [
                "Current time: 2020-01-12 16:25:10 UTC", 
                "System booted: 2020-01-12 13:42:06 UTC (02:43:04 ago)", 
                "Protocols started: 2020-01-12 13:42:27 UTC (02:42:43 ago)", 
                "Last configured: 2020-01-12 16:09:02 UTC (00:16:08 ago) by admin", 
                " 4:25PM  up 2:43, 3 users, load averages: 0.08, 0.02, 0.01"
            ]
        ]
    }
}

PLAY RECAP *****************************************************************************
192.168.227.101            : ok=2    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0

This is great, now Ansible authenticates using the SSH key. You maybe are thinking:

“Do i have to edit my inventory every time i want to use keys?”

The answer is, no, and in the next post we will set a interactive prompt to connect using user and password to run the first playbook, which will configure the key, and then we will run all the other playbooks connecting with this key.

Stay tuned for more!