Cisco Firepower - Automating Cellular Failover

In a recent post, I walked through setting up a Netgear LTE cellular modem with Google Fi. Might seem random - but I had a plan for this!

In trying to ensure I keep a stable internet connection for work, I eventually led myself down the path of buying a cellular modem. That was part one. The next step was being able to somewhat-intelligently monitor my home internet connection, and then fail over to the cell modem when connectivity was poor (think lazy SDWAN).

At first I figured this would be easy.... but of course nothing is :)

I currently have a Cisco FirePower 1010 appliance that I'm using for a home firewall. Unfortunately, while this thing does support IP SLA - It wouldn't quite accomplish what I wanted to do. And to be fair, this box is intended to be a security appliance - not SDWAN.

Anyways - I talked myself into just writing some Python automation to monitor my home internet, and dynamically inject/remove static routing entries. I needed a new project anyways

For those of you wanting to jump straight to the code, the GitHub repo is here


Part 1 - Monitoring Packet Loss & Latency

So first thing - I regularly experience both complete internet outages (always at the worst times), as well as very high latency. Packet loss seems to be less common with my ISP - but hey, it happens sometimes too.

I opted to use the pythonping module as an easy way to collect some of the info I needed.

After importing the module, it's as easy as specifying a destination, packet size, and number of ICMP messages to send:

from pythonping import ping
result = ping('8.8.8.8', size=2, count=10, verbose=True)

To simplify at least one part of the operation, pythonping has a built in function to return the average round trip:

>>> print(result.rtt_avg_ms)
74.23

The actual contents of our ping results are going to be in the following format:
Reply from 8.8.8.8, 10 bytes in 43.82ms

So a quick way for me to figure out packet loss was to simply search for the "Reply from" text in each response:

lost = 0
for packet in result:
    if "Reply from" in str(packet):
        pass
    else:
        lost += 1

Easy enough - and we can use that to calculate the amount of packet loss against the 10 total packets sent:

lossperc = 100 * lost / 10

Once all that is figured out, we can use a simple expression to determine whether or not the loss or latency thresholds have been exceeded:

if response >= MAX_LATENCY or loss >= MAX_LOSS:
    print("Loss/Latency violate thresholds.")
    return True
else:
    print("Loss/Latency within thresholds.")
    return False

The code I wrote then takes one of two paths.

  • If the loss and latency is ABOVE our thresholds, call to the FirePower module to inject a static default route over the LTE modem
  • If the loss and latency is BELOW our thresholds, call to the FirePower module to delete the static default route - which returns traffic to the primary internet

These functions within the FirePower module were written so that each change will only be made once. For example, if the script checks and finds that the loss/latency is bad, but the static route towards the LTE modem already exists - then no additional changes are made.

So next - Let's take a look at the FirePower module.

Part 2 - Authenticating to FDM & Initial Checks

This was my first time getting into the FirePower FDM APIs - but they ended up being fairly straightforward to use.

Turns out that FDM has built-in API documentation, which is extremely helpful. This can be found at https://<FDM IP>/#/api-explorer

firepower-api-explorer

Okay - So first thing's first. Authenticating to the firewall:

oauth_data = {
    "grant_type": "password",
    "username": "admin",
    "password": "Cisco1234!"
}
headers = {
          "Content-Type": "application/json",
          "Accept": "application/json"
}
baseurl = "https://" + FDM + "/api/fdm/latest"

authurl = baseurl + "/fdm/token"
print("Posting AUTH request to FDM")
# POST request to FDM with headers / oauth data
resp = self.s.post(authurl, headers=headers,
                   data=json.dumps(oauth_data),
                   verify=False)
# If success - only pull out & return the auth token
if resp.status_code == 200:
    print("Auth success - got token.")
    return json.loads(resp.text)['access_token']
else:
    print("Authentication Failed.")
    print(resp.text)

In the code above - we take the headers and some basic authentication info (username, password, grant type) and send it in an HTTP POST request to https://<FDM IP>/api/fdm/latest/fdm/token

This returns an authentication token, which we'll need to include in any further HTTP request to the firewall. This can be done by sending a new set of headers in the future:

headers = {
          "Content-Type": "application/json",
          "Accept": "application/json",
          "Authorization": "Bearer " + self.token
}

Next we need to figure out which routing table to insert the route into. Since I am only using the default routing table, the script will just grab the UID for the default global routing table:

vr_url = baseurl + "/devices/default/routing/virtualrouters"
routing_table = requests.get(vr_url, headers=headers)

I mentioned earlier that the script will not make any changes if the firewall is already in the desired state. So we will take time now to check what state the firewall is in, before attempting to make any changes.

This is accomplished by making our first primary action scanning the routing table to see if our route already exists:

route_url = baseurl + "/devices/default/routing/virtualrouters/" + \
            self.globalVR + "/staticrouteentries"
route_data = requests.get(route_url, headers=headers)

# Convert returned JSON object
current_routes = json.loads(route_data)['items']
# Run through each route returned and see if it matches the one we're looking for
for route in current_routes:
    # For each route, we need to manually go look up the uid of our gateway & network objects
    gateway = self.getNetworkObject(route['gateway']['id'])
    dest_network = self.getNetworkObject(route['networks'][0]['id']).split('/')[0]
    # Match based on route prefix & upstream next hop gateway
    if gateway == GATEWAY and dest_network == ROUTE.split('/')[0]:
        print("Found route to %s via %s" % (dest_network, gateway))
        return route

Depending on whether we were looking to add or remove a route, the script may proceed with doing so - or just quit if the action has already been taken previously.

Part 3 - Failover (Add Static Route)

Assuming that we need to Add a route, we need to sent a POST request to https://<FDM IP>/api/fdm/latest/devices/default/routing/virtualrouters/<Virtual Router ID>/staticrouteentries with some required information (like subnet info, next-hop, etc)

Unfortunately, we can't just send a POST with the target subnet & gateway. Those need to be created as network objects on the FirePower box beforehand. For each network object, we need to build a quick JSON object with the required info:

host_data = {}
host_data['name'] = name
host_data['description'] = "Created by ISP Failover automation"
host_data['subType'] = subtype
host_data['value'] = address
host_data['type'] = "networkobject"

Then we can send a POST to the FDM API to create the object with our supplied parameters:

posturl = baseurl + "/object/networks"
requests.post(posturl, headers=headers, json=host_data)

Once we have those objects created, we can compile all of that info into a JSON object with all the components needed to create a static route entry:

routeobject = {}
routeobject['name'] = "route_BACKUP"
routeobject['description'] = "Created by ISP Failover automation"
routeobject['iface'] = {}
routeobject['iface']['id'] = iface_ID
routeobject['iface']['type'] = "physicalinterface"
routeobject['iface']['name'] = iface_name
routeobject['networks'] = [{}]
routeobject['networks'][0] = {}
routeobject['networks'][0]['id'] = network_ID
routeobject['networks'][0]['type'] = "networkobject"
routeobject['networks'][0]['name'] = network_name
routeobject['gateway'] = {}
routeobject['gateway']['id'] = gateway_ID
routeobject['gateway']['type'] = "networkobject"
routeobject['gateway']['name'] = gateway_name
routeobject['metricValue'] = 1
routeobject['ipType'] = "IPv4"
routeobject['type'] = "staticrouteentry"

Then pass all that into a POST request to create the route object:

add_url = baseurl + "/devices/default/routing/virtualrouters/" + self.globalVR + "/staticrouteentries
requests.post(add_url, headers=headers, json=routeobject)

Success? Well not quite yet.

FirePower requires all changes to be deployed (or applied) to the system before they take effect.

So next we need to initiate a deployment task - and periodically check on it. Deployments can take a while to complete, depending on number of changes, processing power of the appliance, etc.

# First send a POST to the deployment endpoint:
deploy_url = baseurl + "/operational/deploy"
deploy_response = requests.post(deploy_url, headers=headers)
# Grab the deployment task UID:
deploymentID = json.loads(deploy_response.text)['id']

# Loop while deployment is running:
while deployed is False:
    # Sleep for a few seconds:
    sleep(5)
    # Get list of all deployment tasks:
    tasklist = json.loads(requests.get(deploy_url).text)
    for task in taskList['items']:
        if task['id'] == deploymentID and task['state'] == 'DEPLOYED':
            print("Deployment status is: " + task['state'])
            deployed = True
            return True
        elif task['id'] == deploymentID and task['state'] != 'DEPLOYED':
            # If changes not yet deployed, check again momentarily
            print("Deployment status is: " + task['state'])
            deployed = False

So in the above block, we POST a request to deploy changes. The response of that POST request contains our deployment ID, which we keep to check the status later.

Then the script waits a few seconds, and pulls a list of all deployment tasks. Search through that list for our deployment ID - and check to see if it has been completed yet. If not, wait and run the loop again.

And that's it! Once we get confirmation that our changes have been deployed - we've now routed traffic over the secondary connection (an LTE modem, in my case).

Part 4 - Fail-Back (Delete Static Route)

Okay - this section will be fairly quick. We've already looked at authenticating, checking if our route already exists, and how to add a new route. So the only thing remaining is removing our route if we wanted to migrate traffic back to the primary connection.

Same logic applies as earlier. If our path monitoring script runs and finds that loss & latency are within the thresholds we set, then we make a call to the firepower module. First we check to ensure there is a route for us to delete, and if so - proceed with deleting it.

Adding a route is a little more work, since we may need to create network objects. However, when we delete the route - we'll just leave those objects on the FirePower box. They'll be there for the next time we need them (which also speeds up deployment times).

To get started, we just need the UID for the route we want to delete.

Remember that code I showed above, where we check to see if the static route exists? And match on our intended subnet / gateway pair? Well, I just made that into a function that will return the UID of the route if it finds it. Easy enough!

Next, we just send an HTTP DELETE to the static routing endpoint - and reference that UID:

del_url = baseurl + "/devices/default/routing/virtualrouters/" + \
          self.globalVR + "/staticrouteentries/" + route['id']
requests.delete(del_url, headers=headers)

Once that succeeds - we use the same logic as earlier to deploy the changes.

After that, our route is removed & traffic should be flowing over our primary connection again.


If it helps anyone - I also threw together a quick diagram to explain the flow of operations here:
flow-diagram


Well - that's it. This was a fun side project to work on over the past week or two. I hadn't spent much time with the FirePower/FDM APIs just yet. While there was a bit of work in creating all the necessary network objects, the overall process was fairly simple.

It helped tremendously that the FDM API explorer exists, and is available on-box. This utility allows you to see all the available API calls, what parameters they require, and even run test calls from the web UI. This greatly reduced the time needed to figure this out.

Hope this was interesting. If you would like to see the whole project - check out the repo on my GitHub page.