Networking Basics on 0x2142 | Networking Nonsense

L2 Basics: Configuring an EtherChannel

Tue, 30 Jan 2018 10:00:46 +0000

Today we’re going to take a look at how to configure an etherchannel between two Cisco Switches.

What is an etherchannel? It’s a way of taking multiple independent links and bundling them together, so that they appear as one logical connection between two devices. Etherchannels are commonly used between two switches, or between a switch and a host - which allows for both additional bandwidth and fault tolerance/redundancy. In the example today, we’ll be using an etherchannel protocol called Link Aggregation Control Protocol (LACP). LACP is an IEEE standard (802.3ad).

You might be thinking “Wait, wouldn’t multiple links cause a loop? Or trigger Spanning-tree to block ports?”. Not in this case! Etherchannel technologies work around those problems by creating a single logical interface for spanning-tree to worry about. The etherchannel protocol itself worries about loop prevention in between the two devices, so we get multiple ports of non-blocking bandwidth.

For everything we cover in this example, we’ll be using the following topology:

So we have two switches, which are connected together via Eth0/0 and Eth0/1. Each switch has three VLANs configured - 10, 20, and 30.

Configuring an Etherchannel

I’ll only be showing the configuration from the perspective of 0x2142-SW1 - but all configuration is replicated on 0x2142-SW2.

! We'll use the interface range command to apply the etherchannel configuration to
! both Eth0/0 and Eth0/1 at the same time:
0x2142-SW1(config)#int range Eth0/0 - 1

! We specify which etherchannel protocol to use by configuring 'channel-protocol'
! PAgP is a Cisco Proprietary protocol, but we'll be using LACP for this example:
0x2142-SW1(config-if-range)#channel-protocol ?
  lacp  Prepare interface for LACP protocol
  pagp  Prepare interface for PAgP protocol
0x2142-SW1(config-if-range)#channel-protocol lacp

! Next we need to specify a channel-group and mode:
0x2142-SW1(config-if-range)#channel-group 1 mode ?
  active     Enable LACP unconditionally
  auto       Enable PAgP only if a PAgP device is detected
  desirable  Enable PAgP unconditionally
  on         Enable Etherchannel only
  passive    Enable LACP only if a LACP device is detected

0x2142-SW1(config-if-range)#channel-group 1 mode active
Creating a port-channel interface Port-channel 1

0x2142-SW1(config-if-range)#
*Jan 26 01:03:04.532: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-channel1, changed state to up

Let’s talk through a few notes about the above configuration. In order to enable etherchannel, we only need to configure two commands: channel-protocol and channel-group. The channel-protocol command tells the switch which etherchannel protocol to use for negotiation (LACP in this case). The channel-group command provides two necessary components: the group number and mode. The group number is just a device-local identifier for which group to add these ports to. When we specified group 1, the switch adds both Eth0/0 and Eth0/1 into the new logical interface Port-Channel 1.

The etherchannel mode is also important. The two primary options we want to look at for LACP are active and passive. Active tells the switch to preemptively send out LACP negotiation packets. In this case, the switch really wants the ports to become a bundle and will ask it’s partner device for an etherchannel to be formed. Passive mode tells our switch to only negotiate if another device wants to. In this mode, our switch won’t send out any etherchannel negotiation packets unless its partner device does so first.

Generally speaking, the most common configuration is to set the mode on both devices to active. This ensures that both devices actively participate in trying to establish an etherchannel. Placing one device in active and one in passive will work as well. However, if both devices are placed into passive mode, an etherchannel will never form.

Validation

So how do we validate that the etherchannel has formed correctly? One way is using the show etherchannel summary command:

0x2142-SW1#show etherchannel summary
Flags:  D - down        P - bundled in port-channel
        I - stand-alone s - suspended
        H - Hot-standby (LACP only)
        R - Layer3      S - Layer2
        U - in use      N - not in use, no aggregation
        f - failed to allocate aggregator

        M - not in use, minimum links not met
        m - not in use, port not aggregated due to minimum links not met
        u - unsuitable for bundling
        w - waiting to be aggregated
        d - default port

        A - formed by Auto LAG

Number of channel-groups in use: 1
Number of aggregators:           1

Group  Port-channel  Protocol    Ports
------+-------------+-----------+-----------------------------------------------
1      Po1(SU)         LACP      Et0/0(P)    Et0/1(P)

From the output above, we see that there is one group configured with the group ID of 1. It shows that both Eth0/0 and Eth0/1 have been added into the Port-channel 1 interface. The (SU) next to the Port-channel interface indicate that the etherchannel is up (U) and configured for layer 2 (S). I mentioned earlier that spanning-tree only worries about the port-channel interface, not the individual member ports. We can also check that out by using the show spanning-tree command:

0x2142-SW1#sh spanning-tree vlan 20
VLAN0020
  Spanning tree enabled protocol rstp
  Root ID    Priority    32788
             Address     aabb.cc00.1000
             This bridge is the root
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    32788  (priority 32768 sys-id-ext 20)
             Address     aabb.cc00.1000
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time  300 sec

Interface           Role Sts Cost      Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Et0/2               Desg FWD 100       128.3    Shr
Et0/3               Desg FWD 100       128.4    Shr
<-- Output omitted -->
Po1                 Desg FWD 56        128.65   Shr

Making Configuration Changes to an Etherchannel

Now that we have a working etherchannel - We have a few things that need special attention. The individual port configurations, Eth0/0 and Eth0/1 in this case, need to match at all times! Port configuration mis-matches are going to be an easy way to inadvertently bring down the port-channel. The good thing is that we now have a convenient Port-Channel interface which we can use for configuration. This logical port will replicate any configuration changes to all member ports.

! Let's jump into our Port-Channel 1 interface and configure a trunk for VLAN 20
0x2142-SW1(config)#int po1
0x2142-SW1(config-if)#switchport mode trunk
0x2142-SW1(config-if)#switchport trunk allowed vlan 20
! Now we can check the individual port configs:
0x2142-SW1(config-if)#do sh run int e0/0
Building configuration...

Current configuration : 176 bytes
!
interface Ethernet0/0
 switchport trunk allowed vlan 20
 switchport mode trunk
 channel-protocol lacp
 channel-group 1 mode active
end

0x2142-SW1(config-if)#do sh run int e0/1
Building configuration...

Current configuration : 176 bytes
!
interface Ethernet0/1
 switchport trunk allowed vlan 20
 switchport mode trunk
 channel-protocol lacp
 channel-group 1 mode active
end

Easy enough, right? The configuration changes for the trunk are now on both Eth0/0 and Eth0/1.

Troubleshooting Etherchannels

There is always a possibility that something goes wrong - so let’s take a quick look at some common problems and how to fix them.

Remember how I said that the member port configurations had to match? Here’s what happens if we make a configuration change on only one of the two member ports:

0x2142-SW1(config)#int eth0/1
0x2142-SW1(config-if)#switchport trunk allowed vlan 30
0x2142-SW1(config-if)#
*Jan 28 20:43:55.458: %EC-5-CANNOT_BUNDLE2: Et0/1 is not compatible with Et0/0 and will be suspended (vlan mask is different)

Eth0/1 immediately gets put into a suspended state, and is no longer active in the port-channel interface. In this case the switch gives us a good hint as to what’s wrong - vlan mask is different. Error messages will vary slightly, but a suspended port is easy to fix by comparing individual port configurations and fixing the mismatch.

Here’s another one:

*Jan 28 21:06:07.346: %EC-5-L3DONTBNDL2: Et0/0 suspended: LACP currently not enabled on the remote port.
*Jan 28 21:06:08.009: %EC-5-L3DONTBNDL2: Et0/1 suspended: LACP currently not enabled on the remote port.

This error message can mean a few things - the common one being exactly what it states! Check both sides of the connection, and ensure that LACP is configured on each device. This error message can also occur on certain mismatches - like if one side is running as a Layer 2 etherchannel, but the other side is running as Layer 3.

One more:

Jan 28 20:83:55.458 %ETHPORT-5-IF_DOWN_PORT_CHANNEL_MEMBERS_DOWN: Interface port-channel1 is down (No operational members)

The above message is also somewhat self-explanatory. In this case, the switch is unable to bring up the port-channel interface, because none of the underlying member ports are coming online. Troubleshoot what might be wrong with those ports first, then the port-channel should come up.

Hope this was useful! In a later post, we’ll dig into more configuration and considerations - like packet hashing, layer 3 etherchannels, and how packets are weighted between interfaces.

Questions? Drop them in the comments below!

What's wrong with VLAN 1?

Tue, 05 Dec 2017 09:07:06 +0000

Earlier this year I was involved in a string of interviews for an open network engineer position. The questions and scenarios provided during the interviews were aimed for someone mid-level. One of the more basic-ish scenario questions I like to ask is the following:

Given a brand new switch, can you provide me the commands you would use to configure the first four ports for hosts in VLAN 15?

This question is always interesting because I get such a wide variety of responses. You can certainly filter out people quickly who have never touched a switch. Some people will start with conf t, while others just jump straight into setting the VLAN tag. Some people specify that they’ll use an interface range command, while others get confused and want to configure the ports as a trunk. In fact, during this year’s interviews I had quite a significant number of people who completed the scenario by providing the commands to configure a trunk port, instead of static access. One thing that struck me as noteworthy is that the people who did this also provided the commands they needed to change the default native VLAN.

For a vast majority of networking devices (maybe all of them), the default/native VLAN for a trunk port is VLAN 1. This is not the best configuration for reasons we’ll get into a bit later - but unfortunately it needs to be manually changed. In every interview where the candidate suggested this be changed, I followed up with asking why. I asked so that I can find out two things: How well the individual is at explaining concepts, and whether or not this is something they just do because they were taught or if they actually understand the logic behind it. Surprisingly, a vast majority of the candidates could only provide the reasoning that “Well, VLAN 1 is bad” - but they couldn’t elaborate on why.

So why is VLAN 1 bad to use?

Technically, VLAN 1 itself isn’t the problem. The concept of a default VLAN allows for someone to attack a network by taking advantage of how switches use a default VLAN. Since VLAN 1 is typically set as the default for most vendors, then it becomes a well-known configuration for attackers to abuse. If every vendor had set the default native VLAN to 52, then we would still run into the exact same problem except the ‘bad’ VLAN would be 52. So let’s back up just a little here and explain a bit of background. The concept of a VLAN is a segmented logical network. Rather than requiring a different piece of physical hardware to keep hosts separate, we can assign them into different virtual LAN segments. This is accomplished by ’tagging’ each Ethernet packet with a VLAN ID. Internally, the switch code does not allow packets that contain one VLAN ID tag to be sent to hosts configured with a different VLAN ID tag.

Let’s look at a few practical examples of how this works. When you configure a host port, you would configure the switch with the appropriate VLAN ID tag that the host should be assigned to. The actual server may know nothing about the VLAN it is in, but the switch knows to inject that VLAN tag into the Ethernet headers of every packet received from that server. For the rest of that packet’s life on the network, every switch reads that VLAN tag to assist in forwarding decisions. Trunk ports are commonly used between switches, and these ports are enabled to carry multiple VLAN ID tags. The problem here is that each switch is expecting that the packets it receives already contain a VLAN tag in the Ethernet header.

So whats the problem with the default native VLAN? This is a special type of VLAN in which the switch never attaches a VLAN tag to the headers. Whenever we connect two switches via a trunk port, we are likely configuring multiple VLANs that need to cross that link. However, the switches themselves still need to communicate directly with eachother (for protocol negotiations/spanning-tree), and each switch isn’t necessarily going to know what VLAN the other wants to use for this management traffic. This is where the concept of a native VLAN comes in. For every configured trunk link, each switch has a default native VLAN for which it expects to receive packets with no VLAN tag headers. Even though normal network traffic crossing a trunk link is going to require a VLAN tag in the headers, the switch-to-switch control-plane communication is sent with no header present.

This is where VLAN 1 becomes a problem because of the native VLANs are processed/interpreted by the switch. Whenever a switch receives a packet which contains the VLAN ID set to the native VLAN, it knows that packet doesn’t need to contain a VLAN header - so it removes the VLAN ID header.

How could this be exploited? Well for example, let’s say that we had a network where every developer PC was using a trunk port and permitted to use VLAN tags. This might be so they could run local VMs for testing that connect to both the DEV and QA networks. If we leave the default native VLAN as 1, then a malicious developer could exploit this to gain access to another segment. This is accomplished by using a software package to double-tag an Ethernet packet with two separate VLAN ID headers. The first VLAN tag header will be set to the native VLAN (VLAN 1), and the second header will be set to the target VLAN - let’s say we use VLAN 20 for the accounting network. When the switch receives the packet across the trunk link, it will read the Ethernet headers. When it processes the first VLAN tag and sees that it matches the default native VLAN, the switch strips this header - which then leaves the second VLAN tag header for VLAN 20. When the switch goes to forward this traffic to it’s destination, the remaining VLAN header will allow the packet to bypass any security measures and be directly forwarded on VLAN 20.

Well what if we don’t provide end users with a trunk port? Could they still execute this type of attack? Remember that the default switchport configuration allows for dynamic negotiation - where the connecting computer can tell the switch whether or not it needs an access port or a trunk port.

If VLAN 1 is well-known as the default, what should the native VLAN be set to?

Anything you like! The key thing is that it should be a VLAN which has no access to any network resources - so it should be a VLAN with no hosts and gateway. I usually don’t even create the VLAN on the switch itself, therefore immediately black-holing all traffic sent to that VLAN.

How else can this be prevented?

Always statically set your end user ports to switchport mode access, and enable switchport nonegotiate. This will prevent the switch from allowing port negotiations, which prevents a user from tricking the switch into assigning a trunk port.

Newer Cisco switches also support a global configuration command: vlan dot1q tag native. This command will force the switch to require VLAN tags to be present on packets in the native VLAN. See more detail here. This will need to be configured on all switches in your network to be effective.

Do you change the native VLAN in your network? Or is it not a big security concern for you? Comment below!

L2 Basics: Spanning-Tree Protocol

Tue, 14 Nov 2017 08:00:22 +0000

Spanning-tree protocol (STP) is one of those network technologies that is easy to forget about. It exists in the background of almost every network, and for the most part it does it’s job without any issues. However, there is still a huge benefit to understanding what STP does and how it works - because it’s default behaviors might not the the best for every network.

I’ve been making progress going through my CCIE books, and the earlier sections are focusing on layer 1 and layer 2 technologies. A lot of this is review from CCNP studies, but with STP the book starts to get into additional detail on the inner workings of the protocol - which I’m finding to be quite fascinating. It seems like in many of the companies that I’ve worked I’ve found that a majority of the IT staff (whether sysadmins or network admins) don’t really have a good handle on how STP works and why it needs to be tuned. So this post is meant to cover spanning-tree at a very high level, and I’ll include some examples from issues I’ve seen in the past.

So what is spanning-tree protocol anyways?

At it’s very basic level, STP is a communications protocol used between switches to allow them to identify redundant paths in the network. The goal of STP is to figure out what is the most efficient L2 path through the network, then block all other paths to prevent loops. The best way I’ve heard STP explained is that it’s essentially a routing protocol for layer 2. Rather than routers communicating and exchanging routes to determine the best path through a network, all of the switches will talk to determine the best (loop-free) layer 2 path.

STP determines the best layer 2 path - but the best path to what?

When configuring a standard routing protocol (like EIGRP or OSPF), you might have a node that advertises a route for 10.10.10.0/24. All other routers in the network are going to select a best path to the router who originates this advertisement - but how does something like this work when we’re talking about layer 2?

Spanning-tree relies on the concept of having a single root bridge of each network. At the beginning of a spanning-tree process, all switches will hold a quick election to determine who the root bridge is - then each switch will figure out what it’s own best path is to that device. The switch that ultimately becomes the root bridge will be based on the priority set by the administrator - but by default all switches are pre-configured with the same priority. In a tie, the switch with the lowest MAC address will win and become the root bridge.

What does that actually mean? More or less, one switch gets put in charge of defining the best path through the network. All other switches examine all of their redundant paths to the primary switch, then figure out which of those paths are more preferable than the others. An important note here, is that the “best path” selected is all from the specific viewpoint of whichever switch takes charge.

For an example, let’s use the following topology:

In this example, we have five switches and a firewall - which are used to provide connectivity to two network segments (NET1 and NET2). For each of the two network segments, there are a number of different paths that traffic could take to reach the firewall. Without spanning tree, NET1 might send traffic to SW4, which in turn would forward it to both SW2 and SW3. This sounds like a good thing, since we would use all available paths to try and reach the firewall - but in reality this can cause other problems like the firewall receiving packets out of order.

So for the example above, let’s assume that SW1 becomes our root bridge. SW1 is now in charge of determining what the best path through the network is. It does this by sending out messages on all ports connected to other switches, called Bridge Protocol Data Units (BPDU). In this message, SW1 asserts it’s role as the root bridge - and provides some information for other switches to use for path selection. Each switch will examine the message from SW1 to determine which of it’s uplinks is the most efficient path to SW1. Once each switch does this, it will forward on the message to downstream switches - this time adding in some of it’s own information (or path cost).

After all that is complete, we might be left with the following path below:

The green lines above show the final path that was selected. For NET1 to reach the firewall, it would use SW4, then SW2, then up to SW1. For NET2, it would use SW5 > SW2 > SW1. This leaves the orange links unused. In fact, spanning-tree will place these links into a blocking state. The switches might still listen on those links, just in case their neighbor starts advertising a better path - but they will not forward any data traffic on these connections. In the case of SW2 suddenly failing, SW4 and SW5 would still be aware of their connections through SW3 - and after a brief period would begin using those links to reach the firewall.

This is a very simplistic explanation, and there is a lot more in the background that actually happens during spanning-tree operation. There are a number of different STP standards that a switch can run, each with their own options for configuration and tuning. There are also methods of providing a loop-free path while still utilizing some redundant paths. I plan to cover some more detail on these topics in later posts.

So why should I care about STP?

Remember that part earlier when I said that if STP priority is not configured, then switch with the lowest MAC becomes the root bridge? Well as it turns out, MAC addresses are the hardware addresses configured by the manufacturer - and these addresses increment as they produce new devices. So the lower MAC addresses are typically going to be the oldest equipment in your network. Unfortunately, this can have a dramatic effect on your network traffic if you’re not paying attention to STP.

From the earlier example, what happened if SW4 became the root bridge? Maybe this was an old Cisco 2950 that someone forgot to replace and it’s just been left in the network. If the STP configuration went unmodified, then this switch would likely become the root bridge of our network. Let’s look at what that path might look like:

So in this case, SW4’s path to the firewall hasn’t changed. However, it’s best path to SW5 and NET2 is through SW3 - which means any traffic from NET2 to the firewall has to follow the path of SW5 > SW3 > SW4 > SW2 > SW1. Not only does that add more layer 2 hops that NET2 has to pass through, but it also adds more (unnecessary) load onto SW4. What happened if SW4 was so old that it still had 100M ports? It might get overwhelmed pretty quickly.

Now you might be thinking, “How often does this really happen”? Well, when I started at my last job they were experiencing a similar issue. The primary building had three floors, each with two Cisco 3548 switches to service users. Each of these switches linked back to a pair of Cisco 4500 core switches. All of the 3548 switches were purchased at the same time (far prior to the 4500s), and it turned out that one of them on the third floor had the lowest MAC address in the network. The entire layer 2 topology was then based on this switch as the central point of the network. This caused the interconnects between the core switches to be put into blocking mode - meaning that if a switch on the second floor needed to connect to the alternate core switch, then it would have to pass traffic through the third floor. A quick change to the spanning-tree priority (during a maintenance period) was all that was needed to put the core switches back in charge.

This doesn’t immediately make spanning-tree a bad technology. As with just about anything in IT, it’s something you need to understand and tune to fit your needs - otherwise you’ll just get less-than-ideal results. At another employer, I actually found out that the previous network administrator had manually disabled all of the redundant paths in the network - because he didn’t understand STP, and therefore thought that any redundant paths would cause a loop. Spanning-tree isn’t something we need to be afraid of - it just needs a little attention.

So next time you’re logged into one of the switches in your network, just run show spanning-tree and double-check that the switch you assume is your root bridge actually is.

Well I hope that this was helpful. As I mentioned earlier, I meant this as a fairly basic overview - but I intend on diving a bit deeper in later posts. The most fascinating part of networking to me is all the details on how things like spanning-tree actually work behind the scenes. Have any spanning-tree stories? Leave a comment below

SRX Basics: Redunancy Groups and Failover

Tue, 18 Jul 2017 08:00:24 +0000

In last weeks post, we took a look at how to set up a chassis cluster on a Juniper SRX Firewall. So now that we have a basic cluster setup - Let’s explore some of the additional options and configuration items.

Redundant Ethernet Interfaces

So first thing is first - Once you have a cluster configured, you’ll probably want to configure a few sets of redundant ethernet interfaces. These interfaces are also often referred to as reth interfaces. This will create a shared interface between your SRX pair, where you can configure IP address and VLAN information to be shared between the two. Let’s say that we have a Juniper SRX 1500 cluster, and we want to create a redundant interface for one of our 10Gb ports. Here is how we would do that:

root@testsrx# set interfaces xe-0/0/16 gigether-options redundant-parent reth1
root@testsrx# set interfaces xe-7/0/16 gigether-options redundant-parent reth1
root@testsrx# set interfaces reth1 redundant-ether-options redundancy-group 1

In the config above, we first take both of our interfaces (xe-0/0/16 on node0, and xe-7/0/16 on node1) and tell them that they now belong to a redundant interface group (reth1). Next, we enter into the reth1 config, and associate it to a redundancy group.

You’re also going to need to keep in mind that the SRX requires you to specify how many redundant ethernet interfaces will be configured. This is likely a memory thing, since each SRX also has a different maximum number of reth interfaces that can be configured. For example, if you tell the SRX that you need 5 reth interfaces, then the SRX will allocate system resources to manage those interfaces. In order to set the number of available reth interfaces, we’ll use the following command:

root@testsrx# set chassis cluster reth-count 5

Redundancy Groups

A redundancy group, or RG, is used as a container for logically grouping redundant interfaces/virtual routers which must fail over together. A single RG can be configured as primary on one of the two active SRX firewalls is a cluster - with the ability to fail over to the other node. For example, we might want be planning on only using one virtual routing instance on our SRX - so we would create RG1 and assign out interfaces to belong to it.

A quick note - all interfaces in a single virtual router must belong to the same RG. This way the virtual routing instance and all of it’s associated interfaces will always run on the same SRX node. In order to achieve an active/active firewall configuration, you would need to create two separate virtual routers, each with their own reth interfaces and different RGs. Then you would make RG1 primary on node0, and RG2 primary on node1.

In most configurations, dumping all of your reth interfaces into RG1 will be sufficient. You’re likely going to want to set up a priority for each RG - and maybe even preemptive fail-over. In order to do that - you’ll have to configure each cluster member with a priority:

root@testsrx# set chassis cluster redundancy-group 0 node 0 priority 200
root@testsrx# set chassis cluster redundancy-group 0 node 1 priority 50
root@testsrx# set chassis cluster redundancy-group 1 node 0 priority 200
root@testsrx# set chassis cluster redundancy-group 1 node 1 priority 50
root@testsrx# set chassis cluster redundancy-group 1 preempt

The higher priority wins here - so if you set node0 to a higher priority and preempt is enabled, then node0 will actively try to take ownership of RG1. I would rather not set preempt on RG0 for a few reasons - which we’ll cover in the next section. Priorities can also be modified using interface monitoring, so if a particular interface goes offline we can decrement the priority of that node (also covered below).

A Note About RG0

You might notice from the last post, that you’re output of show chassis cluster status already showed two redundancy groups: RG0 and RG1. RG0 is only used for management traffic and manages the routing engine for your SRX. Unfortunately, this can lead to some weird behaviors that you might not be expecting.

For example, whichever node is primary for RG0 is the only node that collects interface and monitoring statistics. If you’re using a monitoring tool that polls data from both of your SRXs, then the secondary for RG0 will report nothing about it’s interfaces, CPU, etc. This is also true if you log into the actual SRX itself - a show interfaces will actually return a bunch of default values, including showing that your ports are half-duplex. Don’t panic though, this is just an oddity of RG0. If you log back into the primary node for RG0, then it will show all of the proper statistics for both SRX firewalls.

Due to these weird things about RG0 - I prefer to always leave it on node0. Therefore I know which one to log into whenever I need to look at something, or which SRX to check in our monitoring tools. It’s also worth noting that whichever SRX is primary for RG0 is also the node you’re going to need to log into for configuration changes - even if all of your other redundancy groups are the other SRX.

Weird, right?

Oh, and be warned that since RG0 controls the routing engine, a failover of this RG can cause brief outages. This is primarily because the routing table and firewall state information will be lost. The secondary node has to spin up new processes for the routing engine, and at least currently there isn’t a graceful sync of all of that data.

Interface Monitoring

I mentioned setting device priorities a bit earlier. Setting interface weights is going to be the primary method for dynamically affecting those priorities, and therefore possibly causing a preemptive failover. One example might be that you’re using an SRX cluster for your edge firewall, and you want it to automatically fail over if the primary loses it’s internet uplink.

Note that you must configure the physical interfaces here, not the redundant ethernet interfaces:

root@testsrx# set chassis cluster redundancy-group 1 interface-monitor xe-0/0/16 weight 160

Remember when we set the priorities of our firewalls earlier? Node0 was set to 200, and node1 at 50. So here we are saying that xe-0/0/16 on node0 is worth 160 points. So if xe-0/0/16 goes down, then node0 will decrement it’s priority by 160 - which will be 40. This will trigger a preemtive failover by node1. The reverse is also true - when xe-0/0/16 comes back up, then node0’s priority will go back up to 200. Then node0 will take back ownership of RG1.

Manual Failover

There is a pretty good chance at some point you might need to perform a manual failover of your SRX redundancy groups. Maybe you need to do some maintenance or upgrades, or you just want to make sure failover works as you expect. In either case, the commands to do this are pretty straightforward:

root@testsrx> show chassis cluster status
Cluster ID: 5
Node       Priority       Status       Preempt       Manual failover
Redundancy group: 0 , Failover count: 0
node0      200            primary      no            no 
node1      50             secondary    no            no

Redundancy group: 1 , Failover count: 0
node0      200            primary      yes           no 
node1      50             secondary    yes           no

root@testsrx> request chassis cluster failover redundancy-group 1 node 1 

root@testsrx> show chassis cluster status 
Cluster ID: 5
Node       Priority       Status       Preempt       Manual failover
Redundancy group: 0 , Failover count: 0
node0     200             primary      no            no 
node1     50              secondary    no            no

Redundancy group: 1 , Failover count: 1
node0     200             secondary    yes           yes
node1     255             primary      yes           yes

Okay - so let’s talk about a few things that have happened here. I always recommend that you run a show chassis cluster status first, so you know where things already stand. Then we can proceed by requesting a failover. To do this, you have to specify which redundancy group you want to fail over, and which node you want to become the new primary. So in this case, we made node1 the new primary of RG1.

You might also notice that the priorities have changed, and the devices are marked as being in a manual failover state. This is important, because you cannot manually fail back until you reset this state. That’s right - if you tried to run the failover command again to move RG1 back to node0, it will not work. An automatic failover due to hardware failure or interface monitoring will still be permitted. In order to perform a manual fail-back to node0, we have to run the following reset command:

root@testsrx> request chassis cluster failover reset redundancy-group 1

Hopefully between last weeks post and this one, you should have a good handle on the basics of configuring a chassis cluster on your new pair of Juniper SRX firewalls. Let me know in the comments below if this helped you!

SRX Basics: Clustering

Tue, 11 Jul 2017 08:00:07 +0000

So you just unboxed a brand new pair of Juniper SRX firewalls - now what? Well, the first thing you’re likely going to want to do is get the two devices hooked up and clustered together. That should be pretty simple, right? Yeah, mostly - though there are a few variations between device models, and there are a few fine-print steps that might keep you from getting everything working the first time.

So let’s take a look at what we need to do!

Physical configuration

First thing we need to do is get both devices unboxed and cabled appropriately. In order to get a successful cluster configured, we will need to get two critical ports connected: the HA control port and the cluster fabric port. Technically only the HA control port is required to get a cluster working, but you’ll want to get the fabric port working as well - here is what both ports are used for:

HA Control Port - This is used for communication between the cluster members. This connection is used just for control-plane stuff - like keepalives/heartbeats and config sync between the two nodes.

Fabric Port - This port is used for data sync between the cluster members. All routing/firewall state information is synced using this port, and any cross-cluster traffic is also transferred using this port (for example, if one SRX was primary for a redundancy group, but the secondary was the active BGP speaker for your upstream connection - then the traffic would come in through the secondary and cross this link to reach the primary RG)

The fabric port is the easiest to connect - because you can use any port you like, then specify which port to use in the CLI. The control port, however, must be the assigned port that Juniper allocates for this use. Unfortunately, this port varies between device models. The two most common SRXs that I’ve deployed are the 345 and 1500. The SRX 1500 has a dedicated 10G HA control port, but the SRX 345 actually uses ge-0/0/1 on both nodes for this. Juniper lists what all those port assignments are over on this page.

Once those ports are connected, go ahead and power on both devices!

JunOS Config

Okay - Once the physical configuration has been completed, there are a few things that need to be configured on both devices before you can establish a cluster.

When you first boot each device, you’ll log in with root and no password. Then you’ll be dropped into the JunOS shell, and you’ll need to type cli to start the JunOS command-line interface. Then type configure to get into the configuration mode.

root@testsrx% cli
root@testsrx> configure
root@testsrx#

In the config mode, we’ll need to set a root password before we can enable the clustering. This password must match on both devices!

root@testsrx# set system root-authentication plain-text-password
New Password: 
Retype new password:

For the SRX 1500 series, where there is a dedicated HA Control port, this is enough to get the cluster working. But for some of the branch SRXs, like the 300 series, you’ll need to make a few additional changes. These devices come with a default config, which includes IP addresses on certain interfaces. Unfortunately, this will conflict with your cluster config and will not allow your cluster to reach a healthy state.

In my config, I already plan on re-configuring all of the interfaces and security-zones to fit my needs - so I will just delete those entire config sections:

root@testsrx# delete interfaces
root@testsrx# delete security

After all that is done, we need to commit our changes:

```text
root@testsrx# commit

Finally we can go ahead and set up the cluster! This config is actually done outside of configure mode, so you will need to exit that.

So one thing to note here - each cluster will be configured with a cluster-id. This MUSTbe unique across any layer 2 subnet. So if we had multiple SRX clusters within a single broadcast domain, we would need to assign each one a different cluster ID. I’ll use cluster-id 5 in this example.

On whichever SRX you want to be the primary node:

root@testsrx# exit
root@testsrx> set chassis cluster cluster-id 5 node 0 reboot

I personally like to give the primary a minute or to into the boot process before I configure the secondary, but we’ll do so with a similar command (just specifying node 1 instead of 0):

root@testsrx> set chassis cluster cluster-id 5 node 1 reboot

After both nodes come back online, log into node 0 and run the following command:

root@testsrx> show chassis cluster status 
Cluster ID: 5
Node       Priority      Status      Preempt      Manual failover

Redundancy group: 0 , Failover count: 0
node0      100           primary     no           no 
node1      100           secondary   no           no

Redundancy group: 1 , Failover count: 0
node0      100           primary     yes          no 
node1      100           secondary   yes          no

Perfect! Now let’s go configure our fabric ports! Interface fab0 will be configured as the fabric port on node0, and interface fab1 will be configured as the fabric port on node1.

root@testsrx> configure
root@testsrx# set interface fab0 fabric-options member-interfaces ge-0/0/10
root@testsrx# set interface fab1 fabric-options member-interfaces ge-5/0/10
root@testsrx# commit

Now that we’re in a cluster, all of this configuration can be done on node0 - but note that in this case the secondary device’s ports all start with ge-5/x/x. This is another oddity of JunOS - that numbering scheme isn’t always the case. In the SRX1500s, the node1 ports all start with ge-7/x/x - so this will vary depending on what devices you’re working with. If you ever need to check this - you can run show interface terse to list all interfaces in the cluster.

As a final verification that all our ports are up, drop out of config mode and run show chassis cluster interfaces:

root@testsrx> show chassis cluster interfaces
Control link 0 name: ge-0/0/1
Control link status: Up

Fabric interfaces:
Name Child-interface Status
fab0 ge-0/0/10 up
fab0
fab1 ge-5/0/10 up
fab1
Fabric link status: up

Hooray! We now have a functioning SRX cluster!

Sometimes if this doesn’t work, the output of show chassis cluster status will show the secondary node as disabled or lost. I’ve found that lost usually indicates a conflicting configuration on the cluster interfaces (like leaving the default IPs configured). If you see disabled, try rebooting the secondary node again - and if that doesn’t work, then you may need to disable clustering on both nodes and re-configure. This can be done using the set chassis cluster disable reboot command.

Next week, we’ll look at redundancy-groups, performing manual failovers, and setting up interface monitoring for automatic failovers. Hope this was helpful!

Quick Tips for Better BGP

Tue, 02 May 2017 10:26:42 +0000

A while back I wrote some basic information on how to get started implementing multi-homed internet using BGP. The details and configurations listed in that post are enough to get the connection up and running - but not quite in an ideal state. So today I want to share some quick tips that will help you maintain a better and more secure BGP connection.

Securing your BGP peering (Know who you’re connecting to)

BGP is a little different from most other routing protocols, since it uses a single unicast TCP connection between peers to exchange routing updates. Lucky for us, that means that we can easily filter traffic from only known peers. Once you have direct connectivity up between your edge router/firewall and your direct peer, lock down that connection with an ACL. Permit TCP port 179 traffic ONLY from your directly connected peer IP - no one else.

While you’re at it, let’s take it another step further: Request that your ISP set up BGP authentication. Sure, a majority of BGP implementations today still require use of MD5 for auth (which is terrible) - but some authentication is still better than none. This can usually be arranged at the time of turning up peering. Both sides configure the same authentication password and with any luck the peering still establishes.

BGP by nature is unfortunately not the most secure protocol - but a few simple steps like this will help ensure you’re only connecting out to authorized peers.

Route filtering (Don’t trust anyone)

Usually when you’re filling out the BGP peering paperwork for your service provider, they will ask you what kinds of routes you want. In most cases, you should be able to request one of the following:

Default only - Exactly what it sounds like. Your provider will only advertise a route for 0.0.0.0/0. In many cases, this is probably what you’re going to want. With this type of advertisement, each upstream provider will just give us the same default route to the internet. From there we can weight which one we want to use, and traffic will automatically fail-over to the secondary connection should the primary fail.

Partial - If for any reason you want to weight routes to certain destinations differently, then we might request this. In this case, you’re probably going to still receive 0.0.0.0/0 plus any specific routes you ask for. A good example of this is if we wanted to specifically manipulate routes for a remote office we have. Maybe we want to weight Internet traffic for one uplink, and VPN traffic to a remote office on the other uplink.

Full - In 99% of typical business cases, this won’t be required. This option means the upstream providers will be dumping the entire Internet routing table on you. While this offers you a ton of control over path manipulation, it also requires significant memory resources on your routers in order to maintain that routing table.

After we figure this out, the next step is to make sure we are filtering the routes we accept from the upstream provider. Wait - didn’t we just tell them exactly what routes to send us? Why do we need to filter them? Well you can never be too safe here - and we would rather perform an unnecessary filtering than have an ISP accidentally misconfigure route advertisements. So if you’re only expecting a route for 0.0.0.0/0, then filter your inbound route advertisements so you only accept that route.

Same thing goes for outbound route advertisement - if we own a /24 of public IP space, then we only want that range to be advertised out. Some providers may already filter this on their end, but again it doesn’t hurt here to be extra cautious. If we are accepting anything other than a default route from our provider, then we run the risk of leaking those additional routes between the two providers - which would lead to inadvertently becoming a transit AS. Chances are pretty good that you don’t want that, so make sure you configure filtering for all outbound route advertisements.

Minimum Advertisement (Oh no, we have to re-address everything)

I mentioned this in the original post - but typically when you are peering with two separate upstream providers, you need to advertise no less than a /24. We ran into this at my last job, where we had been provided a /25 by AT&T but we needed to bring in a second carrier via BGP. The reasoning behind this is to keep global routing tables as small as possible, by not allowing them to end up flooded with a ton of routes for smaller subnets. It makes sense, but on the other hand I feel like requiring a /24 in all cases can be a bit wasteful. My last job only required maybe 30 publicly addressable hosts - which meant that the remaining addresses went unused.

At any rate - should you find yourself in this scenario then you’re going to have to face the inevitable: Renumbering into a new IP space. Any time you have to do this, it’s going to be a bit of a pain - but for external addressing like this it might be easier. So in our case, the entire /25 space was hosted on our external firewall then NAT’ed into DMZ servers.

Here is the quick steps that I used to do a side-by-side migration without taking any significant downtime:

Get the new subnet up and running - assign the interface addresses on your firewall and BGP up and running
Assign new IP addresses to all of your existing services
Configure NAT rules for the new external IP addresses to the DMZ hosts - while leaving the existing NAT rules for the old subnet (Also make sure your firewall rules permit the same traffic to either IP)
Migrate DNS entries externally to point to the new IP space
Once traffic stops flowing to the old IP, remove the old NAT

As a side note - if you procure redundant internet connections through the same upstream provider, then you might be able to work out something else. They may be able to provide you a private ASN to use, and they will likely accept any minimum advertisement - since they will be summarizing upstream within their network anyways.

I had a few more things I originally intended to cover here - but it seems that these topics are filling way more space than I thought they would. Specifically, I’m thinking about a dedicated post to BGP path manipulation - which is probably something you’re going to want to implement after peering is established. Hopefully these tips help! If you have any questions, throw them in the comments below.

BGP: Getting Started with Multi-homed Internet

Tue, 10 Jan 2017 08:00:17 +0000

A few years back I worked for an organization that had a single 100Mb Internet connection. Not bad for just typical corporate traffic, but we also hosted our production web site out of that location as well. An incident occurred where our website was down due to Internet issues during an extremely inconvenient time. So we decided to procure a second Internet uplink through a different provider. At the time, I had no practical experience doing something like this - yet I was put in charge of the project. Let’s go over some of what I learned…

The easy part of the whole process is the first step - ordering a second Internet connection. Our CIO at the time placed a few calls and had a quote back pretty quickly. A local carrier was willing to run new fiber cables to our building in less than a month. Depending on how important uptime is to your organization, this is the point where you might want to ask about a diverse path into the building. If both connections run though the same physical paths, then a single incident could still cause an outage. For example - I once worked somewhere where the redundant Internet connections shared the same telephone poll across the street. So even though the connections were redundant, a single accident involving that telephone poll and both connections were severed.

Next - Ask about IP space. In terms of IPv4, the general rule for external BGP peering is that ISP’s don’t like to accept any prefixes smaller than a /24. In our case, we had a single /25 block already allocated by our current provider - which wasn’t going to work. Luckily, the new service provider offered to give up a free /24 block along with the installation costs. Unfortunately, this meant that we had to re-address all of our public-facing services, which is almost always a pain to do. I have a few tips for this, which helped us to minimize downtime - but that’s a story for another time.

Next, we need to obtain a globally unique Autonomous System (AS) number, which will be used to advertise our network to the world. Since we were located in North America, we went though ARIN for this process - which was fairly painless. Sign up for an account, prove that you’re associated with the business, fill out a few forms to justify your need, and then just wait for the approval. One thing to watch out for is 2-byte vs 4-byte AS numbers. 2-byte is the standard and has been around forever, but only allows for up to 65,535 unique IDs. A 4-byte ASN allows for significantly more unique IDs, but I have actually run into instances where an ISP doesn’t support these. I would hope that in most cases a 4-byte ASN will be just fine, but it might be worth asking your ISP just in case.

At this point, you should be ready to hit the ground running as soon as that second Internet uplink is installed. This is also assuming you already run a router or multilayer switch on the edge of your network, which also has BGP capabilities. So let’s get down to the fun stuff - an extremely basic configuration to peer between two ISPs. I’ll dedicate another post to additional recommended settings and configurations - but for now let’s focus on getting this running. The configuration sample below is aimed at Cisco devices, but the same concepts apply to most vendors:

EdgeRouter(config)# router bgp *  *! The AS number provided by ARIN
EdgeRouter(config-router)# network **   ! The subnet we need to advertise out both ISPs
EdgeRouter(config-router)# neighbor ** remote-as ** ! Provided by the first ISP - Their remote peer IP and ASN
EdgeRouter(config-router)# neighbor * *remote-as * *! Provided by the second ISP

As I mentioned, this config is very basic and will just accomplish what we need to get going. Follow up with a quick show ip bgp neighbors and hopefully you’ll see two peers in the established state. Any other state indicates a problem bringing up the peer connection. I won’t get into too much detail here - but check the physical connection, ping the peer, and make sure there are no firewalls blocking TCP port 179 between the peer addresses.

Hope this was helpful! Comment below and let me know how your experiences have gone with this type of setup - and look forward to a few more posts regarding BGP peering setup with multiple ISPs.

IP Address Design (Part 2)

Tue, 03 Jan 2017 08:00:02 +0000

Last week in IP Address Design (Part 1) we discussed an example of a bad design for IP allocations and the problems that it caused. This week we will continue by discussing the proposed solution and how it resolved those issues.

The problems with our IP Addressing scheme bothered me quite a lot - especially because IP Addressing design doesn’t really seem to be something you can easily go back and fix. We are in a somewhat unique case since we often open new locations, which is a perfect opportunity to make a positive change going forward. About a year ago, I heard that we would be opening four new data center locations in the near future. So I finally sat down and figured out a new scheme, which ultimately we deployed to all new locations.

My first goal was to start making more proper use of address space, while still making it somewhat easy to remember. As I stated in the last post, our largest data center was only using about 4,000 addresses. I began the design by trying to figure out a good starting point. A single /16 is probably still too large, but if I split up a /16 into two /17s then people will get confused about where a subnet lives. Remember that we were migrating from a very simple scheme in the past, where the second octet dictated the network location. So for the sake of simplicity, I started the design using a single /16 per data center.

Next, I needed to split up that /16 into classless subnets which could be routed in a somewhat meaningful fashion within the data center. In also trying to keep human usability in mind, I decided to split the main /16 assignment into two /17s. The top /17 subnet would be designated to all edge subnets, like the DMZ and Out of Band Management - both of which were directly terminated off of the external firewall set. The bottom /17 would be designated for all internal, protected subnets. This included anything behind the internal firewall set, like our primary internal network and some of the new isolated network segments we had built.

So here is the final scheme:

10.15.0.0/16 - Overall data center allocation

10.15.0.0/17 - Edge subnets
- 10.15.0.0/18 - Main DMZ (10.15.0.0-10.15.63.255)
- 10.15.64.0/21 - Out of band management (10.15.64.0-10.15.71.255)
- 10.15.72.0/21 - Misc DMZ VLAN (10.15.72.0-10.15.79.255)
- 10.15.80.0/20 - Unused (10.15.80.0-10.15.97.255)
10.15.128.0/17 - Internal subnets
- 10.15.128.0/18 - Main Internal subnet (10.15.128.0-10.15.191.255)
- 10.15.192.0/22 - Protected subnet 1 (10.15.192.0-10.15.195.255)
- 10.15.196.0/22 - Protected subnet 2 (10.15.196.0-10.15.199.255)
- 10.15.200.0/21 - Unused (10.15.200.0-10.15.200.207.255)
- 10.15.208.0/20 - Unused (10.15.208.0-10.15.223.255)
- 10.15.224.0/19 - Unused (10.15.224.0-10.15.255.255)

Now the first thing you may notice is that there is a large amount of unused IP space - but I’m accepting that as potential for future growth. Even the large /18 allocations will allow for over 16,000 hosts, which may be more than we will need in the foreseeable future. However, as I mentioned earlier I needed to balance conservation and efficiency with human readability.

So how does this help some of our problems? We’ve already addressed the problem of IP exhaustion by dropping each data center to a single /16 subnet rather than several /16s. Routing tables are immensely simplified now due to summarization. Oh, I need a route to that other data center? Sure, now it is only a single /16 route to the VPN peer for that location. Once the traffic gets over to that local network, then we can worry about trying to route the individual allocations within there. Even then, within the data center I only need a handful of small routes. The external firewall can point the whole 10.15.128.0/17 subnet to the internal firewall set and let it handle routing from there. And finally - that pesky problem of exponential VPN tunnels. Now that each data center has a single /16, we only have to create a single tunnel between two locations which saves us a ton of valuable CPU on the VPN gateways.

Now, obviously these benefits only apply to locations where the new IP addressing scheme is the only addressing scheme. For connections back to a legacy data center, we would still have a single /16 on one side of the VPN while the other side had 4-6 /16 subnets. Even so, the VPN tunnels required for that configuration are significantly less than before. So to wrap this up, the design was proposed to the team and we decided to go with it for the four new data center builds. It is working quite well so far - and we are beginning to have conversations on back-porting this design to the legacy data centers (which will be another post for another time).

Have you ever had to re-design an IP addressing scheme? or have you ever been bothered by the current design and wished you could change it? Comment with your thoughts!

IP Address Design (Part 1)

Tue, 27 Dec 2016 08:00:37 +0000

It’s funny when you think about basic networking concepts and wonder if they will ever actually prove to be useful. Kind of like that “Do I really need to learn complex geometry? When am I ever going to use this?”. What I’m here to talk about today is IP Addressing design. In many cases this will be something that is already in place and fairly solid, so there won’t be much to think about. This was the case at every company I worked at until the most recent one, which is a local cloud service provider. The type of architecture required for this environment is a bit different from what I’ve previously worked with.

So here is my first architecture tip:

No matter how small your organization is today, think about how your proposed design might look 5-10 years down the road.

The problem that I ran into here was that this cloud provider was still using an IP addressing design which was originally designed for a different set of needs. The design was intended to support the business back when we had two data centers and no one thought we would expand. Well, today we have over a dozen locations and there are constant discussions about adding more.

Let’s start with the original design, why it was a good idea, and why it doesn’t scale well today. Every data center location was assigned a few standard blocks of IP Addresses, where each block corresponded to a logical network location. The 10.0.0.0/8 space was used for this, and broken into the following blocks:

10.1.0.0/16 - Reserved
10.11.0.0/16 - DMZ
10.111.0.0/16 - Out of band Management
10.211.0.0/16 - Internal network

This was the bare minimum that each location received, in some cases another /16 or two might be allocated. So first, let’s cover the reasons why this was a good design for the time. All subnets were terminated at classful boundaries, which means there was never confusion on a subnet mask. The association of the second octet to network region made the subnets easy to remember - it was quick for anyone to say “10.2xx? Oh yeah that’s an internal segment”. Also, with a minimum of four /16 blocks, we would practically never run out of IP space in each location (>260,000 usable addresses). All that being said, the addressing scheme was perfect for what it was designed for: Easy to be read and remembered by humans.

While that may have been great for two data center locations, it doesn’t really scale well about eight years later. So let’s take a look at why this design doesn’t work in the long run. After we reached the number of locations we have today, we are left with only ~40 /16 blocks unused in the 10.x.x.x block. That means we have room for ten or less new locations, before we completely exhaust that IP space. Next, after some quick research it turns out that even our largest location was only consuming about 4,000 addresses - not even 2% of the total addresses allocated. Routing tables in each data center were a nightmare, because each location had to have several discontiguous /16 blocks routed back to it. And to top it all off - it turns out that our site-to-site VPN tunnel architecture between locations was configured to use subnet-pair tunnels. This meant that for each pair of data centers (4 /16s per site), there would be 16 VPN tunnels. While 16 isn’t a lot, that really grows exponentially when we add more locations which are all configured for full-mesh VPN connectivity.

I’m trying to keep these posts somewhat manageable - so look for a continuation of this post next week, where I’ll discuss the solution to this problem and how we implemented it.