VXLAN is Now RFC 7348

In the last August, the Internet Engineering Task Force (IETF) galvanized the work of roughly 3 years in a Request for Comments (RFC) document. Although it can be considered a young virtualization technology, Virtual eXtensible Local Area Networks (VXLANs) are already molding how Cloud networks function and are quickly becoming a fundamental building block of Software-Defined Networking (SDN).

As I have explored in Chapter 15 from my book (“Data Center Virtualization Fundamentals”), VXLANs were at first invented to provide Layer 2 communication between Virtual Machines (VMs) over IP networks. Nevertheless, its flexibility can be used to implement specialized data center fabrics such as Cisco Application Centric Infrastructure (ACI).

The document was written by employees from Cisco Systems, Storvisor, Cumulus Networks, Arista, Broadcom, VMware, Intel and Red Hat and can be downloaded here: https://datatracker.ietf.org/doc/rfc7348/?include_text=1

Here are some of its highlights:

  • The RFC has achieved informational status, meaning that it should not be considered a mandatory standard but simply the publishing of an experience. IETF´s motto in these cases is “rather document than ignore” according to RFC 1796 (“Not All RFCs are Standards”), which funnily is also informational.
  • According to the RFC, VXLAN overcomes the following challenges: STP limitations, VLAN range size, multi-tenant environments in cloud computing environments, and inadequate table sizes at Top-of-Rack switches.
  • The document focuses on data plane learning scheme as control plane for VXLAN, meaning that the association of MAC to VTEP´s IP address is discovered via source MAC address learning.
  • Multicast is used of carrying unknown destination, broadcast, and multicast frames. VTEPs use (*,G) joins for that objective.
  • The RFC does not discard that other control plane options for MAC/VTEP learning may exist, such as Nexus 1000V´s Enhanced VXLAN.
  • To ensure traffic delivery without fragmentation, it is recommended that the MTUs across the physical network are set to a value that accommodates the larger frame size due to the VXLAN encapsulation.
  • The Internet Assigned number Authority (IANA) assigned the value of 4789 for the VXLAN´s destination UDP port. It is recommended that the source port is calculated using a hash of the inner Ethernet frame´s headers to leverage entropy on networks that deploy traffic load-balancing methods such as ECMP. Also, the UDP source port range should be within the dynamic/private port range of 49152-65535).
  • VXLAN gateways are defined as elements that can forward traffic between VXLAN and non-VXLAN environments.

Even under information status, the publishing of VXLAN as a RFC will surely help the convergence of distinct vendor research and development. These are exciting news indeed for network engineers, server virtualization admins, and cloud architects.

Have you already configured your first VXLAN?

Best regards,

Gustavo

Configuring Physical Nexus Switches as VXLAN Layer 2 Gateways

Today, a VLAN can be read as a “villain” in some data centers (pun unashamedly intended). The rate of virtual machine deployment in cloud and scalable server virtualization environments environments is seriously challenging how physical networks in provisioning speed and sheer capacity.

In my book (“Data Center Virtualization Fundamentals”, CiscoPress 2013), I have explained the principles of Virtual eXtensible LAN (VXLAN): basically a network virtualization technique focused on hypervisor-based server virtualization environments using the encapsulation of Ethernet frames into UDP segments. VXLAN is a hot topic today not because how it encapsulates Ethernet frames over Layer 3 networks, but because who does it: the hypervisor itself.

Rather than VLANs, a server virtualization administrator may use VXLANs to provide isolated broadcast domains between VMs, overcoming the following challenges:

  • Defining more than 4094 broadcast domains (you can theoretically provision more than 16M distinct VXLAN segments);
  • Provisioning additional Layer 2 segments without any operations on the physical network;
  • Avoiding MAC address table “explosions” on the physical network due to an extreme number of virtual machines. With VXLANs, only the VXLAN Tunnel End Points (VTEPs) MAC addresses are learned by the physical switches.

The first Cisco product that offered VXLAN was the Nexus 1000V, using the essential (free) license. As I explained in the book, Nexus 1000V allows the communication of several VMs using IP multicast for BUM (Broadcast, Unknown unicast, or Multicast) traffic exactly as defined on the first VXLAN draft. However, applications are surely not entirely composed of VMs. Physical servers and network appliances (such as firewalls and load balancers) must also communicate with them, and therefore, VXLAN Gateways are needed for this objective.

Also in the book I have explored Layer 3 gateways, such as CSR 1000V and ASA 1000V. These specialized virtual machines can basically route VXLAN packets from VMs to VLANs and vice-versa. Nevertheless, a very interesting question may arise from this discussion? How can a VXLAN-bound VM exchange Layer 2 traffic with a physical server (such as a database server) connected to a VLAN? In other words, a Layer 2 Gateway is needed to “weld” a VLAN+VXLAN pair, providing a single broadcast domain with these two network abstractions.

Last year, Cisco has released a virtual service blade for the Nexus 1100 to provide this communication. The gateway main focus is to leverage Nexus 1000V´s Enhanced VXLAN, which allows unicast-only communication among VTEPs. Because this will be a topic for a future post, I will focus here on Nexus physical switches that can be used as Layer 2 Gateway.

Figure 1 represents a very simple topology where the switch Nexus is deployed as a Layer 2 Gateway between a server on VLAN 1500 and a virtual machine on VXLAN 10000.

Figure1

Figure 1: NEXUS switch as a Layer 2 Gateway

This is the (select) configuration for the physical switch.

 

NEXUS# show running-config[output suppressed]! Enabling VXLAN with VLAN-based configuration

feature nv overlay

feature vn-segment-vlan-based

[output suppressed]

 

! Allowing the advertisement of multicast routes

ip pim rp-address 2.2.2.2 group-list 235.1.1.1/32

 

! Changing the default VXLAN UDP port (4789) to the Nexus 1000V original port

vxlan udp port 8472

 

! “Welding” VLAN 1500 with VXLAN 1000

vlan 1500

vn-segment 10000

 

! Configuring the virtual L2 interface

interface nve1

no shutdown

source-interface loopback0

member vni 10000 mcast-group 235.1.1.1

 

! Configuring the physical L2 interface

interface Ethernet1/1

switchport access vlan 1500

speed 1000

 

[output suppressed]

! Physical L3 interface

Interface loopback0

ip address 4.1.1.2/32

ip router ospf 1 area 0.0.0.0

ip pim sparse-mode

 

[output suppressed]

 

As you can see, the configuration somehow resembles the very famous OTV from Nexus 7000 and ASR 1000. Because VXLAN is also an Ethernet-over-IP overlay, the configuration requires that both protocols are correctly established (interfaces Ethernet1/1 and Loopback0, respectively). After all VXLAN processes are enabled (feature commands) and the UDP port is configured for compatibility. Finally, a Network Virtual Edge (NVE) interface is configured to behave as a VTEP.

A mild curiosity: if you check IANA´s service port numbers, UDP port 8472 is reserved for “Overlay Transport”. Later VXLAN drafts changed that to UDP port 4789 to further distance the protocol from its “humble” origins.

After all machines communicate with each other using ARP requests and replies, the Nexus switch Figure 1 displays the following MAC address table for VLAN 1500.

NEXUS# show mac address-table vlan 1500Legend:* – primary entry, G – Gateway MAC, (R) – Routed MAC, O – Overlay MACage – seconds since last seen,+ – primary entry using vPC Peer-Link

VLAN       MAC Address     Type     age       Secure NTFY     Ports/SWID.SSID.LID

———+—————–+——–+———+——+—-+——————

* 1500     0050.56b0.5009   dynamic     100       F   F   nve1 /20.20.20.3

* 1500     0050.56b0.58d9   dynamic     10         F   F   nve1 /20.20.20.3

* 1500     8843.e1c2.b4cc   dynamic     10         F   F   Eth1/1

 

In the meanwhile, Nexus 1000V shows the following MAC address table for VXLAN segment 1000.

VSM# show mac address-table bridge-domain segment-10000Bridge-domain: segment-10000MAC Address       Type     Age       Port           IP Address     Mod

————————–+——-+———+—————+—————+—

0050.56b0.5009   static   0         Veth11         0.0.0.0         3

0050.56b0.58d9   static   0         Veth12         0.0.0.0         3

8843.e1c2.b4cc   dynamic 1         Eth3/3         4.1.1.2         3

 

Total MAC Addresses: 3

 

You can notice that both devices have entries for all three end hosts. The physical switch sees the MACs on VLAN 1500 and VXLAN 1000 while the Nexus 1000V instance has it on VXLAN 10000 (or bridge domain “segment-10000”). Local MAC addresses are directed to local interfaces and remote addresses point to the VTEP (IP addresses).

For sure other aspects of this implementation, such as high availability and possibility of VLAN extension between data centers, must be further discussed. But I hope that for now these simple principles are enough food for thought.

Please feel free to add your comments below.

Best regards and stay tuned!

Gustavo

CCIE: A Study Strategy

During discussions that usually happen after my presentations, one of the most frequent questions is: “how did you study for the CCIE (Cisco Certified Network Expert) certification?”

Although this certification is widely recognized in the IT market for more than 20 years, it amazes me how it is still beset by doubts and exaggerated concepts. And the sheer number of CCIEs over these years both shows the growing interest for the certification and how hard it is to actually achieve it:

CCIE-Statistics

(Source: http://www.bradreese.com/worldwide-ccie-count.htm)
Without any doubt, I can say that the greatest satisfactions I ever had in my career at Cisco were surpassing this challenge three times (CCIE Routing & Switching in 2002 , CCIE Storage Networking in 2007 , and CCIE Data Center this month, as I have posted earlier here). And at each time, the preparation for the exams ultimately changed the way I studied, worked, and even thought.

Regardless of your intended CCIE specialty (Routing & Switching, Collaboration, Data Center, Security , Service Provider , Service Provider Operations, Voice, or Wireless), I truly believe that there are general good practices that can be used in the exam preparation. Of course, several other ways exist to reach a positive outcome but here I will describe a methodology developed by Alexandre Moraes, Marcos Yamamoto and me in the early 2000s. At the time, we were systems engineers in Cisco Brasilia´s office, blazing nights and weekends together to get certified.

A good analogy to the strategy would be: “train for a marathon and not for a 100 meter race”. Keeping your pace is as important as learning itself. And I can vow that such strategy was so successful that both Alexandre and I became triple-CCIEs and Marcos, a CCIE and an international marathon runner.

An important note: respecting the Cisco Certification and Confidentiality Agreement signed by all CCIE applicants, I will absolutely not discuss the content or any other information about the exams. As aforementioned, my idea is to solely describe the study strategy I have been using in the last 12 years.

Preparing for the Written exam
Nothing new here for those who have already studied for some certification exam from Cisco or any other IT vendor. The acronym “RTFB” (meaning, pardon my French, “Read the F… Books “) bluntly summarizes this step. At this stage, it is important that you follow what is being asked in the written test blueprint (visit http://www.cisco.com/go/ccie for these documents).

While studying, try to understand the concepts in a non-superficial way. Believe me; you’ll need them again repeatedly on the preparation for the lab exam.

Another valuable tip: schedule your written exam for no more than a month. As the day approaches you will be naturally “coerced” to study. And most importantly: even if you do not pass in the first time this first attempt will assess accurately your conceptual weaknesses.

To summarize the importance of this tip, a little bit of personal history: before 2001, I only scheduled exams when I felt truly prepared for them. However my manager at the time insisted at this very point and made me schedule the Routing&Switching written exam against my own will. I must say that he was absolutely correct (thanks, Wadih!) and that his lesson was simply “leave your ego aside and focus the positive result”.

With the success in the written test you will be entitled to set a date for the lab exam. I would suggest that you check to four to six months in the future. In most cases it can be pretty difficult to get dates inside the window above but keep on trying: many candidates change their dates and slots may be available for a few minutes.

Preparing for the Lab Exam
Now the fun really begins! Time planning is crucial for the lab exam study so aim to get at least 200 hours of laboratory practice until the day of the exam.

For example, if you practice two hours a day on weekdays and eight hours during each day of the weekend you will reach 26 hours per week. With this pace you will reach 216 hours in two months which is a particular good average considering that you will probably need to study a lot of theory during these early months. Again, this is an average figure: each individual obviously has his own pace, strengths and areas for improvement.

It is important that you plan to have access to “real” lab equipment during your study time. I know there are several simulators for these devices but I have always sought to practice in real environments as similar to the exam as possible (the CCIE website usually describes which devices and software versions are used in the exam). If you do not have equipment available for your immediate use, there are several companies, such as IPExpert and INE, that provide CCIE racks for reasonable prices. Another great tip is to use laboratories from Cisco PEC (Partner E- Learning Connection) for practice on specific topics.

My lab exam preparation was always separated into three main phases during approximately one month each. The first phase is focused on CONTENT, meaning the complete elimination of your doubts about design, configuration, and troubleshooting of features required in the lab exam blueprint. During this phase, it can be quite interesting to attend a CCIE lab bootcamp. These trainings are extremely good to point out your hands-on weaknesses.

Subsequently it comes the time to move to the second phase which is focused on the practice of FULL EXAMS. Your goal now is to learn how to use the content absorbed previously in simulations modeled on the lab exam. In this phase, both your mind and body will become used to long consecutive hours of hands-on required for the lab exam.

In the CCIE bootcamp training, you probably received some complete mock exams while others can be purchased in published books or CCIE workbooks from websites such as www.iementor.com. With at least 5 mock exams in hand you will be able to practice during the weekends and use the weekdays to resolve the doubts that hindered their execution. Be sure to mark the time you took for each mock exam: you will use them in the next phase. Try to solve problems using all possible ways and pay extreme attention to mock exam´s text, since it will probably require you to use a specific methodology that may differ from your personal preference.

You may move to the third stage when you have total confidence on all possible tweaks of the features required in blueprint (at this point I starting to having nightmares about commands that I had never heard of). The focus now is SPEED and your goal is quite simple: try to retake the mock exams spending half the time you used in the second phase. This procedure will force you to develop your own method for faster configuration.

In my case, I used the weekends to practice at least four mock exams and weekdays to develop and test acceleration techniques such as copy-and-paste into notepad and memorize wizards from graphical interfaces. At the end of this phase you will almost become an “IOS configuration printer” and time will no longer be your worst enemy.
Now you’re more than ready for the real thing. Until the exam it is important that you do not miss a beat: continue to conduct new simulations and inventing new methods to increase your speed. By all means, keep your pace for the ultimate marathon!

The Day of The Exam

Some quick tips for the D-day:

  • Plan to arrive early to the exam site.
  • Have a good breakfast.
  • Prior to the exam, do not do anything you’re not used to, like taking tranquilizers or eat exotic food.
  • Try to relax and trust your skills!

Do Not Give Up

As I have mentioned above, the CCIE certification may test you in ways you are not expecting at all. Of course, it is a great privilege to pass on the first time but this is an extremely rare situation as most candidates can attest. In my humble opinion, you should not “fight” the certification, complaining to yourself that a said question, topic or troubleshooting does not reflect the best practices. As my first CCIE proctor summarized: “this is not the real world, this is the CCIE lab exam”.

And if you don´t pass in the first time, as it happened to me, remember that there is ALWAYS room for improvement. Or in the words of the greatest film director of all time, Mr. Stanley Kubrick:

“I’ve never been certain whether the moral of the Icarus story should only be, as is generally accepted, ‘don’t try to fly too high,’ or whether it might also be thought of as ‘forget the wax and feathers, and do a better job on the wings.”

 StanleyKubrick

I sincerely hope this strategy can help you somehow. Good luck and Godspeed!

Gustavo