Friday, 14 August 2015

Logical Routing

By using the NSX platform, we have the ability to interconnect endpoint, either virtual or physical, deployed in separate logical L2 networks. This is possible because of the decoupling of the virtual network from the physical.

In the diagram below we can see a routed topology connecting two logical switches.



By deploying a logical router we can interconnect endpoints, either virtual or physical, belonging to separate L2 domains, or by interconnecting endpoints belonging to L2 domains with devices deployed in the external L3 physical environment. The first type is east-west communication while the latter is north-south.


Here, we can see both east-west and north-south routing in a multi-tier application.

Logical Routing Components 

Here we will discuss centralized and distributed routing.

Centralized routing represents the functionality that allows communication between the logical network and layer 3 physical infrastructure.







Here we can see that centralized routing can be used for both east-west and north-south routed communications. However east-west routing is not optimized in a centralized routing deployment, since the traffic is always hair-pinned from the compute rack towards the edge. This occurs even when two VMs in separate logical networks reside on the same physical host.


The deployment of distributed routing prevents hair-pinning for VM to VM routed communication by providing hypervisor level routing. Each hypervisor installs the kernel specific flow information, ensuring direct communication path when the endpoints belong to separate IP subnets.


The DLR control plane is is provided by the DLR Control VM. This VM supports dynamic routing protocols, such as BGP and OSPF. It exchanges routing updates with the next L3 hop device and communicates with the NSX Manager and Controller Cluster. HA is provided through Active-Standby.

At the data-plane level there are DLR modules (VIBs) that are installed on the ESXi hosts. The modules have routing information base (RIB) that is pushed through the controller cluster. Route lookup and ARP  lookup are performed by these modules. The kernel modules are equipped with logical interfaces (LIFs) connecting to different logical switches or VLAN backed portgroups. Each LIF is assigned an IP address representing the default gateway for that logical segment and also a vMAC address.





The above shows the integration between all logical routing components to enable distributed routing.
  1. A DLR instance is created from NSX Manager, either by by UI or and API call and routing is enables, using the protocol of choice, either OSPF or BGP.
  2. The controller leverages the control plane with the ESXi hosts to push the new DLR configuration, including LIFs and their IP addresses and vMAC addresses.
  3. OSPF/BGP peering is established between the edge and the DLR control VM.
  4. The DLR control VM pushes the IP route learnt from the NSX edge to the controller cluster.
  5. The controller cluster is responsible for distributing routes learned from the DLR control VM across the hypervisors. Each controller node in the cluster distributes information for a particular logical router instance. 
  6. The DLR Routing kernel modules on the hosts handle the data path traffic for communications to the external network via the NSX edge.



The above shows the required steps for routed communication between two virtual machines connected to separate logical segments.

  1. VM1 wants to send a packet to VM2 connected to a different VXLAN segment, so the packets is sent VM2 default gateway interfaces located on the local DLR.
  2. A routing lookup is performed at the local DLR, which determines that the destination subnet is directly connected to DLR LIF2. A lookup is performed on the LIF2 ARP table to determine the MAC address associated to VM2.
  3. An L2 lookup is performed in the local MAC address table to determine how to reach VM2, the original packet is then VXLAN encapsulated and sent out the VTEP of ESXi2.
  4. ESXi2 de-encapsulates the packet and performs an L2 lookup in the local MAC table associated with the given VXLAN
  5. The packet is delivered to VM2. 

Local routing will always take place on the DLR instance running in the kernel of the ESXi hosting the workload that initiates the communication.




Above is an example of ingress traffic form external networks.

  1. A device on external network wants to communicate with VM1
  2. The packet is delivered from external network to ESXi server which is hosting the edge. The edge receives the packet and performs a routing lookup.
  3. Both routing lookups, NSX level and DLR level are performed locally on ESXi2.
  4. The destination is directly connected to the DLR, packet is VXLAN encapsulated and routed from transit network to correct VXLAN segment.
  5. ESXi 1 de-encapsulates the packet and delivers it to VM1. 





Above is an example of egress traffic to external network.

  1. VM1 wants to reply to external destination. Packet is sent to default gateway, located on local DLR.
  2. Routing lookup is performed at local DLR, determines next hop. This is the NSX edge on the transit network. Information is pushed to DLR kernel by the Controller.
  3. L2 lookup is performed to determine how to reach NSX edge interface on transit network. Packet is VXlAN encapsulated and sent to VTEP of ESXi2.
  4. Edge performs a routing lookup and sends packet into phyical network to next L3 hop. Packet will then be delivered by physical network.




No comments:

Post a Comment