Friday, 12 June 2015

Logical Switching


In NSX we have the ability to create isolated logical L2 networks. Both physical and virtual endpoints  can be connected to these logical segments and establish connectivity independently from where they are deployed.

The diagram below shows the logical and physical views when logical switching is deployed using VXLAN. This allows us to stretch a L2 domain across multiple server racks, by utilizing a logical switch. This all happens independently from the underlay L2 or L3 infrastructure.






Replication Modes for Multi-Destination Traffic

If we have 2 VMs on different hosts that need to communicate with each other, unicast VXLAN traffic is exchanged between the 2 VTEPs. If we need to send traffic to other VMs on the same logical switch, we have 3 options

  • Broadcast
  • Unknown Unicast
  • Multicast

So how does NSX replicate traffic to multiple unknown remote hosts? The different replication types are

  • Multicast
  • Unicast
  • Hybrid
The logical switch inherits its replication mode from the transport zone by default, although this can be changed on a switch by switch basis.

In the below diagram we have a look at VTEP segemnts. there are 2 VTEP segments in the below scenario. Host 1 & 2 are one VTEP segment and host 3 & 4 make up the second segment.


                                 




Multicast


When Multicast replication mode is chosen for a given Logical Switch, NSX relies on the Layer 2 and Layer 3 Multicast capability of the data center physical network to ensure VXLAN encapsulated multi-destination traffic is sent to all the VTEPs.

Multicast mode is a way of handling Broadcast, Unknown Unicast and Multicast (BUM) traffic. This does not allow us to use decoupling between physical and logical networking infrastructures.

In Multicast mode a multicast IP address needs to be assigned to each logical switch. L2 multicast capability is used to replicate traffic to all VTEPs in the local segment. IGMP Snooping must be enabled on physical devices. ALso to ensure multicast traffic is delivered to VTEPs on a different subnet L3 multicast routing and  Protocol Independent Multicast (PIM) must be enabled.


In the example below the VXLAN segment 5001 has been associated to the multicast group 239.1.1.1. As a consequence, as soon as the first VM gets connected to that logical switch, the ESXi hypervisor hosting the VM generates an IGMP join message to notify the physical infrastructure its interest in receiving multicast traffic sent to that specific group.



                          



As a result of the IGMP Joins sent by ESXi1-ESXi-3, multicast state is built in the physical network to ensure delivery of multicast frames sent to the 239.1.1.1 destination. Notice that ESXi-4 does not send the IGMP Join since it does not host any active receivers (VMs connected to the VXLAN 5001 segment). The sequence of events required to deliver a BUM frame in multicast mode is depicted below.



                        


  • VM1 generates a BUM frame
  • ESXi 1 encapsulates the frame with VXLAN. The destination IP address in the VXLAN header is set to the multicast address of 239.1.1.1
  • The L2 switch receiving the multicast frame performs replication: assuming IGMP Snooping is configured on the switch, it will be able to replicate the frame only to the relevant interfaces connecting to ESXi-2 and the L3 router. If IGMP Snooping is not enabled or not supported, the L2 switch treats the frame as a L2 broadcast packet and replicates it to all the interfaces belonging to the same VLAN of the port where the packet was received.
  • The L3 router performs L3 multicast replication and sends the packet into the Transport subnet B.
  • The L2 switch then replicates the frame.
  • ESXi-2 and ESXI-3 decapsulate the received VXLAN packets exposing the original Ethernet frames that are delivered to VM2 and VM3.





Unicast

In unicast mode, decoupling logical and physical networks is achieved. The hosts in the NSX domain are divided into separate VTEP segments based on their IP subnet. A host in each subnet is selected to be the Unicast Tunnel End Point (UTEP). The UTEP is responsible for replicating multi-destination traffic. An NSX controller is required for this, as it acts as a cache server for ARP and MAC address tables.

Every UTEp will only replicate traffic to ESXi hosts on the local segment that have at least one VM actively connected to the logical network where multi-destination traffic is sent to.

                                    


  • VM generates a BUM frame to be sent to all VMs.
  • ESXi 1 looks at its local VTEP and determines the need to replicate the packet only to the other VTEP belonging to the local segment (ESXi2) and to the UTEP part of remote segments. The unicast copy sent to the UTEP is characterized by having the  "REPLICATE_LOCALLY" bit in the VXLAN header.
  • The UTEP receives the frame, looks at its local VTEP table and replicates it to all the hosts which are part of the local VTEP segment with at least one VM connected.


Hybrid


Hybrid Mode offers operational simplicity similar to Unicast Mode (no IP Multicast Routing configuration required in the physical network) while leveraging the Layer 2 Multicast capability of physical switches.

The specific VTEP responsible for performing local replication to the other VTEPs part of the same subnet is now named “MTEP”. The reason is that in Hybrid Mode the [M]TEP uses L2 [M]ulticast to replicate BUM frames locally.



  • VM1 generates a BUM frame that needs to be replicated to all the other VMs part of VXLAN 5001. The multicast group 239.1.1.1 must be associated with the VXLAN segment, as multicast encapsulation is performed for local traffic replication
  • ESXi1 encapsulates the frame in a multicast packet addressed to the 239.1.1.1 group. Layer 2 multicast configuration in the physical network is leveraged to ensure that the VXLAN frame is delivered to all VTEPs in the local VTEP segment; in hybrid mode the ESXi hosts send an IGMP Join when there are local VMs interested in receiving multi-destination traffic.
  • At the same time ESXi-1 looks at the local VTEP table and determines the need to replicate the packet to the MTEP part of remote segments. The unicast copy sent to the MTEP with the bit set in the VXLAN header, as an indication to the MTEP that this frame is coming from a remote VTEP segment and needs to be locally re-injected in the network.
  • The MTEP creates a multicast packet and sends it to the physical network where will be replicated by the local L2 switching infrastructure.



So to simplify 

In a 2 rack environment with 2 VTEP networks, 2 hosts on each rack.

Unicast - Has to send out frames to each host on rack 2. Can cause a lot of overhead in larger environments.
Hybrid - Only has to send one frame to rack 2. The MTEP will then replicate to each host from here.
Multicast - Local and remote replication handled by multicast. Needs multicast addresses. Biggest challenge is configuring physical network. 



You can change the VXLAN mode at any stage by going to Logical Network Preparation - Transport Zones - Edit settings.

You can also migrate existing logical switches to the new control plane mode, by checking the check box.


To create a logical switch, select the logical switches menu icon, hit the plus button, give it a name, choose transport zone and replication mode.




































No comments:

Post a Comment