VMware NSX network virtualization programmatically creates,
snapshots, deletes, and restores software-based virtual networks. With the ability to be deployed
on any IP network, including both existing networking models and next generation fabric architectures from
any vendor, NSX is a completely non-disruptive solution.
From the design guide we get a decent analogy comparing NSX to traditional compute virtualization.
With NSX the complete set of L2 - L7 networking services (switching, routing, firewall and load balancing) are reproduced in software.
An NSX deployment is made up of data plane, control plane and management plane.
Control Plane
The control plane runs in the NSX controller. This is responsible for managing the switching and routing modules in the hypervisors. It consists of controller nodes to manage logical switches. By using the controller cluster to manage VXLAN based logical switches, it eliminates the need for multicast support form the physical network. The control plane will reduce ARP broadcasts. The user world agent is installed here, it tells VXLAN and DLR kernel modules what to do. It communicates between different components and kernel modules. The logical Router Control VM, handles routing table updates, route distribution. Three controllers a recommended for redundancy.
Data Plane
This is the area where packets are actually moved around. Frames are switched, packets are routed. The kernel modules installed here are VXLAN, Distributed logical router and Distributed firewall. The vSphere distributed switch is used here to switch the actual packets. The edge services appliances is a VM that does edge services such as NAT, edge firewalling, VPN termination, routing between network segments.
Management Plane
The management plane is what you interact with. It talks to the control plane to initiate changes. The NSX manager talks to the NSX Controllers. The message bus is the way these send commands between components. It also supplies the REST API entry points to the environment.
NSX Services
Switching - Enabling extension of a L2 segment anywhere in the netwrok irrespective of the physical network
Routing - Routing between IP subnets can be done without traffic having to leave through the physical router. routing is performed in the kernel. This provides optimal data path for routing traffic (East-West communication). The NSX Edge provides a centralized point to integrate with the physical network. (North - South communication)
Distributed Firewall - Security enforcement takes place at the kernel and VNIC level. This makes firewall rule enforcement highly scalable.
Load Balancing - L4 - L7 Load balancing and SSL termination.
VPN - SSL VPN for L2 and L3 VPNs
Connectivity to physical network - L2 and L3 gateway functions provide communication between logical and physical networks.
NSX Manager
The NSX manager is the management plane virtual appliance, it helps configure logical switches and connect vms to these logical switches. It provides management UI and is the entry point for the API for NSX, used for automating deployment of logical networks.
For every vCenter server in an environment there is one NSX manager.
The following diagram illustrates the order in which NSX manager is configured.
It is responsible for the deployment of the controllers, preparing the ESXi hosts, installing the vSphere installation bundles (VIBs) on the hosts to enable VXLAN, Distributed routing, distributed firewall and the user world agent which is used to communicate at the control plane.
The NSX manager is also responsible for deployment of the edge services gateway and services such as load balancing, firewalling, NAT.
Controller Cluster
The controller cluster is the control plane component that is responsible in managing the switching and routing modules in the hypervisors. It contains controller nodes that manage specific logical switches, this eliminates the need for multicast support on the physical network.
It is advised to deploy a controller cluster with multiple nodes in an odd number. A slicing method is used to ensure all nodes are being utilized.
In case of failure of a node, the slices assigned to this node are reassigned to the remaining members of the cluster. To ensure this method works correctly, one of the cluster is nominated as a master for each role. It is responsible for allocating slices to individual nodes and determining if a node has failed, if so it will redistribute the slices to remaining nodes. In the case of an election, a majority is needed to become master, therefore controllers must be deployed in odd numbers.
VXLAN Introduction
Definition
Virtual eXtensible Local Area Network (VXLAN): A Framework
for Overlaying Virtualized Layer 2 Networks over Layer 3 Networks
For a VXLAN deepdive, have a look at Joe Oniskick's post on
Define the Cloud VXLAN Deepdive
Here is the VXLAN deepdive presentation from VMworld 2012 -
link
VXLAN is a L2 over L3 encapsulation technology. The original ethernet frame is encapsulated with external VXLAN, UDP and ethernet headers to ensure it can be transfered between VTEPs.
A VXLAN segment can span multiple L3 networks, with full connectivity of VMs.
VMkernel interface is used to communicate over VXLAN. Everything is tunneled directly via vmkernels.
VXLAN kernel module encapsulates packet in a VXLAN header, sends it out the vmkernel interface (VTEP) - VXLAN tunnel endpoint, to the VTEP on a destination host, which de-encapsulates it and hands it to VM. This process is completely transparant to the VM.
Depending on teaming types hosts may have single or multiple VTEPs
The VXLAN Network Identifier (VNI) is a 24 bit identifier which is associated with each L2 segment created, it is carried inside the VXLAN header and is associated to an IP subnet, like a traditional VLAN. This VNI is the reason VXLAN can scale beyond the 4094 VLAN limitation.
VTEPs are identified by the source and destination IP addresses used in the external IP header.
Because the original ethernet frame is encapsulated into a UDP packet, this increases the size of the IP packet, therefore it is recommended that the minimum MTU size is set to 1600 Bytes.
Below we will look at L2 communication using VXLAN
- VM1 originates a frame destined to the VM2 part of the same L2 logical segment
- The source ESXi host identifies the ESXi host (VTEP) where the VM2 is connected and encapsulates the frame
before sending it into the transport network.
- The transport network is only required to enable IP communication between the source and destination VTEPs
- The destination ESXi host receives the VXLAN frame, decapsulates it and identifies the L2 segment it belongs to
(leveraging the VNI value inserted in the VXLAN header by the source ESXi host).
- The frame is delivered to the VM2.
NSX Edge Services Gateway
Services provided by NSX edge services gateway -
- Routing and NAT: the NSX edge provides centralized on-ramp/off-ramp routing between logical networks deployed in the NSX domain and the external physical infrastructure. It supports various routing protocols such as OSPF, iBGP and eBGP and can communicate leveraging static routing. Source and Destination NAT is performed here.
- Firewall: It has stateful firewall capabilities which compliment the distributed firewall enabled on the kernel of the ESXi hosts. While the distributed firewall enforces security policies for communication between workloads connected to the logical networks (east-west), the firewall on the edge filters communication between the logical and physical network (north-south)
- Load balancing: the NSX edge can perform load-balancing services for server farms of workloads deployed in the logical space
- L2 and L3 VPN: L2 VPn is usually applied to extend L2 domains between different DC sites. L3 VPNs can be deployed to allow IPSEC site to site connectivity between to edges or orther VPN terminators, using an SSL VPN.
- DHCP, DNS and IP address management: DNS relay, DHCP server and default gateway features also available.
NSX Manager deploys a pair of NSX edges on different hosts (anti-affinity). Heartbeat keepalives are exchanged every second between the active and standby edges to monitor each others health status. These keepalives are L2 probes sent over an internal portgroup. VXLAN can be used to transmit these keepalives, therefore allowing this to happen over a routed network.
If the ESXi server hosting the active NSX edge fails, at the expiration of a "Declare Dead Time" timer, the standby edge takes over. The default time takes 15 seconds but can be decreased to 6.
The NSX Manager also monitors the state of health of deployed edges.
Transport Zone
A transport zone defines a collection of ESXi hosts that can communicate with each other a physical network infrastructure. This communication happens by leveraging at least one VTEP on each host.
As a transport zone extends across one or more ESXi clusters it defines the span of logical switched.
A VDS can span across a number of ESXi hosts.
A Logical switch can extend across multiple VDS
NSX Distributed Firewall
The DFW provides L2-L4 stateful firewall services to any workload in the NSX environment. DFW runs in the kernel space and as asuch performs near line rate network traffic protection. DFW performance and throughput scale linearly by adding new ESXi hosts.
The distributed firewall is activated as soon as the host preparation process is completed. If you want to exclude a VM from DFW service, you can add it to the exclusion list.
One DFW instance per VM vNIC is created, so if you create a new VM with 5 vNICS, 5 instances of DFW will be allocated to the VM. When a DFW rule is created a Point of Enforcement (PEP) can be selected, options vary from vNIC to logical switch. By default "apply to" option is not selected and the DFW rule is applied to all instances.
DFW policy rules can be written in 2 ways: using L2 rules (Ethernet) or L3/L4 rules (General).
L2 rules are mapped to L2 OSI model: only MAC addresses can be used in the source and destination fields – and only L2 protocols can be used in the service fields (like ARP for instance).
L3/L4 rules are mapped to L3/L4 OSI model: policy rules can be written using IP addresses and TCP/UDP ports. It is important to remember that L2 rules are always enforced before L3/L4 rules. As a concrete example, if the L2 default policy rule is modified to ‘block’, then all L3/L4 traffic will be blocked as well by DFW (and ping would stop working for instance).
The DFW is an NSX component designed to protect workload to workload network traffic, either virtual to virtual or virtual to physical. The main goal of the DFW is to protect east-west traffic. But since the DFW policy enforcement is applied to the vNIC, it can also be used to prevent communication between VMs and the physical network. The edge services gateway os the first point of entry in the data centre as it primarily concerned with protecting north-south traffic flow.
The DFW operates at the vNIC level, meaning that a VM is always protected no matter how it is connected to the logical network. VM can be connected to a VDS VLAN-backed port-group or to a Logical Switch (VXLAN-backed port-group). All these connectivity modes are fully supported. ESG Firewall can also be used to protect workloads sitting on physical servers and appliances, such as NAS.
There are 3 entities that make up the DFW architecture:
vCenter server: This is the management plane on the DFW. Policy rules are created through the vSphere web client. Any vCenter container can be used in the source/destination field of the policy rule: cluster, VDS port-group, Logical Switch, VM, vNIC, resource pool, etc.
NSX Manager: This is the control plane on the DFW. It receives rules from the vCenter and stores them in the central database. NSX Manager then pushes DFW rules down to all hosts. NSX Manager can also recieve rules directly from Rest API.
ESXi Host: This is the data plane of the solution. DFW rules are received from the NSX Manager and then translated into the kernel space for real-time execution. VM network traffic is inspected and enforced per ESXi host.
VMtools need to be installed on VMs to provide IP connectivity.
While a host is being prepared, and DFW is being activated a kernel VIB is loaded into the hypervisor. This is called VMware Service Insertion Platform (VSIP)
VSIP is responsible for all data plane traffic protection and runs at near line speed. A DFW instance is created per vNIC and this instance is located between the VM and the virtual switch.
A set of daemons called vsfwd run permanently on the ESXi host and perform the follwing tasks:
- Interact with the NSX Manager to retrieve DFW rules.
- Gather DFW statistics and send them to NSX Manager.
- Send audit logs to the NSX Manager.
The communication path between the vCenter Server and the ESXi host (using the vpxa process on the ESXi host) is only used for vSphere related purposes like VM creation or storage modification and to program host with the IP address of the NSX Manager. This communication is not used at all for any DFW operation.
The VSIP kernel module adds services like Spoofguard ( protects against IP spoofing) and traffic redirection to third party services like Palo Alto.
How DFW rules are enforced
- DFW rules are enforced in top-to-bottom ordering.
- Each packet is checked against the top rule in the rule table before moving down the subsequent rules in the table.
- The first rule in the table that matches the parameters are enforced.
When creating rules it is recommended to put the most granular rules at the top, to ensure they are all enforced.
By default the bottom rule in the table is a catch all, and will be applied to all by default. This default catch all rule is set to allow.
An IP packet (first packet - pkt1) that matches Rule number 2 is sent by the VM. The order of operation is the following:
1. Lookup is performed in the connection tracker table to check if an entry for the flow already exists.
2. As Flow 3 is not present in the connection tracker table (i.e miss result), a lookup is performed in the rule table to
identify which rule is applicable to Flow 3. The first rule that match the flow will be enforced.
3. Rule 2 matches for Flow 3. Action is set to ‘Allow’.
4. Because action is set to ‘Allow’ for Flow 3, a new entry will be created inside the connection tracker table. The
packet is then transmitted properly out of DFW.
For subsequent packets, lookup is performed in the connection tracker table to check if an entry for the flow already exists, if a flow exists, the packet is transmitted out of the DFW.