- Open vSwitch
- Contents
- Installation
- Configuration
- Overview
- Bridges
- Bonds
- VLANs Host Interfaces
- Rapid Spanning Tree (RSTP)
- Note on MTU
- Examples
- Example 1: Bridge + Internal Ports + Untagged traffic
- Example 2: Bond + Bridge + Internal Ports
- Example 3: Bond + Bridge + Internal Ports + Untagged traffic + No LACP
- Example 4: Rapid Spanning Tree (RSTP) — 1Gbps uplink, 10Gbps interconnect
- Multicast
- Using Open vSwitch in Proxmox
- Network Configuration
- Apply Network Changes
- Reboot Node to apply
- Reload Network with ifupdown2
- Naming Conventions
- Systemd Network Interface Names
- Choosing a network configuration
- Proxmox VE server in a private LAN, using an external gateway to reach the internet
- Proxmox VE server at hosting provider, with public IP ranges for Guests
- Proxmox VE server at hosting provider, with a single public IP address
- Default Configuration using a Bridge
- Routed Configuration
- Masquerading (NAT) with iptables
- Linux Bond
- VLAN 802.1Q
- VLAN for Guest Networks
- VLAN on the Host
- Disabling IPv6 on the Node
Open vSwitch
Open vSwitch (openvswitch, OVS) is an alternative to Linux native bridges, bonds, and vlan interfaces. Open vSwitch supports most of the features you would find on a physical switch, providing some advanced features like RSTP support, VXLANs, OpenFlow, and supports multiple vlans on a single bridge. If you need these features, it makes sense to switch to Open vSwitch.
Contents
Installation
Update the package index and then install the Open vSwitch packages by executing:
Configuration
Overview
Open vSwitch and Linux bonding and bridging or vlans MUST NOT be mixed. For instance, do not attempt to add a vlan to an OVS Bond, or add a Linux Bond to an OVSBridge or vice-versa. Open vSwitch is specifically tailored to function within virtualized environments, there is no reason to use the native linux functionality.
Bridges
A bridge is another term for a Switch. It directs traffic to the appropriate interface based on mac address. Open vSwitch bridges should contain raw ethernet devices, along with virtual interfaces such as OVSBonds or OVSIntPorts. These bridges can carry multiple vlans, and be broken out into ‘internal ports’ to be used as vlan interfaces on the host.
It should be noted that it is recommended that the bridge is bound to a trunk port with no untagged vlans; this means that your bridge itself will never have an ip address. If you need to work with untagged traffic coming into the bridge, it is recommended you tag it (assign it to a vlan) on the originating interface before entering the bridge (though you can assign an IP address on the bridge directly for that untagged data, it is not recommended). You can split out your tagged VLANs using virtual interfaces (OVSIntPort) if you need access to those vlans from your local host. Proxmox will assign the guest VMs a tap interface associated with a vlan, so you do NOT need a bridge per vlan (such as classic linux networking requires). You should think of your OVSBridge much like a physical hardware switch.
When configuring a bridge, in /etc/network/interfaces, prefix the bridge interface definition with allow-ovs $iface. For instance, a simple bridge containing a single interface would look like:
Remember, if you want to split out vlans with ips for use on the local host, you should use OVSIntPorts, see sections to follow.
However, any interfaces (Physical, OVSBonds, or OVSIntPorts) associated with a bridge should have their definitions prefixed with allow-$brname $iface, e.g. allow-vmbr0 bond0
NOTE: All interfaces must be listed under ovs_ports that are part of the bridge even if you have a port definition (e.g. OVSIntPort) that cross-references the bridge.
Bonds
Bonds are used to join multiple network interfaces together to act as single unit. Bonds must refer to raw ethernet devices (e.g. eth0, eth1).
When configuring a bond, it is recommended to use LACP (aka 802.3ad) for link aggregation. This requires switch support on the other end. A simple bond using eth0 and eth1 that will be part of the vmbr0 bridge might look like this.
NOTE: The interfaces that are part of a bond do not need to have their own configuration section.
VLANs Host Interfaces
In order for the host (e.g. proxmox host, not VMs themselves!) to utilize a vlan within the bridge, you must create OVSIntPorts. These split out a virtual interface in the specified vlan that you can assign an ip address to (or use DHCP). You need to set ovs_options tag=$VLAN to let OVS know what vlan the interface should be a part of. In the switch world, this is commonly referred to as an RVI (Routed Virtual Interface), or IRB (Integrated Routing and Bridging) interface.
IMPORTANT: These OVSIntPorts you create MUST also show up in the actual bridge definition under ovs_ports. If they do not, they will NOT be brought up even though you specified an ovs_bridge. You also need to prefix the definition with allow-$bridge $iface
Setting up this vlan port would look like this in /etc/network/interfaces:
Rapid Spanning Tree (RSTP)
Open vSwitch supports the Rapid Spanning Tree Protocol, but is disabled by default. Rapid Spanning Tree is a network protocol used to prevent loops in a bridged Ethernet local area network.
WARNING: The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability. Also, the Intel i40e driver is known to not work, older generation Intel NICs that use ixgbe are fine, as are Mellanox adapters that use the mlx5 driver.
In order to configure a bridge for RSTP support, you must use an «up» script as the «ovs_options» and «ovs_extras» options do not emit the proper commands. An example would be to add this to your «vmbr0» interface configuration:
It may be wise to also set a «post-up» script that sleeps for 10 or so seconds waiting on RSTP convergence before boot continues.
Other bridge options that may be set are:
- other_config:rstp-priority= Configures the root bridge priority, the lower the value the more likely to become the root bridge. It is recommended to set this to the maximum value of 0xFFFF to prevent Open vSwitch from becoming the root bridge. The default value is 0x8000
- other_config:rstp-forward-delay= The amount of time the bridge will sit in learning mode before entering a forwarding state. Range is 4-30, Default 15
- other_config:rstp-max-age= Range is 6-40, Default 20
You should also consider adding a cost value to all interfaces that are part of a bridge. You can do so in the ethX interface configuration:
Interface options that may be set via ovs_options are:
- other_config:rstp-path-cost= Default 2000 for 10GbE, 20000 for 1GbE
- other_config:rstp-port-admin-edge= Set to False if this is known to be connected to a switch running RSTP to prevent entering forwarding state if no BDPUs are detected
- other_config:rstp-port-auto-edge= Set to False if this is known to be connected to a switch running RSTP to prevent entering a forwarding state if no BDPUs are detected
- other_config:rstp-port-mcheck= Set to True if the other end is known to be using RSTP and not STP, will broadcast BDPUs immediately on link detection
You can look at the RSTP status for an interface via:
NOTE: Open vSwitch does not currently allow a bond to participate in RSTP.
Note on MTU
If you plan on using a MTU larger than the default of 1500, you need to mark any physical interfaces, bonds, and bridges with a larger MTU by adding an mtu setting to the definition such as mtu 9000 otherwise it will be disallowed.
Odd Note: Some newer Intel Gigabit NICs have a hardware limitation which means the maximum MTU they can support is 8996 (instead of 9000). If your interfaces aren’t coming up and you are trying to use 9000, this is likely the reason and can be difficult to debug. Try setting all your MTUs to 8996 and see if it resolves your issues.
Examples
Example 1: Bridge + Internal Ports + Untagged traffic
The below example shows you how to create a bridge with one physical interface, with 2 vlan interfaces split out, and tagging untagged traffic coming in on eth0 to vlan 1.
This is a complete and working /etc/network/interfaces listing:
Example 2: Bond + Bridge + Internal Ports
The below example shows you a combination of all the above features. 2 NICs are bonded together and added to an OVS Bridge. 2 vlan interfaces are split out in order to provide the host access to vlans with different MTUs.
This is a complete and working /etc/network/interfaces listing:
Example 3: Bond + Bridge + Internal Ports + Untagged traffic + No LACP
The below example shows you a combination of all the above features. 2 NICs are bonded together and added to an OVS Bridge. This example imitates the default proxmox network configuration but using a bond instead of a single NIC and the bond will work without a managed switch which supports LACP.
This is a complete and working /etc/network/interfaces listing:
Example 4: Rapid Spanning Tree (RSTP) — 1Gbps uplink, 10Gbps interconnect
WARNING: The stock PVE 4.4 kernel panics, must use a 4.5 or higher kernel for stability. Also, the Intel i40e driver is known to not work, older generation Intel NICs that use ixgbe are fine, as are Mellanox adapters that use the mlx5 driver.
This example shows how you can use Rapid Spanning Tree (RSTP) to interconnect your ProxMox nodes inexpensively, and uplinking to your core switches for external traffic, all while maintaining a fully fault-tolerant interconnection scheme. This means VM VM access (or possibly Ceph Ceph) can operate at the speed of the network interfaces directly attached in a star or ring topology. In this example, we are using 10Gbps to interconnect our 3 nodes (direct-attach), and uplink to our core switches at 1Gbps. Spanning Tree configured with the right cost metrics will prevent loops and activate the optimal paths for traffic. Obviously we are using this topology because 10Gbps switch ports are very expensive so this is strictly a cost-savings manoeuvre. You could obviously use 40Gbps ports instead of 10Gbps ports, but the key thing is the interfaces used to interconnect the nodes are higher-speed than the interfaces used to connect to the core switches.
This assumes you are using Open vSwitch 2.5+, older versions did not support Rapid Spanning Tree, but only Spanning Tree which had some issues.
To better explain what we are accomplishing, look at this ascii-art representation below:
This is a complete and working /etc/network/interfaces listing:
On our Juniper core switches, we put in place this configuration:
Multicast
Right now Open vSwitch doesn’t do anything in regards to multicast. Typically where you might tell linux to enable the multicast querier on the bridge, you should instead set up your querier at your router or switch. Please refer to the Multicast_notes wiki for more information.
Using Open vSwitch in Proxmox
Using Open vSwitch isn’t that much different than using normal linux bridges. The main difference is instead of having a bridge per vlan, you have a single bridge containing all your vlans. Then when configuring the network interface for the VM, you would select the bridge (probably the only bridge you have), and you would also enter the VLAN Tag associated with the VLAN you want your VM to be a part of. Now there is zero effort when adding or removing VLANs!
Источник
Network Configuration
Network configuration can be done either via the GUI, or by manually editing the file /etc/network/interfaces , which contains the whole network configuration. The interfaces(5) manual page contains the complete format description. All Proxmox VE tools try hard to keep direct user modifications, but using the GUI is still preferable, because it protects you from errors.
Once the network is configured, you can use the Debian traditional tools ifup and ifdown commands to bring interfaces up and down.
Apply Network Changes
Proxmox VE does not write changes directly to /etc/network/interfaces . Instead, we write into a temporary file called /etc/network/interfaces.new , this way you can do many related changes at once. This also allows to ensure your changes are correct before applying, as a wrong network configuration may render a node inaccessible.
Reboot Node to apply
With the default installed ifupdown network managing package you need to reboot to commit any pending network changes. Most of the time, the basic Proxmox VE network setup is stable and does not change often, so rebooting should not be required often.
Reload Network with ifupdown2
With the optional ifupdown2 network managing package you also can reload the network configuration live, without requiring a reboot.
Since Proxmox VE 6.1 you can apply pending network changes over the web-interface, using the Apply Configuration button in the Network panel of a node.
To install ifupdown2 ensure you have the latest Proxmox VE updates installed, then
installing ifupdown2 will remove ifupdown, but as the removal scripts of ifupdown before version 0.8.35+pve1 have a issue where network is fully stopped on removal [Introduced with Debian Buster: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945877] you must ensure that you have a up to date ifupdown package version. |
For the installation itself you can then simply do:
With that you’re all set. You can also switch back to the ifupdown variant at any time, if you run into issues.
Naming Conventions
We currently use the following naming conventions for device names:
Ethernet devices: en*, systemd network interface names. This naming scheme is used for new Proxmox VE installations since version 5.0.
Ethernet devices: eth[N], where 0 ≤ N ( eth0 , eth1 , …) This naming scheme is used for Proxmox VE hosts which were installed before the 5.0 release. When upgrading to 5.0, the names are kept as-is.
Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 ( vmbr0 — vmbr4094 )
Bonds: bond[N], where 0 ≤ N ( bond0 , bond1 , …)
VLANs: Simply add the VLAN number to the device name, separated by a period ( eno1.50 , bond1.30 )
This makes it easier to debug networks problems, because the device name implies the device type.
Systemd Network Interface Names
Systemd uses the two character prefix en for Ethernet network devices. The next characters depends on the device driver and the fact which schema matches first.
|d ] — devices on board
|d ] — device by hotplug id
|d ] — devices by bus id
x — device by MAC address
The most common patterns are:
eno1 — is the first on board NIC
enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
Choosing a network configuration
Depending on your current network organization and your resources you can choose either a bridged, routed, or masquerading networking setup.
Proxmox VE server in a private LAN, using an external gateway to reach the internet
The Bridged model makes the most sense in this case, and this is also the default mode on new Proxmox VE installations. Each of your Guest system will have a virtual interface attached to the Proxmox VE bridge. This is similar in effect to having the Guest network card directly connected to a new switch on your LAN, the Proxmox VE host playing the role of the switch.
Proxmox VE server at hosting provider, with public IP ranges for Guests
For this setup, you can use either a Bridged or Routed model, depending on what your provider allows.
Proxmox VE server at hosting provider, with a single public IP address
In that case the only way to get outgoing network accesses for your guest systems is to use Masquerading. For incoming network access to your guests, you will need to configure Port Forwarding.
For further flexibility, you can configure VLANs (IEEE 802.1q) and network bonding, also known as «link aggregation». That way it is possible to build complex and flexible virtual networks.
Default Configuration using a Bridge
Bridges are like physical network switches implemented in software. All virtual guests can share a single bridge, or you can create multiple bridges to separate network domains. Each host can have up to 4094 bridges.
The installation program creates a single bridge named vmbr0 , which is connected to the first Ethernet card. The corresponding configuration in /etc/network/interfaces might look like this:
Virtual machines behave as if they were directly connected to the physical network. The network, in turn, sees each virtual machine as having its own MAC, even though there is only one network cable connecting all of these VMs to the network.
Routed Configuration
Most hosting providers do not support the above setup. For security reasons, they disable networking as soon as they detect multiple MAC addresses on a single interface.
Some providers allow you to register additional MACs through their management interface. This avoids the problem, but can be clumsy to configure because you need to register a MAC for each of your VMs. |
You can avoid the problem by “routing” all traffic via a single interface. This makes sure that all network packets use the same MAC address.
A common scenario is that you have a public IP (assume 198.51.100.5 for this example), and an additional IP block for your VMs ( 203.0.113.16/29 ). We recommend the following setup for such situations:
Masquerading (NAT) with iptables
Masquerading allows guests having only a private IP address to access the network by using the host IP address for outgoing traffic. Each outgoing packet is rewritten by iptables to appear as originating from the host, and responses are rewritten accordingly to be routed to the original sender.
In some masquerade setups with firewall enabled, conntrack zones might be needed for outgoing connections. Otherwise the firewall could block outgoing connections since they will prefer the POSTROUTING of the VM bridge (and not MASQUERADE ). |
Adding these lines in the /etc/network/interfaces can fix this problem:
For more information about this, refer to the following links:
Linux Bond
Bonding (also called NIC teaming or Link Aggregation) is a technique for binding multiple NIC’s to a single network device. It is possible to achieve different goals, like make the network fault-tolerant, increase the performance or both together.
High-speed hardware like Fibre Channel and the associated switching hardware can be quite expensive. By doing link aggregation, two NICs can appear as one logical interface, resulting in double speed. This is a native Linux kernel feature that is supported by most switches. If your nodes have multiple Ethernet ports, you can distribute your points of failure by running network cables to different switches and the bonded connection will failover to one cable or the other in case of network trouble.
Aggregated links can improve live-migration delays and improve the speed of replication of data between Proxmox VE Cluster nodes.
There are 7 modes for bonding:
Round-robin (balance-rr): Transmit network packets in sequential order from the first available network interface (NIC) slave through the last. This mode provides load balancing and fault tolerance.
Active-backup (active-backup): Only one NIC slave in the bond is active. A different slave becomes active if, and only if, the active slave fails. The single logical bonded interface’s MAC address is externally visible on only one NIC (port) to avoid distortion in the network switch. This mode provides fault tolerance.
XOR (balance-xor): Transmit network packets based on [(source MAC address XOR’d with destination MAC address) modulo NIC slave count]. This selects the same NIC slave for each destination MAC address. This mode provides load balancing and fault tolerance.
Broadcast (broadcast): Transmit network packets on all slave network interfaces. This mode provides fault tolerance.
IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP): Creates aggregation groups that share the same speed and duplex settings. Utilizes all slave network interfaces in the active aggregator group according to the 802.3ad specification.
Adaptive transmit load balancing (balance-tlb): Linux bonding driver mode that does not require any special network-switch support. The outgoing network packet traffic is distributed according to the current load (computed relative to the speed) on each network interface slave. Incoming traffic is received by one currently designated slave network interface. If this receiving slave fails, another slave takes over the MAC address of the failed receiving slave.
Adaptive load balancing (balance-alb): Includes balance-tlb plus receive load balancing (rlb) for IPV4 traffic, and does not require any special network switch support. The receive load balancing is achieved by ARP negotiation. The bonding driver intercepts the ARP Replies sent by the local system on their way out and overwrites the source hardware address with the unique hardware address of one of the NIC slaves in the single logical bonded interface such that different network-peers use different MAC addresses for their network packet traffic.
If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using the corresponding bonding mode (802.3ad). Otherwise you should generally use the active-backup mode.
If you intend to run your cluster network on the bonding interfaces, then you have to use active-passive mode on the bonding interfaces, other modes are unsupported.
The following bond configuration can be used as distributed/shared storage network. The benefit would be that you get more speed and the network will be fault-tolerant.
Another possibility it to use the bond directly as bridge port. This can be used to make the guest network fault-tolerant.
VLAN 802.1Q
A virtual LAN (VLAN) is a broadcast domain that is partitioned and isolated in the network at layer two. So it is possible to have multiple networks (4096) in a physical network, each independent of the other ones.
Each VLAN network is identified by a number often called tag. Network packages are then tagged to identify which virtual network they belong to.
VLAN for Guest Networks
Proxmox VE supports this setup out of the box. You can specify the VLAN tag when you create a VM. The VLAN tag is part of the guest network configuration. The networking layer supports different modes to implement VLANs, depending on the bridge configuration:
VLAN awareness on the Linux bridge: In this case, each guest’s virtual network card is assigned to a VLAN tag, which is transparently supported by the Linux bridge. Trunk mode is also possible, but that makes configuration in the guest necessary.
«traditional» VLAN on the Linux bridge: In contrast to the VLAN awareness method, this method is not transparent and creates a VLAN device with associated bridge for each VLAN. That is, creating a guest on VLAN 5 for example, would create two interfaces eno1.5 and vmbr0v5, which would remain until a reboot occurs.
Open vSwitch VLAN: This mode uses the OVS VLAN feature.
Guest configured VLAN: VLANs are assigned inside the guest. In this case, the setup is completely done inside the guest and can not be influenced from the outside. The benefit is that you can use more than one VLAN on a single virtual NIC.
VLAN on the Host
To allow host communication with an isolated network. It is possible to apply VLAN tags to any network device (NIC, Bond, Bridge). In general, you should configure the VLAN on the interface with the least abstraction layers between itself and the physical NIC.
For example, in a default configuration where you want to place the host management address on a separate VLAN.
The next example is the same setup but a bond is used to make this network fail-safe.
Disabling IPv6 on the Node
Proxmox VE works correctly in all environments, irrespective of whether IPv6 is deployed or not. We recommend leaving all settings at the provided defaults.
Should you still need to disable support for IPv6 on your node, do so by creating an appropriate sysctl.conf (5) snippet file and setting the proper sysctls, for example adding /etc/sysctl.d/disable-ipv6.conf with content:
This method is preferred to disabling the loading of the IPv6 module on the kernel commandline.
Источник