- Troubleshoot issues with WSUS client agents
- Verify that the client is configured correctly
- Check for issues relating to BITS
- BITS fails to start
- BITS jobs are failing
- Repair a corrupted BITS configuration
- Issues with the WSUS agent service
- Make sure the WSUS server is reachable from the client
- Rebuild the Automatic Update Agent Store
- Check for clients with the same SUSclient ID
- Troubleshoot the Windows Server Software Defined Networking Stack
- Error types
- Diagnostic tools
- Network controller diagnostics
- Hyper-V host diagnostics
- GitHub
- Troubleshooting Workflows and Guides
- [Hoster] Validate System Health
- Check network connectivity between the network controller and Hyper-V Host (NC Host Agent service)
- Check Host Agent services
- Check health of network controller
- Check for corresponding HostIDs and certificates between network controller and each Hyper-V Host
- Check the SLB Configuration State
- Gateway Validation
- [Hoster] Validate Data-Plane
- Check HNV Provider Logical Network Connectivity
- Check MTU and Jumbo Frame support on HNV Provider Logical Network
- Check Tenant VM NIC connectivity
- Specific Troubleshooting Scenarios
- No network connectivity between two tenant virtual machines
- Example
- Look at IP Configuration and Virtual Subnets which are referencing this ACL
- Logging, Tracing and advanced diagnostics
- Network controller centralized logging
- Enable logging
- Change logging settings
- Collecting Logs and Traces
- SLB Diagnostics
- SLBM Fabric errors (Hosting service provider actions)
- SLB Mux Tracing
- VFP and vSwitch Tracing
Troubleshoot issues with WSUS client agents
This article helps you diagnose and resolve issues with the Windows Server Update Services (WSUS) client agents.
Original product version: В Windows Server Update Services
Original KB number: В 10132
When you experience issues with the WSUS client agents, they can manifest themselves in many ways. Some common problems are listed here:
- It could be an issue with the client settings for Group Policy.
- It could be an issue with BITS.
- It could be an issue with the WSUS agent service.
- It could be related to a network issue that prevents the client from reaching the server.
- It could be an issue with the Automatic Update Agent Store.
- It could be an issue in which clients have duplicate WSUS client IDs caused by disk cloning.
Verify that the client is configured correctly
When you troubleshoot issues with a WSUS client agent, first make sure the client is properly configured. Make sure the proper Active Directory Group Policy is being received by the client, and the details of the WSUS server are present. You can do so by running the following command:
Open the text file in Notepad and find the name of your WSUS policy. For example, if your WSUS policy is named WSUS, you can find it in the GPRESULT.TXT file within the Computer Settings section under the Applied Group Policy Objects heading. Below is an example:
If the WSUS settings aren’t present, possible causes include:
- The system doesn’t have the Group Policy from the domain.
- The Group Policy isn’t targeted to the client system.
To fix this issue, ensure that the Group Policy is successfully updated on each client, and that the WSUS setting is properly configured.
To update the Group Policy on the client, run GPUpdate /force from a Command Prompt.
For more information about configuring Group Policy for WSUS clients, see Configure Automatic Updates by Using Group Policy.
Check for issues relating to BITS
Background Intelligent Transfer Service (BITS) is the service used by WSUS to download updates from Microsoft Update to the main WSUS server, and from WSUS servers to their clients. Some download issues may be caused by problems with BITS on the server or client computers. When you troubleshoot download problems, you should ensure that BITS is running properly on all affected computers.
The BITS service must run under the LocalSystem account by default. To configure the service to run under the correct account, follow these steps:
Open a Command Prompt and run the following command:
A space must occur between obj= and LocalSystem. If successful, you should receive the following output:
Stop and restart BITS.
To view the BITS service status, open a Command Prompt and run the following command:
If BITS is running, you should see the following output:
If BITS isn’t running, you’ll see the following output:
Usually it’s possible to resolve BITS issues by stopping the service and restarting it. To stop and restart the BITS service, run the following commands from a Command Prompt:
You must be logged on as a local administrator to stop and restart BITS.
BITS fails to start
If the BITS service fails to start, look in the event log for any BITS-related error. You can use the following table to diagnose the cause of these errors.
Error name | Error code | Description |
---|---|---|
ERROR_SERVICE_DOES_NOT_EXIST | 0x80070424 | See the section on repairing the BITS configuration below. |
ERROR_SERVICE_NOT_IN_EXE | 0x8007043B | BITS isn’t listed as one of the services in the netsvcs svchost group |
ERROR_SERVICE_DISABLED | 0x80070422 | BITS has been disabled. Enable the BITS service. |
ERROR_SERVICE_DEPENDENCY_DELETED ERROR_SERVICE_DEPENDENCY_FAIL | 0x80070433, 0x8007042c | A service appearing in the BITS service dependency list cannot be started. Make sure the dependency list for the BITS service is correct: Windows Vista: RpcSs, EventSystem (also http.sys and LanManWorkstation when peer caching is enabled) Windows Server 2003: Rpcss, EventSystem Windows XP: Rpcss Windows 2000: Rpcss, SENS, Wmi |
ERROR_PATH_NOT_FOUND | 0x80070003 | Pre-Windows Vista: %ALLUSERSPROFILE%\Microsoft\Network doesn’t exist |
ERROR_FILE_NOT_FOUND | 0x80070002 | The Parameters key is missing. Ensure that the following keys and values exist: HKLM\SYSTEM\CurrentControlSet\Services\BITS\Parameters\ServiceDll = %SystemRoot%\System32\qmgr.dll |
REGDB_E_CLASSNOTREG, EVENT_E_INTERNALERROR | 0x80040154, 0x80040206 | BITS for Windows 2000 is dependent on SENS and EventSystem services. If the COM+ catalog is corrupted, BITS may fail with this error code. |
BITS jobs are failing
If the client is properly configured to receive updates, BITS is configured correctly, and BITS appears to start and run properly, you may be experiencing an issue where BITS jobs themselves are failing. To verify it, look in the event log for any BITS-related errors. You can use the following table to diagnose the cause of these errors.
Error name | Error code | Description |
---|---|---|
E_INVALIDARG | 0x80070057 | An incorrect proxy server name was specified in the user’s Internet Explorer proxy settings. This error is also seen when credentials are supplied for authentication schemes that aren’t NTLM/Negotiate, but the user name or password is null. Change the user’s Internet Explorer proxy settings to be a valid proxy server. Or change the credentials not to be NULL user name/password for schemes other than NTLM/Negotiate. |
ERROR_WINHTTP_NAME_NOT_RESOLVED | 0x80072ee7 | The server/proxy could not be resolved by BITS. Internet Explorer on the same machine in the context of the job owner would see the same problem. Try downloading the same file via the web browser using the context of the job owner. |
ERROR_HTTP_INVALID_SERVER_RESPONSE | 0x80072f78 | It’s a transient error and the job will continue downloading. |
BG_E_INSUFFICIENT_RANGE_SUPPORT | 0x80200013 | BITS uses range headers in HTTP requests to request parts of a file. If the server or proxy server doesn’t understand range requests and returns the full file instead of the requested range, BITS puts the job into the ERROR state with this error. Capture the network traffic during the error and examine if HTTP GET requests with Range header are getting valid responses. Check proxy servers to ensure that they are configured correctly to support Range requests. |
BG_E_MISSING_FILE_SIZE | 0x80200011 | When BITS sends a HEAD request and the server/proxy doesn’t return Content-Length header in the response, BITS puts the job in ERROR state with this error. Check the proxy server and WSUS server to ensure that they are configured correctly. Some versions of the Apache 2.0 proxy server are known to exhibit this behavior. |
BG_E_HTTP_ERROR_403 | 0x80190193 | When the server returns HTTP 403 response in any of the requests, BITS puts the job in ERROR state with this error code. HTTP 403 corresponds to Forbidden: Access is denied. Check access permissions for the account running the job. |
ERROR_NOT_LOGGED_ON | 0x800704dd | The SENS service isn’t receiving user logon notifications. BITS (version 2.0 and later) depends on logon notifications from Service Control Manager, which in turn depends on the SENS service. Ensure that the SENS service is started and running correctly. |
Repair a corrupted BITS configuration
To repair corrupted BITS service configuration, you can enter the BITS service configuration manually.
This action should only be taken in circumstances where all other troubleshooting attempts have failed. You must be an administrator to modify the BITS configuration.
To repair a corrupted BITS configuration, follow these steps:
Open a Command Prompt.
Enter the following commands, press ENTER after you type each command:
Stop and restart BITS.
Issues with the WSUS agent service
Make sure that the Windows Update service can start successfully.
To view the current status of the Windows Update service, open a Command Prompt and run the following command:
If WUAUSERV is running, you should see the following output:
If WUAUSERV isn’t running, you see the following output:
Verify that you can start the WUAUSERV service successfully. You must be logged on as a local administrator to stop and restart WUAUSERV.
To start the WUAUSERV service, run the following commands from a Command Prompt:
If the client agent fails to start and run properly, check the Windows Update Agent version. If the agent isn’t up to date, update the Windows Update Agent to the latest version.
After you run the fix or update the agent, run wuauclt /detectnow . Check windowsupdate.log to make sure there’s no issues.
Make sure the WSUS server is reachable from the client
Make sure that you can access the URL http:// /iuident.cab and download the file without errors.
If the WSUS server is unreachable from the client, the most likely causes include:
- There’s a name resolution issue on the client.
- There’s a network-related issue, such as a proxy configuration issue.
Use standard troubleshooting procedures to verify name resolution is working on the network. If name resolution is working, the next step is to check for proxy issues. Check windowsupdate.log (C:\windows) to see if there are any proxy related errors. You can run the proxycfg command to check the WinHTTP proxy settings.
If there are proxy errors, go to Internet Explorer > Tools > Connections > LAN Settings, configure the correct proxy, and then make sure you can access the WSUS URL specified.
Once done, you can copy these user proxy settings to the WinHTTP proxy settings by using the proxycfg -u command. After the proxy settings are specified, run wuauclt /detectnow from a Command Prompt and check windowsupdate.log for errors.
Rebuild the Automatic Update Agent Store
When there are issues downloading updates and there are errors relating to the software distribution store, complete the following steps on the client:
- Stop the Automatic Updates service by running sc stop wuauserv from a Command Prompt.
- Rename the software distribution folder (for example, C:\Windows\SoftwareDistribution).
- Restart the Automatic Update service by running sc start wuauserv from a Command Prompt.
- From a Command Prompt, run wuauclt /resetauthorization /detectnow .
- From a Command Prompt, run wuauclt /reportnow .
Check for clients with the same SUSclient ID
You may experience an issue where only one WSUS client appears in the console. Or you may notice that out of a group of clients, only one appears in the console at a time but the exact one that does appear may change over time. This issue can happen when systems are imaged and the clients end up having the same SUSclientID .
For those clients that aren’t working properly because of the same SUSclientID , complete the following steps:
Stop the Automatic Updates service by running sc stop wuauserv from a Command Prompt.
Delete the SUSclientID registry key from the following location:
Restart the Automatic Update service by running sc start wuauserv from a Command Prompt.
From a Command Prompt, run wuauclt /resetauthorization /detectnow .
From a Command Prompt, run wuauclt /reportnow .
Troubleshoot the Windows Server Software Defined Networking Stack
Applies to: Windows Server 2019, Windows Server 2016
This guide examines the common Software Defined Networking (SDN) errors and failure scenarios and outlines a troubleshooting workflow that leverages the available diagnostic tools.
For more information about Microsoft’s Software Defined Networking, see Software Defined Networking.
Error types
The following list represents the class of problems most often seen with Hyper-V Network Virtualization (HNVv1) in Windows Server 2012 R2 from in-market production deployments and coincides in many ways with the same types of problems seen in Windows Server 2016 HNVv2 with the new Software Defined Network (SDN) Stack.
Most errors can be classified into a small set of classes:
Invalid or unsupported configuration A user invokes the NorthBound API incorrectly or with invalid policy.
Error in policy application Policy from Network Controller was not delivered to a Hyper-V Host, significantly delayed and / or not up to date on all Hyper-V hosts (for example, after a Live Migration).
Configuration drift or software bug Data-path issues resulting in dropped packets.
External error related to NIC hardware / drivers or the underlay network fabric Misbehaving task offloads (such as VMQ) or underlay network fabric misconfigured (such as MTU)
This troubleshooting guide examines each of these error categories and recommends best practices and diagnostic tools available to identify and fix the error.
Diagnostic tools
Before discussing the troubleshooting workflows for each of these type of errors, let’s examine the diagnostic tools available.
To use the Network Controller (control-path) diagnostic tools, you must first install the RSAT-NetworkController feature and import the NetworkControllerDiagnostics module:
To use the HNV Diagnostics (data-path) diagnostic tools, you must import the HNVDiagnostics module:
Network controller diagnostics
These cmdlets are documented on TechNet in the Network Controller Diagnostics Cmdlet Topic. They help identify problems with network policy consistency in the control-path between Network Controller nodes and between the Network Controller and the NC Host Agents running on the Hyper-V hosts.
The Debug-ServiceFabricNodeStatus and Get-NetworkControllerReplica cmdlets must be run from one of the Network Controller node virtual machines. All other NC Diagnostic cmdlets can be run from any host which has connectivity to the Network Controller and is in either in the Network Controller Management security group (Kerberos) or has access to the X.509 certificate for managing the Network Controller.
Hyper-V host diagnostics
These cmdlets are documented on TechNet in the Hyper-V Network Virtualization (HNV) Diagnostics Cmdlet Topic. They help identify problems in the data-path between tenant virtual machines (East/West) and ingress traffic through an SLB VIP (North/South).
The Debug-VirtualMachineQueueOperation, Get-CustomerRoute, Get-PACAMapping, Get-ProviderAddress, Get-VMNetworkAdapterPortId, Get-VMSwitchExternalPortId, and Test-EncapOverheadSettings are all local tests which can be run from any Hyper-V host. The other cmdlets invoke data-path tests through the Network Controller and therefore need access to the Network Controller as descried above.
GitHub
The Microsoft/SDN GitHub Repo has a number of sample scripts and workflows which build on top of these in-box cmdlets. In particular, diagnostic scripts can be found in the Diagnostics folder. Please help us contribute to these scripts by submitting Pull Requests.
Troubleshooting Workflows and Guides
[Hoster] Validate System Health
There is an embedded resource named Configuration State in several of the Network Controller resources. Configuration state provides information about system health including the consistency between the network controller’s configuration and the actual (running) state on the Hyper-V hosts.
To check configuration state, run the following from any Hyper-V host with connectivity to the Network Controller.
The value for the NetworkController parameter should either be the FQDN or IP address based on the subject name of the X.509 >certificate created for Network Controller.
The Credential parameter only needs to be specified if the network controller is using Kerberos authentication (typical in VMM deployments). The credential must be for a user who is in the Network Controller Management Security Group.
A sample Configuration State message is shown below:
There is a bug in the system where the Network Interface resources for the SLB Mux Transit VM NIC are in a Failure state with error «Virtual Switch — Host Not Connected To Controller». This error can be safely ignored if the IP configuration in the VM NIC resource is set to an IP Address from the Transit Logical Network’s IP Pool. There is a second bug in the system where the Network Interface resources for the Gateway HNV Provider VM NICs are in a Failure state with error «Virtual Switch — PortBlocked». This error can also be safely ignored if the IP configuration in the VM NIC resource is set to null (by design).
The table below shows the list of error codes, messages, and follow-up actions to take based on the configuration state observed.
Code | Message | Action |
---|---|---|
Unknown | Unknown error | |
HostUnreachable | The host machine is not reachable | Check the Management network connectivity between Network Controller and Host |
PAIpAddressExhausted | The PA Ip addresses exhausted | Increase the HNV Provider logical subnet’s IP Pool Size |
PAMacAddressExhausted | The PA Mac addresses exhausted | Increase the Mac Pool Range |
PAAddressConfigurationFailure | Failed to plumb PA addresses to the host | Check the Management network connectivity between Network Controller and Host. |
CertificateNotTrusted | Certificate is not trusted | Fix the certificates used for communication with the host. |
CertificateNotAuthorized | Certificate not authorized | Fix the certificates used for communication with the host. |
PolicyConfigurationFailureOnVfp | Failure in configuring VFP policies | This is a runtime failure. No definite work arounds. Collect logs. |
PolicyConfigurationFailure | Failure in pushing policies to the hosts, due to communication failures or others error in the NetworkController. | No definite actions. This is due to failure in goal state processing in the Network Controller modules. Collect logs. |
HostNotConnectedToController | The Host is not yet connected to the Network Controller | Port Profile not applied on the host or the host is not reachable from the Network Controller. Validate that HostID registry key matches the Instance ID of the server resource |
MultipleVfpEnabledSwitches | There are multiple VFp enabled Switches on the host | Delete one of the switches, since Network Controller Host Agent only supports one vSwitch with the VFP extension enabled |
PolicyConfigurationFailure | Failed to push VNet policies for a VmNic due to certificate errors or connectivity errors | Check if proper certificates have been deployed (Certificate subject name must match FQDN of host). Also verify the host connectivity with the Network Controller |
PolicyConfigurationFailure | Failed to push vSwitch policies for a VmNic due to certificate errors or connectivity errors | Check if proper certificates have been deployed (Certificate subject name must match FQDN of host). Also verify the host connectivity with the Network Controller |
PolicyConfigurationFailure | Failed to push Firewall policies for a VmNic due to certificate errors or connectivity errors | Check if proper certificates have been deployed (Certificate subject name must match FQDN of host). Also verify the host connectivity with the Network Controller |
DistributedRouterConfigurationFailure | Failed to configure the Distributed router settings on the host vNic | TCPIP stack error. May require cleaning up the PA and DR Host vNICs on the server on which this error was reported |
DhcpAddressAllocationFailure | DHCP address allocation failed for a VMNic | Check if the static IP address attribute is configured on the NIC resource |
CertificateNotTrusted CertificateNotAuthorized | Failed to connect to Mux due to network or cert errors | Check the numeric code provided in the error message code: this corresponds to the winsock error code. Certificate errors are granular (for example, cert cannot be verified, cert not authorized, etc.) |
HostUnreachable | MUX is Unhealthy (Common case is BGPRouter disconnected) | BGP peer on the RRAS (BGP virtual machine) or Top-of-Rack (ToR) switch is unreachable or not peering successfully. Check BGP settings on both Software Load Balancer Multiplexer resource and BGP peer (ToR or RRAS virtual machine) |
HostNotConnectedToController | SLB host agent is not connected | Check that SLB Host Agent service is running; Refer to SLB host agent logs (auto running) for reasons why, in case SLBM (NC) rejected the cert presented by the host agent running state will show nuanced information |
PortBlocked | The VFP port is blocked, due to lack of VNET / ACL policies | Check if there are any other errors, which might cause the policies to be not configured. |
Overloaded | Loadbalancer MUX is overloaded | Performance issue with MUX |
RoutePublicationFailure | Loadbalancer MUX is not connected to a BGP router | Check if the MUX has connectivity with the BGP routers and that BGP peering is setup correctly |
VirtualServerUnreachable | Loadbalancer MUX is not connected to SLB manager | Check connectivity between SLBM and MUX |
QosConfigurationFailure | Failed to configure QOS policies | See if sufficient bandwidth is available for all VM’s if QOS reservation is used |
Check network connectivity between the network controller and Hyper-V Host (NC Host Agent service)
Run the netstat command below to validate that there are three ESTABLISHED connections between the NC Host Agent and the Network Controller node(s) and one LISTENING socket on the Hyper-V Host
- LISTENING on port TCP:6640 on Hyper-V Host (NC Host Agent Service)
- Two established connections from Hyper-V host IP on port 6640 to NC node IP on ephemeral ports (> 32000)
- One established connection from Hyper-V host IP on ephemeral port to Network Controller REST IP on port 6640
There may only be two established connections on a Hyper-V host if there are no tenant virtual machines deployed on that particular host.
Check Host Agent services
The network controller communicates with two host agent services on the Hyper-V hosts: SLB Host Agent and NC Host Agent. It is possible that one or both of these services is not running. Check their state and restart if they’re not running.
Check health of network controller
If there are not three ESTABLISHED connections or if the Network Controller appears unresponsive, check to see that all nodes and service modules are up and running by using the following cmdlets.
The network controller service modules are:
- ControllerService
- ApiService
- SlbManagerService
- ServiceInsertion
- FirewallService
- VSwitchService
- GatewayManager
- FnmService
- HelperService
- UpdateService
Check that the Replica Status is Ready for each service.
Check for corresponding HostIDs and certificates between network controller and each Hyper-V Host
On a Hyper-V Host, run the following commands to check that the HostID corresponds to the Instance Id of a server resource on the Network Controller
Remediation If using SDNExpress scripts or manual deployment, update the HostId key in the registry to match the Instance Id of the server resource. Restart the Network Controller Host Agent on the Hyper-V host (physical server) If using VMM, delete the Hyper-V Server from VMM and remove the HostId registry key. Then, re-add the server through VMM.
Check that the thumbprints of the X.509 certificates used by the Hyper-V host (the hostname will be the cert’s Subject Name) for (SouthBound) communication between the Hyper-V Host (NC Host Agent service) and Network Controller nodes are the same. Also check that the Network Controller’s REST certificate has subject name of CN=.
You can also check the following parameters of each cert to make sure the subject name is what is expected (hostname or NC REST FQDN or IP), the certificate has not yet expired, and that all certificate authorities in the certificate chain are included in the trusted root authority.
- Subject Name
- Expiration Date
- Trusted by Root Authority
Remediation If multiple certificates have the same subject name on the Hyper-V host, the Network Controller Host Agent will randomly choose one to present to the Network Controller. This may not match the thumbprint of the server resource known to the Network Controller. In this case, delete one of the certificates with the same subject name on the Hyper-V host and then re-start the Network Controller Host Agent service. If a connection can still not be made, delete the other certificate with the same subject name on the Hyper-V Host and delete the corresponding server resource in VMM. Then, re-create the server resource in VMM which will generate a new X.509 certificate and install it on the Hyper-V host.
Check the SLB Configuration State
The SLB Configuration State can be determined as part of the output to the Debug-NetworkController cmdlet. This cmdlet will also output the current set of Network Controller resources in JSON files, all IP configurations from each Hyper-V host (server) and local network policy from Host Agent database tables.
Additional traces will be collected by default. To not collect traces, add the -IncludeTraces:$false parameter.
The default output location will be the \NCDiagnostics\ directory. The default output directory can be changed by using the -OutputDirectory parameter.
The SLB Configuration State information can be found in the diagnostics-slbstateResults.Json file in this directory.
This JSON file can be broken down into the following sections:
- Fabric
- SlbmVips — This section lists the IP address of the SLB Manager VIP address which is used by the Network Controller to coodinate configuration and health between the SLB Muxes and SLB Host Agents.
- MuxState — This section will list one value for each SLB Mux deployed giving the state of the mux
- Router Configuration — This section will list the Upstream Router’s (BGP Peer) Autonomous System Number (ASN), Transit IP Address, and ID. It will also list the SLB Muxes ASN and Transit IP.
- Connected Host Info — This section will list the Management IP address all of the Hyper-V hosts available to run load-balanced workloads.
- Vip Ranges — This section will list the public and private VIP IP pool ranges. The SLBM VIP will be included as an allocated IP from one of these ranges.
- Mux Routes — This section will list one value for each SLB Mux deployed containing all of the Route Advertisements for that particular mux.
- Tenant
- VipConsolidatedState — This section will list the connectivity state for each Tenant VIP including advertised route prefix, Hyper-V Host and DIP endpoints.
SLB State can be ascertained directly by using the DumpSlbRestState script available on the Microsoft SDN GitHub repository.
Gateway Validation
From Network Controller:
From Gateway VM:
From Top of Rack (ToR) Switch:
sh ip bgp summary (for 3rd party BGP Routers)
Windows BGP Router
In addition to these, from the issues we have seen so far (especially on SDNExpress based deployments), the most common reason for Tenant Compartment not getting configured on GW VMs seem to be the fact that the GW Capacity in FabricConfig.psd1 is less compared to what folks try to assign to the Network Connections (S2S Tunnels) in TenantConfig.psd1. This can be checked easily by comparing outputs of the following commands:
[Hoster] Validate Data-Plane
After the Network Controller has been deployed, tenant virtual networks and subnets have been created, and VMs have been attached to the virtual subnets, additional fabric level tests can be performed by the hoster to check tenant connectivity.
Check HNV Provider Logical Network Connectivity
After the first guest VM running on a Hyper-V host has been connected to a tenant virtual network, the Network Controller will assign two HNV Provider IP Addresses (PA IP Addresses) to the Hyper-V Host. These IPs will come from the HNV Provider logical network’s IP Pool and be managed by the Network Controller. To find out what these two HNV IP Addresses are ‘s
These HNV Provider IP Addresses (PA IPs) are assigned to Ethernet Adapters created in a separate TCPIP network compartment and have an adapter name of VLANX where X is the VLAN assigned to the HNV Provider (transport) logical network.
Connectivity between two Hyper-V hosts using the HNV Provider logical network can be done by a ping with an additional compartment (-c Y) parameter where Y is the TCPIP network compartment in which the PAhostVNICs are created. This compartment can be determined by executing:
The PA Host vNIC Adapters are not used in the data-path and so do not have an IP assigned to the «vEthernet (PAhostVNic) adapter».
For instance, assume that Hyper-V hosts 1 and 2 have HNV Provider (PA) IP Addresses of:
Hyper-V Host | PA IP Address 1 | PA IP Address 2 |
---|---|---|
Host 1 | 10.10.182.64 | 10.10.182.65 |
Host 2 | 10.10.182.66 | 10.10.182.67 |
we can ping between the two using the following command to check HNV Provider logical network connectivity.
Remediation If HNV Provider ping does not work, check your physical network connectivity including VLAN configuration. The physical NICs on each Hyper-V host should be in trunk mode with no specific VLAN assigned. The Management Host vNIC should be isolated to the Management Logical Network’s VLAN.
Check MTU and Jumbo Frame support on HNV Provider Logical Network
Another common problem in the HNV Provider logical network is that the physical network ports and/or Ethernet card do not have a large enough MTU configured to handle the overhead from VXLAN (or NVGRE) encapsulation.
Some Ethernet cards and drivers support the new *EncapOverhead keyword which will automatically be set by the Network Controller Host Agent to a value of 160. This value will then be added to the value of the *JumboPacket keyword whose summation is used as the advertised MTU. e.g. *EncapOverhead = 160 and *JumboPacket = 1514 => MTU = 1674B
To test whether or not the HNV Provider logical network supports the larger MTU size end-to-end, use the Test-LogicalNetworkSupportsJumboPacket cmdlet:
- Adjust the MTU size on the physical switch ports to be at least 1674B (including 14B Ethernet header and trailer)
- If your NIC card does not support the EncapOverhead keyword, adjust the JumboPacket keyword to be at least 1674B
Check Tenant VM NIC connectivity
Each VM NIC assigned to a guest VM has a CA-PA mapping between the private Customer Address (CA) and the HNV Provider Address (PA) space. These mappings are kept in the OVSDB server tables on each Hyper-V host and can be found by executing the following cmdlet.
If the CA-PA mappings you expect are not output for a given tenant VM, please check the VM NIC and IP Configuration resources on the Network Controller using the Get-NetworkControllerNetworkInterface cmdlet. Also, check the established connections between the NC Host Agent and Network Controller nodes.
With this information, a tenant VM ping can now be initiated by the Hoster from the Network Controller using the Test-VirtualNetworkConnection cmdlet.
Specific Troubleshooting Scenarios
The following sections provide guidance for troubleshooting specific scenarios.
No network connectivity between two tenant virtual machines
- [Tenant] Ensure Windows Firewall in tenant virtual machines is not blocking traffic.
- [Tenant] Check that IP addresses have been assigned to the tenant virtual machine by running ipconfig.
- [Hoster] Run Test-VirtualNetworkConnection from the Hyper-V host to validate connectivity between the two tenant virtual machines in question.
The VSID refers to the Virtual Subnet ID. In the case of VXLAN, this is the VXLAN Network Identifier (VNI). You can find this value by running the Get-PACAMapping cmdlet.
Example
Create CA-ping between «Green Web VM 1» with SenderCA IP of 192.168.1.4 on Host «sa18n30-2.sa18.nttest.microsoft.com» with Mgmt IP of 10.127.132.153 to ListenerCA IP of 192.168.1.5 both attached to Virtual Subnet (VSID) 4114.
- [Tenant] Check that there is no distributed firewall policies specified on the virtual subnet or VM network interfaces which would block traffic.
Query the Network Controller REST API found in demo environment at sa18n30nc in the sa18.nttest.microsoft.com domain.
Look at IP Configuration and Virtual Subnets which are referencing this ACL
- [Hoster] Run Get-ProviderAddress on both Hyper-V hosts hosting the two tenant virtual machines in question and then run Test-LogicalNetworkConnection or ping -c from the Hyper-V host to validate connectivity on the HNV Provider logical network
- [Hoster] Ensure that the MTU settings are correct on the Hyper-V hosts and any Layer-2 switching devices in between the Hyper-V Hosts. Run Test-EncapOverheadValue on all Hyper-V hosts in question. Also check that all Layer-2 switches in between have MTU set to least 1674 bytes to account for maximum overhead of 160 bytes.
- [Hoster] If PA IP Addresses are not present and/or CA Connectivity is broken, check to ensure network policy has been received. Run Get-PACAMapping to see if the encapsulation rules and CA-PA mappings required for creating overlay virtual networks are correctly established.
- [Hoster] Check that the Network Controller Host Agent is connected to the Network Controller. Run netstat -anp tcp |findstr 6640 to see if the
- [Hoster] Check that the Host ID in HKLM/ matches the Instance ID of the server resources hosting the tenant virtual machines.
- [Hoster] Check that the Port Profile ID matches the Instance ID of the VM Network Interfaces of the tenant virtual machines.
Logging, Tracing and advanced diagnostics
The following sections provide information on advanced diagnostics, logging, and tracing.
Network controller centralized logging
The Network Controller can automatically collect debugger logs and store them in a centralized location. Log collection can be enabled when you deploy the Network Controller for the first time or any time later. The logs are collected from the Network Controller, and network elements managed by Network Controller: host machines, software load balancers (SLB) and gateway machines.
These logs include debug logs for the Network Controller cluster, the Network Controller application, gateway logs, SLB, virtual networking and the distributed firewall. Whenever a new host/SLB/gateway is added to the Network Controller, logging is started on those machines. Similarly, when a host/SLB/gateway is removed from the Network Controller, logging is stopped on those machines.
Enable logging
Logging is automatically enabled when you install the Network Controller cluster using the Install-NetworkControllerCluster cmdlet. By default, the logs are collected locally on the Network Controller nodes at %systemdrive%\SDNDiagnostics. It is STRONGLY RECOMMENDED that you change this location to be a remote file share (not local).
The Network Controller cluster logs are stored at %programData%\Windows Fabric\log\Traces. You can specify a centralized location for log collection with the DiagnosticLogLocation parameter with the recommendation that this is also be a remote file share.
If you want to restrict access to this location, you can provide the access credentials with the LogLocationCredential parameter. If you provide the credentials to access the log location, you should also provide the CredentialEncryptionCertificate parameter, which is used to encrypt the credentials stored locally on the Network Controller nodes.
With the default settings, it is recommended that you have at least 75 GB of free space in the central location, and 25 GB on the local nodes (if not using a central location) for a 3-node Network Controller cluster.
Change logging settings
You can change logging settings at any time using the Set-NetworkControllerDiagnostic cmdlet. The following settings can be changed:
- Centralized log location. You can change the location to store all the logs, with the DiagnosticLogLocation parameter.
- Credentials to access log location. You can change the credentials to access the log location, with the LogLocationCredential parameter.
- Move to local logging. If you have provided centralized location to store logs, you can move back to logging locally on the Network Controller nodes with the UseLocalLogLocation parameter (not recommended due to large disk space requirements).
- Logging scope. By default, all logs are collected. You can change the scope to collect only Network Controller cluster logs.
- Logging level. The default logging level is Informational. You can change it to Error, Warning, or Verbose.
- Log Aging time. The logs are stored in a circular fashion. You will have 3 days of logging data by default, whether you use local logging or centralized logging. You can change this time limit with LogTimeLimitInDays parameter.
- Log Aging size. By default, you will have a maximum 75 GB of logging data if using centralized logging and 25 GB if using local logging. You can change this limit with the LogSizeLimitInMBs parameter.
Collecting Logs and Traces
VMM deployments use centralized logging for the Network Controller by default. The file share location for these logs is specified when deploying the Network Controller service template.
If a file location has not been specified, local logging will be used on each Network Controller node with logs saved under C:\Windows\tracing\SDNDiagnostics. These logs are saved using the following hierarchy:
- CrashDumps
- NCApplicationCrashDumps
- NCApplicationLogs
- PerfCounters
- SDNDiagnostics
- Traces
The Network Controller uses (Azure) Service Fabric. Service Fabric logs may be required when troubleshooting certain issues. These logs can be found on each Network Controller node at C:\ProgramData\Microsoft\Service Fabric.
If a user has run the Debug-NetworkController cmdlet, additional logs will be available on each Hyper-V host which has been specified with a server resource in the Network Controller. These logs (and traces if enabled) are kept under C:\NCDiagnostics
SLB Diagnostics
SLBM Fabric errors (Hosting service provider actions)
- Check that Software Load Balancer Manager (SLBM) is functioning and that the orchestration layers can talk to each other: SLBM -> SLB Mux and SLBM -> SLB Host Agents. Run DumpSlbRestState from any node with access to Network Controller REST Endpoint.
- Validate the SDNSLBMPerfCounters in PerfMon on one of the Network Controller node VMs (preferably the primary Network Controller node — Get-NetworkControllerReplica):
- Is Load Balancer (LB) engine connected to SLBM? (SLBM LBEngine Configurations Total > 0)
- Does SLBM at least know about its own endpoints? (VIP Endpoints Total >= 2 )
- Are Hyper-V (DIP) hosts connected to SLBM? (HP clients connected == num servers)
- Is SLBM connected to Muxes? (Muxes Connected == Muxes Healthy on SLBM == Muxes reporting healthy = # SLB Muxes VMs).
- Ensure the BGP router configured is successfully peering with the SLB MUX
- If using RRAS with Remote Access (i.e. BGP virtual machine):
- Get-BgpPeer should show connected
- Get-BgpRouteInformation should show at least a route for the SLBM self VIP
- If using physical Top-of-Rack (ToR) switch as BGP Peer, consult your documentation
- For example: # show bgp instance
- If using RRAS with Remote Access (i.e. BGP virtual machine):
- Validate the SlbMuxPerfCounters and SLBMUX counters in PerfMon on the SLB Mux VM
- Check configuration state and VIP ranges in Software Load Balancer Manager Resource
- Get-NetworkControllerLoadBalancerConfiguration -ConnectionUri netstat -anp tcp |findstr 6640)
- Check HostId in nchostagent service regkey (reference HostNotConnected error code in the Appendix) matches the corresponding server resource’s instance Id ( Get-NCServer |convertto-json -depth 8 )
- Check port profile id for virtual machine port matches corresponding virtual machine NIC resource’s Instance Id
SLB Mux Tracing
Information from the Software Load Balancer Muxes can also be determined through Event Viewer.
- Click on «Show Analytic and Debug Logs» under the Event Viewer View menu
- Navigate to «Applications and Services Logs» > Microsoft > Windows > SlbMuxDriver > Trace in Event Viewer
- Right click on it and select «Enable Log»
It is recommended that you only have this logging enabled for a short time while you are trying to reproduce a problem
VFP and vSwitch Tracing
From any Hyper-V host which is hosting a guest VM attached to a tenant virtual network, you can collected a VFP trace to determine where problems might lie.