DNS Troubleshooting Guide

CCE CCE

  • Function Release Records
  • Common Tools
    • Command Line Scenario Examples
  • API Reference
    • Overview
    • Common Headers and Error Responses
    • General Description
  • Product Announcement
    • Announcement on the Discontinuation of CCE Standalone Clusters
    • CCE New Cluster Management Release Announcement
    • Upgrade Announcement for CCE Cluster Audit Component kube-external-auditor
    • CCE Console Upgrade Announcement
    • Announcement on Management Fees for CCE Managed Clusters
    • Container Runtime Version Release Notes
    • Announcement on the Decommissioning of CCE Image Repository
    • Kubernetes Version Release Notes
      • CCE Release of Kubernetes v1_26 History
      • CCE Kubernetes Version Update Notes
      • CCE Release of Kubernetes v1_24 History
      • CCE Release of Kubernetes v1_30 History
      • CCE Release of Kubernetes v1_22 History
      • CCE Release of Kubernetes v1_18 History
      • CCE Release of Kubernetes v1_20 History
      • CCE Release of Kubernetes v1_28 History
      • Release Notes for CCE Kubernetes 1_31 Version
      • Kubernetes Version Overview and Mechanism
    • Security Vulnerability Fix Announcement
      • Vulnerability CVE-2019-5736 Fix Announcement
      • Vulnerability CVE-2021-30465 Fix Announcement
      • CVE-2025-1097, CVE-2025-1098, and Other Vulnerabilities Fix Announcement
      • CVE-2020-14386 Vulnerability Fix Announcement
      • Impact Statement on runc Security Issue (CVE-2024-21626)
  • Service Level Agreement (SLA)
    • CCE Service Level Agreement SLA (V1_0)
  • Typical Practices
    • Pod Anomaly Troubleshooting
    • Adding CGroup V2 Node
    • Common Linux System Configuration Parameters Description
    • Encrypting etcd Data Using KMS
    • Configuring Container Network Parameters Using CNI
    • CCE - Public Network Access Practice
    • Practice of using private images in CCE clusters
    • Unified Access for Virtual Machines and Container Services via CCE Ingress
    • User Guide for Custom CNI Plugins
    • CCE Cluster Network Description and Planning
    • Cross-Cloud Application Migration to Baidu CCE Using Velero
    • CCE Resource Recommender User Documentation
    • Continuous Deployment with Jenkins in CCE Cluster
    • CCE Best Practice-Guestbook Setup
    • CCE Best Practice-Container Network Mode Selection
    • CCE Usage Checklist
    • VPC-ENI Mode Cluster Public Network Access Practice
    • CCE Container Runtime Selection
    • Cloud-native AI
      • Elastic and Fault-Tolerant Training Using CCE AITraining Operator
      • Deploy the TensorFlow Serving inference service
      • Best Practice for GPU Virtualization with Optimal Isolation
  • FAQs
    • How do business applications use load balancer
    • Using kubectl on Windows
    • Cluster management FAQs
    • Common Questions Overview
    • Auto scaling FAQs
    • Create a simple service via kubectl
  • Operation guide
    • Prerequisites for use
    • Identity and access management
    • Permission Management
      • Configure IAM Tag Permission Policy
      • Permission Overview
      • Configure IAM Custom Permission Policy
      • Configure Predefined RBAC Permission Policy
      • Configure IAM Predefined Permission Policy
      • Configure Cluster OIDC Authentication
    • Configuration Management
      • Configmap Management
      • Secret Management
    • Traffic access
      • BLB ingress annotation description
      • Use K8S_Service via CCE
      • Use K8S_Ingress via CCE
      • Implement Canary Release with CCE Based on Nginx-Ingress
      • Create CCE_Ingress via YAML
      • LoadBalancer Service Annotation Description
      • Service Reuses Existing Load Balancer BLB
      • Use Direct Pod Mode LoadBalancer Service
      • NGINX Ingress Configuration Reference
      • Create LoadBalancer_Service via YAML
      • Use NGINX Ingress
    • Virtual Node
      • Configuring BCIPod
      • Configuring bci-profile
      • Managing virtual nodes
    • Node management
      • Add a node
      • Managing Taints
      • Setting Node Blocking
      • Setting GPU Memory Sharing
      • Remove a node
      • Customizing Kubelet Parameters
      • Kubelet Container Monitor Read-Only Port Risk Warning
      • Managing Node Tag
      • Drain node
    • Component Management
      • CCE CSI CDS Plugin Description
      • CCE Fluid Description
      • CCE CSI PFS L2 Plugin
      • CCE Calico Felix Description
      • CCE Ingress Controller Description
      • CCE QoS Agent Description
      • CCE GPU Manager Description
      • CCE Ingress NGINX Controller Description
      • CCE P2P Accelerator Description
      • CCE Virtual Kubelet Component
      • CoreDNS Description
      • CCE Log Operator Description
      • CCE Node Remedier Description
      • CCE Descheduler Description
      • CCE Dynamic Scheduling Plugin Description
      • Kube Scheduler Documentation
      • CCE NPU Manager Description
      • CCE CronHPA Controller Description
      • CCE LB Controller Description
      • Kube ApiServer Description
      • CCE Backup Controller Description
      • CCE Network Plugin Description
      • CCE CSI PFS Plugin Description
      • CCE Credential Controller Description
      • CCE Deep Learning Frameworks Operator Description
      • Component Overview
      • CCE Image Accelerate Description
      • CCE CSI BOS Plugin Description
      • CCE Onepilot Description
      • Description of Kube Controller Manager
      • CCE_Hybrid_Manager Description
      • CCE NodeLocal DNSCache Description
      • CCE Node Problem Detector Description
      • CCE Ascend Mindx DL Description
      • CCE RDMA Device Plugin Description
      • CCE AI Job Scheduler Description
    • Image registry
      • Image Registry Basic Operations
      • Using Container Image to Build Services
    • Helm Management
      • Helm Template
      • Helm Instance
    • Cluster management
      • Upgrade Cluster Kubernetes Version
      • CCE Node CDS Dilatation
      • Managed Cluster Usage Instructions
      • Create cluster
      • CCE Supports GPUSharing Cluster
      • View Cluster
      • Connect to Cluster via kubectl
      • CCE Security Group
      • CCE Node Resource Reservation Instructions
      • Operate Cluster
      • Cluster Snapshot
    • Serverless Cluster
      • Product overview
      • Using Service in Serverless Cluster
      • Creating a Serverless Cluster
    • Storage Management
      • Using Cloud File System
      • Overview
      • Using Parallel File System PFS
      • Using RapidFS
      • Using Object Storage BOS
      • Using Parallel File System PFS L2
      • Using Local Storage
      • Using Cloud Disk CDS
    • Inspection and Diagnosis
      • Cluster Inspection
      • GPU Runtime Environment Check
      • Fault Diagnosis
    • Cloud-native AI
      • Cloud-Native AI Overview
      • AI Monitoring Dashboard
        • Connecting to a Prometheus Instance and Starting a Job
        • NVIDIA Chip Resource Observation
          • AI Job Scheduler component
          • GPU node resources
          • GPU workload resources
          • GPUManager component
          • GPU resource pool overview
        • Ascend Chip Resource Observation
          • Ascend resource pool overview
          • Ascend node resource
          • Ascend workload resource
      • Task Management
        • View Task Information
        • Create TensorFlow Task
        • Example of RDMA Distributed Training Based on NCCL
        • Create PaddlePaddle Task
        • Create AI Training Task
        • Delete task
        • Create PyTorch Task
        • Create Mxnet Task
      • Queue Management
        • Modify Queue
        • Create Queue
        • Usage Instructions for Logical Queues and Physical Queues
        • Queue deletion
      • Dataset Management
        • Create Dataset
        • Delete dataset
        • View Dataset
        • Operate Dataset
      • AI Acceleration Kit
        • AIAK Introduction
        • Using AIAK-Training PyTorch Edition
        • Deploying Distributed Training Tasks Using AIAK-Training
        • Accelerating Inference Business Using AIAK-Inference
      • GPU Virtualization
        • GPU Exclusive and Shared Usage Instructions
        • Image Build Precautions in Shared GPU Scenarios
        • Instructions for Multi-GPU Usage in Single-GPU Containers
        • GPU Virtualization Adaptation Table
        • GPU Online and Offline Mixed Usage Instructions
        • MPS Best Practices & Precautions
        • Precautions for Disabling Node Video Memory Sharing
    • Elastic Scaling
      • Container Timing Horizontal Scaling (CronHPA)
      • Container Horizontal Scaling (HPA)
      • Implementing Second-Level Elastic Scaling with cce-autoscaling-placeholder
      • CCE Cluster Node Auto-Scaling
    • Network Management
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC-ENI Mode)
      • Container Access to External Services in CCE Clusters
      • CCE supports dual-stack networks of IPv4 and IPv6
      • Using NetworkPolicy Network Policy
      • Traffic Forwarding Configuration for Containers in Peering Connections Scenarios
      • CCE IP Masquerade Agent User Guide
      • Creating VPC-ENI Mode Cluster
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC Network Mode)
      • Using NetworkPolicy in CCE Clusters
      • Network Orchestration
        • Container Network QoS Management
        • VPC-ENI Specified Subnet IP Allocation (Container Network v2)
        • Cluster Pod Subnet Topology Distribution (Container Network v2)
      • Network Connectivity
        • Container network accesses the public network via NAT gateway
      • Network Maintenance
        • Common Error Code Table for CCE Container Network
      • DNS
        • CoreDNS Component Manual Dilatation Guide
        • DNS Troubleshooting Guide
        • DNS Principle Overview
    • Namespace Management
      • Set Limit Range
      • Set Resource Quota
      • Basic Namespace Operations
    • Workload
      • CronJob Management
      • Set Workload Auto-Scaling
      • Deployment Management
      • Job Management
      • View the Pod
      • StatefulSet Management
      • Password-Free Pull of Container Image
      • Create Workload Using Private Image
      • DaemonSet Management
    • Monitor Logs
      • Monitor Cluster with Prometheus
      • CCE Event Center
      • Cluster Service Profiling
      • CCE Cluster Anomaly Event Alerts
      • Java Application Monitor
      • Cluster Audit Dashboard
      • Logging
      • Cluster Audit
      • Log Center
        • Configure Collection Rules Using CRD
        • View Cluster Control Plane Logs
        • View Business Logs
        • Log Overview
        • Configure Collection Rules in Cloud Container Engine Console
    • Application management
      • Overview
      • Secret
      • Configuration dictionary
      • Deployment
      • Service
      • Pod
    • NodeGroup Management
      • NodeGroup Management
      • NodeGroup Node Fault Detection and Self-Healing
      • Configuring Scaling Policies
      • NodeGroup Introduction
      • Adding Existing External Nodes
      • Custom NodeGroup Kubelet Configuration
      • Adding Alternative Models
      • Dilatation NodeGroup
    • Backup Center
      • Restore Management
      • Backup Overview
      • Backup Management
      • Backup repository
  • Quick Start
    • Quick Deployment of Nginx Application
    • CCE Container Engine Usage Process Overview
  • Product pricing
    • Product pricing
  • Product Description
    • Application scenarios
    • Introduction
    • Usage restrictions
    • Features
    • Advantages
    • Core concepts
  • Solution-Fabric
    • Fabric Solution
  • Development Guide
    • EFK Log Collection System Deployment Guide
    • Using Network Policy in CCE Cluster
    • Creating a LoadBalancer-Type Service
    • Prometheus Monitoring System Deployment Guide
    • kubectl Management Configuration
  • API_V2 Reference
    • Overview
    • Common Headers and Error Responses
    • Cluster Related Interfaces
    • Instance Related Interfaces
    • Service domain
    • General Description
    • Kubeconfig Related Interfaces
    • RBAC Related Interfaces
    • Autoscaler Related Interfaces
    • Network Related Interfaces
    • InstanceGroup Related Interfaces
    • Appendix
    • Component management-related APIs
    • Package adaptation-related APIs
    • Task Related Interfaces
  • Solution-Xchain
    • Hyperchain Solution
  • SDK
    • Go-SDK
      • Overview
      • NodeGroup Management
      • Initialization
      • Install the SDK Package
      • Cluster management
      • Node management
All documents
menu
No results found, please re-enter

CCE CCE

  • Function Release Records
  • Common Tools
    • Command Line Scenario Examples
  • API Reference
    • Overview
    • Common Headers and Error Responses
    • General Description
  • Product Announcement
    • Announcement on the Discontinuation of CCE Standalone Clusters
    • CCE New Cluster Management Release Announcement
    • Upgrade Announcement for CCE Cluster Audit Component kube-external-auditor
    • CCE Console Upgrade Announcement
    • Announcement on Management Fees for CCE Managed Clusters
    • Container Runtime Version Release Notes
    • Announcement on the Decommissioning of CCE Image Repository
    • Kubernetes Version Release Notes
      • CCE Release of Kubernetes v1_26 History
      • CCE Kubernetes Version Update Notes
      • CCE Release of Kubernetes v1_24 History
      • CCE Release of Kubernetes v1_30 History
      • CCE Release of Kubernetes v1_22 History
      • CCE Release of Kubernetes v1_18 History
      • CCE Release of Kubernetes v1_20 History
      • CCE Release of Kubernetes v1_28 History
      • Release Notes for CCE Kubernetes 1_31 Version
      • Kubernetes Version Overview and Mechanism
    • Security Vulnerability Fix Announcement
      • Vulnerability CVE-2019-5736 Fix Announcement
      • Vulnerability CVE-2021-30465 Fix Announcement
      • CVE-2025-1097, CVE-2025-1098, and Other Vulnerabilities Fix Announcement
      • CVE-2020-14386 Vulnerability Fix Announcement
      • Impact Statement on runc Security Issue (CVE-2024-21626)
  • Service Level Agreement (SLA)
    • CCE Service Level Agreement SLA (V1_0)
  • Typical Practices
    • Pod Anomaly Troubleshooting
    • Adding CGroup V2 Node
    • Common Linux System Configuration Parameters Description
    • Encrypting etcd Data Using KMS
    • Configuring Container Network Parameters Using CNI
    • CCE - Public Network Access Practice
    • Practice of using private images in CCE clusters
    • Unified Access for Virtual Machines and Container Services via CCE Ingress
    • User Guide for Custom CNI Plugins
    • CCE Cluster Network Description and Planning
    • Cross-Cloud Application Migration to Baidu CCE Using Velero
    • CCE Resource Recommender User Documentation
    • Continuous Deployment with Jenkins in CCE Cluster
    • CCE Best Practice-Guestbook Setup
    • CCE Best Practice-Container Network Mode Selection
    • CCE Usage Checklist
    • VPC-ENI Mode Cluster Public Network Access Practice
    • CCE Container Runtime Selection
    • Cloud-native AI
      • Elastic and Fault-Tolerant Training Using CCE AITraining Operator
      • Deploy the TensorFlow Serving inference service
      • Best Practice for GPU Virtualization with Optimal Isolation
  • FAQs
    • How do business applications use load balancer
    • Using kubectl on Windows
    • Cluster management FAQs
    • Common Questions Overview
    • Auto scaling FAQs
    • Create a simple service via kubectl
  • Operation guide
    • Prerequisites for use
    • Identity and access management
    • Permission Management
      • Configure IAM Tag Permission Policy
      • Permission Overview
      • Configure IAM Custom Permission Policy
      • Configure Predefined RBAC Permission Policy
      • Configure IAM Predefined Permission Policy
      • Configure Cluster OIDC Authentication
    • Configuration Management
      • Configmap Management
      • Secret Management
    • Traffic access
      • BLB ingress annotation description
      • Use K8S_Service via CCE
      • Use K8S_Ingress via CCE
      • Implement Canary Release with CCE Based on Nginx-Ingress
      • Create CCE_Ingress via YAML
      • LoadBalancer Service Annotation Description
      • Service Reuses Existing Load Balancer BLB
      • Use Direct Pod Mode LoadBalancer Service
      • NGINX Ingress Configuration Reference
      • Create LoadBalancer_Service via YAML
      • Use NGINX Ingress
    • Virtual Node
      • Configuring BCIPod
      • Configuring bci-profile
      • Managing virtual nodes
    • Node management
      • Add a node
      • Managing Taints
      • Setting Node Blocking
      • Setting GPU Memory Sharing
      • Remove a node
      • Customizing Kubelet Parameters
      • Kubelet Container Monitor Read-Only Port Risk Warning
      • Managing Node Tag
      • Drain node
    • Component Management
      • CCE CSI CDS Plugin Description
      • CCE Fluid Description
      • CCE CSI PFS L2 Plugin
      • CCE Calico Felix Description
      • CCE Ingress Controller Description
      • CCE QoS Agent Description
      • CCE GPU Manager Description
      • CCE Ingress NGINX Controller Description
      • CCE P2P Accelerator Description
      • CCE Virtual Kubelet Component
      • CoreDNS Description
      • CCE Log Operator Description
      • CCE Node Remedier Description
      • CCE Descheduler Description
      • CCE Dynamic Scheduling Plugin Description
      • Kube Scheduler Documentation
      • CCE NPU Manager Description
      • CCE CronHPA Controller Description
      • CCE LB Controller Description
      • Kube ApiServer Description
      • CCE Backup Controller Description
      • CCE Network Plugin Description
      • CCE CSI PFS Plugin Description
      • CCE Credential Controller Description
      • CCE Deep Learning Frameworks Operator Description
      • Component Overview
      • CCE Image Accelerate Description
      • CCE CSI BOS Plugin Description
      • CCE Onepilot Description
      • Description of Kube Controller Manager
      • CCE_Hybrid_Manager Description
      • CCE NodeLocal DNSCache Description
      • CCE Node Problem Detector Description
      • CCE Ascend Mindx DL Description
      • CCE RDMA Device Plugin Description
      • CCE AI Job Scheduler Description
    • Image registry
      • Image Registry Basic Operations
      • Using Container Image to Build Services
    • Helm Management
      • Helm Template
      • Helm Instance
    • Cluster management
      • Upgrade Cluster Kubernetes Version
      • CCE Node CDS Dilatation
      • Managed Cluster Usage Instructions
      • Create cluster
      • CCE Supports GPUSharing Cluster
      • View Cluster
      • Connect to Cluster via kubectl
      • CCE Security Group
      • CCE Node Resource Reservation Instructions
      • Operate Cluster
      • Cluster Snapshot
    • Serverless Cluster
      • Product overview
      • Using Service in Serverless Cluster
      • Creating a Serverless Cluster
    • Storage Management
      • Using Cloud File System
      • Overview
      • Using Parallel File System PFS
      • Using RapidFS
      • Using Object Storage BOS
      • Using Parallel File System PFS L2
      • Using Local Storage
      • Using Cloud Disk CDS
    • Inspection and Diagnosis
      • Cluster Inspection
      • GPU Runtime Environment Check
      • Fault Diagnosis
    • Cloud-native AI
      • Cloud-Native AI Overview
      • AI Monitoring Dashboard
        • Connecting to a Prometheus Instance and Starting a Job
        • NVIDIA Chip Resource Observation
          • AI Job Scheduler component
          • GPU node resources
          • GPU workload resources
          • GPUManager component
          • GPU resource pool overview
        • Ascend Chip Resource Observation
          • Ascend resource pool overview
          • Ascend node resource
          • Ascend workload resource
      • Task Management
        • View Task Information
        • Create TensorFlow Task
        • Example of RDMA Distributed Training Based on NCCL
        • Create PaddlePaddle Task
        • Create AI Training Task
        • Delete task
        • Create PyTorch Task
        • Create Mxnet Task
      • Queue Management
        • Modify Queue
        • Create Queue
        • Usage Instructions for Logical Queues and Physical Queues
        • Queue deletion
      • Dataset Management
        • Create Dataset
        • Delete dataset
        • View Dataset
        • Operate Dataset
      • AI Acceleration Kit
        • AIAK Introduction
        • Using AIAK-Training PyTorch Edition
        • Deploying Distributed Training Tasks Using AIAK-Training
        • Accelerating Inference Business Using AIAK-Inference
      • GPU Virtualization
        • GPU Exclusive and Shared Usage Instructions
        • Image Build Precautions in Shared GPU Scenarios
        • Instructions for Multi-GPU Usage in Single-GPU Containers
        • GPU Virtualization Adaptation Table
        • GPU Online and Offline Mixed Usage Instructions
        • MPS Best Practices & Precautions
        • Precautions for Disabling Node Video Memory Sharing
    • Elastic Scaling
      • Container Timing Horizontal Scaling (CronHPA)
      • Container Horizontal Scaling (HPA)
      • Implementing Second-Level Elastic Scaling with cce-autoscaling-placeholder
      • CCE Cluster Node Auto-Scaling
    • Network Management
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC-ENI Mode)
      • Container Access to External Services in CCE Clusters
      • CCE supports dual-stack networks of IPv4 and IPv6
      • Using NetworkPolicy Network Policy
      • Traffic Forwarding Configuration for Containers in Peering Connections Scenarios
      • CCE IP Masquerade Agent User Guide
      • Creating VPC-ENI Mode Cluster
      • How to Continue Dilatation When Container Network Segment Space Is Exhausted (VPC Network Mode)
      • Using NetworkPolicy in CCE Clusters
      • Network Orchestration
        • Container Network QoS Management
        • VPC-ENI Specified Subnet IP Allocation (Container Network v2)
        • Cluster Pod Subnet Topology Distribution (Container Network v2)
      • Network Connectivity
        • Container network accesses the public network via NAT gateway
      • Network Maintenance
        • Common Error Code Table for CCE Container Network
      • DNS
        • CoreDNS Component Manual Dilatation Guide
        • DNS Troubleshooting Guide
        • DNS Principle Overview
    • Namespace Management
      • Set Limit Range
      • Set Resource Quota
      • Basic Namespace Operations
    • Workload
      • CronJob Management
      • Set Workload Auto-Scaling
      • Deployment Management
      • Job Management
      • View the Pod
      • StatefulSet Management
      • Password-Free Pull of Container Image
      • Create Workload Using Private Image
      • DaemonSet Management
    • Monitor Logs
      • Monitor Cluster with Prometheus
      • CCE Event Center
      • Cluster Service Profiling
      • CCE Cluster Anomaly Event Alerts
      • Java Application Monitor
      • Cluster Audit Dashboard
      • Logging
      • Cluster Audit
      • Log Center
        • Configure Collection Rules Using CRD
        • View Cluster Control Plane Logs
        • View Business Logs
        • Log Overview
        • Configure Collection Rules in Cloud Container Engine Console
    • Application management
      • Overview
      • Secret
      • Configuration dictionary
      • Deployment
      • Service
      • Pod
    • NodeGroup Management
      • NodeGroup Management
      • NodeGroup Node Fault Detection and Self-Healing
      • Configuring Scaling Policies
      • NodeGroup Introduction
      • Adding Existing External Nodes
      • Custom NodeGroup Kubelet Configuration
      • Adding Alternative Models
      • Dilatation NodeGroup
    • Backup Center
      • Restore Management
      • Backup Overview
      • Backup Management
      • Backup repository
  • Quick Start
    • Quick Deployment of Nginx Application
    • CCE Container Engine Usage Process Overview
  • Product pricing
    • Product pricing
  • Product Description
    • Application scenarios
    • Introduction
    • Usage restrictions
    • Features
    • Advantages
    • Core concepts
  • Solution-Fabric
    • Fabric Solution
  • Development Guide
    • EFK Log Collection System Deployment Guide
    • Using Network Policy in CCE Cluster
    • Creating a LoadBalancer-Type Service
    • Prometheus Monitoring System Deployment Guide
    • kubectl Management Configuration
  • API_V2 Reference
    • Overview
    • Common Headers and Error Responses
    • Cluster Related Interfaces
    • Instance Related Interfaces
    • Service domain
    • General Description
    • Kubeconfig Related Interfaces
    • RBAC Related Interfaces
    • Autoscaler Related Interfaces
    • Network Related Interfaces
    • InstanceGroup Related Interfaces
    • Appendix
    • Component management-related APIs
    • Package adaptation-related APIs
    • Task Related Interfaces
  • Solution-Xchain
    • Hyperchain Solution
  • SDK
    • Go-SDK
      • Overview
      • NodeGroup Management
      • Initialization
      • Install the SDK Package
      • Cluster management
      • Node management
  • Document center
  • arrow
  • CCECCE
  • arrow
  • Operation guide
  • arrow
  • Network Management
  • arrow
  • DNS
  • arrow
  • DNS Troubleshooting Guide
Table of contents on this page
  • DNS troubleshooting process
  • Troubleshooting process
  • Category of common client errors
  • DNS troubleshooting approach
  • Troubleshoot by domain name type with resolution exception
  • Troubleshoot by frequency of resolution exception
  • Troubleshooting methods
  • Pod DNS configuration check
  • Connectivity exception troubleshooting
  • CoreDNS pod running status troubleshooting
  • CoreDNS pod log troubleshooting
  • Troubleshooting for domain name resolution
  • Node DNS troubleshooting
  • NodeLocal DNS troubleshooting
  • Common DNS problems and solutions
  • Container network connectivity failure
  • Extra-cluster domain names resolution is abnormal
  • Headless Service Domain Name Resolution Failure
  • High load on CoreDNS pods
  • Node Conntrack table full
  • Concurrent resolution exception of A and AAAA records
  • IPVS defects cause resolution exception

DNS Troubleshooting Guide

Updated at:2025-10-27

This document outlines common DNS-related problems and errors, along with corresponding troubleshooting methods and solutions.

DNS troubleshooting process

Troubleshooting process

When DNS resolution fails, troubleshoot as follows

  1. First, we can determine the problem type based on resolution failure errors. For judgment methods, refer to Common Client Error Categorization

    1. If caused by network disconnection, refer to Troubleshooting by Domain Name Type with Resolution Exception of Troubleshooting Approach
    2. If it is due to domain name resolution failure, refer to Troubleshooting by Resolution Exception Frequency of Troubleshooting Approach
  2. If the above steps do not resolve the problem, troubleshoot by the following steps

    1. Check the DNSPolicy field in pod configuration to confirm whether CoreDNS is used. For details, refer to Pod DNS Configuration Check

      1. If CoreDNS is not used, node DNS configuration will be inherited by default. Refer to Node DNS Troubleshooting
      2. If CoreDNS is used, troubleshoot as follows

        1. To troubleshoot CoreDNS load status, refer to CoreDNS Pod Running Status Troubleshooting
        2. To troubleshoot CoreDNS log, refer to CoreDNS Pod Log Troubleshooting
    2. If NodeLocal DNS is used, refer to NodeLocal DNS Troubleshooting
  3. If the above steps fail to resolve the problem, submit a ticket
Category of common client errors
Category Error
Network connectivity problem network unreachable
connection timeout
connection reset by peer
no route to host
Domain name resolution failure no such host
could not resolve host
name or service not known
NXDOMAIN

Note: Different SDKs might display varying error messages. The error keywords mentioned might not be exhaustive; refer to the actual error message when making a determination.

DNS troubleshooting approach

Troubleshoot by domain name type with resolution exception
Exception type Solution
Intra-cluster and extra-cluster domain names are both abnormal Container Network Failure
High CoreDNS Pod Load
Node Conntrack Table Full
IPVS Defect Causing Resolution Exception
Only extra-cluster domain name is abnormal Extra-cluster domain name resolution exception
Only the Headless service domain name is abnormal Headless Service Domain Name Resolution Failure
Troubleshoot by frequency of resolution exception
Frequency of occurrence Solution
Exception occurs only during application peak hours High CoreDNS Pod load
Node Conntrack Table Full
High frequency of exception occurrence IPVS defect causing resolution exception
Low frequency of exception occurrence Concurrent resolution exception for A and AAAA records
Exception occurs only during node scaling or CoreDNS scale-down IPVS defect causing resolution exception

Troubleshooting methods

Pod DNS configuration check

To view the dnsPolicy field of a business pod, execute the following command:

Plain Text
1kubectl -n <ns> get pod <pod-name> -o yaml | grep dnsPolicy

The value of this field is as follows:

  • "Default": The pod inherits the name resolution configuration from the node it runs on, i.e. using the cloud-based DNS server for domain name resolution services
  • "ClusterFirst": Uses CoreDNS for domain name resolution service. In pod, the nameserver in /etc/resolv.conf points to the ClusterIP of the kube-dns service
  • "ClusterFirstWithHostNet": Pods running in hostNetwork mode require the DNS policy to be explicitly set to "ClusterFirstWithHostNet"; otherwise, they will fall back to the "Default" policy while using the "ClusterFirst" policy.
  • "None": This setting allows pods to ignore DNS setting in the Kubernetes environment, and pod will use the DNS settings provided by their dnsConfig field

You can dive into the business pod to verify whether its DNS configuration file aligns with expectations.

First, enter the container:

Plain Text
1kubectl -n <ns> exec -it <pod-name> -- bash

If the bash command does not exist, try sh

Then check the DNS configuration file:

Plain Text
1cat /etc/resolv.conf

If the dnsPolicy field is set to ClusterFirst or ClusterFirstWithHostNet, the nameserver in resolv.conf should be the ClusterIP of kube-dns in the cluster.
If the dnsPolicy field is set to "Default," the nameserver in resolv.conf should match that in /etc/resolv.conf on the node.
If the dnsPolicy field is set to none, the configuration in resolv.conf should match the dnsConfig field in the pod YAML of the user.

Connectivity exception troubleshooting

Start by verifying the connectivity between the pod and the ClusterIP of CoreDNS.

You can enter the pod using the following command:

Plain Text
1kubectl -n <ns> exec -it <pod-name> -- bash

If the bash command does not exist, try sh

If a pod cannot be signed in via bash or sh, sign in to the node first and execute the following command on the node to obtain the container-id:

Plain Text
1docker ps | grep <Container name>

Then execute the following command to obtain the pid:

Plain Text
1docker inspect <container-id> | grep -i pid

Finally, enter the pod network namespace via the following command:

Plain Text
1nsenter -t <pid> -n bash

Note: If the container runtime is containerd, replace the docker command with crictl.

First, obtain the ClusterIP of CoreDNS via the following command:

Plain Text
1kubectl -n kube-system get svc kube-dns

Then test connectivity to ClusterIP and pod-ip sequentially via the following command:

Plain Text
1telnet <ClusterIP> 53
2telnet <pod-ip> 53

If the ClusterIP is unreachable but the CoreDNS pod IP is accessible, this indicates issues with cluster service connectivity. Check components like kube-proxy related to service load balancing.

If there’s no connectivity between the pod and the CoreDNS pod IP, first ensure the cluster pod network is functioning correctly. Resolve any pod network issues before proceeding, as common problems include node network outages or incorrect security group settings.

If there is no problem with the cluster pod network, the problem may lie with the CoreDNS pod. Resolution can be tested via the following command:

Plain Text
1dig <domain> @<ClusterIP>

Based on your findings, consult other sections of this document for further troubleshooting guidance.

CoreDNS pod running status troubleshooting

First, check the running status of the CoreDNS pod by executing the following command:

Plain Text
1kubectl -n kube-system get pod -o wide | grep coredns

All pods are expected to be in the running status. For non-running pods, the following command can be executed to investigate the cause:

Plain Text
1kubectl -n kube-system describe pod <pod-name>

Or use the following command to check CoreDNS pod resource usage and determine if resources are exceeded:

Plain Text
1kubectl -n kube-system top pod -l k8s-app=kube-dns

If the CoreDNS pod's load is excessive, increase the number of CoreDNS replicas.

If the CoreDNS pod's status appears normal, analyze the CoreDNS pod logs for additional insights.

CoreDNS pod log troubleshooting

The command is as follows:

Plain Text
1kubectl -n kube-system logs <pod-name>

Within the cluster, services load-balancing across multiple pods through ClusterIP enables spot-checking the CoreDNS pod logs.

To precisely direct DNS requests to a certain CoreDNS pod, use the dig command to specify the DNS server. The command is as follows:

Plain Text
1dig baidu.com @<pod-ip>

Then check the CoreDNS pod logs of the corresponding pod-ip
Example of CoreDNS logs:

Plain Text
1[INFO] 192.168.2.90:30639 - 8870 "A IN nfd-master.kube-system.svc.cluster.local. udp 69 false 1232" NOERROR qr,aa,rd 114 0.00010245s

The keyword NOERROR indicates a successful resolution return code. Common return codes are as follows:

  • NOERROR: Resolution succeeded
  • NXDOMAIN: Domain name does not exist
  • SERVFAIL: Resolution errors from upstream DNS servers, etc

Moreover, attention should be paid to whether other errors exist in the logs. Some common errors include:

  • Unable to connect to the API server. Please check the status of the API Server
  • K8S API compatibility errors are generally caused by incompatibility between CoreDNS and K8S versions, typically requiring CoreDNS or K8S version upgrade
Troubleshooting for domain name resolution

Scenario 1: Intra-cluster service resolution succeeded, but public network domain name resolution failed. In such situations, CoreDNS pod logs often display return codes like NXDOMAIN or SERVFAIL. You can reach out to the cloud DNS team for assistance.

Scenario 2: Private domain name resolution failed. Verify whether the private domain name is registered in the cloud DNS. To enable CoreDNS to resolve private domain names, include the configuration of the private domain name resolution server in the CoreDNS configuration file.

Scenario 3: Headless service resolution failure. For Headless services, the resolution directly returns all pod IPs. Check whether the pods corresponding to the service are in a running state.

Node DNS troubleshooting

If the dnsPolicy of user load is set to default or uses hostNetwork, the node DNS configuration will be used.
The dig command can be used on the node for reproduction and troubleshooting, as shown in the following example:

Plain Text
1dig baidu.com

Check node kernel logs by executing the following command:

Plain Text
1dmesg

Check for network-related errors, such as:

  • queue failue
  • conntract full

If no anomalies are found in the node kernel logs, and since the default /etc/resolv.conf configuration uses the public cloud DNS server, you can contact the cloud DNS team to address the issue.

NodeLocal DNS troubleshooting

Please first read CCE Node Local DNS Description to understand the operating principle of NodeLocal DNS.
Then, verify that relevant configurations of NodeLocal DNS have taken effect as per the document. For troubleshooting of other links, refer to this document.

Common DNS problems and solutions

Container network connectivity failure

Problem phenomenon:
Persistent DNS resolution failures for business pods

Root cause:
Network connectivity failure between business pods and CoreDNS pod containers

Solution:
Troubleshoot and ensure container network connectivity is normal. Refer to Connectivity Exception Troubleshooting in the troubleshooting methods

Extra-cluster domain names resolution is abnormal

Problem phenomenon:
Business pod can resolve the intra-cluster domain name normally but fails to resolve certain extra-cluster domain names

Root cause:
Upstream server domain name resolution returns exceptions

Solution:
Check CoreDNS pod request logs to identify error codes. Example logs are as follows:

Plain Text
1[INFO] 192.168.2.90:30639 - 8870 "A x.y.com. udp 69 false 1232" NOERROR qr,aa,rd 114 0.00010245s

If the return code is not NOERROR, it indicates an upstream server error. Common errors include:

  • NXDOMAIN: Domain name does not exist
  • SERVFAIL: Typically indicates upstream server failure or inability to connect to upstream servers

If the problem is confirmed to be with the upstream server, you can submit a ticket for resolution

Headless Service Domain Name Resolution Failure

Problem phenomenon: Scenario 3: Headless svc cannot be resolved.

Root cause:
For Headless services, the resolution result directly returns all pod IPs. If the business pod is not in running status, resolution cannot be done

Solution:
Please check and ensure that the pod corresponding to the service is in the running status

High load on CoreDNS pods

Problem phenomenon: Business pod experiences great DNS request latency, intermittent failures, or persistent failures

Root cause:
High load on CoreDNS pods, and insufficient processing capacity, leading to increased request latency or failures

Solution:
Increase CoreDNS replica count or allocate more resources to CoreDNS

Node Conntrack table full

Problem phenomenon: Business pod DNS experiences intermittent or persistent request failures during peak traffic Run dmesg -T on the node, and in the logs of the corresponding period, errors containing the keyword "conntrack full" are found

Root cause:
During application peak hours, the kernel Conntrack table becomes full, preventing new TCP or UDP requests

Solution:
To increase the kernel Conntrack table limit on nodes, submit a ticket for resolution

Concurrent resolution exception of A and AAAA records

Problem phenomenon: Business pod experiences intermittent failures in domain name resolution

Root cause:
Concurrent A and AAAA DNS requests trigger a defect in the Conntrack module of Linux kernel, causing UDP message loss

Solution:

  • If the container image is based on Alpine, it is recommended to replace the base image
  • Consider adopting the NodeLocal DNS cache solution to enhance DNS resolution performance and reduce CoreDNS load
  • Base images like CentOS and Ubuntu can be optimized by parameters such as options timeout:2 attempts:3 rotate single-request-reopen
IPVS defects cause resolution exception

Problem phenomenon: During cluster node scaling or CoreDNS scale-down, intermittent resolution failures may occur, typically lasting for about five minutes

Root cause:
If the kube-proxy load balancer mode of cluster is IPVS, on nodes with kernel versions below 4.19 (e.g., CentOS), after removing an IPVS UDP backend, the newly initiated UDP message will be discarded if they have conflicting source locations

Solution:
Consider adopting the NodeLocal DNS solution. Since TCP is used between NodeLocal DNS pod and CoreDNS pod, it can tolerate the packet loss caused by this IPVS defect

Previous
CoreDNS Component Manual Dilatation Guide
Next
DNS Principle Overview