getaddrinfo ENOTFOUND occasionally

deekerman commented

2026-02-28 03:27:08 -05:00

Owner

Originally created by @sudoexec on GitHub (May 29, 2024).

🛡️ Security Policy

I agree to have read this project Security Policy

Description

There are some getaddrinfo ENOTFOUND errors occasionally(0-3 errors per day).

Uptime Kuma running in k8s. Upstream dns is k8s's coredns and coredns don't have any error logs.
I use while true; do nslookup example.com && sleep 1; done to test dns resolution and no errors.

The error occurs randomly and I can't reproduce it.
Is there any methods to find details about this error?

👟 Reproduction steps

Can't reproduce.

👀 Expected behavior

No getaddrinfo ENOTFOUND errors.

😓 Actual Behavior

getaddrinfo ENOTFOUND

🐻 Uptime-Kuma Version

1.23.11

💻 Operating System and Arch

k8s

🌐 Browser

125.0.6422.112 (Official Build) Arch Linux (64-bit)

🖥️ Deployment Environment

Runtime: k8s v1.18.1
Database: sqlite
Filesystem used to store the database on: local storage via hostpath
number of monitors: 52

📝 Relevant log output

Failing: getaddrinfo ENOTFOUND

Originally created by @sudoexec on GitHub (May 29, 2024). ### 📑 I have found these related issues/pull requests - https://github.com/louislam/uptime-kuma/issues/220 - https://github.com/louislam/uptime-kuma/issues/3992 - https://github.com/louislam/uptime-kuma/issues/4765 ### 🛡️ Security Policy - [X] I agree to have read this project [Security Policy](https://github.com/louislam/uptime-kuma/security/policy) ### Description There are some `getaddrinfo ENOTFOUND` errors occasionally(0-3 errors per day). Uptime Kuma running in k8s. Upstream dns is k8s's coredns and coredns don't have any error logs. I use `while true; do nslookup example.com && sleep 1; done` to test dns resolution and no errors. The error occurs randomly and I can't reproduce it. Is there any methods to find details about this error? ### 👟 Reproduction steps Can't reproduce. ### 👀 Expected behavior No getaddrinfo ENOTFOUND errors. ### 😓 Actual Behavior getaddrinfo ENOTFOUND ### 🐻 Uptime-Kuma Version 1.23.11 ### 💻 Operating System and Arch k8s ### 🌐 Browser 125.0.6422.112 (Official Build) Arch Linux (64-bit) ### 🖥️ Deployment Environment - Runtime: k8s v1.18.1 - Database: sqlite - Filesystem used to store the database on: local storage via hostpath - number of monitors: 52 ### 📝 Relevant log output ```shell Failing: getaddrinfo ENOTFOUND ```

deekerman

2026-02-28 03:27:08 -05:00

closed this issue
added the
A:documentation

feature-request

help wanted

good first issue

hacktoberfest
labels

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@CommanderStorm commented on GitHub (May 29, 2024):

Same steps as in https://github.com/louislam/uptime-kuma/issues/4765

getaddrinfo ENOTFOUND test.xyz

What is the TTL of the domains you are using?
Do you have DNS caching enabled in the settings?

Most commonly, this issue is caused by you using a DNS resolver which does not like the level of DNS requests it is getting.
=> your DNS Server is dropping SOME requests
=> have you enabled NSCD in the settings to lowered the amount of DNS requests to your TTL (instead of on every request)

@CommanderStorm commented on GitHub (May 29, 2024): Same steps as in https://github.com/louislam/uptime-kuma/issues/4765 > `getaddrinfo ENOTFOUND test.xyz` - What is the TTL of the domains you are using? - Do you have DNS caching enabled in the settings? Most commonly, this issue is caused by you using a DNS resolver which does not like the level of DNS requests it is getting. => your DNS Server is dropping SOME requests => have you enabled NSCD in the settings to lowered the amount of DNS requests to your TTL (instead of on every request)

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@sudoexec commented on GitHub (May 29, 2024):

Same steps as in #4765

getaddrinfo ENOTFOUND test.xyz

What is the TTL of the domains you are using?

Do you have DNS caching enabled in the settings?

Most commonly, this issue is caused by you using a DNS resolver which does not like the level of DNS requests it is getting. => your DNS Server is dropping SOME requests => have you enabled NSCD in the settings to lowered the amount of DNS requests to your TTL (instead of on every request)

TTL is 600
DNS chaing is enabled

@sudoexec commented on GitHub (May 29, 2024): > Same steps as in #4765 > > > `getaddrinfo ENOTFOUND test.xyz` > > * What is the TTL of the domains you are using? > * Do you have DNS caching enabled in the settings? > > Most commonly, this issue is caused by you using a DNS resolver which does not like the level of DNS requests it is getting. => your DNS Server is dropping SOME requests => have you enabled NSCD in the settings to lowered the amount of DNS requests to your TTL (instead of on every request) - TTL is 600 - DNS chaing is enabled ![image](https://github.com/louislam/uptime-kuma/assets/27770920/048e7366-2f7f-447b-8139-bfa3fe7c726d)

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@CommanderStorm commented on GitHub (May 29, 2024):

I have no clue what could be causing this.

Lets rule out the stupid cauases first:

could you look in the log if NSCD has been successfully started? (possible cause: using a custom UUID/GUID)
have you verified that the TTL is actually 600?
coredns don't have any error logs

Just to make sure: you have activated https://coredns.io/plugins/errors/ and/or https://coredns.io/plugins/log/?
What are the logs?

@CommanderStorm commented on GitHub (May 29, 2024): I have no clue what could be causing this. Lets rule out the stupid cauases first: - could you look in the log if NSCD has been successfully started? (possible cause: using a custom UUID/GUID) - have you verified that the TTL is actually 600? - > coredns don't have any error logs Just to make sure: you have activated https://coredns.io/plugins/errors/ and/or https://coredns.io/plugins/log/? What are the logs?

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@sudoexec commented on GitHub (May 29, 2024):

could you look in the log if NSCD has been successfully started? (possible cause: using a custom UUID/GUID)

ps aux show NSCD is running

have you verified that the TTL is actually 600?

I'm sure TTL is 600

coredns don't have any error logs

Just to make sure: you have activated https://coredns.io/plugins/errors/ and/or https://coredns.io/plugins/log/?
What are the logs?

I enable errors plugin but not log plugin. I'll try to enable log plugin to find more details.

@sudoexec commented on GitHub (May 29, 2024): > could you look in the log if NSCD has been successfully started? (possible cause: using a custom UUID/GUID) `ps aux` show NSCD is running > have you verified that the TTL is actually 600? I'm sure TTL is 600 > > coredns don't have any error logs > > Just to make sure: you have activated https://coredns.io/plugins/errors/ and/or https://coredns.io/plugins/log/? > What are the logs? I enable errors plugin but not log plugin. I'll try to enable log plugin to find more details.

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@thielj commented on GitHub (May 30, 2024):

@sudoexec Alpine or other musl based Linux? Can you post a copy of your host's and the running container's /etc/resolv.conf?

I have seen similar issues in the past, including with Kubernetes, usually involving multiple DNS servers or related to search domains. The musl resolver would send out multiple parallel queries and ignore all replies but the first one. If that response was an error, this is what you would get. If the "good" lookup would usually win the race, you wouldn't see this error often.

Also, a regular nslookup or dig (or the DNS monitors in Kuma) do name service lookups differently than for example curl or http requests in Node which use the resolver (getaddrinfo) provided by the C library. Just had a quick google and these might give some background:

https://jvns.ca/blog/2022/02/23/getaddrinfo-is-kind-of-weird/
https://medium.com/@hsahu24/understanding-dns-resolution-and-resolv-conf-d17d1d64471c

(this is just a personal opinion, but I wouldn't touch nscd with a barge pole)

@thielj commented on GitHub (May 30, 2024): @sudoexec Alpine or other musl based Linux? Can you post a copy of your host's and the running container's /etc/resolv.conf? I have seen similar issues in the past, including with Kubernetes, usually involving multiple DNS servers or related to search domains. The musl resolver would send out multiple parallel queries and ignore all replies but the first one. If that response was an error, this is what you would get. If the "good" lookup would usually win the race, you wouldn't see this error often. Also, a regular `nslookup` or `dig` (or the DNS monitors in Kuma) do name service lookups differently than for example `curl` or http requests in Node which use the resolver (`getaddrinfo`) provided by the C library. Just had a quick google and these might give some background: https://jvns.ca/blog/2022/02/23/getaddrinfo-is-kind-of-weird/ https://medium.com/@hsahu24/understanding-dns-resolution-and-resolv-conf-d17d1d64471c (this is just a personal opinion, but I wouldn't touch `nscd` with a barge pole)

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@sudoexec commented on GitHub (May 30, 2024):

@thielj Host machine is ubuntu 18.04.
Here are resolv.conf:

# Host
nameserver 119.29.29.29

# Container
nameserver 10.96.0.10                 # k8s coredns
search namespace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Thanks for the info you provided, I've learned more abount DNS internal from it.

Additionally, I've added another nameserver to uptime kuma pod, and there're no errors in the past 2 days.

@sudoexec commented on GitHub (May 30, 2024): @thielj Host machine is ubuntu 18.04. Here are resolv.conf: ``` # Host nameserver 119.29.29.29 # Container nameserver 10.96.0.10 # k8s coredns search namespace.svc.cluster.local svc.cluster.local cluster.local options ndots:5 ``` Thanks for the info you provided, I've learned more abount DNS internal from it. Additionally, I've added another nameserver to uptime kuma pod, and there're no errors in the past 2 days.

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@thielj commented on GitHub (May 31, 2024):

If you get more getadrinfo related errors: those resolv.conf settings and the internal DNS they lead to is the rabbit hole you need to dig into, all the way from the container/pod down your stack.

https://coredns.io/2017/06/08/how-queries-are-processed-in-coredns/

@thielj commented on GitHub (May 31, 2024): If you get more getadrinfo related errors: those resolv.conf settings and the internal DNS they lead to is the rabbit hole you need to dig into, all the way from the container/pod down your stack. https://coredns.io/2017/06/08/how-queries-are-processed-in-coredns/

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@CommanderStorm commented on GitHub (May 31, 2024):

We should likely document this here
https://github.com/louislam/uptime-kuma/wiki/Troubleshooting

What is your second nameserver? (how did you find it's IP? Do you have multiple coredns instances running?)

(Not a kubernetes/dns wizard 😅)

@CommanderStorm commented on GitHub (May 31, 2024): We should likely document this here https://github.com/louislam/uptime-kuma/wiki/Troubleshooting What is your second nameserver? (how did you find it's IP? Do you have multiple coredns instances running?) (Not a kubernetes/dns wizard 😅)

deekerman commented

2026-02-28 03:27:13 -05:00

Author

Owner

@sudoexec commented on GitHub (May 31, 2024):

@thielj Thanks again for your help. I'll try it

@CommanderStorm

Additionally, I've added another nameserver to uptime kuma pod, and there're no errors in the past 2 days.

In fact,"another nameserver" is 1.1.1.1. In case it's caused by coredns.

@sudoexec commented on GitHub (May 31, 2024): @thielj Thanks again for your help. I'll try it @CommanderStorm > Additionally, I've added another nameserver to uptime kuma pod, and there're no errors in the past 2 days. In fact,"another nameserver" is 1.1.1.1. In case it's caused by coredns.

deekerman commented

2026-02-28 03:27:14 -05:00

Author

Owner

@thielj commented on GitHub (May 31, 2024):

@sudoexec This probably doesn't do what you expect, and if it does, you're relying on specific implementation behaviour of POSIX getaddrinfo. There are at least four different major implementations, and most of them can be further configured, see nsswitch.conf for an example.

The two most common, and their default behaviour with regards to the DNS resolver are:

glibc, which will query the first server, and if it replies saying that it can't resolve your name, that's the final result. Only if the first server doesn't reply at all within the timeout, glibc would move on. For the purpose of monitoring, this can effectively mask problems in your Kubernetes DNS setup. Unless you monitor to show off "all green" to your boss or a client, it's probably not what you want.
musl, which will query both servers in parallel, and the first to reply wins. If 1.1.1.1 is faster than coredns and says it's unresolveable, then that's the final result. This usually ends in your internal DNS winning the race 99.99% of the time. Instead of logging that your coredns is sometimes slow, you will log lookup failures (without knowing that they actually came from 1.1.1.1).

So: If you specify more than one server in resolv.conf, BOTH should be able to resolve ALL your hosts. If you want to implement fallbacks, query routing and such, configure a coredns or dnsmasq instance appropriately and point your resolv.conf to that. If you still want two DNS entries in your resolv.conf, configure two identically redundant instances.

Also, if you run frequent probes, you will eventually see failures. That's pretty normal. With a 99.99% reliability, a < 0.01% failure rate would be acceptable. Configure your probes to allow for one retry maybe?

Alpine/Musl

@thielj commented on GitHub (May 31, 2024): @sudoexec This probably doesn't do what you expect, and if it does, you're relying on specific implementation behaviour of [POSIX getaddrinfo](https://pubs.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html). There are at least four different major implementations, and most of them can be further configured, see [nsswitch.conf](https://www.gnu.org/software/libc/manual/html_node/NSS-Configuration-File.html) for an example. The two most common, and their default behaviour with regards to the DNS resolver are: - glibc, which will query the first server, and if it replies saying that it can't resolve your name, that's the final result. Only if the first server doesn't reply at all within the timeout, glibc would move on. For the purpose of monitoring, this can effectively mask problems in your Kubernetes DNS setup. Unless you monitor to show off "all green" to your boss or a client, it's probably not what you want. - musl, which will query both servers in parallel, and the first to reply wins. If 1.1.1.1 is faster than coredns and says it's unresolveable, then that's the final result. This usually ends in your internal DNS winning the race 99.99% of the time. Instead of logging that your coredns is sometimes slow, you will log lookup failures (without knowing that they actually came from 1.1.1.1). So: If you specify more than one server in resolv.conf, **BOTH** should be able to resolve **ALL** your hosts. If you want to implement fallbacks, query routing and such, configure a coredns or dnsmasq instance appropriately and point your resolv.conf to that. If you still want two DNS entries in your resolv.conf, configure two identically redundant instances. Also, if you run frequent probes, you will eventually see failures. That's pretty normal. With a 99.99% reliability, a < 0.01% failure rate would be acceptable. Configure your probes to allow for one retry maybe? [Alpine/Musl](https://wiki.musl-libc.org/functional-differences-from-glibc.html#Name-Resolver/DNS)

deekerman commented

2026-02-28 03:27:14 -05:00

Author

Owner

@skrue commented on GitHub (Jun 11, 2024):

I started seeing this behavior after setting up AdGuard Home. In my previous setup I only had Unbound DNS running on my OPNsense router/firewall. Now, AdGuard will relay all requests that it doesn't decide to block to Unbound, so AdGuard is the primary DNS. My entire home network is whitelisted in AdGuard as is the Uptime Kuma IP, so no blocking should be happening there. I am running Uptime Kuma as an LXC container on my Proxmox host. getaddrinfo ENOTFOUND errors pop up roughly once a day for each monitor that I have configured. I have now increased the retry value from 0 to 2, let's see if that helps.

@skrue commented on GitHub (Jun 11, 2024): I started seeing this behavior after setting up AdGuard Home. In my previous setup I only had Unbound DNS running on my OPNsense router/firewall. Now, AdGuard will relay all requests that it doesn't decide to block to Unbound, so AdGuard is the primary DNS. My entire home network is whitelisted in AdGuard as is the Uptime Kuma IP, so no blocking should be happening there. I am running Uptime Kuma as an LXC container on my Proxmox host. `getaddrinfo ENOTFOUND` errors pop up roughly once a day for each monitor that I have configured. I have now increased the retry value from 0 to 2, let's see if that helps.

deekerman commented

2026-02-28 03:27:14 -05:00

Author

Owner

@sudoexec commented on GitHub (Jun 23, 2024):

Weeks age, I change my upstream DNS (which is provided by cloud service and managed by systemd-resolved) to another 2 public DNS server. There's no getaddrinfo ENOTFOUND error again.

@sudoexec commented on GitHub (Jun 23, 2024): Weeks age, I change my upstream DNS (which is provided by cloud service and managed by systemd-resolved) to another 2 public DNS server. There's no `getaddrinfo ENOTFOUND` error again.

deekerman commented

2026-02-28 03:27:14 -05:00

Author

Owner

@benoitjpnet commented on GitHub (Feb 17, 2025):

I started seeing this behavior after setting up AdGuard Home. In my previous setup I only had Unbound DNS running on my OPNsense router/firewall. Now, AdGuard will relay all requests that it doesn't decide to block to Unbound, so AdGuard is the primary DNS. My entire home network is whitelisted in AdGuard as is the Uptime Kuma IP, so no blocking should be happening there. I am running Uptime Kuma as an LXC container on my Proxmox host. getaddrinfo ENOTFOUND errors pop up roughly once a day for each monitor that I have configured. I have now increased the retry value from 0 to 2, let's see if that helps.

Having the same setup and same issue. I wonder what's wrong with Adguard... Nothing in the logs.

@benoitjpnet commented on GitHub (Feb 17, 2025): > I started seeing this behavior after setting up AdGuard Home. In my previous setup I only had Unbound DNS running on my OPNsense router/firewall. Now, AdGuard will relay all requests that it doesn't decide to block to Unbound, so AdGuard is the primary DNS. My entire home network is whitelisted in AdGuard as is the Uptime Kuma IP, so no blocking should be happening there. I am running Uptime Kuma as an LXC container on my Proxmox host. `getaddrinfo ENOTFOUND` errors pop up roughly once a day for each monitor that I have configured. I have now increased the retry value from 0 to 2, let's see if that helps. Having the same setup and same issue. I wonder what's wrong with Adguard... Nothing in the logs.

Rows
Columns

getaddrinfo ENOTFOUND occasionally #3368

📑 I have found these related issues/pull requests

🛡️ Security Policy

Description

👟 Reproduction steps

👀 Expected behavior

😓 Actual Behavior

🐻 Uptime-Kuma Version

💻 Operating System and Arch

🌐 Browser

🖥️ Deployment Environment

📝 Relevant log output