DOH requests hang after a network reconnection #2822

Closed
opened 2026-03-04 02:26:54 -05:00 by deekerman · 52 comments
Owner

Originally created by @spiralshapeturtle on GitHub (Jun 1, 2021).

Originally assigned to: @ainar-g on GitHub.

  • Version of AdGuard Home server:
    0.106.3
  • How did you install AdGuard Home:
    docker
  • How did you setup DNS configuration:
    Adguard -> router -> ISP
  • If it's a router or IoT, please write device model:
    OPNsense
  • CPU architecture:
    Intel
  • Operating system and version:
    Ubuntu running with Docker containers

Expected Behavior

adguard should always keeps processing DNS requests.

Actual Behavior

Since the latest updates, AdGuard stops processing DNS requests. Even in the same vlan (excluding the router) the software stops listing to DNS requests. Logs keep empty, webgui and ping still works (no network issue)

Screenshots

PRTG polling my DNS setup, also pihole is deployed in the same setup an never show an alarm. It seems for now only related tot adguard.
image

Additional Information

sudo docker run -d --name adguard --hostname adguard --net=hostnetwork --ip=172.16.1.253 --cap-add=NET_ADMIN --cap-add=CAP_NET_RAW --cap-add=CAP_NET_BIND_SERVICE -e TZ=Europe/Amsterdam -v /home/rodeho/docker/adguard/workdir:/opt/adguardhome/work -v /home/rodeho/docker/adguard/confdir:/opt/adguardhome/ --restart unless-stopped adguard/adguardhome

Originally created by @spiralshapeturtle on GitHub (Jun 1, 2021). Originally assigned to: @ainar-g on GitHub. <!-- Please include all relevant details about the environment you experienced the bug in. If possible, include the result of running `./AdGuardHome -v --version` from the installation directory. --> * **Version of AdGuard Home server:** 0.106.3 * **How did you install AdGuard Home:** docker * **How did you setup DNS configuration:** Adguard -> router -> ISP * **If it's a router or IoT, please write device model:** OPNsense * **CPU architecture:** Intel * **Operating system and version:** Ubuntu running with Docker containers ### Expected Behavior adguard should always keeps processing DNS requests. ### Actual Behavior Since the latest updates, AdGuard stops processing DNS requests. Even in the same vlan (excluding the router) the software stops listing to DNS requests. Logs keep empty, webgui and ping still works (no network issue) ### Screenshots PRTG polling my DNS setup, also pihole is deployed in the same setup an never show an alarm. It seems for now only related tot adguard. ![image](https://user-images.githubusercontent.com/16855935/120327086-ac5ee200-c2e9-11eb-8581-8bb1e17b4131.png) ### Additional Information `sudo docker run -d --name adguard --hostname adguard --net=hostnetwork --ip=172.16.1.253 --cap-add=NET_ADMIN --cap-add=CAP_NET_RAW --cap-add=CAP_NET_BIND_SERVICE -e TZ=Europe/Amsterdam -v /home/rodeho/docker/adguard/workdir:/opt/adguardhome/work -v /home/rodeho/docker/adguard/confdir:/opt/adguardhome/ --restart unless-stopped adguard/adguardhome`
deekerman 2026-03-04 02:26:54 -05:00
Author
Owner

@ainar-g commented on GitHub (Jun 1, 2021):

Hi. Is this really the full command that you use to run the container? Because it doesn't have any volumes attached, so any restart probably removes all previous configuration. See the quick start example on our Docker Hub page.

@ainar-g commented on GitHub (Jun 1, 2021): Hi. Is this really the full command that you use to run the container? Because it doesn't have any volumes attached, so any restart probably removes all previous configuration. See the [quick start example] on our Docker Hub page. [quick start example]: https://hub.docker.com/r/adguard/adguardhome
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 1, 2021):

https://github.com/AdguardTeam/AdGuardHome/issues/2657

could be related?

No thats not the full docker command, I removed the volumes due to privacy reasons:
-v /home/foobar/docker/adguard/workdir:/opt/adguardhome/work -v /home/foobar/docker/adguard/confdir:/opt/adguardhome/

@spiralshapeturtle commented on GitHub (Jun 1, 2021): https://github.com/AdguardTeam/AdGuardHome/issues/2657 could be related? No thats not the full docker command, I removed the volumes due to privacy reasons: ` -v /home/foobar/docker/adguard/workdir:/opt/adguardhome/work -v /home/foobar/docker/adguard/confdir:/opt/adguardhome/`
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 1, 2021):

I might thing that when the internet upload is fully congested, the problem occurs faster. Fully utilized like 100% I mean.

@spiralshapeturtle commented on GitHub (Jun 1, 2021): I might thing that when the internet **upload** is fully congested, the problem occurs faster. Fully utilized like 100% I mean.
Author
Owner

@ainar-g commented on GitHub (Jun 1, 2021):

I see. It will be hard to understand what is going on without the logs. Can you configure AGH to make verbose logs and also configure it to write logs to a permanent location using log_file and either post the log here as an attachment or send it to devteam@adguard.com with the subject line “AdGuard Home Issue 3217”? Thanks.

@ainar-g commented on GitHub (Jun 1, 2021): I see. It will be hard to understand what is going on without the logs. Can you [configure] AGH to make verbose logs and also [configure] it to write logs to a permanent location using `log_file` and either post the log here as an attachment or send it to devteam@adguard.com with the subject line “AdGuard Home Issue 3217”? Thanks. [configure]: https://github.com/AdguardTeam/AdGuardHome/wiki/FAQ#verboselog
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 1, 2021):

sure will do

@spiralshapeturtle commented on GitHub (Jun 1, 2021): sure will do
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 1, 2021):


;; QUESTION SECTION:
;www.waze.com.	IN	 A

2021/06/01 22:56:31.435547 1#20385 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: all upstreams failed to respond, cause: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=i9ABAAABAAAAAAABA3d3dwR3YXplA2NvbQAAAQABAAApCAAAAIAAAAA": context deadline exceeded (hidden: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=i9ABAAABAAAAAAABA3d3dwR3YXplA2NvbQAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers), couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=i9ABAAABAAAAAAABA3d3dwR3YXplA2NvbQAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers))

This is the issue, it's a dual ISP network pihole and windows DNS does not report this issue. It seems almost impossible there is no internet. Let me show you.

user@MacBook-Pro-van-user ~ % dig @172.16.1.253 github.com

...timeout...

Switching to the docker container itself, Adguard to be specific.

user@dockerstable:~$ sudo docker exec -it adguard ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1): 56 data bytes
64 bytes from 1.1.1.1: seq=0 ttl=55 time=10.323 ms
64 bytes from 1.1.1.1: seq=1 ttl=55 time=8.896 ms
64 bytes from 1.1.1.1: seq=2 ttl=55 time=10.350 ms
user@dockerstable:~$ sudo docker exec -it adguard ping github.com
PING github.com (140.82.121.4): 56 data bytes
64 bytes from 140.82.121.4: seq=0 ttl=52 time=17.865 ms
64 bytes from 140.82.121.4: seq=1 ttl=52 time=17.533 ms
64 bytes from 140.82.121.4: seq=2 ttl=52 time=17.141 ms
^C

Seems like a software issue regarding https DNS upstreams?

upstream config

https://146.112.41.2/dns-query
https://9.9.9.11/dns-query
https://149.112.112.11/dns-query
[/my.domain/]10.0.0.1
@spiralshapeturtle commented on GitHub (Jun 1, 2021): ``` ;; QUESTION SECTION: ;www.waze.com. IN A 2021/06/01 22:56:31.435547 1#20385 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: all upstreams failed to respond, cause: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=i9ABAAABAAAAAAABA3d3dwR3YXplA2NvbQAAAQABAAApCAAAAIAAAAA": context deadline exceeded (hidden: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=i9ABAAABAAAAAAABA3d3dwR3YXplA2NvbQAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers), couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=i9ABAAABAAAAAAABA3d3dwR3YXplA2NvbQAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers)) ``` This is the issue, it's a dual ISP network pihole and windows DNS does not report this issue. It seems almost impossible there is no internet. Let me show you. ``` user@MacBook-Pro-van-user ~ % dig @172.16.1.253 github.com ...timeout... ``` Switching to the docker container itself, Adguard to be specific. ``` user@dockerstable:~$ sudo docker exec -it adguard ping 1.1.1.1 PING 1.1.1.1 (1.1.1.1): 56 data bytes 64 bytes from 1.1.1.1: seq=0 ttl=55 time=10.323 ms 64 bytes from 1.1.1.1: seq=1 ttl=55 time=8.896 ms 64 bytes from 1.1.1.1: seq=2 ttl=55 time=10.350 ms ``` ``` user@dockerstable:~$ sudo docker exec -it adguard ping github.com PING github.com (140.82.121.4): 56 data bytes 64 bytes from 140.82.121.4: seq=0 ttl=52 time=17.865 ms 64 bytes from 140.82.121.4: seq=1 ttl=52 time=17.533 ms 64 bytes from 140.82.121.4: seq=2 ttl=52 time=17.141 ms ^C ``` Seems like a software issue regarding https DNS upstreams? upstream config ``` https://146.112.41.2/dns-query https://9.9.9.11/dns-query https://149.112.112.11/dns-query [/my.domain/]10.0.0.1 ```
Author
Owner

@ameshkov commented on GitHub (Jun 1, 2021):

Client.Timeout exceeded while awaiting headers means that the TCP connection timed out.

Does the issue go away with time or is it necessary to restart the container?

When this happens, could you please record a tcpdump?
tcpdump dst 9.9.9.11 -w out.pcap

Also, it'd be interesting to see if you have any issues when you use a plain DNS upstream.

@ameshkov commented on GitHub (Jun 1, 2021): `Client.Timeout exceeded while awaiting headers` means that the TCP connection timed out. Does the issue go away with time or is it necessary to restart the container? When this happens, could you please record a tcpdump? `tcpdump dst 9.9.9.11 -w out.pcap` Also, it'd be interesting to see if you have any issues when you use a plain DNS upstream.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 2, 2021):

Client.Timeout exceeded while awaiting headers means that the TCP connection timed out.

Does the issue go away with time or is it necessary to restart the container?

When this happens, could you please record a tcpdump?
tcpdump dst 9.9.9.11 -w out.pcap

Also, it'd be interesting to see if you have any issues when you use a plain DNS upstream.

OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"tcpdump\": executable file not found in $PATH": unknown

user@dockerstable:~$ sudo docker exec -it adguard find / -name tcpdump
user@dockerstable:~$

Could you guide me where you hide tcpdump in the docker? Any way to run bash from the docker host, i.e. like you have SSH-ed into the docker?

@spiralshapeturtle commented on GitHub (Jun 2, 2021): > `Client.Timeout exceeded while awaiting headers` means that the TCP connection timed out. > > Does the issue go away with time or is it necessary to restart the container? > > When this happens, could you please record a tcpdump? > `tcpdump dst 9.9.9.11 -w out.pcap` > > Also, it'd be interesting to see if you have any issues when you use a plain DNS upstream. ``` OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"tcpdump\": executable file not found in $PATH": unknown user@dockerstable:~$ sudo docker exec -it adguard find / -name tcpdump user@dockerstable:~$ ``` Could you guide me where you hide tcpdump in the docker? Any way to run bash from the docker host, i.e. like you have SSH-ed into the docker?
Author
Owner

@ameshkov commented on GitHub (Jun 2, 2021):

@rodeho you don't need to run it inside the container, try running it on your host machine should be just okay.

@ameshkov commented on GitHub (Jun 2, 2021): @rodeho you don't need to run it inside the container, try running it on your host machine should be just okay.
Author
Owner

@hplee0120 commented on GitHub (Jun 2, 2021):

I installed it on the OpenWrt router and usually ran normally. But every time Wan is reconnected, ADH does not resolve DNS, the device can't open some web pages, and then check the logs to confirm that no device requests DNS have been received in the ADH background. This phenomenon returns to normal work after turning off the ADH and reopening the operation.

@hplee0120 commented on GitHub (Jun 2, 2021): I installed it on the OpenWrt router and usually ran normally. But every time Wan is reconnected, ADH does not resolve DNS, the device can't open some web pages, and then check the logs to confirm that no device requests DNS have been received in the ADH background. This phenomenon returns to normal work after turning off the ADH and reopening the operation.
Author
Owner

@hplee0120 commented on GitHub (Jun 3, 2021):

What might be about the logs? Looks like it's a connectivity issue for DoH? But you actually need to close the program and restart it to get back to normal.
https://github.com/AdguardTeam/AdGuardHome/issues/3138

2021/06/03 05:37:06.267815 [info] Couldn't get version check json from https://static.adguard.com/adguardhome/release/version.json: *fmt.wrapError updater: HTTP GET https://static.adguard.com/adguardhome/release/version.json: Get "https://static.adguard.com/adguardhome/release/version.json": all upstreams failed to exchange request, cause: couldn't do a GET request to 'https://doh.pub:443/dns-query', cause: Get "https://doh.pub:443/dns-query?dns=M0EBAAABAAAAAAAABnN0YXRpYwdhZGd1YXJkA2NvbQAAAQAB": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (hidden: couldn't do a GET request to 'https://dns.alidns.com:443/dns-query', cause: Get "https://dns.alidns.com:443/dns-query?dns=M0EBAAABAAAAAAAABnN0YXRpYwdhZGd1YXJkA2NvbQAAAQAB": net/http: request canceled (Client.Timeout exceeded while awaiting headers))

@hplee0120 commented on GitHub (Jun 3, 2021): What might be about the logs? Looks like it's a connectivity issue for DoH? But you actually need to close the program and restart it to get back to normal. https://github.com/AdguardTeam/AdGuardHome/issues/3138 2021/06/03 05:37:06.267815 [info] Couldn't get version check json from https://static.adguard.com/adguardhome/release/version.json: *fmt.wrapError updater: HTTP GET https://static.adguard.com/adguardhome/release/version.json: Get "https://static.adguard.com/adguardhome/release/version.json": all upstreams failed to exchange request, cause: couldn't do a GET request to 'https://doh.pub:443/dns-query', cause: Get "https://doh.pub:443/dns-query?dns=M0EBAAABAAAAAAAABnN0YXRpYwdhZGd1YXJkA2NvbQAAAQAB": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (hidden: couldn't do a GET request to 'https://dns.alidns.com:443/dns-query', cause: Get "https://dns.alidns.com:443/dns-query?dns=M0EBAAABAAAAAAAABnN0YXRpYwdhZGd1YXJkA2NvbQAAAQAB": net/http: request canceled (Client.Timeout exceeded while awaiting headers))
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 3, 2021):

@ameshkov there are 3 tickets complaining about ADH stop processing requests when it detects an internet down situation. All my other DNS servers is run keeps running normal, no issue. Only ADH with DOH is having issue, whit a client I can browse normal during the "internet down alarms" from ADH.

It really has something to do with the TCP implementation of ADH.

@spiralshapeturtle commented on GitHub (Jun 3, 2021): @ameshkov there are 3 tickets complaining about ADH stop processing requests when it detects an internet down situation. All my other DNS servers is run keeps running normal, no issue. Only ADH with DOH is having issue, whit a client I can browse normal during the "internet down alarms" from ADH. It really has something to do with the TCP implementation of ADH.
Author
Owner

@ameshkov commented on GitHub (Jun 3, 2021):

Okay, I have one idea. Our DOH client has a limit of 1 active connection to the remote host. Maybe, in your case when the network change occurs, the connection is not closed and hangs in some semi-open state. This would block further requests.

Let's try lifting that connections limit and see if it helps.

@ameshkov commented on GitHub (Jun 3, 2021): Okay, I have one idea. Our DOH client has a limit of 1 active connection to the remote host. Maybe, in your case when the network change occurs, the connection is not closed and hangs in some semi-open state. This would block further requests. Let's try lifting that connections limit and see if it helps.
Author
Owner

@EugeneOne1 commented on GitHub (Jun 3, 2021):

@rodeho, the possible fix is implemented in the latest edge build (since 084564e). Could you please check, if it solves the problem for you?

@EugeneOne1 commented on GitHub (Jun 3, 2021): @rodeho, the possible fix is implemented in the latest `edge` build (since 084564e). Could you please check, if it solves the problem for you?
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 3, 2021):

@rodeho, the possible fix is implemented in the latest edge build (since 084564e). Could you please check, if it solves the problem for you?

Cool let me check. Could you guide me how to run this edge build into docker?

@spiralshapeturtle commented on GitHub (Jun 3, 2021): > @rodeho, the possible fix is implemented in the latest `edge` build (since [084564e](https://github.com/AdguardTeam/AdGuardHome/commit/084564e6b7dc2785a04acd46fcced20ef501c766)). Could you please check, if it solves the problem for you? Cool let me check. Could you guide me how to run this edge build into docker?
Author
Owner

@EugeneOne1 commented on GitHub (Jun 3, 2021):

Running development builds for Docker is described here.

@EugeneOne1 commented on GitHub (Jun 3, 2021): Running development builds for Docker is described [here](https://github.com/AdguardTeam/AdGuardHome/wiki/Docker#-running-dev-builds).
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 3, 2021):

Running development builds for Docker is described here.

And up and running:v0.107.0-a.62+084564e6

@spiralshapeturtle commented on GitHub (Jun 3, 2021): > Running development builds for Docker is described [here](https://github.com/AdguardTeam/AdGuardHome/wiki/Docker#-running-dev-builds). And up and running:v0.107.0-a.62+084564e6
Author
Owner

@hplee0120 commented on GitHub (Jun 3, 2021):

After testing, it seems that the problem only exists in DoH. The above problem does not occur with DoT or ordinary port dns.
How to use the repaired version, waiting for 107.b2 release?

@hplee0120 commented on GitHub (Jun 3, 2021): After testing, it seems that the problem only exists in DoH. The above problem does not occur with DoT or ordinary port dns. How to use the repaired version, waiting for 107.b2 release?
Author
Owner

@EugeneOne1 commented on GitHub (Jun 4, 2021):

@hplee0120, we've got no information about how you installed your AdGuard Home. Could you please provide this info?

@EugeneOne1 commented on GitHub (Jun 4, 2021): @hplee0120, we've got no information about how you installed your _AdGuard Home_. Could you please provide this info?
Author
Owner

@hplee0120 commented on GitHub (Jun 4, 2021):

@hplee0120, we've got no information about how you installed your AdGuard Home. Could you please provide this info?

I added ADH luci when compiling OpenWrt. Use binary files at https://github.com/AdguardTeam/AdGuardHome/releases.

@hplee0120 commented on GitHub (Jun 4, 2021): > @hplee0120, we've got no information about how you installed your _AdGuard Home_. Could you please provide this info? I added ADH luci when compiling OpenWrt. Use binary files at https://github.com/AdguardTeam/AdGuardHome/releases.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 5, 2021):

Client.Timeout exceeded while awaiting headers means that the TCP connection timed out.

Does the issue go away with time or is it necessary to restart the container?

When this happens, could you please record a tcpdump?
tcpdump dst 9.9.9.11 -w out.pcap

Also, it'd be interesting to see if you have any issues when you use a plain DNS upstream.

I don't think its fully solved yet. Its very fragile, when the TCP connection drops is running into issues. Could the auto recovery of the TCP session be optimized?
@spiralshapeturtle commented on GitHub (Jun 5, 2021): > `Client.Timeout exceeded while awaiting headers` means that the TCP connection timed out. > > Does the issue go away with time or is it necessary to restart the container? > > When this happens, could you please record a tcpdump? > `tcpdump dst 9.9.9.11 -w out.pcap` > > Also, it'd be interesting to see if you have any issues when you use a plain DNS upstream. <removed for privacy> I don't think its fully solved yet. Its very fragile, when the TCP connection drops is running into issues. Could the auto recovery of the TCP session be optimized?
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 6, 2021):

Hi there,

Did some further testing, when the internet line is OK everything seems to be working well. When I change a FW rule, assume it will hit the current TCP table of the router (OPNsense) the issue kicks in.

The session or the re-connect implementation seems not be able to reconnect in a proper way, and the outage remains for minutes <10. After a minute or what, it resolves by itself.

@spiralshapeturtle commented on GitHub (Jun 6, 2021): Hi there, Did some further testing, when the internet line is OK everything seems to be working well. When I change a FW rule, assume it will hit the current TCP table of the router (OPNsense) the issue kicks in. The session or the re-connect implementation seems not be able to reconnect in a proper way, and the outage remains for minutes <10. After a minute or what, it resolves by itself.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 7, 2021):

Did some testing with 106.3 the stable one, with normal DNS resolving to unbound within the same network. It also drops when there was a minimal glitch in the internet connectivity.

The interesting part is that I monitor 4 DNS servers:
Adguard edge
Adguard stable
Windows
Pihole

And only the AdGuard servers does have issues, all the other DNS servers are not metioned once in the logs...in the past week. Always working fine, it has to be something in the AGH IP stack...

@spiralshapeturtle commented on GitHub (Jun 7, 2021): Did some testing with 106.3 the stable one, with normal DNS resolving to unbound within the same network. It also drops when there was a minimal glitch in the internet connectivity. The interesting part is that I monitor 4 DNS servers: Adguard edge Adguard stable Windows Pihole And only the AdGuard servers does have issues, all the other DNS servers are not metioned once in the logs...in the past week. Always working fine, it has to be something in the AGH IP stack... <removed for privacy>
Author
Owner

@ainar-g commented on GitHub (Jun 7, 2021):

@rodeho, thank you for the information! We're still investigating the issues. Just as a side note, what is the value of dns.max_goroutines in your configuration file? Does the situation improve if you set it to 0?

The error you see on the stable version are caused by the fact that edge migrates its configuration version, and the stable releases don't know about the new versions. In your case, you should be able to simply replace schema_version: 11 with schema_version: 10 in your configuration file when you switch to the release version.

@ainar-g commented on GitHub (Jun 7, 2021): @rodeho, thank you for the information! We're still investigating the issues. Just as a side note, what is the value of `dns.max_goroutines` in your configuration file? Does the situation improve if you set it to `0`? The error you see on the stable version are caused by the fact that `edge` migrates its configuration version, and the stable releases don't know about the new versions. In your case, you should be able to simply replace `schema_version: 11` with `schema_version: 10` in your configuration file when you switch to the release version.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 7, 2021):

@rodeho, thank you for the information! We're still investigating the issues. Just as a side note, what is the value of dns.max_goroutines in your configuration file? Does the situation improve if you set it to 0?

The error you see on the stable version are caused by the fact that edge migrates its configuration version, and the stable releases don't know about the new versions. In your case, you should be able to simply replace schema_version: 11 with schema_version: 10 in your configuration file when you switch to the release version.

Hi, the dns.max_goroutines is not configured on the stable one, lets configure it on the edge, running a load of servers in the homelab. So there should be a higher load of DNS than regulair.

@spiralshapeturtle commented on GitHub (Jun 7, 2021): > @rodeho, thank you for the information! We're still investigating the issues. Just as a side note, what is the value of `dns.max_goroutines` in your configuration file? Does the situation improve if you set it to `0`? > > The error you see on the stable version are caused by the fact that `edge` migrates its configuration version, and the stable releases don't know about the new versions. In your case, you should be able to simply replace `schema_version: 11` with `schema_version: 10` in your configuration file when you switch to the release version. Hi, the `dns.max_goroutines` is not configured on the stable one, lets configure it on the edge, running a load of servers in the homelab. So there should be a higher load of DNS than regulair.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 7, 2021):

Another log from the stable one, before stop serving clients.

07-06 14:15+/- crash


2021/06/07 14:12:41.547581 1#4919 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 172.16.7.197:53645
2021/06/07 14:12:41.547677 1#4919 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 2
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;U3174986.Mf4f26d5dc7f2.sos.atlas.ripe.net.	IN	 AAAA

2021/06/07 14:12:41.547753 1#4919 [debug] IPv6 is disabled. Reply with NoError to u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net. AAAA request
2021/06/07 14:12:41.547811 1#4919 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 2
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net.	IN	 AAAA

;; AUTHORITY SECTION:
u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net.	10	IN	SOA	fake-for-negative-caching.adguard.com. hostmaster.u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net. 100500 1800 60 604800 86400

2021/06/07 14:12:41.884732 1#4920 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 172.16.1.139:36346
2021/06/07 14:12:41.884806 1#4920 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 204
;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;ws.meethue.com.	IN	 A

2021/06/07 14:12:41.884881 1#4920 [debug] using settings for client HUE with ip 172.16.1.139 and id ""
2021/06/07 14:12:41.884937 1#4920 [debug] etchostscontainer: answer: ws.meethue.com -> []
2021/06/07 14:12:41.885048 1#4920 [debug] github.com/AdguardTeam/AdGuardHome/internal/dnsfilter.check(): SafeBrowsing: checking ws.meethue.com: 55ae.sb.dns.adguard.com.
2021/06/07 14:12:41.885110 1#4920 [debug] https://dns-family.adguard.com:443/dns-query: sending request TXT 55ae.sb.dns.adguard.com.
2021/06/07 14:16:55.276691 1#11 [debug] free os memory

@spiralshapeturtle commented on GitHub (Jun 7, 2021): Another log from the stable one, before stop serving clients. 07-06 14:15+/- crash ``` 2021/06/07 14:12:41.547581 1#4919 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 172.16.7.197:53645 2021/06/07 14:12:41.547677 1#4919 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 2 ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;U3174986.Mf4f26d5dc7f2.sos.atlas.ripe.net. IN AAAA 2021/06/07 14:12:41.547753 1#4919 [debug] IPv6 is disabled. Reply with NoError to u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net. AAAA request 2021/06/07 14:12:41.547811 1#4919 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: NOERROR, id: 2 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net. IN AAAA ;; AUTHORITY SECTION: u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net. 10 IN SOA fake-for-negative-caching.adguard.com. hostmaster.u3174986.mf4f26d5dc7f2.sos.atlas.ripe.net. 100500 1800 60 604800 86400 2021/06/07 14:12:41.884732 1#4920 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): Start handling new UDP packet from 172.16.1.139:36346 2021/06/07 14:12:41.884806 1#4920 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): IN: ;; opcode: QUERY, status: NOERROR, id: 204 ;; flags: rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;ws.meethue.com. IN A 2021/06/07 14:12:41.884881 1#4920 [debug] using settings for client HUE with ip 172.16.1.139 and id "" 2021/06/07 14:12:41.884937 1#4920 [debug] etchostscontainer: answer: ws.meethue.com -> [] 2021/06/07 14:12:41.885048 1#4920 [debug] github.com/AdguardTeam/AdGuardHome/internal/dnsfilter.check(): SafeBrowsing: checking ws.meethue.com: 55ae.sb.dns.adguard.com. 2021/06/07 14:12:41.885110 1#4920 [debug] https://dns-family.adguard.com:443/dns-query: sending request TXT 55ae.sb.dns.adguard.com. 2021/06/07 14:16:55.276691 1#11 [debug] free os memory ```
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 7, 2021):

@spiralshapeturtle commented on GitHub (Jun 7, 2021): <removed for privacy>
Author
Owner

@ainar-g commented on GitHub (Jun 7, 2021):

What exactly do you mean when you say that “dns.max_goroutines is not configured”? Did you set it to 0?

Also, the context cancelled error still means that AGH can't reach the remote DNS server. What happens when you run:

curl -v 'https://dns-family.adguard.com:443/dns-query'

From the machine with AGH?

@ainar-g commented on GitHub (Jun 7, 2021): What exactly do you mean when you say that “`dns.max_goroutines` is not configured”? Did you set it to `0`? Also, the `context cancelled` error still means that AGH can't reach the remote DNS server. What happens when you run: ```sh curl -v 'https://dns-family.adguard.com:443/dns-query' ``` From the machine with AGH?
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 7, 2021):

What exactly do you mean when you say that “dns.max_goroutines is not configured”? Did you set it to 0?

Also, the context cancelled error still means that AGH can't reach the remote DNS server. What happens when you run:

curl -v 'https://dns-family.adguard.com:443/dns-query'

From the machine with AGH?

I'm testing two machine's

106.3 call it stable
107.edge call it edge

The dns.max_goroutines change to 0 on the edge server does improve the situation, first impression the issue is less noticeable. What does this command do? What I mention tot say: the dns.max_goroutines is not available on the stable server?

Curl is not available from within the docker I used wget to perform a baseline on the stable server. Its from a normal situation, to perform baseline testing and compare the outcome later from the faulty situation.

user@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com:443/dns-query
Connecting to dns-family.adguard.com:443 (94.140.14.15:443)
wget: server returned error: HTTP/1.1 400 Bad Request
user@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com:443/dns-query
Connecting to dns-family.adguard.com:443 (94.140.14.15:443)
wget: server returned error: HTTP/1.1 400 Bad Request
user@dockerstable:~$ sudo docker exec -it adguard wget https://google.com
Connecting to google.com (142.250.179.142:443)
Connecting to www.google.com (142.250.179.196:443)
wget: can't open 'index.html': File exists
user@dockerstable:~$ sudo docker exec -it adguard wget https://github.com
Connecting to github.com (140.82.121.4:443)
wget: can't open 'index.html': File exists
user@dockerstable:~$ sudo docker exec -it adguard wget https://github.com
Connecting to github.com (140.82.121.4:443)
wget: can't open 'index.html': File exists
user@dockerstable:~$ sudo docker exec -it adguard ping dns-family.adguard.com
PING dns-family.adguard.com (94.140.14.15): 56 data bytes
64 bytes from 94.140.14.15: seq=0 ttl=55 time=17.046 ms
64 bytes from 94.140.14.15: seq=1 ttl=55 time=16.167 ms
64 bytes from 94.140.14.15: seq=2 ttl=55 time=15.775 ms
^C
--- dns-family.adguard.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 15.775/16.329/17.046 ms
user@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com:443/dns-query
Connecting to dns-family.adguard.com:443 (94.140.15.16:443)
wget: server returned error: HTTP/1.1 400 Bad Request
user@dockerstable:~$
@spiralshapeturtle commented on GitHub (Jun 7, 2021): > What exactly do you mean when you say that “`dns.max_goroutines` is not configured”? Did you set it to `0`? > > Also, the `context cancelled` error still means that AGH can't reach the remote DNS server. What happens when you run: > > ```shell > curl -v 'https://dns-family.adguard.com:443/dns-query' > ``` > > From the machine with AGH? I'm testing two machine's 106.3 call it stable 107.edge call it edge The `dns.max_goroutines` change to 0 on the edge server does improve the situation, first impression the issue is less noticeable. What does this command do? What I mention tot say: the `dns.max_goroutines` is not available on the stable server? Curl is not available from within the docker I used wget to perform a baseline on the stable server. Its from a normal situation, to perform baseline testing and compare the outcome later from the faulty situation. ``` user@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com:443/dns-query Connecting to dns-family.adguard.com:443 (94.140.14.15:443) wget: server returned error: HTTP/1.1 400 Bad Request user@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com:443/dns-query Connecting to dns-family.adguard.com:443 (94.140.14.15:443) wget: server returned error: HTTP/1.1 400 Bad Request user@dockerstable:~$ sudo docker exec -it adguard wget https://google.com Connecting to google.com (142.250.179.142:443) Connecting to www.google.com (142.250.179.196:443) wget: can't open 'index.html': File exists user@dockerstable:~$ sudo docker exec -it adguard wget https://github.com Connecting to github.com (140.82.121.4:443) wget: can't open 'index.html': File exists user@dockerstable:~$ sudo docker exec -it adguard wget https://github.com Connecting to github.com (140.82.121.4:443) wget: can't open 'index.html': File exists user@dockerstable:~$ sudo docker exec -it adguard ping dns-family.adguard.com PING dns-family.adguard.com (94.140.14.15): 56 data bytes 64 bytes from 94.140.14.15: seq=0 ttl=55 time=17.046 ms 64 bytes from 94.140.14.15: seq=1 ttl=55 time=16.167 ms 64 bytes from 94.140.14.15: seq=2 ttl=55 time=15.775 ms ^C --- dns-family.adguard.com ping statistics --- 3 packets transmitted, 3 packets received, 0% packet loss round-trip min/avg/max = 15.775/16.329/17.046 ms user@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com:443/dns-query Connecting to dns-family.adguard.com:443 (94.140.15.16:443) wget: server returned error: HTTP/1.1 400 Bad Request user@dockerstable:~$ ```
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 8, 2021):

The edge server v0.107.0-a.65+4700dea9 just went down, all other DNS servers, pihole, windows, unbound keeps functional during this issue.

root@dockerstable:~# sudo docker exec -it adguard2 wget  https://dns-family.adguard.com:443/dns-query
Connecting to dns-family.adguard.com:443 (94.140.14.15:443)
wget: server returned error: HTTP/1.1 400 Bad Request
root@dockerstable:~# sudo docker exec -it adguard2 wget  https://dns-family.adguard.com:443/dns-query
Connecting to dns-family.adguard.com:443 (94.140.15.16:443)
wget: server returned error: HTTP/1.1 400 Bad Request
root@dockerstable:~# sudo docker exec -it adguard2 wget github.com
Connecting to github.com (140.82.121.4:80)
Connecting to github.com (140.82.121.4:443)
wget: can't open 'index.html': File exists
root@dockerstable:~#


;; QUESTION SECTION:
;gateway.fe.apple-dns.net.	IN	 A

2021/06/08 09:11:01.138654 1#44969 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: all upstreams failed to respond, cause: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=L3cBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded (hidden: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=L3cBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers), couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=L3cBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded)
2021/06/08 09:11:01.138681 1#44970 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: all upstreams failed to respond, cause: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=4JUBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (hidden: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=4JUBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers), couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=4JUBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": context deadline exceeded)
2021/06/08 09:11:01.804985 1#45000 [debug] https://9.9.9.11:443/dns-query: response: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=hmABAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2021/06/08 09:11:01.805072 1#44997 [debug] https://149.112.112.11:443/dns-query: response: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=iN8BAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2021/06/08 09:11:01.805120 1#44999 [debug] https://146.112.41.2:443/dns-query: response: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=hmABAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded
2021/06/08 09:11:01.804998 1#44995 [debug] https://146.112.41.2:443/dns-query: response: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=iN8BAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2021/06/08 09:11:01.805080 1#44996 [debug] https://9.9.9.11:443/dns-query: response: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=iN8BAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": context deadline exceeded
2021/06/08 09:11:01.805136 1#45001 [debug] https://149.112.112.11:443/dns-query: response: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=hmABAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded
2021/06/08 09:11:01.805197 1#44994 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).Resolve(): RTT: 10001 ms
2021/06/08 09:11:01.805224 1#44993 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).Resolve(): RTT: 10001 ms
2021/06/08 09:11:01.805240 1#44994 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: SERVFAIL, id: 35039
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;gateway.fe.apple-dns.net.	IN	 A

2021/06/08 09:11:01.805258 1#44993 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: SERVFAIL, id: 34400
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;gateway.fe.apple-dns.net.	IN	 HTTPS

``` ```
@spiralshapeturtle commented on GitHub (Jun 8, 2021): The edge server v0.107.0-a.65+4700dea9 just went down, all other DNS servers, pihole, windows, unbound keeps functional during this issue. ``` root@dockerstable:~# sudo docker exec -it adguard2 wget https://dns-family.adguard.com:443/dns-query Connecting to dns-family.adguard.com:443 (94.140.14.15:443) wget: server returned error: HTTP/1.1 400 Bad Request root@dockerstable:~# sudo docker exec -it adguard2 wget https://dns-family.adguard.com:443/dns-query Connecting to dns-family.adguard.com:443 (94.140.15.16:443) wget: server returned error: HTTP/1.1 400 Bad Request root@dockerstable:~# sudo docker exec -it adguard2 wget github.com Connecting to github.com (140.82.121.4:80) Connecting to github.com (140.82.121.4:443) wget: can't open 'index.html': File exists root@dockerstable:~# ``` ``` ;; QUESTION SECTION: ;gateway.fe.apple-dns.net. IN A 2021/06/08 09:11:01.138654 1#44969 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: all upstreams failed to respond, cause: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=L3cBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded (hidden: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=L3cBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers), couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=L3cBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded) 2021/06/08 09:11:01.138681 1#44970 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: all upstreams failed to respond, cause: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=4JUBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers) (hidden: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=4JUBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers), couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=4JUBAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": context deadline exceeded) 2021/06/08 09:11:01.804985 1#45000 [debug] https://9.9.9.11:443/dns-query: response: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=hmABAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers) 2021/06/08 09:11:01.805072 1#44997 [debug] https://149.112.112.11:443/dns-query: response: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=iN8BAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers) 2021/06/08 09:11:01.805120 1#44999 [debug] https://146.112.41.2:443/dns-query: response: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=hmABAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded 2021/06/08 09:11:01.804998 1#44995 [debug] https://146.112.41.2:443/dns-query: response: couldn't do a GET request to 'https://146.112.41.2:443/dns-query', cause: Get "https://146.112.41.2:443/dns-query?dns=iN8BAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": net/http: request canceled (Client.Timeout exceeded while awaiting headers) 2021/06/08 09:11:01.805080 1#44996 [debug] https://9.9.9.11:443/dns-query: response: couldn't do a GET request to 'https://9.9.9.11:443/dns-query', cause: Get "https://9.9.9.11:443/dns-query?dns=iN8BAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAAQABAAApCAAAAIAAAAA": context deadline exceeded 2021/06/08 09:11:01.805136 1#45001 [debug] https://149.112.112.11:443/dns-query: response: couldn't do a GET request to 'https://149.112.112.11:443/dns-query', cause: Get "https://149.112.112.11:443/dns-query?dns=hmABAAABAAAAAAABB2dhdGV3YXkCZmUJYXBwbGUtZG5zA25ldAAAQQABAAApCAAAAIAAAAA": context deadline exceeded 2021/06/08 09:11:01.805197 1#44994 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).Resolve(): RTT: 10001 ms 2021/06/08 09:11:01.805224 1#44993 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).Resolve(): RTT: 10001 ms 2021/06/08 09:11:01.805240 1#44994 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: SERVFAIL, id: 35039 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;gateway.fe.apple-dns.net. IN A 2021/06/08 09:11:01.805258 1#44993 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).logDNSMessage(): OUT: ;; opcode: QUERY, status: SERVFAIL, id: 34400 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;gateway.fe.apple-dns.net. IN HTTPS ``` <removed for privacy> ``` ```
Author
Owner

@ameshkov commented on GitHub (Jun 8, 2021):

all the other DNS servers are not metioned once in the logs

Are you using encrypted upstreams on these servers or are they using plain DNS upstreams?

I don't think its fully solved yet. Its very fragile, when the TCP connection drops is running into issues. Could the auto recovery of the TCP session be optimized?

Did you save that tcpdump? I'd like to take a look at it. Regarding TCP session recovery, no, it cannot be optimized or changed in any way.

To be honest, I doubt we're dealing with an AGH bug here.

There're three options:

  1. This is either an issue with the OPNsense's TCP stack
  2. Golang TCP issue (not likely).
  3. This is a Quad9 issue since they have rather aggressive rate limits. I find this one the most feasible explanation, but we're yet to see.

In order to eliminate all these issues one by one:

  1. Replace Quad9 upstreams. Try using any other upstream: Google, Cloudflare, or AdGuard Non-Filtering.
  2. If the issue continues to happen, please record the tcpdump, I'd like to see what's going on there. Ideally, I'll need AdGuard Home logs alongside this tcpdump (or at least the part of the log for the time period when the issue happened).
@ameshkov commented on GitHub (Jun 8, 2021): > all the other DNS servers are not metioned once in the logs Are you using encrypted upstreams on these servers or are they using plain DNS upstreams? > I don't think its fully solved yet. Its very fragile, when the TCP connection drops is running into issues. Could the auto recovery of the TCP session be optimized? Did you save that tcpdump? I'd like to take a look at it. Regarding TCP session recovery, no, it cannot be optimized or changed in any way. To be honest, I doubt we're dealing with an AGH bug here. There're three options: 1. This is either an issue with the OPNsense's TCP stack 2. Golang TCP issue (not likely). 3. This is a Quad9 issue since they have rather aggressive rate limits. I find this one the most feasible explanation, but we're yet to see. In order to eliminate all these issues one by one: 1. Replace Quad9 upstreams. Try using any other upstream: Google, Cloudflare, or AdGuard Non-Filtering. 2. If the issue continues to happen, please record the tcpdump, I'd like to see what's going on there. Ideally, I'll need AdGuard Home logs alongside this tcpdump (or at least the part of the log for the time period when the issue happened).
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 8, 2021):

Hi @ameshkov

To be honest, I doubt we're dealing with an AGH bug here.

There're three options:

  1. This is either an issue with the OPNsense's TCP stack
  2. Golang TCP issue (not likely).
  3. This is a Quad9 issue since they have rather aggressive rate limits. I find this one the most feasible explanation, but we're yet to see.

I greatly appreciate your cooperation. It's a complex setup. I have 5 DNS servers running at home, in my home lab. Of these 5 servers, 2 are adguard, 1 pihole, 1 router and 1x windows DNS. I monitor this entire environment with PRTG, I get alerts if one of these DNS servers no longer responds to requests.

Of these 5 DNS servers, only 2 give alarms; the two AGH servers. That makes it extra remarkable to me? AGH and Phiole both run IntelNUC -> ESXi hypervisor -> Ubuntu VM -> Docker container Pihole/AdGuard.

Everything is identical in terms of underlying network, all dockers are connected with macvlan in Docker. Using an own real IP of the host's network.

#1 that could be, I can force the replication of the issue with sending bulk traffic in my upload ISP. It still confuses me why 3 out of 5 (internal) dns servers remains functional, and only AGH stops serving clients?

#2 Based on the netwerk setup above I won't exclude the IP stack of Golang. Had the same issue with Adguard 106.3 with UDP resolving to the internal router unbound UDP_53. Also ended up in AGH stop processing DNS request.

#3point taken please note that also the OPENdns was also in the list of my DoH config. That poin'ts me to #2 IPstack issues of Golang.

In order to eliminate all these issues one by one:

  1. Replace Quad9 upstreams. Try using any other upstream: Google, Cloudflare, or AdGuard Non-Filtering.
  2. If the issue continues to happen, please record the tcpdump, I'd like to see what's going on there. Ideally, I'll need AdGuard Home logs alongside this tcpdump (or at least the part of the log for the time period when the issue happened).

Will try to increase the monitoring, it pols every 30seconds. And after 3 fails I receive a push alert, try to reduce that to have more time to start collecting logs.

@spiralshapeturtle commented on GitHub (Jun 8, 2021): Hi @ameshkov > To be honest, I doubt we're dealing with an AGH bug here. > > There're three options: > > 1. This is either an issue with the OPNsense's TCP stack > 2. Golang TCP issue (not likely). > 3. This is a Quad9 issue since they have rather aggressive rate limits. I find this one the most feasible explanation, but we're yet to see. I greatly appreciate your cooperation. It's a complex setup. I have 5 DNS servers running at home, in my home lab. Of these 5 servers, 2 are adguard, 1 pihole, 1 router and 1x windows DNS. I monitor this entire environment with PRTG, I get alerts if one of these DNS servers no longer responds to requests. Of these 5 DNS servers, only 2 give alarms; the two AGH servers. That makes it extra remarkable to me? AGH and Phiole both run IntelNUC -> ESXi hypervisor -> Ubuntu VM -> Docker container Pihole/AdGuard. Everything is identical in terms of underlying network, all dockers are connected with macvlan in Docker. Using an own real IP of the host's network. #1 that could be, I can force the replication of the issue with sending bulk traffic in my upload ISP. It still confuses me why 3 out of 5 (internal) dns servers remains functional, and only AGH stops serving clients? #2 Based on the netwerk setup above I _won't_ exclude the IP stack of Golang. Had the same issue with Adguard 106.3 with UDP resolving to the internal router unbound UDP_53. Also ended up in AGH stop processing DNS request. #3point taken please note that also the OPENdns was also in the list of my DoH config. That poin'ts me to #2 IPstack issues of Golang. > In order to eliminate all these issues one by one: > > 1. Replace Quad9 upstreams. Try using any other upstream: Google, Cloudflare, or AdGuard Non-Filtering. > 2. If the issue continues to happen, please record the tcpdump, I'd like to see what's going on there. Ideally, I'll need AdGuard Home logs alongside this tcpdump (or at least the part of the log for the time period when the issue happened). Will try to increase the monitoring, it pols every 30seconds. And after 3 fails I receive a push alert, try to reduce that to have more time to start collecting logs.
Author
Owner

@ameshkov commented on GitHub (Jun 8, 2021):

It still confuses me why 3 out of 5 (internal) dns servers remains functional

If you were using the same upstreams with the same protocol on all 5 that would be confusing indeed, but that's not the case.

Had the same issue with Adguard 106.3 with UDP resolving to the internal router unbound UDP_53. Also ended up in AGH stop processing DNS request.

Note that if you have "browsing security" enabled, AGH internally uses AdGuard DNS DOH upstream. If you want to do a clean test with UDP, it'd be better to disable both parental control and browsing security.

@ameshkov commented on GitHub (Jun 8, 2021): > It still confuses me why 3 out of 5 (internal) dns servers remains functional If you were using the same upstreams with the same protocol on all 5 that would be confusing indeed, but that's not the case. > Had the same issue with Adguard 106.3 with UDP resolving to the internal router unbound UDP_53. Also ended up in AGH stop processing DNS request. Note that if you have "browsing security" enabled, AGH internally uses AdGuard DNS DOH upstream. If you want to do a clean test with UDP, it'd be better to disable both parental control and browsing security.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 8, 2021):

If you were using the same upstreams with the same protocol on all 5 that would be confusing indeed, but that's not the case.

OK point taken all DNS servers are now pointing to unbound (the router).

Note that if you have "browsing security" enabled, AGH internally uses AdGuard DNS DOH upstream. If you want to do a clean test with UDP, it'd be better to disable both parental control and browsing security.

Yeah saw that in the logs, lets first monitor all servers with the same uplink and only uplink as unbound. If the issue still persists the next step is to disable the parental/browsingsecurity features, so no DOH is being used.

@spiralshapeturtle commented on GitHub (Jun 8, 2021): > If you were using the same upstreams with the same protocol on all 5 that would be confusing indeed, but that's not the case. OK point taken all DNS servers are now pointing to unbound (the router). > Note that if you have "browsing security" enabled, AGH internally uses AdGuard DNS DOH upstream. If you want to do a clean test with UDP, it'd be better to disable both parental control and browsing security. Yeah saw that in the logs, lets first monitor all servers with the same uplink and only uplink as unbound. If the issue still persists the next step is to disable the parental/browsingsecurity features, so no DOH is being used.
Author
Owner

@hplee0120 commented on GitHub (Jun 8, 2021):

My test results show that this condition exists on the DoH upstream server and there is no problem with DoT or normal port DNS.

@hplee0120 commented on GitHub (Jun 8, 2021): My test results show that this condition exists on the DoH upstream server and there is no problem with DoT or normal port DNS.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 12, 2021):

Okey time for an update. You might know that docker supports macvlan, a solution to provide a docker container a real IP. Super easy for DNS servers;-) to allow this nested-mac design you need to enable promiscuous mode on ESXi. Then all packets seems to be replicated within the same port group (virtual switch port) when you sniff on the host. In that same switch port also the NFS and iSCSI bulk sessions are hosted, so a lot of traffic is flowing by the Golang nic.

My first thoughts are still the same. The IP stack of Golang is not as stable as the one from other DNS servers like PiHole. I moved the docker vlan to a new vlan, which makes the network cleaner. Lets monitor again.

@spiralshapeturtle commented on GitHub (Jun 12, 2021): Okey time for an update. You might know that docker supports macvlan, a solution to provide a docker container a real IP. Super easy for DNS servers;-) to allow this nested-mac design you need to enable promiscuous mode on ESXi. Then all packets seems to be replicated within the same port group (virtual switch port) when you sniff on the host. In that same switch port also the NFS and iSCSI bulk sessions are hosted, so a lot of traffic is flowing by the Golang nic. My first thoughts are still the same. The IP stack of Golang is not as stable as the one from other DNS servers like PiHole. I moved the docker vlan to a new vlan, which makes the network cleaner. Lets monitor again.
Author
Owner

@ameshkov commented on GitHub (Jun 13, 2021):

@hplee0120 have you tried the edge channel build?

@rodeho
Neither golang nor dnsmasq (pi-hole) use their own TCP/IP stack. They all rely on OS sockets. The difference that I see here is that in one case you're using plain DNS over UDP -- no connection state, simple to handle any networking issues just by re-sending the same query again. At the same time, you're comparing it to DNS over HTTPS which is quite different and more vulnerable to any network-related issues.

I still think that we should first make sure that there're no issues when you use plain DNS over UDP upstream in AGH. When it's confirmed, the next thing to check would be a DOT or DNS-over-TCP upstream just to see if there's any difference.

Note that in both cases you need to disable "browsing security" and "parental control" in AGH since they rely on using DOH internally.

@ameshkov commented on GitHub (Jun 13, 2021): @hplee0120 have you tried the edge channel build? @rodeho Neither golang nor dnsmasq (pi-hole) use their own TCP/IP stack. They all rely on OS sockets. The difference that I see here is that in one case you're using plain DNS over UDP -- no connection state, simple to handle any networking issues just by re-sending the same query again. At the same time, you're comparing it to DNS over HTTPS which is quite different and more vulnerable to any network-related issues. I still think that we should first make sure that there're no issues when you use plain DNS over UDP upstream in AGH. When it's confirmed, the next thing to check would be a DOT or DNS-over-TCP upstream just to see if there's any difference. Note that in both cases you need to disable "browsing security" and "parental control" in AGH since they rely on using DOH internally.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 14, 2021):

@ameshkov

What I did was the following, I moved the docker guest (ubuntu) to the docker lan, which keeps the network more silent. Normal network behavior should listen only to their own MAC or a broadcast message, right? The monitoring tool alerts me that the server went (both) down again at the same time. They were still running with the original configuration, because i wanted to capture a full outage with logs and wireshark. I have a set, how could I share these with you in a privacy safe way?

adguard1 recovered (stable build)
adguard 2 stil in crashed situation (edge build)

I fired some manual cli commands let me show you some logs:

2021/06/14 16:13:01.006225 1#48751 [debug] using settings for client PRTG with ip 172.16.1.8 and id ""
2021/06/14 16:13:01.006303 1#48751 [debug] etchostscontainer: answer: <cut> -> []
2021/06/14 16:13:01.006394 1#48751 [debug] github.com/AdguardTeam/AdGuardHome/internal/filtering.check(): SafeBrowsing: checking cut: 106e.1a06.sb.dns.adguard.com.
2021/06/14 16:13:01.006463 1#48751 [debug] https://dns-family.adguard.com:443/dns-query: sending request TXT 106e.1a06.sb.dns.adguard.com.
2021/06/14 16:13:01.006645 1#48785 [debug] github.com/AdguardTeam/dnsproxy/upstream.(*bootstrapper).createDialContext.func1(): Dialing to 94.140.14.15:443
2021/06/14 16:13:01.026505 1#48785 [debug] github.com/AdguardTeam/dnsproxy/upstream.(*bootstrapper).createDialContext.func1(): dialer has successfully initialized connection to 94.140.14.15:443 in 19 milliseconds
2021/06/14 16:13:01.027677 1#48751 [debug] https://dns-family.adguard.com:443/dns-query: response: couldn't do a GET request to 'https://dns-family.adguard.com:443/dns-query', cause: Get "https://dns-family.adguard.com:443/dns-query?dns=Li0BAAABAAAAAAAABDEwNmUEMWEwNgJzYgNkbnMHYWRndWFyZANjb20AABAAAQ": EOF
2021/06/14 16:13:01.027733 1#48751 [debug] github.com/AdguardTeam/AdGuardHome/internal/filtering.(*DNSFilter).checkSafeBrowsing(): SafeBrowsing lookup for cut; Elapsed time: 21ms
2021/06/14 16:13:01.027824 1#48751 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: filtering failed to check host "cut": safe browsing: couldn't do a GET request to 'https://dns-family.adguard.com:443/dns-query', cause: Get "https://dns-family.adguard.com:443/dns-query?dns=Li0BAAABAAAAAAAABDEwNmUEMWEwNgJzYgNkbnMHYWRndWFyZANjb20AABAAAQ": EOF
root@dockerstable:~# docker exec -it adguard2 wget https://dns-family.adguard.com
Connecting to dns-family.adguard.com (94.140.15.16:443)
ssl_client: dns-family.adguard.com: handshake failed: No error information
wget: error getting response: Connection reset by peer
root@dockerstable:~#

image

The recovered one
image

You are completely right about the next test with all the DOH options disabled. The one thing I'm questioning myself: what is the benefits of testing further? This seems to be an adguard code issue, the OS is unable to call the DOH servers, but on CLI I'm able to?

@spiralshapeturtle commented on GitHub (Jun 14, 2021): @ameshkov What I did was the following, I moved the docker guest (ubuntu) to the docker lan, which keeps the network more silent. Normal network behavior should listen only to their own MAC or a broadcast message, right? The monitoring tool alerts me that the server went (both) down again at the same time. They were still running with the original configuration, because i wanted to capture a full outage with logs and wireshark. I have a set, how could I share these with you in a privacy safe way? adguard1 recovered (stable build) adguard 2 stil in crashed situation (edge build) I fired some manual cli commands let me show you some logs: ``` 2021/06/14 16:13:01.006225 1#48751 [debug] using settings for client PRTG with ip 172.16.1.8 and id "" 2021/06/14 16:13:01.006303 1#48751 [debug] etchostscontainer: answer: <cut> -> [] 2021/06/14 16:13:01.006394 1#48751 [debug] github.com/AdguardTeam/AdGuardHome/internal/filtering.check(): SafeBrowsing: checking cut: 106e.1a06.sb.dns.adguard.com. 2021/06/14 16:13:01.006463 1#48751 [debug] https://dns-family.adguard.com:443/dns-query: sending request TXT 106e.1a06.sb.dns.adguard.com. 2021/06/14 16:13:01.006645 1#48785 [debug] github.com/AdguardTeam/dnsproxy/upstream.(*bootstrapper).createDialContext.func1(): Dialing to 94.140.14.15:443 2021/06/14 16:13:01.026505 1#48785 [debug] github.com/AdguardTeam/dnsproxy/upstream.(*bootstrapper).createDialContext.func1(): dialer has successfully initialized connection to 94.140.14.15:443 in 19 milliseconds 2021/06/14 16:13:01.027677 1#48751 [debug] https://dns-family.adguard.com:443/dns-query: response: couldn't do a GET request to 'https://dns-family.adguard.com:443/dns-query', cause: Get "https://dns-family.adguard.com:443/dns-query?dns=Li0BAAABAAAAAAAABDEwNmUEMWEwNgJzYgNkbnMHYWRndWFyZANjb20AABAAAQ": EOF 2021/06/14 16:13:01.027733 1#48751 [debug] github.com/AdguardTeam/AdGuardHome/internal/filtering.(*DNSFilter).checkSafeBrowsing(): SafeBrowsing lookup for cut; Elapsed time: 21ms 2021/06/14 16:13:01.027824 1#48751 [debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: filtering failed to check host "cut": safe browsing: couldn't do a GET request to 'https://dns-family.adguard.com:443/dns-query', cause: Get "https://dns-family.adguard.com:443/dns-query?dns=Li0BAAABAAAAAAAABDEwNmUEMWEwNgJzYgNkbnMHYWRndWFyZANjb20AABAAAQ": EOF ``` ``` root@dockerstable:~# docker exec -it adguard2 wget https://dns-family.adguard.com Connecting to dns-family.adguard.com (94.140.15.16:443) ssl_client: dns-family.adguard.com: handshake failed: No error information wget: error getting response: Connection reset by peer root@dockerstable:~# ``` ![image](https://user-images.githubusercontent.com/16855935/121907664-8f380380-cd2c-11eb-83a3-3556e97d76ae.png) The recovered one ![image](https://user-images.githubusercontent.com/16855935/121907810-b393e000-cd2c-11eb-849d-94d56e22870b.png) You are completely right about the next test with all the DOH options disabled. The one thing I'm questioning myself: what is the benefits of testing further? This seems to be an adguard code issue, the OS is unable to call the DOH servers, but on CLI I'm able to?
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 14, 2021):

Next test implemented, all options off. Remain to use DoH upstream providers:

@spiralshapeturtle commented on GitHub (Jun 14, 2021): Next test implemented, all options off. Remain to use DoH upstream providers: <removed because of privacy>
Author
Owner

@ameshkov commented on GitHub (Jun 15, 2021):

They were still running with the original configuration, because i wanted to capture a full outage with logs and wireshark. I have a set, how could I share these with you in a privacy safe way?

You could send them directly to me to am@adguard.com.

This seems to be an adguard code issue, the OS is unable to call the DOH servers, but on CLI I'm able to?

@rodeho I see you're getting wget: error getting response: Connection reset by peer on dockerstable, so there was still a networking issue.

The one thing I'm questioning myself: what is the benefits of testing further? This seems to be an adguard code issue

The benefit is that we'll be able to narrow this down. I don't see what's causing this issue, all of my guesses from above are still possible.

@ameshkov commented on GitHub (Jun 15, 2021): > They were still running with the original configuration, because i wanted to capture a full outage with logs and wireshark. I have a set, how could I share these with you in a privacy safe way? You could send them directly to me to `am@adguard.com`. > This seems to be an adguard code issue, the OS is unable to call the DOH servers, but on CLI I'm able to? @rodeho I see you're getting `wget: error getting response: Connection reset by peer ` on `dockerstable`, so there was still a networking issue. > The one thing I'm questioning myself: what is the benefits of testing further? This seems to be an adguard code issue The benefit is that we'll be able to narrow this down. I don't see what's causing this issue, all of my guesses from above are still possible.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 16, 2021):

This seems to be an adguard code issue, the OS is unable to call the DOH servers, but on CLI I'm able to?

@rodeho I see you're getting wget: error getting response: Connection reset by peer on dockerstable, so there was still a networking issue.

No no netwerk issue, connection reset by peer is the other side. There is a connection but SSL seems to be failing, could that lead to the root cause? Will show you some other dockers in the same network segment and their output.


root@dockerstable:~$ sudo docker exec -it pihole  wget https://dns-family.adguard.com
--2021-06-16 21:23:00--  https://dns-family.adguard.com/
Resolving dns-family.adguard.com (dns-family.adguard.com)... 94.140.14.15, 94.140.15.16
Connecting to dns-family.adguard.com (dns-family.adguard.com)|94.140.14.15|:443... connected.
GnuTLS: The TLS connection was non-properly terminated.
Unable to establish SSL connection.

root@dockerstable:~$ sudo docker exec -it adguard  wget https://dns-family.adguard.com
Connecting to dns-family.adguard.com (94.140.14.15:443)
ssl_client: dns-family.adguard.com: handshake failed: No error information
wget: error getting response: Connection reset by peer

root@dockerstable:~$ sudo docker exec -it adguard2  wget https://dns-family.adguard.com
Connecting to dns-family.adguard.com (94.140.14.15:443)
ssl_client: dns-family.adguard.com: handshake failed: No error information
wget: error getting response: Connection reset by peer

root@dockerstable:~$ sudo docker exec -it smokeping  wget https://dns-family.adguard.com
Connecting to dns-family.adguard.com (94.140.14.15:443)
Connecting to adguard.com (172.67.3.157:443)
saving to 'index.html'
index.html           100% |*********************************************************************************************************************************************************************************************|  423k  0:00:00 ETA
'index.html' saved
@spiralshapeturtle commented on GitHub (Jun 16, 2021): > > This seems to be an adguard code issue, the OS is unable to call the DOH servers, but on CLI I'm able to? > > @rodeho I see you're getting `wget: error getting response: Connection reset by peer ` on `dockerstable`, so there was still a networking issue. No no netwerk issue, connection reset by peer is the other side. There is a connection but SSL seems to be failing, could that lead to the root cause? Will show you some other dockers in the same network segment and their output. ``` root@dockerstable:~$ sudo docker exec -it pihole wget https://dns-family.adguard.com --2021-06-16 21:23:00-- https://dns-family.adguard.com/ Resolving dns-family.adguard.com (dns-family.adguard.com)... 94.140.14.15, 94.140.15.16 Connecting to dns-family.adguard.com (dns-family.adguard.com)|94.140.14.15|:443... connected. GnuTLS: The TLS connection was non-properly terminated. Unable to establish SSL connection. root@dockerstable:~$ sudo docker exec -it adguard wget https://dns-family.adguard.com Connecting to dns-family.adguard.com (94.140.14.15:443) ssl_client: dns-family.adguard.com: handshake failed: No error information wget: error getting response: Connection reset by peer root@dockerstable:~$ sudo docker exec -it adguard2 wget https://dns-family.adguard.com Connecting to dns-family.adguard.com (94.140.14.15:443) ssl_client: dns-family.adguard.com: handshake failed: No error information wget: error getting response: Connection reset by peer root@dockerstable:~$ sudo docker exec -it smokeping wget https://dns-family.adguard.com Connecting to dns-family.adguard.com (94.140.14.15:443) Connecting to adguard.com (172.67.3.157:443) saving to 'index.html' index.html 100% |*********************************************************************************************************************************************************************************************| 423k 0:00:00 ETA 'index.html' saved ```
Author
Owner

@ameshkov commented on GitHub (Jun 21, 2021):

Renaming the issue and merging #3270 here.

I tend to think the problem is specific to DOH and may be caused by a bug in golang http2 implementation. The problem is that we still cannot reproduce this.

We probably can add some workaround (re-create the HTTP client on every timeout error for instance), but I'd like to figure out what's really causing this first.

@ameshkov commented on GitHub (Jun 21, 2021): Renaming the issue and merging #3270 here. I tend to think the problem is specific to DOH and may be caused by a bug in golang http2 implementation. The problem is that we still cannot reproduce this. We probably can add some workaround (re-create the HTTP client on every timeout error for instance), but I'd like to figure out what's really causing this first.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 21, 2021):

@ameshkov thanks, is there anyway we can validate if its specific to the DOH servers of adguard cloud, or specific to DOH upstream servers?

When trying to simulate in your lab, try to add some packet loss. Seems to be more frequent when the upload is heavy loaded.

@spiralshapeturtle commented on GitHub (Jun 21, 2021): @ameshkov thanks, is there anyway we can validate if its specific to the DOH servers of adguard cloud, or specific to DOH upstream servers? When trying to simulate in your lab, try to add some packet loss. Seems to be more frequent when the upload is heavy loaded.
Author
Owner

@ameshkov commented on GitHub (Jun 21, 2021):

@rodeho IMO, it does not seem to be specific to particular DOH servers, rather to golang's http client in general (which we use internally). If we're not able to reproduce it internally, we'll push another build to the edge build that should in theory solve this by re-creating the http client on timeout errors.

@ameshkov commented on GitHub (Jun 21, 2021): @rodeho IMO, it does not seem to be specific to particular DOH servers, rather to golang's http client in general (which we use internally). If we're not able to reproduce it internally, we'll push another build to the edge build that *should* in theory solve this by re-creating the http client on timeout errors.
Author
Owner

@fizsef commented on GitHub (Jun 21, 2021):

I have observed this problem twice, but when I specifically wanted to reproduce it in order to post the log here, this time there were no problems and everything worked fine. Although the settings were the same. I use a DoH server https://family.cloudflare-dns.com/dns-query

@fizsef commented on GitHub (Jun 21, 2021): I have observed this problem twice, but when I specifically wanted to reproduce it in order to post the log here, this time there were no problems and everything worked fine. Although the settings were the same. I use a DoH server https://family.cloudflare-dns.com/dns-query
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 21, 2021):

@rodeho IMO, it does not seem to be specific to particular DOH servers, rather to golang's http client in general (which we use internally). If we're not able to reproduce it internally, we'll push another build to the edge build that should in theory solve this by re-creating the http client on timeout errors.

Just to inform you, I moved to a local Unbound forwarder. Therefore 95% of my network using port 53 with UDP. Only some clients 5% where specifically configure with extra checking for parental control, and they failed an hour ago. Analyzed the logs only the logs DOH checks failed, which also means that the rest of the IP stack remains functional for at least UDP packets.

request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

@spiralshapeturtle commented on GitHub (Jun 21, 2021): > @rodeho IMO, it does not seem to be specific to particular DOH servers, rather to golang's http client in general (which we use internally). If we're not able to reproduce it internally, we'll push another build to the edge build that _should_ in theory solve this by re-creating the http client on timeout errors. Just to inform you, I moved to a local Unbound forwarder. Therefore 95% of my network using port 53 with UDP. Only some clients 5% where specifically configure with extra checking for parental control, and they failed an hour ago. Analyzed the logs only the logs DOH checks failed, which also means that the rest of the IP stack remains functional for at least UDP packets. `request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)`
Author
Owner

@ainar-g commented on GitHub (Jun 22, 2021):

@rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the edge channel in revision 90a85362. Could any of you check if the situation is better now? Thanks!

@ainar-g commented on GitHub (Jun 22, 2021): @rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the `edge` channel in revision 90a85362. Could any of you check if the situation is better now? Thanks!
Author
Owner

@hplee0120 commented on GitHub (Jun 25, 2021):

@rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the edge channel in revision 90a8536. Could any of you check if the situation is better now? Thanks!

From my personal use, after downloading the latest version of edge, the problem seems to have been solved.!

@hplee0120 commented on GitHub (Jun 25, 2021): > @rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the `edge` channel in revision [90a8536](https://github.com/AdguardTeam/AdGuardHome/commit/90a853627ec1357b564f11a47dde8cf1fb107ada). Could any of you check if the situation is better now? Thanks! From my personal use, after downloading the latest version of edge, the problem seems to have been solved.!
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 26, 2021):

@rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the edge channel in revision 90a8536. Could any of you check if the situation is better now? Thanks!

I upgraded to this edge build within my test setup, my primary DNS moved to PiHole during this debugging period AGH was causing to much outages.

@spiralshapeturtle commented on GitHub (Jun 26, 2021): > @rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the `edge` channel in revision [90a8536](https://github.com/AdguardTeam/AdGuardHome/commit/90a853627ec1357b564f11a47dde8cf1fb107ada). Could any of you check if the situation is better now? Thanks! I upgraded to this edge build within my test setup, my primary DNS moved to PiHole during this debugging period AGH was causing to much outages.
Author
Owner

@fizsef commented on GitHub (Jun 27, 2021):

and how to install, where to download, this corrected version?

@fizsef commented on GitHub (Jun 27, 2021): and how to install, where to download, this corrected version?
Author
Owner

@ainar-g commented on GitHub (Jun 28, 2021):

@fizsef, there are instructions here in our readme.

@ainar-g commented on GitHub (Jun 28, 2021): @fizsef, there are instructions [here](https://github.com/AdguardTeam/AdGuardHome#test-unstable-versions) in our readme.
Author
Owner

@spiralshapeturtle commented on GitHub (Jun 30, 2021):

@rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the edge channel in revision 90a8536. Could any of you check if the situation is better now? Thanks!

Last night my stable build went down, and the edge build remains up. Seems to be fixed! Thanks

@spiralshapeturtle commented on GitHub (Jun 30, 2021): > @rodeho, @hplee0120, @fizsef, we've pushed some fixes that may improve the situation to the `edge` channel in revision [90a8536](https://github.com/AdguardTeam/AdGuardHome/commit/90a853627ec1357b564f11a47dde8cf1fb107ada). Could any of you check if the situation is better now? Thanks! Last night my stable build went down, and the edge build remains up. Seems to be fixed! Thanks
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/AdGuardHome#2822
No description provided.