Support edns-tcp-keepalive EDNS0 Option - RFC7828 #3146

Closed
opened 2026-03-04 03:04:37 -05:00 by deekerman · 16 comments
Owner

Originally created by @rskallies on GitHub (Oct 28, 2021).

Originally assigned to: @ainar-g on GitHub.

  • I am running the latest edge version
  • I checked the documentation and found no answer
  • I checked to make sure that this issue has not already been filed

Problem Description

Using latest stubby / getdns as DOT client is not possible without running stubby with the option idle_timeout: 0 because
on the AdGuard server side the edns-tcp-keepalive EDNS0 Option seems not to be parsed. Instead the server logs
[info] error handling TCP packet: dns: buffer size too small

Proposed Solution

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature.
It would be nice either to support this also or at least emit a proper error message if such option is requested by a client and the server cannot handle this feature.

It took me a while to get this figured out and to get a Stubby client working.

Originally created by @rskallies on GitHub (Oct 28, 2021). Originally assigned to: @ainar-g on GitHub. - [x] I am running the latest edge version - [x] I checked the documentation and found no answer - [x] I checked to make sure that this issue has not already been filed ### Problem Description Using latest stubby / getdns as DOT client is not possible without running stubby with the option idle_timeout: 0 because on the AdGuard server side the edns-tcp-keepalive EDNS0 Option seems not to be parsed. Instead the server logs [info] error handling TCP packet: dns: buffer size too small ### Proposed Solution The project already use the https://github.com/miekg/dns/ Go library already which supports this feature. It would be nice either to support this also or at least emit a proper error message if such option is requested by a client and the server cannot handle this feature. It took me a while to get this figured out and to get a Stubby client working.
deekerman 2026-03-04 03:04:37 -05:00
Author
Owner

@Harvester57 commented on GitHub (Dec 2, 2021):

It took me a long time to find this post and the solution to my problem: using DoT on an Asus router with Merlin firmware, I would always hit the same problem you described, where pointing to NextDNS works flawlessly.

Upon implementing your solution, I was able to restore the connection with my AdGuard server. So first, thank you for your post, and second, I concur with your proposition to add the support for edns-tcp-keepalive !

@Harvester57 commented on GitHub (Dec 2, 2021): It took me a long time to find this post and the solution to my problem: using DoT on an Asus router with Merlin firmware, I would always hit the same problem you described, where pointing to NextDNS works flawlessly. Upon implementing your solution, I was able to restore the connection with my AdGuard server. So first, thank you for your post, and second, I concur with your proposition to add the support for edns-tcp-keepalive !
Author
Owner

@gspannu commented on GitHub (Dec 10, 2021):

Using latest stubby / getdns as DOT client is not possible without running stubby with the option idle_timeout: 0 because on the AdGuard server side the edns-tcp-keepalive EDNS0 Option seems not to be parsed. Instead the server logs [info] error handling TCP packet: dns: buffer size too small

Proposed Solution

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature. It would be nice either to support this also or at least emit a proper error message if such option is requested by a client and the server cannot handle this feature.

It took me a while to get this figured out and to get a Stubby client working.

It took me a long time to find this post and the solution to my problem: using DoT on an Asus router with Merlin firmware, I would always hit the same problem you described, where pointing to NextDNS works flawlessly.

Upon implementing your solution, I was able to restore the connection with my AdGuard server. So first, thank you for your post, and second, I concur with your proposition to add the support for edns-tcp-keepalive !

@rskallies @Harvester57

I am running AdGuard Home on a VPS and Asus RTAX88U running Merlin firmware 386.3.2

I am unable to get AdGuard Home working on Asus Router when using Beta/Edge version (while NextDNS works without any issue).

1) Could you please elaborate how you fixed this issue? What changes did you make on the Asus Router? Which files/ scripts to edit/ add?


  1. In addition, the only way I can get Asus to connect to AdGuard home is if I put in the full address in the DNS-Privacy like attached.
    Screenshot 2021-12-11 at 03 59 10 am

Do you have the same issue?

@gspannu commented on GitHub (Dec 10, 2021): > Using latest stubby / getdns as DOT client is not possible without running stubby with the option idle_timeout: 0 because on the AdGuard server side the edns-tcp-keepalive EDNS0 Option seems not to be parsed. Instead the server logs [info] error handling TCP packet: dns: buffer size too small > > ### Proposed Solution > The project already use the https://github.com/miekg/dns/ Go library already which supports this feature. It would be nice either to support this also or at least emit a proper error message if such option is requested by a client and the server cannot handle this feature. > > It took me a while to get this figured out and to get a Stubby client working. > It took me a long time to find this post and the solution to my problem: using DoT on an Asus router with Merlin firmware, I would always hit the same problem you described, where pointing to NextDNS works flawlessly. > > Upon implementing your solution, I was able to restore the connection with my AdGuard server. So first, thank you for your post, and second, I concur with your proposition to add the support for edns-tcp-keepalive ! @rskallies @Harvester57 I am running AdGuard Home on a VPS and Asus RTAX88U running Merlin firmware 386.3.2 I am unable to get AdGuard Home working on Asus Router when using Beta/Edge version (while NextDNS works without any issue). **1) Could you please elaborate how you fixed this issue? What changes did you make on the Asus Router? Which files/ scripts to edit/ add?** ----------------------- 2) In addition, the only way I can get Asus to connect to AdGuard home is if I put in the full address in the DNS-Privacy like attached. <img width="756" alt="Screenshot 2021-12-11 at 03 59 10 am" src="https://user-images.githubusercontent.com/137664/145663014-02a3e413-a766-482b-b3a5-973a4da4ee47.png"> Do you have the same issue?
Author
Owner

@Harvester57 commented on GitHub (Dec 11, 2021):

Hi @gspannu,

I created the post-conf script for Stubby (/jffs/scripts/stubby.postconf) with the following content:

#!/bin/sh
CONFIG=$1
source /usr/sbin/helper.sh

pc_replace "edns_client_subnet_private: 1" "edns_client_subnet_private: 0" $CONFIG
pc_replace "idle_timeout: 9000" "idle_timeout: 0" $CONFIG

You don't need the edns_client_subnet_private: 1 line if you don't intend to use ECS. The idle_timeout: 0 is the only parameter you need to change.

Do not forget to add the execute bit to your script: chmod +x /jffs/scripts/stubby.postconf and then you can restart the Stubby service.

P.S. : I assume you already have a USB key and external JFFS scripts support enabled, and that you can connect to your router through SSH

@Harvester57 commented on GitHub (Dec 11, 2021): Hi @gspannu, I created the post-conf script for Stubby (/jffs/scripts/stubby.postconf) with the following content: ``` #!/bin/sh CONFIG=$1 source /usr/sbin/helper.sh pc_replace "edns_client_subnet_private: 1" "edns_client_subnet_private: 0" $CONFIG pc_replace "idle_timeout: 9000" "idle_timeout: 0" $CONFIG ``` You don't need the `edns_client_subnet_private: 1` line if you don't intend to use ECS. The `idle_timeout: 0` is the only parameter you need to change. Do not forget to add the execute bit to your script: `chmod +x /jffs/scripts/stubby.postconf` and then you can restart the Stubby service. P.S. : I assume you already have a USB key and external JFFS scripts support enabled, and that you can connect to your router through SSH
Author
Owner

@gspannu commented on GitHub (Dec 13, 2021):

Hi @gspannu,
/* snip
*/

Thanks for your response.

Another quick query if could help me.

  • I have hosted my AdGuard Home on a VPS and
  • am using my own self-signed certificates for encryption settings.

AdGuard complains about chain of trust when adding the certificate, but if I copy my self-signed/ self-generated RootCA to /etc/ssl/certs path, then the self signed certificate is accepted by AdGuardHome.

To test that my self signed cert is actually working on DoT/ DoH...

  • I used the fabulous tool dnslookup (by Ameshkov) tool from another machine and executed some DoT/ DoH queries.
  • These queries fail because the client machine does not have the RootCA.
  • Again, copying the RootCA to the client Machines /etc/ssl/certs folder works
  • and all queries work now and are received by AdGuard Home as encrypted.
  • tested for both Dot & DoH queries - all good so far from another machine to my AdGuard Home VPS

Q: How do I copy this RootCA to Asus Router?
If I try and scp the file to /etc/certs... it fails with read-only error.

@gspannu commented on GitHub (Dec 13, 2021): > Hi @gspannu, > /* snip > */ Thanks for your response. Another quick query if could help me. - I have hosted my AdGuard Home on a VPS and - am using my own self-signed certificates for encryption settings. _AdGuard complains about chain of trust when adding the certificate_, but if I copy my self-signed/ self-generated RootCA to /etc/ssl/certs path, then _the self signed certificate is accepted by AdGuardHome_. To test that my self signed cert is actually working on DoT/ DoH... - I used the fabulous tool dnslookup (by [Ameshkov](https://github.com/ameshkov/dnslookup)) tool from another machine and executed some DoT/ DoH queries. - _These queries fail_ because the client machine does not have the RootCA. - Again, copying the RootCA to the client Machines /etc/ssl/certs folder works - and _all queries work now_ and are received by AdGuard Home as encrypted. - tested for both Dot & DoH queries - all good so far from another machine to my AdGuard Home VPS **Q: How do I copy this RootCA to Asus Router?** _If I try and scp the file to /etc/certs... it fails with read-only error._
Author
Owner

@Harvester57 commented on GitHub (Dec 13, 2021):

You can copy you cert in your /jffs partition (for example /jffs/mycert/mycert.pem) and use a binding mount with the /etc/ssl directory :

mount -o bind /jffs/mycert/mycert.pem /etc/ssl/certs/mycert.pem

You should now see your cert in /etc/ssl/certs.

You can automatically do this during the router boot phase, by editing the file /jffs/scripts/services-start, and by adding the previous line in it.

@Harvester57 commented on GitHub (Dec 13, 2021): You can copy you cert in your /jffs partition (for example /jffs/mycert/mycert.pem) and use a binding mount with the /etc/ssl directory : `mount -o bind /jffs/mycert/mycert.pem /etc/ssl/certs/mycert.pem` You should now see your cert in `/etc/ssl/certs`. You can automatically do this during the router boot phase, by editing the file `/jffs/scripts/services-start`, and by adding the previous line in it.
Author
Owner

@gspannu commented on GitHub (Dec 13, 2021):

You can copy you cert in your /jffs partition (for example /jffs/mycert/mycert.pem) and use a binding mount with the /etc/ssl directory :

mount -o bind /jffs/mycert/mycert.pem /etc/ssl/certs/mycert.pem

You should now see your cert in /etc/ssl/certs.

You can automatically do this during the router boot phase, by editing the file /jffs/scripts/services-start, and by adding the previous line in it.

You are a star.... Thanks.

@gspannu commented on GitHub (Dec 13, 2021): > You can copy you cert in your /jffs partition (for example /jffs/mycert/mycert.pem) and use a binding mount with the /etc/ssl directory : > > `mount -o bind /jffs/mycert/mycert.pem /etc/ssl/certs/mycert.pem` > > You should now see your cert in `/etc/ssl/certs`. > > You can automatically do this during the router boot phase, by editing the file `/jffs/scripts/services-start`, and by adding the previous line in it. You are a star.... Thanks.
Author
Owner

@ameshkov commented on GitHub (Dec 14, 2021):

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature.

Not really: https://github.com/miekg/dns/pull/1317

@ainar-g if the PR does not get merged until we release AGH, I suggest adding a replace to go.mod.

@ameshkov commented on GitHub (Dec 14, 2021): > The project already use the https://github.com/miekg/dns/ Go library already which supports this feature. Not really: https://github.com/miekg/dns/pull/1317 @ainar-g if the PR does not get merged until we release AGH, I suggest adding a replace to go.mod.
Author
Owner

@rskallies commented on GitHub (Dec 14, 2021):

The project already use the https://github.com/miekg/dns/ Go library already which supports this feature.

Not really: miekg/dns#1317

@ameshkov Thank you for digging even deeper and for creating a PR to upstream. Spotting / fixing this exceeded my skills. 😄

@ainar-g if the PR does not get merged until we release AGH, I suggest adding a replace to go.mod.

👍

@rskallies commented on GitHub (Dec 14, 2021): > > The project already use the https://github.com/miekg/dns/ Go library already which supports this feature. > > Not really: [miekg/dns#1317](https://github.com/miekg/dns/pull/1317) @ameshkov Thank you for digging even deeper and for creating a PR to upstream. Spotting / fixing this exceeded my skills. 😄 > @ainar-g if the PR does not get merged until we release AGH, I suggest adding a replace to go.mod. 👍
Author
Owner

@ainar-g commented on GitHub (Dec 15, 2021):

@rskallies, the latest edge build includes Andrey's version of the fix. Can you check if that fixes your issue?

@ainar-g commented on GitHub (Dec 15, 2021): @rskallies, the latest `edge` build includes Andrey's version of the fix. Can you check if that fixes your issue?
Author
Owner

@rskallies commented on GitHub (Dec 15, 2021):

@rskallies, the latest edge build includes Andrey's version of the fix. Can you check if that fixes your issue?

Yes it does. 👍
Stubby connects successful when setting the (default) value idle_timeout to 10000 again.

[13:30:58.740760] STUBBY: 95.xxx.xxx.xxx  : Conn opened: TLS - Strict Profile

[13:30:58.947467] STUBBY: 95.xxx.xxx.xxx  : Verify passed : TLS

[13:31:09.151213] STUBBY: 95.xxx.xxx.xxx  : Conn closed: TLS - Resps=     1, Timeouts  =     0, Curr_auth =Success, Keepalive(ms)= 10000

[13:31:09.151375] STUBBY: 95.xxx.xxx.xxx  : Upstream   : TLS - Resps=     1, Timeouts  =     0, Best_auth =Success

[13:31:09.151451] STUBBY: 95.xxx.xxx.xxx  : Upstream   : TLS - Conns=     1, Conn_fails=     0, Conn_shuts=      0, Backoffs     =     0

Still wonder why dnsproxy does not connect via QUIC - would prefer to use Adguard dnsproxy instead of Stubby on this MIPS based router. I'll create another issue with more information later.

@rskallies commented on GitHub (Dec 15, 2021): > @rskallies, the latest `edge` build includes Andrey's version of the fix. Can you check if that fixes your issue? Yes it does. :+1: Stubby connects successful when setting the (default) value idle_timeout to 10000 again. ``` [13:30:58.740760] STUBBY: 95.xxx.xxx.xxx : Conn opened: TLS - Strict Profile [13:30:58.947467] STUBBY: 95.xxx.xxx.xxx : Verify passed : TLS [13:31:09.151213] STUBBY: 95.xxx.xxx.xxx : Conn closed: TLS - Resps= 1, Timeouts = 0, Curr_auth =Success, Keepalive(ms)= 10000 [13:31:09.151375] STUBBY: 95.xxx.xxx.xxx : Upstream : TLS - Resps= 1, Timeouts = 0, Best_auth =Success [13:31:09.151451] STUBBY: 95.xxx.xxx.xxx : Upstream : TLS - Conns= 1, Conn_fails= 0, Conn_shuts= 0, Backoffs = 0 ``` Still wonder why dnsproxy does not connect via QUIC - would prefer to use Adguard dnsproxy instead of Stubby on this MIPS based router. I'll create another issue with more information later.
Author
Owner

@ainar-g commented on GitHub (Dec 15, 2021):

Thanks for testing! I'll close this issue then. I've left a TODO in the code to switch back to the original library once the PR is merged there.

@ainar-g commented on GitHub (Dec 15, 2021): Thanks for testing! I'll close this issue then. I've left a TODO in the code to switch back to the original library once the PR is merged there.
Author
Owner

@ameshkov commented on GitHub (Dec 15, 2021):

Still wonder why dnsproxy does not connect via QUIC

This one is strange. Do you specify the port number when running dnsproxy?

@ameshkov commented on GitHub (Dec 15, 2021): > Still wonder why dnsproxy does not connect via QUIC This one is strange. Do you specify the port number when running dnsproxy?
Author
Owner

@rskallies commented on GitHub (Dec 15, 2021):

Yes -u quic://fully.qualified.domain:784 , since using dnsproxy a long time on x86_64 and arm64 devices which defaulted to port 784 then. I already tested using various different ports in case of a middleware is blocking something but still no success. Using the same config / arguments for dnsproxy from an x86 or arm64 device which is behind the affected MIPS / OpenWRT based router works like a charm. Only if running dnsproxy directly on the router is causing this. I also disabled any firewall rules for testing purposes on the router but no success so far.

Log on the Adguard Server is showing

"got error when accepting a new QUIC stream: timeout: no recent network activity"

Log on the client (dnsproxy v0.39.13) is showing

"[debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: failed to open QUIC session to quic://fully.qualified.domain:784, cause: timeout: handshake did not complete in time"

Seems I need to enable debug on the server and see what is logged then.

@rskallies commented on GitHub (Dec 15, 2021): Yes `-u quic://fully.qualified.domain:784` , since using dnsproxy a long time on x86_64 and arm64 devices which defaulted to port 784 then. I already tested using various different ports in case of a middleware is blocking something but still no success. Using the same config / arguments for dnsproxy from an x86 or arm64 device which is behind the affected MIPS / OpenWRT based router works like a charm. Only if running dnsproxy directly on the router is causing this. I also disabled any firewall rules for testing purposes on the router but no success so far. Log on the Adguard Server is showing "got error when accepting a new QUIC stream: timeout: no recent network activity" Log on the client (dnsproxy v0.39.13) is showing "[debug] github.com/AdguardTeam/dnsproxy/proxy.(*Proxy).udpHandlePacket(): error handling DNS (udp) request: talking to dnsUpstream failed, cause: failed to open QUIC session to quic://fully.qualified.domain:784, cause: timeout: handshake did not complete in time" Seems I need to enable debug on the server and see what is logged then.
Author
Owner

@ameshkov commented on GitHub (Dec 15, 2021):

Hm, it looks as if something is wrong with UDP to port 784 in general. As if it's dropping packets somehow.

@ameshkov commented on GitHub (Dec 15, 2021): Hm, it looks as if something is wrong with UDP to port 784 in general. As if it's dropping packets somehow.
Author
Owner

@rskallies commented on GitHub (Dec 15, 2021):

I disabled any firewalling on both sides, changed UDP port with no success. And interestingly it also does not work with DoT. Using same dnsproxy version from a client behind the router does work for both DoQ / Quic and DoT for the same AdGuard server. Really strange.

I'll try to debug this using tcpdump on both sides.

@rskallies commented on GitHub (Dec 15, 2021): I disabled any firewalling on both sides, changed UDP port with no success. And interestingly it also does not work with DoT. Using same dnsproxy version from a client behind the router does work for both DoQ / Quic and DoT for the same AdGuard server. Really strange. I'll try to debug this using tcpdump on both sides.
Author
Owner

@gspannu commented on GitHub (Dec 15, 2021):

@rskallies, the latest edge build includes Andrey's version of the fix. Can you check if that fixes your issue?

Confirmed. The issue is now fixed in v0.107.0-b.16. 👍

Just tested and removed the post.conf script for Stubby in Asus Merlin Router, all works as expected.

@gspannu commented on GitHub (Dec 15, 2021): > @rskallies, the latest `edge` build includes Andrey's version of the fix. Can you check if that fixes your issue? Confirmed. The issue is now fixed in v0.107.0-b.16. 👍 **Just tested and removed the post.conf script for Stubby in Asus Merlin Router, all works as expected.**
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/AdGuardHome#3146
No description provided.