Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] HickoryDNS loses nearly 50% of queries #2613

Open
mokeyish opened this issue Nov 23, 2024 · 7 comments
Open

[Performance] HickoryDNS loses nearly 50% of queries #2613

mokeyish opened this issue Nov 23, 2024 · 7 comments

Comments

@mokeyish
Copy link
Contributor

mokeyish commented Nov 23, 2024

Describe the bug

I made a simple DNS performance test repository and obtained the most popular 1 million domain names as a test set.

Then it was discovered that HickoryDNS actually lost nearly 50% of the queries.

To Reproduce

https://github.com/mokeyish/dnsperf-testing

Steps to reproduce the behavior:

  • Case 1: Use nameserver 119.29.29.29 directly

    1. Configure dnsperf .env file
    ## dns server settings
    
    # the server to query (default: 127.0.0.1)
    SERVER=119.29.29.29
    # the port on which to query the server (default: udp/tcp 53, DoT 853 or DoH 443)
    PORT=53
    # transport mode: udp, tcp, dot or doh (default: udp)
    MODE=udp
    TYPE=A
    
    
    ## data
    INPUT_CSV=./data/raw/cloudflare-radar_top-1000000-domains_20241111-20241118.csv
    DOMAIN_FILE=./data/processed/domain.txt
    COUNT=10000
    
    
    ## dnsperf settings
    CLIENT_COUNT=100
    TIMEOUT=5
    
    1. Start testing
    just test
    1. The results
    dns_gold
  • Case 2: use HickoryDNS

    1. HickoryDNS configuration file
    [[zones]]
    zone = "."
    zone_type = "Forward"
    stores = { type = "forward", name_servers = [{ socket_addr = "119.29.29.29:53", protocol = "udp", trust_negative_responses = false }] }
    1. Start HickoryDNS
    cargo run --bin=hickory-dns -- -c tests/test-data/test_configs/example_forwarder.toml -p 8054
    1. Update the dnsperf .env
    ## dns server settings
    
    # the server to query (default: 127.0.0.1)
    - SERVER=119.29.29.29
    + SERVER=127.0.0.1
    # the port on which to query the server (default: udp/tcp 53, DoT 853 or DoH 443)
    - PORT=53
    + PORT=8054
    # transport mode: udp, tcp, dot or doh (default: udp)
    MODE=udp
    TYPE=A
    
    
    ## data
    INPUT_CSV=./data/raw/cloudflare-radar_top-1000000-domains_20241111-20241118.csv
    DOMAIN_FILE=./data/processed/domain.txt
    COUNT=10000
    
    
    ## dnsperf settings
    CLIENT_COUNT=100
    TIMEOUT=5
    1. Start testing
    just test
    1. The results
    dns_bad

Expected behavior

Fix this performance issue.

System:

  • OS: Debian
  • Architecture: x86_64
  • Version 12 bookworm
  • rustc version: rustc 1.82.0 (f6e511eec 2024-10-15)

Version:
Crate: server
Version: 001ced8

@marcus0x62
Copy link
Collaborator

Hi @mokeyish,

Running the same git commit as you, this is what I see against 1.1.1.1:

Statistics:

  Queries sent:         9999
  Queries completed:    9952 (99.53%)
  Queries lost:         47 (0.47%)

  Response codes:       NOERROR 9855 (99.03%), NXDOMAIN 97 (0.97%)
  Average packet size:  request 30, response 78
  Run time (s):         28.946179
  Queries per second:   343.810490

  Average Latency (s):  0.229788 (min 0.011245, max 4.347742)
  Latency StdDev (s):   0.340956

and 8.8.8.8:

Statistics:

  Queries sent:         9999
  Queries completed:    9992 (99.93%)
  Queries lost:         7 (0.07%)

  Response codes:       NOERROR 9895 (99.03%), NXDOMAIN 97 (0.97%)
  Average packet size:  request 30, response 77
  Run time (s):         14.472769
  Queries per second:   690.400020

  Average Latency (s):  0.126741 (min 0.018369, max 4.703716)
  Latency StdDev (s):   0.213172

Most of the few query failures I did see went away with a longer timeout (I was getting late reply messages in a few cases.)

So, with DNS servers that are reasonably close to me, I can't duplicate the behavior you are seeing. If you can reproduce this readily, trace logs from hickory would be helpful.

@mokeyish
Copy link
Contributor Author

@marcus0x62 Are your two directly testing 1.1.1.1 and 8.8.8.8?

One of my tests is to test 119.29.29.29 directly, and the other is to start HickoryDNS and set the upstream to 119.29.29.29.

@marcus0x62
Copy link
Collaborator

marcus0x62 commented Nov 23, 2024

@marcus0x62 Are your two directly testing 1.1.1.1 and 8.8.8.8?

No, the output of dnsperf in my previous message shows requests to Hickory, configured to forward to 1.1.1.1 and 8.8.8.8. I used the same forwarding configuration as in your issue report.

@mokeyish
Copy link
Contributor Author

Here are my logs from testing HickoryDNS (left) and dnsperf (right).

image

@mokeyish
Copy link
Contributor Author

Your data is so close, is it because my computer performance is relatively low? I am running it in WSL.

@mokeyish
Copy link
Contributor Author

@marcus0x62 I ran it on cloudcone and got exactly the same result as yours.

@marcus0x62
Copy link
Collaborator

I think I'd need to see the full log output to venture any sort of guess as to what's going on. Trying running your server with:

RUST_LOG=trace cargo run \
               --all-features \
               --bin hickory-dns \
               -- \
                  --debug -p 8054 \
                 --config tests/test-data/test_configs/example_forwarder.toml --debug

and attach the full log file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants