Subject: Re: Am I using libcares correctly ?

Re: Am I using libcares correctly ?

From: Ben Greear <greearb_at_candelatech.com>
Date: Wed, 27 May 2015 16:30:20 -0700

On 05/27/2015 04:04 PM, Richard wrote:
> Hello:
>
> I am attaching with this email a wireshark capture file
> "failure01.pcapng" (size: 80KB). I don't know if I am allowed to attach
> files in a mailing list and if you will receive it on your end.

First, if you send more captures that you have diagnosed, let us know the frame number
to save a bit of time.

Anyway, that is interesting... The DNS server sends a response, but the client claims
it is not listening on the right port (the ports appear to change by +1 for each
request, except that last request which was +2). Maybe a send attempt failed
due to ARP needing to be refreshed, and the c-ares retransmit logic improperly
incremented sending port without properly fixing up the listening socket?

Maybe you are destructing the cares objects too soon?

Can you post your code, or if not that code, then something else
simple that can reproduce the problem?

I think cares can probably bind it's source port to a particular port. If
your OS is having ephemeral port issues, that might resolve the problem.

If cares has bugs in it's retry logic, that might also work around the problem,
but it would be best to actually fix the problem.

Thanks,
Ben

>
> What I did in this experiment:
>
> - I am running my experiment on my laptop that has windows 7
>
> - My laptop (IP 192.168.1.109) is connected wirelessly to my router
> (IP 192.168.1.1)
>
> - On my laptop I am running a small program that runs in a loop to
> use libcares to resolve www.google.com <http://www.google.com> to its IP address. The loop
> has 100 iterations. Each request to libcares to resolve the hostname
> to IP has a timeout of 10 seconds.
>
> - On the 95th iteration libcares timesout and fails to resolve
> www.google.com <http://www.google.com> to its IP address.
>
> - While performing the experiment I was capturing the traffic with
> wireshark. I used a filter of "arp || dns"
>
> - What I see is that on the 95th iteration I get an ICMP error
> "Destination unreachable".
>
> - I think it is quite possible to get such an error. Afterall I am
> running libcares using UDP and my setup is wireless.
>
> - Am I using libcares wrong ? What is the proper way to use libcares
> ? I am asking libcares to perform a hostname to IP translation
> with a timeout of 10 seconds. Should libcares not try again within
> that 10 seconds if it gets an ICMP error while trying to reach the
> nameserver ?
>
> Thanks
> Richard
>
>
> On Wed, May 27, 2015 at 3:28 PM, Ben Greear <greearb_at_candelatech.com <mailto:greearb_at_candelatech.com>> wrote:
>
> On 05/27/2015 12:22 PM, Richard wrote:
> > Thanks for the response.
> >
> > I am using the latest stable release which is: c-ares-1.10.0
> >
> > Is that what I should be using ? Or should I be using the development version from github.
>
> Looks like you have a good test case to try it out...so I would try latest as well.
>
> If it is still broken, at least it's not some bug that is already fixed
> and it would be worth more time trying to find the issue.
>
> Thanks,
> Ben
>
> >
> > Regards
> > Richard
> >
> > On Wed, May 27, 2015 at 3:07 PM, Ben Greear <greearb_at_candelatech.com <mailto:greearb_at_candelatech.com> <mailto:greearb_at_candelatech.com <mailto:greearb_at_candelatech.com>>> wrote:
> >
> > On 05/27/2015 12:02 PM, Richard wrote:
> > > Hello:
> > >
> > > I am sorry if this message is a duplicate. I tried to post this same
> > > message a few weeks ago through gmane but I never got a response. I am
> > > trying to post again directly via the c-ares mailing list.
> > >
> > > I am a recent libcares user. I hope this is the right forum to post this
> > > question. If not I apologize and please let me know what is the correct
> > > forum or mailing list to use.
> > >
> > > The project I am working on recently switched to use libcares for
> > > DNS queries. Prior to that we were using the standard getaddrinfo()
> > > (see "man 3 getaddrinfo") for name to ip resolution. With getaddrinfo()
> > > we could not specify a timeout value to perform the DNS query. Hence we
> > > switched to use libcares.
> > >
> > > Our use of libcares is getting some stability issues especially if
> > > we use wireless network. libcares will sometimes fail to resolve the
> > > hostname. Libcares will fail to resolve the hostname within our timeout
> > > of 60 seconds.
> > >
> > > Let me elaborate:
> > >
> > > - If I do a few DNS queries using libcares then everything is fine.
> > >
> > > - I wrote a profiling tool that does many DNS queries in a loop. For
> > > example, we resolve "www.google.com <http://www.google.com> <http://www.google.com> <http://www.google.com>" and we do 100 iterations of that
> > > in a loop.
> > >
> > > - If I run that with my laptop that has a wired connection to my router
> > > then everything is fine. Name resolution with libcares is rock solid
> > > and it does not fail to resolve www.google.com <http://www.google.com> <http://www.google.com> <http://www.google.com> to its IP address for
> > > every iteration
> > >
> > > - If I run that with my laptop that has a wireless connection to my
> > > router. Then maybe at the 60th iteration libcares will timeout and not
> > > be able to resolve "www.google.com <http://www.google.com> <http://www.google.com> <http://www.google.com>"
> > >
> > > There must be something I am doing wrong. I am probably using libcares
> > > in a way that it is not supposed to be used. I am wondering how other
> > > projects such as curl are using libcares.
> > >
> > > A few things come to my mind:
> > >
> > > [1] By default libcares makes DNS queries using UDP. Should I use the
> > > ARES_FLAG_USEVC to specify libcares to use TCP instead ?
> > >
> > > [2] Should I implement caching ? At my application level, should I
> > > cache the name to ip resolution and don't invoke libcares to make a DNS
> > > query. Then this leads to the next question of what timeout should I
> > > use to evict old entries from my cache.
> > >
> > > Once again, I am probably using libcares in a wrong way. So your insight
> > > is much appreciated.
> >
> > I doubt using TCP is a good idea. Maybe cares has issues dealing with
> > retransmitting frames, or something like that.
> >
> > First, make sure you are using the latest libcares.
> >
> > Then, maybe a network capture that involves a failure would be helpful.
> >
> > You might also try using 'netem' or some other network impairment tool to
> > drop frames (and/or add delay) to reproduce this in a more controlled environment.
> >
> > Thanks,
> > Ben
> >
> >
> > --
> > Ben Greear <greearb_at_candelatech.com <mailto:greearb_at_candelatech.com> <mailto:greearb_at_candelatech.com <mailto:greearb_at_candelatech.com>>>
> > Candela Technologies Inc http://www.candelatech.com
> >
> >
>
>
> --
> Ben Greear <greearb_at_candelatech.com <mailto:greearb_at_candelatech.com>>
> Candela Technologies Inc http://www.candelatech.com
>
>

-- 
Ben Greear <greearb_at_candelatech.com>
Candela Technologies Inc  http://www.candelatech.com
Received on 2015-05-28