Subject: Re: New release?

Re: New release?

From: John Engelhart <john.engelhart_at_gmail.com>
Date: Tue, 27 Oct 2009 18:02:25 -0400

On Tue, Oct 27, 2009 at 2:23 PM, Daniel Stenberg <daniel_at_haxx.se> wrote:

> On Tue, 27 Oct 2009, Jakub Hrozek wrote:
>
> I know there had been a similar question asked by Daniel a couple of months
>> back, but since then, some other patches landed..so I wanted to ask again -
>> does c-ares upstream plan a new release to facilitate the changes since
>> 1.6.0?
>>
>
> I'm all for doing a release. It's really not that much work.
>
> Let's work on getting the relevant patches applied, then give a few days
> for everything to settle and then release!
>
> So are there any patches that have faded away that we need to bring out
> into the light again?

I'm investigating a possible bug with ares_timeout() and timeout logic. I'm
still trying to figure out if it's a misunderstanding on my part or an
actual problem.

While I've got your attention, let me describe to you what I'm seeing and
see if it tickles anything with someone who's more familiar with the code
base.

Basically, I'm using c-ares as the default resolver for my application,
which means a channel stays open for essentially the duration of the
applications life time. I need resolution to be very asynchronous, mostly
for user interface response issues.

The basic DNS work loop looks something like (this is Mac OS X / Cocoa /
Objective-C, btw):

        int numberOfActiveFileDescriptors = 0, selectResult = 0;
        fd_set readFileDescriptors, writeFileDescriptors;
        struct timeval timeOutTV, *timeOutTVPtr = NULL;

        FD_ZERO(&readFileDescriptors);
        FD_ZERO(&writeFileDescriptors);

        if((numberOfActiveFileDescriptors = ares_fds(channel,
&readFileDescriptors, &writeFileDescriptors)) == 0) { [condition lock];
if(([[NSThread currentThread] isCancelled] == NO)) { [condition
waitUntilDate:[NSDate dateWithTimeIntervalSinceNow:(1.0 / 37.0)]]; }
[condition unlock]; }
        else {
          timeOutTV.tv_sec = 0L;
          timeOutTV.tv_usec = 27027L; // (1.0 / 37.0) of a second.
          timeOutTVPtr = ares_timeout(channel, &timeOutTV, &timeOutTV);

          // XXX: Added to work around observed behavior!
          if((timeOutTVPtr->tv_sec == 0L) && (timeOutTVPtr->tv_usec == 0L))
{ timeOutTVPtr->tv_sec = 0L; timeOutTVPtr->tv_usec = 27027L; } // <<---
XXX!!

          if((selectResult = select(numberOfActiveFileDescriptors,
&readFileDescriptors, &writeFileDescriptors, NULL, timeOutTVPtr)) > 0) {
ares_process(channel, &readFileDescriptors, &writeFileDescriptors); }
        }

The problem is at the 'XXX' comment(s). Basically, I've observed behavior
where "very long running" queries were consuming an excessive amount of CPU,
particularly system CPU time. On investigation I found that ares_timeout
was returning a timeout of 0 seconds and 0 microseconds. The little digging
I've done so far turned up that ares_send.c sets the timeout to 0 seconds
and 0 microseconds "by default".

Since I don't know the code base, it's hard to determine what the actual
intent of this is, but on first approximation, this seems wrong. The end
effect seems to be that ares_timeout will almost always return with 0
seconds, 0 microseconds of wait, which effectively translates in to "return
from select() immediately if there is no data available". Setting a "max
timeout" doesn't seem to help because the logic in ares_timeout.c 'prefers'
the 0/0 result it has determined. In the end, without the 'XXX' lines, the
code sits in a very tight loop endlessly calling select() until some data is
available.

Unfortunately I don't have an example query handy that "takes a long time".
 The one that I had used TXT records to return some of 'non-standard'
information (ie, not your normal name to address lookup stuff), and the
query I was using as a test that took an excessive amount of time is now
'working correctly' and returning almost instantly. :(

So... bug, or just a misunderstanding on my part?
Received on 2009-10-27