Subject: Re: Two questions: 1. Callback function not called; 2. TCP connecting to un-existing server prevents from using existing server.

Re: Two questions: 1. Callback function not called; 2. TCP connecting to un-existing server prevents from using existing server.

From: Anlin Zhang <anlin.zhang_at_gmail.com>
Date: Thu, 9 Aug 2007 12:19:11 -0700

Hi,
Please ignore my question 1.
In my application, if select() doesn't return positive value, I don't call
ares_process(channel, readset, writeset). That's wrong, I should always
call ares_process(channel, readset, writeset) no mater whether or not
select() return positive value. Now I can always get callbacks, either with
value or timeout.

For question 2.
Attached is a diff -u out put to show my fix of the tcp connection problem.
I also insert here since it's not big:

--- ares_process.c.original_1.4.0 2007-08-09 11:44:31.000000000 -0700
+++ ares_process.c.myChangeOnOriginal_1.4.0 2007-08-09 11:57:42.000000000-0700
@@ -27,6 +27,7 @@
 #endif
 #include <netinet/in.h>
 #include <netdb.h>
+#include <netinet/tcp.h>
 #include <arpa/nameser.h>
 #ifdef HAVE_ARPA_NAMESER_COMPAT_H
 #include <arpa/nameser_compat.h>
@@ -518,6 +519,15 @@
   server = &channel->servers[query->server];
   if (query->using_tcp)
     {
+
+ /* We can setup all connection here, or at init too.
+ int iii;
+ for ( iii=0; iii < channel->nservers; iii++) {
+ if (channel->servers[iii].tcp_socket == ARES_SOCKET_BAD){
+ open_tcp_socket(channel, &channel->servers[iii]);
+ }
+ }
+ */
       /* Make sure the TCP socket for this server is set up and queue
        * a send request.
        */
@@ -530,6 +540,35 @@
               return;
             }
         }
+
+ printf("test tcp_socket...\n" );
+ ares_socket_t s;
+ struct sockaddr_in sockin;
+
+ // Acquire a socket.
+ s = server->tcp_socket;
+ // test Connect to the server.
+ memset(&sockin, 0, sizeof(sockin));
+ sockin.sin_family = AF_INET;
+ sockin.sin_addr = server->addr;
+ sockin.sin_port = (unsigned short)(channel->tcp_port & 0xffff);
+ if (connect(s, (struct sockaddr *) &sockin, sizeof(sockin)) == -1) {
+ int err = SOCKERRNO;
+
+ if (err == EINPROGRESS || err == EWOULDBLOCK || err==EALREADY) {
+ printf("connect still EINPROGRESS, cannot use this one, skip to
next\n");
+ next_server(channel, query, now);
+ return;
+ }
+ if (err == EISCONN) {
+ printf("aready connected, use this one, fd is %d\n",
server->tcp_socket);
+ }
+ } else {
+ printf("connect() return no error, use this one %d\n",
server->tcp_socket);
+ }
+
+ printf("start put query in place...\n" );
+
       sendreq = calloc(sizeof(struct send_request), 1);
       if (!sendreq)
         {
@@ -658,6 +697,23 @@
   /* Set the socket non-blocking. */
   nonblock(s, TRUE);

+ int opt = 1;
+ if (setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (const void *)&opt,
sizeof(opt)) < 0) {
+ printf("setsockopt keepalive error, errno: %d\n", errno);
+ }
+ int keepIdle=2;
+ if (setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (const void *)&keepIdle,
sizeof(keepIdle)) < 0) {
+ printf("setsockopt keepalive-keepIdle error, errno: %d\n", errno);
+ }
+ int keepCnt=1;
+ if (setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (const void *)&keepCnt,
sizeof(keepCnt)) < 0) {
+ printf("setsockopt keepalive-keepCnt error, errno: %d\n", errno);
+ }
+ int keepIntvl=1;
+ if (setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (const void *)&keepIntvl,
sizeof(keepIntvl)) < 0) {
+ printf("setsockopt keepalive-keepIntvl error, errno: %d\n", errno);
+ }
+
   /* Connect to the server. */
   memset(&sockin, 0, sizeof(sockin));
   sockin.sin_family = AF_INET;

Thanks,
Anlin.

On 8/8/07, Anlin Zhang <anlin.zhang_at_gmail.com> wrote:
>
>
> Hi,
>
> We are trying to use c-ares in a sort of mission-critical application,
> but encountered some problems. Want to know if you have solutions.
>
> Two problems:
> Problem 1: Not all queries get callback after calling 200 ares_query().
> UDP packets might lost but shouldn't there be a callback informing timeout?
> Or do I need set some channel options to get all callbacks?
>
> Problem 2: If force "Always use TCP" and primary DNS host machine is
> shut-down, tcp setup to primary DNS server takes more than a minute to give
> up before trying to connect to Secondary DNS server (shutdown means not just
> kill the named process but power-down the server, this makes big difference
> since there is no TCP stack running to reply tcp SYN). I did a quick fix of
> this problem, now it can immediately switch to the secondary DNS server. I
> also applied tcp keepalive to the socked to detect unplugged network cable.
> Want you to review the change (I only considered compiling in linux here,
> tcp keepalive might not be portable). I'm still having "Problem 1" with TCP
> if I shut-down and re-start Primary or Secondary DNS server randomly.
>
> Problem 1 Details:
> With default channel setting (ares_init) and two DNS servers in
> /etc/resolv.conf.
> calling ares_query() 200 times for 200 host names (I see ares use UDP to
> send queries).
> loop{
> ares_fds(),
> ares_timeout(),
> select(),
> process(),
> }
> after looping many times, only part of the 200 hostnames get callback,
> rest seems just lost, not even callback with timeout.
>
> Problem 2 Details:
> With optional settings in Channel (ares_init_options), set opts.flags =
> ARES_FLAG_USEVC | ARES_FLAG_NORECURSE | ARES_FLAG_STAYOPEN ;
> Power-down the Primary DNS server (or use iptables to block tcp packets).
>
> calling ares_query() 200 times for 200 host names.
> ares try unblocking connect to Primary DNS server, since getting no ACK
> on SYN, it starts the TCP handshake timeout algorithm which could take more
> than 75 seconds to give up.
> However, the FD is considered valid. But if application do a select on
> the FD, no event on it. Ares_processs seems not trying other servers either.
>
> I did a fix of this problem 2 by checking if the FD is really connected
> before queue the query to the connection. How to detect if the connection is
> setup of not? I simply call the unblocking connect again. If the new
> connect() returns error EINPROGRESS or EAGAIN or EALREADY, then means the
> connection is still in progress and it cannot be used to send queries. If
> connect() return no error or return error EISCONN, means it's connected and
> ready to send queries. If it's not connected, I call next_server() to try
> connecting to other servers. Here is code I added in ares__send_query():
>
> printf("test tcp_socket...\n" );
> ares_socket_t s;
> struct sockaddr_in sockin;
>
> /* Acquire a socket. */
> s = server->tcp_socket;
> /* test Connect to the server. */
> memset(&sockin, 0, sizeof(sockin));
> sockin.sin_family = AF_INET;
> sockin.sin_addr = server->addr;
> sockin.sin_port = (unsigned short)(channel->tcp_port & 0xffff);
> if (connect(s, (struct sockaddr *) &sockin, sizeof(sockin)) == -1) {
> int err = SOCKERRNO;
>
> if (err == EINPROGRESS || err == EWOULDBLOCK || err==EALREADY) {
> printf("connect still EINPROGRESS, cannot use this one, skip to
> next\n");
> next_server(channel, query, now);
> return;
> }
> if (err == EISCONN) {
> printf("aready connected, use this one, fd is %d\n",
> server->tcp_socket);
> }
> } else {
> printf("connect() return no error, use this one %d\n",
> server->tcp_socket);
> }
>
> This fix seems resolved the problem of not being able to use secondary
> DNS server when tcp setup to primary DNS server is in progress. But what
> happens if primary DNS server is up and connection has been setup
> successfully at beginning but later it crashes (for example, unplug the
> network cable)? In that case c-ares will be fooled thinking the FD is still
> usable and indeed write() to it will not get error! My solution here is to
> apply tcp KeepAlive on the connection. Here is the code I added in
> open_tcp_socket() function:
>
> /* Set the socket non-blocking. */
> nonblock(s, TRUE);
>
> int opt = 1;
> if (setsockopt(s, SOL_SOCKET, SO_KEEPALIVE, (const void *)&opt,
> sizeof(opt)) < 0) {
> printf("setsockopt keepalive error, errno: %d\n", errno);
> }
> int keepIdle=2;
> if (setsockopt(s, IPPROTO_TCP, TCP_KEEPIDLE, (const void *)&keepIdle,
> sizeof(keepIdle)) < 0) {
> printf("setsockopt keepalive-keepIdle error, errno: %d\n", errno);
> }
> int keepCnt=1;
> if (setsockopt(s, IPPROTO_TCP, TCP_KEEPCNT, (const void *)&keepCnt,
> sizeof(keepCnt)) < 0) {
> printf("setsockopt keepalive-keepCnt error, errno: %d\n", errno);
> }
> int keepIntvl=1;
> if (setsockopt(s, IPPROTO_TCP, TCP_KEEPINTVL, (const void *)&keepIntvl,
> sizeof(keepIntvl)) < 0) {
> printf("setsockopt keepalive-keepIntvl error, errno: %d\n", errno);
> }
>
> If the primary DNS server is offline in the middle, the keepalive
> mechanism can shutdown the socket, making the FD un-writable and forcing
> write() return errors, the ares_process() will detect it and switch to other
> connections.
> Now it seems working as expected. If someone can fix the Problem 1, then
> it will be perfect.
> I will attach the whole ares_process.c file that I modified here for your
> review (a little messy by inserting logs, again, I only considered compiling
> in linux so far). So far I still don't understand most of the c-ares code
> yet, hopefully I didn't break anything. Please review it and maybe there are
> better ways to achieve this (for example, we can try other connections once
> write() returns errors, or add another state to tcp_socket).
>
> Thanks,
> Anlin.
>
>

Received on 2007-08-09