[openamq-dev] OpenAMQ and Sockets Direct

R Hayes rfhayes at reillyhayes.com
Sat Jan 12 00:27:09 CET 2008


I have no idea if the problem I'm having is an issue with OpenAMQ or  
with the Sockets Direct Library.  I know there's not much you can *do*  
to help me resolve this without access to the target environment.  So  
I'm mostly looking for some information that will help me track it down.

Background:

We're running in a blade enclosure with an infiniband backplane (as  
well as a gigE on each blade talking to the outside world.  We're  
running RHEL5 with a low latency kernel.

Libsdp.so is a library for InfiniBand.  It loads as a shim (using  
LD_PRELOAD) in between the standard socket implementation and the user  
program.  When you make a TCP connection across and InfiniBand network  
to a server that's also got libsdp.so loaded, TCP is bypassed and SDP  
(using the reliable infiniband transport) is used instead.  This has  
higher throughput AND lower latency over the same physical infiniband  
network.

This all worked fine with OpenAMQ 1.2c1.  Perfectly.  It works fine  
with a number of other TCP based test applications.  OpenAMQ 1.2c4 is  
the only place we have problems.

With OpenAMQ 1.2c4, it no longer works on the amq_client side (no  
detectable issues on the amq server side).

The failure occurs when trying to connect to the amq_server.  Running  
amq_client we get a simple "could not connect":

$ LD_PRELOAD=libsdp.so  /opt/local/ibase1.2c4/bin/amq_client -s  
ibnode1:5672
  17:46:03: E: could not connect to ibnode1:5672
(Hangs)

As it turns out, we can make it work by configuring libsdp to force  
all TCP traffic in the process to use SDP.  Which is fine for our  
components that only need to talk *inside* the blade enclosure, but  
not so good for the components that need to talk outside the blade  
enclosure AND use openamq.

The libsdp traces tell me that libsdp thinks the connection is being  
made succesfully.  But this knowledge isn't making it to OpenAMQ.

The libsdp traces lead me to believe that libsdp may do something  
"clever" with the file descriptors.

What I'm hoping for is some advice on places to look:

1) Did the process of connecting to the server change from 1.2c1 to  
1.2c4 in ways I should know about?  I'm going to look through code  
diffs, but advice is always helpful.

2) Any trickery relating to handling of file descriptors I should be  
aware of?

thanks,

r hayes







More information about the openamq-dev mailing list