[openamq-dev] OpenAMQ and Sockets Direct
R Hayes
rfhayes at reillyhayes.com
Sat Jan 12 00:27:09 CET 2008
I have no idea if the problem I'm having is an issue with OpenAMQ or
with the Sockets Direct Library. I know there's not much you can *do*
to help me resolve this without access to the target environment. So
I'm mostly looking for some information that will help me track it down.
Background:
We're running in a blade enclosure with an infiniband backplane (as
well as a gigE on each blade talking to the outside world. We're
running RHEL5 with a low latency kernel.
Libsdp.so is a library for InfiniBand. It loads as a shim (using
LD_PRELOAD) in between the standard socket implementation and the user
program. When you make a TCP connection across and InfiniBand network
to a server that's also got libsdp.so loaded, TCP is bypassed and SDP
(using the reliable infiniband transport) is used instead. This has
higher throughput AND lower latency over the same physical infiniband
network.
This all worked fine with OpenAMQ 1.2c1. Perfectly. It works fine
with a number of other TCP based test applications. OpenAMQ 1.2c4 is
the only place we have problems.
With OpenAMQ 1.2c4, it no longer works on the amq_client side (no
detectable issues on the amq server side).
The failure occurs when trying to connect to the amq_server. Running
amq_client we get a simple "could not connect":
$ LD_PRELOAD=libsdp.so /opt/local/ibase1.2c4/bin/amq_client -s
ibnode1:5672
17:46:03: E: could not connect to ibnode1:5672
(Hangs)
As it turns out, we can make it work by configuring libsdp to force
all TCP traffic in the process to use SDP. Which is fine for our
components that only need to talk *inside* the blade enclosure, but
not so good for the components that need to talk outside the blade
enclosure AND use openamq.
The libsdp traces tell me that libsdp thinks the connection is being
made succesfully. But this knowledge isn't making it to OpenAMQ.
The libsdp traces lead me to believe that libsdp may do something
"clever" with the file descriptors.
What I'm hoping for is some advice on places to look:
1) Did the process of connecting to the server change from 1.2c1 to
1.2c4 in ways I should know about? I'm going to look through code
diffs, but advice is always helpful.
2) Any trickery relating to handling of file descriptors I should be
aware of?
thanks,
r hayes
More information about the openamq-dev
mailing list