[openamq-dev] round-robin queues
Martin Sustrik
sustrik at imatix.com
Tue Feb 12 10:07:37 CET 2008
Hi Terry,
> Martin> Although this is not difficult to implement, there is a good reason
> Martin> not to do so. Actually, the load balancing in exchange would break
> Martin> the very idea of load balancing. There are two thing to take into
> Martin> account:
>
> Martin> 1. If a service instance is not connected at a moment the load
> Martin> balancing algorithm shouldn't block any messages and it should send
> Martin> them to running instances of the same service immediately.
>
> Martin> 2. If there is no running instance of a service, the messages
> Martin> should be queued until one becomes available.
>
> Martin> After giving it a little thought you'll realise that load balancing
> Martin> should be done *after* the message is queued, never before (i.e. in
> Martin> the exchange).
>
> I've given it a little thought, but I really don't understand :-)
>
> In your 1, what's the difference between sending to all matching queues and
> sending to just one? I don't see what that has to do with blocking.
>
> I don't see what your 2 has to do with the difference between sending to
> all relevant queues or just picking one. As I'm thinking of it, there is no
> queue in this case - it's the running services that create the queues.
>
> Sorry if this sounds dense or like me being difficult - I really don't get
> it.
>
Sorry about this. I've been messing with the stuff so long that what
seems obvious to me may be unobvious to people from outside.
Firstly, if the load balancing is done on the exchange level and there
is a queue for each separate service instance, failure of a service
instance will cause messages still be dispatched to its queue with
nobody to process them. This will increase the response time for a
particular request, that was dispatched to the failed service instance
to grow to several hours or however long it takes to fix the problem.
Secondly, if you would like to avoid the problem using auto-delete
queues (i.e. queue disappears once the service instance fails) you'll
end up with no queue to dispatch the message to when there is no
instance of the service running. That'll result in the request being lost.
> I agree with all this, but it doesn't seem to move me towards a practical
> solution.
>
> For example, suppose I want to interact with resource r1. Do I need to
> maintain and consult some database table that tells me which queue that
> resource is currently available on? That puts the work of maintaining and
> deciding where to route messages to onto my code - which seems wrong
> because that's precisely what you use something like AMQP for.
>
> We have also talked about something like your approach. In it, we would
> keep a table that indicates which boxes (i.e., queues, seeing as this would
> be a 1-queue-per-box setup) hold which resources. So if you have a resource
> on box 1 which creates and listens on queue 1, and you replicate it onto
> box 2 (listening on queue 2), then you also tell another box or process
> that r1 is now also available on q2. To obtain service for r1 a client
> would then ask which queues r1 was on, get an answer of (q1, q2) and choose
> one at random.
>
> That would work. BUT..... why should we do that work? Isn't that exactly
> the kind of work a messaging hub is explicitly designed to do?
>
> Maybe there's something in all this about blocking that I don't understand.
>
If you want to change the topology of your services, you just have to do
something. It won't come for free. It doesn't matter whether whether
it's done on your app's level or by instructing OpenAMQ broker to do so,
whether the task is automated or done by hand etc. You still to do
*something*.
Now, the question is how to make this *something* as minimal as possible.
My solution would be to associate each resource with a queue name. This
can be done via simple two column DB table (resource name->queue name).
By default the queue name = "common". I.e. there is only a single queue
called 'common' for all non-replicated resources.
When you are replicating the resource, you'll change it's queue name to
equal to the resource name, i.e. there will be separate queue for each
replicated resource.
So, when starting service you'll do sometihng like this:
service_start (string resource_name)
{
queue_name = sql ("SELECT queue_name FROM resources WHERE
resource_name = %s", resource_name);
queue_declare (queue = queue_name);
queue_bind (exchange = amq.direct, queue = queue_name, routing_key =
resource_name);
basic_consume (queue = queue_name);
}
When you are creating a new resource:
INSERT VALUES (queue_name, "common") INTO resources
When replicating a resource:
UPDATE resources SET queue_name = resource_name WHERE resource_name = ...
Martin
> Martin> Ok, the above may not be exactly your case, but my point is that
> Martin> you should take the nature and performance profile of the resources
> Martin> into account rather than stating that 'we have 1,000,000 resources
> Martin> and we want the broker to handle them neatly non depending on
> Martin> whether there's one message a day or 100,000 messages a second.'
>
> We do take this into account. Resources that are not being used can be
> taken offline. Resources that are heavily used can be replicated. Resources
> that are lightly used can all live on the same box. Resources that are big
> can live on their own box. Etc. All that can be automated. The question
> is: when we have a resource r1 replicated on several boxes, can we use AMQP
> to round-robin messages to those boxes?
>
> It looks like the answer is no; that we should be doing that work ourselves
> at the other end of the queue. I just don't understand why that's a better
> solution.
>
> Thanks again for your patience!
>
> Terry
>
More information about the openamq-dev
mailing list