No subject


Mon Feb 11 23:40:58 CET 2008


work so that we don't have to do it.

Suppose you're building a service that provides access to a (potentially
very large) number of resources, r1, r2, r3, etc. You want to distribute
the work of serving these resources over a bunch of boxes. In the simples=
t
case, each resource would be hosted on exactly one box, and each box coul=
d
host multiple resources. Then you'd just need to partition resources over
boxes.

In one implementation approach (I-1), you could have a queue per resource=
,
and when something wanted to access that resource they could directly rou=
te
a message to the queue for that resource. But if you have 1M resources,
that's 1M queues.  At some point that approach wont scale.

Thus you might decide (I-2) to have one queue per box. Then you can set u=
p
a headers exchange and route messages to the right queue based on a
resource header in messages.  That puts the work of deciding which queue =
to
route messages to onto the AMQP broker. That's of course a good thing,
providing the broker is designed to handle that sort of setup and load.

That's all well and good.

So you build and run your system. After a while you notice that resource =
r1
is extremely heavily used. At that point you'd like to replicate the r1
resource by making another box available that can also handle r1.

The ordinary thing to do here, I think, would be to set up >1 boxes on th=
e
end of the existing queue for r1 and distribute the incoming requests to
these boxes (*). But it's work to set that up, and it makes your
architecture non-uniform because now on the end of some queues you have
something directly serving resources (e.g., r2), and on the end of others
you have something doing further request distribution (for r1).

A nicer thing to do would be to simply have another box in your original
setup (I-2) also handle r1.  Now you've got two boxes that can equally
handle r1 requests (bear in mind that each of those boxes is also handlin=
g
other resources).  So the AMQP broker could take on the work of round-rob=
in
sending r1 requests to a machine serving that resource. That would be a
good thing because the AMQP broker would be doing the message distributio=
n
work for you, and your architecture would remain simpler / uniform.


As I understand it, an AMQP exchange looks at the properties of an incomi=
ng
message and its binding tables and decides which queues to forward copies
of the message to. In our (little) experience, it looks like OpenAMQ brok=
er
(presumably as specified or implied by the AMQP spec) unconditionally sen=
ds
the message to all queues whose bindings match the incoming message.

Is that right?


Esteve is asking if it's somehow possible to have the broker use, for
example, round-robin to select just one of the matching queues to send to=
.

This seems like a useful thing to be able to do. It also seems like it
shouldn't be hard to implement.

Yes, it breaks (by extending) the AMQP spec, but that's why we have
standards :-) Seriously, seeing as this simply provides more functionalit=
y
and is backwards compatible, it could be added to the spec.

So Esteve proposes that on creating an exchange you could optionally tell
it to do queue round robin on matching messages. To make this even more
useful, clients creating queues could also indicate their weight and thus
receive more or less messages according to their processing abilities.

Does that help make things clearer?

None of that seems unusual to me, so I still suspect that we're doing
something fairly common and we're just not looking at it, or describing i=
t,
in terms that make sense to people who've been building these things for
years.

>>>>> "Martin" =3D=3D Martin Sustrik <sustrik at imatix.com> writes:
Martin> Your use case looks like standard SOA architecture, i.e. client
Martin> sends a request with routing_key=3Dservice type=3DDB table name
Martin> ('book', 'dvd', etc.)  The request is then routed to the
Martin> appropriate queue (1 queue per service) and afterwards the queue
Martin> content is load balanced between service instances (individual
Martin> boxes where the table clones reside).

This is (*) above, if I'm reading you right. You suggest that we do work
"afterwards", but that work mainly involves message routing. We're simply
asking if the AMQP broker could be doing that work. Message routing like
that is its raison d'=EAtre, after all.

Martin> How can you manage that amount of DB tables? I just cannot make
Martin> myself believe that they are really "independent". How big is the
Martin> team that administers the distributed database of that size? It
Martin> seems strange.  Maybe the categories form a tree-like hierarchy?
Martin> Maybe they are cartesian product of two criteria with 1,000
Martin> possible values for each of them? Etc.

It's not a hierarchy or a Cartesian product etc. You also don't need a bi=
g
team to build something supporting millions of resources - you can just u=
se
Amazon EC2 :-)

Anyway, the question is independent of the size. It's whether this sort o=
f
approach can work at all. Can an AMQP broker handle the round-robin
scheduling above, or do we need to: a) handle it ourselves on the end of =
a
queue, or b) add the round-robin approach to OpenAMQ?  And if b, what are
the chances that the wider community (OpenAMQ developers, AMQP designers)
would find this worthwhile supporting so that we didn't have to maintain =
a
patched broker (which is work, and which may restrict us to just OpenAMQ =
as
a broker)?

I hope that's a bit clearer. Sorry for so many words!

Thanks for your time and patience,
Terry


More information about the openamq-dev mailing list