Message ReSequencing in a Distributed
Publisher-Subscriber System
In a typical cloud based distributed
environment with many message publishers and subscribers, the message could be
processed by any of the subscriber and this is usually not predictable. In
certain scenarios, it could be necessary to have a group of messages processed
in sequence though they reached the subscribers randomly.
One solution to this is described
below. The solutions tries to satisfy the following requirements :
1.) Make sure there is no tie-up / hard-link
between subscribers and publishers: Any message can be received by any
subscriber and any publisher can push any message.
2.) When the messages are not
grouped, the system continues to behave as usual / before.
3.) When group messages are detected
at subscriber, change in overall processing time should be minimal.
4.) When group message are detected
at subscriber, no blocking operation should be performed and the subscriber
should continue to be available for receiving and processing other messages.
Each group message is expected to
contain the following items in addition to the message pay load itself :
a.)
Group Message ID.
b.)
Total Number of Messages in group
c.)
Group Message Sequence Number.
e.g.:-
a.)
GROUP_1
b.)
4 //GROUP_1 contains 4 messages in total
c.)
2 //this means that this message is 2nd in the group GROUP_1.
This solution employs subscribers
working in a distributed cooperative manner. As a group message is received by
a subscriber, it queries the distributed hash table to check if any other
subscriber is working on the same group number. If yes, the message received is
pushed onto that subscriber. (A push endpoint is expected to be available for
each subscriber. This list of end points too could be maintained in the
distributed hash table indicated earlier.) If there is no entry in the
distributed hash table for this group, the subscriber adds itself into the
distributed hash linking the message group id with itself.
GroupHash[GROUP_1] = SUBSCRIBER_ID
Additionally, the message received is
added into a local data structure/bag of the receiving subscriber.
When a new message is received in the
group_message_queue, the following steps are executed by the watcher@subscriber
owning the bag.
a.)
check if all messages for the group has been received.
If all messages have been received,
the messages are sorted based on the Group_Message_Sequence_Number and
processed one after the other or as the logic demands for the group.
Distributed hash entry is cleared for this group GroupHash[GROUP_1] =
"" once the processing of message group is complete.
The data structure/bag maintained by
the subscriber would be typically filled in the following scenarios :
a.)
message pushed from another subscriber.
b.)
message pushed by the local listener since no other subscriber is working on
this group.
Partitioning within groups can be
employed if required by employing the same strategy for sub-groups. In this
case, it could be required by subscriber to wait for the sub-group messages to
be processed before the proceeding with the group messages.
The system can be enhanced such that
the subscribers internally check the load of active group message subscriber
(as pointed by GroupHash) before pushing the message. This way, the subscriber
that received the message can take over the ownership of group message
voluntarily; especially if the message received was the last of the group
message expected, requiring a message process flow.