Saturday, 25 May 2013

Message ReSequencing in a Distributed Publisher-Subscriber System

Message ReSequencing in a Distributed Publisher-Subscriber System

In a typical cloud based distributed environment with many message publishers and subscribers, the message could be processed by any of the subscriber and this is usually not predictable. In certain scenarios, it could be necessary to have a group of messages processed in sequence though they reached the subscribers randomly.
One solution to this is described below. The solutions tries to satisfy the following requirements :
1.) Make sure there is no tie-up / hard-link between subscribers and publishers: Any message can be received by any subscriber and any publisher can push any message.
2.) When the messages are not grouped, the system continues to behave as usual / before.
3.) When group messages are detected at subscriber, change in overall processing time should be minimal.
4.) When group message are detected at subscriber, no blocking operation should be performed and the subscriber should continue to be available for receiving and processing other messages.
Each group message is expected to contain the following items in addition to the message pay load itself :
            a.) Group Message ID.
            b.) Total Number of Messages in group
            c.) Group Message Sequence Number.
e.g.:-
            a.) GROUP_1
            b.) 4 //GROUP_1 contains 4 messages in total
            c.) 2 //this means that this message is 2nd in the group GROUP_1.
This solution employs subscribers working in a distributed cooperative manner. As a group message is received by a subscriber, it queries the distributed hash table to check if any other subscriber is working on the same group number. If yes, the message received is pushed onto that subscriber. (A push endpoint is expected to be available for each subscriber. This list of end points too could be maintained in the distributed hash table indicated earlier.) If there is no entry in the distributed hash table for this group, the subscriber adds itself into the distributed hash linking the message group id with itself.
GroupHash[GROUP_1] = SUBSCRIBER_ID
Additionally, the message received is added into a local data structure/bag of the receiving subscriber.
When a new message is received in the group_message_queue, the following steps are executed by the watcher@subscriber owning the bag.
            a.) check if all messages for the group has been received.
If all messages have been received, the messages are sorted based on the Group_Message_Sequence_Number and processed one after the other or as the logic demands for the group. Distributed hash entry is cleared for this group GroupHash[GROUP_1] = "" once the processing of message group is complete.
The data structure/bag maintained by the subscriber would be typically filled in the following scenarios :
            a.) message pushed from another subscriber.
            b.) message pushed by the local listener since no other subscriber is working on this group.
Partitioning within groups can be employed if required by employing the same strategy for sub-groups. In this case, it could be required by subscriber to wait for the sub-group messages to be processed before the proceeding with the group messages.
The system can be enhanced such that the subscribers internally check the load of active group message subscriber (as pointed by GroupHash) before pushing the message. This way, the subscriber that received the message can take over the ownership of group message voluntarily; especially if the message received was the last of the group message expected, requiring a message process flow.



Thursday, 15 November 2012

Tyco Security Products - Bangalore Openings

Tyco Security Product Bangalore team looking out for a lead engineer and a build engineer.
Find details here : http://sdrv.ms/SBx1g1

Monday, 11 June 2012

Assembly Line Programming - a practise for better code.



With availability of skilled programmers on the decline, software development firms need to look at alternate approaches for quality software release. A possible approach being multi-level-programming/ assembly-line programming(just coined)

Engineers based on their expertise could be placed in a particular level/rank, with any software development task to be picked up by the lowest of the level developers. S/he can complete as much as possible based on their expertise and knowledge. The completed software piece at each level is pushed to the next level for further work.

Key points:

1.) Completion at each level - It should be noted that after each level, the code is functionally expected to be 100% complete.

2.) More than just cleaning - its about tweaking, making the code artsy/beautiful at each level.

3.) Its more than just a code review. Its about the next level of programmers picking up the code as their starting point and changing (including heavy refactoring) to make the piece of code better , perfect and world class.

4.) Each level of programmer is expected to take ownership of the code in concern. When in your level, you own it.

A streamlined approach such that the programmers down the level get a chance to learn can be implemented with difference-reports emailed after each check-in. No action is expected from the recipient developer at this stage - this just FYI; passionate developers can learn.

Upper level developers shouldn't look at the piece of work as a cleaning/reviewing process, but  as a complete development process. They could treat the inputs they received as templates that are partially filled up. Think of skeletons with minimal flesh that you get - you have the flexibility to tone that piece of art based on your skill.

As you apply the same principle at each level, the code is expected to get cleaner and nearer perfection. Additionally, this makes sure that the very senior/experienced geek is not bothered with the very basic nitty gritty of things.

3 levels of programmers is a good start for any organization. Its quite easy to classify them based on the experience at any software development house.

Pair programming can be pain in some situations where the wavelength of the two developers don't  match - it can get destructive for the experienced developer. With assembly line programming , the experienced developer is on his/her own during the development process.

Theoretically, the unit test cases should not be changed as we are not expected to change the functionality as such. New unit test would definitely be added as the code is polished up the levels. Having said that, there could be instances that the skeletons/contracts can get changed at upper levels.

Cost-Benefit - Each orgranization /  project is on their own to perform a cost benefit analysis and tweak the number of levels as desired.

Thursday, 29 March 2012

Interpreting software capabilities - smartly ?

What level of /intelligence/ does a software need such that it can interpret the capabilities / functions of another software?

a. Given the codebase for one software component, can it parse, deduce what the other software is trying to acheive ? (could be called : white-box analysis)

b. Can it watch the way this software component behaves in different situations (inputs) and then deduce behavior ? (black-box analysis). How much time before the it has /understood/ 80% of the behaviour? How could we quantify 80% when we do not know whats 100% ?

b.1 Would observer system watch discreetly or would it be an agreement wherein the observer provides a set of inputs to the observee to respond ?

(i think we are now in the realms of machine learning)

c. Most importantly, can it mimic the behaviors/capabilities that it learned?

d. Interestingly enough ,can the same observer software learn and mimic itself ?

A possible research area.

Friday, 23 March 2012

SessionFlow - seamless flow of sessions & context across devices


 
Typical pain point – you are chatting/composing message/browsing on your mobile and now that you have reached home, you want to switch to your iPad/your favorite other device.

 
 
Solution – Track mobile device orientation(gyro) change + the GPS location. If one device is getting tilted clockwise relative with the other device just below it (think about water flowing from a jug to a cup), move the session states from the top device to the bottom. Application of this could be numerous - any application context/state that you need to push to another device could be used using SessionFlow. In case of devices without a gyro/GPS, allow for session push using shared WiFi/BlueTooth etc.

 Stuff Required to be implemented :
 
           Client libraries on multiple devices that :
  • can interpret the change in orienatation, location, identify if it’s a SessionFlow request.
  • Push state, session, context etc to the SessionFlow servers.
  • Start target app, apply the received state/session/context to the target device application.  Etc and finally give the update back.
  • Continually push location (lat/lang) to the server. 
         Services that :
  • Allow thirdparty servers to register (see point 3).
  • Push session-flow request to thirdparty server
  • Interpret and identify another device below the mentioned device.
  • Can trigger target devices once a session flow request is received. 
        Session Flow Standards/Protocols for client apps and servers to consume :
  • An entire deck of protocols to be created for communication across systems.
  • Define communication sequences and states.
  • Enable target applications and its service application to use the SessionFlow services and libraries.
Business Angle:
  • Vendors would subscribe to sessionflow servers on the cloud for enabling sessionFlow.
  • They could use client libraries for enabling sessionflow on their device applications or just use the standard we provide.
  • Subscription to be based on number of session-moves.
  • Theoretically this can be enabled for anything – including copying files. 
Challenges:
  •                       Exact interpretation of location, altitude, tilt can get tricky – we don’t want the session to flow to your neighbor one floor below.
  •                      Deducing whether the target device is just below can be complex unless some smart indexing is maintained at the server.
  •                     Expect very high load on the servers due to continuous position tracking. Additionally think of non-server based solutions where two devices just move session using wifi/Bluetooth etc.
 

Tuesday, 6 March 2012

Copying Images & Hyperlinks from a docx to another.



Copying the textual data from an openxml based docx file was pretty OK - you just had to copy all the child nodes of //body node from the source to the destination.

Conceptually this would be in the lines of :

1.) load the source document's MainDocumentPart.Document to a XmlDocument.
2.) load the destination document's MainDocumentPart.Document to another XmlDocument.
3.) locate //body child node for each XmlDocument
4.) copy all child node from the source //body child node to the destination document //body child node. You might have trouble copying directly, hence use an ImportNode and then do an add.
5.) Once done for all child node,save the XmlDocument back to the MainDocumentPart.Document

All seems fine except for the fact that images/hyperlinks wouldnt appear in the destination document when opened. The following stuff has to be done additionally to get this working

1.) For each image part in the source, add a new Image part into the destination. This makes sure the final document has got the following entries added (rename the docx to zip and check):
a.) Word\Media folder has got the the images as separate files
b.) Word\relations xml has got an entry for each image with target pointing to the appropriate file in Word\Media
c.) The content types xml in the root has got an entry for this specific file type; say jpg.

Once you have the following three entries appearing right, your image should appear OK in the final document. Note that once the AddPart<ImagePart>() is done with, you would have to Close() the Package explicitly. This makes sure the above entries such as relation, content type entries get rightly saved into the target document. This is a critical step. Just saving the MainDocumentPart.Document to a FileStream is not going to help.

2.) Similar to Images, for each hyper link in source, add a new hyper link relation. This too would cause relation ship entries, content types entries in the destination rightly created.

Points of interest

If you are copying from multiple source document, you would have to make sure that while copying the child nodes, the relation-id of the content in concern (say image/blip/hyperlink) is temporarily updated to a unique-id such that it does not conflict with the same relation-id  in a different source file.

Additionally, when adding the related imagepart/hyperlink, make sure that the imagepart/hyperlink part id is same as the new id that you created. If everything goes good, when you save the Package, openxml sdk would rename all relation-id to be sequential and also update all references in the document content with the sequential-id it generated.

Good.

Thursday, 2 February 2012

Humans and Human body as the final Architectural reference point.

Its an interesting perspective to consider humans, their means of interaction with others including other humans, animals, machines, their growth, evolution, human anatomy among others as a software architecture reference point.

Given a scenario for high load data transactions that come in, how would a typical  single human handle it? How would many humans handle it? In case of multiple humans interacting to achieve a goal, how do they interact and how do they establish trust (also read tree-of-trust).

Would the human in concern work on multiple tasks in a round-robin fashion? Would the human sub-contract the work or would he just say he cant do it? What happens as the human matures - is s/he in a better shape to work on this task given the maturity and experience and how ? How do we mimic experience - is it all the time about machine learning?

Can we apply the human interaction and behaviours in a strange crowd and the resulting possible formation of a team into software components that could discover, trust and perform as a single component all dynamically without any system/person intervention? Would the software need "intelligence" to achieve this?

Can we apply principles of healing (physical, psychological) into software's  that report issues in their health - more than bug fixing, can we heal/fix/alter the behaviours of software components based on its interaction with other software components over the years ? Can we apply mentoring and counselling theories into software design?

On a related note, the human body itself provides ample opportunity as a reference point for a software component. There is enough structures already present in a human body that can be mimicked into a single software component.

We could treat the electric impulses within nerves as messages in a integration project or as packets in a tcp/ip transaction as the case may be. Can we include auto-heal modules into software components similar to the way the skin auto-heals (kinda) in case of a bruise/burn. Can we patch systems with antibody software components to further help the module to recover better and sooner? Can we adapt the techniques of 'sense' such as touch, sight etc such that the software component can adapt based on the environment it lives in. In a queue based integration, this could be about sensing the network load, message load and perhaps moving itself to a different machine.

Do we also worry about mutations of software's ? Is reincarnation a versioning mechanism or is it about rewriting completely?

Is the spine analogous to the ESB? Is the food breakdown procedure and waste  disposal a pipeline pattern with unused bytes of data after a filter a software component waste?

Is there a software design problem that this reference model cannot assist with ?

Related Tweet on applying stuff back to the 'normal' world