Saturday 25 May 2013

Message ReSequencing in a Distributed Publisher-Subscriber System

Message ReSequencing in a Distributed Publisher-Subscriber System

In a typical cloud based distributed environment with many message publishers and subscribers, the message could be processed by any of the subscriber and this is usually not predictable. In certain scenarios, it could be necessary to have a group of messages processed in sequence though they reached the subscribers randomly.
One solution to this is described below. The solutions tries to satisfy the following requirements :
1.) Make sure there is no tie-up / hard-link between subscribers and publishers: Any message can be received by any subscriber and any publisher can push any message.
2.) When the messages are not grouped, the system continues to behave as usual / before.
3.) When group messages are detected at subscriber, change in overall processing time should be minimal.
4.) When group message are detected at subscriber, no blocking operation should be performed and the subscriber should continue to be available for receiving and processing other messages.
Each group message is expected to contain the following items in addition to the message pay load itself :
            a.) Group Message ID.
            b.) Total Number of Messages in group
            c.) Group Message Sequence Number.
e.g.:-
            a.) GROUP_1
            b.) 4 //GROUP_1 contains 4 messages in total
            c.) 2 //this means that this message is 2nd in the group GROUP_1.
This solution employs subscribers working in a distributed cooperative manner. As a group message is received by a subscriber, it queries the distributed hash table to check if any other subscriber is working on the same group number. If yes, the message received is pushed onto that subscriber. (A push endpoint is expected to be available for each subscriber. This list of end points too could be maintained in the distributed hash table indicated earlier.) If there is no entry in the distributed hash table for this group, the subscriber adds itself into the distributed hash linking the message group id with itself.
GroupHash[GROUP_1] = SUBSCRIBER_ID
Additionally, the message received is added into a local data structure/bag of the receiving subscriber.
When a new message is received in the group_message_queue, the following steps are executed by the watcher@subscriber owning the bag.
            a.) check if all messages for the group has been received.
If all messages have been received, the messages are sorted based on the Group_Message_Sequence_Number and processed one after the other or as the logic demands for the group. Distributed hash entry is cleared for this group GroupHash[GROUP_1] = "" once the processing of message group is complete.
The data structure/bag maintained by the subscriber would be typically filled in the following scenarios :
            a.) message pushed from another subscriber.
            b.) message pushed by the local listener since no other subscriber is working on this group.
Partitioning within groups can be employed if required by employing the same strategy for sub-groups. In this case, it could be required by subscriber to wait for the sub-group messages to be processed before the proceeding with the group messages.
The system can be enhanced such that the subscribers internally check the load of active group message subscriber (as pointed by GroupHash) before pushing the message. This way, the subscriber that received the message can take over the ownership of group message voluntarily; especially if the message received was the last of the group message expected, requiring a message process flow.



Thursday 15 November 2012

Tyco Security Products - Bangalore Openings

Tyco Security Product Bangalore team looking out for a lead engineer and a build engineer.
Find details here : http://sdrv.ms/SBx1g1

Monday 11 June 2012

Assembly Line Programming - a practise for better code.



With availability of skilled programmers on the decline, software development firms need to look at alternate approaches for quality software release. A possible approach being multi-level-programming/ assembly-line programming(just coined)

Engineers based on their expertise could be placed in a particular level/rank, with any software development task to be picked up by the lowest of the level developers. S/he can complete as much as possible based on their expertise and knowledge. The completed software piece at each level is pushed to the next level for further work.

Key points:

1.) Completion at each level - It should be noted that after each level, the code is functionally expected to be 100% complete.

2.) More than just cleaning - its about tweaking, making the code artsy/beautiful at each level.

3.) Its more than just a code review. Its about the next level of programmers picking up the code as their starting point and changing (including heavy refactoring) to make the piece of code better , perfect and world class.

4.) Each level of programmer is expected to take ownership of the code in concern. When in your level, you own it.

A streamlined approach such that the programmers down the level get a chance to learn can be implemented with difference-reports emailed after each check-in. No action is expected from the recipient developer at this stage - this just FYI; passionate developers can learn.

Upper level developers shouldn't look at the piece of work as a cleaning/reviewing process, but  as a complete development process. They could treat the inputs they received as templates that are partially filled up. Think of skeletons with minimal flesh that you get - you have the flexibility to tone that piece of art based on your skill.

As you apply the same principle at each level, the code is expected to get cleaner and nearer perfection. Additionally, this makes sure that the very senior/experienced geek is not bothered with the very basic nitty gritty of things.

3 levels of programmers is a good start for any organization. Its quite easy to classify them based on the experience at any software development house.

Pair programming can be pain in some situations where the wavelength of the two developers don't  match - it can get destructive for the experienced developer. With assembly line programming , the experienced developer is on his/her own during the development process.

Theoretically, the unit test cases should not be changed as we are not expected to change the functionality as such. New unit test would definitely be added as the code is polished up the levels. Having said that, there could be instances that the skeletons/contracts can get changed at upper levels.

Cost-Benefit - Each orgranization /  project is on their own to perform a cost benefit analysis and tweak the number of levels as desired.

Thursday 29 March 2012

Interpreting software capabilities - smartly ?

What level of /intelligence/ does a software need such that it can interpret the capabilities / functions of another software?

a. Given the codebase for one software component, can it parse, deduce what the other software is trying to acheive ? (could be called : white-box analysis)

b. Can it watch the way this software component behaves in different situations (inputs) and then deduce behavior ? (black-box analysis). How much time before the it has /understood/ 80% of the behaviour? How could we quantify 80% when we do not know whats 100% ?

b.1 Would observer system watch discreetly or would it be an agreement wherein the observer provides a set of inputs to the observee to respond ?

(i think we are now in the realms of machine learning)

c. Most importantly, can it mimic the behaviors/capabilities that it learned?

d. Interestingly enough ,can the same observer software learn and mimic itself ?

A possible research area.

Friday 23 March 2012

SessionFlow - seamless flow of sessions & context across devices


 
Typical pain point – you are chatting/composing message/browsing on your mobile and now that you have reached home, you want to switch to your iPad/your favorite other device.

 
 
Solution – Track mobile device orientation(gyro) change + the GPS location. If one device is getting tilted clockwise relative with the other device just below it (think about water flowing from a jug to a cup), move the session states from the top device to the bottom. Application of this could be numerous - any application context/state that you need to push to another device could be used using SessionFlow. In case of devices without a gyro/GPS, allow for session push using shared WiFi/BlueTooth etc.

 Stuff Required to be implemented :
 
           Client libraries on multiple devices that :
  • can interpret the change in orienatation, location, identify if it’s a SessionFlow request.
  • Push state, session, context etc to the SessionFlow servers.
  • Start target app, apply the received state/session/context to the target device application.  Etc and finally give the update back.
  • Continually push location (lat/lang) to the server. 
         Services that :
  • Allow thirdparty servers to register (see point 3).
  • Push session-flow request to thirdparty server
  • Interpret and identify another device below the mentioned device.
  • Can trigger target devices once a session flow request is received. 
        Session Flow Standards/Protocols for client apps and servers to consume :
  • An entire deck of protocols to be created for communication across systems.
  • Define communication sequences and states.
  • Enable target applications and its service application to use the SessionFlow services and libraries.
Business Angle:
  • Vendors would subscribe to sessionflow servers on the cloud for enabling sessionFlow.
  • They could use client libraries for enabling sessionflow on their device applications or just use the standard we provide.
  • Subscription to be based on number of session-moves.
  • Theoretically this can be enabled for anything – including copying files. 
Challenges:
  •                       Exact interpretation of location, altitude, tilt can get tricky – we don’t want the session to flow to your neighbor one floor below.
  •                      Deducing whether the target device is just below can be complex unless some smart indexing is maintained at the server.
  •                     Expect very high load on the servers due to continuous position tracking. Additionally think of non-server based solutions where two devices just move session using wifi/Bluetooth etc.
 

Tuesday 6 March 2012

Copying Images & Hyperlinks from a docx to another.



Copying the textual data from an openxml based docx file was pretty OK - you just had to copy all the child nodes of //body node from the source to the destination.

Conceptually this would be in the lines of :

1.) load the source document's MainDocumentPart.Document to a XmlDocument.
2.) load the destination document's MainDocumentPart.Document to another XmlDocument.
3.) locate //body child node for each XmlDocument
4.) copy all child node from the source //body child node to the destination document //body child node. You might have trouble copying directly, hence use an ImportNode and then do an add.
5.) Once done for all child node,save the XmlDocument back to the MainDocumentPart.Document

All seems fine except for the fact that images/hyperlinks wouldnt appear in the destination document when opened. The following stuff has to be done additionally to get this working

1.) For each image part in the source, add a new Image part into the destination. This makes sure the final document has got the following entries added (rename the docx to zip and check):
a.) Word\Media folder has got the the images as separate files
b.) Word\relations xml has got an entry for each image with target pointing to the appropriate file in Word\Media
c.) The content types xml in the root has got an entry for this specific file type; say jpg.

Once you have the following three entries appearing right, your image should appear OK in the final document. Note that once the AddPart<ImagePart>() is done with, you would have to Close() the Package explicitly. This makes sure the above entries such as relation, content type entries get rightly saved into the target document. This is a critical step. Just saving the MainDocumentPart.Document to a FileStream is not going to help.

2.) Similar to Images, for each hyper link in source, add a new hyper link relation. This too would cause relation ship entries, content types entries in the destination rightly created.

Points of interest

If you are copying from multiple source document, you would have to make sure that while copying the child nodes, the relation-id of the content in concern (say image/blip/hyperlink) is temporarily updated to a unique-id such that it does not conflict with the same relation-id  in a different source file.

Additionally, when adding the related imagepart/hyperlink, make sure that the imagepart/hyperlink part id is same as the new id that you created. If everything goes good, when you save the Package, openxml sdk would rename all relation-id to be sequential and also update all references in the document content with the sequential-id it generated.

Good.

Thursday 2 February 2012

Humans and Human body as the final Architectural reference point.

Its an interesting perspective to consider humans, their means of interaction with others including other humans, animals, machines, their growth, evolution, human anatomy among others as a software architecture reference point.

Given a scenario for high load data transactions that come in, how would a typical  single human handle it? How would many humans handle it? In case of multiple humans interacting to achieve a goal, how do they interact and how do they establish trust (also read tree-of-trust).

Would the human in concern work on multiple tasks in a round-robin fashion? Would the human sub-contract the work or would he just say he cant do it? What happens as the human matures - is s/he in a better shape to work on this task given the maturity and experience and how ? How do we mimic experience - is it all the time about machine learning?

Can we apply the human interaction and behaviours in a strange crowd and the resulting possible formation of a team into software components that could discover, trust and perform as a single component all dynamically without any system/person intervention? Would the software need "intelligence" to achieve this?

Can we apply principles of healing (physical, psychological) into software's  that report issues in their health - more than bug fixing, can we heal/fix/alter the behaviours of software components based on its interaction with other software components over the years ? Can we apply mentoring and counselling theories into software design?

On a related note, the human body itself provides ample opportunity as a reference point for a software component. There is enough structures already present in a human body that can be mimicked into a single software component.

We could treat the electric impulses within nerves as messages in a integration project or as packets in a tcp/ip transaction as the case may be. Can we include auto-heal modules into software components similar to the way the skin auto-heals (kinda) in case of a bruise/burn. Can we patch systems with antibody software components to further help the module to recover better and sooner? Can we adapt the techniques of 'sense' such as touch, sight etc such that the software component can adapt based on the environment it lives in. In a queue based integration, this could be about sensing the network load, message load and perhaps moving itself to a different machine.

Do we also worry about mutations of software's ? Is reincarnation a versioning mechanism or is it about rewriting completely?

Is the spine analogous to the ESB? Is the food breakdown procedure and waste  disposal a pipeline pattern with unused bytes of data after a filter a software component waste?

Is there a software design problem that this reference model cannot assist with ?

Related Tweet on applying stuff back to the 'normal' world

Sunday 1 January 2012

COTS Architectures - sure shot architecture anti-pattern.

The typical trend in software architecture definitions of these days appear to be what I prefer to call 'COTS Architectures'. Why 'common-off-the-shelf' ? You take these architecture 'solutions' and it appears to be applicable for nearly all enterprise requirements!. These COTS Architectures appear to be used right from pre-sales proposals and typically would have the following (in a logical diagram) items :

1.) Three tiers - UI, Business, Data with usual layers:
1.1) 'UI process components', 'business objects', 'data transfer objects', 'data access layer' and of course the persistence layer with much regarded 'MVC components' spread across the tiers and layers.
2.) A couple of cross cutting concerns - logging, exception handing, aspects, error handling etc
3.) The interaction channels/protocols usually tcp/http
4.) All nicely drawn in tempting visio diagrams.
5.) You want to make it a bit more Enterprisy, add in a couple of CDN's, 'web servers', 'app servers', 'search servers', 'cache servers', 'document servers', 'enterprise service bus' et al into a cloud and there you go.

Having come across this issue across architects I have worked in past couple of years, my advice typically revolves around this line :

0.) 'COTS architecture' (if it exists) as such is not a solution, but a starting/reference point only. 
1.) Where are the specifics for your application/s ? What blocks are specific to your application that I wouldnt typically find in another application architecture document?
2.) List down the 10 core requirements of your system and tell me which component/block deals with it and what is the overall strategy for the particular requirement-solution. At this point, the architect should be in a position to tell which blocks get active and which other blocks it needs. While defining a service-oriented application, it would be more about which services are consumed by a specific service and what services it provides. 
2.1) Follow the KISS, SOA, SOLID principles religiously. After defining your architecture, read through these principles and figure out if it aligns.
3.) Always split the architecture definition diagrams into many - logical, technical, development and deployment at the bare minimum. Don't clutter and try to fit everything into one. If time permits, go for the conceptual, operations etc too. This additionally makes sure you need to worry about a few stuff only at any point.

Please don't get another 'COTS architecture' into your review meeting. This is not what the client/team want to see.

Rooting days are here again - custom ROM, S-OFF and some free time -> God Mode

It is definitely quite an experience to root an Android device with all sorts of custom ROM readily available. Reminds one of the early slackware days at college when getting to see the console itself was termed a success. With my HTC Desire HD having seen enough of its stock ROM and with some time at hand , it was yet that time again !

Rooting was pretty straightforward while religiously following "advanced ACE hack kit" manuals and instructions (install device driver, set developer mode, USB in charge only etc etc). While working from a Virtual pc environment, had to make sure the device was 'attached' manually each time the device re-started. Overall a smooth process to get your device rooted. The result is pretty awesome : Terminal emulator + busybee + super user = total control of the device! God-like.

Once rooted, it had to be the Revolution HD ROM for my HTC DHD. Revolution appeared to be most mature of the custom rom lots. Copied the zip file to the SDCard and it was time for the low-volume button + power button + recovery sequence and then the install that completed in less than 10 mins. Cool! A 2.3.5 Android Revolution with previous data, contacts, apps intact! Nice.

The evening saw the braveheart in me showing its head - had to try a 4.0 Icecream sandwich Android custom ROM. As per the discussion threads at xda-developers, BeatMod appeared the best and active of the lot, though still in beta. No camera fucntionality it seems - who needs it anyway. Brave.

Once again went through the vol-power-recovery sequence and installed the BeatMod rom. Sadly, the device kept continuously rebooting and once again it was an xda-developer forum thread that indicated a clean wipe before flashing. Back to step one but this time a clean wipe of user data,  cache, dalvik cache. Would be loosing all data including contacts, but what the hell - must have ICS this new-year. Didn't bother to backup either! Continually Brave.

4.0 this time didn't have the reboot issue but a cool welcoming interface! Did setup the exchange, gmail, browser etc but alas the text does not render quite right in these three apps! Though I could get the browser to render fine by disabling OpenGL, had no luck with exchange and gmail - essential apps for me :(. Perhaps its is best to wait until BeatMod got out of the beta stage. Sad.

Got back to Revolution HD while at the chicken stall today. Neat.

End result : Rooted, Android Revolution HD ROM 6.3, 2.3.5 Android, HTC sense 3.0, kernel 2.6. Not a bad way to start the new year though I have lost all the data. But, who cares - GodMode. Cool.

Fellows at xda-developers :  you rock! Thanks!

Update 03.03.2012 : Moved to IceCold Sandwich rom. Android 4 rocks!


Sunday 11 September 2011

Distributed apps and the browser

Wondering what the possibilities of exploiting the browser are to execute distributed applications, hopefully extending around http://des.codeplex.com Would be cool if browser users could voluntarily share their machines computing power for distributed application needs.

Why browser ? Simply because that is the most used app in an internet connected environment.

 Possibilities :

 1.) Javascript based apps - exploiting the underlying machine structures for threading, figuring out idle time might not be really feasible though AJAX could provide some capabilities.

 2.) HTML5 - a bit more promising, but how do you make sure the application lives outside the browsing session/page as you would want the processing to continue even after user moves across pages. This is worth a read : http://dev.w3.org/html5/workers/

 3.) Browser plugins/extensions/apps - worth a detailed research. As these components live within the browser app and not necessarily within a browsing page session. With enough security rights set-up, these components could in fact access the entire computing power of the underlying machine. Interesting.

Anyone currently implementing open source solutions around these areas / having thoughts , do ping me please.


Tuesday 16 August 2011

ConsistentHash implementation in C# 4.0

For those not familiar with ConsistentHash, start here: Consistent Hashing

The following is an attempt to implement the functionality using features of C# 3/4. I needed this functionality for the distributed cache project that I was working on (Available at HoC )

namespace HoC.Common
{
    public class ConsistentHash  : ICloneable
    {
        SortedList<string, string> itemCircle = new SortedList<string, string>();

        public string GetNearestItem(string key)
        {
            if (itemCircle.Count == 0)
                throw new ConsistentHashCircleEmpty();

            string keyHash = Hasher.GetHash(key);

            //find the last item that is just after the passed in key (clockwise)
            Func<KeyValuePair<string, string>, bool> stringCompare = x => (x.Key.CompareTo(keyHash) > 0);
            KeyValuePair<string, string> item = itemCircle.FirstOrDefault(stringCompare);

            if (string.IsNullOrEmpty(item.Key)) 
            {
                //if no item, fallback to the first item => traverse circle clockwise 
                return itemCircle.First().Value; 
            }

            return item.Value;
        }

        public void StoreItem(string item)
        {
            string hash = Hasher.GetHash(item);
            itemCircle[hash] = item;
        }

        public void RemoveItem(string item)
        {
            string hash = Hasher.GetHash(item);
            if (itemCircle.ContainsKey(hash))
                itemCircle.Remove(hash);
        }

        public string GetNextItemInCircle(string key)
        {
            return GetNearestItem(key);
        }

        //this function traverses anticlockwise, whereas GetNearestItem() traverses clockwise
        public string GetPreviousItemInCircle(string key)
        {
            if (itemCircle.Count == 0)
                throw new ConsistentHashCircleEmpty();

            string keyHash = Hasher.GetHash(key);

            //traverse, anticlock wise , find first item
            Func<KeyValuePair<string, string>, bool> stringCompare = x => (x.Key.CompareTo(keyHash) < 0);
            KeyValuePair<string, string> item = itemCircle.FirstOrDefault(stringCompare);

            if (string.IsNullOrEmpty(item.Key))
            {
                //if no item, fallback to the last item => traverse circle anti clockwise 
                return itemCircle.Last().Value;
            }

            return item.Value;
        }



        public object Clone()
        {
            return (ConsistentHash)MemberwiseClone();
        }
    }

    class ConsistentHashCircleEmpty : ApplicationException
    {

    }


}

Couple of drill downs:

1.) Basically each objects gets assigned a hash (Hasher class internally uses RIPEMD160Managed) so that it can be arranged in the circle and then gets stored to the internal circle. RIPEMD160Managed though a bit slower, supposedly has the lowest collision.

2.) The circle as seen is implemented using a SortedList list class - itemCircle

3.) The core functionality is implemented using the two methods:

3.1) GetNearestItem : This method traverses clockwise - basically find the object that comes up next in the sorted list after the given key.

3.2) GetPreviousItemInCircle : Opposite of GetNearestItem. Traverses anti-clockwise.

In the project that I use this internally, the objects are all serializable, hence the objects could be easily represented as a string.


Wednesday 10 August 2011

Object Pool - Quick and Short in .NET 4.0

A simple object pool that could be used to maintain a set of objects readily available in memory, especially if you see that the object creation time is heavy. Eg:- creation of a MemoryStream on a need basis is typically time consuming in a server based app. In this case, it could best to have a set of MemoryStream objects readily available in memory. But, we definitely have to make sure to couple of things first :

1.) The way the object pool is filled happens is async.
2.) Access to a pool instance from the client app is thread safe.

To solve 1, we could exploit the AsyncMethodCaller together with its BeginInvoke(), while to make sure that the access to the pool is thread safe, we could check out the ConcurrentBag in .NET 4.0. So, where is the code?!

namespace HoC.Client
{
    /// <summary>
    /// maintains a pool/list of objects, provides it on request
    /// usually used for classes whose construction call is heavy
    /// </summary>
    /// <typeparam name="W"></typeparam>
    public class ObjectInstancePool<W>
    {
        private const int objectCount = 20; //can be updated to receive through the constructor
        private ConcurrentBag<W> objectList = new ConcurrentBag<W>();

        public ObjectInstancePool()
        {
            //refresh the bag first
            new AsyncMethodCaller(Filler).BeginInvoke(null, null);
        }

        public W GetInstance()
        {
            W result;

            if (!(objectList.TryTake(out result)))
            {
                result = Activator.CreateInstance<W>();
            }
            else
                new AsyncMethodCaller(Filler).BeginInvoke(null, null); //refresh the bag
            
            return result;
        }

        public delegate void AsyncMethodCaller();

        private void Filler()
        {
            while (objectList.Count < objectCount)
            {
                objectList.Add(Activator.CreateInstance<W>());
            }
        }
    }
}


I totally like it short and sweet :)



Friday 15 July 2011

Tree-Of-Trust

Overview

Enterprise applications of today are no longer independent, but dependent on other applications, services and components for completing their functionality. Authenticating/Trusting another application typically requires having a new authentication/identity management module written for each new application that is integrated. With the advent of claims based/token based authentication, this pain has been eased very much for a developer. Supporting protocols, standards and frameworks such as SAML, WS-Security, WS-Federation, WIF, ADFS etc playing a huge role.

As of today, A developer for application A requiring access to use services of application B would configure the identity providers supported by B to trust A. As this task is most of the time manual and done as part of the deployment phase, the identity management today can be treated static as such. In a world where SOA is exploited, there comes a need to build trusts dynamically, in the same manner as a service is discovered dynamically.

Influences:
Similar to humans who build trust based on relationships, introductions, recommendations etc, what is proposed in the "dynamic tree of trust" topic area is to have a mechanism that includes new frameworks, protocols, markups, standards that collectively assist in building trust dynamically.

Possibility 1:
An application A wanting to access the functionality/services of an application B can claim that its trusted by another set of applications and provide this set of trust using a markup language to application B. Application B can go through its list of internal tree of trusts and figure out if the trust claims are authentic.

Possibility 2:
An application A wanting to access the functionality/services of an application B can send its identity to application B. Application B internally can apply its 'tree-of-trust' locator algorithm to check if there is any identity provider that appears to know this application A.

Ranking trust:
A ranking mechanism can be used by application B to get the effective-trust index using :
a.) Number of providers that support/trust application A
b.) Depth/level within the 'tree-of-trust'.

Research Possibilities:
1.) Research into dynamically building trust; devise a fool-proof mechanism.
2.) Research into developing a set of
(1.1) protocols
(1.2) standards
(1.3) markups and a
(1.4) sample framework for applications to more easily build trust dynamically without any human interventions, while considering any current trends in claims based authentication/WS-* standards. Especially WS-Federation, WS-Trust.
3.) Research into machine learning mechanisms can be included within the framework to have more robust trust learning mechanism such that the effective ranking is based on prior identity success/failures.
4.) Research into more effective traversal algorithms when the logical structure (not necessarily implementation) of the trust-tree is tree based.
4.1) Research into querying identity providers n level deep on the authenticity of the requestor.
5.) Research into applications in mobile devices.


Applicable thoughts:
1.) Hard Trust - A direct trust setup on an identity provider
2.) Soft Trust - A trust that was build up dynamically. ST (0.5), ST(0.8) etc, wherein the index indicates the effective trust index/rank
3.) On-Behalf-Of – A trust that is directly vouched by another.