Tuesday, 6 March 2012

Copying Images & Hyperlinks from a docx to another.



Copying the textual data from an openxml based docx file was pretty OK - you just had to copy all the child nodes of //body node from the source to the destination.

Conceptually this would be in the lines of :

1.) load the source document's MainDocumentPart.Document to a XmlDocument.
2.) load the destination document's MainDocumentPart.Document to another XmlDocument.
3.) locate //body child node for each XmlDocument
4.) copy all child node from the source //body child node to the destination document //body child node. You might have trouble copying directly, hence use an ImportNode and then do an add.
5.) Once done for all child node,save the XmlDocument back to the MainDocumentPart.Document

All seems fine except for the fact that images/hyperlinks wouldnt appear in the destination document when opened. The following stuff has to be done additionally to get this working

1.) For each image part in the source, add a new Image part into the destination. This makes sure the final document has got the following entries added (rename the docx to zip and check):
a.) Word\Media folder has got the the images as separate files
b.) Word\relations xml has got an entry for each image with target pointing to the appropriate file in Word\Media
c.) The content types xml in the root has got an entry for this specific file type; say jpg.

Once you have the following three entries appearing right, your image should appear OK in the final document. Note that once the AddPart<ImagePart>() is done with, you would have to Close() the Package explicitly. This makes sure the above entries such as relation, content type entries get rightly saved into the target document. This is a critical step. Just saving the MainDocumentPart.Document to a FileStream is not going to help.

2.) Similar to Images, for each hyper link in source, add a new hyper link relation. This too would cause relation ship entries, content types entries in the destination rightly created.

Points of interest

If you are copying from multiple source document, you would have to make sure that while copying the child nodes, the relation-id of the content in concern (say image/blip/hyperlink) is temporarily updated to a unique-id such that it does not conflict with the same relation-id  in a different source file.

Additionally, when adding the related imagepart/hyperlink, make sure that the imagepart/hyperlink part id is same as the new id that you created. If everything goes good, when you save the Package, openxml sdk would rename all relation-id to be sequential and also update all references in the document content with the sequential-id it generated.

Good.

Thursday, 2 February 2012

Humans and Human body as the final Architectural reference point.

Its an interesting perspective to consider humans, their means of interaction with others including other humans, animals, machines, their growth, evolution, human anatomy among others as a software architecture reference point.

Given a scenario for high load data transactions that come in, how would a typical  single human handle it? How would many humans handle it? In case of multiple humans interacting to achieve a goal, how do they interact and how do they establish trust (also read tree-of-trust).

Would the human in concern work on multiple tasks in a round-robin fashion? Would the human sub-contract the work or would he just say he cant do it? What happens as the human matures - is s/he in a better shape to work on this task given the maturity and experience and how ? How do we mimic experience - is it all the time about machine learning?

Can we apply the human interaction and behaviours in a strange crowd and the resulting possible formation of a team into software components that could discover, trust and perform as a single component all dynamically without any system/person intervention? Would the software need "intelligence" to achieve this?

Can we apply principles of healing (physical, psychological) into software's  that report issues in their health - more than bug fixing, can we heal/fix/alter the behaviours of software components based on its interaction with other software components over the years ? Can we apply mentoring and counselling theories into software design?

On a related note, the human body itself provides ample opportunity as a reference point for a software component. There is enough structures already present in a human body that can be mimicked into a single software component.

We could treat the electric impulses within nerves as messages in a integration project or as packets in a tcp/ip transaction as the case may be. Can we include auto-heal modules into software components similar to the way the skin auto-heals (kinda) in case of a bruise/burn. Can we patch systems with antibody software components to further help the module to recover better and sooner? Can we adapt the techniques of 'sense' such as touch, sight etc such that the software component can adapt based on the environment it lives in. In a queue based integration, this could be about sensing the network load, message load and perhaps moving itself to a different machine.

Do we also worry about mutations of software's ? Is reincarnation a versioning mechanism or is it about rewriting completely?

Is the spine analogous to the ESB? Is the food breakdown procedure and waste  disposal a pipeline pattern with unused bytes of data after a filter a software component waste?

Is there a software design problem that this reference model cannot assist with ?

Related Tweet on applying stuff back to the 'normal' world

Sunday, 1 January 2012

COTS Architectures - sure shot architecture anti-pattern.

The typical trend in software architecture definitions of these days appear to be what I prefer to call 'COTS Architectures'. Why 'common-off-the-shelf' ? You take these architecture 'solutions' and it appears to be applicable for nearly all enterprise requirements!. These COTS Architectures appear to be used right from pre-sales proposals and typically would have the following (in a logical diagram) items :

1.) Three tiers - UI, Business, Data with usual layers:
1.1) 'UI process components', 'business objects', 'data transfer objects', 'data access layer' and of course the persistence layer with much regarded 'MVC components' spread across the tiers and layers.
2.) A couple of cross cutting concerns - logging, exception handing, aspects, error handling etc
3.) The interaction channels/protocols usually tcp/http
4.) All nicely drawn in tempting visio diagrams.
5.) You want to make it a bit more Enterprisy, add in a couple of CDN's, 'web servers', 'app servers', 'search servers', 'cache servers', 'document servers', 'enterprise service bus' et al into a cloud and there you go.

Having come across this issue across architects I have worked in past couple of years, my advice typically revolves around this line :

0.) 'COTS architecture' (if it exists) as such is not a solution, but a starting/reference point only. 
1.) Where are the specifics for your application/s ? What blocks are specific to your application that I wouldnt typically find in another application architecture document?
2.) List down the 10 core requirements of your system and tell me which component/block deals with it and what is the overall strategy for the particular requirement-solution. At this point, the architect should be in a position to tell which blocks get active and which other blocks it needs. While defining a service-oriented application, it would be more about which services are consumed by a specific service and what services it provides. 
2.1) Follow the KISS, SOA, SOLID principles religiously. After defining your architecture, read through these principles and figure out if it aligns.
3.) Always split the architecture definition diagrams into many - logical, technical, development and deployment at the bare minimum. Don't clutter and try to fit everything into one. If time permits, go for the conceptual, operations etc too. This additionally makes sure you need to worry about a few stuff only at any point.

Please don't get another 'COTS architecture' into your review meeting. This is not what the client/team want to see.

Rooting days are here again - custom ROM, S-OFF and some free time -> God Mode

It is definitely quite an experience to root an Android device with all sorts of custom ROM readily available. Reminds one of the early slackware days at college when getting to see the console itself was termed a success. With my HTC Desire HD having seen enough of its stock ROM and with some time at hand , it was yet that time again !

Rooting was pretty straightforward while religiously following "advanced ACE hack kit" manuals and instructions (install device driver, set developer mode, USB in charge only etc etc). While working from a Virtual pc environment, had to make sure the device was 'attached' manually each time the device re-started. Overall a smooth process to get your device rooted. The result is pretty awesome : Terminal emulator + busybee + super user = total control of the device! God-like.

Once rooted, it had to be the Revolution HD ROM for my HTC DHD. Revolution appeared to be most mature of the custom rom lots. Copied the zip file to the SDCard and it was time for the low-volume button + power button + recovery sequence and then the install that completed in less than 10 mins. Cool! A 2.3.5 Android Revolution with previous data, contacts, apps intact! Nice.

The evening saw the braveheart in me showing its head - had to try a 4.0 Icecream sandwich Android custom ROM. As per the discussion threads at xda-developers, BeatMod appeared the best and active of the lot, though still in beta. No camera fucntionality it seems - who needs it anyway. Brave.

Once again went through the vol-power-recovery sequence and installed the BeatMod rom. Sadly, the device kept continuously rebooting and once again it was an xda-developer forum thread that indicated a clean wipe before flashing. Back to step one but this time a clean wipe of user data,  cache, dalvik cache. Would be loosing all data including contacts, but what the hell - must have ICS this new-year. Didn't bother to backup either! Continually Brave.

4.0 this time didn't have the reboot issue but a cool welcoming interface! Did setup the exchange, gmail, browser etc but alas the text does not render quite right in these three apps! Though I could get the browser to render fine by disabling OpenGL, had no luck with exchange and gmail - essential apps for me :(. Perhaps its is best to wait until BeatMod got out of the beta stage. Sad.

Got back to Revolution HD while at the chicken stall today. Neat.

End result : Rooted, Android Revolution HD ROM 6.3, 2.3.5 Android, HTC sense 3.0, kernel 2.6. Not a bad way to start the new year though I have lost all the data. But, who cares - GodMode. Cool.

Fellows at xda-developers :  you rock! Thanks!

Update 03.03.2012 : Moved to IceCold Sandwich rom. Android 4 rocks!


Sunday, 11 September 2011

Distributed apps and the browser

Wondering what the possibilities of exploiting the browser are to execute distributed applications, hopefully extending around http://des.codeplex.com Would be cool if browser users could voluntarily share their machines computing power for distributed application needs.

Why browser ? Simply because that is the most used app in an internet connected environment.

 Possibilities :

 1.) Javascript based apps - exploiting the underlying machine structures for threading, figuring out idle time might not be really feasible though AJAX could provide some capabilities.

 2.) HTML5 - a bit more promising, but how do you make sure the application lives outside the browsing session/page as you would want the processing to continue even after user moves across pages. This is worth a read : http://dev.w3.org/html5/workers/

 3.) Browser plugins/extensions/apps - worth a detailed research. As these components live within the browser app and not necessarily within a browsing page session. With enough security rights set-up, these components could in fact access the entire computing power of the underlying machine. Interesting.

Anyone currently implementing open source solutions around these areas / having thoughts , do ping me please.


Tuesday, 16 August 2011

ConsistentHash implementation in C# 4.0

For those not familiar with ConsistentHash, start here: Consistent Hashing

The following is an attempt to implement the functionality using features of C# 3/4. I needed this functionality for the distributed cache project that I was working on (Available at HoC )

namespace HoC.Common
{
    public class ConsistentHash  : ICloneable
    {
        SortedList<string, string> itemCircle = new SortedList<string, string>();

        public string GetNearestItem(string key)
        {
            if (itemCircle.Count == 0)
                throw new ConsistentHashCircleEmpty();

            string keyHash = Hasher.GetHash(key);

            //find the last item that is just after the passed in key (clockwise)
            Func<KeyValuePair<string, string>, bool> stringCompare = x => (x.Key.CompareTo(keyHash) > 0);
            KeyValuePair<string, string> item = itemCircle.FirstOrDefault(stringCompare);

            if (string.IsNullOrEmpty(item.Key)) 
            {
                //if no item, fallback to the first item => traverse circle clockwise 
                return itemCircle.First().Value; 
            }

            return item.Value;
        }

        public void StoreItem(string item)
        {
            string hash = Hasher.GetHash(item);
            itemCircle[hash] = item;
        }

        public void RemoveItem(string item)
        {
            string hash = Hasher.GetHash(item);
            if (itemCircle.ContainsKey(hash))
                itemCircle.Remove(hash);
        }

        public string GetNextItemInCircle(string key)
        {
            return GetNearestItem(key);
        }

        //this function traverses anticlockwise, whereas GetNearestItem() traverses clockwise
        public string GetPreviousItemInCircle(string key)
        {
            if (itemCircle.Count == 0)
                throw new ConsistentHashCircleEmpty();

            string keyHash = Hasher.GetHash(key);

            //traverse, anticlock wise , find first item
            Func<KeyValuePair<string, string>, bool> stringCompare = x => (x.Key.CompareTo(keyHash) < 0);
            KeyValuePair<string, string> item = itemCircle.FirstOrDefault(stringCompare);

            if (string.IsNullOrEmpty(item.Key))
            {
                //if no item, fallback to the last item => traverse circle anti clockwise 
                return itemCircle.Last().Value;
            }

            return item.Value;
        }



        public object Clone()
        {
            return (ConsistentHash)MemberwiseClone();
        }
    }

    class ConsistentHashCircleEmpty : ApplicationException
    {

    }


}

Couple of drill downs:

1.) Basically each objects gets assigned a hash (Hasher class internally uses RIPEMD160Managed) so that it can be arranged in the circle and then gets stored to the internal circle. RIPEMD160Managed though a bit slower, supposedly has the lowest collision.

2.) The circle as seen is implemented using a SortedList list class - itemCircle

3.) The core functionality is implemented using the two methods:

3.1) GetNearestItem : This method traverses clockwise - basically find the object that comes up next in the sorted list after the given key.

3.2) GetPreviousItemInCircle : Opposite of GetNearestItem. Traverses anti-clockwise.

In the project that I use this internally, the objects are all serializable, hence the objects could be easily represented as a string.


Wednesday, 10 August 2011

Object Pool - Quick and Short in .NET 4.0

A simple object pool that could be used to maintain a set of objects readily available in memory, especially if you see that the object creation time is heavy. Eg:- creation of a MemoryStream on a need basis is typically time consuming in a server based app. In this case, it could best to have a set of MemoryStream objects readily available in memory. But, we definitely have to make sure to couple of things first :

1.) The way the object pool is filled happens is async.
2.) Access to a pool instance from the client app is thread safe.

To solve 1, we could exploit the AsyncMethodCaller together with its BeginInvoke(), while to make sure that the access to the pool is thread safe, we could check out the ConcurrentBag in .NET 4.0. So, where is the code?!

namespace HoC.Client
{
    /// <summary>
    /// maintains a pool/list of objects, provides it on request
    /// usually used for classes whose construction call is heavy
    /// </summary>
    /// <typeparam name="W"></typeparam>
    public class ObjectInstancePool<W>
    {
        private const int objectCount = 20; //can be updated to receive through the constructor
        private ConcurrentBag<W> objectList = new ConcurrentBag<W>();

        public ObjectInstancePool()
        {
            //refresh the bag first
            new AsyncMethodCaller(Filler).BeginInvoke(null, null);
        }

        public W GetInstance()
        {
            W result;

            if (!(objectList.TryTake(out result)))
            {
                result = Activator.CreateInstance<W>();
            }
            else
                new AsyncMethodCaller(Filler).BeginInvoke(null, null); //refresh the bag
            
            return result;
        }

        public delegate void AsyncMethodCaller();

        private void Filler()
        {
            while (objectList.Count < objectCount)
            {
                objectList.Add(Activator.CreateInstance<W>());
            }
        }
    }
}


I totally like it short and sweet :)