Bits from the software front...: programming

Showing posts with label programming. Show all posts

Tuesday, 16 August 2011

ConsistentHash implementation in C# 4.0

For those not familiar with ConsistentHash, start here: Consistent Hashing

The following is an attempt to implement the functionality using features of C# 3/4. I needed this functionality for the distributed cache project that I was working on (Available at HoC )

namespace HoC.Common
{
    public class ConsistentHash  : ICloneable
    {
        SortedList<string, string> itemCircle = new SortedList<string, string>();

        public string GetNearestItem(string key)
        {
            if (itemCircle.Count == 0)
                throw new ConsistentHashCircleEmpty();

            string keyHash = Hasher.GetHash(key);

            //find the last item that is just after the passed in key (clockwise)
            Func<KeyValuePair<string, string>, bool> stringCompare = x => (x.Key.CompareTo(keyHash) > 0);
            KeyValuePair<string, string> item = itemCircle.FirstOrDefault(stringCompare);

            if (string.IsNullOrEmpty(item.Key)) 
            {
                //if no item, fallback to the first item => traverse circle clockwise 
                return itemCircle.First().Value; 
            }

            return item.Value;
        }

        public void StoreItem(string item)
        {
            string hash = Hasher.GetHash(item);
            itemCircle[hash] = item;
        }

        public void RemoveItem(string item)
        {
            string hash = Hasher.GetHash(item);
            if (itemCircle.ContainsKey(hash))
                itemCircle.Remove(hash);
        }

        public string GetNextItemInCircle(string key)
        {
            return GetNearestItem(key);
        }

        //this function traverses anticlockwise, whereas GetNearestItem() traverses clockwise
        public string GetPreviousItemInCircle(string key)
        {
            if (itemCircle.Count == 0)
                throw new ConsistentHashCircleEmpty();

            string keyHash = Hasher.GetHash(key);

            //traverse, anticlock wise , find first item
            Func<KeyValuePair<string, string>, bool> stringCompare = x => (x.Key.CompareTo(keyHash) < 0);
            KeyValuePair<string, string> item = itemCircle.FirstOrDefault(stringCompare);

            if (string.IsNullOrEmpty(item.Key))
            {
                //if no item, fallback to the last item => traverse circle anti clockwise 
                return itemCircle.Last().Value;
            }

            return item.Value;
        }



        public object Clone()
        {
            return (ConsistentHash)MemberwiseClone();
        }
    }

    class ConsistentHashCircleEmpty : ApplicationException
    {

    }


}

Couple of drill downs:

1.) Basically each objects gets assigned a hash (Hasher class internally uses RIPEMD160Managed) so that it can be arranged in the circle and then gets stored to the internal circle. RIPEMD160Managed though a bit slower, supposedly has the lowest collision.

2.) The circle as seen is implemented using a SortedList list class - itemCircle

3.) The core functionality is implemented using the two methods:

3.1) GetNearestItem : This method traverses clockwise - basically find the object that comes up next in the sorted list after the given key.

3.2) GetPreviousItemInCircle : Opposite of GetNearestItem. Traverses anti-clockwise.

In the project that I use this internally, the objects are all serializable, hence the objects could be easily represented as a string.

Wednesday, 10 August 2011

Object Pool - Quick and Short in .NET 4.0

A simple object pool that could be used to maintain a set of objects readily available in memory, especially if you see that the object creation time is heavy. Eg:- creation of a MemoryStream on a need basis is typically time consuming in a server based app. In this case, it could best to have a set of MemoryStream objects readily available in memory. But, we definitely have to make sure to couple of things first :

1.) The way the object pool is filled happens is async.
2.) Access to a pool instance from the client app is thread safe.

To solve 1, we could exploit the AsyncMethodCaller together with its BeginInvoke(), while to make sure that the access to the pool is thread safe, we could check out the ConcurrentBag in .NET 4.0. So, where is the code?!

namespace HoC.Client
{
    /// <summary>
    /// maintains a pool/list of objects, provides it on request
    /// usually used for classes whose construction call is heavy
    /// </summary>
    /// <typeparam name="W"></typeparam>
    public class ObjectInstancePool<W>
    {
        private const int objectCount = 20; //can be updated to receive through the constructor
        private ConcurrentBag<W> objectList = new ConcurrentBag<W>();

        public ObjectInstancePool()
        {
            //refresh the bag first
            new AsyncMethodCaller(Filler).BeginInvoke(null, null);
        }

        public W GetInstance()
        {
            W result;

            if (!(objectList.TryTake(out result)))
            {
                result = Activator.CreateInstance<W>();
            }
            else
                new AsyncMethodCaller(Filler).BeginInvoke(null, null); //refresh the bag
            
            return result;
        }

        public delegate void AsyncMethodCaller();

        private void Filler()
        {
            while (objectList.Count < objectCount)
            {
                objectList.Add(Activator.CreateInstance<W>());
            }
        }
    }
}

I totally like it short and sweet :)

Tuesday, 5 October 2010

HoC is up on codeplex

HoC - A distributed cache implementation using .NET 4.0 is now available for download with source at http://hoc.codeplex.com/

HoC = Herd Of Cache.

Friday, 5 February 2010

Concurrency & .NET

With each versions of the windows programming libraries, the options for concurrent programming seems to be on the rise. Gone are the days when you had to start with a plain CreateThread() Win32 call (remember setting all those security attributes?). Then came the wrappers right from CThread in MFC, TThread in VCL (a more elegant version - Delphi ruled those days).

With earlier versions of .NET, you had the Thread class, the BackgroundWorker class and highly recommended ThreadQueue class. (lets not worry about all the sync objects that came along). With multi-core machines all around, the possibilities in .NET 4.0 are endless :

a.) Parallel Extensions (PLINQ + TPL)

Integrating parallelism right into the framework design while expoiting the extension methods has made expressing concurrency easier. Had a loop that you wanted to execute in parallel? Just use the Parallel.For().
1 core? 2 core? n core? Not sure how to exploit them? Just use the framework provided by TPL (Task Parallel Library) - your applications would scale (not worrying about the internal design/syncs for the moment) based on the number of the cores. Nice. The best part is, C# language and the supporting framework structure appears to move towards the functional programming paradigm - wherein you are not worried about how to do the job but more about what to do. LINQ, TPL, Parallel-extensions etc seems to be inspired by this functional paradigm as in Haskell [my current interest area)] / F#.

Want to dig real deep with some great samples ? Check this out : http://code.msdn.microsoft.com/ParExtSamples

b.) Axum

A very interesting .NET programming language from the MS research yard to check out. A language built with concurrency as the primary design objective. You have 'agent's (think about a block of code being executed independently like threads) talking with each other through the 'channel's using the 'message's (think about the all sync-objects you used to get two threads to talk with each other, but easier). Very promising - you could write your core domain objects in C#, use them within Axum wherein you would ave laid out your concurrency logic.

Check out http://msdn.microsoft.com/en-us/devlabs/dd795202.aspx , http://en.wikipedia.org/wiki/Axum_(programming_language)

c.) DirectCompute

Would like the exploit the massive processing power of your GPU? Check out the DirectCompute library. A DirectX 11/10 based framework that lets you offload tasks onto the GPU - awesome. In similar lines, also check out Brahma framework written by my ex-collegue Ananth at http://brahma.ananthonline.net

Dont miss the DirectCompute session video (http://microsoftpdc.com/Sessions/P09-16) which also showed some cool applications. Was amazing to see the computationally intensive job being done by the GPU while the CPU stayed at ~0% utilization !

d.) Dryad

Yet another product from the MS research aresenal, Dryad appears to be more targetted at making writing distributed applications easier. Need to check this out in detail - once I find an HPC server to do the installation, then perhaps port DES to it?

Check it out further at http://research.microsoft.com/en-us/projects/dryad/

Saturday, 31 October 2009

DES R2

DES R2 (Distributed Execution System Release 2) released to http://des.codeplex.com/SourceControl/changeset/view/33784

Core enhancement - support for child tasks, concept of collection gate, design changes/refactoring etc

Thursday, 18 June 2009

Making sense of Contravariance and Covariance in C# 2.0/4.0

Introduction

In simple words, covariance refers to the fact that you can use a type or one of its descendants when the specific type is expected. Say for the following class hierarchy: Doberman -> Dog -> Animal, where Animal is the root/super parent class. If you had a function

public static Dog ProcessDog(Dog myDog)
{
return new Dog();
}

In all versions of C#,

Dog returnDog = ProcessDog(new Dog()); // is valid
Dog returnDog = ProcessDog(new Doberman()); //is valid
Dog returnDog = ProcessDog(new Animal()); //is INVALID.

The above invalidity confirms the fact that all arguments in C# are covariant.

Similarly,

Dog returnDog = ProcessDog(new Dog()); // is valid
Object returnDog = ProcessDog(new Dog()); // is valid
Doberman returnDog = ProcessDog(new Dog()); // is INVALID.

confirms the fact that all return values are contravariant. As in , you cannot use a child type when the parent type is expected in case of return values. Effectively, every type that goes IN is covariant and everything that comes OUT is contravariant.

Covariance and Contravariance in Generics / C# 2.0 / C# 4.0

In C# 2.0, generic interfaces by themselves did not allow for covariance/contravariance. Say if I had

IEnumerable dogs = new List();
IEnumerable animals;
animals = dogs;//is INVALID

I cannot do a animals = dogs; though it make sense literally - arent dogs animals?!. Though note that you could still assign an instance of dog to a animal variable. Coming to C# 4.0, the above statement animals = dogs; works perfectly fine! Why? How?

To understand this, assume a case where IEnumerable allowed setting the value of an item. ASSUME, if you could do

IEnumerable dogs = new List();
IEnumerable animals;
animals = dogs;
animals[0] = cats[0]; //assume cat type to be descendant of animal type

Accessing dogs[0] would now result in an invalid typecast because you are now trying to cast a cat object as a dog!. Thankfully, IEnumerable does not let you set a value to an item, which makes IEnumerable a perfect candidate for covariance. To make sure animals=dogs statement work in C# 4.0, we need to make sure that IEnumerable does not allow for setting/accepting the generic type as a function argument or directly (which is what we did when we did animals[0] = cats[0]).

In our case, what we need to enforce on IEnumerable is to make sure IEnumerable does not have any function which can accept a instance of T (or its descendant). If we did allow, we are introducing the problem of allowing animals[0] = cats[0] OR allowing for animals.AddItem(cat) (declared perhaps as void AddItem(T item)). If we did not allow this, we can ensure that animals=dogs is valid. (which is what C# 4.0 did)

The idea is this : some generic interfaces do not allow inserting items into their list (like IEnumerable) by default. This makes sure that you cannot actually set a specific item of the list with a casted value. For this kind of generics interfaces, C# 4.0 introduces the concept of in, out keyword.

If you had an interface definition of the below kind:

interface Itest
{
T GetVal();
}

the new 'out' keyword makes sure that the type T is covariant. This effectively means that the type T is forced to be a covariant and cannot be used as an IN variable. Say if you tried to add a new method 'GetNewVal' to the above interface as :

void GetNewVal(T someParam);

The compiler would return an error saying T is covariant as T can be used to return a value ONLY.
Similarly, if you had an interface Itest2 as:

interface Itest2
{
void getval(T another);
}

the 'in' keyword makes sure that T is contravariant such that T can be used as an inbound entity (usually as an argument) only. Effectively, if you try to add a new method 'GetYetAnotherVal' as :

T GetYetAnotherVal(void);

the compiler would return an error.

Within C# 4.0, IEnumerable is declared as :

public interface IEnumerable : IEnumerable
{
IEnumerator GetEnumerator();
}

Note the 'out' keyword indicating that no function within IEnumerable can now accept T as input making sure it is covariant to the type T. Outputs are fine (as in GetEnumerator).

References

1.) http://research.microsoft.com/en-us/um/people/akenn/generics/ECOOP06.pdf
Above paper also at http://research.microsoft.com/pubs/64042/ecoop06.pdf

2.) http://en.wikipedia.org/wiki/Covariance_and_contravariance_%28computer_science%29

Friday, 17 April 2009

DES Uploaded to CodePlex

Have uploaded the initial version of my hobby project Distributed Execution System to http://des.codeplex.com/

Check the homepage at http://des.codeplex.com/ for a brief summary of the project.

Tuesday, 24 March 2009

Boxed & Secured Execution Of a .NET Type

One of the usual needs in an application developers world is to instantiate a .NET type in a boxed/contained/isolated environment with zero impact to the current application process space. How do we do that? This article solves this in an easy manner, adoptable easily.

Usual Solution

The immediate answer is to use an Application Domain - create new application domain and instantiate the type within the new domain. Sounds straight forward. Sadly nope. If you thought the following lines of code would just work, you are mistaken. To reconfirm that it does not work, try unloading the app domain and then deleting the loaded assembly from the windows explorer. It does not let you delete the assembly file. What happened here?

//create the application domain and create an instance of the object
AppDomain clientDomain = AppDomain.CreateDomain("ClientTaskDomain");
Object executionObject = clientDomain.CreateInstanceAndUnwrap("ABC.Test", "MyTest");

//find the Execute method and call it.
MethodInfo executionMethod = executionObject.GetType().GetMethod("Execute");
returnData = executionMethod.Invoke(executionObject, null);

As the type ABC.Test.MyTest was not a MarshalByRefObject descendant, the type instance gets loaded into the main application domain. Instead if the type ABC.Test.MyTest did descend from MarshalByRefObject, the type would have been instantiated in the 'remote' application domain. Thats the way it is designed.

Easy Way Out

Have a proxy type created in your application which descendants from MarshalByRefObject. Instantiate this object and call a proxy routine on this proxy object which then instantiates the real type. In this way, as the proxy type is already created in the new application domain, the real type too would be created in the new application domain.

AppDomain clientDomain = AppDomain.CreateDomain("ClientTaskDomain");
try
{
AssemblyLoader _aLoader = (AssemblyLoader)clientDomain.CreateInstanceAndUnwrap("XYZ.Test", "XYZ.Test.AssemblyLoader");
returnData = _aLoader.LoadAndRun("ABC.Test", "MyTest");
}
finally
{
AppDomain.Unload(clientDomain);
}

where AssemblyLoader is defined as an MBR descendant as :

[Serializable]
public class AssemblyLoader : MarshalByRefObject
{
public Object Execute(string assemblyName, string typeName)
{
Assembly _assembly = Assembly.Load(assemblyName);
Type _type =_assembly.GetType(typeName);
MethodInfo _method =_type.GetMethod("Execute");
return _method.Invoke(Activator.CreateInstance(_type), null);
}
}

Using this method, we have made sure that "MyTest" is always instantiated in the new application domain.

Impersonate for Security

All good until now, but how do you make sure the executed code executes under a user supplied account ? Pretty simple if you know how to authenticate a username/password/domain. Sadly, there is no direct way to perform a windows authentication in .NET. Not sure why there isnt a "bool WindowsPrincipal.Authenticate(userName, passWord,domain)" routine ? No clues. We could go the LogonUser route, but it appears it has certain permission issues in NT/2000 basedmachines. Hence, lets write one using the NegotiateStream

public static class SSPIHelper
{
  enum AuthenticationState { Unknown, Success, Failure } ;

  public static WindowsPrincipal LogonUser(NetworkCredential credential)
  {
      string userName, domain, password;

      userName = credential.UserName;
      domain = credential.Domain;
      password = credential.Password;

      TcpListener tcpListener = new TcpListener(IPAddress.Loopback, 0);
      tcpListener.Start();

      WindowsIdentity id = null;
      AuthenticationState authState = AuthenticationState.Unknown;

      IAsyncResult serverResult = tcpListener.BeginAcceptTcpClient(delegate(IAsyncResult asyncResult)
      {
          using (NegotiateStream serverSide = new NegotiateStream(
                   tcpListener.EndAcceptTcpClient(asyncResult).GetStream()))
          {
              try
              {
                  serverSide.AuthenticateAsServer(CredentialCache.DefaultNetworkCredentials,
                       ProtectionLevel.None, TokenImpersonationLevel.Impersonation);
                  id = (WindowsIdentity)serverSide.RemoteIdentity;
                  authState = AuthenticationState.Success;
              }
              catch (Exception e)
              {
                  authState = AuthenticationState.Failure;
              }
          }
      }, null);


      using (NegotiateStream clientSide = new NegotiateStream(new TcpClient("localhost",
                   ((IPEndPoint)tcpListener.LocalEndpoint).Port).GetStream()))
      {
          try
          {
              clientSide.AuthenticateAsClient(new NetworkCredential(userName, password, domain),
                       "", ProtectionLevel.None, TokenImpersonationLevel.Impersonation);
              authState = AuthenticationState.Success;
          }
          catch (Exception E)
          {
              authState = AuthenticationState.Failure;
          }
      }

      while (authState == AuthenticationState.Unknown) ;

      tcpListener.Stop();
      if (authState == AuthenticationState.Success)
          return new WindowsPrincipal(id);
      else
          return null;
  }
}

Ok, we have a windows principal. Now what ? Impersonate to execute the code using this principal, which happens to be the easy bit.

WindowsIdentity newId = (WindowsIdentity)windowsPrincipal.Identity; //the one received from SSPIHelper
WindowsImpersonationContext impersonatedUser = newId.Impersonate();

This makes sure that the code following the above Impersonate() call uses the provided identity. Once we want to revert back to the original identity, just do a Undo() (see below)

So effectively what we now have is an isolated and safe execution of a type provided by the client using the credentials supplied by them. To summarize, the code should look similar to this:

//authenticate the client supplied credentials
WindowsPrincipal windowsPrincipal = SSPIHelper.LogonUser(credentials);
WindowsIdentity newId = (WindowsIdentity)windowsPrincipal.Identity;

//impersonate
WindowsImpersonationContext impersonatedUser = newId.Impersonate();
try
{
//create the application domain and create an instance of the object
AppDomain clientDomain = AppDomain.CreateDomain("ClientTaskDomain");
try
{
 //use the proxy MBR object
 AssemblyLoader _aLoader = (AssemblyLoader)clientDomain.CreateInstanceAndUnwrap("XYZ.Test", "XYZ.Test.AssemblyLoader");
 returnData = _aLoader.LoadAndRun("ABC.Test", "MyTest");//call the client's method
}
finally
{
 AppDomain.Unload(clientDomain);
}
}
finally
{
impersonatedUser.Undo();//back to the normal a/c
File.Delete(assemblySaveLocation);//just to clean up things, clean the client's assembly too.
}

Wednesday, 5 March 2008

Injections

Dependency Injection:

Dependency Injection refers to the process by which functional components ('concerns' in AOP terms) are induced into an object such that the object could use its functionalities. Say, you had a Customer Management module and one of the function it does was audit the name of all users who updated a customer record. On the simplest terms, we might have the following classes :

MyBusinessObject - A base class for each the business entity classes.
Customer - The business entity containing the customer details - name, DOB, Address etc. This being derived from MyBusinessObject.
CustomerManager - Manages all business functions related with the customer. Say, adding customer, deleting, searching, modifying etc
MySimpleAudit - A class which does the audit the various operations.

In simplest case, CustomerManager class would directly instanstiate the MySimpleAudit class and call the appropriate Audit function. All good. If we have more Audit classes , say StackTraceAudit (which audits the stack trace too..why? I dont know), ObjectStreamAudit (Audits the current state of the object) , the standard design logic would call for interfaces to separate the functionality out, in our case perhaps into an IAudit interface.

IAudit
   AuditDelete(MyBusinessObject)
   AuditCreate(MyBusinessObject)
   AuditModify(MyBusinessObject)

We would then make sure that all our Audit classes (MySimpleAudit, StackTraceAudit, ObjectStreamAudit) implement this interface. The only confused class is the CustomerManager class which does not know which IAudit implementing class to use. Ofcourse it could depend on a configuration entry to get the audit class name or it could just hard-code to use one of the class etc.

What if we could tell directly to the CustomerManager which audit object to use ? The crux of dependency injection is this . Injecting an object (IAudit Instance) instance into another object (CustomerManager) such that the injectee (the object that got injected! sic indeed) can use the functionality of the injected object (IAudit instance).

You could pass the instance of the injected object in three standard ways as part of the constructor , use a property or use an interface definition.

For our customer example, passing the object via a constuctor would be in the lines of :

class CustomerManager
{
    private _audit IAudit;
    CustomerManager (audit IAudit)...
   
    DeleteCustomer(Customer customer)
    {
        _audit.AuditDelete(customer as MyBusinessObject)
    }
}

In this case, when the CustomerManager class is instantiated, the right audit instance is passed along - eg :- new CustomerManager(new StackTraceAudit)

Stuff noted:
1.) CustomerManager is not disturbed for any changes/additions to the IAudit implementation
2.) Any new IAudit implementation class can be created without affecting the consuming class.
3.) What is depicted is effectively an 'Inversion of Control'. The control of locating, creating of the audit class being inverted to a different object.
4.) This pattern decouples the logic of - which object, from where etc out from the consumer object.

Policy Injection

For the similar scenario as above, assume you had a single Audit object consumed by CustomerManager. Now if there is a new requirement to include the functionalities of StackTrace, ObjectLogging too into the audit system, what would you do?

Though there are numerous immediate solutions to get the stuff working (modify existing Audit class to call the other audits as well, create yet another master class which calls all the audit objects etc), Policy Injection calls for creating a proxy object class for the currently available Audit class. It would be this proxy class which gets used by at the CustomerManager object instead of the Audit object.

The CustomerManager object might end up using a Factory pattern or a dependency injection pattern (!) to get the right proxy audit class (reread this line again till it makes sense.)

Now interestingly, what the Audit proxy object class would perform is this:
1.) On the way in (when the request for audit happened) , it would call the 'Pre' step routines of all registered audit handlers (StackTrace, ObjectStream etc) in a sequence and finally call the original Audit class routines.
2.) On the way back (when the request for audit is done), it might call further 'Post' step routines on each of the handlers in the reverse order.

As seen, from the point of view of the CustomerManager, it is dealing with only one object, which is the new Audit proxy object. Whenever it calls the proxy Audit class to audit, all the handlers would perform its audit (either in the pre/post routines) and finally call the original Audit object.

Tuesday, 3 April 2007

Survival tips for the 'common programmer'

Learn Learn
For the 'common programmer'[R.K Laxman - 'Common Man' variant], the importance of a good foundation in computer science and continuously upgrading your knowledge cannot be emphasised further. The reason for this post are the many interviews conducted over the past few days which have been very disappointing and a few talks with my colleagues.

In addition to computer science fundamentals, what definitely appear to be missing from the many software professionals is the passion to learn stuff, the desire to look into the details to know how things works

Your basic foundation, which should have at least covered computer architecture, OS fundamentals, Networking fundamentals, programming concepts and constructs (for a more exhaustive list, check out the syllabus from any of the B.Tech or B.Sc Computer Science courses) seems to be missing.

Second, you need to be aware of whats around and happening in this field; now how would I do that ? Subscribe to postings via a good RSS reader - Google Reader is a good option. Nearly all websites support for RSS subscriptions. Most importantly, make sure you read through them periodically.

Information Overload ?
Now, while reading through the many stuff, how do you make sure its relevant to you ? There is no way anyone could read and understand each of the topic (that would take 25hrs daily... behind bars perhaps?). An easy option is to not go into the details of the implementation, but be aware of the concept; as in, know the fundas. Unless the posting itself is of an interesting nature and you want to go in deeper.

The same logic applies to newsgroup postings; subscribe to newsgroups which appear interesting, but be aware of what needs a closer read. The experts appear to 'read between the lines'; you could skip paragraphs and sentences to read through the article to get an overall idea. If it does appears interesting, go back and read all the lines.

Overall, just make sure you are updated - make the above two steps a habit :)

Look Further
Now, when you learn something new, make sure you delve a bit more deeper than the skin to understand the hows and whys. These two questions should clear a lot many doubts on why the stuff is there in the first place and how the stuff solves it.

eg:- Most of us appear to know that the foreach construct in C# lets you loop through each items in the collection (solves the 'why' part). All good. Now, how does it do it and how can I extend my System.Object descendant to be made usable within the foreach construct? Enter IEnumerable interface.

Another one - Threads in C# do let me run jobs in parallel ('why' part). Now how does the CLR manage user threads? Did you know that a thread need not be created at the OS level each time a /new Thread()/ is called ? Enter Thread pool managed by the CLR.

What needs to be stressed is the importance of going deeper into anything you learn by answering the above two questions each time.

All the best fellow programmers. Would like comments on how you guys learn and update yourself.

Bits from the software front...