Showing posts with label Distributed Systems. Show all posts
Showing posts with label Distributed Systems. Show all posts

Monday 19 October 2020

Distributed state management — refresher

  • Ease of Stateless Services

    Designing stateless distributed systems are relatively easy. You would have raised an event/message once the service had done processing. You typically are not worried about how the other systems consume your data. In fact in the majority of these scenarios, you don’t care what happens to the message/event after you were done.

    Think of fire and forget. This could be easily achieved using typical message broker queues & topics. Sadly, this stateless approach is quite bookish and not practical for most enterprise application needs.

    Pains of Stateful Services

    State could be as straightforward as an entity-update which another service is dependent on or a bit more trickier one as an amount being debited from one account and credited to another. This information/state needs to be maintained somewhere.

    In 2 / 3 tier applications, we had a central transaction/state coordinator/server through which all transactions must flow. In case of errors, it was the task of this coordinator to rollback all the child tasks relevant in the transaction context.

    One of the problems with this approach was about a single point of failure (SPoF) — what if this server went down — what happens to the workflow/process state ?

    In distributed architectures, we would require a distributed state log which is virtually “centralized”, but practically distributed across nodes/pods/VMs for enabling both availability and scalability.

    One of the main challenge raised in this distributed situation is state consistency :

    • How do we make the state synced across the nodes such that all nodes return the same state even if queried separately? — strong consistency.
    • Staleness/freshness — Is the state returned older than the state at other nodes ?
    • During a retry operation (say when part of the workflow failed), what is the impact of executing the service again ? Processing duplicate messages should not affect the underlying entity ever — think of 2 debit messages against the same account during a retry. Is the old state overwritten with new or should it be ignored ?

    Approaches

    There are many approaches employed these days for state management :

    Coordinators/Orchestrators : a set of system(s) that manages the state — think of an orchestrator in a music symphony.

    • Distributed locks : What if nodes in a cluster can elect a leader among themselves such that leader is the only one that can change the state. Paxos/Chubby/Raft being some of the prominent algorithms with many implementations.
    • 2-Phase commit : Employed mostly in systems migrating from 2 / 3 tier to the cloud; requires two cycles of requests across ALL participants — First cycle preparing the for commit — PrepareForCommmit and second the commit itself — CommitNow.
    • Eventual Consistency : state as received is processed as long it’s more newer than the one it is aware. All nodes are NOT required to be aware of the most recent state at any point in time. Once all of the applicable process completes, the eventual state is available in the DB/Cache. The UI depicts information available to it, though stale. For a specific workflow, UI would depict states as ‘InProgress’ and later as ‘Completed’ once all the tasks in the workflow are identified as completed or after a timeout. This approach is applicable for information that is not too business critical, where clients are OK with information that is stale.
    • Optimistic/TimeStamp based : If the timestamp of the state received is newer than the one available in the data store, the node applies it as a new state. As in most cases, locks might need to be applied on the DB record to make sure no one else is simultaneously applying.
    • Event Source Logs — EventSourcing depends on an append only store for all events. This distributed event-log could in fact be used to derive the state of an entity at a point in time by evaluating the events until a point in time. Snapshots of the state at a point in time can definitely speed up this evaluation of state.

    Depending on the consistency requirement, it’s usually a mix of the above approaches used for distributed state management.

    Technology Options

    Writing frameworks/libraries from scratch that address above challenges are complex and not recommended unless you are a software company with serious software research focus.

    Across the spectrum, there are interesting frameworks and stacks that can assist the ‘common engineer’ (derived from ‘common man’). As each framework has their (dis)advantages for adoption, based on the technical capability of the team, hybrid solutions too can be looked into.

    • Azure Durable Functions based orchestration : OrchestrationTrigger, when applied on services, exploits the capabilities of Durable Azure Functions that automatically orchestrate the state. The fundamental idea being, the context/state is available for all of the functions(services) participating in the workflow. For developers, this is like writing a 2 tier application with a single try-catch block to handle any error/state across libraries. Instead of libraries, you are calling services with the entire context of execution available to you across the services. From the AWS world, we now have the Step Functions which behaves the same.
    • Reliable Actors, Reliable Collection built over Service Fabric/Azure Service Fabric (SF) support orchestration for getting/updating state information across the distributed nodes while hiding away the intricacies on how nodes internally communicate and keep themselves in sync. This is a pretty good option if you have no plans to support multi cloud. Though it’s conceptually possible to host SF onto AWS, no assurance on the level of compatibility today. Possibly satisfying CP in CAP.
    • Orleans research project from Microsoft needs a special reference here as both SF and Orleans are based on the actor model though the design is different.
    • Akka and related Akka.Net : An actor model where a conductor/parent actor is internally aware of its child actors within a cluster. The model can be exploited for various distributed state management together with its supported persistent actor and singleton model. Compared with the Service Fabric model, SF does not support this parent-child relationship. (PS: if you are from a .NET background, do check Akka.net. Further ahead, together with Azure, Akka.net deployed on AKS pods is a cool experiment for massively scalable state management needs). CAP is deterministic here based on the storage/persistence store selected.
    • Dapr.io : Supporting strong consistency (all nodes must be in sync) for the state management, Dapr is deployed typically as a sidecar and does not disturb the service code (unlike Akka.net where runtime is part of the service). (do check out similar service mesh offerings like Istio too)
    • Kafka Streams , HazelCast Jet : acting as orchestrators, these stream processing engines have intrinsic support to make sure a message in the cluster is processed ‘exactly-once’. This out of the box feature can be exploited to manage state as you don’t need to worry how the set of nodes are talking with each other to reach an agreement internally. Intricacies on how the nodes in the cluster message each other over queues to reach an agreement is completely abstracted away. Possibly satisfying CP in CAP.
    • Axon, Eventuate.io, Camunda , Netflix Conductor : similar to the way above ServiceFabric/Akka/Kafka streams function, these too hide away the internals of inter node state synchronization and could be looked into.
    • NServicebus : the supporting framework requires an explicit transaction start call such that it can handle rest of the related messages in the transaction. Internally it can lock all related messages for a forced sequencing like a funnel in case there are too many consumers. Possibly satisfying CP in CAP.
    • Redis Locks : Use Redis for achieving a distributed lock before changing state.

    It’s highly recommended to check with your architect team who could weigh-in the features while considering the characteristics/NFR’s and KPI’s of your system. The underlying storage/persistence later for each of the above set of frameworks/services directly effect the CAP. In cases the framework allows for choosing a persistence store, must review whether its CP or AP of the CAP that you are planning to satisfy for your service.

    References

Tuesday 13 October 2020

EventChain

Applying Blockchain to Event Sourcing

Event Sourcing pattern at the core requires an event store to maintain the events. What if we add these events as it arrives into a blockchain ? This should effectively make sure the events have not been tampered with. The plan would be to initiate typical blockchain mining after which the event is added to the “block-chain of events” — an “EventChain”.

The definite side effect is that until the mining is complete, the business transaction cannot be internally marked as complete. Considering the time typically taken for mining, this would probably be an offline job.

Tamper Proof

The typical challenges faced by organizations who employ event sourcing and the event store is about securing the events. What if the DB admin for the event store manages to inject/remove events ? The replayed events and resulting projections are no longer valid in this case. Event chains should solve this issue for typical event stores.

Exploit the distributed infrastructure.

For private event chains , where businesses do not want the chain nor events to be exposed, existing distributed systems/hosts can be exploited for mining. Your event store DB cluster hosts, event sourcing services hosts, API hosts, cache cluster hosts and others that are spread across geography could be exploited for the same.

GDPR Challenges

There are cases where regulations require personal data to be removed from all data stores. In our case, this is about removing the related set of events from the event chain. Without the event chain, removing events from the event store was quick and easy.

Resetting the event chain when events are required to be deleted is challenging especially if there have been many events after the event(s) in concern. This would require re-mining the rest of events after removing the event(s) that had personal data all the way down to the most recent event. As this is an extremely time and compute intensive operation, it’s not recommended to store events that contain personal data in the event chain.

Snapshots

As the events from the event store can be played back to recreate a state at a point in time (“projections”), we could in fact have “snapshots” to identify a specific projection in time. We could link this snapshot as a child branch to the main event chain tree such that it’s not required to recalculate the projections each time; while making sure the projections themselves have not been tampered with.

We could look at having many child branches/trees for the different filters/conditions too.