Event-driven architecture is required to quickly and effectively react to real-time events in your organizational landscape. We’ve previously talked about the benefits of event-driven architecture,but also highlighted that several architectural components and investments may be required in order to unlock the benefits of this concept. Stream Processing is one such component and a great tool to work with real-time events as they are produced. We’ve talked about the concept of stream processing in a previous blog. This technical blog will guide you through the basics of developing a stream processor and what to consider during development.
To develop a stream processor in eMagiz, a few key components are required. You start by building your foundation, the messaging infrastructure. Next, you define your stream processor logic after which you require deployment infrastructure. Finally, you need to manage your stream processing application to enforce security and access throughout all your applications.
Start at the foundation, your event streaming infrastructure
Before you can start with stream processing, it is a key requirement to have an event broker that distributes events in real-time. Such an event broker must be highly scalable, fault tolerant, and provide ‘exactly once’ delivery guarantees. You may have never heard of these three attributes, so let’s discuss them first.
High scalability
Being highly scalable means that we must support high data throughputs without introducing delays, and also support horizontal scaling of producers and consumers of messages to prevent bottlenecks. For instance, by distributing incoming events across instances of a certain consumer. Scalability in the broker is usually achieved by spinning up multiple broker instances and distributing incoming events in parallel to the subscribers.
Fault tolerance
This means that you want to be able to recover from failure, either in your infrastructure, or within one of your consumers. On the infrastructure side, fault tolerance should ensure that if part of the hardware infrastructure of the event broker goes down, the event broker can continue to function without data loss and without a significant impact on performance. Lineage and duplication are two key technologies used to achieve this. Additionally, when consumers go offline, it should also be ensured by the broker that data is not lost. Retention is a key technique that can be used here to ensure that consumers can temporarily go offline, and then resume processing data where they left off.
Delivery guarantees
Delivery guarantees ensure that all incoming events are processed exactly one time. Fault tolerance is one of the key factors to ensure ‘exactly once’ delivery. Many frameworks support ‘at most once’, or ‘at least once’ delivery. However for crucial business data, such as financial transactions, more strict processing semantics are crucial.
Kafka is the most popular event broker that supports all these demands, as an open-source framework that can be used to create a distributed publish/subscribe-based messaging infrastructure for real-time communication. Kafka uses a concept called partitioning to ensure clients can read and write data from many brokers at the same time. These partitions are replicated to ensure availability and fault-tolerance.
Kafka is open source and free to deploy on your own hardware or cloud environment of your choice. However, for enterprise applications, it can be beneficial to opt for Kafka as a service, to ensure uptime, stability and enterprise grade management functionality, such as eMagiz Event Streaming.
Next, define your stream processor
You’ve finished setting up your streaming infrastructure, the next step is to define your stream processor. There is a large landscape of options for defining your stream processing applications. Again, we must consider non-functional attributes for our stream processor such as fault tolerance, scalability and delivery guarantees. Additionally, we must consider other aspects such as deployment models, batch use-cases, and lookups.
Similar to your event streaming infrastructure, your processor must guarantee fault tolerance, scalability and delivery guarantees. This can be achieved through distributed instances of the processor, these work together to achieve a shared goal. Some processors highly rely on the distribution capabilities of the event streaming infrastructure, such as Kafka Streams, to achieve scalability, while others use internal distribution mechanisms, such as Apache Flink. Using shared state stores, lineage and other fault recovery methods, stream processors achieve fault-tolerance in a distributed manner as well.
The deployment models
Stream processors are highly variant in their deployment models. Kafka Streams can be deployed standalone, on a per instance bases. Multiple instances will automatically recognize each other to work together towards their shared goal. Typically however, stream processing frameworks require a central broker to control all instances, such as Zookeeper, which manages the individual instances for processing. Apache Flink is an example for this. The suitability of the deployment model depends on the task at hand. For highly variable throughput rates, clustered solutions may be excellent due to their ability to automatically scale and communicate when needed. However, when throughput is stable, and processing must happen at various locations in the infrastructure, standalone processors are more suitable.
The deployment model also impacts the type of workload a stream processor is able to handle. Standalone processors are more suitable for event-based workloads, such as filtering and aggregations, while clustered solutions are more ideal for collaborative use cases such as enrichment but also for time triggered events, and batch cases, as they do not depend on incoming events to trigger jobs. Furthermore, standalone processors are not at all suitable for lookups, or iterative problems that require an analysis of the whole data set, as they only have access to individual events rather than the complete dataset.
Overall, what type of processor to use depends on your individual use case, as well as your ability to manage individual instances or host a cluster of processing power. While standalone processors are easier to deploy, they are harder to maintain and scale once deployed. Clustered solutions have an overall higher barrier to get started. To lower this barrier, processing services are available that abstract from infrastructure level deployment choices, and provide management, deployment and scaling out of the box.
Overall, what type of processor to use depends on your individual use case, as well as your ability to manage individual instances or host a cluster of processing power. While standalone processors are easier to deploy, they are harder to maintain and scale once deployed. Clustered solutions have an overall higher barrier to get started. To lower this barrier, processing services are available that abstract from infrastructure level deployment choices, and provide management, deployment and scaling out of the box.
Keep your engines running with: management, monitoring and governance
Once you have established your infrastructure and deployed your stream processing application, you have your stream processor running. But will it remain working? And what happens when your ecosystem grows, how do you keep it manageable? A crucial step that is sometimes forgotten in the lifecycle of stream processing applications is the management phase. We’ll discuss a few key things to consider.
Testing & migration
Before deploying any new version, make sure to extensively test your stream processor, not only by testing the logic itself, but also by deploying it on a testing environment so your stream processor can be tested with actual data. As stream processors consume live data and have low criteria for downtime, testing can also help you to optimize the deployment process and ensure smooth migrations to newer versions with different business logic, or different input requirements.
Monitoring
Depending on your choice for a stream processing framework and deployment model, you will have different options for monitoring your applications. Make sure to setup an infrastructure for collecting errors, and trigger the right actions on them so that any failures can be detected immediately. All stream processing frameworks provide you with the option to tap into the exception stream, but by default, they may simply ignore this compromising the delivery guarantees. Similarly, monitoring the metrics stream helps to scale your application appropriately and ensure that throughput times can be maintained. External tools (e.g. Grafana) can help you turn metrics and exceptions into easy to use dashboards with alerting, to assist you with monitoring your stream processors.
Management
Management is important in order to act when needed, and to enforce security and access throughout your applications. Especially as your number of stream processors grows, it is key to manage which stream processors have access to what data, and who is in charge of managing this data and processor logic. This not only includes access rights and security, but also responsibilities and roles for scaling and monitoring.
The monitoring and management part of stream processors are currently lacking in most major stream processing frameworks. Therefore, it is crucial to integrate this into your own stream processing solution, or in other monitoring tools in your application landscape. A platform like eMagiz can help you to develop, monitor and manage your stream processing solutions out of the box, with a wide variety of options for deploying, securing, regulating and monitoring your environment, including alerting. eMagiz helps you focus on your business solution instead of the intricacies of event processing. Are you curious about the possibilities for your organization? Give us a call, we are happy to help!
By Mark de la Court, Software Developer @ eMagiz