Fast Data Architectures for Streaming Applications

Fast Data Architectures for Streaming Applications is a free report co-published by Lightbend and O'Reilly on the architectural characteristics of highly available, resilient, scalable, and responsive systems for data stream processing at scale.

The book provides an overview of the common requirements for reliable streaming systems, including the need to handle potential data loss, duplication, late arrival, etc. Standard system concerns, the so called reactive principles are important for such systems, where services and applications need to run reliably for weeks, months, even years.

Apache Kafka is the messaging backbone of these architectures, providing high scalability and reliability for ingesting data organized into topics and orders, similar to conventional message queues.

The data can then be processed by one or more stream processors, including the following:

  • Apache Spark, for a rich variety of processing options, including SQL, using a minibatch processing model
  • Apache Flink, which offers low-latency (vs. minibatch) processing with rich semantics for reliably processing.
  • Akka Streams, which offers very low-latency event processing with rich integrations with other systems.
  • Kafka Streams, for low-latency processing of Kafka topics.

Rounding out the picture are tools for building other microservices, such as the Lightbend Reactive Platform, as well as management and monitoring tools.

Fast Data Architectures for Streaming Applications