Echo Kanak

Consuming and Visualizing Real-Time Event Streams

Apr 7, 2025

So in Part 1, I simulated user journeys on a fake shopping platform clicks and transactions. Events were flying into Kafka topics users, clickstream, and transactions.

Now we are going to consume data from these topics and make some use of it. Cleaning it, storing it, analyzing it and visualizing it.

Each Kafka topic has its own structure:

  • users: sign-up info
  • clickstream: page views, devices, sessions
  • transactions: payment events + product details


The Kafka consumer script uses PySpark to process real-time data from the Kafka topics. It reads the data, applies a schema, and parses it into structured format.

After submitting the Spark job (spark-submit), you’ll start seeing logs confirming messages are being consumed and stored.

  • It writes this data to Minio in Parquet format for storage, and to PostgreSQL for querying and dashboarding.
  • Transaction data is further exploded to store individual product-level details in a separate transaction_items table.
  • init.sql script creates the tables when you start the postgres container for the first time.
  • users, clickstream and transactions tables store the raw data exactly as produced by the Kafka producer.


I add the
transaction_items, table which breaks down the products array inside each transaction into one row per item. This makes it easier to analyze product-level insights like revenue per category or most sold items.


CREATE TABLE IF NOT EXISTS transaction_items  (
    transaction_id VARCHAR,
    product_id VARCHAR,
    product_category VARCHAR,
    quantity INT,
    unit_price FLOAT,
    product_spend FLOAT,
    timestamp TIMESTAMP
);
sql

We can check our postgres tables

docker exec -it postgres psql -U postgres -d streaming_analytics
bash

It will postgres terminal where we can check our tables. It should look something like this

Minio - Minio acts as an S3-compatible object store. Spark writes partitioned Parquet files into Minio buckets:

  • clickstream
  • transactions
  • user

When the minio container starts for the first time it creates these buckets. This is what the data looks like when we use the minio user interface to check our files.

Dashboard

  1. Open your browser and go to: http://localhost:3000
  1. Credentials as mentioned in docker-compose.yml:
    • Username: user
    • Password: pass
  1. Navigate to the pre-configured dashboard

When you run it for the first time this is something that you should see on the dashboard. Each panel has some queries for some basic data insights.

In my project I haven’t used volumes. To persist data across container restarts, use docker volumes for PostgreSQL, Kafka, and Minio. This ensures data isn’t lost when running docker-compose down. The volumes are defined outside the container lifecycle in docker-compose.yml.

Github repo for this project can be found here GitHub