Echo Kanak

Simulating Real-Time User Journeys with Python and Kafka

Apr 6, 2025

I wanted to build a project which dealt with clickstream user events as well as transaction data. So I decided to simulate it using python as I can tweak make changes add volume or any number of attributes. This project is for learning purpose. This is the first step of my dockerized streaming pipeline project that uses Kafka, Spark Streaming, Minio, Postgres and Grafana.

Each new user follows a mini-journey:

  1. Registers with a name and location
  1. Clicks through 3–6 pages
  1. (Maybe) completes a transaction

All events are timestamped and streamed to Kafka topics.

Technologies Used

  • Python – to simulate data using Faker, requests, and random
  • RandomUser API – for realistic Indian user profiles
  • Kafka – to stream user, click, and transaction events
  • Docker – to containerize everything for reproducibility

Step 1: Simulating a New User

Each simulated user has:

  • A UUID user ID
  • A name, email, and city/state
  • A registration timestamp

It tries the https://randomuser.me/api/?nat=in API first, and falls back to Faker if the request fails.

{
  "user_id": "5df1b623-bf7f-40ad-a1e6-731c6a8fc639",
  "name": "Kanak Bisht",
  "email": "kanak.bisht@example.com",
  "location": "Bengaluru, Karnataka",
  "registered_at": "2025-04-02 20:51:12"
}
json

Step 2: Clickstream Events

Each user clicks through 3–6 pages with short delays, generating actions like:

  • click, scroll, hover, navigate
  • Pages like /home, /products, /cart

Example click event:

{
  "user_id": "u123",
  "session_id": "SESS123",
  "timestamp": "2025-04-01 20:12:30",
  "page": "/products",
  "device": "iOS",
  "action": "click"
}
json

To make user journey feel natural, just like how people behave on an e-commerce site I added weights for the random choices

Here's the logic behind click events:

"page": random.choices(
    ["/home", "/products", "/cart", "/checkout", "/offers"],
    weights=[0.4, 0.2, 0.15, 0.1, 0.15],  
    k=1
)[0]
python

That means:

  • 40% of the time, users land on the homepage /home
  • 20% go to product listings /products
  • 15% check their cart /cart
  • 10% proceed to checkout /checkout
  • 15% check out special offers /offers

The weight distribution makes data more believable and simulates real world user activities. As most users who visit an ecommerce website dont end up making a purchase

Step 3: Transaction Event

Each transaction includes:

  • User & session IDs to trace back the journey
  • Order-level info: total amount, payment method and status
  • Items: product details including quantity and price

Here's an example of what a transaction event looks like when it gets sent to the transactions Kafka topic:

{
  "user_id": "bc12EF45GH67",
  "session_id": "SESS9832475901",
  "transaction_id": "TXN327594837210",
  "timestamp": "2025-04-04 20:42:33",
  "transaction_amount": 4319.97,
  "payment_method": "credit_card",
  "payment_status": "successful",
  "products": [
    {
      "product_id": "PROD13456",
      "product_category": "Electronics",
      "quantity": 1,
      "unit_price": 3599.99
    },
    {
      "product_id": "PROD98765",
      "product_category": "Books",
      "quantity": 2,
      "unit_price": 359.99
    }
  ]
}
json

Also to make transaction events realistic I gave a range for the product categories so that we dont get random price values that make no sense eg buying a book at Rs 5 or electronic item at Rs 10.

Dockerize the project

Docker-compose setup

services:
  kafka:
    image: bitnami/kafka
    container_name: kafka
    ports: ["9092:9092"]
    environment:
      - KAFKA_CFG_NODE_ID=0      
      - KAFKA_CFG_PROCESS_ROLES=controller,broker      
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093      
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093      
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER      
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
  producer:
    image: producer:v1
    volumes:
      - ./containers/producer/producer.py:/app/producer.py
    environment:
      - KAFKA_BROKER=kafka:9092
    depends_on:
      - kafka
yaml

To make producer container, I created a containers folder where we keep the producer.py and add requirements.txt as well as the Dockerfile for the container


FROM python:3.9-slim AS builder

WORKDIR /app

COPY requirements.txt .

RUN pip install -r requirements.txt

COPY producer.py .

ENV KAFKA_BROKER=kafka:9092

CMD ["python", "producer.py"]
docker

Run docker-compose up -d

If the producer container doesn’t exist Docker will make it using this Dockerfile

The generated data gets streamed to the kafka topics .

Kafka Topics in Use

The simulator pushes to three Kafka topics:

TopicWhat it Stores
usersBasic user profiles
clickstreamPage visits & interactions
transactionsOrder & payment information

Now we can use subscribe to these topics and make use of the data further.

This project is part of my larger streaming data pipeline series where this data feeds into:

  • Spark Streaming
  • PostgreSQL
  • Minio object storage
  • Grafana dashboard

Find the next part of the blog here Part 2

Github repo for this project can be found here GitHub