Simulating Real-Time User Journeys with Python and Kafka

I wanted to build a project which dealt with clickstream user events as well as transaction data. So I decided to simulate it using python as I can tweak make changes add volume or any number of attributes. This project is for learning purpose. This is the first step of my dockerized streaming pipeline project that uses Kafka, Spark Streaming, Minio, Postgres and Grafana.

Each new user follows a mini-journey:

Registers with a name and location

Clicks through 3–6 pages

(Maybe) completes a transaction

All events are timestamped and streamed to Kafka topics.

Technologies Used

Python – to simulate data using Faker, requests, and random

RandomUser API – for realistic Indian user profiles

Kafka – to stream user, click, and transaction events

Docker – to containerize everything for reproducibility

Step 1: Simulating a New User

Each simulated user has:

A UUID user ID

A name, email, and city/state

A registration timestamp

It tries the https://randomuser.me/api/?nat=in API first, and falls back to Faker if the request fails.

{
  "user_id": "5df1b623-bf7f-40ad-a1e6-731c6a8fc639",
  "name": "Kanak Bisht",
  "email": "kanak.bisht@example.com",
  "location": "Bengaluru, Karnataka",
  "registered_at": "2025-04-02 20:51:12"
}

json

Step 2: Clickstream Events

Each user clicks through 3–6 pages with short delays, generating actions like:

click, scroll, hover, navigate

Pages like /home, /products, /cart

Example click event:

{
  "user_id": "u123",
  "session_id": "SESS123",
  "timestamp": "2025-04-01 20:12:30",
  "page": "/products",
  "device": "iOS",
  "action": "click"
}

json

To make user journey feel natural, just like how people behave on an e-commerce site I added weights for the random choices

Here's the logic behind click events:

"page": random.choices(
    ["/home", "/products", "/cart", "/checkout", "/offers"],
    weights=[0.4, 0.2, 0.15, 0.1, 0.15],  
    k=1
)[0]

python

That means:

40% of the time, users land on the homepage /home

20% go to product listings /products

15% check their cart /cart

10% proceed to checkout /checkout

15% check out special offers /offers

The weight distribution makes data more believable and simulates real world user activities. As most users who visit an ecommerce website dont end up making a purchase

Step 3: Transaction Event

Each transaction includes:

User & session IDs to trace back the journey

Order-level info: total amount, payment method and status

Items: product details including quantity and price

Here's an example of what a transaction event looks like when it gets sent to the transactions Kafka topic:

{
  "user_id": "bc12EF45GH67",
  "session_id": "SESS9832475901",
  "transaction_id": "TXN327594837210",
  "timestamp": "2025-04-04 20:42:33",
  "transaction_amount": 4319.97,
  "payment_method": "credit_card",
  "payment_status": "successful",
  "products": [
    {
      "product_id": "PROD13456",
      "product_category": "Electronics",
      "quantity": 1,
      "unit_price": 3599.99
    },
    {
      "product_id": "PROD98765",
      "product_category": "Books",
      "quantity": 2,
      "unit_price": 359.99
    }
  ]
}

json

Also to make transaction events realistic I gave a range for the product categories so that we dont get random price values that make no sense eg buying a book at Rs 5 or electronic item at Rs 10.

Dockerize the project

Docker-compose setup

services:
  kafka:
    image: bitnami/kafka
    container_name: kafka
    ports: ["9092:9092"]
    environment:
      - KAFKA_CFG_NODE_ID=0      
      - KAFKA_CFG_PROCESS_ROLES=controller,broker      
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,CONTROLLER://:9093      
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=0@kafka:9093      
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER      
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
  producer:
    image: producer:v1
    volumes:
      - ./containers/producer/producer.py:/app/producer.py
    environment:
      - KAFKA_BROKER=kafka:9092
    depends_on:
      - kafka

yaml

To make producer container, I created a containers folder where we keep the producer.py and add requirements.txt as well as the Dockerfile for the container


FROM python:3.9-slim AS builder

WORKDIR /app

COPY requirements.txt .

RUN pip install -r requirements.txt

COPY producer.py .

ENV KAFKA_BROKER=kafka:9092

CMD ["python", "producer.py"]

docker

Run docker-compose up -d

If the producer container doesn’t exist Docker will make it using this Dockerfile

The generated data gets streamed to the kafka topics .

Kafka Topics in Use

The simulator pushes to three Kafka topics:

Topic	What it Stores
`users`	Basic user profiles
`clickstream`	Page visits & interactions
`transactions`	Order & payment information

Now we can use subscribe to these topics and make use of the data further.

This project is part of my larger streaming data pipeline series where this data feeds into:

Spark Streaming

PostgreSQL

Minio object storage

Grafana dashboard

Find the next part of the blog here Part 2

Github repo for this project can be found here GitHub

Echo Kanak

Simulating Real-Time User Journeys with Python and Kafka

Technologies Used

Step 1: Simulating a New User

Step 2: Clickstream Events

Step 3: Transaction Event

Dockerize the project

Tags