Echo Kanak

Simulating Real-Time User Journeys with Python and Kafka

Simulate clickstream and transaction data for users

Apr 6, 2025
Simulating Real-Time User Journeys with Python and Kafka

I wanted to build a project which dealt with clickstream user events as well as transaction data. So I decided to simulate it using python as I can tweak make changes add volume or any number of attributes. This project is for learning purpose. This is the first step of my dockerized streaming pipeline project that uses Kafka, Spark Streaming, Minio, Postgres and Grafana.

 

Each new user follows a mini-journey:

  1. Registers with a name and location
  1. Clicks through 3–6 pages
  1. (Maybe) completes a transaction

All events are timestamped and streamed to Kafka topics.

Technologies Used

  • Python – to simulate data using Faker, requests, and random
  • RandomUser API – for realistic Indian user profiles
  • Kafka – to stream user, click, and transaction events
  • Docker – to containerize everything for reproducibility

Step 1: Simulating a New User

Each simulated user has:

  • A UUID user ID
  • A name, email, and city/state
  • A registration timestamp

It tries the https://randomuser.me/api/?nat=in API first, and falls back to Faker if the request fails.

Step 2: Clickstream Events

Each user clicks through 3–6 pages with short delays, generating actions like:

  • click, scroll, hover, navigate
  • Pages like /home, /products, /cart

Example click event:

To make user journey feel natural, just like how people behave on an e-commerce site I added weights for the random choices

Here's the logic behind click events:

That means:

  • 40% of the time, users land on the homepage /home
  • 20% go to product listings /products
  • 15% check their cart /cart
  • 10% proceed to checkout /checkout
  • 15% check out special offers /offers

The weight distribution makes data more believable and simulates real world user activities. As most users who visit an ecommerce website dont end up making a purchase

Step 3: Transaction Event

Each transaction includes:

  • User & session IDs to trace back the journey
  • Order-level info: total amount, payment method and status
  • Items: product details including quantity and price

Here's an example of what a transaction event looks like when it gets sent to the transactions Kafka topic:

Also to make transaction events realistic I gave a range for the product categories so that we dont get random price values that make no sense eg buying a book at Rs 5 or electronic item at Rs 10.

Dockerize the project

Docker-compose setup

To make producer container, I created a containers folder where we keep the producer.py and add requirements.txt as well as the Dockerfile for the container

Run docker-compose up -d

If the producer container doesn’t exist Docker will make it using this Dockerfile

The generated data gets streamed to the kafka topics .

 

Kafka Topics in Use

The simulator pushes to three Kafka topics:

Topic
What it Stores
users
Basic user profiles
clickstream
Page visits & interactions
transactions
Order & payment information

Now we can use subscribe to these topics and make use of the data further.

This project is part of my larger streaming data pipeline series where this data feeds into:

  • Spark Streaming
  • PostgreSQL
  • Minio object storage
  • Grafana dashboard
 

Find the next part of the blog here Part 2

Github repo for this project can be found here GitHub

 

You might also like

BlogPro logo
Made with BlogPro