Simulating Real-Time User Journeys with Python and Kafka
Simulate clickstream and transaction data for users
I wanted to build a project which dealt with clickstream user events as well as transaction data. So I decided to simulate it using python as I can tweak make changes add volume or any number of attributes. This project is for learning purpose. This is the first step of my dockerized streaming pipeline project that uses Kafka, Spark Streaming, Minio, Postgres and Grafana.
Each new user follows a mini-journey:
- Registers with a name and location
- Clicks through 3β6 pages
- (Maybe) completes a transaction
All events are timestamped and streamed to Kafka topics.
Technologies Used
- Python β to simulate data using
Faker,requests, andrandom
- RandomUser API β for realistic Indian user profiles
- Kafka β to stream user, click, and transaction events
- Docker β to containerize everything for reproducibility
Step 1: Simulating a New User
Each simulated user has:
- A UUID user ID
- A name, email, and city/state
- A registration timestamp
It tries the https://randomuser.me/api/?nat=in API first, and falls back to Faker if the request fails.
Step 2: Clickstream Events
Each user clicks through 3β6 pages with short delays, generating actions like:
click,scroll,hover,navigate
- Pages like
/home,/products,/cart
Example click event:
To make user journey feel natural, just like how people behave on an e-commerce site I added weights for the random choices
Here's the logic behind click events:
That means:
- 40% of the time, users land on the homepage
/home
- 20% go to product listings
/products
- 15% check their cart
/cart
- 10% proceed to checkout
/checkout
- 15% check out special offers
/offers
The weight distribution makes data more believable and simulates real world user activities. As most users who visit an ecommerce website dont end up making a purchase
Step 3: Transaction Event
Each transaction includes:
- User & session IDs to trace back the journey
- Order-level info: total amount, payment method and status
- Items: product details including quantity and price
Here's an example of what a transaction event looks like when it gets sent to the transactions Kafka topic:
Also to make transaction events realistic I gave a range for the product categories so that we dont get random price values that make no sense eg buying a book at Rs 5 or electronic item at Rs 10.
Dockerize the project
Docker-compose setup
To make producer container, I created a containers folder where we keep the producer.py and add requirements.txt as well as the Dockerfile for the container
Run docker-compose up -d
If the producer container doesnβt exist Docker will make it using this Dockerfile
The generated data gets streamed to the kafka topics .
Kafka Topics in Use
The simulator pushes to three Kafka topics:
| Topic | What it Stores |
users | Basic user profiles |
clickstream | Page visits & interactions |
transactions | Order & payment information |
Now we can use subscribe to these topics and make use of the data further.
This project is part of my larger streaming data pipeline series where this data feeds into:
- Spark Streaming
- PostgreSQL
- Minio object storage
- Grafana dashboard
Find the next part of the blog here Part 2
Github repo for this project can be found here GitHub