top of page

CONSUMER SEGMENTATION
Nowadays corporate companies are spending millions on Business Intelligence for accurate decision-making in order to identify the behavior of their customer, target new customers, and sell their products and services more.

It also helps companies to define their marketing campaigns. In offline shopping retail chains often use market-basket analysis to know their customers. It is very important for online sellers and E-commerce companies to segment their customers in order to generate maximum revenue.













This project aims to categories online customers into 5 clusters based on transaction per user.
Apart from transactions, other eight parameters are taken into consideration.
InvoiceNo: Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with letter 'c', it indicates a cancellation.
StockCode: Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
Description: Product (item) name. Nominal.
Quantity: The quantities of each product (item) per transaction. Numeric.
InvoiceDate: Invice Date and time. Numeric, the day and time when each transaction was generated.
UnitPrice: Unit price. Numeric, Product price per unit in sterling.
CustomerID: Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
Country: Country name. Nominal, the name of the country where each customer resides.
Sample dataset:
.png)
These models are used to classify customers into 5 categories.
LOGISTIC REGRESSION
For this algorithm, the accuracy achieved in prediction is: 86.29 %. The learning curve is also plotted.

K-NEAREST NEIGHBORS
Accuracy: 79.78 %
Learning curve:

DECISION TREE
For this algorithm, the accuracy achieved in prediction is: 83.24 %. The learning curve is also plotted.

RANDOM FOREST
Precision: 89.61 %

ADABOOST
Precision: 54.57 %
The relation between training examples and training score and cross-validation score are showed using the line chart.

GRADIENT BOOSTING CLASSIFIER
Precision: 89.47 %
Learning Curves:

At this level, Random Forest, Gradient Boosting and k-Nearest Neighbors are mixed for predictions because this leads to a slight improvement in predictions:
Precision: 75.46 %
bottom of page