Online Retail Store Analysis
- Radhin krishna
- Jul 2, 2024
- 7 min read
Forage Data Visualization Virtual Internship by TATA
Introduction
I conducted an analysis for the data visualization virtual internship offered by TATA on Forage. They provided a business scenario and resources for producing an analytical report. At the conclusion of the internship, I was required to create a visualization file (in Tableau or Power BI) and a five-minute video explaining my findings and inferences through data storytelling. Overall, this internship enhanced my visualization skills, familiarized me with real-world situations, and honed my presentation abilities.

Ask
The initial task involved collecting questions from stakeholders to comprehend their needs and concerns, which also helped us enhance our background knowledge of the attributes and the industry.
The major questions that arose were:
Which region is generating the highest revenue, and which region is generating the lowest?
What is the monthly trend of revenue, which months have faced the biggest increase/decrease?
Who are the top customers and how much do they contribute to the total revenue? Is the business dependent on these customers or is the customer base diversified? Which months generated the most revenue? Is there a seasonality in sales?
What is the percentage of customers who are repeating their orders? Are they ordering the same products or different?
For the repeat customers, how long does it take for them to place the next order after being delivered the previous one?
What revenue is being generated from the customers who have ordered more than once?
Who are the customers that have repeated the most? How much are they contributing to revenue?
Data
data is in Excel format and the given data had 541910 observations and 7 attributes.
Attributes are:
InvoiceNo | StockCode | Description | Quantity | InvoiceDate | UnitPrice | CustomerID | Country |
Data is fictional and free to use.
there are no customer names or any personal information hence there is no privacy and security concerns either.
Data cleaning and pre-processing
Load Data:
Go to the Data tab in Excel.
Click on Get Data > From File > From Excel.
Browse and select the local Excel file containing your data.
Choose the worksheet or table you want to import.
In the Navigator pane, select Transform Data to open the Power Query Editor.
Ensure Desired Format:
In the Power Query Editor, identify the columns containing data types like quantity, invoice number, and customer ID.
Use the Transform tab to change the data type of these columns to Whole Number.
Locate the stock ID column and change its data type to Text.
Identify and Address Inconsistent Observations:
Use the filter row in each column to check for null values.
Focus on columns like stock ID, invoice number, and invoice date.
You can filter out rows with null values by selecting the filter dropdown and choosing (Blanks), then clicking Remove Rows.
Similarly, filter the [quantity] column and remove rows with negative values
Note: You chose filtering because the number of inconsistent observations was low. For a higher percentage (e.g., >10%), consider imputation or substitution techniques.
4. Split Date Column:
Identify the date column containing both date and time information.
In the Transform tab, click on Split Column by Delimiter.
Choose Space as the delimiter to separate the date and time into separate columns.
5. Calculate Revenue:
Create a new custom column by clicking the New Column button.
Name the new column "Revenue."
In the formula bar, type [Quantity] * [Unit Price]
Now save the processed Data in the desired location.
Data Analysis & Visualisation
Here I Used PowerBI for the analysis
Import Data and Transformation (One-Time Setup):
Go to the Home tab in Power BI Desktop.
Click on Get Data and choose Excel.
Browse and select your Excel file containing the cleaned data.
Select the table you want to import.
In the Power BI import window, you might encounter data type suggestions. Ensure the data types for columns like InvoiceNo (Text), Quantity (Whole Number), UnitPrice (Currency), and CustomerID (Text) are correct.
Regional Revenue Performance:
Create a new Slicer visual by dragging the Country field into the Slicers pane.
Create a new Card visual. Set the Value field to SUM(Revenue). This will show the total revenue for the selected country (from the slicer).
By selecting different countries in the slicer, you can identify the region with the highest and lowest revenue.
Monthly Revenue Trend and Seasonality:
Create a new Line Chart visual.
Set the Axis field to Month from the InvoiceDate (assuming the date is formatted with a month dimension).
Set the Values field to SUM(Revenue). This will show the monthly revenue trend.
Analyze the trend to identify months with significant increases/decreases. You can also create a calculated column for a year if needed.
To assess seasonality, consider creating a year-over-year comparison by adding another line representing the previous year's revenue for the same month.
Top Customers and Revenue Contribution:
Create a new Stacked Bar Chart visual.
Set the Axis field to CustomerID.
Set the Values field to SUM(Revenue).
Sort the chart by revenue (descending) to identify the top customers and their contributions.
To assess customer base diversification, calculate the percentage of total revenue contributed by the top X customers (e.g., top 10%).
Repeat Customers and Order Analysis:
Identify Repeat Customers: Create a calculated column to flag repeat customers. One approach is to use a combination of COUNTROWS and filtering for previous orders based on CustomerID.
Order Frequency and Time Lapse: Create calculated columns to determine the number of orders per customer and the time difference between consecutive orders (using the DATEDIFF function with InvoiceDate).
Repeat Customer Revenue: Filter the data for repeat customers and create a new visual (e.g., bar chart) to show revenue generated from them.
Top Repeat Customers: Similar to top customers, create a visual to identify customers with the most repeat orders and their revenue contribution.
Percentage of Repeat Customers:
Identify Repeat Customers: We need to create a calculated column to flag repeat customers. Here's one approach: VAR firstOrderDate = MIN(InvoiceDate) RETURN VAR hasPreviousOrder = CALCULATE(COUNTROWS('YourTableName'), FILTER('YourTableName', CustomerID = EARLIER(CustomerID), InvoiceDate < firstOrderDate)) RETURN IF(hasPreviousOrder > 0, TRUE, FALSE)
Create a new calculated column named "IsRepeatCustomer".
Use the following DAX formula:
Calculate Percentage: DIVIDE( COUNTROWS(FILTER('YourTableName', IsRepeatCustomer = TRUE)), COUNTROWS(ALL('YourTableName')) )
Create a new calculated measure named "PercentageRepeatCustomers".
Use the following DAX formula:
Visualize Percentage:
Go to the Visualizations pane and select Card.
Drag the "PercentageRepeatCustomers" measure to the Values field.
Repeat Customer Order Analysis:
Identify Order Frequency: Create another calculated column named "NumberOfOrders". Use the COUNTROWS function filtered by CustomerID to count the number of orders per customer.
Re-order Time Lapse: Create a calculated column named "DaysBetweenOrders". This requires some additional logic based on the structure of your data.
You'll need to identify the previous order date for each order using techniques like EARLIER or LOOKUPVALUE functions.
Then, use the DATEDIFF function to calculate the difference between the current invoice date and the previous order date.
Analyze Repeat Customer Revenue:
Filter the data for rows where IsRepeatCustomer is TRUE.
Create a new visual (e.g., bar chart) and set the Values field to SUM(Revenue) to visualize the revenue generated from repeat customers.
Top Repeat Customers:
Visualize Top Repeaters: Use a similar approach as identifying top customers (by revenue). However, replace SUM(Revenue) with COUNTROWS in the Values field. This will show the customers with the most repeat orders.
Revenue from Top Repeaters: Filter for repeat customers and create a separate visual (e.g., bar chart) with CustomerID on the Axis and SUM(Revenue) on the Values field. This will show the top repeat customers and their revenue contribution
Map Visualization
Ensure your data has columns for:
Location (e.g., City, State, Country) - used for geographical mapping.
Sales amount - used for sizing the pie charts.
Additional category (optional) - used for coloring the pie chart slices (can represent product categories, sales reps, etc.).
2. Create the Map Visual:
Drag the Location field to the Location well in the Visualizations pane. Power BI will automatically recognize it and create a map visual.
3. Add Pie Chart Slices:
Drag the Sales amount field to the Size well in the Format pane for the map visual. This will create pie charts on the map, with the size of each pie representing the sales amount for that location.
4. Color by Category :
If you have a category to differentiate sales, drag that field to the Legend well in the Format pane. This will color the pie chart slices based on the category for each location.
5. Customize the Map and Pie Charts:
Use the formatting options in the Format pane to customize the map style, pie chart colors, legend labels, and other visual elements
Please note: The solution I am uploading is the standard solution PDF provided by the internship portal. To prevent reuse and copying, I cannot provide my personal solution here.
Conclusions
Regional Analysis (page1-2) :
Market Penetration and Revenue Streams:
The United Kingdom (UK) represents the company's core market, where it has established itself as a first mover and enjoys a dominant position. This translates to a significant portion of the company's total revenue and sales volume being generated in the UK.
Geographic Expansion and Revenue Diversification:
Beyond the UK, established markets like the Netherlands, Ireland, France, and Germany have also seen healthy revenue streams exceeding €200,000. These regions represent successful geographic expansion efforts and contribute to revenue diversification. Additionally, Australia emerges as a promising emerging market with sales and revenue exceeding €100,000.
Market Re-evaluation and Strategic Adjustments:
Conversely, countries like Brazil, the Czech Republic, Bahrain, and Saudi Arabia have yielded minimal revenue. This suggests the need for a strategic review of the company's business model and policies in these regions. Market re-evaluation might be necessary to identify areas for improvement and optimize performance.
Sales Growth and Marketing Effectiveness:
Recent trends over the past three months indicate a positive development. Sales growth has been observed in Ireland, Germany, and Australia, suggesting the effectiveness of current marketing strategies in these regions.
Customer Understanding (page 2-3):
Customer Retention and Purchase Frequency:
Customer churn is analyzed by calculating the number of times a customer makes a purchase (purchase frequency). This is achieved by counting the occurrences of customer IDs. The data is then visualized in a bar graph. The graph reveals that the percentage of returning customers who make three purchases within a specific timeframe falls below 36%. This highlights an opportunity to improve customer retention strategies.
Upselling and Customer Lifetime Value:
The analysis also reveals that 19% of customers increase their purchase value on their second order. This indicates a potential for upselling strategies to further enhance customer lifetime value.
High-Value Customers and Relationship Management:
The top ten customer IDs ({14646, 18102, 17450, 14911, 12415, 14156, 17511, 16684, 13694, 15311}) represent the company's most valuable clients, contributing significantly to overall revenue. Customer relationship management (CRM) efforts should prioritize these customers to ensure exceptional service and post-sales support, fostering long-term loyalty.

Customer segmentation: Based on purchasing behavior, customers have been clustered into seven groups. Their contributions, behaviors, and buying patterns have been visualized to assist the marketing team.

Product Analysis (Page 4)
The products generating the most revenue have stock codes: 22423, 22960, 85099B, 47566, 84879, 85123A, 79321, and 21621.
Products generating the least revenue have stock codes: 85071A, and 22941.
Time-series Analysis (Page 5)
The next concern was to determine if there is any seasonality in sales, and the answer is affirmative.
The company's sales demonstrate a steady increase starting from August and reaching a peak in November. During these three months, there is a high demand for the company's products. It is crucial to ensure that supply meets demand effectively to capitalize on this trend.



Comments