Review of India's Amazon Online Sales Report - March 2020-June 2020

Upon further review by my client, Amazon, of the online sales in India, we discovered that the most recent sales report data showed the status of several items as either cancelled, damaged, lost in transit, rejected by the buyer, returned or returning to the seller.

The Executive Team wanted to see the number and dollar amounts of the items in the above categories, the breakdown by month, the top three states the items were returned from, the type of clothing and sizes of these items.

I followed the data analysis process:

Step 1: Define the question;

Step 2: Collect the data;

Step 3. Clean the data;

Step 4: Analyze the data;

Step 5: Visualize and share the findings.

Data cleaning

To clean any data, it is required to remove irrelevant or duplicate, fix structural errors, do any type conversions if needed, handle missing data, deal with outliers, standardize/normalize data and then validate it.
Working through 120,378 of unique values was very difficult in Excel so I switched to BigQuery to clean the data via SQL.
I ran a query to see all the field names and the distinct rows in the file.
Missing values can create errors or skew the results during analysis.
Upon closer inspection, the column Qty. showed some zeros and in the Amount column nothing was listed for those rows. I deleted those rows by using the following SQL statement.
The query returned 7,795 rows where the amounts were null.
I decided to remove these rows from the analysis.
Since I don’t have a subscription to BigQuery, I was not allowed to make too many changes to the data file so I continued cleaning the data in Microsoft SQL Server Management Studio 2019.
I created the database Amazon and then I imported the flat file Amazon Sales Report.
I had to modify the data type in columns before the data could be imported in SQL Server successfully.
The last column titled “Unnamed_22” had only NULL values so I deleted it.
Many rows in the Qty column had 0 values so I wanted to see how many rows were there with those values.
After running the query, it showed 5,136 with 0 values in the Qty column.
I removed those rows as well.
I ran a statement to see how the data file looked and it now showed that the database had 116,044 rows.
Some of the cells in the promotion_ids and fulfilled_by columns had empty cells but I couldn’t delete the rows because it had values in other columns that I needed. Instead of NULL, I added 0 to make it look cleaner. I ran the following statement which completed successfully:
UPDATE dbo.[Amazon Sale Report]
SET promotion_ids = 0
WHERE
promotion_ids IS NULL;
Now that the file was cleaner, I wanted to see how many orders were cancelled so I can focus on those. After running the statement below, the query showed 5,630 rows of cancellations.
SELECT * FROM dbo.[Amazon Sale Report]
WHERE
Status = 'Cancelled';
I wanted to confirm the total amount of the items cancelled so I ran the query below. Once clean, I exported the file as a csv format for further analysis.
The Status showed items cancelled, shipped - damaged, shipped -lost in transit, shipped – rejected by buyer, shipped – returned to seller, shipped – returning to seller and some blank. I wanted to dig deeper and get more information on these items. I used Tableau Public to import the data file and get more visualizations.

Data analysis

Once the clean data file was imported, I was ready to explore it further and create visualizations in Tableau.
The primary focus was on items that had negative revenue, so after sorting the amounts I found that the cancelled amounts and the items that were shipped but were returned to seller warranted deeper analysis.
Most of the items were either cancelled or returned during the months of April, May and June.
I filtered the status by cancelled and returned to seller and added the cities as well.
Most of the cancellations, returns, etc. were happening throughout India but there were some states that had the most negative dollar returns. The top three were Maharashtra, Karnataka and Uttar Pradesh.
I wanted to focus on the top three states to get more information on the items being cancelled or returned so I created a filter just for those states by month. I added Telangana to the map since its numbers were somewhat similar to Uttar Pradesh.
A closer look at the items cancelled, damaged or returned by month and state.
Most of the returned items from all three states were kurtas and sets.
Sizes of the items analyzed.

Conclusion

Top three states heat map
Items by top three states and month
Focus on two categories
Please click on the map below for the full presentation.