-Identify the type of each attribute.
| Attributes |
Type |
| store_nbr |
nominal |
| family |
category |
| onpromotion |
Discrete |
| sales |
continuous (Target attribute) |
| date |
interval |
| city |
category |
| cluster |
Discrete |
| (type) for store dataset |
category |
| holiday |
category |
| type_holiday |
category |
| transferred |
nominal (binary) |
| oil price |
continuous |
| dcoilwtico |
continuous |
| Local |
category |
| Local name |
category |
| Description |
category |
-Stores
- The company owns 54 branches nationwide.
- Not all branches were opened at the same time.
An example of this is Store 22.
- As shown in the figure, its total sales over a long
periods are equal to zero, indicating that hasn't
yet opened during that period.

Will this affect the accuracy of the model?
Yes Therefore, this issue will be addressed in preprocessing.
-City

- These stores are distributed across 22 cities and 16 states.
- After constructing the histogram for the 22 cities, it is evident that City Quito has achieved the highest profitability.

- Most of the other cities represent a similar sales distribution range of (1000 —> 15000).
- This reminds us of using scaling in the data preprocessing stage.
- Does this mean that the region or city affects the sales of each store? It is not necessary , especially if the increase in sales in a specific city or state is city or state to having the largest number of stores , as explained in the previous figure. (Quito —> 18 store —> 0.33)
- This reminds us of using scaling in the data preprocessing stage.
- The two shapes represent the sales of two stores in the same region, and as we can see, there is a significant difference in their distribution and the range of numbers.
- Confirm my hypothesis:
- The two shapes represent the sales of two stores in the same region, and as we can see, there is a significant difference in their distribution and the range of numbers. This indicates the presence of other factors that have a greater influence….