Click here to Skip to main content
15,886,519 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I need help plotting some categorical and numerical Values in python. the code is given below:

%%time  
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns




%%time  df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns



plt.figure(figsize=(20,6)) 
sns.countplot(x='Store', data=df) 
plt.show()



Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10)



However, the data size is so huge (Big data) that I'm not even able to make meaningful plotting in python. Basically, I just want to take the top 5 or top 10 values in python and make a plot of that as given below:-

https://i.stack.imgur.com/pHcAI.png

In an attempt to plot the thing, I'm trying to put the below code into a dataframe and plot it, but not able to do so. Can anyone help me out in this:-

Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10)


Below, is a link to the sample dataset. However, the dataset is a representation, in the original one where I'm trying to do the EDA, which has around 3 thousand unique stores and 60 thousand rows of data. PLEASE HELP! Thanks!



https://drive.google.com/file/d/1j77Xvl1mzUAPNZ53b89LzODSu1ZsbvEJ/view?usp=sharing

What I have tried:

%%time  
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns




%%time  df=pd.read_csv('train_feature_store.csv')
df.info
df.head
df.columns



plt.figure(figsize=(20,6)) 
sns.countplot(x='Store', data=df) 
plt.show()



Size = df[['Size','Store']].groupby(['Store'], as_index=False).sum() Size.sort_values(by=['Size'],ascending=False).head(10)
Posted
Comments
Richard MacCutchan 14-Jul-22 4:39am    
You need to read less data in the first place. it is quite possible (although you have not explained the problem) that pandas is running out of memory.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900