for row in df.CUSTOMER_CONTENTION_TEXT.values: Where can i find it with my excel file
import matplotlib.pyplot as plt import pandas as pd plt.style.use('ggplot') from wordcloud import WordCloud, STOPWORDS df = pd.read_excel(r'crawling.xlsx') text = '' for row in df.CUSTOMER_CONTENTION_TEXT.values: text = text + row.value() + ' ' wc = WordCloud(max_words=2000, stopwords=STOPWORDS, font_path=r'NanumBarunGothic.otf') # generate word cloud wc.generate(text) # store to file wc.to_file(r'first.png') # show plt.imshow(wc) plt.axis("off") plt.figure() plt.axis("off") plt.show()
I got a code from github [https://gist.github.com/pybokeh/de5475328fb2bbb33cb7] python_wordcloud_from_excel.ipynb
for row in df.CUSTOMER_CONTENTION_TEXT.values:
I can not understand. Can you explain this? I want to fix this code and use it, but I don't know how to fix this.
See also questions close to this topic
Trying to engineer new columns using a for loop to create column names and then populate new columns with data from the dataset
I am working with the olympic medal dataset which can be found here. The data shows aggregated medals won by each country for both summer and winter olympics.
I am trying to create new columns to show the number of gold, silver and bronze medals won per games appearance for summer games, winter games, and combined. I have tried creating a list of medals and seasons medals per game, creating a new column in the dataframe for each combination and thought I could divide the original columns (eg summer gold, summer silver ...) by the total for games attended for given season (eg summer gold / summer games attended).
However, when I tried the code below I got KeyError: ('%s games attended', 'Summer'). Any suggestion for how to accomplish the new feature I'm trying to create would be appreciated
import pandas as pd import numpy as np import matplotlib.pyplot as plt data = pd.read_csv('/Users/xx/Downloads/olympics.txt', header = 1) data.columns = ['Country', 'Summer games attended', 'Summer gold', 'Summer silver','Summer bronze','Summer total', 'Winiter games attended','Winter gold', 'Winter silver','Winter bronze','Winter total', 'Combined games attended','Combined gold', 'Combined silver','Combined bronze','Combined total'] data['Country'] = data['Country'].str.split("\[", expand=True) data.drop(146, axis = 0, inplace = True) medals = ['gold', 'silver', 'bronze'] seasons = ['Summer', 'Winter', 'Combined'] for col in data.columns: for medal in medals: if medal in col: for season in seasons: if season in col: n = season + ' ' + medal + ' per games' data[n] = [i for i in col / 2]
How to implement the iterative way to change the filename reading and how to combine result into single excel file
I am new to python. Have a task that have to find some of the following for all the excel files(1.xlsx-350.xlsx) around 350 excel files, which contained in single folder(Videos). and written following code it works fine but it is time consuming, manually have to change file name every iteration. even in the end of the process, I have to combine all 350 excel file operated data into single excel file. But in my code it overwrite each and every iteration. please help me to resolve this problem.
data12 = pd.read_excel (r'C:\Users\Videos\1.xlsx') gxt = data12.iloc [:,0] gyan = data12.iloc [:,1] int= gyan.iloc[98:197] comp= gyan.iloc[197:252] seg= gyan.iloc[252:319] A= max(int) B= max(comp) C= min(comp) D= max(seg) s = pd.Series([A, B, C, D]) frame_data= [gyan, comp, seg, stat] result = pd.concat(frame_data) result.to_excel("output.xlsx", sheet_name='modify_data', index=False)
thank you for helping.
Tkinter class and methods
I have this tkinter codes, working fine but I want to put them into a class with methods for each process. I am very new to Python, how can I do this?
You don't have to do all, just the class and two methods will be fine and I can learn to replicate the rest.
root= tk.Tk() canvas1 = tk.Canvas(root, width = 400, height = 400, relief = 'raised') canvas1.pack() label1 = tk.Label(root, text='EDA') label1.config(font=('helvetica', 12)) canvas1.create_window(200, 25, window=label1) label2 = tk.Label(root, text='Number of Clusters:') label2.config(font=('helvetica', 8)) canvas1.create_window(200, 120, window=label2) entry1 = tk.Entry (root) canvas1.create_window(200, 140, window=entry1) browseButtonExcel = tk.Button(text=" Import Excel File (CSV) ", command=App.getExcel, bg='green', fg='white', font=('helvetica', 10, 'bold')) canvas1.create_window(200, 70, window=browseButtonExcel) processButton = tk.Button(text=' k-Means Clustering', command=cluster, bg='brown', fg='white', font=('helvetica', 10, 'bold')) canvas1.create_window(200, 170, window=processButton) root.mainloop()
plot Word cloud without stopwords
Iam looking to plot Wordcloud using a column in my pandas dataframe
here is my code:
all_words=''.join( [tweet for tweet in tweet_table['tokens'] ] ) word_Cloud=WordCloud(width=500, height=300, random_state=21, max_font_size=119).generate(all_words) plt.imshow(word_Cloud, interpolation='bilinear')
tweet_table['tokens']that iam looking to plot looks like this:
0 [da, trumpanzee, follower, blm, balance, wp, g... 1 [counting, blacklivesmatter, received, trainin... 2 [okay, like, little, kids, pretty, smart, know... 3 [thank, oscopelabs, got, mounted, loud, amp, p... 4 [bpi, proud, supported, hoops, 4l, f, e, see, ... ... 44713 [tomorrow, buy, charity, compilation, undergro... 44714 [needs, erected, state, capitol, think, darkfa... 44715 [clay, county, sheriffs, motto, screw, amp, in... 44716 [films, eleven, films, bravo, bad, ass, video,... 44717 [everybody, give, listen, blm]
My code above gives me the following error:
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-227-4066d6d1a153> in <module> 2 # REMOVE STOP WORDS 3 ----> 4 all_words=''.join( [tweet for tweet in tweet_table['tokens'] ] ) TypeError: sequence item 0: expected str instance, list found
How can i fix the error please? The column
tokenizedand clean from any
Ps: when i use similar code for this column
tweet_table['clean_text']the code works fine.
tweet_table['clean_text']looks like this:
0 You have a da trumpanzee follower in ... 1 Over 279 and counting If BlackLivesMatte... 2 Okay but like little kids are pretty smart and... 3 Thank you oscopelabs got it mounted loud amp... 4 BPI is proud to have supported Hoops4L Y F E ... ... 44713 TOMORROW you can buy the charity compilation... 44714 That needs to be erected at the State Capi... 44715 Clay County Sheriffs Motto To Screw amp ... 44716 Films Eleven Films bravo Bad ass vid... 44717 everybody should give this a listen ...
Stylecloud to generate wordcloud
I want to use stylecloud to generate wordcloud. And I followed the instruction from its github page(https://github.com/minimaxir/stylecloud). But somehow I cannot see any result. My python version is 3.7.4.
How to print top words from wordcloud
I have a word cloud generated using the wordcloud library. I got an image of the wordcloud. Now I want a list of the top x words of the wordcloud. How do I do that? I don't want the most frequent words. I want the most frequent and most important words which is what the image shows.
Here is the wordcloud library: link