Python/MatPlotLib: Set the Labels to Months
I have been trying to figure this out for hours. I am working with a dataset on Trump's approval ratings from FiveThirtyEight, and the data is specifically from Gallup Polls. The data has 12 polls a month. I just cannot seem to find a way to set the labels to months instead of the entry it is. For example, 0 through 11 should be Jan, 12 through 24 should be Feb and etc. Screenshot of the iPython Notebook currently working in This is the code I currently have:
sns.set() xcoords = [i for i in range(0, len(trumpGallup))] plt.plot(xcoords,trumpGallup.adjusted_approve,'g-',label="Adjusted Approval (Trump)") plt.plot(xcoords,trumpGallup.adjusted_disapprove,'r-',label="Adjusted Disapproval (Trump)") plt.legend(loc=2) #This enables and determines the location of the graph's legend plt.xlabel("Trump's Ratings from January 2017 to November 2017") plt.ylabel("Percetange") plt.show()
See also questions close to this topic
How to iterate over divs in Scrapy?
It is propably very trivial question but I am new to Scrapy. I've tried to find solution for my problem but I just can't see what is wrong with this code.
My goal is to scrap all of the opera shows from given website. Data for every show is inside one div with class "row-fluid row-performance ". I am trying to iterate over them to retrieve it but it doesn't work. It gives me content of the first div in each iteration(I am getting 19x times the same show, instead of different items).
Thanks for any advice!
import scrapy from ..items import ShowItem class OperaSpider(scrapy.Spider): name = "opera" allowed_domains = ["http://www.opera.krakow.pl"] start_urls = [ "http://www.opera.krakow.pl/pl/repertuar/na-afiszu/listopad" ] def parse(self, response): divs = response.xpath('//div[@class="row-fluid row-performance "]') for div in divs: item= ShowItem() item['title'] = div.xpath('//h2[@class="item-title"]/a/text()').extract() item['time'] = div.xpath('//div[@class="item-time vertical-center"]/div[@class="vcentered"]/text()').extract() item['date'] = div.xpath('//div[@class="item-date vertical-center"]/div[@class="vcentered"]/text()').extract() yield item
Can't log into aspx website
I'm trying to log into this website https://www.moodys.com/Login.aspx using Python 3 but I'm having no luck. I've tried every method possible and can't seem to get it to work. I'm trying this code below but I can't figure out the login section
from robobrowser import RoboBrowser url = 'https://moodys.com' login_url = 'https://www.moodys.com/Login.aspx' username = "XXXXX" password = "XXXXX" browser = RoboBrowser(history=True) # This retrieves __VIEWSTATE and friends browser.open(login_url) signin = browser.get_form(id='aspnetForm') signin["MdcUserName"].value = username signin["MdcPassword"].value = password signin["Log In"].value = "Log In" browser.submit_form(signin) browser.url
IndentationError when there is no indentation needed
My script is simple:
I use the interactive interpreter and I get the error when I import the file called "tries1.py" which contains nothing but the code cited -- no spaces, no tabs.
The "r" is a from an old version of the same file. I tried exiting the interpreter and the terminal and starting over -- this does not help. This happens both with Python 2.7.12 ("python") and with Python 3.5.2 ("python3") on Linux Mint. Is there a way to "clear" what the interpreter remembers; and how does it remember stuff?
Does anyone have a clue?
The full story:
user@host /media/user/New Volume/IT studies/Python $ ls __pycache__ test2.py test2.pyc test.py tries 1.py tries1.py user@host /media/user/New Volume/IT studies/Python $ python Python 2.7.12 (default, Nov 19 2016, 06:48:10) [GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import tries1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "tries1.py", line 2 print("Hello", r) ^ IndentationError: expected an indented block >>>
P.S. I am new to programming and this is my very first post about IT stuff!
Saving pandas styler object in anyway
Here's a styler object with a background gradient:
I'm just looking for anyway to save it as is. Have tried using .render() but not sure what to do with that HTML code, and from reading other questions on the subject it seems there is no current way to save these. Is there a hackish way to do it?
Here is the array:
array([[ 0.264 , 0.271 , 0.285 , 0.289 , 0.329 ], [ 0.053 , 0.051 , 0.045 , 0.038 , 0.031 ], [ 0.006 , 0.007 , 0.009 , 0.01 , 0.01 ], [-3.98650106, -3.95728537, -3.99767582, -4.20624136, -3.54186842], [-3.22600677, -2.87623307, -2.03420988, -1.54443176, -1.41006671]])
and the line of code I have:
Use pandas autocorrelation plot to plot the autocorrelation of the adjusted monthend close AAPL
So I'm tring to plot the autocorrelation of AAPL, and also plot the autocorrelation of AAPL shifted by one month in order to calculate the monthly return. There is no line in my second graph, and in both graphs the range of x values is not what I expected (0-1200 when there are 1795 days between the dates I gave).
import pandas_datareader.data as web import datetime as dt import matplotlib.pyplot as plt import pandas as pd import numpy as np from pandas.plotting import autocorrelation_plot start = datetime.datetime(2012, 7, 31) end = datetime.datetime(2017, 6, 30) aapl = web.DataReader('AAPL', 'yahoo', start, end) autocorrelation_plot(aapl['Adj Close']) plt.show() autocorrelation_plot(aapl.shift(30)['Adj Close']) plt.show()
Ordering a dataframe using value_counts
I have a dataframe in which under the column "component_id", I have component_ids repeating several times. Here is what the df looks like:
In : df.head() Out: index molregno chembl_id assay_id tid tid component_id 0 0 942606 CHEMBL1518722 688422 103668 103668 4891 1 0 942606 CHEMBL1518722 688422 103668 103668 4891 2 0 942606 CHEMBL1518722 688721 78 78 286 3 0 942606 CHEMBL1518722 688721 78 78 286 4 0 942606 CHEMBL1518722 688779 103657 103657 5140 component_synonym 0 LMN1 1 LMNA 2 LGR3 3 TSHR 4 MAPT
As can be seen, the same component_id can be linked to various component_synonyms(essentially the same gene, but different names). I wanted to find out the frequency of each gene as I want to find out the top 20 most frequently hit genes and therefore, I performed a value_counts on the column "component_id". I get something like this.
In : df.component_id.value_counts() Out: 5432 804 3947 402 5147 312 3 304 2693 294 75 282 Name: component_id, dtype: int64
Is there a way for me to order the entire dataframe according to the component_id that is present the most number of times? And also, is it possible for my dataframe to contain only the first occurrence of each component_id?
Any advice would be greatly appreciated!
Impact of NA's when filtering Data Frames
I have a large data frame, which includes the following 2 fields and the number of rows shown (just 2 columns shown for simplicity):
> nrow(df)  3541393 > summary(df$ttlVisits) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.000 1.000 1.527 1.000 118.000 > summary(df$AVGsessTOS) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1 27 30 115 72 21554 280146
I would like to remove rows with AVGsessTOS > 1628
> nrow(df[df$AVGsessTOS>=1628,])  300645
So, I run the following command, expecting 300,645 rows to be removed, but instead get 20,499:
df <- df[ df$AVGsessTOS < 1628, ]
The impact of the command on row counts and the 2 original columns:
> 3541393 - nrow(df)  20499 > summary(df$ttlVisits) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1.00 1.00 1.00 1.53 1.00 118.00 280146 > summary(df$AVGsessTOS) Min. 1st Qu. Median Mean 3rd Qu. Max. NA's 1.0 27.0 30.0 102.5 70.0 1627.5 280146
If I make a simple change to my filtering approach and use the 'which' function, I get the results that I expect.
df <- df.bak # restore original data frame
df <- df[ which(df$AVGsessTOS < 1628), ]
And the impacts of the command:
> 3541393 - nrow(df)  300645 > summary(df$ttlVisits) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.000 1.000 1.000 1.526 1.000 118.000 > summary(df$AVGsessTOS) Min. 1st Qu. Median Mean 3rd Qu. Max. 1.0 27.0 30.0 102.5 70.0 1627.5
My interpretation of the above is that Filter #1 caused the expected 300,645 rows to get dropped BUT had a side effect of adding 280,146 "empty rows" due to the presence of NA's in df$AVGsessTOS. ( 300,645 - 280,146 = 20,499)
Can someone confirm my interpretation of these results, and that this is the expected behavior of Filter #1?
Maybe this will help someone else avoid getting bit by this as well. Thanks
UPDATE: Replicating the issue with mtcars:
data(mtcars) set.seed(66) > nrow(mtcars)  32
Looking at breakdown of the distribution of the 'carb' column is as-expected, totaling 32:
> table(mtcars$carb) 1 2 3 4 6 8 7 10 3 10 1 1
Now set 3 carb values to NA (not entire row, just carb values) to create similar data to my dataset, to illustrate the problem:
set.seed(66) mtcars[sample(1:nrow(mtcars), 3), ]$carb <- NA
Again, distribution of 'carb' column totals 29 is as expected, 3 less than the original after setting NA's:
> table(mtcars$carb) 1 2 3 4 6 8 6 10 1 10 1 1
Now, drop the 6 rows shown above, with carb value of 1
> mtcars2 <- mtcars[mtcars$carb>=2,]
Confirm intended records were dropped:
> table(mtcars2$carb) 2 3 4 6 8 10 1 10 1 1
However, row count does NOT agree with above counts:
> nrow(mtcars2) 26
Inspecting the data shows 3 entire rows of NA values. Where are these rows coming from?
View(mtcars2) ( replicate to see output of 'view' )
correct apply of pandas.apply with lambda
i would like the column 'extrema' of my DataFrame beeing 'max2015' if 'max215' is bigger than 'max' or smaller than min2015 if 'min2015' is smaller than 'min'
i think it's the most elegant way to solve this with an df.apply - lambda combination but i can't get a correct solution with this.
x['extrema'] = x.apply(lambda df: df['max2015'] if df['max2015'] > df['max'] else df['min'] if df['min2015'] > df['min'] else np.nan, axis=1)
I get the following result, what is not the correct solution.
What's my mistake or another good solution?
Thank you in advance!
How to deal with tibbles when subsetting/indexing dataframe column in R?
I am currently subsetting my dataframe column like this
df_subset <- df[,c(2)]
Measurement ------------ 1 2752 2 2756 3 2756 4 2740 5 2724 6 2536 7 2796 8 2800
The output says this is a 50 x 1 tibble, which makes sense cause there are 50 rows and 1 column. However, I am not sure how to deal with tibbles. From what I understand I cannot index it like I would a list or vector. Is it easy to index a tibble, and if so how? If not, how would I convert this to a list/vector instead? Just to give you an idea of what I want to do, let's say I want to index the 8th, 15th, 23rd, and 47th measurement values, and I would like the final output to ideally be a vector or something else that is easy to work with.
Python Spyder: show plots in new window instead of inline (not answered by previous posts)
Several existing posts address this question, but none of the solutions worked for me.
I am using Spyder 3.2.4 with Python 3.6. I'd like plots to show up in a new window instead of as tiny in-line figures in the IPython console.
I tried Tools > Preferences > IPython console > Graphics > Graphics backend > Automatic. I also tried Qt5 and Qt4 here, and closed and reopened the file I was trying to run (see code below).
I also tried
directly in the console, with no result.
I checked whether the windows might be popping up in the background, but they are not.
import numpy as np import matplotlib.pyplot as plt x = np.arange(10) y = x**2 plt.ion() plt.plot(x,y) plt.show()
- How to change the position of text in seaborn jointplot
How do I map df column values to hex color in one go?
I have a pandas dataframe with two columns. One of the columns values needs to be mapped to colors in hex. Another graphing process takes over from there.
This is what I have tried so far. Part of the toy code is taken from here.
import pandas as pd import matplotlib import matplotlib.pyplot as plt import seaborn as sns # Create dataframe df = pd.DataFrame(np.random.randint(0,21,size=(7, 2)), columns=['some_value', 'another_value']) # Add a nan to handle realworld df.iloc[-1] = np.nan # Try to map values to colors in hex # # Taken from here norm = matplotlib.colors.Normalize(vmin=0, vmax=21, clip=True) mapper = plt.cm.ScalarMappable(norm=norm, cmap=plt.cm.viridis) df['some_value_color'] = df['some_value'].apply(lambda x: mapper.to_rgba(x)) df
How do I convert
'some_value'df column values to hex in one go? Ideally using the
I am not opposed to using something other than
Thanks in advance.
Seaborn saving blank graphs
I wrote a piece of code that, in my intentions, should plot two graphs on top of each other, save the result, and then continue. The code does produce the right plots on screen, as well as creating the png files, but they are all blanks.
Here's the code:
for i in plotlist: name = i+'.png' sns.kdeplot(dataset1[i], label=i+'- dataset1') sns.kdeplot(dataset2[i], label=i+'- dataset2') plt.legend() plt.show() plt.savefig(name)
What am I missing?
Setting 2 colors and text in the legend when plotting a graph with seaborn
I'm doing a project and using Seaborn in Python to plot some graphs to display some results and I'm wondering how to set two colors(i.e. red & green) which will be used when displaying the graph and how to change the text in the graph legend from 1 & 0 in the example supplied to Left & Stayed.
Also, I have tried color=' ' but I want to pick my own two colors!
Here is a snippet of code:
plt.subplots(figsize=(15, 5)) sns.countplot(y="Salary", hue='Turnover', data=df).set_title('Salary vs Turnover!');
What it displays: