Plotting only specific points using matplotlib's imshow
import numpy as np
import matplotlib.pyplot as plt
N = 101
x = np.linspace(1,1,N); ones = np.ones_like(x)
coords = np.outer(ones,x) #x coords
coords = np.concatenate([[coords], [coords.T]])
ourShape = np.zeros([N,N])
ourShape[np.square(coords[0,:,:]) + np.square(coords[1,:,:]) <= 1.] = 1.
fig, ax = plt.subplots();
ax.imshow(ourShape)
plt.show()
This plots a circle inscribed in a square. But how do I get python to plot only the blue region, which is part of the square and not the circle? To be clear, I do not want to just turn the circle white; I want it to not plot at all. I tried
ax.imshow(ourShape[ourShape < 1.])
and that produces a TypeError.
See also questions close to this topic

Syntax error is not actually an error
I am doing this problem:
Finding Numbers in a Haystack
In this assignment you will read through and parse a file with text and numbers. You will extract all the numbers in the file and compute the sum of the numbers.
Data Files
We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment. Sample data: http://py4edata.drchuck.net/regex_sum_42.txt (There are 90 values with a sum=445833)
 Actual data: http://py4edata.drchuck.net/regex_sum_97463.txt (There are 82 values and the sum ends with 873)
These links open in a new window. Make sure to save the file into the same folder as you will be writing your Python program.
Note: Each student will have a distinct data file for the assignment  so only use your own data file for analysis.
I wrote this code:
import re fname = input('Enter a file name: ') try: fhandle = open(fname) except: print ('File cannot be opened:', fname) exit() numlist = list() for line in fhandle: line = line.rstrip() num = re.findall('^New .*: ([09]+)', line) if len(num) > 0: for number in num: number =`enter code here` float(number) numlist.append(number) print (sum(numlist))
it should run, but I get an error at line six at the print statement. I have put a link to the error that comes to screen in the picture. Any advice or solution is greatly appreciated

Yield usage in recursive manner
I was playing around with the
yield
and I made up a function that takes 3 numbers and a limit and computes them in a particular way (the computation is not important, but can be seen in the code below).values = [] def compute(limit, a, b, c): if a <= limit: print("Got {}, {}, {}".format(a, b, c)) values.append([a, b, c]) compute(limit, b*c, a*c, a*b) else: print("Process ended") compute(9999, 2, 3, 4) print(values)
This works, and produces the output:
Got 2, 3, 4 Got 12, 8, 6 Got 48, 72, 96 Got 6912, 4608, 3456 Process ended [[2, 3, 4], [12, 8, 6], [48, 72, 96], [6912, 4608, 3456]]
However, I am sure that this can be done with the usage of
yield
 mainly thevalues
list creation part, which, as you see, is filled manually by me in the function.The reason why I bother at all is just why the
yield
is used in the first place  here it doesn't matter, but what if my task required the 100000th value from this list  constructing it as a whole is not required, or even plainly stupid considering I don't want the previous 99999 values...I tried using something like:
def compute_y(a, b, c): while True: yield b*c, a*c, b*a for a, b, c in compute_y(2, 3, 4): if a > 9999: break
The problem with this is that the numbers in the
for
are the same all the time (2, 3 and 4), so it's never gonna reach other values and because of that it's an infinite loop.tl;dr  can I use yield here at all to make the algorithm more efficient for bigger lists?

Getting the sum of groupby as a new column with distinct values in Pandas
This is how my data look like:
id date rt dnm 101122 20170124 0.0 70 101122 20170108 0.0 49 101122 20170413 0.02976 67 101122 20170803 1.02565 39 101122 20161201 0.0 46 101122 20170125 0.0 69 101122 20170102 0.0 76 101122 20170718 0.02631 38 101122 20160602 0.0 120 221344 20161021 0.00182 176 221344 20160921 0.47732 194 221344 20160623 0.0 169 221344 20171010 0.91391 151 221344 20170429 0.0 33 221344 20170205 0.0 31 221344 20171016 0.0 196 221344 20160925 0.0 33 221344 20160717 0.0 21 221344 20160721 0.0 46 615695 20170712 0.0 21 615695 20170705 0.0 18 615695 20160711 0.0 38 615695 20160719 0.03655 29 615695 20170527 0.0 23 615695 20171222 0.0 20 615695 20170425 0.0 34 615695 20170323 0.0 20 615695 20160923 0.0 25 615695 20160618 0.0 25
I'm trying to get the sum of 'dmn' column for each 'id' and give this new column a name like 'sum_values'. After that I need to get the id's that have the 'sum_values' higher than 300. The following code generates the first part:
data = pd.read_csv(file_name, sep='\t', header=0, parse_dates=[1], infer_datetime_format=True); test = (data.assign(sum_values = data.groupby('id')['dnm'].transform(np.sum)) .query('sum_values > 300'))
This will add a new column named 'sum_values' and repeat the sum value for each id several times. I need to get a unique value of 'id' and 'sum_values' column. But I can't figure out how/where to add the nunique().
This is the desired outcome:
id sum_values(>300) 101122 574 221344 1050
Any ideas?

InterpolatedUnivariateSpline and ax.fill_between yield unexpected result (filling wrong area) with low Yvalues
I have a function that is supposed to take some raw data, plot it onto a canvas and then fill the area between the baseline and a predefined peak, which works well for high Yvalues but gives the inverse result when using low Yvalues. My question is then twofold:
 Why does this occur?
 What is a robust way to fix this issue (I tried multiplying all Yvalues by 1E6 and, performing the InterpolatedUnivariateSpline fit and then dividing the returning fit by 1E6 again but there must be a better way to fix this).
Snippet:
X = [16.08278,16.090878,16.098978,16.107077,16.115177,16.123279,16.13138,16.139482,16.147586,16.155689,16.163793,16.171899,16.180004,16.18811,16.196218,16.204325,16.212433,16.220543,16.228652,16.236762,16.244874,16.252985,16.261097,16.269211,16.277324,16.285439,16.293554,16.30167,16.309786,16.317904,16.326021,16.334139,16.342259,16.350379,16.358499,16.366621,16.374742] Y = [1.496555,1.766111,2.074339,2.426317,2.825952,3.274024,3.764088,4.288722,4.839724,5.406741,5.978055,6.536869,7.064041,7.540824,7.948076,8.267242,8.48543,8.596198,8.598762,8.492928,8.279867,7.962899,7.55062,7.059239,6.508092,5.91964,5.318298,4.7234,4.148229,3.602356,3.094568,2.635609,2.231337,1.882143,1.58295,1.328678,1.113859] Y2 = [1496555,1766111,2074339,2426317,2825952,3274024,3764088,4288722,4839724,5406741,5978055,6536869,7064041,7540824,7948076,8267242,8485430,8596198,8598762,8492928,8279867,7962899,7550620,7059239,6508092,5919640,5318298,4723400,4148229,3602356,3094568,2635609,2231337,1882143,1582950,1328678,1113859] # Toggle low vs high Yvalues #Y = Y2 import matplotlib.pyplot as plt import numpy as np from scipy.interpolate import InterpolatedUnivariateSpline fig = plt.figure(figsize=(8, 6)) ax = fig.add_subplot(111) plt.plot(X, Y, 'b') plt.legend(['Raw Data'], loc='best') plt.xlabel("Retention Time [m]") plt.ylabel("Intensity [au]") newTime = np.linspace(X[0], X[1], len(X)) f = InterpolatedUnivariateSpline(X, Y) newIntensity = f(newTime) ax.fill_between(X, newTime, newIntensity, alpha=0.5) plt.show(fig)
This yields the following figures:
This is what I would expect (and occurs with high Yvalues).

How to remove 0% from pie chart?
I am working with text data to find the sentiment analysis. I have a data frame of sentiment score of each sentence. Using this data i am creating a pie chart but it shows the 0% in the graph. I am not able to understand the meaning of this 0%. Here is my data frame df1:
score Negative 100.0 Neutral 0.0 Positive 0.0
and here is my code for creating a pie chart:
import matplotlib.pyplot as plt import os plt.figure(figsize=(4,3)) df1.plot(kind='pie', autopct='%1.1f%%', subplots=True,startangle=90, legend = False, fontsize=14) plt.axis('off') plt.show()
and here is my output plot:
How can i remove this 0% from my plot?

python  plotly show command doesn't displays figures
I know this question have been asked here several times in the past.. BUT, I'm tried all the suggested solutions (regarding backend etc..) and no solution have solved my problem!!
I'm using Spyder and Jupyter and with both options it doesn't work and figure not displayed.
Can anyone help me solve this issue and understand what is the problem?
Thanks

How to precisely position the text labels for ggplot() using geom_text() or geom_label(). geom_text_repel() dose not work. Throws an error if used
I've tried many combinations of
geom_text()
but couldn't get the exact output I need.I want the labels to remain at same yaxis limit unless they intersect If they do intersect, I need them to offset in the ydirection so that they will not intersect. I want them efficient so that not much space is left between them.
geom_text_repel()
is not a fix because it throws some error. can someone explain why there is an error when I usegeom_text_repel()
This is my Data and code:
library(ggplot2) library(ggrepel) Data=data.frame(Group=c("Group 2","Group 1","Group 3","Group 2","Group 5","Group 4","Group 6", "Group 7","Group 4","Group 3","Group 1","Group 5","Group 6","Group 7", "Group 2","Group 4","Group 6","Group 7","Group 3","Group 1"), Fruit=c("apple","apple","apple","mango","apple","apple","apple","apple","mango","mango", "mango","mango","mango","mango","orange","orange","orange","orange","orange", "orange"), Percentage=c(68.46846847,77.35849057,72.72727273,26.12612613,76.31578947,62.79069767, 71.05263158,69.23076923,30.23255814,25,20.75471698,23.68421053,23.68421053, 23.07692308,5.405405405,6.976744186,5.263157895,7.692307692,2.272727273, 1.886792453)) ggplot(Data,aes(x=reorder(Fruit,Percentage),y=Percentage,color=Group,label=paste0(round(Percentage),"%")))+geom_point(size=4)+coord_flip()+scale_y_continuous(limits=c(0,100))+ theme(panel.grid.major.y=element_line(color="gray90",size = 0.7),panel.background=element_blank(),strip.background=element_blank(),panel.border=element_rect(color="black",fill=NA,size=1))+ labs(x="",y="% of people",color="")+geom_text_repel(aes(label=paste0(round(Percentage),"%")))
The error is:
Error in .Call("_ggrepel_repel_boxes", PACKAGE = "ggrepel", data_points, : Incorrect number of arguments (11), expecting 9 for '_ggrepel_repel_boxes'

Octave plot statement for different values
How to plot the following statement in octave?
Total_delay_time = a*(1./(mu1lambda./2)) + (1a)*(1./(mu2lambda./2));
This is what I've tried so far:
clc; clear all; close all; a = 0.001:0.001:0.999 ; lambda=10000; mu1=128/15; mu2=125/12; Total_delay_time = a*(1./(mu1lambda./2)) + (1a)*(1./(mu2lambda./2)); plot(Total_delay_time,a)

How to plot BER Vs SNR curve for binary data
I have two vectors of dimension (130,1).One for the transmitted data and the other the received data. Using the following code I have obtained the BER and the number of errors in the vector. The problem is I have no idea how to proceed with plotting the BER vs SNR curve. Though the SNR can be varied from 1to 30 in steps of 1, the BER while plotter doesnt seem to change and im getting a straight line instead of a waterfall curve. Please help, Thanks in advance.
clc clear all close all A = 'sample25418.xlsx'; sheet1 = 1; sheet2 = 2 Range = 'A1:AN140'; B = xlsread (A ,sheet1, Range); C = xlsread (A ,sheet2, Range); N = 10^4 % number of bits or symbols rand('state',100); % initializing the rand() function randn('state',200); % initializing the randn() function % Transmitter Eb_N0_dB = [3:10]; % multiple Eb/N0 values for ii = 1:length(Eb_N0_dB) % counting the errors nErr(ii) = size(find([B  C]),1); end simBer = nErr/N; % simulated ber theoryBer = 0.5*erfc(sqrt(10.^(Eb_N0_dB/10))); % theoretical ber % plot close all figure semilogy(Eb_N0_dB,theoryBer,'b.'); hold on semilogy(Eb_N0_dB,simBer,'b*'); axis([3 10 10^5 0.5]) grid on legend('theory', 'simulation'); xlabel('Eb/No, dB'); ylabel('Bit Error Rate'); title('Bit error probability curve for BPSK modulation');
The plot looks as below.
Please help. Thanks in advance.