geom bar for data in rows
I have a data.frame:
I would like to do barlike plot where I would visualize results from column Transaction for each name in rows (decreasingly or increasingly)
See also questions close to this topic

how to combine multiple vectors into a data frame using the names of the elements of each vector (in R)
I would like to concatenate multiple vectors into a data frame, using the names of each vector to guide the concatenation.
for instance if I have vectors x1, x2, and x3:
sample(1:50,20)>x1; sample(1:50,20)>x2; sample(1:50,20)>x3
and each vector has names such:
nam < paste("A",1:50, sep=""); names(x1)<as.character(sample(nam,20)); names(x2)<as.character(sample(nam,20)); names(x3)<as.character(sample(nam,20))
I would like to generate a data frame in which the first column contains all names used in at least one vector and the rest of the columns containing the values associated to each vector with "na" when there is no value for a particular name. Something like this:
A1 3 NA NA A2 NA 4 5 A3 NA 3 NA A4 NA 22 NA ....
That would mean that the name A1 is associated with a value (which is 3) only in x1, but not in x2 or x3. A2 is associated with value only in vector x2 and x3 but not in x1. Etc.
Any idea of how to do this?
Thank you very much,

Python DataFrame  Select dataframe rows based on values in another dataframe
I'm struggling with a dataframe related problem. There are two dataframes, df and dff, as below
data = np.array([['', 'col1', 'col2'], ['row1', 1, 2], ['row2', 3, 4]]) df = pd.DataFrame(data=data[1:,1:].astype(int), index=data[1:,0],columns=data[0,1:]) filters=np.array([['', 'col1', 'col2'], ['row1', 1, 1], ['row2', 1, 2], ['row3', 3, 2]]) dff = pd.DataFrame(data=filters[1:,1:].astype(int), index=filters[1:,0],columns=filters[0,1:])
I wish to select rows from df such that their col2 value belongs to a list of values that can be found in dff with matching col1 value. For example, for the col1 value equals to 1, that list should be [1, 2], for the col1 value equals 2, the list is [2].
My best attempt to solve this is
df1 = df[df['col2'].isin(dff[dff['col1']==df['col1']]['col2'])]
But that results in
ValueError: Can only compare identicallylabeled Series objects
Any help would be appreciated. Thanks so much.

Find equivalents in two DataFrames and insert a new value into a column
I'm trying to find matches in the two Data Frames decoys and files. When I have a match, I want to insert '1' into a new column in the files DataFrame for that sample, other samples that don't find a match should have a '0'.
I'm doing this to clean my data and prepare it for a machine learning algorithm.
This is what I have so far, and it is not very efficient.
for decoy in decoys.iterrows(): for file in files.iterrows(): if decoy[1][0] == file[1][4] and file[1][3] == decoy[1][1]: #Match. Add '1' to new column in files['decoy']

Any way to make a plot for regression output where continuous variable as a key in R?
Here I have longitudinal data that needs to run regression and visualize its put. So I reshaped my data for plotting and tried to render plot for my regression outputwith
ggplot2
. For plotting regression output, I found several post inSO
and got idea from there (post about regression output plotting). But, R bumped a bug that stop to make a plot for me. Here is the reproducible example and my code:reproducible data:
df= data.frame(index = rep(c('dex111', 'dex112', 'dex113','dex114','dex115'), each = 30), year =1980:2009, region= rep(c('Berlin','Stuttgart','BĂ¶blingen','Wartburgkreis','Eisenach'), each=30), ln_gdp_percapita=rep(sample.int(40, 30), 5),ln_gva_agr_perworker=rep(sample.int(45, 30), 5), temperature=rep(sample.int(50, 30), 5), precipitation=rep(sample.int(60, 30), 5), bin1=rep(sample.int(32, 30), 5), bin2=rep(sample.int(34, 30), 5), bin3=rep(sample.int(36, 30), 5), bin4=rep(sample.int(38, 30), 5), bin5=rep(sample.int(40, 30), 5), bin6=rep(sample.int(42, 30), 5), bin7=rep(sample.int(44, 30), 5),bin8=rep(sample.int(46, 30), 5), bin9=rep(sample.int(48, 30), 5), bin10=rep(sample.int(50, 30), 5), bin11=rep(sample.int(52, 30), 5))
Note that to fill up my reproducible data.frame, I used the random integer.
here is what I did (updated):
library(plm) library(dplyr) library(broom) library(tidyverse) library(lmtest) pdf < pdata.frame(df, index = c("region", "year")) model < plm(ln_gdp_percapita ~ bin1+bin2+bin3+bin4+bin5+bin6+bin7+bin8+bin9+bin10+bin11, data = pdf, model = "pooling", effect = "twoways") res=summary(coeftest(model), cluster=c("c")) ft < pdf %>% group_by(ln_gdp_percapita) %>% do(tidy(plm(ln_gdp_percapita ~ bin1+bin2+bin3+bin4+bin5+bin6+bin7+bin8+bin9+bin10+bin11, data = pdf, model = "pooling", effect = "twoways"))) %>% select(ln_gdp_percapita, estimate) ggplot(pdf, aes(x=bin6, y=ln_gdp_percapita)) + geom_point() + geom_abline(data=ft, aes(slope=bin6, intercept=`(Intercept)`))
but I bumped into a problem down below, R complaint with this strange bug:
Error in eval_tidy(enquo(var), var_env) : object '' not found
don't know why; perhaps I need to figure out another way how to reshape regression output for plotting, but how? Regarding the regression result, is there any better way to render scatter plot with ggplot2? Any idea?
update: my second attempt:
library(mgcv) mdel < dat$ln_gdp_percapita ~ dat$prec+dat$bin1+dat$bin3+dat$bin4+s(dat$bin6, bs = 'cr')+s(dat$bin7, bs = 'cr') ft < bam(mdel, data = dat, discrete = TRUE, nthreads = 2) predict.bam(ft, dat, se.fit = TRUE) summary(ft) plot.gam(ft, page = 1)
update:
I also tried above solution to my actual data and it run slow. Is there any better solution to render a plot for regression estimation output? Any thought?
desired plot:
Here is the desired plot that inspired by corresponding literature (plot was demonstrated at page 7):
Any better idea to get my expected plot like that? Thanks

How to plot bar graph in R with each components corresponds with other component?
I have data frame in following format
Triplet Amino_Acid Fraction Freq_1k Number 1 AAA K 0.62 27.1 39 2 AAC N 0.32 13.9 20 3 AAG K 0.38 16.7 24 4 AAT N 0.68 29.2 42 5 ACA T 0.21 10.4 15 6 ACC T 0.31 15.3 22 7 ACG T 0.28 13.9 20 8 ACT T 0.20 9.7 14 9 AGA R 0.17 16.7 24 10 AGC S 0.18 13.9 20
and i want to plot a graph with on xaxis i want Triplet and on yaxis i want Fraction. On plotting a graph i found just 6 Triplets in Xaxis but i want to plot fractions according to the number of Triplets like in the above example i have 10 fractions and 10 triplets i want to make a graph with these 10 not with just 6

ggplot2  how to reorder stacked bar plot datapoints with labels by values and not alphabetically
I would like to reorder stacked barplot datapoints so that in each bar they are sorted from largest to smallest COMPETITOR by its total VALUE and not alphabetically.
I generated the data to use fct_reorder (the line commented out) and the datapoints get sorted but the labels do not follow the changed order. How can I make the labels on the plot follow suit and be located in the right positions in the middle of the bar segments?
Here is my working reproducible example with the fct_reorder line commented out. If you uncomment it, the datapoints will get sorted but labels will remain in wrong positions.
library(tidyverse) library(scales) data< tibble::tribble( ~CUSTOMER, ~COMPETITOR, ~VALUE, "AAA", "XXX", 23400, "AAA", "YYY", 10000, "AAA", "ZZZ", 80000, "AAA", "YYY", 60000, "BBB", "XXX", 10000, "BBB", "YYY", 20000, "BBB", "ZZZ", 10000, "BBB", "YYY", 80000, "CCC", "YYY", 30000, "CCC", "ZZZ", 20000, "DDD", "YYY", 7000, "CCC", "VVV", 10000 ) unit_mln < scales::unit_format( unit = "mln", sep = " ", scale = 1e6, digits = 2, justify = "right" ) col_competitors < scale_fill_manual( "legend", values = c( "XXX" = "navyblue", "YYY" = "red", "ZZZ" = "lightyellow", "VVV" = "green")) df_cust< data %>% mutate(COMPETITOR=as.factor(COMPETITOR)) %>% group_by(CUSTOMER) %>% mutate(CUST_VALUE=sum(VALUE)) %>% ungroup() %>% group_by(COMPETITOR) %>% mutate(COMP_VALUE=sum(VALUE)) %>% ungroup() %>% group_by(CUSTOMER, COMPETITOR) %>% summarise(CUST_VALUE=max(CUST_VALUE), COMP_VALUE=max(COMP_VALUE), VALUE=sum(VALUE))%>% arrange(desc(CUST_VALUE)) # df_cust<df_cust %>% mutate(COMPETITOR= fct_reorder(COMPETITOR, COMP_VALUE)) df_comp< data %>% group_by(COMPETITOR) %>% summarise(VALUE=sum(VALUE)) df_cust$CUSTOMER = str_wrap(df_cust$CUSTOMER, width = 30) plt_main<df_cust %>% ggplot(aes(x = fct_reorder(CUSTOMER, CUST_VALUE), y = VALUE)) + geom_col( aes(fill = COMPETITOR), alpha = 0.5, position = position_stack(reverse = T), col = "darkgray", show.legend = F ) + geom_text(aes(label = unit_mln(round(VALUE,4))), size = 3, position = position_stack(vjust = 0.5)) + xlab(" ") + ylab("Market share (GROSS PLN)") + ggtitle(paste("Top competitors in top customers: ", "Poland")) + theme_bw(base_size = 11) + theme( axis.text.x = element_text( angle = 90, hjust = 1, vjust = 0.5 ), legend.position = c(0.94, 0.75)) + col_competitors + scale_y_continuous( labels = function(n) { unit_mln(n) }, sec.axis = sec_axis(~ . / sum(df$VALUE), labels = scales::percent) )
 visualize the GRU layer in keras

Visualizing binary timeseries data in python
I've got a collection of a few different binary timeseries that I'd like to visualize on top of one another. The series are composed of cycle data, so each data point looks like
(start_ts, end_ts, state)
, wherestart_ts
andend_ts
are both floats andstate
is a booleanEach time series is composed of a list of tuples like the one above, yielding something like
[(t0, t1, s1), (t1, t2, s2), ... (tn1, tn, sn)]
For example, you might have something like
[(0, 5, TRUE), (5, 23, FALSE), (23, 38, TRUE)]
signifying that, for that particular time series, the value was TRUE from 0 seconds to 5 seconds, FALSE from 5 second to 23 seconds, and then TRUE again from 23 seconds to 38 seconds
In the end, I'd like output that looks like
series_1 XXXXXOOOOOXXXXXOOXOOOXO series_2 XXXXXXXXOOOOOOXXXXXXOOO series_3 XXXXOOOOXXXXXXXXXXOOOXX
but as a colored chart, not as a series of X's and O's
Do you have any recommendations on the best way to visualize this? Thank you!

Python Seaborn Boxplot: Overlay 95 percentile values on whisker
I want to overlay 95 percentile values on seaborn boxplot. I could not figure out the ways to overlay text or if there is seaborn capability for that. How would I modify following code to overlay the 95 percentile values on plot.
import pandas as pd import numpy as np import seaborn as sns df = pd.DataFrame(np.random.randn(200, 4), columns=list('ABCD'))*100 alphabet = list('AB') df['Gr'] = np.random.choice(np.array(alphabet, dtype="S1"), df.shape[0]) df_long = pd.melt(df, id_vars=['Gr'], value_vars = ['A','B','C','D']) sns.boxplot(x = "variable", y="value", hue = 'Gr', data=df_long, whis = [5,95])