Ignore sign of numbers with ggplot2
I have the following df:
gene = c("a", "b", "c", "d")
fc = c(1, 2, 1, 2)
df = data.frame(gene, fc)
I am using the following code for plotting:
ggplot(df, aes(gene, fc)) + geom_point(size=df$fc) + theme_minimal()
How can I ignore the sign of the values in "fc" while plotting?
Thanks
1 answer

You can use the absolute value function
abs()
to ignore the negative sign. For exampleggplot(df, aes(gene, fc)) + geom_point(aes(size=abs(fc))) + theme_minimal()
Just make sure to put properties that you want to map to data inside
aes()
at all times. Rarely should you ever see a$
in ggplot code.
See also questions close to this topic

How to make a function that loops over two lists
I have an event A that is triggered when the majority of coin tosses in a series of tosses comes up heads. I have an unfair coin and I'd like to see how the likelihood of A changes as the number of tosses change and the probability in each toss changes.
This is my function assuming 3 tosses
n < 3 #victory requires majority of tosses heads #tosses only occur in odd intervals k < seq(n/2+.5,n) victory < function(n,k,p){ for (i in p) { x < 0 for (i in k) { x < x + choose(n, k) * p^k * (1p)^(nk) } z < x } return(z) } p < seq(0,1,.1) victory(n,k,p)
My hope is the
victory()
function would1  find the probability of each of the outcomes where the majority of tosses are heads, given a particular value p
2  sum up those probabilities and add them to a vector z
3  go back and do the same thing given another probability pI tested this with
n < 3, k < c(2,3)
andp < (.5,.75)
and the output was 0.75000, 0.84375. I know that the output should've been 0.625, 0.0984375. 
Exponentiation of Log Transformed Values in Mixed Effects Model
I have run a linear mixedeffects model in R using the nlme package in which my response variable (Proximal_Lead_Bowing) was transformed to log10 scale (Log_Bowing) due to a non normal distribution of values. The estimated differences in Log_Bowing between different Deep Brain Stimulation Electrodes (DBS_Electrode) as estimated by the model using the "glht" function for multiple comparisons of means (Tukey contrasts) are as follows: (View screenshot for full glht() output: https://imgur.com/WVJ9KM6)
Linear Hypothesis: Medtronic 3389  Boston Scientific Versice == 0 Estimate: 0.5766* St. Jude Medical Infinity  Boston Scientific Versice == 0 Estimate: 0.2208 St. Jude Medical Infinity  Medtronic 3389 == 0 Estimate:0.3558* *Denotes significance
Exponentiating these values (10^Abs(Estimate)) provide me with the following estimates for true differences in Proximal_Lead_Bowing as estimated by our mixedeffects model:
Linear Hypothesis: Medtronic 3389  Boston Scientific Versice == 0 3.77 (in millimeters) St. Jude Medical Infinity  Boston Scientific Versice == 0 1.66 St. Jude Medical Infinity  Medtronic 3389 == 0 2.27
These values do not make sense considering that the the average Proximal_Lead_Bowing ± 95% CI for each DBS_Electrode in the sample is as follows:
Boston Scientific Versice: 2.10 ± 0.67 (in millimeters) Medtronic 3389: 2.95 ± 0.58 St. Jude Medical Infinity: 2.00 ± 0.35
Thus I would expect true differences in Proximal_Lead_Bowing as estimated by our linear mixed model to be estimated as approximately 1.0 mm between Medtronic 3389 and the other DBS_Electrode models but instead the exponentiated values I have calculated don't seem to make sense. Am I missing something in the process of exponentiation of log10 values and/or use of the "glht" function for multiple comparisons of means? Any feedback would be appreciated.

What kind of Statistic Method for enrichment or overrepresent should I used for a rank ordered vector with Binary status
I have a gene expression data from 1065 different cell lines, let's say "BRAF" gene. BRAF gene expression levels are ordered. Most TP53 mutated cell lines are high BRAF expression (see the figure below). So what kind of statistical method should I use to test the enrichment or overrepresent for TP53 status (WT vs Mutant) on BRAF expression?

Geom_sf does not use geometry coordinates in axes but plots correct shape of polygon?
My overall aim is to combine multiple shape files (polygons of river subbasins from within a large river basin) into one file and plot as a map. This new combined file will later combine with variable data e.g.(rainfall) and plot by
aes()
.My problem is:
ggplot()+geom_sf()
plots the correct shapes of the polygons but doesn't have the correct coordinates on the axes  it doesn't use the values given in the geometry column on the axes.My thoughts on what is wrong, but I'm not sure how to correct:
 The shape file read in has geometry in 'long' 'lat' (crs= 4326) but the crs is saying the coordinates are in UTM Zone 48N WGS84 (crs=32648). If I try and force the crs to 4326 the coordinate values change as if the conversion formula is trying to correct them.
geom_sf
andcoord_sf
are doing something that I don't understand!
library(sp)
library(raster)
library(ggplot2)
library(sf)
library(ggsf)
library(rgdal)
library(plyr)
library(dplyr)
library(purrr)setwd("/Users/.../Sub_Basin_Outlines_withSdata/")
list.files('/Users/.../Sub_Basin_Outlines_withSdata/', pattern='\.shp$')Read in individual polygon shape files from folder. Combine with ID.
bangsai < st_read("./without_S_data/", "Nam Bang Sai")
BasinID < "BGS"
bangsai < cbind(bangsai,BasinID)ing < st_read("./without_S_data/", "Nam Ing Outline")
BasinID < "ING"
ing < cbind(ing,BasinID)
The two individual shape files import as simple features, see image of R codeCombine the individual subbasin polygon shape files into one shapefile with multiple features.
all_sub_basins < rbind(bangsai,ing)
The image shows the values of the coordinates of the polygons/features in
all_sub_basins$geometry
. They are long lat format yet the proj4sting suggests UTM?Plot the
all_sub_basins
simple feature shapefile in ggplotsubbasins< ggplot()+
geom_sf(data=all_sub_basins, colour="red", fill=NA)
subbasinsThe result is a correctly plotted shape file with multiple features (there are more polygons in this image than read in above). However the axes are incorrect (nonsense values) and are not plotting the same values as in the geometry field.
If I add in coord_sf and confirm the crs:
subbasins< ggplot()+
geom_sf(data=all_sub_basins, colour="red", fill=NA)
coord_sf(datum=st_crs(32648), xlim = c(94,110), ylim = c(9,34))
subbasinsThen I get the Correct axes values but not as coordinates with N and E. It seems as if the geometry isn't recognised as coordinates, just as forced numbers?
I don't mind if the coordinates are UTM Zone 48N or lat long. Could I fix it in any of these ways? If so, how do I achieve that?
 Change the shape file crs without changing the values in the geometry column so geom_sf would know to plot the correct axes text.
 Extract the geometry from the shape file into a two column .csv file with long and lat columns. Convert csv into a sf and create my own shape file with correct crs.
 Last resort, leave the plot as it is and replace new axes text manually.
Any help is much appreciated!

How do I force ggplot to use the ordered x axis
I have this data called
test.melted
below. I also have code to plot this data, but it doesn't plot xaxis values in order(xaxis should be 100pc, 95pc, 90pc so on..). How can I fix this? I also wanted to add line instead of geom_point, but changing it to geom_line gives blank plot.data:
test.melted< structure(list(`diluted sample` = c("100pc", "95pc", "90pc", "85pc", "0pc", "100pc", "95pc", "90pc", "85pc", "0pc", "100pc", "95pc", "90pc", "85pc"), variable = c(" of self", " of self", " of self", " of self", " of self", " with NA12878", " with NA12878", " with NA12878", " with NA12878", " with NA12878", " with NA12877", " with NA12877", " with NA12877", " with NA12877"), value = c(0.96, 0.87, 0.78, 0.71, 0.96, 1.13, 1.03, 0.98, 0.96, 0, 0, 0.03, 0.07, 0.14)), .Names = c("diluted sample", "variable", "value"), row.names = c(1L, 2L, 3L, 4L, 21L, 22L, 23L, 24L, 25L, 42L, 43L, 44L, 45L, 46L), class = "data.frame")
code:
p = ggplot(test.melted, aes( x = `diluted sample`, y = value, color = variable )) p + geom_point()

Log and break in y axis (ggplot2)
I have this graph
Code :
library("tidyverse") library("scales") #data head(Vesself, n = 20L) AREA VESSELm VESSEL Clust 1 A10 5 1 4 2 A13 5 1 4 3 A16 5 1 4 4 A2 5 2 4 5 A23 5 1 4 6 A25 3 2 4 7 A25 5 5 4 8 A26 5 5 4 9 A26 3 2 4 10 A26 2 1 4 11 A27 5 1 4 12 A28 3 1 4 13 A28 5 6 4 14 A36 3 1 4 15 A39 5 1 2 16 A43 5 5 2 17 B25 5 1 4 18 B25 3 1 4 19 B26 3 1 4 20 B26 5 2 4 my_breaksx = c(1, 4, 16, 64, 256, 660) #Plot ggHist < ggplot(data = Vesself, aes(VESSEL, color = Clust, fill = Clust)) + geom_bar(stat = "count", width = 0.08) + scale_color_manual(values = cols, name = "Group") + scale_fill_manual(values = cols, name = "Group") + scale_x_continuous(trans = log2_trans(), breaks = my_breaksx) + labs(x="Density of ships per area", y="Number of area", title="Distribution of ship density", subtitle="by scales")+ theme_bw() + theme(plot.title = element_text(face="bold", hjust=0.5), plot.subtitle=element_text(hjust=0.5), legend.background = element_rect(fill="grey90", size=0.5, linetype="solid", colour ="black"), aspect.ratio = 1) + facet_wrap(~VESSELm) ggHist
When I try to apply a logarithm transformation to the y axis, I don't have the same result as the x axis. The values are incredibly high. I don't understand why.
The result of the transformation without manual breaks :
scale_y_continuous(trans = log2_trans())
And the result with manual breaks :
my_breaksy = c(1, 4, 16, 64, 150) scale_y_continuous(trans = log2_trans(), breaks = my_breaksy)
My goal is to have an equivalent representation as the x axis.

Preview Columns in DataFrame with dplyr select
Is there anyway to preview a dataframes columns names when using
dplyr
select
function? I would like to be able to obtain something in the spirit of using the$
in base R, which previews the names of objects in a data frame. Thanks. 
how to reorder the search path?
I want to attach an environment (package or other) to position 2 and I want it to stay there.
I can use
library
withpos=3
to ensure this most of the time, but I have issues withtidyverse
:search() # [1] ".GlobalEnv" "tools:rstudio" "package:stats" # [4] "package:graphics" "package:grDevices" "package:utils" # [7] "package:datasets" "package:methods" "Autoloads" # [10] "package:base" something < list() attach(something) library(tidyverse,pos = 3) search() # [1] ".GlobalEnv" "package:forcats" "package:stringr" # [4] "package:dplyr" "package:purrr" "package:readr" # [7] "package:tidyr" "package:tibble" "package:ggplot2" # [10] "something" "package:tidyverse" "tools:rstudio" # [13] "package:stats" "package:graphics" "package:grDevices" # [16] "package:utils" "package:datasets" "package:methods" # [19] "Autoloads" "package:base"
tidyverse
attaches its children packages atpos = 2
, I want asearch
path starting with :search() # [1] ".GlobalEnv" "something" "package:forcats"...
How can I achieve this ?

Replace column in a list of lists of dataframes with columns in another list of lists of dataframes. R
I have two sets of lists with the following format:
list(list(structure(list(X = c(3L, 4L, 5L, 7L, 2L, 8L, 9L, 6L, 10L, 1L), Y = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L), .Label = c("no", "yes"), class = "factor")), .Names = c("X", "Y"), row.names = c(NA, 10L), class = "data.frame"), structure(list( X = c(3L, 4L, 5L, 7L, 2L, 8L, 9L, 6L, 10L, 1L), Y = structure(c(2L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L), .Label = c("no", "yes" ), class = "factor")), .Names = c("X", "Y"), row.names = c(NA, 10L), class = "data.frame")))
and
list(list(structure(list(X = c(10L, 3L, 4L, 9L, 8L, 2L, 5L, 7L, 1L, 6L), Y = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L), .Label = c("no", "yes"), class = "factor")), .Names = c("X", "Y"), row.names = c(NA, 10L), class = "data.frame"), structure(list( X = c(5L, 7L, 4L, 3L, 10L, 2L, 9L, 1L, 8L, 6L), Y = structure(c(2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L), .Label = c("no", "yes" ), class = "factor")), .Names = c("X", "Y"), row.names = c(NA, 10L), class = "data.frame")))
My objective is to replace a[[1]][[i]]$x < b[[1]][[i]]$x
This is fairly simple when two dataframes are outside lists:
df1$x<df2$x
However with the code I wrote it does not work
replacex<function(onelist, anotherlist){ newlist<list() #for storage onelist$x<anotherlist$x newlist<onelist } Dfs_new_X<lapply(a,lapply,replacex,anotherlist=b)
It does not give an error, but it deletes the column instead.
Any help would be appreciated.