Set color of point depending on a value
I have a dataframe that looks like this:
x1= c("Station 1", "Station 2", "Station 3", "Station 4", "Station 5", "Station 6")
x2= c(58.73, 57.20, 41.90, 38.00, 47.10, 67.30)
x3= c(16.55, 2.10, 8.80, 23.70, 24.50, 14.40)
x4= c(342, 1900, 283, 832, 212, 1533)
x5= c("rual", "rual", "urban", "suburban", "rual", "urban")
testframe = data.frame(Station=x1, LAT=x2, LON=x3, ALT=x4, AREA=x5)
I want to display the points in 3 different colors. Green for rual, yellow for suburban, red for urban.
But until now I only managed to display them all in one color. I didn this:
library(ggmap)
library(ggplot2)
Europe = get_map(location = "Europe", zoom = 4)
p = ggmap(Europe)
p = p + geom_point(data=testframe, aes(x=testframe$LON, y=testframe$LAT), color = "red", size=1)
p
Can someone help me out please?
1 answer

You could try the following:
p + geom_point(data = testframe, aes(LON, LAT, color = AREA), size = 10) + scale_color_manual(name = "AREA", values = c("rual" = "darkgreen", "suburban" = "yellow", "urban" = "red"))
Or copy/paste this chunk of code:
library(ggmap) library(ggplot2) x1 = c("Station 1", "Station 2", "Station 3", "Station 4", "Station 5", "Station 6") x2 = c(58.73, 57.20, 41.90, 38.00, 47.10, 67.30) x3 = c(16.55, 2.10, 8.80, 23.70, 24.50, 14.40) x4 = c(342, 1900, 283, 832, 212, 1533) x5 = c("rual", "rual", "urban", "suburban", "rual", "urban") testframe = data.frame(Station = x1, LAT = x2, LON = x3, ALT = x4, AREA = x5) Europe = get_map(location = "Europe", zoom = 4) p = ggmap(Europe) p + geom_point(data = testframe, aes(LON, LAT, color = AREA), size = 10) + scale_color_manual(name = "AREA", values = c("rual" = "darkgreen", "suburban" = "yellow", "urban" = "red"))
See also questions close to this topic

How to replace values in one df with values from another df using an if statement
I have two data frames:
df1 <data.frame(ID=c(1,2,3,4,5), date=c(NA,NA,NA,NA,NA), outcome=c(NA,1,NA,NA,0)) df1 ID date outcome 1 1 NA NA 2 2 NA 1 3 3 NA NA 4 4 NA NA 5 5 NA 0 df2 <data.frame(ID=c(3,25,222,415,700), date=c(010215,032412,040513,041015,120314), outcome=c(1,1,1,1,1)) df2 ID date outcome 1 3 10215 1 2 25 32412 1 3 222 40513 1 4 415 41015 1 5 700 120314 1
If the ID in df1 is in df2 then I want to replace df1$date with df2$date. Also, if ID in df1 is in df2 I want to set df1$outcome = 1. I can do this with this code:
df1$date <ifelse(df1$ID %in% df2$ID, df2$date[match(df1$ID,df2$ID)],df1$date) df1$outcome <ifelse(df1$ID %in% df2$ID, 1,df1$outcome) df1 ID date outcome 1 1 NA NA 2 2 NA 1 3 3 10215 1 4 4 NA NA 5 5 NA 0
but I would like to understand how to do it with one if statement. I have come up with the following code:
for(i in 1:nrow(df1)){ if(df1$ID[i] %in% df2$ID){ df1$outcome[i]==1 & df1$date[i]==df2$date[match(df1$ID,df2$ID)] } } df1 ID date outcome 1 1 NA NA 2 2 NA 1 3 3 NA NA 4 4 NA NA 5 5 NA 0
which runs without errors, but does not complete the desired replacement. Can someone suggest how to modify what I have done to make it work like the first code chunk?

R Index error while trying to append multiple dataframes into one
So I have a massive dataframe and I'm trying to combine scores I calculated from multiple dataframes (about 17 dataframes) to this one dataframe and I need to do this process 12 different times. This is an example dataframe that I have
df=structure(list(ï..id = structure(c(2L, 7L, 5L, 4L, 3L, 1L, 6L, 8L), .Label = c("B12", "B7", "C2", "C9", "D3", "E2", "E6", "R4" ), class = "factor"), age = c(42L, 45L, 83L, 59L, 49L, 46L, 52L, 23L)), class = "data.frame", row.names = c(NA, 8L))
So I need to calculate network metrics using the
igraph
package. Here are 2 matrices I have with different people in themnet_mat1=structure(c("B7", "E6", "D3", "C9"), .Dim = c(2L, 2L), .Dimnames = list( NULL, c("ï..target", "partner"))) net_mat2=structure(c("C2", "B12", "E2", "R4"), .Dim = c(2L, 2L), .Dimnames = list( NULL, c("ï..target", "partner")))
Here is what I'm calculating
library(igraph) g1=graph_from_edgelist(net_mat1) g2=graph_from_edgelist(net_mat2) degree_cent_close_1=centr_degree(g1, mode = "all") degree.cent_close_1 #create object that contains metrics degree.cent_close2=centr_degree(g2, mode = "all") degree.cent_close2 #create another object that contains metrics
I then create dataframes that contain the metrics I calculated
cent_score_df1=data.frame(degree_cent_close_1$res, V(g1)$name) cent_score_df1 cent_score_df2=data.frame(degree.cent_close2$res, V(g2)$name) cent_score_df2
I then try to match and index the the values of these metrics back into the
df
dataframe doing thisdf$centrality_scores < cent_score_df1[ match(df[['id']], cent_score_df1[['V.g1..name']] ) , 'degree_cent_close_1.res'] df$centrality_scores df$centrality_scores < cent_score_df2[ match(df[['id']], cent_score_df2[['V.g2..name']] ) , 'degree.cent_close2.res'] df$centrality_scores
However, it seems each time I try to merge my data with the original dataframe it can only attach half the data. I can never attach both dataframes. Does anyone have a better method that works for reattaching data? If there are faster and cleaner ways of doing this I would greatly appreciate the input

Running Rscript through a slurm system fails
I'm trying to run
R
code throughRscript
calls on a google cluster instance.I've downloaded
R
:R version 3.3.3 (20170306) Platform: x86_64pclinuxgnu (64bit) Running under: Debian GNU/Linux 9 (stretch)
And set the path in my
.bashrc
:export R_LIBS_USER=/home/akh/R
If I run the command on the head node:
Rscript <path_to_r_code.R>
It runs fine.
However if I submit that as a job to the
slurm
system it fails with thiserror
message:/var/lib/slurm/slurmd/job12197/slurm_script: 9: /var/lib/slurm/slurmd/job12197/slurm_script: Rscript: not found
Inside my
R
folder there is nobin
folder bur rather a folder calledbindr
in which there is anotherR
folder and in which there is abindr
file:akh@n2frontend001:~$ head R/bindr/R/bindr # File share/R/nspackloader.R # Part of the R package, http://www.Rproject.org # # Copyright (C) 19952012 The R Core Team # # This program is free software; you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation; either version 2 of the License, or # (at your option) any later version.
This is the file I'm submitting to the
slurm
system:#!/bin/sh #SBATCH partition=main #SBATCH jobname=bbh #SBATCH output=bbh.%J.OUT #SBATCH error=bbh.%J.ERR #SBATCH mempercpu=5000 #SBATCH ntasks=1 #SBATCH cpuspertask=5 Rscript /home/akh/bbh.R args query="ptr" tr1.fn="/home/akh/RP/ptr.rn" pr2.fn="/home/akh/RP/ptr.aa" search="ann" tr2.fn="/home/akh/RP/ann.rn" pr2.fn="/home/akh/RP/ann.aa" out.fn="/home/akh/RP/ptr.ann.bbh"
Any idea?

R wavelet plots data example
I would like to generate just some simple plots of different types/ forms of wavelets comparable to the plots in wikipedia:
https://en.wikipedia.org/wiki/Wavelet
After an exhaustive research, I still could not find any example data sets and functions integrated in a package to simply plot different example types of wavelets in R?
Any help appreciated!

ggplot facet_wrap as_labeller does not display the new sequence
I created a vector of ordered names and tried to replace each panel title with the ordered one (e.g., Jessie with 1. Jessie, Marion with 2.Marion, etc.). But, I am getting NAs for each panel title instead. Any hints what is going wrong.
With the NAs
With the labeller commented out
list.top.35.names.ordered < data.frame( cbind(order = c(1:35),list.top.35.names)) %>% unite( name.new, c("order" ,"list.top.35.names"), sep = ".") list.top.35.names.ordered < list.top.35.names.ordered$name.new[1:35]
str(list.top.35.names.ordered)
chr [1:35] "1.Jessie" "2.Marion" "3.Jackie" "4.Alva" "5.Trinidad" "6.Ollie" ...data.babyname.all %>% ggplot( mapping = aes(x = year, y = perc, fill = sex)) + geom_density(stat = "identity", position = "stack" , show.legend = F ) + facet_wrap(~name, ncol= 7, nrow =5, labeller= as_labeller(list.top.35.names.ordered)) + scale_fill_manual(values = c('#E1AEA1','#9ABACF')) + geom_point(data = most.unisex.year.and.value, mapping = aes(x = year, y = perc), size = 3, fill = "white", color = "black", shape = 21) + scale_y_continuous(breaks = c(0,.50,1), labels= c("0%", "50%","%100")) + scale_x_continuous(breaks = c(1940, 1960, 1980,2000), labels= c('1940', "'60","'80",'2000')) + geom_text(mapping = aes(x =x , y = y , label = label), check_overlap = F, na.rm = T, position = position_dodge(width=.9), size=3) + theme_minimal() + #set theme theme( text = element_text(size = 10), axis.title.x = element_blank(), axis.title.y = element_blank(), panel.grid = element_blank(), panel.border = element_blank(), plot.background = element_blank(), axis.ticks.x = element_line(color = "black"), axis.ticks.length =unit(.2,'cm'), strip.text = element_text(size = 10, margin = margin(l=10, b = .1)))

Adding a legend for vertical lines of histograms
I'm trying to put a legend for a graph I am creating. The idea is to compare the mean and medians of a skewed and symmetric distribution. This is what I currently have as the code however the
show.legend = TRUE
code doesn't do the job.
set.seed(19971222) sym < as.data.frame(cbind(c(1:500), rchisq(500, df = 2))) # generate 500 random numbers from a symetric distribution colnames(sym) < c("index", "rnum") sym_mean < mean(sym$rnum) sym_med < median(sym$rnum) # get into a format that tidyverse likes central_measures < as.data.frame(cbind(sym_mean, sym_med)) colnames(central_measures) < c("mean", "median") sym %>% ggplot(aes(sym$rnum)) + geom_histogram(binwidth = 0.4, fill = "steelblue", colour = "navy", alpha = 0.9) + geom_vline(xintercept = sym_mean, colour = "red", show.legend = TRUE) + geom_vline(xintercept = sym_med, colour = "yellow", show.legend = TRUE) + labs(title = "Histogram of 500 Randomly Generated Numbers from the ChiSquared Distribution", x = "Value", y = "Frequency") + theme_minimal()
I just want to have a legend on the side saying that the red is the "Mean" and the yellow is the "Median".
Thank you!

How do I plot coordinates on a US map?
I am new to R and have only gotten this far using tutorials. my data "geolocation" contains coordinates to about 200 cities in the US.
library("ggmap") library("tidyverse") library("dplyr") library ("usmap") library("ggplot2") library("maps") geolocation<read.csv("mapdata.csv") geolocation<geolocation[,1:7] names(geolocation)<c("City","State","Country","Size","Status","Lat","Long") map("state") p < ggmap(USA) p < p + geom_point(data=geolocation, aes(x= Long, y= Lat), size=Size, color=Status)

Python binary RNN classification of timeseries coordinates
I have been attempting to create an RNN. I have, in total, a dataset of 1661 individual "entries" with 158 timeseries coordinates in each of those entries.
The following is a small part of one entry:
0.00000000e+00 1.92609687e04 3.85219375e04 5.77829062e04 3.00669864e04 2.35106660e05 7.33379576e04 1.49026982e03
This is simply an array of 158 timeseries values.
Now,I would like to classify whether or not an array of values belongs to a condition A or a condition B.
I have looked at a lot of blogs, keras documentationl, and youtube videos,and came up with the following network:
from keras.models import Sequential from keras.layers import Dense from keras.layers import LSTM from keras.layers.embeddings import Embedding from sklearn.model_selection import train_test_split import numpy as np import matplotlib.pyplot as plt # Set data and labels # Somehow find a way to 'unpack' the data datarnn = np.copy(normalized_data) datarnn = np.array(rearrange_data(datarnn)) print(len(datarnn)) # Convert labels to binary labels targetrnn = np.asarray(['1' if 'A' in str(x) else '0' for x in spineMidData_clean[:,0][1:]]) # Split data for training and testing x_training,x_testing,y_training,y_testing = train_test_split(datarnn,targetrnn,test_size=0.2,random_state=4) model=Sequential() # Input layer model.add(Embedding(1661, 1)) # Hidden layer model.add(LSTM(3)) # Output layer with binary classification model.add(Dense(1, activation='sigmoid')) # Set training settings model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy']) # Model diagnostics model.summary() history = model.fit(x_training,y_training,epochs=20,validation_data=(x_testing,y_testing)) # Predict the test data results = model.predict(x_testing)
I was pretty excited to finally see it work. However, I can't seem to increase accuracy, which stays at around 50%. Is there a way to make this network more accurate? E.G. do I add more layers, or have I configured an existing one in a wrong/inefficient way?

How to make code calculate angle given 3 points?
So I wrote a program using Java to calculate an angle when I input 2 points. It uses the dot product formula and vectors to do so. However, Math.acos always outputs NaN unless the angle is perfectly 90 degrees.
One of my failed attempts is on the code link below. It works when the angle is 90 degrees only. For all other angles, it is NaN.
This code can change depending on the coordinates and the angles inputted but for what it is currently set to, it should output the degrees depending on the coordinates inputted. In this case, it only works for 90 degrees. How do I make Math.acos work and fix the formula in the program?

How to check a number every x amount
I need to write a conditional statement for a cloud function that runs every 2016th document created.
So I have a variable that gets iterated on every time a new document is created. In my mind I thought I would be able to use this one variable to check every x amount.
The current amount of
documentsCreated
is just a random number, not a set variable.const documentsCreated = 19239123; function checkDocuments(){ let x = (documentsCreated / 2016) % 2016; if(x === 2016){ return true } else { return false } }
This function should return
true
everytimedocumentsCreated
is a multiple of 2016.I would love to do this with just one variable but I am thinking that I might have to keep a second variable that I reset to 0 every time it hits 2016.

What does this where Condition mean in python script
I am trying to translate a python script into java. As I'm not much familiar with python, I cannot understand a condition in this script. here is the original script:
import numpy as np def inverse_generalized_anscombe(x, mu, sigma, gain=1.0): test = np.maximum(x, 1.0) exact_inverse = ( np.power(test/2.0, 2.0) + 1.0/4.0 * np.sqrt(3.0/2.0)*np.power(test, 1.0)  11.0/8.0 * np.power(test, 2.0) + 5.0/8.0 * np.sqrt(3.0/2.0) * np.power(test, 3.0)  1.0/8.0  np.power(sigma, 2) ) exact_inverse = np.maximum(0.0, exact_inverse) exact_inverse *= gain exact_inverse += mu exact_inverse[np.where(exact_inverse != exact_inverse)] = 0.0 return exact_inverse
The line that I don't understand is this line:
exact_inverse[np.where(exact_inverse != exact_inverse)] = 0.0
As I understand, exact_inverse should be a single value, and not an array, so why is there a pair of square brackets in front of it? what is the condition in square brackets is trying to check?
exact_inverse != exact_inverse
condition seems to be alwaysfalse
, or am I missing something here.The original Script can be found here

How to draw from sample until at least one of each sample value is obtained and then stop
I am working on a homework problem where a cereal company is running a promotion where 1 of 4 free toys is included in each box. The goal is to collect all 4 toys through buying one box at a time.
There are two scenarios:
1.Each toy is equally likely
2.The toys have selection probabilities 0.10, 0.25, 0.25, and 0.40.
I want to design a function that takes each toy number and each toy's selection probability as inputs, simulates buying boxes of cereal by sampling from the toy numbers until at least one of each toy is obtained, then stops and reports how many boxes were purchased.
The end goal is to use this function in a
Monte Carlo
simulation study to find out on average how many boxes consumers will have to buy to collect all the toys, and the proportion of consumers that will have to buy at least 14 boxes to collect all the toys.I've tried to create a
loop (while, repeat)
which samples until a vector contains all the toy values but the loops have run infinitely. I suspect there is a problem with the condition I am feeding the loop.box_buyer < function (purchase_options, probabilities) { boxes < numeric() while (!purchase_options %in% boxes) { append(boxes, sample(purchase_options, 1, probabilities, replace = TRUE)) } return(length(boxes)) } box_buyer(c(1, 2, 3, 4), c(1/4, 1/4, 1/4, 1/4))
I am expecting a function that returns the number of boxes that were purchased. Currently what I get is an infinite loop that gives the error:
"the condition has length > 1 and only the first element will be used"
repeated until I terminateR
.How can I get the loop to sample until at least one of each toy is obtained and then stop and return the number of boxes purchased? Any help is appreciated.

Cannot retrieve location map from R package dismo using 'gmap'
I am attempting to retrieve an image from the Google static maps webservice using package 'dismo' and 'gmap' within R but only get the following code back:
mymap<gmap("France") REQUEST_DENIED:France [1] "try 2 ..." REQUEST_DENIED:France [1] "try 3 ..." REQUEST_DENIED:France [1] "try 4 ..." REQUEST_DENIED:France Error in gmap("France") : location not found
I have also tried to get the image using 'ggmap' and 'RgoogleMaps' but I am met with similar codes or:
mymap<get_map(location = "France", zoom = 4) Error in download.file(url, destfile = tmp, quiet = !messaging, mode = "wb") : cannot open URL 'http://maps.googleapis.com/maps/api/staticmap?center=France&zoom=4&size=640x640&scale=2&maptype=terrain&language=enEN&sensor=false' In addition: Warning message: In download.file(url, destfile = tmp, quiet = !messaging, mode = "wb") : cannot open URL 'http://maps.googleapis.com/maps/api/staticmap?center=France&zoom=4&size=640x640&scale=2&maptype=terrain&language=enEN&sensor=false': HTTP status was '403 Forbidden'
I have not found any solutions through looking through pages or in the R documentation and I have no previous experience with retrieving google images within in R so any help would be very appreciated!

R GGMAP Plot UK Post Codes
I have an excel spreadsheet of UK Post Codes and Customer numbers in two columns, I'd like to plot them in R Studio using GGMAP ( I'm new to R ), is anyone able to please point me towards the necessary code or a guide on how to do this ?. thanks for your help Gavin.

Geocoding Data Locations With Google in R
I am trying to use very well written instructions from this blog: https://www.jessesadler.com/post/geocodingwithr/ to geocode locational data in R including specific cites and cities in Hawaii. I am having issues pulling information from Google. When running mutate_geocode my data runs but no output is gathered. I bypassed this for the time being with manual entry of lat and lon for just one location of my dataset, attempting to trouble shoot. Now, when I use get_googlemap, I get the error message "Error in Download File"
I have tried using mutate_geocode as well as running a loop using geocode. I either do not get output or I get the OVER_QUERY_LIMIT error (which seems to be very classic). After checking my query limit I am nowhere near the limit.
Method 1:
BH < rename(location, place = Location) BH_df < as.data.frame(BH) location_df < mutate_geocode(HB, Location)
Method 2:
origAddress < read.csv("HSMBH.csv", stringsAsFactors = FALSE) geocoded < data.frame(stringsAsFactors = FALSE) for(i in 1:nrow(origAddress)) { result < geocode(HB$Location[i], output = "latlona", source = "google") HB$lon[i] < as.character(result[1]) HB$lat[i] < as.character(result[2]) HB$geoAddress[i] < as.character(result[3]) }
Post Manual Entry of lon and lat points I run in to this error:
map < get_googlemap(center = c(158.114, 21.59), zoom = 4)
I am hoping to gather lat and lon points for my locations, and then be able to use get_googlemap to draft a map with which I can plot density points of occurrences (I have the code for the points already).