Function returns a function within lapply  nested lapply?
I thought I was being elegant with the code until I ran into an issue with lapply function. I used dput to output sample. Note that I am using data.table not data.frame.
full_data < structure(list(FireplaceQu = c("Gd", "Gd", "TA", "TA", "Gd",
"None", "Gd", "Gd", "None", "None", "None", "None", "Gd", "Gd",
"Gd", "None"), BsmtQual = c("TA", "Gd", "Gd", "TA", "Gd", "TA",
"Ex", "TA", "TA", "TA", "TA", "Ex", "TA", "Ex", "Ex", "Gd"),
CentralAir = c("Y", "Y", "Y", "Y", "Y", "Y", "Y", "Y", "N",
"N", "Y", "Y", "Y", "Y", "Y", "Y")), .Names = c("FireplaceQu",
"BsmtQual", "CentralAir"), class = "data.frame", row.names = c(NA,
16L))
library(data.table)
setDT(full_data)
cols = c('FireplaceQu', 'BsmtQual', 'CentralAir')
FireplaceQu=c('None','Po','Fa','TA','Gd','Ex')
BsmtQual=c('None','Po','Fa','TA','Gd','Ex')
CentralAir=NA
cust_levels < list(FireplaceQu, BsmtQual, CentralAir)
# I modified a function from SO to sort based on set levels instead of by using default sort function.
# https://stackoverflow.com/questions/38620424/labelencoderfunctionalityinr
# function which returns function which will encode vectors with values of 'vec'
lev_index = 1
label_encoder = function(vec){
levels = cust_levels[[lev_index]]
lev_index = lev_index + 1
function(x){
match(x, levels)
}
}
full_data[, (cols) := lapply(.SD, lapply(.SD, label_encoder)), .SDcols = cols]
I know I can get this to work in a for loop, but I thought I would try to use the lapply function. I'm confused on how to use this with a function that returns a function as the value and than needs to be evaluated.
I ultimately want to create integer values ordered based on the order of the cust_levels. Bonus if I can get rid of the lev_index!
Example input:
FireplaceQu BsmtQual CentralAir
None Gd Y
TA Gd Y
TA Gd Y
Gd TA Y
Example output:
FireplaceQu BsmtQual CentralAir
1 5 NA
4 5 NA
4 5 NA
5 4 NA
1 answer

You can do this with
mapply
:full_data[, (cols) := mapply(match, .SD, cust_levels, SIMPLIFY = FALSE), .SDcols = cols] # > full_data # FireplaceQu BsmtQual CentralAir # 1: 5 4 NA # 2: 5 5 NA # 3: 4 5 NA # 4: 4 4 NA # 5: 5 5 NA # 6: 1 4 NA # 7: 5 6 NA # 8: 5 4 NA # 9: 1 4 NA # 10: 1 4 NA # 11: 1 4 NA # 12: 1 6 NA # 13: 5 4 NA # 14: 5 6 NA # 15: 5 6 NA # 16: 1 5 NA
See also questions close to this topic

R. How to check if a dataset contains the same elements in another dataset
I have 2 datasets "Dataset2016_17" and "PlayOffDataset2016_17". Dataset2016_17$TEAM looks like the following.. [1] "Atlanta Hawks" "Boston Celtics" "Brooklyn Nets", etc. So I would like to know if values in Dataset2016_17$TEAM occurs in PlayOffDataset2016_17$TEAM. If so I want something like a table of true and false.
I have already tried something like this
highlight_flag < grepl(PlayOffDataset2016_17$TEAM, Dataset2016_17$TEAM)
But it did not work. Please let me know if there are any suggestions.

how to capture a repeated group
I'am trying to create a regular expression to capture repeated groups using the package
stringr
.my_text < c("LPC 14:0", "PC 16:0_18:1", "TAG 18:0_20:1_22:2")
I'am trying to capture all the numbers:
 from LPC I want the 14 and 0,
 from PC I want the 16, 0, 18 and 1
 from TAG I want 18, 0, 20, 1, 22 and 2
So far I tried:
str_match_all(string = my_text, pattern = "^[AZ]+ (([09]{2}):([09]{1})_?)*")
and several variations on this. I only succeed to capture the first match or the last match. On 101regex.com I get the message:
A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations.
But I just can't get it to work. Any help appreciated!!
Cheers, Rico

Merge dataframes of list and obtain names of dataframes as column
I merged all data frames from a list in just one data frame.
The dataframes inside the list are called
TAI NAM HEE
and each data frame looks like this
YrM Compound1 Compound 2 201501 0.002 0.15 201502 0.004 0.02 201503 0.01 0.09
when I merge all dataframes with
meanall<do.call(rbind, meaneach)
I getYrM Compound1 Compound2 TAI.1 201501 0.002 0.15 TAI.2 201502 0.004 0.02 TAI.3 201503 0.01 0.09 . . . NAM.1 201501 0.03 0.4 NAM.2 201502 0.001 0.005
I would like to get a column with the names of the list and not as rownames (like above), and without the numbers (TAI.1, TAI.2...), I just want the name TAI
So that I get this:
List YrM Compound1 Compound2 TAI 201501 0.002 0.15 TAI 201502 0.004 0.02 TAI 201503 0.01 0.09 . . . NAM 201501 0.03 0.4 NAM 201502 0.001 0.005
How can I do this?

Neural Network  problem with back propagation  invalid syntax
The following code that I pulled from here in an effort to better understand how Machine Learning and Neural Networks work, isn't working. It keeps producing an "invalid syntax" error at line 31.
self.weights1 = self.weights1 + d_weights1
Here is the full code, any suggestions would be increadibly helpful as I'm pushing the limits of what I understand with python.
import numpy as np def sigmoid(x): # ACTIVATION FUNCTION  dictates if the numaric output is true of false  grades each layer return 1.0/(1+ np.exp(x)) def sigmoid_derivative(x): # BACKPROPAGATION  tells the appropriate amount to adjust the weights and biases return x * (1.0 x) class NeuralNetwork: # NEURAL NETWORK  where the training through generations happens def __init__(self, x, y): self.input = x self.weights1 = np.random.rand(self.input.shape[1], 4) self.weights2 = np.randow.rand(4, 1) self.y = y self.output = np.zeros(y.shape) def feedforward(self): # calculates through each layer of the network self.layer1 = sigmoid(np.dot(self.input, self.weights1)) self.output = sigmoid(np.dot(self.layer1, self.weights2)) def backprop(self): # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1 d_weights2 = np.dot(self.layer1.T, (2*(self.y  self.output) * sigmoid_derivative(self.output))) d_weights1 = np.dot(self.input.T, (np.dot(2*(self.y  self.output) * sigmoid_derivative(self.output), self.weights2.T) * sigmoid_derivative(self.layer1)) # update the weights with the derivative (slope) of the loss function self.weights1 += d_weights1 self.weights2 += d_weights2 if __name__ == "__main__": X = np.array([ [0, 0, 1], [0, 1, 1], [1, 0, 1], [1, 1, 1] ]) y = np.array([[0], [1], [1], [0]]) nn = NeuralNetwork(X,y) for i in range(1500): # GENERATIONS = range(generations): nn.feedforward() nn.backprop() print(nn.output)

slow function by groups in data.table r
My experimental design has trees measured in various forests, with repeated measurements across years.
DT < data.table(forest=rep(c("a","b"),each=6), year=rep(c("2000","2010"),each=3), id=c("1","2","3"), size=(1:12)) DT[,id:=paste0(forest,id)] > DT forest year id size 1: a 2000 a1 1 2: a 2000 a2 2 3: a 2000 a3 3 4: a 2010 a1 4 5: a 2010 a2 5 6: a 2010 a3 6 7: b 2000 b1 7 8: b 2000 b2 8 9: b 2000 b3 9 10: b 2010 b1 10 11: b 2010 b2 11 12: b 2010 b3 12
For each tree i, I want to calculate a new variable, equal to the summatory of the size of all the other individuals in the same group/year that are bigger than the tree i.
I have created the following function:
f.new < function(i,n){ DT[forest==DT[id==i, unique(forest)] & year==n # select the same forest & year of the tree i & size>DT[id==i & year==n, size], # select the trees larger than the tree i sum(size, na.rm=T)] # sum the sizes of all such selected trees }
When applied within the data table, I got the correct results.
DT[,new:=f.new(id,year), by=.(id,year)] > DT forest year id size new 1: a 2000 a1 1 5 2: a 2000 a2 2 3 3: a 2000 a3 3 0 4: a 2010 a1 4 11 5: a 2010 a2 5 6 6: a 2010 a3 6 0 7: b 2000 b1 7 17 8: b 2000 b2 8 9 9: b 2000 b3 9 0 10: b 2010 b1 10 23 11: b 2010 b2 11 12 12: b 2010 b3 12 0
Note that I have a large dataset with several forests (40) & repeated years (6) & single individuals (20,000), for a total of almost 50,000 measurements. When I carry out the above function it takes 810 minutes (Windows 7, i56300U CPU @ 2.40 GHz 2.40 GHz, RAM 8 GB). I need to repeat it often with several small modifications and it takes a lot of time.
 Is there any faster way to do it? I checked the *apply functions but cannot figure out a solution based on them.
 Can I make a generic function that doesn't rely on the specific structure of the dataset (i.e. I could use as "size" different columns)?

Create columns like some sort of contingency table based on a solution variable
As my previous question about this topic from a month ago was not yet answered completely and I am missing 1 reputation point to add a bounty, I decided to ask it again with some added information.

I have a list of stores and I have a product (apples). I ran a system of linear equations to get the column 'var'; this value represents the amount of apples you will receive or have to give to another store. I can't figure out how to make an 'actionable dataframe' from it. I can't figure out the correct terms to correctly explain what I want so I hope below helps:
Data:
df < data.frame(store = c('a', 'b', 'c', 'd', 'e', 'f'), sku = c('apple', 'apple', 'apple', 'apple', 'apple', 'apple'), var = c(1,4,6,1,5,3))
Output I want (or something similar):
output < data.frame(store = c('a', 'b', 'c', 'd', 'e', 'f'), sku = c('apple', 'apple', 'apple', 'apple', 'apple', 'apple'), var = c(1,4,6,1,5,3), ship_to_a = c(0,0,1,0,0,0), ship_to_b = c(0,0,4,0,0,0), ship_to_c = c(0,0,0,0,0,0), ship_to_d = c(0,0,0,0,0,0), ship_to_e = c(0,0,1,1,0,3), ship_to_f = c(0,0,0,0,0,0))
Bonus: Ideally, I would like to fill the ship_to_store columns until all ()minus values are 'gone' when sum(df$var) doesn't count up to zero.
This function was created by another user:
fun < function(DF){ n < nrow(DF) mat < matrix(0, nrow = n, ncol = n) VAR < DF[["var"]] neg < which(DF[["var"]] < 0) for(k in neg){ S < 0 Tot < abs(DF[k, "var"]) for(i in seq_along(VAR)){ if(i != k){ if(VAR[i] > 0){ if(S + VAR[i] <= Tot){ mat[k, i] < VAR[i] S < S + VAR[i] VAR[i] < 0 }else{ mat[k, i] < Tot  S S < Tot VAR[i] < VAR[i]  Tot + S } } } } } colnames(mat) < paste0("ship_to_", DF[["store"]]) cbind(DF, mat) }
The function worked in my specific example above, but it doesnt work in all cases as it does not save the already received number of apples per store and therefore results in the store receiving too many apples. For example:
df < data.frame(store = c('a', 'b', 'c', 'd', 'e') sku = c('apple', 'apple', 'apple', 'apple', 'apple') var = c(44,151,100,52,43))
Output has store B giving 100 apples to store C and store A 44 apples to C. That makes 144 instead of the 100 they should get.

Merge data which have a different number of observations with filling empty cells by zeros in R
I work with two dataset
df1=structure(list(ad_set_id = c("23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700" ), realpurchase_cash = c(6.9900002, 1.49, 1.49, 1.49, 1.49, 1.49, 1.49, 1.49, 1.49, 3.99, 1.49, 1.49, 6.9900002, 1.49, 1.49, 6.9900002, 1.49)), .Names = c("ad_set_id", "realpurchase_cash"), row.names = c(NA, 17L), class = "data.frame")
and
df2=structure(list(spent = c(1.02, 30.13, 29.46, 28.7, 8.72, 50.27, 51.19, 50.14, 50.07, 50.91, 47.15, 47.4), ad_set_id = c("23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700", "23843069854090700")), .Names = c("spent", "ad_set_id"), row.names = c(NA, 12L), class = "data.frame")
when i try
cbind
it, i get errorError in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 2081, 5530
I know what does mean!
In df1 and df2, the ad_set_id is key column. let's take my reproducible example ad_set_id in df1 has 17 rows for realpurchase_cash columns ad_set_id in df2 has 12 rows for spent columns How to do, that if any ad_set_id present in both dataset (df1,df2), and in one of them(ad_set) has more observations than in the other, then add in ad_set_id where fewer rows zero values in the metric variable(spent or realpurchase Depending on dataset.) I.E in reproducible example df2 has fewer rows by ad_set_id so add zeros in spent column
I.E output
spent ad_set_id ad_set_id.1 realpurchase_cash 1 1.02 23843069854090700 23843069854090700 6.99 2 30.13 23843069854090700 23843069854090700 1.49 3 29.46 23843069854090700 23843069854090700 1.49 4 28.70 23843069854090700 23843069854090700 1.49 5 8.72 23843069854090700 23843069854090700 1.49 6 50.27 23843069854090700 23843069854090700 1.49 7 51.19 23843069854090700 23843069854090700 1.49 8 50.14 23843069854090700 23843069854090700 1.49 9 50.07 23843069854090700 23843069854090700 1.49 10 50.91 23843069854090700 23843069854090700 3.99 11 47.15 23843069854090700 23843069854090700 1.49 12 47.40 23843069854090700 23843069854090700 1.49 13 0.00 23843069854090700 23843069854090700 6.99 14 0.00 23843069854090700 23843069854090700 1.49 15 0.00 23843069854090700 23843069854090700 1.49 16 0.00 23843069854090700 23843069854090700 6.99 17 0.00 23843069854090700 23843069854090700 1.49
zeros in the metric variable is filled for ad_set_id which matched in df2 and df1
How to perform it?

efficiency of chain rule in data.table?
data.table chain rule is quite fasnating to me. It just goes with my thoughts without interuption. However, I am wondering if it is more efficient than stepbystep way, so I made such a toy sample and trying to measure it.
library(data.table) library(magrittr) library(microbenchmark) set.seed(100) DT < data.table(A = sample(1e4, 1e5, replace = T), B = sample(1e4, 1e5, replace = T), C = letters[sample(1:26, 1e5, replace = T)]) DX < DT[,.(X = A*B/log(A)), by = .(C)][,.N, by =.(C)][order(N)][cumsum(N)/sum(N)>0.8][DT, on ="C"][,cut(A, quantile(A),labels = 1:4)] %>% table DY < { DY < DT[,.(X = A*B/log(A)), by = .(C)] DY < DY[,.N, by =.(C)][order(N)][cumsum(N)/sum(N)>0.8] DY < DY[DT, on ="C"][,cut(A, quantile(A),labels = 1:4)] DY < DY %>% table }
the result was odd, sometimes it tells DX is better, sometimes not.
> microbenchmark(DX, DY, times = 1000) Unit: nanoseconds expr min lq mean median uq max neval DX 0 0 0.503 0 0 353 1000 DY 0 0 1.908 0 0 1764 1000 > microbenchmark(DX, DY, times = 1000) Unit: nanoseconds expr min lq mean median uq max neval DX 0 0 2.189 0 0 2116 1000 DY 0 0 0.436 0 0 353 1000
How come will this happen? and if the chaining is better in efficiency? p.s. interestingly, according to my limited observation, the results just come one faster and one slower.

Memory requirement using `fread()` for large column vector
I have a humanreadable file containing 1 billion doubles all written in a single line (1 billion columns).
The file itself is only around 8G and I am using
fread("filename.data", sep=" ", header=FALSE, data.table=TRUE, showProgress=TRUE)
to load them into an R session. The script will always be "Killed" and the most amount of information I get from
showProgress
is* caught segfault * address 0x7efc7bed2010, cause 'memory not mapped'
I've loaded much larger files (raw size) using the same approach in the past but probably in "matrix form" and with fewer columns. I'm guessing that data.table is needing to store 1 billion column names which is costing a lot of memory... Is this correct?
 Is there no way to
fread
a straight into row vector (as opposed to transposing after reading)?  Would this data be salvageable or do I need to rewrite it as a row vector?
 Is there no way to

t.test: create lapply function for multiple grouping levels
I'm trying to create an
lapply
function to run multiplet.test
s for multiple levels of grouping. I came across this question: KruskalWallis test: create lapply function to subset data.frame? but they were only trying to group by one variable (phase
). I would like to add another grouping levelcolor
, where my iv isdistance
and dv isval
grouped bycolor
thenphase
.# create data val<runif(60, min = 0, max = 100) distance<floor(runif(60, min=1, max=3)) phase<rep(c("a", "b", "c"), 20) color<rep(c("red", "blue","green","yellow","purple"), 12) df<data.frame(val, distance, phase, color)
Their answer for the grouping by
phase
waslapply(split(df, df$phase), function(d) { kruskal.test(val ~ distance, data=d) })
However, it doesn't account for another level (
color
) for grouping. I might be approaching this wrong so I appreciate any help. 
Add placeholder rows to a list of dataframes
I have a list of dataframes.
# Split dataframe into list of dataframes on two factor variables. DF_list < split(DF, list(DF$Unit_number, DF$Compartment), drop = TRUE)
I know how to remove rows from lists of dataframes. But this time I want to add rows. 1 placeholder row on the bottom of every dataframe in the list.
This will prevent my rate of change calculations from creating false calculations for different factor levels that buttress each other in the normal dataframe structure.
Before splitting on compartment and unit number the dataframe looks like this;
DF < data.frame(Unit_number=c(1,1,2,2,2,1,2,2,1,1), Compartment=c("Engine", "Engine", "Engine", "Transmission", "Transmission", "Transmission", "Tyres", "Tyres", "Tyres", "Tyres"))
The result needed is this;
Result < data.frame(Unit_number=c(1,1,"Placeholder",2,"Placeholder",2,2,"Placeholder",1,"Placeholder",2,2,"Placeholder",1,1), Compartment=c("Engine", "Engine","Placeholder", "Engine","Placeholder", "Transmission", "Transmission","Placeholder", "Transmission","Placeholder", "Tyres", "Tyres","Placeholder", "Tyres", "Tyres"))

How to get counts for every levels of a factor in a data frame in r and get observations with low counts
Im super new to R and programming in general, so my description of stuff may be a bit off. I'll try to be as clear as possible. I have a dataframe
df.train
that has hundreds of factor levels with obfuscated entries that may or may not be unique to their respective factor. I'm trying to get the row id of every observation that has at least one factor level below some given amount. So If there is an nice way to get this I would like that answer. if not, this is my current attempt at a solution and where I got stuck, with a sample dataframe:structure(list(GKUhYLAE = structure(c(1L, 2L, 1L, 1L, 2L, 1L), .Label = c("DDOFi", "fVvMw"), class = "factor"), OnTaJkLa = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("LDyDX", "sbxXu"), class = "factor"), SsZAZLma = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = c("fSMSz", "Hltat"), class = "factor"), BMmgMRvd = structure(c(2L, 1L, 2L, 2L, 2L, 2L), .Label = c("IjEdt", "QZujc"), class = "factor"), OMtioXZZ = c(3L, 21L, 30L, 21L, 12L, 12L), bIBQTaHw = structure(c(1L, 3L, 1L, 3L, 3L, 2L), .Label = c("ALZyK", "qqkkL", "wQABW" ), class = "factor")), row.names = 1013:1018, class = "data.frame")
which gives me the following table
GKUhYLAE OnTaJkLa SsZAZLma BMmgMRvd OMtioXZZ bIBQTaHw 1013 DDOFi LDyDX fSMSz QZujc 3 ALZyK 1014 fVvMw LDyDX fSMSz IjEdt 21 wQABW 1015 DDOFi LDyDX fSMSz QZujc 30 ALZyK 1016 DDOFi LDyDX fSMSz QZujc 21 wQABW 1017 fVvMw LDyDX fSMSz QZujc 12 wQABW 1018 DDOFi LDyDX fSMSz QZujc 12 qqkkL
I can run the following to figure out which factors have counts lower than a certain amount:
library (dplyr) library(plyr) df.test.a = lapply( df.test[,!names( df.test) %in% c("id")], count) df.test.freqcount < as.data.frame(do.call(rbind, df.test.a)) df.test.list = df.test.freqcount[which( df.test.freqcount$freq <2),]
This returns:
x freq BMmgMRvd.1 IjEdt 1 OMtioXZZ.1 NA 1 OMtioXZZ.4 NA 1 bIBQTaHw.2 qqkkL 1
(the left most column is the column name with a .x after it which I assume is the factor level). Here is where im stuck. I figured the best way to get what I want is to make a vector with entries true or 1 when that entry of my dataframe has at least one column with a factor that is in my list of flagged factor. I cant figure out how to do this or how to construct a suitable list of flagged factors. what I want to write is this:
df.test.freqcount[which( df.test.freqcount$freq <2),]$x
Since the names are not unique and im geting NA values I dont expect, instead of checking by $x i want to check by the column.factorlevel thats a hidden column on the left of the table I gave, if possible.
what I would like as output is:
low_count_factor 1013 TRUE 1014 TRUE 1015 TRUE 1016 FALSE 1017 FALSE 1018 TRUE

Get all ways to express a number as a product of two integers
I've been playing with finding divisors of numbers recently and decided to try and make a program, which prints all ways to express a number N as a product of two integers. I wrote one which works on positive numbers and only considers positive numbers to make up the product.
#include <iostream> #include <cmath> int main() { int n; std::cin >> n; int root = std::sqrt(n); for(int i = 1; i <= root; i++) { if(n % i != 0) continue; int j = n / i; std::cout << i << ", " << j << std::endl; } return 0; }
The code above just finds all divisors of N and prints them as pairs. It works fine, but I wanted to try and make it find all possible ways to get to N, not only with positive numbers.
For example, if I input 10 in the program above, the results will be (1, 10), (2, 5); These are correct, but there are other ways to multiply two numbers and get to 10. It involves negative numbers: (1, 10), (2, 5) are also solutions, since when you multiply two negative numbers, you end up with a positive one.
If I wanted the program to only work on positive N values but also find negative multiples, I could just print the negative versions of i and j, since you can only get to a positive number by either multiplying two positive or two negative together.
That works, but now I want to get this code to work on negative N values. For example, an expected output for N = 10 would be: (1, 10), (1, 10), (2, 5), (2, 5);
The problem is, the algorithm above can only find positive divisors for positive numbers, since it involves square root, which is only defined for positive numbers, and the loop starts at a positive and ends at a positive.
I noticed that I can just calculate the square root of the absolute value of N, then make the loop start at root and end at root to go over the negative divisors of N as well. I had to make sure to skip 0, though, because division with 0 isn't defined and that made it crash. The code I ended up with looks like this:
#include <iostream> #include <cmath> int main() { int n; std::cin >> n; int root_n = std::sqrt(std::abs(n)); for(int i = root_n; i <= root_n; i++) { if(i == 0  n % i != 0) continue; int j = n / i; std::cout << i << ", " << j << std::endl; } return 0; }
It worked properly for all the tests I came up with, but I am not sure if it's the best way to write it. Is there anything that I can improve?
Thanks in advance!
EDIT: Tried using std::div as suggested by Caleth (also used ReSharper addon in VS to give me refactoring suggestions):
#include <iostream> #include <cstdlib> int main() { int n; std::cin >> n; const int sqrt_n = std::sqrt(std::abs(n)); for(auto i = sqrt_n; i <= sqrt_n; i++) { if (i == 0) continue; const auto div_res = std::div(n, i); if (div_res.rem) continue; std::cout << i << ", " << div_res.quot << std::endl; } return 0; }
Instead of calculating the remainder, then calculating the quotient, I can just do a single call to std::div, which returns a struct, containing both values.

Convert numeric factors in column to String of factors
I have a dataframe: df
Person Mood Age 1 1 16// 2 2 32// 3 3 25// 4 4 22// 5 5 28// 6 1 37// 7 2 40// 8 3 26// 9 4 19// 10 5 37//
And I have a vector:
Emotions < c(Happy, Sad, Angry, Upset, Neutral)
I want to convert the values in column mood as they map to the vector emotions
Person Mood Age 1 happy 16// 2 sad 32// 3 angry 25// 4 upset 22// 5 neutral 28// 6 happy 37// 7 sad 40// 8 angry 26// 9 upset 19// 10 neutral 37//

Convert Date into Factors for sequential analysis
I want to convert Date into factor for sequential analysis.
I tried the following code:
start_month < '20190101' elapsed_month < function(end_date, start_date) { ed < as.POSIXlt(end_date) sd < as.POSIXlt(start_date) 12 * (ed$year  sd$year) + (ed$mon  sd$mon) } trans_sequence$eventID < elapsed_month(trans_sequence$repdate, start_month)
But i got the following output:
Instead I wish following output:
Thank You for your Help !!