Subsetting for proportional representation in R
I can't wrap my tiny brain around this one. One dataframe contains observations, each with a gender and an age bracket. I'm trying to write a function that returns a subset of the rows of this dataframe where each agegender combination appears in a proportion roughly equal to the value in the "props" dataframe. Ideally, the function will trim as few observations as possible. The results can be approximate (By approximate/roughly equal, I mean that each group's representation in the output should be at least within 5% of the desired proportion, and generally as low as possible).
ages < c("1829", "3039", "4049", "5059","60+")
genders < c("M","F")
set.seed(101)
df < data.frame("id" = paste0("p",c(1:500)),
"gender" = sample(genders, replace=TRUE, size=500),
"age" = sample(ages, replace=T, size=500))
props < data.frame("age" = c(ages, ages),
"gender" = genders,
"pcts" = c(.0835, .1145, .1145, .1145, .073, .0835, .1145,
.1145, .1145, .073))
select_max < function(df, props) {
....
return(subset)
}
I experimented with solutions using least common multiples and greatest common divisors, but these fell apart when the proportions didn't work nicely together. I'm considering a solution which adds and subtracts rows one at a time until it gets close enough to the desired proportions, but I feel there must be some more elegant solution. All help is appreciated. This is a fun one, for sure.
See also questions close to this topic

Examine which row/function in R code takes the most time within function?
I have a set of functions that run within a wrapper:
wrapper_func < function(x,y,z,.....) { t < foo1(x,y) kuku < foo2(t,z) .... final_res < foo20(t, kuku, ...) return(final_res) }
It runs slowly and I want to understand who is the bottleneck/troublemaker. Please advise which function can perform deeper analysis (benchmark?microbenchmark?...) that will show the drilldown  which row/function takes the most time/resources?

compile error with boost include when I using Rcpp in R
I got an error as below although I already have RTools and package BH. (I'm using Win10 64bit and R 3.5.1)
Rtools works well when I compile other packages.
I already saw many related q&a, but couldn't solve this.
Please help this newbie.
> sourceCpp('D:/Data/Drive/RCodes/scRNAseq/TransSyn/TransSyn.cpp') c:/Rtools/mingw_64/bin/g++ std=gnu++11 I"C:/PROGRA~1/R/R35~1.1/include"  DNDEBUG I"C:/Users/CEO/Documents/R/winlibrary/3.5/Rcpp/include"  I"D:/Data/Drive/RCodes/scRNAseq/TransSyn" O2 Wall mtune=generic c TransSyn.cpp o TransSyn.o TransSyn.cpp:5:37: fatal error: boost/functional/hash.hpp: No such file or directory #include <boost/functional/hash.hpp> ^ compilation terminated. make: *** [C:/PROGRA~1/R/R35~1.1/etc/x64/Makeconf:215: TransSyn.o] Error 1 Error in sourceCpp("D:/Data/Drive/RCodes/scRNAseq/TransSyn/TransSyn.cpp") : Error 1 occurred building shared library. WARNING: Rtools is required to build R packages but is not currently installed. Please download and install the appropriate version of Rtools before proceeding: https://cran.rstudio.com/bin/windows/Rtools/

How to normalize the special ? character using dplyr pipe
I have the following data frame:
library(tidyverse) ndf < structure(list(experiment_status = c("Negative？", "Negative？", "Negative", "Negative？", "Negative？", "Negative？"), id = 1:6), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 6L)) ndf #> # A tibble: 6 x 2 #> experiment_status id #> <chr> <int> #> 1 Negative？ 1 #> 2 Negative？ 2 #> 3 Negative 3 #> 4 Negative？ 4 #> 5 Negative？ 5 #> 6 Negative？ 6
Notice the
experiment status
column contain unusual question mark:？
instead of?
How can I normalize the column using pipe?

I need to get data from sheet 1 and input it to sheet 2
I'm working with a reusable invoice I made in google sheets my proplem is that its reusable meaning print then clear the same sheet every time. So I needed help figuring out how to copy and past a specific cells into sheet 2 column 1 then have the function in sheet 2 column 1 stay the same and have the data drop down to column 2 so when I clear the data in the invoice the data in the report doesn't clear with it.

How to I print what a function returns?
My code looks like this:
from random import randint Item = randint(1, 10) def ItemName(Item): if Item == 1: return Hat elif Item == 2: return Gloves elif Item == 3: return Scarf elif Item == 4: return Top elif Item == 5: return Pants elif Item == 6: return Shoes elif Item == 7: return Socks elif Item == 8: return Sunglasses elif Item == 9: return Bag else: return Jacket print (ItemName)
I expect the program to print one of the possible values it could return. Like, 'hat', or 'socks', or 'bag'. Instead, I get "" or some other variant of complete gibberish.

List output from function on ipywidgets
I constructed a function linked to MySQL which returns a list of children for a given parent id. I would like to output this list of children using ipywidgets.
I am having trouble linking the function to ipywidgets. So far I have:
> from ipywidgets import widgets > > text1 = widgets.Text() > text2 = widgets.Text() > button = widgets.Button(description = 'Run') > display(text1) > display(button) display(text2) > > idnum = text1.value > text2.value= list_children(idnum) > > button.on_click(list_children)
The function is the following:
> def list_children(parentid): > value = parentid > parent_80 = session.query(Parent).get(value) > parent_80_children= parent_80.children > childrenlist=[] > > for i in parent_80_children: > childrenlist.append(i.UWI) > > return childrenlist
I keep getting the following error:
AttributeError: 'NoneType' object has no attribute 'children'
as it breaks in this line:
parent_80_children= parent_80.children
The function is correct if I run the python cell so I know it's working, but it breaks when I try to click the widget box "Run". Somehow there is no link between the function and the widget box.
I would like to have the output upon clicking the "Run" widget button as the following:
 1771860100
 1771860200
 1771860300
minus the bullet points.
Any input is appreciated.

Fire an event when near textbox boundary
I have a text box (text box width is fixed) and I have to fire an event when I am near the boundary of the textbox. For ex: in below textbox the event should be fired when user reach (.) while typing.
(fixedTextBoxWidth) (expandableDivWidth) 227 px expandable one line div <> < unlimited width     (.)    (.)    (.)   (.)  
Now, in order to do that:
 I have created a div with nowrap and auto height.
 And then copied the text typed by the user in it.
 After doing that I am just returning that div's offsetWidth.
 And then finally comparing that returned width with the fixed textbox width. To check if it crosses the boundary to fire an event.
But I am not able to get the maths correct. I tried 2 solutions:
 As detecting the first event is easy i.e. (227  220) < 30 fire an event and then for subsequent calls, I am just storing the previous width when the event was last fired and then using it to deduct.
int widthToCompare = (expandableDivWidth + 25)  _prevExpandableDivWidth; if ( widthToCompare > fixedTextBoxWidth ) { // fire event; }
 Second method, I am finding multiple of 227 and then deducting it with returned width from expandable div to get the difference and then checking if difference is < 30.
if ( (expandableDivWidth < fixedTextBoxWidth) && (fixedTextBoxWidth  expandableDivWidth < 30) ) { // fire event; } if ( expandableDivWidth > fixedTextBoxWidth) { int quotient = expandableDivWidth/fixedTextBoxWidth; int multiply = fixedTextBoxWidth * (quotient + 1); int difference = multiply  expandableDivWidth; if ( difference < 30 ) { // fire event; } }
 Make series start at same value

cant find Bug in following Javascript code?
<!DOCTYPE> <html> <head> <script> var sin0=0; var sin30=1/2; var sin45=Math.sqrt(1)/2; var sin60=Math.sqrt(3)/2; var sin90=1; var cos0=1; var cos30=Math.sqrt(3)/2; var cos45=Math.sqrt(1)/2; var cos60=1/2; var cos90=0; var tan0=0 var tan30=1/Math.sqrt(3); var tan45=1; var tan60=Math.sqrt(3); var sec0=1; var sec30=2/Math.sqrt(3); var sec45=Math.sqrt(2); var sec60=2; var cosec30=2; var cosec45=Math.sqrt(2); var cosec60=2/Math.sqrt(3); var cosec90=1; var cot30=Math.sqrt(3); var cot45=1; var cot60=1/Math.sqrt(3); var cot90=0; var p=document.getElementById("p").value; var b=document.getElementById("b").value; var h=document.getElementById("h").value; var l=document.getElementById("l").value; function func(){ if(p==""&&b==""){ if(l==0){ var pe=sin0*h; } if(l==30){ var pe=sin30*h; } if(l==45){ var pe=sin45*h; } if(l==60){ var pe=sin60*h; } if(l==90){ var pe=sin490*h; } } if(h==""&&b==""){ if(l==0){ alert("Not Defined"); } if(l==30){ var hy=cosec30*p; } if(l==45){ var hy=cosec45*p; } if(l==60){ var hy=cosec60*p; } if(l==90){ var hy=cosec90*p; } } if(h==""&&p==""){ if(l==0){ var hy=sec0*b; } if(l==30){ var hy=sec30*b; } if(l==45){ var hy=sec45*b; } if(l==60){ var hy=sec60*b; } if(l==90){ alert("Not Defined"); } } alert(pe); } </script> </head> <body> <p> You need to have at least 1 Angle of the 'Triangle formed' and at least 1 side 's length (in metres). <br> To apply Trigonometry.<br><br> Which side do you have? <br> </p> <label>Perpendicular</label> <input id="p" type="number"/> <label>Base</label> <input id="b" type="number"/> <label>Hypotenuse</label> <input id="h" type="number"/><br><br> <label>Measure of Angle you have</label> <input type="number" id="l"> <input type="button" value="Solve" onclick="func()"/> </body> </html>
I've created this code for "Applications of Trigonometry". I think I've every variable well defined, but Every time, I run this , it throws 'undefined' .
You can enter only 0,30,45,60 and 90 in "Angle" input field. And fill only one of fields out of 'Perpendicular' , 'Base' , and 'Hypotenuse'.

String subset using a pattern and range of string length and in R
I have a data set that contains a column with strings made up of 4 letters (A,T,C,G); these strings range from 21991 characters long. I would like to subset all rows where the strings match a particular pattern. For example, I would like to create a new dataframe that subsets all rows where there are 010 consecutive Ts in column 17.
Please let me know if you require additional information and thank you for your time!

Can R automatically create vector based on time period?
I am trying to create vectors of indexes based on time (each quarter). I am wondering if there is any way to let R automatically generate new vector name and slice the data based on quarter. Ultimately, I want R to generate a new vector by the end of each quarter.
Below is the code that I have so far:
### Create indexes #### First Sales before or start at 2013 Q1 PN_2013Q1 < unique(FIRST_SALE_DATE$PNM_AUTO_KEY[FIRST_SALE_DATE$QUARTER < as.Date("20130401")]) ##### First Sales start at 2013 Q2 PN_2013Q2 < unique(FIRST_SALE_DATE$PNM_AUTO_KEY[FIRST_SALE_DATE$QUARTER >= as.Date("20130401") & FIRST_SALE_DATE$QUARTER < as.Date("20130701")]) ##### First Sales start at 2013 Q3 PN_2013Q3 < unique(FIRST_SALE_DATE$PNM_AUTO_KEY[FIRST_SALE_DATE$QUARTER >= as.Date("20130701") & FIRST_SALE_DATE$QUARTER < as.Date("20131001")]) ##### First Sales start at 2013 Q4 PN_2013Q4 < unique(FIRST_SALE_DATE$PNM_AUTO_KEY[FIRST_SALE_DATE$QUARTER >= as.Date("20131001") & FIRST_SALE_DATE$QUARTER < as.Date("20140101")]) ##### First Sales start at 2014 Q1 PN_2014Q1 < unique(FIRST_SALE_DATE$PNM_AUTO_KEY[FIRST_SALE_DATE$QUARTER >= as.Date("20140101") & FIRST_SALE_DATE$QUARTER < as.Date("20140401")]) ##### First Sales start at 2014 Q2 PN_2014Q2 < unique(FIRST_SALE_DATE$PNM_AUTO_KEY[FIRST_SALE_DATE$QUARTER >= as.Date("20140401") & FIRST_SALE_DATE$QUARTER < as.Date("20140701")])
Thanks!

R programming: Subset multiple times using conditions .. how to write a function
I've a dataset that looks like this.
Var1 Var2 Var3 Y N N N N Y N Y Y Y Y N
I know how to subset this dataset to a smaller dataset using
Var1=Y
orVar3=N
etc.
Can I write a function so that I don't have to subset the data manually multiple times?I want to call
function(x)
so thatfunction(x=1)
will give me a dataset usingVar1=Y
for example. Different values ofx
should give me different datasets.