Create a column that identifies if all conditions are met
I have a data frame with numeric values. I want to check, for each row if they meet a certain criteria, and create a new column which gives TRUE
if all criteria are met.
Example criteria are Current.eGFR
is greater than or equal to 15, or less than 60 and Decline.12month
is less than or equal to 4.
This is head()
of data frame
ID Current.eGFR Decline.12month Decline.24.month
1 13 18.0 1.3 8.9
2 19 17.6 1.5 2.3
3 1063 20.1 5.3 10.4
4 700 28.0 0.2 2.7
5 1518 14.6 14.7 45.2
6 197 19.0 13.0 5.1
3 answers

One option is to to use the
>
or<
along with
and&
df1$newcol < with(df1, (Current.eGFR >= 15 & Current.eGFR < 60) & Decline.12month <= 4) df1$newcol #[1] FALSE FALSE TRUE FALSE FALSE TRUE
data
df1 < structure(list(ID = c(13L, 19L, 1063L, 700L, 1518L, 197L), Current.eGFR = c(18, 17.6, 20.1, 28, 14.6, 19), Decline.12month = c(1.3, 1.5, 5.3, 0.2, 14.7, 13), Decline.24.month = c(8.9, 2.3, 10.4, 2.7, 45.2, 5.1)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))

First note that we need Current.eGFR >= 15 and Current.eGFR < 60 since all numbers would satisfy the condition if it were really or. Compare:
1:70 >=15  1:70 < 60 # bad  result is *always* TRUE ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [31] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [46] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE 1:70 >=15 & 1:70 < 60 # good ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [13] FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE ## [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Making that correction, use
transform
to create the new column.transform(mydf, ok = Current.eGFR >= 15 & Current.eGFR < 60 & Decline.12month < 4)
giving:
ID Current.eGFR Decline.12month Decline.24.month ok 1 13 18.0 1.3 8.9 FALSE 2 19 17.6 1.5 2.3 FALSE 3 1063 20.1 5.3 10.4 TRUE 4 700 28.0 0.2 2.7 FALSE 5 1518 14.6 14.7 45.2 FALSE 6 197 19.0 13.0 5.1 TRUE
Note
The input
mydf
in reproducible form is assumed to be as follows.Lines < " ID Current.eGFR Decline.12month Decline.24.month 1 13 18.0 1.3 8.9 2 19 17.6 1.5 2.3 3 1063 20.1 5.3 10.4 4 700 28.0 0.2 2.7 5 1518 14.6 14.7 45.2 6 197 19.0 13.0 5.1" mydf < read.table(text = Lines)

Tidy way, just for completeness:
library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union df1 < structure(list(ID = c(13L, 19L, 1063L, 700L, 1518L, 197L), Current.eGFR = c(18, 17.6, 20.1, 28, 14.6, 19), Decline.12month = c(1.3, 1.5, 5.3, 0.2, 14.7, 13), Decline.24.month = c(8.9, 2.3, 10.4, 2.7, 45.2, 5.1)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6")) df1 %>% mutate( conditions_met = if_else( Current.eGFR >= 15 & Current.eGFR < 60 & Decline.12month <= 4, TRUE, FALSE ) ) #> ID Current.eGFR Decline.12month Decline.24.month conditions_met #> 1 13 18.0 1.3 8.9 FALSE #> 2 19 17.6 1.5 2.3 FALSE #> 3 1063 20.1 5.3 10.4 TRUE #> 4 700 28.0 0.2 2.7 FALSE #> 5 1518 14.6 14.7 45.2 FALSE #> 6 197 19.0 13.0 5.1 TRUE
^{Created on 20191208 by the reprex package (v0.3.0)}