Group unique values in r
I have a following data:
I want o summarise the data in three columns by Subject type and other two columns sates number of schools and number of students by each subject type. I have used group_by function by it keeps giving the number of types subjects appeared rather then by the number of schools or students
TIA
do you know?
how many words do you know
See also questions close to this topic
-
Can't group employees list without years
my issue is that when I want to group a set of employees according to the month of their birth date, Odoo separates the months by years, which I want to avoid because i wanna generate a list of people who have their birthday in the month.
This is code:
<filter name="birthday_groupby" string="Birthday month" domain="[]" context="{'group_by': 'birthday:week'}" />
And this is result:
As you can see, it separates by months, but also by years. I think solution may be more easier than than I think, but i can't figure it out.
-
Fetching values from one column based on other column keys in long-formatted dataset
I have a long format dataset of 100,000+ individuals, capturing clinic visits at 5 different time points (not chronological). I've included an example dataset below that replicates the formatting of my data:
ID
: participant ID visit_number: order of the clinic visits in the original datasetage_visit
: age at the time of the visitclinic_number
: the identifier for the specific clinic locationage_sorted
: For each ID, age sorted in ascending order across the 5 clinic visitsage_sorted_index
: For each ID, the visit number corresponding to the sorted age
I would like to create a new column (
clinic_number_extracted
) that fetches the clinic identifier (clinic_number
) corresponding to each sorted age (age_sorted
value) for each participant. I was thinking that it might be possible to use theage_sorted_index
andvisit_number
variables to do so (generating key-value pairs?), but am not quite sure how to do this outside ofdata.table
. A tidyverse solution would be preferred.I've looked on R community and stack exchange for clues, but haven't been able to find exactly what I'm looking for (likely not using the correct search terms). I tried to play around with
group_by(across())
andwith_order(order_by())
functions without much success. I can potentially create a new variable with a fewcase_when()
conditions but might run into issues if there are repeatedage_assessment_sorted values
.set.seed(42) # Beginning dataset das <- data.frame(id = rep(letters[1:3], each = 5), visit_number = rep(1:5, times = 3), age_visit = c(50, rep(NA_real_, times = 7), 34, 40, 72, rep(NA_real_, times = 3), 87), clinic_number = sample(30:50, 15, replace=TRUE), age_sorted = c(50, rep(NA_real_, times = 4), 34, 40,rep(NA_real_, times = 3), 72, 87, rep(NA_real_, times = 3)), age_sorted_index = c(rep(1:5), 4, 5, rep(1:3), 1, 5, 2, 3, 4)) # Print out dataset das #> id visit_number age_visit clinic_number age_sorted age_sorted_index #> 1 a 1 50 46 50 1 #> 2 a 2 NA 34 NA 2 #> 3 a 3 NA 30 NA 3 #> 4 a 4 NA 39 NA 4 #> 5 a 5 NA 33 NA 5 #> 6 b 1 NA 47 34 4 #> 7 b 2 NA 46 40 5 #> 8 b 3 NA 44 NA 1 #> 9 b 4 34 36 NA 2 #> 10 b 5 40 33 NA 3 #> 11 c 1 72 34 72 1 #> 12 c 2 NA 43 87 5 #> 13 c 3 NA 49 NA 2 #> 14 c 4 NA 47 NA 3 #> 15 c 5 87 44 NA 4
Desired data:
das_final <- cbind(das, clinic_number_extracted = c(46, rep(NA_real_, times = 4), 36, 33, rep(NA_real_, times = 3), 34, 44, rep(NA_real_, times = 3))) # Print out final dataset das_final #> id visit_number age_visit clinic_number age_sorted age_sorted_index #> 1 a 1 50 46 50 1 #> 2 a 2 NA 34 NA 2 #> 3 a 3 NA 30 NA 3 #> 4 a 4 NA 39 NA 4 #> 5 a 5 NA 33 NA 5 #> 6 b 1 NA 47 34 4 #> 7 b 2 NA 46 40 5 #> 8 b 3 NA 44 NA 1 #> 9 b 4 34 36 NA 2 #> 10 b 5 40 33 NA 3 #> 11 c 1 72 34 72 1 #> 12 c 2 NA 43 87 5 #> 13 c 3 NA 49 NA 2 #> 14 c 4 NA 47 NA 3 #> 15 c 5 87 44 NA 4 #> clinic_number_extracted #> 1 46 #> 2 NA #> 3 NA #> 4 NA #> 5 NA #> 6 36 #> 7 33 #> 8 NA #> 9 NA #> 10 NA #> 11 34 #> 12 44 #> 13 NA #> 14 NA #> 15 NA
Created on 2022-05-06 by the reprex package (v2.0.1)
-
How does java.util.stream.Stream.distinct() method work? Can I override the equals() method of the stream of the objects?
My use case is that I am trying to use distinct method of Stream to remove Students with same roll number from list of objects of class StudentCourseMapping. Pojo details are below
public class StudentCourseMapping implements Serializable{ private String name; private String dept; private Integer roll; private String course;
Below is the equals method
@Override public boolean equals(Object obj) { StudentCourseMapping other = (StudentCourseMapping) obj; if (roll == null) { if (other.roll != null) return false; } else if (!roll.equals(other.roll)) return false; return true; }
Below is the implementation
public class RemoveDuplicateUsingStream { public static void main(String[] args) { List<StudentCourseMapping> studentCourceList = JacksonJSONReaderObjectMapper.jsonReader(); studentCourceList.stream().distinct().forEach(System.out::println); StudentCourseMapping s0 = studentCourceList.get(0); StudentCourseMapping s1 = studentCourceList.get(1); System.out.println(s0.equals(s1)); Set<Integer> st = new HashSet(); List<StudentCourseMapping>studentCourceList2 = studentCourceList.stream().filter(s -> st.add(s.getRoll())) .collect(Collectors.toCollection(ArrayList::new)); System.out.println(studentCourceList2.size()); } }
And the output is
StudentCourseMapping [name=Alu, dept=Physics, roll=12, course=Quantum Theory] StudentCourseMapping [name=Alu, dept=Physics, roll=12, course=English] StudentCourseMapping [name=Sam, dept=Commerce, roll=16, course=English] StudentCourseMapping [name=Sam, dept=Commerce, roll=16, course=Accounts] StudentCourseMapping [name=Joe, dept=Arts, roll=19, course=English] StudentCourseMapping [name=Joe, dept=Arts, roll=19, course=Hindi] true 3
JacksonJSONReaderObjectMapper.jsonReader(); is a custom method which reads below JSON. I am able to achieve the same by using filter and adding to HashSet but I really want to know what is wrong with my distinct implementation.
{ "studentCourseMapping": [ { "name": "Alu", "dept": "Physics", "roll": 12, "course": "Quantum Theory" }, { "name": "Alu", "dept": "Physics", "roll": 12, "course": "English" }, { "name": "Sam", "dept": "Commerce", "roll": 16, "course": "English" }, { "name": "Sam", "dept": "Commerce", "roll": 16, "course": "Accounts" }, { "name": "Joe", "dept": "Arts", "roll": 19, "course": "English" }, { "name": "Joe", "dept": "Arts", "roll": 19, "course": "Hindi" } ] }
When I try to test the equals method directly it was working properly and returning true since both the s0 and s1 has roll as 12.
StudentCourseMapping s0 = studentCourceList.get(0); StudentCourseMapping s1 = studentCourceList.get(1); System.out.println(s0.equals(s1));
But when I am using distinct all the Objects are getting printed and also while trying to debug in eclipse the distinct method I wrote is not getting called. but the documentation says it should be called. Btw this from Oracle docs 8 but I am using JDK 11
Stream distinct() Returns a stream consisting of the distinct elements (according to Object.equals(Object)) of this stream.
-
Django multiple distinct values queryset for SQLDB
I'm trying to get a list of results that have distinct x and y values, but I also want to return color along with them.
Since I'm using an SQLDB I'm not allowed to use the fieldnames arg of distinct. Is there a good work around for this? All the examples I'm been looking up haven't been working out.
The query that I'm not allowed to run: queryset = Tile.objects.values('x', 'y', 'color').distinct('x', 'y')
Current Work Around: For now I'm just flagging old rows as retired and pulling everything else as not retired. Probably the right way of doing things, but I didn't want to add additional columns if not needed.