The one-way ANOVA function I'm using keeps spitting out F values that don't make sense
I'm working on a project for college and it's kicking my ass.
I downloaded a data file from https://www.kaggle.com/datasets/majunbajun/himalayan-climbing-expeditions
I'm trying to use an ANOVA to see if there's a statistically significant difference in time taken to summit between the seasons.
The F value I'm getting back doesn't seem to make any sense. Any suggestions?
#import pandas
import pandas as pd
#import expeditions as csv file
exp = pd.read_csv('C:\\filepath\\expeditions.csv')
#extract only the data relating to everest
exp= exp[exp['peak_name'] == 'Everest']
#create a subset of the data only containing
exp_peaks = exp[['peak_name', 'member_deaths', 'termination_reason', 'hired_staff_deaths', 'year', 'season', 'basecamp_date', 'highpoint_date']]
#extract successful attempts
exp_peaks = exp_peaks[(exp_peaks['termination_reason'] == 'Success (main peak)')]
#drop missing values from basecamp_date & highpoint_date
exp_peaks = exp_peaks.dropna(subset=['basecamp_date', 'highpoint_date'])
#convert basecamp date to datetime
exp_peaks['basecamp_date'] = pd.to_datetime(exp_peaks['basecamp_date'])
#convert basecamp date to datetime
exp_peaks['highpoint_date'] = pd.to_datetime(exp_peaks['highpoint_date'])
from datetime import datetime
exp_peaks['time_taken'] = exp_peaks['highpoint_date'] - exp_peaks['basecamp_date']
#convert seasons from strings to ints
exp_peaks['season'] = exp_peaks['season'].replace('Spring', 1)
exp_peaks['season'] = exp_peaks['season'].replace('Autumn', 3)
exp_peaks['season'] = exp_peaks['season'].replace('Winter', 4)
#remove summer and unknown
exp_peaks = exp_peaks[(exp_peaks['season'] != 'Summer')]
exp_peaks = exp_peaks[(exp_peaks['season'] != 'Unknown')]
#subset the data according to the season
exp_peaks_spring = exp_peaks[exp_peaks['season'] == 1]
exp_peaks_autumn = exp_peaks[exp_peaks['season'] == 3]
exp_peaks_winter = exp_peaks[exp_peaks['season'] == 4]
#calculate the average time taken in spring
exp_peaks_spring_duration = exp_peaks_spring['time_taken']
mean_exp_peaks_spring_duration = exp_peaks_spring_duration.mean()
#calculate the average time taken in autumn
exp_peaks_autumn_duration = exp_peaks_autumn['time_taken']
mean_exp_peaks_autumn_duration = exp_peaks_autumn_duration.mean()
#calculate the average time taken in winter
exp_peaks_winter_duration = exp_peaks_winter['time_taken']
mean_exp_peaks_winter_duration = exp_peaks_winter_duration.mean()
# Turn the season column into a categorical
exp_peaks['season'] = exp_peaks['season'].astype('category')
exp_peaks['season'].dtypes
from scipy.stats import f_oneway
# One-way ANOVA
f_value, p_value = f_oneway(exp_peaks['season'], exp_peaks['time_taken'])
print("F-score: " + str(f_value))
print("p value: " + str(p_value))
1 answer
-
answered 2022-05-04 11:22
Stuart
It seems that
f_oneway
requires the different samples of continuous data to be arguments, rather than taking a categorical variable argument. You can achieve this usinggroupby
.f_oneway(*(group for _, group in exp_peaks.groupby("season")["time_taken"]))
Or equivalently, since you have already created series for each season:
f_oneway(exp_peaks_spring_duration, exp_peaks_autumn_duration, exp_peaks_winter_duration)
I would have thought there would be an easier way to perform an ANOVA in this common case but can't find it.
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
Any efficient way to compare two dataframes and append new entries in pandas?
I have new files which I want to add them to historical table, before that, I need to check new file with historical table by comparing its two column in particular, one is
state
and another one isdate
column. First, I need to checkmax (state, date)
, then check those entries withmax(state, date)
in historical table; if they are not historical table, then append them, otherwise do nothing. I tried to do this in pandas bygroup-by
on new file and historical table and do comparison, if any new entries from new file that not in historical data, then add them. Now I have issues to append new values to historical table correctly in pandas. Does anyone have quick thoughts?My current attempt:
import pandas as pd src_df=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/src_df.csv") hist_df=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/historical_df.csv") picked_rows = src_df.loc[src_df.groupby('state')['yyyy_mm'].idxmax()]
I want to check
picked_rows
inhist_df
where I need to check bystate
andyyyy_mm
columns, so only add entries frompicked_rows
wherestate
hasmax
value or recent dates. I created desired output below. I tried inner join orpandas.concat
but it is not giving me correct out. Does anyone have any ideas on this?Here is my desired output that I want to get:
import pandas as pd desired_output=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/output_df.csv")
-
How to bring data frame into single column from multiple columns in python
I have data format in these multiple columns. So I want to bring all 4 columns of data into a single column.
YEAR Month pcp1 pcp2 pcp3 pcp4 1984 1 0 0 0 0 1984 2 1.2 0 0 0 1984 3 0 0 0 0 1984 4 0 0 0 0 1984 5 0 0 0 0 1984 6 0 0 0 1.6 1984 7 3 3 9.2 3.2 1984 8 6.2 27.1 5.4 0 1984 9 0 0 0 0 1984 10 0 0 0 0 1984 11 0 0 0 0 1984 12 0 0 0 0
-
Exclude Japanese Stopwords from File
I am trying to remove Japanese stopwords from a text corpus from twitter. Unfortunately the frequently used nltk does not contain Japanese, so I had to figure out a different way.
This is my MWE:
import urllib from urllib.request import urlopen import MeCab import re # slothlib slothlib_path = "http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt" sloth_file = urllib.request.urlopen(slothlib_path) # stopwordsiso iso_path = "https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt" iso_file = urllib.request.urlopen(iso_path) stopwords = [line.decode("utf-8").strip() for line in iso_file] stopwords = [ss for ss in stopwords if not ss==u''] stopwords = list(set(stopwords)) text = '日本語の自然言語処理は本当にしんどい、と彼は十回言った。' tagger = MeCab.Tagger("-Owakati") tok_text = tagger.parse(text) ws = re.compile(" ") words = [word for word in ws.split(tok_text)] if words[-1] == u"\n": words = words[:-1] ws = [w for w in words if w not in stopwords] print(words) print(ws)
Successfully Completed: It does give out the original tokenized text as well as the one without stopwords
['日本語', 'の', '自然', '言語', '処理', 'は', '本当に', 'しんどい', '、', 'と', '彼', 'は', '十', '回', '言っ', 'た', '。'] ['日本語', '自然', '言語', '処理', '本当に', 'しんどい', '、', '十', '回', '言っ', '。']
There is still 2 issues I am facing though:
a) Is it possible to have 2 stopword lists regarded? namely
iso_file
andsloth_file
? so if the word is either a stopword fromiso_file
orsloth_file
it will be removed? (I tried to use line 14 asstopwords = [line.decode("utf-8").strip() for line in zip('iso_file','sloth_file')]
but received an error as tuple attributes may not be decodedb) The ultimate goal would be to generate a new text file in which all stopwords are removed.
I had created this MWE
### first clean twitter csv import pandas as pd import re import emoji df = pd.read_csv("input.csv") def cleaner(tweet): tweet = re.sub(r"@[^\s]+","",tweet) #Remove @username tweet = re.sub(r"(?:\@|http?\://|https?\://|www)\S+|\\n","", tweet) #Remove http links & \n tweet = " ".join(tweet.split()) tweet = ''.join(c for c in tweet if c not in emoji.UNICODE_EMOJI) #Remove Emojis tweet = tweet.replace("#", "").replace("_", " ") #Remove hashtag sign but keep the text return tweet df['text'] = df['text'].map(lambda x: cleaner(x)) df['text'].to_csv(r'cleaned.txt', header=None, index=None, sep='\t', mode='a') ### remove stopwords import urllib from urllib.request import urlopen import MeCab import re # slothlib slothlib_path = "http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt" sloth_file = urllib.request.urlopen(slothlib_path) #stopwordsiso iso_path = "https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt" iso_file = urllib.request.urlopen(iso_path) stopwords = [line.decode("utf-8").strip() for line in iso_file] stopwords = [ss for ss in stopwords if not ss==u''] stopwords = list(set(stopwords)) with open("cleaned.txt",encoding='utf8') as f: cleanedlist = f.readlines() cleanedlist = list(set(cleanedlist)) tagger = MeCab.Tagger("-Owakati") tok_text = tagger.parse(cleanedlist) ws = re.compile(" ") words = [word for word in ws.split(tok_text)] if words[-1] == u"\n": words = words[:-1] ws = [w for w in words if w not in stopwords] print(words) print(ws)
While it works for the simple input text in the first MWE, for the MWE I just stated I get the error
in method 'Tagger_parse', argument 2 of type 'char const *' Additional information: Wrong number or type of arguments for overloaded function 'Tagger_parse'. Possible C/C++ prototypes are: MeCab::Tagger::parse(MeCab::Lattice *) const MeCab::Tagger::parse(char const *)
for this line:
tok_text = tagger.parse(cleanedlist)
So I assume I will need to make amendments to thecleanedlist
?I have uploaded the cleaned.txt on github for reproducing the issue: [txt on github][1]
Also: How would I be able to get the tokenized list that excludes stopwords back to a text format like cleaned.txt? Would it be possible to for this purpose create a df of ws? Or might there even be a more simple way?
Sorry for the long request, I tried a lot and tried to make it as easy as possible to understand what I'm driving at :-)
Thank you very much! [1]: https://gist.github.com/yin-ori/1756f6236944e458fdbc4a4aa8f85a2c
-
Issue with 'group_by' function when doing shapiro_test in R
I've asked this question previously with no luck, so here goes again:
My dataframe:
data.type <- c("DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","DNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA","RNA") hour <- c(1,1,1,2,2,2,24,24,24,48,48,48,96,96,96,168,168,168,672,672,672,1,1,1,2,2,2,24,24,24,48,48,48,96,96,96,168,168,168,672,672,672) zotu.count <- c(11,14,16,7,16,15,5,14,13,6,5,17,7,7,12,3,4,5,3,5,4,2,3,2,1,6,2,1,1,1,1,0,0,1,1,4,1,1,1,6,7,6) id <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42)
I am trying to do a shapiro test to test for normality of my data using the following code and am being given the following error:
dataset %>% group_by(data.type, hour) %>% shapiro_test(zotu.count) Error: Problem with `mutate()` column `data`. ℹ `data = map(.data$data, .f, ...)`. x Problem with `mutate()` column `data`. ℹ `data = map(.data$data, .f, ...)`. x all 'x' values are identical
This is very strange as it has worked before on another dataset with the same data structure but I have no idea why I'm getting this error now. I am very frustrated as I have scoured the internet for answers and have nothing. Anybody who might be able to help would be a godsend!
Thank you!
-
How to filter a list of models based on Pr(>F)
I ran a model which is basically this
models <- mclapply(frms, function(x) anova(lm(x, data = mrna.pcs)))
Now I want to filter the model based on the
Pr(>F)
The model class is
list
This the
str
of the modelstr(models) List of 248 $ PC1 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 205562 63770 ..$ Mean Sq: num [1:2] 205562 343 ..$ F value: num [1:2] 600 NA ..$ Pr(>F) : num [1:2] 4.34e-60 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC1" $ PC2 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 1098 185549 ..$ Mean Sq: num [1:2] 1098 998 ..$ F value: num [1:2] 1.1 NA ..$ Pr(>F) : num [1:2] 0.296 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC2" $ PC3 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 6023 56650 ..$ Mean Sq: num [1:2] 6023 305 ..$ F value: num [1:2] 19.8 NA ..$ Pr(>F) : num [1:2] 1.5e-05 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC3" $ PC4 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 88.1 48006.7 ..$ Mean Sq: num [1:2] 88.1 258.1 ..$ F value: num [1:2] 0.341 NA ..$ Pr(>F) : num [1:2] 0.56 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC4" $ PC5 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 390 31192 ..$ Mean Sq: num [1:2] 390 168 ..$ F value: num [1:2] 2.33 NA ..$ Pr(>F) : num [1:2] 0.129 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC5" $ PC6 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 58.3 24470 ..$ Mean Sq: num [1:2] 58.3 131.6 ..$ F value: num [1:2] 0.443 NA ..$ Pr(>F) : num [1:2] 0.506 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC6" $ PC7 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 21.9 19772.5 ..$ Mean Sq: num [1:2] 21.9 106.3 ..$ F value: num [1:2] 0.206 NA ..$ Pr(>F) : num [1:2] 0.65 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC7" $ PC8 ~ Sex :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 1 186 ..$ Sum Sq : num [1:2] 7.39 17396.15 ..$ Mean Sq: num [1:2] 7.39 93.53 ..$ F value: num [1:2] 0.0791 NA ..$ Pr(>F) : num [1:2] 0.779 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC8" $ PC1 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 717 268616 ..$ Mean Sq: num [1:2] 358 1452 ..$ F value: num [1:2] 0.247 NA ..$ Pr(>F) : num [1:2] 0.782 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC1" $ PC2 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 238 186409 ..$ Mean Sq: num [1:2] 119 1008 ..$ F value: num [1:2] 0.118 NA ..$ Pr(>F) : num [1:2] 0.889 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC2" $ PC3 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 5461 57211 ..$ Mean Sq: num [1:2] 2731 309 ..$ F value: num [1:2] 8.83 NA ..$ Pr(>F) : num [1:2] 0.000218 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC3" $ PC4 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 3845 44250 ..$ Mean Sq: num [1:2] 1922 239 ..$ F value: num [1:2] 8.04 NA ..$ Pr(>F) : num [1:2] 0.00045 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC4" $ PC5 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 1804 29778 ..$ Mean Sq: num [1:2] 902 161 ..$ F value: num [1:2] 5.61 NA ..$ Pr(>F) : num [1:2] 0.00433 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC5" $ PC6 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 7.65 24520.66 ..$ Mean Sq: num [1:2] 3.82 132.54 ..$ F value: num [1:2] 0.0288 NA ..$ Pr(>F) : num [1:2] 0.972 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC6" $ PC7 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 378 19416 ..$ Mean Sq: num [1:2] 189 105 ..$ F value: num [1:2] 1.8 NA ..$ Pr(>F) : num [1:2] 0.168 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC7" $ PC8 ~ fAge :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 2 185 ..$ Sum Sq : num [1:2] 239 17165 ..$ Mean Sq: num [1:2] 119.4 92.8 ..$ F value: num [1:2] 1.29 NA ..$ Pr(>F) : num [1:2] 0.279 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC8" $ PC1 ~ Index :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 23 164 ..$ Sum Sq : num [1:2] 30056 239277 ..$ Mean Sq: num [1:2] 1307 1459 ..$ F value: num [1:2] 0.896 NA ..$ Pr(>F) : num [1:2] 0.604 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC1" $ PC2 ~ Index :Classes ‘anova’ and 'data.frame': 2 obs. of 5 variables: ..$ Df : int [1:2] 23 164 ..$ Sum Sq : num [1:2] 8402 178245 ..$ Mean Sq: num [1:2] 365 1087 ..$ F value: num [1:2] 0.336 NA ..$ Pr(>F) : num [1:2] 0.998 NA ..- attr(*, "heading")= chr [1:2] "Analysis of Variance Table\n" "Response: PC2"
My attempt to filter is the model based on pvalue is this
myp2<-lapply(models,function(x)x$"Pr(>F)") myp2<-lapply(models,function(x)x$"Pr(>F)" < 0.05,) this gives me this Error in FUN(X[[i]], ...) : unused argument (alist()) I know the code is not right.
My question how do I pass the pvalue argument into my models which can filter only significant models
Any suggestion or help would be really appreciated.
My sample data subset
structure(list(Mouse.ID = c("DO.0661", "DO.0669", "DO.0670", "DO.0673", "DO.0674", "DO.0676", "DO.0677", "DO.0682", "DO.0683", "DO.0685", "DO.0686", "DO.0692", "DO.0693", "DO.0696", "DO.0698", "DO.0701", "DO.0704", "DO.0709", "DO.0710", "DO.0711"), Sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), fAge = structure(c(2L, 3L, 2L, 3L, 2L, 2L, 2L, 2L, 3L, 3L, 2L, 3L, 3L, 2L, 2L, 3L, 2L, 3L, 3L, 2L), .Label = c("6", "12", "18"), class = "factor"), Index = structure(c(21L, 24L, 11L, 20L, 12L, 19L, 20L, 7L, 1L, 7L, 6L, 15L, 19L, 23L, 14L, 17L, 8L, 22L, 13L, 12L), .Label = c("AR001", "AR002", "AR003", "AR004", "AR005", "AR006", "AR007", "AR008", "AR009", "AR010", "AR011", "AR012", "AR013", "AR014", "AR015", "AR016", "AR018", "AR019", "AR020", "AR021", "AR022", "AR023", "AR025", "AR027"), class = "factor"), Lane = structure(c(6L, 2L, 4L, 5L, 5L, 4L, 8L, 8L, 8L, 4L, 2L, 2L, 1L, 1L, 2L, 3L, 7L, 4L, 8L, 1L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor"), Gen = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("8", "9", "10", "11", "12"), class = "factor"), PC1 = c(-23.147618298858, -23.004329868562, -17.0024755772689, -23.9178589007844, -56.7766982399411, -34.3969872418573, -27.7082679050298, -34.32038042076, -6.54582754257061, -48.2738527700051, -51.350816410461, -23.1430204310663, -44.8168212771171, -34.9912596308964, -57.2869816005964, -35.9007859727558, -13.396023721849, -70.4151952469644, -3.95389163762967, -35.2820334506896), PC2 = c(40.5243564641241, 2.99206119995141, -61.4176842149059, 7.10965422446634, 7.28461966315024, -64.1955797075099, 9.48345862615554, -1.04318789593829, 29.0090598234213, -72.8866334170873, -3.21615600827421, 0.792597778173725, -5.14192513442733, -11.7269589504179, 6.55428703944617, -11.5180102658871, 33.3869522894233, -35.1229326772949, 15.996339264987, -11.8901043502155 ), PC3 = c(-17.0598627155672, -22.1038475592448, -6.25238299099893, 23.500307567532, 53.4553992426852, -20.1077749520339, -11.8816581457792, -5.73256447673161, -22.0636009501435, 0.688509203223446, 16.5309171320498, -19.983643792547, -9.04327584423542, -2.27657333476154, 37.6402580806145, 3.45415683648683, -32.247947130388, 64.7524458379641, -22.9483534394309, -12.2002153235215), PC4 = c(-5.37605681469604, 28.8757760174757, 1.96723351126677, 10.1757811517044, 7.63553142427313, -0.61083387825962, -2.14595568269526, 6.96007000414511, -5.55019443290321, 10.7590865244751, -10.6766589136731, 2.57313118560919, -3.80955622632714, -3.66495004673328, 21.0056059162486, -6.43937479210278, -9.20567548365632, 16.1413805847049, 4.77454270484041, 2.14994000686116), PC5 = c(2.49156058897602, -2.2801673669604, -5.45494631567109, -5.44682692111089, -7.21616736676726, -11.0786655194642, 3.89806778409165, 6.1416402328447, -7.6800051817927, -1.30037456136107, -3.73786692756896, -19.2389148951544, 9.07153121652293, -10.2899662479029, 0.579736383131339, -0.0725346819879087, 16.3956001897781, -12.6980354901866, 2.24690751602866, 26.4308764499693 ), PC6 = c(-11.625850369587, 1.54093546690149, -4.87370378395642, -22.0735137415442, -2.44337914021456, 0.619440592140127, 10.0537326783752, 4.27431733991133, 13.6314815937122, 4.15399959062463, -10.1029165139482, 3.79816714568195, 11.054055138545, -8.56784129106846, -16.5277734318821, -11.1264688073482, -10.4604427054892, -9.80324924496993, -6.23395120489922, 11.8384546696797), PC7 = c(7.20873385839409, -17.719801994905, -0.811301497692041, 7.55418040146638, -4.68437054723712, 1.1158744957288, -15.1982758555559, 5.25257260525755, -8.31670233486223, -3.86077542839162, -5.29923744674506, -16.0223534779217, -18.0399629122521, -17.9420689996937, -14.3059444168904, -14.3249976727842, 12.4030641816896, 0.629537064989641, 1.01109826318526, -5.35255467845748), PC8 = c(-7.19678837565302, 6.24827779166403, 0.224651092284126, 6.10960416152842, -14.6615234719377, -0.410198021192528, 10.3006326038467, 7.37866876496142, -12.4177204278112, -11.712973299024, -2.00299875954171, 3.19952937463445, 8.81158436770453, 11.7845383750873, -4.79906390420115, 9.7890992316383, -6.26723664234847, -3.97353277391602, -7.12621186623398, -7.33366271961528)), row.names = c(NA, -20L), class = c("tbl_df", "tbl", "data.frame"))
Code to generate the model
iv <- c("Sex", "fAge", "Index", "Lane", "Gen") dv <- paste0('PC', 1:8) rhs <- unlist(sapply(1:length(iv), function(m) apply(combn(iv, m = m), 2, paste, collapse = ' * '))) frms <- with(expand.grid(dv, rhs), paste(Var1, Var2, sep = ' ~ ')) frms models <- mclapply(frms, function(x) anova(lm(x, data = mrna.pcs)))
-
Problem with `mutate()` column `data` when doing shapiro_test
I am trying to do a shapiro test to test for normality in my dataset. However, when I pass my dataset to a group_by argument, it gives me the following error when I run the command:
My data looks like this (example)
# dataset (example) data.type <- c("DNA", "RNA", ...) hour <- c("1", "1", "1", "2", "2", "2"...) count <- c("12", "13", "7", "4", "8", "0", ...) id <- c(1, 2, 3, 4, 5, 6 ...) # command dataset %>% group_by(data.type, hour) %>% shapiro_test(count) Error: Problem with `mutate()` column `data`. ℹ `data = map(.data$data, .f, ...)`. x Problem with `mutate()` column `data`. ℹ `data = map(.data$data, .f, ...)`. x all 'x' values are identical
It's really weird because I've run this exact argument before and don't get an error with very similar datasets. All my x values are NOT identical. I don't know how to get rid of this error and can't find how to fix it anywhere so here I am.
This is what my output looked like before with the other datasets:
I have no idea why this one is not working.
Thanks in advance for your help!