Results of Ridgeregression
I've faced a problem connected with Ridgeregression.
As it's known, Ridgeregression is used in cases of strong conditionality of features. This is just my case: the determinant of my matrix of interfactor correlation is of the order of 10^(18)
. Multicollinearity is my case. The sampling of data consists of 8 quantitative features.
Ridge regression gives worse (or just the same) results than the standard linear regression.
What leads to this result? How can the results be improved?
1 answer

Ridge regression has one obvious disadvantage. Unlike best subset, forward stepwise, and backward stepwise selection, which will generally select models that involve just subset of the variables, ridge regression will include all predictiors in the final model. The lasso is a relatively recent alternative to ridge regression that overcomes this disadvantage. However, have you already considered selecting the tuning parameter using crossvalidation? reference: Chapter 6 Linear Model Selection and Regularization; ISLR
See also questions close to this topic

Finetuning VGG16 on GPU in Keras: memory consumption
I'm finetuning VGG16 for my task. The idea is that I load the pretrained weights, remove the last layer (which is softmax with 1000 outputs) and replace it with a softmax with a few outputs. Then I freeze all the layers but the last and train the model.
Here is the code that builds the original model and loads the weights.
def VGG_16(weights_path=None): model = Sequential() model.add(ZeroPadding2D((1,1),input_shape=(224,224,3))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(64, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(128, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(128, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(256, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(256, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(512, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(512, (3, 3), activation='relu')) model.add(ZeroPadding2D((1,1))) model.add(Conv2D(512, (3, 3), activation='relu')) model.add(MaxPooling2D((2,2), strides=(2,2))) model.add(Flatten()) model.add(Dense(4096, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(4096, activation='relu')) model.add(Dropout(0.5)) model.add(Dense(1000, activation='softmax')) if weights_path: model.load_weights(weights_path) return model
Keras uses Tensorflow as a backend in my case. Tensorflow is built to use GPU (using CUDA). I currently have a rather old card: GTX 760 with 2Gb of memory.
On my card I cannot even load the whole model (the code above) because of an out of memory error.
Here the author says that 4Gb is not enough as well.
Here GTX 1070 is able to even train VGG16 (not just load it into memory), but only with some batch sizes and in different frameworks (not in Keras). It seems that GTX 1070 always have exactly 8Gb of memory.
So it seems that 4Gb is clearly not enough for finetuning VGG16, and 8Gb may be enough.
And the question is: what amount of memory is enough to finetune VGG16 with Keras+TF? Will 6Gb be enough, or 8Gb is minimum and ok, or something bigger is needed?

Predictive maintenance software
I have to develop a predictive maitenance software for a power generation company . I am completely new to this field. Can i please get a step by step guide to how to start with it ? Some references would also be helpful .thanks.

f1 score for test data
To get the best f1 score of cross validation I do that
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=10, verbose=10, scoring='f1') grid_result = grid_search.fit(X_train, y_train) print("best parameters", grid_search.best_params_) print('Best score : {}'.format(grid_search.best_score_))
but for the Test score I also need f1score not the accuracy
print("Test Score",grid_search.best_estimator_.score(X_test,y_test.reshape(y_test.shape[0])))
Is there any function e.g.,
f1_score()
that I can use or should I write the function myself? 
Building a prediction model with the dpois function in R
Hello! I am in the beginning stages of building (and learning!) how to build prediction models for sports, specifically using NHL statistics. I am using this code to get my data set:
library(tidyverse) library(rvest) all_urls < c("http://www.hockeyreference.com/leagues/NHL_1991_games.html", "http://www.hockeyreference.com/leagues/NHL_1992_games.html", "http://www.hockeyreference.com/leagues/NHL_1993_games.html", "http://www.hockeyreference.com/leagues/NHL_1994_games.html", # "http://www.hockeyreference.com/leagues/NHL_1995_games.html", "http://www.hockeyreference.com/leagues/NHL_1996_games.html", "http://www.hockeyreference.com/leagues/NHL_1997_games.html", "http://www.hockeyreference.com/leagues/NHL_1998_games.html", "http://www.hockeyreference.com/leagues/NHL_1999_games.html", "http://www.hockeyreference.com/leagues/NHL_2000_games.html", "http://www.hockeyreference.com/leagues/NHL_2001_games.html", "http://www.hockeyreference.com/leagues/NHL_2002_games.html", "http://www.hockeyreference.com/leagues/NHL_2003_games.html", "http://www.hockeyreference.com/leagues/NHL_2004_games.html", # "http://www.hockeyreference.com/leagues/NHL_2005_games.html", "http://www.hockeyreference.com/leagues/NHL_2006_games.html", "http://www.hockeyreference.com/leagues/NHL_2007_games.html", "http://www.hockeyreference.com/leagues/NHL_2008_games.html", "http://www.hockeyreference.com/leagues/NHL_2009_games.html", "http://www.hockeyreference.com/leagues/NHL_2010_games.html", "http://www.hockeyreference.com/leagues/NHL_2011_games.html", "http://www.hockeyreference.com/leagues/NHL_2012_games.html", # "http://www.hockeyreference.com/leagues/NHL_2013_games.html", "http://www.hockeyreference.com/leagues/NHL_2014_games.html", "http://www.hockeyreference.com/leagues/NHL_2015_games.html", "http://www.hockeyreference.com/leagues/NHL_2016_games.html", "http://www.hockeyreference.com/leagues/NHL_2017_games.html") outcomes < NULL for (i in 1:length(all_urls)) { season < all_urls[i] %>% read_html %>% html_node("#games") %>% html_table(header = T) season < season[,c(1:5,7)] season$GVisitor < season$G season$GHome < season$G.1 season < season %>% select(Date, Visitor, GVisitor, Home, GHome, Att.) playoffs < all_urls[i] %>% read_html %>% html_node("#games_playoffs") %>% html_table(header = T) playoffs < playoffs[,c(1:5,7)] playoffs$GVisitor < playoffs$G playoffs$GHome < playoffs$G.1 playoffs < playoffs %>% select(Date, Visitor, GVisitor, Home, GHome, Att.) whole_season < rbind(season, playoffs) outcomes < rbind(outcomes, whole_season) }
This pulls in outcomes, and I am using # goals scored to learn how to predict game outcomes. I have my code in this file (feel free to mess around):
https://github.com/papelr/nhldatar/blob/master/nhldatar/R/nhldatarphase3.R
I would love some pointers on the dpois part of my code, excerpted here:
# Using number of goals for prediction model model_one < rbind( data.frame(goals = outcomes$GHome, team = outcomes$Home, opponent = outcomes$Visitor, home = 1), data.frame(goals = outcomes$GVisitor, team = outcomes$Visitor, opponent = outcomes$Home, home = 0)) %>% glm(goals ~ home + team + opponent, family = poisson (link = log), data = .) summary(model_one) # Probability function / matrix simulate_game < function(stat_model, homeTeam, awayTeam, max_goals = 10) { home_goals < predict(model_one, data.frame(home = 1, team = homeTeam, opponent = awayTeam), type ="response") away_goals < predict(model_one, data.frame(home = 0, team = awayTeam, opponent = homeTeam), type ="response") dpois(0: max_goals, home_goals) %>% dpois(0: max_goals, away_goals) } simulate_game(model_one, "Nashville Predators", "Chicago Blackhawks", max_goals = 10)
I totally understand that a Poisson model isn't the best for sports predictions, but I am rebuilding a model I found for the EPL for learning/practice reasons, and adapting it to the NHL (from David Sheehan's model, https://dashee87.github.io/data%20science/football/r/predictingfootballresultswithstatisticalmodelling/).
Any tips would be great, because currently, this model returns a bunch of warnings:
There were 11 warnings (use warnings() to see them) > warnings() Warning messages: 1: In dpois(., 0:max_goals, away_goals_avg) : noninteger x = 0.062689 2: In dpois(., 0:max_goals, away_goals_avg) : noninteger x = 0.173621

Impolse Response Function vs Cumulative Response Function
I am conducting an VAR analysis and in that context also both Impulse Response and Cumulative Response Functions.
I have understood that the Impulse Function shows the immediate response if one of the variables were to be exposed to a shock.
What I am struggling with is what the Cumulative Function shows. Can someone please help me explain?

calculate median from stem and leaf plot in R
I have a data set like this
data=c(876,578,718,388,562,971,698,298,673,537,642,856,376,508,529,393,354,735,811,504,807,719,464,410,491,557,771,685,448,571,189,661,877,563,647,447,336,526,624,605,496,296,628,481,224,868,804,210,421,435,291,393,605,341,352,374,267,684,685,460,466,498,562,739,562,817,690,720,758,731,480,559,505,703,809,706,631,626,639,585,570,928,516,885,751,561,1020,592,814,843)
When I try to calculate median for data set using median() function it equals to 574.5 but in a book, using stem and leaf plot of data , median is calculated 497.5 How are both medians different?How to calculate stem and leaf plot median through code in R?