R programming for linear model
model2<lm(formula = Losses.in.Thousands~Age, Years.of.Experience,Gender, Married, data = default)
Error in model.frame.default(formula = Losses.in.Thousands ~ Age, data = default, : object 'Married' not found
See also questions close to this topic

Using GLOVEs pretrained glove.6B.50.txt as a basis for word embeddings R
I'm trying to convert textual data into vectors using GLOVE in r. My plan was to average the word vectors of a sentence, but I can't seem to get to the word vectorization stage. I've downloaded the glove.6b.50.txt file and it's parent zip file from: https://nlp.stanford.edu/projects/glove/ and I have visited text2vec's website and tried running through their example where they load wikipedia data. But I dont think its what I'm looking for (or perhaps I am not understanding it). I'm trying to load the pretrained embeddings into a model so that if I have a sentence (say 'I love lamp') I can iterate through that sentence and turn each word into a vector that I can then average (turning unknown words into zeros) with a function like vectorize(word). How do I load the pretrained embeddings into a glove model as my corpus (and is that even what I need to do to accomplish my goal?)

Using dev.copy2pdf(), how can I set the filename of the pdf using paste()?
The
dev.copy2pdf
function allows you to export the currently displayed plot as a pdf file. You name the pdf file withdev.copy2pdf(file = "",...)
. Because I am writing a loop to save multiple plots as pdfs I want to be able to name the new file using an element from my character vector.Let's say I have a character vector called
charactervector
and the first element is"MyImage1"
. I could doNewName < paste(charactervector[1])
, butNewName
wouldn't be recognized byfile=""
. It would simply save it asNewName.pdf
and notMyImage1.pdf
. How can I accomplish what I want to do? 
Getting duplicate rows when merging two data frames in R
I am trying to get every pitch thrown from a specific baseball game from 2011 by the pitcher Justin Verlander, however when I use the function merge rows repeat. I should have somewhere around 100+ (for the total of number of pitches he threw during that game) rows not the 7 thousand that I get from the output. I merged the 2 data frames by url as the primary key but i am not sure if that is correct.
library("Lahman") library("pitchRx") library("ggplot2") library("tidyverse") library("dplyr") pitching_05_07_2011<scrape(start="20110507", end="20110507") atbats<pitching_05_07_2011$atbat pitches<pitching_05_07_2011$pitch head(atbats) head(pitches) verlander_nohitter<filter(atbats,atbats$pitcher_name=="Justin Verlander") verlander_nohitter pitching_atbats<merge(verlander_nohitter,pitches,by="url") pitching_atbats

KerasRegressor: ValueError: continuous is not supported
I am trying to apply a regression learning method to my data which has 28 dimensions.
Code:
import numpy import pandas as pd from keras.models import Sequential from keras.layers import Dense from keras.wrappers.scikit_learn import KerasRegressor from sklearn.model_selection import cross_val_score from sklearn.model_selection import KFold from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.metrics import accuracy_score import os os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # load dataset dataframe = pd.read_csv("gold_train_small.csv", header=None) dataset = dataframe.values # split into input (X) and output (Y) variables X = dataset[:,1:29] Y = dataset[:,0] # load test set # load dataset dataframe = pd.read_csv("gold_test.csv", header=None) dataset = dataframe.values # split into input (X) and output (Y) variables X_test = dataset[:,1:29] Y_test = dataset[:,0] # define base model def baseline_model(): # create model model = Sequential() model.add(Dense(28, input_dim=28, kernel_initializer='normal', activation='relu')) model.add(Dense(1, kernel_initializer='normal')) # Compile model model.compile(loss='mean_squared_error', optimizer='adam') return model # fix random seed for reproducibility seed = 7 numpy.random.seed(seed) # evaluate model with standardized dataset estimator = KerasRegressor(build_fn=baseline_model, epochs=100, batch_size=5, verbose=0) kfold = KFold(n_splits=10, random_state=seed) results = cross_val_score(estimator, X, Y, cv=kfold) print("Results: %.2f (%.2f) MSE" % (results.mean(), results.std())) #Baseline: 31.64 (26.82) MSE # evaluate model with standardized dataset numpy.random.seed(seed) estimators = [] estimators.append(('standardize', StandardScaler())) estimators.append(('mlp', KerasRegressor(build_fn=baseline_model, epochs=50, batch_size=5, verbose=0))) pipeline = Pipeline(estimators) kfold = KFold(n_splits=10, random_state=seed) results = cross_val_score(pipeline, X, Y, cv=kfold) print("Standardized: %.2f (%.2f) MSE" % (results.mean(), results.std())) estimator.fit(X, Y) prediction = estimator.predict(X_test) accuracy_score(Y_test, prediction)
However, I receive the following error for the last line:
ValueError: continuous is not supported
Should I use other measures?

How to specify degree argument in ns() in R, for constructing natural spline of degree 5?
library(ISLR) fit=lm(wage~bs(age ,knots =c(25 ,40 ,60),degree = 5,),data=Wage) fit=lm(wage~ns(age ,knots =c(25 ,40 ,60),degree = 5,),data=Wage)
I am able to build a regression spline of degree 5 polynomial, but how do I build a natural spline of degree 5, as the ns() function lacks a degree argument.
I am only able to produce a cubic natural spline using ns(). Are there any other functions that could be used to produce let's say quadratic natural splines, etc?

wildfly 10.1 mysql replication second database is readonly exception
I started using Wildfly 10.1 and use mysql as a database. I have two databases, DB1 and DB2. Wildfly will connect to DB2 when D1 disconnected. I've done it so far. But when I connect to DB2, I get a "Connection is readonly" error. I looked at the topic here Database Fail Over in Jboss Data sources but
> <connectionproperty name = "readOnly"> false </ connectionproperty>
not resolve. I'm looking for a solution to that. I want master/master not master/slave. In my mysql configuration in standalone.xml:
<datasources> <datasource jndiname="java:jboss/datasources/ExampleDS" poolname="ExampleDS" enabled="true" usejavacontext="true"> <connectionurl>jdbc:h2:mem:test;DB_CLOSE_DELAY=1;DB_CLOSE_ON_EXIT=FALSE</connectionurl> <driver>h2</driver> <security> <username>sa</username> <password>sa</password> </security> </datasource> <datasource jta="true" jndiname="java:jboss/datasources/RailbaseLabDS" poolname="RailbaseLabDS" enabled="true" usejavacontext="true" useccm="true"> <connectionurl>jdbc:mysql://IP1:3306,IP2:3306/DBN?autoreconnect=true</connectionurl> <driverclass>com.mysql.jdbc.Driver</driverclass> <driver>mysql</driver> <urldelimiter></urldelimiter> <security> <username>DBN</username> <password>DBN</password> </security> <validation> <validconnectionchecker classname="org.jboss.jca.adapters.jdbc.extensions.mysql.MySQLValidConnectionChecker"/> <checkvalidconnectionsql>select 1</checkvalidconnectionsql> <backgroundvalidation>true</backgroundvalidation> <backgroundvalidationmillis>5000</backgroundvalidationmillis> </validation> </datasource> <drivers> <driver name="h2" module="com.h2database.h2"> <xadatasourceclass>org.h2.jdbcx.JdbcDataSource</xadatasourceclass> </driver> <driver name="mysql" module="com.mysql"> <xadatasourceclass>com.mysql.jdbc.jdbc2.optional.MysqlXADataSource</xadatasourceclass> </driver> </drivers> </datasources>

Probabilistic classification with Gaussian Bayes Classifier vs Logistic Regression
I have a binary classification problem where I have a few great features that have the power to predict almost 100% of the test data because the problem is relatively simple.
However, as the nature of the problem requires, I have no luxury to make mistake(let's say) so instead of giving a prediction I am not sure of, I would rather have the output as probability, set a threshold and would be able to say, "if I am less than %95 sure, I will call this "NOT SURE" and act accordingly". Saying "I don't know" rather than making a mistake is better.
So far so good.
For this purpose, I tried Gaussian Bayes Classifier(I have a cont. feature) and Logistic Regression algorithms, which provide me the probability as well as the prediction for the classification.
Coming to my Problem:
GBC has around 99% success rate while Logistic Regression has lower, around 96% success rate. So I naturally would prefer to use GBC. However, as successful as GBC is, it is also very sure of itself. The odds I get are either 1 or very very close to 1, such as 0.9999997, which makes things tough for me, because in practice GBC does not provide me probabilities now.
Logistic Regression works poor, but at least gives better and more 'sensible' odds.
As nature of my problem, the cost of misclassifying is by the power of 2 so if I misclassify 4 of the products, I lose 2^4 more (it's unitless but gives an idea anyway).
In the end; I would like to be able to classify with a higher success than Logistic Regression, but also be able to have more probabilities so I can set a threshold and point out the ones I am not sure of.
Any suggestions?
Thanks in advance.

How can I use stepwise regression to remove a specific coefficient in logistic regression within R?
When I run the logistic regression for a cars dataset:
carlogistic.fit4 < glm(as.factor(Mpg01) ~ Weight+Year+Origin, data=carslogic, family="binomial") summary(carlogistic.fit4)
I get the below output: Call: glm(formula = as.factor(Mpg01) ~ Weight + Year + Origin, family = "binomial", data = carslogic)
Deviance Residuals: Min 1Q Median 3Q Max
2.29189 0.10014 0.00078 0.19699 2.60606Coefficients: Estimate Std. Error z value Pr(>z) (Intercept) 2.697e+01 5.226e+00 5.161 2.45e07 *** Weight 6.006e03 7.763e04 7.737 1.02e14 *** Year 5.677e01 8.440e02 6.726 1.75e11 *** OriginGerman 1.256e+00 5.172e01 2.428 0.0152 * OriginJapanese 3.250e01 5.462e01 0.595 0.5519  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 549.79 on 396 degrees of freedom Residual deviance: 151.06 on 392 degrees of freedom AIC: 161.06
However, if you notice the pvalue for Japanese origin cars is greater than 0.05 and hence is insignificant. I want to remove this from the model, however, the column header is Origin as you see in the initial code. How do I exclude Japanese origin specifically from the model?

chisquare goodnessoffit and Rsquare measures from the fixedeffect logistic regression using 'feglm' function
I am trying to get the chisquare goodnessoffit and Rsquare measures from the following fixedeffect logistic regression using 'feglm' function. However, I find very limited information to even check this.
> regress=feglm(Y ~ X1+X2+X3+X4+X5+X6*X10+X7+X8+X9+X11 Firm+Time, data=DATA, family=binomial(link="logit")) > summary(regress) binomial Y ~ X1 + X2 + X3 + X4 + X5 + X6 * X10 + X7 + X8 + X9 + X11  Firm + Time l= [127, 15], n= 14139, deviance= 9891.112 Structural parameter(s): Estimate Std. error z value Pr(> z) X1 7.006e02 3.990e03 17.560 < 2e16 *** X2 1.473e+00 1.047e01 14.077 < 2e16 *** X3 9.105e02 2.691e02 3.384 0.000715 *** X4 2.896e04 3.294e05 8.791 < 2e16 *** X5 1.223e01 4.557e03 26.848 < 2e16 *** X6 1.154e01 2.267e01 0.509 0.610699 X10 6.273e03 2.387e+00 0.003 0.997903 X7 2.663e02 1.192e02 2.234 0.025453 * X8 2.940e01 9.002e02 3.266 0.001092 ** X9 4.115e+00 1.080e01 38.103 < 2e16 *** X11 1.115e03 3.442e01 0.003 0.997415 X6:X10 3.344e02 2.533e01 0.132 0.894962  Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 ( 6244 observation(s) deleted due to missingness )
I suppose I will need at least loglikelihood, residual deviance, etc. values to even calculate the chisquare values and Rsquares which I cannot find from the above result.
May I get help on this?

Linear Programming feasible using linprog and unfeasible using Gurobi in Matlab
I have the following very simple linear programming problem to solve in Matlab
clear %The unknown %x=[x1,...,x10]; %The constraints %x2+x8=Phi12 %x3+x7=Phi21 %x5=infvalue; %x10=infvalue; %The known parameters Phi12=3.3386; Phi21=3.0722; infvalue=50; sizex=10; %size of the unknown
The problem admits a solution.
When I implement this LP using
linprogr
it find a solution.When I implement this LP using the Gurobi solver it tells me that the problem is unfeasible.
What am I doing wrong? Here's my code.
beq=[Phi12; Phi21; infvalue; infvalue]; rAeq=[ 1 1 ... 2 2 ... 3 ... 4]; cAeq=[ 2 8 ... 3 7 ... 5 10]; fillAeq=[1 1 ... 1 1 ... ones(1,2)]; Aeq=sparse(rAeq, cAeq,fillAeq, size(beq,1),sizex); Aeqfull=full(Aeq); %linprogr f=zeros(sizex,1); xlinprog = linprog(f,[],[],Aeqfull,beq); %Gurobi clear model; model.A=Aeq; model.rhs=beq; model.sense=repmat('=', size(Aeq,1),1); model.obj=f; resultgurobi=gurobi(model);
During my attempts to understand what is going on: if I put any positive value in place of
3.3386
, then Gurobi works perfectly. Whta 
Extractin LP model from AMPL
I have a very big and complicated LP model in AMPL. I need to extract
Ax<= b
format from my LP (i.e), I need to extract all my data in form of matrixA
,b
, and all variables to be concatenate into a large vectorx
.How can I do that?

Python pulp constraint  Doubling the weight of any one variable which contributes the most
I am trying to use http://www.philipkalinda.com/ds9.html to set up a constrained optimisation.
prob = pulp.LpProblem('FantasyTeam', pulp.LpMaximize) decision_variables = [] res = self.team_df # Set up the LP for rownum, row in res.iterrows(): variable = str('x' + str(rownum)) variable = pulp.LpVariable(str(variable), lowBound = 0, upBound = 1, cat= 'Integer') #make variables binary decision_variables.append(variable) print ("Total number of decision_variables: " + str(len(decision_variables))) total_points = "" for rownum, row in res.iterrows(): for i, player in enumerate(decision_variables): if rownum == i: formula = row['TotalPoint']* player total_points += formula prob += total_points print ("Optimization function: " + str(total_points))
The above, however, creates an optimisation where if points earned by x1 = X1, x2=X2.... and xn=Xn, it maximises x1*X1 + x2*X2 +..... + xn*XN. Here xi is the points earned by the XI variable. However, in my case, I need to double the points for the variable that earns the most points. How do I set this up?
Maximize OBJ: 38.1 x0 + 52.5 x1 + 31.3 x10 + 7.8 x11 + 42.7 x12 + 42.3 x13 + 4.7 x14 + 49.5 x15 + 21.2 x16 + 11.8 x17 + 1.4 x18 + 3.2 x2 + 20.8 x3 + 1.2 x4 + 24 x5 + 25.9 x6 + 27.8 x7 + 6.2 x8 + 41 x9
When I maximise the sum x1 gets dropped but when I maximise with the top guy taking double points, it should be there
Here are the constraints I am using:
Subject To _C1: 10.5 x0 + 21.5 x1 + 17 x10 + 7.5 x11 + 11.5 x12 + 12 x13 + 7 x14 + 19 x15 + 10.5 x16 + 5.5 x17 + 6.5 x18 + 6.5 x2 + 9.5 x3 + 9 x4 + 12 x5 + 12 x6 + 9.5 x7 + 7 x8 + 14 x9 <= 100 _C10: x12 + x2 + x6 >= 1 _C11: x10 + x11 + x17 + x3 <= 4 _C12: x10 + x11 + x17 + x3 >= 1 _C13: x0 + x10 + x11 + x12 + x13 + x14 + x15 + x18 + x2 <= 5 _C14: x0 + x10 + x11 + x12 + x13 + x14 + x15 + x18 + x2 >= 3 _C15: x1 + x16 + x17 + x3 + x4 + x5 + x6 + x7 + x8 + x9 <= 5 _C16: x1 + x16 + x17 + x3 + x4 + x5 + x6 + x7 + x8 + x9 >= 3 _C2: x0 + x1 + x10 + x11 + x12 + x13 + x14 + x15 + x16 + x17 + x18 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 = 8 _C3: x0 + x14 + x16 + x5 <= 4 _C4: x0 + x14 + x16 + x5 >= 1 _C5: x15 + x18 + x4 + x7 + x8 <= 4 _C6: x15 + x18 + x4 + x7 + x8 >= 1 _C7: x1 + x13 + x9 <= 4 _C8: x1 + x13 + x9 >= 1 _C9: x12 + x2 + x6 <= 4
Naturally, maximising A + B + C + D doesn't maximise max(2A+B+C+D, A+2B+C+D, A+B+2C+D, A+B+C+2D)