R programming for linear model
model2<-lm(formula = Losses.in.Thousands~Age, Years.of.Experience,Gender, Married, data = default)
Error in model.frame.default(formula = Losses.in.Thousands ~ Age, data = default, : object 'Married' not found
See also questions close to this topic
Add Hive odbc driver to server
I'm using hosted RStudio on Red Hat CentOs7.
I would like to connect to a Hive database and was looking at odbc package after reading a how to blog page on rstudio.
Example code from the page:
library(odbc) con <- dbConnect(odbc::odbc(), driver = <driver>, host = <host>, dbname = <dbname>, user = <user>, password = <password>, port = 10000)
This is my current objective, to create a connection to Hive. The part that's tripping me up is the driver.
On the Horton Works add ons page I copied a link to CENTOS7 (64-Bit) driver: https://public-repo-1.hortonworks.com/HDP/hive-odbc/220.127.116.113/Linux/EL7/hive-odbc-native-18.104.22.1683-1.el7.x86_64.rpm
Then, on my linux server:
sudo wget https://public-repo-1.hortonworks.com/HDP/hive-odbc/22.214.171.1243/Linux/EL7/hive-odbc-native-126.96.36.1993-1.el7.x86_64.rpm sudo yum install hive-odbc-native-188.8.131.523-1.el7.x86_64.rpm
Everything appeared to work OK up till this point.
I added the driver to the con call:
con <- dbConnect(odbc::odbc(), driver = 'hive-odbc-native-184.108.40.2063-1.el7.x86_64', host = 'example.com', dbname = 'mydb', user = 'doug', password = 'password123', port = 10000)
However, R tells me: "Error: nanodbc/nanodbc.cpp:950: 01000: [unixODBC][Driver Manager]Can't open lib 'drivers/hive-odbc-native-220.127.116.113-1.el7.x86_64' : file not found ".
How can I use this obdc driver for my connection?
fill in the blanks type question generation NLP (non-english)
I am working on a project related to NLP for Mandarin language. The objective is to generate fill in the blanks type question for the corpus text data.
Any related existing work or any references to start with especially for non-english language. Suggestions welcomed.
Thanks in advance.
how to join two data frames in r
I've two data frames dt and dt1 as follows:
dt dt1 id date id date 1 2018-09-20 1 2018-09-20 1 2018-09-18 2 2018-09-14 2 2018-09-16 2 2018-09-15 3 2018-09-14
how to combine two data frames and get the output as:
dt2 id date 1 2018-09-20 1 2018-09-20 1 2018-09-18 2 2018-09-16 2 2018-09-15 2 2018-09-14 3 2018-09-14
Minimal glmnet example for factors
I am trying to understand how to use the R package glmnet.
Suppose I have a dataset, representing games played between two teams, with the 'win' column defining the result.
library(RcppAlgos) library(dplyr) data <- RcppAlgos::permuteGeneral(c("A", "B", "C", "D", "E"), 2, repetition = TRUE) %>% as.data.frame() %>% setNames(c("team1", "team2")) %>% mutate(win = rbinom(25, 1, 0.5))
where 1 represents that team1 won, and 0 represents that team1 lost.
I now want to run this data through glmnet, with the 'won' column as the response.
I know that I need to use model.matrix with my factor variables, but it doesn't seem to me that that would give the right result.
x <- model.matrix(data$win ~ data$team1 + data$team2) fit <- glmnet(x, data$win)
Can anyone help?
One shot learning for a regression task
I know one shot learning can be used for classification as in the Siamese-network, but can we use one shot learning for a regression task?
Centering variables for multiple regression - interested in group effects
I'm trying to run a multiple regression model looking at the length-weight relationship in fish. So y = weight, x = weight. What I want to examine specifically is if the length-weight relationship between different populations (same species) differs - I've run the model as:
weight = length * population
BUT have also reading a lot about centering data in regression models. It seems to make no sense to me to grand-mean centre length for this analysis as i'm specifically interested in the differences in L-W relationship between the groups, but should I group-centre for length? Or, not centre at all?
Any help or pointers greatly apriciated.
Binary logistic regression: significance without an increase in Overall Percentage?
I have some data that with two independent variables and one dependent. I'm using SPSS and my IVs have interaction. My results are below.
I don't have a stats background and am new to LG, so not sure how to interpret my results. Specifically, as I highlight below, the data seems to have significance (χ2(1) = 7.737, p = .005), but the Overall Percentage for the model is the same as the Null Hypothesis (60.0)?
Am I doing something wrong or can binary LG show significance in the data without a bump in Overall Percentage?
Unbalanced training samples for binary classification (90% vs 10%) - Tensorflow
I have a training sample of 100,000 (with 5 features) (90,000 classified as '0' and the rest classified as '1')
I am getting the 98% accuracy but precision/recall rates were 55%
Any suggestion to improve precision/recall rates? using tensorflow
#Loss function after sigmoid applied on yy_ loss = tf.losses.log_loss(yy_, scores, scope="loss") optimizer = tf.train.GradientDescentOptimizer(learning_rate=.01) train_op = optimizer.minimize(loss) prediction = (scores > 0.5)
Logist Regression Predicted Values Are Clustered
I have built a logistic regression model (using MSFT Machine Learning studio). The overall model seems to be decent.
However, when I use the predictive functionality of the web platform to see how it does, the values are not as evenly distributed as I saw in the training/testing phase. I did a rough check on the training data and data I am using to do predictions and there is not a significant difference.
Linear Optimization R
I am new to optimization so please bear with me. Here is my problem:
A, B, C, D and E are percentages (18%,2%,1%,78%,1%)
Maximize sum (A(x) + B(x) + C(x) +D(x) + E(x)) ie maximize x ( x<=499572)
- A(x) <= 20076
- B(x) <= 8619
- C(x) <= 145
- D(x) <= 465527
- E(x) <= 5205
How do I frame this problem in R?
I was using LPsolve package but I am ok with any suggestions.
how to get sub-optimal solution in changing set condition
If I solve the set problem using binary integer programming, but set is a little changing, then I can solve the sub-optimal solution not using binary integer programming? If it can be, then can you give me an answer?
Is Microsoft Solver Foundation discontinued?
We have a Linear Programming Problem. Currently we are solving this problem with Simplex method in our .NET desktop application.
We are planning to use Microsoft Solver in our application.
With reference to this link, we have following questions:
- Is the product still active?
- Does the product have enterprise level support for active users?
- We have seen the latest update on nuget (in January 2017). Is Microsoft planning for any new release in near future?
It would be great if anyone can provide any additional information or pointers regarding the product.