What's the difference between fitting a model with feature X and a model that fits with X but then zeros it out?
Say, I have many features to fit a regression model. The features can be categorized into 2 classes: X
and Y
. I'd like to know how X
affects the labels. I wonder the difference between 2 methods:
 Directly fitting the models with
X
, and withoutY
;  Fitting with
X
andY
both in, but after fitting, zeroing out coef's ofY
and looking at coef ofX
.
What's the difference between the two? What are they modeling, respectively? Which should I take? Thanks
See also questions close to this topic

How to handle the test data for predictions after model deployment when we use the string indexer or normalisation?
I picked the us census data (adult dataset) from uci machine learning repository and then i tried to predict the income from that. I used python and scikitlearn to create the model. As it takes the numeric data, i have used the string indexer on all the columns which were categorical in nature like native country, workclass etc. Now i have created a machine learning model using decision tree. I have deployed the model, but now as i want to predict using the test set, that was never processed with the same pipeline with training data, so i am not sure, if i will run the string indexer, it will give it different ids and my prediction will be wrong. So how to handle it? Hope it make sense and not broad anymore;
Below given was my previous question, which is marked as broad, whilst i think it was quite valid and specific to design question ....
I am trying to predict from a model that i have created using the string indexer on many categorical columns and also applied decimal scaling on few columns. Not sure how i can use this model for prediction? Do i need to use the same string indexer on the test data and needs to do the same decimal scaling on columns before i send that test data to model or there is an alternative or easy way to do so?
Thanks for your response in advance.

Training model with large range for target variable
I am training a model using R and have a specific issue with a target variable whose values vary from 10n to 10m (6 orders of magnitude). I took a log transform before training the model and the relative error [(pred/actual 1) * 100] was reasonable (within +/ 7%), but when I transform the predictions back to the original scale, the relative error gets magnified (in some cases to ~ 40%). I want the error in the original scale to be within +/ 1015%.
I tried the similar approach on an output variable which does not have such a large range and the error is much better there on the normal scale.
I am wondering if anyone has faced the same problem before and resolved it successfully. Any tips would be appreciated.

Mongoose model architecture and Node require order
I am having a problem understanding how I should structure my Mongoose model files so I don't have issues with dependencies when I require them. I have a lot of model files and when I require one model file in another to use as a reference type, I run into errors if I don't require the model files in order. What should I do different architecturally if I can't load them in a linear order because dependencies go up and down the list of model files? Do I just have to juggle them down the line and hope I don't paint myself into a corner or am I doing this wrong?
app.js  top of my app
const path = require('path'); const bodyParser = require("bodyparser"); const mongoose = require('mongoose'); const express = require('express'); require('./models/task'); require('./models/taskstory'); require('./models/majortask'); require('./models/majortaskinstance'); require('./models/client'); require('./models/discipline'); require('./models/disciplineinstance'); require('./models/estimate'); require('./models/project'); require('./models/user'); const taskstoryRoutes = require('./routes/taskstory'); const majortaskRoutes = require('./routes/majortask'); const taskRoutes = require('./routes/task'); const userRoutes = require("./routes/user"); const clientRoutes = require("./routes/client"); const projectRoutes = require("./routes/project"); const estimateRoutes = require('./routes/estimate'); const disciplineRoutes = require('./routes/discipline'); const app = express();
And an example model... estimate.js
const mongoose = require('mongoose'); const disciplineinstanceschema = require('mongoose').model('disciplineInstance').schema; var ObjectId = require('mongoose').Types.ObjectId; const estimateScheme = mongoose.Schema({ estimatename: { type: String, required: true, unique: true }, summarytext: { type: String }, scopetext: { type: String }, disciplines:{type: [disciplineinstanceschema], sparse:true} }); module.exports = mongoose.model('Estimate', estimateScheme);

Data perturbation  How to perform it?
I am doing some projects related to statistics simulation using R based on "Introduction to Scientific Programming and Simulation Using R" and in the Students projects session (chapter 24) i am doing the "The pipe spiders of Brunswick" problem, but i am stuck on one part of an evolutionary algorithm, where you need to perform some data perturbation according to the sentence bellow:
"With probability 0.5 each element of the vector is perturbed, independently of the others, by an amount normally distributed with mean 0 and standard deviation 0.1"
What does being "perturbed" really mean here? I dont really know which operation I should be doing with my vector to make this perturbation happen and im not finding any answers to this problem. Thanks in advance!

Python: Weighted coefficient of variation
How can I calculate the weighted coefficient of variation (CV) over a NumPy array in Python? It's okay to use any popular thirdparty Python package for this purpose.
I can calculate the CV using
scipy.stats.variation
, but it's not weighted.import numpy as np from scipy.stats import variation arr = np.arange(5, 5) weights = np.arange(9, 1, 1) # Same size as arr cv = abs(variation(arr)) # Isn't weighted

PCA results on imbalanced data with duplicates
I am using sklearn IPCA decomposition and surprised that if I delete duplicates from my dataset, the result differs from the "unclean" one.
What is the reason? As I think, the variance is the same.