Mapping a set of input variables to a single output variable; also optimization of input (ex. Car inputs to fuel economy)
I'm wondering what are the state-of-the-art methods when having to learn the mapping between a set of input variables to a single output. An example is the inputs of a car to fuel efficiency where the data set contained these variables:
- Time and Date
- Speed (Kph)
- Car tire pressure (psi)
- Fuel Economy (Liters/gallon)
(I understand the data set is lacking and flawed. But this is just an example.)
Most cases use multiple linear regressions to do this.
Some also use a basic neural network with an input, hidden, and output layer could be used to solve this problem.The input is a m x n array (where m is the number of data points and n is the number of variables (in the above case, 3). And the output is a single variable representing fuel economy. You then compare the predicted fuel economy to the actual and begin optimizing the neural network (back propagation). However I learned that from a relatively old source (2013). I was wondering if there have been new advancements in solving this kind of problem.
Can RNNs (LSTMs) be used to solve this? If the data is measured every minute for example, I have reason to believe that the current fuel economy is affected by the previous car inputs (t-1, t-2, etc.)
I also learned that I can use evolutionary algorithms to optimize this sort of problem. (ex. Whats the fastest I can go while having a fuel economy of 50 L/gal). What are the most popular evolutionary algorithms for this kind of problem? Could anyone recommend any papers or other material that could help me learn more about this?
See also questions close to this topic
cvxpy reports a PSD matrix not convex
I ran into a strange issue today with cvxpy package (https://www.cvxpy.org, version 1.0).
1) Say I have a covariance matrix Sigma below
0 1 2 3 4 0 0.000063 -0.000018 0.000033 0.000027 0.000008 1 -0.000018 0.000073 -0.000021 -0.000023 -0.000024 2 0.000032 -0.000022 0.000093 0.000029 0.000013 3 0.000027 -0.000023 0.000030 0.000064 0.000016 4 0.000008 -0.000025 0.000013 0.000015 0.000089
2) This matrix is semi positive defnite (psd) because all eigen values are positive:
pdb> np.linalg.eigvals(Sigma) array([1.63363083e-04, 8.51035789e-05, 3.60510844e-05, 4.48872115e-05, 5.32619175e-05])
I double checked above using R...
3) However, cvxpy reports that w' * Sigma * w is not convex!
pdb> cvx.quad_form(w, Sigma).is_convex() False pdb> cvx.quad_form(w, Sigma).is_nonneg() False
I must have misunderstood something...your help is appreciated!
Generating a value in a step function based on a variable
Im creating an optimization model using gurobi and have some trouble with one of my constraints. The constraint is used to establish the quantity and is based on supply and demand curves. The supply curves cause the problems as it is a step curve. As seen in the code, the problem is when im writing the def MC section.
Demand_Curve1_const = 250 Demand_Curve1_slope = -0.025 MC_water = 0 MC_gas = 80 MC_coal = 100 CAP_water = 5000 CAP_gas = 2500 CAP_coal = 2000 model = pyo.ConcreteModel() model.Const_P1 = pyo.Param(initialize = Demand_Curve1_const) model.slope_P1 = pyo.Param(initialize = Demand_Curve1_slope) model.MCW = pyo.Param(initialize = MC_water) model.MCG = pyo.Param(initialize = MC_gas) model.MCC = pyo.Param(initialize = MC_coal) model.CW = pyo.Param(initialize = CAP_water) model.CG = pyo.Param(initialize = CAP_gas) model.CC = pyo.Param(initialize = CAP_coal) model.qw = pyo.Var(within = pyo.NonNegativeReals) model.qg = pyo.Var(within = pyo.NonNegativeReals) model.qc = pyo.Var(within = pyo.NonNegativeReals) model.d = pyo.Var(within = pyo.NonNegativeReals) def MC(): if model.d <=5000: return model.MCW if model.d >= 5000 and model.d <= 7500: return model.MCG if model.d >= 7500 : return model.MCC def Objective(model): return(model.Const_P1*model.d + model.slope_P1*model.d*model.d - (model.MCW*model.qw + model.MCG*model.qg + model.MCC*model.qc)) model.OBJ = pyo.Objective(rule = Objective, sense = pyo.maximize) def P1inflow(model): return(MC == model.Const_P1+model.slope_P1*model.d*2) model.C1 = pyo.Constraint(rule = P1inflow)
Is there a faster way of to generate the required output than using a one-to-many join in Proc SQL?
I require an output that shows the total number of hours worked in a rolling 24 hour window. The data is currently stored such that each row is one hourly slot (for example 7-8am on Jan 2nd) per person and how much they worked in that hour stored as "Hour". What I need to create is another field that is the sum of the most recent 24 hourly slots (inclusive) for each row. So for the 7-8am example above I would want the sum of "Hour" across the 24 rows: Jan 1st 8-9am, Jan 1st 9-10am... Jan 2nd 6-7am, Jan 2nd 7-8am.
Rinse and repeat for each hourly slot.
There are 6000 people, and we have 6 months of data, which means the table has 6000 * 183 days * 24 hours = 26.3m rows.
I am currently done this using the code below, which works on a sample of 50 people very easily, but grinds to a halt when I try it on the full table, somewhat understandably.
Does anyone have any other ideas? All date/time variables are in datetime format.
proc sql; create table want as select x.* , case when Hours_Wrkd_In_Window > 16 then 1 else 0 end as Correct from ( select a.ID , a.Start_DTTM , a.End_DTTM , sum(b.hours) as Hours_Wrkd_In_Window from have a left join have b on a.ID = b.ID and b.start_dttm > a.start_dttm - (24 * 60 * 60) and b.start_dttm <= a.start_dttm where datepart(a.Start_dttm) >= &report_start_date. and datepart(a.Start_dttm) < &report_end_date. group by ID , a.Start_DTTM , a.End_DTTM ) x order by x.ID , x.Start_DTTM ;quit;
Word2vec compact models
Tell me if there are any w2v models that do not require a dictionary. So, everything that I found in torchtext first wants to know the dictionary build_vocab. But if I have a huge body of text, I would like to have a model that works at the level of phrases. But I did not find one.
supervised learning for parcours
for my school project i got to implement a neural network for a parcours. I know it useless but i want the neural net to learn a simple algorithm:
if front right is bigger than front left -> go right, else -> go left.
I wanna use supervised learning. I got 2 inputs neurons, 2 hidden neurons and 1 output neuron. The goal is that when the player has to go left the output gives a number under 0.5 and if the player has to go right the nn has to return a number greater that 0.5.
Somehow I made a mistake and the nn always tries to return 0.5. Do so know what i did wrong and what i can do now.
thats how the parcours looks like
Categorical Variables and too many NA for ML model
We have a data set of 250 variables and 50,000 records. One variable is numeric, 248 variables are categorical and one variable is binary (the target variable). Each category variable has more than 3000 levels. We have many NA. Each row is the record of diseases that a patient has suffered. That's why there are so many NAs. Because a patient may have suffered 100 diseases, and another has suffered only one. The objective is to be able to predict if patients can have a specific disease from the information of other diseases they have suffered. How can this data set be handled in machine learning?
Should Pytorch Optimizer be extended for weight optimization by genetic algorithm?
I am reading paper about use of evolutionary algorithms for the optimization of weights for the neural network: https://eplex.cs.ucf.edu/papers/morse_gecco16.pdf Such approach is alternative for stochastic gradient descent/backpropagation. Note also that this is different from the use of evolutionary algorithms for the optimization of the hyperparameters or architecture of the neural networks.
I am using Pytorch and there is used such schematic code:
encoder_optimizer = optim.SGD(encoder.parameters(), lr=learning_rate) encoder_optimizer.zero_grad() loss = train(input_tensor, target_tensor, encoder, decoder, encoder_optimizer, decoder_optimizer, criterion) loss.backward() encoder_optimizer.step()
So - there is predefinet optimizes SGD that takes as input all the weights and biases (w's and b's) of the encoder neural network and that observers from the train function how those parameters impact the loss function and then this optimizer adjusts those parameters w's and b's.
It do this by stochastic gradient descent. But my task would be to adjust those parameters (to find new set of values for the next iteration) by creation of new generation of parameters values using evolutionary algorithm.
So - architecturally - is my intention good to extend Pytorch Optimizer for my class of evolutionary optimizer? I am studing documentation https://pytorch.org/docs/stable/optim.html and it says that gradient is provided as input for the optimization. That is clear, but this gradient is not needed for my evolutionary optimizer. Maybe it is architecturally wrong to simply ignore provided gradient, maybe I should create completely new class for my evolutionary optimimzer? I am afraid of introducing some unintended consequences by extending class for completely different optimization approach.
As far as I can say that so far noone has implemented evolutionary optimizer for PyTorch.
pinv(H) is not equal to pinv(H'*H)*H'
I'm testing the
y = SinC(x)function with single hidden layer feedforward neural networks (SLFNs) with 20 neurons.
With a SLFN, in the output layer, the output weight(OW) can be described by
OW = pinv(H)*T
after adding regularized parameter
OW = pinv(I/gamma+H'*H)*H'*T
gamma -> Inf, pinv(H'*H)*H'*T == pinv(H)*T, also pinv(H'*H)*H' == pinv(H).
But when I try to calculate
pinv(H), I find a huge difference between these two when neurons number is over 5 (under 5, they are equal or almost the same).
For example, when
cond(H) = 21137561386980.3,
rank(H) = 10,
H = [0.736251410036783 0.499731137079796 0.450233920602169 0.296610970576716 0.369359425954153 0.505556211442208 0.502934880027889 0.364904559142718 0.253349959726753 0.298697900877265; 0.724064281864009 0.521667364351399 0.435944895257239 0.337878535128756 0.364906002569385 0.496504064726699 0.492798607017131 0.390656915261343 0.289981152837390 0.307212326718916; 0.711534656474153 0.543520341487420 0.421761457948049 0.381771374416867 0.360475582262355 0.487454209236671 0.482668250979627 0.417033287703137 0.329570921359082 0.315860145366824; 0.698672860220896 0.565207057974387 0.407705930918082 0.427683127210120 0.356068794706095 0.478412571446765 0.472552121296395 0.443893207685379 0.371735862991355 0.324637323886021; 0.685491077062637 0.586647027111176 0.393799811411985 0.474875155650945 0.351686254239637 0.469385056318048 0.462458480695760 0.471085139463084 0.415948455902421 0.333539494486324; 0.672003357663056 0.607763454504209 0.380063647372632 0.522520267708374 0.347328559602877 0.460377531907542 0.452395518357816 0.498449772544129 0.461556360076788 0.342561958147251; 0.658225608290477 0.628484290731116 0.366516925684188 0.569759064961507 0.342996293691614 0.451395814182317 0.442371323528726 0.525823695636816 0.507817005881821 0.351699689941632; 0.644175558300583 0.648743139215935 0.353177974096445 0.615761051907079 0.338690023332811 0.442445652121229 0.432393859824045 0.553043275759248 0.553944175102542 0.360947346089454; 0.629872705346690 0.668479997764613 0.340063877672496 0.659781468051379 0.334410299080102 0.433532713184646 0.422470940392161 0.579948548513999 0.599160649563718 0.370299272759337; 0.615338237874436 0.687641820315375 0.327190410302607 0.701205860709835 0.330157655029498 0.424662569229062 0.412610204098877 0.606386924575225 0.642749594844498 0.379749516620049]; T=[-0.806458764562879 -0.251682808380338 -0.834815868451399 -0.750626822371170 0.877733363571576 1 -0.626938984683970 -0.767558933097629 -0.921811074815239 -1]';
There is a huge difference between
pinv(H'*H)*H*T = [-4803.39093243484 3567.08623820149 668.037919243849 5975.10699147077 1709.31211566970 -1328.53407325092 -1844.57938928594 -22511.9388736373 -2377.63048959478 31688.5125271114]'; pinv(H)*T = [-19780274164.6438 -3619388884.32672 -76363206688.3469 16455234.9229156 -135982025652.153 -93890161354.8417 283696409214.039 193801203.735488 -18829106.6110445 19064848675.0189]'.
I also find that if I round
pinv(H)*Treturn the same answer. So I guess one of the reason might be the float calculation issue inside the matlab.
cond(H)is large, any small change of
Hmay result in large difference in the inverse of
H. I think the
roundfunction may not be a good option to test. As Cris Luengo mentioned, with large
cond,the numerical imprecision will affect the accuracy of inverse.
In my test, I use 1000 training samples with noise between
[-0.2,0.2], and test samples are noise free. 20 neurons are selected. The
OW = pinv(H)*Tcan give reasonable results for
SinCtraining, while the performance for
OW = pinv(H'*H)*Tis worse. Then I try to increase the precision of
pinv(vpa(H'*H)), there's no significant improvement.
Does anyone know how to solve this?
Result of Linear regression in WEKA
I'm working on WEKA program. I have data set contains 22 attributes and 500 rows. I want to apply linear regression on it. when I start run linear regression, the result of correlation coefficient .. etc are appear individually for each attribute. (ex: attribute p1= .123, p2=.098 .. etc).
How can I apply linear regression then the result appear as the following picture: enter image description here
I mean I want CHB1, CHB2, CHB3 and CHB4 as p1,p2,p3 as the following picture:
Is that possible in weka? if yes, what is the steps please.
"Error in as.Formula(formula) : could not find function "as.Formula"" in my codee
While using sfa analysis package this error is coming,"Error in as.Formula(formula) : could not find function "as.Formula"
install.packages("sfa") library(frontier) library(sfa) truncated_normal<-sfa(formula = lMILK ~ lLAND + lFEED + lCOWS + lLABOR, data = cs)
Is it possible to stack graphs in R?
screen shot of current R window1
I have 4 line graphs - produced using AIC stepwise multiple regression in the olsrr package.
Is it possible to plot all 4 graphs in one? and then code to colour change the lines?
Integer, multi-objective optimization with Platypus (Python)
I am exploring the Platypus library for multi-objective optimization in Python. It appears to me that Platypus should support variables (optimization parameters) as integers out of the box, however this simple problem (two objectives, three variables, no constraints and Integer variables with SMPSO):
from platypus import * def my_function(x): """ Some objective function""" return [-x ** 2 - x ** 2, x - x] def AsInteger(): problem = Problem(3, 2) # define 3 inputs and 1 objective (and no constraints) problem.directions[:] = Problem.MAXIMIZE int1 = Integer(-50, 50) int2 = Integer(-50, 50) int3 = Integer(-50, 50) problem.types[:] = [int1, int2, int3] problem.function = my_function algorithm = SMPSO(problem) algorithm.run(10000)
Traceback (most recent call last): File "D:\MyProjects\Drilling\test_platypus.py", line 62, in AsInteger() File "D:\MyProjects\Drilling\test_platypus.py", line 19, in AsInteger algorithm.run(10000) File "build\bdist.win-amd64\egg\platypus\core.py", line 405, in run File "build\bdist.win-amd64\egg\platypus\algorithms.py", line 820, in step File "build\bdist.win-amd64\egg\platypus\algorithms.py", line 838, in iterate File "build\bdist.win-amd64\egg\platypus\algorithms.py", line 1008, in _update_velocities TypeError: unsupported operand type(s) for -: 'list' and 'list'
Similarly, if I try to use another optimization technique in Platypus (CMAES instead of SMPSO):
Traceback (most recent call last): File "D:\MyProjects\Drilling\test_platypus.py", line 62, in AsInteger() File "D:\MyProjects\Drilling\test_platypus.py", line 19, in AsInteger algorithm.run(10000) File "build\bdist.win-amd64\egg\platypus\core.py", line 405, in run File "build\bdist.win-amd64\egg\platypus\algorithms.py", line 1074, in step File "build\bdist.win-amd64\egg\platypus\algorithms.py", line 1134, in initialize File "build\bdist.win-amd64\egg\platypus\algorithms.py", line 1298, in iterate File "build\bdist.win-amd64\egg\platypus\core.py", line 378, in evaluate_all File "build\bdist.win-amd64\egg\platypus\evaluator.py", line 88, in evaluate_all File "build\bdist.win-amd64\egg\platypus\evaluator.py", line 55, in run_job File "build\bdist.win-amd64\egg\platypus\core.py", line 345, in run File "build\bdist.win-amd64\egg\platypus\core.py", line 518, in evaluate File "build\bdist.win-amd64\egg\platypus\core.py", line 160, in call File "build\bdist.win-amd64\egg\platypus\types.py", line 147, in decode File "build\bdist.win-amd64\egg\platypus\tools.py", line 521, in gray2bin TypeError: 'float' object has no attribute 'getitem'
I get other types of error messages with other algorithms (OMOPSO, GDE3). While the algorithms NSGAIII, NSGAII, SPEA2, etc... appear to be working.
Has anyone ever encountered such issues? Maybe I am specifying the problem in te wrong way?
Thank you in advance for any suggestion.
Genetic training and test (Unseen) set prediction using NEAT
I am trying to use NEAT-python to make predictions on a categorical variable (with response true or false represented as 1 and 0) given a particular set of predictor variables.
I have split the data into a training and test sets (75% and 25% respectively). So after genetic training takes place on training set using NEAT to a specified fitness level (specified in config file). Thereafter, predictions take place on the test set.
My data is split relatively equally on the trues and falses in training. However, my true positive rate (TPR) is much higher than my true negative rate (TNR) on the test set (about 95% and 75% respectively). I have tried changing the fitness level however the TNR does not seem to be getting better. I need to get the TNR to around 85%. Please assist by suggesting something that could make the model better at identifying the falses correctly.
My config file is below:
[NEAT] #80% fitness on training set is used (specified at fitness_threshold) fitness_criterion = max fitness_threshold = 11522 pop_size = 300 reset_on_extinction = False [DefaultGenome] # node activation options activation_default = sigmoid activation_mutate_rate = 0.0 activation_options = sigmoid tanh # node aggregation options aggregation_default = sum aggregation_mutate_rate = 0.0 aggregation_options = sum # node bias options bias_init_mean = 0.0 bias_init_stdev = 1.0 bias_max_value = 30.0 bias_min_value = -30.0 bias_mutate_power = 0.5 bias_mutate_rate = 0.7 bias_replace_rate = 0.1 # genome compatibility options compatibility_disjoint_coefficient = 1.0 compatibility_weight_coefficient = 0.5 # connection add/remove rates conn_add_prob = 0.5 conn_delete_prob = 0.5 # connection enable options enabled_default = True enabled_mutate_rate = 0.01 feed_forward = False initial_connection = full_direct # node add/remove rates node_add_prob = 0.2 node_delete_prob = 0.2 # network parameters num_hidden = 0 num_inputs = 13 num_outputs = 1 # node response options response_init_mean = 1.0 response_init_stdev = 0.0 response_max_value = 30.0 response_min_value = -30.0 response_mutate_power = 0.0 response_mutate_rate = 0.0 response_replace_rate = 0.0 # connection weight options weight_init_mean = 0.0 weight_init_stdev = 1.0 weight_max_value = 30 weight_min_value = -30 weight_mutate_power = 0.5 weight_mutate_rate = 0.8 weight_replace_rate = 0.1 [DefaultSpeciesSet] compatibility_threshold = 3.0 [DefaultStagnation] species_fitness_func = max max_stagnation = 20 species_elitism = 2 [DefaultReproduction] elitism = 2 survival_threshold = 0.2
Handling NLua.Exceptions.LuaScriptException: table error in Lua
So I saw a video where a user named Sethbling uses an modified version of his lua scripts so they were compatible with Super Mario Kart. And with all of the tools open for everyone, I decided to try it out for myself as I think these semi-DIY experiments with AI are neat.
I first when to download the Super Mario Kart rom and the BizHawk emulator. Then, I downloaded the scripts and converted them into the proper file types. Finally, I when to running the game and when to Single Player > Time Trials. I also selected Mario as the racer along with the map Mario Circuit. Finally, I created a state file of right after the race had started, and I then restarted the emulator and ran the script once the race begun again. However, I am given this error every time I go to run it.
I am new to coding and have been unsuccessful in finding a solution. If anyone could help me with finding a solution, I'd greatly appreciate it. The scripts are also listed down below.
- This is the main script which I titled neatevolve.lua can be found here.
- Next, the file csv.lua is acquired here.
- Also, map.txt is found here.
And for anyone that wants to run this for themselves, I initially encountered an error when all this files were not in the same folder and titled the names above, except neatevolve which you can change.