Python Pulp error with pyspark: Unsupported Operation Between Types
I have an optimization problem where for each customer I have an FTE column (number of full time employees) and a profit column. There is a constraint on the number of FTEs and I need to maximize profit.
First I prepare the spark dataframe:
US_AEI = ( US_AEI .filter(F.col("Channel")=="A" .select("Customer_Name", "ftes", "profit") .filter(F.col("profit").isNotNull()) .filter(F.col("ftes").isNotNull()) .withColumn("profit_per_fte", F.col("profit")/F.col("ftes")) )
The FTE column is a double. For example Customer A could have 0.01 FTEs to make 1000 profit.
Then I prepare the lists and dictionaries:
keys = [x.Customer_Name for x in US_AEI.select('Customer_Name').collect()] values = [x.profit_per_fte for x in US_AEI.select('profit_per_fte').collect()] profit_per_fte= dict(zip(keys, values)) emp_values = [x.ftes for x in US_AEI.select('ftes').collect()] ftes = dict(zip(keys, emp_values))
Maximum number of FTEs:
fte_envelope = 40
Since a customer can be chosen or not chosen to have FTEs, I make it binary:
customer_vars = LpVariable.dicts("Customer",keys,lowBound=0,cat='Binary')
Creating the problem and adding the constraint:
prob = LpProblem("Profit", LpMaximize) prob += lpSum([profit_per_fte[i]*customer_vars[i] for i in keys]) #constraint prob += lpSum([ftes[c] * customer_vars[c] for c in keys])<= fte_envelope prob.solve()
At this point I get an error,
prob += lpSum(profit_per_fte[i]*customer_vars[i] for i in keys) #TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
I don't understand why I'm getting this error, when I return a dataframe everything looks as expected, the profit and FTE columns are both doubles and they don't have any nulls.
pandasDF = pd.DataFrame(keys, columns = ['Customer']) pandasDF = pandasDF.assign(profit_per_fte = values) pandasDF = pandasDF.assign(FTEs = emp_values) spark = SparkSession.builder.getOrCreate() sparkDF=spark.createDataFrame(pandasDF) return sparkDF
Any suggestions for getting past this error would be greatly appreciated.
The full data has 6 sales channels per customer, as a first step I was trying to get it working with the data filtered for one channel only. For the full data I will also need to add a binary constraint so that a maximum of one sales channel is chosen per customer. Not all customers need ot be assigned an FTE.
This is the logic I've been unable to test because of the previous error, any feedback would be appreciated.
channel = ["A", "B", "C", "D", "E", "F"] #setting the decision variables use_vars = LpVariable.dicts("UseChannel", channel, 0, 1, LpBinary) chan_vars = LpVariable.dicts("Service", [(i,j) for i in customer, for j in channel], 0) prob += lpSum(atcost[j] * use_vars[j] for j in channel) prob += lpSum(ftes[i] * customer[i] for i in customer) <= fte_envelope prob.sovle