numpy inner product and value error
I have problem with inner product of two vector. I define subtract and distance in these shape:
subtract = np.zeros((3,1), dtype=int)
distance = np.zeros((7,))
then when I want to do this operation:
subtract = np.subtract(pix[i,j],cluster[k])
distance[k] = np.inner(subtract,np.transpose(subtract))
I get this error:
distance[k] = np.inner(subtract,np.transpose(subtract))
ValueError: shapes (1,3) and (1,3) not aligned: 3 (dim 1) != 1 (dim 0)
there is no problem with subtract I can print it and its transpose.
See also questions close to this topic

Creating a class to remove special characters in Python
I'd like to write a class that can remove a special character from any column of a dataframe that I would like. For instance, let's say I have data from a table below:
Column A  Column B ?a?  ?b?
I would like to return:
Column A  Column B a  b
I tried writing a class so I could remove the special character from each column that I choose from my data. For instance, if I want to remove "?" from column A, I want to be able to do that for that specific column.
class a(): def __int__(self, col): self.col = col def remove_char(self,col): for i, col in enumerate(df.col): df.iloc[:, i] = df.iloc[:, i].str.replace('?', '') return san_col p = a() san_data = p.remove_apost(df)
I get an error that states that:
'NameError: name 'san_col' is not defined'
I am newer to this so any help would be appreciated.

Only last iteration of for loop being stored at every index of my 2d array in python
I am trying to create a 2d array in python using nested for loops. Here is my code:
count = 0 stateArray = [] state = [0]*2 for i in range (0,2): for j in range(0,2): state[0] = i state[1] = j stateArray.append(state) print (count) print (stateArray[count]) count += 1 print(stateArray[0]) print(stateArray[1]) print(stateArray[2])
Here is my output:
0 [0, 0] 1 [0, 1] 2 [1, 0] 3 [1, 1] [1, 1] [1, 1] [1, 1]
Why does my stateArray change to storing the last iteration of the fo loop at every index once I've exited the loop?

ordering list of coordinates on row basis
I am trying to order a set of coordinates in a specific way
If I have a yettobe ordered set of coordinates that represent different cells in some grid,
{(2, 2), (3, 1), (1, 2), (3, 2), (1, 3)}
My goal is to order this from left to right starting from the bottom row.
schematic of coordinates 0 1 2 3 4 0 X X X X X 1 X X O O X 2 X X O X X 3 X O O X X {(3, 1), (3, 2), (2, 2), (1, 2), (1, 3)} # desired output
I came up with the following code, but this would fail in the case where the row number is same but column number is different
data = {(1, 2), (1, 4), (2, 2), (0, 3), (0, 4)} data_1 = sorted(data, key=lambda x: x[0], reverse=True) data_2 = sorted(data_1, key=lambda x: x[1]) >>> print(data_2) [(2, 2), (1, 2), (0, 3), (1, 4), (0, 4)] # desired output [(2, 2), (1, 2), (1, 4), (0, 3), (0, 4)]
What improvement can I make?

Why is minimize_scalar not minimizing correctly?
I am a new Python user, so bear with me if this question is obvious.
I am trying to find the value of lmbda that minimizes the following function, given a fixed vector Z and scalar sigma:
def sure_sft(z,lmbda, sigma): indicator = np.abs(z) <= lmbda; minimum = np.minimum(z**2,lmbda**2); return sigma**2*np.sum(indicator) + np.sum(minimum);
When I pass in values of lmbda manually, I find that the function produces the correct value of sure_stf. However, when I try to use the following code to find the value of lmbda that minimizes sure_stf:
minimize_scalar(lambda lmbda: sure_sft(Z, lmbda, sigma))
it gives me an incorrect value for sure_stf (8.6731 for lmbda = 0.4916). If I pass in 0.4916 manually to sure_sft, I obtain 7.99809 instead. What am I doing incorrectly? I would appreciate any advice!

How to Calculate Mean and Margin of Error for Random Normal Distribution?
I have the following dataframes as part of my dataset:
df1 = np.random.normal(32000,200000,3650) df2 = np.random.normal(32000,200000,3650) df3 = np.random.normal(43500,140000,3650) df4 = np.random.normal(48000,70000,3650)
I need to calculate the Mean and Margin of Error for the following random normal distributions. I am aware that the formula for calculating Margin of Error is:
Margin of Error = Standard Error * C where
Standard Error = Standard Deviation of Sample/Square Root(Number of Samples)
C = Constant

In pandas, when removing data using str.split, how can I skip rows?
I have the following data:
Name X Y AA:AA 0 0 AA:BB 1 1 AA:CC 2 2 GG:AB 3 3 GG:AC 4 4
How can I filter out 'AA' and the semicolon, but skip anything with GG? I have used this to filter out the colon, and only keep the right side of the data, but for GG, I need to keep it as is
data['Name'] = data['Name'].str.split(":").str[1]

i need some help guys. Trying to implement a teradata transpose/pivot kind of output, but just not getting there
I have 2 tables:
create TABLE Z_SA_ACT_ERRR_T_T.RT_HOTEL_MAST ( HOTEL_CODE Integer , HOTEL_NAME VARCHAR(100), HOTEL_CITY VARCHAR(50) ); create TABLE Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL ( HOTEL_CODE Integer , TRAVEL_SITE_CODE VARCHAR(100), HOTEL_SEARCH_NAME VARCHAR(200) );
Inserts:
Insert into Z_SA_ACT_ERRR_T_T.RT_HOTEL_MAST Values (551,'Crowne Plaza Richmond', 'Richmond'); Insert into Z_SA_ACT_ERRR_T_T.RT_HOTEL_MAST Values (11978,'Super 8 Motel Port Clinton', 'Port Clinton'); Insert into Z_SA_ACT_ERRR_T_T.RT_HOTEL_MAST Values (406,'Hyatt Arlington','Arlington'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 00406, '061','"56654hyattarlingtonHyatt Arlington"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 00406, '2309','"19235Yyatt Centric"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 00406, '360','"arlingtonvahyattcentricarlington_52462565Hyatt"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 11978, '061','"1131685Super 8 Motel Port Clinton"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 11978, '2309','"39798Super 8 by Wyndham Port Clinton"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 11978, '360','"portclintonohsuper8bywyndhamportclinton5057"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 00551, '2309','"24074Crowne Plaza Fairfield"'); insert into Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL VALUES ( 00551, '360','"albuquerquenmalbuquerquecrowneplaza174775"'); select * from Z_SA_ACT_ERRR_T_T.RT_HOTEL_MAST ; HOTEL_CODE HOTEL_NAME HOTEL_CITY 1 551 Crowne Plaza Richmond Richmond 2 11978 Super 8 Motel Port Clinton Port Clinton 3 406 Hyatt Arlington Arlington select * from Z_SA_ACT_ERRR_T_T.RT_TRAVEL_HOTEL HOTEL_CODE TRAVEL_SITE_CODE HOTEL_SEARCH_NAME 1 406 061 "56654hyattarlingtonHyatt Arlington" 2 406 2309 "19235Yyatt Centric" 3 406 360 "arlingtonvahyattcentricarlington_52462565Hyatt" 4 551 2309 "24074Crowne Plaza Fairfield" 5 551 360 "albuquerquenmalbuquerquecrowneplaza174775" 6 11978 061 "1131685Super 8 Motel Port Clinton" 7 11978 2309 "39798Super 8 by Wyndham Port Clinton" 8 11978 360 "portclintonohsuper8bywyndhamportclinton5057"
What i want: I want an output which looks something like this:
Can someone help me acheive this?

Transpose a 1dimensional array in Numpy without casting to matrix
My goal is to to turn a row vector into a column vector and vice versa. The documentation for
numpy.ndarray.transpose
says:For a 1D array, this has no effect. (To change between column and row vectors, first cast the 1D array into a matrix object.)
However, when I try this:
my_array = np.array([1,2,3]) my_array_T = np.transpose(np.matrix(myArray))
I do get the wanted result, albeit in matrix form (
matrix([[66],[640],[44]])
), but I also get this warning:PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices or deal with linear algebra (see https://docs.scipy.org/doc/numpy/user/numpyformatlabusers.html). Please adjust your code to use regular ndarray.
my_array_T = np.transpose(np.matrix(my_array))
How can I properly transpose an
ndarray
then? 
Python pandas dataframe transformations with groupby, pivot and transpose
I do have a dataframe with two columns:
date
andbill_id
. Dates range in date column is one year from 01012017 to 30122017. There are 1000 uniquebill_ids
. Eachbill_id
may occur at least once inbill_id
column. The result is a DataFrame of size: 2 column, 1000000 rows...dt bill_id 01012017 bill_1 01012017 bill_2 02012017 bill_1 02012017 bill_3 03012017 bill_4 03012017 bill_4
so, some name_ids may occur on specific day while other not.
What I want to achieve is a dataframe in a format so all unique bill_ids are columns, all unique dates are rows and each bill_id has either 0 or 1 or 2 for corresponding day value where 0 = did not appear on that date yet, 1 appeared on that date, 2 did not appear on that date but existed before e.g.
if bill_id existed on 02012017 then it would have 0 on 01012017, 1 on 02012017 and 2 on 03012017 and 2 on all consequetive days.
I did it in few steps but the code does not scale more as it is slow:
def map_values(row, df_z, c): subs = df_z[[c, 'bill_id', 'date']].loc[df_z['date'] == row['dt']] if c not in subs['bill_id']: row[c] = max(subs[c].tolist()) else: val = df_z[c].loc[(df_z['date'] == row['dt']) & (df_z['bill_id'] == c)].values assert len(val) == 1 row[c] = val[0] return row def map_to_one(x): bills_x = x['bill_id'].tolist() for b in bills_x: try: x[b].loc[x['bill_id'] == b] = 1 except: pass return x def replace_val(df_groupped, col): mask = df_groupped.loc[df_groupped['bill_id'] == col].index[df_groupped[col].loc[df_groupped['bill_id'] == col] == 1] min_dt = df_groupped.iloc[min(mask)]['date'] max_dt = df_groupped.iloc[max(mask)]['date'] df_groupped[col].loc[(df_groupped['date'] < min_dt)] = 0 df_groupped[col].loc[(df_groupped['date'] >= min_dt) & (df_groupped['date'] <= max_dt)] = 1 df_groupped[col].loc[(df_groupped['date'] > max_dt)] = 2 return df_groupped def reduce_cols(row): col_id = row['bill_id'] row['val'] = row[col_id] return row df = df.sort_values(by='date') df = df[pd.notnull(df['bill_id'])] bills = list(set(df['bill_id'].tolist())) for col in bills: df[col] = 9 df_groupped = df.groupby('date') df_groupped = df_groupped.apply(lambda x: map_to_one(x)) df_groupped = df_groupped.reset_index() df_groupped.to_csv('groupped_in.csv', index=False) df_groupped = pd.read_csv('groupped_in.csv') for col in bills: df_groupped = replace_val(df_groupped, col) df_groupped = df_groupped.apply(lambda row: reduce_cols(row), axis=1) df_groupped.to_csv('out.csv', index=False) cols = [x for x in df_groupped.columns if x not in ['index', 'date', 'bill_id', 'val']] col_dt = sorted(list(set(df_groupped['date'].tolist()))) dd = {x:[0]*len(col_dt) for x in cols} dd['dt'] = col_dt df_mapped = pd.DataFrame(data=dd).set_index('dt').reset_index() for c in cols: counter += 1 df_mapped = df_mapped.apply(lambda row: map_values(row, df_groupped[[c, 'bill_id', 'date']], c), axis=1)
EDIT:
The answer from Joe is fine but I decided to go instead with other option:
 get date.min() and date.max()
 df_groupped = groupby bill_id
 df_groupped apply function in which I check date_x.min() and date_x.max() per group and I do compare date.min() with date_x.min() and date.max() with date_x.max() and in such way I know where is 0, 1 and 2 :)

Complete a Johansen cointegration using two excel data sheets. ValueError: cannot copy sequence with size 0 to array axis with dimension 3580
I am trying to complete a Johansen cointegration using two excel data sheets but am getting this error.
ValueError: cannot copy sequence with size 0 to array axis with dimension 3580
I have the Johansen sheet running and working.
import matplotlib.pyplot as plt import pandas as pd from johansen import coint_johansen from pandas import ExcelWriter from pandas import ExcelFile df_x = pd.read_excel('NASDAQ.xlsx', sheet_name='NASDAQ', index_col=0) df_y = pd.read_excel('TA75.xlsx', sheet_name='TA75', index_col=0) df=pd.DataFrame({'x':df_x,'y':df_y}, index = [0]) coint(johansen(df,0,1))

ValueError: too many values
I am going through a python tutorial. I typed in exactly what the tutorial has but it won't run. I think the issue is the tutorial uses Python 2 something and i am using Python 3.5. For instance the tutorial does not use parenthesis after the print and i have to and it uses raw_input where i use just input.
This is what i am trying to run
def sumProblem(x, y): print ('The sum of %s and %s is %s.' % (x, y, x+y)) def main(): sumProblem(2, 3) sumProblem(1234567890123, 535790269358) a, b = input("Enter two comma separated numbers: ") sumProblem(a, b) main()
This is the error i receive:
ValueError: too many values to unpack (expected 2)
If i put just two numbers with out a comma it will concatenate them. I have tried to change to integer but it gives this error:
ValueError: invalid literal for int() with base 10:
When i searched it on here the answers did not seem to apply to my problem, they were much more involved or i didn't understand.

OCaml UserDefined Type and Function Return Error
I was writing a function with userdefined types in OCaml when I encountered an error message that I don't understand.
I'm currently using the OCaml interactive toplevel and also Visual Studio Code to write my code. The strange thing is that when I run the code in Visual Studio Code, it compiles fine but encounters the error in the interactive toplevel.
The OCaml code that I am referring to is as follows:
type loc = int;; type id = string;; type value =  Num of int  Bool of bool  Unit  Record of (id > loc) ;; type memory = (loc * value) list;; exception NotInMemory;; let rec memory_lookup : (memory * loc) > value = fun (mem, l) > match mem with  [] > raise NotInMemory  hd :: tl > (match hd with  (x, a) > if x = l then a else (memory_lookup (tl, l)) ) ;;
The code that I wrote is basically my rudimentary attempt at implementing/emulating looking up memory and returning corresponding values.
Here's an example input:
memory1 = [ (1, Num 1) ; (2, Bool true) ; (3, Unit) ];;
Here's the expected output:
memory_lookup (memory1, 2);;  : value = Bool true
However, here's the actual output:
Characters: 179180:  (x, a) > if x = l then "a" else (memory_lookup (tl, l))) Error: This expression has type value/1076 but an expression was expected of type value/1104
(Just for clarification: the error is regarding character
a
)Does anybody know what
type value/1076
andtype value/1104
mean? Also, if there is anything wrong with the code that I wrote, would anybody be kind enough to point it out?Thank you.

CUDA inner product with matrix
I'm trying to use CUDA for accelerating the calculation of the inner product:
<x, W.x> = xT.W.x
where W is a square matrix of size N and x a vector of size N.
In fact, I have to do this inner product for a large number of vectors but for the same matrix W. Furthermore N is large.
Any suggestion on a possible algorithm?

Matlab inner product between two vectors
how to find the inner product between ["v1" "v2"] and ["f1" "f2" "f3"] in Matlab for this ["v1f1" "v1f2" "v1f3" "v2f1" "v2f2" "v2f3"] result? Thanks.