Python: “Too many indices for array" happens when using sparse
I want to get_dummies a dataframe, and turn the dummy columns into sparse matrix.
df = pd.DataFrame(
{
"A": ["a", "b", "c", "a"],
"B": [1, 2, 3, 4]
})
df['A'] = df['A'].astype('category')
one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
print(one_hot)
one_hot.to_csv('test_sparse.csv',index=False)
one_hot:
B A_a A_b A_c
0 1 1 0 0
1 2 0 1 0
2 3 0 0 1
3 4 1 0 0
Error:
IndexError: too many indices for array
Hopefully for help!
See also questions close to this topic

Comparing list values and storing new ones in a separate list
import csv with open("DADSA RESIT CWK JULY 2018.csv", newline='') as f: r = csv.reader(f) database = list(r) del database[0] names = [] names.append([]) def fillnames(d, n): for j in n: for i in d: if d[i][0] == n[j][0] and d[i][1] == n[j][1]: n[i][2] = n[i][2]+1 else: names.append([d[i][0], d[i][1], 0]) fillnames(database, names) for i in names: print(i)
The code I have here is me scanning in a csv file into a list. I then want to count how many entries share the same name, by scanning each new name into a separate list, then incrementing the number found every time I find a new one. Every time I run this code it returns "TypeError: list indices must be integers or slices, not list."

Changing font of a list python
Say I have some code that makes a list into a 4 by 4 1d array:
nlist = [2,2,4,8, 0,0,0,0, 0,0,0,0, 0,0,0,0] def drawBoard(): count = 0 for i in range(16): print(nlist[i], end = ' ') count += 1 if count == 4: print("") count = 0 print("") drawBoard()
How can I change all the fonts in this list into size 26. I tried doing font = 'times 26' but I don't know where to put it or if that command needs tkinter.

Get mouse coordinates without clicking in matplotlib
In a matplotlib plot, how can I continuously read the coordinates of the mouse when it is moved, but without waiting for clicks? This is possible in matlab, and there is a mpld3 plugin to do almost exactly what I want, but I can't see how to actually access the coordinates from it. There is also the package mpldatacursor, but this seems to require clicks. Searching for things like "matplotlib mouse coordinates without clicking" did not yield answers.
Answers using additional packages such as mpld3 are fine, but it seems like a pure matplotlib solution should be possible.

How to pass user defined function inside TfidfVectorizer.fit_transform()
I have function for text preprocessing which is simply removing stopwords as:
def text_preprocessing(): df['text'] = df['text'].apply(word_tokenize) df['text']=df['text'].apply(lambda x: [item for item in x if item not in stopwords]) new_array=[] for keywords in df['text']: #converts list of words into string P=" ".join(str(x) for x in keywords) new_array.append(P) df['text'] = new_array return df['text']
I want to pass
text_preprocessing()
into another functiontf_idf()
which gives feature matrix what I essentially did as:def tf_idf(): tfidf = TfidfVectorizer() feature_array = tfidf.fit_transform(text_preprocessing) keywords_data=pd.DataFrame(feature_array.toarray(), columns=tfidf.get_feature_names()) return keywords_data
I got an error as
TypeError: 'function' object is not iterable

Copy values of a column into an array in python
I want to pass the values of column in an array and then use it in a loop but the issue is that the loop replaces all the values of original column to the first value of array.
E.g here is the original dataset
Score Col1 Col2 Col3 1 2 6 1 2 5 0 1 3 1 13 1 4 1 0 0
The result I want is
Score Col1 Col2 Col3 1 2 6 1 1 5 0 1 1 1 13 1 1 1 0 0 Score Col1 Col2 Col3 2 2 6 1 2 5 0 1 2 1 13 1 2 1 0 0 Score Col1 Col2 Col3 3 2 6 1 3 5 0 1 3 1 13 1 3 1 0 0 Score Col1 Col2 Col3 4 2 6 1 4 5 0 1 4 1 13 1 4 1 0 0
But using my code I'm getting the results like
Score Col1 Col2 Col3 1 2 6 1 1 5 0 1 1 1 13 1 1 1 0 0 Score Col1 Col2 Col3 1 2 6 1 1 5 0 1 1 1 13 1 1 1 0 0 Score Col1 Col2 Col3 1 2 6 1 1 5 0 1 1 1 13 1 1 1 0 0 Score Col1 Col2 Col3 1 2 6 1 1 5 0 1 1 1 13 1 1 1 0 0
This is the code I'm using it's quite simple
df_arr = df1['Score'].values for i in df_arr: df1['Score'] = i print(df1)
However if I add duplicate column of 'Score' e.g 'Score1' and use it in making array and in loop I get the right results.
df_arr = df1['Score1'].values for i in df_arr: df1['Score'] = i print(df1)
Edit: What I want is for each value in my array i get the dataset in which the first whole column replaced by that array value. I have provided sample as well.

checking range of number and writing a value in a new column in pandas dataframe
I need to iterate over column 'movies_rated', check the value against the conditions, and write a value in a newly create column 'expert_level'. When I test on a subset of data, it works. But when I run it against my whole dateset, it only gets filled with value 1.
for num in df_merge['movies_rated']: if num in range(20,31): df_merge['expert_level'] = 1 elif num in range(31,53): df_merge['expert_level'] = 2 elif num in range(53,99): df_merge['expert_level'] = 3 elif num in range(99,202): df_merge['expert_level'] = 4 else: df_merge['expert_level'] = 5
here's a sample dataframe.
movies = [88,20,35,55,1203,99,2222,847] name = ['angie','chris','pine','benedict','alice','spock','tony','xena'] df = pd.DataFrame(movies,name,columns=['movies_rated'])
certainly there's a less verbose way of doing this?

How to do sparse * dense instead of dense * sparse in cuSparse
The cusparse functions
cusparse<t>csrmm()
andcusparse<t>gemmi()
both do dense x sparse matrix multiplication. But how can one do sparse x dense matrix multiplication with cusparse?If
S
is sparse andD
is dense, one could use the cublas functioncublas<t>geam()
to perform a transpose of the dense matrix and transpose the sparse matrix by modifying the entry indices. Then, the sparse * dense product can be found by:S*D = (D^T*S^T)^T
However, this would waste 3 transposes, also using
cublas<t>geam()
to transpose the dense matrix by setting *alpha=1 and *beta=0 wastes a matrix addition.Is there a more efficient way?

number of operations for sparse*dense matrix multiplication
How many floating point operations does it take to multiply a CSR sparse x dense matrix or a dense x CSR sparse matrix using optimized sparse routines (like cuSparse or Eigen or Matlab).
In the limit where the sparse matrix is fully dense, the number of operations is
N^2*(2*N1)
 so why are sparse routines slower than dense routines when the the sparse matrix is not sparse enough? What additional work is being done?

How are Values Placed into Sparse Matrix?
I have a dataframe similar to the following:
> dput(sparsed) structure(list(Movie = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 7L, 7L, 8L), .Label = c("1", "2", "3", "4", "5", "6", "7", "8"), class = "factor"), User = structure(c(3L, 2L, 9L, 4L, 9L, 14L, 6L, 2L, 7L, 14L, 11L, 9L, 1L, 5L, 14L, 2L, 15L, 1L, 8L, 13L, 12L, 3L, 10L, 9L, 2L), .Label = c("12", "2", "32", "34", "35", "4", "46", "5", "56", "64", "67", "69", "78", "89", "90"), class = "factor"), Rating = c(1L, 3L, 2L, 4L, 5L, 3L, 2L, 3L, 4L, 5L, 2L, 3L, 5L, 1L, 2L, 3L, 4L, 5L, 4L, 3L, 3L, 2L, 2L, 1L, 1L)), .Names = c("Movie", "User", "Rating"), row.names = c(NA, 25L), class = "data.frame")
I am trying to understand the logic used to place values into a sparse matrix (this is just a made up example, I am working with a much larger dataset that follows this pattern).
In order to get the previous dataframe into a sparse matrix I do the following. I also do not understand why I have to do the following conversions, it seems odd. But if I don't do them, my matrix does not come out with the correct dimensions (it should be an 15x8 sparse matrix when it's created).
library(Matrix) sparsed$Movie<as.factor(as.character(sparsed$Movie)) sparsed$User<as.factor(as.character(sparsed$User)) sparse<sparseMatrix(i=as.numeric(sparsed$Movie), j=as.numeric(sparsed$User), x=as.numeric(sparsed$Rating))
This results in a matrix like this:
> sparse 15 x 8 sparse Matrix of class "dgCMatrix" [1,] . . . 5 5 . . . [2,] 3 . 3 . 3 . . 1 [3,] 1 . . . . 2 . . [4,] . 4 . . . . . . [5,] . . . . 1 . . . [6,] . . 2 . . . . . [7,] . . 4 . . . . . [8,] . . . . 4 . . . [9,] 2 5 . 3 . . 1 . [10,] . . . . . . 2 . [11,] . . 2 . . . . . [12,] . . . . . 3 . . [13,] . . . . 3 . . . [14,] . 3 5 . 2 . . . [15,] . . . . 4 . . .
Which causes issues when I then need to link back to the original id's because they are seemingly in a random order. The first row corresponds to movie id 12, while the columns appear to be in numerical order. Can anyone explain what is happening? Or articulate a good way to control for this and get back to my original values?