Python: “Too many indices for array" happens when using sparse
I want to get_dummies a dataframe, and turn the dummy columns into sparse matrix.
df = pd.DataFrame(
{
"A": ["a", "b", "c", "a"],
"B": [1, 2, 3, 4]
})
df['A'] = df['A'].astype('category')
one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
print(one_hot)
one_hot.to_csv('test_sparse.csv',index=False)
one_hot:
B A_a A_b A_c
0 1 1 0 0
1 2 0 1 0
2 3 0 0 1
3 4 1 0 0
Error:
IndexError: too many indices for array
Hopefully for help!
See also questions close to this topic

how to display contents of text file one line at a time via timer using python on windows?
this is the code.
def wndProc(hWnd, message, wParam, lParam): if message == win32con.WM_PAINT: hdc, paintStruct = win32gui.BeginPaint(hWnd) dpiScale = win32ui.GetDeviceCaps(hdc, win32con.LOGPIXELSX) / 60.0 fontSize = 36 # http://msdn.microsoft.com/enus/library/windows/desktop/dd145037(v=vs.85).aspx lf = win32gui.LOGFONT() lf.lfFaceName = "Times New Roman" lf.lfHeight = int(round(dpiScale * fontSize)) #lf.lfWeight = 150 # Use nonantialiased to remove the white edges around the text. # lf.lfQuality = win32con.NONANTIALIASED_QUALITY hf = win32gui.CreateFontIndirect(lf) win32gui.SelectObject(hdc, hf) rect = win32gui.GetClientRect(hWnd) # http://msdn.microsoft.com/enus/library/windows/desktop/dd162498(v=vs.85).aspx win32gui.DrawText( hdc, **'Glory be to the Father, and to the son and to the Holy Spirit.',** 1, rect, win32con.DT_CENTER  win32con.DT_NOCLIP  win32con.DT_VCENTER ) win32gui.EndPaint(hWnd, paintStruct) return 0
.where it says the "glory be to the father.." prayer I would like that string to actually display a few different prayers on a timer. what I mean is I want to save short prayers to a text file and have the line where it says "glory be.." to change to a new prayer every 60 seconds cycling through a few prayers such as the serenity prayer etc.

How to plot the frequency of my data per day in an histogram?
I want to plot the number occurences of my data per day. y represent the id of my data. x represent the timestamp which I convert to time and day. But I can't make the correct plot. import matplotlib.pyplot as plt plt.style.use('ggplot') import time
y=['5914cce8fad645d1bec2e59e62823617', '1c2067e051734a1d8a75b18267ee4598', 'db6830fffa9c4aa5b71ef6da9333f357', '672cc9d5360e4451bb7c03e3d0bd8f0d', 'fb0f8122fffc47fea87ab2b749df173b', '558e96ca022240c7acc0e444f7663f53', 'c3f86fd5eac348d3a44cb325f30b6139', '21dd849f895f4cf5a16845a4c1a9fbf9', 'e3b4cd56e291467193b6d2226ee82ae7', '01346c48a8c443d1ac021efa33ca0f4e', '23b78b0f85be4ca799f41a5add76c12e', 'b1c036c00c2b4170a1708fd0add0dec2', '74737546e9c34126bcb24d34503421ca', '342991f5ec874c9d83eb9908f3e221aa', '4fdcd83aeb684e26b79b753c5e022a4e', 'b7fbeca9941643c49e909e71acc1eaba', '27c9d358a3ef4c69ba89eac16d8d3bdb', 'ef982c4ba11548a1aef12f672d7f1f00', 'efedede29bb44c5298b18b03070df3fd', 'eb03ae1b4cde409c8d342a16a8be30d2'] x=['1548143296750', '1548183033872', '1548346185194', '1548443373507', '1548446119319', '1548446239441', '1548446068267', '1548445962159', '1548446011209', '1548446259465', '1548446180380', '1548239985290', '1548240060367', '1548240045347', '1547627568993', '1548755333313', '1548673604016','1548673443843', '1548673503914', '1548673563975'] date=[] for i in x: print(i) print() i=i[:10] print(i) readable = time.ctime(int(i)) readable=readable[:10] date.append(readable) print(date) plt.hist(date,y) plt.show()

mysql.connector.errors.ProgrammingError: Error in SQL Syntax
I'm using the Python MySQL connector to add data to a table by updating the row. A user enters a serial number, and then the row with the serial number is added. I keep getting a SQL syntax error and I can't figure out what it is.
query = ("UPDATE `items` SET salesInfo = %s, shippingDate = %s, warrantyExpiration = %s, item = %s, WHERE serialNum = %s") cursor.execute(query, (info, shipDate, warranty, name, sn, )) conn.commit()
Error:
mysql.connector.errors.ProgrammingError: 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'WHERE serialNum = '1B0000021A974726'' at line 1
"1B0000021A974726" is a serial number inputted by the user and it is already present in the table.

How to merge pandas dataframe into existing reportlab table?
example_df = [[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]
I want to integrate example_df pandas data frame into an existing Reportlab table  where the number of rows is changing (could be 3 as shown in the example, or it could be 20):
rlab_table(['Mean','Max','Min','TestA','TestB'], ['','','','',''], ['','','','',''], ['','','','','']])
I have tried:
np.array(example_df).tolist()
but I get this error (AttributeError: 'int' object has no attribute 'wrapOn')
I am able to manually add each row into the report lab table by doing:
rlab_table(['Mean','Max','Min','TestA','TestB'], np.array(example_df).tolist()[0], np.array(example_df).tolist()[1], np.array(example_df).tolist()[2]])
However, the issue is that the number of rows in the dataframe is constantly changing, so I am seeking a solution similar to:
rlab_table(['Mean','Max','Min','TestA','TestB'], np.array(example_df).tolist()[0:X])] #Where X is the number of rows in the dataframe
 Pandas python Aggregation and Grouping  I want to show the sum on top of each different type

bins  Categorize column values using bins for ages
I have a .CSV file a snippet of which looks like this:
ID,SN, Age,Gender,Item ID,Item Name, Price 0,Lisim78, 20, Male, 108, Extraction Quickblade, 3.53 1,Lisovynya38, 40, Male, 143, Frenzied Scimitar, 1.56 2,Ithergue48, 24, Male, 92, Final Critic, 4.88 3,Chamassasya86, 24, Male, 100, Blindscythe, 3.27 4,Iskosia90, 23, Male, 131, Fury, 1.44 5,Yalae81, 22, Male, 81, Dreamkiss, 3.61 6,Itheria73, 36, Male, 169, Interrogator, 2.18 7,Iskjaskst81, 20, Male, 162, Abyssal Shard, 2.67 8,Undjask33, 22, Male, 21, Souleater, 1.1 9,Chanosian48, 35, Other, 136, Ghastly Adamantite, 3.58 10,Inguron55, 23, Male, 95, Singed Onyx Warscythe, 4.74
I need to establish bins for the 'Age' column which I have done like so:
bins = [0, 10, 15, 20, 25, 30, 35, 40, 45] names = ['<10', '1014', '1519', '2024', '2529', '3034', '3539', '40+'] df_bins = pd.cut(df['Age'], bins, labels=names)
How do I use the bins to categorize other columns like column 'SN'? I wanna be able to get a count of all players in 'SN' column who are <10, 1014, 1519 years... and so on.
Any help is greatly appreciated!

Summing sparse matrix rows by column groups
I have a scipy sparse matrix in coo format:
from scipy.sparse import coo_matrix data = np.asarray([[1, 0, 0], [.8, .2, 0], [0, 1, 0], [0.4, 0.3, 0.3]]) data array([[1. , 0. , 0. ], [0.8, 0.2, 0. ], [0. , 1. , 0. ], [0.4, 0.3, 0.3]]) sparse_matrix = coo_matrix(data)
For each column I have a cluster assignment, I would like to sum rows grouped by their cluster assignment. During this operation I would like to stay in sparse format for memory issues.
Example:
labels = ["a", "b", "b"]
Expected output:1, 0 .8, .2 0, 1 .4, .6

How to additively merge columns in a dataframe with similar column names?
I have a large dataframe with several columns that need to be additively merged based on the first part of a string (before .S*)...
an example data frame of this can be generated with this code
DF1 = structure(list(taxonomy = c("cat", "dog","horse","mouse","frog", "lion"), A = c(0L, 5L, 3L, 0L, 0L, 0L), D = c(2L, 1L, 0L, 0L, 2L, 0L), C = c(0L, 0L, 0L, 4L, 4L, 2L)), .Names = c("taxonomy", "A.S595", "B.S596", "B.S487"), row.names = c(NA, 6L), class = "data.frame")
This file looks like this:
taxonomy A.S595 B.S596 B.S487 1 cat 0 2 0 2 dog 5 1 0 3 horse 3 0 0 4 mouse 0 0 4 5 frog 0 2 4 6 lion 0 0 2
and I would like the output to look like this
taxonomy A B 1 cat 0 2 2 dog 5 1 3 horse 3 0 4 mouse 0 4 5 frog 0 6 6 lion 0 2

Matrix Inversion of Banded Sparse Matrix using SciPy
I am trying to solve the inverse of a banded sparse matrix in the most efficient way so that I can incorporate this in my realtime system. I am generating sparsebanded matrices which represent a convolution operation. Currently, I am using
spsolve
fromscipy.sparse.linalg
library. I found that there is a better way by usingsolve_banded
from thescipy.linalg
library. However,solve_banded
requires(l,u)
which is the number of nonzero lower and upper diagonals andab
which(l + u + 1, M)
array like banded matrix. I am not sure how to convert my code so that I can usesolve_banded
. Any help with this regard is highly appreciated.import numpy as np from scipy import linalg import math import time from scipy.sparse import spdiags from scipy.sparse.linalg import spsolve def ABC(deg, fc, N): r"""Generate sparsebanded matrices """ omc = 2*math.pi*fc t = ((1math.cos(omc))/(1+math.cos(omc)))**deg p = 1 for k in np.arange(deg): p = np.convolve(p,np.array([1,1]),'full') P = spdiags(np.kron(p,np.ones((N,1))).T, np.arange(deg+1), Ndeg, N) B = P.T.dot(P) q = np.sqrt(t) for k in np.arange(deg): q = np.convolve(q,np.array([1,1]),'full') Q = spdiags(np.kron(q,np.ones((N,1))).T, np.arange(deg+1), Ndeg, N) C = Q.T.dot(Q) A = B + C return A,B,C if __name__ == '__main__': mu = 0.1 deg = 3 wc = 0.1 for i in np.arange(1,7,1): # some dense random vector x = np.random.rand(10**i,1) # generate sparse banded matrices A,_,C = ABC(deg, wc, 10**i) # another banded matrix G = mu*A.dot(A.T) + C.dot(C.T) # SCIPY SPSOLVE st = time.time() y = spsolve(G,x) et = time.time() print("SCIPY SPSOLVE: N = ", 10**i, "Time taken: ", etst)
Results
SCIPY SPSOLVE: N = 10 Time taken: 0.0 SCIPY SPSOLVE: N = 100 Time taken: 0.0 SCIPY SPSOLVE: N = 1000 Time taken: 0.015689611434936523 SCIPY SPSOLVE: N = 10000 Time taken: 0.020943641662597656 SCIPY SPSOLVE: N = 100000 Time taken: 0.16722917556762695 SCIPY SPSOLVE: N = 1000000 Time taken: 1.7254831790924072