Python: “Too many indices for array" happens when using sparse
I want to get_dummies a dataframe, and turn the dummy columns into sparse matrix.
df = pd.DataFrame(
{
"A": ["a", "b", "c", "a"],
"B": [1, 2, 3, 4]
})
df['A'] = df['A'].astype('category')
one_hot = pd.get_dummies(df.to_sparse(), sparse=True)
print(one_hot)
one_hot.to_csv('test_sparse.csv',index=False)
one_hot:
B A_a A_b A_c
0 1 1 0 0
1 2 0 1 0
2 3 0 0 1
3 4 1 0 0
Error:
IndexError: too many indices for array
Hopefully for help!
See also questions close to this topic

How to detach Python child process on Windows (without setsid)?
I'm migrating some process code to Windows which worked well on Posix. Very simply put: the code to launch a subprocess and immediately detach will not work because
setsid()
is not available:import os, subprocess, sys p = subprocess.Popen([sys.executable, 'c', "print 'hello'"], preexec_fn=os.setsid)
I can remove the use of
setsid
but then the child process ends when the parent ends.My question is how do I achieve the same effect as
setsid
on Windows so that the child process lifetime is independent of the parent's?I'd be willing to use a particular Python package if one exists for this sort of thing. I'm already using
psutil
for example but I didn't see anything in it that could help me. 
Recursively finding a base sequence
findStartRec(goal, count)
recursively searches forward from an initial value of 0, and returns the smallest integer value that reaches or exceeds the goal.The preconditions are that
goal >= 0
andcount > 0
. If the double (x * 2) and add 5 ( + 5) sequence starting at 0 cannot reach the goal incount
steps, then try starting at 1.Continue this process until the program finds a starting value 'N' that does reach or exceed goal in count steps, and return that start value.
Example:
findStartRec( 100, 3 ) returns '9'
Here is what I have come up with so far
def findStartRec(goal, count, sequence=0, itter=0): if sequence == goal and count == 0: print("Sequence: ", sequence, "Itter: ", itter) return sequence, itter else: while count > 0: sequence = (itter * 2) + 5 count = count + 1 #return findStartRec(goal, count + 1, sequence, itter) else: return findStartRec(goal, count, sequence, itter + 1)

Data preparation with python
I have a text file that I want to divide into four parts as indicated in the code. it always generates me errors.
# First import pandas and the regex module import pandas as pd import numpy as np import re data = open("Discussion.txt", encoding="utf8") contenu = data.read() data.close() print(contenu) # Read the .txt file into a string data = open("Discussion.txt", encoding="utf8") string = data.read() data.close() #Split seperate lines into list of strings splitstring = string.splitlines() # For each list item find the data needed (with regex or indexing) # and assign to a dictionary df = {} for i in range(len(splitstring)): match = re.search(r'(.* .*)  (.*): (.*)',splitstring[1]) line = { 'Date' : splitstring[i][:10], 'Time' : match.group(1), 'Number' : match.group(2), 'Text' : match.group(3)} df[i] = line  AttributeError Traceback (most recent call last) <ipythoninput543a1f0fdf7c6> in <module>() 8 line = { 9 'Date' : splitstring[i][:10], > 10 'Time' : match.group(1), 11 'Number' : match.group(2), 12 'Text' : match.group(3)} AttributeError: 'NoneType' object has no attribute 'group' # Convert dictionary to pandas dataframe dataframe = pd.DataFrame(df).T #Finally send to csv dataframe.to_csv(filepath) File "<ipythoninput62b1b4e00c433>", line 3 Finally send to csv ^ IndentationError: unexpected indent
Here is a preview of the content of print (content) in image:

pandas.value_counts calculates counts incorrectly
I have a column in a data frame from which I want to generate the frequency table using pandas.value_counts(). But the counts in freqTable are all wrong. Does anyone know why? For example, one of the values should have a count of 8, but freqTable says the count is 2. This is the code I used.
freqtable = data['A'].value_counts()
I also tried this, still the counts are wrong.
freqtable = data.groupby('A').count()

When filtering pandas dataframe latter column values are being dropped?
I'm trying to filter dataframe columns based on whether the subsc_type column value is 'Registered' or 'Casual'. The original data looks like;
data.head() end_statn bike_nr subsc_type zip_code birth_date gender 23.0 B00468 Registered '97217 1976.0 Male 23.0 B00554 Registered '02215 1966.0 Male
I've not included all columns, but all columns to the left of and including 'subsc_type' maintain their values, however the following columns do not. The code I'm using and the data after this is;
registered = data[data.subsc_type == 'Registered'] casual = data[data.subsc_type == 'Casual'] casual.head() end_statn bike_nr subsc_type zip_code birth_date gender 47.0 B00368 Casual NaN NaN NaN 40.0 B00358 Casual NaN NaN NaN
Any help would be amazing! Thanks!

matplotlib fill_between issues with y1 & y2 arguments?
I've a pandas dataframe and i'm trying to plot the data to make a bett
wave mean median mad 0 4050.32 0.016182 0.011940 0.008885 1 4208.98 0.023707 0.007189 0.032585 2 4374.94 0.001321 0.001196 0.000378 3 4379.74 0.002778 0.003380 0.004685 4 4398.01 0.002974 0.004633 0.011037
I'm trying to plot column
wave
against columnmad
and i want to put theMAD
of colulmnmad
as the error. Instead if putting the error bars, i want to use thefill_between
frommatplotlib
but i don't understand what should give as an input toy1 & y2
in thisplt.fill_between(x['wave'], y1= ???, y2= ???, color='k',alpha=.5)
I've tried this? Is it correct?
plt.figure() x=pd.concat([wave,mean,median,mad],axis=1,keys['wave','mean','median','mad']) x=x[np.abs(x['mad']x['mad'].mean()) <= (2*x['mad'].mad())] plt.plot(x['wave'],x['mean'],'<') plt.fill_between(x['wave'], y1=+x['mad'].mad(), y2=x['mad'].mad(), color='k',alpha=.5) slope, intercept, r_value, p_value, std_err = stats.linregress(x['wave'],x['mean']) plt.plot(x['wave'], intercept + slope*x['wave'], 'r', label='fitted line') plt.show()
But it doesn't seems to be working because that's what i'm getting. Image
How can i do this?

How to solve the condition number of a large sparse matrix in compressed row storage (CRS) format?
I have a big spare matrix generated by Fortran and is in compressed row storage.
I want to use Matlab or Fortran to solve the condition number of the matrix.
How could I go about this?

3d array to sparse matrix
I would like to convert my 3d array lists into sparse matrix with the same dimension.
Here is my example dummy code:
import numpy as np from scipy import sparse A = np.array([[[1,2,0],[0,0,3],[1,0,4]],[[1,2,0],[0,0,3],[1,0,4]]]) B = np.matrix([[[1,2,0],[0,0,3],[1,0,4]],[[1,2,0],[0,0,3],[1,0,4]]]) # to become like this print(A.shape) AA = sparse.csr_matrix(A) print(AA)
After I run this code, I received:
TypeError: expected dimension <= 2 array or matrix
I understand that it needs to be in 2 array but is there anyway to make it the same as B?
I hope the A would be the same as the B if possible. There was an example for turning a 2d to 2d matrix but no 3d to 3d matrix.

Matrix inversion is difficult in matlab when deal with sparse matrix
I implement a algorithm which is related to sparse matrix inversion.
The code:
kapa_t=phi_t*F_x'*(inv(inv(R_t)+F_x*phi_t*F_x'))*F_x*phi_t;
I write down the code in matlab. It give me a warning Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 4.419037e18.. But as per my algorithm matrix inversion is important part. So, I am trying to search some efficient way for matrix inversion.So I find out this link how to compute inverse of a matrix accurately?
So I changed my code as suggest.kapa_t=phi_t*F_x'*(inv(inv(R_t)+F_x*phi_t*F_x'))\F_x*phi_t;
After that I get an error Error using \ Matrix dimensions must agree.
Error in EKF_SLAM_known (line 105) kapa_t=phi_tF_x'(inv(inv(R_t)+F_xphi_tF_x'))\F_x*phi_t;
Here line no: 8 of the algorithm is equivalent to code kapa_t=phi_tF_x'(inv(inv(R_t)+F_xphi_tF_x'))F_xphi_t;
What should I do with my code to get rid of this warning.