Sort a numpy 2d array by 1st row, maintaining columns
In python, I have a numpy array of the form:
[4 8 2 0 5]
[3 1 6 8 1]
[2 2 6 0 3]
[9 7 6 7 8]
[5 8 1 1 4]
I want to sort it by the value of the first row from left to right in ascending order, while keeping the columns as a whole intact. The actual arrays are of unspecified dimensions, and pretty gigantic, so writing something myself with for loops get prohibitively slow. The result should be:
[0 2 4 5 8]
[8 6 3 1 1]
[0 6 2 3 2]
[7 6 9 8 7]
[1 1 5 4 8]
I can get a row vector with the column indexes ordered correctly using argsort, but don't know where to go from there on actually building the new array.
1 answer

Source array:
In [215]: a Out[215]: array([[4, 8, 2, 0, 5], [3, 1, 6, 8, 1], [2, 2, 6, 0, 3], [9, 7, 6, 7, 8], [5, 8, 1, 1, 4]], dtype=int64)
Using Numpy indexing:
In [218]: a[:, a[0].argsort()] Out[218]: array([[0, 2, 4, 5, 8], [8, 6, 3, 1, 1], [0, 6, 2, 3, 2], [7, 6, 9, 8, 7], [1, 1, 5, 4, 8]], dtype=int64)
Using Pandas:
In [212]: pd.DataFrame(a).sort_values(0, axis=1).values Out[212]: array([[0, 2, 4, 5, 8], [8, 6, 3, 1, 1], [0, 6, 2, 3, 2], [7, 6, 9, 8, 7], [1, 1, 5, 4, 8]], dtype=int64)
See also questions close to this topic

How to detach Python child process on Windows (without setsid)?
I'm migrating some process code to Windows which worked well on Posix. Very simply put: the code to launch a subprocess and immediately detach will not work because
setsid()
is not available:import os, subprocess, sys p = subprocess.Popen([sys.executable, 'c', "print 'hello'"], preexec_fn=os.setsid)
I can remove the use of
setsid
but then the child process ends when the parent ends.My question is how do I achieve the same effect as
setsid
on Windows so that the child process lifetime is independent of the parent's?I'd be willing to use a particular Python package if one exists for this sort of thing. I'm already using
psutil
for example but I didn't see anything in it that could help me. 
Recursively finding a base sequence
findStartRec(goal, count)
recursively searches forward from an initial value of 0, and returns the smallest integer value that reaches or exceeds the goal.The preconditions are that
goal >= 0
andcount > 0
. If the double (x * 2) and add 5 ( + 5) sequence starting at 0 cannot reach the goal incount
steps, then try starting at 1.Continue this process until the program finds a starting value 'N' that does reach or exceed goal in count steps, and return that start value.
Example:
findStartRec( 100, 3 ) returns '9'
Here is what I have come up with so far
def findStartRec(goal, count, sequence=0, itter=0): if sequence == goal and count == 0: print("Sequence: ", sequence, "Itter: ", itter) return sequence, itter else: while count > 0: sequence = (itter * 2) + 5 count = count + 1 #return findStartRec(goal, count + 1, sequence, itter) else: return findStartRec(goal, count, sequence, itter + 1)

Data preparation with python
I have a text file that I want to divide into four parts as indicated in the code. it always generates me errors.
# First import pandas and the regex module import pandas as pd import numpy as np import re data = open("Discussion.txt", encoding="utf8") contenu = data.read() data.close() print(contenu) # Read the .txt file into a string data = open("Discussion.txt", encoding="utf8") string = data.read() data.close() #Split seperate lines into list of strings splitstring = string.splitlines() # For each list item find the data needed (with regex or indexing) # and assign to a dictionary df = {} for i in range(len(splitstring)): match = re.search(r'(.* .*)  (.*): (.*)',splitstring[1]) line = { 'Date' : splitstring[i][:10], 'Time' : match.group(1), 'Number' : match.group(2), 'Text' : match.group(3)} df[i] = line  AttributeError Traceback (most recent call last) <ipythoninput543a1f0fdf7c6> in <module>() 8 line = { 9 'Date' : splitstring[i][:10], > 10 'Time' : match.group(1), 11 'Number' : match.group(2), 12 'Text' : match.group(3)} AttributeError: 'NoneType' object has no attribute 'group' # Convert dictionary to pandas dataframe dataframe = pd.DataFrame(df).T #Finally send to csv dataframe.to_csv(filepath) File "<ipythoninput62b1b4e00c433>", line 3 Finally send to csv ^ IndentationError: unexpected indent
Here is a preview of the content of print (content) in image:

How to call object from array in JavaScript
I have array with this objects:
const words =[ {PL: "1", EN: "a"}, {PL: "2", EN: "b"}];
How do I call one of the values? For example:
1
I have tried:
document.getElementById("word").innerHTML=Object.values(words[0]);
but it shows
1, a

C++ Compiler passing by reference without me telling it to?
So I made a program earlier in the year that taught us how to use arrays. Just recently I was taught about the use of pointers and structures in C++. Now that I have that knowledge and understanding (or so I think), I am confused to how my earlier program managed to work!
Before I say more, here is the program code:
special attention to my askSales function as it leaves me baffled
void askSales(int salesArray[], string namesArray[]) { for (int counter = 0; counter <= NUM_OF_POS; counter++) { cout << "How many jars of " << namesArray[counter] << "sold? : "; cin >> salesArray[counter]; while (salesArray[counter] < 0 ) { cout << endl << "You cannot enter a negative number for sales. If no jars were\nsold please enter 0. Please answer prompt again." << endl << endl << "How many jars of " << namesArray[counter] << "sold? : "; cin >> salesArray[counter]; } } }
Heres what I am confused about:
How in the world is my program able to modify the contents of the arrays passed to askSales when those arguments are not passed by reference?

Change Json Objecto to another format
How to change this Json objerct to be like following.
Json original:
["Monte Mor, São Paulo", "Monte Mor, São Paulo", "Monte Mor, São Paulo"]
Expected:
[{location: 'Monte Mor, São Paulo'}, {location: 'Monte Mor, São Paulo'}, {location: 'Monte Mor, São Paulo'}]

How to sort an ArrayList of HSV colors
I am doing an Android Launcher app that extracts the average color of every app icon.
The idea is to sort apps by its color in HSV and have a beautiful gradient made of apps.
First I extract the average rgb. Then I convert rgb to HSV and compare with each other in the HUE scale (0 to 360).
The problem is that the order of colors are not visually pleaseant. I think thats because the sorting is only working on the HUE dimension. I mean, the sorting is working fine but could be a "red" that has full brightness and so for us is a "white".
I tried sorting by HUE, then by Saturation and by Value. Same result. Ordered app color list that seems unordered to the eyes.
Later I code a lot of if statements in the comparison method, excluding apps that its Saturation/Brightness are grater than or lower than.. etc.
Now I get something better but not quite yet:
(Average color of app icon is represented in backgorund color)
Then I add a bunch more of if statements. But the same problem is ocurring:
BOTTOM Apps (Disaster) These are apps that the ifs statements exclude and then are sort in another list. Finally Im using .addAll() to append the "disaster" app list to the bottom.
I am looking to do something like this:
I found this question (above image is from there) but I dont know if this is usefull for my particular case How to sort colors in two dimensions?
I tried lookin and implementing methods like eucledian distance, travelling salesman problem and kdtree... but I don't know if that is the right way to do it and I dont know how can I use that for sorting the list of colors. I also tried searching for java libraries for sorting color, but I didnt found any.
Thank you very much for your time.

Stream.sorted() then collect, or collect then List.sort()?
In general, is there a performance difference between these two pieces of code?
List<Integer> list1 = someStream1.sorted().collect(toList()); // vs. List<Integer> list2 = someStream2.collect(toList()); list2.sort(Comparator.naturalOrder())
Variant 2 is obviously yucky and should be avoided, but I'm curious if there are any performance optimizations built into the mainstream (heh, mainstream) implementations of Stream that would result in a performance difference between these two.
I imagine that because the stream has strictly more information about the situation, it would have a better opportunity to optimize. E.g. I imagine if this had a
findFirst()
call tacked on, it would elide the sort, in favor of amin
operation. 
Find not used values in array
I am writing a function in Java to find out the first available(not used) values in an array. The range is 0 to 999.
For example,
{1,3,4,10} > available 0,2,5
{0,1,3,4,10} > available 2,5
My function works when 0 is not present. how to make it work for both cases?
public class Values{ public static void main(String []args){ int myArrray[] = {0,1,3,4,10}; int temp = 0; int index = 0; int available = 0; for (int i = 0; i < myArrray.length; i++) { if(temp == 0 && myArrray[i] != temp ){ available = temp; System.out.println("value of temp: " + temp); System.out.println("value of available time: " + available); System.out.println("value of index: " + i); } else if(myArrray[i]  temp > 1 ){ available = temp + 1; temp = available; System.out.println("value of temp: " + temp); System.out.println("value of available time: " + available); System.out.println("value of index: " + i); } else{ } temp = myArrray[i]; } } }
result
value of temp: 0 value of available time: 0 value of index: 1 value of temp: 2 value of available time: 2 value of index: 2 value of temp: 5 value of available time: 5 value of index: 4

How do I use numpy vectorize to iterate through a twodimentional vector?
I am trying to use numpy.vectorize to iterate over a (2x5) matrix which contains two vectors representing the x and yvalues of coordinates. The coordinates (x and yvalue) are to be fed to a function returning a (1x1) vector for each iteration. So that in the end, the result should be a (1x5) vector. My problem is that instead of iterating through each element I want the algorithm to iterate through both vectors simultaneously, so it picks up the x and yvalues of the coordinates in parallel to feed it to the function.
data = np.transpose(np.array([[1, 2], [1, 3], [2, 1], [1, 1], [2, 1]])) th_ = np.array([[1, 1]]) th0_ = 2 def positive(x, th = th_, th0 = th0_): if signed_dist(x, th, th0)[0][0] > 0: return np.array([[1]]) elif signed_dist(x, th, th0)[0][0] == 0: return np.array([[0]]) else: return np.array([[1]]) positive_numpy = np.vectorize(positive) results = positive_numpy(data)
Reading the numpy documentation did not really help and I want to avoid large workarounds in favor of computation timing. Thankful for any suggestion!

How to get data from python datatype returned in MATLAB?
I have a python script like so:
import numpy as np def my_function(x): return np.array([x])
And I have a MATLAB script to call it:
clear all; clc; if count(py.sys.path,'') == 0 insert(py.sys.path,int32(0),''); end myfunction_results = py.python_matlab_test.my_function(8); display(myfunction_results);
And it displays:
myfunction_results = Python ndarray with properties: T: [1×1 py.numpy.ndarray] base: [1×1 py.NoneType] ctypes: [1×1 py.numpy.core._internal._ctypes] data: [1×8 py.buffer] dtype: [1×1 py.numpy.dtype] flags: [1×1 py.numpy.flagsobj] flat: [1×1 py.numpy.flatiter] imag: [1×1 py.numpy.ndarray] itemsize: 8 nbytes: 8 ndim: 1 real: [1×1 py.numpy.ndarray] shape: [1×1 py.tuple] size: 1 strides: [1×1 py.tuple] [8.]
But I do not know how to actaully get the data out of this object. The type is
py.numpy.ndarray
, but I want to obviously use it in MATLAB as an array or matrix, or integer or something. HOw do I convert it to one of those types?I've been looking at these: https://www.mathworks.com/help/matlab/examples/callpythonfrommatlab.html https://www.mathworks.com/matlabcentral/answers/216498passingnumpyndarrayfrompythontomatlab https://www.mathworks.com/help/matlab/matlab_external/usematlabhandleobjectsinpython.html
Some of the answers suggest writing to a
.mat
file. I DO NOT want to write to a file. This needs to be able to run in real time and writing to a file will make it very slow for obvious reasons.Seems like there is an answer here: "Converting" Numpy arrays to Matlab and vice versa which shows
shape = cellfun(@int64,cell(myfunction_results.shape)); ls = py.array.array('d',myfunction_results.flatten('F').tolist()); p = double(ls);
But I must say that is very cumbersome....is there an easier way?

How to solve ValueError: cannot reindex from a duplicate axis in python
I have a dataset of all categorical columns from where I need to find the proportions of the target_class i.e 1 within each level of a categorical variable. And then append the correlation of each level with target_class by dummying the categorical variable. Below is an example of input data and expected output:
#Input Data: df_data = pd.DataFrame( {'production' : ['1101100000','1101100000','100100000','100100000','1101100000','1101100000','1001000000','1101100000','1101100000','1101100000'], 'enc_svod' : ['Free','Free','Pay','','Pay','Free','Free','','','Pay'], 'status' : [1,0,0,0,1,0,0,0,0,1]} )
Code to find proportions and correlation with target_class:
cat_cols = ['production','enc_svod'] # Code to find proportions and correlation with target_class: # Now traverse through each column and calculate correlation and generate metrics cat_count = 0 cat_metrics_df = pd.DataFrame() for each_col in cat_cols: df_temp = pd.DataFrame() df_single_col_data = df_data[[each_col]] cat_count += 1 # Calculate uniques and nulls in each column to display in log file. uniques_in_column = len(df_single_col_data[each_col].unique()) nulls_in_column = df_single_col_data.isnull().sum() print('Working on column %s, converting to dummies and finding correlation with target' %(each_col)) df_categorical_attribute = pd.get_dummies(df_single_col_data[each_col].astype(str), dummy_na=True, prefix=each_col) df_categorical_attribute = df_categorical_attribute.loc[:, df_categorical_attribute.var() != 0.0]# Drop columns with 0 variance. df_temp['correlation'] = df_categorical_attribute.corrwith(df_data['status']) try: # Calculate Index : Proportions of 1's within each CAT level frames = [df_single_col_data,df_data['status']] df_proportions = pd.concat(frames,axis = 1) df_proportions = df_proportions.fillna('nan').groupby(each_col,as_index = True).mean() df_proportions.index = [str(df_proportions.index.name) + '_' + str(x) for x in df_proportions.index.values] df_temp['Index'] = df_temp.join(df_proportions)['status'] df_temp['Attribute'] = str(each_col) cat_metrics_df = cat_metrics_df.append(df_temp) except ValueError: print("Error for column %s:" %(each_col)) continue
The reason I do try except here is because for some variables there is an error of Value Error as below:
Traceback (most recent call last): File "/user/data_processing_functions.py", line 443, in metrics_categorical df_temp['Index'] = df_temp.join(df_proportions)['disco_status'] File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/frame.py", line 2331, in __setitem__ self._set_item(key, value) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/frame.py", line 2397, in _set_item value = self._sanitize_column(key, value) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/frame.py", line 2547, in _sanitize_column value = reindexer(value) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/frame.py", line 2539, in reindexer raise e File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/frame.py", line 2534, in reindexer value = value.reindex(self.index)._values File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/series.py", line 2426, in reindex return super(Series, self).reindex(index=index, **kwargs) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/generic.py", line 2515, in reindex fill_value, copy).__finalize__(self) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/generic.py", line 2533, in _reindex_axes copy=copy, allow_dups=False) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/generic.py", line 2627, in _reindex_with_indexers copy=copy) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/internals.py", line 3886, in reindex_indexer self.axes[axis]._can_reindex(indexer) File "/user/anaconda3/lib/python3.6/sitepackages/pandas/core/indexes/base.py", line 2836, in _can_reindex raise ValueError("cannot reindex from a duplicate axis") ValueError: cannot reindex from a duplicate axis
For some columns there are more unique values(19) than the number of categories left after:
df_categorical_attribute = df_categorical_attribute.loc[:, df_categorical_attribute.var() != 0.0]# Drop columns with 0 variance.
This happens when I run it on the server which has pandas version 0.20.3 whereas on my local it is the latest one  0.23.4. I am not sure if this is the reason or there is some other reason for this error. I thought of using Try Except so that if such ValueError errors it should skip that column. I am not sure why that is happening, I am guessing because of spaces on the whole data  2.5 million rows * 1200 columns ( I am using a sample  50000 on my local) which might not be capturing those cases I think.