Is it possible to create a numpy.memmap of array of arrays?
I have a (4,) arrays that I want to save to the disk (The sizes I am working with can not fit into memory so I need to dynamically load what I need). However, I want to have that in a single numpy.memmap
. Not sure if it is possible but any suggestion would be greatly appreciated.
I have this without numpy.memmap
arr1 = [1,2,3,4]
arr2 = [2,3,4,5]
arr3 = [3,4,5,6]
arr4 = [4,5,6,7]
data = []
data.extend([arr1])
data.extend([arr2])
data.extend([arr3])
data.extend([arr4])
print(data)
[[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6], [4, 5, 6, 7]]
I want to be able to do something like this:
import numpy as np
arr1 = np.memmap('./file1', np.dtype('O'), mode='w+', shape=(4,))
arr1[:] = [1,2,3,4]
arr2 = np.memmap('./file2', np.dtype('O'), mode='w+', shape=(4,))
arr2[:] = [2,3,4,5]
arr3 = np.memmap('./file3', np.dtype('O'), mode='w+', shape=(4,))
arr3[:] = [3,4,5,6]
arr4 = np.memmap('./file4', np.dtype('O'), mode='w+', shape=(4,))
arr4[:] = [4,5,6,7]
data = []
data.extend([arr1])
data.extend([arr2])
data.extend([arr3])
data.extend([arr4])
print (data)
[memmap([1, 2, 3, 4], dtype=object), memmap([2, 3, 4, 5], dtype=object), memmap([3, 4, 5, 6], dtype=object), memmap([4, 5, 6, 7], dtype=object)]
This requires me to create different files per array and I really want to have a single memmap
that would handle the entire miniarrays of 4. Can someone provide a way to do this using memmaps
?
The ability to extend i.e. data.extend()
is important as I don't know how many miniarrays I have.
See also questions close to this topic

Original value of variable gets lost after initialising other variable (with it)
(This tag is similar to other tags, but the other tags are often specific). I tried to raise it on a high level.
I initialised a varibale
"X"
,type(list)
with an another varibale"arrayMetricPI"
,type(list)
. But as I change ("X"
) I also change variable ("arrayMetricPI"
), which doesn't make sense to me? Maybe it's Python specific?What can I do, to keep an original value of a variable unchanged (here:
"arrayMetricPI"
)  even if I change the initialised variable (here:"X"
)?print(arrayMetricPI) # returns array "[0.0, 0.01, 0.02, 0.03,...] X = arrayMetricPI for xx in np.arange(0, counter, 1): # delete the first entry of the the array "X" print(X) # returns array "[0.01, 0.02, 0.03,...] print(arrayMetricPI) # returns array "[0.01, 0.02, 0.03,...]
I expected that
arrayMetricPI
still starts with[0.0,...]
as I just changed the list "X". 
how extract structured data from a pdf using regex
I have a pdf that repeats many times the following:
31102018 NATIONAL Initial Hearing Imputed: Maynor Steven Sevilla Flores Crime: murder Relation of facts: murder at 10 am in the neighborhood cox 20...… NOTE: xxxxxxxx... NOTE2:xxxxxxxx... DATA: xxxxxxx... 01112018 NATIONAL Initial Hearing Imputed: James Graden Crime: murder Relation of facts: murder at 11 am in the neighborhood bit 45...… . . .
I want to implement a python code:
import PyPDF2 import re PATH_DOWNLOAD_PDF = /home/Dev/Freelance/Webscrapping/test/file.pdf' pdf_file = open(PATH_DOWNLOAD_PDF, 'rb') read_pdf = PyPDF2.PdfFileReader(pdf_file) #. #. #.
I need to read the pdf with a regular python expression to get as a result:
Expected result : List Dict PYTHON:
[ { “Date” : “31102018”, “Judge” : “NATIONAL”, “Initial Hearing” : { “imputed” : “Maynor Steven Sevilla Flores” “Crime” : murder “Relation of facts” “murder at 10 am in the neighborhood cox 20...” } }, { “Date” : “01112018”, “Judge” : “NATIONAL”, “Initial Hearing” : { “imputed” : “ames Graden” “Crime” : murder “Relation of facts” “murder at 11 am in the neighborhood bit 45...…” } } ]
I'm bit programming and I need your help please

avoid loading specific class in hierarchy with sqlalchemy query
I want to load an entity via a sqlalchemy query while explicitly avoiding loading a specific class of entity as a field on any instance of any child of my loaded entity. Take the below data model:
from sqlalchemy.ext.declarative import declarative_base from sqlalchemy import Column, Integer, ForeignKey from sqlalchemy.orm import relationship base = declarative_base() class Parent(base): __tablename__ = 'Parent' uid = Column(Integer, primary_key=True) class Child(base): __tablename__ = 'Child' uid = Column(Integer, primary_key=True) parent_id = Column(Integer, ForeignKey('Parent.uid')) parent = relationship('Parent', backref="children") class OtherChild(base): __tablename__ = 'OtherChild' uid = Column(Integer, primary_key=True) parent_id = Column(Integer, ForeignKey('Parent.uid')) parent = relationship('Parent', backref="other_children") class Bicycle(base): __tablename__ = 'Bicycle' uid = Column(Integer, primary_key=True) child_id = Column(Integer, ForeignKey('Child.uid')) child = relationship('Child', backref="bicycles") child_id = Column(Integer, ForeignKey('OtherChild.uid')) child = relationship('OtherChild', backref="bicycles")
If I do a
Parent.query.all()
then I'm going to get back anyChild
orOtherChild
objects which are in thoseParent
objects in thechildren
andother_children
fields, respectively. Further, I'll get anyBicycle
objects which are embedded inside either theChild
orOtherChild
objects.I wish to do a
query
onParent
which explicitly avoids loading anyBicycle
objects on any children regardless how deep they may be in the data structure. 
Make a window of size L on a graph
I have a networkx lattice which looks like this
m=100 G=nx.grid_2d_graph(m,m,periodic=True)
This is a lattice of dimension mxm. Now I want to create a window of dimension l where l is from 0
How do I code this? An idea would be sufficient I dont need any code.

Multiplying Matrices of Vectors using Dot Product
I am given two 2D matrices where each of the cells is a vector with three elements.
What I'm looking to do is use 2D matrix multiplication where when any of the cells are multiplied together, it takes the dot product of the threeelement vectors.
My linear algebra skills are lacking so apologies if there is an answer already, I've looked over many pages relating to tensordot and einsums but I don't understand how each of these might apply to my situation.
Here is basically what I am given:
import numpy as np ar1 = np.array([[[1,2,3],[3,4,5]],[[5,6,7],[7,8,9]]]) ar2 = np.array([[[2,3,4],[4,5,6]],[[6,7,8],[8,9,10]]])
Here is how to make what I am looking for:
final = [[0 for x in range(2)] for y in range(2)] final[0][0] = np.dot(ar1[0][0], ar2[0][0]) + np.dot(ar1[0][1], ar2[1][0]) final[0][1] = np.dot(ar1[0][0], ar2[0][1]) + np.dot(ar1[0][1], ar2[1][1]) final[1][0] = np.dot(ar1[1][0], ar2[0][0]) + np.dot(ar1[1][1], ar2[1][0]) final[1][1] = np.dot(ar1[1][0], ar2[0][1]) + np.dot(ar1[1][1], ar2[1][1]) final Output: [[106, 142], [226, 310]]
In reality these matrices are going to be around 3000x40000x3 and 40000x40x3, so taking speed into account is greatly appreciated. Thanks!

"Please help me about issue of seed?"
I am trying to implement some codes and can't figure out the difference between them.
x = np.random.RandomState(42) y = np.random.RandomState(seed=None)
what is the difference between them?

How to get the original combination of numpy array?
I have two numpy arrays and I am able to get all the combination by adding these two arrays where none of the rows have any zeros left but while doing so I loose the original constituent of the array and I am not sure how the retrieve that piece of information. Please look below at my code:
x = np.array([[1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0], [0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0], [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0], [0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0], [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0], [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]]) y= np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0], [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0], [0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0], [0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1]]) x = np.expand_dims(x, 1) combos = (x + y).reshape(1, 12).astype("int") mask = np.any(np.equal(combos, 0), axis=1) combos=combos[~mask] print("combos:",combos) # Prints combos: [[1 1 1 2 2 2 2 2 1 1 1 1] [1 1 1 1 2 2 2 2 2 1 1 1] [1 1 1 2 2 2 2 2 2 1 1 1] [1 1 1 2 2 2 2 2 2 1 1 1]]
Now from the above result I need to know what are the row values of x and y that created combos, for example for the first row:
Combos[0] = [1,1,1,2,2,2,2,2,1,1,1,1]
X = [1,1,1,1,1,1,1,1,0,0,0,0]
Y = [0,0,0,1,1,1,1,1,1,1,1,1]

Splitting a bumpy array based on the length of another list
I have two lists. And I need to split a numpy.ndarray (matrix) based on the length of another list of lists (position). For example:
position: [[0.0056, 0.0065, 0.008], [0.009, 0.1, 0.127], [0.232, 0.879]]
In this example the matrix is of size (5,8)
matrix: array([[0, 0, 1, 0, 0, 1, 1, 1], [1, 1, 0, 1, 1, 1, 1, 1], [0, 1, 0, 1, 0, 1, 0, 1], [1, 1, 1, 0, 0, 0, 0, 0], [1, 1, 0, 0, 1, 0, 1, 0]])
So for example the list in first index of position is of length 3. Therefore I would like to generate an output that is cuts matrix at position three. Therefore, the output would for first list in position would be:
0, 0, 1 1, 1, 0 0, 1, 0 1, 1, 1 1, 1, 0
Similarly, the length of list in second index of position is also 3 therefore, it'll take next three columns from matrix and output would be:
0, 0, 1 1, 1, 1 1, 0, 1 0, 0, 0 0, 1, 0
Last index of position has a list of length two and so it'll take the last two columns of matrix.
I don't know how to achieve this. Help would be appreciated.

How can I make a transparent background?
I have a .csv file which contains some data where x, y, x1, y1 are the coordinate points, and p is the value. My below code is working very well for plotting, but when I am plotting the data, I am getting a background color like the purple color. I don't want any color in the background. I want the background will be Transparent. My ultimate goal is overlying this result over an image. I am new in Python. Any help will be highly appreciated.
Download link of the .csv file here or link2 or link3
My Code
import matplotlib.pyplot as plt from scipy import ndimage import numpy as np import pandas as pd from skimage import transform from PIL import Image import cv2 x_dim=1200 y_dim=1200 # Read CSV df = pd.read_csv("flower_feature.csv") # Create numpy array of zeros os same size array = np.zeros((x_dim, y_dim), dtype=np.uint8) for index, row in df.iterrows(): x = np.int(row["x"]) y = np.int(row["y"]) x1 = np.int(row["x1"]) y1 = np.int(row["y1"]) p = row["p"] array[x:x1,y:y1] = p map = ndimage.filters.gaussian_filter(array, sigma=16) plt.imshow(map) plt.show()
As per Ghassen's suggestion I am getting below results. I am still not getting the transparent background.
When Alpha =0
When alpha =0.5
When alpha =1

Is there a way to load a numpy unicode array into a memmap?
I am trying to create an array of
dtype='U'
and saving that usingnumpy.save()
, however, when trying to load the saved file into anumpy.memmap
I get an error related to the size not being a multiple of 'U3'I am working with
python 3.5.2
. I have tried the following code where I am creating an empty array and another array with 3 entries, all with length of 3 letters and then save the array intofile1.npy
file.import numpy as np arr = np.empty((1, 0), dtype='U') arr2 = np.array(['111', '222', '333'], dtype='U') arr = np.concatenate((arr, arr2), axis = None) print(arr) np.save('file1', arr) rArr = np.memmap('file1.npy', dtype='U3', mode='r')
However, when I try to load the file into a
numpy.memmap
I get the the following errorValueError: Size of available data is not a multiple of the datatype size.
Is there a way to load the data into a
numpy.memmap
using strings? I feel I am missing something simple. 
Why does numpy.memmap.flush() not update file in Windows?
I am currently copying/reformatting data from disk into a memmappednumpy array. I call flush() every ~5000 elements, but the changedate in Windows does not change.
dimx = 400000 dimy = 100 dimz = 16 data_in: np.memmap = np.memmap(fname_input, dtype='float32', mode='r+', shape=(dimx, dimy, dimz)) for i in range(dimx): data = next(data_generator) data_in[i] = data if (i % 5000) is 0 and i is not 0: data_in.flush() print('changes flushed to disk.')

BUS error occurs while using numpy memmap
I am trying a simple numpymemmap example from a book, which is causing BUS error no matter which numpy version I try (I have also tried 1.16.3, the latest). Any suggestion from experts would be great.
import numpy as np nrows, ncols = 1000000, 100 f = np.memmap('memmapped.dat', dtype=np.float32, mode='w+', shape=(nrows, ncols)) for i in range(ncols): f[:, i] = np.random.rand(nrows) # BUS error occurs here

How to use HugePages with pmem allocation?
In my Ubuntu 18.04 Intel system, there are 356gb of DDR. Out of this memory, 300gb, are preallocated using the pmem mechanism (since we have an external HW that sends us contiguous data chunks of 10gb). We are accessing this memory in a totally random manner.
We've witnessed a memory tpt degradation when accessing this memory, and we came into an understanding that the many accesses/updates to the TLB are causing the performance degradation.
For this reason we are trying to mmap our pmem with huge pages, but currently without success.
Does anybody has any experience with working with huge pages over pmem? Maybe other solutions for our scenario?
Thanks, Tom.

Numpy memmap inplace sort of a large matrix by column
I'd like to sort a matrix of shape
(N, 2)
on the first column whereN
>> system memory.With inmemory numpy you can do:
x = np.array([[2, 10],[1, 20]]) sortix = x[:,0].argsort() x = x[sortix]
But that appears to require that
x[:,0].argsort()
fit in memory, which won't work for memmap whereN
>> system memory (please correct me if this assumption is wrong).Can I achieve this sort inplace with numpy memmap?
(assume heapsort is used for sorting and simple numeric data types are used)