Slicing overlapping sets of features from 2D numpy array into 3D numpy array
I am confronted with the following problem:
I have a 2D numpy array of shape (total_samples, n_features)
. Think of total_samples as time series i.e. features recorded at every time step. I would like to reshape this 2D array into a 3D array of shape (n_batch, n_samples, n_features)
to be used as input for a “many to one” long short term memory (LSTM) neural network (NN). I would like to be able to specify n_samples
and percentage_overlap
between batches as inputs to a python function that figures out how many batches or strides are possible from the available 2D array given the desired percentage_overlap and n_samples
.
def create_chunks((total_samples, n_features), n_samples, overlap=.5)
return (n_batch, n_samples, n_features)
Numerically, let’s say for a single feature and 4 samples:
[[1], [2], [3], [4]]
using n_samples=2
and percentage_overlap=.5
should output
[[[1], [2]], [[2], [3]], [[3], [4]]].
See also questions close to this topic

Multiple decorators wrapping a generator function in python 3.7  all possible coin combinations problem
I want to write a program that shows all the possible ways a combination of different coins (5c, 10c, 50c, etc.) can amount to a certain value (for instance, $4). Let's say, for starters, I only want to consider 50c, $1 and 2$ into the possible combinations. If I want to use brute force, I could write 3 nested
while
loops and return that combination of currency every time they sum up to the value I want (in this case, $4). However, as we consider more currency options (for instance, 1c, 5c, 10c, 25c, 50c, $1, $2, $5, $10, $50, $100) and bigger sums (let's say I want all combinations to get $400), the code becomes very verbose and difficult to mantain. If I want to make a generic algorithm that works for any country and combination of currencies, I can't rely on simple loop nesting.Given this problem, I've tried using decorators so I can nest the loops in a single function:
INTENDED_SUM = 400 coin_count = { 200: 0, 100: 0, 50: 0, } def max_range(value): return INTENDED_SUM / value def coin_loop(coin_value): def decorator(func): while coin_count[coin_value] <= max_range(coin_value): yield from func() coin_count[coin_value] += 1 else: coin_count[coin_value] = 0 return decorator @coin_loop(100) def get_combinations(): agg = sum([i * coin_count[i] for i in coin_count]) if agg == INTENDED_SUM: yield coin_count for i in get_combinations: print(i)
This is the output:
{200: 0, 100: 4, 50: 0}
However, if I add multiple decorators to the
get_combinations()
function, a TypeError is raised:@coin_loop(200) @coin_loop(100) @coin_loop(50) def get_combinations(): agg = sum([i * coin_count[i] for i in coin_count]) if agg == INTENDED_SUM: yield coin_count Traceback (most recent call last): File "31coin_sums.py", line 29, in <module> for i in get_combinations: File "31coin_sums.py", line 15, in decorator yield from func() TypeError: 'generator' object is not callable
So I have two questions:
Why does get_combinations already shows up as a generator and does not need to be called when I call it in the last 2 lines of the code? Shouldn't they be like this?
for i in get_combinations(): print(i)
Why isn't decorator nesting working in this case? Expected output should be something like:
{200: 2, 100: 0, 50: 0} {200: 1, 100: 2, 50: 0} {200: 1, 100: 1, 50: 2} {200: 1, 100: 0, 50: 4} {200: 0, 100: 4, 50: 0} {200: 0, 100: 3, 50: 2} {200: 0, 100: 2, 50: 4} {200: 0, 100: 1, 50: 6} {200: 0, 100: 0, 50: 8}

PythonHow to change timestamp to int in array
I'm trying to change timestamp datatype into int in array, for example,
array([Timestamp('20111229 00:00:00'), Timestamp('20131212 00:00:00'), Timestamp('20140109 00:00:00'), Timestamp('20140129 00:00:00')])
to
array([20111229, 20131212, 20140109, 20140129])
How can I do this?

Creating a dataset in tensorflow by adding random elements of two datasets
I have two sets of images, say set A and set B. These are available as numpy arrays. I will choose x random images from A and y random images from B and then add them together (taking the average at the end). These new images will become the input to a CNN.
Now my question is how can I do this using Tensorflow's data pipeline approach?
I can obviously do it outside of TensorFlow using numpy, but this precreated data will be too big in size. I can also use feed_dicts, but I wanted to know if there's a way to use pipelines to achieve this because that seems to be favoured approach now.

Which python module contains file object methods?
While it is simple to search by using
help
for most methods that have a clearhelp(module.method)
arrangement, for examplehelp(list.extend)
, I cannot work out how to look up the method.readline()
in python's inbuilt help function.Which module does
.readline
belong to? How would I search inhelp
for.readline
and related methods?Furthermore is there any way I can use the interpreter to find out which module a method belongs to in future?

find a string inside a another string
I am trying to write a function to find a string in a message but I'm having trouble setting up the indexes. I want my function to find all the occurrences of a string in a given message and print it. Any help is highly appreciated. Thank you!
Code:
def find_substr(mssg, search): searchstr = str(search) mssgstr = str(mssg) ix = 0 a = [] found = False while ix < len(mssgstr): # Condition to loop if searchstr == mssgstr[]: a.append(c) found = True ix = ix + 1 if found and c == len(mssgstr): for i in a: print(i) else: print(1) messag = " man man women women man" find_substr(messag, "man")

passing value to np.zeros()
I'm learning to understand how to add two matrices using functions
def
by initializing the results withnp.zeros()
. but somehow it doesn't work at all maybe i forgot something? can anyone help me?import numpy as np def addMatrix(a : np.ndarray, x : np.array)> np.ndarray: n, n_b = a.shape m, m_x = x.shape # Initialize result matrix with zeros result = np.zeros(n, m_x) #**I need to use this** , but this is the problem here for i in range(len(a)): for j in range(len(a[0])): result[i:j] = a[i:j] + x[i:j] #> this is the big Problem return b c = np.array([[1,2,3], [2,4,6], [3,2,1]]) d = np.array([[5,8,1], [6,7,3], [4,5,9]]) addMatrix(c,d)

Picking data randomly without a repeat in the index and create a new list out of it
My program needs to pick values randomly without repeating them. After that, the program will assign them random variables.
Assume this is the data:
[input] data [output] 0 0 770000.000 1 529400.000 2 780000.000 3 731300.000 4 935000.000 5 440000.000 6 634120.000 7 980000.000 8 600000.000 9 770000.000 10 600000.000 11 536613.000 12 660000.000 13 850000.000 14 563600.000 15 985000.000 16 600000.000 17 770000.000 18 957032.000 19 252000.000 20 397000.000 21 218750.000 22 785578.000
As you can see, the data contains repeated numbers in the index 0, 9, and 17. These numbers must not be ignored as the index is different. I could not find any way to solve my problem. I had many attempts like using
data.iloc[0]
but, I recieved thiserror ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Or, in my other attempts, the data was reduced as the program excluded some similar data.
In my first attempt, I used the following code
Col_list = [] def Grab(repeat): for x in range(FixedRange): letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' Three = [random.choice(letters) + \ random.choice(letters) + \ random.choice(letters)] A_Slice = random.randint(1, Total_Range_of_Data) [Col_list.append(data[A_Slice:A_Slice + 200]), Col_list.append(Three*len(data[A_Slice:A_Slice + 200]))] Col_list1 = pd.DataFrame(Col_list).T Col_listFinal = Col_list1 Grab(0)
and the output will give something like
. . . . . . . . 190 1.06934e+06 kCn 3.46638e+06 EmV ... 514564 LLl 450000 hfX 191 250000 kCn 1.37e+06 EmV ... 1.00430e+06 LLl 468305 hfX 192 741088 kCn 1.25e+06 EmV ... 312032 LLl 520000 hfX 193 427500 kCn 726700 EmV ... 1.0204e+06 LLl 495750 hfX 194 969600 kCn 853388 EmV ... 139300 LLl 530000 hfX 195 388556 kCn 1.21e+06 EmV ... 437500 LLl 598520 hfX 196 2.045e+06 kCn 1.53636e+06 EmV ... 547835 LLl 538250 hfX 197 435008 kCn 752700 EmV ... 712400 LLl 326000 hfX 198 6.15566e+06 kCn 1.56282e+06 EmV ... 1.385e+06 LLl 480000 hfX 199 551650 kCn 1.222e+06 EmV ... 771512 LLl 495750 hfX
But this is not helpful, as it is random and it may take some values more than once. Any suggestion to solve the problem?
by the way, the desired output must be something similar to the one above but without duplicates.

Pandas Grouping  Row counts of groups within a group
This is the first time I've asked a question here, so let me know if more info is needed 
I currently have a pandas
df
that is grouped by three columns:# Group by employee, end of work date and calendar date sum the quantity of the hours on each calendar date empHoursSum = df.groupby(['Employee ID', 'Week Ending', 'Calendar Date'])['Quantity'].sum().to_frame('Quantity')
This gives me an
Employee ID
with buckets forWeek Ending
(date the calendar work week ends) andCalendar Date
with the date's associated hours summed.What I am wanting to see is a running count for each
Calendar Date
within aWeek Ending
group.For instance if someone worked 6 days in a work week there would be 6 rows of dates. I would like to see a column with a 1 on the first entry a 2 on the second entry, so on and so forth.

Using a Custom Dual Input Keras Layer
I am attempting to create a simple multilayer perceptron in Keras. The general structure I would like to create is one where a matrix A of dimension [n_a1, n_a2] is sent through a number of layers of a multilayer perceptron, and at a certain point, the dot product of the morphed A matrix is taken with a randomly selected y vector [n_y, 1] from a set of y vectors, and the result then continues through a number more layers before reaching the end where it is compared with the input labels as normal.
Unfortunately, I am having issues with figuring how to implement this properly. I created a custom multiinput layer per the simple example offered at https://keras.io/layers/writingyourownkeraslayers/, but it seems that I still can't have multiple inputs in the network. I am getting an error for setting an array element as a sequence:
ValueError: setting an array element with a sequence.
Also, its unclear to me how the network would know what to do with the list formatted inputs for the other layers. Do I need to specify the list shape for each layer, and to somehow only use the A in the [A, y] list?
Custom Layer
class DualLayer(Layer): def __init__(self, output_dim, **kwargs): self.output_dim = output_dim super(DualLayer, self).__init__(**kwargs) def build(self, input_shape): #Trainable weight variable for layer self.kernel = self.add_weight(name='kernel', shape=(input_shape[1], self.output_dim), initializer='uniform', trainable=True) super(DualLayer, self).build(input_shape) def call(self, x): aOpt, y = x return [K.dot(aOpt, y)] def compute_outpute_shape(self, input_shape): assert isinstance(input_shape, list) shape_y, shape_aOpt = input_shape return [shape_y[0]]
Model
def modFunc(y1, A1, y2, A2, xSim): model = Sequential() model.add(Flatten(input_shape = np.shape(A1))) model.add(Dense(np.shape(A1)[0]*np.shape(A1)[1], activation = 'relu', kernel_regularizer = 'l1', activity_regularizer = 'l1')) model.add(DualLayer([y1, A1])) model.add(Dense(5, activation = 'relu')) clf = model.fit([y1, A1], xSim, epochs=5, batch_size=1, verbose=2) return clf.coef_

Training speed on a shallow neural network with a small dataset
My data contains of 1 feature and a label per feature ie.
["smallBigTest", "toastBob"]
< feature 4 labels["mix", "small", "big", "medium"]
I have converted my features to numbers based on alphabet ie.
smallBigTest > 18, 12, 0, 53, 53, 27, 8, 6, 45, 4, 18, 19 toastBob > 19, 14, 0, 18, 19, 27, 14, 1, 1, 1, 1, 1
which later on I hotencoded and reshaped so the final array of features would look like
[[hotencoded(18,12,0,53,53,27,8,6,45,4,18,19)], [hotencoded(19,14,0,18,19,27,14,1,1,1,1,1)]
simply made it into a 2d array from 3d array to match my labels shape, i have also hot encoded labels
the training data is about 60k lines of text 1.2mb csv file
and here is my model:
model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(16, activation=tf.nn.sigmoid)) model.add(tf.keras.layers.Dense(labelsDictSize, activation=tf.nn.softmax)) optimizer = tf.train.GradientDescentOptimizer(0.05) model.compile(optimizer, loss=tf.losses.softmax_cross_entropy) model.fit(featuresOneHot,labelsOneHot, steps_per_epoch=dataCount, epochs=5, verbose=1)
I'm new to ML, so I might be doing something completely wrong or completely stupid, I thought though that this amount of data would be fine. Training on my machine with gtx870m takes an hour per epoch and on google collaboratory around 2030 minutes per epoch

TensorFlow Keras Error: expected flatten_input to have shape
I am using TensorFlow Keras to create a machine learning model for a set of data. I have been able to get the model to compile, however when I try to run a prediction, I get the following error:
prediction = model.predict(test) File "C:\Users\mbaker\PycharmProjects\UnileverPatentWatch\venv\lib\sitepackages\tensorflow\python\keras\engine\training.py", line 1864, in predict x, check_steps=True, steps_name='steps', steps=steps) File "C:\Users\mbaker\PycharmProjects\UnileverPatentWatch\venv\lib\sitepackages\tensorflow\python\keras\engine\training.py", line 992, in _standardize_user_data class_weight, batch_size) File "C:\Users\mbaker\PycharmProjects\UnileverPatentWatch\venv\lib\sitepackages\tensorflow\python\keras\engine\training.py", line 1117, in _standardize_weights exception_prefix='input') File "C:\Users\mbaker\PycharmProjects\UnileverPatentWatch\venv\lib\sitepackages\tensorflow\python\keras\engine\training_utils.py", line 332, in standardize_input_data ' but got array with shape ' + str(data_shape)) ValueError: Error when checking input: expected flatten_input to have shape (1872,) but got array with shape (1,)
My ML code is:
train_features = np.array(train_features) model = keras.Sequential([ keras.layers.Flatten(input_shape=(len(train_features[0]),)), keras.layers.Dense(512, activation=tf.nn.relu), keras.layers.Dense(128, activation=tf.nn.relu), keras.layers.Dense(7, activation=tf.nn.softmax) ]) model.compile(optimizer=tf.train.AdamOptimizer(), loss='sparse_categorical_crossentropy', metrics=['accuracy']) model.fit(train_features, train_labels, epochs=10) x = 0 for test in test_features: test = np.array(test) print(test) print(len(test)) prediction = model.predict(test) print(test_patents[x], prediction, labelIndex[prediction]) x += 1
test_features is a list of feature sets for which I want a prediction. I have tried alternately passing the feature set as a numpy array as seen above or as a list and both methods have resulted in the above error. The data is being passed to the predict function in the same format as was provided to train the model. Any help would be greatly appreciated.