How to bring time series data in following format for sequence to sequence prediction
1 answer

Use
shift
in loop withfstring
s:#python 3.6+ for i in range(1,5): df[f'demand_{i}'] = df['demand'].shift(i) #python bellow 3.6 for i in range(1,5): df['demand_{}'.format(i)] = df['demand'].shift(i)
Sample:
df = pd.DataFrame({ 'demand':[4,7,8,3,5,0], }) for i in range(1,5): df['demand_{}'.format(i)] = df['demand'].shift(i) print(df) demand demand_1 demand_2 demand_3 demand_4 0 4 7.0 8.0 3.0 5.0 1 7 8.0 3.0 5.0 0.0 2 8 3.0 5.0 0.0 NaN 3 3 5.0 0.0 NaN NaN 4 5 0.0 NaN NaN NaN 5 0 NaN NaN NaN NaN
See also questions close to this topic

Set header in python list()
How can I set header for a python list?
I'm currently doing it like this:
df = list() df.append('header1')
Just wondering if there's a better solution...

Better way to Vlookup
I would like to know if there is a better alternative to Vlookup to find matches between two cells (or Python Dfs).
I want my code to check if the values in DF1 was in DF2, If values exactly match OR if the values partially matche return me the value in the DF2.
Just like the matches in 4th column Row 2,3 returned values.
Thanks Amigo!

How to have topic directive show up in table of contents?
The topic directive (found here: http://docutils.sourceforge.net/docs/ref/rst/directives.html#topic) is said to be used as a "selfcontained section".
How can I make the title of each topic box show up in the table of contents, like a normal section title would?

How to merge pandas dataframe into existing reportlab table?
example_df = [[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]
I want to integrate example_df pandas data frame into an existing Reportlab table  where the number of rows is changing (could be 3 as shown in the example, or it could be 20):
rlab_table(['Mean','Max','Min','TestA','TestB'], ['','','','',''], ['','','','',''], ['','','','','']])
I have tried:
np.array(example_df).tolist()
but I get this error (AttributeError: 'int' object has no attribute 'wrapOn')
I am able to manually add each row into the report lab table by doing:
rlab_table(['Mean','Max','Min','TestA','TestB'], np.array(example_df).tolist()[0], np.array(example_df).tolist()[1], np.array(example_df).tolist()[2]])
However, the issue is that the number of rows in the dataframe is constantly changing, so I am seeking a solution similar to:
rlab_table(['Mean','Max','Min','TestA','TestB'], np.array(example_df).tolist()[0:X])] #Where X is the number of rows in the dataframe
 Pandas python Aggregation and Grouping  I want to show the sum on top of each different type

bins  Categorize column values using bins for ages
I have a .CSV file a snippet of which looks like this:
ID,SN, Age,Gender,Item ID,Item Name, Price 0,Lisim78, 20, Male, 108, Extraction Quickblade, 3.53 1,Lisovynya38, 40, Male, 143, Frenzied Scimitar, 1.56 2,Ithergue48, 24, Male, 92, Final Critic, 4.88 3,Chamassasya86, 24, Male, 100, Blindscythe, 3.27 4,Iskosia90, 23, Male, 131, Fury, 1.44 5,Yalae81, 22, Male, 81, Dreamkiss, 3.61 6,Itheria73, 36, Male, 169, Interrogator, 2.18 7,Iskjaskst81, 20, Male, 162, Abyssal Shard, 2.67 8,Undjask33, 22, Male, 21, Souleater, 1.1 9,Chanosian48, 35, Other, 136, Ghastly Adamantite, 3.58 10,Inguron55, 23, Male, 95, Singed Onyx Warscythe, 4.74
I need to establish bins for the 'Age' column which I have done like so:
bins = [0, 10, 15, 20, 25, 30, 35, 40, 45] names = ['<10', '1014', '1519', '2024', '2529', '3034', '3539', '40+'] df_bins = pd.cut(df['Age'], bins, labels=names)
How do I use the bins to categorize other columns like column 'SN'? I wanna be able to get a count of all players in 'SN' column who are <10, 1014, 1519 years... and so on.
Any help is greatly appreciated!

Replace and Iterate over x and yaxis of multidimensional np.matrix
I have written the following script:
import numpy as np d = 0.05 limits = np.arange(0, 0.5, d) matrix = np.zeros([limits.shape[0], limits.shape[0]]) matrix[:, 0] = limits matrix[0, :] = limits
This gives me the following matrix:
How can I make sure that the values in the first column and first row are the "index"values? So instead of [0,1,2,...,9] there should be [0, 0.05, ...0.45]
Thank you

Concatenate arrays in Python  output error
I am writing a program to concatenate two numpy arrays and I want the program to print a message for each possible outcome (horizontal, vertical, or no concatenation). I have the following code and I donĀ“t understand why even when the first condition (np.hstack) is met it continues evaluating the subsequent if and else statements and wrongly prints that there is both a horizontal concatenation (first condition is met) and that a concatenation is not possible.
import numpy as np def fun1(a,b): if a.shape[0] == b.shape[0]: print("The horizontal concatenation is:", np.hstack((a,b))) if a.shape[1] == b.shape[1]: print("The vertical concatenation is:", np.vstack((a,b))) else: print("These arrays cannot be concatenated.") a = np.floor(10*np.random.random((3,2))) b = np.floor(10*np.random.random((3,4))) fun1(a,b)
Output:
The horizontal concatenation is: [[5. 0. 1. 1. 3. 7.] [4. 1. 8. 3. 1. 9.] [9. 1. 5. 7. 0. 0.]] These arrays cannot be concatenated.

How to merge different timeseries plot into a 2D in Python
I have three different timeseries data of the following format where the first column is timestamp and the second column is the value.
0.086206438,10 0.086425551,12 0.089227066,20 0.089262508,24 0.089744425,30 0.090036815,40 0.090054172,28 0.090377569,28 0.090514071,28 0.090762872,28 0.090912691,27
For reproduciability, I have shared the three timeseries data I am using here.
From column 2, I wanted to read current row and compare it with the value of the previous row. If it is greater, I keep comparing. If the current value is smaller than the previous row's value continue, I take the difference. Let me make it clear. For example in the above sample record I provided, the seventh row (28) is smaller than the value in the sixth row (40)  so it will be (4028 =12).
Here is my sample code.
import numpy as np import pandas as pd import csv import numpy as np import scipy.stats import matplotlib.pyplot as plt import seaborn as sns from scipy.stats import norm from statsmodels.graphics.tsaplots import plot_acf, acf protocols = {} types = {"data1": "data1.csv", "data2": "data2.csv", "data3": "data3.csv"} for protname, fname in types.items(): arr = [] arr1 = [] with open(fname, mode='r', encoding='utf8sig') as f: reader = csv.reader(f, delimiter=",") for i in reader: arr.append(int(i[1])) arr1.append(float(i[0])) arr, arr1 = np.array(arr), np.array(arr1) diffs = np.diff(arr) diffs1 = np.diff(arr1) diffs1 = diffs1[diffs > 0] diffs = diffs[diffs > 0] # To keep only the increased values protocols[protname] = { "rtime": np.array(arr1), "rdata": np.array(arr), "data": diffs, "timediff": diffs, } ## change in time for protname, values in protocols.items(): d = values["rdata"] t = values["rtime"] d = np.diff(d, 1) #/ d[:1] t = np.diff(t, 1) plt.plot(t, d, ".", label=protname, alpha=0.5) plt.xlabel("Changes in time") plt.ylabel("differences") plt.legend() plt.show()
This gives me the following plots
How can we plot the differences versus the change in time (column onw) in a twodimensitional (2D) graph of the three data I included separately?