Correllation (pandas) between date and integer? timeseries
Suppose my data is in the form of
date  price
20170909  13000
20170908  20000
20170907  15000
20170906  13000
20170905  15000
How do I find the correlation between price and time? df.corr() ignores the date column.
1 answer

Change date time format to numeric , then you can use
corr
df.date=pd.to_datetime(df.date) df.date=pd.to_numeric(df.date) df.corr() Out[306]: date price date 1.000000 0.165647 price 0.165647 1.000000
See also questions close to this topic

Implement insert and pop methods for a linked list
I have to create a two classes (
Node
,LinkedList
) thatadd
,remove
, etc. from a linked list. Everything else works but I am struggling with theinsert()
andpop()
methods. The instructions for what I am trying to do are:insert(after,item)
adds a new Node with value=item to the list after the Node with value=after. It needs the value of the existing node and the new value to be inserted, returns nothing.pop()
removes and returns the value of the last Node in the list. It needs no parameters and modifies the list and returns an item.
This is what I have for the classes + the methods:
class Node: def __init__(self, value, after): self.value = value self.next = None self.after = after def getValue(self): return self.value def getNext(self): return self.next def setValue(self,new_value): self.value = new_value def setNext(self,new_next): self.next = new_next def __str__(self): return ("{}".format(self.value)) __repr__ = __str__ class LinkedList: def __init__(self): self.head=None self.tail=None self.count=0 def insert(self, after, item): if after==0: self.add(item) elif item>self.size(): print("Position index is out of range") elif item==self.size(): self.append(item) else: temp=Node.Node(item,after) current=self.head prev=None current_after=0 while current_after!=after: prev=current current=current.next current_after+=1 prev.next=temp temp.next=current def pop(self): prev = None node = self.head i = 0 while ( node != None ) and ( i < index ): prev = node node = node.next i += 1 if prev == None: self.head = node.next else: prev.next = node.next return node.next def append(self, value): if self.head==None: new_node=Node(value) self.head=new_node self.tail=self.head elif self.tail==self.head: self.tail=Node(value) self.head.setNext(self.tail) else: new_node=Node(value) self.tail.setNext(new_node) self.tail=new_node self.count+=1 def remove(self, value): current=self.head previous=None found=False while not found: if current.getValue()==value: found=True else: previous=current current=current.getNext() if previous==None: self.head=current.getNext() elif current.getNext()==None: self.tail=previous previous.setNext(None) else: previous.setNext(current.getNext()) self.count=1 def isEmpty(self): return self.head == None def size(self): return self.count def add(self, value): new_node=Node(value) new_node.setNext(self.head) self.head=new_node self.count+=1 if self.size()==1: self.tail=new_node self.printList() def search(self,value): current=self.head found=False while current!=None and not found: if current.getValue()==value: return True else: current=current.getNext() return found def printList(self): temp=self.head while temp: print(temp.value, end=' ') temp=temp.next

Divide each value in list array
I am trying to divide by 80 of each array value in the list. What I have tried is,
dfs = pd.read_excel('ff1.xlsx', sheet_name=None) dfs1 = {i:x.groupby(pd.to_datetime(x['date']).dt.strftime('%Y%m%d'))['duration'].sum() for i, x in dfs.items()} d = pd.concat(dfs1).groupby(level=1).apply(list).to_dict() print(d)
OP :
{'20170506': [197, 250], '20170507': [188, 80], '20170508': [138, 138], '20170509': [216, 222], '20170609': [6]}
But Expected OP :
1 : Divide by 80 {'20170506': [2, 3], '20170507': [2, 1], '20170508': [2, 2], '20170509': [2, 2], '20170609': [0]} 2 : total of each array and subtract each value (3+2 = 53 and 52) {'20170506': [3, 2], '20170507': [1, 2], '20170508': [2, 2], '20170509': [2, 2], '20170609': [0]}
How to do this using python?

Export list of data frames to CSVs in python
I've got a list of data frames I'm trying to preform a function on and export the results. The function spits out a result, and I then want to turn the results into a data frame and export to a .CSV. Here's what I currently have:
for df, filename in zip(df_list, filename_list): function(df) results_df = pd.DataFrame(function_results) results_df.to_csv(filename)
The error occurs when I try to export the .csv. If I just run the loop with the function and print results to the console like so:
for df in df_list: function(df)
It works fine. When I try to loop the .csv export though I get
Attribute Error: 'list' object has no attribute 'close'
Any ideas?

Hi,in pands my 2 data frames are like shown below and my expected data..?
df1
c1 c2 1 1 2 2 4 2
df2
c3 5 6 7 8
Expected Data frame should be like this
Resultdf4
c1 c2 c3 1 1 5   6   7   8 2 2 5   6   7   8 4 2 5   6   7   8
 = null
i am concatenateing 2 dataframe.

Rsquared in Fama Macbeth using rolling window
I am trying to do Fama Macbeth regression on some tradable factors using 5year rolling window updated monthly. However, I am a little bit confused when calculating the final Rsquared of the model. I am thinking about two ways to deal with it:
For each rolling window, I have one Rsquared. To calculate the final Rsquared of the model, I just take the average of all Rsquared in each rolling window (just like the way we do with lambda) >> I get pretty good Rsquared (around 70%80%)
After extracting the final lambda for each factor, I use Rsquared formula to calculate the final Rsquared >> I get very bad Rsquared (negative). In this case, I use dependent variables are average return of each portfolio, independent variables are obviously the betas, corresponding with factors and portfolios.
So how usually the final Rsquared is calculated ?

I have the data retrieved from LRS and would like to plot some graphs with them. How to do it?
Array ( [0] => Array ( [Name] => Aishwarya Ravichandran [Email] => mailto:aishR@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/completed [Activity] => https://app.acuizen.com/populate_form/965/1573/4690 )
[1] => Array ( [Name] => Akshaya Manikandan [Email] => mailto:aksh.m14@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/skipped [Activity] => https://app.acuizen.com/populate_form/965/1573/4300 ) [2] => Array ( [Name] => Akshaya Manikandan [Email] => mailto:aksh.m14@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/skipped [Activity] => https://app.acuizen.com/populate_form/965/1573/4690 ) [3] => Array ( [Name] => Aishwarya Ravichandran [Email] => mailto:aishR@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/completed [Activity] => https://app.acuizen.com/populate_form/965/1573/4690 ) [4] => Array ( [Name] => Aishwarya Chandrashekar [Email] => mailto:aishuc@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/completed [Activity] => https://app.acuizen.com/populate_form/965/1573/4690 ) [5] => Array ( [Name] => Shreenidhi Rajendran [Email] => mailto:shree.r@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/skipped [Activity] => https://app.acuizen.com/populate_form/965/1573/4690 ) [6] => Array ( [Name] => Akshaii Narayanan [Email] => mailto:aksh.n07@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/paused [Activity] => https://app.acuizen.com/populate_form/965/1573/4692 ) [7] => Array ( [Name] => Akshaay [Email] => mailto:axe13@yahoo.com [Verb] => http://adlnet.gov/expapi/verbs/paused [Activity] => https://app.acuizen.com/populate_form/965/1573/4697 ) [8] => Array ( [Name] => Vasudevan S [Email] => mailto:vauss@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/skipped [Activity] => https://app.acuizen.com/populate_form/965/1573/4697 ) [9] => Array ( [Name] => Kathiroli V [Email] => mailto:kadv93@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/completed [Activity] => https://app.acuizen.com/populate_form/965/1573/4695 ) [10] => Array ( [Name] => Yashwanth [Email] => mailto:yashkav@gmail.com [Verb] => http://adlnet.gov/expapi/verbs/completed [Activity] => https://app.acuizen.com/populate_form/965/1573/4699 )
The data retrieved looks like this. Now I would like to represent the same content sorted as Verb or Activity and display them graphically.
 Automatic Value on Excel Tables

Error in data.frame(ID = sales$Store, Date = sales$Date, Prediction = test_sales) : arguments imply differing number of rows: 27320, 27214
I am working on a dataset to forecast the sales. I have Store ID(50 stores), Date(2016,2017) and sales. So, I need to forecast the sales for 2018.
library(forecast) library(fma) library(dplyr) library(ggplot2) #visualizations library(gridExtra) #viewing multiple plots together library(tidytext) #text mining library(wordcloud2) library(caret) library(lattice) sales_2016 = read.csv("C:/Users/manjeet.singh/Desktop/2016XP.csv") sales_2017 = read.csv("C:/Users/manjeet.singh/Desktop/2017XP.csv") sales = rbind(sales_2016,sales_2017) preProcValues < preProcess(sales, method = c("knnImpute","center","scale")) train_processed < predict(preProcValues, sales) sum(is.na(sales)) sales$Date < as.Date(sales$Date,format= "%Y%m%d") tsDataSales<ts(sales$Sales ,start=c(2016,1),frequency=52) plot(tsDataSales) plot(decompose(tsDataSales)) train_sales<window(tsDataSales,start=c(2016,1),end=c(2017,52),frequency = 52) test_sales<window(tsDataSales,start=2018,frequency = 52) ndiffs(train_sales) Acf(train_sales,lag.max=52,plot=TRUE) Pacf(train_sales,lag.max=52,plot=TRUE,main="Original Time Series") tst_arima < arima(train_sales,order=c(1,1,1)) summary(tst_arima) tst_arima_resi < residuals(tst_arima) summary(tst_arima_resi) plot(tst_arima_resi) qqnorm(tst_arima_resi) qqline(tst_arima_resi) box<Box.test(tst_arima_resi,lag=52,type="LjungBox",fitdf=1) box forecastval < forecast(tst_arima,h=52) plot(forecastval,main="Prediction from Auto Arima for Weekly Sales") lines(test_sales,col="green") **result = data.frame("ID"=sales$Store,"Date"= sales$Date, "Prediction"= test_sales)**
I am getting this error when I am trying the write the CSV file.
Error in data.frame(ID = sales$Store, Date = sales$Date, Prediction = test_sales) : arguments imply differing number of rows: 27320, 27214
Can you please tell me where I am going wrong.
I have tried everything here but still struggling.

How to identify cycles in Time series with pandas tools?
My problem is that i have dataframe as:
Time Value Invalid 0 20170806 00:00:51.561 0.000000 False 1 20170806 00:00:51.610 0.035937 False 2 20170806 00:00:51.690 0.071875 False 3 20170806 00:00:51.711 0.035020 False 4 20170806 00:00:51.760 0.000000 False ... 9000000
I would like to count how many cycles there are in this data and identify each cycle with starting time and ending time. There should be many cycles and all start and end to zero value. How can I identify this with python/pandas?
Thank you

Comparing Garch and Neural Networks with Time Series
I want to compare how garch models time series versus a neural network. I have my garch model:
garch<ugarchspec(variance.model=list(garchOrder=c(1,1)), mean.model=list(armaOrder= c(0,0), include.mean=FALSE), distribution.model= "std") garch_fit<ugarchfit(spec=garch,data=currency) print(garch_fit)
So I have my data which is a foreign currency converted to US dollars and I scaled the raw prices to be in terms of the mean of all prices.
The next step of my project is to fit my currency data to a neural network and then compare which model works bettergarch or neural net. Any advice on how to proceed?