Problem with calculation od days from date in DataFrame in Python Pandas
I have DataFrame like below:
df = pd.DataFrame({"data" : ["02.01.2020"]})
df["data"] = pd.to_datetime(df["data"])
And list of special dates:
special_date = pd.to_datetime(["04.01.2020", "01.01.2020"], dayfirst=True)
ANd I need to calculate 2 columns in this DataFrame:
col1 = number of days to the next special date
col2 = number of days from the last special date
\
So I need result like below:
col1 = 2 because next special date from 02.01.2020 will be for 2 days (04.01.2020)
col2 = 1 because last special date from 02.01.2020 was 1 day ago (01.01.2020)
1 answer
-
answered 2021-01-19 12:07
Celius Stingher
You can simply use the
-
sign operator:df = pd.DataFrame({"data" : ["02.01.2020"]}) df["data"] = pd.to_datetime(df["data"],dayfirst=True) special_date = pd.to_datetime(["04.01.2020", "01.01.2020"], dayfirst=True) df['col1'] = abs(df['data'] - special_date[0]) df['col2'] = df['data'] - special_date[1]
This outputs:
data col1 col2 0 2020-01-02 2 days 1 days
If you'd like to have only the number, then you can add
.dt.days
after the calculations(abs(df['data'] - special_data[0]).dt.days
which outputs:data col_1 col_2 0 2020-01-02 2 1
See also questions close to this topic
-
AzureML Machine Learning Tasks
Please email me for an easy AzureML task. Regression and Classification is needed in graphics. I will provide you details and pay you right away after we agree on the budget. email: burakkdedeoglu gmail.com
Thanks for your helps. I am looking forward to start.
-
Methods to remove Time Limit exceeded error from my code?
https://www.codechef.com/MARCH21C/problems/COLGLF4
This is the question, for which I have written the solution.
It shows Time Limit Exceeded(TLE) error.
Pls suggest how do I optimise it further.
I have optimised it to some extent. How do I Optimise it further. Pls Help!
Here in my code: e-> egg, h-> choclate
#define fastio() ios_base::sync_with_stdio(false);cin.tie(NULL);cout.tie(NULL) #include <bits/stdc++.h> using namespace std; // help returns the min cost for n items int help(int n, int e, int h, int& a, int& b, int& c, int *** dp){ if(n==0) return 0; if(dp[h][e][n]!=-1) return dp[h][e][n]; int p1=INT_MAX,p2=INT_MAX,p3=INT_MAX; if(e>=2){ int ans = help(n-1,e-2,h,a,b,c,dp); // dp[h][e-2] = ans; if(ans != INT_MAX) p1=ans + a; } if(h>=3){ int ans = help(n-1,e,h-3,a,b,c,dp); // dp[h-3][e] = ans; if(ans != INT_MAX) p2 = ans + b; } if(e>=1&&h>=1){ int ans = help(n-1, e-1,h-1,a,b,c,dp); // dp[h-1][e-1] = ans; if(ans != INT_MAX) p3= ans + c; } int f = min(p1,min(p2,p3)); dp[h][e][n] = f; return f; } int main() { // your code goes here fastio(); int T; cin>>T; for(int t=0;t<T;t++){ int n,e,h,a,b,c; cin>>n>>e>>h>>a>>b>>c; // scanf("%d",&n); // scanf("%d",&e); // scanf("%d",&h); // scanf("%d",&a); // scanf("%d",&b); // scanf("%d",&c); int*** dp = new int**[h+1]; for(int i=0;i<=h;i++){ dp[i] = new int*[e+1]; for(int j=0;j<=e;j++){ dp[i][j] = new int[n+1]; for(int k=1;k<=n;k++) dp[i][j][k]=-1; } } for(int i=0;i<=h;i++){ for(int j=0;j<=e;j++){ dp[i][j][0]=0; } } int ans = help(n,e,h,a,b,c,dp); if(ans == INT_MAX) cout<<-1<<'\n'; else cout<<ans<<'\n'; // for(int i=0;i<=h;i++){ // for(int j=0;j<=e;j++){ // cout<<dp[i][j]<<" "; // }cout<<endl; // } for(int i=0;i<=h;i++){ for(int j=0;j<=e;j++){ delete [] dp[i][j]; } delete dp[i]; } delete [] dp; } return 0; }
-
Python Custom Config Files for Modules
I have below Directory Structure:
|-- masterFile.py
|-- firstFolder
|-- |-- first.py
|-- |-- configFile
|-- SecondFolder
|-- |-- second.py
|-- |-- configFileI have many folders like (firstFolder, secondFolder...) and all their python files have same type of code only difference is on processing the variables from config file. I have put all the common code in masterFile.py file and the variable processing has been kept in first.py,m second.py files as a function.
What I want to do is: I will run the masterFile.py file and depending on some input/signal I will call the fucntion from first.py, second.py .. file and I want them to be run with their own config file variables.
I am using below code format:
In first.py, second.py
import os import configparser config = configparser.ConfigParser() config.read(os.path.join(os.path.dirname(__file__),'config')) var1 = config['xyz']['abc'] var2 = config['jkl']['mno'] def doSomeMagic(INPUT_VAR): someProcessing_on var1, var2 & INPUT_VAR
Importing the modules in masterFile.py
from firstFolder import first from secondFolder import second initialization_of INPUT_VAR if SIGNAl == 1: first.doSomeMagic(INPUT_VAR)
Depending on some input/signal, I will call doSomeMagic function from first.py, second.py ... files. I want these function to use their config file but Config files are not accessible when I am following this format/structure. < br>
Any help on how to Achieve this and any advice to change these formats/stucture to make more Pythonic way.
I have some folder with name starting with a digit like: 7Yes
from 7Yes import 7yes
How can I import the python file from this directory?
-
How to read Pandas data frame from one file to another file
i have a data from in abc.py i want read the dataframe(DB) in another file xyz.py and perform some other operation
in xyz.py
import abc as abc print(abc.DB) its giving empty data frame.
Need a solution how to access a DataFrame in multiple files.
-
Expand Pandas Dataframe Column with JSON
I'm looking for a clean, fast way to expand a pandas dataframe column which contains a json object (essentially a dict of nested dicts), so I could have one column for each element in the json column in json normalized form; however, this needs to retain all of the original dataframe columns as well. In some instances, this dict might have a common identifier I could use to merge with the original dataframe, but not always. For example:
import pandas as pd import numpy as np df = pd.DataFrame([ { 'col1': 'a', 'col2': {'col2.1': 'a1', 'col2.2': {'col2.2.1': 'a2.1', 'col2.2.2': 'a2.2'}}, 'col3': '3a' }, { 'col1': 'b', 'col2': np.nan, 'col3': '3b' }, { 'col1': 'c', 'col2': {'col2.1': 'c1', 'col2.2': {'col2.2.1': 'c2.1', 'col2.2.2': 'c2.2'}}, 'col3': '3c' } ])
Here is a sample dataframe. As you can see, col2 is a dict in all of these cases which has another nested dict inside of it, or could be a null value, containing nested elements I would like to be able to access. In this case, they have no ID that could link up to the original dataframe. My end goal would be essentially to have this:
final = pd.DataFrame([ { 'col1': 'a', 'col2.1': 'a1', 'col2.2.col2.2.1': 'a2.1', 'col2.2.col2.2.2': 'a2.2', 'col3': '3a' }, { 'col1': 'b', 'col2.1': np.nan, 'col2.2.col2.2.1': np.nan, 'col2.2.col2.2.2': np.nan, 'col3': '3b' }, { 'col1': 'c', 'col2.1': 'c1', 'col2.2.col2.2.1': 'c2.1', 'col2.2.col2.2.2': 'c2.2', 'col3': '3c' } ])
In my instance, the dict could have up to 50 nested key-value pairs, and I might only need to access a few of them. Additionally, I have about 50 - 100 other columns of data I need to preserve with these new columns (so an end goal of around 100 - 150). So I suppose there might be two methods I'd be looking for--getting a column for each value in the dict, or getting a column for a select few. The former option I haven't yet found a great workaround for; I've looked at some prior answers but found them to be rather confusing, and most threw errors. This seems especially difficult when there are dicts nested inside of the column. To attempt the second solution, I tried the following code:
def get_val_from_dict(row, col, label): if pd.isnull(row[col]): return np.nan norm = pd.json_normalize(row[col]) try: return norm[label] except: return np.nan needed_cols = ['col2.1', 'col2.2.col2.2.1', 'col2.2.col2.2.2'] for label in needed_cols: df[label] = df.apply(get_val_from_dict, args = ('col2', label), axis = 1)
This seemed to work for this example, but for my actual dataframe which had substantially more data, this was quite slow--and, I would imagine, is not a great or scalable solution. Would anyone be able to offer an alternative to this sluggish approach to resolving the issue I'm having?
(Also, apologies also about the massive amounts of nesting in my naming here. If helpful, I am adding in several images of the dataframes below--the original, then the target, and then the current output.)
-
Label groups of columns in Pandas
I am producing a data frame with many columns, and I would like to have groups of columns.
I believe this and this questions ares related, but I'm not sure how to apply that to my case.
In summary, I want to have a super label for sets of columns.
My data is getting generated similarly to:
def inspect_pages(cases, checks=[check_one, check_two]): columns = list(chain.from_iterable( [[(check.name, column) for column in check.reportable] for check in checks] )) column_index = MultiIndex.from_tuples(columns, names=["Check Name", "Check"]) print(column_index) results = DataFrame( index=[case.label for case in cases], columns=column_index ) for n, case in enumerate(cases): for (name, result) in map(lambda c: (c.name, c(**case._asdict())), checks): for (key, value) in result.items(): results.loc[case.label, key] = value return results
So what I would like is to have a merged column that spans the same number of columns I'm getting per test labeled with the content
test.name
, and have regular columns with thekey
andvalue
values per case.Any idea or pointer to the right direction?
-
I am trying to send data frames to a platform, but keep getting this error : "TypeError: unhashable type: 'list'"
I have attached a snippet of the code written and the error.Please do let me know how to about it. Thanks.
code written -
Getting the week range for a given month in Postgress calculating from the beginning of the Date
Here is my current implementation
-- Getting the week range from week 1 to week 5 with t as (select date_trunc('month', '2019-03-11'::date) as aday), -- any date in Jan-2021 s as ( select d::date, d::date + 6 ed, extract('isodow' from d) wd from t, generate_series (aday, aday + interval '1 month - 1 day', interval '1 day') d ) select format ('Week %s', extract(day from d)::integer / 7 + 1) as weekname, d, ed from s where wd = 1;
When tested with
2019-03-11
, I get a wrong output. Expected output should beWeek 1 : 01-03-2019 - 03-03-2019
Week 2 : 04-03-2019 - 10-03-2019
Week 3 : 11-03-2019 - 17-03-2019
Week 4 : 18-03-2019 - 24-03-2019
Week 5 : 25-03-2019 - 31-03-2019
The beginning of the week should start from the First day of the month.
-
Does change of date or inconsistency between date and hwclock affect to ipv4 (inet addr) in ifconfig in linux?
I have trouble that inet addr(ipv4) of eth0 disappears sometimes.
I guess that It Occurs When I Changed the Date by
sudo date MMDDHHmmYYYY.SS
When I changed the date as 030810102016.00 today(03/08/2021),
IP address disappeared in few minutes (as I thought).I'v tried it multiple times, and got same results.
I'm using Ubuntu 16.04.6 LTS.
Does the change can cause problem in ip? (especially setting back to past date.)
or, does the inconsistency between date and hwclock can cause that?
-
overlap period and create new period + Java
I have a problem with my algorithm to calculate the off-peak periods between two periods. The goal is not to have an intersection between the periods. Example an agent has a list of quarries. we must keep those of type A and calculate whether new periods of type O are needed.
this some exemple : --Agent 123456 |--------------------------------------------------|O |----------------|A |----------------|A
nudoss adecod codcar dateff datfin 123456 G04377 O 2020-01-01 2022-12-31 123456 E00452 A 2020-06-01 2020-10-31 123456 E00452 A 2022-01-01 2022-09-30 Resultat attendu : |----| O1 |----------------|A |--------|O2 |----------------|A |----|O Resulat attendu codcar|datgra |dateff |datfin |adecod| ------|----------|----------|----------|------| O |1996-09-01|2020-01-01|2020-05-31|G04377| A |1996-09-01|2020-06-01|2020-10-31|E00452| O |1996-09-01|2020-11-01|2021-12-31|G04377| A |1996-09-01|2022-01-01|2022-09-30|E00452| O |1996-09-01|2022-10-01|2022-12-31|G04377|
--Agent 234567 |--------|O |---------------|O |----------------|A|----------------------|A
nudoss adecod codcar dateff datfin 234567 GGTEAA O 2020-01-01 2020-12-31 234567 GGTE2C A 2020-06-01 2021-10-31 234567 GGTEBB A 2021-11-01 2022-05-01 234567 GGTE2C O 2022-04-01 2022-12-31 |------|O1
|----------------|A|----------------------|A|-----------|O2
Resulat attendu codcar|datgra |dateff |datfin |adecod|nudoss| ------|----------|----------|----------|------|------| O |1996-09-01|2020-01-01|2020-05-31|GGTHHH|234567| A |1996-09-01|2020-06-01|2021-10-31|GGTE2K|234567| A |1996-09-01|2021-11-01|2022-05-01|GGTE2C|234567| O |1996-09-01|2022-05-02|2022-12-31|GGTEJK|234567|
-- Agent 999999 |------|O |---------------|O |---------------|O3|---------------|O4 |----------------|A1 |----------------------|A2
O1 : 31/12/2019 --- 31/12/2020 A1 : 01/01/2020 --- 30/10/2021 O2 : 01/05/2020 --- 31/12/2021 A2 : 01/01/2022 --- 01/05/2022
resulat attendu
O3 : 01/03/2022 --- 31/12/2022 O4 : 01/01/2023 --- 31/12/2999|------|O
|----------------|A
|---------|02 |----------------------|A2 |-----|O3'|---------------|O4codcar dateff datfin adecod O 2019-01-01 2019-12-31 GGTEAA A 2020-01-01 2020-10-30 GGTE2C O 2020-10-31 2021-12-31 GGTEBB A 2022-01-01 2022-05-01 GGTE2C O 2022-05-02 2022-12-31 GGTEKK O 2023-01-01 2999-12-31 GGTEMM -- Agent 909812 données acctuelles O |------ ....... -------|O7 |-------------------------------------------------------|O8 |------------------------------------------------|A
adecod codcar dateff datfin G01083 O 2007-09-01 2007-11-30 G01083 O 2007-12-01 2008-08-31 G01083 O 2008-09-01 2009-08-31 G01083 O 2009-09-01 2012-02-29 G01083 O 2012-03-01 2015-02-28 G01083 O 2015-03-01 2017-08-31 G04296 O 2017-09-01 2018-02-28 G04296 O 2018-03-01 2999-12-31 G04302 A 2020-12-01 2999-12-31 resulat attendu O |------ ....... -------|O7 |------|O8 |---------------------------------------------------|A
codcar dateff datfin adecod O 2007-09-01 2007-11-30 G01083 O 2007-12-01 2008-08-31 G01083 O 2008-09-01 2009-08-31 G01083 O 2009-09-01 2012-02-29 G01083 O 2012-03-01 2015-02-28 G01083 O 2015-03-01 2017-08-31 G01083 O 2017-09-01 2018-02-28 G04296 O 2018-03-01 2020-11-30 G04296 A 2020-12-01 2999-12-31 G04302 -- Agent 908699
donnees acctuelles : O |------ ....... |--------------------------------|O11 |-------------|A1 |--------|A2 |-----------------------------------|A3 |--------------|O12 |--|O13 |------|O14 |-------------...-----------|O15 |---------|A4
adecod codcar dateff datfin GGCCCN O 1982-10-01 1983-09-30 GGCCCN O 1983-10-01 1986-09-30 GGCCCN O 1986-10-01 1988-09-30 GGCCCN O 1988-10-01 1990-09-30 GGCCCN O 1990-10-01 1993-08-31 GGCCCN O 1993-09-01 1995-12-31 GGCCCN O 1996-01-01 1998-11-30 GGCCCN O 1998-12-01 2004-08-31 G00682 O 2004-09-01 2005-08-31 G00681 O 2005-09-01 2008-08-31 G00681 O 2008-09-01 2016-12-31 E00462 A 2011-07-20 2012-07-19 E00462 A 2012-07-20 2013-07-19 E00462 A 2013-07-20 2020-12-31 G00680 O 2017-01-01 2017-12-30 G04377 O 2017-12-31 2017-12-31 G04377 O 2018-01-01 2018-12-31 G04377 O 2019-01-01 2999-12-31 E00986 A 2021-01-01 2024-12-31 resultat : O |------ ....... |----|O11 |-------------|A1 |--------|A2 |------------------|A3 |---------|A4
|---------......-----|O15codcar dateff datfin adecod O 1982-10-01 1983-09-30 GGCCCN O 1983-10-01 1986-09-30 GGCCCN O 1986-10-01 1988-09-30 GGCCCN O 1988-10-01 1990-09-30 GGCCCN O 1990-10-01 1993-08-31 GGCCCN O 1993-09-01 1995-12-31 GGCCCN O 1996-01-01 1998-11-30 GGCCCN O 1998-12-01 2004-08-31 GGCCCN O 2004-09-01 2005-08-31 G00682 O 2005-09-01 2008-08-31 G00681 O 2008-09-01 2011-07-19 G00681 A 2011-07-20 2012-07-19 E00462 A 2012-07-20 2013-07-19 E00462 A 2013-07-20 2020-12-31 E00462 A 2021-01-01 2024-12-31 E00986 O 2025-01-01 2999-12-31 G04377 thank you is some one can help me with an algorithm. i tried two but I find myself with more O