Any efficient way to compare two dataframes and append new entries in pandas?
I have new files which I want to add them to historical table, before that, I need to check new file with historical table by comparing its two column in particular, one is
state and another one is
date column. First, I need to check
max (state, date), then check those entries with
max(state, date) in historical table; if they are not historical table, then append them, otherwise do nothing. I tried to do this in pandas by
group-by on new file and historical table and do comparison, if any new entries from new file that not in historical data, then add them. Now I have issues to append new values to historical table correctly in pandas. Does anyone have quick thoughts?
My current attempt:
import pandas as pd src_df=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/src_df.csv") hist_df=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/historical_df.csv") picked_rows = src_df.loc[src_df.groupby('state')['yyyy_mm'].idxmax()]
I want to check
hist_df where I need to check by
yyyy_mm columns, so only add entries from
max value or recent dates. I created desired output below. I tried inner join or
pandas.concat but it is not giving me correct out. Does anyone have any ideas on this?
Here is my desired output that I want to get:
import pandas as pd desired_output=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/output_df.csv")