python dataframe convert epoch to readable datetime hour minutes seconds as zero
I have a dataframe as follows:
period
1651622400000.00000
1651536000000.00000
1651449600000.00000
1651363200000.00000
1651276800000.00000
1651190400000.00000
1651104000000.00000
1651017600000.00000
I have converted it into human readable datetime as:
df['period'] = pd.to_datetime(df['period'], unit='ms')
and this outputs:
2022-04-04 00:00:00
2022-04-05 00:00:00
2022-04-06 00:00:00
2022-04-07 00:00:00
2022-04-08 00:00:00
2022-04-09 00:00:00
2022-04-10 00:00:00
2022-04-11 00:00:00
2022-04-12 00:00:00
hours minutes and seconds are turned to 0.
I checked this into https://www.epochconverter.com/ and this gives
GMT: Monday, April 4, 2022 12:00:00 AM
Your time zone: Monday, April 4, 2022 5:45:00 AM GMT+05:45
How do I get h, m, and s as well?
2 answers
-
answered 2022-05-04 10:44
jezrael
If use
https://www.epochconverter.com/
is added timezone.If need add timezones to column use
Series.dt.tz_localize
and thenSeries.dt.tz_convert
:df['period'] = (pd.to_datetime(df['period'], unit='ms') .dt.tz_localize('GMT') .dt.tz_convert('Asia/Kathmandu')) print (df) period 0 2022-05-04 05:45:00+05:45 1 2022-05-03 05:45:00+05:45 2 2022-05-02 05:45:00+05:45 3 2022-05-01 05:45:00+05:45 4 2022-04-30 05:45:00+05:45 5 2022-04-29 05:45:00+05:45 6 2022-04-28 05:45:00+05:45 7 2022-04-27 05:45:00+05:45
-
answered 2022-05-04 10:52
My Work
There is no problem with your code or with pandas. And I don't think the timezone is an issue here either (as the other answer says). April 4, 2022 12:00:00 AM is the exact same time and date as 2022-04-04 00:00:00, just in one case you use AM... You could specify timezones as
jezrael
writes or withutc=True
(check the docs) but I guess that's not your problem.
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
Any efficient way to compare two dataframes and append new entries in pandas?
I have new files which I want to add them to historical table, before that, I need to check new file with historical table by comparing its two column in particular, one is
state
and another one isdate
column. First, I need to checkmax (state, date)
, then check those entries withmax(state, date)
in historical table; if they are not historical table, then append them, otherwise do nothing. I tried to do this in pandas bygroup-by
on new file and historical table and do comparison, if any new entries from new file that not in historical data, then add them. Now I have issues to append new values to historical table correctly in pandas. Does anyone have quick thoughts?My current attempt:
import pandas as pd src_df=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/src_df.csv") hist_df=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/historical_df.csv") picked_rows = src_df.loc[src_df.groupby('state')['yyyy_mm'].idxmax()]
I want to check
picked_rows
inhist_df
where I need to check bystate
andyyyy_mm
columns, so only add entries frompicked_rows
wherestate
hasmax
value or recent dates. I created desired output below. I tried inner join orpandas.concat
but it is not giving me correct out. Does anyone have any ideas on this?Here is my desired output that I want to get:
import pandas as pd desired_output=pd.read_csv("https://raw.githubusercontent.com/adamFlyn/test_rl/main/output_df.csv")
-
How to bring data frame into single column from multiple columns in python
I have data format in these multiple columns. So I want to bring all 4 columns of data into a single column.
YEAR Month pcp1 pcp2 pcp3 pcp4 1984 1 0 0 0 0 1984 2 1.2 0 0 0 1984 3 0 0 0 0 1984 4 0 0 0 0 1984 5 0 0 0 0 1984 6 0 0 0 1.6 1984 7 3 3 9.2 3.2 1984 8 6.2 27.1 5.4 0 1984 9 0 0 0 0 1984 10 0 0 0 0 1984 11 0 0 0 0 1984 12 0 0 0 0
-
Exclude Japanese Stopwords from File
I am trying to remove Japanese stopwords from a text corpus from twitter. Unfortunately the frequently used nltk does not contain Japanese, so I had to figure out a different way.
This is my MWE:
import urllib from urllib.request import urlopen import MeCab import re # slothlib slothlib_path = "http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt" sloth_file = urllib.request.urlopen(slothlib_path) # stopwordsiso iso_path = "https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt" iso_file = urllib.request.urlopen(iso_path) stopwords = [line.decode("utf-8").strip() for line in iso_file] stopwords = [ss for ss in stopwords if not ss==u''] stopwords = list(set(stopwords)) text = '日本語の自然言語処理は本当にしんどい、と彼は十回言った。' tagger = MeCab.Tagger("-Owakati") tok_text = tagger.parse(text) ws = re.compile(" ") words = [word for word in ws.split(tok_text)] if words[-1] == u"\n": words = words[:-1] ws = [w for w in words if w not in stopwords] print(words) print(ws)
Successfully Completed: It does give out the original tokenized text as well as the one without stopwords
['日本語', 'の', '自然', '言語', '処理', 'は', '本当に', 'しんどい', '、', 'と', '彼', 'は', '十', '回', '言っ', 'た', '。'] ['日本語', '自然', '言語', '処理', '本当に', 'しんどい', '、', '十', '回', '言っ', '。']
There is still 2 issues I am facing though:
a) Is it possible to have 2 stopword lists regarded? namely
iso_file
andsloth_file
? so if the word is either a stopword fromiso_file
orsloth_file
it will be removed? (I tried to use line 14 asstopwords = [line.decode("utf-8").strip() for line in zip('iso_file','sloth_file')]
but received an error as tuple attributes may not be decodedb) The ultimate goal would be to generate a new text file in which all stopwords are removed.
I had created this MWE
### first clean twitter csv import pandas as pd import re import emoji df = pd.read_csv("input.csv") def cleaner(tweet): tweet = re.sub(r"@[^\s]+","",tweet) #Remove @username tweet = re.sub(r"(?:\@|http?\://|https?\://|www)\S+|\\n","", tweet) #Remove http links & \n tweet = " ".join(tweet.split()) tweet = ''.join(c for c in tweet if c not in emoji.UNICODE_EMOJI) #Remove Emojis tweet = tweet.replace("#", "").replace("_", " ") #Remove hashtag sign but keep the text return tweet df['text'] = df['text'].map(lambda x: cleaner(x)) df['text'].to_csv(r'cleaned.txt', header=None, index=None, sep='\t', mode='a') ### remove stopwords import urllib from urllib.request import urlopen import MeCab import re # slothlib slothlib_path = "http://svn.sourceforge.jp/svnroot/slothlib/CSharp/Version1/SlothLib/NLP/Filter/StopWord/word/Japanese.txt" sloth_file = urllib.request.urlopen(slothlib_path) #stopwordsiso iso_path = "https://raw.githubusercontent.com/stopwords-iso/stopwords-ja/master/stopwords-ja.txt" iso_file = urllib.request.urlopen(iso_path) stopwords = [line.decode("utf-8").strip() for line in iso_file] stopwords = [ss for ss in stopwords if not ss==u''] stopwords = list(set(stopwords)) with open("cleaned.txt",encoding='utf8') as f: cleanedlist = f.readlines() cleanedlist = list(set(cleanedlist)) tagger = MeCab.Tagger("-Owakati") tok_text = tagger.parse(cleanedlist) ws = re.compile(" ") words = [word for word in ws.split(tok_text)] if words[-1] == u"\n": words = words[:-1] ws = [w for w in words if w not in stopwords] print(words) print(ws)
While it works for the simple input text in the first MWE, for the MWE I just stated I get the error
in method 'Tagger_parse', argument 2 of type 'char const *' Additional information: Wrong number or type of arguments for overloaded function 'Tagger_parse'. Possible C/C++ prototypes are: MeCab::Tagger::parse(MeCab::Lattice *) const MeCab::Tagger::parse(char const *)
for this line:
tok_text = tagger.parse(cleanedlist)
So I assume I will need to make amendments to thecleanedlist
?I have uploaded the cleaned.txt on github for reproducing the issue: [txt on github][1]
Also: How would I be able to get the tokenized list that excludes stopwords back to a text format like cleaned.txt? Would it be possible to for this purpose create a df of ws? Or might there even be a more simple way?
Sorry for the long request, I tried a lot and tried to make it as easy as possible to understand what I'm driving at :-)
Thank you very much! [1]: https://gist.github.com/yin-ori/1756f6236944e458fdbc4a4aa8f85a2c
-
How to convert YYYYMM to YYYY-MM datetime format without day?
I have two datasets that have monthly frequencies. For one of them,
df
, I had to aggregate some data to turn it from daily to monthly using the following code:df_grouped=df.groupby([df.index.to_period('M'),'City ID']).agg({'Estimated Population':'mean','Estimated Population_2019':'mean','Confirmed Rate':['mean','std'],'Death Rate':['mean','std'],'New Confirmed':'sum','New Deaths':'sum'}) df_grouped.index.rename(['Month','City ID'],inplace=True)
After doing these changes my dates became in the format
YYYY-MM
, for example:2020-01 2020-02 ... 2021-07
My other dataset,
df2
, has the date in formatYYYMM
, so I used the following code to convert it:df2['DATE'] = pd.to_datetime(df2['DATE'],format='%Y%m')
My new dates become in the format
YYYY-MM-DD
, where all theDD
become 01, as follows:2020-01-01 2020-02-01 ... 2021-07-01
How can I convert
df2
date now fromYYYY-MM-DD
toYYYY-MM
?I was thinking, maybe there is a way to convert straight from
YYYYMM
toYYYY-MM
? -
convert month of dates into sequence
i want to combine months from years into sequence, for example, i have dataframe like this:
stuff_id date 1 2015-02-03 2 2015-03-03 3 2015-05-19 4 2015-10-13 5 2016-01-07 6 2016-03-20
i want to sequence the months of the date. the desired output is:
stuff_id date month 1 2015-02-03 1 2 2015-03-03 2 3 2015-05-19 4 4 2015-10-13 9 5 2016-01-07 12 6 2016-03-20 14
which means feb'15 is the first month in the date list and jan'2016 is the 12th month after feb'2015
-
PHP | Parse specific german date format to yyyy-mm-dd
Hey I am struggling with a date format . I need to adjust the display of some dates in a wordpress project.
Its not a duplicate of This question I tried the suggestion over there and its not working with that specific date format.
What I have is
german date
in the formatD, d. MM yyyy
looking like this:Fr, 6. Mai 2022
I want to convert it to
yyyy-mm-dd
=2022-05-06
but I cant get it work. I have tried to use date_parse_from_format and date_create_from_format but it seems to fail because of the german month and day names.$date_german = 'Fr, 6. Mai 2022'; $date_english = 'Fri, 6. May 2022'; print_r(date_create_from_format('D, d. F Y', $date_german)); // doesn't work print_r(date_create_from_format('D, d. F Y', $date_english)); // works ()
Another try with IntlDateFormatter
$date = 'Fr, 6. Mai 2022'; $formatter = new IntlDateFormatter("de_DE", IntlDateFormatter::SHORT, IntlDateFormatter::NONE); $formatter->setPattern('D, d. F Y'); $unixtime=$formatter->parse($date); $datetime=new DateTime(); $datetime->setTimestamp($unixtime); echo $datetime->format('Y-m-d');
Not working either, it returns:
1970-01-01
because $unixtime is empty?I also tried to setLocale to de_DE before formatting, but still same problem.
-
separate datetime column in R while keeping time accurate
4/12/2016 12:00:00 AM I have dates in the format above and have tried to use separate() to create two columns in the data frame where the data is present. When I do the columns are created but AM/PM so the times just become numbers or worse appear as "12H 0M 0S". Can anyone help me out, pretty new to data analysis as a whole and would be much appreciated!
-
A date loop problem and list remove problem on JupyterLab
Hello everyone, I encountered a date looping problem on JupyterLab, the problem is as shown in the attached picture:
It is very strange that the red circle of B should be displayed the same as the red circle of A. Why is week 6 missing?
And "if d.weekday() in [5, 6]: dates.remove(d)". It should be 5 and 6 removed, how can there be 4/3 and 4/10?
I have restarted the core and the result is the same. It's amazing...
-
Output the Tuesday 6 weeks in the future in Python?
UPDATE: post edited to add answer to end of post
Core Question
Using Python, how do I output the date of the Tuesday that occurs 6 weeks after a certain date range?
Context
I work at a SaaS company in a customer facing role. Whenever I do an implementation for a client, the client receives a survey email on the Tuesday that occurs in the 6th week after our initial interaction.
To know which Tuesday to be extra nice on, we currently have to reference a chart that says if the interaction falls in date range x, then the client receives their survey solicitation on Tuesday y.
An example of this would be: if the interaction happened sometime within Apr. 18 - Apr. 22, then the survey goes out on May 31.
I would prefer for this to be done without having to hard code the date ranges and their corresponding Tuesdays into my program (just because I'm lazy and don't want to update the dates manually as the months go by), but I'm open to that solution if that's how it has to be. :)
Code Attempt
I can use datetime to output a particular date x weeks from today's date, but I'm not sure how to get from here to what I want to do.
import time from datetime import datetime, timedelta time1 = (time.strftime("%m/%d/%Y")) #current date time2 = ((datetime.now() + timedelta(weeks=6)).strftime('%m/%d/%Y')) #current date + six weeks print(time1) print((datetime.now() + timedelta(weeks=6)).strftime('%m/%d/%Y'))
Disclaimer: I am a beginner and although I did search for an answer to this question before posting, I may not have known the right terms to use. If this is a duplicate question, I would be thrilled to be pointed in the right direction. :)
~~~UPDATED ANSWER~~~
Thanks to @Mandias for getting me on the right track. I was able to use week numbers to achieve my desired result.
from datetime import datetime, timedelta, date today = date.today() #get today's date todays_week = today.isocalendar()[1] #get the current week number based on today's date survey_week = todays_week + 6 #add 6 weeks to the current week number todays_year = int(today.strftime("%Y")) #get today's calendar year and turn it from a str to an int survey_week_tuesday = date.fromisocalendar(todays_year, survey_week, 2) #(year, week, day of week) 2 is for Tuesday print("Current Week Number:") print(todays_week) print("Current Week Number + 6 Weeks:") print(todays_week + 6) print("Today's Year:") print(todays_year) print("The Tuesday on the 6th week from the current week (i.e. survey tuesday):") print(survey_week_tuesday.strftime('%m-%d-%Y')) #using strftime to format the survey date into MM-DD-YYYY format because that's what we use here even though DD-MM-YYYY makes more sense
-
Update date format in jsonb object in postgreSQL
Postgres table have one column as JSONB with below format
{ "steps": [ { "step": "Building", "status": "Complete", "end_date": "03/08/2018", "start_date": "03/08/2018" }, { "step": "Underground Mechanical", "status": "Not Applicable", "end_date": "04/25/2018", "start_date": "" }, { "step": "Close Mechanical Permit", "status": "Complete", "end_date": "04/25/2018", "start_date": "" } ], "people": [ { "name": "Energy Resource Center", "role": "Contractor", "phone": "(000) 444-4447" }, { "name": "XXX YYY", "role": "Owner", "phone": "" } ], "status": "Closed", "address": "XXX str", "sub_type": "Residential", "issue_date": "03/08/2018", "description": "" }
steps can have n number of objects. I need to update end_date , start_date and issue_date to epoch date format. Please anyone can help with script.
-
Postgres Epoch timestamp data type number of digits
I am creating an Epoch timestamp, which is said to be double precision format.
I saved this to my local DB, and am using it to sync items to clients.
I have now been confronted with some situations, that the timestamp will only show 5 digits, eg. 1645604141.22873,
since according to Postgres docu (The to_timestamp function can also take a single double precision argument to convert from Unix epoch to timestamp with time zone. ), the double precision has 15 digits
double precision 8 bytes variable-precision, inexact 15 decimal digits precision
This would mean that the format above would be correct,
BUT I am seeing the Epoch timestamp in my table with 16 digits like this:
1645604141.228735
Does anyone have a good explanation, on how the epoch is really stored, and how it can guarantee my 6 digits after the comma (microsecond)