iat returns too much info
I'm trying to create a program that looks for a cell value with a specific name (A1 for example). When it finds it, it takes a value from a cell in the same row and another value from a cell in a different sheet.
my problem is that when I try and use iat[] it returns the value that I want, but with some extra data that I don't know where it gets it from.
import re
import pandas as pd
import csv
import sys
rows = pd.read_excel(r'DSF test.xls', sheet_name='Raw Data', usecols="A")
lr = rows.index[-1]+2
#this is what value you are looking for in the data
for m in ("A"):
for n in range(1,5):
well = m+str(n) #this creates A1 for example
for i in range(0,110):
dfwell = pd.read_excel(r'DSF test.xls', sheet_name='Raw Data', header=7, usecols="A")
vrst = dfwell.loc[i].to_string(index=False)
if vrst == well:
dfreading = pd.read_excel(r'DSF test.xls', sheet_name='Raw Data', header=7, usecols="F")
csvreading = dfreading.loc[i].to_string(index=False)
c = 4
dftemp = pd.read_excel(r'DSF test.xls', sheet_name='Melt Region Temperature Data', header=7)
csvtemp = dftemp.iat[i,c]
c += 1
print(csvtemp)
csvdata = csvtemp + "," + csvreading
filename = well + ".csv"
with open(filename, "a", newline="") as f:
thewriter = csv.writer(f)
thewriter.writerow([csvdata])
f.close()
Also the program is very slow to run but I'm trying to make it do what I want first and I'll optimize it later.
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
how do I dissable debian python path/recursion limit
so, as of late, I've been having path length limit and recursion limit issues, so I really need to know how to disable these.
I can't even install modules like discord.py!!!!
-
TypeError: 'float' object cannot be interpreted as an integer on linspace
TypeError Traceback (most recent call last) d:\website\SpeechProcessForMachineLearning-master\SpeechProcessForMachineLearning-master\speech_process.ipynb Cell 15' in <cell line: 1>() -->1 plot_freq(signal, sample_rate) d:\website\SpeechProcessForMachineLearning-master\SpeechProcessForMachineLearning-master\speech_process.ipynb Cell 10' in plot_freq(signal, sample_rate, fft_size) 2 def plot_freq(signal, sample_rate, fft_size=512): 3 xf = np.fft.rfft(signal, fft_size) / fft_size ----> 4 freq = np.linspace(0, sample_rate/2, fft_size/2 + 1) 5 xfp = 20 * np.log10(np.clip(np.abs(xf), 1e-20, 1e100)) 6 plt.figure(figsize=(20, 5)) File <__array_function__ internals>:5, in linspace(*args, **kwargs) File ~\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\core\function_base.py:120, in linspace(start, stop, num, endpoint, retstep, dtype, axis) 23 @array_function_dispatch(_linspace_dispatcher) 24 def linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, 25 axis=0): 26 """ 27 Return evenly spaced numbers over a specified interval. 28 (...) 118 119 """ --> 120 num = operator.index(num) 121 if num < 0: 122 raise ValueError("Number of samples, %s, must be non-negative." % num) TypeError: 'float' object cannot be interpreted as an integer
What solution about this problem?
-
IndexError: list index out of range with api
all_currencies = currency_api('latest', 'currencies') # {'eur': 'Euro', 'usd': 'United States dollar', ...} all_currencies.pop('brl') qtd_moedas = len(all_currencies) texto = f'{qtd_moedas} Moedas encontradas\n\n' moedas_importantes = ['usd', 'eur', 'gbp', 'chf', 'jpy', 'rub', 'aud', 'cad', 'ars'] while len(moedas_importantes) != 0: for codigo, moeda in all_currencies.items(): if codigo == moedas_importantes[0]: cotacao, data = currency_api('latest', f'currencies/{codigo}/brl')['brl'], currency_api('latest', f'currencies/{codigo}/brl')['date'] texto += f'{moeda} ({codigo.upper()}) = R$ {cotacao} [{data}]\n' moedas_importantes.remove(codigo) if len(moedas_importantes) == 0: break # WITHOUT THIS LINE, GIVES ERROR
Why am I getting this error? the list actually runs out of elements, but the code only works with the if
-
ValueError: All arrays must be of the same length when scraping
I try to input different zip codes and scrape information for Target products. However, results in this error:ValueError: All arrays must be of the same length and there is nothing in my CSV file. I guess because I did not successfully scrap e all the information. Can anyone give me some suggestions on how to improve the code? I appreciate any help. Thanks.
Following is my code:
#Target Url list urlList = [ 'https://www.target.com/p/pataday-once-daily-relief-extra-strength-drops-0-085-fl-oz/-/A-83775159?preselect=81887758#lnk=sametab', 'https://www.target.com/p/kleenex-ultra-soft-facial-tissue/-/A-84780536?preselect=12964744#lnk=sametab', 'https://www.target.com/p/claritin-24-hour-non-drowsy-allergy-relief-tablets-loratadine/-/A-80354268?preselect=14351285#lnk=sametab', 'https://www.target.com/p/opti-free-pure-moist-rewetting-drops-0-4-fl-oz/-/A-14358641#lnk=sametab', 'https://www.target.com/p/allegra-24-hour-allergy-relief-tablets-fexofenadine-hydrochloride/-/A-15068699?preselect=14042732#lnk=sametab', 'https://www.target.com/p/nasacort-allergy-relief-spray-triamcinolone-acetonide/-/A-15143450?preselect=15503329#lnk=sametab', 'https://www.target.com/p/genexa-dextromethorphan-kids-39-cough-and-chest-congestion-suppressant-4-fl-oz/-/A-80130848#lnk=sametab', 'https://www.target.com/p/zyrtec-24-hour-allergy-relief-tablets-cetirizine-hcl/-/A-15075280?preselect=79847258#lnk=sametab', 'https://www.target.com/p/pataday-twice-daily-eye-allergy-itch-and-redness-relief-drops-0-17-fl-oz/-/A-78780978#lnk=sametab', 'https://www.target.com/p/systane-gel-drops-lubricant-eye-gel-0-33-fl-oz/-/A-14523072#lnk=sametab'] zipCodeList = [3911,4075,4467,96970,96960,49220,49221,49224,48001,49227,48101,48002,48003,48004] while(True): priceArray = [] nameArray = [] zipCodeArray =[] GMTArray = [] TCIN = [] UPC = [] def ScrapingTarget(url): wait_imp = 10 CO = webdriver.ChromeOptions() CO.add_experimental_option('useAutomationExtension', False) CO.add_argument('--ignore-certificate-errors') CO.add_argument('--start-maximized') wd = webdriver.Chrome(r'D:\chromedriver\chromedriver_win32new\chromedriver_win32 (2)\chromedriver.exe',options=CO) wd.get(url) wd.implicitly_wait(wait_imp) # needed to click onto the "Show more" to get the tcin and upc xpath = '//*[@id="tabContent-tab-Details"]/div/button' element_present = EC.presence_of_element_located((By.XPATH, xpath)) WebDriverWait(wd, 5).until(element_present) showMore = wd.find_element(by=By.XPATH, value=xpath) sleep(3) showMore.click() # showMore = wd.find_element(by=By.XPATH, value="//*[@id='tabContent-tab-Details']/div/button") # sleep(2) #showMore.click() soup = BeautifulSoup(wd.page_source, 'html.parser') # gets a list of all elements under "Specifications" try: # gets a list of all elements under "Specifications" div = soup.find("div", {"class": "styles__StyledCol-sc-ct8kx6-0 iKGdHS h-padding-h-tight"}) list = div.find_all("div") for a in range(len(list)): list[a] = list[a].text # locates the elements in the list tcin = [v for v in list if v.startswith("TCIN")] upc = [v for v in list if v.startswith("UPC")] except: tcin = "Error" upc = "Error" TCIN.append(tcin) UPC.append(upc) for zipcode in zipCodeList: try: #click the delivery address address = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div/div[1]/button[2]") address.click() #click the Edit location editLocation = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div/div[2]/button") editLocation.click() except: #directly click he Edit location editLocation = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div[1]/div/div[1]/button") editLocation.click() #input ZipCode inputZipCode = wd.find_element(by=By.XPATH, value="//*[@id='enter-zip-or-city-state']") inputZipCode.clear() inputZipCode.send_keys(zipcode) #click submit clickSubmit = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[4]/div/div[2]/div/div/div[3]/div/button[1]") clickSubmit.click() #start scraping name = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[1]/h1/span").text nameArray.append(name) price = wd.find_element(by=By.XPATH, value="//*[@id='pageBodyContainer']/div[1]/div[2]/div[2]/div/div[1]/div[1]/span").text priceArray.append(price) currentZipCode = zipcode zipCodeArray.append(currentZipCode) tz = pytz.timezone('Europe/London') GMT = datetime.now(tz) GMTArray.append(GMT) with concurrent.futures.ThreadPoolExecutor() as executor: executor.map(ScrapingTarget, urlList) data = {'prod-name': nameArray, 'Price': priceArray, 'currentZipCode': zipCodeArray, "Tcin": TCIN, "UPC":UPC, "GMT": GMTArray } df = pd.DataFrame(data, columns= ['prod-name', 'Price','currentZipCode',"Tcin","UPC","GMT"]) df.to_csv(r'C:\Users\12987\PycharmProjects\python\Network\priceingAlgoriCoding\export_Target_dataframe.csv', mode='a', index = False, header=True) sleep(20)
-
Python: I have a list of dictionaries but when I try to call a specific Key and Value it is throwing a error
def readParksFile(fileName="national_parks(1).csv"): nationalParks = open(fileName) headers = nationalParks.readline() keys = headers.split(",") numKeys = len(keys) parksList = [] values = nationalParks.readlines() rowsList = [] parksDictionary = {} for row in values: rowsList.append(row.split(",")) for item in rowsList: parksDictionary = {} for i in range(numKeys): parksDictionary[keys[i]] = item[i] parksList.append(parksDictionary) for i in range(len(parksList)): return(parksList[i]) nationalParks.close()
I created a list of dictionaries using the code above
def printParksInState(parksList): state = getStateAbbr() for parksDictionary in parksList: if state in parksDictionary["State"]: print(parksDictionary["Name"] + " (" + parksDictionary["Code"] + ")") print("\t" + "Location:" + parksDictionary["State"]) print("\t" + "Area:" + parksDictionary["Acres"] + " acres") print("\t" + "Date Established:" + tasks.convertDate(parksDictionary["Date"])) else: print("There are no national parks in " + state + " or it is not a valid state")
I have functions that uses the list of dictionaries to print information
def main(): print("National Parks") parksList = tasks.readParksFile() menuDict = interface.getMenuDict() choice = ("") while choice != "Q": print(interface.displayMenu(menuDict)) choice = interface.getUserChoice(menuDict) if choice == "A": interface.printAllParks(parksList) elif choice == "B": interface.printParksInState(parksList) elif choice == "C": interface.printLargestPark(parksList) elif choice == "D": interface.printParksForSearch(parksList) else: print("This is not an option")
In my main function I call the other functions that uses the list of dictionaries
Error However no matter what function I call it throws the error that string indices must be integers and I am not sure what this error is or how to fix it. Please help!
-
R Studio keeps crashing when I'm trying to merge multiple csv files into a data frame. How do I fix this?
I have 12 csv files that I need to merge for analysis project and their size ranges from 20mb to 120mb per file.
I attempted cutting down to only using the necessary columns by using fread() so it reads 6 columns instead of the total 11.
I've assigned each of them into a data frame as shown below.
However, at some point doing these manually, especially for using View() of the data frame that contains the 12 csv data, I keep getting crashed from R Studio probably due to the memory usage and the whole environment just resets and I have to do everything over again.
Is there a shorting and less ugly way to do this without crashing?
Packages <- c("dplyr", "janitor", "skimr", "readr", "lubridate","tidyverse","tidyr") lapply(Packages, library, character.only = TRUE) library("data.table") td2105 <- fread("/cloud/project/Capstone Cyclistic Project/202105-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2106 <- fread("/cloud/project/Capstone Cyclistic Project/202106-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2107 <- fread("/cloud/project/Capstone Cyclistic Project/202107-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2108 <- fread("/cloud/project/Capstone Cyclistic Project/202108-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2109 <- fread("/cloud/project/Capstone Cyclistic Project/202109-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2110 <- fread("/cloud/project/Capstone Cyclistic Project/202110-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2111 <- fread("/cloud/project/Capstone Cyclistic Project/202111-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2112 <- fread("/cloud/project/Capstone Cyclistic Project/202112-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2201 <- fread("/cloud/project/Capstone Cyclistic Project/202201-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2202 <- fread("/cloud/project/Capstone Cyclistic Project/202202-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2203 <- fread("/cloud/project/Capstone Cyclistic Project/202203-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td2204 <- fread("/cloud/project/Capstone Cyclistic Project/202204-divvy-tripdata.csv", select = c("rideable_type", "started_at", "ended_at", "start_station_name","end_station_name","member_casual")) td_2105_to_2204 <- rbind(td2105,td2106,td2107,td2108,td2109,td2110,td2111,td2112,td2201,td2202,td2203,td2204) View(td_2105_to_2204)
-
How to rename columns after looping through directory in pandas?
I have a directory
directory = '//data-share/jobs/Escalations'
This directory has multiple files called
1.xls 2.xls 3.xls
and so on. more could be added in the future with same data formats, so ideally id want the script to loop through the os directory
The issue is that column names within these files are not named same even though they mean the same thing.
for example
1.xls has the column 'billing address' in 3 sheets 2.xls has the column 'bill-to address' in 3 sheets 3.xls has the column 'billed address' in 3 sheets
I just want to rename all these columns from all sheets and call it 'billing address' in a new df. there are other columns too like 'Zip','Billing zip','Bill-To Zip Code' i just want it to be all renamed 'Zip'... Billing city','Bill-To City','City' = 'City' So the new df will have all the renamed columns and one unified dataframe My final df should look like the below for each xls file's df because my final objective is to concatenate all under 1 df. and in order to do this, all column names need to match.
zip billing address city
Note that the amount of columns and order of columns is not the same for the files
-
Loading data into df after finding blank line
What is the best way to find the first newline in a file when the input file is sometimes a .csv and sometimes a .xls. The newline is guaranteed, but the newline is always at a random row when reading the file. The input file will have a certain amount of rows, always at the top. This data is variable by a line or two. So I will skip the first 4, 5, 6, because of this unpredictability. My goal here is to read the data beyond that point into a DataFrame, skipping those first rows. The line right after the first blank line is where I will start reading the data in to the
df
. So something that just skips this variable amount of rows is what I am missing, I have a small function that identifies file type, if that code returns true the file is a xls file and if false the file is a CSV file. In my example file below the first blank row is at row 7.1: CSV
This reads forever and I have to interrupt execution for the program to quit. A key point, when running f.readline() and looking at the output line by line I notice the file passes the blank line because it is not
'\n'
as expected. Instead it's always something like',,,,,,,,,,\n'
with no consistency across my many csv files. How can I write something to identify this as a blank line without always tweaking code to account for new amount of commas in the first blank row in the CSV file?import pandas as pd file = 'input_file.csv' f = open(file) while f.readline() not in ('\n'): pass final_df = pd.read_csv(f, header=None)
Example
file
.report random info more info Project number 111111 Order number Plates Plate1 Plate2 Plate3 DNA \ Assay id1 id2 id3 Name1 C:C G:G T:C Name2 C:C G:G C:C Name3 C:C G:G T:C Current output for the readline function that is looking for the newline, at the newline:
',,,,,,,,,,\n'
final_df
expected outputDNA \ Assay id1 id2 id3 Name1 C:C G:G T:C Name2 C:C G:G C:C Name3 C:C G:G T:C 2: XLS
When the files are in the xls file format, they appear the exact same as my example file used above. The example file provides the data exactly as needed for this question, no changes needed.
My idea to read the files if they are input as a xls file is to
import pandas as pd df = pd.read_excel(file) f = tempfile.NamedTemporaryFile() df.to_csv(f) f.seek(0) line = str(f.readline()).strip()
and the current output after a
print(line)
returnsb',report,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46\n'
I'm not wanting to continue reading the file this way if there is another way to find the first blank line with
pd.read_excel(line)
.The expected output is the same as listed above in
final_df
I would ideally use something like
final_df = pd.read_csv(line)
to produce thefinal_df
, that does not work.DNA \ Assay id1 id2 id3 Name1 C:C G:G T:C Name2 C:C G:G C:C Name3 C:C G:G T:C