Using request in python to download a xls file
In this page you will find a link to download an xls file (below attachment or adjuntos): https://www.banrep.gov.co/es/emisiones-vigentes-el-dcv
The link to download the xls file is: https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls
I was using this code to automatically download that file:
import requests
import os
path = os.path.abspath(os.getcwd()) #donde se descargará el archivo
path = path.replace("\\", '/')+'/'
url = 'https://www.banrep.gov.co/sites/default/files/paginas/emisiones/EMISIONES.xls'
myfile = requests.get(url, verify=False)
open(path+'EMISIONES.xls', 'wb').write(myfile.content)
This code was working well, but suddently the downloaded file started being corrupted.
If I run the code, it raises this warning:
InsecureRequestWarning: Unverified HTTPS request is being made to host 'www.banrep.gov.co'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
warnings.warn(
1 answer
-
answered 2022-01-23 03:05
Cristian Quintero
The error is related to how your request is being built. The status_code returned by the request is 403 [Forbiden]. You can see it typing
myfile.status_code
I guess the security issue is related to cookies and headers in your get request, because of that I suggest you take a view on how the webpage is building its headers in your request before the URL you're using is sent.
TIP: start you web browser in development mode and using Network tab, try to identify the headers.
To solve the issue of cookies take a view on how to retrieve naturally cookies pointing out to a previous webpage in www.banrep.gov.co, using requests.sessions
session_ = requests.Session()
Before coding you could try to test your requests using Postman, or other REST API test software.
do you know?
how many words do you know
See also questions close to this topic
-
Python File Tagging System does not retrieve nested dictionaries in dictionary
I am building a file tagging system using Python. The idea is simple. Given a directory of files (and files within subdirectories), I want to filter them out using a filter input and tag those files with a word or a phrase.
If I got the following contents in my current directory:
data/ budget.xls world_building_budget.txt a.txt b.exe hello_world.dat world_builder.spec
and I execute the following command in the shell:
py -3 tag_tool.py -filter=world -tag="World-Building Tool"
My output will be:
These files were tagged with "World-Building Tool": data/ world_building_budget.txt hello_world.dat world_builder.spec
My current output isn't exactly like this but basically, I am converting all files and files within subdirectories into a single dictionary like this:
def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree
Right now, my dictionary looks like this:
key:''
.In the following function, I am turning the empty values
''
into empty lists (to hold my tags):def empty_str_to_list(d): for k,v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v)
When I run my entire code, this is my output:
hello_world.dat ['World-Building Tool'] world_builder.spec ['World-Building Tool']
But it does not see
data/world_building_budget.txt
. This is the full dictionary:{'data': {'world_building_budget.txt': []}, 'a.txt': [], 'hello_world.dat': [], 'b.exe': [], 'world_builder.spec': []}
This is my full code:
import os, argparse def fs_tree_to_dict(path_): file_token = '' for root, dirs, files in os.walk(path_): tree = {d: fs_tree_to_dict(os.path.join(root, d)) for d in dirs} tree.update({f: file_token for f in files}) return tree def empty_str_to_list(d): for k, v in d.items(): if v == '': d[k] = [] elif isinstance(v, dict): empty_str_to_list(v) parser = argparse.ArgumentParser(description="Just an example", formatter_class=argparse.ArgumentDefaultsHelpFormatter) parser.add_argument("--filter", action="store", help="keyword to filter files") parser.add_argument("--tag", action="store", help="a tag phrase to attach to a file") parser.add_argument("--get_tagged", action="store", help="retrieve files matching an existing tag") args = parser.parse_args() filter = args.filter tag = args.tag get_tagged = args.get_tagged current_dir = os.getcwd() files_dict = fs_tree_to_dict(current_dir) empty_str_to_list(files_dict) for k, v in files_dict.items(): if filter in k: if v == []: v.append(tag) print(k, v) elif isinstance(v, dict): empty_str_to_list(v) if get_tagged in v: print(k, v)
-
Actaully i am working on a project and in it, it is showing no module name pip_internal plz help me for the same. I am using pycharm(conda interpreter
File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\Scripts\pip.exe\__main__.py", line 4, in <module> File "C:\Users\pjain\AppData\Local\Programs\Python\Python310\lib\site-packages\pip\_internal\__init__.py", line 4, in <module> from pip_internal.utils import _log
I am using pycharm with conda interpreter.
-
Looping the function if the input is not string
I'm new to python (first of all) I have a homework to do a function about checking if an item exists in a dictionary or not.
inventory = {"apple" : 50, "orange" : 50, "pineapple" : 70, "strawberry" : 30} def check_item(): x = input("Enter the fruit's name: ") if not x.isalpha(): print("Error! You need to type the name of the fruit") elif x in inventory: print("Fruit found:", x) print("Inventory available:", inventory[x],"KG") else: print("Fruit not found") check_item()
I want the function to loop again only if the input written is not string. I've tried to type return Under print("Error! You need to type the name of the fruit") but didn't work. Help
-
.Body & .HTMLBody Indent and Default Signature Issue
I've set up a macro to send an e-mail through Outlook.
.Body is read from a cell inside the file with indents. Since the value will change depending on the usage, I need to reference that cell for the body.
However, there rises 2 issues using .HTMLbody I lose indents which are constructed with CHAR(10) but I keep the default HTML signature.
When using just .BODY indents are displayed are correctly however the default signature is not constructed as HTML and I lose the images.
How should I go about fixing this issue?
My code:
sig = .HTMLBody body = xlSht.Range("B4").Value .To = xlSht.Range("B2").Value .CC = "" .Subject = xlSht.Range("B1").Value .body = body & sig .Display
I'd really appreciate your assistance.
Thanks.
-
Yahoo Finance no longer returns VBA cookie request for .getResponseHeader("Set-Cookie")
The following Excel VBA code segment has worked for years, but stopped working around 28 Apr 2022. I receive the responseText, but the .getResponseHeader("Set-Cookie") returns null.
Set httpReq = New WinHttp.WinHttpRequest DownloadURL = "https://finance.yahoo.com/lookup?s=" & stockSymbol With httpReq .Open "GET", DownloadURL, False .setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8" .Send .waitForResponse response = .responseText cookie = Split(.getResponseHeader("Set-Cookie"), ";")(0) End With
-
export excel rows to individual json files in python
My excel file has 500 rows of data. I am trying to get 500 individual JSON files. Each file should have data only from 1 row. Thank you in advance.
import json import pandas excel_data_df = pandas.read_excel("F:/2/N.csv.xlsx", sheet_name='Sheet1') json_str = excel_data_df.to_json(orient='records') for idx, row in enumerate(json_str): fpath = str(idx) + ".json" with open(fpath, "w+") as f: json.dump(row, f)
-
How to search list of numbers in many files
I have 5 files containing numbers and need to search for a list of numbers and print the name of the file it has. i tried this code but don't work
import os out = open('output', 'w') numbers = [23175,2080,6277,6431,19846,10330,25408,25811,8454,10515] filenames = { 'G':'green.txt', 'R':'red.txt', 'B':'blue.txt', 'Y':'yellow.txt', 'O':'orange.txt', } for k,filename in filenames.items(): j=0 with open(filename, 'r') as f: for line in f: if int(line.strip()) == numbers[j]: print(filename) print(numbers[j]) else : j+=1
i got
if int(line.strip()) == numbers[j]: IndexError: list index out of range
-
Pycharm "Save console output to file" not working
I see that no file is saved even though I specify a file pathname log.log in the logs tab (Run/Debug configurations of Pycharm)
-
Validate user input with data in .txt file
I have searched and searched and tried everything. I am creating a game where the user will input a pre-assigned pin and I want to validate that pin against a .txt file in Python. I have tried so many different lines of code and my result is either everything is valid or nothing is valid. What am I doing wrong? The pins are formatted on each line and are alpha numeric like this...
1DJv3Awv5 1DGw2Eql8 3JGl1Hyt7 2FHs4Etz4 3GDn9Buf8 1CEa9Aty0 2AIt9Dxz9 5DFu0Ati4 3AJu9Byi4 1EAm0Cfn1 3BEr0Gwk0 7JAf8Csf8 4HFu0Dlf4
Here is what I have:
user_input = input('Please enter your PIN: ') if user_input in open("PINs.txt").read(): print('Congratulations! Click the button below to get your Bingo Number.') else: print('The PIN you entered does not match our records. Please check your PIN and try again.')
-
Electron JS ReCaptcha Harvester
I need to make a recaptcha V2 harvester in Electron JS. I tried the following code but all I get it "ERROR for site owner: Invalid domain for site key". I think I need to intercept requests. Someone suggested me to use "interceptBufferProtocol" method in JS but I don't know how to use that. Can someone help me ?
<html> <head> <title>Captcha Harvester</title> <script src="https://www.google.com/recaptcha/api.js" async defer></script> </head> <body> <form action="/submit" method="POST"> <div class="g-recaptcha" id="captchaFrame" data-sitekey="actual-site-key" data-callback="sub"></div> </form> <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.2.1/jquery.min.js"></script> </body> </html>
-
Add body to luasocket POST request with generic form?
From https://w3.impa.br/~diego/software/luasocket/http.html, there are two ways to make a request, simple and generic. I have gotten the body to work with the simple method. However, when I add an LTN12 source to the generic method, an empty body is sent to the server.
http.request(url [, body]) http.request{ url = string, [sink = LTN12 sink,] [method = string,] [headers = header-table,] [source = LTN12 source], [step = LTN12 pump step,] [proxy = string,] [redirect = boolean,] [create = function] }
This works:
http.request("http://localhost:56218/sendState", "at=" .. AT)
This doesn't:
local reqbody = "hi" local respbody = {} local body, code, headers, status = http.request { url = "http://localhost:56218/sendState", source = ltn12.source.string(reqBody), headers = { ["content-length"] = string.len(reqbody) } sink = ltn12.sink.table(respbody) }
When I try to read the body of the above line of code in my server, it is empty. What am I doing wrong?
-
how to Access to the Response body in the retrofit?
I changed the Volley library to Retrofit. Now I want to access the response body like the Volley library. I searched the internet and came up with this solution, but when I run the program, the program closes and shows a low error. Thank you in advance for your help.
apiService Class
public void getVerifyCode(String mobile, RequestStatus requestStatus) { Log.i(TAG, "getVerifyCode: Called"); JsonObject jsonObject = new JsonObject(); jsonObject.addProperty("command", "register_user"); jsonObject.addProperty("mobile", mobile); Log.i(TAG, "getVerifyCode: requestCode: " + jsonObject.toString()); retrofitApi.getVerifyCode(jsonObject).enqueue(new Callback<ResponseBody>() { @Override public void onResponse(Call<ResponseBody> call, Response<ResponseBody> response) { Log.i(TAG, "getVerifyCode:onResponse: " + response.toString()); requestStatus.onSuccess(response.toString()); } @Override public void onFailure(Call<ResponseBody> call, Throwable t) { Log.e(TAG, "getVerifyCode:onFailure= " + t.getMessage()); requestStatus.onError(new Exception(t)); } }); }
Retrofit Callback
@POST(".") Call<ResponseBody> getVerifyCode(@Body JsonObject body);
Logcat
I/ApiService: getVerifyCode: Called I/ApiService: getVerifyCode: requestCode: {"command":"register_user","mobile":"0915*******7"} I/ApiService: getVerifyCode:onResponse: Response{protocol=http/1.1, code=200, message=OK, url=http://**********ion.freehost.io/} W/System.err: at ir.*****pp.*****k.ApiService$2.onResponse(ApiService.java:75)
-
How to rename columns after looping through directory in pandas?
I have a directory
directory = '//data-share/jobs/Escalations'
This directory has multiple files called
1.xls 2.xls 3.xls
and so on. more could be added in the future with same data formats, so ideally id want the script to loop through the os directory
The issue is that column names within these files are not named same even though they mean the same thing.
for example
1.xls has the column 'billing address' in 3 sheets 2.xls has the column 'bill-to address' in 3 sheets 3.xls has the column 'billed address' in 3 sheets
I just want to rename all these columns from all sheets and call it 'billing address' in a new df. there are other columns too like 'Zip','Billing zip','Bill-To Zip Code' i just want it to be all renamed 'Zip'... Billing city','Bill-To City','City' = 'City' So the new df will have all the renamed columns and one unified dataframe My final df should look like the below for each xls file's df because my final objective is to concatenate all under 1 df. and in order to do this, all column names need to match.
zip billing address city
Note that the amount of columns and order of columns is not the same for the files
-
iat returns too much info
I'm trying to create a program that looks for a cell value with a specific name (A1 for example). When it finds it, it takes a value from a cell in the same row and another value from a cell in a different sheet.
my problem is that when I try and use iat[] it returns the value that I want, but with some extra data that I don't know where it gets it from.
import re import pandas as pd import csv import sys rows = pd.read_excel(r'DSF test.xls', sheet_name='Raw Data', usecols="A") lr = rows.index[-1]+2 #this is what value you are looking for in the data for m in ("A"): for n in range(1,5): well = m+str(n) #this creates A1 for example for i in range(0,110): dfwell = pd.read_excel(r'DSF test.xls', sheet_name='Raw Data', header=7, usecols="A") vrst = dfwell.loc[i].to_string(index=False) if vrst == well: dfreading = pd.read_excel(r'DSF test.xls', sheet_name='Raw Data', header=7, usecols="F") csvreading = dfreading.loc[i].to_string(index=False) c = 4 dftemp = pd.read_excel(r'DSF test.xls', sheet_name='Melt Region Temperature Data', header=7) csvtemp = dftemp.iat[i,c] c += 1 print(csvtemp) csvdata = csvtemp + "," + csvreading filename = well + ".csv" with open(filename, "a", newline="") as f: thewriter = csv.writer(f) thewriter.writerow([csvdata]) f.close()
Also the program is very slow to run but I'm trying to make it do what I want first and I'll optimize it later.
-
Loading data into df after finding blank line
What is the best way to find the first newline in a file when the input file is sometimes a .csv and sometimes a .xls. The newline is guaranteed, but the newline is always at a random row when reading the file. The input file will have a certain amount of rows, always at the top. This data is variable by a line or two. So I will skip the first 4, 5, 6, because of this unpredictability. My goal here is to read the data beyond that point into a DataFrame, skipping those first rows. The line right after the first blank line is where I will start reading the data in to the
df
. So something that just skips this variable amount of rows is what I am missing, I have a small function that identifies file type, if that code returns true the file is a xls file and if false the file is a CSV file. In my example file below the first blank row is at row 7.1: CSV
This reads forever and I have to interrupt execution for the program to quit. A key point, when running f.readline() and looking at the output line by line I notice the file passes the blank line because it is not
'\n'
as expected. Instead it's always something like',,,,,,,,,,\n'
with no consistency across my many csv files. How can I write something to identify this as a blank line without always tweaking code to account for new amount of commas in the first blank row in the CSV file?import pandas as pd file = 'input_file.csv' f = open(file) while f.readline() not in ('\n'): pass final_df = pd.read_csv(f, header=None)
Example
file
.report random info more info Project number 111111 Order number Plates Plate1 Plate2 Plate3 DNA \ Assay id1 id2 id3 Name1 C:C G:G T:C Name2 C:C G:G C:C Name3 C:C G:G T:C Current output for the readline function that is looking for the newline, at the newline:
',,,,,,,,,,\n'
final_df
expected outputDNA \ Assay id1 id2 id3 Name1 C:C G:G T:C Name2 C:C G:G C:C Name3 C:C G:G T:C 2: XLS
When the files are in the xls file format, they appear the exact same as my example file used above. The example file provides the data exactly as needed for this question, no changes needed.
My idea to read the files if they are input as a xls file is to
import pandas as pd df = pd.read_excel(file) f = tempfile.NamedTemporaryFile() df.to_csv(f) f.seek(0) line = str(f.readline()).strip()
and the current output after a
print(line)
returnsb',report,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46\n'
I'm not wanting to continue reading the file this way if there is another way to find the first blank line with
pd.read_excel(line)
.The expected output is the same as listed above in
final_df
I would ideally use something like
final_df = pd.read_csv(line)
to produce thefinal_df
, that does not work.DNA \ Assay id1 id2 id3 Name1 C:C G:G T:C Name2 C:C G:G C:C Name3 C:C G:G T:C