python unicode error with scrapy
File "F:/python/graduate.py", line 37, in init dodgers =
File "F:\anaco\lib\site-packages\pandas\io\parsers.py", line 709, in parser_f return _read(filepath_or_buffer, kwds)
File "F:\anaco\lib\site-packages\pandas\io\parsers.py", line 449, in _read parser = TextFileReader(filepath_or_buffer, **kwds)
File "F:\anaco\lib\site-packages\pandas\io\parsers.py", line 818, in init self._make_engine(self.engine)
File "F:\anaco\lib\site-packages\pandas\io\parsers.py", line 1049, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "F:\anaco\lib\site-packages\pandas\io\parsers.py", line 1695, in init self._reader = parsers.TextReader(src, **kwds)
File "pandas/_libs/parsers.pyx", line 562, in pandas._libs.parsers.TextReader.cinit
File "pandas/_libs/parsers.pyx", line 790, in pandas._libs.parsers.TextReader._get_header UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e in position 0: invalid start byte
See also questions close to this topic
Python version's tmvtnorm::rtmvnorm which original at R
For simulating some data, I need to Sampling Random Numbers From The Truncated Multivariate Normal Distribution. Which is description of a function called tmvtnorm::rtmvnorm in R.
I have tried the function in R. But my script is major written by python. So I would like to know If there are any function could do the same things?
I have tried truncnorm in scipy, emcee(python libray). But it all doesn't work like the result outputed by tmvtnorm::rtmvnorm.
Finally, I am using the rpy2 to get the result output from R.
Here is the needed question:
- Any tools which could work like tmvtnorm::rtmvnorm?
- Any explan about the differences of tmvtnorm::rtmvnorm and truncnorm in scipy.
Scikit learn KerasClassifer evaluation error
I have created a Keras Classifier for k fold validation. Below is the function for building the classifer.
def build_classifier(): classifier = Sequential() classifier.add(Dense(activation="relu", input_dim=11, units=6, kernel_initializer="uniform")) classifier.add(Dense(activation="relu", units=6, kernel_initializer="uniform")) classifier.add(Dense(activation="sigmoid", units=1, kernel_initializer="uniform")) classifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy']) return classifier
I have initialized the classifier as below.
classfier = KerasClassifier(build_fn = build_classifier, batch_size=10, epochs=100)
I am trying to get the accuracy score from this 10 fold validation.
accuracies = cross_val_score(estimator = classifier, X = X_train, y = y_train, cv = 10, n_jobs = -1)
However it show me a TypeError:
TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator does not.
Running flask via supervisor
I'm running a flask web app (say it's
app.py) with Supervisor.
app.pyhangs without crashing.
Is it possible orrecommended to create another program which listens for heartbeats from
app.pyand restarts if it doesn't get a heartbeat in X seconds?
My other thought was simply to wrap problem spots in try/except and just crash the program, using supervisor to restart.
This is a not a long term fix.
How would you solve this problem?
pandas df to_csv with row and column offset
With pd.excelwriter, you can specify a row and column offset with the startrow and startcol parameters. Similarly in pd.to_csv() in append mode, is there a way to specify row and column offsets? I want to write multiple dataframes iteratively, with row spacing between each iteration, AND my dataframes have different column lengths that I want to offset as well when writing to csv file.
Iteratively write from zipped list to csv
I have been trying to figure out how to write from a zipped list of lists to a csv iteratively.
def write_to_Excel(a,b,path): #first extract data from a and b #c,d,e,f are formatted dataframes of varying lengths c,d=extract(a) e,f=extract(b) outname = "file" + ".xlsx" out_path = os.path.join(path, outname) writer = pd.ExcelWriter(out_path, engine='xlsxwriter') offset=len(f) for df in (a,b,c,d,e): n=0 df.to_excel(writer,sheet_name='Sheet1',startcol=offset,startrow=n,index=False) offset =offset+len(df.columns)+2 n+=10 writer.save() def main(): #x,y are list of dataframes containing the data I want to write to csv at specified path for (a,b) in zip (x,y): write_to_Excel (a,b,path)
I want to be able to take the extracted data and write to csv, then add 10 or so rows and then write the extracted data from zip and so on until all iterations of zip has been exhausted. The output of the excel file I am getting has only the data extracted from the last iteration, but not at each iteration. How do I go about either appending the data from a multiple output function (extract) or iteratively write each set of data to the Same csv. I was able to write each iteration to a new excel file, but looking to do it for the same csv.
Change the format in a atribute in Sqlite3 from a CSV File
I have to import a table from CSV, where i can do a table like this
table productos(id int, name varchar, description varchar, price int)
My problem is that in the price row in the CSV file I have the price with the format
But i need it in this way
Thanks alot guys!
How to append dishes to its right category when scraping restaurant menu with Scrapy
I have been stuck on this for a couple of days already and I couldn't find a post here on Stackoverflow that fixed it for me.
I am using Scrapy to scrape restaurant menus, for example: https://www.foodora.ca/restaurant/s0se/wow-chicken-kensington
I would like to append the dishes to its right category, but I don't know how to accomplish that. Can someone help me with that?
How to use Scrapy sitemap spider on sites with text sitemaps?
I tried using a generic Scrapy.spider to follow links, but it didn't work - so I hit upon the idea of simplifying the process by accessing the
sitemap.txtinstead, but that didn't work either!
I wrote a simple example (to help me understand the algorithm) of a spider to follow the sitemap specified on my site:
https://legion-216909.appspot.com/sitemap.txtIt is meant to navigate the URLs specified on the sitemap, print them out to screen and output the results into a
links.txtfile. The code:
import scrapy from scrapy.spiders import SitemapSpider class MySpider(SitemapSpider): name = "spyder_PAGE" sitemap_urls = ['https://legion-216909.appspot.com/sitemap.txt'] def parse(self, response): print(response.url) return response.url
I ran the above spider as
Scrapy crawl spyder_PAGE > links.txtbut that returned an empty text file. I have gone through the Scrapy docs multiple times, but there is something missing. Where am I going wrong?
Scrapy Selector attribute
I use the following website to test:
scrapy shell http://example.webscraping.com/places/default/user/login#
And do some test:
[<Selector xpath='//div[@style]/input' data='<input name="_next" type="hidden" value='>, <Selector xpath='//div[@style]/input' data='<input name="_formkey" type="hidden" val'>, <Selector xpath='//div[@style]/input' data='<input name="_formname" type="hidden" va'>]
response.xpath('//div//@style/input') == response.xpath('//div[style]/input')
I want to know how different 1 and 2 is,thanks.