Use selenium to get information out a table with changing xpaths
I am trying to loop through a list of companies and scrap their environmental ratings from CSRhub. I would post the link as an example, but it is by log in only. My scraper has not been getting accurate numbers as the location of the ratings changes depending on the rows of the table on the webpage.
For example: Here we see that Target has 5 rows in the table and the xpath for 73 (Energy & Climate Change rating) is:
//[@id="rating-section"]/div/div2/div/div/table/tbody/tr[23]/td[5]/div/table/tbody/tr/td2/div/div/span1/span*
But companies vary in their number of rows, here are the xpaths for the different elements I am trying to gather.
The table and webpage features do not have ids or very well labeled classes. I am fairly new to understanding front end. How can I select the correct feature regardless of number of rows that company has?
1 answer
-
answered 2020-11-25 20:11
BCR
Since you can't rely on the row numbering, identify what you can rely on--in this case the text label of the value you are looking for. Use the xpath contains() method to check the text. I can't read the HTML in your screenshot so it's hard to give exact code, but it will look something like this:
if the element is
<span class="something useless">I am a label!</span>
use
"//*[@id='rating section']//table//span[contains(text(),'I am a label')]"
BTW a handy trick is to use "//" anywhere there is a lot of non-specific code, so you don't need to have all the /div/span/div cruft in your xpath.
Also look at using child and parent nodes. Identify an element that is highly static nearby the element you want, then use the child node expression (and additional xpath if needed) to get the element needed.
Xpath is daunting when you first start out, but I encourage you to keep trying and learning. It's really powerful in cases like this.
See also questions close to this topic
-
Error non-linear-regression python curve-fit
Hello guys i want to make non-linear regression in python with curve fit this is my code:
#fit a fourth degree polynomial to the economic data from numpy import arange from scipy.optimize import curve_fit from matplotlib import pyplot import math x = [17.47,20.71,21.08,18.08,17.12,14.16,14.06,12.44,11.86,11.19,10.65] y = [5,35,65,95,125,155,185,215,245,275,305] # define the true objective function def objective(x, a, b, c, d, e): return ((a)-((b)*(x/3-5)))+((c)*(x/305)**2)-((d)*(math.log(305))-math.log(x))+((e)*(math.log(305)-(math.log(x))**2)) popt, _ = curve_fit(objective, x, y) # summarize the parameter values a, b, c, d, e = popt # plot input vs output pyplot.scatter(x, y) # define a sequence of inputs between the smallest and largest known inputs x_line = arange(min(x), max(x), 1) # calculate the output for the range y_line = objective(x_line, a, b, c, d, e) # create a line plot for the mapping function pyplot.plot(x_line, y_line, '--', color='red') pyplot.show()
this is my error :
Traceback (most recent call last): File "C:\Users\Fahmi\PycharmProjects\pythonProject\main.py", line 16, in popt, _ = curve_fit(objective, x, y) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 784, in curve_fit res = leastsq(func, p0, Dfun=jac, full_output=1, **kwargs) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 410, in leastsq shape, dtype = _check_func('leastsq', 'func', func, x0, args, n) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 24, in _check_func res = atleast_1d(thefunc(((x0[:numinputs],) + args))) File "C:\Users\Fahmi\PycharmProjects\pythonProject\venv\lib\site-packages\scipy\optimize\minpack.py", line 484, in func_wrapped return func(xdata, params) - ydata File "C:\Users\Fahmi\PycharmProjects\pythonProject\main.py", line 13, in objective return ((a)-((b)(x/3-5)))+((c)(x/305)**2)-((d)(math.log(305))-math.log(x))+((e)(math.log(305)-(math.log(x))**2)) TypeError: only size-1 arrays can be converted to Python scalars
thanks before
-
beautifulsoup (webscraping) not updating variables when HTML text has changed
I am new to python and I cant understand why this isn't working, but I've narrowed down the issue to one line of code.
The purpose of this bot is to scrape HTML from a website (using beautiful and post to discord when the text changes. I use FC2 and FR2 (flightcategory2 and flightrestrictions2) as memory variables for the code to check against every time it runs. If they're the same, the code waits for _ minutes and checks again, if they're different it posts it.
However when running this code, the variables "flightCategory" "flightRestrictions" change the first time the code runs, but for some reason stop changing when the HTML text on the website changes. the line in question is this if loop.
if 1==1: # using 1==1 so this loop constantly runs for testing, otherwise I have it set for a time flightCategory, flightRestrictions = und.getInfo()
When debugging mode, the code IS run, but the variables in the code don't update, and I am confused as to why they would update the first time the code is run, but not sequential times. This line is critical to the operation of my code.
Here's an abbreviated version of the code to make it easier to read. I'd appreciate any help.
FC2 = 0 FR2 = 0 flightCategory = "" flightRestrictions = "" class UND: def __init__(self): page = requests.get("http://sof.aero.und.edu") self.soup = BeautifulSoup(page.content, "html.parser") def getFlightCategory(self): # Takes the appropriate html text and sets it to a variable flightCategoryClass = self.soup.find(class_="auto-style1b") return flightCategoryClass.get_text() def getRestrictions(self): # Takes the appropriate html text and sets it to a variable flightRestrictionsClass = self.soup.find(class_="auto-style4") return flightRestrictionsClass.get_text() def getInfo(self): return self.getFlightCategory(), self.getRestrictions() und = UND() while 1 == 1: if 1==1: #using 1==1 so this loop constantly runs for testing, otherwise I have it set for a time flightCategory, flightRestrictions = und.getInfo() (scrape the html from the web) if flightCategory == FC2 and flightRestrictions == FR2: # if previous check is the same as this check then skip posting Do Something elif flightCategory != FC2 or flightRestrictions != FR2: # if any variable has changed since the last time FC2 = flightCategory # set the comparison variable to equal the variable FR2 = flightRestrictions if flightRestrictions == "Manager on Duty:": # if this is seen only output category Do Something elif flightRestrictions != "Manager on Duty:": Do Something else: print("Outside Time") time.sleep(5) # Wait _ seconds. This would be set for 30 min but for testing it is 5 seconds. O
-
Need to reload vosk model for every transcription?
The vosk model that I'm using is vosk-model-en-us-aspire-0.2 (1.4GB). Every time need quite amount of time to load the vosk model. Is it necessary to recreate the vosk object for every time? It take many time to load the model, if we only load model once. It can save up at least half of the time.
-
How to add data to multiple complex xsd types
What I'm trying to achieve here is I'm trying to add different data to two complex xsd types. Below is an example of some of the code
stackStock= new NASDAQ(); ratStock= new StockData(); stackStock.setStockName("Microsoft"); stackStock.setIPO("MSFT"); stackStock.setCost((int) 500); stackStock.setShares(40); ratStock.setDate("2021-06-01"); ratStock.setCurrency("USD"); ratStock.setSharePrice(180); stockList.add(stackStock); stockList.add(ratStock);
Below is part of the XSD
<xsd:complexType name="NASDAQ"> <xsd:sequence> <xsd:element name="StockName" type="xsd:string"/> <xsd:element name="IPO" type="xsd:string"/> <xsd:element name="shares" type="xsd:int"/> <xsd:element name="cost" type="xsd:int"/> <xsd:element name="Stock_Collection" type="tns:StockData"/> </xsd:sequence> </xsd:complexType> <xsd:complexType name = "StockData"> <xsd:sequence> <xsd:element name="date" type="xsd:string"/> <xsd:element name="currency" type="xsd:string"/> <xsd:element name="share_price" type="xsd:double"/> </xsd:sequence> </xsd:complexType>
Due to them having to be in two different complexTypes, I'm struggling on finding a way on adding the inputted data into the same 'stockList' which will be used to output the XML, because they're in two different complex types the return functions are in two different java files and cannot share the same variable name. Any help is appreciated!
-
Android SwitchPreference for dark mode crashes
I would really like to code a switch in order to get a dark mode setting. I'm trying hard but I still don't know how to make it work.
root_preferences.xml
<PreferenceCategory app:title="Dark Mode"> <SwitchPreference app:key="tema" app:summaryOff="Not active" app:summaryOn="Active" app:title="Dark mode" /> </PreferenceCategory>
SettingsActivity.java
public class SettingsActivity extends AppCompatActivity { @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); setContentView(R.layout.settings_activity); if (savedInstanceState == null) { getSupportFragmentManager() .beginTransaction() .replace(R.id.settings, new SettingsFragment()) .commit(); } ActionBar actionBar = getSupportActionBar(); if (actionBar != null) { actionBar.setDisplayHomeAsUpEnabled(true); } } public static class SettingsFragment extends PreferenceFragmentCompat { @Override public void onCreatePreferences(Bundle savedInstanceState, String rootKey) { setPreferencesFromResource(R.xml.root_preferences, rootKey); SwitchPreference thememode = findPreference("tema"); thememode.setOnPreferenceChangeListener(new Preference.OnPreferenceChangeListener() { @Override public boolean onPreferenceChange(Preference preference, Object o) { if (thememode.isChecked()) { Toast.makeText(getContext(), "Unchecked", Toast.LENGTH_SHORT).show(); // Checked the switch programmatically AppCompatDelegate.setDefaultNightMode(AppCompatDelegate.MODE_NIGHT_YES); thememode.setChecked(false); } else { Toast.makeText(getContext(), "Checked", Toast.LENGTH_SHORT).show(); // Unchecked the switch programmatically AppCompatDelegate.setDefaultNightMode(AppCompatDelegate.MODE_NIGHT_NO); thememode.setChecked(true); } return true; } }); }; } }
The app crashes when I try to open the settings activity. I get this error:
java.lang.RuntimeException: Unable to start activity ComponentInfo{mypackage.MainActivity}: android.view.InflateException: Binary XML file line #15 in mypackage:layout/activity_main: Binary XML file line #11 in mypackage:layout/content_main: Error inflating class fragment
-
BeautifulSoup parsing XML to table
come back again with another issue. using BeautifulSoup really new in parsing XML , and have this problem since 2 weeks now. will appreciate your help have this structure :
<detail> <page number="01"> <Bloc code="AF" A="000000000002550" B="000000000002550"/> <Bloc code="AH" A="000000000035826" C="000000000035826" D="000000000035826"/> <Bloc code="AR" A="000000000026935" B="000000000024503" C="000000000002431" D="000000000001669"/> </page> <page number="02"> <Bloc code="DA" A="000000000038486" B="000000000038486"/> <Bloc code="DD" A="000000000003849" B="000000000003849"/> <Bloc code="EA" A="000000000001029"/> <Bloc code="EC" A="000000000063797" B="000000000082427"/> </page> <page number="03"> <Bloc code="FD" C="000000000574042" D="000000000610740"/> <Bloc code="GW" C="000000000052677" D="000000000075362"/> </page> </detail>
this is my code:(i know that its so poor and have to improve it :'( )
if soup.find_all('bloc') != None: for element in soup.find_all('bloc'): code_element = element['code'] if element.find('m1'): m1_element = element['m1'] else: None if element.find('m2'): m2_element = element['m2'] else: None print(code_element,m1_element, m2_element)
I ve got the error because the 'm2' element does not exist in all the pages. i dont know how can handle this issue.
i would like to put the result in DataFrame like this.
DatFrame = CODE A/ B/ C/ D Page--- Columns AF 0000002550 00002550 NULL NULL 01 AH 000035826 NULL 000035826 0000035826 01 AR 000026935 000000024503 0000002431 0000001669 01 ....etc.
Thank you so much for your help
-
Can not find file that is in resource folder - java
I have a question regarding file handling. I automate a page using selenium, and I need to upload a file in this page. I want to put the file in resource folder and read it's path in the test (since many OS and path will be different to any computer WIN/MAC).
I put the file manually in the resource folder, and it put it in:
X:\Project_11_01_2021\src\test\resources
when I used the ClassLoader and try to find the file it not found it, I saw that if I manually put it in this path it find it, found.
X:\Project_11_01_2021\out\test\resources
the problem is that I am using git and if I add to the resources it upload to git and every one will get the change, and when I put in out\test\resources it is not displayed in the source tree to commit to git. is their a way that classLoader will search in the first location? and not in the second?
[
][path that worked]
[
][when here not worked]
/******* test *******/
public void entertax() throws Exception { WebDriver deiver2 = getWebDriver(); Thread.sleep(1000); ClassLoader classLoader = getClass().getClassLoader(); String path = classLoader.getResource("TAX12.pdf").getPath(); System.out.println("\n\n path is " + path); deiver2.switchTo() .activeElement(); deiver2.findElement(By.xpath("//input[@type='file']")) .sendKeys( "X:\\Project_11_01_2021\\out\\test\\resources\\fw8TAX12.pdf"); System.out.println("END"); }
-
Can not get file path from resource folder - Java
I try to automate a page that I have. In the page I need to upload a PDF file. The problem is that I want selenium via java will get the file from the resource of the project and not hard coded. (since some have windows and some mac and some ubonto so the solution is to get the path dynamically).
I manually hard coded I enter the path it worked and return the path, however to let java return the path it crash. this is my code:
/******* test *******/ public void testFileUpload() throws Exception { WebDriver deiver2 = getWebDriver(); Thread.sleep(3000); deiver2.switchTo() .activeElement(); deiver2.findElement(By.xpath("//input[@type='file']")) .sendKeys( "X:\\project\\src\\test\\resources\\TAX12.pdf"); ClassLoader classLoader = getClass().getClassLoader(); String path = classLoader.getResource("TAX12.pdf").getPath(); System.out.println("\n\n path is " + path); System.out.println("END"); }
I want in the send keys to send the file, the exists however it is not worked. get null exception
the file exists in the folder and worked if manually set the path, but not locate the file if I want it to do it by himself (for win / mac / linux etc)
-
Can not get file path from resource folder - Java (exception)
I try to automate a page that I have. In the page I need to upload a PDF file. The problem is that I want selenium via java will get the file from the resource of the project and not hard coded. (since some have windows and some mac and some ubonto so the solution is to get the path dynamically).
I manually hard coded I enter the path it worked and return the path, however to let java return the path it crash. this is my code:
/******* test *******/ public void testFileUpload() throws Exception { WebDriver deiver2 = getWebDriver(); Thread.sleep(3000); deiver2.switchTo() .activeElement(); deiver2.findElement(By.xpath("//input[@type='file']")) .sendKeys( "X:\\project\\src\\test\\resources\\TAX12.pdf"); ClassLoader classLoader = getClass().getClassLoader(); String path = classLoader.getResource("TAX12.pdf").getPath(); System.out.println("\n\n path is " + path); System.out.println("END"); }
I want in the send keys to send the file, the exists however it is not worked.
java.lang.NullPointerException at org.testng.Assert.fail(Assert.java:97) at tests.ExPubPaymentPageTest.tryFile(ExPubPaymentPageTest.java:58) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:134) at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:597) at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:173) at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) at org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:816) at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:146) at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.testng.TestRunner.privateRun(TestRunner.java:766) at org.testng.TestRunner.run(TestRunner.java:587) at org.testng.SuiteRunner.runTest(SuiteRunner.java:384) at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:378) at org.testng.SuiteRunner.privateRun(SuiteRunner.java:337) at org.testng.SuiteRunner.run(SuiteRunner.java:286) at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:53) at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:96) at org.testng.TestNG.runSuitesSequentially(TestNG.java:1187) at org.testng.TestNG.runSuitesLocally(TestNG.java:1109) at org.testng.TestNG.runSuites(TestNG.java:1039) at org.testng.TestNG.run(TestNG.java:1007) at com.intellij.rt.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:66) at com.intellij.rt.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:109)
the file exists in the folder and worked if manually set the path, but not locate the file if I want it to do it by himself (for win / mac / linux etc)