HtmlAgilityPack - Wait for javascript to run
I'm trying to get info off of a webpage using the HtmlAgilityPack in c#. My issue is that the page I'm trying to get info from loads a preliminary static page and then updates it with a .js script a few seconds later.
Is it possible to make HtmlAgilityPack wait for the .js script to do its thing before saving and parsing through the HTML code?
Or is there a better alternative?
do you know?
how many words do you know
See also questions close to this topic
-
how to change prettier format for react native
my code formatting prettier didn't works well for react native, i don't understand where to config it but it works well with flutter
from this code
import { View, Text } from 'react-native' import React from 'react' export default function App() { return ( <View> <Text>Apps</Text> </View> ) }
it's formatted to this
import { View, Text } from 'react-native' import React from 'react' export default function App() { return ( < View > < Text > Apps < /Text> < /View> ) }
-
MarkLogic server-side JavaScript and XQuery
I am just starting using NoSQL MarkLogic DB and trying to choose for me the best query language to learn and use in future. On server side MarkLogic provides the possibility to use JavaScript API or XQuery API.
I want to receive an advice. Which language is better to concentrate on and learn? JavaScript or XQuery?
- Popover in chrome extension using js
-
C# - Adding condition to func results in stack overflow exception
I have a func as part of specification class which sorts the given iqueryable
Func<IQueryable<T>, IOrderedQueryable<T>>? Sort { get; set; }
When i add more than one condition to the func like below , it results in stack overflow exception.
spec.OrderBy(sc => sc.Case.EndTime).OrderBy(sc => sc.Case.StartTime);
The OrderBy method is implemented like this
public ISpecification<T> OrderBy<TProperty>(Expression<Func<T, TProperty>> property) { _ = Sort == null ? Sort = items => items.OrderBy(property) : Sort = items => Sort(items).ThenBy(property); return this; }
Chaining or using separate lines doesn't make a difference.
This problem gets resolved if I assign a new instance of the specification and set it's func, but i don't want to be assigning to a new instance everytime. Please suggest what am i missing here and how to reuse the same instance (if possible).
-
How to projection fields for a dictionary (C#, MongdoDB)
I am trying my luck here, I have a model which is like the following
public class RowData : BaseBsonDefinition { . [BsonExtraElements] [BsonDictionaryOptions(DictionaryRepresentation.ArrayOfDocuments)] public Dictionary<string, object> Rows { get; set; } = new(StringComparer.OrdinalIgnoreCase); . }
In result, the schema in the MongoDB looks like
{ "_id": { "$binary": { "base64": "HiuI1sgyT0OZmcgGUit2dw==", "subType": "03" } }, "c1": "AAA", "c8": "Fully Vac", "c10": "", }
Those c1, c8 and c10 fields are keys from the dictionary, my question is how to dynamic project those fields?
I tried
Builders<RowData>.Projection.Exclude(p => "c1")
It seems the MongoDB driver can not handle a value directly.
Anyone could point me in the correct direction?
Thanks,
-
How do I add new DataSource to an already Databinded CheckBoxList
i'm building a web form that show Database's item(Tables, Rows, FK,...)
I have a CheckBoxList of Tables (
chkListTable
) which will show a new CheckBoxList of Rows (chkListRow
) everytime I SelectedIndexChanged fromchkListTable
. The problem is i can show the items fromchkListTable
with 1 selected item. But i don't know how to showchkListRow
if multiple item fromchkListTable
are selected.Here are my codes:
aspx
:<div> <asp:Label ID="Label2" runat="server" Text="Table: "></asp:Label> <asp:CheckBoxList ID="chkListTable" runat="server" DataTextField="name" DataValueFeild="name" AutoPostBack="true" OnSelectedIndexChanged="chkListTable_SelectedIndexChanged"> </asp:CheckBoxList> </div> <div> <asp:CheckBoxList ID="chkListRow" runat="server" DataTextField="COLUMN_NAME" DataValueField="COLUMN_NAME" RepeatDirection="Horizontal"> </asp:CheckBoxList> </div>
aspx.cs
:protected void chkListTable_SelectedIndexChanged(object sender, EventArgs e) { tableName.Clear(); foreach (ListItem item in chkListTable.Items) { if(item.Selected) { tableName.Add(item.Text.Trim()); } } for(int i = 0; i < tableName.Count; i++) { String query = "USE " + dbname + " SELECT * FROM information_schema.columns" + " WHERE table_name = '" + tableName[i] + "'" + " AND COLUMN_NAME != 'rowguid'"; chkListRow.DataSource = Program.ExecSqlDataReader(query); chkListRow.DataBind(); Program.conn.Close(); } }
Program.cs
:public static bool Connect() { if (Program.conn != null && Program.conn.State == ConnectionState.Open) Program.conn.Close(); try { Program.conn.ConnectionString = Program.constr; Program.conn.Open(); return true; } catch (Exception e) { return false; } } public static SqlDataReader ExecSqlDataReader(String query) { SqlDataReader myreader; SqlCommand sqlcmd = new SqlCommand(query, Program.conn); sqlcmd.CommandType = CommandType.Text; if (Program.conn.State == ConnectionState.Closed) Program.conn.Open(); try { myreader = sqlcmd.ExecuteReader(); return myreader; myreader.Close(); } catch (SqlException ex) { Program.conn.Close(); return null; } }
I want my display to be like this:
[x]Table1 [x]Table2 [ ]Table3 [ ]Row1(Table1) [ ]Row2(Table1) [ ]Row3(Table1) [ ]Row1(Table2) [ ]Row2(Table2)
-
Download JSON string from webpage fails
I am trying to download a JSON Object from this particular URL:
https://www.dhl.de/int-verfolgen/data/search?piececode=00000
I tried using
Dim JSON As String = New System.Net.WebClient().DownloadString(URL)
and also with the HtmlAgilityPack:
Dim CurrWebPage As New HtmlAgilityPack.HtmlWeb Dim CurrHTMLDoc As HtmlAgilityPack.HtmlDocument CurrHTMLDoc = CurrWebPage.LoadFromBrowser(URL)
but it does not work either.
The program just stops working here.
What makes this page special? Can someone please help me getting this JSON String.
Note: I do not get any exception. It behaves like a deadlock. The debugger just doesn't continues.
-
c# Remove html body's numeric symbol code from html agility pack inner text
I would like to know if there are any ways to remove numeric symbol codes(such as [) from inner text retrieved from site node using html agility pack.
Programming language: C#
Below is the code i used to get the inner-text:
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb(); HtmlAgilityPack.HtmlDocument htmlDocument = web.Load(result[0]/* Here comes the url of a wikipedia's page */); result_title.Text = result[1]; result_content.Text = ""; for (int i = 2; i <= 4; i++) { try { foreach (var item in htmlDocument.DocumentNode.SelectNodes("/html/body/div[3]/div[3]/div[5]/div[1]/p["+i+"]")) { result_content.Text += item.InnerText; } } catch (NullReferenceException) { result_content.Text = "This is a content page. Please refer the url"; } catch (Exception) { break; } } System.Windows.Documents.Hyperlink hyperlink = new System.Windows.Documents.Hyperlink { NavigateUri = new Uri("" + result[0]), }; hyperlink.Inlines.Add("Read More..."); hyperlink.RequestNavigate += Hyperlink_RequestNavigate; result_content.Inlines.Add(hyperlink); Search_Progress.Visibility = Visibility.Collapsed; selected_result_scroll.Visibility = Visibility.Visible;
The output is below:
As you can see in the image, it consists of the output of the inner-text grabbed from the wikipedia page's body. I like to know if there is any way i can remove those numeric symbolic code from it(the one's marked in red around it).
The text shown in wikipedia site is below(if you want to know what do those codes show in web):
Ambareesha is a 2014 Indian Kannada-language action film directed and produced by Mahesh Sukhadhare under the Sri Sukhadhare Pictures banner.[1][2] The film stars Darshan, Rachita Ram and Priyamani. Dr.Ambareesh and his wife Sumalatha Ambareesh will be seen in guest roles.[3] The soundtrack and score is composed by V. Harikrishna and the cinematography is by Ramesh Babu.
-
My XPath with double slash always selects same element from HtmlAgilityPack result
I'm trying to scrap the list of music for a given date from the billboard website. When I obtain a list of div for each music I'm trying to get the title and the artist for each div, but in my foreach loop I get always the same return values, is like my foreach loop doesn't go to the next div.
using HtmlAgilityPack; using System; using System.Collections.Generic; namespace Billboard_Scraping { internal class Program { static void Main(string[] args) { var billBoardMusic = GetMusicList("https://www.billboard.com/charts/hot-100/2000-08-12"); } static List<Music> GetMusicList(string url) { var musics = new List<Music>(); HtmlWeb web = new HtmlWeb(); HtmlDocument document = web.Load(url); HtmlNodeCollection linknode = document.DocumentNode.SelectNodes("//div[contains(@class,\"o-chart-results-list-row-container\")]"); foreach (var link in linknode) { var music = new Music(); var titleXPath = "//h3[contains(@class,\"c-title\")]"; var artistXPath = "//span[contains(@class,\"c-label a-no-trucate\")]"; music.Title = link.SelectSingleNode(titleXPath).InnerText.Trim(); music.Autor = link.SelectSingleNode(artistXPath).InnerText.Trim(); musics.Add(music); } return musics; } } public class Music { public string Title { get; set; } public string Autor { get; set; } } }