Puppeteer: Get selector from ElementHandle OR set page.on('request'...) for new pages

I am attempting to grab relevant elements on a page and then click each of them individually in order to capture the redirects in between and the final landing url.

const page = await browser.newPage();
const url = "someurl";

await page.goto(url, {
    waitUntil: 'networkidle0',
    timeout: 30000
});

//Grab the clickable elements
let anchors, buttons, inputs, onclicks, elements;
anchors = await page.$$('a');
buttons = await page.$$('button');
inputs = await page.$$('input[type=submit]');
onclicks = await page.$$('[onclick]');
elements = anchors.concat(buttons, inputs, onclicks);

// Click each element and wait for navigation
for (var i = elements.length - 1; i >= 0; i--) {
    await Promise.all([
        elements[i].click(),
        page.waitForNavigation({waitUntil: 'networkidle0'})
    ]);
    // Go back to page and click on next element
    await page.goBack({
        waitUntil: 'networkidle0',
        timeout: 30000
    });
    // Alternatively: reload page
    //await page.goto(url, {
    //    waitUntil: 'networkidle0',
    //    timeout: 30000
    //});
}

Warning: The code above does NOT work because once you navigate off the page, the ElementHandles you derived from that page are disposed of regardless of which way you navigate back to the original page. I've tried using element.click({ button: 'middle'}) which opens a new tab/page for each element, but there is no way to capture the redirects in between because a new page object is created separately and the page redirects before I can grab the page and set the page to save the redirects via the page.on function as shown below.

page.on('request', interceptedRequest => { 
    saveRedirect(interceptedRequest.url); 
}

I also tried getting the CSS selector or Xpath from the ElementHandle so maybe I could use that to click on elements, but the documentation's pretty slim on what the ElementHandle's properties look like. So, I tried printing out the Map to see if the selector or Xpath was a property, but I get an error TypeError: Converting circular structure to JSON. I tried quite a few different things like Map.values() and iterating the map, but I either get this typeError or an empty array.

I could use page.evaluate(() => document.querySelectorAll('a')) for each of the elements I need and store those, but, correct me if I'm wrong, I'll end up running into the same issue of losing context once I navigate off page.

So my question(s):

  1. Is there a way to globally set any new pages to have a page.on like above or manage all the responses for the browser?
  2. Alternatively, is there a way to maintain the ElementHandles references after navigating?
  3. Alternatively Alternatively, is there a way to grab the CSS path or Xpath of the element from the ElementHandle object?

I appreciate any help that can be lent. Big thanks to Puppeteer; it is a very robust tool and the dev is very quick as far as I've seen.

1 answer

  • answered 2017-11-15 20:43 Bobby Singh

    Figured it out. When a new page, listener, etc. is created, a 'targetcreated' event is raised within the browser from which you can obtain a Target object which, if it is a page, Target.page() returns the page, otherwise returns null. From there you can set the page.on event to capture any redirects/ navigation.

    let elements = page.$$('a');
    const newPagePromise = new Promise(fulfill => browser.once('targetcreated', target => fulfill(target.page())));
    await elements[0].click({button:'middle'});
    const newPage = await newPagePromise;
    if(newPage != null){
        console.log(`Got new page ${newPage.url()} `);
        newPage.on('request', interceptedRequest => {
            console.log(`-------InterceptedRequest.url:${interceptedRequest.url}`);
        }); 
    }