Selenium get URL of "a" Tag without href attribute

ghz 12hours ago ⋅ 5 views

I'm facing an element like:

<li _ngcontent-bcp-c271="">
   <a _ngcontent-bcp-c271="">2018</a>
   <!---->
   <!---->
</li>

This element is clickable but since it does not have a href attribute, and I think it should use some script for the click event, I don't have a solution to get the URL from this element.

The code that I use most of the time is as follows:

driver.find_element(By.TAG_NAME, 'li').find_element(By.TAG_NAME, 'a').get_attribute('href')

Update: I need to know the URL before I click on the bottom.

Answer

You're correct that in many cases, clickable elements like the one you showed might not have a direct href attribute because the URL is dynamically handled by JavaScript (likely by triggering an event on click, such as navigating or fetching content via AJAX).

If the a tag does not have an href attribute, you need to investigate how the URL is determined and/or how the JavaScript handles the navigation. There are a few ways you can proceed to extract the URL before clicking:

1. Check for JavaScript Event Listeners on the Element

Since you mentioned that the element doesn't have an href, it's likely the URL is generated dynamically via JavaScript when you click the element. Here's what you can do:

  • Look for the event listener: If you have access to the browser's Developer Tools (DevTools), you can inspect the element and check for any JavaScript event listeners attached to it. You can do this via the Event Listeners tab in Chrome DevTools. Once you identify the event, you can look for the corresponding JavaScript code that generates the URL.

  • Monitor network requests: Another approach is to monitor the network requests made when the element is clicked. You can use the Network tab in DevTools to track any requests sent when the element is clicked. This can often reveal the URL being fetched.

2. Use JavaScript to Retrieve the URL

If the URL is determined by a script when the element is clicked, you can attempt to simulate the click event using JavaScript and inspect what happens:

Approach 1: Using JavaScript in Selenium to Extract the URL

You can execute a JavaScript snippet to simulate the click event and possibly retrieve the URL that the script generates.

Here’s how you can do it with Selenium:

from selenium import webdriver
from selenium.webdriver.common.by import By

# Setup WebDriver
driver = webdriver.Chrome()

# Navigate to the page
driver.get("your_page_url")

# Find the element
element = driver.find_element(By.TAG_NAME, 'li').find_element(By.TAG_NAME, 'a')

# Use JavaScript to simulate the click and get the URL (if URL is dynamically generated)
url = driver.execute_script("""
    var element = arguments[0];
    var event = new MouseEvent('click', {
        'bubbles': true,
        'cancelable': true,
        'view': window
    });
    element.dispatchEvent(event);
    // Try to capture the URL if it's set in window.location or some global variable
    return window.location.href;  // or check some other global variable where the URL might be stored
""", element)

print("URL:", url)

This code snippet will simulate a click event and, depending on how the JavaScript handles the navigation or URL generation, you can capture the resulting URL through window.location.href or other methods (you may need to adjust this based on how the site works).

Approach 2: Check for JavaScript Variables or Methods

Sometimes the URL might be stored in a JavaScript variable, or it could be fetched via a function call. You can try to evaluate JavaScript code to capture that.

For example, if the script that runs on click sets a variable or triggers a function that contains the URL, you can extract that information using:

url = driver.execute_script('return someJavascriptVariableOrFunction();')
print("URL:", url)

3. Monitor Network Requests

If the element triggers an HTTP request (AJAX or full page load), you can use browser developer tools or programmatically monitor the network traffic to capture the URL.

  • Chrome DevTools: In the Network tab, look for the request triggered when you click the element. The URL for that request will be the one you're looking for.

  • Selenium and BrowserMob Proxy: You can also use BrowserMob Proxy or mitmproxy to intercept the HTTP requests sent by the browser when interacting with the page. This allows you to capture the exact URL that gets triggered.

Example using BrowserMob Proxy:

from browsermobproxy import Server
from selenium import webdriver

# Set path to your BrowserMob Proxy binary
server = Server("path_to_browsermob_proxy")
server.start()
proxy = server.create_proxy()

# Set up WebDriver with proxy
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)

# Start the WebDriver session
driver.get("your_page_url")

# Start capturing the traffic
proxy.new_har("capture_traffic")

# Interact with the page (click on the element)
element = driver.find_element(By.TAG_NAME, 'li').find_element(By.TAG_NAME, 'a')
element.click()

# Retrieve the URL from captured traffic
har = proxy.har  # This is the captured HTTP Archive (HAR) data
for entry in har['log']['entries']:
    print(entry['request']['url'])  # Extract URLs

4. Look for data- Attributes

Another possibility is that the URL is stored in a data- attribute or some other custom attribute. You can check if the element contains any such attributes.

url = driver.find_element(By.TAG_NAME, 'li').find_element(By.TAG_NAME, 'a').get_attribute('data-url')

If the a element has a data-url attribute or similar, you can retrieve it and use that URL.

Summary:

  • Monitor the network requests: Check the browser's network traffic to see what happens when the element is clicked.
  • Simulate the click using JavaScript: You can trigger the click and try to capture the resulting URL using window.location.href or other global variables.
  • Check for data- attributes: The URL might be stored in a custom attribute like data-url.
  • Use browser dev tools: Inspect the element in the browser to see if there are event listeners or JavaScript code that sets or redirects to a URL.

By combining these methods, you should be able to track down how the URL is generated and retrieve it before performing the click action.