How to bypass the alert window while downloading a zipfile?

ghz 2days ago ⋅ 11 views

If I open the link: https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip

This link shows the window and I need to press the OK button and it downloads the file.

The alert is not from the browser, it is from the page itself.

But When I tried the script:

from io import BytesIO
from zipfile import ZipFile
import requests


def get_zip(file_url):
    url = requests.get(file_url)
    zipfile = ZipFile(BytesIO(url.content))
    zipfile.extractall("")

file_link ='https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip'

get_zip(file_link)

This throws the error:

zipfile.BadZipFile: File is not a zip file

And when I tried:

import requests

url = r'https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip'
output = r'downloadedfile.zip'

r = requests.get(url)
with open(output, 'wb') as f:
    f.write(r.content)

This downloads the content of the page showing the OK button. Any idea how to solve this:, the link downloads the zip file.

Answer

The issue arises because the file you are trying to download is not directly accessible; instead, the webpage requires user interaction to confirm (via an "OK" button) before serving the file. This confirmation process likely involves cookies, headers, or form submissions that your requests library does not automatically handle.

You can solve this by simulating the browser behavior using a tool like Selenium, which can interact with the webpage and handle such dialogs.


Using Selenium to Automate the Download

  1. Install Selenium and a WebDriver: Install Selenium using pip:

    pip install selenium
    

    Download a WebDriver (e.g., ChromeDriver for Chrome) that matches your browser version.

  2. Automate the Download: Use Selenium to interact with the webpage and click the "OK" button to initiate the download.

Here’s an example script:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import time

# Set up WebDriver (make sure to download and specify the path to your WebDriver)
driver = webdriver.Chrome(executable_path="path_to_chromedriver")

try:
    # Open the URL
    driver.get("https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip")

    # Wait for the "OK" button to appear
    time.sleep(5)  # Adjust this depending on your network speed

    # Click the "OK" button
    ok_button = driver.find_element(By.XPATH, "//input[@value='OK']")
    ok_button.click()

    # Wait for the download to complete
    time.sleep(10)  # Adjust this as needed based on file size and download speed

finally:
    # Close the browser
    driver.quit()

Notes:

  1. Download Location: By default, the file will download to the browser's default download directory. To specify a custom download location, configure browser options:

    from selenium.webdriver.chrome.options import Options
    
    chrome_options = Options()
    chrome_options.add_experimental_option("prefs", {
        "download.default_directory": "/path/to/your/download/directory",
        "download.prompt_for_download": False,
    })
    driver = webdriver.Chrome(executable_path="path_to_chromedriver", options=chrome_options)
    
  2. Headless Mode: If you don’t want the browser window to appear, you can run the browser in headless mode:

    chrome_options.add_argument("--headless")
    chrome_options.add_argument("--disable-gpu")
    

Alternative (Manual Cookies and Headers with Requests)

If you can identify the HTTP request sent after clicking the "OK" button (using browser developer tools), you may be able to mimic the behavior using requests. This requires:

  • Capturing the headers and cookies after the "OK" interaction.
  • Reproducing the exact request.

For example:

import requests

headers = {
    # Add headers captured from the browser
}
cookies = {
    # Add cookies captured from the browser
}
url = "https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip"

response = requests.get(url, headers=headers, cookies=cookies)
with open("downloadedfile.zip", "wb") as f:
    f.write(response.content)

However, using Selenium is often simpler for such cases.