If I open the link: https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip
This link shows the window and I need to press the OK button and it downloads the file.
The alert is not from the browser, it is from the page itself.
But When I tried the script:
from io import BytesIO
from zipfile import ZipFile
import requests
def get_zip(file_url):
url = requests.get(file_url)
zipfile = ZipFile(BytesIO(url.content))
zipfile.extractall("")
file_link ='https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip'
get_zip(file_link)
This throws the error:
zipfile.BadZipFile: File is not a zip file
And when I tried:
import requests
url = r'https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip'
output = r'downloadedfile.zip'
r = requests.get(url)
with open(output, 'wb') as f:
f.write(r.content)
This downloads the content of the page showing the OK button. Any idea how to solve this:, the link downloads the zip file.
Answer
The issue arises because the file you are trying to download is not directly accessible; instead, the webpage requires user interaction to confirm (via an "OK" button) before serving the file. This confirmation process likely involves cookies, headers, or form submissions that your requests
library does not automatically handle.
You can solve this by simulating the browser behavior using a tool like Selenium, which can interact with the webpage and handle such dialogs.
Using Selenium to Automate the Download
-
Install Selenium and a WebDriver: Install Selenium using pip:
pip install selenium
Download a WebDriver (e.g., ChromeDriver for Chrome) that matches your browser version.
-
Automate the Download: Use Selenium to interact with the webpage and click the "OK" button to initiate the download.
Here’s an example script:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains
import time
# Set up WebDriver (make sure to download and specify the path to your WebDriver)
driver = webdriver.Chrome(executable_path="path_to_chromedriver")
try:
# Open the URL
driver.get("https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip")
# Wait for the "OK" button to appear
time.sleep(5) # Adjust this depending on your network speed
# Click the "OK" button
ok_button = driver.find_element(By.XPATH, "//input[@value='OK']")
ok_button.click()
# Wait for the download to complete
time.sleep(10) # Adjust this as needed based on file size and download speed
finally:
# Close the browser
driver.quit()
Notes:
-
Download Location: By default, the file will download to the browser's default download directory. To specify a custom download location, configure browser options:
from selenium.webdriver.chrome.options import Options chrome_options = Options() chrome_options.add_experimental_option("prefs", { "download.default_directory": "/path/to/your/download/directory", "download.prompt_for_download": False, }) driver = webdriver.Chrome(executable_path="path_to_chromedriver", options=chrome_options)
-
Headless Mode: If you don’t want the browser window to appear, you can run the browser in headless mode:
chrome_options.add_argument("--headless") chrome_options.add_argument("--disable-gpu")
Alternative (Manual Cookies and Headers with Requests)
If you can identify the HTTP request sent after clicking the "OK" button (using browser developer tools), you may be able to mimic the behavior using requests
. This requires:
- Capturing the headers and cookies after the "OK" interaction.
- Reproducing the exact request.
For example:
import requests
headers = {
# Add headers captured from the browser
}
cookies = {
# Add cookies captured from the browser
}
url = "https://dibbs2.bsm.dla.mil/Downloads/RFQ/Archive/ca210731.zip"
response = requests.get(url, headers=headers, cookies=cookies)
with open("downloadedfile.zip", "wb") as f:
f.write(response.content)
However, using Selenium is often simpler for such cases.