Using pypdfocr library from within a Python script

ghz 12hours ago ⋅ 2 views

How can you run pypdfocr from within a Python script, as opposed to the command line?

import pypdfocr
from pypdfocr import pypdfocr
from pypdfocr.pypdfocr import PyPDFOCR as pocr

filepath = 'C:/myfolder/myPDF.pdf'

newfile = pocr.run_conversion(filepath)

This throws an error:

Unbound method  run_conversion must be called with PyPDFOCR instance as first argument.

Can someone help me fill in the (likely obvious) missing piece?

Answer

To run pypdfocr from within a Python script, you need to correctly instantiate the PyPDFOCR class before calling the run_conversion method. The error message you are seeing (Unbound method run_conversion must be called with PyPDFOCR instance as first argument) suggests that you're trying to call the method on the class directly, rather than on an instance of the class.

Here's how you can modify your code to correctly use pypdfocr:

  1. Import the PyPDFOCR class.
  2. Create an instance of the class.
  3. Call the run_conversion method on that instance.

Here's the corrected version of your code:

import pypdfocr
from pypdfocr.pypdfocr import PyPDFOCR

# Create an instance of PyPDFOCR
pocr = PyPDFOCR()

# Path to your PDF file
filepath = 'C:/myfolder/myPDF.pdf'

# Run the conversion
newfile = pocr.run_conversion(filepath)

# Optionally, print the output
print(f'Converted file saved at: {newfile}')

Explanation:

  1. Instantiating PyPDFOCR: The method run_conversion needs to be called on an instance of PyPDFOCR. You are essentially creating an object (pocr = PyPDFOCR()) that holds the configuration and methods needed to process the PDF.

  2. Calling run_conversion: Once the instance pocr is created, you can call its run_conversion method to convert the PDF.

Additional Notes:

  • Path Handling: Make sure the file path you pass to run_conversion is correct and uses appropriate path separators (e.g., use forward slashes / or double backslashes \\ in file paths on Windows).

  • Error Handling: You may want to add error handling (like checking if the file exists) to make your script more robust.

  • Output: The run_conversion method returns the path to the output file. You can use that to further process or move the output PDF as needed.

With this setup, you should be able to call pypdfocr functions from within a Python script successfully!