How can you run pypdfocr from within a Python script, as opposed to the command line?
import pypdfocr
from pypdfocr import pypdfocr
from pypdfocr.pypdfocr import PyPDFOCR as pocr
filepath = 'C:/myfolder/myPDF.pdf'
newfile = pocr.run_conversion(filepath)
This throws an error:
Unbound method run_conversion must be called with PyPDFOCR instance as first argument.
Can someone help me fill in the (likely obvious) missing piece?
Answer
To run pypdfocr
from within a Python script, you need to correctly instantiate the PyPDFOCR
class before calling the run_conversion
method. The error message you are seeing (Unbound method run_conversion must be called with PyPDFOCR instance as first argument
) suggests that you're trying to call the method on the class directly, rather than on an instance of the class.
Here's how you can modify your code to correctly use pypdfocr
:
- Import the
PyPDFOCR
class. - Create an instance of the class.
- Call the
run_conversion
method on that instance.
Here's the corrected version of your code:
import pypdfocr
from pypdfocr.pypdfocr import PyPDFOCR
# Create an instance of PyPDFOCR
pocr = PyPDFOCR()
# Path to your PDF file
filepath = 'C:/myfolder/myPDF.pdf'
# Run the conversion
newfile = pocr.run_conversion(filepath)
# Optionally, print the output
print(f'Converted file saved at: {newfile}')
Explanation:
-
Instantiating
PyPDFOCR
: The methodrun_conversion
needs to be called on an instance ofPyPDFOCR
. You are essentially creating an object (pocr = PyPDFOCR()
) that holds the configuration and methods needed to process the PDF. -
Calling
run_conversion
: Once the instancepocr
is created, you can call itsrun_conversion
method to convert the PDF.
Additional Notes:
-
Path Handling: Make sure the file path you pass to
run_conversion
is correct and uses appropriate path separators (e.g., use forward slashes/
or double backslashes\\
in file paths on Windows). -
Error Handling: You may want to add error handling (like checking if the file exists) to make your script more robust.
-
Output: The
run_conversion
method returns the path to the output file. You can use that to further process or move the output PDF as needed.
With this setup, you should be able to call pypdfocr
functions from within a Python script successfully!