Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?
Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:
lastName, firstName - [contentVariable] (repeat the above 100,000x)
Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'
I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'
I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')
^ Moving a file from one folder to another.. quite easily done.
Is there a way to modify the above code to apply to a regex (below)?
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')
EDIT: @Byberi - Something as such?
path = "C:/Users/Kenny/Documents/Scans" dirs = os.path.isfile()
This would print all the files and directories
for file in dirs: print file
dest_dir = "C:/Users/Kenny/Documents/Clients" for file in glob.glob(r'C:/*'): print(file)
Answer
Here's how you can achieve your goal of moving PDF files from the :/Scans
folder into their corresponding client folders based on names, using a combination of glob
, os
, and shutil
. The approach is as follows:
Step-by-Step Solution
-
Extract Last Name and First Name from File Names
Use a regular expression to parse thelastName, firstName
pattern from each PDF file name in the:/Scans
folder. -
Match Client Folders
Search for a matching subfolder in the:/Clients
directory. -
Move Files
Useshutil.move()
to move the files to their corresponding client folders.
Implementation
Here’s the complete Python script:
import os
import re
import shutil
import glob
# Define source and destination directories
scans_dir = "C:/Users/Kenny/Documents/Scans"
clients_dir = "C:/Users/Kenny/Documents/Clients"
# Regular expression to extract "lastName, firstName" from file names
name_pattern = re.compile(r'^(.*?, .*?) - ')
# Iterate over all PDF files in the Scans directory
for pdf_file in glob.glob(os.path.join(scans_dir, "*.pdf")):
# Extract the file name from the full path
file_name = os.path.basename(pdf_file)
# Match the client name pattern
match = name_pattern.match(file_name)
if match:
client_name = match.group(1) # Extract "lastName, firstName"
print(f"Processing file: {file_name}, client: {client_name}")
# Construct the expected client folder path
client_folder = os.path.join(clients_dir, client_name)
# Check if the client folder exists
if os.path.isdir(client_folder):
# Move the file to the client's folder
dest_path = os.path.join(client_folder, file_name)
shutil.move(pdf_file, dest_path)
print(f"Moved {file_name} to {client_folder}")
else:
print(f"Client folder not found for: {client_name}. Skipping.")
else:
print(f"File name doesn't match the expected pattern: {file_name}")
Explanation
-
Regex Matching
Thename_pattern
regular expression^(.*?, .*?) -
looks for the "lastName, firstName" format at the start of the file name. -
Directory Matching
The script checks if a folder matching the client name exists in the:/Clients
directory. If it doesn’t exist, the file is skipped, and an error is logged. -
File Moving
Theshutil.move()
function moves the file to the matched client folder. -
Error Handling
The script skips files with unmatched names or missing client folders, logging these cases for manual review.
Sample Folder Structure
Source Directory (Scans
)
Scans/
├── Doe, John - Report.pdf
├── Smith, Jane - Invoice.pdf
├── Doe, John - Summary.pdf
Destination Directory (Clients
)
Clients/
├── Doe, John/
├── Smith, Jane/
Output
For the above folder structure, the script will produce:
Processing file: Doe, John - Report.pdf, client: Doe, John
Moved Doe, John - Report.pdf to C:/Users/Kenny/Documents/Clients/Doe, John
Processing file: Smith, Jane - Invoice.pdf, client: Smith, Jane
Moved Smith, Jane - Invoice.pdf to C:/Users/Kenny/Documents/Clients/Smith, Jane
Processing file: Doe, John - Summary.pdf, client: Doe, John
Moved Doe, John - Summary.pdf to C:/Users/Kenny/Documents/Clients/Doe, John
Notes
-
File Matching Issues
If client names in the file names and folder names don't match exactly (e.g., extra spaces or different casing), you'll need to normalize the strings (e.g.,str.lower()
andstr.strip()
). -
Handling Missing Folders
Optionally, you can create missing client folders:os.makedirs(client_folder, exist_ok=True)
-
Performance
This script is efficient for hundreds or thousands of files but might need optimization for millions of files or folders.
Let me know if you need further clarification or enhancements!