Python- Move All PDF Files in Folder to NewDirectory Based on Ma

ghz 15hours ago ⋅ 7 views

Python- Move All PDF Files in Folder to NewDirectory Based on Matching Names, Using Glob or Shutil

I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?

Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.

Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:

lastName, firstName - [contentVariable] (repeat the above 100,000x)

Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'

I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'

I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.

shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')

^ Moving a file from one folder to another.. quite easily done.

Is there a way to modify the above code to apply to a regex (below)?

shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')

EDIT: @Byberi - Something as such?

path = "C:/Users/Kenny/Documents/Scans"    dirs = os.path.isfile()

This would print all the files and directories

for file in dirs:       print file
dest_dir = "C:/Users/Kenny/Documents/Clients" for file in glob.glob(r'C:/*'):    print(file)

Answer

Here's how you can achieve your goal of moving PDF files from the :/Scans folder into their corresponding client folders based on names, using a combination of glob, os, and shutil. The approach is as follows:


Step-by-Step Solution

  1. Extract Last Name and First Name from File Names
    Use a regular expression to parse the lastName, firstName pattern from each PDF file name in the :/Scans folder.

  2. Match Client Folders
    Search for a matching subfolder in the :/Clients directory.

  3. Move Files
    Use shutil.move() to move the files to their corresponding client folders.


Implementation

Here’s the complete Python script:

import os
import re
import shutil
import glob

# Define source and destination directories
scans_dir = "C:/Users/Kenny/Documents/Scans"
clients_dir = "C:/Users/Kenny/Documents/Clients"

# Regular expression to extract "lastName, firstName" from file names
name_pattern = re.compile(r'^(.*?, .*?) - ')

# Iterate over all PDF files in the Scans directory
for pdf_file in glob.glob(os.path.join(scans_dir, "*.pdf")):
    # Extract the file name from the full path
    file_name = os.path.basename(pdf_file)

    # Match the client name pattern
    match = name_pattern.match(file_name)
    if match:
        client_name = match.group(1)  # Extract "lastName, firstName"
        print(f"Processing file: {file_name}, client: {client_name}")

        # Construct the expected client folder path
        client_folder = os.path.join(clients_dir, client_name)

        # Check if the client folder exists
        if os.path.isdir(client_folder):
            # Move the file to the client's folder
            dest_path = os.path.join(client_folder, file_name)
            shutil.move(pdf_file, dest_path)
            print(f"Moved {file_name} to {client_folder}")
        else:
            print(f"Client folder not found for: {client_name}. Skipping.")
    else:
        print(f"File name doesn't match the expected pattern: {file_name}")

Explanation

  1. Regex Matching
    The name_pattern regular expression ^(.*?, .*?) - looks for the "lastName, firstName" format at the start of the file name.

  2. Directory Matching
    The script checks if a folder matching the client name exists in the :/Clients directory. If it doesn’t exist, the file is skipped, and an error is logged.

  3. File Moving
    The shutil.move() function moves the file to the matched client folder.

  4. Error Handling
    The script skips files with unmatched names or missing client folders, logging these cases for manual review.


Sample Folder Structure

Source Directory (Scans)

Scans/
├── Doe, John - Report.pdf
├── Smith, Jane - Invoice.pdf
├── Doe, John - Summary.pdf

Destination Directory (Clients)

Clients/
├── Doe, John/
├── Smith, Jane/

Output

For the above folder structure, the script will produce:

Processing file: Doe, John - Report.pdf, client: Doe, John
Moved Doe, John - Report.pdf to C:/Users/Kenny/Documents/Clients/Doe, John
Processing file: Smith, Jane - Invoice.pdf, client: Smith, Jane
Moved Smith, Jane - Invoice.pdf to C:/Users/Kenny/Documents/Clients/Smith, Jane
Processing file: Doe, John - Summary.pdf, client: Doe, John
Moved Doe, John - Summary.pdf to C:/Users/Kenny/Documents/Clients/Doe, John

Notes

  • File Matching Issues
    If client names in the file names and folder names don't match exactly (e.g., extra spaces or different casing), you'll need to normalize the strings (e.g., str.lower() and str.strip()).

  • Handling Missing Folders
    Optionally, you can create missing client folders:

    os.makedirs(client_folder, exist_ok=True)
    
  • Performance
    This script is efficient for hundreds or thousands of files but might need optimization for millions of files or folders.

Let me know if you need further clarification or enhancements!