Python Multiprocessing: Execute code serially before and after p

ghz 昨天 ⋅ 7 views

Python Multiprocessing: Execute code serially before and after parallel execution

Novice here: I am trying to execute some code serially and then create a pool of threads and execute some code in parallel. After the parallel execution is done, I want to execute some more code serially.

For example...

import time
from multiprocessing import Pool

print("I only want to print this statement once")



def worker(i):
    """worker function"""
    now = time.time()
    time.sleep(i)
    then = time.time()
    print(now, then)

if __name__ == '__main__':
    with Pool(3) as p:
        p.map(worker, [1, 1, 1])
        p.close()

print("Only print this once as well")

I would like this to return...

I only want to print this statement once
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well

However what it returns is this:

I only want to print this statement once
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well

So it seems to be running the print statements an additional time for each pool.

Any help would be appreciated!

Answer

The issue you're encountering is due to how the Pool from the multiprocessing module works. When you use multiprocessing.Pool, it spawns separate processes for each worker, and this can cause code outside of the if __name__ == '__main__': block to be executed multiple times.

This is a common behavior when using multiprocessing in Python, particularly on Windows (where it uses spawn to start processes). When the script is executed, the code outside the if __name__ == '__main__': block gets re-executed for each worker in the pool. This results in the repeated print statements.

Solution:

  1. Ensure that the pool's creation code is inside the if __name__ == '__main__': block. This prevents the creation of new processes from re-running code that's meant to be run only once.

  2. Remove the extra calls to print(). The additional print("Only print this once as well") is executed because it’s being printed from each process spawned by the pool.

Corrected Code:

import time
from multiprocessing import Pool

def worker(i):
    """worker function"""
    now = time.time()
    time.sleep(i)
    then = time.time()
    print(now, then)

if __name__ == '__main__':
    print("I only want to print this statement once")

    # Create the Pool and map tasks
    with Pool(3) as p:
        p.map(worker, [1, 1, 1])

    print("Only print this once as well")

Key Changes:

  1. Moved the print("I only want to print this statement once") and print("Only print this once as well") statements into the if __name__ == '__main__': block so they only execute once in the main process, not in the child processes.

  2. The Pool(3) and p.map(worker, ...) are inside the if __name__ == '__main__': block, so they will only be executed once, and the worker functions will only print their output when executed in parallel.

Expected Output:

I only want to print this statement once
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well

Explanation:

  • The first print statement (print("I only want to print this statement once")) happens in the main process.
  • Then, the pool of worker processes is created, each of which will execute the worker function, resulting in the printed times.
  • After all the worker processes complete, the final print statement (print("Only print this once as well")) is executed in the main process.

By ensuring that code outside the if __name__ == '__main__': block is only executed in the main process, you avoid the issue of redundant print statements being triggered by each spawned process.