Python Multiprocessing: Execute code serially before and after parallel execution
Novice here: I am trying to execute some code serially and then create a pool of threads and execute some code in parallel. After the parallel execution is done, I want to execute some more code serially.
For example...
import time
from multiprocessing import Pool
print("I only want to print this statement once")
def worker(i):
"""worker function"""
now = time.time()
time.sleep(i)
then = time.time()
print(now, then)
if __name__ == '__main__':
with Pool(3) as p:
p.map(worker, [1, 1, 1])
p.close()
print("Only print this once as well")
I would like this to return...
I only want to print this statement once
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
However what it returns is this:
I only want to print this statement once
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
I only want to print this statement once
Only print this once as well
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
So it seems to be running the print statements an additional time for each pool.
Any help would be appreciated!
Answer
The issue you're encountering is due to how the Pool
from the multiprocessing
module works. When you use multiprocessing.Pool
, it spawns separate processes for each worker, and this can cause code outside of the if __name__ == '__main__':
block to be executed multiple times.
This is a common behavior when using multiprocessing
in Python, particularly on Windows (where it uses spawn
to start processes). When the script is executed, the code outside the if __name__ == '__main__':
block gets re-executed for each worker in the pool. This results in the repeated print statements.
Solution:
-
Ensure that the pool's creation code is inside the
if __name__ == '__main__':
block. This prevents the creation of new processes from re-running code that's meant to be run only once. -
Remove the extra calls to
print()
. The additionalprint("Only print this once as well")
is executed because it’s being printed from each process spawned by the pool.
Corrected Code:
import time
from multiprocessing import Pool
def worker(i):
"""worker function"""
now = time.time()
time.sleep(i)
then = time.time()
print(now, then)
if __name__ == '__main__':
print("I only want to print this statement once")
# Create the Pool and map tasks
with Pool(3) as p:
p.map(worker, [1, 1, 1])
print("Only print this once as well")
Key Changes:
-
Moved the
print("I only want to print this statement once")
andprint("Only print this once as well")
statements into theif __name__ == '__main__':
block so they only execute once in the main process, not in the child processes. -
The
Pool(3)
andp.map(worker, ...)
are inside theif __name__ == '__main__':
block, so they will only be executed once, and the worker functions will only print their output when executed in parallel.
Expected Output:
I only want to print this statement once
1533511478.0619314 1533511479.0620182
1533511478.0789354 1533511479.0791905
1533511478.0979397 1533511479.098235
Only print this once as well
Explanation:
- The first print statement (
print("I only want to print this statement once")
) happens in the main process. - Then, the pool of worker processes is created, each of which will execute the
worker
function, resulting in the printed times. - After all the worker processes complete, the final print statement (
print("Only print this once as well")
) is executed in the main process.
By ensuring that code outside the if __name__ == '__main__':
block is only executed in the main process, you avoid the issue of redundant print statements being triggered by each spawned process.