Python Multiprocessing: A Complete Guide for Developers

Last updated 2 weeks, 3 days ago | 78 views 75     5

Tags:- Python

Introduction

Python is popular for data science, AI, and backend development—but it often struggles with performance bottlenecks due to the Global Interpreter Lock (GIL).

When dealing with CPU-bound tasks like data processing, mathematical simulations, or image rendering, Python threads can’t fully leverage multiple CPU cores.

This is where Python’s Multiprocessing module comes in:

  • It allows you to bypass the GIL by running separate processes.

  • Each process has its own Python interpreter and memory space.

  • Perfect for tasks that require true parallel execution.


Tutorial Section (Step-by-step Guide)

Step 1: Import the Module

import multiprocessing

Step 2: Creating a Simple Process

from multiprocessing import Process

def worker(name):
    print(f"Hello from process {name}")

if __name__ == "__main__":
    p1 = Process(target=worker, args=("A",))
    p1.start()      # Start the process
    p1.join()       # Wait until process completes

✅ Output:

Hello from process A

Step 3: Using Multiple Processes

from multiprocessing import Process

def square(n):
    print(f"{n} squared is {n*n}")

if __name__ == "__main__":
    numbers = [1, 2, 3, 4]
    processes = []

    for num in numbers:
        p = Process(target=square, args=(num,))
        processes.append(p)
        p.start()

    for p in processes:
        p.join()

Step 4: Multiprocessing with Pool

from multiprocessing import Pool

def cube(x):
    return x**3

if __name__ == "__main__":
    with Pool(4) as pool:
        result = pool.map(cube, [1, 2, 3, 4, 5])
    print(result)

✅ Output:

[1, 8, 27, 64, 125]

⚡ Comparison: Multithreading vs Multiprocessing

Feature Multithreading Multiprocessing
Execution Concurrent (not true parallel) True parallel execution
Best for I/O-bound tasks CPU-bound tasks
Memory Shared memory space Separate memory for each process
Overhead Low Higher (process creation cost)
GIL Effect Affected by GIL Not affected by GIL

 


✅ Complete Functional Example

from multiprocessing import Pool
import time

def compute_square(n):
    time.sleep(1)
    return n * n

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]

    start = time.time()

    with Pool(processes=5) as pool:
        results = pool.map(compute_square, numbers)

    end = time.time()

    print("Squares:", results)
    print("Time Taken:", round(end - start, 2), "seconds")

✅ Output:

Squares: [1, 4, 9, 16, 25]
Time Taken: ~1 second

Without multiprocessing, it would take 5 seconds—this shows the power of parallel execution.


Tips & Common Pitfalls

Best Practices

  • Use multiprocessing for CPU-bound tasks (e.g., computation-heavy).

  • Use multithreading for I/O-bound tasks (network, file I/O).

  • Use Pool.map() for simple parallelism.

  • Always protect code with if __name__ == "__main__": on Windows.

Common Mistakes

  • Forgetting if __name__ == "__main__" → leads to infinite process spawning.

  • Using multiprocessing for small tasks → overhead outweighs performance gain.

  • Sharing state incorrectly → must use multiprocessing.Queue or Manager.


FAQ Section

Q1. Does multiprocessing bypass the GIL?
✅ Yes, because each process runs in its own interpreter.

Q2. When should I use multiprocessing instead of threading?

  • Multiprocessing → CPU-bound tasks

  • Threading → I/O-bound tasks

Q3. How do processes share data?
Use Queue, Pipe, or Manager.

from multiprocessing import Queue
q = Queue()
q.put("Hello")
print(q.get())

Q4. Is multiprocessing always faster?
❌ No. For small tasks, the process creation overhead may make it slower.

Q5. Can I use multiprocessing in Jupyter Notebooks?
⚠️ It’s tricky. Best to run from a .py file due to how Jupyter handles processes.


Cheat Sheet Section

Feature Syntax / Usage
Create process Process(target=func, args=(arg,))
Start process p.start()
Wait for process p.join()
Pool creation with Pool(n) as p:
Map tasks p.map(func, data_list)
Queue q = multiprocessing.Queue()
Shared Value Value('i', 0)
Shared Array Array('i', [1,2,3])

Interview Questions Section

Q1. What is the difference between threading and multiprocessing?

  • Threading → I/O-bound

  • Multiprocessing → CPU-bound, bypasses GIL

Q2. How does multiprocessing overcome the GIL?
Each process runs on its own Python interpreter + memory space.

Q3. Example: Use Pool to calculate factorials in parallel.

from multiprocessing import Pool
import math

nums = [5, 6, 7, 8]
with Pool(4) as pool:
    print(pool.map(math.factorial, nums))

Q4. What are some inter-process communication methods?

  • Queue

  • Pipe

  • Manager

Q5. What are common pitfalls with multiprocessing?

  • Forgetting if __name__ == "__main__"

  • Using it for small tasks (slower).

Q6. How to handle shared state safely?
Use multiprocessing.Manager() for shared dictionaries/lists.

Q7. What’s the difference between Pool and Process?

  • Process → Fine-grained control.

  • Pool → Easier, automatic worker management.


8. Conclusion / Summary

  • Python’s multiprocessing is the go-to solution for CPU-bound tasks.

  • It bypasses the GIL and enables true parallel execution.

  • Use Pool for simplicity, Process for fine control.

  • Remember: Threads for I/O-bound, Processes for CPU-bound.

Best Practice Takeaway: Always benchmark—multiprocessing shines with heavy CPU tasks, but can be overkill for lightweight jobs.