Multiprocessing vs. Threading in Python: What Every Data Scientist Needs to Know
Contents
My learnings from this article on parallel processing :
- Parallel processing in Python can be achieved via two libraries-
multiprocessing
andthreading
- A process is an instance of computer that is executed with its own memory and resources.
- Threads are components of a process which can be run in parallel. There can be multiple threads in a process that share the same memory space of the parent process. They typically have low overhead compared to processes. Rely on Inter-process-communication model provided by the OS
- Spawning process is slower than Spawning threads
- Pitfalls of || programming
- Race condition : Two threads try to modify a variable at the same time
- Starvation : A thread does not get access to the resources
- Deadlock - Threads are waiting for each other to complete execution
- Livelock - Threads are running in a loop without any progress
- Two threads do not write to the same location - This is achieved by GIL - Global Interpreter Lock
- Any function that needs to execute should acquire a global lock. Only a single thread can acquire the lock at a time. This means that interpreter ultimately runs the specification serially
- Usecases of threading - GUI programs, Web scraping
- If the program is CPU intensive, then there is a case for multiprocessing. It outshines threading.
- Threading cannot use multiple cores at all
- CPU bound tasks - Multiprocessing is better than Multithreading
- IP bound tasks - Multithreading is better Multiprocessing
- Analyzing the emails stored in a mail server
- Downloading email box in parallel
- Since it involves a lot of IO operations, this is best done via multi-threading
- Using a RF classifier is a computationally heavy task and hence it is better to use multiprocessor
- Factors to consider are
- Whether there is an IO operation
- Whether IO is the bottleneck
- Whether the task depends on large amount of computation by CPU
Till date I had never considered this aspect at all. In fact I have been totally ignorant about this. I am glad that I have atleast learnt it now.