Effective Python - Book Review

2022-08-25

book-reviews

8786 words 42 mins read

A few months ago, I had this thought of practicing Python every day for 20 minutes. If you use Python in your daily work, you should not rely on that work a substitute for a deliberate practice session. This was also echoed by Josh Kaufman in his book, The First Twenty Hours, where he could not rely on daily work that involved typing as a substitute for a deliberate practice session on touchtyping. If you are trying to learn touch typing, you might assume that since you are anyway typing emails, reports etc, you are in essence doing deliberate practice. Not really. Once you are in a deliberate practice session, your focus become the craft itself unlike the outcome of the specific task. Unless you set aside some time for the task on a regular basis, it is difficult to improve in any skill, be it touch typing or coding python.

In any case, setting aside a 20 min time slot for going through the book, “Effective Python” , helped me in reading this book slowly and digest all the wonderful information present in it. In any case, this book cannot be consumed in a few sittings. It will take quite amount of time to read, to think and understand various ways in which one could improve the craft of coding

This blogpost summarizes some of the main points from the book.

Pythonic Thinking

Python version

1
2


import sys
sys.version_info

1

sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0)

Difference between `str` and `bytes`

There are two types that represent sequences of character data: bytes and str.

Instances of bytes contain raw, unsigned 8-bit values.

1
2


a = b'h\x65llo'
a, list(a)


b	hello	(104 101 108 108 111)

Instances of str contain Unicode code points that represent textual characters from human languages

1
2


a = 'a\u002A sdfdf'
a, list(a)


a* sdfdf	(a * s d f d f)

str instances do not have an associated binary encoding, and bytes instances do not have an associated text encoding. To convert Unicode data to binary data, you must call the encode method. To convert binary data to Unicode data, you must call decode method of bytes

def to_str(bytes_or_str):
    if isinstance(bytes_or_str, bytes):
        value = bytes_or_str.decode("utf-8")
    else:
        value = bytes_or_str
    return value
return to_str('hello'), to_str(b'hello')
None


hello	hello


def to_bytes(bytes_or_str):
    if isinstance(bytes_or_str, str):
        value=bytes_or_str.encode('utf-8')
    else:
        value = bytes_or_str
    return value
return to_bytes(b'foo'), to_bytes('bar')


b	foo	b	bar

You can add two str instances or two bytes instances but you cannot add a byte instance to a str instance
If the file is opened in 'r' mode or 'w' mode, it expects that the file is in text mode. write operations expect str instances and read operations uses the system’s default text encoding to interpret data
If you want to read or write unicode data to/from a file, be careful about system’s default text encoding. Explicitly pass the encoding parameter to open if you want to avoid surprises
If you want to read or write binary data to/from a file, always open the file using a binary mode(like ‘rb’ or ‘wb’)
bytes and str instances can’t be used together with operators like (>,==, + and %)
Use helper functions to ensure that the inputs you operate are the type of character sequence that you expect(8-bit values, UTF-8-encoded strings, Unicode points)

Prefer interpolated `F-strings` Over C-style format strings

Python has four different ways to formatting strings that are built in to the language and the standard library

Use formatting operator %. These come from C’s printf function
- One can use the % operator with a dict

1
2
3


a = 0b10111011
b = 0xc5f
return 'Binary is %d, hex is %d '%(a,b)

1

Binary is 187, hex is 3167

Python 3 added support for advanced string formatting that is more expressive than the old C-style format strings that use the % operator. For individual python values, this new functionality can be accessed through the format built-in function.

1
2


a = 1234.4
return format(a, ',.2f')

1

1,234.40

1
2
3


key = "rk"
value = "45"
return '{}={}'.format(key, value)

1

rk=45

You can use the new functionality to format multiple values together by calling the new format method of the str type
Python 3.6 added interpolated format strings – f strings for short – to solve most of the problems associated with displaying formatted strings
- Python expression may also appear within the format specifier options

1
2
3


key = "rk"
value = "45"
return f'{key}={value}'

1

rk=45

1
2
3


key = "rk"
value = 45.12
return f'{key:<10}={value:.1f}'

1

rk        =45.1

Takeaways

C-style format strings that use the % operator suffer from a variety of gotchas and verbosity problems
The str.format introduces some useful concepts in its formatting specifiers mini language, but it otherwise repeats the mistakes of C-style format strings and should be avoided
F-strings are a new syntax for formatting values into strings that solves the biggest problems with C-style format strings
F-string are succinct yet powerful because they allow for arbitrary Python expressions to be directly embedded within the format specifiers

Write Helper functions instead of Complex expressions

1
2
3


from urllib.parse import parse_qs
my_values = parse_qs('red=5&blue=10')
return my_values


red	:	(5)	blue	:	(10)

Python’s syntax makes it easy to write single-line expressions that are overly complicated and difficult to read. Hence it is better if you move such complicated expressions to helper functions

Prefer Multiple Assignment Unpacking over Indexing

Unpacking has less visual noise than accessing the tuple’s indexes and it often requires fewer lines.

books_to_read = [
    ("R", "Resampling"),
    ("Python", "Effective Python"),
    ("Finance", "Factor Models in R"),
]
for i, (sub, book) in enumerate(books_to_read, 1):
print(f"{i}: Subject {sub} Book{book}")

1
2
3


1: Subject R BookResampling
2: Subject Python BookEffective Python
3: Subject Finance BookFactor Models in R

Unpacking is generalized in Python and can be applied to any iterable, including many levels of iterables within iterables.

Prefer enumerate over range

<< operator is zero fill left shift operator
>> operator is zero fill right shift operator
| operator is OR operator
enumerate provides a concise syntax for looping over an iterator and getting the index of each item from the iterator as you go
Prefer enumerate instead of looping over a range and indexing in to a sequence
You can supply a second parameter to enumerate to specify the number from which to begin counting

Use `zip` to process iterators in parallel

Beware of the situation where the iterators are not of equal length. It yields tuples until any one of the wrapped iterators is exhausted
one can also use zip_longest for the case where iterators of varying lengths

Avoid `else` Blocks After `for` and `while` loops

else block runs immediately after the loop finishes
the else block runs only if the loop body did not encounter a break statement
Avoid using else blocks after loops because their behavior isn’t intuitive and can be confusing

Prevent Repetition with Assignment Expressions

An assignment expression - also known as walrus operator- is a new syntax introduced in Python 3.8 to solve a long-standing problem with the language. It is written as

fresh_fruit = {
'apple':10, 'banana':8, 'lemon':5
}
if count:= fresh_fruit.get('lemon',0):
    print('Yes lemon')
else:
    print('No lemon')
if (count:= fresh_fruit.get('lemon',0)) > 4:
print('Yes Cider')
else:
print('No Cider')

1
2


Yes lemon
Yes Cider

walrus operator can also be used a substitute for deeply nested if,elif,else statements.
walrus operator can also be used to eliminate loop-and-a-half idiom
Although switch-case statements and do-while loops are not available in Python, their functionality can be emulated much more clearly by using assignment expressions

Lists and Dictionaries

Know How to Slice Sequences

When slicing from a start of a list, you should leave out the zero index to reduce visual noise
When slicing to the end of a list, you should leave out the final index because it is redundant
The result of slicing a list is a whole new list.
Assigning to a list slice replaces that range in the original sequence with what’s referenced even if the lengths are different

Avoid Striding and Slicing in a Single Expression

Specifying start, end and stride in a slice can be extremely confusing
Prefer using positive stride values in slices without start or end indexes. Avoid negative stride values if possible
Avoid using start, end and stride together in a single slice. If you need all three parameters, consider doing two assignments

Prefer Catch-All Unpacking over Slicing

Use unpacking pattern

1
2
3


x = list(range(10))
a, b, *c = x
return f"a:{a}, b:{b}, c:{c}"

1

a:0, b:1, c:[2, 3, 4, 5, 6, 7, 8, 9]

Starred expressions may appear in any position, and they will always become a list containing the zero or more values they receive
When dividing a list in to non-overlapping pieces, catch-all unpacking is much less error prone than slicing and indexing

Sort by Complex Criteria Using the `key` parameter

Sorting arbitrary python objects in a list works by invoking the relevant comparison methods on the object. If the object does not implement the comparison operator, then there is a syntax error

class Tool:
    def __init__(self, name, weight):
        self.name = name
        self.weight = weight
<span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
    <span class="k">return</span> <span class="n">f</span><span class="s2">&#34;Tool({self.name!r}, {self.weight})&#34;</span>

tools = [Tool("level", 2), Tool("axe", 21)]
return sorted(tools, key=lambda x: x.name), sorted(tools, key=lambda x: x.weight)

Tool	(axe 21)	Tool	(level 2)
Tool	(level 2)	Tool	(axe 21)

Tuples are comparable by default and have a natural ordering
Returning a tuple from the key function allows you to combine multiple sorting criteria together. The unary minus operator can be used to reverse individual sort orders for types that allow it
For types that can’t be negated, you can combine many sorting criteria together by calling the sort method multiple times using different key functions and reverse values, in the order of lowest rank sort call to highest rank sort call

Be Cautious When Relying on `dict` insertion ordering

In Python 3.5 and before, iterating over a dict would return keys in an arbitrary order. This happened because the dictionary type previously implemented its hash table algorithm with an combination of the hash built-in function and a random seed that was assigned when the Python interpreter started
Starting with Python 3.6 and officially part of Python spec in version 3.7, dictionaries preserve the insertion order

Prefer `get` Over `in` and `KeyError` to Handle missing Dictionary Keys

There are four common ways to detect and handle missing keys in dictionaries: using in expressions, KeyError exceptions, the get method and setdefault method
The get method is best for dictionaries that contain basic types like counters, and it is preferable along with assignment expressions when creating dictionary values has a high cost or may raise exceptions
setdefault tries to fetch the value of a key in the dictionary. If the key isn’t present, the method assigns that key to the default value provided.
When the setdefault method of dict seems like the best fit for your problem, you should consider using defaultdict method

Prefer `defaultdict` Over `setdefault` to handle missing items in internal state

If you are creating a dictionary to mange an as arbitrary set of potential keys, then you should prefer using a defaultdict instance from the collections built-in module if it suits your problem
If a dictionary of arbitrary keys is passed to you, and you don’t control its creation, then you should prefer the get method to access its items. However, it’s worth considering using the setdefault method for a few situations in which it leads to shorter code

Know how to construct key-dependent default values using `missing`

The setdefault method of dict is a bad fit when creating the default value has high computational cost
The function passed to defaultdict must not require any arguments, which makes it impossible to have the default value depend on the key being accessed
You can define your own dict subclass with a __missing__ method in order to construct default values that must know which key was being accessed

Functions

Never Unpack More than three variables when functions return multiple values

You can have functions return multiple values by putting them in a tuple and having the caller take advantage of Python’s unpacking syntax
Multiple return values from a function can also be unpacked by catch-all starred expressions
Unpacking into four or more variables is error prone and should be avoided. One can use a namedtuple instance

Prefer Raising Exceptions to Returning None

Functions that return None to indicate special meaning are error prone because None and other values all evaluate to False in conditional expectations
Raise exceptions to indicate special situations instead of returning None.
Type annotations can be used to make it clear that a function will never return the value None, even in special situations

Know How closures interact with Variable scope

Python supports closures - that is, functions that refer to variables from the scope in which they were defined
Python has specific rules for comparing sequences. It first compares items at index zero; if they are still equal, it compares items at index two, and so on
When you reference a variable in an expression, the Python interpreter traverses the scope to resolve the reference in this order
- the current function’s scope
- any enclosing scopes
- the scope of the module that contains the code
- the built-in scope
Assigning a value to a variable works differently. If the variable is already defined in the current scope, it will just take on the new value. If the variable doesn’t exist in the current scope, Python treats the assignment as a variable definition. Critically, the scope of the newly defined variable is the function that contains the assignment
There is a special syntax for getting data out of closure. The nonlocal statement is used to indicate the scope traversal should happen upon assignment for a specific variable name.
avoid using nonlocal statements for anything beyond simple functions
use the nonlocal statement to indicate when a closure can modify a variable in its enclosing scope
By default. closures can’t affect enclosing scopes by assigning variables

Reduce Visual Noise with variable positional arguments

Optional positional arguments are always turned into a tuple before they are passed to a function
functions that accept *args are best for situations where you know the number of inputs in the argument list will be reasonably small
Using the * operator with a generator may cause a program to run out of memory and crash

Provide Optional Behavior with Keyword Arguments

Positional arguments must be specified before keyword arguments
Function arguments can be specified by position or by keyword
Keywords make it clear what the purpose of each argument is when it would be confusing with only positional arguments
Keyword arguments with default values make it easy to add new behaviors to a function without needing to migrate all existing callers
Optional keyword arguments should always be passed by keyword instead of by position

Use None and Docstrings to specify dynamic default arguments

A default argument value is evaluated only once per module load, which usually happens when a program starts up. After the module containing this code is loaded, the datetime.now() default argument will never be evaluated again
Use None as the default value for any keyword argument that has a dynamic value. Document the default behavior using the function’s docstring

Enforce clarity with Keyword only arguments and Positional arguments

Keyword-only arguments force callers to supply certain arguments by keyword, which makes the intention of the function call clearer. Keyword-only arguments are defined after a single * in the argument list
Positional-only arguments ensure that callers can’t supply certain parameters using keywords, which help reduce coupling
Parameters between the / and * characters in the argument list may be supplied by position or keyword

Define Function decorators with functools.wraps

Decorators in Python are syntax to allow one function to modify another function at runtime
Using decorators can cause strange behaviors in tools that do introspection
Use the wraps decorator from the functools built-in module when you define your decorators to avoid issues

Comprehensions and Generators

Use comprehensions instead of `map` and `filter`

List comprehensions are cleaner than the map and filter built-in functions because they don’t require lambda expressions

1
2
3
4


data = list(range(10))
x1 =[x*2 for x in data if x%2 ==0]
x2 = list(map(lambda x : x*2, filter(lambda x: x%2==0, data)))
x1==x2

1

True

List comprehensions allow you to easily skip items from the input list, a behavior that map doesn’t support without help of a filter
Dictionaries and sets can also be created using comprehensions

Avoid more than two control subexpressions in comprehensions

comprehensions support multiple if conditions. multiple conditions at the same loop level have and implicit and expression
comprehensions support multiple levels and multiple conditions per loop level

Avoid repeated work in comprehensions by using Assignment expressions

if a comprehension uses the walrus operator in the value part of the comprehension and doesn’t have a condition, it ’ll leak the loop variable in the containing scope
Assignment expressions make it possible for comprehensions and generator expressions to reuse the value from one condition elsewhere in the same comprehension, which can improve readability and performance

Consider Generators Instead of Returning Lists

Using generators can be clearer than the alternative of having a function return a list of accumulated results
The iterator returned by a generator produces the set of values passed to yield expressions within the generator function’s body
Generators can produce a sequence of outputs for arbitrarily large inputs because their working memory doesn’t include all inputs and outputs

Be Defensive when iterating over arguments

The iterator protocol is how python for loops and related expressions traverse the contents of a container type. When Python sees a statement like for x in foo, it actually calls iter(foo). The iter built-in function calls the foo.__iter__ special method in turn. The __iter__ method must return an iterator object. Then , the for loop repeatedly calls the next built-in function on the iterator object until its exhausted
when an iterator is passed to the iter built-in function, iter returns the iterator itself
when a container type is passed to iter, a new iterator object is returned each time
Beware of functions and methods that iterate over input arguments multiple times. If these arguments are iterators, you may see strange behavior and missing values
Python’s iterator protocol defines how containers and iterators interact with iter and next built-in functions, for loops and related expressions
You can easily define your own iterable container type by implementing the __iter__ method as generator
You can detect that a value is an iterator if called iter on it produces the same value as what you passed in,

Consider Generator Expressions for Large List Comprehensions

List comprehensions can cause problems for large inputs by using too much memory.
Generator expressions avoid memory issues by producing outputs one at a time as iterators
Generator expressions can be composed by passing the iterator from one generator expression into the for subexpression of another
Generator expressions execute very quickly when chained together and are memory efficient

Compose Multiple Generators with `yield from`

The yield from expression allows you to compose multiple nested generators together into a single combined generator
yield from provides a better performance than manually iterating nested generators and yielding their outputs

Avoid Injecting Data into Generators with `send`

Python generators support the send method, which upgrades yield expressions into a two-way channel. The send method can be used to provide streaming inputs to a generator at the same time it’s yielding outputs.
The send method can be used to inject data into a generator by giving the yield expression a value that can be assigned to a variable
using send with yield from expressions may cause surprising behavior, such as None values appearing at unexpected times in the generator output
Providing an input iterator to a set of composed generators is a better approach than using the send method

Avoid Causing State Transitions in Generators with `throw`

The way throw works is simple: When the method is called, the next occurrence of a yield expression re-raises the provided Exception instance after its output is received instead of continuing normally
The throw method can be used to re-raise exceptions within generators at the position of the most recently executed yield expression

Consider `itertools` for working with iterators and generators

use chain to combine multiple iterators into a single sequential iterator
use repeat to output a single value forever
use cycle to repeat an iterator’s items forever
use tee to split a single iterator into a number of parallel iterators
use islice to slice an iterator by numerical indexes without copying
use takewhile and dropwhile to filter iterator values
accumulate folds an item from an iterator into a running value by applying a function that takes two parameters
product returns the cartesian product of items from one or more iterators
permutations returns the unique ordered permutations of length N with items from an iterator
The itertools functions fall in to three main categories for working with iterators and generators - linking iterators together, filtering items they output, and producing combination of items

Classes and Interfaces

Compose classes instead of nesting many levels of built-in types

Avoid making dictionaries with values that are dictionaries, long tuples or complex nestings of other built-in types
Use namedtuple for lightweight, immutable data containers before you need the flexibility of a full class
Move your bookkeeping code to using multiple classes when your internal state dictionaries get complicated
Although a namedtuple is useful in many circumstances, it’s important to understand when it can do more harm than good:
- You can’t specify default argument values for the namedtuple classes. This makes them unwieldy when your data may have many optional properties
- The attribute values of namedtuple instances are still accessible using numerical indexes and iteration

Accept Functions Instead of Classed for Simple Interfaces

Instead of defining and instantiating classes, you can often simply use functions for simple interfaces between components in Python
References to functions and methods in Python are first class meaning they can be used in expressions
The __call__ special method enables instances of a class to be called like plain Python functions
When you need a function to maintain state, consider defining class that provides the __call__ method instead of defining a stateful closure

Use @classmethod Polymorphism to Construct Objects Generically

Polymorphism enables multiple classes in a hierarchy to implement their own unique versions of a method. This means that many classes can fulfill the same interface or abstract base class while providing different functionality
Use @classmethod to define alternative constructor for your classes
Use class method polymorphism to provide generic ways to build and connect many concrete subclasses
https://realpython.com/courses/threading-python/

Initialize Parent Classes with `super`

Python’s standard method resolution order (MRO) solves the problems of superclass initialization order and diamond inhertiance
Use the super built-in function with zero arguments to initialize parent classes

Metaclasses and Attributes

Metaclass lets you intercept Python’s class statement and provide special behavior each time a class is defined

Use Plain attributes instead of Setter and Getter methods

In Python, you never need to implement explicit setter or getter methods.
property() is a built-in function that creates and returns a property object. The syntax of this function is

1

property(fget=None,fset=None, fdel=None, doc=None)

where

fget is the function to get the value of the attribute
fset is the function to set the value of the attribute
fdel is the function to delete the attribute
doc is a string
Define new alss interfaces using simple public attributes and avoid defining setter and getter methods
Use @property to define special behavior when attributes are accessed on your objects, if necessary
Follow the rule of least surprise and avoid odd side effects in your @property methods
Ensure the @property methods are fast; for slow or complex work - especially involving I/0 or causing side effects - use normal methods instead

Consider `@property` instead of refactoring attributes

Use @property to give existing instance attributes new functionality
Make incremental progress towards better data models by using @property
Consider refactoring a class and all call sites when you find yourself using @property too heavily

Use Descriptors for Reusable `@property` methods

The big problem with @property built-in is reuse. The methods it decorates can’t be reused for multiple attributes of the same class.
The descriptor protocol defines how attribute access is interpreted by the language. Descriptor class can provide get and set methods that let you reuse any validation logic without boiler plate
weakref module: This module provides a special class called WeakKeyDictionary that can take the place of simple dictionary. The unique behavior of WeakKeydictionary is Python does the bookkeeping and the dictionary will be empty when all the keys are no longer in use
Reuse the behavior and validation of @property methods by defining your own descriptor classes
Use WeakKeyDictionary to ensure that the descriptor classes don’t cause memory leaks
Don’t get bogged won trying to understand exactly how __getattribute__ uses the descriptor protocol for getting and setting attributes

Use `getattr, getattribute, setattr__` for Lazy attributes

Learned that it is important to pay attention if your classes have an implementation of __getattribute__

Use __getattr__ and __setattr__ to lazily load and save attributes for an object
Understand that __getattr__ only gets called when accessing a missing attribute, whereas __getattribute__ gets called every time any attribute is accessed
Avoid infinite recursion in __get_attribute__ and __setattr__ by using methods from super() to access instance attributes

Validate Subclasses with `initsubclass__`

A metaclass is defined by inheriting from type
A metaclass receives the contents of the associated class statements in its __new__ method
The metaclass has access to the name of the class, the parent classes it inherits from and all the class attributes that are defined in the class body
Python 3.6 introduced a simplified syntax __init__subclass__ that can be used to validate the object hierarchy

Register Class Existence with `__init_subclass__`

Class registration is a helpful pattern for building modular Python programs
Metaclasses let you run registration code automatically each time a base class is subclassed in a program
Using metaclasses for class registration helps you avoid errors by ensuring that you never miss a registration call
Prefer __init_subclass__ over standard metaclass machinery because it’s clearer and easier for beginners to understand

Concurrency and Parallelism

Use `subprocess` to manage child processes

Python has many ways to run subprocesses, but the best choice for managing child processes is to to use the subprocess built-in module
Child processes run in parallel with the Python interpreter, enabling you to maximize your usage of CPU cores
Use the run convenience function for simple usage, and the Popen class for advanced usage like UNIX-style pipelines
Use the timeout parameter of the communicate method to avoid dead-locks and hanging child processes

Use threads for blocking I/O, Avoid for Parallelism

Because of the way CPython works, threading may not speed up all tasks. This is due to interactions with the GIL that essentially limit one Python thread to run at a time

The standard implementation of Python is called CPython. CPython runs a Python program in two steps. First it parses and compiles the source text into bytecode, which is a low-level representation of the program. Then, CPython runs the bytecode using a stack-based interpreter. The bytecode interpreter has state that must be maintained and coherent while the program executes. CPython enforces coherence with GIL
GIL is a mutex that prevents CPython from being affected by preemptive multithreading, where one thread takes control of a program by interrupting another thread.
Why does Python supports thread at all ?
- Multiple threads make it easy for a program to seem like it’s doing multiple things at the same time. Managing the juggling act of simultaneous tasks is difficult to implement yourself. With threads, you can leave it to Python to run your function concurrently
- Helps in dealing with blocking I/O which happens when Python does certain types of system calls
All system calls will run in parallel from multiple Python threads even though they are limited by the GIL. The GIL prevents my Python code from running in parallel but it doesn’t have an effect on system calls. This works because Python threads release the GIL just before they make system calls, and they reacquire the GIL as soon as the system calls are done
Use Python threads to make multiple system calls in parallel. This allows you to do blocking I/O at the same time as the computation

Use `Lock` to prevent data races in threads

Although only one Python thread runs at a time, a thread’s operations on data structures can be interrupted between any two byte code instructions in the Python interpreter


from threading import Thread
class Counter:
def init(self):
self.count = 0
<span class="k">def</span> <span class="nf">increment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">offset</span><span class="p">):</span>
    <span class="bp">self</span><span class="o">.</span><span class="n">count</span> <span class="o">+=</span> <span class="n">offset</span>

def worker(sensor_index, how_many, counter):
for _ in range(how_many):
counter.increment(1)
how_many = 10 ** 5
counter = Counter()
threads = []
for i in range(5):
thread = Thread(target=worker, args=(i, how_many, counter))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
expected = how_many * 5
found = counter.count
print(f"Counter should be {expected}, got {found}")

1

Counter should be 500000, got 374258

The python interpreter enforces fairness between all of the threads that are executing to ensure they get roughly equal processing time. To do this, Python suspends a thread as it’s running and resumes another thread in turn. The problem is that you don’t know exactly when Python will suspend your threads. A thread can even be paused seemingly halfway through what looks like an atomic operation

The above program can be easily modified with the help of Lock to get the desired output

from threading import Thread
from threading import Lock
class Counter:
def init(self):
self.count = 0
self.lock = Lock()
<span class="k">def</span> <span class="nf">increment</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">offset</span><span class="p">):</span>
    <span class="k">with</span> <span class="bp">self</span><span class="o">.</span><span class="n">lock</span><span class="p">:</span>
        <span class="bp">self</span><span class="o">.</span><span class="n">count</span> <span class="o">+=</span> <span class="n">offset</span>

def worker(sensor_index, how_many, counter):
for _ in range(how_many):
counter.increment(1)
how_many = 10 ** 5
counter = Counter()
threads = []
for i in range(5):
thread = Thread(target=worker, args=(i, how_many, counter))
threads.append(thread)
thread.start()
for thread in threads:
thread.join()
expected = how_many * 5
found = counter.count
print(f"Counter should be {expected}, got {found}")

1

Counter should be 500000, got 500000

Use `Queue` to Coordinate Work Between Threads

Pipelines are a great way to organize sequences of work - especially I/O bound programs - that run concurrently using multiple Python threads
Be aware of the many problems in building concurrent pipelines: busy waiting, how to tell workers to stop, and potential memory explosion
The Queue class has all the facilities you need to build robust pipelines: blocking operations, buffer sizes and joining

Know How to Recognize When Concurrency is Necessary

A program often grows to require multiple concurrent lines of execution as its scope and complexity increases
The most common types of concurrency coordination are fan-out(generating new units of concurrency) and fan-in (waiting for existing units of concurrency to complete)
Python has many different ways of achieving fan-out and fan-in

Avoid Creating New `Thread` Instances for On-demand Fan-out

The Thread instances require special tools to coordinate with each other safely. This makes the code that uses threads harder to reason than the procedural, single-threaded code from before. This complexity makes threaded code ore difficult to extend and maintain over time
Threads require a lot of memory - about 8 MB per executing thread. On many computers, that amount of memory doesn’t matter for let’s say 100 threads. But if you span 10000 threads, then it is a issue as you would need 80GB of memory
Starting a thread is costly, and threads have a negative performance impact when they run due to context switching between them. In this case, all of the threads are started and stopped each generation of the game, which has high overhead and will increase latency beyond the expected I/O time
Thread class will independently catch any exceptions that are raised by the target function and then write their traceback to sys.stderr. Such exceptions are never re-reraised to the caller that started the thread in the first place
Threads have many downsides. They’re costly to start and run if you need a lot of them, they each require a significant amount of memory, and they require special tools like Lock instances for coordination
Threads do not provide a built-in way to raise exceptions back in the code that started a thread or that is waiting for one to finish which makes them difficult to debug.

Understand How Using Queue for Concurrency Requires Refactoring

Using Queue instances with a fixed number of worker threads improves the scalability of fan-out and fan-in using threads.
It takes a significant amount of work to refactor existing code to use Queue, especially when multiple stages of a pipeline are required
Using Queue fundamentally limits the total amount of I/O parallelism a program can leverage compared to alternative approaches provided by other built-in Python features and modules

Consider `ThreadPoolExecutor` when threads are necessary for concurrency

Python include concurrent.futures built-in module, which provides the ThreadPoolExecutor class. It combines the best of Thread and Queue
The threads used for the executor can be allocated in advance, which means three is no startup cost for each execution
ThreadPoolExecutor automatically propagates exceptions back to the caller
The big problem with using ThreadPoolExecutor is that it won’t be able to scale
Although ThreadPoolExecutor eliminates the potential memory blow-up issues of using threads, it also limits I/O parallelism by requiring max_workers to be specified upfront
ThreadPoolExecutor enables simple I/O parallelism with limited refactoring, easily avoiding the cost of thread startup each time fan out concurrency is required

Achieve Highly Concurrent I/O with Coroutines

Python addresses the need for highly concurrent I/O with coroutines. Coroutines let you have a very large number of seemingly simultaneous functions in your Python programs.
The cost of starting a coroutine is a function call. Once a coroutine is active, it uses less than 1 KB of memory until it’s exhausted
Like threads, coroutines are independent functions that can consume inputs from their environment and produce resulting outputs. The difference is that coroutines pause at each await expression and resume executing an async function after the pending awaitable is resolved
The magic mechanism powering coroutines is the event loop, which can do highly concurrent I/O efficiently, while rapidly interleaving execution between appropriately written functions The beauty of coroutines is that they decouple your code’s instructions for the external environments from the implementation that carries out your wishes.
Coroutines can use fan-out and fan-in in order to parallelize I/O while also overcoming all the problems associated with doing I/O in threads

Know how to port threaded I/O to `asyncio`

Python’s support for asynchronous execution is well integrated in to the language
Python provides asynchronous versions of for loops, with statements, generators, comprehensions and library helper functions that can be used as drop-in replacements in coroutines
The asyncio built-in module makes it straightforward to port existing code that uses threads and blocking I/O over to coroutines and asynchronous I/O

Consider `concurrent.futures` for True Parallelism

It enables Python to utilize multiple CPU cores in parallel by running additional interpreters as child processes. These child processes are separate from the main interpreter, so their global interpreter locks are also separate. Each child can fully utilize one CPU core. Each child has a link to the main process where it receives instructions to do computation and returns results
What does ProcessPoolExecutor do ?
- It takes each item from the args list
- It serializes the item in to a binary data using pickle module
- It copies the serialized data from the main interpreter process to a child interpreter process over a local socket
- It deserializes the data back into Python objects, using pickle in the child process
- It imports the Python module containing the relevant function
- It runs the function on the input data in parallel with other child processes
- It serializes the results back into binary data
- It copies the binary data back through the socket
- It deserializes the binary data back into Python objects in the parent process
- It merges the results from multiple children
Moving CPU bottlenexts to C-extension modules can be an effective way to improve performance while maximizing your investment in Python code
The multiprocessing module provides powerful tools that can parallelize certain types of Python computation with minimal effort
The power of multiprocessing is best accessed through the concurrent.futures built-in module
Avoid the advanced parts of multiprocessing module until you have exhausted all other options

Robustness and Performance

Take Advantage of Each Block in `try/except/else/finally/`

use try/finally when you want exceptions to propagate up but also want yo run up cleanup code even when exceptions occur
use try/except/else to make it clear which exceptions will be handled by your code and which exceptions will propagate up
Use try/except/else/finally when you want to do it all in one compound statement. For example, say that I want to read a description of work to do from a file, process it, and then update the file in-place. The try block is used to read the file and process it; the except block is used to handle exceptions from the try block that are expected; the else block is used to update the file in place and allow related exceptions to propagate up; and the finally block cleans up the file handle
The else block helps you minimize the amount of code in try=blocks and visually distinguish the success case from the =try/except blocks
An else block can be used to perform additional actions after a successful try block but before common cleanup in a finally block

Consider `contextlib` and `with` Statements for Reusable `try/finally` Behavior

the with statement in Python is used to indicate when code is running in a special context.
It is easy to make your objects and functions work in with statements using the contextlib built-in module. This module contains the contextmanager decorator which lets a simple function be used in with statements. This is much easier than defining a new class with special methods __enter__ and __exit__
The context manager passed to a with statement may also return an object. The object is assigned to a local variable in the as part of the compound statement
The value yielded by context managers is supplied to the as part of the with statement. It is useful for letting your code directly access the cause of a special context
The contextlib built-in module provides a contextmanager decorator that makes it easy to use your own functions in with statements

Use `datetime` instead of `time` for Local clocks

the time module fails to consider work properly for multiple local times. Thus, you should avoid using the time module for this purpose. If you must use time, use it only to convert between UTC and the host computer’s local time.
datetime only provides the machinery for time zone operations with its tzinfo class and related methods. The Python default installation is missing time zone definitions beside UTC
To use pytz effectively, you should always convert local times to UTC first. Perform any datetime operations you need on the UTC values. Then convert to local times as a final step
Always represent time in UTC and do conversions to local time as the very final step before presentation

Make `pickle` reliable with `copyreg`

The purpose of pickle is to let you pass Python objects between programs that you control over binary channels
If you serialize, deserialize and then serialize again making changes to the classes, there will be inconsistency between previous serialized objects and the most recently serialized objects
Deserializing previously pickled objects may bread if the classes involved ave changed over time
The copyreg module lets you register the functions responsible for serializing and deserializing python objects, allowing you to control the behavior of pickle and make it more reliable
Use the copyreg built-in module with pickle to ensure backward compatibility of serialized objects

Use `decimal` when precision is paramount

The Decimal class from the decimal built-in module provides fixed point math of 28 decimal places by default
The Decimal class is ideal for situations that require high precision and control over rounding behavior, such as computations of monetary values
Pass str instances to the Decimal constructor instead of float instances if it’s important to compute exact answers and not floating point approximations

Profile before optimizing

Python provides a built-in profiler for determining which parts of a program are responsible for its execution time. This means you can focus your optimization efforts on the biggest sources of trouble and ignore parts of the program that don’t impact speed
Python provides two built-in profilers: one that is pure Python and another that is a C-extension module. The cProfile built-in module is better because of its minimal impact on the performance of your program while its being profiled
The Profile object’s runcall method provides everything you need to profile a tree of function calls in isolation
The Stats object lets you select and print the subset of profiling information you need to see to understand your program’s performance.

Prefer `deque` for Producer-Consumer Queues

the list type can be used as a FIFO queue by having the producer call append to add items and the consumer call pop(0) to receive items. However, this may cause problems because the performance of pop(0) degrades superlinearly as the queue length increases.
The deque class from the collections built-in module takes constant time - regardless of length - for append and popleft, making it ideal for FIFO queues.

Consider Searching Sorted Sequences with `bisect`

Searching sorted data contained in a list takes linear time using the index method or a for loop with simple comparisons
The bisect built-in module’s bisect-left function takes logarithmic time to search for values in sorted lists, which can be orders of magnitude faster than other approaches

Know How to Use `heapq` for Priority Queues

Testing and Debugging

Consider Interactive Debugging with `pdb`

In most other programming languages, you use a debugger by specifying what line of a source file you would like to stop on, and then execute the program. In contrast, with Python, the easiest way to use the debugger is by modifying your program to directly initiate the debugger just before you think you’ll have an issue worth investigating
Three very useful commands make inspecting the running program easier
- where
- up
- down
When you are done inspecting the current state, you can use these five debugger commands to control the program’s execution
- step
- next
- return
- continue
- quit
The Python debugger prompt is a full Python shell that lets you inspect and modify the state of a running program

Use `tracemalloc` to understand memory usage and leaks

Memory management in the default implementation of Python, CPython, uses reference counting. This ensures that as soon as all references to an object have expired, the reference object is also cleared from memory, freeing up that space for other data. CPython also has a built-in cycle detector to ensure that self-referencing objects are eventually garbage collected. In theory, this means the most Python programmers don’t have to worry about allocating or deallocating memory in their programs
One of the first ways to debug memory usage is to ask the gc built-in module to list every object currently known by the garbage collector.
It can be difficult to understand how Python programs use and leak memory
The gc module can help you understand which objects exist, but it has no information about how they were allocated
The tracemalloc built-in module provides powerful tools for understanding the sources of memory usage

Collaboration

Know where to find community-built modules

The Python Package Index contains a wealth of common packages that are built and maintained by the Python community
pip is the command line tool you can use to install packages from PyPI
The majority of PyPI modules are free and open source software

Use Virtual environments for isolated and reproducible environments

Virtual environments allow you to use pip to install many different versions of the same package on the same machine without conflicts
Virtual environment are created with python -m venv, enabled with source bin/activate and disabled with deactivate
You can dump all the requirements of an environment with python3 -m pip freeze
You can reproduce an environment by running python3 -m pip install -r requirements.txt

Write Docstrings for every function, class and Module

Documentation in Python is extremely important because of the dynamic nature of the language. Python provides built-in support for attaching documentation to blocks of code. Unlike with many other languages, the documentation from the program’s source code is directly accessible as the program runs
You can use the built-in pydoc module from the command line to run a local web server that hosts all the Python documentation that’s accessible to your interpreter
Each module should have a top-level docstring - a string literal that is the first statement in the source file. The goal of this doc string is to introduce the module and its contents
If you are using type annotations, omit the information that’s already present in type annotations from docstrings since it would be redundant to have it in both places
For functions and methods: Document every argument, returned value, raised exception and other behaviors in the docstring following the def statement
For classes: Document behavior, important attributes, and subclass behavior in the docstring following the class statement

Use Packages to Organize Modules and Provide Stable APIs

Packages in Python are modules that contain other modules. Packages allow you to organize your code into separate, non-conflicting namespaces with unique absolute module names
Simple packages are defined by adding an __init__.py to a directory that contains other source files. These files become the child modules of the directory’s package. Package directories may also contain other packages
You can provide an explicitly API for a module by listing its publicly visible names in its __all__ special attribute
You can hide a package’s internal implementation by only importing public names in the package’s __init__.py field or by naming internal-only members with a leading underscore
When collaborating within a single team or a single codebase, using __all__ for explicitly APIs is probably unnecessary

Consider Module scoped code to configure deployment environments

Programs often need to run in multiple deployment environments that each have unique assumptions and configurations
You can tailor a module’s contents to different deployment environments by using normal Python statements in module scope
Module contents can be the product of any external condition including host introspection through the sys and os modules

Define a Root Exception to Insulate Callers from APIs

root exceptions let callers understand when there’s a problem with their usage of an API. If callers are using API properly, they should catch the various exceptions that are deliberated raised
root exceptions also help in finding bugs
Intermediate root exceptions let you add more specific types of exceptions in the future without breaking your API consumers
Catching the Python Exception base class can help you find bugs in API implementations

Know how to break circular dependencies

When a module is imported, here’s what Python actually does
- Searches for a module in locations from sys.path
- Loads the code from the module and ensures that it compiles
- Creates a corresponding empty module object
- Inserts the module into sys.modules
- Runs the code in the module object to define its contents
The attributes of a module aren’t defined until the code for those attributes has executed. But the module can be loaded with the import statement immediately after it’s inserted into sys.modules
Dynamic imports are the simplest solution for breaking a circular dependency between modules while minimizing refactoring and complexity

Consider `warnings` to Refactor and Migrate Usage

Using warnings is a programmatic way to inform other programmers that their code needs to be modified due to a change to an underlying library that they depend on. While exceptions are primarily for automated error handling by machines, warnings are all about communication between humans about what to expect in their collaboration with each other
warning.warn also supports the stacklevel parameter, which makes it possible to report the correct place in the stack as the cause of the warning. stacklevel also makes it easy to write functions that can issue warnings on behalf of other code, reducing boiler plate.

Consider Static Analysis via `typing` to Obviate Bugs – WORK IN PROGRESS

the benefit of adding type information to a Python program is that you can run static analysis tools to ingest a program’s source code and identify where bugs are most likely to occur. The typing built-in module doesn’t actually implement any of the type checking functionality itself. It merely provides a common library for defining types, including generics, that can be applied to Python code and consumed by separate tools
Most popular implementations of typing tools are mypy , pytype, pyright, pyre
There are many new constructs in this chapter that I have never paid attention to. Infact I have hardly written any code that uses typing module to annotate. I should probably spend some time going over typing module and incorporate it in my daily work
A wide variety of other options are available in the typing module. Notably, exceptions are not included. Exceptions are not considered part of an interface’s definition. Thus, if you want to verify that you are raising and catching exceptions properly, you need to write tests
It’s going to slow you down if you try to use type annotations from the start when writing a new piece of code. A general strategy is to write a first version without annotations, then write tests, and then add type information where it’s most valuable
Type hints are most important at the boundaries of a codebase such as an API you provide that many callers depend on. Type hints complement integrations tests and warnings to ensure that your API callers aren’t surprised or broken by your changes
It can be useful to apply type hints to the most complex and error prone parts of you code that aren’t part of an API
If possible, you should include static analysis as part of your automated build and test system to ensure that every commit to your codebase is vetted for errors. In addition, the configuration used for type checking should be maintained in the repository to ensure that all of the people you collaborate with are using the same rules
As you add type information to your code, it’s important to run type checker as you go. Otherwise, you may nearly finish sprinkling type hints everywhere and then be hit by a huge wall of errors from the type checking tool, which can be disheartening and make you want to abandon type hints altogether
It’s important that in many situations, you may not need or want to use any type annotations at all. For small programs, adhoc code, legacy codebases, and prototypes, type hints may require far more effort than they are worth
Python has special syntax and the typing built-in module for annotating variables, fields, functions and methods with type information
Static type checkers can leverage type information to help you avoid many common bugs that would otherwise happen at runtime
There are variety of best practices for adopting types in your programs, using them in APIs, and making sure they don’t get in the way of your productivity.

Takeaway

This book is targeted towards intermediate level Python developer and can be a useful reference for writing beautiful code. If you writing throwaway code most of the time, then probably you can give this book a pass. However if you are writing or intend to write a piece of code that will be reusable by you or others, now or in the future, this book can be a valuable reference in writing effective code.

Contents

Pythonic Thinking

Python version

Difference between str and bytes

Prefer interpolated F-strings Over C-style format strings

Takeaways

Write Helper functions instead of Complex expressions

Prefer Multiple Assignment Unpacking over Indexing

Prefer enumerate over range

Use zip to process iterators in parallel

Avoid else Blocks After for and while loops

Prevent Repetition with Assignment Expressions

Lists and Dictionaries

Know How to Slice Sequences

Avoid Striding and Slicing in a Single Expression

Prefer Catch-All Unpacking over Slicing

Sort by Complex Criteria Using the key parameter

Be Cautious When Relying on dict insertion ordering

Prefer get Over in and KeyError to Handle missing Dictionary Keys

Prefer defaultdict Over setdefault to handle missing items in internal state

Know how to construct key-dependent default values using __missing__