This blog post summarizes the book titled “Python Tricks - A Buffet of Awesome Python Features”
Covering your A** with Assertions
- An assertion error should never be raised unless there is a bug in your program
- In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it.[1] The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of quantum mechanics, which states that the act of observing a system inevitably alters its state. In electronics the traditional term is probe effect, where attaching a test probe to a device changes its behavior.
assert
can be globally disabled by -0
and -00
command line switches, as well as PYTHONOPTIMIZE environment variable in Python
- Never use
assert
statements to validate data
- It is surprisingly easy to write asserts that never fail
assert
statement is a debugging aid that tests a condition as an internal self-check in your program
- Asserts should only be used to help developers identify bugs. They are not a mechanism for handling run-time errors
pytest
tells you to write assert
and the test condition in a single line
assert(1 =
2, ‘This should fail’)= This will never fail because assert looks at a tuple and evaluates it to True always
Complacent comma placement
- Multiple adjacent string literals, possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation
- In python, you can place a comma after every item in a list, dict or set, including the last item
- smart formatting and comma placement can make your list easy to maintain
Context managers and the with
statement
with
statement makes acquiring and releasing resources a breeze
- The alternative to using
context manager
is to write your own try
and finally
block
- A context manager is nothing but an object that happens to have dunder enter and dunder exit method implemented
- one can use
contextlib
and use the contextmanager
decorator function to define a generator based factory function for a resource that will then automatically support the with
statement
1
2
3
4
5
6
7
8
9
10
11
|
from contextlib import contextmanager
@contextmanager
def managed_file(name):
try:
f = open(name, 'w')
yield f
finally:
f.close()
>>> with managed_file('hello.txt') as f:
... f.write('hello, world!')
... f.write('bye now')
|
- Found an interesting implementation of indentation using
contextmanager
decorator
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
@contextmanager
def rkindentor():
level=0
@contextmanager
def _indenter():
nonlocal level
try:
level += 1
yield
finally:
level -= 1
def _print(text):
print('\t' * level + text)
_indenter.print = _print
yield _indenter
with rkindentor() as indent:
print("\n")
indent.print("radha")
with indent():
indent.print("krishna")
with indent():
indent.print("pendyala")
indent.print("hey")
|
Underscores, Dunders and more
- There are five underscore patterns that one must be aware of in Python
- Single leading underscore
- Single trailing underscore
- Double leading underscore
- Double leading and trailing underscore
- Single underscore
- Single leading underscore is agreed upon convention that the variable is intended for private use
- If you do a wildcard import, the leading underscore variable and function will not be imported
- If you do a regular import, the leading underscore variable and function will be imported
- Single trailing underscore
- Sometimes the most fitting name is already taken up by Python such as def, print etc. Hence the convention is to use a trailing underscore to use these names
- Double leading underscore
- These variables are changed by the Python interpreter so that any derived class cannot override these variables. The change of variables is called
name mangling
- Double leading and trailing underscore
- variables starting with double leading and trailing underscore are not touched by Python interpreter
- reserved for special use
- Single underscore is meant to convey that the variable is a temporary or a throw away variable
_
is a special variable that represents the result of the last expression
- There are four ways to format strings in Python
- % operator
- There is a
%
operator on the string that can be used to do positional formatting of strings
- If there are multiple substitutions that you need to make, it is better to bunch up all the variables in to a dictionary and then use the
%
operator
- Using
%
operator is called old style string formating
- format function - Python 3 - The new style is using the
format
function
- fstrings : Python 3.6 - Formatted string literals
- Behind the scenes the formatted string literals are a Python parser feature that converts f-strings in to a series of string constants and expressions
- Template:
- One needs to import from standard library
1
2
|
from string import Template
Template('Hey $name').substitute(name=name)
|
- Template strings are better from a safety perspective as they reduce security vulnerabilities to your program
- Rule of thumb
- If strings are user supplied, use
Template
strings
- If you are using Python 3.6+, use formatted string literals
- If you are using older Python 3, use
format
function
Python Functions are first-class
- Python attaches a string identifier to every function at creation time that can be accessed by dunder name
- functions can be stored in data structures
- The ability to pass functions around is powerful as it allows to pass around behaviors in your program
- functions can also return functions, i.e. return behaviors
- A closure remembers the values from its enclosing lexical scope even when the program flow is no longer in that scope
- A functions can also preconfigure behaviors
- All functions are objects but not the other way around. An object can be made in to a function by implementing dunder call method
Lambdas are Single-Expression functions
- Lambda functions are restricted to single expression. They can’t use annotations or statements
- Executing a lambda executes the single expression and then returns the result of evaluating the expression
- It is better to use list comprehensions and generator expressions as compared to using
map
and filter
operations
The Power of decorators
- Python’s decorators allows you to extend and modify the behavior of the callable without permanently modifying the callable itself
- Some of the usecases of decorators
- logging
- enforcing access control and authentication
- instrumentation and timing functions
- rate-limiting
- caching, and more
- decorators are applied from bottom to top
- decorating functions that takes arguments: This is done by the following template
1
2
3
4
5
|
def trace(func):
def wrapper(*args, **kwargs):
original_result = func(*args, **kwargs)
return original_result
return wrapper
|
- use
functools.wraps
to carry the docstrings and parameter names to the decorated function
Fun with *args and **kwargs
- It is used to make the function flexible. It can take positional arguments and keyword arguments
- It also gives an opportunity for the functions to modify the keyword arguments or positional arguments before passing along to other functions
- *args collect extra positional arguments as tuple. **kwargs collect the extra arguments as a dictionary
Function Argument Unpacking
- put a * before an iterable - Python will unpack it and pass the elements to the function
- put a ** before a dictionary - Python will unpack it as keyword arguments and pass it along
Nothing to Return here
- Every python functions returns None if you do not specify explicitly a return statement
- It is better to communicate the intent of your code by explicitly stating a return statement than avoiding one
- code is communication
Object comparisons
- == operator is used to check equality whereas
is
operator is used to check identities
is
expression evaluates to True if they are pointing to the same object
- == evaluates to True if the objects are referred to by the variables are equal
String conversions - Every class needs dunder repr
- The fact that str and such methods start and end with a double underscore is simply a naming convention to flag them as core Python features
- Inspecting an object in Python interpreter simply prints the results of repr method
- When to use what str and repr ? : It is better to use repr strings unambiguous and helpful to the developers
- If you don’t addd a str method, Python falls back to the repr method
- In Python 3, there’s one data type to hold all kinds of text in the world -
str
- In Python 2.x, there are two data types.
str
uses ASCII text and unicode
which is equivalent to Python 3’s str
function
- In Python 2.x, str returns bytes whereas unicode returns characters
- Always use Python 3’s str
Defining your own Exception classes
- Custom Exception classes help the downstream applications/ developers make sense of errors without having to go through the source code implementation
- One should have custom exception base class for a project and then derive all sorts of exceptions from this base class
- defining custom exception classes makes it easier for your users to adopt an it’s easier to ask for forgiveness than permission (EAFP) coding style that’s considered more Pythonic.
Cloning objects for fun and profit
- Assignment statements in Python don’t create copies of object. They only bind names to the object. For immutable objects, it does not matter
- For mutable objects, the usual constructors available do
shallow copy
. This means that one constructs a new collection object and then populates references to the child objects
- One can use
copy.deepcopy
to do deep cloning
copy
module gives the power to do shallow copying and deep copying
Abstract base classes keep Inheritance in check
abc
module is useful to respect inheritance structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
|
from abc import ABCMeta, abstractmethod
class Base(metaclass = ABCMeta):
@abstractmethod
def foo(self):
pass
@abstractmethod
def bar(self):
pass
class Concrete(Base):
def foo(self):
pass
x = Concrete() # Gives Type Error
Base()
|
- Using ABCs can help avoid bugs and make class hierarchies easier to maintain
What are Named Tuples good for ?
- One cannot give names to various elements in a regular built-in tuple
NamedTuples
are useful to give names to various elements in a tuple
- These were added in Python 2.6 as a part of
collections
library
- NamedTuples can be thought of as a memory efficient shortcut to defining an immutable class in Python
- NamedTuples can help clean up your code by enforcing easier-to-understand structure on your data
_fields
method is used to access the field of a named tuple
- Namedtuples provide a few useful helper methods that all start with a single underscore, but are part of the public interface. It’s okay to use them.
Class vs Instance variable pitfalls
- There are two kinds of data objects on Python objects - class variables and instance variables
- you can access the class variables using instance or class
- modifying a class variable on the class namespace affects all instances of the class
- class variable can be shadowed by instance variables
Instance, Class and Static Methods Demystified
- class method takes in
cls
attribute and hence cannot modify object instance state
- static method takes in no parameter and hence cannot modify object or class state
- Python allows only one init method and hence by using @classmethod, you can create as many class constructors as you want
- Put differently, using static methods and class methods are ways to communicate developer intent while enforcing that intent enough to avoid most “slip of the mind” mistakes and bugs that would break the design.
Dictionary, Maps and Hash tables
- A hashable object is one whose hash value never changes in its lifetime
OrderedDict
preserves the order in which keys have been created
defaultDict
is another class in the collections
field that accepts a callable in its constructor whose return value will be used if a requested key cannot be found
chainMap
groups multiple dictionaries in to a single dictionary
types.MappingProxyType
is a wrapper around a standard dictionary that gives a read only dictionary
Array Data Structure
- Arrays are contiguous data structures
- A restricted parking lot corresponds to a TypedArray
- Python lists are implemented as DynamicArrays
- Python tuple sizes are decided at the time of initialization. They are immutable and hence the data is tightly packed
- Python’s
array
module provides space-efficient storage of basic C style data types like bytes, 32-bit integers, floating point numbers
- Arrays created with
array
module are TypedArrays
- Python 3.x uses
str
objects to store textual data as immutable sequences of unicode characters
- strings are recursive data structures
- Byte objects are immutable sequences of single bytes
- Bytearray objects is a mutable set of integers in the range of 0 to 255. They are closely related to
bytes
objects
Records, Structs and Data Transfer Objects
dict
is an associative array
- they are mutable and offer no protection against wrong field names
tuple
is immutable but no protection against missing fields and wrong order
custom class
collections.namedtuple
tying.NamedTuple
- similar to collections.namedtuple
but with support for type hints
struct.Struct
class converts between Python values and C structs
types.SimpleNamespace
glorified dictionary
- If you’re looking for a safe default choice, my general recommendation for implementing a plain record, struct, or data object in Python would be to use collections.namedtuple in Python 2.x and its younger sibling, typing.NamedTuple in Python 3.
Sets and Multiset
set
, frozenset
, collections.Counter
are mentioned in this chapter
frozenset
can act as dictionary keys
Counter
implements a multiset bag type
Stacks
- stack is LIFO
- queue is FIFO
- To get the amortized O(1) performance for inserts and deletes, new items must be added to the end of the list with the append() method and removed again from the end using pop(). For optimum performance, stacks based on Python lists should grow towards higher indexes and shrink towards lower ones.
list
can be considered as simple built-in stack
collections.deque
implements a double ended queue that supports adding and removing elements from either sides
queue.LifoQueue
for implementing LIFO
- list is backed by a dynamic array which makes it great for fast random access, but requires occasional resizing when elements are added or removed. The list over-allocates its backing storage so that not every push or pop requires resizing, and you get an amortized O(1) time complexity for these operations. But you do need to be careful to only insert and remove items “from the right side” using append() and pop(). Otherwise, performance slows down to O(n).
- collections.deque is backed by a doubly-linked list which optimizes appends and deletes at both ends and provides consistent O(1) performance for these operations. Not only is its performance more stable, the deque class is also easier to use because you don’t have to worry about adding or removing items from “the wrong end.”
Queues
list
is terribly show queue
collections.deque
can act as queue as it gives O(1) performance for adding elements at the beginning or end. However for random access it is O(n)
queue.Queue
locking semantics for Parallel computing
- -multiprocessing.Queue= shared job queues
- If you’re not looking for parallel processing support, the implementation offered by collections.deque is an excellent default choice for implementing a FIFO queue data structure in Python. It provides the performance characteristics you’d expect from a good queue implementation and can also be used as a stack (LIFO Queue).
Priority Queues
- One can use several alternatives in Python to get a Priority queue implementation.
list
can be used to get a priority queue. One can add elements, sort it manually so that elements are in the order of priority
heapq
module is also an alternative where it prov
queue.PriorityQueue
is an another alternative if you are looking for synchronization and locking semantics
queue.PriorityQueue
stands out from the pack with a nice object-oriented interface and a name that clearly states its intent. It should be your preferred choice
Writing Pythonic Loops
- avoid the
range(len)
pattern if you are iterating over a list or set or some built in structure
- if you want index, you can use
enumerate
- if you are iterating over a python data structure, check to see if the object itself has functions useful for iterating over it
- Avoid managing loop indexes and stop conditions manually if possible.
- Python’s for loops are actuall “for-each” loop that can directly iterate over the items from a container
Comprehending Comprehensions
- They are a key feature in Python
- They are just fancy syntactic sugar for simple
for
loop
- Don’t use list comprehensions, dict comprehensions, set comprehensions for more than one level
List Slicing tricks and the Sushi operator
- In Python 3, you can use = list.clear()=
- One can use slicing to replace all elements of a list without creating new objects
lst[::]
creates a shallow copy
Beautiful Iterators
for in
loop is a syntactic sugar for iterator calling its __iter__
method. It returns an iterator object
- The loop repeatedly calls
__next__
method of the iterator
- If you’ve ever worked with database cursors, this mental model will seem familiar: We first initialize the cursor and prepare it for reading, and then we can fetch data from it into local variables as needed, one element at a time.
iter(x)
invokes dunder iter method
- If you invoke
__next__
method after you have exhausted the list, it invokes a StopIteration
exception
- To support iteration, an object needs to implement dunder init and dunder next methods
Generator Expressions
- Once a generator expression has been consumed it cannot be reused. Hence in that sense a class based or method based generators have the added flexibility
- They look similar to list comprehensions but do not generate any objects. Instead, they generate values “just in time”
- They are best to implement simple adhoc iterators
Iterator Chains
- You can chain iterators so that each iterator can be fed in to another iterator
- Data processing happens one element at a time
1
2
3
4
|
integers = range(8)
squared = (i*i for i in integers)
negated = (-i for i in squared)
list(negated)
|
- One can keep extending the chain of generators to build out a processing pipeline with many steps. It would still perform efficiently and could easily be modified because each step in the chain is an individual generator function
- It can impact readability though
Dictionary Default Values
- Avoid explicit
key in dict
checks when testing for membership
collections.defaultDict
could be a better alternative
Sorting Dictionaries for Fun and Profit
- one can use
operator.itemgetter
and operator.attrgetter
for key
arguments
Emulating Switch-case with dicts
- One can use dictionary keys as conditions and push all the logic for each case as a lambda function or generic function as value for these keys
Craziest Dict expression in the West
1
|
{True: 'yes', 1: 'no', 1.0: 'maybe'}
|
evaluates to
- Python treats
bool
as subclass of int
So many ways to merge dictionaries
- In Python 3.5 and above, one can use
**
operator to merge multiple dictionaries
1
2
3
4
5
|
>>> x = {'a':121}
>>> y = {'b':2121}
>>> z = {**x, **y}
>>> z
{'a': 121, 'b': 2121}
|
- To stay compatible with older versions, you can use
update
method
Dictionary pretty printing
- Disadvantage of using
json.dumps
is that it cannot stringify complex objects
- Alternative to
json.dumps
is to use pprint.pprint
function
Exploring Python Modules and Objects
- Use
dir
and help
to explore modules and objects
Isolated Project dependencies with virtual env
- Virtual environments keep your project dependencies separated. They help you avoid version conflicts between packages and different versions of the Python runtime.
- As a best practice, all of your Python projects should use virtual environments to store their dependencies. This will help avoid headaches.
Peeking beyond the Bytecode curtain
- CPython executes programs by first translating them into intermediate byte code and then running the bytecode on a stack based virtual machine
- You can use the built-in
dis
module to peek behind the scenes and inspect the byte code
- CPython is a VM - Virtual Machine. VM’s are everywhere on the cloud. It pays to read up on them