The following post contains a summary of the book titled Treading on Python II by Matt Harrison

Programming Styles

  • Python supports three types of programming paradigms
    • Imperative/Procedural
    • Object Oriented
    • Declarative/Functional

Iterator Protocol

  • iter is a global built-in function that calls the object’s dunder method __iter__
  • Writing a for loop based on iterators
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
test = [1, 2, 4]
for i in test:
    print(i)

iterator = iter(test) while True: try: x = iterator.next() print(x) except StopIteration as e: break

  • Each loop is converted in to byte code and this byte code is run by the interpreter
  • The actual iterator is not the object that is being iterated. list and string have separate iterator objects to iterate upon them
  • StringIO class implements the iterator protocol
  • Iterator protocol defines the process of iterating the objects in a container utilizing the methods __iter__ and __next__

Iterable vs. Iterator

  • What is an iterable ? An iterable is any object that allows iteration
    • This object must implement __iter__ method and must return an iterator object. This iterator object can be the same object or a completely different object
    • This object must also implement __next__ method
  • Iterators are good for one pass over the values. This means that iterators are stateful
  • range(10) returns an rangeiterator object that implements __iter__ and __next__ methods
  • A class is called a self-iterator if its __iter__ method returns the same instance on which the dunder method has been invoked
  • Most iterable objects are not self-iterators. They return a different object when their __iter__ method is invoked
  • If the datatype is a self-iterator, then there could be problems in nested loops. Here is a nice example
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
class Counter(object):
    def __init__(self, size):
        self.size = size
        self.start = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.start < self.size:
            self.start +=1
            return self.start
        raise StopIteration

x = Counter(2) y = Counter(3) for i in x: for j in y: print(i,j)

The above code does not work as desired as the iter returns the same instance and the inner loop goes through only once and never gets repeated. The solution to this problem is to make sure that the iter method returns a different object as compared to the original object on which the method was invoked

  • StringIO is a self-iterator and hence allows a single traversal through the data
  • One can modify a class to be an iterable and not an iterator by modifying its __iter__ method to return a fresh instance of the class
  • Self-iterators will exhaust. If that is an issue, make objects that are only iterable, but not iterators themselves
  • Never thought about iterables and iterators in this detail, until now. This chapter has been super awesome as it talks about the perils of making a data instance in to a self-iterator
  • One can easily create an object spitting out an infinite sequence using an iterator
  • What did I learn from this chapter ?

Iterators are different from Iterable. Any object that implements __iter__ and __next__ method are iterators. Iterable are objects that need to implement __iter__- method. Iterables are used in for loops, while loops etc. This is exactly the object we are looking for. Iterables have __iter__ method. This method can return an instance of iterator whenever

Generators

  • Generators were introduced in Python 2.3
  • Iterators have two problems
    • They must track their state within the iterator instance in order to provide the correct elements across multiple calls
    • If a list is returned, it could potentially consume large amounts of memory
  • What is generator according to Python documentation ?
    • A function which returns an iterator is a generator
    • It looks like a normal function but returns a yield statement to produce a series of values usable in a for-loop or can be retrieved one at a time using next() method
    • Each yield suspends processing, remembering the location execution state
  • generators are not invoked when they are created. They are invoked only when they are iterated over
  • The differences between function and generator are
    • Generators are not executed when they are invoked. They are invoked only when they are iterated over
    • Generators can be iterated over whereas functions cannot be
    • Generators freeze their state after a yield statement
  • Any return statement inside a generator function is treated as raising a StopIteration exception
  • Because generators exhaust, they do not serve well for re-use in nested loops
  • Invoking __iter__ on a generator will return the same generator object instance
  • Generators are self-generators
  • return in a generator causes the generator to stop and exit

Object Generators

  • Not only functions, but methods in a class can be generator functions
  • Object generators are reusable if they do not attach state to the instance
  • Every time a generator is iterated over, Python creates a separate object to iterate over

Generators in Practice

  • The main pattern to look for is accumulation in to a list during the loop.
  • Generators exhaust. They are not good for reuse
  • A nice way of debugging generators is to convert them in to list
  • pdb will not step in to the generator unless it is actually iterated over
  • Generators do not index or slice
  • itertools module contains function to slice generators
  • Generators have no inherent length
  • It is possible that Generators are slower than the normal list iteration for smaller chunks of data
  • Generators may consume less memory
  • Generators always evaluate to True. This is an area where generators are not a drop in replacement for lists
  • The OrderedDict class in the collections.py module in the Standard Library uses a generator in its __iter__ method

List or Generator

  • If the access to items is repeated, then it is better to use list
  • If data fits in memory, use lists
  • If operations on whole sequence are performed, then using generator is not a good idea
  • A list can be converted to generator but not vice-versa
  • More of Python objects are becoming lazy since Python 3

Real World Uses

  • Database chunking
  • Recursive generators

Functions

  • Functions are first class citizens. It means that functions can be passed in to other functions, returned from other functions and assigned to variable
  • Java language has no concept of passing functions around
  • How to check whether a function is callable ?
1
2
3
4
5
6
import collections
def foo():
    return 1
callable(foo)
hasattr(foo,'__call__')
isinstance(foo, collections.Callable)
  • Instances of classes in Python have attributes. Functions, being instances of function, likewise have attributes
  • __name__ attribute of the function stores the function name
  • __doc__ attribute of the function stores the doc string
  • __defaults__ attribute of the function stores the defaults
  • Default parameter for functions only use non-mutable types
  • locals and globals will return a mapping of names to objects that their respective namespaces contain
  • Any nested function has read/write access to globals and built-ins
  • A free variable is any variable that is neither a local variable nor passed as an argument

Function Parameters

  • Python supports four different types of parameters for functions
    • Normal parameters
    • Keyword parameters
    • Variable parameters(*args)
    • Variable keyword parameters (**kwargs)
  • If you pass a splat operator, it will be passed as a tuple in to the function
  • * splat operator can be used to flatten an input and pass the input to a function that expects normal arguments
  • ** operator can be used to flatten a dictionary and pass the input in to a function the expects normal arguments
  • The order of parameters in any function should be normal, keyword, variable, variable keyword

Closures

  • They are used to keep a common interface, eliminate code duplication and to delay execution of a function
  • They enable generation of functions and conforming to an interface
  • When you define a function using def, it creates a function object and a variable name

Decorators

  • A decorator is a method for altering a callable. Closure enables the creation of decorators
  • Python 2.4 introduced syntactic sugar to write decorators for a function
  • Parametrized decorator is useful to customize the specific function it is wrapping
  • common uses of decorators
    • function arguments
    • function being wrapped
    • results of a function
  • common instances where decorators are used
    • caching expensive computations
    • retrying a function that might fail
    • logging the amount of time spent in a function
    • timing out a function call
    • access control

Alternative Decorator implementations

  • One cannot have a lambda function in another lambda function ? Why ? Lambdas support only expression in their body. Another lambda would be compound statement
  • you can write classes that serve as decorator instances
  • Decorating a decorator looks complicated. I guess unless I use this in the code, I will not be able to internalize the lesson

Functional Constructs in Python

  • The biggest drawback to lambda expressions is that they support a single expression in their body
  • lambda expressions are considered pure
  • map operator can only operate on finite sequences. It cannot operate on infinite generators or iterators
  • In Python 3, reduce is a part of functools module
  • filter was converted to lazy class in Python 3
  • Tail Call Optimization refers to an optimization that applies to certain recursive functions where they call themselves as the last operation
  • Python interpreter comes with rails to protect against using too much memory by creating a lot of stacks. By default it is set to 1000
  • Generator expressions return a generator object, which follows the iterator protocol
  • A generator expression can be iterated only once whereas a list comprehension can be run through multiple times
  • An object that allows for single iteration will have __iter__ and __next__ method
  • An iterable object will have __iter__ method that gives an iterator every time it is called
  • Dictionary comprehensions are not lazy and are evaluated upon creation
  • Set comprehensions are not lazy and are evaluated upon creation
  • operator module is super useful when programming in a functional style or using comprehension constructs

Takeaway

I had downloaded this book in June 2019 and somehow had never found time to go through this book. Thanks to my Python immersion in this lock-down period, I have managed to read through the entire book in a day. With out spending time going through Python Workout and Tiny Python Projects, I would not have had followed the book in its entirety in one day. Since I had been working through the exercises in the previous two books, I could follow most of the stuff mentioned in the book. Needless to say, the book has valuable content that would be helpful to understand iterators, generators, decorators and list comprehensions. Super useful book and I am glad I have managed to read it. This was my HIGHLIGHT for the day that I had planned. I am happy that I have managed get it done.