Python - Generator Functions and Expressions
In computer science, a generator is a special routine that can be used to control the iteration behavior of a loop.
A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.
Python provides tools that produce results only when needed:
- Generator functions
They are coded as normal def but use yield to return results one at a time, suspending and resuming. - Generator expressions
These are similar to the list comprehensions. But they return an object that produces results on demand instead of building a result list.
Because neither of them constructs a result list all at once, they save memory space and allow computation time to be split by implementing the iteration protocol.
We can write functions that send back a value and later be resumed by picking up where they left off. Such functions are called generator functions because they generate a sequence of values over time.
Generator functions are not much different from normal functions and they use defs. When created, however, they are automatically made to implement the iteration protocol so that they can appear in iteration contexts.
Normal functions return a value and then exit. But generator functions automatically suspend and resume their execution. Because of that, they are often a useful alternative to both computing an entire series of values up front and manually saving and restoring state in classes. Because the state that generator functions retain when they are suspended includes their local scope, their local variables retain information and make it available when the functions are resumed.
The primary difference between generator and normal functions is that a generator yields a value, rather than returns a value. The yield suspends the function and sends a value back to the caller while retains enough state to enable the function immediately after the last yield run. This allows the generator function to produce a series of values over time rather than computing them all at once and sending them back in a list.
Generators are closely bound up with the iteration protocol. Iterable objects define a __next__() method which either returns the next item in the iterator or raises the special StopIteration exception to end the iteration. An object's iterator is fetched with the iter built-in function.
The for loops use this iteration protocol to step through a sequence or value generator if the protocol is suspended. Otherwise, iteration falls back on repeatedly indexing sequences.
To support this protocol, functions with yield statement are compiled specially as generators. They return a generator object when they are called. The returned object supports the iteration interface with an automatically created __next__() method to resume execution. Generator functions may have a return simply terminates the generation of values by raising a StopIteration exceptions after any normal function exit.
The net effect is that generator functions, coded as def statements containing yield statement, are automatically made to support the iteration protocol and thus may be used any iteration context to produce results over time and on demand.
Let's look at the interactive example below:
>>> def create_counter(n): print('create_counter()') while True: yield n print('increment n') n += 1 >>> c = create_counter(2) >>> c <generator object create_counter at 0x03004B48> >>> next(c) create_counter() 2 >>> next(c) increment n 3 >>> next(c) increment n 4 >>>
Here are the things happening in the code:
- The presence of the yield keyword in create_counter() means that this is not a normal function. It is a special kind of function which generates values one at a time. We can think of it as a resumable function. Calling it will return a generator that can be used to generate successive values of n.
- To create an instance of the create_counter() generator, just call it like any other function. Note that this does not actually execute the function code. We can tell this because the first line of the create_counter() function calls print(), but nothing was printed from the line:
>>> c = create_counter(2)
- The create_counter() function returns a generator object.
- The next() function takes a generator object and returns its next value. The first time we call next() with the counter generator, it executes the code in create_counter() up to the first yield statement, then returns the value that was yielded. In this case, that will be 2, because we originally created the generator by calling create_counter(2).
- Repeatedly calling next() with the same generator object resumes exactly where it left off and continues until it hits the next yield statement. All variables, local state, &c.; are saved on yield and restored on next(). The next line of code waiting to be executed calls print(), which prints increment n. After that, the statement n += 1. Then it loops through the while loop again, and the first thing it hits is the statement yield n, which saves the state of everything and returns the current value of n (now 3).
- The second time we call next(c), we do all the same things again, but this time n is now 4.
- Since create_counter() sets up an infinite loop, we could theoretically do this forever, and it would just keep incrementing n and spitting out values.
The generator function in the following example generated the cubics of numbers over time:
>>> def cubic_generator(n): for i in range(n): yield i ** 3 >>>
The function yields a value and so returns to its caller each time through the loop. When it is resumed, its prior state is restored and control picks up again after the yield statement. When it's used in a for loop, control returns to the function after its yield statement each time through the loop:
>>> for i in cubic_generator(5): print(i, end=' : ') # Python 3.0 #print i, # Python 2.x 0 : 1 : 8 : 27 : 64 : >>>
If we use return instead of yield, the result is:
>>> def cubic_generator(n): for i in range(n): return i ** 3 >>> for i in cubic_generator(5): print(i, end=' : ') #Python 3.0 Traceback (most recent call last): File "", line 1, in for i in cubic_generator(5): TypeError: 'int' object is not iterable >>>
Here is an example of using generator and yield.
>>> # Fibonacci version 1 >>> def fibonacci(): Limit = 10 count = 0 a, b = 0, 1 while True: yield a a, b = b, a+b if (count == Limit): break count += 1 >>> >>> for n in fibonacci(): print(n, end=' ') 0 1 1 2 3 5 8 13 21 34 55 >>>
Because generators preserve their local state between invocations, they're particularly well-suited for complicated, stateful iterators, such as fibonacci numbers. The generator returning the Fibonacci numbers using Python's yield statement can be seen below.
Here is another version of Fibonacci:
>>> # Fibonacci version 2 >>> def fibonacci(max): a, b = 0, 1 (1) while a < max: yield a (2) a, b = b, a + b (3)
Simple summary for this version:
- It starts with 0 and 1, goes up slowly at first, then more and more rapidly. To start the sequence, we need two variables: a starts at 0, and b starts at 1.
- a is the current number in the sequence, so yield it.
- b is the next number in the sequence, so assign that to a, but also calculate the next value a + b and assign that to b for later use. Note that this happens in parallel; if a is 3 and b is 5, then a, b = b, a + b will set a to 5 (the previous value of b) and b to 8 (the sum of the previous values of a and b).
>>> for n in fibonacci(500): print(n, end=' ') 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 >>> >>> list(fibonacci(500)) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377] >>>
As we can see from the output, we can use a generator like fibonacci() in a for loop directly. The for loop will automatically call the next() function to get values from the fibonacci() generator and assign them to the for loop index variable (n). Each time through the for loop, n gets a new value from the yield statement in fibonacci(), and all we have to do is print it out. Once fibonacci() runs out of numbers (a becomes bigger than max, which in this case is 500), then the for loop exits gracefully.
This is a useful idiom: pass a generator to the list() function, and it will iterate through the entire generator (just like the for loop in the previous example) and return a list of all the values.
To end the generation of values, functions use either a return with no value or simply allow control to fall off the end of the function body.
To see what's happening inside the for, we can call the generator function directly:
>>> x = cubic_generator(5) >>> x <generator object cubic_generator at 0x000000000315F678> >>>
We got back a generator object that supports the iteration protocol. The next(iterator) built-in calls an object's __next__() method:
>>> next(x) 0 >>> next(x) 1 >>> next(x) 8 >>> next(x) 27 >>> next(x) 64 >>> next(x) Traceback (most recent call last): File "<pyshell#20>", line 1, in <module> next(x) StopIteration >>>
We could have built the list of yielded values all at once:
>>> def cubic_builder(n): result = [] for i in range(n): result.append(i ** 3) return result >>> for x in cubic_builder(5): print(x, end=' : ') 0 : 1 : 8 : 27 : 64 : >>>
Or:
>>> >>> for x in [n ** 3 for n in range(5)]: print(x, end=' : ') 0 : 1 : 8 : 27 : 64 : >>> >>> for x in map((lambda n: n ** 3), range(5)): print(x, end=' : ') 0 : 1 : 8 : 27 : 64 : >>>
As we've seen, we could have had the same result using other approaches. However, generators can be better in terms of memory usage and the performance. They allow functions to avoid doing all the work up front. This is especially useful when the resulting lists are huge or when it consumes a lot of computation to produce each value. Generator distributes the time required to produce the series of values among loop iterations.
As a more advanced usage example, generators can provide a simpler alternatives to manually saving the state between iterations in class objects. With generators, variables accessible in the function's scopes are saved and restored automatically.
The notions of iterators and list comprehensions have been combined in a new feature, generator expressions. Generator expressions are similar to list comprehensions, but they are enclosed in parentheses instead of square brackets:
>>> # List comprehension makes a list >>> [ x ** 3 for x in range(5)] [0, 1, 8, 27, 64] >>> >>> # Generator expression makes an iterable >>> (x ** 3 for x in range(5)) <generator object <genexpr> at 0x000000000315F678> >>>
Actually, coding a list comprehension is essentially the same as wrapping a generator expression in a list built-in call to force it to produce all its results in a list at once:
>>> list(x ** 3 for x in range(5)) [0, 1, 8, 27, 64] >>>
But in terms of operation, generator expressions are very different. Instead of building the result list in memory, they return a generator object. The returned object supports the iteration protocol to yield one piece of the result list at a time in any iteration context:
>>> Generator = (x ** 3 for x in range(5)) >>> next(Generator) 0 >>> next(Generator) 1 >>> next(Generator) 8 >>> next(Generator) 27 >>> next(Generator) 64 >>> next(Generator) Traceback (most recent call last): File "<pyshell#68>", line 1, in <module> next(Generator) StopIteration >>>
Typically, we don't see the next iterator machinery under the hood of a generator expression like this because of for loops trigger the next for us automatically:
>>> for n in (x ** 3 for x in range(5)): print('%s, %s' % (n, n * n)) 0, 0 1, 1 8, 64 27, 729 64, 4096 >>>
In the above example, the parentheses were not required around the generator expression if they are the sole item enclosed in other parentheses. However, there are cases when extra parentheses are required as in the example below:
>>> >>> sum (x ** 3 for x in range(5)) 100 >>> >>> sorted(x ** 3 for x in range(5)) [0, 1, 8, 27, 64] >>> >>> sorted((x ** 3 for x in range(5)), reverse=True) [64, 27, 8, 1, 0] >>> >>> import math >>> list( map(math.sqrt, (x ** 3 for x in range(5))) ) [0.0, 1.0, 2.8284271247461903, 5.196152422706632, 8.0] >>>
Generator expressions are a memory-space optimization. They do not require the entire result list to be constructed all at once while the square-bracketed list comprehension does. They may also run slightly slower in practice, so they are probably best used only for very large result sets.
The same iteration can be coded with either a generator function or a generator expression. Let's look at the following example which repeats each character in a string five times:
>>> G = (c * 5 for c in 'Python') >>> list(G) ['PPPPP', 'yyyyy', 'ttttt', 'hhhhh', 'ooooo', 'nnnnn']
The equivalent generator function requires a little bit more code but as a multistatement function, it will be able to code more logic and use more state information if needed:
>>> def repeat5times(x): for c in x: yield c * 5 >>> G = repeat5times('Python') >>> list(G) ['PPPPP', 'yyyyy', 'ttttt', 'hhhhh', 'ooooo', 'nnnnn'] >>>
Both expressions and functions support automatic and manual iteration. The list we've got in the above example iterated automatically. The following iterate manually:
>>> G = (c * 5 for c in 'Python') >>> I = iter(G) >>> next(I) 'PPPPP' >>> next(I) 'yyyyy' >>> >>> G = repeat5times('Python') >>> I = iter(G) >>> next(I) 'PPPPP' >>> next(I) 'yyyyy' >>>
Note that we make new generators here to iterator again. Generators are one-shot iterators.
Both generator functions and generator expressions are their own iterators. So, they support just one active iteration. We can't have multiple iterators. In the previous example for generator expression, a generator's iterator is the generator itself.
>>> G = (c * 5 for c in 'Python') >>> # My iterator is myself: G has __next__() method >>> iter(G) is G True >>>
If we iterate over the results stream manually with multiple iterators, they will all point to the same position:
>>> G = (c * 5 for c in 'Python') >>> # Iterate manually >>> I1 = iter(G) >>> next(I1) 'PPPPP' >>> next(I1) 'yyyyy' >>> I2 = iter(G) >>> next(I2) 'ttttt' >>>
Once any iteration runs to completion, all are exhausted. We have to make a new generator to start again:
# Collect the rest of I1's items >>> list(I1) ['hhhhh', 'ooooo', # Other iterators exhausted too >>> next(I2) Traceback (most recent call last): File "<pyshell#45>", line 1, in <module> next(I2) StopIteration # Same for new iterators >>> I3 = iter(G) >>> next(I3) Traceback (most recent call last): File "<pyshell#47>", line 1, in <module> next(I3) StopIteration # New generator to start over >>> I3 = iter( c* 5 for c in 'Python') >>> next(I3) 'PPPPP' >>>
The same applies to generator functions:
>>> def repeat5times(x): for c in x: yield c * 5 >>> # Generator functions work the same way >>> G = repeat5times('Python') >>> iter(G) is G True >>> I1, I2 = iter(G), iter(G) >>> next(I1) 'PPPPP' >>> next(I1) 'yyyyy' >>> # I2 at same position I1 >>> next(I2) 'ttttt' >>>
This is different from the behavior of some built-in types. Built-in types support multiple iterators and passes and reflect their in-place changes in active iterators:
>>> >>> L = [1, 2, 3, 4] >>> I1, I2 = iter(L), iter(L) >>> next(I1) 1 >>> next(I1) 2 >>> # Lists support multiple iterators >>> next(I2) 1 >>> # Changes reflected in iterators >>> del L[2:] >>> next(I1) Traceback (most recent call last): File "<pyshell#21>", line 1, in <module> next(I1) StopIteration >>>
Python tutorial
Python Home
Introduction
Running Python Programs (os, sys, import)
Modules and IDLE (Import, Reload, exec)
Object Types - Numbers, Strings, and None
Strings - Escape Sequence, Raw String, and Slicing
Strings - Methods
Formatting Strings - expressions and method calls
Files and os.path
Traversing directories recursively
Subprocess Module
Regular Expressions with Python
Regular Expressions Cheat Sheet
Object Types - Lists
Object Types - Dictionaries and Tuples
Functions def, *args, **kargs
Functions lambda
Built-in Functions
map, filter, and reduce
Decorators
List Comprehension
Sets (union/intersection) and itertools - Jaccard coefficient and shingling to check plagiarism
Hashing (Hash tables and hashlib)
Dictionary Comprehension with zip
The yield keyword
Generator Functions and Expressions
generator.send() method
Iterators
Classes and Instances (__init__, __call__, etc.)
if__name__ == '__main__'
argparse
Exceptions
@static method vs class method
Private attributes and private methods
bits, bytes, bitstring, and constBitStream
json.dump(s) and json.load(s)
Python Object Serialization - pickle and json
Python Object Serialization - yaml and json
Priority queue and heap queue data structure
Graph data structure
Dijkstra's shortest path algorithm
Prim's spanning tree algorithm
Closure
Functional programming in Python
Remote running a local file using ssh
SQLite 3 - A. Connecting to DB, create/drop table, and insert data into a table
SQLite 3 - B. Selecting, updating and deleting data
MongoDB with PyMongo I - Installing MongoDB ...
Python HTTP Web Services - urllib, httplib2
Web scraping with Selenium for checking domain availability
REST API : Http Requests for Humans with Flask
Blog app with Tornado
Multithreading ...
Python Network Programming I - Basic Server / Client : A Basics
Python Network Programming I - Basic Server / Client : B File Transfer
Python Network Programming II - Chat Server / Client
Python Network Programming III - Echo Server using socketserver network framework
Python Network Programming IV - Asynchronous Request Handling : ThreadingMixIn and ForkingMixIn
Python Coding Questions I
Python Coding Questions II
Python Coding Questions III
Python Coding Questions IV
Python Coding Questions V
Python Coding Questions VI
Python Coding Questions VII
Python Coding Questions VIII
Python Coding Questions IX
Python Coding Questions X
Image processing with Python image library Pillow
Python and C++ with SIP
PyDev with Eclipse
Matplotlib
Redis with Python
NumPy array basics A
NumPy Matrix and Linear Algebra
Pandas with NumPy and Matplotlib
Celluar Automata
Batch gradient descent algorithm
Longest Common Substring Algorithm
Python Unit Test - TDD using unittest.TestCase class
Simple tool - Google page ranking by keywords
Google App Hello World
Google App webapp2 and WSGI
Uploading Google App Hello World
Python 2 vs Python 3
virtualenv and virtualenvwrapper
Uploading a big file to AWS S3 using boto module
Scheduled stopping and starting an AWS instance
Cloudera CDH5 - Scheduled stopping and starting services
Removing Cloud Files - Rackspace API with curl and subprocess
Checking if a process is running/hanging and stop/run a scheduled task on Windows
Apache Spark 1.3 with PySpark (Spark Python API) Shell
Apache Spark 1.2 Streaming
bottle 0.12.7 - Fast and simple WSGI-micro framework for small web-applications ...
Flask app with Apache WSGI on Ubuntu14/CentOS7 ...
Fabric - streamlining the use of SSH for application deployment
Ansible Quick Preview - Setting up web servers with Nginx, configure enviroments, and deploy an App
Neural Networks with backpropagation for XOR using one hidden layer
NLP - NLTK (Natural Language Toolkit) ...
RabbitMQ(Message broker server) and Celery(Task queue) ...
OpenCV3 and Matplotlib ...
Simple tool - Concatenating slides using FFmpeg ...
iPython - Signal Processing with NumPy
iPython and Jupyter - Install Jupyter, iPython Notebook, drawing with Matplotlib, and publishing it to Github
iPython and Jupyter Notebook with Embedded D3.js
Downloading YouTube videos using youtube-dl embedded with Python
Machine Learning : scikit-learn ...
Django 1.6/1.8 Web Framework ...
Ph.D. / Golden Gate Ave, San Francisco / Seoul National Univ / Carnegie Mellon / UC Berkeley / DevOps / Deep Learning / Visualization