Implementing generators with continuation-passing style, streams, and defunctionalization
Some languages define a variant of return
called yield
. When a function returns normally, it’s
finished, but when a function yields, the execution context of the function is saved, enabling us
to re-enter the function to resume its execution. Such resumable functions are especially
convenient for defining lazily generated sequences. Such functions that generate sequences are
called generators.
This article will first describe native generators in Python before translating the idea into OCaml using mutable variables and continuation-passing style (CPS). Then, we will see how to eliminate the mutable variable and how to eliminate the higher-order functions that arise from CPS, leading to an implementation that we translate into C.
Native generators in Python
For example, consider this recursive algorithm in Python that enumerates all truth assignments on
n
variables. Each truth assignment is represented as a list of booleans of length n
. The
sequence that arises from this enumeration has length 2^n
since there are two possible choices
for the value of each of the n
variables.
def enumerate_assignments(n):
= [True] * n
a
def go(i):
if i == 0:
yield a
else:
- i] = True
a[n yield from go(i - 1)
- i] = False
a[n yield from go(i - 1)
return go(n)
In the base case when i == 0
, the variable a
contains a truth assignment with some True
s and
some False
s put in it by the stack of recursive calls that leads to the base case. We yield
that truth assignment, suspending execution of the function. The function, upon being re-entered,
proceeds from that yield
, in this case returning back to its caller. The caller might be the
recursive call yield from go(i - 1)
, after which a[n - i]
is set to False
and another
recursive call is made.
Therefore, calling this function does not generate the whole sequence at once. In fact, nothing
happens just yet if we call enumerate_assignments(5)
except to ‘prime’ the generator to run. The
call returns a so-called iterator, which stores the state of the execution. Then, calling the
function next()
on the iterator will resume execution of the function up to the next yield
.
Also, next()
returns to us the yielded value. Repeatedly calling next()
until the generator
exits is exactly what a Python for
-loop does, so we can print all the truth assignments on five
variables like this:
for a in enumerate_assignments(5):
print(a)
What if the language we’re working in doesn’t have yield
though? How can we implement something
like this in, say, OCaml? In doing so, we will explore what happens under the hood of generators.
Evaluation contexts and continuations
Before diving into the implementation of generators in OCaml using continuations, some background on continuations is required. I’ve already written an article on continuations in general here, but I’ll give a somewhat different take on the core idea in this section.
To motivate the discussion of continuations in this section, let’s look ahead at what we will want
to accomplish in the next section: we will implement a generator as a stateful function next
.
Each time we call this function – next ()
– it returns the next item in the sequence.
The key idea in implementing this function is that just before returning an item in the sequence,
we store the current evaluation context in a reference. Then, when the user of our generator
calls next ()
again, we restore the saved evaluation context. Evaluation then proceeds from that
saved point up to the next item in the sequence.
I said ‘evaluation context’ several times in the last paragraph (and in the title of this section) so it’s about time I define what that is and how it’s related to continuations.
When a program is “in a call to a function”, for instance, there’s somewhere that call will return to. That location corresponds to how the return value of the function is used. Or, said differently, that location is the evaluation context.
For example, let’s imagine we have a function f : unit -> int
.
Maybe somewhere in our program there is let x = f () in E
.
The evaluation context of that call to f
– the way the result of f
is used – is that it is
associated to the variable x
in evaluating E
.
We can write this evaluation context as let x = _ in E
as this shows where the return value of
the call will be used. Notice that let x = _ in E
is not a program! Rather, it’s a program
with a hole in it.
Or as another example, perhaps the program contains f () + 5
.
The evaluation context inside the call to f
is that the result of the call will be added with
five.
We can write that again with this ‘hole’ syntax: _ + 5
.
Of course, evaluation can reach quite deep into a subexpression.
Consider this evaluation context let x = if _ > 17 then E1 else E2 in E3
.
The relevant insight about evaluation contexts, which are a concept outside our programming language, is that we can represent them in our programming language: we represent an evaluation context as a function. Let’s see how this applies to the examples seen so far.
let x = _ in E
is represented byfun r -> let x = r in E
- The second example
_ + 5
is represented byfun r -> r + 5
- The last example
let x = if _ > 5 then E1 else E2 in E3
is represented byfun r -> let x = if r > 5 then E1 else E2 in E3
This functional representation of an evaluation context is called a continuation.
Now that we have a way of representing evaluation contexts as continuations, we can write OCaml
programs that manipulate continuations. Since continuations are functions, we cannot inspect them:
we can only construct them and call them. Calling a continuation k
with an argument a
represents filling the hole in the evaluation context represented by k
with the value a
and
proceeding to evaluate the resulting expression.
For example, if k
is the continuation fun r -> let x = if r > 5 then E1 else E2 in E3
and we
apply this to 3
, then evaluation continues from the point where the value of the hole _
was
required in the represented evaluation context, namely in computing _ > 5
. The value of 3 > 5
is needed in the context if _ then E1 else E2
, and the value of that if-then-else expression is
needed in the context let x = _ in E3
. From this apparent nestedness of evaluation contexts, we
can observe that the contexts form a stack. This observation will be expanded on considerably in
the last section of this article when we translate these ideas into C.
(fun r -> let x = if r > 5 then E1 else E2 in E3) 3
==> let x = if 3 > 5 then E1 else E2 in E3 -- substitute 3 for r
==> let x = if true then E1 else E2 in E3 -- compute '>'
==> let x = E1 in E3 -- select then-branch
==> ...
Great, evaluation contexts can be represented as functions called continuations and our programs can use continuations by calling them. Calling a continuation corresponds to filling the hole in the represented evaluation context and continuing the evaluation from there.
The big idea of continuation-passing style (CPS) is to equip the functions we write with an
extra parameter. You guessed it, that extra parameter is the continuation of the function. So
instead of having a function f : A -> int
that we use like f a + 5
, we instead write f a (fun r -> r + 5)
. The upshot is that in the implementation of f
, we will have access to (a
representation of) the evaluation context in which the call to f
takes place!
In traditional CPS, functions no longer ‘return normally’. Instead, they return by calling the continuation with the value they want to return. In our current setting of implementing generators, this isn’t quite what we want. We would like our functions to return normally – this is how the generator will emit a value – but we nonetheless want access to the evaluation context so that we can store it in a reference just before returning. This will enable us to resume the function from the point where it emitted a value. In the next section, we’ll see how we can transform the Python code from the beginning of the article into OCaml using this form of CPS.
Implementing generators using state and CPS
In this section, we mimic Python’s approach to generators: generators in Python are stateful objects, so our implementation in this section will use a reference to store an evaluation context.
To come up with our OCaml implementation, let’s first translate the Python program from the first
section into pseudo-OCaml with yield
& yield from
. Rather than use a mutable array to construct
the truth assignment, we’ll build it up one value at a time in a parameter.
let enumerate_assignments n =
let rec go n a =
if n = 0 then
yield aelse begin
-1) (true :: a);
yield from go (n-1) (false :: a)
yield from go (nend
in
go n []
Let’s concentrate specifically on the inner function go
. We will need to find a way to represent
yield
and yield from
in genuine OCaml. First, let’s deal with yield
.
Operationally, yield
saves the current evaluation context and emits the given value.
We choose to represent ‘emitting a value’ as simply returning. To gain access to the current
evaluation context so that we can save it, we rewrite go
in CPS. We also introduce a ref
to
store the evaluation context.
let state = ref (* what to put here initially ? *) in
let rec go n a next =
if n = 0 then
(state := next; a)else begin
-1) (true :: a);
yield from go (n-1) (false :: a)
yield from go (nend
in ...
Next, let’s address yield from
. This keyword causes the current generator to invoke a
‘sub-generator’ and to yield all of its values.
Since we choose to represent yielding a value as simply returning it, we implement yield from go ...
as simply calling go
. In doing so however, we must pass go
a continuation. We can determine
what continuation to pass by looking at the evaluation context of yield from go (n-1) (true :: a)
:
-1) (false :: a) _; yield from go (n
We represent this evaluation context as a function: fun x -> x; yield from go (n-1) (false :: a)
.
Since yield from
does not compute anything – it performs an effect – we observe that
x : unit
. A value of type unit
does not convey any information, so we simplify to fun () -> yield from go (n-1) (false :: a)
.
Next we must translate the inner yield from go ...
which appears inside the continuation. Again,
we translate this simply to a call to go
, but in doing so we must decide what continuation to
pass in this call. We ask ourselves what evaluation context this call takes place in: what
happens next, after yield from go (n-1) (false :: a)
finishes generating its sequence of values?
The answer is that go
returns to its caller. In other words, the evaluation context of yield from go (n-1) (false :: a)
is the evaluation context of go
itself, which was passed to go
in
the continuation next
. Therefore, the continuation we pass to this second call to go
is simply
next
.
Let’s take stock of the translation so far:
let state = ref (* ? *) in
let rec go n a next =
if n = 0 then
(state := next; a)else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nin ...
Let’s zoom out, look at enumerate_assignments
again, and see how we can fit it together with
our adjusted go
.
let enumerate_assignments n =
let state = ref (* ? *) in
let rec go n a next =
if n = 0 then
(state := next; a)else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nin
(* ? *) go n []
Calling go
right away at the end of enumerate_assignments
can’t be correct anymore. This
will return the first item of the sequence, and then we’ll have no way to call the stored
continuation to get the next one!
Instead, we need to adjust the return type of enumerate_assignments
. Let’s look back to the
pseudocode using yield
and yield from
to inform how.
let enumerate_assignments n =
let rec go n a =
if n = 0 then
yield aelse begin
-1) (true :: a);
yield from go (n-1) (false :: a)
yield from go (nend
in
go n []
First of all, what is this enumerate_assignments
supposed to return? What is the return type of
go
in this pseudocode? If we think back to the original Python implementation, it was returning
an iterator. This is some kind of object that holds the suspended execution state. We call the
function next()
on this iterator to resume execution up to the next yield
.
We can imagine in our OCaml pseudocode that go
returns something of type bool list gen
and
therefore that enumerate_assignments n : bool list gen
. We can further imagine that there’s a
function next : 'a gen -> 'a option
which pumps the generator for one more item. This
hypothetical next
returns an option
because the sequence can end and we need a way to
signal this.
Now we need to turn the fantasy of the type 'a gen
and its associated function
next : 'a gen -> 'a option
into reality. Here there are several approaches available to us, but one particularly clean one
using higher-order functions is to represent 'a gen
as a function unit -> 'a option
. Then,
next
becomes trivial to implement.
let next f = f ()
Hence, we implement enumerate_assignments
to return a function unit -> bool list option
, such
that each time this function is called, it emits the next item in the sequence.
let enumerate_assignments n =
let state = ref (* ? *) in
let rec go n a next =
if n = 0 then
(state := next; a)else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nin
(* previously: go n [] *)
fun () -> ???
We’re almost finished now. What’s left is to make it so that the first time the returned function is called, it emits the first item in the sequence.
The missing insight has to do with the initial value of state
: it should be a function that makes
the initial call go n []
. This makes it so that the ‘driver function’ returned by
enumerate_assignments
can be implemented as fun () -> !state ()
. Recall that the state stores a
function that generates the next item in the sequence, so initially we store the function that
generates the first item of the sequence fun () -> go n []
. Moreover, we implemented go
to save
the current continuation back into the state
just before it returns. Therefore, the next time the
driver function is called, it emits the next item in the sequence!
But there’s a small wrinkle in fun () -> go n []
: there’s an argument missing! What continuation
do we pass in this initial call to go?
The continuation passed here ends up saved by go
in state
after the whole sequence of truth
assignments is generated. Therefore, we need to arrange that when this continuation ends up stored,
the driver function returns None
. The driver function has the form fun () -> !state ()
, so this
initial continuation passed to go
ought to be fun () -> None
.
But previously, the continuation had the type unit -> 'a
, so we need to adjust go
slightly to
accommodate this change. This leads to the finalized generator using CPS and state.
let enumerate_assignments n =
let rec go n a next =
if n = 0 then
(* we wrap the next item in Some *)
Some a)
(state := next; else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nand state = ref (fun () -> go n [] (fun () -> None))
(* and arrange that the last continuation to be stored in
`state` just returns `None`. *)
in
fun () -> !state ()
Now the state reference, which initially contains a call to go
, must be mutually recursive with
go
, which refers back to the state reference.
Let’s witness the fruits of our handiwork. Here’s an OCaml REPL demonstrating the generator.
> let next = enumerate_assignments 5 ;;
val next : unit -> bool list option = <fun>
> next () ;;
- : bool list option = Some [true; true; true; true; true]
> next () ;;
- : bool list option = Some [false; true; true; true; true]
> next () ;;
- : bool list option = Some [true; false; true; true; true]
> next () ;;
- : bool list option = Some [false; false; true; true; true]
In the next section, we explore how to eliminate the mutable variable from this implementation. This will give rise to an implementation suitable to purely functional languages, which lack (genuine) mutable variables.
Eliminating state
The implementation we have arrived at can be made stateless. Notice that there are two ways that
the driver function fun () -> !state ()
can return: it can return Some a
after storing the
continuation, or it can return None
which happens when the sequence ends and the initial
continuation fun () -> None
has been stored in the variable state
.
Rather than store the continuation in some hidden stateful variable, we can simply return both
the assignment and the continuation in the Some
case. That gives rise to the following
intuitive implementation.
let enumerate_assignments n =
let rec go n a next =
if n = 0 then
Some (a, next) (* return the value _and_ the continuation *)
else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nin
fun () -> None) go n [] (
Slight problem with this implementation: it doesn’t typecheck! And the error is not a simple “expect this type, got this other type,” but rather
Error: The expression `go (n-1) (false :: a) next`
has type (bool list * (unit -> 'a)) option
but an expression was expected of type 'a
The type variable 'a occurs inside (bool list * (unit -> 'a)) option
According to this error, the return type of go
, which is so far inferred as
(bool list * (unit -> 'a)) option
(due to the expression Some (a, next)
) has to equal the
return type of next
which is so far inferred as 'a
. This circularity is forbidden, so OCaml
rejects the program.
We can resolve this by introducing a recursive type, let’s say L
,
such that L = (bool list * (unit -> L)) option
. The base case of this recursive type arises from
the None
constructor of the option
type. Now we can fix the circular variable type variable
'a
to be L
and eliminate the forbidden circular instantiation.
type l = Next of (bool list * (unit -> l)) option
We can slightly refactor this type by introducing a second constructor and eliminating the
option
.
type l =
| Doneof bool list * (unit -> l) | More
And we can generalize the type by replacing bool list
with a type variable.
type 'a l =
| Doneof 'a * (unit -> 'a l) | More
And would you look at that! This is simply a list, but whose tail is computed by a function unit -> 'a l
rather than being already materialized.
Now we’re equipped to rewrite enumerate_assignments
but having the type
int -> unit -> bool list l
.
let enumerate_assignments n =
let rec go n a next =
if n = 0 then
(* return the value _and_ the continuation *)
More (a, next) else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nin
fun () -> go n [] (fun () -> Done)
Have we achieved our goal of making enumerate_assignments
stateless? Yes and no.
Indeed we have eliminated the mutable variable, so on the one hand we can say “mission
accomplished.” But on the other hand, the state of the walk through the space of truth assignments
is still very much present in our program. That state is captured in the continuation, which is
returned explicitly via the More
constructor of our lazy list type l
. The state of our
generator implementation is no longer mutable and hidden, but rather immutable and explicitly
passed around. We can therefore view lazy lists as purely functional generators.
Another consideration is that in order to make this approach work in a strongly and statically
typed setting as in OCaml, we did need to introduce the recursive type l
. In the setting of a
different type system, this might not be necessary. For instance, in a dynamically-typed setting,
e.g. in Python, it is unnecessary to introduce an extra type: we can simply return None
when the
sequence ends and (a, next)
when the sequence continues.
In the next section, we revisit our implementation using hidden, mutable state, and eliminate from it the higher-order functions. This will give rise to a first-order implementation suitable for translation into a language such as C, which is moreover well-equipped to handle mutable state.
Defunctionalizing the continuation of a generator
Recall from the first section the enumerate_assignments
implementation using mutable state.
let enumerate_assignments n =
let rec go n a next =
if n = 0 then
Some a)
(state := next; else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (n
and state = ref (fun () -> go n [] (fun () -> None))
in
fun () -> !state ()
We can apply defunctionalization to eliminate the higher-order functions present in this program (see here). In short, we replace each function type \(T = T_1 \to T_2\) that occurs in the program with a new datatype \(D(T)\). Then, for each function \(\Gamma \vdash \text{fun}\, x \to e_i : T_1 \to T_2\), define a constructor \(C_i : P(\Gamma) \to D(T)\) where \(P(x_1 : S_1, \ldots, x_n : S_n) = (S_1, \ldots, S_n)\). Next, define the function \(\text{apply}\, : D(T) \to T1 \to T2\) as follows (in pseudo-OCaml)
let apply (f : D(T)) (x : T1) : T2 = match f with
| C_i (x1, ..., xN) -> e_i
In other words, the function apply
takes the representation \(f : D(T)\) of the original
function and recovers the original function: notice that \(\text{apply}\, f : T_1 \to T_2\) has the
original function’s type!
Concretely for enumerate_assignments
, we introduce a type stack
to represent the function type
unit -> bool list option
.
There are three functions of this type passed as arguments to other functions.
fun () -> go n [] (fun () -> None)
: this is the initial continuation and it contains a free variablen : int
. We generate from this a constructorStart : int -> stack
.fun () -> None
: this is the final continuation and it contains no free variables. We generate a constructorFinished : stack
fun () -> go (n-1) (false :: a) next
: this is the continuation passed when making a recursive call togo
. It has the free variablesn : int
,a : bool list
, andnext : unit -> bool list option
. Notice that the function we’re translating refers to a function of the type we’re defunctionalizing. This will make our typestack
into a recursive type. Moreover, since the new continuation defined by this anonymous function refers to exactly one other continuation, we are finally justified in calling our type “stack”: these continuations were implicitly forming a linked list that now is explicitly represented. From this analysis we generate the constructorContinue : int * bool list * stack -> stack
.
We will make a small refactoring: we will separate the “stack frames” from the stack by introducing
a type frame
and changing the constructor Continue
to instead have the type
Continue : frame * stack -> stack
.
type frame = { n : int; a : bool list }
type state =
| Startof frame * state
| Continue | Finished
Equipped with this representation of the functions of type unit -> bool list option
occurring in
the program, we can translate enumerate_assignments
to use these constructors instead of using
anonymous functions. Anywhere we call an anonymous function, we instead call the function apply
that we implement to recover the behaviour of the anonymous function.
The parameter next : unit -> bool list option
becomes s : stack
.
let enumerate_assignments n =
let state = ref (Start n) in
(* ^ previously: fun () -> go n [] (fun () -> None) *)
let rec go n a s =
if n = 0 then
Some a)
(state := s; else
-1) (true :: a) (Continue ({n; a}, s))
go (n(* ^ previously: fun () -> go (n-1) (false :: a) next *)
(* Apply pops a frame from the stack and runs until the next item is produced, if any.
When `go` runs, it will manipulate the stack, saving it into `state` in particular right
before returning `Some a`. *)
and apply s = match s with
| Start n -> go n [] Finished-1) (false :: a) s
| Continue ({n; a}, s) -> go (nNone
| Finished -> in
(* And now we have implemented a function `unit -> bool list option` without using any
higher-order functions internally! *)
fun () -> apply !state
This program is now completely first-order with the exception of the “higher-order interface”
provided at the very end in the form of the function fun () -> apply !state
.
This first-order nature will make it possible for us to (somewhat) straightforwardly translate this
into C. Of course, we can translate it “as is”, which would mean using a linked list structure for
the type stack
and for the type bool list
, but since we’re going lower-level, we may as well
choose more efficient representations for these types too.
Translating to C
The resulting C program is around 100 lines of code whereas the defunctionalized OCaml program is around 20 lines of code. (Both counts ignore blank lines and comments.) We need the following includes in this development.
#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
Our first choice for efficient data representation will be to represent bool list
as simply
uint64_t
. This limits the number of variables to 64, but it also has the nice property that on a
64-bit machine such as most, a truth assignment fits into a register.
We will need to define some operations for setting and clearing specific bits.
typedef uint64_t truth_assignment;
typedef uint8_t var_index;
(truth_assignment a, var_index i) {
truth_assignment set_truereturn a | (1UL << i);
}
(truth_assignment a, var_index i) {
truth_assignment set_falsereturn a & ~(1UL << i);
}
const EMPTY_TRUTH_ASSIGNMENT = 0; truth_assignment
Next, we will translate the type frame
from the OCaml implementation into a simple C struct.
struct frame {
;
var_index i;
truth_assignment a};
struct frame make_frame(var_index i, truth_assignment a) {
return (struct frame) {
.i = i,
.a = a,
};
}
Next, recall the stack
type from the OCaml implementation.
type stack =
of int
| Start of frame * stack
| Continue | Finished
We will represent the linked list structure as a simple array of struct frame
s. Notice that the
depth of recursion is bounded by the parameter n
given to enumerate_assignments
. The maximum
value of the parameter n
in the development here is 64
. The maximum recursion depth informs the
maximum stack size.
const MAX_VARS = 64;
var_index const STACK_LIMIT = MAX_VARS; var_index
To manage this array of frames, we will need to track a frame pointer: this is the index of the next unused frame in the stack. Seen differently, the frame pointer is the count of frames currently in the stack.
The frame pointer, being at least 8 bits wide, can accommodate values greater than the count of
frames we will ever store.
This means we can use the upper bits of the frame pointer to help identify what state the generator
is in.
When the generator is in the start state, the stack is empty, so we can use the value
1 << (WIDTH-1)
for the frame pointer to signify that the generator is in the start state.
In the Start
state, we need to know the count n
of variables we’re enumerating truth
assignments for, but afterwards we can forget this n
and just keep the array of frames.
This suggests using a union
to reuse space here.
struct generator {
uint8_t frame_pointer;
union {
;
var_index num_variablesstruct frame stack[STACK_LIMIT];
} data;
};
enum state {
= 1,
START = 0,
CONTINUE };
enum state generator_state(struct generator *gen) {
// extract the uppermost bit of the frame pointer
return gen->frame_pointer >> 7;
}
// constructs a generator's initial state
struct generator enumerate_assignments(var_index num_variables) {
return (struct generator) {
.data.num_variables = num_variables,
.frame_pointer = 1 << 7,
};
}
Next, we need operations to push to and pop from the stack held in a generator
.
/**
* Returns -1 if pushing fails: wrong generator state or stack is full.
* Otherwise copies the given frame into the top of the stack, increments the frame pointer,
* and returns 1. */
int gen_stack_push(struct generator * gen, struct frame const * frame) {
if (gen->frame_pointer >= STACK_LIMIT) {
return -1;
}
->data.stack[gen->frame_pointer++] = *frame;
genreturn 1;
}
/**
* Returns -1 if popping is forbidden: wrong generator state.
* Returns 0 if the stack is empty.
* Otherwise decrements the frame pointer, and copies the top frame into `out`,
* and returns 1 */
int gen_stack_pop(struct generator * gen, struct frame * out) {
if (START == generator_state(gen)) return -1;
if (0 == gen->frame_pointer) return 0;
*out = gen->data.stack[-- gen->frame_pointer];
return 1;
}
Now that we’ve translated all the type definitions, we can translate the programs go
and apply
from OCaml into C. Recall the OCaml implementation:
let enumerate_assignments n =
let state = ref (Start n) in
let rec go n a s =
if n = 0 then
Some a)
(state := s; else
-1) (true :: a) (Continue ({n; a}, s))
go (nand apply s = match s with
(* Finished comes from fun () -> None *)
| Start -> go n [] Finished -1) (false :: a) s
| Continue ({n; a}, s) -> go (nNone
| Finished -> in
fun () -> apply !state
Notice that go
refers to the variable state
that is not a parameter of go
. In other words,
the definition of go
constructs a closure. Sadly, C does not have closures, so we will
implement this by passing our translation of go
a pointer to the generator
. This way when go
makes a recursive call, it can simply use gen_stack_push
to implement the expression Continue ({n; a}, s)
at the same time as state := s
. In other words, rather than building up the stack in
a parameter to save it just before returning, our translation of go
will be mutating the stack
along the way.
Observe also that go
is tail-recursive: the recursive call is the last thing the function does.
The OCaml compiler transforms this into a while-loop during compilation. This transformation is
called tail-call optimization. We will also perform this transformation in our translation, to
avoid using the C call stack.
int go(struct generator * gen, var_index i, truth_assignment * ta) {
struct frame frame;
// due to the --> 'operator',
// `i` will have its value decremented by one inside the loop
while (i --> 0) {
*ta = set_true(*ta, n);
// in the OCaml program, the frame that gets pushed by the recursive call
// `go (n-1) (true :: a) (Continue ({n; a}, s))`
// stores the value `n`, but here we are storing n-1 as a consequence
// of the decrement that happens in the while loop condition.
= = make_frame(n, *ta);
frame if(-1 == gen_stack_push(gen, &frame)) return -1;
}
return 1;
}
Notice that go
returns a status code here, whereas the original OCaml program returned the truth
assignment. The C program ‘returns’ the truth assignment via the pointer parameter ta
, and rather
than construct a new truth assignment, it simply modifies the given one.
Finally, we can translate apply
. We will call it next
in the C program, since it will dispatch
on the current generator state to compute the next item in the sequence, updating the generator
state.
int next(struct generator * gen, truth_assignment * ta) {
int status;
struct frame frame;
switch (generator_state(gen)) {
case CONTINUE:
= gen_stack_pop(gen, &frame);
status
// handle generator exit
if (0 == status || -1 == status) return status;
*ta = set_false(frame.a, frame.i);
if (-1 == go(gen, frame.i, ta)) return -1;
return 1;
case START:
*ta = EMPTY_TRUTH_ASSIGNMENT;
->frame_pointer = 0;
genif(-1 == go(gen, gen->data.num_variables, ta)) return -1;
return 1;
}
}
And what good is all this code if we don’t try it out. Here’s a main
function to run the
generator until it exits, printing out the truth assignments along the way.
int main() {
struct generator gen = enumerate_assignments(5);
int status = 0;
;
truth_assignment a
for (; status = next(&gen, &a), 1 != status;) {
("truth assignment: %d\n", a);
printf}
if (-1 == status) {
("Generator encountered an error, sorry.\n");
printfreturn EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Collecting all this C code into a file enumerate.c
, we can witness the fruits of our labour:
$ gcc enumerate.c && ./a.out
truth assignment: 31
truth assignment: 30
truth assignment: 29
truth assignment: 28
truth assignment: 27
truth assignment: 26
truth assignment: 25
truth assignment: 24
truth assignment: 23
truth assignment: 22
truth assignment: 21
truth assignment: 20
truth assignment: 19
truth assignment: 18
truth assignment: 17
truth assignment: 16
truth assignment: 15
truth assignment: 14
truth assignment: 13
truth assignment: 12
truth assignment: 11
truth assignment: 10
truth assignment: 9
truth assignment: 8
truth assignment: 7
truth assignment: 6
truth assignment: 5
truth assignment: 4
truth assignment: 3
truth assignment: 2
truth assignment: 1
truth assignment: 0
And there you have it: the most complicated program you’ve ever seen to count down from 2^n -1
.
Conclusion
This article covered a lot of ground.
At first, we were motivated by Python’s elegant generator syntax and curious about how these ideas are implemented. We implemented the idea of stateful generators using continuation-passing style to give us access to a representation of the evaluation context, so we could store that context in a mutable variable. That gave us the following implementation.
let enumerate_assignments n =
let rec go n a next =
if n = 0 then
(* we wrap the next item in Some *)
Some a)
(state := next; else
-1) (true :: a) (fun () -> go (n-1) (false :: a) next)
go (nand state = ref (fun () -> go n [] (fun () -> None))
(* and arrange that the last continuation to be stored in
`state` just returns `None`. *)
in
fun () -> !state ()
In the following section, we explored a purely functional take on this idea, motivated by the simple idea of returning the continuation together with the generated value. To make this typecheck, we needed to introduce the following recursive type.
type 'a l =
| Doneof 'a * (unit -> 'a l) | More
This recursive type (or some variant thereof) is often presented in programming languages courses
simply as “a lazy list”. The development in this article, on the other hand, derived this
representation by eliminating the mutable variable from the program in the previous section.
This demonstrates that lazy lists are purely functional generators, where the state of the
generator is captured in the continuation of type unit -> 'a l
that is explicitly returned.
In the following section, we applied defunctionalization to the stateful CPS generator to convert it into a state machine using an explicit stack.
let enumerate_assignments n =
let state = ref (Start n) in
let rec go n a s =
if n = 0 then
Some a)
(state := s; else
-1) (true :: a) (Continue ({n; a}, s))
go (nand apply s = match s with
| Start n -> go n [] Finished-1) (false :: a) s
| Continue ({n; a}, s) -> go (nNone
| Finished -> in
fun () -> apply !state
In the final section, we translated the defunctionalized program into C, using efficient data representations along the way where possible.
I hope that this sheds some light on the connection between CPS, generators, and state machines.