Goroutines at Baker's Day, a deep dive

Recently, I've interviewed a bunch of Gophers, as more and more of the engageSPARK platform is moving over to Go. One thing I noticed is that there is quite a bit of confusion around goroutines, and how they're different/better/faster than OS-threads. That's a bit worrisome, because as a software engineer you really should understand the fundamentals of your language. And goroutines are a big part of why Go is so useful for concurrency.

I've read around, and have not found the kind of write-up that I would like to point these applicants to. Also, by reading around, I've stumbled over some fun bits how preemptive scheduling in Go works, and maybe it's worth sharing.

Let's start with a metaphor: Baker's Day, and a new you.

Baker's Day, Kids & Goroutines

You're a tech entrepeneur, made your millions and find yourself bored in a neighborhood full of kids. Now, one of your fondest childhood memories is baking cookies with your mom. You think this is essential to growing up sanely next to an iPhone and flying cars. But when talking to parents around, they're all so busy. And if they bake, it feels more like putting yesterday's pizza into a micro-wave.

So you decide to organize a Baker's Day for the neighboorhood kids. All of them should get some quality baking time, where they can watch someone lovingly make cookies, decorate them, sneak away with some sweet dough. How hard can it be?

Same with threads: threads of execution

With your goal set, you ask around, the word spreads and indeed 100 kids sign up for Baker's Day on Sunday. They'll be eating dough, asking questions, cutting cookies and repeat.

Obviously, the 100 kids equate 100 threads of code execution. (Later, those will be goroutines.) By the way, the Go runtime, in the code and docs, denotes goroutines with the letter G.

The ovens / the CPU cores

If you're gonna bake anything, you'll need ovens. So, in frugal startup fashion, you go around town and manage to get 10 ovens together, courtesy of two bakeries and helpful housesolds. (Good job!)

How does that translate to threading? The ovens are the CPU cores: You've 10, so you'll only ever be able to bake 10 trays at a time. (We're not doing multiple trays at the same time, because the ovens are small and because that breaks the metaphor and I love my metaphors.)

The bakers / the OS-threads

The kids are not going to bake cookies themselves and their parents are working, and anyway ovens are hot, so somebody needs to be there. You advertise all over town for bakers to volunteer at Baker's Day. As luck would have it, you are overwhelmed with responses, and you end up assigning each kid their own baker.

With all those bakers ready to bake, you need to figure out a way that everyone gets a fair share of oven time. After all, it's 100 bakers and kids and only 10 ovens. You give the bakers simple instructions: everyone gets the oven with baking area for 15 minutes. When the time is up, they get their dough, tray and kid, and head over to the playground. When it's their turn again, they resume.

How do they know it's their turn? You have a list with their names, and the bakers just go through it. Whenever you reach the bottom of the list, you start at the top again.

And … threads? The bakers are the OS-threads. Same as the kids should not handle the ovens themselves, your program should not worry about processor scheduling and memory management themselves. The Go runtime denotes OS-threads (or their abstraction) with the letter M, for “machine”.

So, same with the kids and the bakers, each function gets their own M, their own OS-level thread. This means, we don't have goroutines yet. (Soon, soon!) Scheduling can actually happen in a variety of ways, often some round-robin variant, as described above.

With the bakers all set, let's bake some cookies! You eagerly await Sunday.

Switching bakers / OS-threads

Come Sunday, the baking starts and with it the problems, too. You walk around, and everyone's complaining, you see few happy faces and in fact few kids actually baking! What's going on?

It turns out, having 100 bakers complicates things. Whenever bakers are supposed to take over the oven, they clean the area thoroughly, check the oven and make sure all equipment is complete. It does not take long, but it does take a few of those 15 minutes.

And threads? Same thing! Whenever the scheduler decides to let another OS-thread execute on a CPU core, a context switch happens and that's expensive.

1 baker, 1 oven

Unhappy kids, not what you wanted! How to fix this?

You quickly decide: There's only going to be 10 bakers, end each stays responsible for their oven. That means, there's no need for equipment control and all those things. Whenever the time is up, the baker sends the kid to the playground, and calls on the next.

And that's where we are with threads in Go! The Go runtime makes sure your execution threads, the goroutines, are run on a fixed number of OS-threads. That's why you can have many more goroutines than OS-threads.

The number of OS-threads M that run your Go-code is fixed, but configurable . The default depends on your Go version: it's eitherb 1 or equal to the number of CPU cores on your computer, see this design document about changing the default in Go 1.5 and the Go 1.5 release notes on the topic. You can configure the number of OS-threads M with the environment variable GOMAXPROCS. (It's slightly more complicated, see below.)

Choosing the next kid / goroutine

When you reduced the number of bakers, you also had to change the list that decides who is next. On the list are no longer the name of the bakers, since they don't change anymore, but the name of the kids.

As you're watching this now, you realize that this does not work. Having finished a time slot with a kid, the baker would go up to the list and find the name of the next one. Now, the list has 100 names, it takes forever, the bakers pile up around the list … what did work with 10, does not work with 100.

What to do?! You decide to go back to lists of ten: you assign each baker a list of ten kids. Whenever bakers are done, they consult the list of their oven. And since it's always the same 10 kids, it's not that hard to figure out who is who.

And … goroutines? Turns out in Go 1.1, that's (almost) exactly what happened. Processors were introduced that retained each a list of goroutines to be run, the run queue. Each OS-thread is assigned a processor, so with 10 OS-threads running your Go code, there's ten processors. And you guessed it, the Go runtime calls processors by their letter: P.

Hm. If there is one processor for each OS-thread, why do we need those processors at all? Why not keep the run-list with the OS-thread? Excellent questions! Now that Baker's Day is running somewhat smoothly, let's look at some examples.

Baker's Break / syscall

“Hey boss—I'm gonna take a break. My partner called me and I want to make sure everything is alright.”

And like that, you've 10 ovens and only 9 bakers and, oh, a list of names—and the kids with those names are getting restless. Where is our Aunt Baker? Why did she go? I want to cut cookies!

You feel the panic rise, but your startup training kicks in: Before the baker disappears, you remember to ask her for the list. Then you quickly call up one of those 90 bakers you sent home earlier, and he's happy to jump in his car and come back. He arrives, you explain him the new system, he takes the list, calls the next name and, phew, everything is running smoothly again. Finally, you announce that each list of names should be left with the oven, not the baker.

Yeah, and OS-threads have significant others now? What? No. Baker's Day is a metaphor. But OS-threads do get stuck waiting for system calls. And when that happens, the Go runtime can't do a thing but wait … or it could start a new OS-thread and hand over that list of waiting goroutines.

And that's exactly what it does: Spawn another OS-thread, and assign it the processor of the one stuck in the syscall. And that's why we need processors, because they keep a run queue for each imagined CPU core, and if an OS-thread gets busy, oh well, let's get another one.

Here's another tidbit to chew on: You've now 10 oven, but 11 bakers. Same in Go. GOMAXPROCS controls the maximum number of OS-threads that run your Go code at the same time. But many more can be stuck in syscalls (and in fact more may be running Go runtime maintenance tasks).

Not just the bakers take time off: The kids disappear in the middle of baking, too!

Kids go to the bathroom / Goroutines block

Every once in a while in the middle of cutting cookies, a kid spontaneously decides it's time for the bathroom and runs off. The baker stares at the vacuum left behind by the kid, looks at you, shrugs and calls the next kid. You get worried though: What happens when the baker reaches the kid on the list that just disappeared to the bathroom? You don't want that baker to waste time searching!

So, you decide to remove the name from the list, and add it again later, when the kid comes back from the bathroom.

And that's where scheduling leaves the “perfect circle” of round-robin: If every thread were to use its alloted time, it would get rescheduled at the end of the run-queue. If we're waiting for it, it gets added to the end only when it is ready again.

That's not the only trouble the kids give you. And yes, the run-queues save you there, too!

Kids get bored / goroutines are done

As it turns out, kids get bored. After the second tray, or after the first or fifth, the kids decide: enough's enough, their stomachs are full with cookies anyway and it's time to sleep or to play with the other kids. They leave—and their name gets removed from the list. This means, at some point a baker will run out of names on their list. When that happens, they do the obvious: go to another baker, ask them if they can help by taking over some of the kids.

Same with threads, but not quite. What's considered helpful at Baker's Day, taking over goroutines from the run queue of another processor, they now call stealing. (Fun fact: It's not easy to steal from everyone equally.)

At this point, I'm going to leave the Baker's Day metaphor behind, because I want to tackle some related topics shortly. Unfortunately, I can't figure out (without inflicting hurt on my brain) how to incorporate them into the metaphor. So, it's time to say: Bye, Baker's Day!

Why are Goroutines “lightweight”?

Goroutines are lightweight, because…

  • The Go runtime does not have to worry about things like signal masks, CPU affinity, cgroups and tracking resources for each goroutines. The Linux kernel does keep track of all those things for every pthread. (That's in part why you can run more of them.
  • Goroutines use segmented stacks that start small. The idea is to reserve enough space to accomodate most goroutines. Stacks can be resized, if required, but often don't need to, because goroutines are typically small functions. For slightly more details see the second paragraph of this FAQ entry.

Are goroutines green threads?

That depends on your definition of green thread.

If you mean a user-space thread, in contrast to an OS-thread, then yes: that's exactly what goroutines are, because the Go runtime is managing them (in the Go process, which is run in user-space).

If you imply explicit cooperative scheduling, as for example Python's greenlet uses, where a thread has to choose to give up control, then no: The Go runtime chooses to schedule another goroutine at certain points, as I'll explain in a second.

And goroutines are coroutines, too?

That's a contested question. Some require coroutines to explicitly define entry points, and then goroutines are not, because you can't know when your goroutine's execution will be suspended in favor of another goroutine. The Go FAQ seems to have the viewpoint that goroutines are coroutines with the scheduling details hidden.

Preemptive or cooperative scheduling?

This is a fun topic in Go. There are basically two ways to schedule threads: preemptive or cooperative. The former means the runtime (OS or Go runtime or whatever) decides when to run what. With cooperative scheduling, as the word implies, the running threads themselves cooperate by choosing to give up control and letting other threads run.

Operating systems typically schedule threads preemptively. Why? Because otherwise your process could starve out other processes for CPU time. And that does not bode well for an OS that is meant to keep all applications happy.

As mentioned above, Python's greenlet is a library for green threading where scheduling happens cooperatively: Threads have to give up control themselves. On top of greenlet, gevent was implemented, and it makes greenlet's threads “cooperate automatically” whenever a socket function is called. So, that's cooperative … and preemptive in the sense that the green thread cannot avoid being preempted when calling a socket function.

With Go, the situation is similarly muddy. In July 2013, Robert Johnstone declared on the go-nuts mailing list: “The scheduler is currently cooperative.”. That is true in the sense that the Go runtime cannot at random times take control away from a Goroutine—but it can schedule another goroutine at certain times.

Why so complicated? It turns out, either extreme is undesirable.

You could do explicit cooperation. But then, as mentioned above, your code would be littered with statements about yielding control. That's ugly. Hiding all this complexity in the runtime is, as Go's FAQ puts it, “the point”.

Complete, time-based preemption is difficult, too: Go was designed to make concurrency easier, and for this concurrency-related operations need to leave memory conistent, like reading from a channel. You only really know when it's safe to schedule, at certain points, and that when rescheduling occurs: When reading or writing from a channel, yes. When doing system calls. When doing function calls, see also Go 1.2 release notes. If you enable the preemptibleloops experiment, Go adds preemption points to loops. (Generally, the thread on infinite loop preemption is worth reading.)

Inserting preemption points at calls to the runtime falls into cooperation. On the preemption side, I should note that time-based preemption has been added back in the day: When 10ms are up, the sysmon thread attempts to preempt the running goroutine. But, as the function comment to preemptone suggests, preemption is purely “best-effort”.

Fun fact: This sysmon thread is a goroutine. So, what does runtime.NumGoroutine() give? The documentation says: NumGoroutine returns the number of goroutines that currently exist.” Does that include the internal sysmon goroutine? You'd think, if your program starts ten goroutines, then runtime.NumGoroutine() should return 10, right? Well, I think it should. Turns out now it does, but only since 1.9.1.

Again on the cooperation side: Go has a function in the runtime package called Gosched to release control of a processor, allowing other goroutines to run.

More reading & watching

That's it on scheduling and goroutines.

To reinforce my learning, it usually helps me to digest the same information again, but from a different point of view. If you want to look at goroutines and scheduling from the perspective of channels, I recommend this excellent talk from GopherCon 2017 by Kavya Joshi: Understanding Channels. If you want to read about scheduling, consider this excellent article on scheduling by Daniel Morsing. For The Truth (because code does not lie), head over to the proc module.