Tier 3 Means Getting Your Hands Dirty
It should have been simple example code, but it was broken. Yes it was for AVR and used unstable compiler features. But still, why did it break in such mysterious ways? To find out, I went down a rabbit hole that led me to submit a patch to LLVM.
Seemingly innocent changes made the bug appear and disappear. My code didn't panic, so why did mapping the error type matter? Switching #[inline] to #[inline(never)] made it go away, maybe it was an inlining issue?
To get to the bottom of things, I had to pore over the generated IR and assembler, and eventually dive into the codebase of LLVM itself.
Presented by
Andrew is a freelance software developer who also runs a small startup designing and building consumer electronics widgets. He pushes his clients hard to adopt Rust -- selfishly, since it makes for far fewer calls in the middle of the night. Since AVR support was mainlined in the Rust compiler, he's been migrating all his microcontroller code to Rust (and loving every minute of it, even the frustrating ones).
Recordings
Transcript
Tier 3 Means Getting Your Hands Dirty
Bard:
Andrew Dona-Couch will now go
to the farthest reach of Rust to show
if you're willing to get
your coding feet wet
Tier three has got some room to grow
Andrew:
Good afternoon. Welcome to RustFest Global 2020. My name's Andrew Dona Couch. This talk is Tier 3 Means Getting Your Hands Dirty. I'm a freelance software developer, I'm a big fan of Rust. And I love playing around with hardware. And recently hardware and Rust. On social media generally @couchand. And previously mentioned, this talk is about my experiences finding and fixing a bug within the Rust compiler. Within the Rust compiler, within a library used by Rust called LLVM. And this bug only comes up on the AVR platform.
So, what's AVR? AVR is a series of 8 bit microcontrollers. These are tiny computers. They are an older series. They're cheap and very hacker friendly. And support for AVR was mainlined into the Rust compiler this past summer. Into Rust and�LLVM thanks to a significant amount of effort by a gentleman, Dylan McKay. And you can see the AVR line ranges here...
❖
Andrew:
My apologies. I...
Stefan (@dns2utf8):
No problem.
Andrew:
My cat chewed through my power cord and so, it was not charging. All right.
Stefan (@dns2utf8):
The risks of feline.
Andrew:
Right.
Stefan (@dns2utf8):
Would you mind going back a couple of slides because we lost you quite some time ago.
Andrew:
Sure. Yep!
Stefan (@dns2utf8):
Yes. All right. We can't hear you at the moment.
Andrew:
How about now? My apologies for all of that. All right. So, I'm at a strange angle at the moment. Hopefully we can make this work. So, just to briefly go over these ground rules again, I'm going to try to show everybody respect today. Hopefully I succeed at that. We'll also be talking about things that I'm relatively an amateur about. And so, bear with me. And most importantly, I'm going to be oversimplifying because I would like to cover a lot of material. And due to those technical issues, we're, of course, now a few minutes behind from our... from our plan.
All right. So, with that in mind, what are these simplified models that I'm going to be talking about? The first example is a black box model. We'll be talking about some complex system. Take this Rube Goldberg self operating napkin, and we're going to put this system inside of a box. And the reason we do that it because we want to analyze this complex system, no terms of the details of the inside, the implementation, but rather in terms of its inputs and its outputs. Its relationship to the outside world. That's a bit abstract. So, let's look at a couple examples.
We can think of a computer as being a box. And the input to our computer is some kind of program. And the output will say... hand waving a bit here... but that we have two outputs. Computation and effects on the world. All right. This is a technical conference. So, let's look inside the box and we find that there are four additional boxes. There's... we'll talk about each of these four in turn a little bit later. But right now I'll give you just a brief sense of what they... what they are. So, on the left we see we have a registered... we have a set of registers and some memory. And on the right our two boxes are called processer and peripherals.
So, just to get a vague idea what have each of these are, we'll use the metaphor of an old school paper pushing office. And you have someone sitting at a desk moving paperwork around. And that person is the brains of the operation and that's analogous to the promoter. The desk in front of them, that's covered in the paper work they're currently working on is something like the registers. The memory is a bit like a file cabinet that holds the paperwork that they need, they need to have access to, but they don't need it immediately at hand. And finally, the peripherals in this metaphor are something like an intercom sitting on top of the desk where if we need to connect with the outside world, if we need to interface with the outside world, ask for a cup of coffee, we can push the button on the intercom and request support.
All right. So, let's connect our arrows from earlier. We said that the computation comes from... we said the process is the brains of the computer. The computation can be thought of as coming from there. The peripherals are the interface to the outside world. We can think of our effects on the world as coming on those peripherals. And then program will... we will load that into memory and then we will go ahead and start executing from there.
All right. Let's look at another example of a black box. This one is a compiler. And we can think of a compiler as a box. And the program is the input to the compiler. And since this is a Rust conference, our compiler presumably is rustc, the Rust compiler. And our program is the Rust source code for... the Rust source code for our program. All right.
And we can look at the output of our compiler. And now we've reached a really interesting point. What is the output of this compiler? So, again, let's not define it based on its intrinsic properties, but rather use another simplifying model. So, here we'll define this output by how we use it. And we'll see that we use it by passing it as the input to our computer box from previous. And we already said that input is a program. So, we've now determined that this compiler is a box.
The compiler is something that takes a program as input and produces a program as output. Now, this is a bit confusing, so, we'll be more specific about this. We have two different representations of the same program. And the input representation, we already called it source code. And the output representation is the one that's meant for the machine, the computer machine. And so, we call it machine code. All right.
So, again, this is a technical conference. Let's look inside this compiler. And we see two more boxes. We have a rustc frontend and an LLVM backend. Now, rustc we know is the Rust compiler. So, that term is not new. But LLVM might be new. So, what is this LLVM thing? LLVM is a library that it contains all of the common parts of compilers. Things that are shared among compilers no matter what language they're for. Optimizations, code generation for the machine code and so forth. As suggested by that previous slide, our back end for the Rust compiler is LLVM. But the LLVM library serves as the back end for many other languages, including Julia and Swift and many others.
And notably, the... the Clang clang compiler, which is part of the project is the C compiler on the Macintosh. Okay. So, looking at our compiler model again. And let's once again connect our arrows for our input and output. And extend them inwards and connect the source code input to the rustc front end. And the source code is the input to our front end. And the output of our LLVM backend is our machine code.
And that leaves one big gap. What is it that's between these two boxes? Well, it's another representation of a program. And because it is intermediate between the source and machine code, the LLVM authors have decided to call it intermediate representation. It's frequently abbreviated IR. So, we'll see frequently the term LLVM IR. That's this common currency between the rustc front end or the front end for any compiler that might be using LLVM, and the LLVM back end.
All right. It's about time that I tell you about my project. So, I was working on something that is more or less the hello, world, of embedded. That the "Hello, World" of embedded is sometimes called Blinky. And the idea here is you take a microcontroller. So, here we have a bread board with a ATtiny 85 microcontroller. That's a particular AVR microcontroller. And we will go ahead and connect it to an LED. An LED is like a tiny light.
So, we have our tiny computer wired up to our tiny light and we write a program for the microcontroller. And that program will make the light turn on and off. And on and off. And on and off. And in embedded, you tend to write things as an infinite loop. So, we're going to turn on and off our light forever.
Well, I expect that, like me for most of you, the bulk of your software development experience is on the desktop or server. Perhaps modern mobile. Or a web context. And there are a few important things to know about writing software for an embedded context that make it slightly different from those other environments.
So, the most important difference is that in an embedded context you have extremely limited resources. Now, we sort of generally probably know what that means in terms of memory, processing power and so forth. But let's look at, again, back at our computer model and see what... what is this limitation in resources imply for each of these components. So, first we'll look at the processer. And inside our processer, we have, again, hand waving quite a bit here some math related stuff that's going to be doing arithmetic perhaps. And some program control stuff that lets us do loops and conditional jumps and moving around in our program. And the big difference in embedded is that it's slower. And in some cases, it's less capable. For instance, the AVR devices that I'm generally developing for don't have floating point numbers.
So, all arithmetic has to be done on integers. All right? Let's look at another component of our computer model, the peripherals. So, I said previously that the peripherals are the interface to the outside world. and let's look at a couple examples. We can consider a video streaming application where you might need to have access to a networking peripheral that you can use to fetch a video from some service. And then once you have the video, you want to show it to a user and so you would use some sort of vastly improve hardware which would be provided by a peripheral. Now, in an embedded context, you certainly could have a networking peripheral, although it's not universal. You might have a video peripheral, although it's somewhat rare.
You tend to have access to peripherals that are much more limited. For instance, you could ask the computer to count to a number. And then let you know when it gets to that number. And this is a bit like when my kids are in the other room and I'm working on the dishes. And my kids want me to come in and play with them. And I need them to finish up what I'm doing and ask them to count to 20 and let me know when I get to 20. And we can do the same thing for our microcontroller for our little computer if we need to delay and perform some computation at a later point in time. We can ask the computer to count to a number for us and�let us know when it gets to it.
All right. Let's take a look at the memory real briefly. So, we're going to be talking a bit more about how the stack will work. But let's sort of get a vague idea of what it is inside this memory. So, I have this little photo collage that I made up. And hopefully it is not completely misleading. But we have three broadly speaking... three parts of our memory. So, as we said previously, we're going to put our program into memory. We're gonna load program into memory. And so, that's... we see at the bottom, the brick wall at the bottom is our program that's been loaded. And any static variables, any static... static variables in the program would be there. And then we're going to have a heap potentially. And this is a bit like a program. This is where the program stores things that it may need later, but it doesn't need right now. It can throw them on the heap and then go back and look for them later.
And finally, we have a stack. And there are really two models that might be useful for thinking about the stack. So, the first model is a bit... is more of an operational model. And this is where you are... you're thinking of the stack as sort of nested context for the program. So, we have... perhaps a pile of file folders on our desk. Now, we're departing a bit from our desk metaphor earlier. Because here the desk is not the registers. But using a slightly different metaphor.
So, I'll have... perhaps I'm working on a client. I have my... the file folder open. And I have some pages. And then my boss comes in and is holding the file folder for a client that's more critical. And we open that up and set it down on top of the pile of papers on my desk. And we look at that one. And then the boss' boss comes in with another client's file folder and we open that up and put it at the very top of the stack. And then when we finish, the boss' boss' client, we close that file and take it away. And resume work on my boss' client. And then when we finish that up, do that work, and my boss leaves and then I can resume doing what I was doing before I got interrupted.
And that's sort of the nested context view of the stack. The somewhat more physical view of the stack I like to think of as a stalactite. Growing from the roof of a cavern. And the reason I think of it this way is that in most... on most computers, the stack begins at the very top of the memory... memory. The very highest address and grows downwards. Grows down into memory. And so, it's a bit like a stalactite hanging from the roof of a cavern.
And yeah. So, it's a bit like a stalactite hanging from the roof of a cavern where as we add additional context, our stalactite grows down. And then as we resume previous context, we shrink it back up.
And the most important difference for the memory in an embedded context is that it's much... that there's... that it's significantly restricted. You have much less memory in most embedded devices. For instance, the ATtiny 10 that you saw on the very first picture of the AVR slides, that has 1 kilobyte of program memory. There are similar devices that have only 512 bytes of program memory. You must write program to fit in 512 bytes. That's a significant limitation when developing.
For completeness, let's look inside the registers and talk about what the types of registers are. So, the main class of registers are these general purpose registers. And this is where we store the state that we're currently working on as I had mentioned previously in the desk metaphor. And we can sort of think of the general purpose registers as a pile of etch a sketches on a table. And any time that we need to ask the processer to do work for us, we have to write down the numbers on the etch a sketches and give them to the processer. So, if we need... if we want to... if we want to provide some numbers for the processer to work on, we need to go to our table and pick up an etch a sketch and ideally find a blank one.
But if we don't find a blank one, we pick up one that's in use. We write down somewhere whatever number is on it and make it to erase it. And then we put our... the value of our number on it. And we need to write it down because we probably were using that previously. And so, when we're done with... with whatever we're doing immediately, we want to restore that etch a sketch to its previous state. And so, we want to know what was on it before we erase it so that we can put it back on.
And this is a process that's referred to as clobbering the registers. So, when you're using a register that's already in use, you clobber it. And if it's important for you to maintain what was in it previously, then you have to save and restore that register. All right. We also have a few special purpose registers. And there are lots of these. But there are two... there are only two that we will talk about today.
And the first one is called the status register. This tells you what the processer has been up to recently. For instance, if you do not math and the result is zero, the status register will tell you that the last math you did, the result is zero. Another important status register we'll talk about later today is a frame pointer. So, the frame pointer always points at the current function stack frame. It points at where on the stack we can find the local variables for the current function.
And like I said, there are many other special purpose registers that we won't be talking about today. All right. So, that is the first difference. We took that opportunity to also expand our computer model a bit. But remember, the first significant difference for developing an embedded is that we have these very tight resource constraints. All right. The second difference is that in an embedded context we're working in, it's a no_std environment. What this means is that you don't have access to the standard library. You do have access to the Core library. The Rust Core library. And this is the parts of the Standard library that don't require memory allocations and don't require platform support. So, with the Core library, we don't have access to collections like a vector or a HashMap. And we don't have access to operating system things like a file system.
But we still do have a lot of the basic functionality. In addition to the no standard context, we are... we're working in a Panic abort model frequently. Because unwinding the stack in the case is very expensive in terms of memory use and processer. So, in general, in developing for a context, we abort on panic rather than going into the stack. This leads to the third difference, which is limitations on debugs and observability. If we can't unwind a panic, then a panic occurs, we have a lot fewer clues about what is caused the panic to happen.
But there are other debugging and observability limitations. So, I'm vaguely aware that there are these professional in circuit debugger systems. There's a standard called JTAG that I'm aware exists. But I've never used these myself. My impression, which might be a naive impression, but my understanding is that they're probably expensive, hard to set up, and probably windows only system. So, I never really looked into them.
So, what that means is in general, when I'm developing for an embedded context, I'm using much more naive debugging methodology. Now, I tend to like to write my software using lots of unit tests with really good... making sure that I have a really good mental model of everything that's going on. But when I'm developing for embedded, I often start working on a project with the... like this meme shows. Of like my code doesn't work. I have no idea why and I changed something about it.
And then it works, and I have no idea why. And it takes much longer when developing an embedded for me to really understand what it is that's making it work and not work. One third option that I've recently started looking at for debugging in an embedded context is simulating. So, now because these embedded devices are so limited, it's actually quite easy to run a program on your desktop computer that simulates the entirety of the embedded device. So, there's a wonderful one for AVR that's an open source project called Sim AVR.
And it allows you to run a simulation of your AVR program. And you... it can produce these trace files as output which you can still load into a GTKWave or other trace visualizers. And here we can see a trace from a program I was simulating a few weeks ago. Running on an AVR device.
Something else that's interesting about sim AVR, we can use the trace files adds inputs. The trace files can represent memory inside the processer. But they can also represent the state of pins on our chips. So, for instance, my ATtiny 85 that I love has 8 pins. Two of them are power. One of them is a reset pin. So, we have 5 general purpose pins to use. And so, you can see the state of those five pins. And if one of them is an input, you could provide a trace file that has those inputs. All right. So, that's the debugging and observability. Somewhat related to that is in an embedded context, your compiler is a cross compiler. You're generally running the Rust compiler on the same device that the program you're compiling runs on. But in an embedded context, you compile it on a host machine like your desktop computer and then you send that to the embedded device and run it on the embedded device. And there's a whole host of nuance to this.
But it leads to things being a little trickier. Okay. So, let's get back to my project. It's a simple real world button handling library. Real world example for a button handling library. The details are not relevant, I'm not going to get into it. But this is the button handling library that I'm writing an example for. And the thing to note here is that it's... it uses embedded HAL, a hardware abstraction layer for the embedded context. We can write code for an embedded context that doesn't need to know details about the... about what hardware it will run on. Okay.
So, we'll go back to Blinky. We're wiring up a microcontroller and connecting it to a an LED. But because I'm writing this example for a button handling library, I'll add a button. And the idea is that I can write an example that when I push the button, it will blink the light on and off. All right. So, I take my existing Blinky example, which I've run, and I know that it works. And I add in a little bit of... I add in a little bit of support for the button part. And I send that to my microcontroller, and I try it out. And nothing. It doesn't work. Okay. It's a very simple example. It's not much code. But still, there's a 95% chance that the... it's my fault. 95% chance the bug is in the code you just wrote. But in fact I'm using unsafe for this code because I'm using static mutable variables. So, it's actually more like a 99% chance that the bug's in my code. So, what are some other things that I'm thinking about? Well, AVR is Tier 3. So, Tier 3 means that it's supported by the Rust compiler, but it's not supported. It's not built and tested automatically by the continuous integration server. And critically, as you can see from the docs here, it may not work. So, we're... everything's a bit up in the air when you're running on Tier 3. I'm also, as I mentioned previously, using static mut. Static mutable variables. And that's always unsafe. I don't know about you. But one reason I like writing Rust is I can almost always ignore unsafe code and not have to think about it.
So, I'm not entirely confident that I'm using my unsafe code correctly here. One other thing to keep in mind is that AVR interrupt service routines are explicitly experimental. So, the code that I've written uses an interrupt service routine. This is experimental. There's a tracking issue in Rust that has no... no progress effectively made on it. And so, that means I have to compile with the Rust nightly and add a feature flag and all of those sort of add confounding factors to the debugging process.
Okay. I said that AVR interrupts are experimental. I said I'm use�interrupts. But what is an interrupt? An interrupt, it's a lot like a function call. But instead of one piece of code calling another function, it's a bit like the world is what's calling your function. And, of course, as we saw in the model previously, the world here is the peripherals. So, we have a function call. We have a function on the left, A, and the function on the right, B. And A does some things and calls into B. And when B is done, it returns back to where an A... it was running. And what does this look like in terms of what we were talking about with the stack? Well, there's this thing called a calling convention. And the calling convention says some of the working registers need to be saved by the function that calls... that is doing the calling.
And so, we're gonna do that first. We're gonna save working registers that we need to save. And then we jump over into the function. And then the first thing inside the function is that we may need to save the other working registers. These are the callee saved registers. Which registers are saved in step 1 or step 3 are determined by the calling convention. But the important thing to note here is that some are done in 1 and some are done in 3. Okay.
So, let's look at the same thing for an interrupt. We have function A on the left and an interrupt service routine on the right. And function A is just doing some local things. But the world says, oh, wait. Let's call in to our interrupt service routine. Perhaps the computer finished counting to 20. Or perhaps finished counting to 256. And so, now we've jumped into our interrupt service routine. Or perhaps we've received a byte on a network interface. But something in the world... something in a peripheral has determined that we need to service this interrupt. So, we go into the interrupt service routine. And then when it's done, we return back to wherever it was that we were interrupted.
All right. So, what is different for our calling convention for an interrupt service routine as opposed to a regular function? Well, the first thing is that we don't... we're not jumping into a function on step 2. We're jumping into an interrupt service routine. And the second thing is, because the function that's getting interrupted doesn't know that it's going to be interrupted, it can't perform step first... step 1 first. The function that's being interrupted doesn't know that it needs to save the working registers. So, we need to move step 1 down to step 2. And so, now the first... now the first thing we do inside our interrupt service routine needs to be save those working registers that would otherwise be saved by the caller.
Okay. So, I have some code and it's not working, and I think it should. And so, I'm making a bunch of changes. To try to get it to work or not work. And some of the things that make the bug appear and disappear are moving my interrupt service routine code into main. Well, this maybe is a clue, but doesn't provide a lot of info because we know ISR... or ISR is experimental. And also, just interrupts are a bit hard to identify what's going on.
We also, if I change an inline annotation to inline never, that makes the bug disappear. Now, this is curious. I've heard of there being compiler bugs related to inlining. But that means it's not my bug. It's not a bug in code I wrote. That implies it's a bug in the compiler. And it's never a bug in the compiler. So, that's very curious. Another thing that can make the bug disappear is adding a map error to unit before I unwrap. So, this is basically I built a conveyor belt and any errors coming down, I just throw into the trash.�be it's important to note that this error never actually gets called. This map error never actually gets called. So, adding... I'm adding code that doesn't get called and that changes the observable behavior of the program.
All right. Let's dig into that just a bit more. So, here I have the... an example of my broken code. It's an interrupt. We call it PCINT0 because it's an interrupt on the pin change. If our pin changes value, it will jump into our interrupt. And so, online 3 we toggle our LED. And then online 4, we unwrap. Well, why is it that toggle returns a result? This goes back to the embedded HAL. Embedded HAL has a number of traits that different pieces of hardware can implement. And in this case, we're talking about an output pin trait and it returns a result because on some platforms, attempting to set an output pin can fail. But not on AVR. On AVR, the error type used for this trait is infallible, it's void. And so, this result can never be the error case. We know statically the result is never the error case. So, the unwrap never actually gets called. Or the error case of the unwrap never gets called.
All right. So, like you said earlier, we can make this work by mapping our error to unit. Now, it's not entirely clear why throwing away the original error and replacing it with an empty error would make this broken code work, particularly because I know statically that error case never actually happens. We can also replace that with a panic. If we explicitly panic, rather than panicking when we unwrap an error case, then it works.
So, I don't, myself, see a semantic different here between this panic version and the original broken version. So, I'm gonna say, I found a bug. Found a bug in the compiler. We need to minimize our reproduction of the bug. So, we're gonna remove all of our external references. We're gonna remove, of course, the crate that I was writing example code for. We're gonna remove any other crates that I make use of. I'm going to remove references to the core library where possible. Because I'm trying to eliminate anything non essential.
And finally, I'm gonna remove the memory shenanigans around my unsafe fat static mutable because I want to eliminate anything that could possibly distract from the bug. And I do this by copying my working version of the code to a file called A dash working. And the broken version to a file called A dash broken. And I make the same change to both of them and compile them and send them, and make sure that the working is working and the broken is broken. And make the incremental changes repeatedly until it looks like the Y case, X or Y, I get to Y and I have what I think is a minimal reproduction. And the minimal difference here looks a lot like my... my minimal difference before. But I've removed all of the core library code and the crates that are in use. And just by changing reference to... to an error to a reference to unit, I can make the bug appear and disappear.
And so, I've confirmed a bug. I have a minimal repro. So, I file a Rust issue. And I sit for a little bit. And nobody comments on it. I guess people have other things to do. So, I say I'm gonna go ahead and dig into this. Now, I'm vaguely aware that Rust uses LLVM. And I've done some messing around with LLVM in the past. So, I think, let's take a look at the LLVM IR that's generated for this code.
And so, I use this incantation, RUSTFLAGS = emit = LLVM ir. And the Rust compiler�dutifully complies and emits LLVM. And it's not important to understand this in detail. But we can see what the broken code's difference is. And the main difference is that we have this one alloca. We have an alloca, and that makes the code fail. What is this alloca that we see? We're reserving space on the stack for the local variable.
So, we said previously, our stack has a stack frame for each function. And if that function has a local variable, we need to reserve space on the stack frame for that local variable. And that's what an LLVM alloca instruction does. But why is reserving space break my code? Break my example? So, we need to keep digging. So, let's take a look at some assembler. An assembler is a textual representation of the machine code we were talking about earlier. It's as close as you can get to really understanding exactly what the machine sees without reading the binary itself. And the Rust compiler has a flag to emit assembler as well. And looks similar to the LLVM flag. We asks it to emit assembler, and the Rust compiler dutifully complies, and we get this code, which is significantly different from our broken code. That's a lot more... a lot more significant differences.
And they come in three main sections. So, this first section where we push some things on to the stack and do some in and out And do with into exactly what this is doing a little bit later. And then a second and third section with pops and out. Let's walk through this. But first we need a little bit more information how to read this AVR assembler. So, the push statement, like I alluded to a moment ago, we're pushing a register value on to the stack. We take some value on the register and put it on the stack to save it for later. And when we want to get it back, we pop it. That takes it off of the stack and puts it back in our register. We have an in operation which will take a value from a... one of the special registers. And put it into a general purpose register. And we have an out command that will take a value from a special register... I apologize... take a value from a general register and put it into a special register. So, push and pop take register values to and from the stack. And in and out take register values to and from special registers for the purpose of this talk. They do other things too.
We can also clear a register. And we can disable interrupts with the CLI command. So, disabling interrupts, that tells the... tells the machine to not interrupt us so that we can run some code without being interrupted. And we also call that clearing the interrupt flag. And it's worth noting that on AVR that interrupt flag is in the status register we were talking about earlier. All right. One other important concept before we dig in here. We have a Prolog and epilogue for every function. And these bookend the body of the function. And they provide that calling convention that I described earlier.
And it's important that these fragments mirror each other because they tend to use the stack to implement their... the saving and restoring of registers. They need to mirror each other. Well, what does that exactly mean to mirror each other? Well, here we see the Prolog and the epilogue of our working code. All right. So, let's read this from the outside in. So, we're gonna start with a push and pop on register zero so if we're starting from the top of the function online 2, we push register zero on to our stack.
And then we push register 1 on to the stack. And then we're going to perform this sequence. So, 63 online 4, the constant 63 refers to the special register, the status register. So, we're going to read in the status register. And we're gonna push it on to the stack. So, now our stack has register zero's prior value, register 1's prior value and the status register's prior value. And then we push register 24 on to the stack. All right. The reason that 24 is down below and r0, r1 and status register are up above is because register zero and 1 are the caller saved registers. And the only reason we are saving them here is because we're in an interrupt. If we were in a regular function, this Prolog would start with line 7.
Okay. And then we go do lines 8 through 10, which are the body of the function. And then perform the epilogue. We pop 24 you are a of the stack. We pop the status value off of the stack and we use an out command to put it back in. Now our status special register has its prior value. And then we pop register 1 and register zero such that at the end of this interrupt we have now restored all of our registers... our special and general registers and we can return. All right.
What's different about the broken code? So, on the broken code, we have the same sequence at the start for the prologue. So, we can push register zero, 1, status register and 24. And then we have a few more callee saved registers that we'll push on to the stack. 25, 28, 29. Because our broken version of the interrupt service routine clobbers three additional registers. Okay. And then towards the end, we'll pop registers 29, 28 and 25 off the stack.
And finally, we have the epilogue from our working... where... from our working code. And this epilogue pops 24. It pops our status value, and it pops register 1 and zero. But note that sequence is interrupted by this other sequence. So, that's a bit mysterious. And we note that this is the sequence to adjust the frame pointer. So, before we walk through that, let's walk through the corresponding sequence from the prologue.
So, first we're going to read in from 61 and 62. And we're gonna store that on registers 28 29. We said previously we clobber 28 and 29. Here is where we do that. 61 and 62 are the addresses of the special registers for the frame pointer. So, we read in the frame pointer and we... so, this is what we're supposed to do here. So, we'll read in the frame pointer into register 28 and 29. And then we subtract 1 from it online 31. We subtract 1 from the frame pointer. And then we go ahead, and we send that updated version of the frame pointer. So, subtracting 1 is what's allocating space on the stack. And then on 16 and 18 we are going to put our updated version of the frame pointer into the frame pointer special register.
All right. We note that little sequence is interrupted by this section, 14, 15 and 17. And this is a miniature version of saving and storing the status register. So, on 14 we saved the status register into register zero. On 15, we clear interrupts so that we can perform our pushing of the frame pointer without being interrupted. And then on 17, we restore the status register, restoring the value of the interrupt flag.
Something curious to note that 17 is before 18 because you get an extra instruction free when you enable interrupts. All right. And then what is it supposed to do to restore the frame pointer in the epilogue? Well, we'll go ahead and add one to our updated frame pointer in registers 28 and 29. And that's going to... that's going to restore that value to the... to what it was prior to entering our interrupt service routine.
And then we'll output that value into our frame pointer. And at the end, our frame pointer will have its original value. And you can see, we have the same little status register and interrupt clearing.
All right. That's what it's supposed to do. But we note that we break symmetry here. In the prologue, we push into 28 and 29 and then we read into our... read in our frame pointer. In the epilogue, we pop from 28 and 29 and then we send out to our frame pointer. So, we have a push and an in followed by a pop and an out. But if this were symmetric, if this mirrored collectively, it should be push in, out pop. The epilogue is in the wrong order. Let's see what that actually means. So, what is it actually doing? Well, as we saw previously, first we push the 28 and 29 registers on to the stack. And we read in the frame pointer on lines 11 and 12. And subtract 1 from it and send that back out to the frame pointer.
So, our prologue is fine. But then in our epilogue, well, first on 22 and 23 we pop our values from registers 28 and 29. And then online 28 we add one to that value. And then on 31 and 32, we put that value into our frame pointer register. And we note that's not the prior value of the frame pointer register. That value is a completely unrelated value based on the previous value of some unrelated registers. So, we have now confirmed we have a bug in LLVM. So, I file an issue. And here is a screenshot of the issue in LLVM's bug repository.
And I sit on it for a bit. And I wonder, who's gonna fix it? Well, Hermes Conrad, one of my favorite characters from Futurama, if you want it, do it.
I'm going to breeze through. Don't get overwhelmed. This is C++ code, but mostly concerned about the comments. So, we see that we have special epilogue code with registers 1, zero and the status register. That sounds familiar. And then we see this early exit if there's no need to restore the frame pointer. And I recall, restoring the frame pointer, if we don't need to restore a frame pointer, the code works. If we do need to restore the frame pointer, the code doesn't work. So, this triggers something in my mind.
And then we see that we're gonna skip the callee save pop instructions and then insert our frame pointer restore code. All right. Let's match this up quickly to what we had... what we saw in our assembler. We saw our emit the special epilogue code here. And we see that's the same as this special epilogue, restoring the status register and register 1 and zero. And we see we restore the frame pointer by doing this arithmetic. And that matches up to our... the sequence we walked through a few minutes ago.
So, now that gets to this bit in the middle. This question of where do we insert the frame pointer restoration? Well, we're gonna do a loop. Here MBBI starts with the end of the function, and as long as we haven't reached the beginning of the function, we step backwards through the function. And we check. If the op code, if the instruction is a pop, then if the current instruction is a pop, then we continue. If it's not a pop, then we will break out of our loop. Well, what does it look like in our broken code? Here's the broken code before we insert our frame pointer restoration. And we start on 29. That's a pop. So, we keep going. We see 28, that's still a pop, keep going. We see 27. 27 is not a pop. So, we uncertainty our frame pointer restoration code there.
And that's what leads to our pops... or our frame pointer being restored later than it should. And we see lines 22 and 23 really need to be after the frame pointer restoration. So, I can go ahead and make a fix now that I've figured it out. It's actually quite straightforward once I worked through all of that. I pull out a function to restore the status register from this special epilogue code. And in the case of an early exit, restore the status register then. And otherwise, restore the status register at the very end.
And I contribute that to LLVM. First write the fix. But, oh, I probably need to write a test to make sure that it works. And before that, I need to compile LLVM which is itself the subject of perhaps a full talk. And then I submit the patch to LLVM. Here is a screenshot of the Phabricator interface that LLVM uses. And I... I get Dylan McKay fortunately had the time to review my patch and committed it. And so, I appreciate that. Thanks again, Dylan.
So, it's fixed! The bug has been fixed in LLVM and now I want to contribute it to Rust. So, rust keeps a fork of LLVM. So, we cherry pick the fix into that fork and then need to update the Rust compiler. And after a couple PRs get landed, finally the Rust bug has been fixed.
Hooray! All right. So, what are my next steps? There are several other outstanding AVR issues. Including as you can see several that relate to AVR interrupts. And now that I've worked through stepping through the assemble her that's generated and working through the code that generates that assembler, I feel a little bit of a responsibility to take a look at these bugs. I haven't had time to yet. But I hope to soon. Well, that was a whirlwind. But thank you very much for listening and for your patience with my technical difficulties at the beginning. Hopefully we have a couple minutes to take a few questions if anyone would like to hear anything more. Thank you.
Inaki:
Andrew. Thank you for that incredible talk which can only be called epic.
Andrew:
Thanks.
Inaki:
That was an epic. Wow. So, AVR maybe tier 3, but your patience man is like God tier. Not only all you've done, but also with like handling all the tech issues and talking just, you know, I can't believe... so, thank you so much for your patience, actually. We're just like, this is gravy for us. So, I do have a few questions.
First, AVR is quite a new target, right?
Andrew:
That's right.
Inaki:
How are you finding it, and have you tried things like ST 32 targets?
Andrew:
I have. I've messed around a little bit with the other embedded targets. I haven't done the STM 32. For anyone in the audience not familiar, that's the target that the Rust embedded intro book talks through using a board called the discovery board. I have on my list of too many things to do I have the goal of picking up one of those discovery boards and working through that. But I haven't had the opportunity to do that. Yeah. I have done a little bit of arm development. ARM is another embedded platform. I've done a little bit with Rust.
But honestly, I've done very little Rust development at this point. Most of my embedded experience is with C. Programming with C I've never liked, and it's always been frustrating. I'm very grateful that the Rust embedded community is working so hard on making Rust... and the Rust compiler contributors are working so hard to make Rust a viable option for embedded. Because it's... there's a lot of potential there.
Inaki:
Absolutely. Do you know of a good reference for AVR assembly?
Andrew:
I... the AVR documentation generally is pretty good. Directly, it's often hard to find the right PDFs. But once you find them, they... they tend to be pretty solid. So, the AVR... AVR is actually a very limited platform. It's much more limited than, for instance, ARM. And so, the documentation... the assembler referenced is quite complete. So, if you search for the AVR assembler reference guide, the... and I could drop a link to it in the slides when I release those. But that guide is quite complete.
The other resource that I found to be incredibly helpful is a... is a forum called AVR Freaks. And this is a forum that a bunch of people who love programming for AVR answer all kinds of questions. So, almost any question that I have has already been answered on that platform... on that forum on one post or another. And so, you know, coming up with search results on AVR Freaks is fantastic. And then the third resource I would suggest is the AVR... the libc documentation for AVR also contains a lot of nuggets that are very useful for illuminating how things actually work on the AVR platform.
Inaki:
Cool. This is an interesting question. Do you think that making types of a standard library less dependent on the global allocator would make your job easier in any way?
Andrew:
Certainly. Yeah. That's a great point. Yeah. I have... I have not yet experimented with using the allocator. You can... so, I was previously in my talk I was talking about you don't have access to the standard library, but you have access to the core library. And then there's a middle ground there where you have the lib veloce that can give you access to collections like vector and HashMaps and so forth. And you can theoretically compile that for a context. I don't with that. I work on the ATtinys which are extremely limited and you... it's almost always worth doing analysis ahead of time to make sure that you don't run out of memory. And that analysis is significantly harder to do if you're using the heap.
So, my programs on embedded almost never do I think about reaching for the heap. Because it seems like it's going to be creating a lot more problems than it would solve. I think on other embedded devices it probably is more... it's much more relevant. Particularly, for instance, ARM obviously. But other more capable platforms I think using... using an allocator makes a lot of sense. I also... not related to AVR, but I feel like the reliance on a global allocator for the standard library means that other... other context... the other context I do a lot of my development in is high performance web development. And that's a place where being able to use, for instance, a slab allocator on a per object basis would be incredibly valuable.
But, again, I think that the, you know, I'm getting into the weeds for something not related to this talk. but I do see the user experience benefit of having it... having those standard libraries based on a global allocator probably outweighs the technical benefits for these niche use cases.
Inaki:
Cool, cool. There are a few more questions, but we're really running out of time. So, so, maybe you could answer them in chat or later on. So, once again, thank you so much, Andrew, for that epic talk.
Andrew:
Yeah, glad to. Thanks, everyone. Have a good day.