Booting and the ESP8266 Flash Format

By: Gil Pinheiro

So far with the ESP8266 I have found that the both the official and unofficial documentation focuses on the ‘what’ rather than the ‘why’. This has made reading the documentation tricky for me - I find myself continually wandering between wikis, datasheets and and source code until I gather enough knowledge to figure out the ‘why’ part before I can really reason about the system.

A good example of this was the figuring out the 8266’s boot up sequence. The documentation does a pretty good job of explaining the memory layout but leaves out enough detail about how the esp8266 is bootstrapped to leave me confused.

Knowing that we wanted to implement over-the-air updates and didn’t want to rely on Espressif’s cloud services, I needed to gain a clear understanding of how all the pieces fit together at a low level. I started keeping notes about structures and logic being applied, and eventually succeeded in getting the boot process to work exactly the way that we wanted for our project.

This week I’ve decided to share those notes in hopes that I can save you time on your projects. This is as good a time as any to mention that everything I’m going to describe is based on my understanding, and there is a real possibility I’ve gotten something horribly wrong. If you have additional questions, or corrections please get in touch: I’m gil@pushrate.com.

ESP8266: Groundwork

The ESP has a quirky architecture. It might be oversimplifying things, but you can think about the ESP as having a single address space that maps all your RAM, ROM and devices. Some of this space is well-known, we understand exactly what it is doing and what outcomes occur when we read and write to the addresses. Unfortunately, there are proprietary bits that have big question marks beside them since they have not been officially documented. For now, there are only a few that you absolutely need to know, which I’ve outlined in the table below. If you are interested, you can find a pretty good map of the whole space on the ESP wiki.

DRAM 0x3ffe8000 - 0x3fffbfff 80k
IRAM1 0x40100000 - 0x40107fff 32k
Mapped Flash 0x40200000 - 0x402fffff 1024k
The important addresses

The ESP is a Harvard architecture device, which means it separates instruction and data memory (iram, dram). I haven’t found any source to confirm why they made that design choice, but typically it is done to allow the processor to be fetch data and instructions at the same time without bus contention. I’m not sure that is a real concern on a microcontroller like this, but hey, I’ll take the additional instructions per cycle the optimisation brings. The compiler and the linker will sort out where each bit of your data and code will go, so in practice this isn’t something you need to worry about. Being aware of the difference will help save time when you run out of space - the compile process will error out and tell you that it can’t fit your program into a particular segment, which will correspond to dram, iram, or irom.

From the table, you can see that the mapped flash section is only 1 megabyte. Most ESP boards have multiple megabyte flash… so what gives? Well, I can’t speak to why they’ve only mapped one megabyte (maybe shades of “640K ought to be enough for anybody"?), but in reality they’ve done us a great favor by mapping flash to the processor's memory space. If they hadn’t, all reads have to explicitly be routed through the SPI interface (the bus that connects the processor and the flash) and you’d have to load your code or data into system memory before using it.

Those SPI calls are still being made whenever you access this memory region, but you don’t need to do anything. This gives us the ability to transparently read and execute code from the flash in exactly the same way that we read from dram, and iram. I don’t know enough about the actual hardware implementation to explain how this works electronically, but from a software perspective all of the details of that communication are taken care of for you. Executing code or accessing data from flash is slower than from ram, but the both happen so fast that it isn’t worth losing sleep over.

The 1MiB region is mapped to the first megabyte on the flash chip by default. You can’t alter the content of the flash by writing to the corresponding memory location. You can alter the flash, but you’ll need to use the SPI interface to do it. Writing flash is tricky, because of technology limitations you can’t simply update a single byte. You are forced to read and write in 4k blocks. To do a partial update you have to issue a read into a ram buffer, modify the buffer and write it back out to flash (again via SPI command). Flash writes can also fail, so you need to be prepared for that eventuality.

The Boot Process

First: I’m going to skip things that I don’t fully know about. Obviously there are tasks like peripheral initialization that occur during boot, but I can’t confirm which devices require setup, nor at what stage each initialization happens. Most of that stuff is hidden in proprietary code that we don’t have the source to. This walkthrough focus on the aspects that are related to loading and running user code.

Power-on happens.

Execution starts in code on the device ROM. This is good. We can’t modify this code, so it is pretty much impossible to completely brick the ESP by loading bad software. Barring a mechanical defect, you'll always be able to fix your device.

The first decision point for the boot loader is what kind of boot to perform. At this level this is controlled by the state of a few pins.

MTD0 GPIO0 GPIO2 Mode
L L H Boot from UART
L H H Boot from Flash
PIN -> ESP boot modes. At least those that we care about (src: ESP8266 community wiki)

During boot it’ll emit some debug information to UART0, at the odd baud rate of 74880. If that seems like an odd number, it is. The datasheet explains that this rate is based on the external oscillator used: If they had used a 40MHz clock, the baud rate would be 115200 (normal standard rate), but with a clock @ 26MHz you’ll get the odd 74880 rate. So it is down to a component choice made by the module manufacturers.

Booting from UART is how we are able to load new firmware onto the device. The ESP is booted into this mode and the flash tool (usually esptool.py) uploads a short stub program and starts it running - the stub’s function is to read your program data from the uart, write it to the flash via SPI, and to verify the result. If you are interested in how that works, I can recommend reading through Espressif’s esptool repository, everything is open source, and you can see how each piece fits together.

Booting from Flash

So, let’s start with an overview of how things are laid out on the flash chip:

Let’s talk about the last bit first: the system parameter area. Espressif has reserved the last 4k of flash (not of the mapped region, but of the entire flash chip) as a space to store system configuration information like antenna tuning. I’m not sure if this region is used to conduct some initialization during firmware boot, or if it exclusively used by user-space code in the Espressif proprietary libraries. Either way, you’ll want to make sure you’ve initialized it with the esp_init_data_default.bin (for a 4MiB flash, that means that these settings start at 0x3fc000). Eventually I’d like to cobble together an article on that since the documentation I’ve found has been very scattered.

What happens if you’re using 1MiB flash? As you’ll see later on, we reserve the last 8k of all 1MiB partitions for Espressif configuration data. This means that when you are using larger chips there will be a few 4k sectors that go unused. It is trade of efficiency for simplicity - each partition has exactly the same format whether it is the first one or the last one.

Let’s unpack some of Espressif’s jargon: OTA, short for Over-The-Air, is used to describe an updatable architecture. Espressif offers their own cloud service where you can upload your code to their website, and if you’ve compiled everything correctly, installed their bootloader and provided the right credentials you can issue an AT command to perform an update without writing additional code. For our purposes, I’m going to ignore their service - for us "OTA" format just indicates that you’re using a custom bootloader, and that your device setup can handle multiple bootable firmware programs.

Similarly, a non-OTA build is just a single program, and you rely on the rom bootloader to start your firmware. As we’ll see, there is a lot of commonality between this and format used for the OTA builds.

To follow along, I recommend cloning our esp8266_template project on github. Build it (“make"). Esptool will generate two firmware files, 0x000000.bin and 0x10000.bin. The file names correspond to the flash memory location each file’s data will reside in.

The built-in bootloader’s purpose is to get the ESP ready to run your code. Basically, it has to load up the volatile memory (iram, dram) with your program and turn over control to you. How does it know what to do? Let’s take a look at how the contents of the flash are laid out:

Up front there are a few parameters that the built-in firmware boot needs to read and use around the proper configuration of the on board flash interface. The bootloader extracts the flash size that is packed into the flash_size_frequency parameter to determine where some of the important data regions will be located (like that system parameter area, which is located at the end of flash memory). The meat and potatoes of the firmware format is the list of ‘segments’. Each segment is a description of how to initialize a range of memory. There will be a segments for each of the ram areas that contain their start location (Destination Address) and how many bytes we need to copy (Segment Length).

As well, you’ve got the entry point address - this is the address of the function that will be invoked once everything is in place. This is usually a function called call_user_start that lives in the libraries provided in the Espressif sdk, which will eventually invoke your own user_init function.

So when is decided what parts of your program will go where? That is all handled during the compile and linking process - eventually I’ll do a post about the linker file (usually called something like: eagle.app.v6.ld) that is used by the compiler/linker when generating the final executable file, and defines the memory regions available and the types of content that are permissible in each so that the linker can place them, replacing references to function and symbol names with the address they will live at when the code is running on real hardware.

Okay, so let’s take a look at the segments in our test program. You can take a peek at the segments that your program defines using esptool:

$ esptool.py image_info 0x00000.bin
esptool.py v2.0.1
Image version: 1
Entry point: 40100004
3 segments
Segment 1: len 0x0083c load 0x3ffe8000 file_offs 0x00000008
Segment 2: len 0x01d84 load 0x3ffe8840 file_offs 0x0000084c
Segment 3: len 0x079f4 load 0x40100000 file_offs 0x000025d8
Checksum: f3 (valid)
      
Using esptool to peek inside your firmware image file

Looking at the listing, you can compare it to the ESP’s memory map, 0x3ffe8000 and 0x3ffe8840 are dram (actually corresponding to .data and .rodata segments in our compiled application) and 0x40100000 is iram (containing functions, with the majority of them internal to the sdk).

So what is missing? What was the second file, the 0x10000.bin?

The ESP is capable of running code in the mapped flash region. If you’ve read any ESP8266 code, you’ve come across ‘ICACHE_FLASH’ which is a macro that basically tells the compiler that the function indicated can be placed in flash memory. The next block - irom0 - is the where all these functions live. When it was compiled, the linker was told that irom0 would always begin at 0x40210000, which is 0x10000 in the flash’s address space.

On a side note, you may have found reference to a 0x40000.bin instead of 0x10000.bin. Older versions of the sdk had slightly different linker files, and whether done intentionally or accidentally, it places irom later in flash, and it meant that there was significantly less irom space (192k less) for irom data. If your esptool.py generates 0x40000.bin, you are likely using an outdated SDK.

My best guess is that the 0x40000 was a mistake. Pushing irom gains more initialization space for the loaded segments (ram), but 0x40000 is 256k, which is larger than both the iram and dram combined (160k). 0x10000 is only 64k, which technically means that with that configuration you would be unable to completely fill iram and dram with initial data, but for most applications that is unnecessary. 0x10000 strikes a practical balance.

rBoot

Okay, so that is pretty simple, right?

Say you want to support over-the-air updates. You can’t really just overwrite your program using SPI commands - well, you can but if you lose power or get something wrong you’ll end up in a bad state, you’ll have bricked the device until you can connect up to it physically to reload your code. To do an upgrade safely you’ll want to have two ‘programs’, one that you know is good, usually the currently running version, and a second program area you can overwrite with the code you've fetched from the network. That way, even if something goes wrong while you are loading the new program, you’ll always have something to fall back on.

We need to be able to manage multiple executable programs - so the tool we need is a bootloader. A bootloader is itself a program, and its goals are to do the same things as the stock bootloader that we dealt with in the previous section: put everything into the right places in memory and start execution - it just adds enough flexibility to deal with multiple programs.

There are a few ways that this can all be arranged. Most of Espressif’s documentation talks about putting multiple ‘programs’ within the 1M mapped memory space. This forces you to compile your program multiple times, once for each mapped slot. Why? Remember that the irom segment is accessed directly via by its 0x40210000+ addresses - so whenever one function will call another function it’ll need to know the absolute memory address that function will be present at. If you squeeze multiple programs into the same mapped 1MiB you'll have to custom compile the program for each slot so that the address ranges don't conflict. So for each update, you'll need to keep track of which slot is currently unused and use a custom linker file when compiling it.

The easiest way we’ve found to avoid this is to use a great open-source bootloader called rBoot. We rely on rBoot’s ability to select which 1M block of flash is mapped to the 0x40200000 memory region to avoid custom linker files. We’ve ended up using the first two 1M ‘windows’ to store bootable partitions. With one acting as the ‘active’ program (it would be the currently booted slot) and the other acting as the OTA target which we overwrite with our update. Using the 1MiB method, we avoid having to worry about custom compiles for each slot since the code will always be exposed at a consistent address space location regardless if it is running from the first or second slot. rBoot manages the process of mapping the right 1MiB, and putting the right program’s segments into memory.

All we need to do is flash rBoot onto the first 4k of memory. It isn’t magic - rBoot is just another program. It needs to be loaded by the internal rom bootloader, so it is itself a non-OTA image and is loaded according to the rules we went over in the previous section. It is a little simpler and smaller than most program images - it loads only two segments (iram and dram), which will it will later overwrite with our code, and it has nothing running or accessed from the mapped flash region (no irom0/0x10000.bin).

$ esptool.py image_info rboot.bin
esptool.py v2.0.1
Image version: 1
Entry point: 40100790
2 segments
Segment 1: len 0x007a8 load 0x40100000 file_offs 0x00000008
Segment 2: len 0x002b4 load 0x3ffe8000 file_offs 0x000007b8
Checksum: 9f (valid)
      
Inside the rBoot image file

Also, the rBoot process reserves the second sector of flash (flash address 0x1000-0x1fff), where it stores the rBoot configuration that defines which partition to boot, and how memory is laid out. Check out rBoot’s documentation (it is very good) for details on the rest of the options available.

So knowing that rboot has its own use for the first two sectors, what does the entire flash layout look like?

Because we are using the 1M per program model, we ‘reserve’ the first 8k of each subsequent 1M image so that all the program image start at the same relative address. Yes, this wastes some space, but the cost is minimal, and it reduces code complexity. You could potentially use that space to store user data, but for us, that is not worth the complexity of treating partitions differently depending on order.

Starting at 0x2000 we have our program image. rBoot follows Espressif’s format for OTA images. There are a number of oddities, and things worth highlighting:

  • The header is the same structure as the non-OTA one, but its use is quite different. Hard-coded constants are provided in what had been the ‘segment count’ (4) and ‘destination address’ (0) fields, and they seem just serve to tell the bootloader to leave the segment alone, rather than change boot behaviour. The segments list actually only has one entry, and it contains the irom data.
  • Since the irom section is always the only element in the first header's segment list, we know exactly where it starts. The program header starts at 0x2000 and has exactly 16 (0x10) bytes before irom start, which means the irom will begin at 0x40202010. You’ll find that address hardcoded into the linker file for rBoot projects.
  • After the irom segment, the bootloader expects to a find a SECOND HEADER. This header exactly matches the original non-OTA header and tells the bootloader what memory segments to initialize and where to find that information. I don't think any of the flash parameters, nor the entry point address that are stored in this header are used for anything. It makes me wonder why they decided to add a second header at all, since they could have stuck with the single header, always included irom as the first segment (with a 0 destination address as a sentinel value meaning copy nothing), followed by the other regular segment initializers and called it a day. My guess would be that since the second header is exactly what would be generated for a non-OTA image, this is just the result of stitching two tools together - their existing one to generate the non-OTA header and a new one to generate the primary header/irom data header, and then they just concatenated the two outputs.
  • This arrangement also fixes the 'wasted' space issue with the default non-OTA format's use of a static offset for the start of irom after the segment data (see the discussion about 0x40000 vs. 0x10000). The irom segment still gets a static offset, and the segment intializers can grow as large as they need to be to the bounds of the available space. Win-Win.
  • Continuing to the end of that, you’ll find a final checksum byte. They were a little tricky here, and this byte is aligned to the next 16-bit boundary, so you may find some padding here.

On each boot, rBoot re-runs the checksum on the segment data and verifies that boot image has not been corrupted, and if the checksum fails to match, the boot process moves onto the previously loaded program slot, ensuring a bad/interrupted OTA update won’t prevent boot.

An rBoot option that I recommend enabling is BOOT_IROM_CHKSUM. By default the Espressif tooling only checksums the data in the second header, and thus you can end up in a bad situation if you’ve somehow corrupted your irom data. The boot loader will not detect the corruption, and will just boot your program, which will likely crash when it hits that region of memory.

To use that option, you’ll need to change how your images are then generated. You can use esptool2, which is available from the same source as rBoot, or you can use our own fork of esptool that has added a command line flag that will generate the right combined checksum. (elf2image --version=2 --checksum-irom example.out -o ./firmware/0x02000.bin)

How/where/when is the 1MiB segment mapped?

The rBoot remapping relies on Espressif’s proprietary code to work. The trick it uses to customize this mapping is to replace one of the sdk’s initialization functions with a customized routine that invokes the mapping routine with parameters that match the currently booted program. The tricky bit here is that this code that needs to be replaced is actually in your program rather than rBoot itself (specifically in the ‘main’ library you link in from the Espressif sdk). Make sure to follow the installation instructions on the homepage, specifically with regard to:

  • Creating a new libmain that has a ‘weakened’ symbol for the Cache_Read_Enable_New function
  • Including the corresponding rBoot re-implementation (rboot-bigflash.c) which uses your boot configuration to determine which 1MiB segment to map.

Performing an OTA update

The process of actually fetching a new bootable program and loading it into flash is going to be totally dependant on how you'd like to setup your infrastructure. I'll eventually write a follow-up article on how we go about it.

In the meantime, I would recommend taking a look at the a nice little api suite that ships with rBoot (rboot-api.c) that you can include in your project to query and control boot behaviour.

The End?

With all that done, you should hopefully have a better grasp of the ESP's boot process. Again, I'm happy to help with questions or to fix any mistakes I've made. I’m gil@pushrate.com.

Back to Post