Turns out that things were more complicated than I thought. It was one of those bugs that seems impossible, until the sky parts and you have that moment of zen.
I thought I had everything up and running, with rBoot checksumming the entire image. I was running with three boot slots, and I could boot from each of them - and I could prove it: During the compile cycle I changed the greeting text - for boot 0, it said “Hello from boot 0", boot 1: "Hello from boot 1" and ... you get the idea. The terminal faithfully proved everything was working.
I wanted to make sure any corruption would be detected, so I set about to test the setup as follows:
- I wrote a blank sector over the code in block 0, and booted the ESP. Success, corruption detected!
- The bootloader jumped to the boot 1 as expected, started that slot running, but quickly crashed!?
That is what it seemed like. I’ll save you a bunch of headscratching and just tell you what was actually happening:
- I was never really booting any slot other than 0.
- I hadn’t properly overridden the sdk function that handles flash-to-esp mapping, so mapped flash MiB never changed. I was always running the first program (well, at least it's irom portion).
- The bootloader was correctly initializing the iram and dram memory from the intended boot slot, but we were using the irom from boot 0! I was getting the string table from boot 1, while still running most of the code from boot 0. So the “Hello" message was being loaded from the intended boot partition, and the boot 0 code would print the 'right' message. When I corrupted boot0, and started boot1, it would run until it hit the corrupted sector of boot0.