This is the mail archive of the ecos-devel@sources.redhat.com mailing list for the eCos project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

contributing a failsafe update meachanism for FIS from within ecos applications


Hi,

I already posted to ecos-discuss, but without much response.

We need to be able to perform safe updates of the firmware, safe regarding power loss at any point in time. Since redboot comes with FIS, we'd like to use fis.
In order to update the firmware a new firmware image has to be placed on the flash and the fis directory has to be updated. When updating the fis directory, the directory is erased and afterwards written with the new contents.
Now if the power goes down directly after erasing the directory redboot can't start the firmware image anymore since it can't read the directory.

In order to enable failsafe operation of redboot and fis under such circumstances, a backup of the fis directory has to be kept until the new directory has been written successfully.
Here comes my proposed strategy:
Currently the fis directory occupies one block of the flash. For safe operation it needs a second block. Both blocks contain the fis directory, but only one is valid (and current).
In the attached patch these blocks are called block 0 (this one existed until now) and block 1 (this one is the new additional one).
Redboot needs a way to determine which block contains the valid information.
For this and to stay compatible with existing flash, I suggest to use the first entry of the fis directory table as a valid marker, which can be used to decide which of the two blocks is valid.
It looks like this:

//1st and 4th byte should be 0xff the two middle bytes != 0xff (endianess)
#define EFIS_MAGIC (0xff1234ff)

#define EFIS_VALID       (0xa5a5a5a5)
#define EFIS_IN_PROGRESS (0xfdfdfdfd)
#define EFIS_EMPTY       (0xffffffff)

struct fis_valid_info
{
   unsigned int magic;          
   unsigned int valid_flag;
   unsigned int version_count;
};

The "magic" is used to check if this fis directory contains a fis_valid_info. 0xff1234ff is constructed this way in order to be compatible with the rest of the algorithms in redboot, which use the first (two) byte(s) of the name to check if the entry is empty. So this entry can't be mistaken as an entry describing an image on the flash.
If the magic matches, the valid_flag is evaluated.
If it is equal to EFIS_VALID then this directory is valid. If both fis directories (from both blocks) have the correct magic and are valid, the version_count comes into play.
The fis directory with the higher version_count will be considered as the most recent valid fis directory and thus be used.

When performing a safe update, the algorithm must do the following:
(after the * followes what happens when the power goes down at this point in time)

1. modify the fis directory (in RAM) so that it reflects the desired changes, set the valid_flag to EFIS_IN_PROGRESS and set version_count=version_count+1;
*nothing has changed yet, so redboot will work as before

2. erase the flash where the currently invalid fis directory is located
*the valid_flag of the fis directory which will become the new valid directory is 0xffffffff, and the valid flag of the currently still active directory is still 0xa5a5a5a5, and the images haven't been touched yet, so still everything ok for redboot

3. write the modified fis directory in this erased flash block.   
*as above, but the valid_flag of the directory which is intended to become valid is now 0xfdfdfdfd. The images still haven't been touched, so everything is ok.

4. modify the flash image (erase, program)
*now the image has been modified. If you erase the only runnable firmware image on the flash you are of course lost, just avoid this. In all other cases, there is still a working fis directory and a working firmware image on the flash. The old current fis directory is still valid, and the currently running firmware image hasn't been touched. By checking the crc's of the images later you can detect which images are broken.

5. after the image is written, set the valid_flag of the fis directory which will become active to 0xa5a5a5a5a5. In order to do this, the flash block doesn't have to be erased, since the transition from 0xfdfdfdfd to 0xa5a5a5a5 only sets some bits to 0. When this is done, the image has been written correctly and the new fis directory has the right magic, the right valid_flag and its version_count is higher than the version_count of the old fis directory.
*if the power goes down while writing the 4 bytes of the valid_flag, either the valid_flag has already reached 0xa5a5a5a5, then everything is ok, if not it will have a valid_flag != 0xa5a5a5a5 and thus not be considered valid.

The attached patch implements support for this strategy in redboot. It basically reads the first entry of both fis blocks, checks them and sets one to be the valid one. The redboot fis commands still behave as always, they don't do the fancy algorithm as described. This is ok since no customer of a device will perform the update directly via fis commands, and if he does, he can simply rebuild the fis with the fis commands.

The other attached file fisfs.tar.gz contains the current working version of the fisfs implementation for ecos applications. I didn't find the time yer to turn this into a real ecos filesystem, but this will happen til the end of this year.
You can have a look at fisfs.cpp, e.g. the function eraseImage() to see an implementation of this strategy.

Since we would like to use this strategy, it would be nice if fis.patch could be applied to ecos cvs, so that we can rely on the binary format on the flash. I guess at least the creation of the fis_valid_info entry in fis_init should be #ifdef'ed with a config switch.

So, what do you think ?

Bye
Alex

Attachment: fis.patch
Description: fis.patch

Attachment: fisfs.tar.gz
Description: fisfs.tar.gz


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]