smaeul-u-boot/fs/yaffs2/README-linux

Welcome to YAFFS, the first file system developed specifically for NAND flash.

It is now YAFFS2 - original YAFFS (AYFFS1) only supports 512-byte page
NAND and is now deprectated. YAFFS2 supports 512b page in 'YAFFS1
compatibility' mode (CONFIG_YAFFS_YAFFS1) and 2K or larger page NAND
in YAFFS2 mode (CONFIG_YAFFS_YAFFS2).


A note on licencing
-------------------
YAFFS is available under the GPL and via alternative licensing
arrangements with Aleph One. If you're using YAFFS as a Linux kernel
file system then it will be under the GPL. For use in other situations
you should discuss licensing issues with Aleph One.


Terminology
-----------
Page -  NAND addressable unit (normally 512b or 2Kbyte size) - can
	be read, written, marked bad. Has associated OOB.
Block - Eraseable unit. 64 Pages. (128K on 2K NAND, 32K on 512b NAND)
OOB -   'spare area' of each page for ECC, bad block marked and YAFFS
	tags. 16 bytes per 512b - 64 bytes for 2K page size.
Chunk - Basic YAFFS addressable unit. Same size as Page.
Object - YAFFS Object: File, Directory, Link, Device etc.

YAFFS design
------------

YAFFS is a log-structured filesystem. It is designed particularly for
NAND (as opposed to NOR) flash, to be flash-friendly, robust due to
journalling, and to have low RAM and boot time overheads. File data is
stored in 'chunks'. Chunks are the same size as NAND pages. Each page
is marked with file id and chunk number. These marking 'tags' are
stored in the OOB (or 'spare') region of the flash. The chunk number
is determined by dividing the file position by the chunk size. Each
chunk has a number of valid bytes, which equals the page size for all
except the last chunk in a file.

File 'headers' are stored as the first page in a file, marked as a
different type to data pages. The same mechanism is used to store
directories, device files, links etc. The first page describes which
type of object it is.

YAFFS2 never re-writes a page, because the spec of NAND chips does not
allow it. (YAFFS1 used to mark a block 'deleted' in the OOB). Deletion
is managed by moving deleted objects to the special, hidden 'unlinked'
directory. These records are preserved until all the pages containing
the object have been erased (We know when this happen by keeping a
count of chunks remaining on the system for each object - when it
reaches zero the object really is gone).

When data in a file is overwritten, the relevant chunks are replaced
by writing new pages to flash containing the new data but the same
tags.

Pages are also marked with a short (2 bit) serial number that
increments each time the page at this position is incremented. The
reason for this is that if power loss/crash/other act of demonic
forces happens before the replaced page is marked as discarded, it is
possible to have two pages with the same tags. The serial number is
used to arbitrate.

A block containing only discarded pages (termed a dirty block) is an
obvious candidate for garbage collection. Otherwise valid pages can be
copied off a block thus rendering the whole block discarded and ready
for garbage collection.

In theory you don't need to hold the file structure in RAM... you
could just scan the whole flash looking for pages when you need them.
In practice though you'd want better file access times than that! The
mechanism proposed here is to have a list of __u16 page addresses
associated with each file. Since there are 2^18 pages in a 128MB NAND,
a __u16 is insufficient to uniquely identify a page but is does
identify a group of 4 pages - a small enough region to search
exhaustively. This mechanism is clearly expandable to larger NAND
devices - within reason. The RAM overhead with this approach is approx
2 bytes per page - 512kB of RAM for a whole 128MB NAND.

Boot-time scanning to build the file structure lists only requires
one pass reading NAND. If proper shutdowns happen the current RAM
summary of the filesystem status is saved to flash, called
'checkpointing'. This saves re-scanning the flash on startup, and gives
huge boot/mount time savings.

YAFFS regenerates its state by 'replaying the tape'  - i.e. by
scanning the chunks in their allocation order (i.e. block sequence ID
order), which is usually different form the media block order. Each
block is still only read once - starting from the end of the media and
working back.

YAFFS tags in YAFFS1 mode:

18-bit Object ID (2^18 files, i.e. > 260,000 files). File id 0- is not
       valid and indicates a deleted page. File od 0x3ffff is also not valid.
       Synonymous with inode.
2-bit  serial number
20-bit Chunk ID within file. Limit of 2^20 chunks/pages per file (i.e.
       > 500MB max file size). Chunk ID 0 is the file header for the file.
10-bit counter of the number of bytes used in the page.
12 bit ECC on tags

YAFFS tags in YAFFS2 mode:
  4 bytes 32-bit chunk ID
  4 bytes 32-bit object ID
  2 bytes Number of data bytes in this chunk
  4 bytes Sequence number for this block
  3 bytes ECC on tags
 12 bytes ECC on data (3 bytes per 256 bytes of data)


Page allocation and garbage collection

Pages are allocated sequentially from the currently selected block.
When all the pages in the block are filled, another clean block is
selected for allocation. At least two or three clean blocks are
reserved for garbage collection purposes. If there are insufficient
clean blocks available, then a dirty block ( ie one containing only
discarded pages) is erased to free it up as a clean block. If no dirty
blocks are available, then the dirtiest block is selected for garbage
collection.

Garbage collection is performed by copying the valid data pages into
new data pages thus rendering all the pages in this block dirty and
freeing it up for erasure. I also like the idea of selecting a block
at random some small percentage of the time - thus reducing the chance
of wear differences.

YAFFS is single-threaded. Garbage-collection is done as a parasitic
task of writing data. So each time some data is written, a bit of
pending garbage collection is done. More pages are garbage-collected
when free space is tight.


Flash writing

YAFFS only ever writes each page once, complying with the requirements
of the most restricitve NAND devices.

Wear levelling

This comes as a side-effect of the block-allocation strategy. Data is
always written on the next free block, so they are all used equally.
Blocks containing data that is written but never erased will not get
back into the free list, so wear is levelled over only blocks which
are free or become free, not blocks which never change.


Some helpful info
-----------------

Formatting a YAFFS device is simply done by erasing it.

Making an initial filesystem can be tricky because YAFFS uses the OOB
and thus the bytes that get written depend on the YAFFS data (tags),
and the ECC bytes and bad block markers which are dictated by the
hardware and/or the MTD subsystem. The data layout also depends on the
device page size (512b or 2K). Because YAFFS is only responsible for
some of the OOB data, generating a filesystem offline requires
detailed knowledge of what the other parts (MTD and NAND
driver/hardware) are going to do.

To make a YAFFS filesystem you have 3 options:

1) Boot the system with an empty NAND device mounted as YAFFS and copy
   stuff on.

2) Make a filesystem image offline, then boot the system and use
   MTDutils to write an image to flash.

3) Make a filesystem image offline and use some tool like a bootloader to
   write it to flash.

Option 1 avoids a lot of issues because all the parts
(YAFFS/MTD/hardware) all take care of their own bits and (if you have
put things together properly) it will 'just work'. YAFFS just needs to
know how many bytes of the OOB it can use. However sometimes it is not
practical.

Option 2 lets MTD/hardware take care of the ECC so the filesystem
image just had to know which bytes to use for YAFFS Tags.

Option 3 is hardest as the image creator needs to know exactly what
ECC bytes, endianness and algorithm to use as well as which bytes are
available to YAFFS.

mkyaffs2image creates an image suitable for option 3 for the
particular case of yaffs2 on 2K page NAND with default MTD layout.

mkyaffsimage creates an equivalent image for 512b page NAND (i.e.
yaffs1 format).

Bootloaders
-----------

A bootloader using YAFFS needs to know how MTD is laying out the OOB
so that it can skip bad blocks.

YAFFS Tracing
-------------