Squashed 'src/leveldb/' content from commit aca1ffc

git-subtree-dir: src/leveldb git-subtree-split: aca1ffc4b65be5e099b2088c6e6a308d69e1ad73
2013-05-03 19:06:59 -07:00
commit c25e98186d
136 changed files with 27582 additions and 0 deletions
--- a/doc/table_format.txt
+++ b/doc/table_format.txt
@@ -0,0 +1,104 @@
+File format
+===========
+
+  <beginning_of_file>
+  [data block 1]
+  [data block 2]
+  ...
+  [data block N]
+  [meta block 1]
+  ...
+  [meta block K]
+  [metaindex block]
+  [index block]
+  [Footer]        (fixed size; starts at file_size - sizeof(Footer))
+  <end_of_file>
+
+The file contains internal pointers.  Each such pointer is called
+a BlockHandle and contains the following information:
+  offset:	    varint64
+  size:		    varint64
+See https://developers.google.com/protocol-buffers/docs/encoding#varints
+for an explanation of varint64 format.
+
+(1) The sequence of key/value pairs in the file are stored in sorted
+order and partitioned into a sequence of data blocks.  These blocks
+come one after another at the beginning of the file.  Each data block
+is formatted according to the code in block_builder.cc, and then
+optionally compressed.
+
+(2) After the data blocks we store a bunch of meta blocks.  The
+supported meta block types are described below.  More meta block types
+may be added in the future.  Each meta block is again formatted using
+block_builder.cc and then optionally compressed.
+
+(3) A "metaindex" block.  It contains one entry for every other meta
+block where the key is the name of the meta block and the value is a
+BlockHandle pointing to that meta block.
+
+(4) An "index" block.  This block contains one entry per data block,
+where the key is a string >= last key in that data block and before
+the first key in the successive data block.  The value is the
+BlockHandle for the data block.
+
+(6) At the very end of the file is a fixed length footer that contains
+the BlockHandle of the metaindex and index blocks as well as a magic number.
+       metaindex_handle: char[p];    // Block handle for metaindex
+       index_handle:     char[q];    // Block handle for index
+       padding:          char[40-p-q]; // zeroed bytes to make fixed length
+                                       // (40==2*BlockHandle::kMaxEncodedLength)
+       magic:            fixed64;    // == 0xdb4775248b80fb57 (little-endian)
+
+"filter" Meta Block
+-------------------
+
+If a "FilterPolicy" was specified when the database was opened, a
+filter block is stored in each table.  The "metaindex" block contains
+an entry that maps from "filter.<N>" to the BlockHandle for the filter
+block where "<N>" is the string returned by the filter policy's
+"Name()" method.
+
+The filter block stores a sequence of filters, where filter i contains
+the output of FilterPolicy::CreateFilter() on all keys that are stored
+in a block whose file offset falls within the range
+
+    [ i*base ... (i+1)*base-1 ]
+
+Currently, "base" is 2KB.  So for example, if blocks X and Y start in
+the range [ 0KB .. 2KB-1 ], all of the keys in X and Y will be
+converted to a filter by calling FilterPolicy::CreateFilter(), and the
+resulting filter will be stored as the first filter in the filter
+block.
+
+The filter block is formatted as follows:
+
+     [filter 0]
+     [filter 1]
+     [filter 2]
+     ...
+     [filter N-1]
+
+     [offset of filter 0]                  : 4 bytes
+     [offset of filter 1]                  : 4 bytes
+     [offset of filter 2]                  : 4 bytes
+     ...
+     [offset of filter N-1]                : 4 bytes
+
+     [offset of beginning of offset array] : 4 bytes
+     lg(base)                              : 1 byte
+
+The offset array at the end of the filter block allows efficient
+mapping from a data block offset to the corresponding filter.
+
+"stats" Meta Block
+------------------
+
+This meta block contains a bunch of stats.  The key is the name
+of the statistic.  The value contains the statistic.
+TODO(postrelease): record following stats.
+  data size
+  index size
+  key size (uncompressed)
+  value size (uncompressed)
+  number of entries
+  number of data blocks