HBase row key range assignment

As I'm designing a row key for my HBase table, I have two questions to ask

  1. How are the row key ranges are assigned across HBase regions?
  2. Do the row insertions affect the row key assignment?

(consider we have only two regions)

To elaborate the question,

  1. If I am inserting row keys starting with axx, bxx,...,zxx does the HBase Master asssign ranges as a-m in to one region and n-z to another region ?

  2. In another case If I'm inserting rowkeys starting only with axx and bxx, does it assign axx to region one and bxx to the other?

1 answer

  • answered 2018-05-16 10:49 Ben Watson

    Splitting does not occur in HBase until existing regions fill up. So if you set up an HBase cluster with 2 region servers, all data will only be added to one region initially. When that region fills up, data will be split across two regions based on whatever key is in the middle of the full region.

    For your question 1., all keys would be added to one region initially. Assuming an even spread of keys, you should expect to see something close to a-m in one and n-z in another, after the first split occurs.

    To show this graphically, assume our two regions can only store four rows each. After entering four records, you'd see:

    REGION 1   REGION 2
    +-----+    +-----+
    | axx |    |     |
    | bxx |    |     |
    | cxx |    |     |
    | dxx |    |     |
    +-----+    +-----+
    

    Now if we want to add axy, it won't fit in REGION 1 and so splitting occurs across the middle of the region:

    REGION 1   REGION 2
    +-----+    +-----+
    | axx |    | cxx |
    | bxx |    | dxx |
    |     |    |     |
    |     |    |     |
    +-----+    +-----+
    

    and finally our new record is added:

    REGION 1   REGION 2
    +-----+    +-----+
    | axx |    | cxx |
    | axy |    | dxx |
    | bxx |    |     |
    |     |    |     |
    +-----+    +-----+
    

    PRE-SPLITTING

    If you know your likely key distribution in advance and wish to avoid expensive automatic splits, you can pre-split when you create the table:

    create 'animals', 'a', {SPLITS => ['e','m','r']}
    

    This would create four regions, each containing data between 0-e, e-m, m-r, r-z.