Skip to main content

Hash tagging Redis keys in a clustered environment


Hello folks,

In this post, we'll talk a little bit about

  • Redis cluster.
  • Limitations of Redis cluster.
  • How we can overcome the limitations of redis cluster.

Redis cluster is a distributed implementation of Redis with three major goals:

  • High performance and linear scalability upto 1000 nodes.
  • Acceptable degree of write safety.
  • Availability: Redis cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable slave for every master node that is no longer reachable.

As an example, let's take a look at the following cluster configuration with six nodes.

  1. Node A (master node)
  2. Node B (master node)
  3. Node C (master node)
  4. Node D (slave of master node A)
  5. Node E (slave of master node B)
  6. Node F (slave of master node C)
Now at this point, a natural question may arise, "When I write to a Redis key, how a node is picked up to store that key or what are the factors that decide which node to store the key in"?

To get answer to this question, let's have a look at Redis Key Distribution Model.
The key space is split into 16384 hash slots, effectively setting an upper limit for the cluster size 16384 master nodes (however, the suggested max size of nodes is in the order of ~1000 nodes).
Each master node in a cluster, handles a subset of 16384 slots.
So, possibly the key slot distribution for our above configuration is as below:
  • Master Node A contains hash slots from 0 to 5500.
  • Master Node B contains hash slots from 5501 to 11000.
  • Master Node C contains hash slots from 11001 to 16383.
Redis cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of a "Hash Slot". Redis uses CRC-16 to map a Key to a Hash Slot. The basic algorithm used to map keys to hash slots is the following:

HASH_SLOT=CRC16 (key) mod 16384

To summarize, a Hash Slot is decided from the key and from Hash Slot a Node is picked up to store the key.

At the same time, we also need to understand that, clustering implies some limitations on the way we use Redis keys (these limitations are very logical though).
  1. Transaction cannot be performed on keys which are part of different range of hash slots.
  2. Multi key operations cannot be performed on keys which are part of different range of hash slots.
For example, suppose there are two keys "key1" and "key2". key1 is mapped to hash slot 5500, thus, is stored in Node A. key2 is mapped to hash slot 5501, thus, is stored in Node B.
So, we cannot perform transaction on those keys. Nor we can perform multi key operations on key1 and key2. Multi key operation like "mget key1 key2" will throw exception.

However, in a practical scenario, we often find ourselves in a need to perform Multi Key operations.
How can we achieve this in a clustered environment?
A simple answer is, by ensuring that the keys on which we perform multi-key operation or transaction, are part of same hash slot range. And we ensure this by "Hash Tagging" Redis keys.
Hash Tags are a way to ensure that multiple keys are allocated in the same hash slot. There is an exception in the computation of hash slots which is used in implementing Hash Tags.

In order to implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a "{...}" pattern only the substring between { and } is hashed in order to obtain the hash slot. However since it is possible that there are multiple occurrences of { or }, the algorithm is well specified by the following rules:
  • IF the key contains a { character.
  • AND IF there is a } character to the right of {
  • AND IF there are one or more characters between the first occurrence of { and the first occurrence of }.
Then instead of hashing the key, only what is between the first occurrence of { and the following first occurrence of } is hashed. Let's have a look at the following examples:
  1. The two keys {user1000}.following and {user1000}.followers will hash to the same hash slot since only the substring user1000 will be hashed in order to compute the hash slot.
  2. For the key foo{}{bar} the whole key will be hashed as usually since the first occurrence of { is followed by } on the right without characters in the middle.
  3. For the key foo{{bar}}zap the substring {bar will be hashed, because it is the substring between the first occurrence of { and the first occurrence of } on its right.
  4. For the key foo{bar}{zap} the substring bar will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of { and }.
  5. What follows from the algorithm is that if the key starts with {}, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.
So far we've seen that, how hash tagging can help us overcome the limitations implied by Redis cluster.

Hope this article comes handy for many of you while working with Redis cluster.

Thank you....





Comments

Post a Comment

Popular posts from this blog

Deploying a self contained .Net core application on Linux and run as a daemon process

CodeProject Recently I got a chance to play little bit with .NET core on Linux where I developed an application in .Net core which I deployed on an Ubuntu 16.04 machine and configured to run it as a daemon process. In this process I learnt quite a few things which might be helpful for beginners in .NET core or for the folks who would like to play around with .NET core. Choosing the deployment model: Two types of deployments can be created for .NET core applications: Framework-dependent deployment: As the name implies, framework-dependent deployment (FDD) relies on a shared system-wide version of .NET Core to be present on the target machine. Because, .NET Core is already present, it can be shared by multiple applications hosted on the target machine. Self-contained deployment: Unlike FDD, a self-contained deployment does not rely on any shared components to be present on the target machine. All components, including both .NET Core libraries and the .NET Core runtime are includ...

Increase performance by using Cache-Control header in Blob Storage

CodeProject Hello Folks, Caching has always been an important part of a good application design and performance optimization. Fetching something over the network is both slow and expensive; large responses require many round-trips between the client and server, which delays when they are available and can be processed by the browser, and also incurs data costs for the visitor. As a result, the ability to cache and reuse previously fetched resources is a critical aspect of optimizing for performance. In this post, we'll see how we can optimize performance by using "Cache-Control" header in Azure Blob Storage.For this, I assume you have an Azure subscription and have a Storage account. Let's see step by step how we can add "Cache-Control" header to a block blob. In this example we'll upload an image to Azure Blob Storage from a simple ASP.Net application. 1. Go to Azure Portal -> Storage->Containers and create a new container. You can selec...