Wednesday, September 18, 2013

What is the configuration cache

You might have seen the term configuration cache if you've played around with libcouchbase or the Couchbase PHP extension, but its not very well documented anywhere. So what is the configuration cache? To answer that question I should probably start by describing how the
client work.

libcouchbase is what we call a smart client; which means it reacts to changes in the topology of the cluster. So how does the client do this? When it is instanciated the first thing is does is to connect to the cluster (streaming REST call) receive notifications for changes in the topology. This doesn't sound like a problem, but it might become one. Unfortunately these REST streaming connection is not only time consuming to set up (compared to the data connections) and relatively resource hungry on the server side, so they don't fit very well in a scenario where you have a large amount of client or short lived connections.

It is not uncommon for people deploying PHP applications to run thousands of PHP processes, which would mean thousands of connections to the REST interface. The data connections are really cheap and fast to set up, so they're not causing any problems with this kind of deployment. In older versions of Couchbase I have unfortunately seen the cluster becoming unstable with such a high number of clients connecting to it.

When you think of it most clusters are running in a steady state most of the time. You don't add/remove nodes to the cluster very often, so I would guess that in 99% of the time clients will NEVER receive an update from the cluster that its topology is changing. This makes the information extremely well suited for caching, and thats exactly what the configuration cache is. It is really simple, but yet so effective:
  • When the client is instanciated it looks for a named file in the filesystem.
    • If it is there it assumes that to be the current cluster configuration and starts using it.
    • If it isn't there it starts the normal bootstrap logic to get the configuration from the cluster, and writes it to the file.
  • Whenever we try to access an item on a node and the node tells us that we tried to communicate to the wrong node, we invalidate the cache and request a new copy of the configuration cache.

So how do you go ahead and use the configuration cache? From PHP it is extremely easy, all you need to do is to add the following to php.ini:

couchbase.config_cache = /tmp

And the Couchbase php driver will start storing configuration cache files in the tmp directory. From C you would use the "compat mode" when you create your instance:

lcb_t instance;
lcb_error_t err;
struct lcb_cached_config_st config;

memset(&config, 0, sizeof(config));
config.createopt.version = 1;
config.createopt.v.v1.host = "host1";
config.createopt.v.v1.user = "mybucket";
config.createopt.v.v1.passwd = "secret";
config.createopt.v.v1.bucket = "mybucket";
config.cachefile = "/var/tmp/couchbase-config-cache";

err = lcb_create_compat(LCB_CACHED_CONFIG, &config,
                        &instance, NULL);
if (err != LCB_SUCCESS) {
     ... error ...
}

Happy hacking!

Tuesday, September 17, 2013

What is vbuckets, and should I care?

The really short answer is: Not unless you really want to know the internals of the Couchbase Server. It is more than sufficient to know about buckets and how to add/remove nodes (and their impact on the
system).

The vbuckets exist so that the Couchbase cluster can move data around within the cluster. When you create a Couchbase Bucket, the cluster splits that bucket up into a fixed number of partitions. Each of this partitions is then assigned an id (the vbucket id) and assigned to a node in the cluster. The thing that maps the different partitions to the physical address is then called the vbucket map. So why not call it partitions? There is no reason for not doing so, but we chose vbuckets for "virtual bucket". At the time, it was never intended to be visible outside "the internals" of the server.

Lets walk through an example and you might see what it is used for. Imagine that you would like to access a document stored under the id "foo". The first thing you would do would be to create a hash value for the key, and then use the hash value to look up which vbucket it belongs to. The number of vbuckets is predefined, and will never change for a given cluster (it is currently set to 1024 on linux/Windows or 256 on Mac OS). With the vbucket id in place we consult the vbucket map to see whos responsible for that vbucket. The client will connect to that server and request the document "foo" from the given vbucket. In fact, the vbucket number to use is in the request itself and defined by the client, based on querying the cluster manager. The client's copy of the vbucket map could be obsolete and the vbucket is not located on the server causing it to return "not my vbucket" and the client should try to update the map. If the vbucket is located on the server it will return the document if it exists.

By having such an indirection from the actual partition and where it is currently located, we can easily move data from one node in the cluster to another node in the cluster (this is what happens during rebalance) and then update the map when we're done copying all data to the new node. When you set up the first node in your cluster all of the vbuckets reside on that node. While you add nodes the vbuckets (and the data) will be spread out across all of the nodes. The cluster tries to keep the distribution of vbuckets evenly across all nodes, to avoid some nodes to be overloaded.

Since we already had a way to transfer all of the data from one node to another node, we could use the same logic to keep replicas on other nodes. The same vbucket id is used on the other server, so the vbucket map could look something like:

+------------+---------+---------+
| vbucket id | active  | replica |
+------------+---------+---------+
|     0      | node A  | node B  |
|     1      | node B  | node C  |
|     2      | node C  | node D  |
|     3      | node D  | node A  |
+------------+---------+---------+

This means that node A have two vbuckets: 0 and 3. VBucket 0 is an active vbucket, which means that all get/set request would go to this node. VBucket 3 is on the other hand only used to keep replicas (there is a special command you may used to read replicas).

Lets imagine that one of your coworkers accidentally spilled his coffee into node 3 causing it to crash and never come up again. You as the administrator could now fail out the node, causing vbucket 3 on node A to be promoted to "active" and all read/write requests would go to that node instead.

As you see these are really some "internal guts" of the Couchbase server that you as user of the cluster really don't need to care about. I would say you'd be better off spending the time focusing on your application and ensuring that you don't under/over provision your cluster. It is by far more important to monitor that the IO path of your cluster is scaled according to your applications usage. If you don't have enough nodes to persist the data you might end up in a situation where your cluster is constantly "out of memory" and returns a message to the clients to back off. If you end up in this situation your cluster will be sluggish, and only accept a small number of updates every time it's written documents to disk.

Monday, September 16, 2013

Using libcouchbase in a multithreaded environment

How to use libcouchbase in a multithreaded environment seems to be a recurrent question, so I figured I should do a quick blog post about that. One of the design goals of libcouchbase is that it should work in a multithreaded environment, but there are ways to implement that:
  • The library does all of the locking so that the client user don't have to care about it. This makes the life for the client users easy, but might introduce unnecessary locking and limitations.
  • The client use the library in a certain way, and introduce locking according to the way the library work.

For libcouchbase I chose the second approach. There are absolutely no locks in libcouchbase, so you as a client user have to ensure that you use libcouchbase in a safe way. Luckily for you doing so isn't hard at all. As long as you don't access the same lcb_t from multiple threads at the same time you should be safe:

void *my_thread_routine(void *arg) {
    lcb_create_st *options = arg;
    lcb_t instance;
    lcb_create(&instance, options);
    ....
}

....

lcb_create_st opt;
memset(&opt, 0, sizeof(opt));
opt.version = 0;
opt.v.v0.host = "mucluster";

for (int ii = 0; ii < 100; ++ii) {
   pthread_t tid;
   pthread_create(&tid, NULL, my_thread_routine, &opt);
}

The above would create 100 threads which in turn would create its own libcouchbase instance that would be safe to use within that thread. The "problem" with the above code is that it would use a lot
of resources not only on the client, but also on the server. Each lcb instance occupies one socket connection to the cluster management console (which is pretty resource heavy on the cluster), and one data connection to each of the nodes in the cluster. The code snippet would therefore use 300 sockets for a two-noe cluster. The connection to the management node may be eliminated if we use the configuration cache (I'll blog about that at a later time), but if your application barely use the libcocuhbase instances it still requires too much resources.

One easy way to work around this is to create a pool of libcouchbase instances and grab an instance from that whenever you need to perform an operation to Couchbase, and release it back to the pool whenever you're done:

void *my_thread_routine(void *arg) {
    my_instance_pool *pool = arg;

    lcb_t instance = pool_pop(pool);
    ....

    pool_push(pool, instance);
}

....

lcb_create_st opt;
memset(&opt, 0, sizeof(opt));
opt.version = 0;
opt.v.v0.host = "mucluster";
my_instance_pool *pool = pool_create(&opt, 10);

for (int ii = 0; ii < 100; ++ii) {
   pthread_t tid;
   pthread_create(&tid, NULL, my_thread_routine, pool);
}

By using such a pool you can control the resources used (like number of sockets), all you need to do is to tune the size of the pool to match the concurrency you're aiming for. You could even set the pool size to 1, and end up with a "singleton".

One thing that is really important to note here is that you can't share the same IO instance between the threads. None of the default IO providers are multithread safe, so bad things will happen if you try
to do so. That being said there is nothing stopping you from making an MT-safe IO provider and use a dedicated IO thread that the clients utilize (but I'll leave that up to you to figure out if it is worth
the extra work ;-))

So how does the code for such a resource pool look like? I added an extremely simple implementation to the example-section of libcouchase. Feel free to look at the example and cherry-pick some ideas :-)

Happy hacking!