I have been learning Linux by myself for about one week. For the very beginning, like my classmates and those who are totally fresh men in Linux, I spent up to two to three days with setting my Gentoo up. And my wireless net still fails to work at this moment and only command-line console is available. But this is not the point I write this article, the purpose is to talk about some intuitions on a further study of the Kernel’s Red-Black Tree. Later I will compose another issue to cover my tough way walking through with setting up my Gentoo and today is for the RBTree.
Before my surface anatomy of the rbtree.h. I believe all of you wanna a brief glance at how important and how comprehensively is RBTree used in the Kernel of Linux.
To quote Linux Weekly News:
There are a number of red-black trees in use in the kernel.
The anticipatory, deadline, and CFQ I/O schedulers all employ rbtrees to track requests; the packet CD/DVD driver does the same.
The high-resolution timer code uses an rbtree to organize outstanding timer requests. The ext3 filesystem tracks directory entries in a red-black tree. Virtual memory areas (VMAs) are tracked with red-black trees, as are epoll file descriptors, cryptographic keys, and network packets in the "hierarchical token bucket" scheduler.
May be there are still possibilities for RBTree in other aspects. My talk won’t cover how RBTree is implemented as a DataStructure but only as a ever existing handy tool for you to use, more explicitly for you to inherit from the original code. The kernel coder uses a marvelous coding tricks to implement inheritance in C. Yeah as you can see it is C. Actually not inheritance in a vigorous definition but It does shock us fresh coders because inheritance is always a well-known terminology for Objected-Orientated Programming. And that is one of my initiations to give such a article.
Anatomy of The Rbtree.h
Anatomy of The Rbtree.h
Now it is time to get started.
Some important files you should know:
/usr/src/linux/include/linux/rbtree.h
/usr/src/linux/lib/rbtree.c
As you can see those two files are prepared for kernel programming purpose. So it is not advisable to include the rbtree.h in your user space programs. For more information you should see http://kernelnewbies.org/KernelHeaders. A simple way to deal with it is to reserve a copy of combination of rbtree.c and rbtree.h in your own directory as a single file rbtree.h. I will tell you how to compile it independently from the kernel code.
The first step is to take a look at the rbtree.h:
There are two definitions of structures:
struct rb_node
{
unsigned long rb_parent_color;
#define RB_RED 0
#define RB_BLACK 1
struct rb_node *rb_right;
struct rb_node *rb_left;
} __attribute__((aligned(sizeof(long))));
struct rb_root
{
struct rb_node *rb_node;
};
_______________________________________________
The second one only contains a pointer to a rb_node. Rb_root is used to catch the track of the root of the RBTree. So we need only one rb_root for each RBTree.
It is the key structure for RBTree in Linux. As you can see there are three members in it. The last two variable are merely pointers to child nodes. There are something tricky for the rb_parent_color. This integer contains two major information of the node: the address of the parent node and the color of itself.
How could it be to store two items in one variable? Just use different bits. rb_parent_color is a variable with 4 bytes or say , 32 bits. The resolution is to use the two least significant bits to record the color( actually only one bit is required to tell red or black ) and the rest 30 bits to record the address.
Does it work? Sure, look at the last line of the structure “__attribute__ ((aligned(sizeof(long))))”. This is a specific feature for GNU C/C++, which tells the compiler to align all struct rb_node in memory with starting addresses divisible by sizeof(long) = 4. So the addresses of all struct rb_node do not use the two least significant bits.
To achieve the purpose of operating struct rb_node, the rbtree.h offers a lot of macros for us.
_______________________________________________________________
#define rb_parent(r) ((struct rb_node *)((r)->rb_parent_color & ~3))
#define rb_color(r) ((r)->rb_parent_color & 1)
#define rb_is_red(r) (!rb_color(r))
#define rb_is_black(r) rb_color(r)
#define rb_set_red(r) do { (r)->rb_parent_color &= ~1; } while (0)
#define rb_set_black(r) do { (r)->rb_parent_color |= 1; } while (0)
_______________________________________________________________
All of the macros above take a pointer to rb_node as parameter. The functions can be derived from the names, which won’t take too much time to handle.
Look at another batch of macros.
_______________________________________________________________
#define RB_ROOT (struct rb_root) { NULL, }
#define rb_entry(ptr, type, member) container_of(ptr, type, member)
#define RB_EMPTY_ROOT(root) ((root)->rb_node == NULL)
#define RB_EMPTY_NODE(node) (rb_parent(node) == node)
#define RB_CLEAR_NODE(node) (rb_set_parent(node, node))
_______________________________________________________________
The first one RB_ROOT is used to initialize a rb_root. Otherwise you can use rb_root->rb_node = NULL to initialize.
The only one need to be mention is rb_entry. It is declared as another macro “container_of” in kernel.h. To make stuff explicit you should take a glimpse at its definition:
_______________________________________________________________
#define container_of(ptr, type, member) ({ \
const typeof( ((type *)0)->member ) *__mptr = (ptr); \
(type *)( (char *)__mptr - offsetof(type,member) );})
(note that the macro offsetof is defined in stddef.h, which is to compute the offset from the beginning of the structure to the member )
_______________________________________________________________
Note that terminology “typeof” is another feature of GNU C/C++, which allows you squeeze out the type of a certain input variable. This is helpful if you are going to define new variables in a macro as the “container_of” does. This “container_of” macro is created to obtain the parent struct of a give struct member. For example, if a structure is defined as below:
_______________________________________________________________
Struct test{
Int a;
Char c;
Int b;
} ptest;
_______________________________________________________________
If you are given a pointer to char c, how to get the pointer to the structure ptest? The following equation accomplishes the goal:
_______________________________________________________________
Struct test * p = rb_entry( &c , struct test , c );
_______________________________________________________________
This trick enables us to wield inheritance over the original system of built-in RBTree. We can define our own structure including rb_node as a member, and use the built-in Red-Black Tree functions to operate on the rb_node’s and call rb_entry to obtain our data if necessary. Detailed introduction will be covered later.
Now come to some key function declaration in the rbtree.h:
_______________________________________________________________
extern void rb_insert_color(struct rb_node *, struct rb_root *);
extern void rb_erase(struct rb_node *, struct rb_root *);
static inline void rb_link_node(struct rb_node * node, struct rb_node * parent,
struct rb_node ** rb_link);
_______________________________________________________________
we don’t have to concern about how those functions are implemented, if you do you can refer to some books on data structure. The first one is to rebalance the tree after inserting a red node into the tree. The second one is to take a certain node apart from the tree, we have to free the memory by ourselves. And the third one is to make parent node the parent of given node. If you wanna see more introduction on anatomy of the rbtree.h file, refer to http://bbs.chinaunix.net/thread-2007670-1-1.html
Our exemplary C code for using rbtree.h
Our exemplary C code for using rbtree.h
We have introduced enough stuff to implement our simple practice of Red-Black Tree. It is time to see the code and admire the power of inheritance in C programming.
First is the structure derived from the rb_node:
_______________________________________________________________
struct unode{
struct rb_node __rb_node;
int __num; //this is to record our data.
};
_______________________________________________________________
We have to do our own functions for insert, search and delete.( Like inheritance in OOP )
_______________________________________________________________
struct unode * rb_search_unode( struct rb_root * root , int target ){
struct rb_node * n = root->rb_node;
struct unode * ans;
while( n ){
//Get the parent struct to obtain the data for comparison
ans = rb_entry( n , struct unode , __rb_node );
if( target < ans->__num )
n = n->rb_left;
else if( target > ans->__num )
n = n->rb_right;
else
return ans;
}
return NULL;
}
struct unode * rb_insert_unode( struct rb_root * root , int target , struct rb_node * source ){
struct rb_node **p = &root->rb_node;
struct rb_node *parent = NULL;
struct unode * ans;
while( *p ){
parent = *p;
ans = rb_entry( parent , struct unode , __rb_node );
if( target < ans->__num )
p = &(*p)->rb_left;
else if( target > ans->__num )
p = &(*p)->rb_right;
else
return ans;
}
rb_link_node( source , parent , p ); //Insert this new node as a red leaf.
rb_insert_color( source , root ); //Rebalance the tree, finish inserting
return NULL;
}
void rb_erase_unode( struct rb_node * source , struct rb_root * root ){
struct unode * target;
target = rb_entry( source , struct unode , __rb_node );
rb_erase( source , root ); //Erase the node
free( target ); //Free the memory
}
_______________________________________________________________
PS: I write those function calls on the premise that the reader should be familiar with basic C programming.
Thus we have done the main part of our program. And the rest is to compose a instance to use those functions. We will go through some key part of the instance, for details see the enclosed c files.
_______________________________________________________________
struct rb_root root = RB_ROOT;
struct unode * node;
_______________________________________________________________
To insert:
node = ( struct unode * )malloc( sizeof( struct unode ) );
rb_insert_unode( &root , op , &node->__rb_node );
if the number “op” already exists then free the memory:
free( node );
else record the “op”
node->__num = op;
To search:
rb_search_unode( &root , op );
To delete:
First find the node
node = rb_search_unode( &root , op );
if node is not null then erase the node
rb_erase_unode( &node->__rb_node , &root );
During the processing of the program, the variable root is always guaranteed pointing to the root of the tree. If you wanna check this out you should see the rotation code in the rbtree.c.
How to compile this program?
How to compile this program?
As told in the beginning of the article, It is not recommended to compile the file directly from the /include/linux/rbtree.h. So we have to duplicate the rbtree.h and combine necessary code of rbtree.c to rbtree.h. You should copy the code of the functions you used in your program from rbtree.c to rbtree.h. Besides you’d better copy the macro of “container_of” from kernel.h to rbtree.h. After all is done, place your C file and the new copy of rbtree.h together in your directory and compile it as you do with most of the C projects. Run it and try several commands, cool?
Downloading the C code and Makefile
Downloading The C Files:http://u.115.com/file/cliq5jqi
Downloading the C code and Makefile
Downloading The C Files:http://u.115.com/file/cliq5jqi
Looks like the link to the code does not work.
ReplyDeleteCan you update it. Thanks.
太感谢了,复旦大学的呀!!!
ReplyDelete