Assignment 6: B-tree

Deadline: Friday 15th May 11:59 PM

A B-tree is a generalization of the 2-3-4 and 3-4-5-6 concept. A 2-3-4 tree is a B-tree of order 4 while a 3-4-5-6 tree is a B-tree of order 6. In this assignment you are to build a B-tree of a given (even) order (the order should be specified as a variable on top of your code file (e.g. var order=20;). Every node in a B-tree of order 20 has 9 to 19 key-value pairs in it and has 10 to 20 children. A 20-node can be split into two 10-nodes and the middle key-value element. Here keys are words and values are lists of filenames where that word was found.

We will make a few simplifications from last assignment. We’ll keep every node in JSON format in a separate file. This allows you to read and write complete files. The root node is kept in a file called root.txt. You’ll need unique filenames for all other nodes. I suggest a format like n0001.txt and incrementing a counter for every new node used. You can count the number of files in current directory and use that count as the integer in the filename for the next node. You’ll use filenames (or just the integer in the name) instead of pointers to other nodes (or instead of offsets in last assignment). There is no copy-on-write and you’ll overwrite nodes.

Here is more explanation of the cache discussed in last assignment. You cache nodes only until they are updated on disk (i.e. synchronously add to cache when writing and remove from cache once write is done). To make the read work immediately, you’ll first lookup the cache and if found, take the node from there and only read the file otherwise. You’ll artifically delay all writes by 10 second as in last assignment. This makes it easy to see if reading is working correctly from cache.

You still need the HTTP server from last assignment with the same URLs. Note that, like last assignment, there is no need to use socket.io as no data has to be passed back and forth between client and server. Similarly you need to write the build-search-index program. However, in this assignment, you have to recursively go deep in the given directory and index all files ending in .txt.

Rest of the instructions are same as last assignment.

Submission is via your git repository as before.

Code Quality

Indentation, descriptive naming, good function decomposition, functional features (filter, map, reduce) and good use of asynchronous features (NO synchronous functions) are a necessary part of this assignment. A complete working assignment might be worth less than half the grade if done in a poor way. Performance is not a focus of this assignment.

Warnings

Taking any code from Internet, from each other, helping each other debug or having access to each other’s code is strictly prohibited. Any perpetrators will be forwarded for strict action. Any guidance on design or even examples you study from the Internet should be quoted in your README file.