Assignment 4: Search Index using 2-3-4 trees

Deadline: Tuesday 21st April 11:59 PM

In this assignment we will take a step back from web pages and consider plain text files in a directory. You will write a function build-search-index that builds a 2-3-4 tree. You need to read about 2-3-4 trees from this Wikipedia article. In your tree, the keys are all words longer than 3 letters in all text files and the value along with each key is a set of files containing that word. Consider anything separated by a space as a word. Recall slurp, split-lines and also lookup file-seq and split. When building the 2-3-4 tree, assume the recursive call can insert in trees of one less height than the original problem.

(def search-index (build-search-index '/directory/of/text/files'))

Now write the function (lookup-search-index ‘search-term’) that returns the set of files containing that term and (lookup-search-terms ‘search terms separated by spaces’) that returns the list of files sorted by the number of terms contained in that file. These lookups must be performed using the in-memory tree and should not need any file reading.

Submission is via your git repository as before.

Code Quality

Indentation, descriptive naming, good function decomposition (not long functions), and good use of functional features (map, filter, reduce, every?, etc. and lambdas) are a necessary part of this assignment. A complete working assignment might be worth less than half the grade if done in a poor way. Performance is not a focus of this assignment.

Warnings

Taking any code from Internet, from each other, helping each other debug or having access to each other’s code is strictly prohibited. Any perpetrators will be forwarded for strict action. Any guidance on design or even examples you study from the Internet should be quoted in your README file.