Assignment 3: Analyzing the link tree

Deadline: Friday 10th April 11:59 PM

The input to this assignment is the output from Assignment 2 (i.e. the link tree with two space indentation and optional error descriptions). If Assignment 2 is not working, you can craft such input by hand. Store the input in a vector of maps with keys :link and :out-links. Read the whole input from a file in the format of Assignment 2 output (see slurp and split-lines). Here is an example of how the data should be stored in Clojure.

(def example-input [
    {:link "link1" :out-links ["link2" "link3"]}
    {:link "link2" :out-links ["link1"]}
    {:link "link3" :out-links ["link1" "link2"]}])

Find the incoming links to each page for each of the links in here and add them as a new key :in-links. Since everything is immutable, you are really making a new vector of maps where each map has one more key than the corresponding map in the original array.

Write a function that sorts them (there is a builtin sort function) by rank where rank is the number of :in-links.

Write a function that sorts them by rank taking care of pages with mutual links. Two spam sites linking to each other to increase their rank. So you sort by number of :in-links excluding those pages who link back to the original site.

While search engines are fast to identify mutual links, larger circles are harder. A collection of 100 spam sites linking to each other in a cylce to increase rank of all of them is hard to detect. Write a third function that sorts considering only :out-links from where there is no way to reach the original page. So you will need a reachable? function that takes two links and tells if we can reach one from the other.

Remember that you can run Clojure programs like this:

java -cp /path/to/clojure.jar clojure.main -i file.to.run.clj

git history usage, and coding quality (good function decomposition and names, higher-order functions when required, functional programming principles, unit testing) are the prime focus. Performance is not a focus of this assignment.

Submission is via your git repository as for Assignment 1/2.

Unit tests

Write tests of any functions you are going to write before writing that function and check them in. So an earlier checkin contains the test and a stub implementation and a later checkin contains the actual implementation. Separate checkins are required so that the order can be verified when grading. For example:

(def test-square
    (and
        (= 4 (square 2))
        (= 49 (square 7))))
(defn square [x] "stub")

More git usage

In this assignment, at some point intentionally corrupt code in a file and commit it. Then do git log to find the commit hash and git diff <commit-hash> <filename> to confirm the old correct file and finally git checkout <commit-hash> <filename> to bring back the correct file. Also try git reset <commit-hash> which takes the whole branch to the older commit. The newer commit exists as an orphan unnamed offshoot if you ever want to get something from there. This will be visible in the history of your submitted repository as an off-shoot commit-hash that was never merged and is not a named branch.

The other thing to do is make a new clone of the repository (either clone command or a local copy). Make some small changes and push in one directory and some different changes in the other directory and push. The second push will fail. At this point run git pull to pull the changes from the server which were pushed from the first copy. Once you pull and merge, you will be able to push.

You also need to remove all binary files from your repository (git rm) and list them in .gitignore in root directory so they are not added back when you try to commit all changed files. For example the following .gitignore excludes all class files and all files in tmp subdirectory. .gitignore should be checked in.

*.class
tmp/

Warnings

Taking any code from Internet, from each other, helping each other debug or having access to each other’s code is strictly prohibited. Any perpetrators will be forwarded for strict action. Any guidance on design or even examples you study from the Internet should be quoted in your README file.