Rationals in Clojurescript

Last night, I uploaded some code to compute the Voronoi Diagram of a set of points to github. This post is not directly about that code -- I'll write about that when it is completely working – but rather about something that I noticed when I tried to convert the code to Clojurescript.

There are a few things that are known not to work in Clojurescript that work in Clojure: I had to convert calls from lazy-cat to concat, for example. However, the one difference that really broke things for me was the lack of support for rational arithmetic in Clojurescript.

One major element of my algorithm is a test to determine whether a given point is on a given line segment. With integer coordinates for the sites that determine the Voronoi Diagram and the bounding box, everything about this test can be represented as a rational number. Since Clojure gives you rational numbers by default, while Clojurescript simply uses Javascript's arithmetic facilities, the test works in Clojure but not in Clojurescript.

So what can be done about this? Here are some options:

  • Use floats, but more smartly

    Probably the simplest option, here we would not test for equality, but for almost equality. That is, if the distance between two points is a very small number (e.g. a millionth), and all of the input is integer, then the two points are probably the same. This option is obviously not too satisfying from a theoretical standpoint, especially if the small number is hardcoded. I am fairly sure that there has been research to determine what the exact value of the small number should be, but that would take the code out of the realm of simplicity.

  • Implement rationals in Clojurescript

    I am not sure how possible this option is. However, it is currently the most appealing to me. Rational arithmetic is not so difficult to implement, especially in the limited use case that I need it for: I am pretty sure that I could get away with /, *, +, -, and the comparison operators. These are currently implemented as macros in Clojurescript, which means that they take the Clojurescript code and change it Javascript code. This allows the Clojurescript compiler to take the prefix syntax of Clojurescript and change it to the infix syntax of Javascript.

    If we were to redefine the operators to be functions, we might have a chance to run different code given different types of input. The defmulti and defmethod macros seem perfect for this. The major question is whether it is possible to shadow the names of the operators but use them nevertheless. That is, will I be able to add (:require-macros [clojure.core :as core]) to my namespace and then use, for example, core/*?

    The other downside, beyond the potential difficulties, is the fact that the generated Javascript will no longer be quite as pretty. What used to be infix notation will be converted to function calls. Additionally, there will be a fairly high penalty for doing arithmetic – three or four function calls per arithmetic operation (assuming the polymorphic solution I have in mind works) rather than the current single operation.

Conclusion

I am going to try to implement the rationals in Clojurescript first. At the very least, I will learn a bit more about how multimethods work in Clojure.

Posted Wed 09 Nov 2011 09:13:06 AM PST Tags:

A small followup

My previous post said that I would be using the macros defined in Clojurescript's core.clj. It turns out that doesn't make a lot of sense. I am using the functions defined in core.cljs instead.

Posted Wed 09 Nov 2011 02:03:16 PM PST Tags:

More about rationals

While attempting to implement support for rationals yesterday, I stumbled on what is (or at least, seems to me to be) a bug in the Clojure compiler itself. That is, one can not refer to the function / in namespaces other than clojure.core. Now, this is probably the first time that anyone has wanted to do this, so it's not surprising that no one had seen the bug before. But what was surprising to me was that I could fix it fairly easily. I submitted my patch, and I should be a real-life contributor to Clojure pretty soon. As a person who doesn't see himself as a compiler guy, that's pretty exciting.

The next problem that I'm having is that redefining the basic arithmetic operations does not seem to work properly in Clojurescript. The compiled javascript is trying (at least as far as I can tell) to redefine cljs.core/+ rather than rationals/+, for example. This is the problem that I am going to try to solve today.

Posted Thu 10 Nov 2011 08:55:06 AM PST Tags:

A new system

This is the obligatory post about how the new blogging system works. Isn't that exciting?

By my standards, this is a fairly simple system. It is based on ikiwiki, which allows me to use org-mode to edit my posts and git as the version control system. Since I already know those tools, I don't have to think about them and can just focus on writing.

I'm less comfortable with making web pages look nice, but I took the CSS file from the nice people at antportal.com and added my own panoramic photo that I took while hiking. I think the output is simple and looks pretty good.

Finally, I am currently hosting the pages on github.

What I intend to write about

The blog is currently called "Hiking and Coding". So far I've only written about the latter, but I'm planning some posts about the former. My current objective is to write once a day as a way to solidify what I'm thinking about for the day, so my posts will likely be about what I'm working on. As such, expect them to be really boring. I'm okay with that. I'm also okay with you not subscribing.

I will try to write about adventures I had while hiking on days that I'm not really working on anything. Those posts will be tagged with the hiking tag. They might be more exciting (though they might not).

Posted Thu 10 Nov 2011 06:13:21 PM PST Tags:

Resolving names as they are compiled in Clojurescript

I think I found the bug in the Clojurescript compiler that I was looking for yesterday. Just to refresh the memory, when defining a function that has a function of the same name in cljs.core, the compiler will assume that you are trying to redefine the function from cljs.core, rather than the function in the namespace that you are actually in. Since I am trying to redefine functions like / and *, this is a problem.

Let's look at the evidence. First, there's this:

(defmethod parse 'def
  [op env form name]
  (let [pfn (fn ([_ sym] {:sym sym})
              ([_ sym init] {:sym sym :init init})
              ([_ sym doc init] {:sym sym :doc doc :init init}))
        args (apply pfn form)
        sym (:sym args)]
    (assert (not (namespace sym)) "Can't def ns-qualified name")
    (let [name (munge (:name (resolve-var (dissoc env :locals) sym)))])))

The important line there is the last: when defining a new name, resolve-var is called on the symbol. Let's have a look at that function. There is a cond, and when none of the tests return true, the following is done:

(munge (symbol (str
                (if (core-name? env sym)
                  'cljs.core
                  (-> env :ns :name))
                "." (munge (name sym)))))

This is seeing if the symbol name is in cljs.core, and if it is setting the namespace of the symbol to cljs.core. Normally that would be correct – one doesn't want to need to use cljs.core in every file – but it doesn't allow for redefinition of functions that are in cljs.core (at least without completely shadowing them).

What to do

So what can we do about this? The first thing that seems odd to me is that the symbol being defined is being resolved first. Every symbol that is defined is defined within its own namespace, so there should be no need to resolve it. That suggests that we should be able to take the part of resolve-var that doesn't have a special case for cljs.core and put it into the parse method. Something like

(let [name (munge (:name (symbol (str (-> env :ns :name) "." (munge (name sym))))))])

might work in the parse method.

Posted Fri 11 Nov 2011 06:23:38 AM PST Tags:

Done!

I was close with my guess in the last post. What ended up working was

(let [name (munge (symbol (str (-> env :ns :name) "." (munge (str sym)))))])

For some reason, (name sym) would crash the compiler from the parse method but it wouldn't from other functions caled from the parse method. I finally gave up worrying about that and just used (str sym) instead, since that is guaranteed to be the same in this case – there is an (assert (not (namespace sym))) just before the let line. I gave up on worrying about it, but someone who knows more about the compiler than me might want to try to figure out why it is.

I also needed to allow names defined in cljs.core but redefined in another namespace to be called or referred to. This involved changing the resolve-existing-var function. Again, cljs.core is hardcoded:

(let [full-ns (if (and (core-name? env sym) (nil? (get-in @namespaces [(-> env :ns :name) :defs sym])))
                'cljs.core
                (-> env :ns :name))])

I added the second test: it simply asks whether the variable has been defined in the namespace. If it hasn't, and the variable is defined in cljs.core, only then is the namespace set to cljs.core.

Remove the macros

It was also necessary to remove many of the macros from core.clj in order to redefine the functions that I needed to in my project. As I noted earlier, my project is to implement rationals in Clojurescript, so I need to redefine most of the functions that work with numbers. (At least, that is the way I am implementing it -- there may be a better way that I don't know about.) Many of these functions were implemented twice: once as a macro in core.clj and once as a function in core.cljs. I am obviously biased towards being able to redefine these functions, so I think the macros should be removed, but at least one of the implementations is redundant.

Posted Fri 11 Nov 2011 05:07:52 PM PST Tags:

Programming challenges

One of the interesting aspects of looking for a job this time around is that most of the companies that I have applied at have asked me to do a programming challenge as a kind of weed-out step. It has been fun and a good way of keeping in practice. In fact, I have resolved to learn at least one new feature of the programming language that I am using for each one.

Time series framework

My latest programming challenge is to build a framework for time series generation. The framework will take a function $f$ that looks at the $n$ previous timesteps and returns the next timestep. The other goals are:

  • Efficiency

    Adding a new timestep should be a $O(1)$ operation.

  • Ease of debugging

    If an error occurs in computing the next timestep, the stack trace should be easily comprehensible and show the current state of the time series.

  • Traceability

    It should be possible to see the differences between two different functions.

Ideas so far

This seems like the perfect time to learn about monads. Yikes! I've tried before, but have never seen a practical use for them. As far as I understand so far, they should be very good for storing the state (i.e. the previous $n$ timesteps).

A simpler solution might be just to use lazy sequences. There's not much to such a solution, so I might just implement it first and then try to replicate it with monads. In this solution, the previous $n$ timesteps are simply the last $n$ elements of the lazy sequence. Getting those elements might be tough to do in constant time, but I have a feeling it's possible.

On the debugging front, there shouldn't be too much complexity: it should just be a combination of try, catch, and throw. There's also clojure.stacktrace if I want to get even fancier.

As far as tracing goes, I might have a look at tools.trace to see if there's anything interesting there that I can use. I think that monads might provide another way of doing the same thing. We'll see which is simpler. It might also be nice to use incanter to display the differences between two (or more) different functions.

On to the coding!

Posted Sat 12 Nov 2011 08:27:15 AM PST Tags:

What we have so far

Well, yesterday was either a very productive day or a very lucky day for me. I read up on monads – I found that khinsen's tutorials and Jim Duey's tutorials complemented each other nicely. Then I went for a long run (about 18 miles I think) and let the ideas sink in. When I got home, I found state-m and m-seq and the job was almost done.

Just as a reminder, yesterday's task was to build a time-series framework where a function \(f\) accepts the previous \(n\) outputs of \(f\) as input to generate a new output. Here is my solution, edited for clarity (the actual solution is in a github repository):

(defn time-series [f init-state n num-iterations]
 (let [call-f (fn [state]
                (let [retval (f state)
                      num (count state)
                      new-state (vec (if (= num n) (drop 1 (conj state retval)) (conj state retval)))]
                  [retval new-state]))]
   ((with-monad state-m
      (m-seq (repeat num-iterations call-f)))
    (vector init-state))))

As you can see, the heavy lifting is done by the state monad and by m-seq. What do they do? Well, m-seq can be thought of as taking a list of functions and calling them in order, returning a list of their return values. The list of functions in this case is the same function (call-f) over and over. All call-f does is call \(f\) and change the state.

It might appear that call-f is an \(O(n)\) function. After all, we call count on the state, which is a vector of at most \(n\) elements. However, almost all the behind-the-scenes data structures that Clojure uses (including the vector data structure) are guaranteed to have \(O(1)\) performance for calling count. So that's nice, and call-f has the performance characteristics that I desired.

Monads

They're still a slightly confusing and magical-seeming thing to me, but monads have some obvious utility. I guess the way to think about them is that if you put your input and output in a common format, then they aid you in stringing functions together very easily. I used them in this case to encapsulate state, and that is probably the most common use of them in Clojure (judging by the number of state functions in the Clojure monads library). However, I think I should probably try to understand all the monads in the library and know how to combine them.

I could have written the function above without monads, but the code would not have been nearly as concise, and there probably would have been many more edge-cases to consider.

The rest

I also wrote some stuff to make debugging easier, and used incanter to compare multiple time series. However, neither required me to bend my brain as much as monads, so I won't talk about them very much. In fact, not at all, because that is the end of my post.

Posted Sun 13 Nov 2011 06:12:16 AM PST Tags:
Posted Sun 13 Nov 2011 06:37:33 AM PST

Starting on the Appalachian Trail

This is my first post about hiking, so I think it should be about my first days really hiking. That would be July 1 and 2, 2010. It already seems like an eternity ago, so much has happened since then.

July 1

I had been at a conference in Quebec City in the days leading up to July 1. It had gone well and I had given a talk that I thought was pretty well-received. But more important for me on that day was that I had gone to MEC and bought a bunch of backpacking gear. In fact, I had mailed all my possessions home from Germany to Calgary the week before, so I was really planning to live on the trail. I had no alternative.

I shaved one last time before my trip, took my new gear, and walked to the rental-car agency. I had booked a car the day before and knew it would be open, even though it was Canada Day. The drive from Quebec to Maine was long and pretty boring. The only slightly-eventful thing that happened was that the border guards really gave me a hard time, eventually saying something like "Well, we have to let him in, he's a citizen." I guess they don't have much to do at that particular crossing, so they make the most of it when they get someone with even a slightly strange story.

I arrived in Bangor, Maine and needed to do some food shopping. I knew that I had the dreaded 100-mile wilderness ahead of me, meaning that I might not get to a store for 10 days. The stuff I bought then seems rather comical now. I had a pound of rice and four or five Indian meals that were not dehydrated. Also a full jar of peanut butter and a full jar of jelly. And two different kinds of non-dehydrated hummus. I'm pretty sure it all fit into my food bag eventually, but it took some doing.

I returned the car at the airport and then took the airport bus from the airport to the bus terminal. It turned out that the bus terminal was walking distance from the rental-car agency, but I didn't find that out until I was already on the bus.

I got to the bus terminal and asked for a ticket to Millinocket. It turned out that I wasn't the first to do that on that day: two other people were already there that were going to the trailhead. They would become known to me as Picker and Grinner, and I would walk with them for about 1000 miles. They obviously knew what they were doing. They had sent their food ahead to the Inn at Millinocket and had booked a ride there. Grinner had also already hiked the trail, so I was sure that I was safe tagging along with them.

At the inn, I met a couple more people that were starting the same day as me: Oblio and his dad. We had a beer at the bar down the street and discussed the trail ahead.

July 2

After a night of pretty poor sleep (I was excited and there were snorers), we got up bright and early at 5:30 the next morning. The inn came with a deal that you get breakfast at the diner, so I met up with Oblio and his dad, and we came up with trail names for ourselves. Since I had been doing my postdoc in Computational Geometry just before the trail, I became known as Dr. Geo. Oblio's dad was in marketing, and assured me that this was a fine name.

We piled into the innkeeper's truck and headed for the trail. The first task of a southbound AT thru-hike is to climb Mount Katahdin. In fact, the climb doesn't even count, because the trail starts at the top. Since we would be coming back, we left our full packs at the Baxter Park ranger station, and took the loaner day-packs that they provide. That would be the only time that I hiked without my full pack over the next 4500 miles.

It was a good thing that we had light packs, because Katahdin is a monster. It is probably the most technical climbing on the trail, requiring you to pull yourself up with your hands at many points. However, the views are great. At some points, you feel like you are in an airplane, you can see so far. There are even great waterfalls near the base. I know some people bathed in the pools beneath the falls, but the water was far too cold for me. However, I did get my picture taken at the summit so that I could prove that I had started the trail.

We started our hike from the summit back down the mountain. It was even harder going down than going up. When we got to the bottom, everyone was pretty tired. Especially for the first day of hiking, Katahdin is really hard. We made our camp at Baxter Park. Of course, I was very poorly organized and hadn't realized that you must book a campsite there, so I shared with Oblio and his dad.

It was a night of firsts for me. I hadn't lit my stove before. It didn't want to work with my flint, so I had to borrow a lighter. I hadn't used a water filter before, so I needed to do that for the first time as well. Cooking outside was also a new experience, and I undercooked the rice pretty woefully. But after a hard day of hiking, any warm food is pretty good. After that, I set up my brand-new hammock for the first time and climbed into my brand-new sleeping bag. The hammock sagged so much that my bum almost touched the ground, but I didn't care – I was in bed and my journey had started.

Posted Sun 13 Nov 2011 07:57:32 AM PST Tags:

Scattered

I am doing a bunch of things today, all fairly small. I think it's because I finished working on the time series framework yesterday, so I'm kind of at loose ends.

New programming project

One thing that I would like to do is to take all the messages that I sent with my SPOT device while hiking and make a small animation of them on Google Maps. This would be another use of Clojurescript. My previous project – a Clojurescript program for computing the Voronoi Diagram of a set of points input by a user – is a bit stalled at the moment, so it's good to have something else to work on. Also, I feel like adding points to a Google Map is something that Clojurescript is better suited to, rather than the fiddly numeric stuff that I was trying to make it do. I will come back to the Voronoi Diagram at some point (hopefully fairly soon), but I am putting it on the back burner for the moment.

So far, I see two major tasks:

Getting the coordinates and times of the SPOT messages

Every SPOT message is an email that contains the coordinates of where and when the message was sent. Parsing the email can be done offline, so I will probably just use Clojure for that.

Putting them on the Google Map

The API for using Google Maps appears to be easy to use and well-thought-out. There's even a way to drop the markers at different intervals – just what I want to do.

Notmuch

I am also playing with the notmuch email client. So far I quite like it. It's a heck of a lot faster than my previous mail client, mainly because it doesn't do as much. Like gmail, it is optimized towards two operations: searching and tagging. I never really used tags in gmail, because doing so involves clicking around, but it's much faster in notmuch. So far, this seems like the big advantage for notmuch.

Git annex

In an effort to use all the software written by Joey Hess (not really, but it seems like it), I am also trying out git annex.

So far, I have come up with one really compelling use case -- syncing podcasts to my mp3 player. My mp3 player has the problem that it will delete all the files on it at random times. Thus, I need to keep a backup of all the files on it on my computer. However, when I am done listening to a podcast, I like to remove the file. Hopefully, I will be able to sync the files that I remove myself (that is, remove them from the laptop), but not the files that get removed by the stupid thing crashing.

Another thing that I would like to do is put my music and video collections in git annex. I'm not sure that anything special would come from doing that, but it seems nice to have these things in version control.

Posted Mon 14 Nov 2011 09:11:23 AM PST Tags:

Another programming challenge

I woke up with another programming challenge in my inbox this morning. This one is from Square. They are a company that gives you a little device so that you can accept credit cards using a smart phone. The challenge is here – basically it's about removing credit cards numbers from logs. There are some details about what constitutes a credit card number, and how they are distinct from other numbers. I won't repeat that here, but you should read it to understand what's coming up.

As usual, I implemented my solution in Clojure. I particularly liked the function I called luhn-check. Here it is:

(defn luhn-check
  "Takes a sequence of digits and determines whether they pass the Luhn test.
   The sequence must be in order from right to left."
  [digits]
  (->> digits
       (partition 2 2 (repeat 0))
       (mapcat #(vector (first %) (* 2 (second %))))
       (reduce #(+ %1 (int (/ %2 10)) (int (mod %2 10))))
       (#(mod % 10))
       (= 0)))

I think this is a good illustration of the ->> operator in Clojure, which is also called the "thrush" operator. Basically, this operator can be thought of as taking the result from one expression and putting it at the end of the next expression.

In this case, we start out with a list of digits. We then partition them into pairs, padding with a zero if necessary. We then take the pairs and double the second number in the pair, concatenating all the pairs back into a single list. We then sum up all the digits in the list, using the reduce function. This leaves us with a number. We get its value \(\mod 10\) and test whether it is equal to zero. If it is, we return true, otherwise we return false.

All of this could be written with nested expressions (and the thrush operator is simply a macro that rewrites what you see as nested expressions), but I think the way it is written nicely shows what the data is doing in the function.

Anyway, I don't know if this solution will get me a job, but it was a bit of fun coding and didn't take too long.

Posted Tue 15 Nov 2011 12:22:19 PM PST Tags:

Where I was

I finally have a Clojurescript application that works basically the way I want it to. It shows a Google Map of the US onto which is added a series of markers. The markers are placed at the positions that I sent a SPOT message on one of my hikes. They are dropped at intervals that correspond to the intervals between me sending them out. The whole thing takes a minute to watch. There are a couple of parts where it seems like nothing is happening – those are winter when I wasn't hiking and a period of time between when the SPOT stopped working and when I got the replacement. The app is here. The source code is on my github page, but it is mostly unedifying.

Posted Wed 16 Nov 2011 08:20:05 AM PST Tags:

Introduction

I'm sitting in an airport, breathing the recycled air and jet exhaust, so I thought I'd write a bit more about my hiking experiences. I don't have my trail guide with me, so I won't be able to include as many details as I would like, but hopefully I can recall the things that made the biggest impression on me.

The 100-Mile Wilderness

Right after the huge challenge of climbing Mount Katahdin, southbound hikers on the Appalachian Trail must deal with about 100 miles without any stores or conveniences. As I mentioned in my last hiking post, this means big and heavy packs for the hikers. Also, my lack of experience meant that I was carrying things that I didn't really need, so when I set out on the first morning I was really loaded down.

I got out of camp a bit later than Oblio and his dad, and walked out of Baxter Park. I don't remember too much about the first day. There was a decent-sized river that we walked beside for a while and I thought I saw a beaver or some other water animal, but I couldn't get close enough to get a picture.

We officially entered the 100-mile wilderness when we passed this sign, and didn't walk more than 100 yards before we saw some moose dung. That seemed like a good sign to me. Where there's dung, there's animals. A couple of miles later, it was time to call it a day at the first shelter on the AT. The log in the shelter warned of rodents, so I immediately decided not to sleep in it. We also met some northbounders (nobos in AT parlance). At that time, I viewed them as something like heroes. These were people that were just 10 or 12 miles from finishing the same hike that I was on. Wow! They were so close to the finish line that they were going on. They would probably do half the miles that we had done in the entire day in that evening.

The next day, I got up fairly early. There was a small hill, and then the trail goes around a lake. The trail was mucky and the boards (called "bog boards") that were placed for people to walk on were disintegrating. Finally we got away from the lake and the trail conditions became a bit better. We crossed this bridge and got to the Rainbow Stream shelter (I see that Picker and Grinner got there before me). I took a small nap on the pine needles behind the shelter and cleaned off my legs in the pool below this waterfall. Finally, I started walking again. About two miles later, I stumbled and decided that I must be getting tired and that I should look for a place to camp. Luckily, I found that Oblio and his dad camped just a few hundred yards later, so I stopped there for the night.

I don't really remember much about the next couple of days. On one of them, I got bitten by a mosquito on the lip, and my lip swelled to twice its normal size. There was also a river that was reputed to have good swimming, but the water was quite warm because it is lake-fed. I hiked a couple of miles past it and camped at the lake that was the source. A float plane landed on the lake while I was there, but it was all the way on the other side of the lake, so it wasn't too bothersome. I had a quiet night, but Picker and Grinner didn't; there was a group of Quebecois Girl Scouts at the campsite next to the river and they stayed up late chattering.

Picker and Grinner passed me early the next morning in what would become a pattern. The two of them were much more efficient than me in the morning, especially while I was still cooking oatmeal for breakfast. That day was the first where I had a big choice to make. Would I hike fast and get over White Cap or would I stop just before it? I didn't have a data book at that point, so I had no real way of knowing how hard White Cap would be, but people had been hyping it for a while as a pretty hard climb.

White Cap

When I got to the shelter just before White Cap, I made the choice to go on. The next shelter was just two or three miles away and it wasn't yet late. This turned out to be a slight mistake. The climb up White Cap wasn't too bad (though it would turn out to be too much for Oblio's dad – he had to be helped down the mountain and he and Oblio called it quits there), but there were about three more mountains to go over before I got to the campsite. It was in that moment that I first learned the value of having a data book, though I don't think it quite sunk in until I got through with Maine.

Another reason that hiking over White Cap that evening was a mistake was that there was a large group of Quebecois boys at the campsite and they kept making noise until about one the next morning. However, I did get woken up by a moose, so maybe that's a wash.

Chairback

The next day was probably my shortest full day of hiking on the trail. I was clearly tired from my adventure (and the noise made by the boys) the night before and I just walked pretty slow in the morning. I also had not counted on the climb up the next mountain – Chairback Mountain – being straight up. People in Maine have clearly not heard of switchbacks, because there were none. When I was getting very close to the shelter at Chairback Mountain, there was a large climb on talus (big rocks that are just sitting on each other with nothing to keep them from falling). That's also not my favorite thing.

I had told myself that if I didn't get to the shelter by 3, then I would stay there. I met an older man that was already staying at the shelter and asked him what time it was – he said that it was after three, and I was quite relieved. We talked for a while -- his trail name was Frosty and had been on the trail for a day longer than me. I was a bit proud of myself for passing someone already, but Frosty had taken a bit of a side trip.

I went to set up my hammock, and while I was doing that, I heard Picker and Grinner arrive. They had stayed at the shelter before White Cap and we had not seen each other that day. She asked Frosty if he'd seen me, and when he pointed me out, Grinner came running over and gave me a big hug. That was a huge help for me to have someone glad to see me at the end of what was my first really hard day on the trail. It was just one of the reasons that I give a lot of credit for my success on the trail to Picker and Grinner.

The rest

I stayed with Picker, Grinner, and Frosty for the rest of the 100-mile wilderness. There were more mountains and campsites, but nothing that stands out too vividly in my memory (though this waterfall is pretty nice). Frosty turned out to be a great hiking companion as well. His pack was lighter than everyone else's, so he had no problems keeping up, and he has a great sense of humor. He had planned a lot better than I had, and finished all the food in his pack by breakfast on the final day before we got to Monson. I think he had to eat turkey stuffing because he had eaten some breakfasts for supper on an earlier night. Luckily, we were only a few miles from the road, so no one starved. (That being said, everyone was talking about food at that point – we were getting pretty hungry and all of us lost quite a bit of weight.)

Monson

We called Shaw's in Monson from the road and they came and picked us up. It was great to get out of the wilderness. The people at Shaw's were so nice and especially great at dealing with tired hikers. They quickly got us into the showers and gave us a nice all-you-can-eat pancake breakfast.

I think I won't get into all that happened in Monson today, but leave that until the next installment of the saga, where we'll meet Wounded Knee. Stay tuned.

Posted Sat 19 Nov 2011 12:34:35 PM PST Tags:

Functional programming

I've gotten a bit obsessed by functional programming over the last couple of years. I have even seriously thought about writing a book that looks at computational geometry (which is the subject in which I was trained) in the light of functional programming. Most of the standard computational-geometry texts these days approach the writing of code from an imperative standpoint.

I think functional programming is a good way to think about computational geometry for a couple of reasons. First, most problems in computational geometry can be expressed in a purely functional manner. That is, the answers to the problems are usually the same given the same inputs. Secondly, I have bought into the idea that parallel and distributed computing are much easier when state is not being modified. Functional programming forces you to think this way. While many problems in computational geometry are inherently sequential, not all of them are. Using a programming method that allows for the easy addition of more cores seems like a good practice when a problem is encountered that is easily made parallel.

One of the things I have been thinking about as a final chapter for the book that I would like to write is the introduction of monads as a way to "parse" polygons. Monads are commonly used (well, commonly in the functional programming world) to parse strings. They generally scan through with the aid of a stack. There are many algorithms that scan through the points of a polygon with a stack -- the example that I have thought about the most is the visibility polygon.

Visibility polygons

A visibility polygon is the subset of a polygon that can be "seen" from a point inside the polygon. Here, we regard the segments of the polygon as opaque walls.

The standard algorithm for computing the visibility polygon is not too difficult to implement in an imperative language. In fact, I implemented it already in C++ for the drawing editor ipe. (Though I think that code is now obsolete since the release of ipe 7.)

I've already tipped my hand at how I would think about implementing such an algorithm in a functional language. I would use a variant of a monadic parser to go through the polygon and find the parts that are visible. (That last sentence doesn't give much of a hint about how it's done, but the whole algorithm would take too much space to describe. Suffice it to say that a stack is kept with the "currently visible" portion of the polygon always on the stack.)

Monads

I used monads earlier to make a framework for creating and comparing time-series. However, I only used a monad that was already defined by someone else. I think I might need to define a monad on my own this time – a slightly daunting idea. I definitely need to read the rest of Jim Duey's tutorial to make sense of how to do it.

Posted Sun 20 Nov 2011 04:58:28 PM PST Tags:

Visibility Polygon

I've put up some preliminary code for finding visibility polygons on my github repo. I have only tested it with one polygon, but so far things are looking good. My idea from yesterday of using monads to "parse" the polygon seems to be paying off. Here is the relevant code:

(defn- add-new-pt [poly]
  (fn [pt stack]
    (when (or (empty? stack) ; First two points are guaranteed to be visible
              (empty? (rest stack))
              (visible? pt (first poly) (first stack)))
      [pt (rest poly) (cons (first poly) stack)])))

(defn- pop-stack [poly]
  (fn [pt stack]
    (let [the-seg (seg/new-seg pt (first poly))
          top-seg (seg/new-seg (first stack) (second stack))]
      (when (pt/left-turn? (second stack) (first stack) (first poly))
        (if (seg/intersection-on-seg? the-seg top-seg)
          [pt poly (cons (seg/intersection the-seg top-seg) stack)]
          [pt poly (rest stack)])))))

(defn- skip-pt [poly]
  (fn [pt stack]
   (let [the-seg (seg/new-seg pt (first stack))
         poly-seg (seg/new-seg (first poly) (second poly))]
     (when (not (pt/left-turn? (second stack) (first stack) (first poly)))
       (if (seg/intersection-on-seg? the-seg poly-seg)
         [pt (cons (seg/intersection the-seg poly-seg) (rest poly)) stack]
         [pt (rest poly) stack])))))

(defn- all-conditions [poly]
  (with-monad polygon-parser-m
    (m-plus (add-new-pt poly) (pop-stack poly) (skip-pt poly))))

This defines three actions to be performed when the polygon and stack are in certain configurations. The first executes when the next point on the polygon doesn't obscure any of the stack. In that case, it removes the point from poly and puts it on stack. The second executes by popping points off the stack until the next point on the polygon no longer obscures the stack. The final condition activates when the next point on the polygon is hidden by the stack. It skips the points of poly until it finds one that would be visible. We combine the three conditions using m-plus, which is defined to try the conditions in order until one returns something that is not nil.

Is this solution any better than a functional programming solution without monads? I think it is. First, the conditions are easy to see and very explicitly laid out. Using a cond to accomplish the same thing is certainly possible, but gets more complicated as the number of conditions grows. Also, we are explicitly managing state in this solution. Doing so in an ad-hoc manner would be much more difficult. In fact, I am not really sure how I would loop until there are no more points in poly in a solution without monads (especially since not all the conditions consume a point from poly). With monads, it is quite simple:

(defn- visibility-polygon-helper [pt poly]
  ((with-monad polygon-parser-m
     (m-until empty? all-conditions poly)) pt []))

Doing something like that with a reduce seems like it would veer off into unreadability fairly quickly.

Are there downsides to this solution? I think there is at least one. That is, the conditions must be functions that return functions. This makes them a bit more confusing than they really should be. I needed to do this so that I could have a test in the m-until function above. On the whole, though, this is a fairly cosmetic gripe, and can be hidden by using helper functions like the one above.

On the horizon

I would like to test the code quite a bit more. However, doing so by drawing out polygons by hand and then figuring out their visibility polygons is quite tedious. Therefore, I need some sort of GUI to be able to draw polygons and then I will be able to see whether the computed visibility polygon makes sense.

I am thinking that it might be nice to use the clj-processing library to do this. Unfortunately, that library is currently only using Clojure 1.2. Since I am using Clojure 1.3, that might be a problem. So I might need to do some porting from 1.2 to 1.3. However, doing so should give me a good idea of how processing works, and could give a nice tool for more interactive geometry programs.

Posted Mon 21 Nov 2011 11:44:43 AM PST Tags:

Viewing visibility polygons

I mentioned yesterday that it would be nice to see the output of my visibility polygon solution. To that end, I created an extremely simple drawing editor using Processing. I have to say, I loved it.

The most important feature to me is the extreme interactivity. I can change functions extremely quickly with my tools – redefining a Clojure function is either one or two keystrokes in emacs, depending how I choose to do it – so I like a graphics environment that changes just as quickly. This is what Processing, and in particular clj-processing, offers. I was able to define a function that draws the various objects (in this case, the polygon, the point where the mouse is, and the visibility polygon of the mouse point inside the polygon). If I wanted to make any changes to this function – for example, changing the color of the point where the mouse is – I can simply redefine the function using my emacs tools and the change shows up immediately on the drawing.

I was able to use this to find a couple of minor bugs in my visibility-polygon-finding code, but in general, it worked really well on the first try.

Shortcomings

There were a couple of things that slightly bothered me about clj-processing. First, it used quite a lot of CPU just to display a simple polygon without too many points. I am probably using it rather naïvely, so it is possible that this is my problem and not the problem of clj-processing. However, the second problem is just that clj-processing is showing its age. I think it was probably one of the first Clojure libraries out there and much of the coding style hasn't evolved with the Clojure best practices that people use. For example, some of the features do not work if you only require the library – you must use it. I try to only require libraries, to avoid my namespace becoming overly populated, so it is frustrating when that doesn't work.

Going forward

I need to clean up the code a bit before I can put it up on github, but it should be there soon. It currently needs me to explicitly call the function in order to find the visibility polygon. I would really like it to find the visibility polygon of any point where the mouse is inside the polygon. However, determining if a point is inside a non-convex polygon tends to be a bit harder than it sounds. You can shoot a ray in one direction from a point, and count the number of polygon edges that it crosses – if it's even you are outside and if it's odd you are inside – but what happens if the ray crosses a vertex? There was a good experimental paper on this problem at a recent EuroCG.

So that's a slightly non-trivial problem. I also coded an implementation of the Voronoi Diagram problem recently. I should add a Processing UI to that as well. I have a feeling that would be the easier task to do next, and I would surely discover some bugs in it while I did.

Posted Tue 22 Nov 2011 11:29:29 AM PST Tags:

Using git annex

I remember being a young pup using Debian and reading the mailing lists. I was always seeing the name Joey Hess answering the tough questions. I had great respect for him then, and it has remained to this day. I have probably used his unclutter program for ten years and it has never crashed. I recently installed ikiwiki to run this blog, etckeeper to track changes in my config files, and I figured that I should try his other recent software: git-annex.

I came at it with two objectives: to be able to sync my podcasts and to be able to manage my media files better. The second has worked really well, and the first is hopefully getting there soon. I will concentrate on the second for the moment, since that is what I have working the best. There are multiple use-cases described on git-annex's website – I will just go through mine.

All of my media files are on a terabyte external drive. When I am at home, I have it mounted as an NFS drive on /media/mybook. I have videos there, split into tv and movies folders, as well as music in the music folder. All of the folders are treated the same, so I will just go through the procedure I used in the music folder.

First, I set up the annex:

~ $ cd /media/mybook/music
/media/mybook/music $ git init .
/media/mybook/music $ git annex init mybook-music
/media/mybook/music $ emacs .git/config

at this point, I set the backends keyword in the [annex] section to either SHA1E or SHA256E. There is a tradeoff there between speed and safety – it is a bit more likely that two files hash to the same file with SHA1E, but it is faster than SHA256E. The E at the end of the hash means that the filename extension is preserved in the hashing. This is important for some mp3 players and other programs that can not tell what type a file is without an extension. The next command takes a long time with a lot of files:

/media/mybook/music $ git annex add .
[All the files are hashed and their contents are put into .git/annex]
[You will see that all your files are now symlinks to files in .git/annex]
/media/mybook/music $ git commit -m "Added my files"

Now it is possible to clone this repository back on the laptop

~ $ git clone /media/mybook/music
~ $ cd music
~/music $ git annex init laptop-music
~/music $ git remote add mybook-music /media/mybook/music
~/music $ cd /media/mybook/music
/media/mybook/music $ git remote add laptop-music ~/music

And now all of the cool stuff can begin. You can look inside the ~/music directory and see that it appears that all your files are there. However, they are not taking up any space. What is happening is that they are broken symlinks to objects in ~/music/.git/annex. If I want to listen to some of the music, I move it over to the laptop with the following command

~/music $ git annex get --from mybook-music Beatles/

I can now take my laptop on the plane and listen to the Beatles. When I'm back from my trip, I can free up the space by doing

~/music $ git annex drop Beatles/

Suppose I obtain some new music, getting into the Rolling Stones. Then I can use git annex add to add the Stones into my collection, git commit them and git pull in the correct repository to duplicate the information. I can then use git annex copy or git annex move to put the files where I want them.

For now, I have a queue of unseen videos and music that I want to hear on my laptop, while the bulk of my media collection sits on my external drive. I can take what I want with me and not worry about my whole collection using up all of my hard drive. I can see what I have in a particular repository with the git annex find command.

As I said, there are many other uses for git-annex and features that I have not yet learned. One thing that I am interested in trying out is using the web as a repository.

Posted Wed 23 Nov 2011 07:02:00 PM PST Tags:

Penance

In a job interview last week I said a couple of things related to sorting in Clojure that turned out to be incorrect when I checked them. This is my way of making up for them – if I can make them true, then no one will notice. Though to be fair, I'm sure that I'm the only one that thinks that way.

The first thing I said was that heap-sort is my favorite sorting algorithm, but then I declined to implement it, preferring to implement merge-sort instead. I think this is actually pretty reasonable in an interview situation – heap-sort is a conceptually more difficult sorting algorithm, so being able to remember it on the spot is more difficult. The second thing I said was that Clojure itself uses heap-sort and that given its love affair with laziness that it would not be unreasonable to assume that (first (sort lst)) was a linear-time operation. I might have read something like this on a mailing list or IRC, but it is not correct. Clojure currently uses Java's sort function, which is a slightly-modified merge-sort. There is not much point in making that algorithm lazy, because getting the first element from the sorted list requires \(O(n \log n)\) time anyway.

Heap Sort

For those that are not familiar with it, heap-sort is something that is usually taught in a second-year undergraduate Computer Science class. So it's not that difficult an algorithm, but it does require some thinking, and there is some fancy analysis that goes into part of it. For a full discussion, see the Introduction to Algorithms book by Cormen, Lieserson, Rivest, and Stein.

Heap

To start with, a heap is conceptually a tree where values are stored at the nodes. The largest value of all the values stored in subtrees is stored at the root and the two descendant trees are heaps. Heaps are usually required to be as close to balanced as possible – if any level of the tree has leaves, they are all bunched to the right, and all the rest of the leaves are at the next level.

Such a tree is usually implemented as an array, where the child nodes of a node can be obtained by some simple arithmetic on the index of the node in the array.

Building a Heap

Given the definition, there is an intuitive algorithm for building such a heap on a set of numbers: first find the largest number in your set and have it be the root of the heap, then split the rest of the numbers in half, and recursively make heaps on those sets.

Such an algorithm is clearly correct, but it is also clearly \(\Theta(n \log n)\). We can do better with a bottom-up algorithm. If we continue imagining the heap as a tree, we start by putting the input numbers into the tree willy-nilly. This clearly does not satisfy the heap properties laid out above. However, some of it does satisfy the heap properties – the leaves of the tree are trivially heaps. If we go up one level from the leaves, we can fix the trees rooted there by exchanging the root of the tree with its largest child (or not exchanging it if it's already the largest of the three). Higher levels are a bit more difficult, because if the root of a tree is exchanged, then we must make sure to fix the tree that it ends up being the root of. You can imagine the root of a tree traveling down the heap until it is larger than both of its children.

The correctness of this algorithm is a bit harder to see and it also appears to take \(O(n \log n)\) time. It does, but there is a slightly more sophisticated analysis that shows that it is really \(\Theta(n)\). I won't go into the analysis, but a hint is that most of the values don't actually travel very far in the tree.

Using a heap to sort

With the heap properties in mind, we can easily see how to get the largest value of a set of numbers – just take the top element from the heap. How can we use the heap properties to sort? Well, we want the largest number, then the largest number that remains, then the largest number that remains after that, and so on. So if we can take the largest number from the heap and then fix the heap so that it retains the heap properties, then we'd be done.

We just devised a way to fix heaps when building the heap, so we use that. What we do is to take the very last node in the heap (which is not necessarily the smallest, but it doesn't hurt to think about it as the smallest) and put that at the top of the heap. The resulting tree is clearly not a heap, but if we call the algorithm to fix heaps on the root of the tree, then we end up with a heap again. The node that we put on top of the heap might end up traveling all the way to the bottom, so this update takes \(\Theta(\log n)\) time. Thus if we sort the entire input set, we have a \(\Theta(n \log n)\) algorithm.

Advantages

That's the best we can do theoretically, which is great, but the Java sort algorithm is also \(\Theta(n \log n)\), so why is there any advantage to using heap-sort? In a language where laziness is embraced (such as Clojure), heap-sort can be made lazy. That is, the next element of the sorted list can be computed only when it is needed. Since the build-heap procedure described above takes linear time, getting the first element from the sorted list takes \(O(n)\) time. Each subsequent element then takes \(O(\log n)\) time. Thus, if only a small number of elements from the sorted list are needed, then this lazy version of heap-sort is theoretically faster than other sorts.

I can think of situations where this would actually have practical advantages. For example, what if you were writing a search engine and wanted to obtain the \(k\) best results? You could write an ad-hoc function that found the best result, removed it and recursed \(k - 1\) times. Or you could just (take k (heap-sort input)). The first would take \(O(kn)\) time, whereas the second would take \(O(k \log n + n)\) time. In many practical situations, \(k\) is \(\Omega(\log n)\), which means that the first takes \(\Omega(n \log n)\) time, whereas the second takes only \(O(n)\) time. Essentially, the first is no better than the second solution with a non-lazy sorting algorithm.

Disadvantages

Heap-sort has some disadvantages compared to other sorts of sort algorithms. The least theoretically significant is that the constants hidden in the big-O notation are higher than other sorting algorithms (tuned versions of quicksort can have extremely small constants).

Another disadvantage can be seen when dealing with data sets so large that they no longer fit in the computer's main memory. Something like merge-sort can be modified fairly easily so that the number I/O operations is minimized. I haven't thought about it too deeply, but this doesn't seem quite so easy with heap-sort. However, I think that people dealing with such large datasets should probably be using specialized libraries anyway, so perhaps that isn't too bad.

Implementation

This whole discussion is a bit useless if it only remains at the theoretical level. I have an implementation in my github repo that implements most of the ideas that are given above. The code is highly optimized so that it is somewhat competitive with the native Java implementation. This makes the code on the HEAD of the master branch somewhat less than readable. However, the first commit to the repository used Clojure vectors and a functional style, so if you would like to understand the code, you might start there.

I (unfortunately) needed to use Java arrays and mutation rather than the more functional style that I have gotten used to, but the results speak for themselves. Finding the first few elements of a sorted array is significantly faster than the Java version. Finding the entire sorted list is somewhat slower than the Java version, but not too much. This is not surprising for a couple of reasons. First, heap-sort tends to have larger constants than other sorting methods. Secondly, this code is one day old. The Java sort method has had years to be optimized.

Conclusion

It may be dreaming, but I would love to see this idea (if not this implementation) put into Clojure proper. I think the advantages from laziness outweigh the small constant slowdown versus using Java's sort.

Posted Sun 27 Nov 2011 06:23:34 AM PST Tags:

Leaving Monson

When I last left my hiking tale, Picker, Grinner, Frosty and I had just gotten to Monson in time for breakfast. We were hustled into showers and given clean clothes to put on while ours were in the washer. We decided to stay in Monson for the rest of the day. A bit of hiking terminology: a zero is a day in which no walking is done. What we were doing is called a nearo (sometimes written near-0), short for near-zero. It's one of the most enjoyable ways to spend a day. You only need to pay for one night at the hostel or hotel, but you have all the fun of a town day.

Anyway, breakfast was great – unlimited pancakes, eggs, and some of the best home fries I had on the trail. The big event of the day was the World Cup final. The town of Monson is extremely small. There was one bar and one convenience store as well as the hostel we were staying at, named Shaw's. We watched the final at the bar. I was disappointed that the Netherlands lost, but I remember it being a really dirty game, so I guess the better team won in the end.

Frosty got a six-pack of beer for us to split, and it was through this beer that we met Wounded Knee. The four of us had taken a seat to chat and drink the beer, when Wounded Knee showed up with one of the beers in his hand. He began regaling us with hiking stories and bad jokes. Later, we figured that he probably thought he was paying for the beer by telling us his great stories. I think we would have been happier if he had just taken the beer and left us alone. (Cruel, but true – at least he did provide an excellent topic of conversation over the next month.)

After a rather fitful sleep, Picker, Grinner, and I left Frosty in Monson. He wanted to stay another day to rest after the 100-mile wilderness. Unfortunately, it was the last time we would see him on our hike – he got a stress fracture in his foot and stopped about 50 miles later.

As we were being driven back to the trail, we were given a warning about a "bahd hikeah" (a bad hiker in the heavy Northeastern accent that is common in Maine). He had apparently chosen to swim a dangerous river for which a ferry is provided after heaping abuse on the ferry operator. He would not be served in Monson – my first hint that there is a little network of hostel operators up and down the trail. We actually did cross paths with the guy, and he looked more like a motorcycle enthusiast than a hiker, but other than that, there was no indication that he was really a bad guy.

The next day, Grinner was not feeling great. I think it was for this reason that we did not go very far. On the subsequent day, it became clear that Grinner was really sick. Luckily, they had a cell phone with them, and were able to call Shaw's. Grinner was picked up, and as she left, she told me to take care of Picker. Which showed that she hadn't lost her sense of humor, because Picker is a bad-ass Army veteran who had been in Iraq.

Without Grinner

Picker was obviously concerned about his wife, so we made a plan to get to the next town as quickly as possible. We got to the Kennebec river (the one with the ferry) unfortunately after the ferry had stopped for the night. Partly to give myself something to do, and partly to cheer Picker up, I decided to find us some sodas. The small town of Caratunk is near the trail at that point, but it didn't even have a convenience store, so I had to hitchhike to a small resort that had vending machines. It was my first time hitching a ride, and I was glad that it worked out well.

The next day, we caught the first ferry across the river. When one imagines a ferry, it is usually a fairly large boat. This was not. It was a small canoe, operated by a local hippie. I helped row us across the river, and we set out as fast as possible. That was our first twenty-mile day. The scenery was great, with a nice waterfall near the trail, but we were in a rush to see Grinner again. Near the end of the day, the trail was rerouted, adding an extra couple of miles to our day. We were quite glad to get to the campsite on the North side of Little Bigelow Mountain.

This put us only about 15 miles from Stratton, where we assumed that Grinner would be. However, there were two big climbs between here and there. We must have gotten to the top of the first fairly early in the morning. We crossed a road in which someone had painted "2000 mi.", meaning that we still had two thousand miles to go -- even after almost two weeks of hiking.

When we got to the camp site five miles from Stratton, there was a decision to make. We had done 10 miles, but it had been 10 hard miles. The rest of the way to the road would be all downhill, and there would be no more camping possibilities. Picker left the decision up to me, but I could tell that he really wanted to go on. After a rest for my back (which still hadn't gotten as strong as would get), I decided that we should try to get to the road. We also talked to a northbounder who said that the trail was pretty easy and that we could make it. At that point, I had not learned to mistrust everything northbounders say.

I would have done well to mistrust the northbounder. The descent was really difficult. However, I did see a beaver, so that made it kind of worth it. When we got within two miles of the road, Picker started running. I trotted along behind him, and we got to the road within thirty minutes. We called the Stratton motel and were heartened to find that Grinner was already there. She was feeling better, which was a huge relief.

Stratton

Since we had gone faster than expected between Monson and Stratton, and since Grinner still wasn't 100%, I decided to take an actual zero in Stratton. I got a ride to Rangeley to get some better shoes and hiking poles at the outfitter there. I still have those poles after more than 4000 miles, so I feel comfortable recommending them.

The other big event from our zero day in Stratton is that we found a lady that was selling pies from her house. She didn't have the proper food license to sell slices, so we were forced to buy a whole cheesecake. We had no choice in the matter. I think we showed great restraint by not eating the entire cheesecake in one sitting – we saved some for breakfast the next day.

Oh, and one final story from Stratton. We had caught up to Wounded Knee by getting there a day earlier than scheduled. During conversation, he mentioned that he had an injury that he needed some help putting a bandage on. Somehow, I was volunteered to help him out. It turned out that he had a really bad chafe on his bum and couldn't quite see it to put the bandage on himself. So I got to help out with that. Blech.

Next time

In the next installment, we go through Southern Maine. This includes some of the most scenic and dangerous parts of the trail, including the Mahoosuc Notch.

Posted Mon 28 Nov 2011 08:52:35 AM PST Tags:

Convex Hulls Three Ways

Whenever I watch cooking competition shows, they always have chefs presenting a foodstuff cooked in multiple different ways. Today, I'm doing that with algorithms.

The algorithm in question today is the convex hull algorithm. In order of implementation complexity, and descending order of theoretical running time, there is the Jarvis March, Graham's Scan, and Chan's Algorithm. All three are implemented in Clojure in my github repository.

Jarvis March

The simplest of the algorithms, the Jarvis March was also one of the first output-sensitive computational geometry algorithms. In a nutshell, you first find a point that you know to be on the convex hull, and then you find the next point by looking at all the rest of the points and determining which one has a segment that has the property that all the rest of the points are on one side of it. You repeatedly find the next point using this procedure until you get back to the first point. There is basically no way to implement this algorithm that does not have a running time of \(O(hn)\), where \(h\) is the number of points on the convex hull and \(n\) is the number of input points.

The one implementation detail of the Jarvis March that is interesting is that whenever you see the concept of "finding the next thing" given some previous information, the Clojure implementation should almost always be lazy. It turns out that implementing Jarvis March lazily will help in implementing Chan's Algorithm, so keep that in mind.

Graham's Scan

Graham's Scan is one of the algorithms I remember most vividly from the undergraduate computational geometry course that I took. The professor, Godfried Toussaint, always referred to it as the "three coins" algorithm, so I have kept up that tradition in some of my function names in my implementation.

The algorithm first makes a polygon of the input points by sorting them by angle about the bottom-most point. Then it goes around the polygon with a stack, pushing and popping points as it goes. If that sounds familiar, it should – it's the same idea as what I was talking about when I brought up the idea of parsing polygons a week and a half ago.

Thus, I used the same polygon-parsing monad in my implementation as when I computed the visibility polygons last week. It still works just as well.

Since the points must be sorted, Graham's Scan takes \(\Theta(n \log n)\). Sorting can be reduced to computing convex hulls, so computing convex hulls has a \(\Omega(n \log n)\) lower bound, meaning that this algorithm is optimal.

But Chan's algorithm is better. Weird huh?

Chan's Algorithm

I must confess that I had always been a little intimidated by Chan's Algorithm. It was invented by Timothy Chan, who has a well-earned reputation for being a genius, so I thought it would be really complicated. It's not. There is a decent description of it on Wikipedia, so I won't go into the details. The gist is that you combine the two previous algorithms that I discussed. The Jarvis March needs to be modified so that the points can be input as a list of smaller convex hulls, and the next point on the convex hull is found by doing a binary search on the smaller convex hulls. But that is really the hardest part about the algorithm. I have put the whole thing below, because I think it's pretty beautiful.

(defn chans-algorithm
  "Finds the convex hull of a set of points by
   the algorithm known as 'Chan's Algorithm'."
  [pts]
  (let [bottom-pt (apply min-key :y pts)]
   (loop [m 3] ; start out with triangles
     (let [partitioned-pts (partition m m [] pts)
           c-hs (map grahams-scan partitioned-pts)
           potential-ch (take m (apply jarvis-march pts c-hs))]
       (if (= bottom-pt (last potential-ch)) ; assumes jarvis-march returns bottom-pt last
         potential-ch
         (recur (min (* m m) (count pts))))))))

The great thing about Chan's Algorithm is that it is also output-sensitive. But instead of being \(O(nh)\) (which is \(O(n^2)\) in the worst case), it is \(O(n \log h)\), which is at least as good as Graham's Scan, but often better. It is also quite simple to implement, given decent implementations of Jarvis March and Graham's Scan.

Conclusion

Convex hull algorithms are great. If I was ever to teach a computational geometry course (admittedly that's looking like a long shot now), I might start and finish the course with them. The progression from the ultra-simple Jarvis March to the more-complicated Chan's Algorithm is really nice, and there are interesting new things to talk about the whole way. They also show that computational geometry is not so hard to do in a functional style. In fact, using laziness is what makes the implementation of Chan's Algorithm so simple. So this might make a nice talk to give people who are already into functional programming as well.

The next thing I have in mind for this project is to animate the algorithms. Viewing the output of algorithms is already pretty easy using Processing, but I would like to be able to see them as they are operating. It would be great if I could do that without changing the code too much. I have a couple of ideas, but I'm not sure if they'll work yet.

Also, it is slightly embarrassing to admit, but my blogging system seems to not support putting images in posts. So I am going to have to figure out how to work around (or even fix) that before I can show any results.

Posted Wed 30 Nov 2011 10:06:31 AM PST Tags:
blog comments powered by Disqus