Summer project
Aug. 3rd, 2005 10:48 pmIt turns out that implementing SIMPL requires me to write two pieces of code: the "hill-climbing" routine to find good axes of projection, and Ross Quinlan's C4.5 algorithm for generating decision trees. I've finished transliterating a very limited version of C4.5 into Python, and that's here. The status of this is dubious because the code I started from isn't GPL or public-domain. Mu~
The next thing that I need to do should be easier; I just need to write some vector stuff. Also I can't find the Bayes Motel code, gar. And then SIMPL will be loose upon the world!
SIMPL is a lot slower than Bayes, unfortunately, and it takes Θ(documents already in corpus + n) time to add n documents and update the tree and axes. That's a major disadvantage. ^_^; On the other hand, that won't be a limitation for problems of this size.