Aug. 4th, 2005

synchcola: (Default)
Okay, now I need to debug it and find less inefficient ways of doing things. ^_^

I'd like to store documents as numpy arrays, with doc[j, 0] the word and doc[j, 1] the frequency, instead of as dictionaries.

I want some kind of speedup when adding only a few documents. One thing I can do is store the initial Fisher's discriminant and begin the new hill-climbing phase with that. Or I can leave it alone! For large enough documents, I could cache the first two discriminants.

I want to add C4.5's capability for determining decision trees from a random subset. (With those two improvements I could really speed up the algorithm when I only add a couple things, but in that case there wouldn't be much difference from ignoring the new stuff completely! And I would like to behave nicely if a mailing list suddenly appears -nyo. :P)

Profile

synchcola: (Default)
synchcola

October 2024

S M T W T F S
  12345
6789101112
13141516171819
202122 23242526
2728293031  

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated Aug. 14th, 2025 06:25 pm
Powered by Dreamwidth Studios