synchcola | Aug. 4th, 2005

Okay, now I need to debug it and find less inefficient ways of doing things. ^_^

I'd like to store documents as numpy arrays, with doc[j, 0] the word and doc[j, 1] the frequency, instead of as dictionaries.

I want some kind of speedup when adding only a few documents. One thing I can do is store the initial Fisher's discriminant and begin the new hill-climbing phase with that. Or I can leave it alone! For large enough documents, I could cache the first two discriminants.

I want to add C4.5's capability for determining decision trees from a random subset. (With those two improvements I could really speed up the algorithm when I only add a couple things, but in that case there wouldn't be much difference from ignoring the new stuff completely! And I would like to behave nicely if a mailing list suddenly appears -nyo. :P)

S	M	T	W	T	F	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Posts Found in a Bathtub

Aug. 4th, 2005

Aug. 4th, 2005

SIMPL journal

Profile

October 2024

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags