This is one of the results that comes up when I type my name into google.
Very cool!
I mean this is what I want to be associated with ;)
Let’s have an update
We are waaaay past June, and no LDA code has been written. Some papers have been read. Some new results out on the list. I spoke with my supervisor and she said my project was a good idea. Let’s see if I can make it…
I kind of stopped thinking about a whole py-lda, because I learned about the MAHOUT project, which has a perfectly good LDA implementation that can even take advantage of clusters. I want to investigate that further and maybe write MAHOUT plugins instead of making my own complete ML library…
On the other hand playing with NumPy arrays from C and python will probably be a good exercise in efficiency (and pointer counting).
On the theory side, I have managed to word the problem scientifically, but I am not sure how it can be solved…. wait. I just realized I was complicating the problem unnecessarily. Here is the simpler version:
Let p(x) be a discrete probability distribution over x \in {list of words}

Let
be a set of prob distributions and let

Can you find the optimal
which minimizes the Kullback-Liebler
divergence between p(x) and q(x), i.e
