Latent variable models have accumulated a considerable amount of interest from the industry and academia for their versatility in a wide range of applications. A large amount of effort has been made to develop systems that is able to extend the systems to a large scale, in the hope to make use of them on industry scale data. In this paper, we describe a system that operates at a scale orders of magnitude higher than previous works, and an order of magnitude faster than state-of-the-art system at the same scale, at the same time showing more robustness and more accurate results. Our system uses a number of advances in distributed inference: high performance in synchronization of sufficient statistics with relaxed consistency model; fast sampling, using the Metropolis-Hastings-Walker method to overcome dense generative models; statistical modeling, moving beyond Latent Dirichlet Allocation (LDA) to Pitman-Yor distributions (PDP) and Hierarchical Dirichlet Process (HDP) models; sophist...
The bigger the corpus, the more topics it can potentially support. To truly make full use of massive text corpora, a topic model inference algorithm must therefore scale efficiently in 1) documents and 2) topics, while 3) achieving accurate inference. Previous methods have achieved two out of three of these criteria simultaneously, but never all three at once. In this paper, we develop an online inference algorithm for topic models which leverages stochasticity to scale well in the number of documents, sparsity to scale well in the number of topics, and which operates in the collapsed representation of the topic model for improved accuracy and run-time performance. We use a Monte Carlo inner loop in the online setting to approximate the collapsed variational Bayes updates in a sparse and efficient way, which we accomplish via the Metropolis-Hastings Walker method. We showcase our algorithm on LDA and the recently proposed mixed membership skip-gram topic model. Our method requires only amortized O(k d) computation per word token instead of O(K) operations, where the number of topics occurring for a particular document k d ≪ the total number of topics in the corpus K, to converge to a high-quality solution.
We introduce building blocks from which a large variety of latent variable models can be built. The blocks include continuous and discrete variables, summation, addition, nonlinearity and switching. Ensemble learning provides a cost function which can be used for updating the variables as well as optimising the model structure. The blocks are designed to fit together and to yield efficient update rules. Emphasis is on local computation which results in linear computational complexity. We propose and test a structure with a hierachical nonlinear model for variances and means.
We describe anytime search procedures that (1) find disjoint subsets of recorded variables for which the members of each subset are d-separated by a single common unrecorded cause, if such exists;
Kuhn’s Structure of Scientific Revolutions is widely but belatedly recognized as the centerpiece of a philosophical shift from logical empiricism to a post-empiricist philosophy of science. The book was also an influential precursor to constructivist sociology of scientific knowledge despite Kuhn’s own ambivalence about that legacy. In several earlier papers (Rouse 1981, 1998, 2003, 2013), I introduced an alternative interpretation of Kuhn’s opening promise of “a decisive transformation in the image of science by which we are now possessed,” and noted that this second Kuhnian revolution in the philosophy of science had not (yet) occurred. Kuhn was implicitly proposing a shift in philosophical focus from scientific knowledge to sciences as research practices rather than just a post-empiricist conception of scientific knowledge. Ronald Giere (1985) offered a different reading of Kuhn’s significance for philosophy of science as initiating a naturalized philosophy of science. Giere also recognized that Kuhn’s naturalistic turn was at that time “generally rejected” by his successors in the discipline. In the 21st Century, however, both a naturalistic turn and a philosophical focus on scientific practice have become prominent within the discipline. My paper highlights both the continuities and discontinuities between Kuhn’s naturalistic philosophy of scientific practice and the development of those themes in the first two decades of this century. Keywords: Thomas Kuhn; scientific practice; naturalism; models; normativity
