Contents

## Deterministic AnnealingIndiana UniversityCS Theory groupJanuary 23 2012Geoffrey

Stabe, Paul, News Director has reference to this Academic Journal, PHwiki organized this Journal Deterministic AnnealingIndiana UniversityCS Theory groupJanuary 23 2012Geoffrey Foxgcf@indiana.edu http://www.infomall.org http://www.futuregrid.org Director, Digital Science Center, Pervasive Technology InstituteAssociate Dean as long as Research in addition to Graduate Studies, School of In as long as matics in addition to ComputingIndiana University BloomingtonAbstractWe discuss general theory behind deterministic annealing in addition to illustrate with applications to mixture models (including GTM in addition to PLSA), clustering in addition to dimension reduction. We cover cases where the analyzed space has a metric in addition to cases where it does not. We discuss the many open issues in addition to possible further work as long as methods that appear to outper as long as m the st in addition to ard approaches but are in practice not used.2ReferencesKen Rose, Deterministic Annealing as long as Clustering, Compression, Classification, Regression, in addition to Related Optimization Problems. Proceedings of the IEEE, 1998. 86: p. 2210-2239.References earlier papers including his Caltech Elec. Eng. PhD 1990T Hofmann, JM Buhmann, Pairwise data clustering by deterministic annealing, IEEE Transactions on Pattern Analysis in addition to Machine Intelligence 19, pp1-13 1997.Hansjörg Klock in addition to Joachim M. Buhmann, Data visualization by multidimensional scaling: a deterministic annealing approach, Pattern Recognition, Volume 33, Issue 4, April 2000, Pages 651-669.Frühwirth R, Waltenberger W: Redescending M-estimators in addition to Deterministic Annealing, with Applications to Robust Regression in addition to Tail Index Estimation. http://www.stat.tugraz.at/AJS/ausg083+4/08306Fruehwirth.pdf Austrian Journal of Statistics 2008, 37(3&4):301-317.Review http://grids.ucs.indiana.edu/ptliupages/publications/pdac24g-fox.pdfRecent algorithm work by Seung-Hee Bae, Jong Youl Choi (Indiana CS PhDs)http://grids.ucs.indiana.edu/ptliupages/publications/CetraroWriteupJune11-09.pdf http://grids.ucs.indiana.edu/ptliupages/publications/hpdc2010-submission-57.pdf

This Particular University is Related to this Particular Journal

Some GoalsWe are building a library of parallel data mining tools that have best known (to me) robustness in addition to per as long as mance characteristicsBig data needs super algorithmsA lot of statistics tools (e.g. in R) are not the best algorithm in addition to not always well parallelizedDeterministic annealing (DA) is one of better approaches to optimizationTends to remove local optimaAddresses overfittingFaster than simulated annealingReturn to my heritage (physics) with an approach I called Physical Computation (cf. also genetic algs) – methods based on analogies to naturePhysics systems find true lowest energy state if you anneal i.e. you equilibrate at each temperature as you coolSome Ideas IDeterministic annealing is better than many well-used optimization problemsStarted as Elastic Net by Durbin as long as Travelling Salesman Problem TSPBasic idea behind deterministic annealing is mean field approximation, which is also used in Variational Bayes in addition to many neural network approachesMarkov chain Monte Carlo (MCMC) methods are roughly single temperature simulated annealing Less sensitive to initial conditions Avoid local optima Not equivalent to trying r in addition to om initial startsSome non-DA Ideas IIDimension reduction gives Low dimension mappings of data to both visualize in addition to apply geometric hashingNo-vector (cant define metric space) problems are O(N2) For no-vector case, one can develop O(N) or O(NlogN) methods as in Fast Multipole in addition to OctTree methodsMap high dimensional data to 3D in addition to use classic methods developed originally to speed up O(N2) 3D particle dynamics problems

Uses of Deterministic AnnealingClusteringVectors: Rose (Gurewitz in addition to Fox) Clusters with fixed sizes in addition to no tails (Proteomics team at Broad)No Vectors: Hofmann in addition to Buhmann (Just use pairwise distances)Dimension Reduction as long as visualization in addition to analysis Vectors: GTMNo vectors: MDS (Just use pairwise distances)Can apply to general mixture models (but less study)Gaussian Mixture ModelsProbabilistic Latent Semantic Analysis with Deterministic Annealing DA-PLSA as alternative to Latent Dirichlet Allocation (typical in as long as mational retrieval/global inference topic model)Deterministic Annealing IGibbs Distribution at Temperature T P() = exp( – H()/T) / d exp( – H()/T)Or P() = exp( – H()/T + F/T ) Minimize Free Energy combining Objective Function in addition to Entropy F = < H - T S(P) > = d {P()H + T P() lnP()}Where are (a subset of) parameters to be minimizedSimulated annealing corresponds to doing these integrals by Monte CarloDeterministic annealing corresponds to doing integrals analytically (by mean field approximation) in addition to is naturally much faster than Monte CarloIn each case temperature is lowered slowly say by a factor 0.95 to 0.99 at each iterationDeterministic Annealing Minimum evolving as temperature decreases Movement at fixed temperature going to local minima if not initialized correctlySolve Linear Equations as long as each temperatureNonlinear effects mitigated by initializing with solution at previous higher temperatureF({y}, T)Configuration {y}

Deterministic Annealing IIFor some cases such as vector clustering in addition to Mixture Models one can do integrals by h in addition to but usually will be impossibleSo introduce Hamiltonian H0(, ) which by choice of can be made similar to real Hamiltonian HR() in addition to which has tractable integralsP0() = exp( – H0()/T + F0/T ) approximate Gibbs as long as HRFR (P0) = < HR - T S0(P0) >0 = < HR H0> 0 + F0(P0)Where < >0 denotes d Po()Easy to show that real Free Energy (the Gibbs inequality) FR (PR) FR (P0) (Kullback-Leibler divergence)Expectation step E is find minimizing FR (P0) in addition to Follow with M step (of EM) setting = <> 0 = d Po() (mean field) in addition to one follows with a traditional minimization of remaining parameters10Note 3 types of variables used to approximate real Hamiltonian subject to annealingThe rest optimized by traditional methodsImplementation of DA Central ClusteringClustering variables are Mi(k) (these are annealed in general approach) where this is probability point i belongs to cluster k in addition to k=1K Mi(k) = 1In Central or PW Clustering, take H0 = i=1N k=1K Mi(k) i(k)Linear as long as m allows DA integrals to be done analyticallyCentral clustering has i(k) = (X(i)- Y(k))2 in addition to Mi(k) determined by Expectation stepHCentral = i=1N k=1K Mi(k) (X(i)- Y(k))2 Hcentral in addition to H0 are identical

General Features of DADeterministic Annealing DA is related to Variational Inference or Variational Bayes methodsIn many problems, decreasing temperature is classic multiscale finer resolution (T is just distance scale)We have factors like (X(i)- Y(k))2 / TIn clustering, one then looks at second derivative matrix of FR (P0) wrt in addition to as temperature is lowered this develops negative eigenvalue corresponding to instabilityOr have multiple clusters at each center in addition to perturbThis is a phase transition in addition to one splits cluster into two in addition to continues EM iterationOne can start with just one cluster1314Rose, K., Gurewitz, E., in addition to Fox, G. C. “Statistical mechanics in addition to phase transitions in clustering,” Physical Review Letters, 65(8):945-948, August 1990.My 6 most cited article (402 cites including 15 in 2011)15Start at T= with 1 ClusterDecrease T, Clusters emerge at instabilities

1617A(k) = – 0.5 i=1N j=1N (i, j)

Continuous Clustering IThis is a subtlety introduced by Ken Rose but not clearly known in communityLets consider dynamic appearance of clusters a little more carefully. We suppose that we take a cluster k in addition to split into 2 with centers Y(k)A in addition to Y(k)B with initial values Y(k)A = Y(k)B at original center Y(k)Then typically if you make this change in addition to perturb the Y(k)A Y(k)B, they will return to starting position as F at stable minimum But instability can develop in addition to one finds19Y(k)A – Y(k)BContinuous Clustering IIAt phase transition when eigenvalue corresponding to Y(k)A – Y(k)B goes negative, F is a minimum if two split clusters move together but a maximum if they separatei.e. two genuine clusters are as long as med at instability pointsWhen you split A(k) , Bi(k), i(k) are unchanged in addition to you would hope that cluster counts C(k) in addition to probabilities

A(k) = – 0.5 i=1N j=1N (i, j)

High Per as long as mance Dimension Reduction in addition to VisualizationNeed is pervasiveLarge in addition to high dimensional data are everywhere: biology, physics, Internet, Visualization can help data analysis Visualization of large datasets with high per as long as manceMap high-dimensional data into low dimensions (2D or 3D).Need Parallel programming as long as processing large data setsDeveloping high per as long as mance dimension reduction algorithms: MDS(Multi-dimensional Scaling)GTM(Generative Topographic Mapping)DA-MDS(Deterministic Annealing MDS) DA-GTM(Deterministic Annealing GTM) Interactive visualization tool PlotVizMultidimensional Scaling MDSMap points in high dimension to lower dimensionsMany such dimension reduction algorithms (PCA Principal component analysis easiest); simplest but perhaps best at times is MDSMinimize Stress (X) = i

Stabe, Paul is from United States and they belong to KVRD-FM and they are from Cottonwood, United States got related to this Particular Journal. and Stabe, Paul deal with the subjects like Local News; Music; National News; Regional News

## Journal Ratings by Huron University

This Particular Journal got reviewed and rated by Huron University and short form of this particular Institution is US and gave this Journal an Excellent Rating.