Deterministic AnnealingNetworks in addition to Complex Systems Talk6pm, Wells Library 001In

Deterministic AnnealingNetworks in addition to Complex Systems Talk6pm, Wells Library 001In

Deterministic AnnealingNetworks in addition to Complex Systems Talk6pm, Wells Library 001In

Stabe, Paul, News Director has reference to this Academic Journal, PHwiki organized this Journal Deterministic AnnealingNetworks in addition to Complex Systems Talk6pm, Wells Library 001Indiana UniversityNovember 21 2011Geoffrey Director, Digital Science Center, Pervasive Technology InstituteAssociate Dean as long as Research in addition to Graduate Studies, School of In as long as matics in addition to ComputingIndiana University BloomingtonReferencesKen Rose, Deterministic Annealing as long as Clustering, Compression, Classification, Regression, in addition to Related Optimization Problems. Proceedings of the IEEE, 1998. 86: p. 2210-2239.References earlier papers including his Caltech Elec. Eng. PhD 1990T Hofmann, JM Buhmann, “Pairwise data clustering by deterministic annealing”, IEEE Transactions on Pattern Analysis in addition to Machine Intelligence 19, pp1-13 1997.Hansjörg Klock in addition to Joachim M. Buhmann, “Data visualization by multidimensional scaling: a deterministic annealing approach”, Pattern Recognition, Volume 33, Issue 4, April 2000, Pages 651-669.Recent algorithm work by Seung-Hee Bae, Jong Youl Choi (Indiana CS PhD’s) GoalsWe are building a library of parallel data mining tools that have best known (to me) robustness in addition to per as long as mance characteristicsBig data needs super algorithmsA lot of statistics tools (e.g. in R) are not the best algorithm in addition to not always well parallelizedDeterministic annealing (DA) is one of better approaches to optimizationTends to remove local optimaAddresses overfittingFaster than simulated annealingReturn to my heritage (physics) with an approach I called Physical Computation (23 years ago) – methods based on analogies to naturePhysics systems find true lowest energy state if you anneal i.e. you equilibrate at each temperature as you cool

Huntington College US

This Particular University is Related to this Particular Journal

Some Ideas IDeterministic annealing is better than many well-used optimization problemsStarted as “Elastic Net” by Durbin as long as Travelling Salesman Problem TSPBasic idea behind deterministic annealing is mean field approximation, which is also used in “Variational Bayes” in addition to many “neural network approaches”Markov chain Monte Carlo (MCMC) methods are roughly single temperature simulated annealing Less sensitive to initial conditions Avoid local optima Not equivalent to trying r in addition to om initial startsSome non-DA Ideas IIDimension reduction gives Low dimension mappings of data to both visualize in addition to apply geometric hashingNo-vector (can’t define metric space) problems are O(N2) For no-vector case, one can develop O(N) or O(NlogN) methods as in “Fast Multipole in addition to OctTree methods”Map high dimensional data to 3D in addition to use classic methods developed originally to speed up O(N2) 3D particle dynamics problemsUses of Deterministic AnnealingClusteringVectors: Rose (Gurewitz in addition to Fox) Clusters with fixed sizes in addition to no tails (Proteomics team at Broad)No Vectors: Hofmann in addition to Buhmann (Just use pairwise distances)Dimension Reduction as long as visualization in addition to analysis Vectors: GTMNo vectors: MDS (Just use pairwise distances)Can apply to general mixture models (but less study)Gaussian Mixture ModelsProbabilistic Latent Semantic Analysis with Deterministic Annealing DA-PLSA as alternative to Latent Dirichlet Allocation (typical in as long as mational retrieval/global inference topic model)

Deterministic Annealing IGibbs Distribution at Temperature T P() = exp( – H()/T) / d exp( – H()/T)Or P() = exp( – H()/T + F/T ) Minimize Free Energy combining Objective Function in addition to Entropy F = < H - T S(P) > = d {P()H + T P() lnP()}Where are (a subset of) parameters to be minimizedSimulated annealing corresponds to doing these integrals by Monte CarloDeterministic annealing corresponds to doing integrals analytically (by mean field approximation) in addition to is naturally much faster than Monte CarloIn each case temperature is lowered slowly – say by a factor 0.95 to 0.99 at each iterationDeterministic Annealing Minimum evolving as temperature decreases Movement at fixed temperature going to local minima if not initialized “correctlySolve Linear Equations as long as each temperatureNonlinear effects mitigated by initializing with solution at previous higher temperatureF({y}, T)Configuration {y}Deterministic Annealing IIFor some cases such as vector clustering in addition to Mixture Models one can do integrals by h in addition to but usually will be impossibleSo introduce Hamiltonian H0(, ) which by choice of can be made similar to real Hamiltonian HR() in addition to which has tractable integralsP0() = exp( – H0()/T + F0/T ) approximate Gibbs as long as HFR (P0) = < HR - T S0(P0) >0 = < HR – H0> 0 + F0(P0)Where < >0 denotes d Po()Easy to show that real Free Energy (the Gibb’s inequality) FR (PR) FR (P0) (Kullback-Leibler divergence)Expectation step E is find minimizing FR (P0) in addition to Follow with M step (of EM) setting = <> 0 = d Po() (mean field) in addition to one follows with a traditional minimization of remaining parameters9Note 3 types of variables used to approximate real Hamiltonian subject to annealingThe rest – optimized by traditional methods

Implementation of DA Central ClusteringClustering variables are Mi(k) (these are in general approach) where this is probability point i belongs to cluster kIn Central or PW Clustering, take H0 = i=1N k=1K Mi(k) i(k)Linear as long as m allows DA integrals to be done analyticallyCentral clustering has i(k) = (X(i)- Y(k))2 in addition to Mi(k) determined by Expectation stepHCentral = i=1N k=1K Mi(k) (X(i)- Y(k))2 Hcentral in addition to H0 are identical = exp( -i(k)/T ) / k=1K exp( -i(k)/T )Centers Y(k) are determined in M step10Implementation of DA-PWCClustering variables are again Mi(k) (these are in general approach) where this is probability point i belongs to cluster kPairwise Clustering Hamiltonian given by nonlinear as long as mHPWC = 0.5 i=1N j=1N (i, j) k=1K Mi(k) Mj(k) / C(k) (i, j) is pairwise distance between points i in addition to jwith C(k) = i=1N Mi(k) as number of points in Cluster kTake same as long as m H0 = i=1N k=1K Mi(k) i(k) as as long as central clusteringi(k) determined to minimize FPWC (P0) = < HPWC - T S0(P0) >0 where integrals can be easily doneAnd now linear (in Mi(k)) H0 in addition to quadratic HPC are differentAgain = exp( -i(k)/T ) / k=1K exp( -i(k)/T )11General Features of DADeterministic Annealing DA is related to Variational Inference or Variational Bayes methodsIn many problems, decreasing temperature is classic multiscale – finer resolution (T is “just” distance scale)We have factors like (X(i)- Y(k))2 / TIn clustering, one then looks at second derivative matrix of FR (P0) wrt in addition to as temperature is lowered this develops negative eigenvalue corresponding to instabilityOr have multiple clusters at each center in addition to perturbThis is a phase transition in addition to one splits cluster into two in addition to continues EM iterationOne can start with just one cluster12

13Start at T= “” with 1 ClusterDecrease T, Clusters emerge at instabilities1415

16Rose, K., Gurewitz, E., in addition to Fox, G. C. “Statistical mechanics in addition to phase transitions in clustering,” Physical Review Letters, 65(8):945-948, August 1990.My 5 most cited article (387 cites)A(k) = – 0.5 i=1N j=1N (i, j) / 2Bj(k) = i=1N (i, j) / i(k) = (Bi(k) + A(k)) = p(k) exp( -i(k)/T )/k=1K p(k) exp(-i(k)/T)C(k) = i=1N p(k) = C(k) / N Loop to converge variables; decrease T from ; split centers by halving p(k) DA-PWC EM Steps (E is red, M Black) k runs over clusters; i,j points17Steps 1 global sum (reduction)Step 1, 2, 5 local sum if broadcastTrimmed ClusteringClustering with position-specific constraints on variance: Applying redescending M-estimators to label-free LC-MS data analysis (Rudolf Frühwirth , D R Mani in addition to Saumyadipta Pyne) BMC Bioin as long as matics 2011, 12:358HTCC = k=0K i=1N Mi(k) f(i,k)f(i,k) = (X(i) – Y(k))2/2(k)2 k > 0f(i,0) = c2 / 2 k = 0The 0’th cluster captures (at zero temperature) all points outside clusters (background)Clusters are trimmed (X(i) – Y(k))2/2(k)2 < c2 / 2 Another case when H0 is same as target HamiltonianProteomics Mass Spectrometry High Per as long as mance Dimension Reduction in addition to VisualizationNeed is pervasiveLarge in addition to high dimensional data are everywhere: biology, physics, Internet, Visualization can help data analysis Visualization of large datasets with high per as long as manceMap high-dimensional data into low dimensions (2D or 3D).Need Parallel programming as long as processing large data setsDeveloping high per as long as mance dimension reduction algorithms: MDS(Multi-dimensional Scaling)GTM(Generative Topographic Mapping)DA-MDS(Deterministic Annealing MDS) DA-GTM(Deterministic Annealing GTM) Interactive visualization tool PlotVizMultidimensional Scaling MDSMap points in high dimension to lower dimensionsMany such dimension reduction algorithms (PCA Principal component analysis easiest); simplest but perhaps best at times is MDSMinimize Stress (X) = i

440K Interpolated25A large cluster in Region 02626 Clusters in Region 427

13 Clusters in Region 62829Underst in addition to ing the Octopi30The octopi are globular clusters distorted by length dependence of dissimilarity measureSequences are 200 to 500 base pairs longWe restarted project using local (SWG) not global (NW) alignment

May Need New AlgorithmsDA-PWC (Deterministically Annealed Pairwise Clustering) splits clusters automatically as temperature lowers in addition to reveals clusters of size O(T)Two approaches to splittingLook at correlation matrix in addition to see when becomes singular which is a separate parallel stepFormulate problem with multiple centers as long as each cluster in addition to perturb ever so often spitting centers into 2 groups; unstable clusters separateCurrent MPI code uses first method which will run on Twister as matrix singularity analysis is the usual “power eigenvalue method” (as is page rank) However not super good compute/communicate ratioExperiment with second method which “just” EM with better compute/communicate ratio (simpler code as well)55Next StepsFinalize MPI in addition to Twister versions of Deterministically Annealed Expectation Maximization as long as Vector Clustering Vector Clustering with trimmed clustersPairwise non vector ClusteringMDS SMACOFExtend O(NlogN) Barnes Hut methods to all codesAllow missing distances in MDS (Blast gives this) in addition to allow arbitrary weightings (Sammon’s method)Have done as long as 2 approach to MDS Explore DA-PLSA as alternative to LDAExploit better Twister in addition to Twister4Azure runtimes56

Stabe, Paul News Director

Stabe, Paul is from United States and they belong to KYBC-AM and they are from  Cottonwood, United States got related to this Particular Journal. and Stabe, Paul deal with the subjects like Local News

Journal Ratings by Huntington College

This Particular Journal got reviewed and rated by Huntington College and short form of this particular Institution is US and gave this Journal an Excellent Rating.