Jason Eisner ICML “Inferning” Workshop June 2012 Learning Approximate Infere

Jason Eisner ICML “Inferning” Workshop June 2012 Learning Approximate Infere www.phwiki.com

Jason Eisner ICML “Inferning” Workshop June 2012 Learning Approximate Infere

Moore, Bob, Operations Manager has reference to this Academic Journal, PHwiki organized this Journal Jason Eisner ICML “Inferning” Workshop June 2012 Learning Approximate Inference Policies as long as Fast Prediction Beware: Bayesians in Roadway A Bayesian is the person who writes down the function you wish you could optimize entailment correlation inflection cognates transliteration abbreviation neologism language evolution translation alignment editing quotation speech misspellings,typos as long as matting entanglement annotation N tokens To recover variables, model in addition to exploit their correlations

Keuka College US www.phwiki.com

This Particular University is Related to this Particular Journal

Motivating Tasks Structured prediction (e.g., as long as NLP problems) Parsing ( trees) Machine translation ( word strings) Word variants ( letter strings, phylogenies, grids) Unsupervised learning via Bayesian generative models Given a few verb conjugation tables in addition to a lot of text Find/organize/impute all verb conjugation tables of the language Motivating Tasks Structured prediction (e.g., as long as NLP problems) Parsing ( trees) Machine translation ( word strings) Word variants ( letter strings, phylogenies, grids) Unsupervised learning via Bayesian generative models Given a few verb conjugation tables in addition to a lot of text Find/organize/impute all verb conjugation tables of the language Given some facts in addition to a lot of text Discover more facts through in as long as mation extraction in addition to reasoning

Current Methods Dynamic programming Exact but slow Approximate inference in graphical models Are approximations any good May use dynamic programming as subroutine (structured BP) Sequential classification Speed-Accuracy Tradeoffs Inference requires lots of computation Is some computation going to waste Sometimes the best prediction is overdetermined Quick ad hoc methods sometimes work: how to respond Is some computation actively harmful In approximate inference, passing a message can hurt Frustrating to simplify model just to fix this Want to keep improving our models! But need good fast approximate inference Choose approximations automatically Tuned to data distribution & loss function “Trainable hacks” – more robust This talk is about “trainable hacks” Prediction device (suitable as long as domain) training data likelihood feedback

This talk is about “trainable hacks” Prediction device (suitable as long as domain) training data loss + runtime feedback Bayesian Decision Theory What prediction rule (approximate inference + beyond) What loss function (can include runtime) How to optimize (backprop, RL, ) What data distribution (may have to impute) This talk is about “trainable hacks” Prediction device (suitable as long as domain) Complete training data loss + runtime feedback

Part 1: Your favorite approximate inference algorithm is a trainable hack General CRFs: Unrestricted model structure . Add edges to model the conditional distribution well. But exact inference is intractable. So use loopy sum-product or max-product BP. Inference: compute properties of the posterior distribution. The cat sat on the mat . DT .9 NN .05 NN .8 JJ .1 VBD .7 VB .1 IN .9 NN .01 DT .9 NN .05 NN .4 JJ .3 .99 , .001 General CRFs: Unrestricted model structure

Decoding: coming up with predictions from the results of inference. The cat sat on the mat . DT NN VBD IN DT NN . General CRFs: Unrestricted model structure One uses CRFs with several approximations: Approximate inference. Approximate decoding. Mis-specified model structure. MAP training (vs. Bayesian). Why are we still maximizing data likelihood Our system is more like a Bayes-inspired neural network that makes predictions. Could be present in linear-chain CRFs as well. General CRFs: Unrestricted model structure Adjust to (locally) minimize training loss E.g., via back-propagation (+ annealing) “Empirical Risk Minimization under Approximations (ERMA)” Train directly to minimize task loss (Stoyanov, Ropson, & Eisner 2011; Stoyanov & Eisner 2012)

Optimization Criteria Optimization Criteria MLE Optimization Criteria MLE

Optimization Criteria MLE Experimental Results 3 NLP problems; also synthetic data We show that: General CRFs work better when they match dependencies in the data. Minimum risk training results in more accurate models. ERMA software package available at www.clsp.jhu.edu/~ves/software ERMA software package http://www.clsp.jhu.edu/~ves/software Includes syntax as long as describing general CRFs. Supports sum-product in addition to max-product BP. Can optimize several commonly used loss functions: MSE, Accuracy, F-score. The package is generic: Little ef as long as t to model new problems. About1-3 days to express each problem in our as long as malism.

Moore, Bob KJZA-FM Operations Manager www.phwiki.com

Modeling Congressional Votes The ConVote corpus [Thomas et al., 2006] First , I want to commend the gentleman from Wisconsin (Mr. Sensenbrenner), the chairman of the committee on the judiciary , not just as long as the underlying bill Modeling Congressional Votes The ConVote corpus [Thomas et al., 2006] First , I want to commend the gentleman from Wisconsin (Mr. Sensenbrenner), the chairman of the committee on the judiciary , not just as long as the underlying bill Yea Modeling Congressional Votes The ConVote corpus [Thomas et al., 2006] First , I want to commend the gentleman from Wisconsin (Mr. Sensenbrenner), the chairman of the committee on the judiciary , not just as long as the underlying bill Yea Had it not been as long as the heroic actions of the passengers of United flight 93 who as long as ced the plane down over Pennsylvania, congress’s ability to serve Yea Mr. Sensenbrenner

Modeling Congressional Votes The ConVote corpus [Thomas et al., 2006] First , I want to commend the gentleman from Wisconsin (Mr. Sensenbrenner), the chairman of the committee on the judiciary , not just as long as the underlying bill Yea Had it not been as long as the heroic actions of the passengers of United flight 93 who as long as ced the plane down over Pennsylvania, congress’s ability to serve Yea Mr. Sensenbrenner Modeling Congressional Votes An example from the ConVote corpus [Thomas et al., 2006] Predict representative votes based on debates. Y/N Modeling Congressional Votes An example from the ConVote corpus [Thomas et al., 2006] Predict representative votes based on debates. First , I want to commend the gentleman from Wisconsin (Mr. Sensenbrenner), the chairman of the committee on the judiciary , not just as long as the underlying bill Y/N

Common architecture There’s not a single best way But all of the methods share the same needs. Store data in addition to permit it to be queried. Fuse data – compute derived data using rules. Propagate updates to data, parameters, or hypotheses. Encapsulate data sources – both input data & analytics. Sensitivity analysis (e.g., back-propagation as long as training). Visualization. And benefit from the same optimizations. Decide what is worth the time to compute (next). Decide where to compute it (parallelism). Decide what is worth the space to store (data, memos, indices). Decide how to store it. Common architecture Dyna is not a probabilistic database, a graphical model inference package, FACTORIE, BLOG, Watson, a homebrew evidence combination system, It provides the common infrastructure as long as these. That’s where “all” the implementation ef as long as t lies. But does not commit to any specific data model, probabilistic semantics, or inference strategy. Summary (again) Principled Bayesian models of various interesting NLP domains. Discover underlying structure with little supervision Requires new learning in addition to inference algorithms Learn fast, accurate policies as long as structured prediction in addition to large-scale relational reasoning. Unified computational infrastructure as long as NLP in addition to AI. A declarative programming language that supports modularity Backed by a searchable space of strategies & data structures Machine learning + linguistic structure. Fashion statistical models that capture good intuitions about various kinds of linguistic structure. Develop efficient algorithms to apply these models to data. Be generic.

Moore, Bob Operations Manager

Moore, Bob is from United States and they belong to KJZA-FM and they are from  Prescott, United States got related to this Particular Journal. and Moore, Bob deal with the subjects like Entertainment; Music

Journal Ratings by Keuka College

This Particular Journal got reviewed and rated by Keuka College and short form of this particular Institution is US and gave this Journal an Excellent Rating.