Contents

## Weighted Deduction as a Programming Language Jason Eisner CMU in addition to Google, May 20

McKay, Tim, Operations Manager has reference to this Academic Journal, PHwiki organized this Journal Weighted Deduction as a Programming Language Jason Eisner CMU in addition to Google, May 2008 co-authors on various parts of this work: Eric Goldlust, Noah A. Smith, John Blatz, Wes Filardo, Wren Thornton An Anecdote from ACL05 -Michael Jordan An Anecdote from ACL05 -Michael Jordan

This Particular University is Related to this Particular Journal

Conclusions to draw from that talk Mike & his students are great. Graphical models are great. (because theyre flexible) Gibbs sampling is great. (because it works with nearly any graphical model) Matlab is great. (because it frees up Mike in addition to his students to doodle all day in addition to then execute their doodles) Mike & his students are great. Graphical models are great. (because theyre flexible) Gibbs sampling is great. (because it works with nearly any graphical model) Matlab is great. (because it frees up Mike in addition to his students to doodle all day in addition to then execute their doodles) Systems are big! Large-scale noisy data, complex models, search approximations, software engineering

Systems are big! Large-scale noisy data, complex models, search approximations, software engineering Maybe a bit smaller outside NLP But still big in addition to carefully engineered And will get bigger, e.g., as machine vision systems do more scene analysis in addition to compositional object modeling Systems are big! Large-scale noisy data, complex models, search approximations, software engineering Consequences: Barriers to entry Small number of players Significant investment to be taken seriously Need to know & implement the st in addition to ard tricks Barriers to experimentation Too painful to tear up in addition to reengineer your old system, to try a cute idea of unknown payoff Barriers to education in addition to sharing Hard to study or combine systems Potentially general techniques are described in addition to implemented only one context at a time How to spend ones life Didnt I just implement something like this last month chart management / indexing cache-conscious data structures memory layout, file as long as mats, integerization, prioritization of partial solutions (best-first, A) lazy k-best, as long as est reranking parameter management inside-outside as long as mulas, gradients, different algorithms as long as training in addition to decoding conjugate gradient, annealing, parallelization I thought computers were supposed to automate drudgery

Solution Presumably, we ought to add another layer of abstraction. After all, this is CS. Hope to convince you that a substantive new layer exists. But what would it look like Whats shared by many programs Can toolkits help Can toolkits help Hmm, there are a lot of toolkits. And theyre big too. Plus, they dont always cover what you want. Which is why people keep writing them. E.g., I love & use OpenFST in addition to have learned lots from its implementation! But sometimes I also want So what is common across toolkits automata with > 2 tapes infinite alphabets parameter training A decoding automatic integerization automata defined by policy mixed sparse/dense implementation (per state) parallel execution hybrid models (90% finite-state)

The Dyna language A toolkits job is to abstract away the semantics, operations, in addition to algorithms as long as a particular domain. In contrast, Dyna is domain-independent. (like MapReduce, Bigtable, etc.) Manages data & computations that you specify. Toolkits or applications can be built on top. Warning Lots more beyond this talk See http://dyna.org read our papers download an earlier prototype sign up as long as updates by email wait as long as the totally revamped next version A Quick Sketch of Dyna

How you build a system (big picture slide) cool model tuned C++ implementation (data structures, etc.) practical equations pseudocode (execution order) as long as width from 2 to n as long as i from 0 to n-width k = i+width as long as j from i+1 to k-1 PCFG How you build a system (big picture slide) as long as width from 2 to n as long as i from 0 to n-width k = i+width as long as j from i+1 to k-1 cool model tuned C++ implementation (data structures, etc.) pseudocode (execution order) PCFG Dyna language specifies these equations. Most programs just need to compute some values from other values. Any order is ok. Feed- as long as ward! Dynamic programming! Message passing! (including Gibbs) Must quickly figure out what influences what. Compute Markov blanket Compute transitions in state machine practical equations How you build a system (big picture slide) as long as width from 2 to n as long as i from 0 to n-width k = i+width as long as j from i+1 to k-1 cool model tuned C++ implementation (data structures, etc.) pseudocode (execution order) PCFG Dyna language specifies these equations. Most programs just need to compute some values from other values. Any order is ok. Some programs also need to update the outputs if the inputs change: spreadsheets, makefiles, email readers dynamic graph algorithms EM in addition to other iterative optimization Energy of a proposed configuation as long as MCMC leave-one-out training of smoothing params practical equations

How you build a system (big picture slide) cool model practical equations PCFG Compilation strategies (well come back to this) tuned C++ implementation (data structures, etc.) pseudocode (execution order) as long as width from 2 to n as long as i from 0 to n-width k = i+width as long as j from i+1 to k-1 Writing equations in Dyna int a. a = b c. a will be kept up to date if b or c changes. b += x. b += y. equivalent to b = x+y. b is a sum of two variables. Also kept up to date. c += z(1). c += z(2). c += z(3). c += z(four). c += z(foo(bar,5)). c is a sum of all nonzero z( ) values. At compile time, we dont know how many! c += z(N). More interesting use of patterns a = b c. scalar multiplication a(I) = b(I) c(I). pointwise multiplication a += b(I) c(I). means a = b(I)c(I) dot product; could be sparse a(I,K) += b(I,J) c(J,K). b(I,J)c(J,K) matrix multiplication; could be sparse J is free on the right-h in addition to side, so we sum over it

Dyna vs. Prolog By now you may see what were up to! Prolog has Horn clauses: a(I,K) :- b(I,J) , c(J,K). Dyna has Horn equations: a(I,K) += b(I,J) c(J,K). Like Prolog: Allow nested terms Syntactic sugar as long as lists, etc. Turing-complete Unlike Prolog: Charts, not backtracking! Compile efficient C++ classes Terms have values Some connections in addition to intellectual debts Deductive parsing schemata (preferably weighted) Goodman, Nederhof, Pereira, McAllester, Warren, Shieber, Schabes, Sikkel Deductive databases (preferably with aggregation) Ramakrishnan, Zukowski, Freitag, Specht, Ross, Sagiv, Query optimization Usually limited to decidable fragments, e.g., Datalog Theorem proving Theorem provers, term rewriting, etc. Nonmonotonic reasoning Programming languages Efficient Prologs (Mercury, XSB, ) Probabilistic programming languages (PRISM, IBAL ) Declarative networking (P2) XML processing languages (XTatic, CDuce) Functional logic programming (Curry, ) Self-adjusting computation, adaptive memoization (Acar et al.) Increasing interest in resurrecting declarative in addition to logic-based system specifications. Example: CKY in addition to Variations

The CKY inside algorithm in Dyna using namespace cky; chart c; c[rewrite(s,np,vp)] = 0.7; c[word(Pierre,0,1)] = 1; c[sentence-length] = 30; cin >> c; // get more axioms from stdin cout c[goal]; // print total weight of all parses phrase(X,I,J) += rewrite(X,W) word(W,I,J). phrase(X,I,J) += rewrite(X,Y,Z) phrase(Y,I,Mid) phrase(Z,Mid,J). goal += phrase(s,0,sentence-length). Visual debugger: Browse the proof as long as est Visual debugger: Browse the proof as long as est

Parameterization rewrite(X,Y,Z) doesnt have to be an atomic parameter: urewrite(X,Y,Z) = weight1(X,Y). urewrite(X,Y,Z) = weight2(X,Z). urewrite(X,Y,Z) = weight3(Y,Z). urewrite(X,Same,Same) = weight4. urewrite(X) += urewrite(X,Y,Z). % normalizing constant rewrite(X,Y,Z) = urewrite(X,Y,Z) / urewrite(X). % normalize phrase(X,I,J) += rewrite(X,W) word(W,I,J). phrase(X,I,J) += rewrite(X,Y,Z) phrase(Y,I,Mid) phrase(Z,Mid,J). goal += phrase(s,0,sentence-length). Related algorithms in Dyna Viterbi parsing Logarithmic domain Lattice parsing Incremental (left-to-right) parsing Log-linear parsing Lexicalized or synchronous parsing Binarized CKY Earleys algorithm phrase(X,I,J) += rewrite(X,W) word(W,I,J). phrase(X,I,J) += rewrite(X,Y,Z) phrase(Y,I,Mid) phrase(Z,Mid,J). goal += phrase(s,0,sentence-length). Related algorithms in Dyna phrase(X,I,J) += rewrite(X,W) word(W,I,J). phrase(X,I,J) += rewrite(X,Y,Z) phrase(Y,I,Mid) phrase(Z,Mid,J). goal += phrase(s,0,sentence-length). Viterbi parsing Logarithmic domain Lattice parsing Incremental (left-to-right) parsing Log-linear parsing Lexicalized or synchronous parsing Binarized CKY Earleys algorithm max= max= max=

To Sum Up Contrastive Estimation means picking your own denominator as long as tractability or as long as accuracy (or, as in our case, as long as both). Now we can use the task to guide the unsupervised learner Its a particularly good fit as long as log-linear models: with max ent features unsupervised sequence models all in time as long as ACL 2006. (like discriminative techniques do as long as supervised learners).

## McKay, Tim Operations Manager

McKay, Tim is from United States and they belong to KGVY-AM and they are from Green Valley, United States got related to this Particular Journal. and McKay, Tim deal with the subjects like Music

## Journal Ratings by Kettering University (GMI)

This Particular Journal got reviewed and rated by Kettering University (GMI) and short form of this particular Institution is US and gave this Journal an Excellent Rating.