Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychou

 www.phwiki.com

 

The Above Picture is Related Image of Another Journal

 

Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychou

Colgate University, US has reference to this Academic Journal, Compactly Representing Parallel Program Executions Ankit Goel Abhik Roychoudhury Tulika Mitra National University of Singapore Path profiles Profiling a program?s execution Count based Path based Count based profiles are more aggregate # of execution of the program?s basic blocks # of accesses of various memory locations Path based profiles are more accurate Sequence of basic blocks executed Sequence of memory locations accessed Use Online compression so that generate compact path profiles. Organization Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Compression Efficiency in addition to Overheads Data race detection over path profiles

 Hutchinson, Earl Colgate University www.phwiki.com

 

Related University That Contributed for this Journal are Acknowledged in the above Image

 

Compressed Path – Example 1 2 3 Uncompressed Path 123123 Compressed Representation S ? AA A ? 123 Control Flow Graph Online Path Compression A program path is a string over a finite alphabet Alphabet decided by what we instrument Control flow (Basic Blocks executed) Data flow (Memory Locations accessed) A string s is represented by a Context Free Grammar Gs: Language of Gs is {s} Construction of Gs is online in addition to not post-mortem Start alongside trivial grammar & modify it in consideration of each symbol No recursive rules (DAG representation) Compression scheme ? Nevill-Manning & Witten 97 Application so that program paths ? Larus 99 Online Compression in action Path Executed Compressed Representation 1 S -> 1 12 S -> 12 123 S -> 123 1231 S -> 1231 12312 S -> 12312 S -> A3A A -> 12

Online Compression in action Path Executed Compressed Representation 123123 S -> A3A3 A -> 12 S -> BB B -> A3 A -> 12 S -> BB B -> 123 Organization Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Compression Efficiency in addition to Overheads Data race detection over path profiles What so that represent ? Control/data flow in each program thread Communication among threads Synchronization (locks, barriers) Unsynchronized shared variable accesses Too costly so that observe/record order of all shared variable accesses We will represent Compressed flow in each thread (via Grammar) Communication via synchronizations (How ?)

ΓΏ The Vision of the Common Core: Changing Beliefs, Transforming Practice Manipulatives in Algebra Let?s Look at Algebra Models Extending the Model Extending the Model Extending the Model Completing the Square An Addition Model An Addition Model A Look at Subtraction A Look at Subtraction Polynomials-Addition Polynomials-Subtraction Try a few on your own Solving Equations in One Variable = Solving Equations in One Variable = Solving Equations in One Variable =

Synchronization Pattern (Locks) lock unlock Compute lock unlock P1 P2 Memory Message Sequence Chart (MSC) Pgm = P1 || P2 Synchronization Pattern (Barrier) Blocked go go ready Compute Compute P1 P2 Pgm = P1 || P2 Memory ready Connection so that MSCs Partial Order of MSC unlock lock Matches Observed Ordering Total order in each thread Ordering across threads visible via synchronization (msg. exchange) All synchronization ops. form a total order Th. 1 Th. 2 Shared Mem.

A first cut Instrument each thread so that observe local control/data flow in addition to global synch. Represent path profile of P1 || P2 Each thread?s flow as a Grammar ? (G1, G2) Contains synch. ops. as well. All synchronization ops. as a list. Associate entries in this list so that the occurrence of synch. ops. in (G1,G2) How so that navigate the path profile ? Zoom in so that a specific lock?unlock segment of P1 Edge annotations a b (lock) c (unlock) x b (lock) c (unlock) y S A a b c x y Grammar in consideration of one thread 0 2 0 1 2 4 Locating synch. operations S A a b c x y 0 2 0 1 2 4 Locating the 3rd synchronization operation Can find synch. segments by looking up global list. X Y } n synch ops. n

So far Control flow of each thread stored as a grammar Synchronization ops. form a global list Grammar of each thread annotated alongside counts Easy searching of synchronization operations What about shared data accesses ? Sequence of memory locations accessed by a single LD/ST instruction can be compressed Use a Grammar representation in consideration of this seq. as well Further compression Locations accessed by a memory operation 10,14,18,22,26,54,58,62,66,70,98 Online Compression of the string as grammar 10(1), 4(4), 28(1), 4(4), 28(1) Difference representation + Run-length encoding Useful in consideration of detecting regularity of array accesses Sweep through an array: A run of constant diffs. Accessing a sub-grid of a multidimensional array Organization Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Compression Efficiency in addition to Overheads Data race detection over path profiles

Any better than gzip ? Compression % (2 Processors) Scalability of Compression Compression % in consideration of our scheme Concerns about Timing Overheads Our scheme does not add substantial time overhead over grammar based string compression Our experiments conducted using RSIM Tracing overheads can be higher in a real multiprocessor Can tracing distort program behavior ? Possible solution Trace minimal number of operations in a parallel program execution (Netzer 1993) so that ensure deterministic replay Collect compressed path profile during replay.

Organization Compressed Path Profiles in Sequential Programs Parallel Program Path Profiles Compression Efficiency in addition to Overheads Data race detection over path profiles Apparent Data races lock unlock lock unlock lock Th. 1 Th.2 unlock lock unlock Th.3 Mem. Last unlock in Th. 1 (first unlock) Next lock in Th. 1 (second lock) Locate root-to-leaf paths of these ops. Tree rooted at the least common ancestor of these ops. No Decompression of the grammar of Th. 1 Data race artifacts Sub := 1 A[1] := 0 X := Sub; Y := A[X] (artifact) X decides which addr. is accessed in Y := A[X] X is set by Sub:= 1 which is also in a data race. Detecting artifacts requires Data-flow Not captured by rd/wr sets in synch. segments Captured in our compact path profiles.

Summary Compressed representation of the execution profile of shared memory parallel programs Control in addition to shared data flow per thread Synchronization patterns across threads Overall compression efficiency 0.25% 9.81% Compression efficiency scalable alongside increasing number of processors Application: Post-mortem debugging such as detecting data races Other Applications We do not capture actual order of unsynchronized shared memory accesses across processors Can be useful in making architectural decisions such as choice of cache coherence protocol Sufficient so that maintain [Netzer 1993] transitive reduction of program order on each proc. shared variable conflict orders Can we capture transitive reduction relation via annotations of WPP edges?

Hutchinson, Earl Managing Editor

Hutchinson, Earl is from United States and they belong to Managing Editor and work for CONTACT Newsletter in the AZ state United States got related to this Particular Article.

Journal Ratings by Colgate University

This Particular Journal got reviewed and rated by Summary Compressed representation of the execution profile of shared memory parallel programs Control in addition to shared data flow per thread Synchronization patterns across threads Overall compression efficiency 0.25% 9.81% Compression efficiency scalable alongside increasing number of processors Application: Post-mortem debugging such as detecting data races Other Applications We do not capture actual order of unsynchronized shared memory accesses across processors Can be useful in making architectural decisions such as choice of cache coherence protocol Sufficient so that maintain [Netzer 1993] transitive reduction of program order on each proc. shared variable conflict orders Can we capture transitive reduction relation via annotations of WPP edges? and short form of this particular Institution is US and gave this Journal an Excellent Rating.