Contents

## Parallel in addition to Distributed Algorithms Eric Vidal Reference: R. Johnsonbaugh in addition to M.

Greystone, Matt, Contributing Writer has reference to this Academic Journal, PHwiki organized this Journal Parallel in addition to Distributed Algorithms Eric Vidal Reference: R. Johnsonbaugh in addition to M. Schaefer, Algorithms (International Edition). 2004. Pearson Education. Outline Introduction (case study: maximum element) Work-optimality The Parallel R in addition to om Access Machine Shared memory modes Accelerated cascading Other Parallel Architectures (case study: sorting) Circuits Linear processor networks (Mesh processor networks) Distributed Algorithms Message-optimality Broadcast in addition to echo (Leader election) Introduction

This Particular University is Related to this Particular Journal

Why use parallelism p steps on 1 printer, 1 step on p printers p = speed-up factor (best case) Given a sequential algorithm, how can we parallelize it Some are inherently sequential (P-complete) Case Study: Maximum Element In: a[] Out: maximum element in a sequential-maximum(a) { n = a.length max = a[0] as long as i = 1 to n 1 { if (a[i] > max) max = a[i] } return max } 21 11 23 17 48 33 22 41 21 23 23 48 48 48 48 O(n) Parallel Maximum Idea: Use n / 2 processors Note idle processors after the first step! 21 11 23 17 48 33 22 41 21 23 48 41 23 48 48 O(lg n)

Work-Optimality Work = number of algorithmic steps × number of processors Running time of parallelized maximum algo = O(lg n) × (n / 2) = O(n lg n) Not work-optimal! Sequential algos work is O(n) Workaround: accelerated cascading Formal Algorithm as long as Parallel Maximum But first! The Parallel R in addition to om Access Machine

The Parallel R in addition to om Access Machine (PRAM) New construct: parallel loop as long as i = 1 to n in parallel { } Assumption 1: use n processors to execute this loop (processors are synchronized) Assumption 2: memory shared across all processors Example: Parallel Search In: a[], x Out: true if x is in a, false otherwise parallel-search(a, x) { n = a.length found = false as long as i = 0 to n 1 in parallel { if (a[i] == x) found = true } return found } Is this work-optimal Shared memory modes: Exclusive Read (ER) Concurrent Read (CR) Exclusive Write (EW) Concurrent Write (CW) Real-world systems are most commonly CREW parallel-search runs on what type Formal Algorithm as long as Parallel Maximum In: a[] Out: maximum element in a parallel-maximum(a) { n = a.length as long as i = 0 to lg n 1 { as long as j = 0 to n/2i+1 1 in parallel { if (j × 2i+1 + 2i < n) // boundary check a[j × 2i+1] = max(a[j × 2i+1], a[j × 2i+1 + 2i]) } } return a[0] } Theorem: parallel-maximum is CREW in addition to finds the maximum element in parallel time O(lg n) in addition to work O(n lg n) Accelerated Cascading Phase 1: Use sequential-maximum on blocks of lg n elements We use n / lg n processors O(lg n) sequential steps per processor Total work = O(lg n) steps × (n / lg n) processors = O(n) Phase 2: Use parallel-maximum on the resulting n / lg n elements lg (n / lg n) parallel steps = lg n lg (lg n) = O(lg n) Total work = O(lg n) steps × ((n / lg n) / 2) processors = O(n) Formal Algorithm as long as Optimal Maximum In: a[] Out: maximum element in a optimal-maximum(a) { n = a.length block-size = lg n block-count = n / block-size create array block-results[block-count] as long as i = 0 to block-count 1 in parallel { start = i × block-size end = min(n 1, start + block-size 1) block-results[i] = sequential-maximum(a[start end]) } return parallel-maximum(block-results) } Some Notes All CR algorithms can be converted to ER algorithms! Broadcasting an ER variable to all processors as long as concurrent access takes O(lg n) parallel time maximum is a semigroup algorithm Semigroup = a set of elements + an associative binary relation (max, min, +, ×, etc.) Same accelerated-cascading methods can be applied as long as min-element, summation, product of n numbers, etc.! Other Parallel Architectures PRAM may not be the best model Shared memory = expensive! Some algorithms require communication between processors (= memory locking issues) Better to use channels! Extreme case: very simple processors with no shared memory (just communication channels) Circuits Each processor is a gate with a specialized function (e.g., comparator gate) Circuit = a layout of gates to per as long as m a full task (e.g., sorting) x y min(x, y) max(x, y) Sorting circuit as long as 4 elements (depth 3) Step 1 Step 2 Step 3 (Depth of network = 3) 17 42 23 7 17 42 7 23 23 17 7 17 23 42 7 42 Sorting circuit as long as n elements Simpler problem: max element Idea: Add as many of these diagonals as needed Odd-Even Transposition Network Theorem: The odd-even transposition network sorts n numbers in n steps in addition to O(n2) processors Zero-One Principle of Sorting Networks Lemma: If a sorting network works correctly on all inputs consisting of only 0s in addition to 1s, it works as long as any arbitrary input Assume there is a network that sorts 0-1 sequences but not another arbitrary input a0 an-1 Let b0 bn-1 be the output of that network There must exist s < t such that bs > bt Label all ai < bs with 0 in addition to all else with 1 If we run all a0 an-1 with their labels, then bss label will be 1 in addition to bts label will be 0 Contradiction: The network is assumed to sort 0-1 sequences properly but did not do so here! Correctness of the Odd-Even Transposition Network Assume binary sequence a0 an1 Let ai = first 0 in the sequence Two cases: i is odd or even To sort a0 ai, we need i steps (worst-case) Induction: Given a0 ak (where k i) will sort in k steps, will a0 ak+1 get sorted in k+1 steps 0 Better Sorting Networks Batchers Bitonic Sorter (1968) Depth O(lg2 n), size O(n lg2 n) Idea: sort 2 groups (recursively), then merge using a network that can sort bitonic sequences AKS Network (1983) Ajtai, Komlós in addition to Szemerédl Depth O(lg n), size O(n lg n) Not practical! Hides a very large c in the cn lg n algorithm

More Intelligent Processors: Processor Networks Star Linear/Ring Completely-connected Mesh Diameter = 2 Diameter = n 1 (or n 2) Diameter = 1 Sorting on Linear Networks Emulate an odd-even transposition network! O(n) steps, work is O(n2) We cant expect better on a linear network 18 42 31 56 12 11 19 18 42 31 56 11 12 19 18 31 42 11 56 12 19 18 31 11 42 12 56 19 18 11 31 12 42 19 56 11 18 12 31 19 42 56 11 12 18 19 31 42 56 11 12 18 19 31 42 56 Sorting on Mesh Networks: Shearsort Arrange numbers in boustrophedon order a = { 15, 4, 10, 6, 1, 5, 7, 11, 12, 14, 13, 8, 9, 16, 2, 3 } Row phase Sort rows, sort columns, repeat

Sorting on Mesh Networks: Shearsort Arrange numbers in boustrophedon order a = { 15, 4, 10, 6, 1, 5, 7, 11, 12, 14, 13, 8, 9, 16, 2, 3 } Column phase Sort rows, sort columns, repeat Sorting on Mesh Networks: Shearsort Arrange numbers in boustrophedon order a = { 15, 4, 10, 6, 1, 5, 7, 11, 12, 14, 13, 8, 9, 16, 2, 3 } Row phase Sort rows, sort columns, repeat Sorting on Mesh Networks: Shearsort Arrange numbers in boustrophedon order a = { 15, 4, 10, 6, 1, 5, 7, 11, 12, 14, 13, 8, 9, 16, 2, 3 } Column phase Sort rows, sort columns, repeat

Echo Creates a spanning tree out of any connected network init-echo() { N = { q q is a neighbor of p } as long as each q N send token to q counter = 0 while (counter < N) { receive token counter = counter + 1 } terminate } echo() { receive token from parent N = { q q is a neighbor of p } { parent } as long as each q N send token to q counter = 0 while (counter < N) { receive token counter = counter + 1 } send token to parent terminate } 2 1 3 6 4 5 3 fin fin fin fin fin Echo Creates a spanning tree out of any connected network Theorem: init-echo + echo has time complexity O(diameter) in addition to message complexity O(edges) init-echo() { N = { q q is a neighbor of p } as long as each q N send token to q counter = 0 while (counter < N) { receive token counter = counter + 1 } terminate } echo() { receive token from parent N = { q q is a neighbor of p } { parent } as long as each q N send token to q counter = 0 while (counter < N) { receive token counter = counter + 1 } send token to parent terminate } 2 1 3 6 4 5 fin fin fin fin fin fin Leader Election ( as long as ring networks) init-election() { send token, p.ID to successor min = p.ID receive token, token-id while (p.ID != token-id) { if token-id < min min = token-id send token, token-id to successor receive token, token-id } if (p.ID == min) i-am-the-leader = true else i-am-the-leader = false terminate } election() { i-am-the-leader = false do { receive token, token-id send token, token-id to successor } while (true) } Theorem: init-election + election runs in n steps with message complexity O(n2)

## Greystone, Matt Contributing Writer

Greystone, Matt is from United States and they belong to D’Vine Wine & Vistor Guide and they are from Lodi, United States got related to this Particular Journal. and Greystone, Matt deal with the subjects like Travel; Wine

## Journal Ratings by University of Scranton

This Particular Journal got reviewed and rated by University of Scranton and short form of this particular Institution is PA and gave this Journal an Excellent Rating.