FPGA Soft Core Processors FPGA Soft Core Processors Conjoinment Overview “Conjoining” Conjoinment Background

FPGA Soft Core Processors FPGA Soft Core Processors Conjoinment Overview “Conjoining” Conjoinment Background www.phwiki.com

FPGA Soft Core Processors FPGA Soft Core Processors Conjoinment Overview “Conjoining” Conjoinment Background

Potter, Elaine, Marketing Coordinator has reference to this Academic Journal, PHwiki organized this Journal Conjoining Soft-Core FPGA Processors David Sheldona, Rakesh Kumarb, Frank Vahida, Dean Tullsenb , Roman Lyseckyc aDepartment of Computer Science in addition to Engineering University of Cali as long as nia, Riverside Also with the Center as long as Embedded Computer Systems at UC Irvine bDepartment of Computer Science in addition to Engineering University of Cali as long as nia, San Diego cDepartment of Electrical in addition to Computer Engineering University of Arizona This work was supported in part by the National Science Foundation, the Semiconductor Research Corporation, in addition to by hardware in addition to software donations from Xilinx FPGA Soft Core Processors Soft-core Processor HDL description Flexible implementation FPGA or ASIC Technology independent HDL Description FPGA ASIC Spartan 3 Virtex 2 Virtex 4 FPGA Soft Core Processors Soft Core Processors can have configurable options Datapath units Cache Bus architecture Current commercial FPGA Soft-Core Processors Xilinx Microblaze Altera Nios P Cache FPU MAC

Crown College US www.phwiki.com

This Particular University is Related to this Particular Journal

Conjoinment Overview Add necessary units to both processors Base micro-processor FPU Base micro-processor FPU FPU FPU FPU Application 1 Application 2 “Conjoining” Conjoin the FPU Unit Conjoinment Background Conjoinment proposed as long as multicore desktop processing (Kumar 2004) Reduces size with reasonable per as long as mance overhead e.g., cache conjoinment overhead: 1%-13% ICache Sharing DCache Sharing Outline Conjoinment as long as soft-core FPGA processors Area savings Per as long as mance overhead Tuning heuristic as long as two configurable soft-cores with conjoin option

Area Savings Significant potential area savings Limitations Does not consider multiplexing costs Due to absence of FPGA synthesis tools supporting conjoinment But good potential justifies further investigation Base MicroBlaze Multiplier FPU Unit Size Multiplier Barrel Shifter Divider FPU 1331 228 122 2738 6% 4% 23% 32% Outline Conjoinment as long as soft-core FPGA processors Area savings Per as long as mance overhead Tuning heuristic as long as two configurable soft-cores with conjoin option Per as long as mance Overhead No simulator exists as long as conjoined processors We developed our own Trace-based conjoined processor simulator Conj. simulator Simulation uses pessimistic per as long as mance assumptions Kumar’s techniques can improve Simulator outputs contention in as long as mation Final cycles can be compared to unconjoined to determine per as long as mance overhead brev bitmnp Xilinx simulator app1 app2

Per as long as mance Overhead 17% 2.4% Speedup: Application time on optimally configured processor / avg. app. time on base processor Compared configuration with conjoinment versus without Per as long as mance overhead usually small, averaged just 4.2% Overhead caused by access delays in addition to contention of the hardware units Outline Conjoinment as long as soft-core FPGA processors Area savings Per as long as mance overhead Tuning heuristic as long as two configurable soft-cores with conjoin option Tuning Heuristic 5 choices per unit e.g., FPU – no unit, 1 only, 2 only, 1 & 2, in addition to conjoined 4 units 54 = 625 possible configurations Simulation: ~30 minutes per configuration Need search heuristic to tune NO FPU NO FPU Base MicroBlaze 1 Base MicroBlaze 2 FPU 2 FPU conjoined FPU 1

Map to 0-1 Knapsack Problem BS Perf increment Size increment FPU MUL DIV 1.1 0.9 1.2 1.0 1.4 2.7 1.8 1.1 Perf/Size 0.96 0.34 0.63 0.93 Creating the model Map to 0-1 Knapsack Problem First consider tuning without conjoinment Problem of instantiating units to limited FPGA size can be mapped to the 0-1 knapsack problem Add items, each with weight in addition to benefit, to weight-constrained knapsack such that profit maximized MUL 1 1 1 FPU 1 Base MicroBlaze MUL 2 2 2 FPU 2 Available FPGA Base MicroBlaze Items: Weights: Benefits: Knapsack Note: Mapping inexact – weights/benefits not strictly additive 1331 228 121 2738 1331 228 121 2738 0.08 0.62 0.00 0.00 0.22 0.76 0.00 0.00 MUL 1 FPU 1 MUL 2 Disjunctively Constrained Knapsack Problem: If conjoined unit included, can’t also include st in addition to alone unit Solution: Map to disjunctively-constrained 0-1 knapsack Yanada T., “Heuristic in addition to Exact Algorithms as long as the Disjunctively Constrained Knapsack Problem”, 2002 Prohibits specific item pairs from being in the knapsack ILP solution, running time is pseudo polynomial Base MicroBlaze Available FPGA Base MicroBlaze Knapsack MUL 1 1 1 FPU 1 MUL 2 2 2 FPU 2 Items: MUL C C C FPU C

Disjunctively Constrained Knapsack Conjoined benefits shows a small decrease in benefit from the unconjoined unit Base MicroBlaze Available FPGA Base MicroBlaze Knapsack MUL 1 1 1 FPU 1 MUL 2 2 2 FPU 2 Items: MUL C C C FPU C Weights: Benefits: 1331 228 121 2738 1331 228 121 2738 0.08 0.62 0 0 0.22 0.76 0 0 Weights: Benefits 1: 1331 228 121 2738 0.06 0.54 0 0 Benefits 2: 0.21 0.71 0 0 MUL 1 MUL C Conjoined units provide benefits to both processors Disjunctively Constrained Knapsack Running Time Modeling 5 Synthesis runs as long as each Processor At most 4 runs of the conjoined Simulator Disjunctively Constrained 0-1 Knapsack NP-complete problem Solved with a heuristic Heuristic takes < 1 min Results Data gathered as long as the Xilinx Microblaze Soft-core Processor 10 EEMBC in addition to Powerstone benchmarks aifir, BaseFP01, bitmnp, brev, canrdr, g3fax, g721-ps, idct, matmul, tblook, ttsprk Obtained results as long as all possible pairwise conjoinment We only show conjoinment data when both applications use unit To avoid making conjoinment appear better than it is Results Knapsack approach finds near-optimal in most cases Results Knapsack heuristic finds near-optimal in most cases (versus exhaustive with conjoinment) Runs in seconds One example had sub-optimal results (2.9 times slower) Per as long as mance overhead due to conjoinment just a few percent on average Results On average the knapsack approach yields the same size as the exhaustive with conjoinment Average size savings of 16% Conclusions Conjoining two soft-core FPGA processors reduces average size by 16% Per as long as mance overhead just a few percent in most cases Disjunctively constrained 0-1 knapsack approach finds near-optimal in most cases But could be improved as long as some examples Future Consider multiplexing size in addition to delay overheads Apply Kumar's advanced conjoining techniques to reduce overheads Potter, Elaine In-Stat Marketing Coordinator www.phwiki.com

Potter, Elaine Marketing Coordinator

Potter, Elaine is from United States and they belong to In-Stat and they are from  Scottsdale, United States got related to this Particular Journal. and Potter, Elaine deal with the subjects like Media Relations; Public Relations

Journal Ratings by Crown College

This Particular Journal got reviewed and rated by Crown College and short form of this particular Institution is US and gave this Journal an Excellent Rating.