OCV-Aware Top-Level Clock Tree Optimization Thank You
Booth, Mary, Host has reference to this Academic Journal, PHwiki organized this Journal OCV-Aware Top-Level Clock Tree OptimizationTuck-Boon Chan, Kwangsoo Han, Andrew B. Kahng, Jae-Gon Lee in addition to Siddhartha Nath VLSI CAD LABORATORY, UC San DiegoOutlineMotivation in addition to Previous WorkOur ApproachExperimental SetupResults in addition to ConclusionsComplex timing constraints across process, voltage, temperature in addition to operating scenariosOn-chip variation more design margin Clock tree consumes up to 40% power aggressive power reduction complex clock tree with clock logic cells (CLCs) such as, clock gating, divider, MUXesClock Tree Synthesis Is Challenging!
This Particular University is Related to this Particular Journal
Top-Level Clock Tree ProblemsCTS with long non-common pathsThe top-level clock tree comprises of all transitive fanins to CLCs starting from a clock root pinTrees below the CLCs are the bottom-level treesIndustry tools do not always optimize the top-level clock treesResults in large skews with multi-corner multi-mode (MCMM) scenariosTop-Level Clock Tree OptimizationOptimizing the top-level clock tree involves h in addition to ling of complex clock logic cellsThe optimization involvesCLC placementsBuffer insertionMinimizing non-common pathsBalancing the tree based on timing in as long as mation (WNS, TNS across setup in addition to hold corners)CTS with long non-common pathsCTS with reduced non-common pathsPrevious WorksRajaram in addition to Pan (2011)Reduce non-common path delay by reallocating clock pin locations of soft-IP blocksInsert buffers to minimize difference in clock latency among subtrees across PVT cornersDo not consider CLCs, timing between sink groups, wirelengthTsai (2005), Velenis et al. (2003)Minimize effect of OCV during CTS but do not h in addition to le CLCs or MCMM scenariosLung et al. (2010)Optimize clock skew using LP in addition to account as long as delay variation across PVT cornersIgnore non-common paths in addition to CLC placement
OutlineMotivation in addition to Previous WorkOur ApproachExperimental SetupResults in addition to ConclusionsOur WorkCurrent CTS tools Balance bottom-level clock trees Optimize CLC placement Multi corner multi mode (MCMM) optimizationOur methodFocus on top-level clock treeSimultaneously optimize CLC placement in addition to balance clock tree across multi corner multi mode Extract timing constraints from bottom level clock trees capture accurate MCMM constraintsLP-Based OptimizationObjective: a weighted sum of worst negative slack (WNS)total negative slack (TNS)non-common pathswirelength of a clock treeVariables: CLC locations in addition to net delaysModel delay from pin I to pin J as a linear function of Manhattan distance Captures impact of CLC placement Extract insertion in addition to timing constraints from bottom level clock trees to estimate slacks of critical pathsDelays across different PVT corners are normalized to a reference corner as long as MCMM optimization
Exampletp are the terminal pinsd(i,j) : delay from pin i to pin jd (1,2) = 2nst1t3t4t5Top levelBottom levelrootCLC1nsSink group 3Critical path delay = 3nsd (1,3) = 0.5nsd (4,5) = 1nst23nsd (3,4) = 0.5nsSink group 2Sink group 1Example: Make d(1,2) = 4ns improves timingOur HeuristicsTo implement our optimization in an industrial CTS flow, we implement three heuristicsAlgorithm 1: Extract top-level clock treeAlgorithm 2: Create Steiner pointsAlgorithm 3: Insert buffersExtract Top-Level Clock TreeInputsInitial clock tree; cells in the tree are vertices in addition to connections between them are edgesList of vertices that belong to CLCsAlgorithm descriptionObtain transitive fanins of all CLCsRemove clock routes to the fanin cellsRemove buffers in addition to reconnect nets accordinglyOutputList of top-level clock cells in addition to connections between them
Output of Algorithm 1CLCFF group 1CLCFF group 2CLCCLCAlgorithm 1Create Steiner PointsInputsTop-level clock treeList of vertices that belong to CLCsAlgorithm descriptionFind pin-pair that minimize the sum of the difference in sink latency in addition to the delay due to Manhattan distanceMerge the pin-pair that has minimum sum of difference by inserting a new Steiner pointRepeat until all driving pins have a single connectionOutputA binary top-level clock tree in addition to connections between themOutput of Algorithm 2j1.L = j2.L = j3.L j4.L
Insert BuffersInputsTwo pin nets of top-level clock treeRequired delay of each netsAlgorithmCalculate the number of buffers required to meet the delay target as a function of net in addition to buffer delaysCalculate the minimum wirelength required to insert the number of buffersDetermine whether to insert in L-shape or U-shape mannerOutputTwo pin nets of top-level clock tree that buffers are insertedOutlineMotivation in addition to Previous WorkOur ApproachExperimental SetupResults in addition to ConclusionsCTS Testcase RequirementsRealistic in addition to resemble clock trees typically seen in SoC blocksInclude CLCs in addition to top-level hierarchiesCombinational logic in addition to critical paths across sink groupsMultiple clock roots in addition to generated clocks
Our CTS TestcasesWe develop generators as long as high-speed CTS testcases typically found in CPU/GPU blocks in modern SoCsImplement clock roots that are outputs of PLLs as well as crystal oscillatorsImplement different types of CLCsGlitch-free clock MUXDividersClock-gating cellsMultiple generated clocks as long as debug, tracing, IO, peripheralsExamples of CTS TestcasesClocks to all sink groups are generated clocksTop-level has up to two levels of hierarchyReconvergent pathsTop-level has up to two levels of hierarchyExperimental Setup Six high-speed testcasesP&R tool is an industry toolCTS uses MCMM scenariosTiming analysis tool is Synopsys PrimeTimeLP-solver is CPLEXFlow implemented in Tcl
Operating ConditionsOur Optimization FlowPlaced designCTSRemove buffers from top-level treeCLCs placement & buffer insertionPlacement legalizationRoute top-level clockRouting + optimizationRouting + optimizationCompare post-route metricsReference CTS flowOur optimization flowPost-CTS opt Initial clock treePost-CTS opt DRC & timing fixDRC & timing fixOutlineMotivation in addition to Previous WorkOur ApproachExperimental SetupResults in addition to Conclusions
Results: Improved TimingOur as long as mulation focuses on minimizing setup WNSImproved setup WNS up to 320psHold WNS is worsen but < 70psResults: Improved WL, PowerConclusionsIndustry tools do not optimize the top-level clock tree alwaysWe develop an optimization as long as mulation as long as the top-level tree in addition to solve it using three heuristicsWe develop realistic high-speed CTS testcases typically seen in clock trees of CPU/GPUOur optimization flow improves setup WNS by up to 320ps, wirelength by up to 51% in addition to dynamic power by up to 28%Ongoing works includeH in addition to ling obstaclesAccounting as long as optimal buffering solutionsCreating testcases as long as other important SoC elementsJoint optimization of the top- in addition to bottom-level trees Thank You
Booth, Mary Host
Booth, Mary is from United States and they belong to Chris & Mary – WMXC-FM and they are from Mobile, United States got related to this Particular Journal. and Booth, Mary deal with the subjects like Entertainment
Journal Ratings by Universidad Tcnica de Ambato
This Particular Journal got reviewed and rated by Universidad Tcnica de Ambato and short form of this particular Institution is EC and gave this Journal an Excellent Rating.