RTL synthesis and other backend Interview Questions (with answers)


Q1: How would you speed up an ASIC design project by parallel computing? Which design stages can be distributed for parallel computing, which cannot, and what procedures are needed for maintaining parallel computing?
Ans: Mentioning the following important steps in parallel computing is essential:
1. Partitioning the design
2. Distributing partitioned tasks among multiple CPUs
3. Integrating the results


WHAT STAGES: The following answers are acceptable. Others may be accepted if you gave a reasonable explanation of why you can or cannot use parallel computing in a particular stage of the flow.
Can use parallel computing:
- Synthesis after partitioning
- Placement (hierarchical design)
- Detailed routing
- DRC
- Functional verification
- Timing Analysis (partition the timing graph)
Cannot use parallel computing:
- Synthesis before partitioning
- Floorplanning
- Flat Placement
- Global Routing
CONSTRAINTS: Mentioning that care must be taken to make sure that partition boundaries are consistent when integrating the results back together.

Q2: What kinds of timing violations are in a typical timing analysis report? Explain!
Ans: Acceptable answers...
- Setup time violations
- Hold time violations
- Minimum delay
- Maximum delay
- Slack
- External delay

Q3: List the possible techniques to fix a timing violation.
Ans: Acceptable answers...
- Buffering
Buffers are inserted in the design to drive a load that is too large for a logic cell to efficiently drive. If the net is too long then the net is broken and buffers are inserted to improve the transition which will ultimately improve the timing on data path and reduce the setup violation.
To reduce the hold violations buffers are inserted to add delay on data paths.- Mapping - Mapping converts primitive logic cells found in a netlist to technology-specific logic gates found in the library on the timing critical paths.
- Unmapping - Unmapping converts the technology-specific logic gates in the netlist to primitive logic gates on the timing critical paths.
- Pin swapping - Pin swapping optimization examines the slacks on the inputs of the gates on worst timing paths and optimizes the timing by swapping nets attached to the input pins, so the net with the least amount of slack is put on the fastest path through the gate without changing the function of the logic.
- Wire sizing
- Transistor (cell) sizing - Cell sizing is the process of assigning a drive strength for a specific cell in the library to a cell instance in the design.If there is a low drive strength cell in the timing critical path then this cell is replaced by higher drive strength cell to reduce the timing violation.
- Re-routing
- Placement updates
- Re-synthesis (logic transformations)

- Cloning - Cell cloning is a method of optimization that decreases the load of a very heavily loaded cell by replicating the cell. Replication is done by connecting an identical cell to the same inputs as the original cell.Cloning clones the cell to divide the fanout load to improve the timing.
- Taking advantage of useful skew
- Logic re-structuring/Transformation (w/Resynthesis) - Rearrange logic to meet timing constraints on critical paths of design
- Making sure we don't have false violations (false path, etc.)

Q4: Give the linear time computation scheme for Elmore delay in an RC interconnect tree.
Ans: The following is acceptable...
- Elmore delay formula
T = Sum over all nodes i in path (s,t) of Ri*Ci where Ci is the total capacitance in the sub tree rooted at node i, or alternatively, the sum over the capacitances at the nodes times the shared resistance between the path of interest and the path to the node.
- Explaining terms in formula
- Mentioning something that shows that it can be done in linear time ("lumped"
or "shared" resistances, "recursive" calculations, etc)

Q5: Given a unit wire resistance "r" and a unit wire capacitance "c", a wire segment of length "l" and width "w" has resistance "l/w" and capacitance "cwl". Can we reduce the Elmore delay by changing the width of a wire segment? Explain your answer.
Ans: You needed to mention that by scaling different segments by different amounts, you can reduce the delay (e.g. wider segments near the root and narrower segments near the leaves. Delay is independent of width because the "w" term cancels out.

Q6: Extend the ZST-DME algorithm to embed a binary tree such that the Elmore delay from the root to each leaf of the tree is identical.
Ans: You needed to mention that a new procedure is needed for calculating the Elmore delay assuming that certain merging points are chosen, instead of just the total downstream wire-length. The merging segment becomes a set of points with equal Elmore delay instead of just equal path length. You could refer the paper "Low-Cost Single-Layer Clock Trees With Exact Zero Elmore Delay Skew", Andrew B. Kahng and Chung-Wen Albert Tsao.

Q7: IPO (sometimes also referred to as "In-Place Optimization") tries to optimize the design timing by buffering long wires, resizing cells, restructuring logic etc.
Explain how these IPO steps affect the quality of the design in terms of area, congestion, timing slack.
(a) Why is this called "In-Place Optimization" ?
(b) Why are the two IPO steps different ?
(c) Why are both used ?

Ans: IPO optimizes timing by buffer insertion and cell resizing. Important steps that are performed in IPO include fixing {setup,hold} time, max. transition time violation. Timing slack along all arcs is optimized by these operations. Increase in area and reduction in timing slack depend upon timing and IPO constraints.
(a) This step is referred to as "In-Place Optimization" because IPO performs resizing and buffer in-place (between cells in the row). It does not perform placement optimization in this step.
(b) The first IPO1 step is performed after placement. It performs trial-route--> extraction --> timing analysis to determine the quality of placement. Setup and hold time fixing is done according to result of timing analysis. The second IPO step is performed after clock tree synthesis. CTS performs clock buffer insertion to balance skews among all flip-flops. IPO2 step optimizes timing paths between flip-flops taking the actual clock skew.
(c) If IPO2 step is not performed after CTS, then timing paths between flip-flops are not tuned for clock skew variation. Even though NanoRoute performs timing optimization, it is more of buffer insertion in long interconnect to fix setup and hold times.

Q8: Clocking and Place-Route Flow. Consider the following steps:
- Clock sink placement
- Standard-cell global placement
- Standard-cell detailed placement
- Standard-cell ECO placement
- Clock buffer tree construction
- Global signal routing
- Detailed signal routing
- Bounded-skew (balanced) clock (sub)net routing
- Steiner clock (sub)net routing
- Clock sink useful skew scheduling (i.e., solving the linear program, etc.)
- Post-placement (global routing based) static timing analysis
- Post-detailed routing static timing analysis
(a) As a designer of a clock distribution flow for high-performance standard-cell based ASICs, how would you order these steps? Is it possible to use some steps more than once, others not at all (e.g., if subsumed by other steps).
(b) List the criteria used for assessing possible flows.
(c) What were the 3 next-best flows that you considered (describe as variants of your flow), and explain why you prefer your given answer.

Ans:(a) My basic flow:
(1) SC global placement
(2) post-placement STA
(3) clock sink useful-skew scheduling
(4) clock buffer tree construction that is useful-skew aware (cf. associative skew.)
(5) standard-cell ECO placement (to put the buffers into the layout)
(6) Steiner clock subnet routing at lower levels of the clock tree (following CTGen type paradigm)
(7) bounded-skew clock subnet routing at all higher levels of the clock tree, and as necessary even at lower levels, to enforce useful skews
(8) global signal routing
(9) detailed signal routing,
(10) post-detailed routing STA
(b)Criteria:
(1) likelihood of convergence with maximum clock frequency
(2) minimization of CPU time (by maximizing incremental steps, minimizing .detailed. steps, and minimizing iterations)
(3) make a good trade-off between wiring-based skew control and wire cost (this suggests Steiner routing at lower levels, bounded-skew routing at higher levels).
[Comment 1. Criteria NOT addressed: power, insertion delay, variant flow for hierarchical clocking or gated clocking.
Comment 2: I do not know of any technology for clock sink placement that can separate this from placement of remaining standard cells. So, my flow does not invoke this step. I also don't want post-route ECOs.]
(c) Variants:
(1) introduce Step 11: loop over Steps 3-10 (not adopted because cost benefit ratio was not attractive, and because there is a trial placement + global routing to drive useful-skew scheduling, buffer tree construction and ECO placement);
(2) after Steps 1-4, re-place the entire netlist (global, detailed placement) and then skip Step 5 (not adopted because benefits of avoiding ECO placement and leveraging a good clock skeleton were felt to be small-buffer tree will largely reflect the netlist structure, and replacing can destroy assumptions made in Steps 3-4);
(3) can iterate the first 5 steps essentially by iterating: clock sink placement, (ECO placement for legalization), (incremental) standard-cell (global + detailed) placement (not adopted because I feel that any objective for standalone clock sink placement would be very "fuzzy", e.g., based on sizes of intersections of fan-in/fan-out cones of sequentially adjacent FFs)

Q9: If we migrate to the next technology node and double the gate count of a design, how would you expect the size of the LEF and routed DEF files to change? Explain your reasoning.
Ans: The LEF file will remain roughly the same size (same richness of cell library, say, between 500-1200 masters), modulo possible changes in conventions (e.g., CTLF used to be a part of LEF) and modulo possible additional library model semantics (e.g., adding power modeling into LEF). The DEF file should at least double (the components and nets will double, but if there is extra routing complexity (more complex geometries, and more segments per connection due to antenna rules or badly scaling router heuristics) the DEF could grow significantly faster.

{ 3 Reactions ... read them below or write one }

Rakesh, Bangalore said on May 19, 2010 at 7:57 AM

These set of questions are so useful. I have started to prepare for interviews and i m sure this will help me evaluate myself. Thanks a lot.

Engineering job descriptions said on October 14, 2010 at 7:20 PM

Hi

I read this post two times.

I like it so much, please try to keep posting.

Let me introduce other material that may be good for our community.

Source: Construction interview questions

Best regards
Henry

Culao said on October 20, 2010 at 8:06 AM

Hi

Tks very much for post:

I like it and hope that you continue posting.

Let me show other source that may be good for community.

Source: Construction interview questions

Best rgs
David

Post a Comment

Your comments will be moderated before it can appear here.