Showing posts with label Interview Questions. Show all posts
Showing posts with label Interview Questions. Show all posts

## Interview Question on Power Analysis

Your task is to do power analysis for a circuit that sends out a one-clock-cycle pulse on the done signal once every 16 clock cycles(done is ’0’ for 15 clock cycles, then ’1’ for one cycle, then repeat with 15 cycles of ’0’ followed by a ’1’, etc). You have been asked to consider three different types of counters: 1. Binary counter, 2. Gray-code counter, and 3. One-hot counter. (The table below
shows the values from 0 to 15 for the different encoding schemes) What is the relative amount of power consumption for the different options?

Your implementation technology is an FPGA where each cell has a programmable combinational circuit and a flip-flop. The combinational circuit has 4 inputs and 1 output. The capacitive load of the combinational circuit is twice that of the flip-flop.

1. You may neglect power associated with clocks.
2. You may assume that all counters:
(a) are implemented on the same fabrication process
(b) run at the same clock speed
(c) have negligible leakage and short-circuit currents

Encoding:
The columns below represent, Decimal Gray One-Hot Binary in order
0 0000 0000000000000001 0000
1 0001 0000000000000010 0001
2 0011 0000000000000100 0010
3 0010 0000000000001000 0011
4 0110 0000000000010000 0100
5 0111 0000000000100000 0101
6 0101 0000000001000000 0110
7 0100 0000000010000000 0111
8 1100 0000000100000000 1000
9 1101 0000001000000000 1001
10 1111 0000010000000000 1010
11 1110 0000100000000000 1011
12 1010 0001000000000000 1100
13 1011 0010000000000000 1101
14 1001 0100000000000000 1110
15 1000 1000000000000000 1111

This question is asked widely in interviews worldwide with varying levels of difficulty. Please start discussing.

Tip: Capacitance is dependent upon the number of signals, and whether a signal is combinational or a flop.

## Clock network design

Clock network is usually formed by top-level mesh/network and bottom-level Steiner minimum trees. The objective of clock network design is 1.) minimum or bounded skew, 2.) minimum delay, 3.) bounded process variation. Can we compare different clock topologies, or, how can we evaluate the effectiveness of clock boosters and feedback loops?

## Clock skew variation estimation

Clock meshes are used in state-of-the-art designs to construct clock routing in contrast to clock trees in older design. This shift in clock tree construction methodology is motivated by the fact that meshes cope better with variability effects. How do you use SPICE simulations to measure the skew of tree routing versus grid clock routing while taking variability effects into consideration. Can you extend your study to non-tree routings, i.e., clock trees with added short cuts? Also can you include delay comparison between tree and non-tree structures?

## Impact of dummy fill on timing

How can you quantify the impact of dummy fill on post-layout timing?  Dummy fill can be inserted into a layout using SOC Encounter or post-tape-out tools like Calibre/Assura. You should then extract dummy fill using Fire-n-Ice extractor and compare pre-fill and post-fill timing. Can you compare the impact of filling approaches (grounded vs. floating)?

## The effect of whitespace and aspect ratio on wirelength and timing

Whitespaces (empty space) are inserted in layouts in order to increase the routing resources of the chip. Have you ever studied the impact of whitespace (and aspect ratio) on timing and wirelength, by say increasing the whitespace from 0% to 100% and evaluate the impact on both wirelength and timing. Can you predict how this will look like? For a 300 mm wafer, can you parameterize the relationship between the number of dies produced, timing, die aspect ratio, wirelength and whitespace?

## Investigation on timing analysis inaccuracies

Timing analysis inaccuracies due to crosstalk, multiple gate input switching, supply voltage variation, temperature, manufacturing variation, etc. are very common. How do you tackle them in real life designs? How do you do it using Prime Time (PT)?

## Distributions in statistical timing

How do you observe and highlight the impact of assumptions on gate-length variability distributions (if any) on final design timing distributions. How do you implement a simple statistical timer by (say) 500 Monte Carlo runs of STA (e.g. Primetime). Assume independent gate delays. Assume gate-delay distributions and generate circuit delay distributions. Delays can be changed in the SDF file. Interconnect may be ignored. Can we try tem for a few probability distributions (e.g. Gaussian, asymmetric Gamma, Triangular, etc).

## Effect of WLM and target frequency on performance

How do you quantify the effect of WireLength Models (WLM) and target frequency on the post-routing timing results?

## Dynamic power supply

Power gating adds enabling signals to a power supply network; dynamic power supply management adjusts supply voltage according to data path criticality. You are asked to take a testcase and upgrade its power supply network to dynamic power supply. How can you verify the power reduction of your technique?

## Clock tree theory

Constructing a zero-skew clock tree can be formulated as constructing a path-length balanced tree (assuming path delay is proportional to path length), i.e., to have identical path length between the root and any leave of the tree. The problem can be in a Euclidean plane, a rectilinear plane, or with other distance metrics. This problem's computation complexity is open. Can you find an approximation algorithm for the problem which guarantees a given error bound?

## Statistical clock tree design

Clock skew is a function of process variation, i.e., delay from the clock source to a leave of the clock tree is a statistical function. A rule of thumb for minimum process variation clock tree design is to have balanced branches, i.e., identical buffers from identical distances to the clock source, and symmetric clock routing branches with identical capacitive loads. Can you have a more flexible clock tree design scheme, while maintaining a minimized/bounded clock skew from a statistical point of view?

## Randomized algorithm/approximation scheme for statistical timing analysis

Statistical timing analysis gives a distribution for signal delay at each node in a netlist. A Monte Carlo simulation can give discrete distribution functions. Can there be a randomized algorithm or approximation scheme for statistical timing analysis with guaranteed error bound?

## Clock driver input alignment

Modern clock networks include several drivers in which delays are affected by the timing of their input signal transitions. How do you find out the input alignment of clock network drivers which leads to worst case driver gate delays?

## Re-timing tackles long combinational logic paths

Re-timing reduces longest combinational logic paths by relocating some of the flip-flops, both logically and physically. How do you evaluate the effectiveness of re-timing, with existing tools and/or some of your own scripts? Comparing with useful clock skew is a plus.
##### custom_date('19 May 2010');

I have listed below a set of common interview questions asked mainly in interviews related to physical design or backend activities in ASIC or VLSI chip design process. Typically these interviews start with questions on physical design(PD) flow and goes on to deeper details.
* What is signal integrity? How it affects Timing?
* What is IR drop? How to avoid .how it affects timing?
* What is EM and it effects?
* What is floor plan and power plan?
* What are types of routing?
* What is a grid .why we need and different types of grids?
* What is core and how u will decide w/h ratio for core?

* What is effective utilization and chip utilization?
* What is latency? Give the types?
* What is LEF?
* What is DEF?
* What are the steps involved in designing an optimal pad ring?
* What are the steps that you have done in the design flow?
* What are the issues in floor plan?
* How can you estimate area of block?
* How much aspect ratio should be kept (or have you kept) and what is the utilization?
* How to calculate core ring and stripe widths?
* What if hot spot found in some area of block? How you tackle this?
* After adding stripes also if you have hot spot what to do?
* What is threshold voltage? How it affect timing?
* What is content of lib, lef, sdc?
* What is meant my 9 track, 12 track standard cells?
* What is scan chain? What if scan chain not detached and reordered? Is it compulsory?
* What is setup and hold? Why there are ? What if setup and hold violates?
* In a circuit, for reg to reg path ...Tclktoq is 50 ps, Tcombo 50ps, Tsetup 50ps, tskew is 100ps. Then what is the maximum operating frequency?
* How R and C values are affecting time?
* How ohm (R), fared (C) is related to second (T)?
* What is transition? What if transition time is more?
* What is difference between normal buffer and clock buffer?
* What is antenna effect? How it is avoided?
* What is ESD?
* What is cross talk? How can you avoid?
* How double spacing will avoid cross talk?
* What is difference between HFN synthesis and CTS?
* What is hold problem? How can you avoid it?
* For an iteration we have 0.5ns of insertion delay and 0.1 skew and for other iteration 0.29ns insertion delay and 0.25 skew for the same circuit then which one you will select? Why?
* What is partial floor plan?
* What parameters (or aspects) differentiate Chip Design & Block level design??
* How do you place macros in a full chip design?
* Differentiate between a Hierarchical Design and flat design?
* Which is more complicated when u have a 48 MHz and 500 MHz clock design?
* Name few tools which you used for physical verification?
* What are the input files will you give for primetime correlation?
* What are the algorithms used while routing? Will it optimize wire length?
* How will you decide the Pin location in block level design?
* If the routing congestion exists between two macros, then what will you do?
* How will you place the macros?
* How will you decide the die size?

* If lengthy metal layer is connected to diffusion and poly, then which one will affect by antenna problem?
* If the full chip design is routed by 7 layer metal, why macros are designed using 5LM instead of using 7LM?
* In your project what is die size, number of metal layers, technology, foundry, number of clocks?
* How many macros in your design?
* What is each macro size and no. of standard cell count?
* How did u handle the Clock in your design?
* What are the Input needs for your design?
* What is SDC constraint file contains?
* How did you do power planning?
* How to find total chip power?
* How to calculate core ring width, macro ring width and strap or trunk width?
* How to find number of power pad and IO power pads?
* What are the problems faced related to timing?
* How did u resolve the setup and hold problem?
* If in your design 10000 and more numbers of problems come, then what you will do?
* In which layer do you prefer for clock routing and why?
* If in your design has reset pin, then it’ll affect input pin or output pin or both?
* During power analysis, if you are facing IR drop problem, then how did u avoid?
* Define antenna problem and how did u resolve these problem?
* How delays vary with different PVT conditions? Show the graph.
* Explain the flow of physical design and inputs and outputs for each step in flow.
* What is cell delay and net delay?
* What are delay models and what is the difference between them?
* What is wire load model?
* What does SDC constraints has?
* Why higher metal layers are preferred for Vdd and Vss?
* What is logic optimization and give some methods of logic optimization.
* What is the significance of negative slack?
* How the width of metal and number of straps calculated for power and ground?
* What is negative slack ? How it affects timing?
* What is track assignment?
* What is grided and gridless routing?
* What is a macro and standard cell?
* What is congestion?
* Whether congestion is related to placement or routing?
* What are clock trees?
* What are clock tree types?
* Which layer is used for clock routing and why?
* What is cloning and buffering?
* What are placement blockages?
* How slow and fast transition at inputs effect timing for gates?
* What is antenna effect?
* What are DFM issues?
* What is .lib, LEF, DEF, .tf?
* What is the difference between synthesis and simulation?
* What is metal density, metal slotting rule?
* What is OPC, PSM?
* Why clock is not synthesized in DC?
* What are high-Vt and low-Vt cells?
* What corner cells contains?
* What is the difference between core filler cells and metal fillers?
* How to decide number of pads in chip level design?
* What is tie-high and tie-low cells and where it is used

## RTL synthesis and other backend Interview Questions (with answers)

Q1: How would you speed up an ASIC design project by parallel computing? Which design stages can be distributed for parallel computing, which cannot, and what procedures are needed for maintaining parallel computing?
Ans: Mentioning the following important steps in parallel computing is essential:
1. Partitioning the design
2. Distributing partitioned tasks among multiple CPUs
3. Integrating the results

WHAT STAGES: The following answers are acceptable. Others may be accepted if you gave a reasonable explanation of why you can or cannot use parallel computing in a particular stage of the flow.
Can use parallel computing:
- Synthesis after partitioning
- Placement (hierarchical design)
- Detailed routing
- DRC
- Functional verification
- Timing Analysis (partition the timing graph)
Cannot use parallel computing:
- Synthesis before partitioning
- Floorplanning
- Flat Placement
- Global Routing
CONSTRAINTS: Mentioning that care must be taken to make sure that partition boundaries are consistent when integrating the results back together.

Q2: What kinds of timing violations are in a typical timing analysis report? Explain!
- Setup time violations
- Hold time violations
- Minimum delay
- Maximum delay
- Slack
- External delay

Q3: List the possible techniques to fix a timing violation.
- Buffering
Buffers are inserted in the design to drive a load that is too large for a logic cell to efficiently drive. If the net is too long then the net is broken and buffers are inserted to improve the transition which will ultimately improve the timing on data path and reduce the setup violation.
To reduce the hold violations buffers are inserted to add delay on data paths.- Mapping - Mapping converts primitive logic cells found in a netlist to technology-specific logic gates found in the library on the timing critical paths.
- Unmapping - Unmapping converts the technology-specific logic gates in the netlist to primitive logic gates on the timing critical paths.
- Pin swapping - Pin swapping optimization examines the slacks on the inputs of the gates on worst timing paths and optimizes the timing by swapping nets attached to the input pins, so the net with the least amount of slack is put on the fastest path through the gate without changing the function of the logic.
- Wire sizing
- Transistor (cell) sizing - Cell sizing is the process of assigning a drive strength for a specific cell in the library to a cell instance in the design.If there is a low drive strength cell in the timing critical path then this cell is replaced by higher drive strength cell to reduce the timing violation.
- Re-routing
- Re-synthesis (logic transformations)

- Cloning - Cell cloning is a method of optimization that decreases the load of a very heavily loaded cell by replicating the cell. Replication is done by connecting an identical cell to the same inputs as the original cell.Cloning clones the cell to divide the fanout load to improve the timing.
- Taking advantage of useful skew
- Logic re-structuring/Transformation (w/Resynthesis) - Rearrange logic to meet timing constraints on critical paths of design
- Making sure we don't have false violations (false path, etc.)

Q4: Give the linear time computation scheme for Elmore delay in an RC interconnect tree.
Ans: The following is acceptable...
- Elmore delay formula
T = Sum over all nodes i in path (s,t) of Ri*Ci where Ci is the total capacitance in the sub tree rooted at node i, or alternatively, the sum over the capacitances at the nodes times the shared resistance between the path of interest and the path to the node.
- Explaining terms in formula
- Mentioning something that shows that it can be done in linear time ("lumped"
or "shared" resistances, "recursive" calculations, etc)

Q5: Given a unit wire resistance "r" and a unit wire capacitance "c", a wire segment of length "l" and width "w" has resistance "l/w" and capacitance "cwl". Can we reduce the Elmore delay by changing the width of a wire segment? Explain your answer.
Ans: You needed to mention that by scaling different segments by different amounts, you can reduce the delay (e.g. wider segments near the root and narrower segments near the leaves. Delay is independent of width because the "w" term cancels out.

Q6: Extend the ZST-DME algorithm to embed a binary tree such that the Elmore delay from the root to each leaf of the tree is identical.
Ans: You needed to mention that a new procedure is needed for calculating the Elmore delay assuming that certain merging points are chosen, instead of just the total downstream wire-length. The merging segment becomes a set of points with equal Elmore delay instead of just equal path length. You could refer the paper "Low-Cost Single-Layer Clock Trees With Exact Zero Elmore Delay Skew", Andrew B. Kahng and Chung-Wen Albert Tsao.

Q7: IPO (sometimes also referred to as "In-Place Optimization") tries to optimize the design timing by buffering long wires, resizing cells, restructuring logic etc.
Explain how these IPO steps affect the quality of the design in terms of area, congestion, timing slack.
(a) Why is this called "In-Place Optimization" ?
(b) Why are the two IPO steps different ?
(c) Why are both used ?

Ans: IPO optimizes timing by buffer insertion and cell resizing. Important steps that are performed in IPO include fixing {setup,hold} time, max. transition time violation. Timing slack along all arcs is optimized by these operations. Increase in area and reduction in timing slack depend upon timing and IPO constraints.
(a) This step is referred to as "In-Place Optimization" because IPO performs resizing and buffer in-place (between cells in the row). It does not perform placement optimization in this step.
(b) The first IPO1 step is performed after placement. It performs trial-route--> extraction --> timing analysis to determine the quality of placement. Setup and hold time fixing is done according to result of timing analysis. The second IPO step is performed after clock tree synthesis. CTS performs clock buffer insertion to balance skews among all flip-flops. IPO2 step optimizes timing paths between flip-flops taking the actual clock skew.
(c) If IPO2 step is not performed after CTS, then timing paths between flip-flops are not tuned for clock skew variation. Even though NanoRoute performs timing optimization, it is more of buffer insertion in long interconnect to fix setup and hold times.

Q8: Clocking and Place-Route Flow. Consider the following steps:
- Clock sink placement
- Standard-cell global placement
- Standard-cell detailed placement
- Standard-cell ECO placement
- Clock buffer tree construction
- Global signal routing
- Detailed signal routing
- Bounded-skew (balanced) clock (sub)net routing
- Steiner clock (sub)net routing
- Clock sink useful skew scheduling (i.e., solving the linear program, etc.)
- Post-placement (global routing based) static timing analysis
- Post-detailed routing static timing analysis
(a) As a designer of a clock distribution flow for high-performance standard-cell based ASICs, how would you order these steps? Is it possible to use some steps more than once, others not at all (e.g., if subsumed by other steps).
(b) List the criteria used for assessing possible flows.
(c) What were the 3 next-best flows that you considered (describe as variants of your flow), and explain why you prefer your given answer.

Ans:(a) My basic flow:
(1) SC global placement
(2) post-placement STA
(3) clock sink useful-skew scheduling
(4) clock buffer tree construction that is useful-skew aware (cf. associative skew.)
(5) standard-cell ECO placement (to put the buffers into the layout)
(6) Steiner clock subnet routing at lower levels of the clock tree (following CTGen type paradigm)
(7) bounded-skew clock subnet routing at all higher levels of the clock tree, and as necessary even at lower levels, to enforce useful skews
(8) global signal routing
(9) detailed signal routing,
(10) post-detailed routing STA
(b)Criteria:
(1) likelihood of convergence with maximum clock frequency
(2) minimization of CPU time (by maximizing incremental steps, minimizing .detailed. steps, and minimizing iterations)
(3) make a good trade-off between wiring-based skew control and wire cost (this suggests Steiner routing at lower levels, bounded-skew routing at higher levels).
[Comment 1. Criteria NOT addressed: power, insertion delay, variant flow for hierarchical clocking or gated clocking.
Comment 2: I do not know of any technology for clock sink placement that can separate this from placement of remaining standard cells. So, my flow does not invoke this step. I also don't want post-route ECOs.]
(c) Variants:
(1) introduce Step 11: loop over Steps 3-10 (not adopted because cost benefit ratio was not attractive, and because there is a trial placement + global routing to drive useful-skew scheduling, buffer tree construction and ECO placement);
(2) after Steps 1-4, re-place the entire netlist (global, detailed placement) and then skip Step 5 (not adopted because benefits of avoiding ECO placement and leveraging a good clock skeleton were felt to be small-buffer tree will largely reflect the netlist structure, and replacing can destroy assumptions made in Steps 3-4);
(3) can iterate the first 5 steps essentially by iterating: clock sink placement, (ECO placement for legalization), (incremental) standard-cell (global + detailed) placement (not adopted because I feel that any objective for standalone clock sink placement would be very "fuzzy", e.g., based on sizes of intersections of fan-in/fan-out cones of sequentially adjacent FFs)

Q9: If we migrate to the next technology node and double the gate count of a design, how would you expect the size of the LEF and routed DEF files to change? Explain your reasoning.
Ans: The LEF file will remain roughly the same size (same richness of cell library, say, between 500-1200 masters), modulo possible changes in conventions (e.g., CTLF used to be a part of LEF) and modulo possible additional library model semantics (e.g., adding power modeling into LEF). The DEF file should at least double (the components and nets will double, but if there is extra routing complexity (more complex geometries, and more segments per connection due to antenna rules or badly scaling router heuristics) the DEF could grow significantly faster.

## Ways to improve your Interview Skills

With so much useful advise and talented career experts out there with often differing opinions, you will most likely end up in a 'black hole' wasting your precious time and money in the process. This blog was created when i was interviewing way back in 2004 mainly as a placeholder for collecting all the interview questions i faced and documenting all the stupidities, oddities and irrelevance i had to go through in that process. More so often i had to come across so many head hunters and career search firms who failed to pay attention to the detail or not experienced enough to understand the requirements of the candidate or the job. The outcome is once wasted interview opportunity and the likelihood that you may not interview with that firm again. But all these facts should not demotivate you as this world is full of surprises and be ready to take on hurdles as they come along.With that in mind, I promise if you take some 30 odd minutes to read these posts, you'll already be ahead of the game. Apply some of the techniques, and I guarantee you'll see some results.
Best of luck.

## Win great prizes by answering the Interview Questions**

One of the most popular topics on this blog is the series on Interview Questions. With over 105 Interview Questions, many of which are original and unique we continue in that trend to post a new set very often. It just does not stop here. The person who can answer all questions right in the comments sections, stands a chance to win a Home Burglar Alarm with integrated motion sensor, 105db loud buzzer, Wall-mount and remote activation. (** The person needs to answer all questions correctly in a given set starting with the questions posted from May 2010. The product will only be shipped for free to addressee in India. Shipping charges of \$10 applies for addressees outside India. Last but not the least, you have to support this blog by first registering as a follower in the Join this Blog section (google sign-up), be part of the Facebook fan page and should be subscribed to our rss feeds or email!).

## VLSI/ASIC/VHDL Interview Questions

One of the most popular topics on this blog is the series on Interview Questions. With over 105 Interview Questions, many of which are original and unique we continue in that trend to post a new set today. It just does not stop here. The person who can answer all questions right in the comments sections, stands a chance to win a Home Burglar Alarm device with integrated motion sensor** (Details will be posted later and the product will only shipped to addressee in India).

Now for the Questions!
1. For a combinational process in VHDL, the sensitivity list should contain all of the signals that are read in the process. Please give a detailed reason and an exception to this statement.
2. For a combinational process, every signal that is assigned to, must be assigned to in every branch of If-Then-Else statement and Case statement. Why?
3. Each signal should be assigned to in only one process. Please give a detailed reason and an exception to this statement.
4. Separate unrelated signals into different processes. Give atleast two reasons!
5. In a state-machine, illegal and unreachable states should transition to the reset state. Explain.
6. If your state-machine has less than 16 states, use a one-hot encoding. Explain.
7. Include a reset signal in all clocked circuits. Explain.
8. For implicit state-machines, check for reset after every wait statement.
9. Connect reset to the important control signals in the design, such as the state signal. Do-not reset every flip-flop. Explain.
10.Use synchronous, not asynchronous reset. Explain.

**Only original answers will be eligible for the lucky draw. Google searched and copied answers will be disqualified.
Last but not the least, you have to support this blog by first being a follower in the Join this Blog section (google sign-up), part of the Facebook fan page and should be subscribed to rss feeds!
Good luck.

## Infineon India Walk-in Interviews (17 April 2010, Saturday)

Infineon provides semiconductor and system solutions, focusing on three central needs of our modern society: energy efficiency, communications and security. With some 25,000 employees worldwide (as of Jan, 2010), Infineon achieved 3.027 billion euros in sale in the 2009 fiscal year. Strong technology portfolio with about 22,900 patents and patent applications; more than 30 major R&D locations.

The Walk-in Interviews will be held between 10:00 AM and 4:00 PM @ Hotel Royal Orchid, Old Airport Road, Bangalore - 8. Please see attached flyer for more details!!