Showing posts with label Low Power. Show all posts
Showing posts with label Low Power. Show all posts

Interview Question on Power Analysis

Your task is to do power analysis for a circuit that sends out a one-clock-cycle pulse on the done signal once every 16 clock cycles(done is ’0’ for 15 clock cycles, then ’1’ for one cycle, then repeat with 15 cycles of ’0’ followed by a ’1’, etc). You have been asked to consider three different types of counters: 1. Binary counter, 2. Gray-code counter, and 3. One-hot counter. (The table below
shows the values from 0 to 15 for the different encoding schemes) What is the relative amount of power consumption for the different options?

Additional Info:
Your implementation technology is an FPGA where each cell has a programmable combinational circuit and a flip-flop. The combinational circuit has 4 inputs and 1 output. The capacitive load of the combinational circuit is twice that of the flip-flop.

1. You may neglect power associated with clocks.
2. You may assume that all counters:
(a) are implemented on the same fabrication process
(b) run at the same clock speed
(c) have negligible leakage and short-circuit currents

The columns below represent, Decimal Gray One-Hot Binary in order
0 0000 0000000000000001 0000
1 0001 0000000000000010 0001
2 0011 0000000000000100 0010
3 0010 0000000000001000 0011
4 0110 0000000000010000 0100
5 0111 0000000000100000 0101
6 0101 0000000001000000 0110
7 0100 0000000010000000 0111
8 1100 0000000100000000 1000
9 1101 0000001000000000 1001
10 1111 0000010000000000 1010
11 1110 0000100000000000 1011
12 1010 0001000000000000 1100
13 1011 0010000000000000 1101
14 1001 0100000000000000 1110
15 1000 1000000000000000 1111

This question is asked widely in interviews worldwide with varying levels of difficulty. Please start discussing.

Tip: Capacitance is dependent upon the number of signals, and whether a signal is combinational or a flop.

Glitch-Free Frequency Shifting

Download this white paper from Silicon Labs to learn how to simplify your timing design using glitch-free frequency shifting. This solution addresses low-power design challenges and the complexity of generating a wide range of frequencies in consumer electronics applications including audio, video, computing or any application that requires multiple frequencies.

Understanding system-level energy-management techniques and test

Gina Bonini is the worldwide embedded-system technical-marketing manager for Tektronix. In this article she talks about Power dissipation, Bus energy dissipation, PCI-e low power mode, some power saving modes and low power DDR DRAM.

High-speed and low-power electronic circuits on carbon material

Using a heated atomic force microscope tip, researchers have drawn nanoscale conductive patterns on insulating graphene oxide. This simple trick to control graphene oxide's conductivity could pave the way for etching electronic circuits into the carbon material, an important advance toward high-speed, low-power, and potentially cheaper electronics. For more info please hear this podcast from MIT.

Dynamic power supply

Power gating adds enabling signals to a power supply network; dynamic power supply management adjusts supply voltage according to data path criticality. You are asked to take a testcase and upgrade its power supply network to dynamic power supply. How can you verify the power reduction of your technique?

Reduce Power, Area and Routing Congestion

This paper, using an example design, demonstrates how to meet challenging performance, latency and bandwidth goals by using the DesignWare Interconnect Fabric for the ARM AMBA 3 AXI while minimizing the total area, power consumption and number of top-level wires. The paper also studies the design requirements and examines the optimization features of the DesignWare Interconnect Fabric used to meet the stringent timing requirements. Detailed technical analysis is provided for the selected architecture, pipelining mode, arbitration scheme and the slave visibility feature employed to reach timing closure for the links with demanding performance requirements. Final results are presented based on the hybrid architecture of the DesignWare Interconnect Fabric used to optimize the infrastructure resulting in a reduction in area, power and routing congestion.

Designing Low Power, Multi-Primary Technology for Mobile Phone Displays

Displays have always been a major power consumer in mobile devices. In the past, one easy way to keep power demands in check was to limit the display performance in terms of luminance, color gamut, resolution and refresh rate. This way, users could get away without recharging their devices for several days. By contrast, today's power-hungry devices must be charged every day. Typical color displays require 400-500 mW, which is ~50% of the average power consumption of the device when using a multimedia application. Reducing the power consumption demand of a mobile device display is a certain way to increase the time required of a user between recharges, but today it is unrealistic to limit display performance in order to achieve that. This article offers technical information for designers about a multi-primary display solution that will reduce power consumption of the display without limiting its performance.

Fast, Easy, and Flexible Power for System Designers

Systems designers are having a difficult time developing power subsystems that supply all of their system's power needs due to varied and changing power requirements. A new type of power subsystem—the field-programmable power subsystem or FPPS—squarely addresses this issue by providing a flexible approach that costs no more than conventional switching power subsystems. This white paper discusses the advantages and benefits of field-programmable power subsystems and discusses the many ways they reduce system-design risks.

The Top 10 power management 'How To' design articles of 2008

EETimes has published an article on the Top 10 Power Management articles of 2008. {Follow Here}

Hierarchy and Power Gating

A scalable approach to chip architecture is essential and valuable since a SOC design today often becomes a component in an even larger chip in subsequent product generations.

To support this portability, module boundaries must be enforced at the power domains level. That is, a given module should belong to a single power domain, not split across several domains. Some tools and flows support RTL process by RTL process assignment to power domains, but this leads to much more complicated implementation and analysis. Clean visibility of the boundaries of a power gated block is key to having a clean, top-down implementation and verification flow.

Although one can in theory nest power gated modules arbitrarily within power gated subsystems which are in turn nested on a shared switched power rail, there are considered benefits in not creating multiple levels of power switching fabric. Power gating is intrusive and ass in some voltage drop and degradation of performance. Cascading multiple voltage drops can lead to unacceptable increases in delay. Even if the design is representeted as hierarchical at the architectural level, the implementation is improved if this is mapped on to a single level of power gating at implementation.

  • Map power gated regions to explicit module boundaries.
  • When partitioning a hierarchical power gating design ensure that the power gating control terms can be mapped back to a flat switching fabric.
  • Avoid control signals passing through power gated or power down regions to other power regions that not hierarchically switched with the first region.
  • Avoid excessively fine power gating granularity unless absolutely required for aggressive leakage power management. Every interface adds implementation and verification challenges and complicates the system level production test challenges.
  • Avoid a power gating system of more that one or two levels.

Architecture and partitioning for low power

A good working definition of architecture ( in the context of IP design, atleast) is the partitioning and interface design of the IP. In supporting various low power strategies, power gating presents the most significant architectural challenge in the architecture of IP.

To support power gating, we need to:
  1. Decide when and how the IP will be powered down and powered up.
  2. Decide which blocks will be power gated and which blocks will be always on.
  3. Design a power controller that controls the power up and power down sequence.
  4. Determine which signals need to be isolated during power down.
  5. Develop an initial strategy for clocks reset and the power control signals.

Retention mechanisms in power gated designs

How can we retain state of some of the registers in the design? How to deal with memory state?

Let me try to explain each one of them based on my recent design experience.
For regular logic blocks, there are multiple ways to wake-up faster without losing much of information.

  • Use retention flops to save state of some important registers. For example state of control block, which forms the heart of the whole system.
  • If the chip is aimed for At-Speed testing, scan chains of the design can be used to scan out the data to an external memory and scan in after wake-up. This may not be as fast as using retention flops.
  • ….. there are many more possible methods.

Again w.r.t to retention flops there were questions about, How many type of retention flops are available.
I have seen 3 types.

  1. Single save/restore pin retention latch (Slave latch being always on)
  2. Single pin balloon Latch
  3. Dual Pin balloon Latch
Pro's and Con's of Single Pin Vs Dual Pin retention flops:

Advantages of Single Pin:

  • Minimal area impact
  • Single signal controls retention

Disadvantages of Single Pin:

  • Performance Impact on the register
  • Hold Time requirements for the input data

Advantage of Dual Pin:

  • Minimal leakage power
  • Minimal performance impact compared to the Single Pin design
  • Minimal dependency on the clock for the control signals.

Disadvantages of Dual Pin:

  • Area Impact
  • More Complex System Design
  • More Buffer Network and AON network required.

Timing closure impacted by DVFS!!

While designing systems with DVFS techniques, we need to look at the impact of temperature inversion on the performance of the design. An important criteria while selecting voltages and frequencies for a design, one must consider a range such that delay/voltage consistently increases or decreases.

What does this means?
We must always operate above the temperature inversion point.

Especially in low power UDSM process, combined use of reduced VDD and High Threshold voltage may greatly modify the temperature sensitiveness of the design. Due to this, worst case timing is no longer guaranteed at higher temperatures. So in order to guarantee correct behavior of the design, one has to verify the design at various PVT corners. This leads to a significant increase in the total turn around time of the design.

In a nutshell, delay increases with increase in temperature, but below a certain voltage, this relationship inverts and delay starts to decrease with increase in temperature. This is a function of threshold voltage (Threshold voltage and carrier mobility are temperature dependent). Due to this threshold voltage dependency, we have observed that non-critical paths suddenly become critical.

Having said this, as soon as Voltage/Delay relate randomly Voltage Scaling becomes a nightmare to implement and verify.

Note: If both threshold voltage and carrier mobility monotonically decrease with increase in temperature, Operating Voltages(range) defines the performance of the design.

Voltage and Frequency scaling mechanisms

There are various voltage scaling approaches that are in use today,

Static Voltage Scaling: Different blocks in the design will be operating at different fixed supply voltages
Multi-level Voltage Scaling: An extension to static voltage scaling where in different blocks are switched between two or more voltage levels.
Dynamic Voltage and Frequency Scaling : An extension to Multi-Level Voltage Scaling Voltage levels are dynamically varied as per the work-load of the block
Adaptive Voltage Scaling : An extension to DVFS and its a closed loop representation of the above method. Power Controller block within the design adopts itself dynamically to varying work-loads.
DVFS example: Here is an outline of tasks that will be executed within a design to scale voltage and frequency dynamically, controller first decides the minimum clock speed that meets the workload requirements. It then determines the lowest supply voltage that will support that clock speed. Given below is an example of a sequence thats followed if the target frequency is higher than the current frequency

– Controller monitors the variance in work-load
– Controller detects variation in work-load and programs the device to operate at different voltage
– Block under question continues operating at the current clock frequency until the voltage settles to the new value
– Controller then programs the desired pre-determined clock frequency

Varying clocks and voltages during operation is a new methodology in the design and leads to many challenges in the design process

– Identifying the optimal combination of Voltage/Frequency
– How to model the timing behavior
– Clock and Power Supply locking times.

Interview Question

Due to a miscommunication during design, you thought your circuit was supposed to have a supply voltage of 2.1 volts (threshold voltage is 0.7 volts) and a 25 ns cycle time, and you designed it to meet those specifications. Now your boss tells you you were supposed to have a 20 ns cycle time. To avoid redesigning the whole circuit, a co-worker suggests increasing the voltage of the circuit to decrease the delay to 20 ns. The same co-worker suggests picking some arbitrary number like 3.5 volts.
  1. Determine the new cycle time of your circuit with a 3.5 volt input voltage.
  2. Your boss is worried about the additional power consumption - calculate the increase in power consumption of your circuit at 3.5 volts, assuming activity factor and capacitance remain the same and neglecting short circuit and leakage power.
  3. To satisfy your boss, calculate the minimum voltage you would increase the supply voltage to, in order to allow your circuit to run at 20 ns. You may leave your answer in non-simplified numeric terms, but not in the form of an equation to solve.

RTL considerations and Functional verification of low power designs

This article is about RTL in a Multi-Voltage environment and it's implication on verification.

In the earlier posts i discussed Multi Voltage design infrastructure. Today let's look at 'Power Gating', the most common design style to reduce Leakage Power.

Typical characteristics of this design style are:-

  1. Some of blocks in the design will be shut-down, when not functional.
  2. There will be blocks, which are always on.
  3. These blocks could be of same voltage or different voltage.
  4. The power structure to shut-down a block could be either completely external or internal. Most commonly used is internal power structure to shut-down blocks.
  5. Either VDD or Ground can be cut-off.

Consider a classical scenario, wherein implementation/verification becomes a real challenge.

"We have a chip taped-out, working fine in 65nm. We want to add more functionality to the same chip and want to accommodate the logic within the same die-area as before. To accommodate this silicon real estate requirement, we decided to move to 45nm. Since the application as well as the technology node demands extremely low leakage, we want to shut-down some blocks in the design."

Given this, it's very very challenging to accommodate Power Gating, since this chip is not architected to accommodate Power Gating

Now, given the characteristics of Power Gating Design Style, here are some facts I think, we need to consider while Micro-Architecting the design.

RTL/Micro-Architecture requirements:-

  1. Some of the blocks will be shut-down. Does your design have control logic that generates signals locally to shut-down the block ?
    • I think, if the design was architected from the beginning with power gating design style in mind, it will have a control block, which probably might make decisions on which blocks to shut-down and when and how long this has to be shut-down etc. Now the other question that comes to mind is, Is this sufficient? Do I need a separate power control block, which takes inputs from the control logic and generates the power down signals in the desired sequence ? I think it is a good practice to introduce such logic to control the complete power-down/power-on sequence.
  2. Lets look at the control signals required:
    • Control Signal for the Power Switch (Switch_enable)
    • Control Signal for the Isolation Cell Enable (Isolate_enable)
    • Control Signal for the retention flops (Save_Restore)
      • Now, I think ideally all these control signals are derivative of each other!. Its just that these signals need to be generated in the right order for the circuit to behave as desired.
      • sequence could be:
        • Inactivity generate
          1. Generate Save_Restore: This will indicate that the retention flops needs to transfer the contents from Master Latch to Slave and go into sleep mode.
          2. Generate Isolate_enable: This will enable isolation cells to be active and clamp the output to a known voltage and state.
          3. Since all the basic elements are informed of the shut-down operation, we can now generate Switch_enable, to turn off the power rails, that control specific blocks.
          4. There could be other actions such as reduce the frequency/reduce the voltage….etc as a part of this sequencing.
          5. As a part of this sequence definition, we should define the right Assertions too, so that if the right sequence is violated, this can be flagged up-front.
        • Sequence power_sequence
          1. Save_Restore && Isolate_enable && Switch_enable ==0
          2. ##1 Save_restore ==1
          3. ##1 Save_restore && Isolate_enable == 1;
          4. ##1 Save_restore && Isolate_enable && Switch_enable ==1;
        • endsequence
  3. If the above control signals exist in RTL, these are driven by power management logic but are not connected to anything!
    • Even though as said in bullet 2 these control signals are generated by Control Logic, these are not connected to anything outside this control logic. The reasons are:-
      • Power Switch that's used to cut-off power does not exist in RTL. These get added probably during Power Planning. Floor-planning/Power Planning Engineer will add these based on the specification from the Architect of the chip. Till switches are in place Switch_enable is floating.
      • In RTL there is nothing specifically done for Retention Flop, these are coded like any other register and Synthesis tool will infer them based on some commands. Save_restore end up floating.
      • Isolation cells does not exist in the RTL and hence Isolate_enable is floating.
    • Now the Question arises. How do we simulate them? We see them in order...
      1. Power Switch Behaviour
      2. Isolation Behaviour
      3. Level Shifter Behaviour
      4. Retention flop.
Remember, we don't have any representation of the above cells in the RTL. Firstly do we need to simulate the behavior of all of them? Typically in good old days, PLI's were written by verification teams to simulate all of them.

For example say,
Power Switch Behaviour: We can write a function/pli with following specification
$power(block_to_be_pd, type_of_pd(aon/shut-down)signal_used_for_shut_down(switch_enable), acknowledge signal(acknowledge))

Now this PLI should look at the "type_of_pd", which is either always_on or shut-down and act accordingly. In case the block under consideration is of type shut-down, then whenever it detects an activity on the "switch_enable" signal, it should corrupt all the signals of the block. Once all the signals are corrupted, it should generate an "acknowledge" signal after a user specified delta delay.

In my humble opinion, this should also include something like:
#0 $power(block1)
#20 $power(block2)

This enables us to simulate the behaviour of power sequencing. There is lot more that can be added to this PLI routine such as:
  1. Trace through the fanout of all the outputs of this block. Flag an Error if corrupted signals are propagated till the reciever.
  2. When switch_enable goes inactive, either reset all the logic in the block to "X" or to some random pattern.
  3. During power up, stagger the power up of different blocks randomly!!!
  4. Emulate Impact of IR-drop using staggering principle!
Isolation Behaviour: This is again pretty straight-forward. All we need is a PLI or a simple function in Verilog, which will be something like:

If "early_switch_enable" is active, maintain the output at "output_sense(1/0)" value, irrespective of the state of input.

Retention Behaviour:
This is again pretty straight-forward from a simulation perspective. The PLI or a simple function, which will be something like:
Whenever "early_early_switch_enable" is active, copy the contents of "register_names " onto a local shadow_register or a local memory, and whenever "wake_up" goes active, reload the "register_names" with contents of the shadow_register.

Now there are various complex flavors of all the above depending the circuit behaviour of these speciall cells. The major question to be answered is, are we looking at 2 different RTL? One for synthesis without any PLIs and one for simulation with PLIs. Can these PLIs be synthesized into H/W automatically by all the EDA tools available? Is this the right approach? Can these be solved using a different approach?

More Questions that need answers..
  1. Is a proper sequence for all the control signals defined ? Examples of this could be:
    • Switch_enable @ 5ns
    • Isolate_enable @ Switch_enable "+" 10ns
    • Save_Restore @ Isolate_enable "+" 20ns
  2. Now the block, which we are trying to shut-down needs to generate an Acknowledgment signal, indicating power-up or power-down. This signal is again a floating output not driven by any logic,but is processed by the power management logic!!!
  3. Is there a requirement, such as : Block needs to be powered-up within "n" clock cycles? What if you don't receive an Acknowledge within "n" clock cycles?
  4. If all the above are taken care of during micro-architecting, there are still few more questions that need to be answered for Logic Synthesis and Functional Simulation:
    – Is Isolation Cell/Level Shifter part of your RTL ? How are you coding this ? Are you instantiating it in the RTL?
    – Are Retention Flops part of RTL ? How are you coding this ? Are you instantiating it in RTL ?
    – How will the control signal get interpreted by the implementation tool, as they (the control signals) are floating?
    – How will Acknowledge signal get generated? Since it's required by power management logic, but is not generated by any hardware?
    – How will functionality of all these get verified, given that some of them are either floating or not generated ?
    – How will the shut-down get simulated ? Nothing special is done in RTL to simulate this behaviour.
    – How do we model Shut-Down to verify the functionality?
    – How will the retention flop behaviour get simulated ? In RTL it's coded like any other register.
    – When a block wakes up from shut-down, what should be the status of all the logic? Is random better or using "X" better ? Wouldn't "X" be very pessimistic ?
    – How to simulate the behaviour of "n" clock cycle requirement of the Acknowledge Signal from power-down block ?
    – If there are some always on logic residing in a shut-down block, how do we implement them? How do we verify them ?

Todays Low Power Techniques

Lets take a look at the various low power techniques in use today.
I would classify them into 2 categories

  • Structural Techniques
    • Voltage Islands
    • Multi-threshold devices
    • Multi-oxide devices
    • Minimize capacitance by custom design
    • Power efficient circuits
    • Parallelism in micro-architecture
  • Traditional Techniques
    • Clock gating
    • Power gating
    • Variable frequency
    • Variable voltage supply
    • Variable device threshold
Which one of the above techniques are aimed at reducing Dynamic Power and Leakage Power?

Dynamic Power Reduction
  • Clock Gating
  • Power efficient circuits
  • Variable frequency
  • Variable voltage supply
Leakage Power Reduction
  • Minimize usage of Low Vt Cells
  • Power Gating
  • Back Biasing
  • Reducing Dynamic Power
  • Reduce Oxide Thickness
  • Use FINFET's

Design Elements of Low Power Design

Special cells are required for implementing a Multi-Voltage design.

  1. Level Shifter
  2. Isolation Cell
  3. Enable Level Shifter
  4. Retention Flops
  5. Always ON cells
  6. Power Gating Switches/MTCMOS switch
Level Shifter: Purpose of this cell is to shift the voltage from low to high as well as high to low. Generally buffer type and Latch type level shifters are available. In general H2L LS's are very simple whereas L2H LS's are little complex and are in general larger in size(double height) and have 2 power pins. There are some placement restrictions for L2H level shifter to handle noise levels in the design. Level shifters are typically used to convert signal levels and protect against sneak leakage paths. With great care, level shifters can be avoided in some cases, but this will become less practicable on a wider scale.

Isolation Cell:
These are special cells required at the interface between blocks which are shut-down and always on. They clamp the output node to a known voltage. These cells needs to be placed in an 'always on' region only and the enable signal of the isolation cell needs to be 'always_on'. In a nut-shell, an isolation cell is necessary to isolate floating inputs.
There are 2 types of isolation cells (a) Retain "0″ (b) Retain "1″

Enable Level Shifter: This cell is a combination of a Level Shifter and a Isolation cell.

Retention Flops: These cells are special flops with multiple power supply. They are typically used as a shadow register to retain its value even if the block in which its residing is shut-down. All the paths leading to this register need to be 'always_on' and hence special care must be taken to synthesize/place/route them. In a nut-shell, "When design blocks are switched off for sleep mode, data in all flip-flops contained within the block will be lost. If the designer desires to retain state, retention flip-flops must be used".

The retention flop has the same structure as a standard master-slave flop. However, the retention flop has a balloon latch that is connected to true-Vdd. With the proper series of control signals before sleep, the data in the flop can be written into the balloon latch. Similarly, when the block comes out of sleep, the data can be written back into the flip-flop.

Always ON cells: Generally these are buffers, that remain always powered irrespective of where they are placed. They can be either special cells or regular buffers. If special cells are used, they have thier own secondary power supply and hence can be placed any where in the design. Using regular buffers as Always ON cells restricts the placement of these cells in a specific region.

In a nut-shell, "If data needs to be routed through or from sleep blocks to active blocks and If the routing distance is excessively long or the driving load is excessively large, then buffers might be needed to drive the nets. In these cases, the always-on buffers can be used."

Power Gating Switches/MTCMOS Switch: MTCMOS stands for multi-threshold CMOS, where low-Vt gates are used for speed, and high-Vt gates are used for low leakage. By using high-Vt transistors as header switches, blocks of cells can be switched off to sleep-mode, such that leakage power is greatly reduced. MTCMOS switches can be implemented in various different ways. First, they can be implemented as PMOS (header) or NMOS (footer) switches. Secondly, their granularity can be implemented on a cell-level (fine-grain) or on a block-level (coarse-grain). That is, the switches can be either built into every standard cell, or they can be used to switch off a large design block of standard cells.

Infrastructure Needs for Multi-Voltage Designs

Before we start looking at implementing a Multi-Voltage design there are certain questions that need to be answered to find out from process/library perspective such as

  1. Available Operating Voltages (PVT)
  2. Do we have special cells such as Level Shifters/Isolation cells/Power Gating Switches?
  3. If Level Shifter exists, what kinds of level shifters are available? (Ex: Enable Level Shifter etc)
  4. What are the different corners that need to be used for sign-off?
  5. How should we handle OCV?
  6. How accurate are these timing models? Is NLDM good enough or do we need CCS/ECSM models?
  7. Do we have special cells with Dual Rails(ex: Retention Flops)? If yes How is the timing captured for each rails?
  8. Are these cells characterized for Power, do they have State Dependent Path Dependent Information?
  9. If special cells exist, is it modeled according to EDA tools requirement?
  10. For Feed through implementation, do we have special Always On Buffers? What's the impact of routing the secondary power pins of these buffers on routing resources?
  11. Given range of Operating Voltages, is there an easy way at early stage of implementation cycle to judge on right Voltage selection? (power/performance product)
  12. Am I getting required power savings by implementing the design in Multi-Voltage style? For ex. If number of special cells required to implement this are too many, is it worth the effort? Can we look at an alternate way of saving power?

Multi Voltage magic

In the last few weeks i have been quite busy with a lot of research on low power design. There are so many tutorials on Low Power and everyone's concerns/questions seems to be dancing around Multi-Voltage design. What I could sense was there were lot of designers Implementing Multi-Supply(Power-Gating) as opposed to real Multi-Voltage design. There are 3 classics style of new design style i found

  1. Multi-Supply(Power Gating)
  2. Static Multi-Voltage
  3. Dynamic Voltage and Frequency Scaling

Lets us discuss about Multi-Voltage design in the forthcoming posts. Like any other design implementation, MV design has its own challenges and much more difficult to sign-off.

I would like to classify the different stages of the design into small segments so that we can discuss one by one.

  • Design Infrastructure
  • Architectural level consideration
  • Microarchitecture
  • RTL Design
  • RTL functional verification
  • Implementation
  • Functional Sign-Off
  • Silicon Signoff
  • Resources For Help

Next post is going to be a in-depth technical article i guess!