17 April 2008

Issues with Multi Height Cell Placement in Multi Vt Flow

Creating the reference libraries

There are two reference libraries required. One is low Vt cell library and another is high Vt cell library. These libraries have two different height cells. Reference libraries are created as per the standard synopsys flow. Library creation flow is given in Figure 1. Read_lib command is used for this purpose. As TF and LEF files are available TF+LEF option is chosen for library creation. After the completion of the physical library preparation steps, logical libraries are prepared.





Figure 1 Library preparation command window


Different Unit Tile Creation

The unit tile height of lvt cells is 2.52 µ and hvt cells are 1.96 µ. Hence two separate unit tiles have to be created and should be added in the technology file. Hvt reference library is created with the unit tile name “unit” and lvt reference library is created with unit tile name “lvt_unit”. By default “unit” tile is defined in technology file and the other unit tile “lvt_unit” is also added to the technology file.



Figure 2. Tile height specifications in library preparation


Floor Planning

70% of the core utilization is provided. Aspect ratio is kept at 1. Rows are flipped, double backed and made channel less. No Top Design Format (TDF) file is selected as default placement of the IO pins are considered. Since we have multi height cells in the reference library separate placement rows have to be provided for two different unit tiles. The core area is divided into two separate unit tile section providing larger area for Hvt unit tile as shown in the Figure 3.



Figure 3. Different unit tile placement

First as per the default floor planning flow rows are constructed with unit tile. Later rows are deleted from the part of the core area and new rows are inserted with the tile “lvt_unit”. Improper allotment of area can give rise to congestion. Some iteration of trial and error experiments were conducted to find best suitable area for two different unit tiles. The “unit” tile covers 44.36% of core area while “lvt_unit” 65.53% of the core area. PR summary report of the design after the floor planning stage is provided below.


PR Summary:

Number of Module Cells: 70449

Number of Pins: 368936

Number of IO Pins: 298

Number of Nets: 70858

Average Pins Per Net (Signal): 3.20281

Chip Utilization:

Total Standard Cell Area: 559367.77

Core Size: width 949.76, height 947.80; area 900182.53

Chip Size: width 999.76, height 998.64; area 998400.33

Cell/Core Ratio: 62.1394%

Cell/Chip Ratio: 56.0264%

Number of Cell Rows: 392


Placement Issues with Different Tile Rows


Legal placement of the standard cells is automatically taken care by Astro tool as two separate placement area is defined for multi heighten cells. Corresponding tile utilization summary is provided below.


PR Summary:

[Tile Utilization]

============================================================

unit 257792 114353 44.36%

lvt_unit 1071872 702425 65.53%

============================================================

But this method of placement generates unacceptable congestion around the junction area of two separate unit tile sections. The congestion map is shown in Figure 4.




Figure 4. Congestion

There are two congestion maps. One is related to the floor planning with aspect ratio 1 and core utilization of 70%. This shows horizontal congestion over the limited value of one all over the core area meaning that design can’t be routed at all. Hence core area has to be increased by specifying height and width. The other congestion map is generated with the floor plan wherein core area is set to 950 µm. Here we can observe although congestion has reduced over the core area it is still a concern over the area wherein two different unit tiles merge as marked by the circle. But design can be routable and can be carried to next stages of place and route flow provided timing is met in subsequent implementation steps.


Tighter timing constraints and more interrelated connections of standard cells around the junction area of different unit tiles have lead to more congestion. It is observed that increasing the area isn't a solution to congestion. In addition to congestion, situation verses with the timing optimization effort by the tool. Timing target is not able to meet. Optimization process inserts several buffers around the junction area and some of them are placed illegally due to the lack of placement area.


Corresponding timing summary is provided below:


Timing/Optimization Information:

[TIMING]

Setup Hold Num Num

Type Slack Num Total Target Slack Num Trans MaxCap Time

========================================================

A.PRE -3.491 3293 -3353.9 0.100 10000.000 0 8461 426 00:02:26

A.IPO -0.487 928 -271.5 0.100 10000.000 0 1301 29 00:01:02

A.IPO -0.454 1383 -312.8 0.100 10000.000 0 1765 36 00:01:57

A.PPO -1.405 1607 -590.9 0.100 10000.000 0 2325 32 00:00:58

A.SETUP -1.405 1517 -466.4 0.100 -0.168 6550 2221 31 00:04:10

========================================================


Since the timing is not possible to meet design has to be abandoned from subsequent steps. Hence in a multi vt design flow cell library with multi heights are not preferred.


References

[1] Astro, User Guide, Version X-2005.09, September 2005



Power Gating

2 Power Gating is effective for reducing leakage power [3]. Power gating is the technique wherein circuit blocks that are not in use are temporarily turned off to reduce the overall leakage power of the chip. This temporary shutdown time can also call as "low power mode" or "inactive mode". When circuit blocks are required for operation once again they are activated to "active mode". These two modes are switched at the appropriate time and in the suitable manner to maximize power performance while minimizing impact to performance. Thus goal of power gating is to minimize leakage power by temporarily cutting power off to selective blocks that are not required in that mode.

Power gating affects design architecture more compared to the clock gating. It increases time delays as power gated modes have to be safely entered and exited. The possible amount of leakage power saving in such low power mode and the energy dissipation to enter and exit such mode introduces some architectural trade-offs. Shutting down the blocks can be accomplished either by software or hardware. Driver software can schedule the power down operations. Hardware timers can be utilized. A dedicated power management controller is the other option.


An externally switched power supply is very basic form of power gating to achieve long term leakage power reduction. To shutoff the block for small interval of time internal power gating is suitable. CMOS switches that provide power to the circuitry are controlled by power gating controllers. Output of the power gated block discharge slowly. Hence output voltage levels spend more time in threshold voltage level. This can lead to larger short circuit current.


Power gating uses low-leakage PMOS transistors as header switches to shut off power supplies to parts of a design in standby or sleep mode. NMOS footer switches can also be used as sleep transistors. Inserting the sleep transistors splits the chip's power network into a permanent power network connected to the power supply and a virtual power network that drives the cells and can be turned off.


The quality of this complex power network is critical to the success of a power-gating design. Two of the most critical parameters are the IR-drop and the penalties in silicon area and routing resources. Power gating can be implemented using cell- or cluster-based (or fine grain) approaches or a distributed coarse-grained approach.


Power-gating parameters


Power gating implementation has additional considerations than the normal timing closure implementation. The following parameters need to be considered and their values carefully chosen for a successful implementation of this methodology [1] [2].


  • Power gate size: The power gate size must be selected to handle the amount of switching current at any given time. The gate must be bigger such that there is no measurable voltage (IR) drop due to the gate. Generally we use 3X the switching capacitance for the gate size as a rule of thumb. Designers can also choose between header (P-MOS) or footer (N-MOS) gate. Usually footer gates tend to be smaller in area for the same switching current. Dynamic power analysis tools can accurately measure the switching current and also predict the size for the power gate.


  • Gate control slew rate: In power gating, this is an important parameter that determines the power gating efficiency. When the slew rate is large, it takes more time to switch off and switch-on the circuit and hence can affect the power gating efficiency. Slew rate is controlled through buffering the gate control signal.


  • Simultaneous switching capacitance: This important constraint refers to the amount of circuit that can be switched simultaneously without affecting the power network integrity. If a large amount of the circuit is switched simultaneously, the resulting "rush current" can compromise the power network integrity. The circuit needs to be switched in stages in order to prevent this.


  • Power gate leakage: Since power gates are made of active transistors, leakage is an important consideration to maximize power savings.



Fine-grain power gating


Adding a sleep transistor to every cell that is to be turned off imposes a large area penalty, and individually gating the power of every cluster of cells creates timing issues introduced by inter-cluster voltage variation that are difficult to resolve. Fine-grain power gating encapsulates the switching transistor as a part of the standard cell logic. Switching transistors are designed by either library IP vendor or standard cell designer. Usually these cell designs conform to the normal standard cell rules and can easily be handled by EDA tools for implementation.


The size of the gate control is designed with the worst case consideration that this circuit will switch during every clock cycle resulting in a huge area impact. Some of the recent designs implement the fine-grain power gating selectively, but only for the low Vt cells. If the technology allows multiple Vt libraries, the use of low Vt devices is minimum in the design (20%), so that the area impact can be reduced. When using power gates on the low Vt cells the output must be isolated if the next stage is a high Vt cell. Otherwise it can cause the neighboring high Vt cell to have leakage when output goes to an unknown state due to power gating.


Gate control slew rate constraint is achieved by having a buffer distribution tree for the control signals. The buffers must be chosen from a set of always on buffers (buffers without the gate control signal) designed with high Vt cells. The inherent difference between when a cell switches off with respect to another, minimizes the rush current during switch-on and switch-off.


Usually the gating transistor is designed as a high vt device. Coarse-grain power gating offers further flexibility by optimizing the power gating cells where there is low switching activity. Leakage optimization has to be done at the coarse grain level, swapping the low leakage cell for the high leakage one. Fine-grain power gating is an elegant methodology resulting in up to 10X leakage reduction. This type of power reduction makes it an appealing technique if the power reduction requirement is not satisfied by multiple Vt optimization alone.


Coarse-grain power gating


The coarse-grained approach implements the grid style sleep transistors which drives cells locally through shared virtual power networks. This approach is less sensitive to PVT variation, introduces less IR-drop variation, and imposes a smaller area overhead than the cell- or cluster-based implementations. In coarse-grain power gating, the power-gating transistor is a part of the power distribution network rather than the standard cell.


There are two ways of implementing a coarse-grain structure:

1) Ring-based

2) column-based


  • Ring-based methodology: The power gates are placed around the perimeter of the module that is being switched-off as a ring. Special corner cells are used to turn the power signals around the corners.


  • Column-based methodology: The power gates are inserted within the module with the cells abutted to each other in the form of columns. The global power is the higher layers of metal, while the switched power is in the lower layers.


Gate sizing depends on the overall switching current of the module at any given time. Since only a fraction of circuits switch at any point of time, power gate sizes are smaller as compared to the fine-grain switches. Dynamic power simulation using worst case vectors can determine the worst case switching for the module and hence the size. IR drop can also be factored into the analysis. Simultaneous switching capacitance is a major consideration in coarse-grain power gating implementation. In order to limit simultaneous switching daisy chaining the gate control buffers, special counters are used to selectively turn on blocks of switches.


Isolation Cells


Isolation cells are used to prevent short circuit current. As the name indicates these cells isolate power gated block from the normally on block. Isolation cells are specially designed for low short circuit current when input is at threshold voltage level. Isolation control signals are provided by power gating controller. Isolation of the signals of a switchable module is essential to preserve design integrity. Usually a simple OR or AND logic can function as an output isolation device. Multiple state retention schemes are available in practice to preserve the state before a module shuts down. The simplest technique is to scan out the register values into a memory before shutting down a module. When the module wakes up, the values are scanned back from the memory.


Retention Registers


When power gating is used, the system needs some form of state retention, such as scanning out data to a RAM, then scanning it back in when the system is reawakened. For critical applications, the memory states must be maintained within the cell, a condition that requires a retention flop to store bits in a table. That makes it possible to restore the bits very quickly during wakeup. Retention registers are special low leakage flip-flops used to hold the data of main register of the power gated block. Thus internal state of the block during power down mode can be retained and loaded back to it when the block is reactivated. Retention registers are always powered up. The retention strategy is design dependent. During the power gating data can be retained and transferred back to block when power gating is withdrawn. Power gating controller controls the retention mechanism such as when to save the current contents of the power gating block and when to restore it back.


References

[1] Practical Power Network Synthesis For Power-Gating Designs, http://www.eetimes.com/news/design/showArticle.jhtml?articleID=199903073&pgno=1, 11/01/2008

[2] Anand Iyer, “Demystify power gating and stop leakage cold”, Cadence Design Systems, Inc. http://www.powermanagementdesignline.com/howto/181500691;jsessionid=NNNDVN1KQOFCUQSNDLPCKHSCJUNN2JVN?pgno=1, 11/01/2008

[3] De-Shiuan Chiou, Shih-Hsin Chen, Chingwei Yeh, "Timing driven power gating", Proceedings of the 43rd annual conference on Design automation,ACM Special Interest Group on Design Automation, pp.121 - 124, 2006




Voltage Scaling and DVFS

2

Voltage Scaling

Reducing the power supply voltage is the effective technique to reduce dynamic power with the speed penalty. Keeping all others factors constant if power scaling is scaled down propagation delay will increase. This can be compensated by scaling down the threshold voltage to the same extent as the supply voltage. This allows the circuit to produce the same speed performance at a lower Vdd. At the same time smaller threshold voltages lead to smaller noise margin and increased leakage current.


Dynamic Voltage and Frequency Scaling (DVFS)

We know that supply voltage can be reduced if frequency of operation is reduced. If reduction in supply voltage is quadratic then approximately cubic reduction of power consumption can be achieved. However, it should be noted that frequency reduction slows the operation.


The above mentioned relation between energy and voltage is not always true. The authors in [1] showed that quadratic relationship between energy and Vdd deviates as Vdd is scaled down into the sub threshold voltage level. Sub threshold leakage current increases exponentially with the supply voltage. Since in sub threshold operation the on current takes the form of sub threshold current delay increases exponentially with voltage scaling. At very low voltages dynamic power reduces quadratically. But the leakage energy increases with supply voltage reduction since leakage energy is linear with the circuit delay. Hence dynamic and leakage power becomes comparable in sub threshold voltage region.


According to Bo Zhai et al. [1] dynamic voltage and frequency scaling is very popular low power technique. But larger voltage ranges does not improve power efficiency. They showed that for sub threshold supply voltages, leakage energy becomes dominant, making "just in time completion" energy inefficient. They also showed that extending voltage range below half Vdd will improve the energy efficiency for most processor designs while extending this range to sub threshold operations is beneficial only for specific applications. One of the important points to be noted from their study is DVFS in sub threshold voltage range is never energy efficient.


References

[1] Bo Zhai, David Blaauw, Dennis Sylvester and Krisztian Flaunter, "Theoretical and Practical Limits of Dynamic Voltage Scaling", DAC , San Diago, California, USA, pp.868-873, June 7-11, 2004


Multiple Threshold CMOS (MTCMOS) Circuits

James T. Kao et al. [2] showed MTCMOS logic is effective standby leakage control technique, but difficult to implement since sleep transistor sizing is highly dependent on discharge pattern within the circuit block. They showed dual Vt domino logic avoids the sizing difficulties and inherent performance associated with MTCMOS. High Vt cells are used where leakage has to be prevented whereas low Vt cells are employed where speed is of concern. Both cells are effectively used in MTCMOS technique.



MTCMOS technique [1]

In active mode of operation the high Vt transistors are turned off and the logic gates consisting of low Vt transistors can operate with low switching power dissipation and smaller propagation delay. In standby mode the high Vt transistors are turned off thereby cutting off the internal low Vt circuitry.


Variable Threshold CMOS (VTCMOS)

One of the efficient methods to reduce power consumption is to use low supply voltage and low threshold voltage without loosing speed performance. But increase in the lower threshold voltage devices leads to increased sub threshold leakage and hence more standby power consumption. Variable Threshold CMOS (VTCMOS) devices are one solution to this problem. In VTCMOS technique threshold voltage of the low threshold devices are varied by applying variable substrate bias voltage from a control circuitry.


VTCMOS technique is very effective technique to reduce the power consumption with some drawbacks with respect to manufacturing of these devices. VTCMOS requires either twin well or triple well technology to achieve different substrate bias voltage levels at different parts of the IC. The area overhead of the substrate bias control circuitry is negligible. [1]



References

[1] Sung Mo Kang and Yusuf Leblebici, “CMOS Digital Integrated Circuits-Analysis and Design”, Tata McGraw Hill, Third Edition, New Delhi, 2003

[2] James T. Kao and Anantha P. Chandrakasan, “Dual-Threshold Voltage Techniques for Low-Power Digital Circuits”, IEEE Journal Of Solid-state Circuits, Vol. 35, No. 7, pp.1009-1018, July 2000



Multi Threshold (MVT) Voltage Technique

Multiple threshold voltage techniques use both Low Vt and High Vt cells. Use lower threshold gates on critical path while higher threshold gates off the critical path. This methodology improves performance without an increase in power. Flip side of this technique is that Multi Vt cells increase fabrication complexity. It also lengthens the design time. Improper optimization of the design may utilize more Low Vt cells and hence could end up with increased power!

Ruchir Puri et al. [2] have discussed the design issues related with multiple supply voltages and multiple threshold voltages in the optimization of dynamic and static power. They noted several advantages of Multi Vt optimization. Multi Vt optimization is placement non disturbing transformation. Footprint and area of low Vt and high Vt cells are same as that of nominal Vt cells. This enables time critical paths to be swapped by low Vt cells easily.


Frank Sill et al. [3] have proposed a new method for assignment of devices with different Vth in a double Vth process. They developed mixed Vth gates. They showed leakage reduction of 25%. They created a library of LVT, mixed Vt, HVT and Multi Vt. They compared simulation results with a LVT version of each design. Leakage power dissipation decreased by average 65% with mixed Vth technique compared to the LVT implementation.


Meeta Srivatsav et al. [4] have explored various ways of reducing leakage power and recommended Multi Vt approach. They have carried out analysis using 130 nm and 90 nm technology. They synthesized design with different combination of target library. The combinations were Low Vt cells only, High Vt cells only, High Vt cells with incremental compile using Low Vt library, nominal (or regular) Vt cell and Multi Vt targeting Hvt and Lvt in one go. With only Low Vt highest leakage power of 469 µw was obtained. With only High Vt cells leakage power consumption was minimum but timing was not met (-1.13 of slack). With nominal Vt moderate leakage power value of 263 µw was obtained. Best results (54 µw with timing met) obtained for synthesis targeting Hvt library and incremental compile using Lvt library.


Different low leakage synthesis flows are carried out by Xiaodong Zhang [1] using Synopsys EDA tools are listed below:


  • Low-Vt --> Multi-Vt flow: This produces least cell count and least dynamic power. But produce highest leakage power. It takes very low runtime. Good for a design with very tight timing constraints


  • Multi-Vt one pass flow: It takes longest runtime and can be used in most of designs.


  • High-Vt --> Multi-Vt flow: Produce least leakage power consumption but has high cell count and dynamic power. This methodology is good for leakage power critical design.


  • High-Vt --> Multi-Vt with different timing constraints flow: This is a well balanced flow and produces second least leakage power. This has smaller cell count, area and dynamic power and shorter runtime. This design is also good for most of designs.


Optimization Strategies


The tradeoffs between the different Vt cells to achieve optimal performance are especially beneficial during synthesis technology gate mapping and placement optimization. The logic synthesis, or gate mapping phase of the optimization process is implemented by synthesis tool, and placement optimization is handled physical implementation tool.


Synthesis

During logic synthesis, the design is mapped to technology gates. At this point in the process optimal logic architectures are selected, mapped to technology cells, and optimized for specific design goals. Since a range of Vt libraries are now available and choices have to be made across architectures with different Vt cells, logic synthesis is the ideal place to start deploying a mix of different Vt cells into the design.


Single-Pass vs. Two-Pass Synthesis –with multiple threshold libraries


Multiple libraries are currently available with different performance, area and power utilization characteristics, and synthesis optimization can be achieved using either one or more libraries concurrently. In a single-pass flow, multiple libraries can be loaded into synthesis tool prior to synthesis optimization. In a two-pass flow, the design is initially optimized using one library, and then an incremental optimization is carried out using additional libraries.


About multi vt optimization in his paper Ruchir Puri[2] says: “The multi-threshold optimization algorithm implemented in physical synthesis is capable of optimizing several Vt levels at the same time. Initially, the design is optimized using the higher threshold voltage library only. Then, the Multi-Vt optimization computes the power-performance tradeoff curve up to the maximum allowable leakage power limit for the next lower threshold voltage library. Subsequently, the optimization starts from the most critical slack end of this power-performance curve and switches the most critical gate to next equivalent lower-Vt version. This will increase the leakage in the design beyond the maximum permissible leakage power. To compensate for this, the algorithm picks the least critical gate from the other end of the power-performance curve and substitutes it back with its higher-Vt version. If this does not bring the leakage power below the allowed limit, it traverses further from the curve (from least critical towards more critical) substituting gates with higher-Vt gates, until the leakage limit is satisfied. Then we jump back to the second most critical cell and switch it to the lower-Vt version. This iteration continues until we can no longer switch any gate with the lower vt version without violating the leakage power limit.”


But Amit Agarwal et al. [5] have warned about the yield loss possibilities due to dual Vt flows. They showed that in nano-scale regime, conventional dual Vt design suffers from yield loss due to process variation and vastly overestimates leakage savings since it does not consider junction BTBT (Band To Band Tunneling) leakage into account. Their analysis showed the importance of considering device based analysis while designing low power schemes like dual Vt. Their research also showed that in scaled technology, statistical information of both leakage and delay helps in minimizing total leakage while ensuring yield with respect to target delay in dual Vt designs. However, nonscalability of the present way of realizing high Vt, requires the use of different process options such as metal gate work function engineering in future technologies.


References

[1] Xiaodong Zhang, “High Performance Low Leakage Design Using Power Compiler and Multi-Vt Libraries”, Synopsys, SNUG, Europe, 2003, www.synopsys.com, 10/9/2007

[2] Ruchir Puri, “Minimizing Power Under Performance Constraint”, International Conference on Integrated Circuit Design and technology, IEEE, pp.159-163, May 17-20 2004

[3] Frank Sill, Frank Grassert and Dirk Timmermann, “Reducing Leakage with Mixed-Vth (MVT), 18th International Conference on VLSI Design, IEEE, pp.874-877, January 2005

[4] Meeta Srivatsav, S.S.S.P. Rao and Himanshu Bhatnagar, “Power Reduction Technique Using Multi-vt Libraries, Fifth International Workshop on System-on-Chip for Real Time Applications, IEEE, pp. 363-367, 2005

[5] Amit Agannral, Kunhyuk Kang, Swarup K. Bhunia, James D. Gallagher, and Kaushik Roy, “Effectiveness of Low Power Dual-Vt Designs in Nano-Scale Technologies Under Process Parameter Variations”, ACM, ISLPED’O5, August 8-10,2005, San Diego, California, USA. 2005.


Multi Vdd (Voltage)

222Dynamic power is directly proportional to power supply. Hence naturally reducing power significantly improves the power performance. At the same time gate delay increases due to the decreased threshold voltage. High voltage can be applied to the timing critical path and rest of the chip runs in lower voltage. Overall system performance is maintained. Different blocks having different voltage supplies can be integrated in SoC. This increases power planning complexity in terms of laying down the power rails and power grid structure. Level shifters are necessary to interface between different blocks.



Multiple Voltage ASIC/SoC Design: Classification


Multi voltage design strategies can be broadly classified as follows [1]:


  • Static Voltage Scaling (SVS): Different but fixed voltage is applied to different blocks or subsystems of the SoC design.


  • Multi-level Voltage Scaling (MVS): The block or subsystem of the ASIC or SoC design is switched between two or more voltage levels. But for different operating modes limited numbers of discrete voltage levels are supported.


  • Dynamic Voltage and Frequency Scaling (DVFS): Voltage as well as frequency is dynamically varied as per the different working modes of the design so as to achieve power efficiency. When high speed of operation is required voltage is increased to attain higher speed of operation with the penalty of increased power consumption.


  • Adaptive voltage Scaling (AVS): Here voltage is controlled using a control loop. This is an extension of DVFS.


Multi Voltage Design Challenges


Level Shifters


Signals crossing from one voltage domain to another voltage domain have to be interfaced through the level shifter buffers which appropriately shift the signal levels. Design of suitable level shifter is a challenging job.


Timing Analysis


Timing analysis of the given design becomes simpler with the single voltage as it can be performed for single performance point based on the characterized libraries. Tools can optimize the design for worst case PVT (Process, Voltage, temperature) conditions. This is not the case with multi voltage designs. Libraries should be characterized for different voltage levels that are used in the design. EDA tool has to optimize individual blocks or subsystems and also multiple voltage domains. This analysis becomes complex for larger ASIC/SoC.


Floor planning and Power Planning


Multiple power domain demands multiple power grid structure and a suitable power distribution among them. For a larger ASIC/SoC more careful floor planning and power planning is essential. The speed in which different power domains switch on or off is also important. A low voltage power domain may activate early compared to the high voltage domain. Multi voltage designs pose additional board level complexities. Separate power supply may necessary to provide different power levels.


Multi Voltage Designs: Timing Issues


Clock


Clock Tree Synthesis (CTS) tools should be aware of different power domains and understand the level shifters to insert them in appropriate places. Clock tree is routed through level shifters to reach different power domains. Simultaneous timing analysis and optimization is necessary for multiple voltage domains. Thus CTS becomes more complex in multi voltage designs.




2

Timing Issues with multi voltage design


Static Timing Analysis (STA)


Timing analysis for single voltage design is easy. When it comes to static voltage scaling it becomes little tougher job as analysis has to be carried out for different voltages. This methodology requires libraries which are characterized for different voltages used. Multi level and dynamic voltage scaling pose a greater challenge. For each supply voltage level or operating point constraints are specified. There can be different operating modes for different voltages. Constraints need not be same for all modes and voltages. The performance target for each mode can vary. EDA tool should be capable of handling all these situations simultaneously to carry out timing analysis. Different constraints at different modes and voltages have to be satisfied.


Multi Voltage Designs: Power Planning Issues


Efficient power planning is one of the key concerns of modern SoC designs. In multi voltage designs providing power to the different power domains is challenging. Every power domain requires independent local power supply and grid structure and some designs may even have a separate power pad. Separate power pad is possible in flip-chip designs and power pad can be taken out near from the power domain. Other chips have to take out the power pads from the periphery which can put limit to the number of power domains.


Local on chip voltage regulation is good idea to provide multiple voltages to different circuits. Unfortunately most of the digital CMOS technologies are not suitable for the implementation of either switched mode of operation or linear voltage regulations. Separate power rail structure is required for each power domain. These additional power rails introduce different levels of IR drop putting limit to the achievable power efficiency.


References

[1] Michael Keating, David Flynn, Robert Aitken, Alan Gibsons and Kaijian Shi, “Low Power Methodology Manual for System on Chip Design”, Springer Publications, NewYork, 2007, www.lpmm-book.org, 4/9/2007

Clock Gating

Clock tree consume more than 50 % of dynamic power. The components of this power are:

1) Power consumed by combinatorial logic whose values are changing on each clock edge
2) Power consumed by flip-flops and

3) The power consumed by the clock buffer tree in the design.

It is good design idea to turn off the clock when it is not needed. Automatic clock gating is supported by modern EDA tools. They identify the circuits where clock gating can be inserted.


RTL clock gating works by identifying groups of flip-flops which share a common enable control signal. Traditional methodologies use this enable term to control the select on a multiplexer connected to the D port of the flip-flop or to control the clock enable pin on a flip-flop with clock enable capabilities. RTL clock gating uses this enable signal to control a clock gating circuit which is connected to the clock ports of all of the flip-flops with the common enable term. Therefore, if a bank of flip-flops which share a common enable term have RTL clock gating implemented, the flip-flops will consume zero dynamic power as long as this enable signal is false.

There are two types of clock gating styles available. They are:

1) Latch-based clock gating
2) Latch-free clock gating.


Latch free clock gating

The latch-free clock gating style uses a simple AND or OR gate (depending on the edge on which flip-flops are triggered). Here if enable signal goes inactive in between the clock pulse or if it multiple times then gated clock output either can terminate prematurely or generate multiple clock pulses. This restriction makes the latch-free clock gating style inappropriate for our single-clock flip-flop based design.



2

Latch free clock gating


Latch based clock gating

The latch-based clock gating style adds a level-sensitive latch to the design to hold the enable signal from the active edge of the clock until the inactive edge of the clock. Since the latch captures the state of the enable signal and holds it until the complete clock pulse has been generated, the enable signal need only be stable around the rising edge of the clock, just as in the traditional ungated design style.



Latch based clock gating

2


Specific clock gating cells are required in library to be utilized by the synthesis tools. Availability of clock gating cells and automatic insertion by the EDA tools makes it simpler method of low power technique. Advantage of this method is that clock gating does not require modifications to RTL description.


References

[1] Frank Emnett and Mark Biegel, “Power Reduction Through RTL Clock Gating”, SNUG, San Jose, 2000

[2] PrimeTime User Guide

15 April 2008

Low Power Design Techniques

Michael Keating et al. [1] lists several low power techniques to tackle the dynamic and static power consumption in modern SoC designs. Dynamic power control techniques include clock gating, multi voltage, variable frequency, and efficient circuits. Leakage power control techniques include power gating, multi Vt cells. Common methods supported by EDA tools include clock gating, gate sizing, low power placement, register clustering, low power CTS, multi Vt optimization.

Some of the low power techniques in use today are listed in below table.



Different Low Power Techniques [3]




2

Trade-offs associated with the various power management techniques [2]


Above table summarizes trade-offs associated with different power management techniques. Power gating and DVFS demand large methodology change whereas multi vt and clock gating affect least. Unless large leakage optimization is not necessary it is always beneficial to go with either multi vt or clock gating techniques. Based on the design complexity and requirements combination of any low power techniques can be adopted. Multi vt optimization along with the power gating is found to be efficient in some of the complex designs. Advanced improvements in the implementation (i.e. fabrication) technology has allowed substrate biasing techniques to be used heavily as it does not pose any architectural and design verification challenges and also provides high leakage reduction.

References

[1] Michael Keating, David Flynn, Robert Aitken, Alan Gibsons and Kaijian Shi, “Low Power Methodology Manual for System on Chip Design”, Springer Publications, NewYork, 2007, www.lpmm-book.org, 4/9/2007

[2] Creating Low-Power Digital Integrated Circuits – The Implementation Phase, Cadence, 2007

[3]Godwin Maben, "Low Power Techniques in Use Today"

Reverse Biased Diode Current (Junction Leakage)-Gate Induced Drain Leakage (GIDL)- Gate Oxide Tunneling

2

Reverse Biased Diode Current (Junction Leakage)

Parasitic diodes formed between the diffusion region of the transistor and substrate consumes power in the form of reverse bias current which is drawn from the power supply. Junction leakage results from minority carrier diffusion and drift near the edge of depletion regions, and also from generation of electron hole pairs in the depletion regions of reverse-bias junctions. When both n regions and p regions are heavily doped, as is the case for some advanced MOSFETs, there will also be junction leakage due to band-to-band tunneling (BTBT), i.e., electron tunneling from valence band of the p-side to the conduction band of the n-side.


In inverter when input is high NMOS transistor is ON and output voltage is discharged to zero. Now between drain and the n-well a reverse potential difference of Vdd is established which causes diode leakage through the drain junction. The n-well region of the PMOS transistor w.r.to p-type substrate is also reverse biased. This also leads to leakage current at the N-well junction.


The reverse current can be mathematically expressed [2] as,

Ireverse=A.Js.(e(q.Vbias/kT)-1)

where,
Vbias-->reverse bias voltage across the junction
Js-->reverse satuartion current density
A-->junction area



Gate Induced Drain Leakage (GIDL)

Gate-induced drain leakage (GIDL) is caused by high field effect in the drain junction of MOS transistors. In an NMOS transistor, when the gate is biased to form accumulation layer in the silicon surface under the gate, the silicon surface has almost the same potential as the p-type substrate, and the surface acts like a p region more heavily doped than the substrate. When the gate is at zero or negative voltage and the drain are at the supply voltage level, there can be a dramatic increase of effects like avalanche multiplication and band-to-band tunneling. Minority carriers underneath the gate are swept to the substrate, completing the GIDL path. Higher supply voltage and thinner oxide increase GIDL.





2

Response of GIDL with varying drain to bulk and gate voltage [1]


Pedram [1] has studied GIDL and has plotted response curve for GIDL with varying drain to bulk and drain to gate voltages as shown in the above figure. From the plot it can be clearly observed that GIDL increases with the increase in Vdb and Vdg.


Gate Oxide Tunneling

When there is a high electric field across a thin gate oxide layer gate oxide tunneling of electrons can result in leakage. Electrons may tunnel into the conduction band of the oxide layer; this is called Fowler-Nordheim tunneling. There can also be direct tunneling through the silicon oxide layer if it is less than 3–4 nm thick. Mechanisms for direct tunneling include electron tunneling in the conduction band (ECB), electron tunneling in the valence band (EVB), and hole tunneling in the valence band (HVB). The dominant source of leakage here is direct tunneling of electrons through gate oxide. This current depends exponentially on the oxide thickness and the VDD [3]



2

Gate current components flowing between NMOS terminals [3]


References

[1] Massoud Pedram, Leakage Power Modeling and Minimization”, University of Southern California, Dept. of EE-Systems, Los Angeles, CA 90089, ICCAD 2004 Tutorial, www.ceng.usc.edu, 10/10/2007

[2] Jan M Rabaey, Anantha Chandrakasan and Borivoje Nikolic, "Digital Integrated Circuits A Design Perspective", 2nd Edition, 2005, Prentice Hall

[3] BSIM4.2.1 MOSFET Model, Department of Electrical Engineering and Computer Sciences University of California, Berkeley, 2001


Sub Threshold Current

2

The sub threshold current always flows from source to drain even if the gate to source voltage is lesser than the threshold voltage of the device. This happens due to the carrier diffusion between the source and drain regions of the CMOS transistor in weak inversion. When gate to source voltage is smaller than but very close to threshold voltage of the device then sub threshold current becomes significant.


As observed by [4] currently, sub threshold leakage is still playing the main part in the three mechanisms. However, researchers believe that gate leakage and reverse-biased junction Band To Band Tunneling (BTBT) will be as important as sub threshold from 45 nm process downwards. In addition, with technology scaling, the gate oxide thickness will be reduced and the substrate doping densities will be increased. As a result other factors such as gate-induced drain leakage (GIDL) and drain-induced barrier lowering (DIBL) will also become more and more evident. Therefore, future effective low leakage design will need to target at several components since all of them play an important role in the total leakage consumption. Various techniques at process and circuit level exist to reduce leakage consumption, including modifying doping profile, oxide thickness and channel length. Forward or inverse body biasing is also one of them, which is a technique resulting in variable threshold CMOS.


Sub threshold current Isub, which occurs when gate voltage is below threshold voltage Vth, is a main part of leakage current [2]. Isub depends on different effects and voltages, which are formulated in following equations [1]:





2

Where

q is the electrical charge.

T is the temperature,

n is the sub threshold swing coefficient,

kB is the Boltzmann constant,

η is the drain induced barrier lowering (DIBL) coefficient,

γ is the body effect coefficient,

μ is the mobility,

Vth0 is the zero-bias threshold voltage,

Vgs is the gate-source voltage,

Vbs is the bulk-source voltage,

Vds is the drain-source voltage,

εox and εSi are the gate dielectric constants of gate oxide and silicium,

NSUB is the uniform substrate doping concentration and

NDEP the channel doping concentration,

Tox is the thickness of the oxide layer,

ФS is the surface potential,

DSUB and ETA0 are technology dependent DIBL coefficients, and

ETAB is a body-bias coefficient of the BSIM4-Modell.


The delay Td of a CMOS device can be approximated by equation (5).

Where

k’ is a technology constant,

CL is the load, and

α models the short channel effects [3].


Variation of Vth is a common technique to reduce leakage because Isub exponentially scales with Vth (see Equation 1). Thus, higher Vth results in lower leakage. However, from equation (5) follows higher Vth additionally results in longer delay [2]. Hence, optimize the design with the balance application of low Vth (LVT) and high Vth devices (HVT).


Transfer characteristics of MOSFET for VGS near Vth are shown in below figure.




Transfer characteristics of MOSFET VGS near Vth [2]

2

From the above figure it can be observed that ID increases exponentially with reduction in Vth.



2

As noted by [4] key dependencies of the sub threshold slope can be summarized as follows:

- Tox ↓ =>Cox ↑=> n ↓ =>sharper sub threshold

- NA ↑ =>Csth ↑ =>n ↑ =>softer sub threshold

- VSB ↑ =>Csth ↓ =>n ↓ =>sharper sub threshold

- T ↑ =>softer sub threshold


How to minimize sub threshold leakage?

A increase in the threshold voltage of the device keeps the Vgs of the NMOS transistor safely below the Vt,n. This is the case for logic zero input. For the logic one input increase in the threshold voltage of the device keeps the |Vgs| of the PMOS transistor safely below the |Vt,p|.


References

[1] Anantha P. Chandrakasan, Samuel Sheng and Robert W.Broadersen, “Low Power CMOS Digital Design”, IEEE Journal of Solid State Circuits, vol. 27, no. 4, pp. 472-484, April 1992

[2] Massoud Pedram, Leakage Power Modeling and Minimization”, University of Southern California, Dept. of EE-Systems, Los Angeles, CA 90089, ICCAD 2004 Tutorial, www.ceng.usc.edu, 10/10/2007

[3] Frank Sill, Frank Grassert and Dirk Timmermann, “Reducing Leakage with Mixed-Vth (MVT), 18th International Conference on VLSI Design, IEEE, pp.874-877, January 2005

[4] Wei Liu ,Techniques for Leakage Power Reduction in Nanoscale Circuits: A Survey1, Department of Informatics and Mathematical Modeling ,Technical University of Denmark , IMM Technical Report 2007

Short Circuit Power

2

When dynamic power is analyzed the switching component of power consumption, an instantaneous rise time was assumed, which insures that only one of the transistors is ON. In practice, finite rise and fall times results in a direct current path between the supply and ground, GND, this exists for a short period of time during switching.



2

Short circuit power [3]

2Consider an example of inverter. During switching both NMOS and PMOS transistors in the circuit conduct simultaneously for a short amount of time. Specifically, when the condition, VTn (lesser than) Vin (lesser than) Vdd - |VTp| holds for the input voltage, where VTn and VTp are NMOS and PMOS thresholds, there will be a conductive path open between Vdd and GND because both the NMOS and PMOS devices will be simultaneously on. This forms direct current path between the power supply and the ground. This current has no contribution towards charging of the output capacitance of the logic gate.

2

2


When the input rising voltage exceeds the threshold voltage of NMOS transistor, it starts conducting. Similarly until input voltage reaches Vdd-|Vt,p| PMOS transistor remains ON. Thus for some time both transistors are ON. Similar event causes short circuit current to flow when signal is falling. Short circuit current terminates when transition is completed.


Assuming symmetric inverter with Kn=Kp=K and Vt,n=|Vt,p|=Vt and very small capacitive load and both rise and fall times are same we can write,

Pavg(short circuit) = 1/12.k.τ.Fclk.(Vdd-2Vt)3 [1]


Thus short circuit power is directly proportional to rise time, fall time and k. Therefore reducing the input transition times will decrease the short circuit current component. But propagation delay requirements have to be considered while doing so.


Short circuit currents are significant when the rise/fall time at the input of a gate is much larger than the output rise/ fall time. This is because the short-circuit path will be active for a longer period of time. To minimize the total average short-circuits current, it is desirable to have equal input and output edge times [2]. In this case, the power consumed by the short-circuit currents is typically less than 10% of the total dynamic power. An important point to note is that if the supply is lowered to be below 2the sum of the thresholds of the transistors, Vdd (lesser than) VTn + |VTp|, the short-circuit currents can be eliminated because both devices will not be on at the same time for any value of input voltage.


References

[1] Sung Mo Kang and Yusuf Leblebici, “CMOS Digital Integrated Circuits-Analysis and Design”, Tata McGraw Hill, Third Edition, New Delhi, 2003

[2] Anantha P. Chandrakasan, Samuel Sheng and Robert W.Broadersen, “Low Power CMOS Digital Design”, IEEE Journal of Solid State Circuits, vol. 27, no. 4, pp. 472-484, April 1992

[3] Michael Keating, David Flynn, Robert Aitken, Alan Gibsons and Kaijian Shi, “Low Power Methodology Manual for System on Chip Design”, Springer Publications, NewYork, 2007, www.lpmm-book.org, 4/9/2007