What is the difference between soft macro and hard macro?

  • What is the difference between hard macro, firm macro and soft macro?

or

  • What are IPs?


  • Hard macro, firm macro and soft macro are all known as IP (Intellectual property). They are optimized for power, area and performance. They can be purchased and used in your ASIC or FPGA design implementation flow. Soft macro is flexible for all type of ASIC implementation. Hard macro can be used in pure ASIC design flow, not in FPGA flow. Before bying any IP it is very important to evaluate its advantages and disadvantages over each other, hardware compatibility such as I/O standards with your design blocks, reusability for other designs.

Soft macros

  • Soft macros are in synthesizable RTL.
  • Soft macros are more flexible than firm or hard macros.
  • Soft macros are not specific to any manufacturing process.
  • Soft macros have the disadvantage of being somewhat unpredictable in terms of performance, timing, area, or power.
  • Soft macros carry greater IP protection risks because RTL source code is more portable and therefore, less easily protected than either a netlist or physical layout data.
  • From the physical design perspective, soft macro is any cell that has been placed and routed in a placement and routing tool such as Astro. (This is the definition given in Astro Rail user manual !)
  • Soft macros are editable and can contain standard cells, hard macros, or other soft macros.


Firm macros

  • Firm macros are in netlist format.
  • Firm macros are optimized for performance/area/power using a specific fabrication technology.
  • Firm macros are more flexible and portable than hard macros.
  • Firm macros are predictive of performance and area than soft macros.


Hard macro

  • Hard macros are generally in the form of hardware IPs (or we termed it as hardwre IPs !).
  • Hard macos are targeted for specific IC manufacturing technology.
  • Hard macros are block level designs which are silicon tested and proved.
  • Hard macros have been optimized for power or area or timing.
  • In physical design you can only access pins of hard macros unlike soft macros which allows us to manipulate in different way.
  • You have freedom to move, rotate, flip but you can't touch anything inside hard macros.
  • Very common example of hard macro is memory. It can be any design which carries dedicated single functionality (in general).. for example it can be a MP4 decoder.
  • Be aware of features and characteristics of hard macro before you use it in your design... other than power, timing and area you also should know pin properties like sync pin, I/O standards etc
  • LEF, GDS2 file format allows easy usage of macros in different tools.


From the physical design (backend) perspective:

  • Hard macro is a block that is generated in a methodology other than place and route (i.e. using full custom design methodology) and is brought into the physical design database (eg. Milkyway in Synopsys; Volcano in Magma) as a GDS2 file.

  • Here is one article published in embedded magazine about IPs. Click here to read.

Synthesis and placement of macros in modern SoC designs are challenging. EDA tools employ different algorithms accomplish this task along with the target of power and area. There are several research papers available on these subjects. Some of them can be downloaded from the given link below.


IEEE/Univerity research papers

  • "Local Search for Final Placement in VLSI Design" - download
  • "Consistent Placement of Macro-Blocks Using Floorplanning and standard cell placement" - download
  • "A Timing-Driven Soft-Macro Placement And Resynthesis Method In Interaction with Chip Floorplanning" - download

FIFO Pointers

Let us continue the discussion on FIFO....


Counters as FIFO Pointers

Two types of counters are used as FIFO pointers- binary counters and Grey counters. Each of these methods has merits and demerits. Synchronization advantages and pitfalls between the read and write clock domain is the decisive factor in choosing right counter design as pointers.

Asynchronous FIFO Pointers Using Binary Counters

Binary counter is natural counter and hence easy to design and implement. This counter works very well in addressing FIFO. In our case we have total 16 memory locations in the FIFO. Hence to address these 16=24 locations we need 4 bit counter.( actually we need to design 5 bit counter….why…? we will see later !) binary code patterns are not unidistance. Number of bits changing from one count to another count can be more than one. For example bits changing from 0001 to 0010 are 2. In worst case all 4 bits can change simultaneously like 0111 to 1000. Then changing bits (nothing but pointers) has to be synchronized with the other clock domains to generate empty and full condition. If the synchronization clock edge becomes active in between the transition of binary bits, say, 0111 to 1000, then metastability can occur with any of the four bits or with all the bits. This metastable state can be resolved to any four-bit count value prediction of which is almost impossible. The pointer value synchronized with other clock domain may become entirely different than intended. This is the biggest drawback of using binary counters as FIFO pointers. One way to counter this problem is to use holding register for synchronization. This uses handshaking signals to communicate between synchronizer and the clock domain. Binary count values from original clock domain is sampled and held in holding register and a “ready” signal is passed to other clock domain. Upon the receipt of “ready” signal other clock domain receives the count values and sends back a synchronized acknowledgement. Original clock domain resamples another count value.

Asynchronous FIFO Pointers Using Gray Counters

Gray numbers are unidistance numbers i.e. to say that unlike binary numbers only one bit changes from one count to another count. Gray pointers too have problem of metastability while synchronizing with other clock domains but it is minimized by the fact of one bit change. Metastability condition on one bit causes +/- 1 count error that is better compared to +/-8 count error in binary pointers. Because of this minimized error gray counters are generally used as FIFO pointers.

Generation of Empty and Full Pointers

Careful observation of 4 bit gray counter values reveal that first half of the numbers (i.e. 0 to 7) are mirror image of the second half of the number (i.e. from 8 to 15) except for MSB. Recall the wrap around condition of full pointer after complete write operation of FIFO. Since read pointer is still in first location, after the wrap around of full pointer they both become equal asserting false empty condition. To avoid this one extra bit is added to both pointers. Thus, instead of using a 4-bit counter use 5 bit counter. Out of this 5 bit only 4 bits are used to address 16 memory locations, while the MSB is used to detect the wrap around conditions and pointer comparisons. Thus when write pointer increases over the final FIFO address the unused bit (MSB bit) toggles and resets remaining bits to zero. For read pointer also same treatment is given. Thus if MSBs of both pointers are same it means that both pointers have wrapped else write pointer has wrapped one more time than read pointer. With this technique if including MSB both pointers are same then it is an FIFO empty condition. If the MSBs different and remaining all bits are equal then it is FIFO full condition.

Considering the case wherein all FIFO locations except last one are written and then same number of locations is read. Now both pointers are pointing towards 15th location i.e. last location of the FIFO. On the next write operation 5 bit gray counter will be incremented and the count is 1_1000 (=16). Remember that earlier count was 0_1000 (=15). Hence when counter is incremented, only MSB (extra bit) changed and remaining address pointer bits remained as it is. Write pointer and read pointer both are pointing to same address location because MSB is not used for addressing FIFO; it is to test full and empty condition. Since write pointer extra bit has set and read pointer extra bit has reset, this condition will be assumed as FIFO full condition but in reality this is NOT at all FIFO full condition. In addition to this problem data will be overwritten on the location 1_1000. Hence necessary condition to generate full pointer is that both MSB (extra bit) and next to MSB bit of write pointer and synchronized read pointer must be different and remaining bits must be equal.

Dual n Bit Gray Code Counter-First Architecture

This architecture solves the problem which is identified in the previous section. A dual n bit gray code counter generates both (n-1) gray code (used to address the memory location) and n bit gray code sequence, nth bit being used for pointer comparison to detect empty and full condition. The block diagram of the dual n-bit gray counter is shown in Figure (4a).

Instead of using a customized gray counter solution a general method of gray code generation is presented here. Gray code and binary codes are related by equations:

gn= bn, gi=bi XOR bi+1, for all i n eq(1)

bn= gn,bi=gi XOR bi+1, for all in eq(2)

where g-->refers to gray code and bàrefers to binary code

Same equations have been implemented in the block diagram Figure (1.4b). Convert the gray code value to binary using eq(1). Add one (i.e. increment the count by one) to this binary value and convert it back to gray by implementing eq(2).

To avoid underflow or overflow design should have precautionary circuit such that when full pointer or empty pointer is asserted, counter should not be incremented any more. This is accomplished by a OR and AND gate with “not empty” and “not full” inputs. When full pointer is asserted there should not be a write operation to avoid overflow. Hence there is a necessity of a circuit which disables write_enable signal. NAND or AND can fulfill this requirement. Same way we can add status flag for “overflow” and “underflow” error indication which could only be cleared by a reset signal.




Figure (4a and 4b) Dual n Bit Gray Counter-Architecture 1 and 2 [2]

Dual n Bit Gray Code Counter-Second Architecture

In this architecture, shown in Figure (4b), both binary and gray counters are used. Binary code is used to address the memory locations of FIFO and gray code is used for synchronizing with opposite clock domain. Two registers are used to register binary code and gray code. The advantage of ease of pointer comparison in binary codes is the key factor to choose binary pointer for FIFO memory addressing. Binary code is not suitable for synchronizing with opposite clock as it has multiple bit changes from one count to another count. For this purpose gray codes are used, as these are less prone to errors.

Almost Full and Almost Empty

When write pointer reaches very close to full condition an almost full condition can be generated. When read pointer approaches empty condition (i.e. read pointer closes to catch the write pointer) then an almost empty can be generated. This almost full and almost empty condition can be generated for programmed difference between write pointer and read pointer. Thus when the programmed difference value is reached corresponding pointer flag becomes active. Almost full and almost empty condition helps the pointer comparison logic circuitry to get “ready” for the full and empty condition detection. For a fixed value of difference design of almost full and almost empty is easy-combination of NOT, AND or OR gate can do the job

Going to Full and Going to Empty

In this architecture FIFO memory location is logically divided into four quadrants. Two MSBs of two pointers are used to decode these quadrants. Possible bit combinations of MSBs are 00, 01, 10 and 11. If the write pointer is one quadrant behind the read pointer then this indicates FIFO ‘going to full’ condition. FIFO can be considered ‘going to empty’ if the above-mentioned condition is reverse.

Pessimistic Full and Empty

Delay in synchronization pointers may cause wrong reporting of full and empty condition. Full pointer may become active even if FIFO is not full or empty pointer may become active when FIFO is not yet empty- such conditions are called as ‘pessimistic reporting’. We know full pointer is generated when write pointer catches up the synchronized read pointer. When read pointer increments, one data has been read from FIFO. Detection of this status by write pointer logic takes minimum two clock cycles due to the presence of synchronizers. Hence even if there is a memory location free which can be written, write pointer logic does not allow writing the FIFO till two clock cycle delay is elapsed and it detects the read pointer status. Same is true with read pointer logic also. This pessimistic report doesn’t harm FIFO data.

Binary Counter Vs Gray Counter

Here is the trade off between binary counter and gray counter as pointers:

  • Binary pointers pose multibit synchronization problems. Gray counter minimizes this problem.
  • A gray counter designed for any mod number other than 2n, n being number of bits, does not remain as gray code. Hence we must design a mod 2n gray counter. This implies that FIFO memory location must also be 2n. But binary counters can be designed to have any mod number and hence FIFO memory locations can also be any arbitrary number.
  • Since binary arithmetic is natural, it is easy to calculate and implement almost empty and almost full with binary numbers.
  • The sampling technique using holding register and handshaking control is advantageous in passing any arbitrary multibit values or pointers. But in gray counters arbitrary value is not possible, they either increment or decrement.
  • Since gray counter has to be designed for mod (2n), FIFO depth (maximum) must also be power of 2. But in binary any depth is permitted.
  • Usage of binary pointer introduces latency of minimum 2 clock cycles in synchronization.

FIFO Depth

Size of the FIFO basically refers to the amount of data available at a given time. In asynchronous FIFO this depends on both read and write clock domain frequencies and number of data written and read (data rate). Data rate can vary depending on the two clock domain operation and requirement (and of course frequency!). The worst case condition is the maximum data rate difference between read and write clock. This can happen when data rate of writing operation is maximum and for read operation data rate is minimum.

Let fwrite -->be the frequency of write clock domain

Fread -->be the frequency of read clock domain

Bmax -->burst of data written or maximum number of data bytes can be written

Bwrite-->number of bytes that is written per clock cycle

Bread-->number of bytes that is read per clock cycle

Then FIFO size can be given by,

Fsize=Bmax- [Bmax.fread.Bread/fwrite.Bwrite]

If number of bytes read or written per clock cycle is one then we have,

Fsize=Bmax-[Bmax.fread/fwrite]

Taking an example,

fwrite=10 MHz

Fread=2.5 MHz

Let Bmax=2 then Fsize=2-[(2*2.5)/10] =2-0.5=1.5~2

If Bmax=5, then Fsize=5-[(5*2.5)/10]=5-1.25=3.75~4.


References

[1]Clifford E.Cumings, Simulation and Synthesis techniques for Asynchronous FIFO Design, http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO1.pdf (14/11/07)

[2] Clifford E.Cumings, Synthesis and Scripting Techniques for Designing Multi-Asynchronous Clock Designs, http://www.sunburst-design.com/papers/ (14/11/07)

[3] Vijay A. Nebhrajani, Asynchronous FIFO Architectures Part 1 &2, http://www.geocities.com/deepakgeorge2000/vlsi_book/async_fifo2.pdf (14/11/07)

[4] Mike Stein,Crossing the abyss:asynchronous signals in a synchronous world, http://www.edn.com/contents/images/310388.pdf (14/11/07)


Related Articles



What is the difference between FPGA and CPLD?

FPGA-Field Programmable Gate Array and CPLD-Complex Programmable Logic Device-- both are programmable logic devices made by the same companies with different characteristics.

  • "A Complex Programmable Logic Device (CPLD) is a Programmable Logic Device with complexity between that of PALs (Programmable Array Logic) and FPGAs, and architectural features of both. The building block of a CPLD is the macro cell, which contains logic implementing disjunctive normal form expressions and more specialized logic operations".

  • This is what Wiki defines.....!!
  • Click here to see what else wiki has to say about it !



Architecture

  • Granularity is the biggest difference between CPLD and FPGA.

  • FPGA are "fine-grain" devices. That means that they contain hundreds of (up to 100000) of tiny blocks (called as LUT or CLBs etc) of logic with flip-flops, combinational logic and memories.FPGAs offer much higher complexity, up to 150,000 flip-flops and large number of gates available.
  • CPLDs typically have the equivalent of thousands of logic gates, allowing implementation of moderately complicated data processing devices. PALs typically have a few hundred gate equivalents at most, while FPGAs typically range from tens of thousands to several million.
  • CPLD are "coarse-grain" devices. They contain relatively few (a few 100's max) large blocks of logic with flip-flops and combinational logic. CPLDs based on AND-OR structure.
  • CPLD's have a register with associated logic (AND/OR matrix). CPLD's are mostly implemented in control applications and FPGA's in datapath applications. Because of this course grained architecture, the timing is very fixed in CPLDs.

  • FPGA are RAM based. They need to be "downloaded" (configured) at each power-up. CPLD are EEPROM based. They are active at power-up i.e. as long as they've been programmed at least once.
  • FPGA needs boot ROM but CPLD does not. In some systems you might not have enough time to boot up FPGA then you need CPLD+FPGA.
  • Generally, the CPLD devices are not volatile, because they contain flash or erasable ROM memory in all the cases. The FPGA are volatile in many cases and hence they need a configuration memory for working. There are some FPGAs now which are nonvolatile. This distinction is rapidly becoming less relevant, as several of the latest FPGA products also offer models with embedded configuration memory.
  • The characteristic of non-volatility makes the CPLD the device of choice in modern digital designs to perform 'boot loader' functions before handing over control to other devices not having this capability. A good example is where a CPLD is used to load configuration data for an FPGA from non-volatile memory.

  • Because of coarse-grain architecture, one block of logic can hold a big equation and hence CPLD have a faster input-to-output timings than FPGA.



Features

  • FPGA have special routing resources to implement binary counters,arithmetic functions like adders, comparators and RAM. CPLD don't have special features like this.
  • FPGA can contain very large digital designs, while CPLD can contain small designs only.The limited complexity (<500>

  • Speed: CPLDs offer a single-chip solution with fast pin-to-pin delays, even for wide input functions. Use CPLDs for small designs, where "instant-on", fast and wide decoding, ultra-low idle power consumption, and design security are important (e.g., in battery-operated equipment).

  • Security: In CPLD once programmed, the design can be locked and thus made secure. Since the configuration bitstream must be reloaded every time power is re-applied, design security in FPGA is an issue.

  • Power: The high static (idle) power consumption prohibits use of CPLD in battery-operated equipment. FPGA idle power consumption is reasonably low, although it is sharply increasing in the newest families.

  • Design flexibility: FPGAs offer more logic flexibility and more sophisticated system features than CPLDs: clock management, on-chip RAM, DSP functions, (multipliers), and even on-chip microprocessors and Multi-Gigabit Transceivers.These benefits and opportunities of dynamic reconfiguration, even in the end-user system, are an important advantage.

  • Use FPGAs for larger and more complex designs.

Click here to read what Xilinx has to say about it.

  • FPGA is suited for timing circuit becauce they have more registers , but CPLD is suited for control circuit because they have more combinational circuit. At the same time, If you synthesis the same code for FPGA for many times, you will find out that each timing report is different. But it is different in CPLD synthesis, you can get the same result.


As CPLDs and FPGAs become more advanced the differences between the two device types will continue to blur. While this trend may appear to make the two types more difficult to keep apart, the architectural advantage of CPLDs combining low cost, non-volatile configuration, and macro cells with predictable timing characteristics will likely be sufficient to maintain a product differentiation for the foreseeable future.

  • There are people who discuss about this. Click here to listen them.
  • Finally here is one pdf document whcih is downloadable: "Architecture of FPGAs and CPLDs: A Tutorial" Download

Hoping that information and references helps you ....... comments and further references are welcome !

Asynchronous FIFO Design

Asynchronous FIFOs are used as buffers between two asynchronous clock domains to exchange data safely. Data is written into the FIFO from one clock domain and it is read from another clock domain. This requires a memory architecture wherein two ports of memory are available- one is for input (or write or push) operation and another is for output (or read or pop) operation. Generally FIFOs are used where write operation is faster than read operation. However, even with the different speed and access types the average rate of data transfer remains constant. FIFO pointers keep track of number of FIFO memory locations read and written and corresponding control logic circuit prevents FIFO from either under flowing or overflowing. FIFO architectures inherently have a challenge of synchronizing itself with the pointer logic of other clock domain and control the read and write operation of FIFO memory locations safely. A detailed and careful analysis of synchronizer circuit along with pointer logic is required to understand the synchronization of two FIFO pointer logic circuits which is responsible for accessing the FIFO read and write ports independently controlled by different clocks.


Why Synchronization?


It is very important to understand the signal stability in multi clock domains since for a traveling signal the new clock domain appears to be asynchronous. If the signal is not synchronized to new clock, the first storage element of the new clock domain may go to metastable state and the worst case is that resolution time can’t be predicted. It can traverse throughout the new clock domain resulting in failure of functionality. To prevent such failures setup time and hold time specification has to be obeyed in the design. Manufacturers provide statistics of probability of failure of flip-flops due to metastability characters in terms of MTBF (Mean Time Before Failure). Synchronizers are used to prevent the downstream logic from entering into the metastable state in multiclock domain with multibit data values.


Issues in Designing Asynchronous FIFO


It has been mentioned that designing of FIFO pointers for efficient working of FIFO is the key issue while designing FIFO architecture. Let us go deep into the FIFO read and write pointers. On reset both read and write pointers are pointing to the starting location of the FIFO. This location is also the first location where data has to be written at the same time this first location happens to be first read location. Therefore, in general we can say, read pointer always points to the word to be read and write pointer always points to the next location to which data has to be written.


Now let us examine data write operation. When both read and write pointers are pointing to first location of FIFO empty flag is asserted indicating the FIFO status as empty. Now data writing can be performed. Data will be written to the location where the write pointer is pointing and after the data write operation write pointer gets incremented pointing to the next location to be written. At the same time, empty flag is deasserted which indicates that FIFO is not empty, some data is available. One notable point regarding read pointer is with empty flag active the data pointed out by the read pointer is always invalid data. When first data written and empty flag status cleared (i.e. empty flag inactive) read pointer logic immediately drives the data from the location to which it was pointing to the read port of the dual port RAM, ready to be read by read logic. With this implementation of read logic the biggest advantage is that only one clock pulse is required to read from read port since previous clock cycle has already incremented read pointer and drives the data to read port. This will help in reducing latency in detecting empty and full pointer flag status. Empty status flag can be asserted in one more condition. After some n number of data write operations if same n number of read is performed then both pointers are again equal. Hence if both pointers “catch up” each other then empty flag is asserted.

Now let us examine about FIFO full status. When write pointer reaches the top of the FIFO, it is pointing towards the location, which can be written and is the last location to be written. No read operation is performed yet and read pointer is pointing to first location itself. This is one method is to generate FIFO full condition. When write pointer reaches the top of the FIFO, if full flag is asserted then it is not the actual FIFO full condition, this is only ‘almost full’ as there is one location which can be written. Similarly almost empty condition can exist in FIFO. Now a write operation causes the location to be written and increment of write pointer. Since the location was the last one write pointer wraps up to first location. Now both read and write pointers are equal and hence empty flag is asserted instead of full flag assertion, which is a fatal mistake. Hence wrap around condition of a full pointer may be a FIFO full condition.

After writing the data to FIFO (consider write pointer is in top of FIFO) some data has been read and read pointer is somewhere in between FIFO. One more write operation causes the write pointer to wrap. Note that even though write pointer is pointing to first location of FIFO this is NOT FIFO full condition, since read pointer has moved up from the first location. Further data writing pushes write pointer up. Imagine read pointer wraps around after some more read operation. Present condition is that both pointers have wrapped around but there is no FIFO full or FIFO empty condition. Data can be written to FIFO or read from the FIFO. This is being the situation how to identify and generate full and empty condition? How to synchronize and compare these two pointers to generate full and empty status? While synchronizing how to avoid possible metastable state and ‘pessimistic reporting’ (i.e. harmless wrong report; will be discussed later)? These are some key issues in designing an asynchronous FIFO.


Coming articles will discuss these issues. Also asynchronous FIFO designed using Verilog and Spartan 3 will be published. Interesting point of this design is it works as both synchronous and asynchronous FIFO !



References

[1]Clifford E.Cumings, Simulation and Synthesis techniques for Asynchronous FIFO Design, http://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO1.pdf (14/11/07)

[2] Clifford E.Cumings, Synthesis and Scripting Techniques for Designing Multi-Asynchronous Clock Designs, http://www.sunburst-design.com/papers/ (14/11/07)

[3] Vijay A. Nebhrajani, Asynchronous FIFO Architectures Part 1 &2, http://www.geocities.com/deepakgeorge2000/vlsi_book/async_fifo2.pdf (14/11/07)

[4] Mike Stein,Crossing the abyss:asynchronous signals in a synchronous world, http://www.edn.com/contents/images/310388.pdf (14/11/07)


Related Articles


Advanced Tools in Reconfigurable Computing


HW/SW Co-Design Languages/Tools can be broadly classified into high level languages and extending HDL languages.


High level languages-
C-FPGA environment

Direct mapping of C code to configuration level is possible. The software supports emulation and simulation of compiled code for debugging. The software is also capable of handling multiprocessor and multi-FPGA computational definitions. Generally this environment allows explicit data flow control within memory hierarchy. It can produce “Unified Executables” for HW or SW processor execution. The runtime libraries handle required interfacing and management. Some of the important C based mappers used for SoC designs are described below. [5]

  • SpecC: This is an extension of ANSI-C. Both behavioral and structural hierarchical embedded systems designs are supported. The language is developed for synthesis and verification in mind. But language is targeted for system-level design language intended for specification and architectural modeling.

  • HardwareC: This modified C language uniformly incorporates both functionality and design constraints.

  • SystemC from Open SystemC Initiative (OSCI): This C language very much popular in the system level verification industry. It uses C++ class libraries and simulation kernel for creating behavioral and RTL designs. This is Open-source extension of C++ for HW/SW modelling. It includes modules and ports for defining structure, and interfaces and channels. The language supports functional modelling of the design. The language supports hierarchical decomposition of a system into modules. It provides structural connectivity between modules using ports. Events help in scheduling and synchronization of concurrent processes.

  • Catapult C from Mentor Graphics: this is an algorithmic synthesis tool for RTL generation. It can generate RTL from pure C++. It does not require any extensions, pragmas, etc. The compiler uses “wrappers” around algorithmic code. This gives capability to manage external I/O interface. Internally it can constrain synthesis to optimize for chosen interface. Architectural constraints and optimization are explicitly mentioned. The tool can generate output RTL netlists in VHDL, Verilog, and SystemC.
  • DIME-C from Nallatech: This is a FPGA prototyping tool. The designs are not cycle-accurate. But allows application synthesis for a higher clock speed. The compiler includes IEEE-754 FP cores. It has dedicated integer multipliers. The compiler supports pipeline or parallel optimization. Output produced is synthesizable VHDL and DIMEtalk components.
  • Handel C from Celoxica: The tool provides good environment for cycle-accurate application development. All operations occur in one deterministic clock cycle. This feature makes it cycle-accurate. On the flip side of it clock frequency is reduced to slowest operation. Explicitly defined parallelism is achieved in language by the help of pragmas. The compiler can efficiently analyze, optimize, and rewrite code. The output generated from the compiler is VHDL or Verilog, SystemC, or targeted EDIFs.
  • Impulse C from Impulse Accelerated Technologies: Sequential applications can be modelled using this language. It can process independent, potentially concurrent, computing blocks. Utilizing Streams-C methodology it can communicate and synchronize processes. It also focuses on compatibility with C development environments. Compilation is carried out by considering each process as separate state machine. Output can be generated in two ways-either generic or FPGA specific VHDL.
  • Mitrion C from Mitrion: The language uses concept of “Soft-core” processor. The soft-core processor creates abstraction layer between C code and FPGA. Compilation of C code is achieved by mapping it to a generic “API” of possible functions. Specific application oriented processor is instantiated on FPGA. The soft-core processor supports custom instruction bit-widths, specific cache and buffer sizes. The language can produce outputs as VHDL IP core for the selected target FPGA architectures.
  • Napa C from National Semiconductor: This Language/compiler is intended for RISC or FPGA hybrid processors. It capitalize on single-cycle interconnect instead of I/O bus. It uses datapath synthesis technique. From C loops the compiler generates hardware pipelines. The language basically targets National Semiconductor NAPA1000 hybrid processor. It has Fixed-Instruction Processor (FIP) and Adaptive Logic Processor (ALP) which allow programmer to specify whether the section of the code is to be executed in software or hardware. [5]. Outputs generated from the compiler are RTL VHDL, structural VHDL and structural Verilog.
  • SA-C from Colorado State University: This is a High-level, expression-oriented, machine-independent, single-assignment language. This is designed to implicitly express data-parallel operations such as image and signal processing. The compiler can perform loop optimizations, structural transforms and execution block placement.
  • Streams C from Los Alamos National Laboratory: This is stream-oriented sequential process modelling language. Data elements are moved through discrete functional blocks. The compiler can generate multi-threaded processor executables and multiple FPGA bitstreams. Tha language allows parallel C program translation into a parallel design. Attractive feature of this language is this includes functional-level simulation environment. The compiler produces synthesizable RTL output.
  • Java Based Approaches-JHDL: Java HDL language is capable of converting Java into synthesizable HDL code. The JBits Application Programming Interface (API) is implemented in the Java programming language and permits programmatic access to all of the configurable elements in Xilinx Virtex-II FPGAs. JBits 3.0 complements Xilinx's industry leading ISE software tools and enables the design and generation of partial bitstreams for reconfigurable applications.
  • UML: Unified Modeling Language: This approach is used for System level modeling. UML is extended so that it can be applied as high level models working with other languages. UML profile for SystemC- this is another extension of the available languages that enables specify, analyze, design, construct, view software and hardware artifacts in a SoC design flow. Extending HDL Languages.
  • System Verilog: this language blends both Verilog and C.This language is an extension to IEEE 1364-2001 Verilog. The language supports interfaces that allow module connections at a high level of abstraction.
  • SuperLog: This language is Verilog superset that includes constructs from C. Verliog 2001 and SuperLog at two ends of the spectrum. For a very productive design process the language utilizes power of C with the simplicity of Verilog. MATLAB to FPGA Design IP Core Instantiation. Xilinx has DSP system generation tool.
  • Khorus: This is a special data flow tool for image processing.
  • Ptolomey: This is a graphical entry tool for system level design.


Spectrum of C-based application mappers are shown in Figure (1) and Figure (2).




Figure (1) C-based Application Mappers 1 [2]





Figure (2) C-based Application Mappers 2 [2]


Advantages of C-based application mappers

High level languages are popular for many years are skilled engineers are available in this segment. This enables potential RC users with high-level languages. Due to this required HDL knowledge is significantly reduced or eliminated. Time to preliminary results is much less than manual HDL designs methodology. Software-to-hardware porting is considerably easier due to the portioning of the design. Visualization of C hardware is far easier compared to the HDLs for engineering community. Understanding of any C based mappers is straight forward.


Disadvantages of C-based application mappers

Mapper instructions are several times more powerful than CPU instructions. On the other hand FPGA clocks are many times slower. Due to the hardware nature of mappers they parallelize and pipeline C code, however they generally cannot automatically instantiate multiple functional units. Manual parallelization of existing code using techniques pertinent to algorithm’s structure is necessary to optimize C-mapper code. Performance may degrade at the benefit of reduced development time. Available tools are not capable of automatic translation of codes. Programs still require manual assistance for hardware compilation. Optimized software C is not efficient as optimized Hardware C.

At least two major challenges remain to be addressed. Input/output interfaces become a limiting factor as once a generic I/O wrapper is generated, it should be reusable. True hardware debugging remains a challenge. Since there is no idea internal HDL signals higher level abstraction increases complexity. C and C++ family of languages mainly deal with 32 bit precision which is overhead for most embedded applications. [5]


Advanced Hardwares

There are several hard wares platforms available for reconfigurable computing applications. They provide advanced features which are out performing the standard processors. Some of them are mentioned below.

Xilinx

The Xilinx XC6200 FPGA-based Reconfigurable Co-processor provides open architecture FPGA (XC6200). They provide advanced dynamic reconfiguration capability such as high-speed reconfiguration via parallel CPU interface, full or partial reconfiguration or context switching and unlimited re-programmability.




Figure (3) Xilinx-based application acceleration module [9]


Xilinx Vertex series of processors have 4 Power PC processor embedded within. They support Microblaze soft core processor. Reconfigurable computing has got a major step with the Cray XD1 high-performance computer (see Figure (4) by breaking down performance barriers at substantially lowered cost by using off-the-shelf components from Xilinx to solve difficult computational problems. [9]

Altera

Altera have coprocessor based approach and example FPGAs is ARC-PCI, Compaq Pamette. They have system on chip solutions such as Excalibur device, ARC-PCI. Excalibur device provide embedded processors like ARM, MIPS or NIOS.


Future of Reconfigurable Computing

Several advancements in the field of reconfigurable computing hardware and software are promising greater design flexibility and reduced cost and time to market in embedded system development environments. But the lack of efficient software tools, simpler design methodologies and standards are hindering the complete development of the SoC designs. Engineering community is still not much familiar with the developments of reconfigurable computing domain. High level language based C mappers offer easiness in software development but ignorance of HDL and hardware design methodologies may result in inefficient design which can de motivate the basic power of SoC designs. Advanced FPGAs like Vertex 4 (from Xilinx) and Excalibur (from Altera) are in forefront to provide reconfigurable solutions with third party support.

Related Articles


References

[1] http://www.netrino.com/Articles/RCPrimer/index.php, 10/11/2007

[2] Brian Holland, Mauricio Vacas, Vikas Aggarwal, Ryan DeVille, Ian Troxel, and Alan D. George, Survey of C-based Application Mapping Tools for Reconfigurable Computing, University of Florida, #215 MAPLD 2005

[3] Tirumale K Ramesh, Reconfigurable Computing After a Decade: A New Perspective and Challenges for Hardware-Software Co-Design and Development, Northern Virginia Chapter IEEE Computer Society Meeting, April 14, 2005

[4] T.J. Todman, G.A. Constantinides, S.J.E. Wilton, O. Mencer, W. Luk and P.Y.K. Cheung, Reconfigurable computing: architectures and design Methods, IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, pp.193-207, March 2005

[5] Katherine Compton, Scott Hauck, Reconfigurable Computing: A Survey of Systems and Software, ACM Computing Surveys, Vol. 34, No. 2, June 2002, pp. 171–210.

[6] Dr. Steve A. Guccione, Reconfigurable Computing at Xilinx, Proceedings of the Euromicro Symposium on Digital Systems Design (DSD’01), IEEE, 0-7695-1239-9/01,2001

[7] Steven A. Guccione, Delon Levi and Prasanna Sundararajan. JBits: A Java-based Interface for Reconfigurable Computing, 2nd Annual Military and Aerospace Applications of Programmable Devices and Technologies Conference (MAPLD), 1999. http://www.io.com/~guccione/Papers/MAPPLD/JBitsMAPPLD.pdf , 10/11/2007

[8] Joao M. P. Cardoso and Mario P. Vestias, Architectures and Compilers to Support Reconfigurable Computing, www.acm.org/crossroads/xrds5-3/rcconcept.html, 10/11/2007

[9] Geert C, Wenes Steve Margerm, Sriram R. Chelluri, Designing Reconfigurable Computing Solutions, Xcell Journal, First Quarter 2006.

Advantages and Disadvantages of Reconfigurable Computing


Reconfigurable computing has several advantages and drawbacks. They are listed below.

Advantages


  • Greater Functionality

It is possible to achieve greater functionality with a simpler hardware design. The required logic can be stored in memory and hence the cost of supporting additional features is reduced to the cost of the memory required to store the logic design. This is very much useful in mobile communication domain where protocol can be easily modified to newer protocol and stored in memory and then hardware can be reconfigured to achieve the required functionality. Compelling advantage includes increased speed, reduced energy and power consumption. A study reports that depending on the particular device used moving critical software loops to reconfigurable hardware results in average energy savings of 35% to 70% with an average speedup of 3 to 7 times. [4]


  • Embedded Characteristics

In general-purpose computing processors common piece of silicon could be configured, after fabrication, to solve any computing task. This meant many applications could share commodity economics for the production of a single IC and the same IC could be used to solve different problems at different points in time. General-purpose computing meant engineers could program the component to do things which the original IC manufacturers never conceived. Embedded systems developers are much benefited from reconfigurable computing systems, especially with the introduction of soft cores which can contain one or more instruction processors. [4]

All of these "general-purpose" characteristics are shared by reconfigurable computing. Instead of computing a function by sequencing through a set of operations in time (like a processor), reconfigurable computers compute a function by configuring functional units and wiring them up in space. This allows parallel computation of specific, configured operations, like a custom ASIC. Also it can also be reconfigured. The reconfigurable hardware fabric can be easily and quickly modified from a remote location to upgrade its performance. It can be modified to perform a completely different function. Hence, non-recurring engineering (NRE) costs of reconfigurable computing are lower than that of a custom ASIC.


  • Lower System Cost

By eliminating the ASIC design lower system cost on a low-volume product is achieved. For higher-volume products, the production cost of fixed hardware is actually very much lower. In the case of ASIC and general purpose hardware designs technical obsolescence drives up the cost of systems. Reconfigurable computing systems are upgradeable and extend the useful life of the system. This reduces lifetime costs.


  • Reduced Time to Market

Reduced time-to-market is the final advantage of reconfigurable computing. Since ASIC is no longer used in reconfigurable computing large amount of development effort is reduced. The logic design remains flexible even after the product is shipped. Design can be sent to market with minimum requirements and later additional features can be added without any change in physical device (or system). Thus reconfigurable computing allows incremental design flow.

These advantages lead reconfigurable computers to serve as powerful tools for many applications. The applications include research and development tools for sophisticated electronic systems such as ASICs and printed circuit boards (PCBs). Simulation tools for these systems do not exist. Also prototype fabrication is expensive and time consuming. A reconfigurable computer can serve as an affordable, fast, and accurate tool for verifying electronic designs


Disadvantages

Two severe disadvantages of reconfigurable computing can be observed. They are the time that the chip takes to reconfigure itself to a given task, and the difficulty in programming such chips. Dynamic reconfigurable computing has several different complex issues. They are design space, placement, routing, timing, consistency and development tools. Each of these is discussed below.

  • Placement Issues

In order to reconfigure a new hardware, it requires having ample space to place the new hardware. The component placement issue becomes complex if the component needs to be placed near special resources like built- in memory, I/O pins or DLLs on the FPGA.

  • Routing Issues

Existing components has to be connected to the components newly reconfigured. The ports must be available to interface new components. The same ports must have also been used under the old configuration. To accomplish this orientation of the components should be in a workable fashion.

  • Timing Issues

Newly configured hardware must meet the timing requirement for the efficient operation of the circuit. Longer wires between components may affect the timing. Optimal speed should be attainable after dynamically reconfiguring the device. Over timing or under timing the new added design may yield erroneous result.

  • Consistency Issues

Static or dynamic reconfiguration of the device should not degrade computational consistency of the design. This issue becomes critical when the FPGA is partially reconfigured and interfaced with existing design. Adding new components to the device by reconfigurable fabric should not erase or alter the existing design in the device. (Or memory). There should be some safe methods to store the bit stream to the memory.

  • Development Tools

Commercial development tools for dynamic reconfigurable computing are still under development stage. The lack of commercially available tools for the specification to implementation stages of the digital design is still a bottleneck. The available tools require enormous human intervention to implement the complete system.


Related Articles


References

[1] http://www.netrino.com/Articles/RCPrimer/index.php, 10/11/2007

[2] Brian Holland, Mauricio Vacas, Vikas Aggarwal, Ryan DeVille, Ian Troxel, and Alan D. George, Survey of C-based Application Mapping Tools for Reconfigurable Computing, University of Florida, #215 MAPLD 2005

[3] Tirumale K Ramesh, Reconfigurable Computing After a Decade: A New Perspective and Challenges for Hardware-Software Co-Design and Development, Northern Virginia Chapter IEEE Computer Society Meeting, April 14, 2005

[4] T.J. Todman, G.A. Constantinides, S.J.E. Wilton, O. Mencer, W. Luk and P.Y.K. Cheung, Reconfigurable computing: architectures and design Methods, IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, pp.193-207, March 2005

[5] Katherine Compton, Scott Hauck, Reconfigurable Computing: A Survey of Systems and Software, ACM Computing Surveys, Vol. 34, No. 2, June 2002, pp. 171–210.

[6] Dr. Steve A. Guccione, Reconfigurable Computing at Xilinx, Proceedings of the Euromicro Symposium on Digital Systems Design (DSD’01), IEEE, 0-7695-1239-9/01, 2001

[7] Steven A. Guccione, Delon Levi and Prasanna Sundararajan. JBits: A Java-based Interface for Reconfigurable Computing, 2nd Annual Military and Aerospace Applications of Programmable Devices and Technologies Conference (MAPLD), 1999. http://www.io.com/~guccione/Papers/MAPPLD/JBitsMAPPLD.pdf , 10/11/2007

[8] Joao M. P. Cardoso and Mario P. Vestias, Architectures and Compilers to Support Reconfigurable Computing, www.acm.org/crossroads/xrds5-3/rcconcept.html, 10/11/2007

[9] Geert C, Wenes Steve Margerm, Sriram R. Chelluri, Designing Reconfigurable Computing Solutions, Xcell Journal, First Quarter 2006.

Reconfigurable Computing

Processing power is the main concern of today’s computationally intensive applications such as streaming video, image recognition and processing. In the embedded market power consumption target, packaging and manufacturing cost, time to market requirements are decreasing rapidly. Meeting these constraints are more challenging than ever before.


Such processing requirements can be fulfilled by 3 ways [3] [5]:

  • High-performance microprocessors
  • Application-specific integrated circuits (ASIC)
  • Reconfigurable computing (RC) systems


  • High-performance microprocessors

High-performance microprocessors provide an off the-shelf means of addressing processing requirements. Unfortunately for many applications, a single processor is not fast enough. In addition, the power consumption (of the order of 100W or more) and cost (thousands of dollars) of these processors limit their applications in embedded field.

  • Application-specific integrated circuits (ASIC)

An ASIC implementation provides means of implementing the design in large amount of parallelism. This custom hardware is faster and more compact than general-purpose hardware. ASIC avoids instruction fetch, decode and execution by large amount. ASICs consume less power than reconfigurable devices. An ASIC can contain just the right mix of functional units for a particular application. But ASICs, they are uneconomical for many embedded systems due to the production (mask and device) cost and the time to market (can be 6 months).Only the very highest-volume applications and lower per-unit price warrant the high nonrecurring engineering (NRE) cost of designing an ASIC.

  • Reconfigurable computing (RC) systems

A reconfigurable computing system typically contains one or more processors and a reconfigurable fabric upon which custom functional units can be built.

Organization of RC systems with respect to the coupling of the RPU to the host computer is shown in Figure (1). The processor(s) executes sequential and non-critical code, HDL is mapped to reconfigurable fabric. Reconfigurable logic provides advantage of the parallelism. RCs based on Field Programmable Gate Arrays (FPGAs) are an attractive alternative. The resulting FPGA combines the best of both general purpose and custom IC. It is faster and smaller than general-purpose hardware, yet compared to an IC, it has smaller NRE costs and transition costs. FPGAs can be easily re-customized without modifying the hardware by designing and loading a different configuration. A reconfigurable computer could be upgraded, or even reconfigured for a completely different function, from a remote location.




Figure (1) Organization of RC systems with respect to the coupling of the RPU to the host computer [8]


Related Articles


References

[1] http://www.netrino.com/Articles/RCPrimer/index.php, 10/11/2007

[2] Brian Holland, Mauricio Vacas, Vikas Aggarwal, Ryan DeVille, Ian Troxel, and Alan D. George, Survey of C-based Application Mapping Tools for Reconfigurable Computing, University of Florida, #215 MAPLD 2005

[3] Tirumale K Ramesh, Reconfigurable Computing After a Decade: A New Perspective and Challenges for Hardware-Software Co-Design and Development, Northern Virginia Chapter IEEE Computer Society Meeting, April 14, 2005

[4] T.J. Todman, G.A. Constantinides, S.J.E. Wilton, O. Mencer, W. Luk and P.Y.K. Cheung, Reconfigurable computing: architectures and design Methods, IEE Proc.-Comput. Digit. Tech., Vol. 152, No. 2, pp.193-207, March 2005

[5] Katherine Compton, Scott Hauck, Reconfigurable Computing: A Survey of Systems and Software, ACM Computing Surveys, Vol. 34, No. 2, June 2002, pp. 171–210.

[6] Dr. Steve A. Guccione, Reconfigurable Computing at Xilinx, Proceedings of the Euromicro Symposium on Digital Systems Design (DSD’01), IEEE, 0-7695-1239-9/01, 2001

[7] Steven A. Guccione, Delon Levi and Prasanna Sundararajan, JBits: A Java-based Interface for Reconfigurable Computing, 2nd Annual Military and Aerospace Applications of Programmable Devices and Technologies Conference (MAPLD), 1999. http://www.io.com/~guccione/Papers/MAPPLD/JBitsMAPPLD.pdf , 10/11/2007

[8] Joao M. P. Cardoso and Mario P. Vestias, Architectures and Compilers to Support Reconfigurable Computing, www.acm.org/crossroads/xrds5-3/rcconcept.html, 10/11/2007

[9] Geert C, Wenes Steve Margerm, Sriram R. Chelluri, Designing Reconfigurable Computing Solutions, Xcell Journal, First Quarter 2006.

What is the difference between FPGA and ASIC?

  • This question is very popular in VLSI fresher interviews. It looks simple but a deeper insight into the subject reveals the fact that there are lot of thinks to be understood !! So here is the answer.


FPGA vs. ASIC


  • Difference between ASICs and FPGAs mainly depends on costs, tool availability, performance and design flexibility. They have their own pros and cons but it is designers responsibility to find the advantages of the each and use either FPGA or ASIC for the product. However, recent developments in the FPGA domain are narrowing down the benefits of the ASICs.


FPGA

  • Field Programable Gate Arrays

FPGA Design Advantages

  • Faster time-to-market: No layout, masks or other manufacturing steps are needed for FPGA design. Readymade FPGA is available and burn your HDL code to FPGA ! Done !!
  • No NRE (Non Recurring Expenses): This cost is typically associated with an ASIC design. For FPGA this is not there. FPGA tools are cheap. (sometimes its free ! You need to buy FPGA.... thats all !). ASIC youpay huge NRE and tools are expensive. I would say "very expensive"...Its in crores....!!
  • Simpler design cycle: This is due to software that handles much of the routing, placement, and timing. Manual intervention is less.The FPGA design flow eliminates the complex and time-consuming floorplanning, place and route, timing analysis.
  • More predictable project cycle: The FPGA design flow eliminates potential re-spins, wafer capacities, etc of the project since the design logic is already synthesized and verified in FPGA device.
  • Field Reprogramability: A new bitstream ( i.e. your program) can be uploaded remotely, instantly. FPGA can be reprogrammed in a snap while an ASIC can take $50,000 and more than 4-6 weeks to make the same changes. FPGA costs start from a couple of dollars to several hundreds or more depending on the hardware features.
  • Reusability: Reusability of FPGA is the main advantage. Prototype of the design can be implemented on FPGA which could be verified for almost accurate results so that it can be implemented on an ASIC. Ifdesign has faults change the HDL code, generate bit stream, program to FPGA and test again.Modern FPGAs are reconfigurable both partially and dynamically.
  • FPGAs are good for prototyping and limited production.If you are going to make 100-200 boards it isn't worth to make an ASIC.
  • Generally FPGAs are used for lower speed, lower complexity and lower volume designs.But today's FPGAs even run at 500 MHz with superior performance. With unprecedented logic density increases and a host of other features, such as embedded processors, DSP blocks, clocking, and high-speed serial at ever lower price, FPGAs are suitable for almost any type of design.
  • Unlike ASICs, FPGA's have special hardwares such as Block-RAM, DCM modules, MACs, memories and highspeed I/O, embedded CPU etc inbuilt, which can be used to get better performace. Modern FPGAs are packed with features. Advanced FPGAs usually come with phase-locked loops, low-voltage differential signal, clock data recovery, more internal routing, high speed, hardware multipliers for DSPs, memory,programmable I/O, IP cores and microprocessor cores. Remember Power PC (hardcore) and Microblaze (softcore) in Xilinx and ARM (hardcore) and Nios(softcore) in Altera. There are FPGAs available now with built in ADC ! Using all these features designers can build a system on a chip. Now, dou yo really need an ASIC ?
  • FPGA sythesis is much more easier than ASIC.
  • In FPGA you need not do floor-planning, tool can do it efficiently. In ASIC you have do it.


FPGA Design Disadvantages

  • Powe consumption in FPGA is more. You don't have any control over the power optimization. This is where ASIC wins the race !
  • You have to use the resources available in the FPGA. Thus FPGA limits the design size.
  • Good for low quantity production. As quantity increases cost per product increases compared to the ASIC implementation.

ASIC

  • Application Specific Intergrated Circiut


ASIC Design Advantages


  • Cost....cost....cost....Lower unit costs: For very high volume designs costs comes out to be very less. Larger volumes of ASIC design proves to be cheaper than implementing design using FPGA.
  • Speed...speed...speed....ASICs are faster than FPGA: ASIC gives design flexibility. This gives enoromous opportunity for speed optimizations.
  • Low power....Low power....Low power: ASIC can be optimized for required low power. There are several low power techniques such as power gating, clock gating, multi vt cell libraries, pipelining etc are available to achieve the power target. This is where FPGA fails badly !!! Can you think of a cell phone which has to be charged for every call.....never.....low power ASICs helps battery live longer life !!
  • In ASIC you can implement analog circuit, mixed signal designs. This is generally not possible in FPGA.
  • In ASIC DFT (Design For Test) is inserted. In FPGA DFT is not carried out (rather for FPGA no need of DFT !) .

ASIC Design Diadvantages

  • Time-to-market: Some large ASICs can take a year or more to design. A good way to shorten development time is to make prototypes using FPGAs and then switch to an ASIC.
  • Design Issues: In ASIC you should take care of DFM issues, Signal Integrity isuues and many more. In FPGA you don't have all these because ASIC designer takes care of all these. ( Don't forget FPGA isan IC and designed by ASIC design enginner !!)
  • Expensive Tools: ASIC design tools are very much expensive. You spend a huge amount of NRE.
Structured ASICS

  • Structured ASICs have the bottom metal layers fixed and only the top layers can be designed by the customer.

  • Structured ASICs are custom devices that approach the performance of today's Standard Cell ASIC while dramatically simplifying the design complexity.

  • Structured ASICs offer designers a set of devices with specific, customizable metal layers along with predefined metal layers, which can contain the underlying pattern of logic cells, memory, and I/O.


FPGA vs. ASIC Design Flow Comparison




Other links