Asynchronous in a synchronous world - Part 2


Performing static timing analysis is the process of verifying that every signal path in a design meets required clock-cycle timing, whether or not all of the signal paths are even possible. Static timing analysis is not used to verify the functionality of the design, only that the design meets timing goals. In theory, timing verification could be accomplished by running exhaustive gatelevel simulations with SDF backannotation of actual timing values after a design is placed and routed. This is often referred to as dynamic timing verification.

Static timing analysis has three principal advantages over dynamic timing verification:
  • static timing analysis tools verify every single path between any two sequential elements,
  • static timing analysis does not require the generation of any test vectors, and
  • static timing analysis tools are orders of magnitude faster than trying to do timing verification running exhaustive gatelevel simulations.
Timing analysis using Synopsys tools on a completely synchronous design is relatively easy to perform using either DesignTime within the Synopsys Design Compiler or Design Analyzer environments, or by using PrimeTime.
  • Timing analysis on modules with two or more asynchronous clocks is error prone, more difficult and can be time consuming.
  • Static timing analysis on signals generated from one clock domain and latched into sequential elements within a second, asynchronous clock domain is inaccurate and for the most part worthless.
  • The timing information for a signal latched by a clock that is asynchronous to the latched signal is inaccurate because the phase relationship between the signal and the asynchronous clock is always changing; therefore, the static timing analysis tool would have to check an infinite number of phase relationships between the signal and asynchronous clock.
The fact is, one must assume that signals that pass from one clock domain to another at some point will violate either setup or hold times on the destination sequential element. There is no good reason to perform timing analysis on signals that are generated in one clock domain and registered in another asynchronous clock domain. It is a given that these signals DO violate setup and hold times on the destination register. This is why synchronizers (see section 3.0) are needed, to alleviate the problems that can occur when a signal is passed from one clock domain to another.

For RTL modules that have two or more asynchronous clocks as inputs, a designer will be required to indicate to the static timing analysis tool which signal paths should be ignored. This is accomplished by "setting false paths" on signals that cross from one clock domain to another. This can be a tedious and error prone job unless the guidelines in the next two sections are followed.

Clock Naming Conventions
Guideline: Use a clock naming convention to identify the clock source of every signal in a design.
Reason: A naming convention helps all team members to identify the clock domain for every signal in a design and also makes grouping of signals for timing analysis easier to do using regular expression "wild-carding" from within a synthesis script. A number of useful clock naming conventions have been used by various design teams. Examples included: uClk for the microprocessor clock, vClk for the video clock and dClk for the display clock. Each signal was synchronized to one of the clock domains in the design and each signal-name had to include a prefix character identifying the clock domain for that signal. Any signal that was clocked by the uClk would have a u-prefix in the signal name, such as uaddr, udata, uwrite, etc. Any signal that was clocked by the vClk would similarly have a v-prefix in the signal name, such as vdata, vhsync, vframe, etc. The same signal naming convention was used for all signals generated by any of the other clocks in the design.

Using this technique, any engineer on the ASIC design team could easily identify the clockdomain source of any signal in the design and either use the signals directly or pass the signals through a synchronizer so that they could be used within a new clock domain. The naming convention alone contributed significantly to the productivity of the design team. How do we know there was a productivity gain? One of the design engineers started his part of the ASIC design using his own naming convention, ignoring the convention in use by the other design team members. After much confusion about the signals entering and leaving his design partition, a team meeting was called and the non-compliant designer was "strongly encouraged" to rename the signals in his part of the design to conform to the team naming convention. After the signal names were changed, it became easier to interface to the partition in question. Fewer questions and less confusions occurred after the change.

Design Partitioning
Guideline: Only allow one clock per module.
Reason: Static timing analysis and creating synthesis scripts is more easily accomplished on single-clock modules or groups of single-clock modules.

Guideline: Create a synchronizer module for each set of signals that pass from just one clock domain into another clock domain.
Reason: It is given that any signal passing from one clock domain to another clock domain is going to have setup and hold time problems. No worst-case (max time) timing analysis is required for synchronizer modules. Only best case (min time) timing analysis is required between first and second stage flip-flops to ensure that all hold times are met. Also, gate-level simulations can more easily be configured to ignore setup and hold time violations on the first stage of each synchronizer.

By partitioning a design to permit only one clock per module, static timing analysis becomes a significantly easier task. The next logical step was to partition the design so that every input module signal was already synchronized to the same clock domain before entering the module. Why is this significant? If all
signals entering and leaving the module are synchronous to the clock used in the module, the design is now completely synchronous! Now the entire module can be static timing analyzed without any "false paths" and Design Compiler can be used to "group" all of the same-clock synchronous modules to perform complete, sequential static timing analysis within each clock domain.

There is one exception to the above recommendation. Multi-clock designs require at least some RTL modules to pass signals from one clock domain to modules that are clocked within a different clock domain.

For example, someone created separate synchronizer modules that permitted signals from one and only one clock domain to be passed into a module
that synchronized the signals into a new clock domain. Using the naming convention described earlier, all processor-clock generated signals (usignals)
would be used as inputs to a module that might be clocked by the video clock. This module was called the "sync_u2v" module and the RTL code did nothing more than take each usignal input and run it through a pair of flip-flops clocked by vClk. Aside from the vClk and reset inputs, every other input signal to the "sync_u2v" module had a "u" prefix and every output signal from that same module had a "v" prefix.

No worst-case timing analysis is required on the "sync" modules because we know that every input signal to these modules will have timing problems; otherwise, we would not have to pass the signals through synchronizers. The only timing analysis that we need to perform within synchronizer modules is min-time (hold time) analysis between the first and second flip-flop stages for each signal. In general, if there are n asynchronous clock domains, the design will require n(n-1) synchronizer modules, two for each pair of clock signals (example: using the uClk and vClk signals: the two synchronizer modules required would be sync_u2v and sync_v2u). Only if there are no signals that pass between two specific clock domains will a pair of synchronizer modules not be required.

After modifying all of the RTL files to create either completely synchronous modules or synchronizer modules, the task of generating synthesis scripts becomes trivial. All of the script files which previously included "set_false_path" commands were either deleted or significantly simplified. All timing problems
were easily identified and fixed (because they were all within single-clock domain groupings) and the final synthesis runs completed two weeks earlier than anticipated, putting the project back on schedule and completely justifying the decision to repartition the design.

Synthesis Scripts & Timing Analysis
Following the guidelines of the previous section, to only permit one clock per module, to require that all signals entering non-synchronizer modules are also in the same clock domain that is used to clock that module and to require that synchronizer modules only permit input signals from one other clock domain, helps to simplify the timing analysis and synthesis scripting tasks associated with a multi-clock design.

Synthesis script commands used to address multiple clock domain issues now become a matter of grouping, identifying false paths and performing min-max timing analysis.

Grouping
Group together all non-synchronizer modules that are clocked within each clock domain. One group should be formed for each clock domain in the design. These groups will be timing verified as if each were a separate, completely synchronous design.

Identifying False Paths
In general, only the inputs to the synchronizer modules require "set_false_path" commands. If a clock-prefix naming scheme is used, then wild-cards can be used to easily identify all asynchronous inputs. For example, the sync_u2v module should have inputs that all start with the letter "u". The following dc_shell command should be sufficient to eliminate all asynchronous inputs from timing analysis:
set_false_path -from { u* }

Performing Min-Max Timing Analysis
Each grouped set of modules for each clock domain is now a completely synchronous sub-design and tools such as DesignTime or PrimeTime can be used to verify worst case timing (including setup time checks) and best case timing (including hold time checks). The synchronizer blocks are timing verified separately. Worst case timing checks are not required because these modules are just composed of flip-flops to synchronize asynchronous input signals; therefore, there are no long path delays and the outputs are fully registered. After setting false paths on all of the asynchronous inputs, best case (minimum) timing verification is conducted to insure that hold times are met on all signals that are passed from the first to second stage synchronizing flip-flops.

Synchronizing Fast Signals Into Slow Clock Domains
A general problem associated with synchronizers is the problem that a signal from a sending clock domain might change values twice before it can be sampled into a slower clock domain. This problem must be considered any time signals are sent from one clock domain to another. Synchronizing slower control signals into a faster clock domain is generally not a problem since the faster clock signal will sample the slower control signal one or more times. Recognizing that sampling slower signals into faster clock domains causes fewer potential problems than sampling faster signals into slower clock domains, a designer might want to take advantage of this fact and try to steer control signals towards faster clock domains.
This has been explained in Part 1 of this article.

Asynchronous in a synchronous world - Part 1


The purpose of synchronizing signals is to protect downstream logic from the metastable state of the first flip-flop in a new clock domain.

A simple synchronizer comprises two flip-flops in series without any combinational circuitry between them. This design ensures that the first flip-flop exits its metastable state and its output settles before the second flip-flop samples it.You also need to place the flipflops close to each other to ensure the smallest possible clock skew between them.

Foundries help with signal synchronization by providing synchronizer cells. These cells usually comprise a flip-flop with a very high gain that uses more power and is larger than a standard flip-flop. Such a flip-flop has reduced setup-and hold- time requirements for the input signal and is resistant to oscillation when the input signal causes a metastable condition.

Another type of synchronizer cell contains two flip-flops, thus easing your job by placing the flip-flops close to each other and preventing you from placing any combinational logic between them. For synchronization to work properly, the signal crossing a clock domain should pass from flip-flop in the original clock domain to the first flip-flop of the synchronizer without passing through any combinational logic between the two (see Fig below).

This requirement is important because the first stage of a synchronizer is sensitive to glitches that combination logic produces. A long enough glitch that occurs at the correct time could meet the setup-and-hold requirements of the first flip-flop in the synchronizer, leading the synchronizer to pass a false-valid indication to the rest of the logic in the new clock domain.

A synchronized signal is valid in the new clock domain after two clock edges. The signal delay is between one and two clock periods in the new clock domain.A rule of thumb is that a synchronizer circuit causes two clock cycles of delay in the new clock domain, and a designer needs to consider how synchronization delay impacts timing of signals crossing clock domains.

Synchronizers fall into one of three basic categories:
level, edge-detecting, and pulse.

Level Synchronizer:
In a level synchronizer, the signal crossing a clock domain stays high and stays low for more than two clock cycles in the new clock domain. A requirement of this circuit is that the signal needs to change to its invalid state before it can become valid again. Each time the signal goes valid, the receiving logic considers it a single event, no matter how long the signal remains valid. This circuit is the heart of all other synchronizers.


Edge Synchronizer:
The edge-detecting synchronizer circuit adds a flip-flop to the output of the level synchronizer (see Fig below). The output of the additional flip-flop is inverted and ANDed with the output of the level synchronizer. This circuit detects the rising edge of the input to the synchronizer and
generates a clockwide, active-high pulse. Switching the inverter on the AND gate inputs creates a synchronizer that detects the falling edge of the input signal. Changing the AND gate to a NAND gate results in a circuit that generates an active- low pulse.

The edge-detecting synchronizer works well at synchronizing a pulse going to a faster clock domain. This circuit produces a pulse that indicates the rising or falling edge of the input signal. One restriction of this synchronizer is that the width of the input pulse must be greater than the period of the synchronizer clock plus the required hold time of the first synchronizer flip-flop. The safest pulse width is twice the synchronizer clock period. This synchronizer does not work if the input is a single clockwide pulse entering a slower clock domain; however, the pulse synchronizer solves this problem.

Pulse Synchronizer:
The input signal of a pulse synchronizer is a single clockwide pulse that triggers a toggle circuit in the originating clock domain (See Fig below). The output of the toggle circuit switches from high to low and vice versa each time it receives a pulse and passes through the level synchronizer
to arrive at one input of the XOR gate, while a one-clock-cycle-delayed version goes to the other input of the XOR. For one clock cycle, each time the toggle circuit changes state, the output
of this synchronizer generates a single clockwide pulse.

The basic function of a pulse synchronizer is to take a single clockwide pulse from one clock domain and create a single clockwide pulse in the new domain. One restriction of a pulse synchronizer is that input pulses must have a minimum spacing between pulses equal to two synchronizer clock periods. If the input pulses are closer, the output pulses in the new clock domain are adjacent to each other, resulting in an output pulse that is wider than one clock cycle. This problem is more severe when the clock period of input pulse is greater than twice the synchronizer clock period. In this case, if the input pulses are too close, the synchronizer does not detect every one.

Asynchronous in a synchronous world - Introduction


I am composing this article to explore various aspects of clock and data synchronization.

The first part of the article talks about Level Synchronizers, Edge Synchronizers and Pulse Synchronizers. The second part deals with Synthesis and Scripting Techniques for Designing Multi-Asynchronous Clock Designs.

Applications, including disk-drive controllers, CDROM/DVD controllers, modems, network interfaces,and network processors, bear inherent challenges moving data across multiple clock domains. When signals travel from one clock domain to another, the signal appears to beasynchronous in the new clock domain.

In todays design flows we have many software programs to help them create million-gate circuits, but these programs do not solve the problem of signal synchronization. It is up to the designer to know reliable design techniques that reduce the risk of failure forcircuits communicating across clock domains.

The first step in managing multiclock designs is to understand the problem of signal stability. When a signal crosses a clock domain, it appears to the circuitry in the new clock domain as an asynchronous signal. The circuit that receives this signal needs to synchronize it. Synchronization prevents the metastable state of the first storage element (flip-flop) in the new clock domain from propagating throughthe circuit.

Metastability is the inability of a flip-flop to arrive at a known state in a specific amount of time. When a flip-flop enters a metastable state, you can predict neither the element's output voltage
level nor when the output will settle to a correct voltage level. During this settling time, the flip-flop's output is at some intermediate voltage level or may oscillate and can cascade the invalid output level to flip-flops farther down the signal path. The input must be stable during a small window of time around the active edge of the clock for any flip-flop. This window of time is a function of the design of the flip-flop, the implementation technology, operating conditions, and the load on the output for outputs that are not buffered. Sharp edge rates on the input signal minimize the window.More windows of vulnerability arise as the clock frequency increases, and the probability of hitting the window increases as the data frequency increases.

Verilog code to detect if a 64bit pattern can be expressed using power of 2


module pat_det ( data_in, patDetected );

input [31:0] data_in;
output patDetected;

wire [4:0] patSum = data_in[0] + data_in[1] + data_in[2] +
data_in[3] + data_in[4] + data_in[5] +
data_in[6] + data_in[7] + data_in[8] +
data_in[9] + data_in[10] + data_in[11] +
data_in[12] + data_in[13] + data_in[14] +
data_in[15] + data_in[16] + data_in[17] +
data_in[18] + data_in[19] + data_in[20] +
data_in[20] + data_in[21] + data_in[22] +
data_in[23] + data_in[24] + data_in[25] +
data_in[26] + data_in[27] + data_in[28] +
data_in[29] + data_in[30] + data_in[31] ;

wire patDetected = (patSum == 1)? 1'b1: 1'b0;

endmodule

Verilog Shift Register with Test Bench


module shifter (result, value_in, direction, type, length);
output [7:0] result;
input [7:0] value_in;
input direction;
input [1:0] type;
input [2:0] length;

reg [7:0] value_out;

always @(value_in or direction or type or length)
begin
case ({direction, type})
3'b0_00: value_out = value_in >> length;
3'b0_01: case(length)
3'b000: value_out = value_in;
3'b001: value_out = {value_in[7], value_in[7:1]};
3'b010: value_out = {{2{value_in[7]}}, value_in[7:2]};
3'b011: value_out = {{3{value_in[7]}}, value_in[7:3]};
3'b100: value_out = {{4{value_in[7]}}, value_in[7:4]};
3'b101: value_out = {{5{value_in[7]}}, value_in[7:5]};
3'b110: value_out = {{6{value_in[7]}}, value_in[7:6]};
3'b111: value_out = {{7{value_in[7]}}, value_in[7]};
endcase
3'b0_10: case(length)
3'b000: value_out = value_in;
3'b001: value_out = {value_in[0], value_in[7:1]};
3'b010: value_out = {value_in[1:0], value_in[7:2]};
3'b011: value_out = {value_in[2:0], value_in[7:3]};
3'b100: value_out = {value_in[3:0], value_in[7:4]};
3'b101: value_out = {value_in[4:0], value_in[7:5]};
3'b110: value_out = {value_in[5:0], value_in[7:6]};
3'b111: value_out = {value_in[6:0], value_in[7]};
endcase
3'b1_00: value_out = value_in << length;
3'b1_01: value_out = {value_in[7], value_in[6:0] << length};
3'b1_10: case(length)
3'b000: value_out = value_in;
3'b001: value_out = {value_in[6:0], value_in[7]};
3'b010: value_out = {value_in[5:0], value_in[7:6]};
3'b011: value_out = {value_in[4:0], value_in[7:5]};
3'b100: value_out = {value_in[3:0], value_in[7:4]};
3'b101: value_out = {value_in[2:0], value_in[7:3]};
3'b110: value_out = {value_in[1:0], value_in[7:2]};
3'b111: value_out = {value_in[0], value_in[7:1]};
endcase
default: value_out = value_in;
endcase
end
assign result = value_out;
endmodule

--


module testbench;

reg clk, direction;
reg [1:0] type;
reg [2:0] length;
reg [7:0] value_in;

wire [7:0] result;

shifter shifter1(result, value_in, direction, type, length);

initial
begin
clk = 0;
direction = 1;
type = 1;
length = 3;
value_in = 'b11110110;

$display("direction = %d type = %d length = %d value_in = %b", direction, type, length, value_in);

#10 $display("done");

$finish;
end

always #5 clk = !clk;
always @(posedge clk)
$strobe("result: %b", result);
endmodule

Verilog Awareness


Differentiate between Inter assignment Delay and Inertial Delay ?

What are the different State machine Styles ? Which is better ? Why and when do we use one ove the other? Explain Disadv. and Adv.?

What is the difference between the following lines of code ?
  1. reg1<= #10 reg2 ;
  2. reg3 = # 10 reg 4 ;
What is value of Var1 after the following assignment ?
  1. reg Var1;
  2. initial begin
  3. Var1<= "-"
  4. end
What is the output of the below code?
  1. module quest_for_out();
  2. integer i;
  3. reg clk;
  4. initial begin
  5. clk = 0;
  6. #4 $finish;
  7. end
  8. always #1 clk = !clk;
  9. always @ (posedge clk)
  10. begin : FOR_OUT
  11. for (i=0; i < 8; i = i + 1) begin
  12. if (i == 5) begin
  13. disable FOR_OUT;
  14. end
  15. $display ( "Current i : %g" ,i);
  16. end
  17. end
  18. endmodule

What is the output of the below code?

  1. module quest_for_in();
  2. integer i;
  3. reg clk;
  4. initial begin
  5. clk = 0;
  6. #4 $finish;
  7. end
  8. always #1 clk = !clk;
  9. always @ (posedge clk)
  10. begin
  11. for (i=0; i < 8; i = i + 1) begin : FOR_IN
  12. if (i == 5) begin
  13. disable FOR_IN;
  14. end
  15. $display ( "Current i : %g" ,i);
  16. end
  17. end
  18. endmodule

Verilog Awareness


  • Consider a 2:1 mux , what will be the output F if the Select (sel) is "X" ?
  • What is the difference between blocking and nonblocking assignment? Explain with a simple example?
  • What is the difference between wire and a reg data type?
  • Write code for async reset D-Flip-Flop, Shift Register.
  • Write code for 2:1 MUX using different coding styles.
  • Write code for parallel encoder and priority encoder.
  • Different "case" usage styles! Explain.
  • What is the difference between === and == ?
  • Why is defparam used for ?
  • What is the difference between unary operator and logical operator ?
  • What is the difference between task and function ?
  • What is the difference between transport and inertial delays?
  • What is the difference between casex and case statements ?
  • What is the difference between $monitor and $display ?
  • What is the difference between compiled, interpreted, event based and cycle based simulator ?

Clock Jitter


Clock jitter is the deviation from the ideal timing of clock transition events. Because such deviation can be detrimental to high-speed data transfer and can degrade performance, jitter must bekept to a minimum in a high-speed system.

High-speed signaling is very sensitive to jitter. As signals toggle faster and faster, tighter restrictions fall on the signal transmitter and receiver. In many high-speed data applications, the clock edge must fall within a tight margin of time to capture data correctly. The more jitter in a system, the more often the clock edge will fall outside the margin. The frequency of clock edge deviations from theacceptable margin translates to the system's bit error rate (BER).

More info can be got from google search...

Verilog Blocking Vs Non Blocking, Myths and Facts


I know people who swear by blocking and some who swear by non-blocking.

So here are some thoughts.
There is very little difference between non-blocking and blocking in speed and no errors if done correctly either way.

The main differences are:
Some people like non-blocking because you can tell that a reg on the left hand side of <= is going to be a flip flop after synthesis.

/* example 1a */
reg a,b,c;
always @(posedge clock)
begin b <= a;
/* b and c will be flip flops */
c <= b;
end

/* example 1b */
reg a,b,c;
always @(posedge clock)
begin
c <= b;
b <= a;
/* b and c will be flip flops */
end

/* example 2a */
reg a,b,c;
always @(posedge clock)
begin
b = a;
c = b;
/* Only c will be a flip flop,b will go away after synthesis. */
/* We could delete the 2 above assignments and replace it with c=a;b=a; In fact, b is the same as c and can be eliminated.*/
end

/* example 2b */
reg a,b,c;
always @(posedge clock)
begin
c = b;
b = a;
/* Both b and c will be flip flops, because these 2 lines are reversed.*/
end

Example 1a, 1b and 2b are functionally the same.
Example 2a is functionally different from 2b just because the order of the statements.

Some people like blocking because it takes less memory in the simulator.

/* example NON-BLOCKING_MEMORY */
reg a,b,c;
always @(posedge clock)
begin
/* b will require 2 memory locations*/
b <= a;
/*<---because this b memory location will hold value of a */
c <= b;
/*<---and this b memory location will hold value of b before the posedge*/
end

/* example BLOCKING_MEMORY */
reg a,b,c;
always @(posedge clock)
begin
// b will require ONLY 1 memory location
c = b;
b = a;
end

Note that I am talking about SIMULATOR memory, not flip-flop count after synthesis. In most cases, the simulator has to remember the value before and after posedge clock if a reg goes between modules in order in order to "execute modules in parallel", so there may be no savings.

Some people like blocking because you can see sharing of resources more readily.
// example 5

reg [15:0] a,b,c,d,e,f,j,k,g,h;
reg [16:0] x,y,z;
always @(posedge clock)
begin
x = a + b + c + d + e + f + j + k;
y = x + g;
z = x + h;
end

// example 6
reg [15:0] a,b,c,d,e,f,j,k,g,h;
reg [16:0] y,z;
always @(posedge clock)
begin
y <= (a + b + c + d + e + f + j + k) + g;
z <= (a + b + c + d + e + f + j + k) + h;
end

Even the cheapest synthesizer should share adder logic in example 5, but a slightly smarter synthesizer is required in example 6.

You will have fewer problems with race conditions in SIMULATION if you always use non-blocking assignments inside always @(posedge clock) blocks where you want to have flip-flops.

file xyz.v :
module xyz(a,b,clk);
input b,clk;
output a;
reg a;
always @(posedge clk)
a = b;
endmodule

file abc.v :
module abc(b,c,clk);
input c, clk;
output b;
reg b;
always @(posedge clk)
b = c;
endmodule

Some of the simulators out there will execute module abc first and then module xyx.
This effectively transfers contents of c to a in ONE clk cycle. This is what some people refer to as a simulator race conditon. Other simulators will execute module xyz and then module abc giving a different simulation result. In some simulators, order of execution cannot be controlled by users.

Clock tree synthesis


Basics of Clock Tree Synthesis:
The main idea is to balance the skew between endpoints. They are built with the following constraints.

  1. Clock Skew: Difference between the clock arrival times.
  2. Clock Latency: Max delay between the clock root and clock leaf.
  3. Transition Time
  • Clock buffers are usually bigger in size and have a shorter transition time as well as a more even rise and fall times.
  • Clock nets are generally routed first and on higher metal layers with minimum detouring to give it the highest priority in routing.
  • Few other clock tree related topics that will be covered subsequently are
Signal Integrity issues/Clock nets aggressor/clock shielding:
Clock nets due to their importance have to be protected from becoming either aggressors or victims in SI closure. They are generally shielded with vdd or gnd to prevent that.
  1. Effective skew: Worst skew between two flops that are talking to each other. This is either equal to or lower than the worst skew.
  2. Useful skew: This is a concept where the skew (Difference in arrival of the clock at the flops is used to improve setup violations.
  3. Few links that have more detailed information about clock tree synthesis
  4. OCV: On chip variation
  5. Few links that have good information about clock tree

Design Guidelines and Criteria for Digital Electronics


..apparently these pages on guidelines and criteria, are from NASA. I think this is a very nice article with good amount discussion on critical aspects of Digital Design.

http://klabs.org/DEI/References/design_guidelines/nasa_guidelines/clocks/clocks.htm

"Safe" and "Unsafe" state machines


"Safe" State Machines:
If the number of states (N) is a power of 2 and you use a binary or gray-code encoding algorithm, then the state machine is "safe." This ensures that you have M number of registers where N = 2M. Because all of the possible state values (or register statuses) are reachable, the design is "safe."

"Unsafe" State Machines:
If the number of states is not a power of 2, or if you do not use binary or gray-code encoding algorithm with fully defined states (e.g., one-hot), then the state machine is "unsafe" as it can stray into an undefined state.

FSM types and significance in detail:

Binary Encoding:
1. States are numbered starting from binary '0' and above.
2. '1' flip flop for very bit of the encoded binary number.
3. States are assigned in binary sequence.

Adv:

1. Lesser number of flip flops - log(n) for n states.
2. Less area, so good for area constrained circuits.

Dis-Adv:

1. More that '1' bit can flip anytime.
2. Getting into a stale state is possible.
3. Complex decoding logic is necessary to find the state that you are currently in.
4. More number of ff toggling at the same time causes more power to be consumed.

Gray Encoding:
1. States are numbered starting from binary '0' and above in gray style.
2. One flip flop for very bit of the encoded gray code.
3. Assign adjacent gray codes to adjacent states.

Adv:

1. Same number of ff's as binary.
2. Only '1' bit is different for adjacent states, so less chance of getting in to a stale state.
3. Only '1' ff changes at any given time so less power consumed.
4. Less area so good for area constarined circuits.

Dis-Adv:

1. Decoding logic is complex.


One Hot Encoding
1. Only '1' flip flop for every state rather than '1' flip flop for every bit..
2. Only '1' flip flop can be '1' at any time, all others must be '0'.

Adv:

1. Very simple decoding logic, so checking for a particular state is as easy as reading the correspoding ff.
2. '2' ff's change their state every time - less power.

Dis-Adv:
1. More ff's

Suited for FPGAs
1. Uses the ffs in the CLBs for state decoding.
2. Lesser number of routing hops required for decoding.

Metastability


Metastability is the ability of a non-equilibrium electronic state to persist for a long period of time. Usually the term is used to describe a state that doesn't settle into equilibrium within the time required for proper operation.

The flip-flop is a device that is susceptible to metastability. It has two well-defined stable states, traditionally designated 0 and 1, but under certain conditions it can hover between them for longer than a clock cycle. This condition is known as metastability. In most cases it is considered a failure mode of the logic design and timing philosophy or implementation.

The most common cause of metastability is violating the flip-flop's setup and hold times. During the time from the setup to the hold time, the input of the flip-flop should remain stable; a change in the input in that time will have a probability of setting the flip-flop to a metastable state.

In a typical scenario where data travels from the output of a source flip-flop to the input of target flip-flop, metastability is caused by either:

(1) the target clock having a different frequency than the source flip-flop, in which case the setup and hold time of the target flip-flop will be violated eventually, or

(2) the target and source clock having the same frequency, but a phase alignment that causes the data to arrive at the target flip-flop during its setup and hold time. This can be caused by fixed overhead or variations in logic delay times on the worst case path between the two flip flops, variations in clock arrival times (clock skew), or other causes.

[Source: Wikipedia]

Synthesizable Verilog from behavioral constructs - 5


Delay statements, e.g. @(posedge clock), require careful attention if there are several in a row. If there are only delays on the positive edge of the clock you can implement them with a state machine:

Behavioural:

forever
begin
command1;
@(posedge clock);
command2;
@(posedge clock);
command3;
@(posedge clock);
end

Synthesizable:

always @(posedge clock or posedge reset)
if (reset) // reset the state machine when reset is high
begin
state <= 0;
end
else
begin
case (state)
0 : begin
command1;
state <= 1;
end

1 : begin
command2;
state <= 2;
end

2 : begin
command3;
state <= 0;
end
endcase
end

If both clock edges are present, then you could implement it in synthesizable Verilog with a state machine changing values on both clock edges:

Behavioural:

forever
begin
command1;
@(posedge clock);
command2;
@(negedge clock);
command3;
@(posedge clock);
@(posedge clock);
command4;
@(negedge clock);
end

Synthesizable:

always @(posedge clock or negedge clock or posedge reset)
if (reset) // reset the state machine when reset is high
begin
state <= 0;
end
else
begin
case (state)
0 : begin
command1;
state <= 1;
end

1 : if (clock == 1) // wait for the positive edge
begin
command2;
state <= 2;
end

2 : begin // this will definitely begin at the negative edge as state 1 precedes it
command3;
state <= 3;
end

3 : state <= 4; // we arrive at the positive edge of the clock, but need to wait a clock cycle

4 : if (clock == 1) // wait for the positive edge
begin
command4;
state <= 0;
end // we'll get back to state 0 at the negative clock edge, the right time for command1
endcase
end

As you can see, multiple clock edges requires care to implement in synthesizable Verilog.


Note: in general commandi refers to a block of commands. It is assumed there is an appropriate clock for the case statement state machines. Care is required in setting appropriate reset states, initialization, and completion of use of a state machine:

* Is there a signal to tell the state machine to begin?
* Does a done signal go high, signalling the state machine has finished?
* When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
* Is the state machine reset to the idle state by a reset signal?
* Ensure that you initialize all registers.
* Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

Synthesizable Verilog from behavioral constructs - 4


When implementing Verilog tasks in modules, the best approach is to group tasks that have the same output signals into separate modules. If different modules control the same signal, then explicit arbitration logic is required to specify which module is controlling the signal at a given time. To put tasks in a separate module, you will require start and completion handshaking signals.

Behavioural:

task update_a_and_b;
begin
a = y;
wait (z != 0) b = z;
end
endtask

...

y = x + w;
update_a_and_b; // calls the task
x = a + b;
@(posedge clock);
x = b;
if (y == x) update_a_and_b;
@(posedge clock);
command1;

Care must be taken when translating this into synthesizable Verilog, to preserve correct timing:

module update_a_and_b(do_update_a_and_b, clock, reset, a, z, done_update_a_and_b, b_temp, x_temp);
input do_update_a_and_b; // this signal should go high for only one clock cycle
input clock;
input reset; // reset this module when reset goes high
input [7:0] a;
input [7:0] z;

output done_update_a_and_b;
output [7:0] b_temp;
output [7:0] x_temp;

reg done_update_a_and_b;
reg [7:0] b_temp;
reg [7:0] x_temp;

reg state;

always @(posedge refclk or posedge reset)
if (reset)
begin
state <= 0;
done_update_a_and_b = 0;
b_temp = 0;
x_temp = 0;
end
else if (do_update_a_and_b && (state == 0))
begin
done_update_a_and_b = 0;
if (z != 0)
begin
b_temp = z;
x_temp = a+b_temp; // x value is update inside this module, as it must occur immediately when z != 0
done_update_a_and_b = 1;
// stay in state 0, we've finished
end
else state <= 1;
end
else
begin
case (state)
0 : done_update_a_and_b = 0; // do nothing, this is the idle state

1 : if (z != 0)
begin
b_temp = z;
x_temp = a+b_temp; // x value is update inside this module, as it must occur immediately when z != 0
done_update_a_and_b = 1;
state <= 0;
// stay in state 0, we've finished
end
endcase
end
endmodule

...

reg [2:0] top_state;

...

case (top_state)
0 : begin
y = x + w;
a = y; // this must happen immediately
if (z != 0) // we have to update x and b immediately if z != 0
begin
b = z;
x = a + b;
top_state <= 2;
end
else
begin
do_update_a_and_b = 1; // call the task
top_state <= 1;
end
end

1 : begin
do_update_a_and_b = 0; // stays high for only one cycle
if (done_update_a_and_b)
begin
// we assume the values of x and b weren't needed on the previous cycle, otherwise additional circuitry is needed
// or x_temp and b_temp values need to be used on that cycle - it's very difficult to coordinate this correctly
// in the general case
x = x_temp;
b = b_temp;
top_state <= 2;
end
end

2 : begin
x = b;
if (y == x)
begin
a = y; // this must happen immediately
if (z != 0) // we have to update x and b immediately if z != 0
begin
b = z;
x = a + b;
top_state <= 4;
end
else
begin
do_update_a_and_b = 1; // call the task
top_state <= 3;
end
end
else top_state <= 4;
end

3 : begin
do_update_a_and_b = 0; // stays high for only one cycle
if (done_update_a_and_b)
begin
// we assume the values of x and b weren't needed on the previous cycle, otherwise additional circuitry is needed
// or x_temp and b_temp values need to be used on that cycle - it's very difficult to coordinate this correctly
// in the general case
x = x_temp;
b = b_temp;
top_state <= 4;
end
end

4: command1;
endcase

Now if we didn't care about having a couple of additional cycle delays between updates (i.e. assuming nothing depends on the variable values immediately, and nothing else is changing variable values), we could implement this in a far simpler fashion:

module update_a_and_b(do_update_a_and_b, clock, reset, a, z, done_update_and_b, b_temp, x_temp);
input do_update_a_and_b; // this signal should go high for only one clock cycle
input clock;
input reset; // reset this module when reset goes high
input [7:0] a;
input [7:0] z;

output done_update_and_b;
output [7:0] b_temp;
output [7:0] x_temp;

reg done_update_and_b;
reg [7:0] b_temp;
reg [7:0] x_temp;

reg state;

always @(posedge refclk or posedge reset)
if (reset)
begin
state <= 0;
done_update_and_b = 0;
b_temp = 0;
x_temp = 0;
end
else if (do_update_a_and_b && (state == 0))
begin
state <= 1;
done_update_a_and_b = 0;
end
else
begin
case (state)
0 : done_update_and_b = 0; // do nothing, this is the idle state

1 : if (z != 0)
begin
b_temp = z;
x_temp = a+b_temp; // x value is update inside this module, as it must occur immediately when z != 0
done_update_and_b = 1;
state <= 0;
// stay in state 0, we've finished
end
endcase
end
endmodule

...

reg [2:0] top_state;

...

case (top_state)
0 : begin
y = x + w;
do_update_a_and_b = 1; // call the task
top_state <= 1;
end

1 : begin
do_update_a_and_b = 0; // stays high for only one cycle
if (done_update_and_b)
begin
x = x_temp;
b = b_temp;
top_state <= 2;
end
end

2 : begin
x = b;
if (y == x)
begin
do_update_a_and_b = 1; // call the task
top_state <= 3;
end
else top_state <= 4;
end

3 : begin
do_update_a_and_b = 0; // stays high for only one cycle
if (done_update_and_b)
begin
x = x_temp;
b = b_temp;
top_state <= 4;
end
end

4: command1;
endcase



Note: in general commandi refers to a block of commands. It is assumed there is an
appropriate clock for the case statement state machines.
Care is required in setting appropriate reset states, initialization, and completion of use of a
state machine:

o Is there a signal to tell the state machine to begin?
o Does a done signal go high, signalling the state machine has finished?
o When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
o Is the state machine reset to the idle state by a reset signal?
o Ensure that you initialize all registers.
o Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

synthesizable Verilog from behavioral constructs - 3


To modify a behavioural Verilog fork and join statement to make it synthesizable.


Behavioural:

command1;
fork
// start of fork block 1
begin
wait (y != 0);
a = y;
end

// start of fork block 2
begin
wait (z != 0);
b = z;
end
join
command2;

command2 will execute only after fork blocks 1 and 2 have finished.

Synthesizable:

case (state)
0 : begin
command1;
done_fork_block_1 = 0;
done_fork_block_2 = 0;

if (y != 0)
begin
a = y;
done_fork_block_1 = 1;
end

if (z != 0)
begin
b = z;
done_fork_block_2 = 1;
end

if (done_fork_block_1 & done_fork_block_2) command2;
else state <= 1;
end

1 : begin
if ((y != 0) && !done_fork_block_1)
begin
a = y;
done_fork_block_1 = 1;
end

if ((z != 0) && !done_fork_block_2)
begin
b = z;
done_fork_block_2 = 1;
end

if (done_fork_block_1 & done_fork_block_2) command2;
// else state <= 1;
end
endcase

In some special cases, it may not be necessary to have done signals, but in general the blocks of commands being executed in parallel by fork may finish at different times. Again, if a cycle delay between command1 and the other commands executing is acceptable, then this code is simpler:

case (state)
0 : begin
command1;
done_fork_block_1 = 0;
done_fork_block_2 = 0;
state <= 1;
end

1 : begin
if ((y != 0) && !done_fork_block_1)
begin
a = y;
done_fork_block_1 = 1;
end

if ((z != 0) && !done_fork_block_2)
begin
b = z;
done_fork_block_2 = 1;
end

if (done_fork_block_1 & done_fork_block_2) command2;
// else state <= 1;
end
endcase

As y or z may have different values after a clock cycle passes, care needs to be taken in choosing the simpler alternative, that doesn't exactly implement the behavioural code.


Note: in general commandi refers to a block of commands. It is assumed there is an appropriate clock for the case statement state machines. Care is required in setting appropriate reset states, initialization, and completion of use of a state machine:

* Is there a signal to tell the state machine to begin?
* Does a done signal go high, signalling the state machine has finished?
* When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
* Is the state machine reset to the idle state by a reset signal?
* Ensure that you initialize all registers.
* Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

synthesizable Verilog from behavioral constructs - 2


Modifying a behavioural Verilog while statement to make it synthesizable.


Behavioural:

command1;
while (x != 0)
begin
command2;
end
command3;

Synthesizable:

case (state)
0 : begin
command1;
if (x != 0)
begin
command2;
state <= 1;
end
else command3;
end

1 : if (x != 0)
begin
command2;
end
else command3;
endcase

Again, if a cycle delay between command1 and the other commands executing is acceptable, simpler code is the following:

case (state)
0 : begin
command1;
state <= 1;
end

1 : if (x != 0)
begin
command2;
end
else command3;
endcase



Note: in general commandi refers to a block of commands. It is assumed there is an
appropriate clock for the case statement state machines.
Care is required in setting appropriate reset states, initialization, and completion of use of a
state machine:

o Is there a signal to tell the state machine to begin?
o Does a done signal go high, signalling the state machine has finished?
o When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
o Is the state machine reset to the idle state by a reset signal?
o Ensure that you initialize all registers.
o Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

synthesizable Verilog from behavioral constructs - 1


Modifying a behavioural Verilog wait statement to make it synthesizable.


Behavioural:

command1;
wait (x != 0);
command3;

Synthesizable:

case (state)
0 : begin
command1;
if (x != 0) command3;
else state <= 1;
end

1 : if (x != 0) // wait until this is true
command3;
endcase

You also need to add the variable state, reg state. If a cycle delay between command1 and command3 does not matter, then the following is simpler, but not identical to the original:

case (state)
0 : begin
command1;
state <= 1;
end

1 : if (x != 0) // wait until this is true
command3;
endcase

The latter approach is preferred in many cases for coding simplicity.

Note: in general commandi refers to a block of commands. It is assumed there is an appropriate clock for the case statement state machines. Care is required in setting appropriate reset states, initialization, and completion of use of a state machine:

* Is there a signal to tell the state machine to begin?
* Does a done signal go high, signalling the state machine has finished?
* When it is not in operation, does the state machine idle correctly? Does it change signal values shared with other code? Does it set outputs from it to appropriate idling values?
* Is the state machine reset to the idle state by a reset signal?
* Ensure that you initialize all registers.
* Ensure that your state register has the correct bit width - if it is too small, assigning a larger state value will just return it to an earlier state.

Matters


I have been getting lots of emails asking for interview questions, solutions, and other questions pertaining to Digital Electronics.

I m maintaining this blog as a hobby, and if you want more than what i am willing or capable to post bcause of time constraints... then i will so do that for monetary benefits....

1$ - 5$ flat is what i am asking....

You may contact me at, "onenanometer@yahoo.com"

Thanks for visiting!

A nice site for basics on Digital Logic Design


http://www.allaboutcircuits.com/, Look up Volume IV - Digital

Summary of topics covered...

Chapter 1: NUMERATION SYSTEMS
Chapter 2: BINARY ARITHMETIC

Chapter 3: LOGIC GATES

Chapter 4: SWITCHES

Chapter 5: ELECTROMECHANICAL RELAYS

Chapter 6: LADDER LOGIC

Chapter 7: BOOLEAN ALGEBRA

Chapter 8: KARNAUGH MAPPING

Chapter 9: COMBINATIONAL LOGIC FUNCTIONS
Chapter 10: MULTIVIBRATORS
Chapter 11: COUNTERS

Chapter 12: SHIFT REGISTERS

Chapter 13: DIGITAL-ANALOG CONVERSION

Chapter 14: DIGITAL COMMUNICATION

Chapter 15: DIGITAL STORAGE (MEMORY)

Chapter 16: PRINCIPLES OF DIGITAL COMPUTING

1's complement and 2's complement


In general, we (human beings) express negative numbers by placing a minus (-) sign at the left end of the number. Similarly while representing the integers in binary format, we can leave the left-most bit be the sign bit. If the left-most bit is a zero, the integer is positive; if it is a one, it is negative.

To make it easy to design computers which do integer arithmetic, integers should obey the following rules:

(1) Zero is positive and -0 = 0
(2) The top-most bit should tell us the sign of the integer.
(3) The negative of a negative integer is the original integer ie., --55 is 55.
(4) x - y should give the same result as x + -y. That is, 8 - 3 should give us the same result as 8 + -3.
(5) Negative and positive numbers shouldn't be treated in different ways when we do multiplication and division with them.


2s complement has become the standard method of storing signed binary integers. It allows the representation of numbers in the range – (2n) to 2n-1, and has the major advantage of only having one encoding for 0.

A simple and elegant way to represent integers which obeys these rules is called 2s complement. The 2s complement of an integer is calculated by changing all bits of integer from 1 to 0 & 0 to 1, then adding 1 to the result.

1's complement addition is distinguished from the 2's complement addition typically encountered in (unsigned) computer arithmetic by how overflow bits are handled. 1's complement overflow bits are carried around back into the sum while 2's complement overflow bits are discarded. In general, the inverse of a number under a given mathematical operation is the value which when operated on with that number returns the identity element. The 1's complement additive inverse of a number is its bitwise complement (replace 0s with 1s and 1s with 0s). This proposal relies on a number and its complement summing to zero (the additive identity element). Actually they sum to negative zero--1's complement addition has two identity elements. Recall that an identity element under a given operation is a value which leaves any other number unchanged when the operation is applied. Under 1's complement arithmetic the addition of either zero (all 0's) or negative zero (all 1's) to a number will generate a sum equal to the original number.

1's complement addition is both associative and commutative (it forms an Abelian group over the unsigned integers), so it is immaterial whether an identity element is added to a number or the number is added to an identity element, or whether the number operates on its inverse or the inverse operates on the number--both arrangements have the same result. Also note that the operation of subtraction is equivalent to adding the inverse (complement) of the number.

negative setup and hold time


A negative setup and hold condition is a very interesting proposition in static timing analysis. Support for this type of conditions was added in the Verilog LRM, only in the late 90's (using the $SETUP and $HOLD constructs).

The basic idea is something like this:
Consider a module with an ideal flop in it. Now, there exists a data path (from primary inputs of module to D of flop) and a clock path (from primary inputs to CLK of flop). Suppose the data path delay is DD and clock path delay is 0 . Therefore, if we consider the clock pulse reaching at the primary input of the module as the reference time, the clock pulse will reach CLK pin (of flop) at 0. The data pulse will reach D pin at DD. Therefore, for setup check to be met, the data pulse must reach the primary inputs of the module, at -D, which means the setup requirement is D. Now consider a clock path delay of CD. This means that the clock pulse now reaches the flop, only after time CD. This means, the data pulse need not begin so early, and rather it has to begin at -DD+CD time(just right shifting the pulse by CD time). This means the setup requirement is now DD-CD. In this case, if CD>DD, then the setup requirement becomes negative, which means, the data pulse can reach the primary input of the module after the clock pulse has reached there.

Similarly for hold:
Consider that the data delay is 0 and the clock delay is CD. Now, the data must not change for atleast CD time, for the flop to be able to latch it. Therefore, the hold requirement is CD. Now, consider a data delay of DD. This means that, now the data need not change only for CD -DD. This is the new hold requirement. If DD>CD, then hold requirement is negative. If we analyse these results mathematically, we can see that setup relation + hold relation =0.

Physically, this implies that an infinitesimally short pulse (a delta pulse) can be captured; which is of course not possible. A more accurate model would be:setup_val<DD-CD (for setup to be met, the time at which data begins should be atleast DD-CD before 0) hold_val < CD-DD (for hold to be met, the time for which the data should be stable should always be greater than the hold_val) Now, the model we described, regarding the module with an ideal flop, is actually a real world flop. In an actual flop, there are more than one data paths and 8 clock paths. Therefore the more accurate description would be: DDmax-CDmin >= setup_val (for setup to be met) CDmax-DDmin >= hold_val (for hold to be met) These kind of relationships, especially the ones, where a negative relations can hold cause problems in simulators. Take for example a data pulse, which rises at 0.0 and falls at 2.0. Now the clock pulse rises at 3.0 . Lets say data delay is 1.0 Assume the origin at the clock pulse (3.0) . Therefore data rise is at -3.0, fall is at -2.0 . The setup relationship may be specified as 2.0, which means data should be present at 0.0-2.0=-2.0 . Now data will arrive at -3.0+DD-CD=-3.0+1.0+0.0=-2.0 (setup OK) The hold relationship may be specified as -1.0, which means data must not change till 0.0+(-1.0)=-1.0. Now, according to our relationship, data will not change till 0.0+CD-DD=0.0-1.0=-1.0. All looks hunky dory...but... There is no problem with the timing checks, however in software, the simulator would capture the falling 2.0 edge rather than the high edge. So the simulator will get the functionally incorrect results, though timing accurate. If both setup and hold relationships were positive, then this would never have happened. So now what ? Very simple actually, instead of taking an ideal clock, the simulator takes a delayed clock. Therefore all calculations are done wrt this delayed clock (in the above example clock is delayed -1 wr.t data), so the simulator will not latch the falling edge.

whew...