Advanced Chip Design Verilog Book

By

Kishore Mishra

  • Home
  • Order a Book
  • About the Author
  • Book: Table of Contents
  • Sample Pages from Book
    • Reusable and Modular Design
    • Synchronization Techniques
    • Throughput
    • FIFO Advanced Concepts
  • Let's Take a Detour
    • Nothing Like It!

Synchronization Techniques

6.14        Synchronization Techniques

Data synchronization and seamless data transfer across different clock domains is a common occurrence in chip design. It is a very important part that needs to be done correctly to avoid any glitches, system hang, or data corruptions. These kinds of issues are not easily detectable and occur irregularly. That is why it is critical that it is correct by design. There are various methods that can be employed for data synchronization.  It is important to understand these methods so that right method can be used for a particular situation. Let us look at a real chip – South Bridge in a computer that connects to the outside world through various interfaces such as PCI express, USB, SATA, Gigabit Ethernet, and many more. All these diverse interfaces work in their respective clock domains that are different from one another. The data from the peripheral devices come to the chip and ultimately go to the North Bridge through a common interface. Proper data synchronization needs to take place.

The various techniques that can be employed are broadly classified as follows.

  • Using FIFO for asynchronous domains
  • Using full handshake for asynchronous domains
  • Deterministic delay for fixed phase, synchronous domains
  • Plesio-chronous domains

 

6.14.1           Data Synchronization Using FIFO

When there are two asynchronous clock domains involved, and data is changing in almost every clock period (bursts or data packets), dual-port FIFOs are best suited. A FIFO has two ports – one port to write incoming data into it, and the second port to read out data from it. Each port operates in separate clock domain and has its own pointers (address) to write and read data. Since each port operates in its own clock domain, the write and read operations can happen independent of each other without any glitch. Writing to the FIFO is stopped when the FIFO becomes full that holds off the writing operation until further room in the FIFO becomes available. Similarly while reading out data, the read operation stops when the FIFO becomes empty. The read operation resumes when more data is written into the FIFO. The two main flags involved in the FIFO operation are full flag and empty flag. More detailed descriptions on FIFO operation is provided later in chapter eight.

 

6.14.2           Data Synchronization with Full Handshake

FIFOs are good for synchronizing burst traffic, but there are applications where the data has to be transferred across clock domains occasionally.  FIFOs also require larger area, very involved control logic, and some latency. Therefore, for applications where data is not changing continuously, a handshake approach is better suited.

These are the steps involved in a full-handshake data transfer scheme.

  • Let us denote transmit side with suffix, _t and receive side with _r. Let us call the transmit clock ‘tclk’ and the receive clock ‘rclk’. Data has to be transferred from tclk to rclk domain.
  • When the data to be transferred is ready, the transmit side asserts t_rdy signal. This signal must come directly from a tclk flop.
  • When t_rdy is asserted, t_data is stable and needs to be kept unchanged.
  • The receiving side synchronizes the t_rdy control signal in rclk domain through double synchronizer. Let us call this t_rdy_rclk.
  • The receiving side can now use the t_rdy_rclk signal to get the t_data into its rclk domain safely. Let us call the data t_data_rclk. This data is safe to use in rclk domain, as it has been transferred correctly.
  • Then, the receive side asserts r_ack signal and keeps it asserted. The signal r_ack must come directly from a rclk flop.
  • The transmit side synchronizes the r_ack signal through double synchronizer in the tclk domain. Let us call this signal r_ack_tclk.
  • Up to this point, all the above steps are called “Half Handshake.”   This is called half handshake as the transmit side does not wait for the r_ack_tclk to be low before it can drive the next data.
  • The advantage of half handshake is that it is faster compared to a full handshake mechanism. But, half handshake needs to be employed carefully and when used improperly, could result in incorrect operation.
  • Half handshake works well when transferring data from a slower clock domain to a faster clock domain as the receiving side is faster to complete the operation. However, if the transfer has to take place from a faster clock domain to a slower domain, a full handshake is required.
  • When r_ack_tclk is detected high, the transmit side drives the t_rdy signal low.
  • When t_rdy_rclk goes low, the receive side drives r_ack low.
  • When transmit side sees the r_ack_tclk low, it completes the full handshake process, and the transmit side can initiate another data transfer phase.
  • As we can see, the full handshake process takes longer and may not be fast enough for burst transfers. However, we also notice that, full handshake is foolproof and can safely transfer data between two clock domains of any frequency.

 

6.14.3           Pulse Synchronizer

Pulse synchronizer takes a single pulse in the source clock domain and generates single pulse in the destination clock domain. Pulse synchronizer actually uses the full handshake mechanism to generate the output pulse. Before we discuss how it works, let us talk about where it can be used. A state machine wants to update some value in a different clock domain or set a status bit in a different clock domain. It could very well drive and hold a signal as is done in a full handshake interaction. However, the state machine will be tied up until the handshake is complete.

One way for the state machine to handle this is to generate an update pulse in the source clock domain and move on to do other stuffs. The pulse synchronizer can take the pulse and finish the rest of the work. However, one thing to note is that there needs to be sufficient gaps between two pulse generations - the pulse synchronizer still needs to have the time to complete the full handshake before another pulse comes in. Otherwise, the logic will not work correctly. Here are the steps and the Verilog code for the pulse synchronizer logic.

module           pulse_synchronizer     

                        (clksrc,             

                        resetb_clksrc,

                        clkdest,

                        resetb_clkdest,

                        pulse_src,

                        pulse_dest);

// ********************************************

input                 clksrc;

input                 resetb_clksrc;

input                 clkdest;

input                 resetb_clkdest;

input                 pulse_src;          // pulse in source clock domain

output               pulse_dest;        // pulse in destination clock domain

 

Steps

  • Based on the source pulse (pulse_src) assertion, generate a signal in source clock domain and hold it asserted (let us call it sig_stretched)
  • Double-synchronize sig_stretched in destination clock domain (let us call it sig_stretched_dest)
  • Double-synchronize the sig_stretched_dest back into source clock domain (let us call it sig_stretched_ack)
  • Generate a pulse based on sig_stretched_ack = 1, and use this feedback pulse to drive sig_stretched to zero (this completes the handshake)
  • Based on sig_stretched_dest, generate a pulse in destination clock domain (call it pulse_dest)

reg        sig_stretched;

wire      sig_stretched_nxt;

reg        sig_stretched_sync1, sig_stretched_dest;

reg        sig_stretched_dest_d1;

reg        sig_stretched_ack_pre, sig_stretched_ack,

reg        sig_stretched_ack_d1;

wire      sig_stretched_ack_edge;

wire      pulse_dest;

 

assign   sig_stretched_nxt =       sig_stretched_ack_edge ? 1'b0 :

                                    (pulse_src ? 1'b1 :  sig_stretched);          

 

always @(posedge clksrc  or negedge resetb_clksrc)

  begin

            if (!resetb_clksrc)

                        sig_stretched     <= 1'b0;

            else

                        sig_stretched     <= sig_stretched_nxt;

  end

 

//First two flops for synchronizing and the third one for pulse generation

always @(posedge clkdest  or negedge resetb_clkdest)

  begin

            if (!resetb_clkdest)

              begin

                        sig_stretched_sync1       <= 1'b0;

                        sig_stretched_dest         <= 1'b0;

                        sig_stretched_dest_d1   <= 1'b0;

              end

            else

              begin

                        sig_stretched_sync1       <= sig_stretched;

                        sig_stretched_dest         <= sig_stretched_sync1;

                        sig_stretched_dest_d1   <= sig_stretched_dest;

              end

  end

 

// First two flops are for synchronizing back to source clock domain.

// third flop is for edge detection

always @(posedge clksrc  or negedge resetb_clksrc)

  begin

            if (!resetb_clksrc)

              begin

                        sig_stretched_ack_pre   <= 1'b0;

                        sig_stretched_ack          <= 1'b0;

                        sig_stretched_ack_d1    <= 1'b0;

              end

            else

              begin

                        sig_stretched_ack_pre   <= sig_stretched_dest;

                        sig_stretched_ack          <= sig_stretched_ack_pre;

                        sig_stretched_ack_d1    <= sig_stretched_ack;

              end

  end

assign   sig_stretched_ack_edge  = sig_stretched_ack &

                                                     !sig_stretched_ack_d1;

// Pulse generation in destination clock domain

assign   pulse_dest         = sig_stretched_dest &

                           ! sig_stretched_dest_d1;

endmodule

 

6.14.4           Fixed Phase, Synchronous Domain

When the two clocks are of same frequency or integral multiple of each other and have a fixed known phase between their rising edges, data can be transferred without a FIFO or a handshake protocol. Put deterministic delay in the data path (should be good across PVT corners) so that it is enough to push the data beyond the set-up and hold time window of the receiving clock edge.  In the beginning (after reset# deassertion), sample the data and accordingly select the amount of delay from a variable delay chain so that data is sampled correctly. Once the delay value is determined in the beginning, it holds good throughout.

This helps to reduce latency in data transfer as encountered when using a FIFO or handshake. Some examples are DDR data transfer when 1x data from the controller is transferred as 2x data. Also, the example we presented earlier (wide-to-narrow and narrow-to-wide) data transfer works on the synchronous frequency and fixed phase principle.

6.14.5           Plesio-chronous Clock Domains

In this scheme, both the clocks are very close in frequency but not exactly same. This is because, the transmit and receive clocks are generated separately, and depending on the quality of the crystal used for clock generation, they can be very close. PCIe requires that the transmit and receive clocks be accurate within 300ppm. What this means is that over a relatively longer period (1300+ clock cycles in PCIe, for example) both clocks will be off by one cycle. Next, we discuss how to synchronize data in this type of clocking scheme.

This scheme is generally used in serial protocols (PCIe, SATA), where an elasticity FIFO is used to synchronize the two clock domains. The elasticity FIFO not only takes care of synchronizing the different frequency but also has mechanism to take care of the data rate mismatch over longer periods. PCIe and SATA spec requires that transmitter insert null data periodically in to the transmit data stream. Depending on frequency of the receiving side, these null symbols are either dropped or added to the elasticity FIFO.



Copyright 2013 Advanced Chip Design Book. All rights reserved.

Web Hosting by Yahoo!