Data synchronization and seamless data transfer across different clock domains is a common occurrence in chip design. It is a very important part that needs to be done correctly to avoid any glitches, system hang, or data corruptions. These kinds of issues are not easily detectable and occur irregularly. That is why it is critical that it is correct by design. There are various methods that can be employed for data synchronization. It is important to understand these methods so that right method can be used for a particular situation. Let us look at a real chip – South Bridge in a computer that connects to the outside world through various interfaces such as PCI express, USB, SATA, Gigabit Ethernet, and many more. All these diverse interfaces work in their respective clock domains that are different from one another. The data from the peripheral devices come to the chip and ultimately go to the North Bridge through a common interface. Proper data synchronization needs to take place.
The various techniques that can be employed are broadly classified as follows.
When there are two asynchronous clock domains involved, and data is changing in almost every clock period (bursts or data packets), dual-port FIFOs are best suited. A FIFO has two ports – one port to write incoming data into it, and the second port to read out data from it. Each port operates in separate clock domain and has its own pointers (address) to write and read data. Since each port operates in its own clock domain, the write and read operations can happen independent of each other without any glitch. Writing to the FIFO is stopped when the FIFO becomes full that holds off the writing operation until further room in the FIFO becomes available. Similarly while reading out data, the read operation stops when the FIFO becomes empty. The read operation resumes when more data is written into the FIFO. The two main flags involved in the FIFO operation are full flag and empty flag. More detailed descriptions on FIFO operation is provided later in chapter eight.
FIFOs are good for synchronizing burst traffic, but there are applications where the data has to be transferred across clock domains occasionally. FIFOs also require larger area, very involved control logic, and some latency. Therefore, for applications where data is not changing continuously, a handshake approach is better suited.
These are the steps involved in a full-handshake data transfer scheme.
Pulse synchronizer takes a single pulse in the source clock domain and generates single pulse in the destination clock domain. Pulse synchronizer actually uses the full handshake mechanism to generate the output pulse. Before we discuss how it works, let us talk about where it can be used. A state machine wants to update some value in a different clock domain or set a status bit in a different clock domain. It could very well drive and hold a signal as is done in a full handshake interaction. However, the state machine will be tied up until the handshake is complete.
One way for the state machine to handle this is to generate an update pulse in the source clock domain and move on to do other stuffs. The pulse synchronizer can take the pulse and finish the rest of the work. However, one thing to note is that there needs to be sufficient gaps between two pulse generations - the pulse synchronizer still needs to have the time to complete the full handshake before another pulse comes in. Otherwise, the logic will not work correctly. Here are the steps and the Verilog code for the pulse synchronizer logic.
module pulse_synchronizer
(clksrc,
resetb_clksrc,
clkdest,
resetb_clkdest,
pulse_src,
pulse_dest);
// ********************************************
input clksrc;
input resetb_clksrc;
input clkdest;
input resetb_clkdest;
input pulse_src; // pulse in source clock domain
output pulse_dest; // pulse in destination clock domain
Steps
reg sig_stretched;
wire sig_stretched_nxt;
reg sig_stretched_sync1, sig_stretched_dest;
reg sig_stretched_dest_d1;
reg sig_stretched_ack_pre, sig_stretched_ack,
reg sig_stretched_ack_d1;
wire sig_stretched_ack_edge;
wire pulse_dest;
assign sig_stretched_nxt = sig_stretched_ack_edge ? 1'b0 :
(pulse_src ? 1'b1 : sig_stretched);
always @(posedge clksrc or negedge resetb_clksrc)
begin
if (!resetb_clksrc)
sig_stretched <= 1'b0;
else
sig_stretched <= sig_stretched_nxt;
end
//First two flops for synchronizing and the third one for pulse generation
always @(posedge clkdest or negedge resetb_clkdest)
begin
if (!resetb_clkdest)
begin
sig_stretched_sync1 <= 1'b0;
sig_stretched_dest <= 1'b0;
sig_stretched_dest_d1 <= 1'b0;
end
else
begin
sig_stretched_sync1 <= sig_stretched;
sig_stretched_dest <= sig_stretched_sync1;
sig_stretched_dest_d1 <= sig_stretched_dest;
end
end
// First two flops are for synchronizing back to source clock domain.
// third flop is for edge detection
always @(posedge clksrc or negedge resetb_clksrc)
begin
if (!resetb_clksrc)
begin
sig_stretched_ack_pre <= 1'b0;
sig_stretched_ack <= 1'b0;
sig_stretched_ack_d1 <= 1'b0;
end
else
begin
sig_stretched_ack_pre <= sig_stretched_dest;
sig_stretched_ack <= sig_stretched_ack_pre;
sig_stretched_ack_d1 <= sig_stretched_ack;
end
end
assign sig_stretched_ack_edge = sig_stretched_ack &
!sig_stretched_ack_d1;
// Pulse generation in destination clock domain
assign pulse_dest = sig_stretched_dest &
! sig_stretched_dest_d1;
endmodule
When the two clocks are of same frequency or integral multiple of each other and have a fixed known phase between their rising edges, data can be transferred without a FIFO or a handshake protocol. Put deterministic delay in the data path (should be good across PVT corners) so that it is enough to push the data beyond the set-up and hold time window of the receiving clock edge. In the beginning (after reset# deassertion), sample the data and accordingly select the amount of delay from a variable delay chain so that data is sampled correctly. Once the delay value is determined in the beginning, it holds good throughout.
This helps to reduce latency in data transfer as encountered when using a FIFO or handshake. Some examples are DDR data transfer when 1x data from the controller is transferred as 2x data. Also, the example we presented earlier (wide-to-narrow and narrow-to-wide) data transfer works on the synchronous frequency and fixed phase principle.
In this scheme, both the clocks are very close in frequency but not exactly same. This is because, the transmit and receive clocks are generated separately, and depending on the quality of the crystal used for clock generation, they can be very close. PCIe requires that the transmit and receive clocks be accurate within 300ppm. What this means is that over a relatively longer period (1300+ clock cycles in PCIe, for example) both clocks will be off by one cycle. Next, we discuss how to synchronize data in this type of clocking scheme.
This scheme is generally used in serial protocols (PCIe, SATA), where an elasticity FIFO is used to synchronize the two clock domains. The elasticity FIFO not only takes care of synchronizing the different frequency but also has mechanism to take care of the data rate mismatch over longer periods. PCIe and SATA spec requires that transmitter insert null data periodically in to the transmit data stream. Depending on frequency of the receiving side, these null symbols are either dropped or added to the elasticity FIFO.
Copyright 2013 Advanced Chip Design Book. All rights reserved.