> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <

# A 64Gb/s Low-Power Transceiver for Short-Reach PAM-4 Electrical Links in 28nm FDSOI CMOS

E. Depaoli, H. Zhang, M. Mazzini, W. Audoglio, A. A. Rossi, G. Albasini, M. Pozzoni, S. Erba, E. Temporiti, A. Mazzanti, *Senior Member, IEEE* 

Abstract—A PAM-4 transceiver operating up to 64Gb/s in 28nm CMOS FDSOI for short-reach electrical links is presented. Receiver equalization relies on a flexible CTLE, providing a very accurate channel inversion through a transfer function that can be optimally adapted at low, mid and high frequency independently. The CTLE meets the performance requirements of CEI-56G-VSR without requiring DFE implementation. As result, timing constraints for comparators in data and edge sampling paths may be relaxed by using track-and-hold stages, saving power consumption. At the maximum speed, the receiver draws 180mA from 1V supply, corresponding to 2.8mW/Gb/s only. The transmitter embeds a flexible FFE which can be reconfigured to comply with legacy standards. A comparison between currentand voltage-mode TX drivers is proposed, proving through experiments that the latter yields larger PAM-4 eye openings thanks to the intrinsically higher speed. The full transceiver (TX, RX and clock generation) operates from 16Gb/s to 64Gb/s in PAM-4, 8Gb/s to 32Gb/s in NRZ, and supports 2x and 4x oversampling to reduce data rate down to 2Gb/s. A TX-to-RX link at 64Gb/s, across a 16.8dB-loss channel, reaches 10<sup>-12</sup> minimum BER and 0.19UI horizontal eye opening at BER=10<sup>-6</sup>, with 5.02mW/Gb/s power dissipation.

*Index Terms*—wireline transceiver, CMOS, FDSOI, analog transceiver, PAM-4, CTLE, Feed Forward Equalizer, serializer, 56Gb/s, CEI-56G-VSR.

### I. INTRODUCTION

THE constant growth of digitally intensive services, such as Internet of Things (IoT), multimedia on demand, cloud storage and cloud computing, is driving the continuous upgrade of telecommunication infrastructures and data-centers to support an exponential network traffic increase. New standards for electrical interconnects, addressing the need for higher communication speed, introduced 4-level Pulse Amplitude Modulation (PAM-4) in the migration path from 28Gb/s per lane to 56Gb/s and beyond (112Gb/s projects are currently in progress) [1]. An intense industry effort is presently underway toward the development of complete 56Gb/s PAM-4 transceivers [2-6] and building blocks at 112Gb/s are being



1

Fig. 1. Application space of 56Gb/s links.

investigated [7][8].

Compared to Non-Return-to-Zero (NRZ), each symbol in PAM-4 carries twice the information, thus limiting the spectral occupation theoretically to 50%. The more efficient use of the available link bandwidth, paired with a reduced clocking frequency and the continuous evolution of CMOS technologies, should enable links speed increase while limiting the overall systems costs and the power dissipation normalized to bit rate. But, compared to NRZ, design of PAM-4 transceivers entails many new challenges and trade-offs. The intrinsic 1/3 eye amplitudes of PAM-4 leads to SNR penalty, and transitions between non-adjacent levels with finite rise and fall times reduce the horizontal eye openings [9]. As a result, PAM-4 transmitters are required to deliver maximum signal swing with very wide equivalent bandwidth [10-13]. In addition, the

Manuscript Submitted in May 2018.

E. Depaoli, M. Mazzini, W. Audoglio, A. A. Rossi, G. Albasini, M. Pozzoni, S. Erba, E. Temporiti are with STMicroelectronics-Studio di Microelettronica Pavia – Italy (e-mail: <u>emanuele.depaoli@st.com</u>)

H. Zhang, A. Mazzanti are with the Department of Electrical, Computer and Biomedical Engineering, Università degli Studi di Pavia, 27100 Pavia, Italy (email: hongyang.zhang01@universitadipavia.it)

The final version of record is available at



Fig. 2. Block diagram of the RX (a) and clock generation (b).

linearity of the transceiver building blocks is extremely critical to avoid distortion of the four PAM levels and preserve signal integrity. Also, the equalization becomes more demanding. In fact, the multilevel signal suffers from increased sensitivity to channel loss and reflections because the smallest transitions (i.e. between adjacent levels) are impaired by inter-symbol interference (ISI) generated by 3-times larger pk-to-pk transitions [14]. This corresponds to a 3-times larger impact of ISI, compared to NRZ, thus mandating much finer channel equalization before symbols detection. Finally, transceivers must comply with legacy components, supporting a wide interval of data-rates and NRZ signaling at reduced speed, while still maintaining power efficiency.

The application space envisioned for 56Gb/s electrical interfaces is depicted in Fig. 1. Very different scenarios are considered, from ultra-short links between chips mounted within the same package, with negligible interconnect loss and reflections, up to long-reach links where the electrical interface must cope with up to 1mt-long channels (either backplane or cable) and the severe reflections generated by multiple packages and connectors. The harsh operating condition of long-reach links is driving the migration toward ADC-based receivers, where complex and flexible equalization and symbol detection are performed by digital signal processing (DSP) [15-17]. 56Gb/s PAM-4 receivers, implemented in state-of-the-art 16nm FinFet technology, demonstrated operation over 30-35dB channel loss at 14GHz Nyquist frequency, with a normalized power consumption (excluding the DSP power) in the range 4.4-6.6 mW/Gb/s [3][5][6]. Considering the DSP power, the total RX power consumption reported in [6] is above 8 mW/Gb/s. To improve the energy efficiency over links with reduced channel loss and reflections, e.g. in medium- or veryshort-reach links for chip-to-chip or chip-to-module interconnects (Fig.1), power scalable ADC-based receivers have been proposed [5][6]. But for such applications, analog PAM-4 receivers may offer higher power and area saving [4][18]. A main limiting factor to the efficiency of analog receivers is the implementation of the decision feedback equalizer (DFE). PAM-4 requires hardware triplication and improved resolution, compared to NRZ, rising challenges to

satisfy critical DFE feedback timing at low power also with advanced CMOS technology nodes [4].

2

This paper describes a low-power analog PAM-4 transceiver (TX+RX) in 28nm FDSOI CMOS technology [19]. The receiver, clocked at quarter-rate, comprises a flexible continuous-time linear equalizer (CTLE), CDR, eye monitor and digital adaptation logic. The CTLE features a transfer function optimally adapted at low, mid, and high frequency, allowing to meet the performance requirements of CEI-56G-VSR scenario with margin. Not requiring DFE, relaxed timing constraints allow to implement data, edge and eye monitor detection with track-and-hold stages, drastically reducing the power requirements. At 64Gb/s the full receiver requires 180mW from a single 1V supply, corresponding to 2.8mW/Gb/s only. The transmitter, leveraging the same quarter-rate clocks of the receiver, embeds a flexible Feed Forward Equalizer (FFE) which can be reconfigured to meet multiple standard requirements. A comparison between currentmode and voltage-mode drivers is proposed, showing with experiments that the latter yields larger eye openings thanks to the higher equivalent bandwidth. The full transceiver operates from 16Gb/s to 64Gb/s in PAM-4 and from 8Gb/s to 32Gb/s in NRZ. It supports data oversampling to reduce data-rate down to 2Gb/s. A TX-to-RX link at 64Gb/s across a 16.8dB-loss channel proves 10<sup>-12</sup> minimum BER and 0.19UI horizontal eye opening at BER=10<sup>-6</sup>, with 5.02mW/Gb/s power dissipation (comprising RX, TX and clocks generation).

The remaining part of the manuscript is organized as follows. Section II presents the receiver architecture and building blocks while the transmitter is described in Sec-III. Exhaustive experimental results are provided in Sec. IV, followed by the Conclusions.

#### II. RECEIVER ARCHITECTURE AND DESIGN

The receiver architecture is shown in Fig. 2a. The analog front-end comprises an input T-coil peaking network, two variable gain amplifiers (VGA1 and VGA2) and a 3-stage continuous time linear equalizer (CTLE). The input network provides wide-band input impedance matching, compensating



Fig.3. VGA realized with resistive source degeneration (a) and frequency response (b).



Fig.4. VGA based on cross-coupled differential pairs (a) and frequency response (b).

pad, ESD protection and input capacitance, and sets the correct common mode voltage for the analog front-end. VGA1 adjusts the signal swing to keep the CTLE in the linear range, while VGA2 is used for fine amplitude control at the samplers input. The output of the analog front-end feeds the RX sampling stage for data recovery and PAM4 to binary decoding. Three parallel sampling paths have been adopted for data, edge and monitor respectively. The receiver operates at quarter-rate, leveraging two differential clocks in quadrature. The clock generator (Fig. 2b) is shared among eight transceivers and consists of a 4-to-8GHz integer-N PLL, based on LC-VCOs and an optional divide-by-two path, to halve the clock frequency. The two LC-VCOs have a 4-to-6 GHz and 5.6-to-8 GHz tuning range respectively. The PLL can operate with 66-to-420 MHz reference clock frequencies and features a 1MHz closed-loop bandwidth. The rms clock jitter, integrating phase noise from 500kHz to Nyquist frequency offset, is 290fs. The total power consumption is 60mW (15mW of which dissipated by the LC-VCO). A self-calibrated injection-locked ring oscillator in each RX and TX slice provides eight quarter-rate phases [20]. The generated clock signals feed independent phase interpolators (PIs) with 7-bit resolution and circuits for Duty-Cycle Distortion correction, allowing precise control of the sampling phase in the three paths.

After data sampling, thermometer to binary decoders provide 4MSB+4LSB NRZ streams, further parallelized by 4:40 demultiplexers. Data path outputs are also used by the clock recovery unit, in combination with the outputs of the edge path, to set the optimal clock sampling phase. Early-late information for the second-order clock-recovery, driving the PIs, is derived after demultiplexers, allowing selective removal of undesired PAM4 transitions in the digital domain. Considering the PIs resolution and sampling rate, the clock recovery operates with



3

Fig.5. Circuit topology of the realized VGA (a), gain vs frequency (b), gain vs input amplitude (c).

up to approximately 1000ppm static error between TX and RX frequency references. Furthermore, data path outputs are used by the adaptation controller, in combination with measurements performed through the eye monitor path, to implement the digital calibration engine. Specifically, the integrated eye monitor builds PAM-4 signal statistics for adaptation of the samplers' thresholds, VGA gains and CTLE frequency response. Finally, data path outputs are used by the integrated PRBS BER checker. Offsets in the analog front-end and in each comparator are calibrated with dedicated autozeroing routines at start-up.

Details of the most critical stages in the analog front-end, i.e. VGAs and CTLE, and of the sampling stages are provided in the next subsections.

### A. Variable Gain Amplifiers

The most popular VGA circuit configuration, depicted in Fig.3a, consists of a differential pair with programmable resistive source degeneration [16][18]. Besides its simplicity, this circuit configuration has the advantage of low input capacitance and good linearity, particularly when the gain is decreased to accommodate large input signal. However, at the minimum gain the stray capacitance at source terminals of the two transistors introduces unwanted high frequency boost, given by  $1+g_mR_s/2$  (gm being the transistors transconductance and R<sub>s</sub> the degeneration resistance). As a result, the circuit suffers from significant bandwidth and group-delay variation across the gain settings, as shown by the simulations plot in Fig. 3b. Furthermore, achieving fine and linear gain control steps is difficult. The alternative VGA implementation in Fig. 4a makes use of a thermometric array of differential pairs with crosscoupled outputs [21][22]. In each element of the array, only one of the two differential pairs is turned on, according to the SEL control bit. Different gain values are then achieved by properly programming the bus of SEL controls. As proved by the simulations in Fig. 4b, this solution overcomes the main limitations of the VGA in Fig. 3a, yielding a flat frequency response with constant bandwidth and accurate gain control.

However, such improvements are achieved at the expense of a poorer gain compression, reduced bandwidth due to higher input and output capacitance, and slightly increased power consumption.

The implemented VGAs combine the two above circuit topologies, as shown in Fig. 5a, to exploit the respective advantages while mitigating the drawbacks. The differential pair with resistive degeneration provides a coarse gain control, while fine gain tuning is implemented using the thermometric array of cross-coupled differential pairs. The frequency response is plotted in Fig. 5b. Compared to a design based only on a resistively degenerated stage, the addition of the array of differential pairs in parallel reduces the required transconductance and degeneration resistance, limiting the unwanted high frequency peaking at the minimum gain setting from 3.8dB to 2dB. At the same time, the compression point improves when the gain is reduced, as shown by the simulations in Fig. 5c, allowing to withstand a wide variation of input amplitude with negligible non-linear distortion. The input 1dBgain compression is 600mV<sub>pp</sub> at the maximum gain of 2dB and it rises above  $1.2V_{pp}$  when the gain is decreased to -6dB.

The gain of the two VGAs is controlled with the following approach. During initial calibrations, a signal with the optimal driving level for the CTLE is first generated by the TX, and injected through a loop-back path bypassing VGA1 (shown Fig. 2a). From eye monitor measurements, VGA2 is regulated to reach the desired amplitude at the input of the sampling stages. Then, VGA1 is calibrated in the actual operating condition, to restore the same amplitude at eye monitor input. Finally, during normal operation, VGA2 is jointly adapted with CTLE, compensating its gain variations across different settings of the transfer functions, while VGA1 maintains the optimal CTLE driving level independently from the actual RX input amplitude. The VGAs are designed with sufficient overlap between the coarse and fine gain settings. In this way, during normal operation only the fine control code can be employed, avoiding potential issues arising from switching between the two different gain control techniques.

### B. Continuous-Time Linear Equalizer

The proposed CTLE consists of three stages, independently controlled, to match precisely the inverse of the channel response through a flexible shaping of the overall transfer function at low, mid and high frequency respectively.

The circuit schematic is drawn in Fig. 6. The RC-degenerated differential pair introduces a zero in the transfer function, shifted across frequency through the programmable degeneration capacitance  $C_s$ . Its purpose is to compensate the dielectric losses of the channel in the ~1-10GHz frequency range by introducing up to 12dB peaking. The feed-forward path in the first stage, consisting of an RC network in series to a transconductance stage (R<sub>2</sub>-C<sub>2</sub> and g<sub>m2</sub>), adds a mild ~1.5-2dB peaking at low frequency, with a zero-pole pair that can be shifted from 0.2GHz to 1GHz by tuning R<sub>2</sub>. This stage refines the CTLE transfer function at low frequency, where the skineffect loss determines a mild roll-off in the channel frequency



4

Fig.6. Circuit schematic of the continuous-time linear equalizer.

response [23][24]. Both  $R_2$  and  $C_8$  are tuned with an iterative algorithm, leveraging measurements performed by the eye monitor, to maximize the vertical and horizontal eye openings simultaneously.

The last stage of the CTLE, implemented with a feedback topology, introduces additional high-frequency boost (up to 6dB) to finely recover the steep roll-off of the channel profile close to Nyquist frequency. Neglecting the shunt peaking inductor  $L_3$ , circuit analysis yields the following transfer function, from the input of  $g_{m3}$  to the output of the CTLE:

$$H_{HF}(\omega) = \frac{g_{m3}R_3}{1+G_{LOOP}} \frac{\left(1+j\frac{\omega}{\omega_f}\right)}{\left(1+j\frac{\omega}{\omega_1}\right)\left(1+j\frac{\omega}{\omega_2}\right)} \tag{1}$$

where  $G_{LOOP} = g_{m5}R_3g_{m4}R_F$  is the static loop gain and  $\omega_f = 1/R_FC_F$ . The two poles  $\omega_l, \omega_2$  in (1) are given by:

$$\omega_{1,2} = \frac{\omega_f + \omega_p \pm \sqrt{\omega_f^2 + \omega_p^2 - 2\omega_f \omega_p (1 + 2G_{LOOP})}}{2}$$
(2)

with  $\omega_p = 1/R_3C_P$  the angular frequency of the parasitic pole at the output nodes of g<sub>m3</sub>. To ensure stability with wide margin, and to minimize distortion due to excessive group-delay variation, the stage is designed to operate with loop gain low enough such that, from (2),  $\omega_{I,2}$  are real. In this condition, the dependence from  $G_{LOOP}$  of (2) can be linearly approximated, yielding:

$$\omega_{1} \approx \omega_{f} \left( 1 + \frac{\omega_{p}}{\omega_{p} - \omega_{f}} G_{LOOP} \right)$$

$$\omega_{2} \approx \omega_{p} \left( 1 + \frac{\omega_{p}}{\omega_{p} - \omega_{f}} G_{LOOP} \right)$$
(3)

From (1) and (3), the peaking of the stage, i.e.  $max(|H_{HF}(\omega)|/|H_{HF}(\omega->0)|)$ , is controlled by  $G_{LOOP}$  reducing the low-frequency gain and by pushing the two poles to higher frequency. Assuming  $\omega_p$  is sufficiently high, such that  $\omega_2 >> \omega_l$ ,  $max(|H_{HF}(\omega)|/|H_{HF}(\omega->0)|) = 1+G_{LOOP}$ . The main advantage of the feedback topology is that the position of the zero in  $H_{HF}(\omega)$  is constant. As a result, a very selective control of the CTLE transfer function can be achieved through  $G_{LOOP}$  at high frequency with negligible impact at low and midfrequency, greatly simplifying the CTLE adaptation. The variation of low-frequency gain, leading to variation of the output eye amplitude, is compensated by VGA2 after the CTLE. Moreover, the feedback topology allows  $H_{HF}(\omega)$  to be



Fig.7. Frequency response and eye diagrams at different steps of the CTLE adaptation. Mid-frequency stage (a,b), low-frequency stage (c,d) and high-frequency stage (e,f).

continuously adapted with a Least Mean Squares algorithm, during normal RX operation, by taking the CTLE output signal as gradient information and using the eye monitor for error slicing [25].

The equalization accuracy of the proposed CTLE is demonstrated through simulations considering an openly available reference channel for 56Gb/s short-reach links [26] in Fig. 7. Simulations follow the adaptation sequence implemented on the RX. First, the optimal CTLE response at mid frequency is found, by programming the degeneration capacitance C<sub>s</sub>. The resulting transfer functions are reported in Fig. 7a, together with the inverse of the channel response. The eve diagram corresponding to the optimal configuration is shown in Fig. 7b. Then, the transfer function is tuned at lowfrequency, by acting on R<sub>2</sub>. The corresponding CTLE transfer functions are reported in Fig. 7c and the eye diagram after optimization in Fig. 7d. Finally, the high frequency boost is adapted (Fig. 7e), finely inverting the channel profile near Nyquist frequency. The resulting eye diagram is reported in Fig. 7f. After completing the CTLE adaptation, the horizontal and vertical eye openings are 0.42UI and 39mV (over 200mV<sub>pp</sub> eye amplitude). CEI-56G-PAM4 standard targets a raw BER of 10<sup>-</sup> <sup>6</sup>, corresponding to ~9.5 $\sigma$  for a gaussian distribution. The boxes in the middle of the eye diagrams in Fig. 7 represent the sampling uncertainty due to the noise of the analog front-end  $(\sigma_n \approx 2.4 \text{mV}_{\text{rms}})$  and random jitter introduced by the clockgeneration circuits ( $\sigma_I \approx 290$  fs). The eye opening after CTLE adaptation (Fig. 7f) meets the target BER with sufficient margin



5

Fig. 8. Temperature sensitivity of the CTLE transfer function.



Fig. 9. Block diagram and circuit schematics of the data sampling stage (a) and timing diagram (b).

left to other transceiver impairments.

Fig. 8 shows the impact of temperature variations on the CTLE transfer function. After adaptation at room temperature,  $T=27^{\circ}C$ , the CTLE maintains good channel inversion in the 0°C-120°C range at low and mid frequency. The temperature increase has a remarkable impact only at high frequency, where the CTLE can be adapted in background with LMS, during normal operation. The black curve, matching closely the nominal transfer function at room temperature, represents the CTLE response when adaptation is performed at 120°C.

# C. Samplers

Two paths, driven by differential clock signals in quadrature (*Phase I, Phase Q*) are used for data sampling and decision. The block diagram, with transistor-level schematics, is drawn in Fig.9 together with the timing diagram for the path driven by *Phase I* clock. The same architecture is employed for edge sampling, but with clocking signals shifted by 0.5UI. Not implementing the DFE, the power consumption required by the comparators is drastically reduced by adding a track-and-hold stage (T&H) in front of the sampling path. In fact, looking at

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) <</p>

The final version of record is available at

the bottom waveforms in Fig.9b, when Ck<sub>T&H</sub> is low, the T&H keeps a stable voltage level for subsequent processing over a time window lasting two UIs, greatly relaxing the speed requirement of the cascaded stages and, consequently, power dissipation. The T&H stage drives three slicers, realized with differential pairs as shown in Fig.9a, each one comparing the input differential signal with one of the three PAM4 thresholds. The thresholds are set independently, through on-chip DACs, to compensate gain variations, offsets and mismatches between the different paths with a dedicated calibration routine running at start-up. Finally, regenerative comparators, realized with a modified two-stage strong-arm circuit topology, suitable for low-supply voltage operation, take binary decisions for the subsequent thermometric to NRZ conversion. The sampling path is designed to reach a resolution of 2mV in the worst corner condition.

The receiver can be reconfigured to operate with NRZ signals, turning off unused blocks to save power consumption. In normal mode, the data rate ranges from 8-to-32 Gb/s, being the clock frequency programmable from 2 to 8GHz. By performing 2x or 4x data oversampling, the RX operates in halfspeed or quarter-speed modes, supporting 4-16Gb/s and down to 2-8Gb/s respectively. In NRZ, maximum power saving is achieved by turning-off two of the three slicers and comparators in each path, and properly setting the threshold of the active slicer. Alternatively, similarly to [9], the high flexibility in hardware reconfigurability allows the implementation of a 1tap look-ahead DFE by switching off only one out of three paths and adaptively setting equal and opposite thresholds in the two paths,  $C_1$  and  $-C_1$  (being  $C_1$  the post-cursor magnitude to be corrected by the DFE). The selection between the two outputs is then performed by the RX digital processing circuits, based on the previous bit decision.

#### III. TRANSMITTER ARCHITECTURE AND DESIGN

The block diagram of the TX is shown in Fig. 10. An on-chip pattern generator feeds 40:8 serializers with MSB and LSB streams. The 8-bit parallel data are then passed through a shift register delivering 5 delayed bundles across a 4-UI window (D. 2, D-1, D0, D1, D2), to be used by the FFE. Multiplexers (MUXs) M<sub>M</sub> and M<sub>L</sub> enable the FFE reconfigurability by selecting the data to be propagated and, after 8:1 serialization, feed the 4-MSB+4-LSB segments of the output driver (realized with 48MSB and 24LSB equal elements). The FFE coefficients are determined by the number of active elements in each segment.

By selecting the appropriate data through  $M_M$ ,  $M_L$ , the FFE is made compliant with different standard requirements. Table I summarizes the operating modes and the range of FFE coefficients. In PAM-4, the FFE features 4 taps, with 1 precursor and 2 post-cursors. In NRZ mode, the 40-bit MSB and LSB input streams are the same, but MUXs M<sub>M</sub>, M<sub>L</sub> and the driver segments can be operated independently. In this way, at full clock speed the FFE window can be extended to 5 taps (2 pre- and 2 post-cursors) satisfying the KP4 standard at 28Gb/s.

Like for the RX, the TX supports data oversampling, allowing operation at reduced speed. The data-rate can be



6

Fig. 10. TX block diagram.

TABLE-I. TX FFE OPERATION MODES

| Operation mode,<br>Standard                          | 2 <sup>nd</sup><br>pre | 1 <sup>st</sup><br>pre | cursor | 1 <sup>st</sup><br>post | 2 <sup>nd</sup><br>post |
|------------------------------------------------------|------------------------|------------------------|--------|-------------------------|-------------------------|
| Full speed, PAM-4<br>CEI-56G-PAM4                    |                        | ±3/24                  | 12/24  | ±9/24                   | ±9/24                   |
| Full speed, NRZ<br>28Gb/s KP4                        | ±21/72                 | ±21/72                 | 36/72  | ±36/72                  | ±36/72                  |
| Half speed, NRZ<br>10Gb/s KR10,<br>8.5Gb/s PCI Exp-3 |                        | ±21/72                 | 36/72  | ±36/72                  |                         |
| Quarter speed, NRZ<br>2.5Gb/s PCI Exp-1,2            |                        |                        | 45/72  | ±27/72                  |                         |

halved (half-speed mode) with 2x oversampling. In this case the FFE allows 3 taps, making the TX compliant with 10Gb/s KR10 and 8.5Gb/s PCI-Express Gen3. With 4x oversampling the FFE still allows 1 post-cursor tap, satisfying the PCI-Express Gen1-2, at 2.5Gb/s.

A double T-coil peaking network, after the output driver, extends the bandwidth and provides good impedance matching up to ~20GHz frequency by splitting the large parasitic capacitances introduced by ESD protections and output pads. The ESD protections are sized to ensure >2kV HBM protection.

The last stage of the high speed 8:1 serializer is the most critical for signal integrity. Being the TX clocked at quarter rate, a 4:1 MUX is the natural choice [10][27]. A conventional passgate 4:1 MUX, operating with selection signals featuring 25% duty-cycle, is drawn in Fig. 11a. This topology is sensitive to errors in the selection signals [28] and the large output capacitance, Cpar, limits the rise/fall time and maximum data rate. To circumvent these issues, a 4:1 serializer comprising the cascade of 2:1 MUXs with a local clock multiplier is proposed, as shown in Fig. 11b. The 4 data streams are retimed and appropriately shifted at quarter rate (b0, b1, b2, b3). Then, a pair of 2:1 MUXs, driven by quarter-rate clocks in quadrature, deliver B<sub>0</sub>, B<sub>1</sub> at half rate. The quadrature clocks also feed the clock-multiplier, yielding the half-rate differential clocks for the last 2:1 MUX. The first 2:1 MUXs are based on pass-gates, to save power and ensure a propagation time, t<sub>MUX</sub>, higher than the delay introduced by the clock multiplier, t<sub>MULT</sub>. The highspeed 2:1 MUX is realized with speed-optimized NAND gates and drives directly the segments of the TX output driver, without buffers. The timing diagram of the 4:1 serializer is



Fig. 11. 4:1 MUX based on pass-gates (a) and 4:1 serializer with local 2x clock multiplier (b).



Fig. 12. Timing diagram for the proposed 4:1 mux.



Fig. 13. Simulated pk-pk Jitter introduced by the two 4:1 MUX in Fig. 10.

sketched in Fig. 12. Taking the falling edge of CKI as a reference, the edges of B<sub>0</sub> and of CK2 (the double-frequency clock) are delayed by  $t_{MUX}$  and  $t_{MULT}$  respectively. As a result, the edges of B<sub>0</sub> are delayed by  $t_D=t_{MUX}-t_{MULT}$  with respect to the edges of CK2. To avoid horizontal eye closure, the bits of B<sub>0</sub> stream (b<sub>2</sub> in the example of Fig. 12) must be stable before selection by the last 2:1 MUX. Similar considerations apply to B1 and CK2, considering CKQ falling edge. Considering a finite rise/fall time,  $t_R/F$ , for B<sub>0</sub>-B<sub>1</sub>, the above consideration requires a period of CK2 which satisfies  $T_{ck}/2 > (t_D+t_{R/F})$  or, equivalently, the maximum data rate is  $R_{max}=1/(t_{R/F}+t_D)$ . With  $t_D\sim10$ ps and  $t_R/F\sim15$ ps with the adopted technology,  $R_{max}\sim40$ Gb/s, allowing the serializer to work at the target datarate without power hungry data retiming before the last 2:1 MUX. Fig. 13 compares the data-dependent pk-pk jitter, J<sub>pp</sub>,



7

Fig. 14. Clock frequency doubler (a) and effect of duty-cycle error (b) end quadrature error (c) on the input clocks.

from the simulated eye diagrams at the output of the two 4:1 serializers in Fig. 11, designed for the same fan-out and power consumption. The adopted solution supports a remarkably higher speed and, at 28Gb/s,  $J_{pp}$  is reduced by 30% compared to the pass-gate 4:1 MUX.

The frequency doubler for the last 2:1 MUX is realized by XORing the quarter-rate clocks with the NAND-based network in Fig. 14a. The accuracy of the input clocks is key not to impair the horizontal eye opening because, as shown by the timing waveforms sketched in Fig. 14b,c, duty-cycle and quadrature errors generate duty-cycle distortion (DCD) on the double frequency clock. But while the impact of quadrature error is on two consecutive UIs (Fig.14c), the repetition period of the impairment generated by a duty-cycle error is 4 UIs (Fig.14b). The quadrature accuracy and duty-cycle of the quarter-rate clocks are regulated by the phase interpolators and duty-cycle correction circuits shown in the TX block diagram (Fig. 10). The two impairments are detected by the eye monitor and calibrated at start-up, with the TX closed internally in loop-back with the RX. Differently from [3], where clocks I-Q error and DCD are detected and corrected within the clock distribution chain, the adopted technique allows compensation of impairments introduced within the full TX chain, comprising MUXes and output driver.

Two TX versions, with current-mode (CM) and voltagemode (VM) drivers have been designed and fabricated to have an experimental performance comparison. The advantage of the CM driver is the larger output voltage swing, desirable with PAM-4 to ensure high SNR [29]. Fig. 15 shows the implemented CM circuit topology, employed in each of the 72 elements of the TX segments. The output current is set by transistors  $M_{C1-2}$  switched on/off by the inverters driven by the



Fig. 15. Current-mode TX driver.



Fig. 16. Voltage-mode TX driver.



Fig. 17. Simulated step response of the current-mode and voltage mode TX drivers.

last stage of the serializer, according to  $In_P$  and  $In_N$  levels. A replica bias circuit regulates the current for a desired output amplitude. The termination resistors ( $R_T$ ) are made programmable to achieve good impedance matching against process variation. Compared to a simple differential pair driver [29], higher voltage swing can be delivered without compromising reliability. In fact, the drain and gate terminals of each transistor in a differential-pair CM driver experience large and opposite voltage excursions, while in the proposed circuit the gate voltage of  $M_{C1-2}$  is kept constant. As a result,  $V_{GD}$ , the maximum voltage stress across the oxide of  $M_{C1,2}$  is remarkably reduced, allowing a higher supply voltage to be used ( $V_{dd,DR}$ ) to deliver larger swing.

The implemented VM driver is shown in Fig. 16. In this case, the termination resistors are in series with inverters driven by

the serializer. Compared to the CM topology, the VM driver delivers an output swing limited to  $V_0 = V_{dd CMOS}$ , but features higher power efficiency and linearity [29]. Moreover, this topology yields lower rise and fall times, performing better at high speed. In fact, comparing the schematics in Fig. 15 and Fig. 16, the charge/discharge time constant of the large parasitic capacitance  $C_P$  of the output devices, arising from the many parallel elements and interconnections in the full TX, is  $\tau \sim R_T C_P$ in the CM driver while it is drastically reduced to  $\tau \sim r_{on}C_{P}$  in the VM topology (being  $r_{on} \ll R_T$  the channel resistance of transistors in the inverters of Fig 16). The small programmable resistors R<sub>C</sub> in series with the supply and ground of the inverters in Fig. 16 are used for trimming the TX output resistance, to have good impedance matching against process variations. Cc shorts R<sub>C</sub> at high frequency, rising speed. Fig. 17 compares the simulated step response of the implemented drivers, accounting the ESD protection, package parasitic and the T-coil peaking network. The rise time, from 10%-to-90%, is 12.5ps and 9.2ps for the CM and VM drivers respectively

8

# IV. EXPERIMENTAL RESULTS

The photomicrograph of the fabricated test chip is shown in Fig. 18, together with the breakdown of power consumption estimated from simulations. Several transceivers are stepped on the same die, interleaved by shared PLLs for clock generation. For testing, chips are assembled in standard BGA packages and mounted within a socket on PCB. First, transmitter measurements are presented. Fig. 19 shows the return loss at the output of the two TX versions, VM (blue curve) and CM (red curve). In both cases, good impedance termination is achieved, meeting the CEI-56G-PAM-4 mask with some margin. TX eye diagram measurements have been performed by connecting the outputs to a high-speed sampling oscilloscope. The operation of the calibration routines for the clock impairments is proved in Fig. 20. Fig. 20a shows the PAM-4 TX eye diagram when a systematic quadrature phase error between the quarter-rate clocks is manually forced through the TX PIs. The horizontal asymmetry between consecutive eyes, expected from the discussion in Sec. III, is clearly visible. The measured eye diagram after automatic calibration, closing the TX in internal loop-back with the RX, is reported in Fig.20b. The horizontal asymmetry is reduced to 0.015UI. A similar test has been performed by forcing a duty-cycle clock error, through the control code of the TX duty-cycle correction circuits. The eve diagrams in Fig. 20c highlight a horizontal opening asymmetry among nearby eyes with a repetition period of 4UIs, almost completely removed after loop-back calibration (Fig. 20d).

The TX is employed to estimate the clock jitter, by transmitting an alternate NRZ pattern and integrating the measured phase noise up to 8GHz offset. The jitter is 290fsec. From simulations, the PLL contributes by 65% and the rest is introduced by the clock distribution chain.

The CM and VM transmitters are compared at maximum data rate in Fig. 21. The supply voltage for the VM driver is 1V, while for the CM driver it is raised to 1.2V. For all the measurements the TX FFE is enabled, with the same



Fig. 18. Chip photographs and breakdown of TX/RX power dissipation.



Fig. 19. Reflection coefficient at the VM and CM TX outputs.



Fig. 20. Calibration of TX clock impairments. Eye with quadrature phase error (a) and after calibration (b). Eye diagram with duty-cycle error (c) and after calibration (d).

coefficients ([C<sub>-1</sub>, C<sub>0</sub>,C<sub>1</sub>,C<sub>2</sub>] = [-1/24,18/24,-3/24,0]), to recover an estimated loss of ~3.5dB due to the test board and cables connecting to the oscilloscope. The eye diagrams at 32Gb/s in NRZ are reported in Fig. 21a-b. In this case, the two TX



Fig. 21. 32Gb/s NRZ eye diagrams at the output of the CM driver (a) and VM driver(b). 64Gb/s PAM-4 eye diagrams at the output of the CM driver (c) and VM driver(d).



Fig. 22. Experimental setup for the link tests (a) and TX-to-RX pulse response (b)

perform similarly, being the measured vertical and horizontal eye openings comparable. The advantage of the VM TX becomes more evident when delivering the PAM-4 signal, at 64Gb/s, shown in Fig.21c,d. The CM driver delivers larger pk-pk eye amplitude than the VM driver (850mV vs 710mV), but the higher speed of the latter yields remarkable vertical and horizontal eye openings improvement. Looking carefully at the two measurements in Fig.21c,d, the VM TX suffers from more noisy transitions, compared to the CM TX, likely induced by supply noise. However, while this must be expected from the intrinsic poor supply sensitivity of simple inverters, it should be noted that the VM driver shares the supply with the rest of the TX, while a separate supply is employed for the output driver in the CM TX. For both drivers, a RLM >96% is measured without any calibration.

Link tests have been performed by using the VM TX and the RX with the experimental setup shown in Fig. 22a. The TX-to-RX pulse response at 32Gbit/sec is shown in Fig.22b. The TX feeds a PCB channel of 10.6 cm length, test-board traces, connectors and cables with a loss from BGA-to-BGA of 16.8dB at 16GHz. The TX FFE is statically configured for 2.8dB precursor pre-emphasis. After running RX calibrations and

#### > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10

TABLE II. TX SUMMARY AND COMPARISON.

|                                 | [12]<br>Diksson,<br>ISSCC 2017 | [3] Frans,<br>JSSC 2017 | [5] Wang,<br>ISSCC 2018 | [6]<br>Upadhyaya,<br>ISSCC 2018 | This work<br>(VM-TX)   |
|---------------------------------|--------------------------------|-------------------------|-------------------------|---------------------------------|------------------------|
| Technology                      | 14nm<br>FinFet                 | 16nm<br>FinFet          | 16nm<br>FinFet          | 16nm<br>FinFet                  | 28nm<br>FDSOI-<br>CMOS |
| Data Rate<br>[Gb/s]             | 56                             | 56                      | 64                      | 56                              | 64                     |
| Amplitude<br>[V <sub>pp</sub> ] | 0.9                            | 1.2                     | 1                       | 1                               | 1                      |
| FFE                             | 3-tap                          | 3-tap                   | 3-tap                   | 4-tap                           | 4-tap                  |
| RLM [%]                         | N.A.                           | 97                      | 99                      | 98                              | > 96                   |
| Area [mm <sup>2</sup> ]         | 0.035                          | 1.4<br>(TX+RX)          | 0.09                    |                                 | 0.12                   |
| Supply [V]                      | 0.95                           | 0.9/1.2                 | 1.2                     | 0.9/1.2                         | 1                      |
| Power [mW]                      | 101                            | 140                     | 89.7                    | 77                              | 135                    |
| [mW/Gb/s]                       | 1.8                            | 2.18                    | 1.4                     | 1.38                            | 2.1                    |

|                          | [3] Frans,<br>JSSC 2017                       | [4] Im,<br>JSSC 2017           | [6] Upadhyaya,<br>ISSCC 2018                  |                     | [5] Wang,<br>ISSCC 2018        |                  | This work              |
|--------------------------|-----------------------------------------------|--------------------------------|-----------------------------------------------|---------------------|--------------------------------|------------------|------------------------|
| Technology               | 16nm<br>FinFet                                | 16nm<br>FinFet                 | 16nm<br>FinFet                                |                     | 16nm<br>FinFet                 |                  | 28nm<br>FDSOI-<br>CMOS |
| Data Rate [Gb/s]         | 56                                            | 56                             | 56                                            |                     | 64                             |                  | 64                     |
| Link Loss [dB]           | 31                                            | 10                             | 32                                            | 7.5                 | 29.5                           | 8.6              | 16.8                   |
| Equalization             | TX-FFE,<br>CTLE,<br>1-tap DFE,<br>24-tap FFE, | TX-FFE,<br>CTLE,<br>10-tap DFE | TX-FFE,<br>CTLE,<br>1-tap DFE,<br>14-tap FFE, | TX-FFE,<br>CTLE,    | TX-FFE,<br>CTLE,<br>16-tap FFE | TX-FFE,<br>CTLE, | TX-FFE,<br>CTLE        |
| Min. BER                 | ~10 <sup>-8</sup>                             | ~10 <sup>-12</sup>             | < 10 <sup>-12</sup>                           | < 10 <sup>-12</sup> | ~10-6                          | ~10-4            | ~10 <sup>-12</sup>     |
| H @ BER=10 <sup>-6</sup> | 0.15                                          | 0.2                            | 0.15                                          | 0.18                | N.A.                           | N.A.             | 0.19                   |
| Area [mm <sup>2</sup> ]  | 1.4<br>(TX+RX)                                | 0.36                           | 2.2 (TX+RX)                                   |                     | 0.163*                         |                  | 0.32                   |
| Supply [V]               | 0.9/1.2                                       | 0.9/1.2                        | 0.85/0.9/1.2/1.8                              |                     | 0.9/1.2                        |                  | 1                      |
| Power [mW]               | 370*                                          | 230                            | 450                                           | 270                 | 283.9*                         | 100*             | 180                    |
| [mW/Gb/s]                | 6.6*                                          | 4.1                            | 8                                             | 4.82                | 4.43*                          | 1.56*            | 2.8                    |



Fig. 23. BER contour (a) and bathtub (b) at 32Gb/s NRZ. BER contour (c) and bathtub (d) at 64Gb/s PAM-4.



Fig. 24. Jitter tolerance test at 64Gb/s PAM-4.

adaptation, the signal quality at the samplers has been estimated through measurements performed with the on-chip RX eye monitor and BER checker. Fig. 23a,b shows the extracted BER contours and bathtub curves when a 32Gb/s NRZ signal is transmitted over the link. In this case, only the CTLE is employed for equalization at the RX side, and the horizontal eye opening at BER= $10^{-12}$  is 0.35UI. By using two of the datasamplers to implement a 1-tap look-ahead DFE, as previously described in Sec. II, higher channel loss can be tolerated. A comparable eye opening at BER= $10^{-12}$  is achievable across a 23dB-loss channel.

Fig. 23c,d plot the BER contours and the bathtub curve with PAM-4, at 64Gb/s. The horizontal opening at BER= $10^{-6}$  is 0.19UI and the bathtub is still minimally open at BER= $10^{-12}$ . The same measurements have been repeated with two adjacent transceivers operating simultaneously, to assess the robustness against crosstalk, mostly originated within the package. From the available package model, the estimated crosstalk magnitude at RX input is  $2.1 \text{mV}_{pk-pk}$ . The bathtub in Fig. 23d proves a marginal penalty on the horizontal eye opening at BER= $10^{-6}$ .

Jitter tolerance tests have been performed feeding the RX with the signal generated by a laboratory PAM-4 pattern generator allowing to add sinusoidal Jitter of different amplitude. The RX reference frequency is provided by a 100ppm crystal. The measured results, plotted in Fig. 24, proves that the RX meets the CEI-56G-VSR mask with robustness against CDR loop gain variation.

Finally, measurements are summarized and compared to PAM-4 transmitters and receivers at similar data rate in Table II and Table III, respectively. The performances of the presented TX is in line with state of the art, while the RX proves a remarkable improvement in power efficiency, compared to other implementations operating at comparable channel loss, despite realization in a less scaled technology node. The TX and RX need 2.1 and 2.8 mW/Gb/s respectively. Accounting for the 60mW power for the clock generation circuits, shared between eight transceivers, the overall power dissipation is 5.02mW/Gb/s.

#### V. CONCLUSION

A PAM-4 transceiver in 28nm FDSOI-CMOS supporting operation up to 64Gb/s has been presented. Current- and voltage-mode drivers in the TX have been experimentally

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication

#### The final version of record is available at http://dx.doi.org/10.1109/JSSC.2018.2873602

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11

compared, proving that the latter provides better eye opening thanks to the intrinsically higher speed. A new CTLE circuit topology, featuring high flexibility and accuracy to match the channel response has been proposed. The CTLE meets the equalization requirements of CEI-56G-VSR links with margin, allowing the implementation of a mostly-analog RX, without DFE, thus saving significant power consumption. The full transceiver includes digital calibration and adaptation algorithms for TX and RX. At 64 Gb/s, a TX-to-RX link over 16.8dB-loss channel reaches 10<sup>-6</sup> BER with 0.19UI timing margin, requiring only 5.02mW/Gb/s.

#### REFERENCES

- [1] Available online: https://www.oiforum.com
- [2] K. Gopalakrishnan et al., "3.4 A 40/50/100Gb/s PAM-4 Ethernet transceiver in 28nm CMOS," IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 62-63.
- [3] Y. Frans et al., "A 56-Gb/s PAM4 Wireline Transceiver Using a 32-Way Time-Interleaved SAR ADC in 16-nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 52, no. 4, pp. 1101-1110, April 2017.
- [4] J. Im et al., "A 40-to-56 Gb/s PAM-4 Receiver With Ten-Tap Direct Decision-Feedback Equalization in 16-nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 3486-3502, Dec. 2017.
- [5] L. Wang, Y. Fu, M. LaCroix, E. Chong and A. C. Carusone, "A 64Gb/s PAM-4 transceiver utilizing an adaptive threshold ADC in 16nm FinFET," IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 110-112.
- [6] P. Upadhyaya et al., "A fully adaptive 19-to-56Gb/s PAM-4 wireline transceiver with a configurable ADC in 16nm FinFET," IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 108-110.
- [7] J. Kim et al., "A 112Gb/s PAM-4 transmitter with 3-Tap FFE in 10nm CMOS," IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 102-104.
- [8] C. Menolfi et al., "A 112Gb/S 2.6pJ/b 8-Tap FFE PAM-4 SST TX in 14nm CMOS," IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 104-106.
- [9] J. L. Zerbe et al., "Equalization and clock recovery for a 2.5-10-Gb/s 2-PAM/4-PAM backplane transceiver cell," in IEEE Journal of Solid-State Circuits, vol. 38, no. 12, pp. 2121-2130, Dec. 2003.
- [10] J. Kim, et al., "A 16-to-40Gb/s quarter-rate NRZ/PAM4 dual-mode transmitter in 14nm CMOS," IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2015, pp. 60-61.
- [11] M. Bassi, F. Radice, M. Bruccoleri, S. Erba and A. Mazzanti, "A 45Gb/s PAM-4 transmitter delivering 1.3Vppd output swing with 1V supply in 28nm CMOS FDSOI," IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 66-67.
- [12] T. O. Dickson et al.: "A 1.8pl/b 56Gb/s PAM-4 Transmitter with Fractionally Spaced FFE in 14nm CMOS", IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2017, pp. 118–119.
- [13] G.Steffan et al.: "A 64Gb/s PAM-4 Transmitter with 4-Tap FFE and 2.26pJ/b Energy Efficiency in 28nm CMOS FDSOI," ISSCC. Dig., Feb. 2017, pp. 116–117.
- [14]A. Roshan-Zamir, O. Elhadidy, H. W. Yang and S. Palermo, "A Reconfigurable 16/32 Gb/s Dual-Mode NRZ/PAM4 SerDes in 65-nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 52, no. 9, pp. 2430-2447, Sept. 2017.
- [15]S. Palermo et al., "CMOS ADC-based receivers for high-speed electrical and optical links," in IEEE Communications Magazine, vol. 54, no. 10, pp. 168-175, October 2016.
- [16] D. Cui et al., "3.2 A 320mW 32Gb/s 8b ADC-based PAM-4 analog frontend with programmable gain control and analog peaking in 28nm CMOS," IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2016, pp. 58-59.
- [17] Aurangozeb, A. D. Hossain, M. Mohammad and M. Hossain, "Channel-Adaptive ADC and TDC for 28 Gb/s PAM-4 Digital Receiver," in IEEE Journal of Solid-State Circuits, vol. 53, no. 3, pp. 772-788, March 2018.

- [18] P.-J.Peng et al.:"A 56Gb/s PAM-4/NRZ Transceiver in 40nm CMOS", IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, 2017, pp. 110–111.
- [19]E. Depaoli et al., "A 4.9pJ/b 16-to-64Gb/s PAM-4 VSR transceiver in 28nm FDSOI CMOS," 2018 IEEE International Solid - State Circuits Conference - (ISSCC), San Francisco, CA, 2018, pp. 112-114.
- [20] E. Monaco, G. Anzalone, G. Albasini, S. Erba, M. Bassi and A. Mazzanti, "A 2–11 GHz 7-Bit High-Linearity Phase Rotator Based on Wideband Injection-Locking Multi-Phase Generation for High-Speed Serial Links in 28-nm CMOS FDSOI," in IEEE Journal of Solid-State Circuits, vol. 52, no. 7, pp. 1739-1752, July 2017.
- [21] E. Mammei et al., "Analysis and Design of a Power-Scalable Continuous-Time FIR Equalizer for 10 Gb/s to 25 Gb/s Multi-Mode Fiber EDC in 28 nm LP CMOS," in IEEE Journal of Solid-State Circuits, vol. 49, no. 12, pp. 3130-3140, Dec. 2014.
- [22] Chih-Fan Liao and Shen-Iuan Liu, "A 10Gb/s CMOS AGC Amplifier with 35dB Dynamic Range for 10Gb Ethernet," 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers, San Francisco, CA, 2006, pp. 2092-2101.
- [23] S. Parikh et al., "A 32Gb/s wireline receiver with a low-frequency equalizer, CTLE and 2-tap DFE in 28nm CMOS," 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, 2013, pp. 28-29.
- [24] B. Zhang et al., "A 28 Gb/s Multistandard Serial Link Transceiver for Backplane Applications in 28 nm CMOS," in IEEE Journal of Solid-State Circuits, vol. 50, no. 12, pp. 3089-3100, Dec. 2015.
- [25] A. Carusone and D. A. Johns, "Analogue adaptive filters: past and present," in IEE Proceedings - Circuits, Devices and Systems, vol. 147, no. 1, pp. 82-90, Feb 2000.
- [26] Available online:

http://www.ieee802.org/3/bs/public/14\_11/dudek\_3bs\_01\_1114.pdf

- [27] Y. Frans et al., "A 40-to-64 Gb/s NRZ Transmitter With Supply-Regulated Front-End in 16 nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 51, no. 12, pp. 3167-3177, Dec. 2016.
- [28] A. A. Hafez, M. S. Chen and C. K. K. Yang, "A 32–48 Gb/s Serializing Transmitter Using Multiphase Serialization in 65 nm CMOS Technology," in IEEE Journal of Solid-State Circuits, vol. 50, no. 3, pp. 763-775, March 2015.
- [29] M. Bassi, F. Radice, M. Bruccoleri, S. Erba and A. Mazzanti, "A High-Swing 45 Gb/s Hybrid Voltage and Current-Mode PAM-4 Transmitter in 28 nm CMOS FDSOI," in IEEE Journal of Solid-State Circuits, vol. 51, no. 11, pp. 2702-2715, Nov. 2016.



**E. Depaoli** was born in Pavia, Italy, in 1980. He received the Laurea degree in electronics engineering from the University of Pavia, Italy, in 2004. His Laurea thesis focused on the study of a RF Front-End for Multistandard Simultaneous or Alternative Receiver Based on LNA with Positive Feedback. In 2004-2005 he was involved in the FIRB project in the design of a

multistandard receiver for WLAN. In december 2005 he joined STMicroelectronics in Pavia in "Studio di Microelettronica" as analog design engineer in the New IP & Design Support Group. He is currently the analog project leader of the High Speed interface design team. His current research interests include the development of IPs for high-speed serial links.



**H. Zhang** was born in Zibo, China, in 1990. He received bachelor degree in Electronic Science and Technology in 2013 from Hefei University of Technology, Hefei, China. He received Master degree in Microelectronics in 2016 from the University of Pavia, Pavia, Italy. During his thesis period, he worked on an Adaptive FIR Equalizer for high speed serial communication.

In November 2016, he started working towards the Ph.D. degree at the University of Pavia in collaboration with STMicroelectronics (Studio di Microelettronica). His current research is high speed wireline communication IC design, with particular emphasis on PAM-4 equalization techniques.



**M. Mazzini** was born in Voghera, Italy, in 1979. He received the Laurea degree (summa cum laude) in electronics engineering from the University of Pavia, Pavia, Italy, in 2004. During his Laurea thesis, he studied RF applications of MEMS sensors in cooperation with STMicroelectronics.

In May 2005, he started working for STMicroelectronics of Castelletto, Italy,

joining several product groups including Printer Division (2005-2006) and Read/Write channel for hard-disk drives (2006-2010). Since September 2010, he has been involved as analog designer in the High-Speed Interface IPs team located in Pavia to develop equalizers and phase generators for Ser-Des applications.



**W. Audoglio** was born in Abano Terme, Italy, in 1978. He received the Laurea degree in electronics engineering in 2003, from the University of Pavia, Italy. In February 2003 he joined the "Studio di Microelettronica" and he was involved in the design of a multi-standard reconfigurable ADC for GSM, UMTS and WLAN. In 2005 he joined STMicroelectronics, Pavia. His design

12

activities included wide-band high resolution analog-to-digital converters and digital calibration techniques, UWB frequency synthesis, analog-to-digital converters for HDD. His current research interests are focused on high-speed serial communication, particularly BaseT Ethernet and multi-gigabit serial interfaces for I/O backplanes.



**A.A. Rossi** was born in Pavia, Italy in 1967. He received the Laurea degree in electronics engineering in 1993, from the University of Pavia (Italy). In 1994 he joined STMicroelectronics, Castelletto site in Milan (Italy). Form 1995 to 2005 he was in Data Storage Division as system architect. He has been involved in many projects for hard-disk drives based on

PRML detection, also from 2000 to 2001 he was at CMRR University of California San Diego as visiting scholar. Since 2006 was in STMicroelectronics High-Speed Interface IPs team as system architect, from 2006 to 2011 he was located in Grenoble (France) and he was involved in the consumer IPs (SATA PCI-Ex). Since 2011 was in the "Studio di Microelettronica" Pavia and he was involved Networking leading edge Ser-Des.



**G. Albasini** was born in Voghera, Italy, in 1974. He received the Laurea degree in electronics engineering from the University of Pavia, Pavia, Italy, in 1999, and the Executive MBA degree from Stogea, Bologna, Italy, in 2004.

In 2000, he joined STMicroelectronics, Pavia, where he was involved in RF analog design for radio communications. In 2002, he joined IMEC, Leuven, Belgium, where

he was involved in system analysis for WLAN. Since 2007, he has been a Design Manager within STMicroelectronics, leading many innovative projects, in the field of millimeter-wave, UWB, probe storage, and HDD, and achieving many silicon successes. In the last years his research interests have included high-speed serial interfaces, particularly for BaseT Ethernet and SerDes applications. This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.

The final version of record is available at http://dx.doi.org/10.1109/JSSC.2018.2873602

> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 13



**M.Pozzoni** was born in Pavia, Italy, in 1969. He graduated summa cum laude in electronics engineering from the University of Pavia, Italy, in 1994 and in the same year he joined STMicroelectronics, Castelletto site (Milan). In 1998 he joined the Telecom Wireline division inside STMicroelectronics as analog design

responsible and from 2000 as high speed design responsible. Since then he lead multiple high-speed interfaces and SerDes design, with multiple projects successfully brought in production in multiple technology nodes. He is expert in electronic analog and digital design and system architecture for high speed interconnections.



S.Erba was born in Como (ITALY) in 1976. He received the Laurea degree in Electrical Engineering from the University of Pavia, ITALY in March 2001. 2001 In he joined STMicroelectronics working on high speed interface analog design and system architecture, becoming High Speed Interface R&D manager in January 2015.

During his career, Mr. Erba led the SerDes development of 10+ projects successfully brought to production in the most advanced CMOS technology nodes. His most important scientific contributions are related to innovative receiver and transmitter architectures and to design and layout techniques for High Speed Serial Links.

Dr. Erba has been a member of the Technical Program Committee of the IEEE International Solid State Circuits Conference, since 2015.



**E.Temporiti** received the Laurea degree in Electronic Engineering from the University of Pavia, Italy, in 1999, working in conjunction with Alcatel Italia.

Since 2000, he has been with STMicroelectronics in the "Studio di Microelettronica" in Pavia, Italy,

focusing on CMOS analog and mixed-signal high speed integrated circuits for wireless and wireline communications. He is currently responsible for the advanced generation IP development in the CMOS ASIC R&D team.

He holds U.S. and European patents, mainly in the fields of optical communications and frequency synthesis.



A.Mazzanti (S'02–M'09–SM'13) received the Laurea and Ph.D. degrees in electrical engineering from the Università di Modena and Reggio Emilia, Modena, Italy, in 2001 and 2005, respectively. In 2003, he joined Agere Systems, Allentown, PA, USA, as an Intern. From 2006 to 2009, he was an Assistant Professor with the Università di Modena and Reggio Emilia. In 2010, he joined the

Università di Pavia, Pavia, Italy, where he is currently an Associate Professor. He has authored over 100 technical papers. His current research interests include cover device modeling and IC design for high-speed communications, RF, and millimeter-wave systems. Dr. Mazzanti was a member of the Technical Program Committee of the IEEE Custom Integrated Circuit Conference, from 2008 to 2014. He has been a member of the IEEE European Solid State Circuits Conference and the IEEE International Solid State Circuits Conference since 2014. He was an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-I from 2012 to 2015, and the Guest Editor of the special issues of the IEEE JOURNAL OF SOLID STATE CIRCUITS dedicated to CICC 2013–14 and ESSCIRC-2015. Since 2017 he is serving as Associate Editor for the IEEE SOLID STATE CIRCUITS LETTERS.