A SHORTENED IMPULSE RESPONSE FILTER (SIRF) SCHEME FOR COST-EFFECTIVE ECHO CANCELLER DESIGN OF 10GBASE-T ETHERNET SYSTEM

Ming-Feng Hsu, Yen-Liang Chen, Kai-Yuan Jheng, and An-Yeu (Andy) Wu

Graduate Institute of Electronics Engineering, National Taiwan University, Taipei 10617, Taiwan, ROC

Abstract—IEEE 802.3an task force is now developing new Ethernet Standard- 10GBase-T. Currently, most of the 10Gbps Ethernet is developed with fiber medium due to the fiber high bandwidth. However, the fiber and optical device cost are still too high to be popular. Therefore, our goal is to lower the 10GBase-T transceiver cost. We propose two new low cost 10GBase-T transceiver architectures. One is echo shortening. A Shortened Impulse Response Filter (SIRF) is used to shorten the echo impulse response to reduce the echo canceller cost. The other is joint shortening. The SIRF is used to jointly shorten impulse response of echo and NEXT. Moreover, compare with the conventional architecture, the cost saving of the proposed echo shortening and joint shortening architecture is 12% and 35% respectively.

I. Introduction

In this paper, we take 10GBase-T as our target system. Although IEEE 802.3an task force [1] has not finalized the new 10 Gigabit Ethernet standard, we choose the most promising specifications as our design parameters. The baseband DSP block diagram of 10GBase-T transceiver is shown in Fig. 1. The upper path is transmitter and the lower path is receiver.

In order to achieve the low cost property of Ethernet, the goal of this paper is to lower the cost of 10GBase-T DSP baseband. However, in the 10GBase-T system, the data transmission is divided into four wire lines and all of the four wire pairs are full-duplex. The 10GBase-T transceiver must be capable to deal with the interferences, such as echo, near-end crosstalk (NEXT), and far-end crosstalk (FEXT). Conventional echo cancellation scheme [2][3] is that the echo canceller is an adaptive FIR to produce an echo replica. The received signal subtracts the echo replica to eliminate the echo interference. The architecture of the NEXT canceller is similar to echo canceller except the transmitter is now replaced by another transmitter which incurs the NEXT interference. In order to produce the echo replica, the echo canceller impulse response must be adapted to the echo impulse response. Thus, the complexity and cost of echo canceller is proportional to the length of echo impulse response. In the 10GBase-T environment, the echo response is very long. Implementing the echo canceller requires the adaptive FIR with hundreds of taps.

Hence, we propose a cost-efficient architecture of echo canceller. A Shortened Impulse Response Filter (SIRF) is used to shorten the echo impulse response to reduce the echo canceller cost. The required tap length of echo canceller is reduced and the cost of echo canceller decreases. The channel shortening technique can also be applied to NEXT cancellers to reduce the NEXT cancellers cost. Based on this principle, we propose the new cost-effective architectures of 10GBase-T baseband.

![Fig. 1. Block diagram of 10GBase-T transceiver.](image)
II. Proposed SIRF Scheme

Melsa et al. proposed an algorithm that can find the optimal solution for channel shortening by maximizing shortening SNR (SSNR) [4]. Basically, the optimal shortening algorithm is based on the knowledge of the channel. Before performing the algorithm, we need to estimate the channel response. The channel estimation accuracy will affect the shortening performance. To maximize the SSNR, we need to try different window locations, computing the SIRF coefficients, calculating the SSNR, and choosing the proper window location and SIRF coefficients which yielded the best SSNR. This can be very computationally extensive and wasteful. Hence, the optimal shortening algorithm is often performed off-line instead of on-line.

A. Proposed Channel Shortening Echo Cancellation

In [5], the LMS approach is proposed to compute the coefficient based on the steepest descent algorithm where the coefficient update is performed by iteration. The LMS approach channel shortening architecture is shown in Fig. 2. The error is feedback into SIRF \( w(n) \) and target channel \( b(n) \) to adjust the weight coefficient. The step size, \( \mu_b \) and \( \mu_w \), are important parameters in the algorithm. The choice of \( \mu_b \) and \( \mu_w \) will affect the convergence speed, stability, and shortened performance. On the other hand, the energy constraint, \( i.e., \mathbf{b}_b^T \mathbf{b}_b = 1 \), is also set to prevent the trivial solution of \( \mathbf{w}_w = \mathbf{b}_b = [0 \ 0 \ldots 0] \). If \( \mu_b \), \( \mu_w \) and the initial values of \( w(n) \) and \( b(n) \) are chosen properly, the mean square error, \( E[\left| e(n) \right|^2] \), will converge. This implies that the cascade of \( h(n) \) and \( w(n) \) will approach to \( b(n) \), which is only \( v \) taps. In other words, the impulse response \( h(n) \) is shortened to \( b(n) \).

Now we have proposed a new low cost echo cancellation scheme based on channel shortening approach. The architecture is shown in Fig. 3. Compare to the conventional architecture, a SIRF is added at the receiver to shorten the echo impulse response. Besides, the initial value and weight updating mechanism of echo canceller is different from conventional algorithm because the energy constraint is also taken into consideration.

B. Proposed Joint Shortening Echo and NEXT Cancellation

In [4][6][7], joint shortening approach is applied to DMT system. The SIRF is used to jointly shorten the channel and echo. However, these algorithms require large matrix operation. Hence these algorithms are not suitable for hardware implementation. In this part, we generalize the concept in [4] to multi-channels shortening. In addition, we derive the corresponding LMS algorithm and the hardware architecture. Suppose we want to jointly shorten \( N \) channels. The architecture is shown in Fig. 4.
The joint multi-channel shortening updating mechanism can also be deduced from the LMS algorithm.

\[
e(n) = \sum_{i=1}^{N} b_{i,n}^T x_{i,n} - w_n^T y_n, \\
\begin{align*}
  b_{i,n} &= b_{i,n} - \mu_n e(n) x_{i,n}, \\
  w_n &= w_n + \mu_n e(n) y_n.
\end{align*}
\]  

The energy constraint is set below to avoid trivial solution.

\[
b_{i,n}^T b_{i,n} = 1
\]

In order to perform on-line joint echo and NEXT cancellation, we adopt the LMS approach where the SIRF jointly shortens the echo impulse response and three NEXT interferences. The architecture is shown in Fig. 5 and the corresponding weight updating equations are listed below.

\[
e(n) = b_{i,n}^T x_{i,n} + b_{i,n}^T x_{i,n} + b_{i,n}^T x_{i,n} + b_{i,n}^T x_{i,n} - w_n^T y_n, \\
\begin{align*}
  b_{i,n} &= b_{i,n} - \mu_n e(n) x_{i,n}, \\
  b_{i,n} &= b_{i,n} - \mu_n e(n) x_{i,n}, \\
  b_{i,n} &= b_{i,n} - \mu_n e(n) x_{i,n}, \\
  w_n &= w_n + \mu_n e(n) y_n.
\end{align*}
\]

The energy constraints are set to prevent trivial solution

\[
w = b_1 = b_2 = b_3 = b_4 = [0 \ldots 0].
\]

\[
\begin{align*}
  b_{i,n}^T b_{i,n} &= 1, \\
  b_{i,n}^T b_{i,n} &= 1, \\
  b_{i,n}^T b_{i,n} &= 1, \\
  b_{i,n}^T b_{i,n} &= 1.
\end{align*}
\]

To compare the difference between echo shortening and joint shortening architectures, the echo shortening architecture is shown in Fig. 3. The SIRF only shortens the echo impulse response. Thus, the three NEXT cancellers run the conventional channel estimation algorithm. However, the channels what NEXT cancellers estimate are the cascade of SIRF and NEXT instead of NEXT only because the SIRF will also affect the NEXT response. Fortunately, in the 10GBase-T system, the echo and NEXT frequency responses are both high pass. Hence, the SIRF will not make the effective NEXT response, i.e. the cascade response of SIRF and NEXT, longer than original.

### III. Performance Analysis and Comparison

#### A. Floating-point System Simulation

The proposed channel shortening joint echo and NEXT cancellation schemes inserts a SIRF at the receiver. The proposed transceiver architecture is shown in Fig. 6.

The simulation environment is based on the 10GBase-T system specifications. First the training symbol is PAM2 while the data symbol is PAM12. The channel model of insertion loss, return loss and NEXT is available from IEEE 802.3an website [1]. We add AWGN with SNR 30 dB to model the interferences and noises other than echo and NEXT. The equalizer learning curve is shown in Fig. 7. We perform 200 randomly independent rounds and average the results. These three architectures performance can meet the SNR requirement, 23.8 dB, specified in 10GBase-T. The performance gap between the proposed and conventional architectures is less than 1 dB.
**B. Implementation Cost Comparison**

We divide the hardware cost into two parts, storage and arithmetic units. For an $N$-tap adaptive FIR, the required registers of storage part is

$$\text{storage} = (N-1) \times W_d + N \times W_c \quad \text{(registers)}. \quad (5)$$

where $W_d$ is the input data wordlength, $W_c$ is the coefficient wordlength.

In arithmetic, we use full adder (FA) as the basic unit to represent the hardware cost. We choose the Boogu-Wooley multiplier [8] and carry-propagation adder as our adaptive FIR arithmetic blocks. The cost of arithmetic part is

$$\text{arithmetic} = N \times [W_o \times (W_c - 1) + 1] + (N-1) \times W_o$$

$$+ N \times [W_o \times (W_c - 1) + 1] + N \times W_c \quad \text{(FAs)}. \quad (6)$$

where $W_o$ is the output data wordlength.

In (6), the first line represents the filter cost while the second line represents the weight updating cost. The result is listed in Table 1. We can find the cost saving of the proposed echo shortening and joint shortening is about 12% and 35%, respectively.

**Table 1. Hardware cost comparison between conventional and proposed architectures (a) storage (b) arithmetic**

<table>
<thead>
<tr>
<th>Architecture</th>
<th>FFE+</th>
<th>FBE+</th>
<th>THP</th>
<th>SIRF</th>
<th>EC+</th>
<th>NC</th>
<th>Total</th>
<th>Saving</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conventional</td>
<td>983</td>
<td>0</td>
<td>22,484</td>
<td>23,407</td>
<td>-</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>The Proposed Echo Shortening</td>
<td>1,208</td>
<td>372</td>
<td>18,884</td>
<td>20,464</td>
<td>13.7%</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>The Proposed Joint Shortening</td>
<td>1,313</td>
<td>372</td>
<td>13,034</td>
<td>14,719</td>
<td>35.2%</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Architecture</th>
<th>FFE+</th>
<th>FBE+</th>
<th>THP</th>
<th>SIRF</th>
<th>EC+</th>
<th>NC</th>
<th>Total</th>
<th>Saving</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conventional</td>
<td>10,033</td>
<td>0</td>
<td>183,684</td>
<td>193,708</td>
<td>-</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>The Proposed Echo Shortening</td>
<td>12,699</td>
<td>3,612</td>
<td>150,884</td>
<td>167,195</td>
<td>12.6%</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>The Proposed Joint Shortening</td>
<td>13,774</td>
<td>3,612</td>
<td>108,234</td>
<td>125,620</td>
<td>37.2%</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

IV. Conclusion

In this paper, the system simulations of the conventional and proposed architectures are demonstrated. We set the equalizer output SNR as the performance index. Floating-point simulation results show that the proposed architectures can meet the 10GBase-T specifications. From fixed-point analysis, we compare the cost between the conventional and proposed architectures. The cost estimation is divided into two parts, arithmetic and storage. The performance degraded of the proposed echo shortening and joint shortening is less than 1 dB. Moreover, the cost saving of the proposed echo shortening and joint shortening is 12% and 35%, respectively.

**References**


