Digital Signal Processing Engine Design for Polar Transmitter in Wireless Communication Systems

Hung-Yang Ko, Yi-Chiuan Wang and An-Yeu (Andy) Wu

Abstract—Polar modulation techniques offer the capability of multimode wireless system and the potential for the high efficiency Power Amplifier (PA). This paper describes a new design of Digital Signal Processing (DSP) engine for the polar transmitter. The digital part includes rectangular-to-polar converter and digital phase modulator, and the engine is designed for EDGE (2.5G) system. We employ the Coordinate Rotation Digital Computer (CORDIC) and Direct Digital Frequency Synthesizer (DDFS) techniques in our design. A prototype chip has been designed and fabricated in UMC 0.18 um CMOS process with 1P6M technology.

1. INTRODUCTION

Polar modulation offers the capability of achieving high linearity and high efficiency simultaneously in a wireless transmitter. Improved efficiency is achieved by using a highly efficient and non-linear PA to work at its peak efficiency. Linear transmission is achieved by modulating the envelope of the signal through the voltage supply of the PA.

Polar transmission utilizes envelope and phase component to represent the digital symbols instead of the conventional I/Q format [1]. The baseband signal $V(t)$ is split into the phase signal $\theta(t)$ and the envelope signal $A(t)$.

$$V(t) = x(t) + j \cdot y(t).$$

$$A(t) = \sqrt{x(t)^2 + y(t)^2},$$

$$\theta(t) = \tan^{-1}\left(\frac{y(t)}{x(t)}\right).$$

It is clear that from Eq. (2) we can have a phase-only signal through phase modulator and multiplied with its envelope at the PA to recreate the original complex signal $V(t)$. This polar modulation process is like the Envelope Elimination and Restoration (EER) [2] architecture. In the conventional design, one part goes through a limiter to remove the envelope and keeps the phase information only. And the other part is detected by an envelope detector to extract the envelope information.

This work is supported by the MediaTek Inc., under NTU-MTK wireless research project.

But both circuits would suffer from the non-linearity and distortion of the analog devices and would cause mismatch problem through the two paths. In this paper we proposed a DSP engine which includes rectangular-to-polar converter and digital Phase Modulator (PM). The design does not have the distortion problem caused by the analog components and the phase modulation process can be precisely controlled by the digital phase modulator. The baseband phase signal is modulated through digital phase modulator at the specific frequency range. The phase modulated signal is represented as $S_{IF-PM}(t)$.

$$s_{IF-PM}(t) = \cos(w_c t + \phi(t)).$$

The PA stage of amplitude modulator (AM) operates in principle as a multiplier in our design model. This gives the output signal in the specific frequency band as follows:

$$s_{IF}(t) = A(t) \cdot s_{IF-PM}(t),$$

$$= A(t) \cdot \text{Re}\left[e^{j \phi(t)} \cdot e^{jw_c t}\right],$$

$$= x(t) \cos(w_c t) + j \cdot y(t) \sin(w_c t).$$

For convenience of the simulation model [2], the gain of the PA is set to one. Thus the Eq. (4) is equal to the signal of EDGE, which is up-converted at Intermediate Frequency (IF) band. The non-linearity of PA and analysis of up-converter to Radio Frequency (RF) stage are beyond the scope of this paper.

2. POLAR TRANSMITTER ARCHITECTURE

The architecture of the polar transmitter is shown in Fig. 1. The rectangular-to-polar converter extracts the symbol phase and envelope information in the digital domain. Then the phase information is modulated through digital phase modulator to create a constant envelope and phase modulated signal. The phase modulated precision and channel selection can be well controlled in the digital part first. In this paper we use the concept from [3] to realize the digital phase modulator design. The digital fine-tune frequencies are generated by the DDFS. The DDFS interpolates...
the carrier frequencies between the coarse frequencies generated by the integer-N PLL. The main design considerations of the DSP engine include: (a) the bandwidth of the envelope and the phase signal; (b) the numbers of the fine-tune frequencies generated by the DDFS would affect the clock rate of DDFS and rectangular-to-polar converter; (c) the quantization effect in digital domain will cause phase noise and frequency spurs. And this effect also influences the Error Vector Magnitude (EVM) performance and the signal spectrum. Typically the bandwidths of envelope and phase signal are equal to 1~2 MHz and larger than the EDGE signal bandwidth 200k Hz. The clock rate of the DDFS can be derived [3] as below:

\[ f_{clk} = S \cdot f_{sym} > \frac{1}{0.4} \cdot (f_{cs} \times (N + 1) + \frac{f_{tb}}{2}) \]  \hspace{1cm} (5)

Where \( f_{clk} \) is the clock rate of DDFS, \( S \) is the number of samples per symbol and \( f_{sym} \) is the symbol rate of the EDGE signal. The maximum output frequency of DDFS is limited to 0.4 times the clock frequency. The parameter \( f_{cs} \) is the carrier spacing (200 kHz) in EDGE system, \( N \) is the number of digital fine tune frequency and \( f_{tb} \) is the transition BW of the filter which is located after up-converter stage. In our design, we choose \( N=25 \), \( f_{cs}=10 \text{ MHz} \), \( f_{tb}=200 \text{ kHz} \) and \( S=96 \). Thus the clock rate of DDFS should be operated at 26 MHz. The digital fine tuning frequencies are generated by the DDFS and locating at 5 MHz~10.4 MHz. Each interpolated frequency (channel) is stored in the \( f_{tb} \) and input vector is \((x_i, y_i)\) from the EDGE signal:

\[ x_2 = d_1 \cdot y_1, \]
\[ y_2 = -d_1 \cdot x_1, \]
\[ z_2 = 0.5 \cdot d_1, \]  \hspace{1cm} (6)

\[ d_1 = \text{sign}(y_i) = \begin{cases} -1, & y_i < 0 \\ 1, & y_i \geq 0 \end{cases} \]

And the remaining iterations (for \( i=2\)~\( n \)) are shown in Eq. (7):

\[ x_{i+1} = x_i + d_i \cdot 2^{-2(i-2)} \cdot y_i, \]
 \[ y_{i+1} = 2 \cdot \left[ y_i - d_i x_i \right], \]
 \[ z_{i+1} = z_i + d_i \cdot p_i, \]  \hspace{1cm} (7)

The desired phase is \( z_n \) and the desired envelope value is \( x_n \), multiplied by a constant scaling factor \( K \). Due to the iterative feature of CORDIC algorithm, the clock rate of this module is \( n \cdot f_{clk} \) and \( n \) is iteration number. It is hard for the module to operate at such high clock rate. A compromise is to use unfolded technique and the architecture is shown in Fig. 2.

### 3. RECTANGULAR-TO-POLAR CONVERTER

For a coordinate axis converter, we adopt the CORDIC algorithm in our design since the CORDIC algorithm is very simple and low hardware cost. In order to further reduce the complexity, we also apply the technique in [4] to our rectangular-to-polar converter. For the first iteration we move the input vector into the \( L_{ck} \) and \( 4_{ck} \) quadrant with simply sign inverting and data exchanging. Second we replace \( y_i \) by \( y_i \cdot 2^n \) as compared with conventional CORDIC algorithm. This modification can save once iteration and one barrier shifter in the rectangular-to-polar converter. This can save more area in our design. For \( i=1 \) and input vector is \((x_1, y_1)\) from the EDGE signal:

\[ x_2 = d_1 \cdot y_1, \]
\[ y_2 = -d_1 \cdot x_1, \]
\[ z_2 = 0.5 \cdot d_1, \]  \hspace{1cm} (6)

\[ d_1 = \text{sign}(y_i) = \begin{cases} -1, & y_i < 0 \\ 1, & y_i \geq 0 \end{cases} \]

And the remaining iterations (for \( i=2\)~\( n \)) are shown in Eq. (7):

\[ x_{i+1} = x_i + d_i \cdot 2^{-2(i-2)} \cdot y_i, \]
 \[ y_{i+1} = 2 \cdot \left[ y_i - d_i x_i \right], \]
 \[ z_{i+1} = z_i + d_i \cdot p_i, \]  \hspace{1cm} (7)

### 4. DIGITAL PHASE MODULATOR

The DDFS architecture is shown in Fig. 3. The DDFS has three basic blocks: FCW table, phase accumulator and phase-to-amplitude converter. The FCW table stores the desired fine-tune frequency control words and can be derived from Eq. (8).

\[ f_c = \frac{FCW \cdot f_{clk}}{2^L}, \quad \forall \quad FCW < 2^{L-1} \]  \hspace{1cm} (8)

In our design we focus on the phase-to-amplitude converter design and propose an architecture which is based on Least Squared (LS) algorithm [5] and Merged-Multiply Accumulator (MAC) technique [6]. The input phase is first truncated by 3-bit according to the \( \pi/4 \) symmetry and the amplitude of the sine
function can be express by the polynomial. The approximated polynomial is generated according to the LS algorithm. In this paper we compare the Spurious Free Dynamic Range (SFDR) performance with the other approximation algorithm such as Taylor and Chebyshev \cite{9}. The comparison method is set the input phase from 0 to π/2. The phase word-length is 15-bit and amplitude output is 15-bit. From the simulation result in Fig. 4, we can easily see that the LS-based polynomial can achieve better performance than Taylor and Chebyshev approximation algorithm with less polynomial order. The less order of polynomial means that low hardware complexity can also be achieved.

\[ p(X) = c_2 \cdot X^2 + c_1 \cdot X + c_0 \]

\[ = \sum_{i=0}^{n-1} R_i \cdot 2^i \cdot [c_1 \cdot X]^n_3 + \sum_{k=0}^{n-1} C_k \cdot 2^k \]

\[ = MAC(\text{rom}_1) + \sum_{j=0}^{n/2} Q_j \cdot [X]^n_3 \cdot 4^j + \text{rom}_2 \]

\[ Q_j = -2c_{1,2,j+1} + c_{1,2,j} + c_{1,2,j+1}, \quad c_{1,0} = 0, \quad \text{and} \quad c_{1,-1} = 1, \quad \text{(10)} \]

Where \( c_i \) represents the coefficient, and \( X \) is the phase of each divided region. In Eq. (9) we store the first term and third term in the look-up table. The size of \( \text{rom}_1 \) and \( \text{rom}_2 \) are 1,536 bits and 232 bits respectively. The operations in Eq. (9-10) now become one booth multiplication and two constant additions. These can be merged into a modified-MAC (Fig. 4).

First the binary phase \( X \) is inputted to the booth decoder circuit and the partial product term is generated in each row of MAC. The partial product terms are summed through Carry-Save-Adder (CSA) tree. As compared with the direct implementation of 2\( n \) \( \times \) \( n \) order polynomial, the CSA tree can prevent the carry ripple problem in the early stages, and the carry ripple only occurs at the final stage. Due to the EDGE spectral requirement we target the desired SFDR over 80dBc. From Matlab simulation, we set the truncated accumulated phase word-length to \( W = 15 \) bits and amplitude word-length to \( P = 14 \) bits. These hardware parameters can achieve SFDR=86dBc. The other parameter is the word-length of the phase of the EDGE signal. This will also introduce phase noise and spur in the output spectrum and we will discuss in section 5. The proposed DDFS circuit is simulated by the NAPOSIM tool and compares with state of the art in Table 1. It is obvious that the proposed DDFS can achieve high SFDR performance. The power efficiency is also superior to the other designs.

![SFDR comparison between LS, Taylor and Chebyshev.](image1.png)

![Architecture of Modified-MAC.](image2.png)

**Table 1. Comparison with the existing DDFS designs.**

<table>
<thead>
<tr>
<th>DDFS</th>
<th>CMOS tech.</th>
<th>SFDR</th>
<th>Latency</th>
<th>Power efficiency (mW/MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ours</td>
<td>0.18</td>
<td>86</td>
<td>5</td>
<td>0.15</td>
</tr>
<tr>
<td>Ref [7]</td>
<td>0.18</td>
<td>84</td>
<td>-</td>
<td>0.22</td>
</tr>
<tr>
<td>Ref [8]</td>
<td>0.25</td>
<td>90.3</td>
<td>13</td>
<td>0.66</td>
</tr>
<tr>
<td>Ref [9] (Taylor)</td>
<td>0.35</td>
<td>82.5</td>
<td>9</td>
<td>0.26</td>
</tr>
<tr>
<td>Ref [9] (Chebyshev)</td>
<td>0.35</td>
<td>73</td>
<td>7</td>
<td>0.35</td>
</tr>
<tr>
<td>Ref [10]</td>
<td>0.35</td>
<td>80</td>
<td>2</td>
<td>0.44</td>
</tr>
</tbody>
</table>

**5. SIMULATION RESULT**

For Mobile Station (MS), the requirements of EVM-rms and EVM-peak are below 9% and 30%. For Base-Transceiver Station (BTS) EVM-rms and EVM-peak are below 7% and 22%. The SFDR performance of the digital frequency synthesizer is suitable for the up-link and down-link spectral requirement. But the phase signal word-length also contributes spur and phase noise. And the wordlength also affects the EVM and the signal spectrum. In this paper we simulate the finite word-length (J) effect of the phase signal with the EVM measurement and spectral mask requirement. The performance summary in is Table 2.

**Table 2. Simulation result and EVM measurement.**

<table>
<thead>
<tr>
<th>J-bits</th>
<th>EVM-rms</th>
<th>EVM-peak</th>
<th>Spectral requirement</th>
</tr>
</thead>
<tbody>
<tr>
<td>9-bits</td>
<td>0.028%</td>
<td>0.094%</td>
<td>No (Spurs at -66dBc)</td>
</tr>
<tr>
<td>10-bits</td>
<td>0.014%</td>
<td>0.046%</td>
<td>No (Spurs at -74dBc)</td>
</tr>
<tr>
<td>11-bits</td>
<td>0.007%</td>
<td>0.018%</td>
<td>No (Spurs at -79dBc)</td>
</tr>
</tbody>
</table>
From the Table 2, we can see that the errors produced by the phase quantization are very small for the word-length higher than 9-bits. And the errors introduced by the entire digital phase modulator can be eliminated. But the spectrum of the $S_{IF-PM}(t)$ signal is not exactly below the spectral mask. Especially for BTS-mask, the requirement of the mask is more stringent than MS-mask. Since the quantization phase error will degrade the synthesizer SFDR performance. It is conservative to choose $J=12$-bit in our design. The signal spectrum with $J=12$-bit at the carrier which equals to 8 MHz is shown in Fig. 5. The digital phase modulated signal generated by the DDFS can meet the spectral requirement for BTS-mask and MS-mask.

![Fig. 5. The spectrum of the EDGE signal through DSP engine.](image)

**6. IMPLEMENTATION RESULT**

The proposed DSP engine was implemented in UMC 0.18 um CMOS process with 1P6M technology. The layout of the DSP engine is shown in Fig. 6. The summary of the circuit is list in Table 3.

![Fig. 6. layout of the proposed DSP engine.](image)

**Table 3. Implement summary of the DSP engine.**

<table>
<thead>
<tr>
<th>Technology</th>
<th>UMC 0.18 um 1P6M CMOS</th>
</tr>
</thead>
<tbody>
<tr>
<td>Voltage</td>
<td>1.8 V</td>
</tr>
<tr>
<td>Core layout area</td>
<td>0.51x0.51 mm$^2$</td>
</tr>
<tr>
<td>Chip layout area</td>
<td>1.114x1.114 mm$^2$</td>
</tr>
<tr>
<td>System clock Frequency</td>
<td>26MHz</td>
</tr>
</tbody>
</table>

**7. CONCLUSION**

In this paper, we proposed the DSP engine for the polar transmitter. The engine is realized by the CORDIC and DDFS techniques. In the digital phase modulator we adopt the LS algorithm. We also apply MAC technique in our DDFS architecture to reduce the hardware complexity and decrease the carry ripple problem of the direct polynomial implementation. The chip implementation with UMC 0.18 um CMOS process with 1P6M technology is also presented in this paper.

**8. REFERENCES**


