# Power Amplifier (PA) design

Copyright© Dr. Osama Shana'a

# • Power Amplifier

- function
- The challenge of delivering high power
- Load pull
- Class A, AB, B, C, F, E
- Memory effects
- Efficiency enhancement techniques

# • References

### Does a power amplifier really amplifies power?



• a power amplifiers converts DC power (from power supply) to RF power (as efficiently as possible). This is kind of different from "small-signal" blocks such as an LNA

# The challenge of delivering high-power in CMOS:



- from the above equation, to deliver +30dBm (1Watt) output power to the 50  $\Omega$  load,  $V_{\text{DD}}$  needs to be 10V !!
- If CMOS technology of say 55nm is used with HCI reliability limit of 1.3V only, the device will breakdown and so cannot deliver this power. The max power that can reliably be delivered to such a load is (1.3)<sup>2</sup>/100 = 17mW (~12dBm)!
- $\rightarrow$  What to do?

# Reliability of CMOS device and stress mechanisms: HCI



- Hot Carrier Injection (HCI) happens when device is ON (channel is • inverted) and there is large horizontal electric field present due to large Vds.
- Electron carriers experience large velocity and get injected into the • oxide shifting the device threshold voltage over time (it is a life time degradation issue)
- Issue is alleviated by reducing Vds/Vgs and/or increase device channel length

# **Reliability of CMOS device and stress mechanisms: NCS**



- None-Conductive Stress, also known as Punch Through, happens when the device is not conducting any current (off) while there is a large Vds voltage across the device
- Depletion region of drain extends towards source increasing leakage current, hence device off-region Rds.
- Issue is alleviated by reducing Vds and/or increase device channel length
- NCS voltage rating is usually much higher than its HCI for a given technology
- NCS is not destructive unless large currents causes thermal or electronmigration damage

### Reliability of CMOS device and stress mechanisms: Oxide BD



- Oxide breakdown (BD) happens when large Vgs or Vdg exceed the rating
- Oxide breakdown is a catastrophic damage (instant, cannot recover)
- There is usually no difference between AC and DC signals (the moment the voltage crosses the breakdown threshold, damage occurs)
- Oxide BD can only be alleviated by lowering voltage (or of course use thicker oxide device)

### **Reliability of CMOS device and stress mechanisms: TDDB**



- Time-Dependent Dielectric Breakdown (TDDB) is a "wear-out" mechanism of the device oxide due to defects that degrades (or destroys) device over time.
- It is a statistical process (two devices with same terminal voltages may fail at different times)
- Defects in the oxide can be due to manufacturing or due to gate tunneling effect in thin-oxide
- TDDB rating is inversely proportional to device oxide area. It can only be alleviated by lowering voltage (or of course use thicker oxide device)

#### Important issues about device reliability metrics: HCI, TDDB, NCS, ...



- The pass/fail voltage rating is done at the device level (for example when Vth shifts by 10mV or when Idss changes by 10%). These numbers have nothing to do with PA performance (in many cases PA is just fine if Vth shifts by 10mV)
- In some of these reliability mechanisms, there is a difference between AC and DC signals (device can tolerate larger AC swing than constant DC voltage). The AC signal nature (CW, OFDM, etc.) also another factor (device can tolerate OFDM signal more than CW tone with same peak value).
- PA design is all about pushing these limits and design a circuit that is reliable for the target application but not being very conservative (foundry gives very conservative device reliability numbers usually related to DC voltages)

→ Understanding device reliability and its impact on PA performance is KEY. PA burn-in in the lab is best way to design PAs while pushing these limits
Copyright© Dr. Osama Shana'a

#### **SEM photos of actual PA failures:**



 Failure analysis (FA) can lead you to which device fails and which port (gate, drain) has the damage for you to analyze the failure mechanism and adjust PA design to fix the issue
 Copyright© Dr. Osama Shana'a

11

### The challenge of delivering high-power in CMOS:



- from the above equation, to deliver 30dBm (1Watt) output power to the 50-ohm load, VDD needs to be 10V !!
- If CMOS technology of say 55nm is used with HCI reliability limit of 1.3V only, the device will breakdown and so cannot deliver this power. The max power that can reliably be delivered to such a load is (1.3)<sup>2</sup>/100 = 17mW (~12dBm)!
- $\rightarrow$  What to do?

12

# **R**<sub>load</sub> of PA to deliver high Pout:



- With V<sub>DD</sub> is set to **1.3V** to match the max reliability V<sub>ds</sub> of thin-gate 55nm device, we can then calculate Rload needed to deliver **30dBm** power:
- →  $R_{load} = 0.85\Omega!$  This is very small load resistor to be realized practically with on-chip components (need to still match to the 50 $\Omega$  antenna)

Let us find out why

### Matching PA load to $50\Omega$ :



- In our example, with  $R=0.85\Omega$ , the network Q is ~7.6.
- → This may not sound too bad. However, the issue is the finite Q of on-chip matching network as explained next

### Finite Q of on-chip matching passives:



- The finite Q of matching network results in efficiency degradation of matching network (power loss)
- → For say 80% matching network efficiency ( $\eta_{match}$ ), in our example (R = 0.85 $\Omega$ ) Qp needs to be around ~31. This is hard to realize with on-chip inductors

### Impact of finite Q of on-chip matching components:



- Need to operate within the region of realizable on-chip inductors Q (<13)
- → For matching network efficiency η>70%, the only way to do this with on-chip matching is by reducing ↓*r*, which means increasing ↑*R*. This means for same output power we need to increase ↑swing without violating reliability limits!
- $\rightarrow$  How?

#### Increase swing without affecting reliability: use differential circuit



- For same VDD and target Pout as before, the differential R can now be 4x larger or in our example is 4 x 0.85 =  $3.4\Omega$ . This brings r down to ~15.
- Although this is better than before and matching can be realized by on-chip components, however, the loss is quite large and efficiency is <70% for inductor Q of 8 or 10
- → Still need to reduce r further. The only thing left is to increase VDD without affecting reliability.

How?

#### Increase VDD using thick-gate cascode device:



- A thick-gate cascode device can be used to be able to increase V<sub>DD</sub>. The cascode device can handle the large output swing while protecting the bottom CS thin-gate device which provides the PA gm and gain.
- → Using an I/O device (thick-gate, L=0.28µm) in 55nm, V<sub>DD</sub> can be increased to 3.3V. For the differential topology and for same Pout, this brings *R* to ~22Ω. This in turn brings *r* to ~2.3. Now matching network can comfortably be realized with on-chip inductors of Q=8 and efficiency is close to 90%.

Copyright© Dr. Osama Shana'a

#### Increase VDD using "special" High-voltage cascode device:



- Drain-extension using lightly doped n-well region increases punch-through Vds voltage to 7~10V. The gate-oxide also does not overlap with the drain n+ region improving Vgd oxide breakdown voltage.
- The device usually comes for "free!" with no extra mask is needed
- The f<sub>T</sub> of such device is decent (25~35GHz in 55nm technology). Drain parasitic cap is large and can impact PA distortion (AM-PM)
- "knee voltage" of the device maybe a bit larger than normal I/O device Copyright© Dr. Osama Shana'a

USB 5V



 $V_{DD}$  value in many cases is not a design option. It is usually set by the application. To make V<sub>DD</sub> a design parameter you need to build a buck-boost DC-DC converter only for the PA, which may not be cost effective in some applications. A cheaper way is to build an LDO but this is not good for efficiency. In ultra-low cost application, the PA needs to interface directly with the supply Copyright© Dr. Osama Shana'a

# Matching using balun:



- For a given *R* the PA likes to see, balun N can be calculated for output matching. For example, if R=12Ω, a 1:2 balun can be used to match this PA impedance to 50Ω.
- Large N results in lower balun Q (multi-turns) which increases balun loss and so reduces PA efficiency

20

M₁

CLARGE :

Cin

V<sub>in</sub>



Sensitive to

inductance

ground



RFOUT

 Single-ended PA is quite sensitive to ground inductance. Low ground inductance are hard to realize on-chip with finite package inductance. The inductance impacts PA gain and power added efficiency. Differential PA does not have this issue.

bias -

 Adding a cascode device increases the "knee voltage" of the PA. Using HV-MOS increases the knee voltage even further. This impacts PA swing, Pout and efficiency

### What if there is no HVMOS device in a process? power combining:



 $V_{out} = nV_{unit}$  $I_{out} = I_{unit}$  $\rightarrow P_{out} = nP_{unit}$ 

- In the case where there is no highvoltage device to be used in a cascode (for example in FinFet process), power combining is perhaps your best choice for PA architecture
- Power combining uses smaller PAs (each delivering relatively small power within its reliability capability) and combine them using passive combiner
- As the number of PAs increases, the power combiner design gets complicated and its loss increases.
   Therefore, there is an optimum number of PAs in a given process.

23

# **Important metrics for PA:**



• Drain efficiency,  $\eta_{\text{drain}}$ , is the efficiency at the drain of the PA by calculating the power delivered by the drain node of the PA relative to DC power from supply. It does not include any matching network losses. This metric tells you how good your PA core is

### PA distortion: Static AM-AM and AM-PM



- AM-AM distortion is mainly due to 3<sup>rd</sup> order distortion of the PA, as discussed in a previous lecture (and sometimes higher odd order distortion like 5<sup>th</sup>).
   AM-PM distortion is due to non-linear capacitors in the PA circuit
- Plots are generated by sweeping input CW tone amplitude and record gain and phase of output vs Pout

# PA distortion: dynamic, memory effects



- memory effect is when the PA output, at a given time, depends also on the instantaneous input that happened earlier
- problem gets worse with wider signal bandwidth
- the "thicker" the AM-AM/PM traces are the worse are memory effects

# PA distortion: dynamic, memory effects' signature



• it is not the asymmetric IM3, but that the asymmetry <u>changes</u> as a function of tone pacing (signal bandwidth)

# PA distortion: memory effects, cause



- memory effects are caused by the impedance at any node of the PA being frequency dependent (not purely resistive, has a time constant or delay) over the signal envelop bandwidth (tone spacing baseband frequency)
- Similar behavior if impedance phase changes over HD2
- results in un-balanced upper and lower IM3 and becomes tone spacing (bandwidth) dependent
- Need to address <u>both</u> the source of memory effects and the transfer function by which memory effects gets inband

### Memory effects: IM3 asymmetry due to RF HD2



• terminating second order harmonics helps mitigate this mixing effect but impedance should be low and flat (resistive) over BW

Copyright© Dr. Osama Shana'a

### Memory effects: IM3 asymmetry due to envelop



 designing bias circuits with low resistive output impedance and high BW compared to signal envelop helps mitigate this mixing effect difference to upper/lower sides Copyright© Dr. Osama Shana'a

# Memory effects: IM3 asymmetry due to envelop

Let us represent the two tone currents as:

$$i_1 = \cos(\omega_1 t)$$
 ;  $i_2 = \cos(\omega_2 t)$ 

The low-frequency current component due to second order distortion can be calculated as:

$$\alpha_2(i_1+i_2)^2 = \frac{\alpha_2}{2}\cos(\Delta\omega t)$$
; where  $\Delta\omega = \omega_2 - \omega_1$ 

If the impedance at  $\Delta \omega$  has an imaginary part, it can be represented as  $|\mathbf{Z}|e^{j\theta}$ 

Therefore, the voltage at such node is I x Z, and so can be represented as:

$$v_{IM2} = \alpha_2 \frac{|Z|}{2} \cos(\Delta \omega t + \theta)$$
  
When  $v_{IM2}$  mixes with the two tones at that node due to 3<sup>rd</sup> order nonelinearity, it results in:  
 $IM3_L = \alpha_3 [v_{IM2} \times \cos(\omega_1 t)] = \alpha_2 \alpha_3 \frac{|Z|}{4} \cos[(2\omega_1 - \omega_2 t) - \theta]$ 

$$IM3_{H} = \alpha_{3}[v_{IM2} \times \cos(\omega_{2}t)] = \alpha_{2}\alpha_{3}\frac{|Z|}{4}\cos[(2\omega_{2} - \omega_{1}t) + \theta]$$

• It can be seen that the phase of the termination impedance,  $\theta$ , at  $\Delta \omega$  due to its imaginary part causes the  $IM3_L$  and  $IM3_H$  to be different and so one adds constructively and the other destructively to the PA's own IM3

Copyright© Dr. Osama Shana'a

UC Berkeley, EECS 290C 30

# Memory effects: IM3 asymmetry due to envelop, cont

From previous slide it was found out that:

$$IM3_L = \alpha_3[v_{IM2} \times \cos(\omega_1 t)] = \alpha_2 \alpha_3 \frac{|Z|}{4} \cos[(2\omega_1 - \omega_2 t) - \theta]$$

 $IM3_{H} = \alpha_{3}[v_{IM2} \times \cos(\omega_{2}t)] = \alpha_{2}\alpha_{3}\frac{|Z|}{4}\cos[(2\omega_{2} - \omega_{1}t) + \theta]$ 

- Therefore, in order to reduce memory effects and asymmetry between  $IM3_L$  and  $IM3_H$  as a function of  $\Delta \omega$ , you need to do the following:
- 1. Reduce  $\theta$  to zero (termination impedance at envelop frequency  $\Delta \omega$  be real or resistive, no imaginary part)
- 2. If  $\theta$  cannot be reduced to zero, you can reduce termination impedance at envelope frequency |Z| to a very small value (for example use inductor to bias the PA which provides low impedance at envelop frequencies)
- 3. Reduce the transfer function (PA nonelinearity at each node) parameters  $\alpha_2\alpha_3$  by which memory effects makes it to RF.

UC Berkeley, EECS 290C 31

# **Choosing the optimum PA load: load-pull**



- load-pull is sweeping PA drain load impedance (green dots on Smith Chart above) and plot contours of same performance parameters
- w, x, y z above can be Pout, PAE, gain, AM-AM/PM, Vswing, etc
- multi-dimensional sweep adds Vgs dc bias sweep, device size sweep, Rs sweep, etc.
- → Select the optimum load that meets all requirements Copyright© Dr. Osama Shana'a

32

# Load pull: how it looks like, example







a simulator like Spectre or ADS is used to sweep circuit parameters (Vgs bias, device size, Pin, etc) and plot contours of Pout, efficiency, output swing, Psat, AM-AM/PM etc.
matlab script is used to discard load impedances that do not meet PA specs (for example swing exceeds reliability, efficiency is too low, Psat is too low, etc.

• optimum impedance is then selected Copyright© Dr. Osama Shana'a

33

### PA DC bias sets its class: Class A, B, AB, C



- Vbias sets the class of PA.
- class A is most linear but least efficient. Class C is most efficient of the 4 modes above but most none-linear

Copyright© Dr. Osama Shana'a

35

### PA DC bias sets its class: Class A, B, AB, C



- more overlap between device voltage and current results in lower efficiency
- more efficient PA (class C) is generally more nonelinear and provides lower gain
- power dissipated into the device generates heat (thermal)

Copyright© Dr. Osama Shana'a

### PA DC bias sets its class: Class E, F



Copyright© Dr. Osama Shana'a

36

## PA In-band distortion: IM3, IM5, IM7 interaction

two-tones: S1 at f1, S2 at f2

3rd oder distorion: 2f1 - f2, 2f2 - f1

5th oder distorion: (3f1 - 2f2), (3f2 - 2f1), (2f1 - f2), (2f2 - f1)

7th oder distorion: (4f1 - 3f2), (4f2 - 3f1), (3f1 - 2f2), (3f2 - 2f1), (2f1 - f2), (2f2 - f1)



• at small Pin/Pout, IM(n) has a slope of (n) vs Pin

at large Pin/Pout, 3<sup>rd</sup> and 5<sup>th</sup> (7<sup>th</sup>, etc.) order distortion generate distortion components at exact same frequencies but with different phases. In some cases, the phase may result in partial cancellation which shows improvement in IM3 at high Pout for some PAs.

Copyright© Dr. Osama Shana'a

## **Two-stage PA:**



• The PA stage (called sometimes the power stage) delivering power to the load may not have sufficient power gain (especially if it is class C or class AB)

- as a result, a PA-driver stage is added to boost the overall PA gain
- a matching network (usually balun) is used to boost the low input impedance of the large PA power stage seen by the PA driver stage
- the overall PA efficiency must include the PA-driver stage.
- a typical PA power gain is 25~30dB.

• driver stage is usually made to be very linear (class-A) to prevent higherorder distortion inside PA of signals coming from this stage

# Efficiency degradation due to PAPR:



• as we discussed in previous lectures, PA efficiency degrades quite rapidly vs backoff power from Psat

• it will be "nice" to see if PA does not need to backoff and still support large PAPR  $\rightarrow$  how?

- 39

# Efficiency enhancer: PAPR (crest factor) reduction



• if PAPR is reduced, less backoff is needed  $\rightarrow$  efficiency is improved

• PAPR reduction degrades in-band EVM. Therefore, for it to work, the receiver needs to apply the inverse reduction function to recover signal back. Not all standards support this → do PAPR reduction only for relaxed EVM modulations, for example MCS0 in WiFi.

## Efficiency enhancer: active load modulation, Doherty PA



- Doherty PA has two power amplifiers; Main to deliver average power and Aux to deliver peak power. They are biased at different points. Aux PA starts to turn-on once Pout crosses the average power level
- impedance inverter results in dynamic load modulation. As a result, two peaks show up in efficiency plot
- problem with Doherty PA is its nonlinearity, especially strong memory effects

#### **Concept of Doherty PA:**



Copyright© Dr. Osama Shana'a

#### **Efficiency enhancement: Doherty PA**



- for low Power, only Main PA is on (half size of class AB). When Power reaches Pave (PAPR below Psat), Main PA reaches Psat-3dB and Aux PA turns on and so efficiency reaches its first peak. As Pout increases, Aux turns on more and efficiency reaches its second peak at Psat.
- Doherty improves efficiency over class AB PA almost by 2x for output stage (if you ignore driver)
- As seen from AM-AM/PM, Doherty has severe distortion and so DPD is a must (memory DPD most of the time)
   Copyright© Dr. Osama Shana'a

43

#### Example Doherty PA: WiFi 11ac Front-end Module (FEM)



• two stage impedance inverter for 1GHz RF bandwidth (WiFi 6GHz 11ac/ax)

• input coupler/power-splitter causes a 90-degree phase shift. Impedance inverter also causes 90-degree phase shift, so both Main and Aux currents are combined in-phase

 when Main and Aux PAs are of same size, the PA is called "symmetric Doherty PA". At Psat, Main and Aux deliver equal power to the load. If Aux is to deliver most of the PAPR power, Aux PA size has to be larger than Main PA (asymmetric design)

# Efficiency enhancement: Envelop tracking (ET)



## **Fixed Supply**

- Simple, but poor efficiency
- Efficiency is strong function of waveform PAPR.

## Average Power Tracking (APT)

- Per-slot tracking based on TX power level
- Improves efficiency at low power.
- Doesn't help at high power.

## **Envelope Tracking (ET)**

- Dynamic, high speed supply tracking of signal amplitude
- Improves efficiency
- Waveform-independent

Copyright© Dr. Osama Shana'a

#### Wasted supply power

#### Efficiency enhancement: continuous ET scheme



- LF path in parallel with HF path to track envelope.
  - LF path: high-efficiency to provide DC current (APT)
  - HF path: linear stage to provide AC current (ET)

46

47

#### Efficiency enhancement: continuous ET scheme



- amplifier distortion needs to be low over envelop ( $3^{5x}$  signal BW)  $\rightarrow$  needs current
- system efficiency drops fast as signal BW increases
- Requires stringent alignment between envelop signal (VPA) and input to PA (<1ns range)  $\rightarrow$  extensive calibrations needed
- Does NOT work well for high BW (>200MHz) and low EVM systems (-33dBc)
- overhead is large and efficiency improvement is low for low PAPR systems (<5dB) like 3G cellular (overall system efficiency improvement is <5% to justify the cost)
- So certain (overall system enciency improvement is <5% to justify the cos

• you need one of these for each PA (dual-band, MIMO, etc.) Copyright© Dr. Osama Shana'a

48

# Efficiency enhancement: Discrete ET scheme



Philip Godoy, et al., JSSC 2012

Copyright© Dr. Osama Shana'a

#### **References:**

[1] Yuen Hui Chee, Fatih Golcuk, Toru Matsuura, Christopher Beale, James F. Wang, and Osama Shanaa, "A Digitally Assisted CMOS WiFi 802.11ac/11ax Front-End Module Achieving 12% PA Efficiency at 20dBm Output Power with 160MHz 256QAM OFDM Signal," *ISSCC Dig. Tech. papers*, pp 292-293, Feb 2017

[2] C.P. Huang, et al., "A Highly Integrated Single Chip 5-6 GHz Front-end IC Based on SiGe BiCMOS that enhances 802.11ac WLAN Radio Front-End Designs", RFIC Dig. of Tech. Papers, pp. 227-230, May 2015

[3] Alihossein Sepahvand, Parisa Momenroodaki, Yuanzhe Zhang, Zoya Popovi´c, and Dragan Maksimovi´c, "Monolithic Multilevel GaN Converter for Envelope Tracking in RF Power Amplifiers," IEEE Energy Conversion Congress and Exposition, pp1-7, Sept 2016

[4] Philip A. Godoy, SungWon Chung, Taylor W. Barton, David J. Perreault, and Joel L. Dawson, "A 2.4-GHz, 27-dBm Asymmetric Multilevel Outphasing Power Amplifier in 65-nm CMOS," IEEE Journal of Solid-State Circuits, Vol. 47, No. 10, pp. 2372-2384, Oct. 2012

[5] Yehui Han and David J. Perreault, "Analysis and Design of High Efficiency Matching Networks," IEEE Transactions on Power Electronics, Vol. 21, No. 5, September 2006, pp. 1484-1491

[6] Domine Leenaerts, J. van der Tang, Cicero S. Vaucher, Circuit Design for RF Transceivers, Springer, 2007

[7] S.C. Cripps, Advanced Techniques in RF power Amplifier Design, Norwood, Artech House, 2002

[8] Joel H. K. Vuolevi, et. al, "Measurement Technique for Characterizing Memory Effects in RF Power Amplifiers," IEEE TMTT, Vol. 49, No. 8, pp. 1383-1389, August 2001

[9] Joao Paulo Martins, et. al, "A Metric for the Quantification of Memory Effects in Power Amplifiers," IEEE TMTT, Vol. 54, No. 12, pp. 4432-4439, December 2006

[10] N. Wongkomet, L. Tee, and P. R. Gray, "A +31.5 dBm CMOS RF Doherty Power Amplifier for Wireless Communications", *IEEE JSSC*, vol. 41, pp. 2852-2859, Dec. 2006.