Abstract
In order to improve bandwidth efficiency and error performance, a new training scheme is proposed for bitinterleavedcoded modulation in multipleinput multipleoutput (BICMMIMO) systems. Typically, in a blockfading channel, the training overhead used for obtaining channel knowledge is proportional to a power of 2 of the number of transmit antennas. However, this overhead can be reduced by embedding pilot symbols within data symbols before precoding. The values, positions, and the number of pilot symbols are found by minimizing the CramerRao bound on the channel estimation error. Computer simulations are presented to demonstrate the advantage of the proposed scheme over other training methods, in terms of both the meansquareerror of the channel estimation and the system's frameerrorrate.
Keywords:
BICMMIMO; block fading; channel estimation; training design; pilot symbols; CramerRao bound; iterative receiver1 Introduction
The pioneering work on multipleinput multipleoutput (MIMO) systems [1] shows that a MIMO system can provide a multiplexing gain and accordingly high spectral efficiency over slow fading channels. On the other hand, to achieve a high diversity order, spacetime transmission techniques can be implemented at the transmitter [2,3]. To achieve both high diversity order and coding gain in coded modulation systems, the concept of spacetime transmission has also been applied [4,5]. In such systems, spacetime transmission is typically implemented using a linear spacetime matrix, or equivalently a linear precoder, so that a single modulation symbol is efficiently transmitted across multiple transmit antennas. Among many research works on precoder design for coded modulation systems with multiple antennas, the design that considers all the relevant components of the transmitter, namely precoding, modulation, and interleaver, can be found in [57]. Specifically, a fullrate precoder with any size and for any number of transmit antennas is designed in [6] to maximize the achievable diversity order and coding gain in MIMO blockfading channels.
It is shown in [6] that the maximum achievable diversity order can be realized by an iterative receiver that employs a softinput softoutput detector [5] and under the assumption of having the perfect channel state information (CSI) at the receiver. In practice, however, CSI has to be estimated using a channel estimator and it is never perfect. Two types of channel estimators have been used for MIMO blockfading channels in coded modulation systems, i.e., trainingbased and semiblind channel estimators [8,9]. In both types of channel estimators, known signals are used to estimate the CSI at the first iteration of the iterative receiver.
Conventionally, for blockfading channels, known signals or the training sequence is included at the beginning of each data block, which is called timemultiplexed training or pilot symbolassisted modulation (PSAM) scheme [10]. This scheme however reduces bandwidth efficiency of MIMO systems, since the amount of training overhead needed is at least a power of 2 of the number of transmit antennas [11] to ensure the identifiability of the MIMO channel. A straightforward application of the PSAM scheme to a BICMMIMO system would be timemultiplex data information with the training information after the precoder.
As an alternative to the above conventional PSAM scheme, a potential benefit can be sought by timemultiplexing data information with the training information before the precoder in the transmitter. This new approach shall reduce the required training overhead compared to the conventional PSAM, since the transmitted training symbols are spread over more time periods; thanks to the precoder. This approach shall be referred to as precoded PSAM (PPSAM). Investigating power and time allocations of the training symbols in PPSAM scheme is the main objective of this article.
Moreover, by multiplexing the training sequence before precoder, training symbols can be exploited in both the initialization and iteration phases of the iterative channel estimation process. This is different from a conventional iterative channel estimator using PSAM scheme, in which training sequence is only used at the initialization phase. A natural question is whether the optimal training design for the initialization phase using PPSAM scheme is still optimal for subsequent iterations of an iterative channel estimator. On the one hand, the channel estimation error at the initialization phase translates to an SNR shift in the BER performance [8]. On the other hand, the channel estimation error from the last iteration of the iterative estimator has a strong impact on the error floor of the BER performance [12]. Therefore, optimal training sequence should be designed carefully that considers both initialization and iteration phases.
One of different criteria that have been used to design training sequences is the minimization of the CramerRao bound (CRB) of the channel estimation error [10]. This criterion shall be used in this article due to two main reasons. First, it is directly related to the channel estimation error. Second, since the CRB is a lower bound on the meansquarederror (MSE) of any unbiased estimator, designing training sequences using this criterion would be applicable to many estimation algorithms. Other design criteria, such as maximizing the channel capacity [8] and minimizing the outage probability [13], are based on some specific channel estimation algorithms.
The article is organized as follows. The system model of BICMMIMO is presented in Section 2. In Section 3 a lower bound on the MSE of the channel estimator is obtained and the training sequence is designed by minimizing this bound. Section 4 provides numerical results and comparisons. Section 5 concludes the article.
2 System model
Figure 1 shows the block diagram of a BICMMIMO system under consideration. At the transmitter,
a channel encoder with a rater errorcorrecting code converts the vector of information bits b into a codeword c. The coded bits are then interleaved by a random interleaver as described in [6] to produce the interleaved codeword
Figure 1. Block diagram of a BICMMIMO system with a linear precoder and proposed training insertion.
Every group of N supersymbols is then spread over N time periods using a linear precoder G. The Nn_{t }× Nn_{t }matrix G multiplies a vector of Nn_{t }QAM symbols at the precoder input, and generates Nn_{t }symbols to be transmitted over n_{t }antennas, over N time periods.
This is illustrated in Figure 2. Let
Figure 2. Spreading a precoded symbol over n_{t }antennas and N time periodsdenoted by
With n_{t }transmit antennas and n_{r }receive antennas, the channel is modeled by an n_{t }× n_{r }matrix. For frequencyflat Rayleigh fading, coefficients of the channel matrix are
i.i.d. zeromean circularly symmetric complex Gaussian random variables with variance
where n_{s }is the number of distinct channel realizations during N time periods of each codeword. To simplify the notation it is also assumed^{(a) }that n_{s }divides N. For example, if the length of a codeword is 64 and n_{c }= 32, then choosing N = 2 would make n_{s }= 1, whereas choosing N = 4 gives n_{s }= 2. Notation
where
In general, the properties of the precoder in [6] are established by the maximumlikelihood decoding analysis and an assumption of ideal channel interleaving. Specifically, this linear precoder which achieves full diversity order and maximum coding gain satisfies the following two conditions:
• A genie condition, which guarantees orthogonal and equal norm subrows in the linear precoding matrix. Each subrow has size n_{t }in a precoding matrix with size Nn_{t }× Nn_{t}.
• Dispersive nucleo algebraic (DNA) condition, which is based on Proposition 2 in [6], forces null and orthogonal nucleotides with size s' = N/n_{s}. Nucleotides refer to subparts of subrows with size s'.
A linear precoder that satisfies the above two sets of conditions is called DNAcyclo precoder and has the best performance in terms of achieving diversity and coding gains with low complexity receiver when N ≤ n_{t}. It is suggested in [6] that to generate one class of such a precoder, a Ns' × Ns' cyclotomic rotator, denoted by Φ, that satisfies the genie condition is first selected. Then the orthogonal nucleotides are placed inside an Nn_{t }× Nn_{t }matrix and they are separated with null nucleotides. Therefore, the DNAcyclo precoder matrix can be expressed by subparts of a cyclotomic rotator as follows:
where Φ^{[i ] [j] }is the ith subrow of the jth row of Φ with size 1 × s', I_{n }is an identity matrix with size n × n and ⊗ denotes the Kronecker product.
The properties that shall be useful for the problem considered in this article, which
are implied directly from the genie and DNA conditions, are ΦΦ^{H }= I_{Ns ' }and
The iterative receiver is also shown in Figure 1. The channel estimator produces an estimate of the channel using the minimum MSE
(MMSE) criterion based on the training sequence. Details about channel estimation
with the proposed method of inserting training sequence shall be given in Section
3. After channel estimation is performed using the training signal, the softinput
softoutput demodulator uses the MMSE criterion to demodulate the data. The softoutput
MMSE demodulator computes the extrinsic information for the interleaved bits,
3 Training design and channel estimator
As discussed before, the criterion used for training design in this article is the
CRB on the channel estimation error. The bound states that the MSE of any unbiased
estimator is lower bounded by the trace of inverse of complex Fisher information matrix
(FIM) [14]. To derive FIM, the relation between the channel input and channel output during
one blocklength, i.e., N/n_{s }time periods, whose corresponding channel matrix is
where y^{[i,t] }= y^{[(t1)s'+i] }represents the ((t  1)s' + i)th received symbol during N time periods, with size n_{r }× 1. Moreover, h^{[t] }is the column vector formed by vertically stacking the columns of an n_{t }× n_{r }channel realization matrix H^{[t] }and x^{[τ]}'s are constructed by splitting x in Ns' subvectors with size 1 × n_{t}/s'. In the following, we call these subvectors x^{[τ]}'s nucleo symbols.
It is quite obvious from (4) that, to have all the received supersymbols, y^{[i,t]}, contain training information, there should be at least one pilot nucleo (i.e., n_{t}/s' pilot symbols) in each group of Ns' nucleos to be precoded.
With the above structure of the proposed training sequence, the number of pilot symbols in Nn_{t }transmitted symbols would be N_{p }= n_{p }× n_{t}/s', where n_{p }nucleo symbols in a symbol to be precoded are assigned to training sequence. Therefore, (4) can be rewritten as
where
The derivation of FIM is given in the next section. Pilot symbols are exploited at the initialization phase and in subsequent iterations considering the special structure of the training sequence. In general, training design can be investigated for these two phases separately. However, for the precoder adopted in this article, the optimal training design obtained for the initialization phase turns out to also be optimal for the iteration phase. Nevertheless, the optimal numbers of pilot nucleos in these two phases of channel estimation are not the same.
3.1 Fisher information matrix
The key steps in deriving the FIM in the initialization phase are now given. Without loss of generality we drop superscript t in (5) and perform all the derivations for the first block period (i.e., t = 1). Collecting all the observations during the first block period of length s' in a vector φ, the FIM for the channel estimation problem at the initialization phase is defined and computed as
where
where
The i.i.d. assumptions on noise and data make the FIM additive. Specifically,
We know that
and
where ∑_{l }is an n_{r }× n_{t }null matrix with only a single element of 1 at position
where
where e_{l }is an n_{t}n_{r }× 1 null vector with a single element 1 at position l.
Using all the above equations and after some manipulations, one has
where
Using the fact that tr (ABC) = tr (CAB) and summing over s' quantities
where
For designing training sequence, (10) can be simplified further using numerical calculation.
Using numerical calculation, it is observed that for a Rayleighdistributed channel,
the matrix
Moreover, using the property of the Kronecker product (A ⊗ B)(C ⊗ D) = (AC) ⊗ (BD), it follows that
In general, the second term in (11) depends on
For the iteration phase, specifically the last iteration, estimation and detection are implemented using
information about the data symbols as well as the pilot symbols. Thus, the parameter
of interest in deriving FIM is
3.2 Optimization of training symbols and their positions
This section is first concerned with minimizing the CRB expression for the initialization phase. The minimization is under a constraint on the power budget for the training sequence. Such a constraint is expressed as
Using the properties of the precoder employed in this study, the above constraint
can be simplified to
where
To proceed, lets consider two separate cases for problem (14): n_{p }= 1 and n_{p }≥ 2. Case 1 (n_{p }= 1): In this case the FIM is simplified to
Because of the shiftinvariant property of (15) with respect to τ, τ can be any value in the set {1, 2, . . . , Ns'}. For simplicity, set τ = 1 and the superscript τ is omitted. Using the fact that if X > 0 then tr (X^{1}) ≥ ∑_{i }1/(X)_{i},_{i}, the original optimization problem is simplified by minimizing the lower bound of the objective function.
On the other hand,
Case 2 (n_{p }≥ 2): In this case there are two options for the placements of pilot nucleos. The
first option is to group all pilot nucleos in one single cluster and the second option
is to spread pilot nucleos. It can be shown that the CRB is invariant with respect
to a shift of the placements of pilot nucleos in both options. Therefore, it suffices
to select one cluster or one spread placement. However, the precoder has been designed
such that the softoutput demodulator works with uncorrelated inputs and putting pilot
nucleos between data nucleos may violate this condition. That condition is satisfied
when A^{[i] }has a diagonal form. The implication of this property is to place pilot nucloes equispaced
in x_{k }and
Then the FIM in (11) can be represented by
To obtain the above expression of the objective function, the following property has been used:
Moreover, the only term that depends on the training symbols is
the solution is given by
Now consider the training design for the iteration phase. Observe that all the terms
in (12) have diagonal forms with equal diagonal elements, except
In summary, by selecting pilot nucleos such that the sum of the powers of their corresponding pilot symbols with the same indexes are equal, the bound on CRB is minimized. The above condition can give different selections for pilot symbols from a twodimensional constellation. It should be pointed out, however, that not all selections guarantee that pilot symbols belong to standard QAM constellations.
3.3 Determination of the number of the training symbols
For blockfading channels, the number of pilot nucleos, i.e., n_{p}, should be as small as possible that meets the power constraint. Using a larger value for n_{p }wastes bandwidth and does not change the system performance.
The optimum numbers of the training symbols in the initialization phase and iteration phase are not the same. This is explained as follows. At the initialization, by looking at (7), it is observed that the first term in (11) is an increasing function of n_{p}. However, the second term is a decreasing function of n_{p }that is multiplied by n_{r}. Therefore, n_{p }that minimizes the CRB are determined by the summation of these two terms, which is also determined by the value of n_{r}. Table 1 gives several examples of optimal n_{p }for different sets of n_{t}, n_{r }and N. For the iteration phase, the expression in (12) means that the CRB in the iteration phase always increases by increasing n_{p}. Since it is assumed that there is perfect information about the data symbols in the iteration phase, which is not the case in reality, it is most appropriate to select n_{p }considering only the initialization phase.
Table 1. Optimum n_{p }for several sets of parameters {n_{t}, n_{r}, N}
To demonstrate the optimal training design, Figure 3 shows a graphical structure for a simple example, where
Figure 3. Structure of the proposed scheme for the training sequencewhen N = 2, n_{t }= 4, n_{r }= 2 and n_{p }= 2
3.4 Channel estimation
For the channel estimation task, one can view the received vector during one block
length as
At the initialization, the mean and covariance matrix of this vector are given in Section 3.1. By treating the data symbols as nuisance parameters, the MMSE channel estimate can be found as [14]
where T = [(T^{1})^{T}, . . . , (T^{[s']})^{T}]^{T}.
In the subsequent iterations, soft information from the decoder is used to improve
the performance of the channel estimator. The channel estimator uses such information
to compute new estimates of the channel coefficients using expected values of the
data symbols. Therefore, the interleaved
To verify the results obtained in this section, Section 4 compares numerically the MSE performance of the above channel estimator obtained with the optimal and suboptimal training sequences.
4 Illustrative results
In this section, the frameerrorrate (FER) and MSE performances of BICMMIMO systems using a MMSE iterative channel estimator are presented. The spacetime precoder is the DNAcyclo precoder that satisfies the properties outlined in Section 2. We consider quadrature phaseshift keying (QPSK) modulation with Gray mapping.
The MSE performance of a BICMMIMO for a codeword length of 4 × 1024 bits is shown for a 4 × 2 blockfading MIMO channel in Figure 4, when n_{c }= 2. In this figure, E_{b }is the energy per information bit. The code used is the 16state convolutional code with generator polynomials (23, 35) in octal form. In Figure 4, the MSE curves are obtained after 1 and 5 iterations of the iterative channel estimation/demodulation/decoding, with the following cyclotomic rotator [16]:
Figure 4. Comparison of MSE performance obtained with the optimal PPSAM and the suboptimal PPSAMover a 4 × 2 blockfading channel with n_{c }= 2, when N = 2 and n_{p }= 2 after 1 and 5 iterations of iterative channelestimation/demodulation/decoding.
and when the setting for N, n_{s}, n_{p }and P_{t }in Figure 3 are used. The channel is generated randomly and is assumed to be Rayleigh distributed. For the purpose of comparison, the results for MSE performances of the optimal PPSAM, denoted by OPPSAM and the suboptimal PPSAM, denoted by SOPPSAM as well as the CRB are shown in Figure 4. For SOPPSAM, two pilot nucleos are inserted as one cluster in front of data nucleos in a symbol to be precoded. In contrast, in the case of OPPSAM, the optimized training sequence embeds the pilot nucleos at the first and third positions of Ns' = 4 positions for nucleos. The MSE curves show that the performance of the optimal scheme is better than the suboptimum scheme for the first iteration (i.e., initialization). In fact the MSE performance of the proposed scheme closely approaches the CRB at high E_{b}/N_{0 }after 5 iterations.
In Figure 5, the FER performance of the system with the PPSAM schemes is compared with the conventional
PSAM training scheme for the same system parameters as in Figure 4. The top curve is the FER performance of the system with the conventional PSAM training
scheme. Note that for a fair comparison, the training scheme in PSAM also meets the
training power constraint as trace
Figure 5. Comparison of FER performance obtained with the optimal PPSAM, suboptimal PPSAM and PSAM schemeover a 4 × 2 blockfading channel with n_{c }= 2, when N = 2 and n_{p }= 2 after 1 and 5 iterations of iterative channelestimation/demodulation/decoding.
As can be seen from Figure 5, the OPPSAM scheme offers 0.5 dB performance gain as compared to the SOPPSAM scheme at FER = 10^{2}. In comparison with PSAM, the performance of the PSAM scheme is about 0.51.5 dB worse than the proposed scheme depending on E_{b}/N_{0 }after 5 iterations. This is expected because the pilot information is embedded in the precoded symbols for the proposed scheme and not for the PSAM scheme. In this way, the demodulator can also make use of this information. Note, however, that for the first iteration, since there is no information about data, PSAM works the best. More importantly, while the proposed scheme uses a little bandwidth for training information (for the system considered in this figure the training overhead is n_{p }× n_{t}/s' = 4), the training overhead of PSAM scheme is n_{t }× n_{t }= 16, which is quadruple. To investigate the effect of the number of transmit antennas, two different systems, one with 2 × 2 channel and one with 4 × 2 MIMO channel, are compared in Figures 6 and 7 in terms of MSE and FER, respectively. For both channels, n_{p }= 2 and the optimum scheme are used when N = 2, while other system parameters are the same as those used for Figure 4. As can be seen from Figure 6, the MSE of the channel estimation increases when increasing the number of transmit antennas. This is expected because there are more channels to be estimated for the same amount of training information and power as done in the comparison. Nevertheless, the gain in diversity by using more antennas can still improve the overall FER performance as seen in Figure 7.
Figure 6. Comparison of MSE performance obtained with the optimal PPSAM for 2 × 2 and 4 × 2 blockfading channels with n_{c }= 2, when N = 2 and n_{p }= 2 after 5 iterations of iterative channelestimation/demodulation/decoding.
Figure 7. Comparison of FER performance obtained with the optimal PPSAM for 2 × 2 and 4 × 2 blockfading channels with n_{c }= 2, when N = 2 and n_{p }= 2 after 5 iterations of iterative channelestimation/demodulation/decoding.
5 Conclusion
In this article, a new training design for a BICMMIMO system over a blockfading channel has been proposed. The design inserts pilot symbols into the data symbols before precoding. The new training sequence improves bandwidth efficiency as compared to the conventional PSAM scheme and can also be used by the demodulator in the receiver. In order to design the optimal training symbols and their positions, the CRB on the channel estimations at the initialization and at the iteration phases are minimized. Compared to PSAM, performance improvement achieved with the proposed training is about 1.5 dB at a FER level of 10^{2}.
Endnotes
^{a}In practice, since n_{s }is typically an approximated value over some range and since N can be selected, such an assumption can be fulfilled. ^{b}Using the matrix inversion lemma, one has
Competing interests
Zohreh Andalibi has received funding from TRLabs of Saskatchewan. This organization partially is financing this manuscript.
References

G Caire, S Shamai, On the achievable throughput of a multiantenna Gaussian broadcast channel. IEEE Trans Inf Theory 49(7), 1691–1706 (2003). Publisher Full Text

SM Alamouti, A simple transmit diversity technique for wireless communications. IEEE J Sel Areas Commun 16(8), 1451–1458 (1998). Publisher Full Text

V Tarokh, N Seshadri, AR Calderbank, Spacetime codes for high data rate wireless communication: performance criterion and code construction. IEEE Trans Inf Theory 44(2), 744–765 (1998). Publisher Full Text

J Boutros, E Viterbo, Signal space diversity: a power and bandwidth eficient diversity technique for the Rayleigh fading channel. IEEE Trans Inf Theory 44(4), 1453–1467 (1998). Publisher Full Text

J Boutros, N Gresset, L Brunel, Turbo coding and decoding for multiple antenna channels. International Symposium on Turbo Codes and Related Topics (Brest, France, 2003), pp. 1–8

N Gresset, L Brunel, J Boutros, Spacetime coding techniques with bitinterleaved coded modulations for MIMO blockfading channels. IEEE Trans Inf Theory 54(5), 2156–2178 (2008)

N Gresset, JJ Boutros, L Brunel, Optimal linear precoding for BICM over MIMO channels. ISIT, 66 (Chicago, IL, 2004)

M Coldrey, P Bohlin, Trainingbased MIMO systems, Part I: performance comparison. IEEE Trans Signal Process 55(11), 5464–5476 (2007)

M Nicoli, S Ferrara, U Spagnolini, Softiterative channel estimation: methods and performance analysis. IEEE Trans Signal Process 55(6), 2993–3006 (2007)

M Dong, L Tong, BM Sadler, Optimal insertion of pilot symbols for transmissions over timevarying flat fading channels. IEEE Trans Signal Process 52(5), 1403–1418 (2004). Publisher Full Text

G Taricco, E Biglieri, Spacetime decoding with imperfect channel estimation. IEEE Trans Wirel Commun 4(4), 1874–1888 (2005)

Y Huang, JA Ritcey, Joint iterative channel estimation and decoding for bitinterleaved coded modulation over correlated fading channels. IEEE Trans Wirel Commun 4(5), 2549–2558 (2005)

P Piantanida, SM Sadough, On the outage capacity of a practical decoder accounting for channel estimation inaccuracies. IEEE Trans Commun 57(5), 1341–1350 (2009)

SM Kay, Fundamentals of Statistical Signal Processing: Estimation Theory (PrenticeHall PTR, New Jersey, 1993)

MA Khalighi, JJ Boutros, Semiblind channel estimation using the EM algorithm in iterative MIMO APP detectors. IEEE Trans Wirel Commun 5(11), 3165–3173 (2006)

GM Kraidy, P Rossi, Fulldiversity iterative MMSE receivers with spacetime precoders over blockfading MIMO channels. Proc IEEE Int Conf Wireless Commun and Signal Processing (Suzhou, 2010), pp. 1–5