FPGA-Based Manchester Decoder for IEEE 802.15.7

Full text

Turn on search term navigation

1. Introduction

Visible Light Communications (VLCs) represents nowadays a novel and promising wireless technique for sending and receiving information in short-range links [1,2,3]. VLC works by modulating the light intensity of a transmitting source, which typically is a Light Emitting Diode (LED). This technology has gained momentum by the recent diffusion on the market of white LED lamps designed for ambient lighting, which have proven effective as VLC transmitters [4].

The performance of a VLC link is limited by the Signal-to-Noise Ratio (SNR) present at the receiver, which imposes a tradeoff between the achievable data rate and the distance between the transmitter (TX) and the receiver (RX). Research showed that it is possible to establish communication in the Gb/s range (for example Cossu et al. showed a 3.4 Gb/s link [5]), or in a vehicle-to-vehicle (V2V) communication link, the maximum transmission distance is approximately 72 m in clear weather and 26 m in fog [6].

The information needs to be modulated before being transmitted through light. The highest performance in terms of efficiency in the use of bandwidth is obtained by applying complex modulations such as Orthogonal Frequency-Division Multiplexing (OFDM) plus Quadrature Amplitude Modulation (QAM) [7]. However, the implementation of these techniques requires electronics systems capable of a relatively high calculation power and TX/RX front-ends with elevated analog performance. On the other hand, in the On–Off Key (OOK) modulation [8], the bits are coded in only two different intensities of the light. This approach is very simple, since in TX the LED can be directly driven by digital devices [9], and in RX even a simple threshold receiver can work [10]. Despite the simplicity of this method, OOK modulation can achieve a relatively high performance [11].

An important variation of the OOK modulation is the Manchester coding. In Manchester OOK coding, the bits ‘0’ and ‘1’ are transmitted though the symbols couples ‘01’ and ‘10’, respectively (or the opposite) [12]. This choice grants a transition for every bit, which is very useful to recover the clock at the receiver and permits a direct-current (DC) balance of the channel. The positive features of this approach are confirmed by its inclusion in important standards such as the IEEE 802.15.7 [13] for wireless communications, which applies, for example, to automotive VLC applications [14,15].

1.1. Manchester Receivers in the Literature

While the Manchester OOK transmitter is easy to implement, the receiver is less straightforward and requires a bit more of electronics. The Manchester receivers present in the literature (see Table 1) typically share the same front-end based on a two-level analog-to-digital (AD) converter, realized by comparing the input signal to a low-pass filtered replica [16]. The works present in the literature are mainly focused on the Clock Data Recover (CDR) circuit that follows the two-level AD converter. The most used approaches are based on a Phase-Locked Loop (PLL) or, alternatively, exploit the so called “blind oversampling”.

In the first approach, a PLL is synchronized to the incoming signal and used to generate a clock with a phase suitable to sample the data [16,17]. This approach can achieve very high frequencies, and it is used in Application-Specific Integrated Circuits (ASICs) (for example, Ethernet receivers). On the other hand, it is not compatible to implementation in the general fabric architecture of a FPGA.

On the other hand, the “blind oversampling” method is often the preferred approach for FPGA implementation. An unsynchronized clock, whose frequency is at least two times the bit rate [18], samples the data with different phases. A following digital logic locates the edges and reconstruct the bits [19,20,21,22,23,24,25]. The performance of this method depends on the oversampling ratio and/or the number of phases of the clock, and is quite sensitive to jitter [26].

The aforementioned methods can be used for rates in the order of Gb/s. For lower rates, such as those at the reach of microcontrollers (µC), the Manchester code is typically decoded by counting the time intervals between edges of the incoming signal by the means of µC peripherals timers [10]. This approach is quite simple, works for rates up to some hundreds of kb/s, but has limited immunity to noise. Table 1

Main Manchester receiver methods.

Method	AD Conv.	Rates	Device	Resources	BER	Ref.
PLL	1-bit	>1 Gb/s	ASIC	Low	Medium	[16,17]
blind ‘oversampling’	1-bit	>1 Gb/s	FPGA, ASIC	Medium	Medium	[19,20,21,22,23,24,25]
uC-based	1-bit	<200 kb/s	uC	Low	Medium	[10,27]
proposed	12-bit	<1 Gb/s	FPGA, ASIC	High	Low	This Study

1.2. Our Contribution

The weak point of the Manchester decoders described above is the very simple two-level front-end, which results in limited noise immunity. Input noise can easily surpass the threshold, leading to false commutations and data errors. Additionally, implementing delays through digital gates is not advisable, and overall tolerance to frequency errors is limited. In case of rates of Gb/s, this choice is related to the excessive complication and cost of a high-resolution and high velocity AD converter, together with the following digital data processing.

On the other hand, for lower data rates, the improvement of the front-end will result in a performance gain, although at the expanse of complexity. Data produced by high-resolution AD converters, even in case of a relatively low rate, cannot by processed by a µC [27], and a more complicated FPGA system would be required. However, applications exist where the performance of a VLC link is more important than simplicity or cost. These include, for example, aerospace and automotive fields. It is feasible to imagine a situation placed in a near future where a reliable, low-latency data exchange between two cars equipped for VLC communication, one blocked and the other fast approaching in the same lane, can automatically trigger the breaks of the second car and be crucial in saving lives [28].

In this work we propose a high-performance Manchester decoder implemented in a Field Programmable Gate Array (FPGA), specifically dedicated to demanding VLC applications that implements the IEEE 802.15.7 standard in real time. It accepts input data from a 12-bit analog-to-digital converter at 10 Msps, and it processes these data by performing a phase analysis over the samples acquired in a bit-time. In other words, the Manchester code is decoded like a special case of a Binary Phase-Shift Keying (BPSK) modulation [29]. The decoder calculates the phase by applying the discrete Fourier transform (DFT) to the single frequency component corresponding to the bit period. As will be clarified in the following, it compares to the existing methods such as those reported in the last row of Table 1: it achieves high noise immunity at the cost of a more demanding architecture.

The architecture of the proposed decoder is detailed in Section 2. In Section 3 the proposed architecture is validated and shown more robust to noise with respect to the simple Manchester decoder typically employed. Then, in Section 4, the decoder is implemented in the FPGA of a system designed for VLC research [30], and tailored for the 100 kb/s rate provided in the IEEE 802.15.7 standard.

In the experiments, reported in Section 5, the FPGA-based VLC system was connected to a 16 W white LED headlamp certified for automotive lighting. The tests showed that the decoder receives with no error when the signal amplitude is higher than −30 dB with respect to the input dynamics, with a bit error rate (BER) [31] lower than 10⁻³ as long as the signal is higher than −47 dB.

2. Architecture of the Receiver

2.1. The Method

The signal modulated according to the Manchester code is a 2-level signal like that represented in the example of Figure 1. In the example, a white noise is added (with a SNR of 30 dB) to the signal to give a more realistic representation. Each bit is transmitted for a time $T_{b}$ . Formally, the signal can be expressed as:

(1) $s (t) = \sum_{n_{b} = 0}^{N_{b} - 1} b_{n b} \cdot o n e (t - n_{b} T_{b}) + (1 - b_{n b}) \cdot z e r o (t - n_{b} T_{b})$

where

N_{b}

is the number of transmitted bits,

b_{n b}

is the value (0 or 1) of the bit in position

n_{b}

, and the functions one(t) and zero(t) represent the single bits of amplitude

A

(2) $o n e (t) = A \cdot r e c t (\frac{t - 3 T_{b} / 4}{T_{b} / 2}); z e r o (t) = A \cdot r e c t (\frac{t - T_{b} / 4}{T_{b} / 2});$

The signal is AD converted with a rate $f_{c}$ , generating the sequence $s_{i} = s (i / f_{c})$ . Each bit is composed by N = $T_{b}$ · $f_{c}$ samples. It is convenient—but not mandatory—that the sampling frequency $f_{c}$ is a multiple of the transmission rate $f_{s} = 1 / T_{b}$ . The input signal is subdivided into slices $s l$ of N sample each, so that each slice of index h lasts for the duration of a bit:

(3) ${s l}_{h} = [s_{h \cdot N + o f}, s_{h \cdot N + 1 + o f}, \dots, s_{N (h + 1) - 1 + ξ o f}] .$

Please note that in the bit boundaries (i.e., the first sample of each bit) are located in the samples $s_{i \cdot N}$ for each i = 0,1,2,3, etc., while the slices ${s l}_{h}$ in (3) are aligned to the sample $s_{h \cdot N + ξ o f}$ , i.e., the slice, in general, is not aligned to the bit boundaries, but is affected by an offset of $ξ o f$ samples.

In the representation of Figure 1, the subdivision of the sequence $s_{i}$ in the slices ${s l}_{h}$ is performed by the “Sync” block. The “Sync” block has no a priori knowledge on where the bit boundaries are located along the signal, and cutting the slices so that they are centered on the true bit boundaries ( $ξ o f = 0$ ) is not a trivial task. This is a general problem that most of the receivers must face, and it is known as “synchronization”. A wide literature is present that investigates possible solutions that apply to different modulations [32]. We will return to this later; for now, let us imagine that the slices are perfectly centered over the bits, i.e., the receiver is synchronized and $ξ o f = 0$ .

2.2. The Bit Decision

The slices are processed to detect the bit value and to extract the information necessary to maintain the synchronization. In the proposed approach, the signal is processed like it was modulated according to a BPSK modulation, and Manchester code is indeed a specific case of BPSK. Thus, the phase φ of the signal actually present in each slice ${s l}_{h}$ is calculated.

The slice represents a square wave (see Figure 1) at frequency f_s, when the communication rate is f_s bit/s. The square wave, in the frequency domain, is characterized by the fundamental frequency at f_s with harmonics at multiple frequencies. The amplitude of the harmonics decreases with the increasing frequency, being the component at f_s characterized by the highest amplitude. Thus, it is reasonable to suppose that the component at f_s features the highest SNR. We will extract the phase φ from this component (see Equation (7)).

Let us start by calculating the complex spectrum of the slice ${s l}_{h}$ composed by the N samples as given in (3), by exploiting the discrete Fourier transform (DFT) [33]:

(4) $I_{k} = \sum_{i = 0}^{N - 1} s_{h \cdot N + o f + i} \cdot c o s (2 π \frac{k}{N} i); Q_{k} = - \sum_{i = 0}^{N - 1} s_{h \cdot N + o f + i} \cdot s i n (2 π \frac{k}{N} i); 0 \leq k \leq N - 1$

In (4), $I_{k}$ and $Q_{k}$ represent the In-phase and Quadrature-phase (IQ) components of the spectrum for the generic frequency $f_{k} = k \cdot f_{c} / N$ , where $f_{c}$ is the sampling frequency. Since N = $T_{b}$ · $f_{c}$ = $f_{c}$ / $f_{s}$ , the component of our interest at frequency $f_{s}$ has the index:

(5) $k = \frac{f_{s}}{f_{c}} N = \frac{f_{s}}{f_{c}} T_{b} f_{c} = \frac{f_{s}}{f_{c}} \frac{f_{c}}{f_{s}} = 1 .$

Equation (4), rewritten for $k = 1$ , is:

(6) $I_{1} = \sum_{i = 0}^{N - 1} s_{h \cdot N + o f + i} \cdot c o s (2 π \frac{i}{N}); Q_{1} = - \sum_{i = 0}^{N - 1} s_{h \cdot N + o f + i} \cdot s i n (2 π \frac{i}{N})$

From here on $I_{1}$ and $Q_{1}$ are named I and Q for brevity. In the next step, the phase φ is calculated through the four-quadrant arctangent function atan2(Q, I) that can be defined as:

(7) $φ = a t a n 2 (Q, I) = \{\begin{matrix} \begin{matrix} \begin{matrix} a t a n (\frac{Q}{I}) & I > 0 \end{matrix} \\ \begin{matrix} π - a t a n (- \frac{Q}{I}) & I < 0 \end{matrix} \end{matrix} \\ \begin{matrix} \begin{matrix} \frac{π}{2} & I = 0; Q > 0 \end{matrix} \\ \begin{matrix} - \frac{π}{2} & I = 0; Q < 0 \end{matrix} \end{matrix} \end{matrix}$

where

a t a n (x)

is the inverse tangent of

x

. In case of an ideal signal coded according to Manchester modulation of amplitude ±A, the slice holding bit ‘0’ (transition from symbol 0 to symbol 1) has I = 0, Q = 2A, and

φ

\frac{π}{2}

; and the slice holding the bit ‘1’ (transition from symbol 1 to symbol 0) has I = 0, Q = −2A, and

φ

- \frac{π}{2}

, as summarized in Table 1.

The decision on the value of the bit can be easily taken by considering the sign of the phase φ. However, it is apparent that the same decision can be obtained by looking at the sign of Q, without the need of calculating I or the phase φ. However, the calculation of the phase is required for the synchronization of the receiver, as it will be discovered in the next section.

2.3. The Syncrhonization of the Receiver

The problem of transmitter–receiver synchronization is well known in the field, and several techniques have been proposed in the literature [32]. As anticipated, the receiver has no a priori knowledge on where the borders of the bits are located along the incoming data sequence. It is necessary to implement some mechanism that finds the positions of the borders and maintains the sequence dynamically synchronized. In fact, even if the transmitter and receiver share the same nominal communication rate, unavoidable differences on their local oscillators make the calculation of the timings in the two apparatuses not perfectly equal. The result is that, without correction, the time-markers of the receiver are likely to accumulate a phase error with respect to the data produced by the transmitter, Moreover, this phase error increases in time and eventually prevents a correct communication [15].

In this application, we use the phase $φ$ , calculated by (7), to locate the bit boundaries at the start of the transmission, and then to dynamically maintain the right position during the remaining part of the communication. To better understand the process, let us imagine that the receiver places the tentative borders of the bit like depicted in Figure 2 (red-dashed lines). In this example the bit transmitted has a ‘0’ value and the tentative bit-slice is placed with a phase error $φ_{e}$ . The error can be due to noise on the signal or simply because this is the first transmitted bit, and thus the receiver has no history to help in locating the right position.

The data-slice delimited by the red interval is processed for phase calculation as detailed in the previous sections. In this condition, from (7), the resulting phase $φ$ is not 90° like it should be ideally for the ‘0’ value of the bit (see Table 2), but it results around $φ$ = −150°. This info is precious for the receiver that can calculate the phase error: $φ_{e}$ = −150° − 90° + 180° = −60°. Now the receiver knows that the borders were placed with a delay of 60°. Thanks to this knowledge, the error is corrected, and the next bit can be located with much higher accuracy, as shown in Figure 2. From now on the receiver is synchronized and dynamically maintains the lock by correcting the error $φ_{e}$ at every step. At this point the error is due only to the noise on the signal, and the receiver maintains the synchronization as long as the SNR is sufficiently high.

It should be noted that in the example of Figure 2, the last samples of the red slice (1st tentative bit) are used in the calculation of the phase (7) for the 1st bit, and, after the correction, are used again in the calculation of the phase for the 2nd bit. Similarly, it happens that when the error is positive (advance tentative position), some samples are not used at all.

The example of Figure 2 can be extended to analyze the general condition. Figure 3 shows how the phase $φ$ calculated by (7) changes according to the phase error $φ_{e}$ . The graph reports the cases for bit of value ‘0’ and ‘1’.

Given a phase $φ$ , Table 3 summarizes the decision about the bit value and the estimation of the error $φ_{e}$ to be used in the positioning of the next bit-slice. Given that $s_{i}$ is the first sample of the current slice, affected by the phase error $φ_{e}$ , the next positioning is achieved by starting the reading of the slice from the sample $s_{i + l}$ , where l is obtained as:

(8) $l = i n t [T_{b} f_{c} (1 + \frac{φ_{e}}{360 °})] = N + i n t (\frac{φ_{e}}{360 °} N) = N + z$

In (8) $φ_{e}$ is expressed in deg (°), $i n t (x)$ is the nearest integer to $x$ , and $N$ represents the number of samples per bit. The increment $l$ is composed by $N$ , i.e., the bit length, plus $z$ that can be positive or negative depending on the sign of $φ_{e}$ , and ranges in ± $N / 2$ .

In a practical implementation, it can be beneficial to reduce the variability of $z$ among subsequent bits. This precaution can avoid, for example, that a sudden noise in a single bit could compromise the synchronization. A simple approach, whose efficacy will be discussed in Section 3, consists in allowing for the next bit a phase correction $z$ not above a given percentage of $N$ .

From Figure 3 we see that, as long as the phase error $φ_{e}$ is between −90° and +90°, the decision about the bit value is correct; on the other hand, a higher phase error results in a wrong bit value, an incorrect calculation of the phase error $φ_{e}$ , and thus in an incorrect positioning of the next bit. In practice, a phase error higher than ±90° causes the receiver to lock over the second commutation edge of the bit, which is an unwanted condition. However, once the correct synchronization is achieved, it is easily maintained as long as the SNR is sufficiently high. Unfortunately, at the beginning of the sequence the synchronization has still to be achieved: the receiver has to “guess” the position of the first bit without any other info (see Figure 2), and a phase error higher than ±90° can happen.

The issue is solved by transmitting a known “synchronization” sequence of some bits at the beginning of the data packet (for example, eight zeroes). With the knowledge of the bit value, the receiver can correctly calculate the phase error $φ_{e}$ (see Figure 3) over the full ±180° range, thus achieving the correct synchronization.

2.4. Approximation of the Inverse Tangent

The calculation of the phase (7) requires the inverse tangent. In general, the real-time calculation of the inverse tangent is not a trivial task. Several algorithms have been reported in the literature that produce approximations with different accuracies and calculation efforts. Readers interested in the subject can find some different implementing techniques for example in [34,35,36,37], but several others are present in the literature.

In order to optimize the effort, let us evaluate the accuracy required for the calculation of the inverse tangent in this application. As detailed in the previous section, the phase error $φ_{e}$ is used by the receiver to correct the slice position on the next bit by applying (8). The quantized nature of $l$ , which must be integer (8), imposes a limitation to the requirements of resolution in the calculation of $φ_{e}$ , corresponding to the minimum variation $∆ φ_{e}$ of the phase $φ_{e}$ that occurs for a unity variation of $l$ . The same applies for $∆ φ$ in the calculation of $φ$ . In other words, it is a waste to calculate the inverse tangent with a grade of resolution higher with respect to the minimum that changes the result of the rounding in (8), namely $i n t (\frac{φ_{e}}{360 °} N) .$ This reasoning ends in the following reference for evaluating the resolution $∆ φ$ :

(9) $\frac{∆ φ_{e}}{360 °} N = 1 \Rightarrow ∆ φ_{e} = \frac{360 °}{N} \Rightarrow ∆ φ = \frac{360 °}{N}$

Another useful constraint is that the approximation produces no error when the system is synchronized, so the lock condition is effectively maintained.

Following the reasons discussed so far, we approximated the four-quadrant inverse tangent through a very simple linear relation that crosses the origin. As will be clearer in the following, the “origin crossing” grants the condition of no error in lock condition. Formally we have:

(10) $φ = i t a n p p (x) = m \cdot x$

where itanpp

(x)

is the approximate inverse tangent function, and

m

is the value that minimizes the maximum error in the octant:

(11) $\min_{m} \{\max_{x} |m \cdot x - \frac{180 °}{π} a t a n (x)|\} 0 \leq x < 1$

After some basic steps of mathematical analysis, here omitted for brevity, we obtain the following closed expression for $m$ that minimizes (11):

(12) $\frac{180 °}{π} a t a n (\sqrt{\frac{\frac{180 °}{π} - m}{m}}) - m (\sqrt{\frac{\frac{180 °}{π} - m}{m}}) - m + 45 ° = 0$

The numerical solution of Equation (12) is

m \approx 47.7434 °

In addition to the approximation of Equation (10) with $m$ given by Equation (12), it is worth of considering also the further approximation of the solution of Equation (12) for $m = 45 °$ . In fact, being 45° = 360°/2³, the hardware implementation of this second approximation is particularly efficient, as will be clarified in Section 4.3.

Once the atan $(x)$ is approximated inside an octant, it is trivial to extend the result to the quadrant by applying $atan (1 / x) = 90 ° - a t a n (x)$ , and then to the full four-quadrant co-domain by evaluating the signs of the I, Q components. The steps are the following:

(1). The octant is determined from the absolute values of I and Q (see Table 3, where octants are numbered in anti-clockwise order starting from 0°);
(2). We calculate (10) with x = min(|I|, |Q|)/max(|I|, |Q|). This way we approximate the inverse tangent or cotangent depending on the angle behaving to an even or odd octant; referring the result to the zero octant (0 ≤ φ < $45 °$ );
(3). The result is rotated back to the octant calculated in point 1 with the simple operation reported in the last column of Table 4. Like the ideal four-quadrant extension of atan $(x)$ , the approximation of Table 4 is undefined when both I and Q are null.

Figure 4 compares the atan(x) approximations with m = 47.7434 (blue curve) and m = 45° (magenta curve) as calculated in Equation (9), and the reference function (red-dashed curve). On top of the picture, the three trends are reported; on the bottom panel, the phase errors $φ_{e},$ calculated with respect to the reference inverse tangent, are highlighted. The graph reports the data for the first octant (0 ≤ x < 1) only, since they repeat unchanged in the remaining octants.

We can see that with m = 47.74° the error ranges between ±2.74°, while with m = 45° the error is between −4.07° and 0°. In addition to the maximum error, it should be noted that once the receiver is synchronized, the phase will be around ±90°, which corresponds to the origin of the octant in both cases (see Table 4). In the origin, the error is null, and remains reasonably low even near the origin. As discussed before, this behavior is important for the system to correctly maintain the lock condition.

As we anticipated, the error calculated here should be related through Equation (9) to the number of samples per bit N. From Equation (9), we deduce that the accuracy granted by the proposed approximations, considering the maximum errors, corresponds to values of N ≈ 130 and N ≈ 90 for m = 47.74° and m = 45°, respectively. However, this is a quite conservative estimation, and from a practical point of view, N ≈ 100 with m = 45° represents a good compromise.

3. Evaluation of the Receiver Performance in Simulations

3.1. Evaluation of How the Saturation of z Affects the Receiver Performance

In Section 2.3, when we discussed Equation (8), we mentioned the convenience to limit the value of $z$ in order to reduce the possible jitter in phase jumps among subsequent bits. The value $z$ in Equation (8) ranges in $\pm N / 2$ , thus we can limit the ‘jumps’ to a percentage $Z_{%}$ of the maximum range $N / 2$ Formally:

(13) $z = \{\begin{matrix} s i g n (z) \cdot Z_{%} N / 2 & |z| > Z_{%} N / 2 \\ z & |z| \leq Z_{%} N / 2 \end{matrix}$

In this test we investigate how $Z_{%}$ affects the receive performance. The algorithm described so far was implemented in MATLAB R2024a (The Mathworks, Natick, MA, USA). In input to the receiver, a signal was prepared that included a synchronization sequence of 100 bit of zeros (01 symbols), followed by a payload of 10⁶ bits generated randomly. At the very beginning of the sequence, a period of noise was added. It lasted for a random length between 0 and $T_{b}$ . The aim was generating an initial phase difference between the transmission and the reception timings. According to IEEE 802.15.7 standard, we simulated $T_{b}$ = 10 μs, corresponding to 100 kbit/s. The signal was sampled at $f_{c}$ = 10 Msps, thus we had N = 100.

The relatively long extent of the synchronization sequence used here was necessary to test low values of $Z_{%}$ : in fact, with $Z_{%} = 2 %$ the receiver can adjust for a maximum of one sample per bit, and thus, in the worst case, needs 100 bits to synchronize.

A white noise was added to the signal to achieve a SNR ranging from 0 dB to −25 dB, with a 1 dB step. The value of $Z_{%}$ evaluated were: 2%, 4%, 6%, 8%, 10%, 20%, and 100%. Parameters are summarized in Table 5. We performed 4 simulations for each couple of $Z_{%}$ , SNR values, for a total of 4 × 7 × 25 = 700 simulations. The performance of the receiver was evaluated through the BER, calculated as the ratio between the number of the wrong received bits, and the number of bits in the payload (10⁶ in these experiments). In case of no error, the BER was considered 1/10⁶.

The BER resulting from the 4 simulations for each couple of $Z_{%}$ and SNR values were averaged and reported in Figure 5. The picture shows 7 curves in different colors: one for each value of $Z_{%}$ tested (see the legend). The curves represent the BERs over the simulated range of SNR. It is interesting noting that the receiver makes no error as long as the SNR is higher than about −3 dB, and the output is only noise when SNR < −20 dB, regardless of $Z_{%}$ . For SNR between −3 and −20 dB $Z_{%}$ plays an important role, and in general the lower $Z_{%}$ , the better. However, the results show a limit in the improvement of the BER with $Z_{%}$ . For example, given an acceptable error rate of BER < 10⁻⁴, we observe a gain in the performance when $Z_{%}$ decreases until 10%, but no gain for $Z_{%}$ < 10%. Similarly, the lower limit of $Z_{%}$ to achieve an improvement in the performance is about $Z_{%}$ = 6% when we require BER < 10⁻²; and so on.

3.2. Evaluation of the Syncronization on Sequences Affected by Timing Errors

The timings of the transmitter and the receiver are based on independent local oscillators, which, although they share the same nominal frequency, generate clocks affected by light differences. Typical oscillators have accuracy in the order of some tens of part per million (ppm) that translate in relative frequency differences between transmitter and receiver below 0.1%. Another possible source of frequency shift is the Doppler effect generated when the transmitter and the receiver move one with respect to the other. In addition, the oscillators are affected by jitter noise [38], i.e., rapid variation of the clock phase due to the electrical noise, and frame jitter noise [39], i.e., temporal variation of the start of the bit frames.

Here, we tested how the proposed algorithm tolerates the frequency shift and jitter noise, starting with the first. We duplicated the experiment proposed in the previous section by changing the bit length $T_{b}$ by 1%, which, being a tenfold variation with respect to what expected in a real application, represents a quite unfavorable condition.

The results are presented in Figure 6a. By comparing the curves to those of Figure 5, we note a slight worsening of the performance. In particular, the curve for $Z_{%}$ = 2% stands out, indicating that in this condition the reception is never possible regardless of the SNR, while in the previous test $Z_{%}$ = 2% granted the best performance (see Figure 5). This result is explained by considering that the limitation of $Z_{%}$ = 2% corresponds to a correction of 1 sample per bit with the experimental parameters of Table 5, that is, exactly the out-of-frequency condition we imposed in the transmission sequence. In other words, the maximum correction rate is not enough to catch with the bit temporal variation produced by the simulated frequency error. Apart from $Z_{%}$ = 2%, like in the previous test, the performance improves with decreasing $Z_{%}$ , but there is no significant gain for $Z_{%}$ < 10%. No reception is achieved for SNR < −10 dB.

The last experiment was designed to simulate the presence of jitter: the bits were generated by varying their duration. If $T_{b}$ and $T_{b n}$ are the actual and the nominal bit duration, we varied randomly $T_{b}$ in the range $T_{b n} (1 \pm 0.01)$ , i.e., a 1% variation. Once again, the other parameters were the same as reported in Table 5. The measurements reported in Figure 6b, are pretty similar to those obtained without jitter shown in Figure 5. A light worsening is present for all the saturation levels $Z_{%}$ tested, but the receiver easily tolerates this level of jitter.

3.3. Comparison of the Proposed Decoder to Reference

The Manchester decoder proposed has so far been compared in performance to a standard decoder, similar to those described in some papers [24,25], but also present in several books and application notes. The standard decoder here considered (see Figure 7a) was based on a 2-level analog-to-digital converter realized by a tracking comparator, followed by a digital filter and logic for the decoding. A Numerical Oscillator (NO) reconstructed the data clock. Similarly to what we did to test the proposed decoder, Manchester-coded signals with added noise at increasing power were generated and processed through the decoder. The BER was obtained by comparing the output and ground-truth data.

Figure 7b reports the outcome of the comparison for BER in the range 5ߝ25 dB. The proposed decoder was set to $Z_{%}$ = 2% to compare better with the low bandwidth of the NO of the reference decoder. Being equal the BER, the proposed decoder gains about 10 dB over the SNR. For BER higher than 10⁻² the advantage reduces, but this error range finds scarce interest in practical applications.

4. FPGA Implementation

This section describes in detail how the decoder, whose general architecture is elaborated and characterized in the previous section of the paper, is implemented in the FPGA. The data flow in input consists of 10 M words per second at 12-bit, as sourced by the ADC. The tasks necessary to carry out in real time the calculations on such a data flow are subdivided, as visible in Figure 8, between the NIOS II soft-processor and other blocks that act like efficient mathematical specialized co-processors. This subdivision, quite common in FPGA projects, allows the exploitation of the advantages belonging to both the microcontroller (μC) and the dedicated logic. The former is programmed in ‘C’ language, making it quite easy to apply changes, make corrections, investigate new possible approaches; the latter grants the required calculation power to process the data flow that the μC alone would never grant. Nevertheless, as it would be clear in the following, the μC must complete its tasks inside timing windows strictly correlated to the bit-length.

It should be noted that the architecture spans two separated clock domains: data is sourced from the ADC at 10 Msps, synchronized to a 10 MHz clock; while the μC, the I/Q demodulator and the phase calculation run at the higher 100 MHz clock. This choice leaves more execution power for the μC and the processing logics. The circular buffer that, as clarified later, is based on dual-port memory, represents the ideal bridge to cross the two clock domains. A description of the functional blocks visible in Figure 8 follows, organized per single block.

4.1. Circular Buffer

The acquired samples reach the circular buffer, where they are temporarily stored (see Figure 9). The heart of the buffer is a dual-port memory capable of holding 512 data. The memory is used like a clock-domain bridge as well: it is written from the 10 MHz side and read from the 100 MHz side. The write side is managed as a typical circular buffer: the writing address pointer (Wrp) is generated by a counter modulo 512 that increments every input sample. The position of the writing pointer is necessary also in the read-domain. A cross-bounding logic protects against possible metastable events that can occur when the pointer crosses the clock-domain.

In the read-side of the buffer, the read-pointer (Rdp) is generated by a counter modulo 512 managed by a specific logic, detailed in this section. The values of write and read pointers are monitored to detect possible overflow of the buffer, which occurs when input samples are written in positions where previous data have not been read yet. In addition, the values of the pointers are used to calculate the number of samples available in the buffer, i.e., the number of sample not yet read, but ready to be read. The number of available samples and the information about overflow are mapped in the address space of the µC that reaches them through its bus.

Let us go back to the logic that manages the read-pointer. The µC, again through its bus, writes a Read Offset. When this operation occurs, the Rdp is updated with the operation: Rdp = Rdp-ROff, where ROff is the Read Offset set by the µC. In addition, the streaming of 100-samples data towards the next stage is automatically triggered. At the sample time of 1/10 MHz, the 100-sample corresponds to $T_{b}$ = 10 µs. In other words, the data corresponding to a bit with the starting point set by the µC, is moved at the maximum velocity of one sample per clock cycle towards the demodulator. During the streaming, Rrp is incremented, and the data valid to the next block is generated as needed.

It should be noted that, although the logic that governs this circular buffer resembles that of a First-In-First-Out (FIFO) memory, it is not exactly the same. In particular, while the write-side corresponds to the logic of a FIFO, the read-side does not. In fact, in FIFO memory, each sample must necessarily be read, and it should be read only once. In the proposed architecture, according to the Read Offset set by the µC, it can happen that some sample is not read at all, or some sample is read multiple times, as required for the decoder and detailed in Section 2.3.

The memory is composed by 512-sample, and thus it can buffer about 5 · $T_{b}$ of data.

4.2. I/Q Demodulator

The data-bursts coming from the circular buffer, composed by 100-sample, reach two parallel multipliers in the I/Q demodulator, designed based on [40], (see Figure 10). The bursts are supported by the data-valid flag. The aim of this block is calculating the I and Q values according to Equation (6). A look-up dual port memory holds 100-sample of a period of cosine signal, with 12-bit resolution. The address logic generates two parallel read addresses so that a cosine and sine signal feed the corresponding multipliers. The cosine signal is obtained by reading the look-up from address 0, while the sine signal is obtained by starting the reading from address 75, i.e., with a 270° phase-delay. The full 24-bit dynamics of the multipliers output is maintained. These signals feed two parallel accumulators working at 31-bit. The data-valid signal is used to reset the address logics and to zero the I/Q accumulators before the beginning of each data burst.

4.3. Phase Calculation and Approximation of the Inverse Tangent

This part of the FPGA implements the algorithm for phase calculation through the inverse tangent approximation described in Section 2.4 like reported in Figure 11. The I and Q data present in the accumulators are the starting point that feed the “Octant” and “Norm” blocks. The previous block, like the name suggests, compares I and Q to find the octant like described in Table 4. The “Norm” block prepares the numerator, A, and denominator, B, for the 24-bit fixed-point divisor that follows. I and Q are left-shifted of j positions so that the maximum of them in absolute value fills the 24-bit dynamic. Then, the lower value is cut to the 12 MSB, further scaled by 2¹², and placed in the numerator A. The higher value is cut to 12 MSB, filled with 12 zeros on the left, and placed in the denominator B. This way leads to the calculation of the inverse tangent for even quadrants and inverse cotangent for odd quadrants, which always results in an angle less than 45°. The shift has no effect on A/B but allows the exploitation of the full dynamics of the divisor; the 2¹² scaling applied to the numerator A is necessary for achieving an integer division. Table 6 summarizes the procedure.

The output of the 24-bit divisor is processed by the “Ph” block that applies the simple operations reported in the last column of Table 4. Finally, the phase is ready to be read by the μC.

4.4. Bit Decision, Synchronization, and Managing

The μC reads the phase. From its value, it decides the value of the bit (see Table 3) and calculates the offset correction l according to (8) to maintain the synchronization. The μC finally writes the offset l in the circular buffer triggering the starting of the evaluation of the next bit. The detect bit is moved in memory and is available for further processing or directly passed to the final application. The μC can use the remaining time (see Section 5.1) for other tasks, like searching for data preambles, data unpackaging, CRC checks, etc.

4.5. Mathematical Noise

The mathematical processing in the receiver is implemented in fixed-point representation. This representation, very convenient in FPGA, inevitably involves approximations. In addition, the inverse tangent function was further approximated through linear functions. In this section we describe the tests planned to investigate if these approximations impact the performance of the receiver.

The real mathematics (RM), like implemented in FPGA, was duplicated in MATLAB^® R2024a. The same test reported in Section 3 was repeated with RM and compared to the reference results obtained with ideal mathematics (IM), i.e., the 64-bit floating point format.

BERs are reported for $Z_{%}$ of 6, 10, 20% in Figure 12, where the dashed and continuous curves describe the results calculated with IM and RM, respectively. In all cases, the curves are very similar. In some parts of the graph, the results obtained with RM seems even better, but this is attributed to statistical fluctuations related to the random nature of the payloads and the noise added into the generation of every signal.

5. Experiments and Results

5.1. Resources and Timings

The algorithm described so far was coded in VHDL [41] and implemented in a VLC research system realized in-house [30]. The system includes a 10M50DAF486C6 FPGA from Intel (Santa Clara, CA, USA) together with all the electronics for transmitting and receiving VLC data. The system is managed through a MATLAB^® user interface that runs on a host PC. A framework present in the FPGA supports the user in the implementation of new applications, like the one here proposed. More details about the VLC system and its use can be found in [30,42].

The resources employed by the proposed decoder once integrated in the FPGA are listed in Table 7. The receiver requires about 7% of Advanced Logic Modules (ALMs), 3.5% of the Digital Signal Processors (DSPs) and 1.3% of the memory available in the target FPGA. The logic that requires most of the resources is the NIOSII soft processor. However, it should be noted that the processor can be used for other tasks as well.

The receiver was compiled with a target clock of 100 MHz, and the compilation software confirmed the timing closure with a large margin.

Before proceeding to the field experiments, we checked with the on-board emulator that the timings of the system tasks were consistent. Every bit in air lasts for $T_{b}$ , which corresponds to 1000 clock cycles when the clock is 100 MHz. Thus, every bit must be processed in 1000 clock cycles at most on average. Thanks to the 5-bit capacity of the input circular buffer, a few bits every now and again can take more time, as long as the 1000 cycles limit is maintained on average.

Figure 13 shows the timing of the processing, checked in the FPGA working in real time. As long as a bit is present in the circular buffer, the NIOSII commands the start of the I/Q demodulation, which terminates in 100 clock cycles: a cycle per sample. At this time, I and Q are present in the accumulators and ready for the phase calculation, which occurs in 20 cycles. The NIOSII gets the phase, makes the decision on the bit, and calculates the phase correction for reading the next bit. This info is set in the circular buffer that in the meantime continued to store input samples. Now the NIOSII waits for the next 100 samples from the new offset to be ready in the buffer, and the processing starts again. On average the processor has about 650 cycles free every 1000, that can be used for managing the data packaging, searching for the data preambles, CRC checking, etc. The µC is apparently unloaded; however, its calculation power would never be enough to support the calculation of the phase in real time, which, for this reason, is performed by the specialized hardware described in the previous parts of this work.

Finally, we note that the latency of the processing is very low, less than $T_{b}$ .

5.2. Experimental Set-Up

The experiments were performed by employing two of the VLC systems composed by a house-made frontend connected to a commercial FPGA developing boards, described in [30]: one system acted like the transmitter; the second as the receiver. The transmitter included in the FPGA was a Manchester coder designed according to the IEEE 802.15.7 standard. It was connected to the lamp model 17508 produced by Philips N.V. (Amsterdam, The Netherlands): a 16 W, 9 LED lamp certified for automotive applications. The receiver, whose FPGA was completed with the described Manchester decoder, was connected to the photodetector PDAPC2 produced by Thorlabs Inc. (Newton, NJ, USA). The 2 systems run on independent clock oscillators, which produced an unavoidable frequency shift between the TX and the RX. A photo of the 2 VLC systems connected to the lamp and the photodetector is reported in Figure 14.

The lamp and the receiver were assembled on tripods and moved 4 m apart. The transmitter was loaded with data packets of 2 Mb random bits. The transmission of a sequence of ‘0’ bits preceded the transmission of the 2 Mb data packet. The sequence was used by the receiver to achieve synchronization. The receiver saved in its memory the decoded bits and the IQ data, used later for debugging and analysis. These data were downloaded to the PC after the end of communication.

Several tests were conducted by varying the SNR at the receiver. At the receiver, the main noise source is constituted by the electronics noise of the photodetector and the first stage of amplification, which are independent from the input signal and constant over the experiments. Given this constant input noise, the variation of SNR was obtained by reducing the level of the signal captured from the transmitter.

5.3. Experimental Results

Twelve transmissions were carried out with different SNRs at the receiver. The received bits and IQ data were saved in every test. The BER was calculated by comparing the received packed to the transmitted data. Results are shown in Figure 15. On the abscissa, the received signal power S is reported in dB after being normalized with respect to the maximum dynamics of the analog-to-digital converter of the system. We measured no errors until the input signal was higher than −30 dB. For lower intensities, the BER rose and reached the 10⁻³ threshold for S of −47 dB.

For a better understanding of the results, Figure 15 reports the constellations measured in four selected cases. The constellations are shown in the IQ plane by overlapping 200 IQ values. Please note that the axes scale changes among the four panels. The unity in the axes again represents the maximum dynamics of the receiver. As noted before, the noise is mainly due to the electrical noisy of the analog section of the receiver, and remains constant. Thus, the diameter of the circular clouds, formed by the distribution of the received IQ points, remains basically unchanged. When the power of the signal, which in the constellation is related to the distance of each point from the origin, reduces below S < −30 dB, the decision given by the sign of Q starts to generate errors.

6. Discussion and Conclusions

In this work the architecture of a high-performance Manchester receiver, specifically designed for FPGA implementation, was described and tested. The receiver has been implanted in a research VLC system and the expected results have been verified through simulations and experiments. Thanks to the 12-bit AD in the front-end and the IQ demodulation the receiver achieves a high noise immunity, boosting the performance of the application it is embedded in.

This approach is relatively calculation-demanding: just the demodulation for the IQ calculation requires 200 multiplications and as many additions in every $T_{b}$ , i.e., 4G operations per s. A real-life deployment of the algorithm, for example in the automotive context, requests addition security features for the implementation of the Automotive Safety Integrity Level (ASIL), which require even further calculations and hardware complication. Even if the required calculation intensity is hardly achievable through a processor and far from the reach of a μC, it can be easily supported by only two DSP blocks of the FPGA (see Table 7), leaving much room for the implementation of security protocols or other algorithms.

On the other hand, the calculation efforts of the proposed approach are similar to other complex algorithms successfully employed in widespread commercial applications, where the use of FPGA can be problematic. A notable example is the Global Position System (GPS) receiver [43]. The challenge is solved by the design of Application-Specific Integrated Circuits (ASICs) which include all the dedicated processing and, in high volume applications, are produced for few cents (see Table 1). In other high-end fields of application, FPGAs are already present. For example, where Manchester coded avionics buses like the MIL-STD-1553 [44] or the ARINC 659 [45] are used. These buses are already managed by FPGAs, and the transition to VLC is being considered [46].

Author Contributions

Conceptualization and methodology, S.R. and S.C.; firmware, S.R.; validation, S.C.; formal analysis, L.M.; investigation, S.R. and S.C.; resources, L.M.; writing—original draft preparation, S.R.; writing—review and editing, L.M. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

View Image - Figure 1. The “Sync.” block cuts the signal in slices of N samples; the “Proc.” block processes the slices for detecting the bits, and produces a feedback to Sync. which maintains the correct synchronism with respect to the bit boundaries.

Figure 1. The “Sync.” block cuts the signal in slices of N samples; the “Proc.” block processes the slices for detecting the bits, and produces a feedback to Sync. which maintains the correct synchronism with respect to the bit boundaries.

View Image - Figure 2. The receiver tentatively places the borders of the first bit in the position delimited by the red-dashed segments. The phase error with respect to the true bit is calculated and used to correct the border positions (magenta dashed lines) for the 2nd bit.

Figure 2. The receiver tentatively places the borders of the first bit in the position delimited by the red-dashed segments. The phase error with respect to the true bit is calculated and used to correct the border positions (magenta dashed lines) for the 2nd bit.

View Image - Figure 3. Relation between the position of the rising edge with respect to the tentative bit start (phase error [Forumla omitted. See PDF.]) and the phase [Forumla omitted. See PDF.] calculated through (7). Blue and magenta curves refer to the bit values of ‘0’ and ‘1’, respectively. A selection of waveforms are reported on top (bit ‘1’) and bottom (bit ‘0’) of the graph in the position of the phase error they represent.

Figure 3. Relation between the position of the rising edge with respect to the tentative bit start (phase error [Forumla omitted. See PDF.]) and the phase [Forumla omitted. See PDF.] calculated through (7). Blue and magenta curves refer to the bit values of ‘0’ and ‘1’, respectively. A selection of waveforms are reported on top (bit ‘1’) and bottom (bit ‘0’) of the graph in the position of the phase error they represent.

View Image - Figure 4. Top: Comparison between inverse tangent (red-dashed curve) and proposed approximations (blue and magenta curves) in the first octant, where the atan argument x ranges from 0 to 1. Bottom: Error of proposed approximations.

Figure 4. Top: Comparison between inverse tangent (red-dashed curve) and proposed approximations (blue and magenta curves) in the first octant, where the atan argument x ranges from 0 to 1. Bottom: Error of proposed approximations.

View Image - Figure 5. BER measured when a data packet of 1 Mbit affected by white noise from 0 to −25 dB is received by applying different [Forumla omitted. See PDF.].

Figure 5. BER measured when a data packet of 1 Mbit affected by white noise from 0 to −25 dB is received by applying different [Forumla omitted. See PDF.].

View Image - Figure 6. BER measured by receiving a data packet of 1 Mbit when frequency error of 1% (a) and noise jitter (b) are present. SNR is reported in x-axis, different curves refer to receptions obtained with different saturation levels [Forumla omitted. See PDF.].

Figure 6. BER measured by receiving a data packet of 1 Mbit when frequency error of 1% (a) and noise jitter (b) are present. SNR is reported in x-axis, different curves refer to receptions obtained with different saturation levels [Forumla omitted. See PDF.].

View Image - Figure 7. (a) Manchester reference decoder. (b) BER comparison between the proposed (blue continuous curve) and the reference decoder (red-dashed curve).

Figure 7. (a) Manchester reference decoder. (b) BER comparison between the proposed (blue continuous curve) and the reference decoder (red-dashed curve).

Figure 8. “Bird’s eye” view of the decoder architecture implemented in FPGA.

View Image - Figure 9. Acquired data are temporarily stored in the circular buffer. They are written from the 10 MHz side, and read from the 100 MHz domain in blocks of 100 samples that represent the data covering a [Forumla omitted. See PDF.].

Figure 9. Acquired data are temporarily stored in the circular buffer. They are written from the 10 MHz side, and read from the 100 MHz domain in blocks of 100 samples that represent the data covering a [Forumla omitted. See PDF.].

Figure 10. I/Q demodulator.

Figure 11. Phase calculation in FPGA.

View Image - Figure 12. Comparison of BERs measured by a receiver working with ideal mathematic (IM, dashed curves) and real mathematic (RM, continuous curves). Tests are performed with SNR from 0 to −25 dB and [Forumla omitted. See PDF.] of 6%, 10%, and 20%.

Figure 12. Comparison of BERs measured by a receiver working with ideal mathematic (IM, dashed curves) and real mathematic (RM, continuous curves). Tests are performed with SNR from 0 to −25 dB and [Forumla omitted. See PDF.] of 6%, 10%, and 20%.

View Image - Figure 13. Temporal budget of the processing in the receiver. Every bit must be processed in 1000 clock cycles on average. Small fluctuations are tolerated thanks to the circular buffer in input.

Figure 13. Temporal budget of the processing in the receiver. Every bit must be processed in 1000 clock cycles on average. Small fluctuations are tolerated thanks to the circular buffer in input.

View Image - Figure 14. In the experiments, we employed one VLC systems: one acts like TX (left) and is connected to the automotive lamp, the second like RX (right) and is connected the PDAPC2 photodetector.

Figure 14. In the experiments, we employed one VLC systems: one acts like TX (left) and is connected to the automotive lamp, the second like RX (right) and is connected the PDAPC2 photodetector.

View Image - Figure 15. BER measured at different levels of input signal S. Input signal is normalized with respect to receiver dynamics. Red circles report experimental measurements. Received constellations are shown in four selected cases.

Figure 15. BER measured at different levels of input signal S. Input signal is normalized with respect to receiver dynamics. Red circles report experimental measurements. Received constellations are shown in four selected cases.

Table 2

Values of I, Q, and phase for an ideal Manchester signal of ±A levels.

BIT	Symbols	I	Q	Phase
0	01	0	+2A	$+ 90 °$
1	10	0	−2A	$- 90 °$

Table 3

Bit decision and phase error derived from calculated phase.

Calculated Phase $φ$	BitDecision	Error Estimation $φ_{e}$
−180° ≤ $φ$ < 0°	1	$φ + 90 °$
0° ≤ $φ$ < 180°	0	$90 ° - φ$

Table 4

Calculation of the approximated 4-quadrant atan.

Sign(I)	Sign(Q)	\|I\|\|Q\|	Octant	Angle	$φ$
+	+	\|I\| > \|Q\|	0	0 ≤ φ < $45 °$	$0 ° + m$ · \|Q\|/\|I\|
+	+	\|I\| ≤ \|Q\|	1	$45 °$ ≤ φ < $90 °$	$90 ° - m$ · \|I\|/\|Q\|
-	+	\|I\| < \|Q\|	2	$90$ ° ≤ φ < 135°	$90 ° + m$ · \|I\|/\|Q\|
-	+	\|I\| ≥ \|Q\|	3	$135 °$ ≤ φ < $180 °$	$180 ° - m$ · \|Q\|/\|I\|
-	-	\|I\| < \|Q\|	4	$- 180 °$ ≤ φ < $- 135 °$	$- 180 ° + m$ · \|I\|/\|Q\|
-	-	\|I\| ≥ \|Q\|	5	$- 135 °$ ≤ φ < $- 90 °$	$- 90 ° - m$ · \|Q\|/\|I\|
+	-	\|I\| ≤ \|Q\|	6	$- 90 °$ ≤ φ < $- 45 °$	$- 90 ° + m$ · \|I\|/\|Q\|
+	-	\|I\| > \|Q\|	7	$- 45 °$ ≤ φ < $0 °$	$0 ° - m$ · \|Q\|/\|I\|

Table 5

Parameters employed in simulations.

Parameter	Value
Sync seq.	100 bits
Payload	10⁶ bits
$T_{b}$	10 μs
$f_{c}$	10 Msps
N	100
SNR	0; −25 dB; step 1 dB
$Z_{%}$	2%, 4%, 6%, 8%, 10%, 20%, 100%

Table 6

Preparation of numerator and denominator of division.

Sign (I)	Sign (Q)	$\|I\|$ $, \|Q\|$	Numerator (A)	Denominator (B)
+	+	$\|I\|$ ≤ $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
-	+	$\|I\|$ < $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
-	+	$\|I\|$ ≥ $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$
-	-	$\|I\|$ < $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
-	-	$\|I\|$ ≥ $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$
+	-	$\|I\|$ ≤ $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
+	-	$\|I\|$ > $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$
+	+	$\|I\|$ > $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$

Table 7

FPGA resources required by receiver.

Entity	ALM	DSP (18 × 18)	Memory (bit)
Circular Buffer	143	0	8192
I/Q Demodulator	172	2	3072
Phase Calc.	686	0	0
NIOSII Processor	2133	3	10,762
Interconnect	321	0	0
TOT	3455	5	22,026

References

1. Khan, L.U. Visible Light Communication: Applications, Architecture, Standardization and Research Challenges. Digit. Commun. Netw.; 2017; 3, pp. 78-88. [DOI: https://dx.doi.org/10.1016/j.dcan.2016.07.004]

2. Rehman, S.; Ullah, S.; Chong, P.; Yongchareon, S.; Komosny, D. Visible Light Communication: A System Perspective—Overview and Challenges. Sensors; 2019; 19, 1153. [DOI: https://dx.doi.org/10.3390/s19051153]

3. Shen, W.H.; Tsai, H.M. Testing vehicle-to-vehicle visible light communications in real-world driving scenarios. Proceedings of the 2017 IEEE Vehicular Networking Conference (VNC); Torino, Italy, 27–29 November 2017; [DOI: https://dx.doi.org/10.1109/VNC.2017.8275596]

4. Wang, Y.; Chen, X.; Xu, Y. Transmitter for 1.9 Gbps phosphor white light visible light communication without a blue filter based on OOK-NRZ modulation. Opt. Express; 2023; 31, pp. 7933-7946. [DOI: https://dx.doi.org/10.1364/OE.476911] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36859914]

5. Cossu, G.; Khalid, A.M.; Choudhury, P.; Corsini, R.; Ciaramella, E. 3.4 Gbit/s Visible Optical Wireless Transmission Based on RGB LED. Opt. Express; 2012; 20, B501. [DOI: https://dx.doi.org/10.1364/OE.20.00B501] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/23262894]

6. Elamassie, M.; Karbalayghareh, M.; Miramirkhani, F.; Kizilirmak, R.C.; Uysal, M. Effect of fog and rain on the performance of vehicular visible light communications. Proceedings of the 2018 IEEE 87th Vehicular Technology Conference (VTC Spring); Porto, Portugal, 3–6 June 2018; [DOI: https://dx.doi.org/10.1109/VTCSpring.2018.8417738]

7. Fuada, S.; Pradana, A.; Adiono, T.; Popoola, W.O. Demonstrating a real–time QAM–16 visible light communications utilizing off-the-shelf hardware. Results Opt.; 2023; 10, 100348. [DOI: https://dx.doi.org/10.1016/j.rio.2022.100348]

8. Gagliardi, R.M.; Karp, S. Optical Communications. Wiley Series in Telecommunications and Signal Processing; 2nd ed. Wiley: New York, NY, USA, 1995; ISBN 978-0-471-54287-2

9. Ricci, S.; Caputo, S. Transmitter for Visible Light Communications Based on FPGA’s Output Buffers. IEEE Commun. Lett.; 2024; 28, pp. 2116-2120. [DOI: https://dx.doi.org/10.1109/LCOMM.2024.3430393]

10. Nawaz, T.; Seminara, M.; Caputo, S.; Mucchi, L.; Cataliotti, F.S.; Catani, J. IEEE 802.15.7-Compliant Ultra-Low Latency Relaying VLC System for Safety-Critical ITS. IEEE Trans. Veh. Technol.; 2019; 68, pp. 12040-12051. [DOI: https://dx.doi.org/10.1109/TVT.2019.2948041]

11. Li, H.; Chen, X.; Guo, J.; Chen, H. A 550 Mbit/s Real-Time Visible Light Communication System Based on Phosphorescent White Light LED for Practical High-Speed Low-Complexity Application. Opt. Express; 2014; 22, 27203. [DOI: https://dx.doi.org/10.1364/OE.22.027203] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/25401871]

12. Forster, R. Manchester encoding: Opposing definitions resolved. Eng. Sci. Educ. J.; 2000; 9, pp. 278-280. [DOI: https://dx.doi.org/10.1049/esej:20000609]

13. IEEE 802.15.7-2018 IEEE Standard for Local and Metropolitan Area Networks—Part 15.7: Short-Range Optical Wireless Communications; IEEE Standards Association: Piscataway, NJ, USA, 2019; Available online: https://standards.ieee.org/ieee/802.15.7/6820/ (accessed on 6 December 2024).

14. Cui, Z.; Yue, P.; Yi, X.; Li, J. Research on non-uniform dynamic vehicle-mounted VLC with receiver spatial and angular diversity. Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC); Shanghai, China, 20–24 May 2019; [DOI: https://dx.doi.org/10.1109/ICC.2019.8761686]

15. Caputo, S.; Ricci, S.; Mucchi, L. IEEE 802.15.7-Compliant Full Duplex Visible Light Communication: Interference Analysis and Experimentation. IEEE Open J. Veh. Technol.; 2024; 5, pp. 1242-1255. [DOI: https://dx.doi.org/10.1109/OJVT.2024.3449144]

16. Razavi, B. Challenges in the design high-speed clock and data recovery circuits. IEEE Commun. Mag.; 2002; 40, pp. 94-101. [DOI: https://dx.doi.org/10.1109/MCOM.2002.1024421]

17. Ahmed, S.I.; Kwasniewski, T.A. Overview of oversampling clock and data recovery circuits. Proceedings of the Canadian Conference on Electrical and Computer Engineering; Saskatoon, SK, Canada, 1–4 May 2005; pp. 1876-1881. [DOI: https://dx.doi.org/10.1109/CCECE.2005.1557348]

18. Moon, Y.H.; Kang, J.K. 2× oversampling 2.5 Gbps clock and data recovery with phase picking method. Curr. Appl. Phys.; 2004; 4, pp. 75-81. [DOI: https://dx.doi.org/10.1016/j.cap.2003.09.016]

19. Bartley, T.; Tanaka, S.; Nonomura, Y.; Nakayama, T.; Muroyama, M. Delay window blind oversampling clock and data recovery algorithm with wide tracking range. Proceedings of the IEEE International Symposium on Circuits and Systems (ISCAS); Lisbon, Portugal, 24–27 May 2015; pp. 1598-1601. [DOI: https://dx.doi.org/10.1109/ISCAS.2015.7168954]

20. Kubicek, M.; Kolka, Z. Blind Oversampling Data Recovery with Low Hardware Complexity. Radioengineering; 2010; 19, pp. 74-78.

21. Kolka, Z.; Kubicek, M.; Biolek, D.; Biolkova, V. Optimization of oversampling Data Recovery. Proceedings of the 52nd IEEE International Midwest Symposium on Circuits and Systems; Cancun, Mexico, 2–5 August 2009; pp. 467-470. [DOI: https://dx.doi.org/10.1109/MWSCAS.2009.5236055]

22. Wang, C.C.; Lee, C.L.; Hsiao, C.Y.; Huang, J.F. Clock-and-Data Recovery Design for LVDS Transceiver Used in LCD Panels. IEEE Trans. Circuits Syst. II Express Briefs; 2006; 53, pp. 1318-1322. [DOI: https://dx.doi.org/10.1109/TCSII.2006.881812]

23. Vijayalakshmi, S.; Paramasivam, A.; Nagarajan, V.; Kudiyarasan, S.; Kamatchi, S.; Hasheetha, J. Design and performance analysis of manchester coder-based body channel communication using FPGA. E-Prime—Adv. Electr. Eng. Electron. Energy; 2024; 9, 100660. [DOI: https://dx.doi.org/10.1016/j.prime.2024.100660]

24. Shi, J.; Xu, Y.; Shi, J. Manchester encoder and decoder based on CPLD. Proceedings of the IEEE International Conference on Industrial Technology; Chengdu, China, 21–24 April 2008; pp. 1-3. [DOI: https://dx.doi.org/10.1109/ICIT.2008.4608523]

25. Zuo, Y.; Yang, J.; Cheng, X. Design and implementation of Manchester CODEC based on FPGA. Appl. Mech. Mater.; 2013; 273, pp. 805-809. [DOI: https://dx.doi.org/10.4028/www.scientific.net/AMM.273.805]

26. Kim, J.; Jeong, D.K. Multi-gigabit-rate clock and data recovery based on blind oversampling. IEEE Commun. Mag.; 2003; 41, pp. 68-74. [DOI: https://dx.doi.org/10.1109/MCOM.2003.1252801]

27. Yoo, J.H.; Jang, J.S.; Kwon, J.K.; Kim, H.C.; Song, D.W.; Jung, S.Y. Demonstration of vehicular visible light communication based on LED headlamp. Int. J. Automot. Technol.; 2016; 17, pp. 347-352. [DOI: https://dx.doi.org/10.1007/s12239-016-0035-8]

28. Eldeeb, H.B.; Elamassie, M.; Sait, S.M.; Uysal, M. Infrastructure-to-Vehicle Visible Light Communications: Channel Modelling and Performance Analysis. IEEE Trans. Veh. Technol.; 2022; 71, pp. 2240-2250. [DOI: https://dx.doi.org/10.1109/TVT.2022.3142991]

29. Berber, S. Discrete Bandpass Modulation Methods. Discrete Communication Systems; Oxford University Press: Oxford, UK, 2021; pp. 305-385. [DOI: https://dx.doi.org/10.1093/oso/9780198860792.003.0007]

30. Ricci, S.; Caputo, S.; Mucchi, L. FPGA-based visible light communications instrument for implementation and testing of ultralow latency applications. IEEE Trans. Instrum. Meas.; 2023; 72, 2004811. [DOI: https://dx.doi.org/10.1109/TIM.2023.3280520]

31. Almeida, A.J.; Silva, N.A.; Muga, N.J.; André, P.S.; Pinto, A.N. Calculation of the number of bits required for the estimation of the bit error ratio. Proceedings of the Second International Conference on Applications of Optics And Photonics; Aveiro, Portugal, 26–30 May 2014; [DOI: https://dx.doi.org/10.1117/12.2063640]

32. Mengali, U.; D’Andrea, A.N. Synchronization Techniques for Digital Receivers; Springer Science + Business Media: New York, NY, USA, 1997; [DOI: https://dx.doi.org/10.1007/978-1-4899-1807-9]

33. Cooley, J.; Lewis, P.; Welch, P. The finite Fourier transform. IEEE Trans. Audio Electroacoust.; 1969; 17, pp. 77-85. [DOI: https://dx.doi.org/10.1109/TAU.1969.1162036]

34. Rajan, S.; Wang, S.; Inkol, R.; Joyal, A. Efficient approximations for the arctangent function. IEEE Signal Process. Mag.; 2006; 23, pp. 108-111. [DOI: https://dx.doi.org/10.1109/MSP.2006.1628884]

35. Pilato, L.; Fanucci, L.; Saponara, S. Real-Time and High-Accuracy Arctangent Computation Using CORDIC and Fast Magnitude Estimation. Electronics; 2017; 6, 22. [DOI: https://dx.doi.org/10.3390/electronics6010022]

36. Benammar, M.; Alassi, A.; Gastli, A.; Ben-Brahim, L.; Touati, F. New Fast Arctangent Approximation Algorithm for Generic Real-Time Embedded Applications. Sensors; 2019; 19, 5148. [DOI: https://dx.doi.org/10.3390/s19235148] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31775303]

37. Gutierrez, R.; Torres, V.; Valls, J. FPGA-implementation of atan(Y/X) based on logarithmic transformation and LUT-based techniques. J. Syst. Archit.; 2010; 56, pp. 588-596. [DOI: https://dx.doi.org/10.1016/j.sysarc.2010.07.013]

38. Zeng, Z.; Zhang, L.; Gong, L.; Zhang, N. A Fast Lock-In Time, Capacitive FIR-Filter-Based Clock Multiplier with Input Clock Jitter Reduction. Electronics; 2023; 12, 1439. [DOI: https://dx.doi.org/10.3390/electronics12061439]

39. Russo, D.; Ricci, S. FPGA Implementation of a Synchronization Circuit for Arbitrary Trigger Sequences. IEEE Trans. Instrum. Meas.; 2020; 69, pp. 5251-5259. [DOI: https://dx.doi.org/10.1109/TIM.2019.2952478]

40. Ricci, S.; Meacci, V. Data-Adaptive Coherent Demodulator for High Dynamics Pulse-Wave Ultrasound Applications. Electronics; 2018; 7, 434. [DOI: https://dx.doi.org/10.3390/electronics7120434]

41. Dally, W.J.; Harting, R.C.; Aamodt, T.M. Digital Design Using VHDL: A Systems Approach; Cambridge University Press: Cambridge, UK, 2015; ISBN 978-1107098862

42. Ricci, S.; Caputo, S.; Mucchi, L. FPGA-Based Pulse Compressor for Ultra Low Latency Visible Light Communications. Electronics; 2023; 12, 364. [DOI: https://dx.doi.org/10.3390/electronics12020364]

43. Kaplan, E.; Hegart, C. Understanding GPS/GNSS: Principles and Applications; 3rd ed. Artech House Publishers: Norwood, MA, USA, 2017; ISBN 978-1630810580

44. Jiang, S.; Liu, S.; Guo, C.; Fan, X.; Ma, T.; Tiwari, P. Implementation of ARINC 659 bus controller for space-borne computers. Electronics; 2019; 8, 435. [DOI: https://dx.doi.org/10.3390/electronics8040435]

45. Pendyala, P.; Pasupureddi, V.S.R. 100-Mb/s enhanced data rate MIL-STD-1553B controller in 65-nm CMOS technology. IEEE Trans. Aerosp. Electron. Syst.; 2016; 52, pp. 2917-2929. [DOI: https://dx.doi.org/10.1109/TAES.2016.150564]

46. Karafolas, N.; Armengol, J.M.P.; Mckenzie, I. Introducing photonics in spacecraft engineering: ESA’s strategic approach. Proceedings of the IEEE Aerospace Conference 2009; Big Sky, MT, USA, 7–14 March 2009; pp. 1-15. [DOI: https://dx.doi.org/10.1109/AERO.2009.4839438]

Word count: 9887

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Visible Light Communication (VLC) is a cutting-edge transmission technique where data is sent by modulating light intensity. Manchester On–Off Keying (OOK) is among the most used modulation techniques in VLC and is normed by IEEE 802.15.7 standard for wireless networks. Various Manchester decoder schemes are documented in the literature, often leveraging minimal two-level analog-to-digital converters followed by straightforward digital logic. These methods often compromise performance for simplicity. However, the VLC applications in fields like automotive and/or aerospace require the maximum performance in terms of bit error rate (BER) with respect to Signal-to-Noise Ratio (SNR), together with a real-time low-latency implementation. In this work, we introduce a high-performance Manchester decoder and detail its implementation in a Field Programmable Gate Array (FPGA). The decoder operates by acquiring a fully resolved signal (12-bit resolution) and by calculating the phase of the transmitted bit. Additionally, the proposed decoder achieves and maintains synchronization with the incoming signal, tolerating frequency shifts and jitter up to 1%. The Manchester decoder was tested in a VLC system with automotive-certified headlamps, realizing an IEEE 802.15.7-compliant link at 100 kb/s. The proposed decoder ensures a BER below 10⁻² for SNR > −12 dB and, compared to a standard decoder, achieves the same BER when the input signal has an SNR of 10 dB lower.

Details

Title

FPGA-Based Manchester Decoder for IEEE 802.15.7 Visible Light Communications

Author

Ricci, Stefano

; Caputo, Stefano

; Mucchi, Lorenzo

First page

Publication year

2025

Publication date

2025

Publisher

MDPI AG

e-ISSN

20799292

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/electronics14010096

ProQuest document ID

3153798663

Sign(I)	Sign(Q)	\|I\|\|Q\|	Octant	Angle	$φ$
+	+	\|I\| > \|Q\|	0	0 ≤ φ < $45 °$	$0 ° + m$ · \|Q\|/\|I\|
+	+	\|I\| ≤ \|Q\|	1	$45 °$ ≤ φ < $90 °$	$90 ° - m$ · \|I\|/\|Q\|
-	+	\|I\| < \|Q\|	2	$90$ ° ≤ φ < 135°	$90 ° + m$ · \|I\|/\|Q\|
-	+	\|I\| ≥ \|Q\|	3	$135 °$ ≤ φ < $180 °$	$180 ° - m$ · \|Q\|/\|I\|
-	-	\|I\| < \|Q\|	4	$- 180 °$ ≤ φ < $- 135 °$	$- 180 ° + m$ · \|I\|/\|Q\|
-	-	\|I\| ≥ \|Q\|	5	$- 135 °$ ≤ φ < $- 90 °$	$- 90 ° - m$ · \|Q\|/\|I\|
+	-	\|I\| ≤ \|Q\|	6	$- 90 °$ ≤ φ < $- 45 °$	$- 90 ° + m$ · \|I\|/\|Q\|
+	-	\|I\| > \|Q\|	7	$- 45 °$ ≤ φ < $0 °$	$0 ° - m$ · \|Q\|/\|I\|

Sign (I)	Sign (Q)	$\|I\|$ $, \|Q\|$	Numerator (A)	Denominator (B)
+	+	$\|I\|$ ≤ $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
-	+	$\|I\|$ < $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
-	+	$\|I\|$ ≥ $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$
-	-	$\|I\|$ < $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
-	-	$\|I\|$ ≥ $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$
+	-	$\|I\|$ ≤ $\|Q\|$	$2^{j} \cdot \|I\|$ · 2¹²	$2^{j} \cdot \|Q\|$
+	-	$\|I\|$ > $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$
+	+	$\|I\|$ > $\|Q\|$	$2^{j} \cdot \|Q\|$ · 2¹²	$2^{j} \cdot \|I\|$