Xilinx TDC

Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Master of Science Thesis in Electrical Engineering

Department of Electrical Engineering, Linköping University, 2020

A 1.8 ps Time-to-Digital
Converter (TDC)
Implemented in a 20 nm
Field-Programmable Gate
Array (FPGA) Using a
Ones-Counter Encoding
Scheme with Embedded
Bin-Width Calibrations and
Temperature Correction

Sven Engström
Master of Science Thesis in Electrical Engineering

A 1.8 ps Time-to-Digital Converter (TDC) Implemented in a 20 nm


Field-Programmable Gate Array (FPGA) Using a Ones-Counter Encoding
Scheme with Embedded Bin-Width Calibrations and Temperature Correction:

Sven Engström

LiTH-ISY-EX--20/5343--SE

Supervisor: Docent Oscar Gustafsson


isy, Linköpings universitet
Patrik Thalin
Teledyne SP Devices

Examiner: Kent Palmkvist


isy, Linköpings universitet

Division of Computer Engineering


Department of Electrical Engineering
Linköping University
SE-581 83 Linköping, Sweden

Copyright © 2020 Sven Engström


Abstract
This thesis investigates the use of field-programmable gate arrays (fpgas) to im-
plement a time-to-digital converter (tdc) with on-chip calibration and tempera-
ture correction. Using carry-chains on the Xilinx Kintex UltraScale architecture
to create a tapped delay line (tdl) has previously been proven to give good time
resolution. This project improves the resolution further by using a bit-counter to
handle bubbles in the tdl without removing any taps. The bit counter also adds
the possibility of using a wave-union approach previously dismissed as unusable
on this architecture. The final implementation achieves an RMS resolution of
1.8 ps.

iii
Acknowledgments
I would first like to thank my colleagues at Teledyne SP Devices who gave me the
opportunity to write this master thesis. A special thanks to Per Magnusson, both
for the helpful discussions during the project and comments on the report. His
feedback has been crucial for the quality of my report.
I would also like to thank my supervisor Oscar Gustafsson and my examiner Kent
Palmkvist for their help with this thesis and the great courses they have provided
during my studies.

Linköping, October 2020


Sven Engström

v
Contents

Notation ix

1 Introduction 1
1.1 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Delimitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Theory 3
2.1 TDC architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Tapped delay line . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2.1 Bubbles and non-linearity . . . . . . . . . . . . . . . . . . . 4
2.2.2 Rise and fall time differences . . . . . . . . . . . . . . . . . 4
2.2.3 Temperature dependency . . . . . . . . . . . . . . . . . . . 4
2.2.4 Voltage dependency . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.5 Individual differences . . . . . . . . . . . . . . . . . . . . . 5
2.3 Multi-measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.1 Multiple instances . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3.2 Wave union . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Design flexibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4.1 Ones-counter . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4.2 Histogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4.3 Oscillator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3 Method 9
3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.1 Tapped delay line . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1.2 Bit counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1.3 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.4 Histogram engine . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.5 Signal handler . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.6 Temperature correction . . . . . . . . . . . . . . . . . . . . . 13
3.1.7 Frequency estimator . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

vii
viii Contents

3.2.1 Timestamp . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2.2 Temperature correction . . . . . . . . . . . . . . . . . . . . . 15
3.3 Test platform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1 Evaluation kit . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2 Digitizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Evaluation 19
4.1 Erratic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Single edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.3 Pulsed edge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4.1 Bin width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.4.2 Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.5 Trigger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.6 Resource usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Conclusions 25
5.1 Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2 Trade-offs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3.1 Teledyne SP Devices Digitizer . . . . . . . . . . . . . . . . . 27
5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

A Evaluation plots 31

Bibliography 49
Notation

Abbreviations

Abbreviation Meaning
cdc Clock domain crossing
clb Configurable logic block
fpga Field-programmable gate array
lut Lookup table
mcu Micro controller unit
mux Multiplexer
pvt Process, Voltage, Temperature
tdc Time-to-digital converter
tdl Tapped delay line
uart Universal asynchronous receiver-transmitter

ix
Introduction
1
Teledyne SP Devices has developed a range of digitizers used to collect analog sig-
nals digitally. One essential part part of data collection is to know when to start,
a.k.a. triggering. In this case, the trigger is a level trigger, i.e. a trigger that fires
when the input signal reaches a configurable level. A simple level trigger may be
built using an analog comparator and a micro controller unit (mcu) sampling the
signal repeatedly. This limits the trigger pulse width and resolution to the clock
period of the mcu which is not always fast enough.

1.1 Aim
The purpose of this master thesis project is to develop a high resolution time-
to-digital converter (tdc) using a tapped delay line (tdl) contained in a Xilinx
Kintex UltraScale Field-programmable gate array (fpga).

1.2 Research questions


The following questions will be in focus to achieve the aim of the thesis:
1. What resolution is possible to achieve?
2. What trade-offs are there between different calibration methods, trigger res-
olution and logic usage?
3. How does this tdc, implemented on the Xilinx UltraScale architecture,
compare to previous research?
4. What is the real world accuracy and precision of this fpga-based tdc?

1
2 1 Introduction

5. How does this tdc perform when used as a level trigger?

1.3 Delimitations
There are some delimitations in place to make the project manageable.
• The time invested in the project is limited to 800 hours.
• The tdc will be constructed using a Xilinx Kintex UltraScale fpga.
2
Theory

This chapter introduces concepts used in this project.

2.1 TDC architectures


Given the limitations of digital and clocked electronics, there are two common
ways to create a tdc with sub-cycle resolution; create delayed versions of the
input and sample these using a synchronous clock, or the other way around using
phase shifted copies of the clock to sample the signal at different moments in time
[1, 6].
The first approach is preferable for several reasons. One reason is that the chosen
architecture is neither designed to create nor to route the large number of clocks
needed for the second approach. Another reason is that the second approach
adds one extra clock domain crossing (cdc) and source of meta stability.

2.2 Tapped delay line


The principles of a tdl is shown in figure 2.1. The input signal is routed through
a chain of delay elements. Along the chain there are outputs (taps) allowing the
signal to be sampled with different amounts of delay.
When creating a tdc, delay elements with a short time delay are preferred since
the delay between each consecutive pair of taps, called bin width, directly affects
the resolution.
Since fpgas generally do not have tdl-specific hardware the delay elements
might be routing or logic.

3
4 2 Theory

Input
T T T T T T T T Delay elements

Flip flops

Thermometer code

Figure 2.1: Principles of a tdl

Each configurable logic block (clb) in the chosen architecture has an 8-input (16-
bit) carry adder element, CARRY8, with fast interconnect between carry out and
carry in on the next clb [15], which makes them a prime candidate for a delay
chain. Each carry block has eight carry outputs and also supplies an inverted
version of each bit for a total of 16 taps. All outputs have an associated flip-flop
within the clb to store the output.
Using all 16 outputs is sometimes referred to as dual-sampling [5, 12].

2.2.1 Bubbles and non-linearity


The logical function of the carry chain is known, but the internal layout, routing
and timing is unknown. Previous research have shown that the outputs are not
equidistant in time and may even arrive out-of-order [3–5, 12, 13]. Figure 2.2
shows an example of non-linearity and bubbles in a CARRY8 element where out-
put C4 transitions before the four preceding outputs, C2 to CO3.
The signals arriving out-of-order and disrupting what could have been a perfect
thermometer code is called bubbles and has to be dealt with when finding an
edge on the incoming signal. One way to remove bubbles is to determine the
order in which the signals arrive at the output and create a static realignment
based on this knowledge.
Especially large gaps in a tdl are known as ultra wide bins and reduce the local
accuracy significantly.

2.2.2 Rise and fall time differences


Previous research has shown that the propagation time for rising and falling
edges through the UltraScale carry elements differ [5, 12]. If the difference is not
linear throughout the tdl it will not be possible to create a single realignment of
the taps that works for both rising and falling edges.

2.2.3 Temperature dependency


Both ambient temperature and heat generated on chip affects resistance and sig-
nal propagation delay. Previous research has shown a small, positive, linear de-
2.3 Multi-measurement 5

CI (Signal) CO

CO0

CO1

CO2

CO3

CO4

CO5

CO6

CO7
C0

C1

C2

C3

C4

C5

C6

C7
Time

Figure 2.2: Example of bubbles and non-linearity in a CARRY8 carry ele-


ment. Red lines start at the time when the corresponding output changed
state from low to high.

pendence between temperature and propagation delay where propagation delay


increased by 4-5% with operating temperatures increasing from 10 to 64 ◦ C [7].

2.2.4 Voltage dependency


Voltage fluctuations on the power supply due to e.g. varying switching activity
may result in core voltage variations which in turn affects propagation delay.

2.2.5 Individual differences


Small individual variance between chips caused by the process might make per-
chip calibration necessary.

2.3 Multi-measurement
Averaging multiple measurements might reduce the effect of non-linearity, tem-
perature and voltage variations.

2.3.1 Multiple instances


A simple approach is to instantiate multiple copies of the tdc and average the
results. Disadvantages are higher logic utilization and an extra calibration step
to determine the different input delays to all instances. These delays may also be
affected by temperature and voltage variations.

This approach is not investigated further because of the high logic utilization.
6 2 Theory

2.3.2 Wave union


By creating a pulse of fixed length from the incoming edge, it is possible to mea-
sure the same input twice using one tdc. This method is known as wave union
[8, 14].
With a short pulse, it is possible to measure the same input twice using a tdl
only slightly larger than in the case of a single edge tdc. This helps eliminate
ultra large bins and improve accuracy. Errors caused by temperature and voltage
are not improved since both edges are clocked during the same clock cycle.
With a longer pulse it is possible to measure both edges, one at a time, at different
clock cycles without modifying the tdc. Measuring with a slight variation in
time may improve on errors caused by jitter on core voltage and system clock but
implies longer dead-time.

2.4 Design flexibility


Bubbles and non-linearity could be handled at design time by selecting a specific
tdl and streaming all outputs from the fpga to a connected PC for further anal-
ysis. The outputs could then be reordered and the sizes of all bins statistically
determined. This raises three questions:
• Is it possible to keep the properties of the tdl when rerunning synthesis
with changes to other parts of the design?
• Is it possible to create a static remapping of the outputs that removes bub-
bles for both rising and falling edges?
• Is it possible to create one static remapping that works on all chips or is the
individual variance too large?
By constraining place and route during synthesis process, it is possible to get
the same layout every time the fpga firmware is rebuilt. This does come with a
penalty in terms of design flexibility; when new modules are added to the design
these need to be placed and routed around the locked parts which may lead to
timing or routing issues.
Previous research shows that it is not possible to create a remapping that elimi-
nates bubbles for both rising and falling edges [5, 12].
Whether the part-to-part variance is too large to allow for one static remapping
when moving between devices is covered by neither previous research nor this
thesis. The concern is not ignored but the design described in the following para-
graphs does solve this problem if present.

2.4.1 Ones-counter
One way to deal with bubbles in the thermometer code without remapping the
outputs is to count all high outputs from one end of the tdl to the first larger
2.4 Design flexibility 7

gap. The number of high outputs corresponds to the bin number of the last high
output if the outputs had been remapped to a perfect thermometer code. The
size of a large gap has to be selected such that it is larger than the largest possible
bubble and smaller than the shortest pulse expected to be measured.
This imposes constraints on the properties of the input signal that a remapping
does not. It does however work on both rising and falling edges (counting low
and high outputs respectively), requires no calibration and is insusceptible to
errors caused by individual variance.

2.4.2 Histogram
Bin sizes are generally determined by generating and collecting a large number
of edges. Given a source uncorrelated to the system clock, larger bins will receive
more hits and by creating a histogram from the resulting set of bin numbers it is
possible to determine the relative sizes of the bins.
Previous research and works have done this calibration off chip by streaming the
collected bin numbers to a PC but that is not a viable solution in the case of this
self-contained tdc.

2.4.3 Oscillator
An edge generator uncorrelated to the system clock is mandatory to get good
results from the histogram calibration. One way to create this source without
adding additional physical components is to create a ring-oscillator within the
fpga.
3
Method

To answer the research questions a tdc had to be implemented on a Xilinx Kintex


UltraScale fpga. This chapter describes the implemented design.

3.1 Architecture
A system overview of the tdc architecture can be seen in figure 3.1. All parts run
at 625 MHz, except where otherwise specified.

3.1.1 Tapped delay line


The CARRY8 was chosen as the delay element in the tdl for the reasons men-
tioned in section 2.2. The carry elements are structurally and functionally de-
scribed in [15] and have 16 outputs, all connected to one flip-flop each, all within
the same clb.
Half of the outputs are hard wired through XOR gates together with the corre-
sponding data input which in this case results in an inversion since the data in-
put is always one. The inverted signals are inverted again after the first flip-flops
before entering the bit counter. Adding pipelining at this stage proved necessary
to meet timing requirements since the tdl stretches over a complete clock region
and all outputs need to be routed together to one single point to create the final
output.
The length of the tdl was selected so that the propagation time was a bit longer
than one clock cycle at room temperature. This allows for changes in propagation
time due to process, voltage and/or temperature (pvt) without overflowing the
tdl. The extra length causes some edges to be detected twice, once at the start

9
10 3 Method

ring oscillator

signal handler

tdl

freq. estimator
bit counter

edge detect

histogram
engine
histogram
engine

reg

Figure 3.1: System architecture

and once at the end. These are handled by the edge detection module described
in section 3.1.3.

Ring oscillator

One use of the tdl is to connect the last carry output to the input through an
inverter and thus creating a ring oscillator with temperature characteristics sim-
ilar to the tdl used for measurements. The use of this is further explained in
sections 3.1.5, 3.1.6 and 3.1.7.

3.1.2 Bit counter


The bit counter is essentially a large population counter in a tree structure, see
figure 3.2. The counter counts from the end of the tdl to make sure that the first
edge is found. This means that the output sum is the number of bins after the
edge. The bin number is then computed as the number of taps minus the output
sum.
The first levels are optimized for the UltraScale architecture and the 6-input
lookup tables (luts) that are available. At the input, six bits are summed up
into a three-bit word using the C63 module as shown in figure 3.3(a). These are
then combined to sum up 36 bits in the POP36 module shown in figure 3.3(b).
3.1 Architecture 11

36 36 36 36
POP36 POP36 POP36 POP36
6 6 6 6
[5][2] [5][2] [5][2] [5][2]
=0 & + =0 & =0 & + =0 &
7 7

& & & & & & & &

72 72
≥1 ≥1 + ≥1 ≥1 +
7
7
≥1 ≥1 +
8 +
1 0
8 8
1 0
8
+
9 +
1 0 9
1 0
9 9

≥1 ≥1
+
10 +
1 0 10
1 0

≥1 ≥1 10 10

(a) Part of bit counter structure. Dashed lines mark pipeline stages.
36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36 POP36
6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6
[5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2] [5][2]
=0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 & =0 & + =0 &
7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7

& & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & & &

72 72 72 72 72 72 72 72 72 72 72 72 72 72 72 72
≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + +
7 7 7 7 7 7 7 7
7 7 7 7 7 7 7 7
≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 + ≥1 ≥1 +
8 + 8 + 8 + 8 + 8 + 8 + 8 + 8 +
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0
8 8 8 8 8 8 8 8
+ + + +
9 + 9 + 9 + 9 +
1 0 9 1 0 9 1 0 9 1 0 9
1 0 1 0 1 0 1 0
9 9 9 9 9 9 9 9

≥1 ≥1 ≥1 ≥1
+ +
10 + 10 +
≥1 ≥1 ≥1 ≥1
1 0 10 1 0 10
1 0 1 0

≥1 ≥1 10 10 ≥1 ≥1 10 10

+
11 +
1 0 11
1 0

≥1 ≥1 11 11

(b) Complete 1152 bit counter

Figure 3.2: Bit counter


12 3 Method

6 6 6 6 6 6
C63 C63 C63 C63 C63 C63
3 3 3 3 3 3

[0] [1] [2]


6 6 6
C63 C63 C63
3 3 3
6
A B C
LUT6 LUT6 LUT6 00AAA
0BBB0
1 1 1 +CCC00
SSSSSS
[2] [1] [0] S
3 6

(a) C63 module (b) POP36 module

Figure 3.3: Pop-counter modules used in the bit counter

An early exit is implemented by adding some control signals, also shown in fig-
ure 3.2. Using those it is possible to detected an edge and exclude the following
bits and thereby allow pulses shorter than the tdl to be detected correctly.
At the same level as the control signals are introduced the complete structure
is also forked into a rising and falling edge sum structure, counting ones and
zeros respectively. Both have their own set of control signals. This can be seen in
figure 3.2.

3.1.3 Edge detection


By looking at the first outputs of the tdl, the edge detection module decides
whether there has been an edge passing the tdl input and whether it is a rising
or a falling edge. This module can be configured to forward only rising edges,
falling edges, pulsed edges or all.
Since the tdl has a propagation time a bit longer than one clock cycle, some edges
could be detected twice. This module contains logic to remove those duplicates.
Whether the first or second trigger should be removed can be selected through a
write to a register.
For pulsed inputs, the sum of both edges’ bin numbers are used as a virtual bin
number. Since both edges of the pulse need to be contained within the tdl the
first active bin will not be bin number zero, it will instead be the bin correspond-
ing to the width of the pulse.
3.1 Architecture 13

The edge detection module is also responsible for starting and stopping histogram
creation.

3.1.4 Histogram engine


As stated in section 2.2.2, the propagation speed of rising and falling edges differ
and therefore two histogram engines are instantiated. This allows the system to
determine and remember the characteristics of both types of edges at the same
time.
The histogram engine uses a dual-port block RAM with one read-write and one
read-only port which allows for efficient histogram creation. The block RAMs
are not capable of running at 625 MHz and are therefore clocked at half that
frequency. The cdc is handled through a first in, first out data buffer.

3.1.5 Signal handler


A schematic drawing of the signal handler is shown in figure 3.4.
At the input of the signal handler is a multiplexer (mux), implemented by a
three-input lut, selecting either the external input or the internal oscillator. The
output is then optionally routed through a small delay element, a CARRY8 with
routing, which together with an XOR function allows incoming edges to be trans-
formed into pulses.
The output signal is then created by a 6-input lut with three signals and three
option bits.

3.1.6 Temperature correction


Since the frequency of the ring oscillator is temperature dependent, a frequency
estimator could be used to get an indirect temperature measurement.

3.1.7 Frequency estimator


The frequency estimator, used to determine the frequency of the ring oscillator,
is mainly two counters. One running at a stable 100-MHz clock, the reference
counter, and another one counting positive edges on the oscillator output.
Early testing showed that an internal ring oscillator of the same length as the tdl
ran at about one quarter the speed of the system clock, ∼ 150 MHz. This allowed
for sampling the oscillator using the system clock and a short shift register. By
looking at two subsequent samples it was easy to determine if a positive edge had
occurred.
Each measurement starts with initiating both counters to zero. When the refer-
ence counter reaches one million, the value of the other counter is saved in a
8
register. The value of the register multiplied by 10
106
= 100 is the current approxi-
mation of the oscillator frequency.
14 3 Method

EXT OSC TDL output


1 1 1
1 LUT3
1 0
1

CARRY8
=1 &
LUT6

&
3
0 1 2 3 4
1

Figure 3.4: Signal handler

3.2 Computations
With the hardware architecture described in section 3.1 all data needed for pre-
cise measurements are available, but a few computational steps are needed to
transform the outputs to precise timestamps.

3.2.1 Timestamp
The collected histograms are used to calculate the width of each bin and the cor-
responding timestamp:
h[k]
w[k] = P Tclk (3.1)
n h[n]

k−1
X w[k]
t[k] = w[n] + . (3.2)
2
n=0
where h[k] is the value of bin k in the histogram and Tclk the clock period.

After calibration all t[k] can be computed and stored for use in a lookup table
for timestamps, either on the fpga or the computer, and requires therefore no
computation at run time.
3.3 Test platform 15

3.2.2 Temperature correction


Since the propagation time through the tdl is inversely proportional to the fre-
quency of the ring oscillator, the temperature correction is applied as
fcalibration
ttc [k] = t[k] (3.3)
fcurrent
where fcalibration and fcurrent is the ring oscillator frequency at calibration time and
at the time of the hit respectively.
This computation can be done either on chip or during post processing.

3.3 Test platform


Testing was done using two platforms:
• Xilinx Kintex UltraScale FPGA KCU105 Evaluation Kit [16]
• Teledyne SP Devices Digitizer

3.3.1 Evaluation kit


The Xilinx evaluation kit was used for iterative development and testing. Since
the design only contained one or two tdcs and a small uart module (to allow
read and write of registers from a PC), the synthesis time stayed well below 15
minutes.
This platform had two purposes; firstly implementing one tdl to determine gen-
eral limits, characteristics and possible resolution of the platform, and secondly
to determine the performance of the final tdc.
The performance evaluation was achieved by instantiating two tdcs and feeding
them one signal with different input delays. The delay was created by different
length cables corresponding to a delay of about half a clock cycle.
The main performance metric used was the
√ standard deviation of the differences
between both measurements divided by 2, as the measured values of standard
deviation contain two tdc channels.

3.3.2 Digitizer
The digitizer contains hardware for analog-to-digital conversion and was used
to evaluate the performance of the tdc as a trigger for data collection. To allow
data collection, this design contained a lot of logic in addition to the tdc which
caused the synthesis time to be counted in hours instead of minutes.
The existing trigger was compared to the tdc by doing triggered data collection
using the existing trigger and running the tdc at the same time. The input sig-
nal was bandwidth-limited to have a rise time that allowed the digitizer to collect
16 3 Method

multiple samples on the edge which allowed the data to be linearly interpolated
around the trigger point. The time when the interpolated signal crossed the trig-
ger level was compared to the values given by the existing trigger and the tdc.

Noise
Any noise introduced by the ADC during sampling will be included in the calcu-
lated trigger precision. Hence it is important to get a feeling for the magnitude
of these errors. The digitizer has an RMS error of 0.4 mV when terminated.

X2

X1

α 1−α

n t n+1

Figure 3.5: Interpolation of samples

Since the tdc error is calculated at a time t between two samples, the error has
to be modeled accordingly. If both samples closest to the time t are called X1 and
X2 the interpolated value, Yα , can be described by figure 3.5 and the following
equations:



 Yα = (1 − α)X1 + αX2

X1 ∼ N (µ1 , σX )


. (3.4)
X2 ∼ N (µ2 , σX )






α ∼ U (0, 1)

The variance of Yα , assuming that X1 and X2 are independent, is


σY2α = var((1 − α)X1 ) + var(αX2 )
= (1 − α)2 var(X1 ) + α 2 var(X2 ) (3.5)
2
= (1 − 2α + 2α )σX2 .
Since α is uniformly distributed between 0 and 1 the variance of the α-independent
Y is
Z1 Z1
σY = σYα dα = (1 − 2α + 2α 2 )σX2 dα
2 2

0 0 (3.6)
h 2 i 1 2
= σX2 α − α 2 + α 3 = σX2 .
3 0 3
The noise introduced by the ADC at time t would therefore have an RMS value
3.3 Test platform 17

of
r
1 2
σADC = σ (3.7)
k 3 X
where k is the slope of the edge and σX is 0.4 mV.
4
Evaluation

This chapter contains general observations and results from measurements done
using the two evaluation platforms listed in section 3.3.

4.1 Erratic behavior


The first outputs closest to the input of the tdl did receive an unreasonable
amount of edges, as can be seen in figure 4.1. The cause was given a cursory in-
vestigation but no reason was found. Instead the first 60 outputs were discarded
which resulted in a more reasonable edge distribution, see figure A.1.

4.2 Single edge


Calibration was done for 3 minutes using an external 10-kHz oscillator and 5
seconds using the internal oscillator and the tdl. This translates to about 1.8 ×
106 and 4 × 108 measurements respectively. The resulting bin widths of one of
the tdcs can be seen in figures A.1 and A.3.
A second internal oscillator with the same length as the tdl was also tested for
comparison. This one generated 7 × 108 edges during the 5 second calibration.
The distribution of bin widths are directly related to the resolution of the tdc
and histograms showing these can be seen in figures A.5 and A.7.
The evaluation plots are based on 327680 edges from an external 10-kHz oscilla-
tor with two in-phase outputs connected with different length cables. The differ-
ence in cable length introduced a delay of 950 ps which means that half of the

19
20 4 Evaluation

200000

150000
Count

100000

50000

0
0 100 200 300 400 500 600
Taps

Figure 4.1: Erratic behavior of first taps

samples hit both tdcs the same clock cycle and the others on two consecutive
cycles.

The error distribution varies depending on the position in the tdl that was hit
which is visible in figures A.9 and A.11. The error distribution of the complete
tdl can be seen in figures A.13 and A.15

4.3 Pulsed edge


Tests of the pulsed edge mode were done in the same way as the single edge
mode except for the shorter internal oscillator which did not manage to create a
reasonable calibration.

Resulting bin widths from the calibration are shown in figures A.2 and A.4. Dis-
tribution of the same can be seen in figures A.6 and A.8.

Position dependent error variation is visible in figures A.10 and A.12. The total
error distribution is summarized in figures A.14 and A.16

4.4 Results
Statistics from the evaluation is summarized in table 4.1.
4.4 Results 21

4.4.1 Bin width


When running in the single edge mode the tdc has about 670 active bins and a
RMS bin width of 3.5 ps for rising edges. Falling edges propagates faster and an
average of 695 bins are needed to cover one clock cycle. The RMS bin width is
therefore a bit lower, 3.2 ps, compared to rising edges.
When using the pulsed edge mode the number of active bins nearly doubles to
1360 which in turn decreases the RMS bin width to 1.6 ps. Since both rising and
falling edges on the input triggers a positive pulse through the tdl the character-
istics are similar.

4.4.2 Error
In single edge mode the resolution is 2.4 ps for rising edges and 2.2 ps for falling
edges in the best case. When using internal calibration the resolution drops to
2.5 ps for both rising and falling edges.
When running in the pulsed edge mode with external calibration the resolution is
1.8 ps. With internal calibration this drops to 2.8 and 3.0 ps for rising and falling
edges respectively.
22

Calibration Bin width Error


99th
Pulsed Rising RMS Largest bin STD Max error
Method, Frequency Active bins percentile
input edge [ps] [ps] [ps] [ps]
[ps]
External, 10 kHz No Yes 669 3.50 15.89 2.37 17.46 8.87
Internal, 79 MHz No Yes 672 3.49 15.82 2.48 19.42 9.56
Internal, 137 MHz No Yes 671 3.50 15.98 3.82 27.86 12.35
TDL, 137 MHz No Yes 670 3.51 15.80 3.33 20.04 11.39
External, 10 kHz No No 694 3.19 13.89 2.24 14.43 8.20
Internal, 79 MHz No No 696 3.19 13.45 2.54 17.03 9.28
Internal, 137 MHz No No 696 3.19 14.09 3.57 27.21 12.69
TDL, 137 MHz No No 694 3.20 14.07 3.76 19.78 12.48
External, 10 kHz Yes Yes 1356 1.63 9.14 1.81 12.92 6.78
Internal, 79 MHz Yes Yes 1358 1.59 9.53 2.82 16.79 9.77
External, 10 kHz Yes No 1360 1.61 9.63 1.78 12.21 6.52
Internal, 79 MHz Yes No 1360 1.63 10.36 2.97 17.43 10.60
Table 4.1: Characteristics determined from calibration and testing
4
Evaluation
4.5 Trigger 23

Existing trigger TDC


0.5 0.5

0.4 0.4
Voltage [V]

Voltage [V]
0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0
−2 0 2 −2 0 2
Time [ns] Time [ns]

Figure 4.2: Data records aligned to cross the trigger level at time zero using
the existing trigger and the tdc

4.5 Trigger
Testing on the digitizer platform was performed according to the method de-
scribed in section 3.3.2 with the digitizer set to 2 GSample/s and a trigger level
of 0.15 V.
The tdc was calibrated using the internal oscillator and configured to collect
values in the single-edge mode.
Figure 4.2 shows 124 data records out of the 3949 collected. The error distribu-
tion can be seen in figure 4.3.
The standard deviation of the errors were 35.4 and 18.7 ps and the largest errors
were 104 and 69 ps for the existing trigger and the tdc respectively.
The mean slope at the trigger level was 163 mV/ns which means that the ADC
contributes with 2.1 ps RMS noise according
√ to equation 3.7. Without this noise
the standard deviation would have been 18.72 − 2.12 ps = 18.6 ps.

4.6 Resource usage


Resource utilization for the tdc, without the uart module, is well below 1% on
the Xilinx Kintex UltraScale XCKU040 fpga. Detailed utilization can be seen in
table 4.2.
24 4 Evaluation

200
TDC
Existing trigger
150
Count

100

50

0
−100 −50 0 50 100
Error [ps]

Figure 4.3: Error distribution of the different triggers

Resource Used Available Utilization


LUT 1972 242400 0.81%
Flip Flop 4237 484800 0.87%
BRAM36 4 600 0.67%
CARRY8 238 30300 0.79%
CLB 627 30300 2.07%
Table 4.2: fpga resource utilization
Conclusions
5
This chapter contains observations and discussions about the implementation
and results.

5.1 Resolution
The best resolution of 1.8 ps was achieved using external calibration in combina-
tion with the pulsed edge mode. This result is better than previously published
fpga-based tdcs [6]. A small comparison to previous works can be seen in ta-
ble 5.1.
Using an external signal for calibration gave consistently better performance com-
pared to all variations of internal calibration. Among the internal calibration
methods, the 79 MHz ring oscillator gave the best results. The reasons for this
are hard to say without further investigation.

Ref.-Year Device Method Precision Resources


[11]-14 Spartan-6 Wave union across multiple tdls 6 ps 144 SLICEs
[10]-15 Virtex-6 Average multiple tdcs 4.2 ps –
[9]-17 Virtex-7 Average multiple tdcs 3.5 ps 12758 LUTs
[2]-16 Kintex-7 Average multiple tdcs, wave union 3.1 ps –
[13]-17 Kintex-7 Multiple tdls, ones-counter 3.9 ps 2433 LUTs
[5]-16 Kintex UltraScale Dual sampling 3.9 ps –
Internal calibration, dual sampling,
2.5 ps
ones-counter
This work Kintex UltraScale 1972 LUTs
Wave union, dual sampling, ones-
1.8 ps
counter
Table 5.1: Comparison of recent, high resolution, fpga-based tdcs

25
26 5 Conclusions

With the slower internal oscillator giving better calibration performance com-
pared to the faster one and the even slower, external calibration performing even
better, a connection between trigger frequency and calibration performance is
not far fetched.
Using internal calibration, the single-edge mode provides a lower standard devi-
ation compared to the pulsed mode. Maximum error and 99th percentile are sim-
ilar for both modes. Since the pulsed mode eliminates most ultra-wide bins this
indicates that the largest errors are not caused by non-uniform bin widths. Two
possible error sources are voltage variations and clock jitter, but others sources
may exist.

5.2 Trade-offs
The use of a bit counter has several advantages compared to tap realignment. It
works equally well for removing bubbles for both rising and falling edges. The
bit counter therefore enables the use of a wave union approach to be applied on
the UltraScale architecture which previously has been dismissed [5].
Having a tdl with a delay longer than one clock cycle allows for both larger
temperature variations and the addition of the pulsed mode. One disadvantage
is that some edges will be seen twice and the dead-time of this implementation
is therefore two clock cycles corresponding to 3.2 ns. This may be mitigated by
higher clock frequency or more clever logic which this project did not investigate.
The bit counter is suspected to have higher resource requirements compared to a
thermometer code decoder. This was not thoroughly investigated but previous re-
search with a thermometer code decoder and 7.8 ps resolution on the UltraScale
architecture required 706 luts compared to 1972 luts in this work [3]. The re-
source requirement of the tdc implemented in this work is 2.8 times higher but
also provides more than 4 times higher resolution.
Internal calibration gives larger errors but allows for calibration without any ex-
tra hardware which is a large advantage in some applications.
Since temperature variations also affects the length of the pulse when using the
pulsed mode, the tdc may be more sensitive to temperature changes when using
this mode.

5.3 Applications
The mode to use depends largely on the application. If external calibration is
possible and the highest resolution is needed, the pulsed mode is the best choice.
If used with internal calibration, the single edge mode delivers the same perfor-
mance as the pulsed mode. The single edge mode also works with a shorter tdl
and a smaller bin counter and therefore allows for the creation of a tdc with
lower resource requirements.
5.4 Future work 27

5.3.1 Teledyne SP Devices Digitizer


When the tdc is used as a trigger for data collection in the Teledyne SP Devices
Digitizer, external calibration is not an option. Therefore the single edge mode
is preferable. According to the tests this gives a RMS trigger error of 18.7 ps
compared to 35.4 ps for the previous trigger.
The resolution achieved when used as a trigger are significantly worse than can
be explained by tdc resolution and noise introduced by the ADC. One possible
reason for this is voltage variations and clock jitter caused by the higher switching
activity in the fpga compared to the stand-alone implementation of the tdc.

5.4 Future work


Possible areas for future work are:
• Quantifying the temperature dependence and evaluating the performance
of the temperature correction scheme proposed in this work.
• Investigating methods to shorten the dead-time.
• Evaluating the performance when calibrating and triggering at different fre-
quencies.
• Investigating different methods for internal calibration and evaluating their
performance.
• Analyzing the effect of increased switching activity on fpga-based tdcs.
• Investigating the erratic first taps of the tdl.
Appendix
A
Evaluation plots

31
32 A Evaluation plots

15

Width [ps]
10

0
0 200 400 600
Bin

(a) External calibration

15
Width [ps]

10

0
0 200 400 600
Bin

(b) Internal calibration

15
Width [ps]

10

0
0 200 400 600
Bin

(c) TDL calibration

Figure A.1: Bin width with a single rising edge


33

8
Width [ps]

0
200 400 600 800 1000 1200 1400
Bin

(a) External calibration

6
Width [ps]

0
200 400 600 800 1000 1200 1400
Bin

(b) Internal calibration

Figure A.2: Bin width with a pulsed rising edge


34 A Evaluation plots

15
Width [ps]
10

0
0 200 400 600
Bin

(a) External calibration

15
Width [ps]

10

0
0 200 400 600
Bin

(b) Internal calibration

15
Width [ps]

10

0
0 200 400 600
Bin

(c) TDL calibration

Figure A.3: Bin width with a single falling edge


35

6
Width [ps]

0
200 400 600 800 1000 1200 1400
Bin

(a) External calibration

10

8
Width [ps]

0
200 400 600 800 1000 1200 1400
Bin

(b) Internal calibration

Figure A.4: Bin width with a pulsed falling edge


36 A Evaluation plots

Count 100

50

0
0 5 10 15
Bin width [ps]

(a) External calibration

100
Count

50

0
0 5 10 15
Bin width [ps]

(b) Internal calibration

100

75
Count

50

25

0
0 5 10 15
Bin width [ps]

(c) TDL calibration

Figure A.5: Bin width histogram with a single rising edge


37

500

400

300
Count

200

100

0
0 2 4 6 8 10
Bin width [ps]

(a) External calibration

500

400

300
Count

200

100

0
0 2 4 6 8
Bin width [ps]

(b) Internal calibration

Figure A.6: Bin width histogram with a pulsed rising edge


38 A Evaluation plots

Count 100

50

0
0 5 10 15
Bin width [ps]

(a) External calibration

100
Count

50

0
0 5 10 15
Bin width [ps]

(b) Internal calibration

100
Count

50

0
0 5 10 15
Bin width [ps]

(c) TDL calibration

Figure A.7: Bin width histogram with a single falling edge


39

500

400

300
Count

200

100

0
0 1 2 3 4 5 6 7 8
Bin width [ps]

(a) External calibration

500

400

300
Count

200

100

0
0 2 4 6 8 10
Bin width [ps]

(b) Internal calibration

Figure A.8: Bin width histogram with a pulsed falling edge


40 A Evaluation plots

20

Error [ps] 10

−10

−20
100 200 300 400 500 600
Bin

(a) External calibration

20

10
Error [ps]

−10

−20
100 200 300 400 500 600
Bin

(b) Internal calibration

20

10
Error [ps]

−10

−20
100 200 300 400 500 600
Bin

(c) TDL calibration

Figure A.9: Error distribution with a single rising edge


41

20

10
Error [ps]

−10

−20
200 400 600 800 1000 1200 1400
Bin

(a) External calibration

20

10
Error [ps]

−10

−20
200 400 600 800 1000 1200 1400
Bin

(b) Internal calibration

Figure A.10: Error distribution with a pulsed rising edge


42 A Evaluation plots

20

Error [ps] 10

−10

−20
100 200 300 400 500 600
Bin

(a) External calibration

20

10
Error [ps]

−10

−20
100 200 300 400 500 600
Bin

(b) Internal calibration

20

10
Error [ps]

−10

−20
100 200 300 400 500 600
Bin

(c) TDL calibration

Figure A.11: Error distribution with a single falling edge


43

20

10
Error [ps]

−10

−20
200 400 600 800 1000 1200 1400
Bin

(a) External calibration

20

10
Error [ps]

−10

−20
200 400 600 800 1000 1200 1400
Bin

(b) Internal calibration

Figure A.12: Error distribution with a pulsed falling edge


44 A Evaluation plots

40000

30000
Count

20000

10000

0
−20 −10 0 10 20
Error [ps]

(a) External calibration

40000

30000
Count

20000

10000

0
−20 −10 0 10 20
Error [ps]

(b) Internal calibration

30000
Count

20000

10000

0
−20 −10 0 10 20
Error [ps]

(c) TDL calibration

Figure A.13: Error distribution with a single rising edge


45

50000

40000

30000
Count

20000

10000

0
−20 −15 −10 −5 0 5 10 15 20
Error [ps]

(a) External calibration

30000

25000

20000
Count

15000

10000

5000

0
−20 −15 −10 −5 0 5 10 15 20
Error [ps]

(b) Internal calibration

Figure A.14: Error distribution with a pulsed rising edge


46 A Evaluation plots

40000

30000
Count

20000

10000

0
−20 −10 0 10 20
Error [ps]

(a) External calibration

30000
Count

20000

10000

0
−20 −10 0 10 20
Error [ps]

(b) Internal calibration

20000
Count

10000

0
−20 −10 0 10 20
Error [ps]

(c) TDL calibration

Figure A.15: Error distribution with a single falling edge


47

50000

40000
Count

30000

20000

10000

0
−20 −15 −10 −5 0 5 10 15 20
Error [ps]

(a) External calibration

30000

25000

20000
Count

15000

10000

5000

0
−20 −15 −10 −5 0 5 10 15 20
Error [ps]

(b) Internal calibration

Figure A.16: Error distribution with a pulsed falling edge


Bibliography

[1] A. Balla, M. Beretta, P. Ciambrone, M. Gatta, F. Gonnella, L. Iafolla, M. Mas-


colo, R. Messi, D. Moricciani, and D. Riondino. Low resource FPGA-based
time to digital converter. 2012. URL https://arxiv.org/abs/1206.
0679v3.
[2] Q. Cao, Y. Wang, and C. Liu. A combination of multiple channels of
FPGA based time-to-digital converter for high time precision. In 2016
IEEE Nuclear Science Symposium, Medical Imaging Conference and Room-
Temperature Semiconductor Detector Workshop (NSS/MIC/RTSD), pages
1–3, October 2016. doi: 10.1109/NSSMIC.2016.8069649.
[3] H. Chen and D. D. Li. Multichannel, low nonlinearity time-to-digital
converters based on 20 and 28 nm FPGAs. IEEE Transactions on In-
dustrial Electronics, 66(4):3265–3274, April 2019. ISSN 1557-9948. doi:
10.1109/TIE.2018.2842787.
[4] H. Chen, Y. Zhang, and D. D. Li. A low nonlinearity, missing-code free time-
to-digital converter based on 28-nm FPGAs with embedded bin-width cal-
ibrations. IEEE Transactions on Instrumentation and Measurement, 66(7):
1912–1921, July 2017. ISSN 1557-9662. doi: 10.1109/TIM.2017.2663498.
[5] C. Liu, Y. Wang, P. Kuang, D. Li, and X. Cheng. A 3.9 ps RMS resolution
time-to-digital converter using dual-sampling method on Kintex UltraScale
FPGA. In 2016 IEEE-NPSS Real Time Conference (RT), pages 1–3, June
2016. doi: 10.1109/RTC.2016.7543081.
[6] R. Machado, J. Cabral, and F. S. Alves. Recent developments and challenges
in FPGA-based time-to-digital converters. IEEE Transactions on Instrumen-
tation and Measurement, 68(11):4205–4221, November 2019. ISSN 1557-
9662. doi: 10.1109/TIM.2019.2938436.
[7] W. Pan, G. Gong, and J. Li. A 20-ps time-to-digital converter (TDC) imple-
mented in field-programmable gate array (FPGA) with automatic temper-
ature correction. IEEE Transactions on Nuclear Science, 61(3):1468–1473,
June 2014. ISSN 1558-1578. doi: 10.1109/TNS.2014.2320325.

49
50 Bibliography

[8] J. Qi, Z. Deng, H. Gong, and Y. Liu. A 20ps resolution wave union FPGA
TDC with on-chip real time correction. In IEEE Nuclear Science Symposuim
Medical Imaging Conference, pages 396–399, October 2010. doi: 10.1109/
NSSMIC.2010.5873788.
[9] X. Qin, L. Wang, D. Liu, Y. Zhao, X. Rong, and J. Du. A 1.15-ps bin size and
3.5-ps single-shot precision time-to-digital converter with on-board offset
correction in an FPGA. IEEE Transactions on Nuclear Science, 64(12):2951–
2957, December 2017. ISSN 1558-1578. doi: 10.1109/TNS.2017.2768082.
[10] Q. Shen, S. Liu, B. Qi, Q. An, S. Liao, P. Shang, C. Peng, and W. Liu. A
1.7 ps equivalent bin size and 4.2 ps RMS FPGA TDC based on multichain
measurements averaging method. IEEE Transactions on Nuclear Science, 62
(3):947–954, 2015. ISSN 1558-1578. doi: 10.1109/TNS.2015.2426214.
[11] R. Szplet, D. Sondej, and G. Grzeda. Subpicosecond-resolution time-to-
digital converter with multi-edge coding in independent coding lines. In
2014 IEEE International Instrumentation and Measurement Technology
Conference (I2MTC) Proceedings, pages 747–751, May 2014. doi: 10.1109/
I2MTC.2014.6860842.
[12] Y. Wang and C. Liu. A 3.9 ps time-interval RMS precision time-to-digital
converter using a dual-sampling method in an UltraScale FPGA. IEEE
Transactions on Nuclear Science, 63(5):2617–2621, October 2016. ISSN
1558-1578. doi: 10.1109/TNS.2016.2596305.
[13] Y. Wang, J. Kuang, C. Liu, and Q. Cao. A 3.9-ps RMS precision time-to-
digital converter using ones-counter encoding scheme in a Kintex-7 FPGA.
IEEE Transactions on Nuclear Science, 64(10):2713–2718, October 2017.
ISSN 1558-1578. doi: 10.1109/TNS.2017.2746626.
[14] J. Wu and Z. Shi. The 10-ps wave union TDC: Improving FPGA TDC res-
olution beyond its cell delay. In 2008 IEEE Nuclear Science Symposium
Conference Record, pages 3440–3446, 2008.
[15] UltraScale Architecture Configurable Logic Block - User Guide. Xilinx
Inc, https://www.xilinx.com/support/documentation/user_
guides/ug574-ultrascale-clb.pdf, v1.5 edition, February 2017.
[16] KCU105 Board - User Guide. Xilinx Inc, https://www.xilinx.
com/support/documentation/boards_and_kits/kcu105/
ug917-kcu105-eval-bd.pdf, v1.10 edition, 2019.

You might also like