SMRT Link User Guide v11.0
SMRT Link User Guide v11.0
SMRT Link User Guide v11.0
user guide
Sequel® II and IIe
systems
Research use only. Not for use in diagnostic procedures.
Information in this document is subject to change without notice. PacBio assumes no responsibility for any errors or
omissions in this document.
PACBIO DISCLAIMS ALL WARRANTIES WITH RESPECT TO THIS DOCUMENT, EXPRESS, STATUTORY, IMPLIED OR
OTHERWISE, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY
QUALITY, NONINFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL PACBIO BE
LIABLE, WHETHER IN CONTRACT, TORT, WARRANTY, PURSUANT TO ANY STATUTE, OR ON ANY OTHER BASIS FOR
SPECIAL, CONSEQUENTIAL, INCIDENTAL, EXEMPLARY OR INDIRECT DAMAGES IN CONNECTION WITH (OR
ARISING FROM) THIS DOCUMENT, WHETHER OR NOT FORESEEABLE AND WHETHER OR NOT PACBIO IS ADVISED
OF THE POSSIBILITY OF SUCH DAMAGES.
Certain notices, terms, conditions and/or use restrictions may pertain to your use of PacBio products and/or third
party products. Refer to the applicable PacBio terms and conditions of sale and to the applicable license terms at
http://www.pacificbiosciences.com/licenses.html.
Trademarks:
Pacific Biosciences, the PacBio logo, PacBio, Circulomics, Omnione, SMRT, SMRTbell, Iso-Seq, Sequel, Nanobind,
and SBB are trademarks of Pacific Biosciences of California Inc. (PacBio). All other trademarks are the sole property
of their respective owners.
PacBio
1305 O’Brien Drive
Menlo Park, CA 94025
www.pacb.com
SMRT® Link user guide (v11.0)
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Contact information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Module menu commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
Gear menu commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Sending information to Technical Support . . . . . . . . . . . . . . . . . . . . . . . .7
Sample Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Application-based calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9
Custom calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Classic mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
Editing or printing calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Deleting calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Importing/exporting calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12
Run Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Creating a new Run Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .16
Custom Run Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18
Advanced options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19
Editing or deleting Run Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
Creating a Run Design by importing a CSV file. . . . . . . . . . . . . . . . . . . . 20
Run QC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Table fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .29
Run settings and metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31
Data Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
What is a Data Set? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .34
Creating a Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
Viewing Data Set information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Copying a Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .36
Deleting a Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
Starting a job from a Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Data Set QC reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37
What is a Project? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Data Sets and Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Creating a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39
Editing a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Deleting a Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40
Viewing/deleting sequence, reference and barcode data . . . . . . . . . . . 41
Importing sequence, reference and barcode data . . . . . . . . . . . . . . . . .41
Exporting sequence, reference and barcode data . . . . . . . . . . . . . . . . . .42
SMRT® Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Creating and starting a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44
Starting a job after viewing sequence data. . . . . . . . . . . . . . . . . . . . . . . 49
Canceling a running job. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Page 1
Restarting a failed job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .50
Viewing job results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .51
Copying and running an existing job . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Exporting a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
Importing a job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .52
PacBio® secondary analysis applications . . . . . . . . . . . . . . . . . . . . . . . . . 54
Genome Assembly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
HiFi Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58
HiFiViral SARS-CoV-2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62
Iso-Seq® Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .67
Microbial Genome Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74
Minor Variants Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Structural Variant Calling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
PacBio® data utilities. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
5mC CpG Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91
Demultiplex Barcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .92
Export Reads. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
Mark PCR Duplicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Trim Ultra-Low Adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Circular Consensus Sequencing (CCS) . . . . . . . . . . . . . . . . . . . . . . . . 104
Working with barcoded data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Step 1: Specify the barcode setup & sample names in a Run Design 107
Step 2: Perform the sequencing run . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Step 3: (Optional) Run the Demultiplex Barcodes data utility . . . . . . . 110
Step 4: Run applications using the demultiplexed data as input . . . . 111
Demultiplex Barcodes details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Automated analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Creating an Auto Analysis job from SMRT Analysis . . . . . . . . . . . . . . 115
Creating Auto Analysis from a Run Design . . . . . . . . . . . . . . . . . . . . . . 116
HiFiViral SARS-CoV-2: Creating Auto Analysis in Run Design. . . . . . . 116
Getting information about analyses created by Auto Analysis . . . . . 116
Getting information about Pre Analysis from SMRT Analysis . . . . . . 117
Getting information about Pre Analysis from Run Design . . . . . . . . . 117
Visualizing data using IGV. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
Using the PacBio® self-signed SSL certificate. . . . . . . . . . . . . . . . . . . . . 120
Sequel® II and Sequel IIe systems output files . . . . . . . . . . . . . . . . . . . . 121
Sequel IIe system output files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Sequel II system output files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
Secondary analysis output files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
Configuration and user management . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
SSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Adding and deleting SMRT Link users . . . . . . . . . . . . . . . . . . . . . . . . . 130
Page 2
Assigning user roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
Hardware/software requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
Appendix A - PacBio terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
Appendix B - Data search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
Appendix C - BED file format for Target Regions report . . . . . . . . . . . . . 138
Appendix D - Additional information in the CCS Data Set Export report 140
Page 3
Introduction
This document describes how to use PacBio’s SMRT Link software.
SMRT Link is the web-based end-to-end workflow manager for Sequel II
systems. SMRT Link includes the following modules:
Note: SMRT Link v11.0 is for use with Sequel II systems and Sequel IIe
systems only. If you are using a Sequel system, use an earlier version of
SMRT Link.
• The data files generated by the Sequel II system and Sequel IIe
systems for each cell transferred to network storage. (See “Sequel® II
system and Sequel IIe system output files” on page 121 for details.)
• The data files generated by secondary analysis. (See “Secondary
analysis output files” on page 126 for details.)
• Configuration and user management. (See “Configuration and user
management” on page 129 for details.)
• SMRT Link client hardware/software requirements. (See “Hardware/
software requirements” on page 132 for details.)
New features, fixed issues and known issues are listed in the document
SMRT Link release notes (v11.0).
When you first start SMRT Link, you must specify which system you are
using: Sequel II, or Sequel IIe. This choice affects some of the initial
values used in the Sample Setup and Run Design modules. In those
modules, you can switch between the two Sequel systems as needed.
Users with administrator access can configure SMRT Link to support all
instrument types.
Page 4
Contact information
For additional technical support, contact PacBio at [email protected] or
1-877-920-PACB (7222).
If an SSL certificate is not installed with SMRT Link, the application will
use the PacBio self-signed SSL certificate and will use the HTTP protocol.
In this case, each user will need to accept the browser security warnings
described in “Using the PacBio® self-signed SSL certificate” on page 120.
• Click the PacBio logo at the top left to navigate back to the SMRT Link
home page from within the application.
• Click the Gear menu to sign out, configure for the Sequel II system or
Sequel IIe system, view version information, or perform administrative
functions (Admins only).
• Click a module name to access that module. Sample Setup, Run
Design, Data Management and SMRT Analysis include links to create
new Calculations, Run Designs, Data Sets, and jobs. (A Module menu
displays next to the PacBio logo, allowing you to move between
modules.)
• Click ? to view the SMRT Link online help.
• Select Sign Out from the Gear menu to log out of SMRT Link.
Page 5
Module menu commands
• Sample Setup: Displays the Sample Setup module.
• Run Design: Displays the Run Design module.
• Run QC: Displays the Run QC module.
• Data Management: Displays the Data Management module.
• SMRT Analysis: Displays the SMRT Analysis module.
Page 6
Sending information to Technical Support
To open a case with PacBio Technical Support, send an email to
[email protected].
• From the SMRT Link menu: About > Troubleshooting Information >
Send.
• From a SMRT Link “Failed” analysis Results page: Click Send Log
Files.
Page 7
Sample Setup
To prepare your samples for sequencing, use SMRT Link's Sample Setup
module to generate a customized protocol for primer annealing and
polymerase binding to SMRTbell® templates, with subsequent sample
cleanup. You can then print the instructions for use in the lab.
Page 8
Application-based calculations
12. If necessary, edit the Cleanup anticipated yield. Adjust this percent-
age based on previous experience. (Cleanup removes excess
primers/polymerase from bound complexes, which results in higher
quality data.)
13. Specify the on-plate loading concentration (OPLC), in pM.
14. Specify the Minimum Pipetting Volume, in uL. This allows you to set a
lower limit on pipetting volumes to use in certain protocol steps, such
as sample annealing and binding. We recommend setting this to 1 uL,
though in some cases, for example if sample availability is very
limited, it may be appropriate to set a value below 1 uL. Some protocol
steps include fixed values of 1 uL that will not be affected by this
setting.
Page 9
15. Optionally, do one of the following:
– Click Copy to start a new sample group using the information
entered. Then, edit specific fields for each sample group.
– Click Automate to generate a CSV file. This exports the calculated
values to a CSV file for lab automation.
16. To print the calculation(s) and instructions, use the browser's Print
command (Ctrl-P).
Custom calculations
1. To accommodate new or unique sample types, choose Application >
Custom and enter all settings manually.
2. Click Set Custom Preset Values to save any custom application
settings you may have specified. The next time you select Application
> Custom, those settings are retrieved.
Classic mode
Note: Classic mode is provided for legacy support purposes only. We
highly recommend using High-Throughput mode even for single samples.
Page 10
15. Optionally, specify an alternative number of cells or on-plate loading
concentration (OPLC) for the final sample dilution step. Use this
feature, for example, to initially set up a single-SMRT Cell run to test a
specific loading concentration prior to conducting a multi-SMRT Cell
sequencing run, or to set up a loading titration experiment to optimize
the OPLC for your particular sample.
Page 11
Advanced options
• Specify the Minimum Pipetting Volume, in uL. This allows you to set a
lower limit on pipetting volumes to use in certain protocol steps, such
as sample annealing and binding. We recommend setting this to 1 uL,
though in some cases, for example if sample availability is very
limited, it may be appropriate to set a value below 1 uL. Some protocol
steps include fixed values of 1 uL that will not be affected by this
setting.
• Specify the % of Annealing Reaction to Use in Binding. This
accommodates pipetting underage: Due to pipetting issues, volumes
may not add up to what they should; a value below 100% helps ensure
there will be enough annealed sample for binding.
Deleting calculations
1. On the Sample Setup screen, select one or more calculation names to
delete.
2. Click Delete.
Importing/exporting calculations
Sample Setup supports importing and exporting calculations in CSV
format.
Note: The content of the CSV file generated using the Export button in the
Sample Setup home screen is different from the content of the CSV file
generated using the High-Throughput mode’s Automate button used for
lab automation.
Page 12
7. In Sample Setup, click Import.
8. Click Browse, then select the CSV file you previously modified in Step
6 and click Open. If everything is correct, click Continue. The imported
calculation displays.
Note:
• You can select multiple calculations to export to the same CSV file.
• You can also import multiple calculations by adding rows to the CSV
file.
Page 13
Field name Required Description
Binding Kit Yes For Sequel II/IIe Binding Kits 2.0, 2.1, 2.2, 3.1 and 3.2:
• Lxxxxx101780500123199 (2.0)
• Lxxxxx101820500123199 (2.1)
• Lxxxxx101894200123199 (2.2)
• Lxxxxx102194200123199 (3.1)
• Lxxxxx102194100123199 (3.2)
Target Annealing Concentration (nM) No Enter a positive integer. Units are in nanomolar.
Note: If Application is set to Custom, this field is required.
Target Binding Concentration (nM) No Enter a positive integer. Units are in nanomolar.
Note: If Application is set to Custom, this field is required.
Target Polymerase Concentration (X) No Enter a positive integer.
Note: If Application is set to Custom, this field is required.
Binding Time (hours) No Enter a positive integer.
Note: If Application is set to Custom, this field is required.
Cleanup Bead Type No Must be AMPure or ProNex.
Note: If Application is set to Custom, this field is required.
Cleanup Bead Concentration (X) No Enter a positive integer.
Note: If Application is set to Custom, this field is required.
Minimum Pipetting Volume (uL) No Enter a positive integer. Units are in microliters.
Percent of Annealing Reaction To Use In No Enter a positive integer.
Binding (%) Note: If Application is set to Custom, this field is required.
AMPure Diluted Bound Complex Volume No Enter a positive integer. Units are in microliters.
(uL)
AMPure Diluted Bound Complex No Enter a positive integer. Units are in nanograms per
Concentration (ng/uL) microliter.
AMPure Purified Complex Volume (uL) No Enter a positive integer. Units are in microliters.
AMPure Purified Complex Concentration No Enter a positive integer. Units are in nanograms per
(ng/uL) microliter.
ProNex Diluted Bound Complex Volume No Enter a positive integer. Units are in microliters.
(uL)
ProNex Diluted Bound Complex No Enter a positive integer. Units are in nanograms per
Concentration (ng/uL) microliter.
ProNex Purified Complex Volume (uL) No Enter a positive integer. Units are in microliters.
ProNex Purified Complex Concentration No Enter a positive integer. Units are in nanograms per
(ng/uL) microliter.
Requested Cells Alternate (cells) No Enter a positive integer.
Requested OPLC Alternate (pM) No Enter a positive integer. Units are in parts per million.
Following are the fields contained in the CSV file generated by the
Automate button in High-Throughput mode. This includes all the fields
Page 14
that display in the Sample Setup page, with the volumes listed in each
table easily accessible for liquid handling automation purposes.
1 Export Version
Version number of the file format specification. Allows for scripts to check
version numbers to ensure compatibility through subsequent software
releases.
2 Instructions Version
Version number of SMRT Link, chemistry bundle, and parameters.
3 Sample Group Name
4 Annealing Number of Samples
5 Annealing Sample Volume
6 Annealing Master Mix Volume
7 Annealing Incubation Temperature (C)
8 Annealing Incubation Time (minutes)
9 Polymerase Stock Volume
10 Sequel II Polymerase Dilution Buffer Volume
11 Binding Number of Samples
12 Binding Annealed Sample Volume
13 Binding Master Mix Volume
14 Binding Diluted Polymerase Volume
15 Binding Incubation Temperature (C)
16 Binding Incubation Time (minutes)
17 ICD1 Sequel Complex Dilution Buffer Volume
18 ICD1 Internal Control Stock Volume
19 ICD2 Sequel Complex Dilution Buffer Volume
20 ICD2 Diluted Internal Control (ICD1) Volume
21 ICD3 Sequel Complex Dilution Buffer Volume
22 ICD3 Diluted Internal Control (ICD2) Volume
23 Cleanup S2 Sample Input Volume
24 Cleanup S2 Diluent Volume
25 Cleanup S2 Binding Buffer
26 Cleanup S3 Bead Solution Volume
27 Cleanup S5 Elution Volume
28 Cleanup S5 Elution Buffer
28 Final Loading Number of Samples
30 Final Loading Prepared Sample Volume
31 Final Loading Diluted Internal Control (ICD3) Volume
32 Final Loading Volume (micro-liter)
Page 15
Run Design
Use SMRT Link's Run Design module to create, edit, or import Run
Designs. A Run Design specifies:
The Run Design then becomes available from the Sequel Instrument
Control Software (ICS), which is the instrument touchscreen software
used to select a Run Design, load the instrument, and then start the run.
Run Designs created in SMRT Link are accessible from all Sequel II
systems linked to the same SMRT Link server.
• Use SMRT Link’s Run Design module to create a new Run Design.
• Create a CSV file, then import it using SMRT Link’s Run Design
module.
Note: To create a run design, either use the Run Design screen, or import
a CSV file. Do not mix the two methods.
Page 16
5. Specify if this Run Design is to be used with a Sequel II system or a
Sequel IIe system. This affects the initial default values.
6. Enter a Run Name. (The software creates a new run name based on
the current date and time; edit the name as needed.)
7. (Optional) Enter Run Comments, Experiment Name, and
Experiment ID as needed. (Note: Experiment ID must be
alphanumeric.)
8. (Optional) Click Select Sample to import information from a
previously-created Sample Setup entry. The following fields are auto-
populated as appropriate:
– Sample Name
– Binding Kit
– DNA Control Complex
– Insert Size
– On-Plate Loading Concentration
9. Select a sequencing application from the list. The following fields are
auto-populated, and display in green:
Page 17
– Template Prep Kit
– Binding Kit
– Sequencing Kit
– DNA Control Complex
– Movie Time per SMRT Cell (hours)
– Pre-Extension Time (hours)
10. Enter a Well Sample Name. (This is the name of the sequencing
library loaded into one well. Example: HG002_2019_11_02_10K)
11. Enter a Bio Sample Name. (This is the name of the biological sample
contained in the sequencing library, such as HG002. See “Working
with barcoded data” on page 107 for details.)
12. (Optional) Enter Sample Comments.
13. Specify the well position used for this sample: Click the icon to the
right of the entry field and choose a plate position.
14. Specify an insert size (500 base pairs minimum). The insert size is
the length of the double-stranded nucleic acid fragment in a SMRTbell
template, excluding the hairpin adapters. This matches the average
insert size for the sample; the size range boundaries are described in
the library preparation protocol. Note: The default insert size for
Subreads is 30,000; 10,000 for CCS reads.
15. Specify the On-Plate loading concentration (OPLC), in picomolarity.
16. (Optional) If you are using barcoded samples, see “Step 1: Specify the
barcode setup and sample names in a Run Design” on page 107 for
instructions. For details on secondary analysis of barcoded samples,
see “Demultiplex Barcodes” on page 92.
17. Sample options:
– Click Copy. This starts a new sample, using the values entered in
the first sample.
– Click Delete. This deletes the current sample.
– Click Add Sample. This starts a new, empty sample.
18. After filling in all the samples, click Save - this saves the entire Run
Design. The new Run Design displays on the main Run Design page.
19. Click View Summary to view a table summarizing the entire Run
Design. The Run Design file is now imported and available for
selection in Sequel ICS on the instrument.
20. (Optional) Auto Analysis allows a specific analysis to be
automatically run after a sequencing run has finished and the data
transferred to the SMRT Link server. See “Automated analysis” on
page 115 for details.
Page 18
– Template Prep Kit, Binding Kit, or Sequencing Kit: Select one from
the list, or type in a kit part number. If the barcode is invalid, "Invalid
barcode" displays.
Note: If the Sequencing or Binding kit selected is incompatible, an
error message displays indicating the obsolete chemistry, and the
run is prevented from proceeding.
– DNA Control Complex: PacBio highly recommends using the
Internal Control to help distinguish between sample quality and
instrument issues in the event of suboptimal sequencing
performance. (Note: PacBio requires the use of the Internal Control
for consumables to be eligible for reimbursement consideration.)
– Movie time per SMRT Cell (hours): Enter a time between 0.5 and
30. Note: The SMRT Cell 8M part supports all movie times up to 30
hours.
– Use Pre-Extension: If selected, optionally specify the length of pre-
extension time in hours. This initiates the sequencing reaction prior
to data acquisition. After the specified time, the sequencing
reagents are removed from the SMRT Cell and replenished with
fresh reagents, and data acquisition starts. This feature is useful for
short inserts (such as ≤15 kb) and provides a significant increase in
read length.
– Include 5mC Calls in CpG Motifs: If selected, analyzes the kinetic
signatures of cytosine bases in CpG motifs to identify the presence
of 5mC.
– Detect and Resolve Heteroduplex Reads: Heteroduplexes are DNA
molecules where the forward and reverse strands are not perfect
reverse-complements. If the option is selected and heteroduplexes
are detected, a consensus is called for each strand separately, and
the sequence of both strands is output.
Note: This option displays only if Adeno-Associated Virus, Full-
Length 16S rRNA Sequencing, <3kb Amplicons, or >=3kb
Amplicons are selected as the application.
Advanced options
• Specify whether to use Adaptive Loading. Adaptive Loading uses
active monitoring of the ZMW loading process to predict a favorable
loading end point. Certain steps (Cleanup and Sample Dilution)
require a different buffer (Adaptive Loading Buffer) if this feature is
used. Note: Adaptive Loading requires the use of Sequel® II binding
kit 2.2. If you select Yes, fill in the following fields:
– Loading Target (P1 + P2): The fraction of ZMWs that the Adaptive
Loading routine will aim to load with at least one sequencing
complex. The default target for CCS applications is higher to
accommodate loss of complexes during pre-extension, which is
generally recommended for all CCS applications.
– Maximum Loading Time (hours): This defines the maximum time
the system will allow loading to progress before proceeding to
sequencing. (Loading time in Adaptive Loading is flexible.)
Page 19
• Specify the length of time (1, 2 or 4 hours) for immobilization of
SMRTbell templates. This is the length of time the SMRT Cell is at the
Cell Prep Station to allow diffusion of SMRTbell templates into the
ZMWs. This option is not available if Adaptive Loading is selected.
– PacBio highly recommends using the default immobilization time
of 2 hours.
• (Sequel IIe systems only) Specify, for this Run Design only, whether to
include kinetics information (used for epigenetics analysis) in the CCS
analysis output. This setting overwrites the global setting in Gear >
Configure > CCS Analysis Output. Note: Adding kinetics information
can increase the amount of storage used by the output BAM files by
up to 5 times.
• Specify, for this Run Design only, whether to include low quality reads
(non-HiFi reads) in the CCS analysis output. Note that this option
disables automatic demultiplexing, 5mC detection, and heteroduplex
insert detection, if applicable.
• Add Data to Project: Specify that Data Sets generated by SMRT
Cell(s) using this Run Design be associated with the selected Project.
(This also applies to any Data Sets generated using Auto Analysis. By
default, all Data Sets are assigned to General Project, which is
accessible to all users.)
1. Update the appropriate CSV file as necessary for the Run Design. (See
the definitions of the Run Design attributes in the table below.)
2. Save the edited CSV file.
3. Import the file into Sequel ICS using SMRT Link. To do so, first access
SMRT Link using the Chrome web browser.
Page 20
4. Select Run Design.
5. Click Import Run Design.
6. Select the saved CSV file designed for the run and click Open. The file
is now imported and available for selection in Sequel ICS on the
instrument.
The Sequel IIe system can be configured to output Subreads data in BAM
format by using the Run Design CSV import mechanism. In addition to the
other required columns, users can add the column Emitted Subreads
Percent to the CSV file, with a value of 0-100 for a given collection. This
results in the inclusion of Subreads from 0-100% of ZMWs in the Data Set
transferred from the instrument, in a BAM file separate from the HiFi
reads. Note that this will not result in the inclusion of associated scraps
data for each ZMW.
Page 21
Run Design attribute Required Description
Is Collection No Enter a Boolean value. (See Boolean details below.) Specifies whether
the row designates a Collection (TRUE) or a barcoded sample
(FALSE).
• Collection lines should have the Barcode Name and Bio Sample
Name fields blank.
• Barcoded Sample lines only need to include the Is Collection,
Sample Name, Barcode Name, and Bio Sample Name fields.
Sample Well Yes Must be specified in every row. Well number must start with a letter A
through H, and end in a number 01 through 12, i.e. A01 through H12.
It must satisfy the regular expression ``/^[A-H](?:0[1-9]|1[0-
2])$/`` Example: A01
Well Sample Name Yes Enter alphanumeric characters, spaces, hyphens, underscores, colons,
or periods only.
Example: A6_3230046_A01_SB_ChemKitv2_8rxnKit
Note: The Sample Name must be unique within a run.
Movie Time per SMRT Cell Yes Enter a floating point number between 0.1 and 30. Time is in hours.
(hours) Example: 5
Use Adaptive Loading No Enter a Boolean value. (See Boolean details below.)
Loading Target (P1 + P2) No Enter a floating point number between 0.01 and 1. Example: 0.4
Maximum Loading Time No Enter a floating point number between 1 and 2. Time is in hours.
(hours) Example: 1.2
Sample Comment No Enter alphanumeric characters, spaces, hyphens, underscores, colons,
or periods only.
Example: A6_3230046_A01_SB_BindKit_ChemKit
Insert Size (bp) Yes Enter an integer ≥10. Units are in base pairs. Example: 2000
On Plate Loading No Enter a floating point number. Units are in parts per million.
Concentration (pM) Example: 5
Size Selection No Enter a Boolean value. (See Boolean details below.) Default is FALSE.
Template Prep Kit Box Barcode Yes Enter or scan a valid kit barcode. (See Kit Barcode Requirements
details below.)
Working example: DM1117100259100111716
DNA Control Complex Box No Enter or scan a valid kit barcode. (See Kit Barcode Requirements
Barcode details below.)
Working example: DM1234101084300123120
Binding Kit Box Barcode Yes Enter or scan a valid kit barcode. (See Kit Barcode Requirements
details below.)
Working example: DM1117100862200111716
Sequencing Kit Box Barcode Yes Enter or scan a valid kit barcode. (See Kit Barcode Requirements
details below.)
Working example: DM0001100861800123120
Automation Name No Enter diffusion (not case-sensitive) or a custom script. (Sequel II
systems do not support magbead loading.)
A path can also be used, such as
/path/to/my/script/my_script.py. The path will not be
processed further, so if the full URI is required, it must be included in
the CSV, such as
chemistry://path/to/my/script/my_script.py.
Automation Parameters No To enable Pre-Extension time, enter the number of hours and set the
boolean value to TRUE. Example 2 hours:
ExtensionTime=double:2|ExtendFirst=boolean:TRUE
(Note: Leave blank when not using Pre-Extension time, or set the
boolean value to FALSE.)
Page 22
Run Design attribute Required Description
Detect and Resolve No Enter a boolean value. (See Boolean details below.) Set to TRUE to
Heteroduplex Reads allow for detection of heteroduplex reads.
Note: Only applicable if Application is set to one of the following:
• Adeno-Associated Virus
• <3kb Amplicons
• >=3kb Amplicons
• Custom
Include 5mC Calls in CpG No Enter a boolean value. (See Boolean details below.) Set to TRUE to
Motifs allow for 5mC calls in CpG motifs.
Note: Only applicable if Application is set to HiFi Reads or Custom.
Sample is Barcoded No Enter a boolean value. (See Boolean details below.) Set to TRUE for a
barcoded run.
Demultiplex Barcodes No Add any of the following values: Do Not Generate, In SMRT Link, or
On Instrument.
If left blank, the default is Do Not Generate for all systems.
Note: This is available for all applications. The following values are
recommended based upon your system:
• Sequel II system: Enter one of the following values: Do Not
Generate or In SMRT Link.
• Sequel IIe system: Enter one of the following values: Do Not
Generate, In SMRT Link, or On Instrument.
CCS Analysis Output - Include No Enter a boolean value. (See Boolean details below.)
Low Quality Reads • Set to TRUE to allow for CCS analysis with --all mode activated
and produce a reads.bam file
• Set to FALSE to exclude all reads with rq < 0.99.
Barcode Set No Must be a UUID for a Barcode Set present in the database.
To find the UUID: Click Data Management > View Data > Barcodes.
Click the Barcode file of interest, then view the UUID.
Example: dad4949d-f637-0979-b5d1-9777eff62008
Note: This field is used for demultiplexed data.
Same Barcodes on Both Ends No Enter a boolean value. (See Boolean details below.) Set to TRUE if
of Sequence symmetric, FALSE if asymmetric.
Barcode Name No Enter Barcode Names one per line.
Example: bc1001--bc1001
• Use double hyphens (--) to separate the 2 barcodes of each pair.
• The barcode names must be contained within the specified
Barcode Set.
• A given barcode name cannot appear more than once in the
spreadsheet.
• A maximum of 15,000 barcodes is permitted per sample.
Bio Sample Name Yes Enter Bio Sample Names in the same row as their associated Barcode
Names. Use alphanumeric characters, spaces (allowed but not
recommended for compatibility with downstream software), hyphens,
underscores, colons, or periods only. Bio Sample Names cannot be
longer than 40 characters.
Example: sample1
Note: This field is used for collections for non-multiplexed data, and
for barcoded samples in multiplexed data.
Page 23
Run Design attribute Required Description
Example: PacBio.DataSet.BarcodeSet;eid_barcode;afe89e3f-17ca-
e9b8-eae9-
b701dbb1f02d|PacBio.DataSet.ReferenceSet;eid_ref_dataset;6b8db1
44-a601-4577-ab04-ba64cadc0548
Task Options No Enter an ASCII string containing the options for the application
referred to in the Pipeline ID field, with parameters separated by “;”
characters: task_id;value_type;value.
Example: pbmm2_align.task_options.minalnlength;integer;50
Note: This field is optional for Auto Analysis - any task options not
specified will use pipeline defaults.
Page 24
Run Design attribute Required Description
Boolean values
• Valid boolean values for true are: true, t, yes, or y.
• Valid boolean values for false are: false, f, no, or n.
• Boolean values are not case-sensitive.
For the above example, the full kit barcode would be:
DM1234100619300123120.
Each kit must have a valid Part Number and cannot be obsolete. The list
of kits can be found through a services endpoint such as:
This services endpoint will list, for each kit, the part numbers
(PartNumber) and whether it is obsolete (IsObsolete).
Page 25
Dates must also be valid, meaning they must exist in the Gregorian
calendar.
Page 26
Run QC
Use SMRT Link’s Run QC module to monitor performance trends and
perform run QC remotely.
Page 27
• Run Completion: Displays the estimated time remaining to complete
sequencing run or the time elapsed since the sequencing run
completed. Also displays the date (in YYYY-MM-DD format) when the
last sequencing run was completed.
• Sequencing ZMWs: Displays a plot of how many ZMWs on a SMRT
Cell are actively sequencing during a movie collection. For
sequencing runs conducted with Binding kit 2.2 and 3.2, only the
number of actively sequencing singly-loaded ZMWs (P1) displays.
For sequencing runs conducted with Binding kit 2.1 and 3.1, the total
number of actively sequencing ZMWs (P1 + P2) displays.
Note: Due to terminations, not all ZMWs are singly-loaded at the same
time. Some ZMWs are singly-loaded only at or near the end of a movie
collection, whereas others are singly-loaded only at the beginning.
(Singly-loaded means that the ZMW contains only one active polymerase
instead of two or more simultaneously active polymerases.) For runs
conducted with Binding kit 2.2 and 3.2, the peak concurrent Sequencing
ZMWs value shown in the plot will always be less than the final %P1 ZMW
yield reported in the Run QC metrics table at the end of a movie collection.
(For sequencing runs conducted with Binding kit 2.1 and 3.1, the peak
concurrent Sequencing ZMWs value shown in the plot will always be
higher than the final %P1 ZMW yield reported in Run QC.)
For a SMRT Cell that achieves ≥50% P1 loading and ≥10% P0, the ZMW
Sequencing plot should typically display a peak value above 2,000,000.
See the figure below for an example comparison between the Instrument
Status report (top) and Run QC report (bottom) for a WGS sample
sequenced using Binding kit 3.2 with a 30-hour movie collection time.
The Sequencing ZMWs plot in the Instrument Status report shows that
the peak concurrent Sequencing ZMWs value for the last SMRT Cell in the
run (Well D01) is approximately 3,000,000 ZMWs, whereas the final %P1
ZMW yield reported in the corresponding Run QC metrics table for Well
D01 is 76.8% (or 6,144,000 P1 ZMWs.)
Page 28
Accessing run information
Table fields
Note: Not all table fields are shown by default. To see additional table
fields, click the > symbol next to a column title.
• Name: A list of all runs for the instruments connected to SMRT Link.
Click a run name to view more detailed information on the individual
run page.
• Summary: A description of the run.
• Dates
– Run Date: The date and time when the run was started.
– Completion Date: The date and time the run was completed.
– Transferred Date: The date and time the run results were
transferred to the network.
• Created By: The name of the user who created the run.
• Status: The current status of the run. Can be one of the following:
Running, Complete, Failed, Terminated, or Unknown.
• Instrument Details
– Instrument Name: The name of the instrument.
– Instrument SN: The serial number of the instrument.
– Instrument SW: The versions of Sequel Instrument Control
Software (ICS) installed on the instrument.
• Cells
– Total: The total number of SMRT Cells used in the run.
– Completed: The number of SMRT Cells that generated data for the
run.
Page 29
– Failed: The number of SMRT Cells that failed to generate data
during the run.
• Run ID: An internally-generated ID number identifying the run.
• Primary Analysis SW: The version of Primary Analysis software
installed on the instrument.
• UUID: Another internally-generated ID number identifying the run.
6. Click the Run name of interest. Following are the fields and metrics
displayed.
• Run Start: The date and time when the run was started.
• Run Complete: The date and time the run was completed.
• Transfer Complete: The date and time that the run data was
successfully transferred to the network.
• Run ID: An internally-generated ID number identifying the run.
• Description: The description, as defined when creating the run.
• Instrument: The name of the instrument.
• Instrument SN: The serial number of the instrument.
• Instrument Control SW Version: The versions of Sequel Instrument
Control Software (ICS) installed on the instrument.
• Instrument Chemistry Bundle: The version of the Chemistry Bundle
installed on the instrument when the run was initiated.
• Primary SW Version: The versions of Primary Analysis software
installed on the instrument.
7. Click the > arrow at the top of the Consumables table to see the
sample wells used, consumable type, lot number, expiration date, and
other information.
Page 30
Run settings and metrics
Note: Click Expand All to expand all of the table columns. Click Collapse
All to collapse the table columns.
Page 31
– Polymerase Read Length N50: 50% of all read bases came from
polymerase reads longer than this value.
– Longest Subread Mean: The mean subread length, considering only
the longest subread from each ZMW.
– Longest Subread N50: 50% of all read bases came from subreads
longer than this value when considering only the longest subread
from each ZMW.
• Control
– Poly RL Mean (bp): The mean polymerase read length of the control
reads.
– Total Reads: The number of control reads obtained.
– Concordance Mean: The average concordance (agreement)
between the control raw reads and the control reference sequence.
– Concordance Mode: The median concordance (agreement)
between the control raw reads and the control reference sequence.
• Local Base Rate: The average base incorporation rate, excluding
polymerase pausing events.
• Template
– Missing Adapter (%): The percent of pre-filter ZMWs that are
missing adapters.
– Adapter Dimer: The percent of pre-filter ZMWs which have
observed inserts of 0-10 bp. These are likely adapter dimers.
– Short Insert: The percent of pre-filter ZMWs which have observed
inserts of 11-100 bp. These are likely short fragment
contamination.
8. View plots for each SMRT Cell where data was successfully trans-
ferred. Clicking on an individual plot displays an expanded view.
These plots include:
• Polymerase Read Length: Plots the number of reads against the
polymerase read length.
• Control Polymerase RL: Displays the polymerase read length
distribution of the control, if used.
• Control Concordance: Maps control reads against the known control
reference and reports the concordance.
• Base Υield Density: Displays the number of bases sequenced in the
collection, according to the length of the read in which they were
observed. Values displayed are per unit of read length (i.e. the base
yield density) and are averaged over 2000 bp windows to gently
smooth the data. Regions of the graph corresponding to bases found
in reads longer than the N50 and N95 values are shaded in medium
and dark blue, respectively.
• Read Length Density: Displays a density plot of reads, hexagonally
binned according to their high-quality read length and median subread
length. For very large insert libraries, most reads consist of a single
subread and will fall along the diagonal. For shorter inserts, subreads
Page 32
will be shorter than the HQ read length, and will appear as horizontal
features. This plot is useful for quickly visualizing aspects of library
quality, including insert size distributions, reads terminating at
adapters, and missing adapters.
• HiFi Read Length Distribution: Displays a histogram distribution of
HiFi reads (QV ≥20), other CCS reads (three or more passes, but QV
<20), and other reads, by read length.
• Read Quality Distribution: Displays a histogram distribution of HiFi
reads (QV ≥20) and other CCS reads by read quality.
• Read Length vs Predicted Accuracy: Displays a heat map of CCS read
lengths and predicted accuracies. The boundary between HiFi reads
and other CCS reads is shown as a dashed line at QV 20.
• 5mC Detections: If 5mC calling in CpG motifs was performed, this
plot displays a reverse cumulative distribution of all detected CpG
motifs according to their predicted probability of methylation.
Page 33
Data Management
Use the Data Management module to:
A Data Set can contain sequencing data from one or multiple SMRT Cells
or collections from different runs, or a portion of a collection with
multiplexed samples.
Some Data Sets can contain basecalled data, while others can contain
analyzed data:
Page 34
Elements within a Data Set are of the same data type, typically subreads
or consensus reads, in aligned or unaligned format.
Page 35
the Data Management page.) General Project: This Data Set will be
visible to all SMRT Link users. All My Projects: This Data Set will be
visible only to users who have access to Projects that you are a mem-
ber of.
Note: Selecting a Project also filters the Data Sets that you can use
when creating the new Data Set.
8. In the Data Sets table, select one or more sets of sequence data.
9. (Optional) Choose how to view the Data Set table: 1) Tree Mode - A
barcoded Data Set displays as one row. 2) Flat Mode - A barcoded
Data Set and its demultiplexed subsets display as separate rows.
10. (Optional) Use the Search function to search for specific Data Sets.
See “Appendix B - Data search” on page 136 for details.
11. (Optional) If you selected one Data Set only, click the Filter Reads by
Length box above the Data Set list. Enter the minimum and/or maxi-
mum length to retain in the new Data Set.
12. (Optional) If you selected one Data Set only, click the Filter Reads by
QV≥ box above the Data Set list. Enter the minimum quality value to
retain in the new Data Set.
13. Click Save Data Set. The new Data Set becomes available for
starting analyses, viewing, or generating reports.
14. After the Data Set is created, click its name in the main Data
Management screen to see reports, metrics, and charts describing the
data included in the Data Set. See “Data Set QC reports” on page 37
for details.
Page 36
– Subreads: Reads containing the sequence from one or more single
passes of a polymerase on a single strand of an insert within a
SMRTbell template.
The Data Sets table displays the appropriate Data Sets available.
3. (Optional) Use the Search function to search for Data Sets. See
“Appendix B - Data search” on page 136 for details.
4. Click the name of the Data Set to copy. The Data Set Reports page
displays.
5. Click Copy. The main Data Management page displays; the new Data
Set has (copy) appended to the name.
Page 37
The following reports are generated by default:
• The Data Set Name, ID, description, and when it was created and
updated.
• The number of reads and their total length in base pairs.
• The names of the run and instrument that generated the data.
• The biological sample name and well sample names of the sample
used to generate the data.
• Path to the location on your cluster where the data is stored, which
can be used for command-line navigation. For information on
command-line usage, see SMRT® Tools reference guide (v11.0).
Completed Analyses
Lists all completed analyses that used the Data Set as input. To view
details about a specific analysis, click its name.
Page 38
What is a Project?
• Projects are collections of Data Sets, and can be used to restrict
access to Data Sets to a subset of SMRT Link users.
• By default, all Data Sets and data belong to the General Project and
are accessible to all users of SMRT Link.
• Any SMRT Link user can create a Project and be the owner. Projects
must have an owner, and can have multiple owners.
• Unless a Project is shared with other SMRT Link users, it is only
accessible by the owner.
• Only owner(s) can delete a Project; deleting a Project deletes all Data
Sets and analyses that are part of the Project.
Projects include:
Creating a Project
Page 39
2. Select Data Management.
3. Click + Create Project.
4. Enter a name for the new project.
5. (Optional) Enter a description for the project.
6. Click Select Data Sets and select one or more sets of sequence data
to associate with the project.
– (Optional) Use the Search function to search for Data Sets. See
“Appendix B - Data search” on page 136 for details.
7. (Optional) Share the Project with other SMRT Link users. (Note:
Unless a Project is shared, it is only visible to the owner.) There are
two ways to specify who can access the new Project, using the
controls in the Members section:
– Access for all SMRT Link Users: None - No one can access the
project other than the user who created it; View - Everyone can view
the Project; View/Edit: Everyone can see and edit the Project.
– Access for Individual SMRT Link Users: Enter a user name and
click Search By Name. Choose Owner, View, or View/Edit, then click
Add Selected User.
– Notes: A) Projects can have multiple owners. B) If you enable all
SMRT Link users to have View/Edit access, you cannot change an
individual member's access to View.
8. Click Save. The new project becomes available for SMRT Link users
who now have access.
Editing a Project
1. On the home page, select Data Management.
2. Click View > Projects.
3. Projects can be sorted and searched for:
– To sort Projects: Click a column title.
– To search for a Project, use the Search function. See “Appendix B -
Data search” on page 136 for details.
4. Click the name of the project to edit.
– (Optional) Edit the Project name or description.
– (Optional) Delete a Data Set associated with the Project: Click X.
– (Optional) Add one or more sets of sequence data to the Project:
Click Select Data Sets and select one or more Data Sets to add.
– (Optional) Delete members: Click X next to a Project member's
name to delete that user from access to the Project.
– (Optional) Add members to the Project: See Step 7 in Creating a
Project.
5. Click Save. The modified Project is saved.
Deleting a Project
1. On the home page, select Data Management.
2. Click View > Projects.
3. Click the name of the Project to delete.
Page 40
4. Click Delete. (This deletes all Data Sets and analyses that are part of
the Project from SMRT Link, but not from the server.)
Note: The Copy button is available for Subreads and HiFi reads, but not
for Reference and Barcode data.
Page 41
4. Select the data type to import:
– Subreads: XML file (.subreadset.xml) or ZIP file containing
information about subreads from Sequel II systems, such as paths
to the BAM files.
Use only ZIP files created by SMRT Link.
– HiFi reads: XML file (.consensusreadset.xml) or ZIP file
containing information about HiFi reads (reads generated with CCS
analysis whose quality value is equal to or greater than 20.)
Use only ZIP files created by SMRT Link.
– Barcodes: FASTA (.fa or .fasta), XML (.barcodeset.xml), or ZIP
files containing barcodes.
– References: FASTA (.fa or .fasta), XML (.referenceSet.xml), or
ZIP files containing a reference sequence for use in starting
analyses. (Note: If importing from a local system, Reference files
must be smaller than 15 MB.)
– Note: FASTA files imported into SMRT Link must not contain empty
lines or non-alphanumeric characters. The file name must not start
with a number. For information about the file types listed here, click
here.
5. Navigate to the appropriate file and click Import. The sequence data,
reference, or barcodes are imported and becomes available in SMRT
Link.
Page 42
– Barcodes: Files containing barcodes.
– References: Files containing a reference sequence for use in
starting analyses.
4. (Optional) Use the Search function to search for Data Sets, barcode
files, or reference files. See “Appendix B - Data search” on page 136
for details.
5. Select one or more sets of data to export. (Multiple data files are com-
bined as one ZIP file for export.)
6. Click Export Selected.
Page 43
SMRT® Analysis
After a run has completed, use SMRT Link’s SMRT Analysis module to
perform secondary analysis of the data.
Page 44
• To filter the list of jobs based on the Project(s) that they are
associated with: Click the Projects menu (located at the top-right of
the main SMRT Analysis page) and select a Project. See “What is a
Project?” on page 39 for details.
4. Click + Create New Job.
5. (Optional) Click Copy From..., choose a job whose settings you wish
to reuse, then click Select. The job name and the Data Type are filled
in. Go to Step 10 to select Data Set(s).
6. Enter a name for the job.
7. Specify the type of job to create:
– Analysis - Uses applications designed to produce biologically-
meaningful results. These applications only accept HiFi reads.
– Auto Analysis - For information on the Auto Analysis feature, see
“Automated analysis” on page 115 for details.
– Data Utility - Data processing utilities used as intermediate steps to
producing biologically-meaningful results.
8. If you selected Data Utility, select the type of data to use for the job:
– HiFi reads: Reads generated with CCS analysis whose quality value
is equal to or greater than 20.
– Subreads: Reads containing the sequence from one or more single
passes of a polymerase on a single strand of an insert within a
SMRTbell template.
9. (Optional) Specify the Project that this job will be associated with
using the Projects menu (located at the top-right of the SMRT
Analysis page.) General Project: This job will be visible to all SMRT
Link users. All My Projects: This job will be visible only to users who
have access to Projects that you are a member of. To restrict access
to a job, make sure to select a Project limited to the appropriate users
before starting the job.
Note: Selecting a Project also filters the Data Sets that you can use
when creating the job.
10. In the Data Sets table, select one or more sets of data to be analyzed.
– (Optional) Use the Search function to search for Data Sets. See
“Appendix B - Data search” on page 136 for details.)
– (Optional) Choose how to view the Data Set table: 1) Tree Mode - A
barcoded Data Set displays as one row. 2) Flat Mode - A barcoded
Data Set and its demultiplexed subsets display as separate rows.
– (Optional) For Data Sets that include demultiplexed subsets, you
can also select individual subsets as part of your selection. To do
so:
Page 45
B) Select one or more subsets, then click Back:
C) Click the list image to view or edit the full Data Set selection.
(The small blue number specifies how many Data Sets and/or
subsets were selected):
11. If you selected multiple Data Sets as input for the job, additional
options become available:
Page 46
– One Analysis for All Data Sets: Runs one job using all the selected
Data Sets as input, for a maximum of 30 Data Sets.
– One Analysis per Data Set - Identical Parameters: Runs one
separate job for each of the selected Data Sets, using the same
parameters, for a maximum of 10,000 Data Sets. Later in the
process, optionally click Advanced Parameters and modify
parameters.
– One Analysis per Data Set - Custom Parameters: Runs one
separate job for each of the selected Data Sets, using different
parameters for each Data Set, for a maximum of 16 Data Sets. Later
in the process, click Advanced Parameters and modify parameters.
Then click Start and Create Next. You can then specify parameters
for each of the included Data Sets.
– Note: The number of Data Sets listed is based on testing using
PacBio's suggested compute configuration, listed in SMRT Link
software installation guide (v11.0).
12. Click Next.
13. Select a secondary analysis application or data utility from the drop-
down list. (Different choices display based on your initial choice of
Analysis or Data Utility in Step 7. See“PacBio® secondary analysis
applications” on page 54 or “PacBio® data utilities” on page 90 for
details.)
Page 47
– Secondary analysis applications/data utilities also have advanced
parameters. These are set to default values, and need only be
changed when analyzing data generated in non-standard
experimental conditions.
14. (Optional) Click Import Analysis Settings and select a previously-
saved CSV file containing the desired settings (including Advanced
Parameters) for the selected application or data utility. The imported
settings are set.
16. (Optional) Click Advanced Parameters and specify the values of the
parameters you would like to change. Click OK when finished.
(Different applications/data utilities have different advanced parame-
ters.)
– To see information about parameters for all secondary analysis
applications and data utilities provided by PacBio, see “PacBio®
secondary analysis applications” on page 54 and “PacBio® data
utilities” on page 90.
Page 48
17. (Optional) Click Export to create a CSV file containing all the settings
you specified for the application/data utility. You can then import this
file when creating future jobs using the same application/data utility.
You can also use this exported file as a template for use with later
jobs.
18. (Optional) Click Back if you need to change any of the analysis
attributes selected in Step 7.
19. Click Start to submit the job. (If you selected multiple Data Sets as
input, click Start Multiple Jobs or Start and Create Next.)
20. Select SMRT Analysis from the Module Menu to navigate to the main
SMRT Analysis screen. There, the status of the job displays. When the
job has completed, click on its name - reports are available for the
completed job.
21. (Optional) To delete the completed job: Click Delete, then click Yes in
the confirmation dialog. The job is deleted from both the SMRT Link
interface and from the server.
Page 49
4. In the Name column, click the name of the sequence data of
interest. Details for the selected sequence data display.
Note: As the restarted job uses information from the original failed job, do
not delete the original job results.
If viewing the results page for the failed job: Click Restart.
Page 50
If not viewing the results page for the failed job:
Page 51
the appropriate file. The file is downloaded according to your browser
settings.
9. (Optional) Specify prefixe(s) used in the names of files generated by
the job. Example: Run Name can be included in the name of every file
generated by the job. Click Edit Output File Name Prefix, check the
type(s) of information to append to the file names, then click Save.
10. To view job log details: Click Data > SMRT Link Log.
11. To visualize the secondary analysis results: See “Visualizing data
using IGV” on page 118 for details.
1. On the home page, select SMRT Analysis. You see a list of all jobs.
2. (Optional) Click the funnel in the State column header, then click
Successful. This displays only successfully-completed jobs.
3. (Optional) Use the Search function to search for specific jobs. See
“Appendix B - Data search” on page 136 for details.
4. Click the job link of interest.
5. Click Copy - this creates a copy of the job, named Copy of <job
name>, using the same parameters.
6. Edit the name of the job.
7. Click Next.
8. (Optional) Edit any other parameters. See “PacBio® secondary analy-
sis applications” on page 54 or “PacBio® data utilities” on page 90 for
further details.
9. Click Start.
Exporting a job
You can export the entire contents of a job directory, including the input
sequence files, as a ZIP file. Afterwards, deleting the job saves room on
the SMRT Link server; you can also later reimport the exported job into
SMRT Link if necessary.
Importing a job
Note: You can only import a job that was created in SMRT Link, then
exported.
Page 52
2. Click Import Job.
3. Select a ZIP file containing the job to import.
4. Click Import. The job is imported and is available on the main SMRT
Analysis page.
Page 53
PacBio® secondary analysis applications
Following are the secondary analysis applications provided with SMRT
Analysis v11.0. These applications are designed to produce biologically-
meaningful results. Each application is described later in this document,
including all analysis parameters, reports and output files generated by the
application.
Genome Assembly
• Generate de novo assemblies of genomes, using HiFi reads.
• See “Genome Assembly” on page 55 for details.
HiFi Mapping (was Mapping)
• Align (or map) reads to a user-provided reference sequence.
• See “HiFi Mapping” on page 58 for details.
HiFiViral SARS-CoV-2 Analysis
• Analyze multiplexed viral surveillance samples for SARS-CoV-2, using
HiFi reads.
• See “HiFiViral SARS-CoV-2 Analysis” on page 62 for details.
Iso-Seq® Analysis
• Characterize full-length transcript isoforms, using HiFi reads.
• See “Iso-Seq® Analysis” on page 67 for details.
Microbial Genome Analysis
• Note: This combines and replaces the Microbial Assembly and Base
Modification Analysis applications in the previous release.
• Generate de novo assemblies of small prokaryotic genomes between
1.9-10 Mb and companion plasmids between 2 – 220 kb, and identify
methylated bases and associated nucleotide motifs.
• Optionally include identification of 6mA and 4mC modified bases and
associated DNA sequence motifs.
• See “Microbial Genome Analysis” on page 74 for details.
Minor Variants Analysis
• Identify and phase minor single nucleotide substitution variants in
complex populations.
• See “Minor Variants Analysis” on page 80 for details.
Structural Variant Calling
• Identify structural variants (Default: ≥20 bp) in a sample or set of
samples relative to a reference.
• See “Structural Variant Calling” on page 86 for details.
Page 54
Genome Use this application to generate high quality de novo assemblies of
Assembly genomes, using HiFi reads.
• The application accepts HiFi reads (BAM format) as input. HiFi reads
are reads generated with CCS analysis whose quality value is equal to
or greater than 20.
Genome Length 0 The approximate number of base pairs expected in the genome.
This is used only for downsampling; if the value is ≤ 0,
downsampling is disabled. Enter an integer, optionally followed
by one of the metric suffixes: k, M or G. Example: 4500k means
“4,500 kilobases” or “4,500,000”. M stands for Mega and G
stands for Giga.
Downsampled coverage 0 The input Data Set can be downsampled to a desired coverage,
provided that both the Downsampled Coverage and Genome
Length parameters are specified and > 0.
Downsampling applies to the entire assembly process,
including polishing.
This parameter selects reads randomly, using a fixed random
seed for reproducibility.
Page 55
Advanced parameters Default value Description
Page 56
• Number of Polishing Reads: The number of reads used to perform polishing
on this contig.
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
Page 57
HiFi Mapping Use this application to align (or map) data to a user-provided reference
sequence. The HiFi Mapping application:
• Accepts HiFi reads (BAM format) as input. HiFi reads are reads
generated with CCS analysis whose quality value is equal to or greater
than 20.
• Maps data to a provided reference sequence, and then identifies
consensus and variants against this reference.
• Haploid variants and small indels, but not diploid variants, are called as
a result to alignment to the reference sequence.
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected utility. The imported utility settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the application. You can then import this file when
creating future analyses using the same application. You can also use
this exported file as a template for use with later analyses.
Reference Set (Required)
• Specify a reference sequence to align the SMRT Cells reads to and to
produce alignments.
Consolidate Mapped BAMs for IGV (Default = OFF)
• By default, SMRT Link consolidates chunked BAM files for viewing in
IGV if the combined size is not more than 10 GB. Setting this option to
ON ignores the file size cutoff and consolidates the BAM files.
• Note: This setting can double the amount of storage used by the BAM
files, which can be considerable. Make sure to have enough disk space
available. This setting may also result in longer run times.
Parameters
Filters to Add to the Data NONE A semicolon-separated (not comma-separated) list of other
Set filters to add to the Data Set.
Minimum Mapped Length 50 The minimum required mapped read length, in base pairs.
(bp)
Bio Sample Name of NONE Populates the Bio Sample Name (Read Group SM tag) in the
Aligned Dataset aligned BAM file. If blank, uses the Bio Sample Name of the
input file. Note: Avoid using spaces in Bio Sample Names as this
may lead to third-party compatibility issues.
Minimum Gap-Compressed 70 The minimum required gap-compressed alignment identity, in
Identity (%) percent. Gap-compressed identity counts consecutive insertion
or deletion gaps as one difference.
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Page 58
Advanced parameters Default value Description
Advanced pbmm2 Options NONE Space-separated list of custom pbmm2 options. Not all
supported command-line options can be used, and HPC settings
cannot be modified. See SMRT® Tools reference guide v11.0
for details.
Target Regions (BED file) NONE (Optional) Specifies a BED file that defines regions for a Target
Regions report showing coverage over those regions.
See “Appendix C - BED file format for Target Regions report” on
page 138 for details.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 59
• CCS Read Length Mean (mapped): The mean read length of CCS reads that
mapped to the reference sequence, starting from the first mapped base of the
first mapped CCS read, and ending at the last mapped base of the last
mapped CCS read.
• CCS Read N50 (mapped): The read length at which 50% of the mapped bases
are in CCS reads longer than, or equal to, this value.
• CCS Read Length 95% (mapped): The 95th percentile of read length of CCS
reads that mapped to the reference sequence.
• CCS Read Length Max (mapped): The maximum length of CCS reads that
mapped to the reference sequence.
Mapping Report > CCS Mapping Statistics Summary
Displays mapping statistics per movie.
• Sample: The sample name for which the following metrics apply.
• Movie: The movie name for which the following metrics apply.
• Number of CCS Reads (mapped): The number of CCS reads that mapped to
the reference sequence. This includes adapters.
• CCS Read Length Mean (mapped): The mean read length of CCS reads that
mapped to the reference sequence, starting from the first mapped base of the
first mapped CCS read, and ending at the last mapped base of the last
mapped CCS read.
• CCS Read Length N50 (mapped): The read length at which 50% of the
mapped bases are in CCS reads longer than, or equal to, this value.
• Number of CCS Bases (mapped): The number of CCS bases that mapped to
the reference sequence.
• Mean Concordance (mapped): The mean concordance of subreads that
mapped to the reference sequence. Concordance for alignment is defined as
the number of matching bases over the number of alignment columns (match
columns + mismatch columns + insertion columns + deletion columns).
Mapping Report > Mapped CCS Read Length
• Histogram distribution of the number of mapped CCS reads by read
length.
Mapping Report > Mapped CCS Reads Concordance
• Histogram distribution of the number of CCS reads by the percent
concordance with the reference sequence. Concordance for CCS
reads is defined as the number of matching bases over the number of
alignment columns (match columns + mismatch columns + insertion
columns + deletion columns).
Mapping Report > Mapped Concordance vs Alignment Length
• Maps the percent concordance with the reference sequence against
the alignment length, in base pairs.
Coverage > Summary Metrics
• Mean Coverage: The mean depth of coverage across the reference sequence.
• Missing Bases: The percentage of the reference sequence without coverage.
Coverage > Coverage Across Reference
• Maps coverage across the reference.
Page 60
Coverage > Depth of Coverage
• Maps the reference regions against the percent coverage.
Coverage > Coverage vs. [GC] Content
• Maps (as a percentage, over a 100 bp window) the number of Gs and
Cs present across the coverage. The number of genomic windows
with the corresponding % of Gs and Cs is displayed on top. Used to
check that no coverage is lost over extremely biased base
compositions.
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
• Mapped BAM: The BAM file of subread alignments to the draft contigs used
for polishing.
• Mapped BAM Index: The BAI index file for the corresponding Mapped BAM
file.
Page 61
HiFiViral Use this application to analyze multiplexed samples sequenced with the
SARS-CoV-2 HiFiViral SARS-CoV-2 kit. For each sample, this analysis provides:
Analysis
• Consensus sequence (FASTA).
• Variant calls (VCF).
• HiFi reads aligned to the reference (BAM).
• Plot of HiFi read coverage depth across the SARS-CoV-2 genome.
Notes:
• The application accepts HiFi reads (BAM format) as input. HiFi reads
are reads generated with CCS analysis that have a quality value equal
to or greater than Phred-scaled Q20.
• This application is for SARS-CoV-2 analysis only and is not
recommended for other viral studies. The Wuhan reference genome is
provided by default to run the application, but advanced users may
specify other reference genomes. We have not tested the application
with reference genomes other than the Wuhan reference genome.
• The application is intended to identify variable sites and call a single
consensus sequence per sample. The output consensus sequence is
produced based on the dominant variant observed. Minor variant
information that passes through a default threshold may be encoded
in the raw VCF, but does not get propagated into the consensus
sequence FASTA.
• The HiFiViral SARS-CoV-2 Analysis application can be run using the
Auto Analysis feature available in Run Design. This feature allows
users to complete all necessary analysis steps immediately after
sequencing without manual intervention. The Auto Analysis workflow
includes CCS, Demultiplex Barcodes, and HiFiViral SARS-CoV-2
Analysis.
Auto Analysis in Run Design
Users may set the analysis to begin automatically after sequencing
completes using Auto Analysis in Run Design. See “HiFiViral SARS-CoV-2:
Creating Auto Analysis in Run Design” on page 116 for details.
Page 62
HiFiViral SARS-CoV-2 application workflow
1. Process the reads using the mimux tool to trim the probe arm
sequences.
2. Align the reads to the reference genome using pbmm2.
3. Call and filter variants using bcftools, generating the raw variant calls
in VCF file format. Filtering in this step removes low-quality calls (less
than Q20), and normalizes indels.
4. Filter low-frequency variants using vcfcons and generate a consensus
sequence by injecting variants into the reference genome. At each
position, a variant is called only if both the base coverage exceeds the
minimum base coverage threshold (Default = 4) and the fraction of
reads that support this variant is above the minimum variant frequency
threshold (Default = 0.5). See here for details.
Page 63
• Click Export to create a CSV file containing all the settings you
specified for the application. You can then import this file when
creating future analyses using the same application. You can also use
this exported file as a template for use with later analyses.
Reference Genome (Required)
• Specify the full viral genome against which to align the reads and call
variants. (The default is the Wuhan Reference genome.)
Parameters
Plate QC CSV NONE (Optional) Specify a CSV file to generate the Plate QC report,
which displays analysis results for each sample in the assay
plate. The CSV file must contain barcode (asymmetric pairs),
Bio Sample Name, assay plate IDs (can include 1-4 plates with
unique names; avoid special characters), and assay plate well
IDS in the format A01, A02,…H12. (To create a new file, click
Download Template, edit, and then save the CSV file.) The plate
and well information corresponds to the location of samples
during the SARS-CoV-2 enrichment assay.
Probes FASTA NONE Specify probe sequences in FASTA format if using probes other
than the standard probes shipped in the HiFiViral SARS-CoV-2
Kit.
Minimum Base Coverage 4 Specify the minimum read depth at each position to report
either a variant or a reference base. Positions with less than this
specified coverage will have an N base output in the consensus
sequence FASTA file. Increasing the minimum base coverage
may result in more Ns and loss of variant detection. We do not
recommend making this value lower than the default threshold
of 4, as it may increase the number of false positive variants
called.
Minimum Variant 0.5 Specify that only variants whose frequency is greater than this
Frequency value are reported. This frequency is determined based on the
read depth (DP) and allele read count (AD) information in the
VCF output file. We recommend using the default value to
properly call the dominant alternative variant while also filtering
out potential artifacts.
Advanced Processing NONE Additional options to pass to the mimux preprocessing tool for
Options trimming and filtering reads by probe sequences. Options
should be entered in space-separated format. See the HiFiViral
SARS-CoV-2 Analysis section of SMRT Tools reference guide
(v11.0) for details.
Minimum Barcode Score 80 A barcode score measures the alignment between a barcode
attached to a read and an ideal barcode sequence, and is an
indicator of how well the chosen barcode pair matches. It
ranges between 0 (no match) and 100 (a perfect match). This
parameter specifies that reads with barcode scores below this
minimum value are not included in analysis.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 64
Summary Report > Summary Metrics
• Samples: The count of all input samples, whether or not they passed analysis.
• Samples with Genome Coverage > 90%: The number of samples where at
least 90% of bases have at least four mapped reads overlapping their
position.
• Samples with Genome Coverage > 95%: The number of samples where at
least 95% of bases have at least four mapped reads overlapping their
position.
• Samples Failing Workflow: The number of samples for which the analysis
was unable to generate a per-sample report due to an absence of usable data.
Summary Report > Sample Summary
• Bio Sample Name: The name of the biological sample associated with the
variants. (Note: Any spaces in the name are substituted by new line
characters for consistency with output file names.)
• Substitutions: The count of all called substitutions in the consensus
sequence for the sample.
• Insertions: The count of all called insertions in the consensus sequence for
the sample.
• Deletions: The count of all called deletions in the consensus sequence for the
sample.
• Reads: The total number of HiFi reads for the sample.
• Read Coverage: The mean number of mapped reads overlapping with each
position in the reference genome.
• On-Target Rate: The mapping yield of reads; the number of unique mapped
reads divided by the total number of reads.
• Multiple Strains (Probability): Samples are flagged as having multiple strains
if the probability is at least 0.95. Samples may contain multiple strains due to
sample contamination or presence of multiple strains in the RNA extract. To
classify a sample as multi-strain, we tolerate error by using the binomial
cumulative distribution function (with a fixed probability of 0.2). This feature
is supported for samples with Ct < 26 with minor frequencies > 20%. Samples
must have > 70% genome coverage to be called Multiple Strains.
• Ns: The number of bases in the consensus sequence that are Ns.
• Genome Coverage: The percentage of bases with at least four mapped reads
overlapping their position by default. See the Advanced Parameters dialog to
adjust minimum base coverage.
Summary Report > Genome Coverage
• Coverage plot showing the per-sample mean read coverage within a
window of 100 bp. The shaded region displays the 25th to 75th
percentile in the range of coverage across all samples, and the darker
solid line displays the median coverage across all samples.
Summary Report > Plate QC
Plot showing analysis results for each plate cell used. This plot is
generated only if the user supplies a Plate QC CSV file mapping Bio
Sample Names to Well IDs in Advanced Parameters.
Page 65
• White wells do not include a sample.
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
Page 66
Iso-Seq® The Iso-Seq application enables analysis and functional characterization
Analysis of full-length transcript isoforms for sequencing data generated on PacBio
instruments.
• The application accepts HiFi reads (BAM format) as input. HiFi reads
are reads generated with CCS analysis that have a quality value equal
to or greater than Q20.
Notes on Multiplexed Data
There are two ways in which an Iso-Seq library can be multiplexed:
Demultiplexed before
Multiplexed method Primer set selection
Iso-Seq?
Page 67
3. Collapse (Optional): When a reference genome is selected, map HQ
isoforms to the genome, then collapse redundant isoforms into unique
isoforms.
Iso-Seq determines two FLNC reads to be the same isoform, and will place
them in the same cluster, if the two reads:
Iso-Seq will only output clusters that have at least two FLNC reads.
Example 1: The Iso-Seq cDNA Primer primer set, included with the SMRT
Link installation.
>IsoSeq_5p
GCAATGAAGTCGCAGGGTTGGG
>IsoSeq_3p
GTACTCTGCGTTGATACCACTGCTT
Example 2: The Iso-Seq 12 Barcoded cDNA Primers set, included with the
SMRT Link installation.
Page 68
Users using barcoded cDNA primers listed in the Appendix 3 -
Recommended barcoded NEBNext single cell cDNA PCR primer and Iso-
Seq Express cDNA PCR primer sequences section of the document
Procedure & checklist - Preparing Iso-Seq® libraries using SMRTbell
prep kit 3.0, should select this option.
>bc1001_5p
CACATATCAGAGTGCGGCAATGAAGTCGCAGGGTTGGGG
>bc1002_5p
ACACACAGACTGTGAGGCAATGAAGTCGCAGGGTTGGGG
…
>IsoSeq_5p
GCAATGAAGTCGCAGGGTTGGG
>dT_BC1001_3p
AAGCAGTGGTATCAACGCAGAGTACCACATATCAGAGTGCG
>dT_BC1002_3p
AAGCAGTGGTATCAACGCAGAGTACACACACAGACTGTGAG
>dT_BC1003_3p
AAGCAGTGGTATCAACGCAGAGTACACACATCTCGTGAGAG
>dT_BC1004_3p
AAGCAGTGGTATCAACGCAGAGTACCACGCACACACGCGCG
The Lexogen TeloPrime cDNA kit contains As in the 3’ primer that cannot
be differentiated from the polyA tail. For best results, remove the As from
the 3’ end as shown below:
>TeloPrimeModified_5p
TGGATTGATATGTAATACGACTCACTATAG
>TeloPrimeModified_3p
CGCCTGAGA
Reference Set (Optional)
• Optionally specify a reference genome to align High Quality isoforms
to, and to collapse isoforms mapped to the same genomic loci.
Run Clustering (Default = ON)
• Specify ON to generate consensus isoforms.
• Specify OFF to classify reads only and not generate consensus
isoforms. The Reference Set will also be ignored.
Page 69
Cluster Barcoded Samples Separately (Default = OFF)
• Specify OFF if barcoded samples are from the same species, but
different tissues, or samples of the same genes but different
individuals. The samples are clustered with all barcodes pooled.
• Specify ON if barcoded samples are from different species. The
samples are clustered separately by barcode.
• In either case, the samples on the results page are automatically
named BioSample_1 through BioSample_N.
Parameters
Require and trim Poly(A) ON ON means that polyA tails are required for a sequence to be
Tail considered full length. OFF means sequences do not need polyA
tails to be considered full length.
Minimum Mapped Length 50 The minimum required mapped HQ isoform sequence length (in
(bp) base pairs) for the Iso-Seq mapping-collapse step.
Note: This is applicable only if a reference genome is provided.
Minimum Gap-Compressed 95 The minimum required gap-compressed alignment identity, in
Identity (%) percent. Gap-compressed identity counts consecutive insertion
or deletion gaps as one difference.
Note: This is applicable only if a reference genome is provided.
Minimum Mapped 99 The minimum required HQ transcript isoform sequence
Coverage (%) alignment coverage (in percent) for the Iso-Seq mapping-
collapse step.
Note: This is applicable only if a reference genome is provided.
Maximum Fuzzy Junction 5 The maximum junction difference between two mapped
Difference (bp) isoforms to be collapsed into a single isoform. If the junction
differences are all less than the provided value, they will all be
collapsed. Setting to 0 requires all junctions to be exact to be
collapsed into a single isoform. Applicable only if a reference
genome is provided.
Min. CCS Predicted 10 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for Iso-Seq Analysis is 20 (QV 20), or 99% predicted accuracy.
Filters to Add to the Data NONE A semicolon-separated (not comma-separated) list of other
Set filters to add to the Data Set.
Advanced pbmm2 Options NONE Space-separated list of custom pbmm2 options. (pbmm2 is
already running with --preset ISOSEQ.) Not all supported
command-line options can be used, and HPC settings cannot be
modified. See SMRT® Tools reference guide v11.0 for details.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 70
• Non-Concatemer Reads with 5’ and 3’ Primers and Poly-A Tail: The number
of non-concatemer CCS reads with 5’ and 3’ primers and polyA tails detected.
This is usually the number for full-length, non-concatemer (FLNC) reads,
unless polyA tails are not present in the sample.
• Mean Length of Full-Length Non-Concatemer Reads: The mean length of the
non-concatemer CCS reads with 5' and 3' primers and polyA tails detected.
• Unique Primers: The number of unique primers in the sequence.
• Mean Reads per Primer: The mean number of CCS reads per primer.
• Max. Reads per Primer: The maximum number of CCS reads per primer.
• Min. Reads per Primer: The minimum number of CCS reads per primer.
• Reads without Primers: The number of CCS reads without a primer.
• Percent Bases in Reads with Primers: The percentage of bases in CCS reads
in the sequence data that contain primers.
• Percent Reads with Primers: The percentage of CCS reads in the sequence
data that contain primers.
CCS Analysis Read Classification > Primer Data
• Bio Sample Name: The name of the biological sample associated with the
primer.
• Primer Name: A string containing the pair of primer indices associated with
this biological sample.
• CCS Reads: The number of CCS reads associated with the primer.
• Mean Primer Quality: The mean primer quality associated with the primer.
• Reads with 5’ and 3’ Primers: The number of CCS reads with 5’ and 3’ cDNA
primers detected.
• Non-Concatemer Reads with 5’ and 3’ Primers: The number of non-
concatemer CCS reads with 5’ and 3’ primers detected.
• Non-Concatemer Reads with 5’ and 3’ Primers and Poly-A Tail: The number
of non-concatemer CCS reads with 5’ and 3’ primers and polyA tails detected.
This is usually the number for full-length, non-concatemer (FLNC) reads,
unless polyA tails are not present in the sample.
CCS Analysis Read Classification > Primer Read Statistics
• Number Of Reads Per Primer: Maps the number of reads per primer, sorted
by primer ranking.
• Primer Frequency Distribution: Maps the number of samples with primers by
the number of reads with primers.
• Mean Read Length Distribution: Maps the read mean length against the
number of samples with primers.
CCS Analysis Read Classification > Primer Quality Scores
• Histogram of primer scores.
CCS Analysis Read Classification > Length of Full-Length Non-
Concatemer Reads
• Histogram of the read length distribution of non-concatemer CCS
reads with 5' and 3' primers and polyA tails detected.
Transcript Clustering > Summary Metrics
• Sample Name: The sample name for which the following metrics apply.
• Number of High-Quality Isoforms: The number of consensus isoforms that
have an estimated accuracy above the specified threshold.
• Number of Low-Quality Isoforms: The number of consensus isoforms that
have an estimated accuracy below the specified threshold.
Page 71
Transcript Clustering > Length of Consensus Isoforms
• Histogram of the consensus isoform lengths and the distribution of
isoforms exceeding a read length cutoff.
Transcript Mapping > Summary Metrics
• Sample Name: Sample name for which the following metrics apply.
• Number of mapped unique isoforms: The number of unique isoforms, where
each unique isoform is generated by collapsing redundant HQ isoforms (such
as those have very minor differences from one to one another) to one
isoform. Each unique isoform may be generated from one or multiple HQ
isoforms.
• Number of mapped unique loci: The number of unique mapped genomic loci
among all unique isoforms. Multiple unique isoforms may map to the same
genomic location, indicating these unique isoforms are transcribed from the
same gene family, but spliced differently.
Transcript Mapping > Length of Mapped Isoforms
• Histogram of mapped isoforms binned by read length and the
distribution of mapped isoforms exceeding a read length cutoff.
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
Page 72
• Collapsed Filtered Isoforms: Mapped, unique isoforms, in FASTQ format.
This is the Mapping step output that is recommended output file to work with.
When the input Data Set is a ConsensusReadSet, only a FASTA file is
generated.
• Collapsed Filtered Isoforms Groups: Report of isoforms mapped into
collapsed filtered isoforms.
• Full-length Non-Concatemer Read Assignments: Report of full-length read
association with collapsed filtered isoforms, in text format.
• Collapsed Filtered Isoform Counts: Report of read count information for each
collapsed filtered isoform.
Data > IGV Visualization Files
The following files are used for visualization using IGV; see “Visualizing
data using IGV” on page 118 for details.
Note: For details on custom PacBio tags added to output BAM files by the
Iso-Seq Application, see page 54 of SMRT Tools reference guide (v11.0),
or see here for details.
Page 73
Microbial Use this application to generate de novo assemblies of small prokaryotic
Genome genomes between 1.9-10 Mb and companion plasmids between 2 – 220
Analysis kb. This application can optionally include analysis of 6mA and 4mC
modified bases and associated DNA sequence motifs. (This requires
kinetic information.)
Note: This combines and replaces the Microbial Assembly and Base
Modification Analysis applications in the previous release.
• Accepts HiFi reads (BAM format) as input. HiFi reads are reads
generated with CCS analysis whose quality value is equal to or greater
than 20.
• Includes chromosomal- and plasmid-level de novo genome assembly,
circularization, polishing, and rotation of the origin of replication for
each circular contig.
• Performs base modification detection to identify 4mCm and 6mA and
associated DNA sequence motifs. (This requires kinetic information.)
• Facilitates assembly of larger genomes (yeast) as well.
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected application. The imported application settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the application. You can then import this file when
creating future analyses using the same application. You can also use
this exported file as a template for use with later analyses.
Run Base Modification Analysis (Default = ON)
• Run Base Modification analysis on the final assembly. This only
applies if the assembly is not empty, and the input data contains the
correct kinetic tags.
Find Modified Base Motifs (Default = ON)
• Perform motif detection on the results of base modification analysis.
Parameters
Page 74
Advanced parameters Default value Description
Run secondary polish ON Specify that an additional polishing stage be run at the end of
the workflow.
Base modifications to m4C,m6A Specify the base modifications to identify, in a comma-
identify separated list.
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Filters to Add to the Data NONE A semicolon-separated (not comma-separated) list of other
Set filters to add to the Data Set.
Cleanup intermediate files ON Removes intermediate files from the run directory to save
space.
Minimum Qmod Score 35 Specify the minimum Qmod score to use in motif-finding.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 75
Mapping Report > CCS Mapping Statistics Summary
Displays mapping statistics per movie.
• Sample: The sample name for which the following metrics apply.
• Movie: The movie name for which the following metrics apply.
• Number of CCS Reads (mapped): The number of CCS reads that mapped to
the reference sequence. This includes adapters.
• CCS Read Length Mean (mapped): The mean read length of CCS reads that
mapped to the reference sequence, starting from the first mapped base of the
first mapped CCS read, and ending at the last mapped base of the last
mapped CCS read.
• CCS Read Length N50 (mapped): The read length at which 50% of the
mapped bases are in CCS reads longer than, or equal to, this value.
• Number of CCS Bases (mapped): The number of CCS bases that mapped to
the reference sequence.
• Mean Concordance (mapped): The mean concordance of subreads that
mapped to the reference sequence. Concordance for alignment is defined as
the number of matching bases over the number of alignment columns (
match columns + mismatch columns + insertion columns + deletion
columns).
Mapping Report > Mapped CCS Read Length
• Histogram distribution of the number of mapped CCS reads by read
length.
Mapping Report > Mapped CCS Reads Concordance
• Histogram distribution of the number of CCS reads by the percent
concordance with the reference sequence. Concordance for CCS
reads is defined as the number of matching bases over the number of
alignment columns (match columns + mismatch columns + insertion
columns + deletion columns).
Mapping Report > Mapped Concordance vs Read Length
• Maps the percent concordance with the reference sequence against
the CCS read length, in base pairs.
Polished Assembly > Summary Metrics
Displays statistics on the contigs from the de novo assembly that were
corrected by Arrow.
Page 76
• Circular: Marks whether circularity of the contig was detected. Output values
are yes and no.
• Coverage: The average coverage across the contig, calculated by the sum of
coverage of all bases in the contig divided by the number of bases.
Coverage > Summary Metrics
Displays depth of coverage across the de novo-assembled genome, as
well as depth of coverage distribution.
• Mean Coverage: The mean depth of coverage across the assembled genome
sequence.
• Missing Bases: The percentage of the genome’s sequence that have zero
depth of coverage.
Coverage > Coverage across Reference
• Displays coverage at each position of the draft genome assembly.
Coverage > Depth of Coverage
• Histogram distribution of the draft assembly regions by the coverage.
Coverage > Coverage vs. [GC] Content
• Maps (as a percentage, over a 100 bp window) the number of Gs and
Cs present across the coverage. The number of genomic windows
with the corresponding % of Gs and Cs is displayed on top. Used to
check that no coverage is lost over extremely biased base
compositions.
Base Modifications > Kinetic Detections
• Per-Base Kinetic Detections: Maps the modification QV against per-
strand coverage.
• Kinetic Detections Histogram: Histogram distribution of the number of
bases by modification QV.
Modified Base Motifs > Modified Base Motifs
Displays statistics for the methyltransferase recognition motifs detected.
Page 77
• Partner Motif: For motifs that are not self-palindromic, this is the
complementary sequence.
• Mean IPD Ratio: The mean inter-pulse duration. An IPD ratio greater than 1
means that the sequencing polymerase slowed down at this base position,
relative to the control. An IPD ratio less than 1 indicates speeding up.
• Group Tag: The motif group of which the motif is a member. Motifs are
grouped if they are mutually or self reverse-complementary. If the motif isn’t
complementary to itself or another motif, the motif is given its own group.
• Objective Score: For a given motif, the objective score is defined as
(fraction methylated)*(sum of log-p values of matches).
• Mapped BAM: The BAM file of subread alignments to the draft contigs used
for polishing.
Page 78
• Mapped BAM Index: The BAI index file for the corresponding Mapped BAM
file.
• Final Polished Assembly: The polished assembly before oriC rotation is
applied, in FASTA format.
• Final Polished Assembly Index: The BAI index file for the polished assembly
before oriC rotation is applied.
• Per-Base IPDs for IGV: BigWig file containing encoded per-base IPD ratios.
Page 79
Minor Variants Use this application to identify and phase minor single nucleotide
Analysis substitution variants in complex populations. This application is powered
by the juliet algorithm:
• Accepts HiFi reads (BAM format) as input. HiFi reads are reads
generated with CCS analysis whose quality value is equal to or greater
than 20.
• Includes reference-based codon amino acid-calling (indel variants not
called) in amplicons ≤4kb, fully spanned by long reads.
• Includes extensive application reports for the HIV pol coding region,
including drug resistance annotation from publicly-available
databases.
• Includes reliable 1% minor variant detection with 6000 high-quality
CCS reads with predicted accuracy of ≥0.99 per sample.
• The current version of this application provides additional reports for
the HIV pol coding region, but it can be configured for any target
organism or gene.
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected application. The imported application settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the application. You can then import this file when
creating future analyses using the same application. You can also use
this exported file as a template for use with later analyses.
Reference Set (Required)
• Specify a reference sequence to align the SMRT Cells reads to and to
produce alignments.
Target Config (Required)
• Defines genes of interest within the reference and, optionally, drug
resistance mutations for specific variants. Minor Variants Analysis
contains one predefined target configuration for HIV HXB2. To specify
this target configuration, enter HIV_HXB2 into the Target Config field.
To specify a custom target configuration for any organism or gene
other than HIV HXB2: Enter either the path to the target configuration
JSON file on the SMRT Link server, or the entire content of the JSON
file.
Parameters
Maximum Variant 100 Specify that only variants whose percentage of the population is
Frequency to Report (%) less than this value be reported. Lowering this value helps to
(Required) phase low-frequency variants when the highest frequency
variant is different from the reference.
Minimum Variant 0.1 Specify that only variants whose percentage of the population is
Frequency to Report (%) greater than this value be reported. Increasing this value helps
(Required) to reduce PCR noise.
Page 80
Advanced parameters Default value Description
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Phase Variants ON Specify whether to phase variants and cluster haplotypes.
Only Report Variants in OFF Specify whether to only report variants that confer drug
Target Config resistance, as listed in the target configuration file.
Region of Interest NONE Specify genomic regions of interest; reads will be clipped to that
region. If not specified, specifies all reads.
Target Config Override NONE If defined (and the main Target Config option is set to NONE),
this string is interpreted as either a file system path to a JSON
file, or the actual JSON content.
Filters to Add to the Data NONE A semicolon-separated (not comma-separated) list of other
Set filters to add to the Data Set.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 81
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
1. Input Data
Summarizes the data provided, the exact call for juliet, and juliet
version for traceability purposes.
2. Target Config
3. Variant Discovery
Page 82
the gene, mutated codon, percentage, mutated amino acid, coverage, and
possible affected drugs.
4. Drug Summaries
Page 83
Phasing
The default mode is to call amino-acid/codon variants independently.
Setting the Phase Variants parameter to On, variant calls from distinct
haplotypes are clustered and visualized in the HTML output.
There are two types of tooltips in the haplotype section of the table.
The first tooltip is for the Haplotypes % and shows the number of reads
that count towards (a) actually reported haplotypes, (b) haplotypes that
have less than 10 reads and are not being reported, and (c) haplotypes
that are not suitable for phasing. Those first three categories are mutually
exclusive and their sum is the total number of reads going into juliet.
For (c), the three different marginals provide insights into the sample
Page 84
quality; as they are marginals, they are not exclusive and can overlap. The
following image shows a sample with bad PCR conditions:
The second type of tooltip is for each haplotype percentage and shows the
number of reads contributing to this haplotype:
Page 85
Structural Use this application to identify structural variants (Default: ≥20 bp) in a
Variant Calling sample or set of samples relative to a reference. Variant types identified
are insertions, deletions, duplications, copy number variants (CNVs),
inversions, and translocations.
• The application accepts HiFi reads (BAM format) as input. HiFi reads
are reads generated with CCS analysis whose quality value is equal to
or greater than 20.
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected application. The imported application settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the application. You can then import this file when
creating future analyses using the same application. You can also use
this exported file as a template for use with later analyses.
Reference Set (Required)
• Specify a reference genome against which to align the reads and call
variants.
Parameters
Page 86
Advanced parameters Default value Description
Advanced pbmm2 Options NONE Space-separated list of custom pbmm2 options. Not all
supported command-line options can be used, and HPC settings
cannot be modified. See SMRT® Tools reference guide v11.0
for details.
Advanced pbsv Options NONE Additional pbsv command-line arguments. See SMRT® Tools
reference guide v11.0 for details.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Note: The Data Set field Bio Sample Name identifies which Data Sets
belong to which biological samples.
• If multiple Data Sets with the same Bio Sample Name are selected and
submitted, the Structural Variant Calling application merges those
Data Sets as belonging to the same sample.
• If any input Data Sets do not have a Bio Sample Name specified, they
are merged (if there are multiple such Data Sets) and their Bio Sample
Name is set to UnnamedSample in the analysis results.
Reports and data files
The Structural Variant Calling application generates the following reports:
• Insertions (total bp): The count and total length (in base pairs) of all called
insertions in the sample.
• Deletions (total bp): The count and total length (in base pairs) of all called
deletions in the sample.
• Inversions (total bp): The count and total length (in base pairs) of all called
inversions in the sample.
• Translocations: The count of all called translocations in the sample.
• Duplications (total bp): The count and total length (in base pairs) of all called
duplications in the sample.
• Total Variants (total bp): The count and total length (in base pairs) of all
variants in the sample.
Page 87
Report > Count by Sample (Genotype)
This table describes the genotype of called variants broken down by
individual sample. For each sample, only variants for which the sample
has a heterozygous (“0/1”) or homozygous alternative (“1/1”) genotype
are considered.
Page 88
• Aligned Reads (per sample): Aligned reads, in BAM format, separated by
individual.
• Index of Aligned Reads (per sample): BAM index files associated with the
Aligned Reads BAM files.
• Structural Variants: All the structural variants, in VCF format. (See here for
details.)
Page 89
PacBio® data utilities
Following are data processing utilities provided with SMRT Analysis
v11.0. These utilities are used as intermediate steps to producing
biologically-meaningful results. Each utility is described later in this
document, including all parameters, reports and output files generated by
the utility.
Note: The following data utilities accept only HiFi reads as input.
Page 90
5mC CpG Use this utility to analyze the kinetic signatures of cytosine bases in CpG
Detection motifs to identify the presence of 5mC.
• The utility accepts HiFi reads (BAM format) as input. HiFi reads are
reads generated with CCS analysis whose quality value is equal to or
greater than 20. The utility also requires kinetics information.
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected utility. The imported utility settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the utility. You can then import this file when creating
future jobs using the same utility. You can also use this exported file
as a template for use with later jobs.
Parameters
Keep Kinetics in Output OFF If ON, specifies that the IPD and PulseWidth records are included
in the output BAM file.
Filters to Add to the Data NONE A semicolon-separated (not comma-separated) list of other
Set filters to add to the Data Set.
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 91
Demultiplex Use this utility to separate sequence reads by barcode. (See “Working
Barcodes with barcoded data” on page 107 for more details.)
• The utility accepts HiFi reads (BAM format) as input. HiFi reads are
reads generated with CCS analysis whose quality value is equal to or
greater than 20.
• Barcoded SMRTbell templates are SMRTbell templates with adapters
flanked by barcode sequences, located on both ends of an insert.
• For symmetric and tailed library designs, the same barcode is
attached to both sides of the insert sequence of interest. The only
difference is the orientation of the trailing barcode. For asymmetric
designs, different barcodes are attached to the sides of the insert
sequence of interest.
• Barcode names and sequences, independent of orientation, must be
unique.
• Most-likely barcode sequences per SMRTbell template are identified
using a FASTA-format file of the known barcode sequences.
Given an input set of barcodes and a BAM Data Set, the Demultiplex
Barcodes utility produces:
• A set of BAM files whose reads are annotated with the barcodes;
• A ConsensusReadSet file that contains the file paths of that
collection of barcode-tagged BAM files and their related files.
Notes on Iso-Seq Multiplexed Data
There are two ways in which an Iso-Seq library can be multiplexed:
Page 92
Run Demultiplex
Multiplexed method Barcodes utility?
Not multiplexed NO
Barcoded adapters YES
Barcoded cDNA primer NO
Interactively:
1. Click Interactively, then drag barcodes from the Available Barcodes
column to the Included Barcodes column. (Use the checkboxes to
select multiple barcodes.)
2. (Optional) Click a Bio Sample field to edit the Bio Sample Name
associated with a barcode. Note: Avoid spaces in Bio Sample Names
as they may lead to third-party compatibility issues.
3. (Optional) Click Download as a file for later use.
4. Click Save to save the edited barcodes/Bio Sample names. You see
Success on the line below, assuming the file is formatted correctly.
Page 93
From a file:
1. Click From a File, then click Download File. Edit the file and enter the
biological sample names associated with the barcodes in the second
column, then save the file. Use alphanumeric characters, spaces
(allowed but not recommended for compatibility with third-party
downstream software), hyphens, underscores, colons, or periods only
- other characters will be removed automatically, with a maximum of
40 characters. If you did not use all barcodes in the Autofilled Barcode
Name file in the sequencing run, delete those rows.
– Note: Open the CSV file in a text editor and check that the columns
are separated by commas, not semicolons or tabs.
2. Select the Barcoded Sample File you just edited. You see Success on
the line below, assuming the file is formatted correctly.
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Minimum Barcode Score 80 A barcode score measures the alignment between a barcode
attached to a read and an ideal barcode sequence, and is an
indicator of how well the chosen barcode pair matches. It
ranges between 0 (no match) and 100 (a perfect match).
Specifies that reads with barcode scores below this minimum
value are not included in the analysis. This affects the output
BAM file and the output demultiplexed Data Set XML file.
Advanced lima Options NONE Space-separated list of custom lima options. Not all supported
command-line options can be used, and HPC settings cannot be
modified. See the Demultiplex Barcodes section of the
document SMRT® Tools reference guide v11.0 for information
on lima.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 94
• Unbarcoded Reads: The number of reads without barcodes in the sequence
data.
• Percent Bases in Barcoded Reads: The percentage of bases in sequence
data reads that contain barcodes.
• Percent Barcoded Reads: The percentage of reads in the sequence data that
contain barcodes.
Barcodes > Barcode Data
• Bio Sample Name: The name of the biological sample associated with the
barcode combination.
• Barcode Name: A string containing the pair of barcode indices for which the
following metrics apply.
• Polymerase Reads: The number of polymerase reads associated with the
barcode combination.
• Bases: The number of bases associated with the barcode combination.
• Mean Read Length: The mean read length of reads associated with the.
barcode combination.
• Mean Barcode Quality: The mean barcode quality associated with the
barcode combination.
Barcodes > Inferred Barcodes
• Barcode Name: The barcode name.
• Number of ZMWs: The number of ZMWs out of the first 50,000 that are
inferred to be assigned to the barcode combination.
• Mean Barcode Score: The mean barcode score associated with the reads
inferred to be associated with the barcode combination.
• Selected: Yes if the number of ZMWs is at least 10, No otherwise.
Barcodes > Barcoded Read Statistics
• Number of Reads per Barcode: Line graph displays the number of sorted
reads per barcode.
– Good performance: The Number of Reads per Barcode line (blue) should
be mostly linear. Note that this depends on the choice of Y-axis scale. The
mean Number of Reads per Barcode line (red) should be near the middle
of the graph and should not be skewed by samples with too many or too
few barcodes.
– Questionable performance: A sharp discontinuity in the blue line, followed
by no yield, with the red line way far from the center. Check the output file
Inferred Barcodes, note the correct barcodes used, and consider
reanalyzing the multiplexed samples with the correct Bio Sample names
for the barcodes actually used. If you reanalyze the data, ensure that the
Barcode Name file includes only the correct barcodes used.
• Barcode Frequency Distribution: Histogram distribution of read counts per
barcode.
– Good performance: A uniform distribution, which is most often a fairly
tight symmetric normal distribution, with few barcodes in the tails.
– Questionable performance: A large peak at zero. This can indicate use of
incorrect barcodes. Check the output file Inferred Barcodes, note the
correct barcodes used, and consider reanalyzing the multiplexed samples
with the correct Bio Sample names for the barcodes actually used. If you
reanalyze the data, ensure that the Barcode Name file includes only the
correct barcodes used.
• Mean Read Length Distribution: Histogram distribution of the mean
polymerase read length for all samples.
– Good performance: The distribution should be normal with a relatively
tight range.
Page 95
– Questionable performance: A spread out distribution, with a mode
towards the low end.
Page 96
• Barcode Files: Barcoded subread Data Sets; one file per barcode.
• Barcoding Summary CSV: Data displayed in the reports, in CSV format. This
includes Bio Sample Name.
• Barcode Summary: Text file listing how many ZMWs were filtered, how many
ZMWs are the same or different, and how many reads were filtered.
• Inferred Barcodes: Inferred barcodes used in the analysis. The barcoding
algorithm looks at the first 35,000 ZMWs, then selects barcodes with ≥10
counts and mean scores ≥45.
• Unbarcoded Reads: BAM file containing reads not associated with a
barcode.
• demultiplex.<barcode>.hifi.reads.fastq.gz: Gzipped HiFi reads in FASTQ
format, one file per barcode.
Page 97
Export Reads Use this utility to export HiFi reads that pass filtering criteria as FASTA,
FASTQ and BAM files.
• The utility accepts HiFi reads (BAM format) as input. HiFi reads are
reads generated with CCS analysis whose quality value is equal to or
greater than 20.
• For barcoded runs, you must first run the Demultiplex Barcodes utility
to create BAM files before using this utility.
• This utility does not generate any reports.
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected utility. The imported utility settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the utility. You can then import this file when creating
future jobs using the same utility. You can also use this exported file
as a template for use with later jobs.
Output FASTA File (Default = ON)
• Outputs a single FASTA/FASTQ file containing all the reads that
passed the filtering criteria.
Output BAM File (Default = OFF)
• Outputs a single BAM file containing all the reads that passed the
filtering criteria.
Min. CCS Predicted Accuracy (Phred Scale) Default = 20
• Phred-scale integer QV cutoff for filtering HiFi reads. The default for
all applications is 20 (QV 20), or 99% predicted accuracy.
Parameters
Filters to Add to the Data NONE A semicolon-separated (not comma-separated) list of other
Set filters to add to the Data Set.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 98
• hifi_reads.fastq.gz: Sequence data that passed filtering criteria, converted to
Gzipped FASTQ format.
• <Reads>.bam: Sequence data that passed filtering criteria.
Page 99
Mark PCR Use this utility to remove duplicate reads from a HiFi reads Data Set
Duplicates created using an ultra-low DNA sequencing protocol.
• The utility accepts HiFi reads (BAM format) as input. HiFi reads are
reads generated with CCS analysis whose quality value is equal to or
greater than 20.
Note: If starting with a very low-input DNA sample using the SMRTbell
gDNA sample amplification kit, you must run this utility (preceded by the
Trim Ultra-Low Adapters utility) on the resulting Data Set prior to running
any secondary analysis application.
Identify Duplicates Across ON Duplicate reads are identified per sequencing library. The library
Sequencing Libraries is specified in the BAM read group LB tag, which is set using the
Well Sample Name field in Run Design. By convention, different
LB tags correspond to different library preparations. Use this
option when the LB tag does not follow this convention to treat
all reads as from the same sequencing library.
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 100
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
Page 101
Trim Ultra-Low Use this utility to trim PCR Adapters from a HiFi reads Data Set created
Adapters using an ultra-low DNA sequencing library.
• The utility accepts HiFi reads (BAM format) as input. HiFi reads are
reads generated with CCS analysis whose quality value is equal to or
greater than 20.
Note: If starting with a very low-input DNA sample using the SMRTbell
gDNA sample amplification kit, you must run this utility (followed by the
Mark PCR Duplicates utility) on the resulting Data Set prior to running any
secondary analysis application.
Min. CCS Predicted 20 Phred-scale integer QV cutoff for filtering HiFi reads. The default
Accuracy (Phred Scale) for all applications is 20 (QV 20), or 99% predicted accuracy.
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
Page 102
• Reads Without PCR Adapters: The number of reads without PCR adapters in
the sequence data.
• Percent Bases in Reads with Adapters: The percentage of bases in reads in
the sequence data that contain PCR adapters.
• Percent Reads with Adapters: The percentage of reads in the sequence data
that contain PCR adapters.
PCR Adapters > PCR Adapter Data
• Bio Sample Name: The name of the biological sample associated with the
PCR adapters.
• PCR Adapter Name: A string containing the pair of PCR adapter indices for
which the following metrics apply.
• Polymerase Reads: The number of polymerase reads associated with the
PCR adapter.
• Bases: The number of bases associated with the PCR adapter.
• Mean Read Length: The mean read length of reads associated with the PCR
adapter.
• Mean PCR Adapter Quality: The mean PCR adapter quality associated with
the PCR adapter.
PCR Adapters > PCR Adapter Read Statistics
• Number of Reads Per PCR Adapter: Histogram distribution of the mean
number of reads per PCR adapter.
• PCR Adapter Frequency Distribution: Histogram distribution of reads with
PCR adapter mapped to the number of barcoded samples.
• Mean Read Length Distribution: Maps the mean read length against the
number of barcoded samples.
PCR Adapters > PCR Adapter Quality Scores
• Histogram distribution of PCR adapter quality scores. The scores
range from 0-100, with 100 being a perfect match.
PCR Adapters > PCR Adapter Read Binned Histograms
• Read Length Distribution By PCR Adapter: Histogram distribution of the read
length by PCR adapter. Each column of rectangles is similar to a read length
histogram rotated vertically, seen from the top.
• PCR Adapter Quality Distribution By Barcode: Histogram distribution of the
per-barcode version of the Read Length Distribution by PCR Adapter
histogram.
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
Page 103
Circular Use this utility to identify consensus sequences for single molecules.
Consensus
Sequencing • The utility accepts Subreads (BAM format) as input.
(CCS)
Importing/exporting analysis settings
• Click Import Analysis Settings and select a previously-saved CSV file
containing the desired settings (including Advanced Parameters) for
the selected utility. The imported utility settings are set.
• Click Export to create a CSV file containing all the settings you
specified for the utility. You can then import this file when creating
future jobs using the same utility. You can also use this exported file
as a template for use with later jobs.
Detect 5mC Sites (Default = OFF)
• If set to ON, kinetics analysis to identify 5mC CpG sites will be
performed.
Parameters
Minimum CCS Read Length 10 The minimum length for the median size of insert reads to
generate a consensus sequence. If the targeted template is
known to be a particular size range, this can filter out alternative
DNA templates.
Maximum CCS Read 50,000 The maximum length for the median size of insert reads to
Length generate a consensus sequence. If the targeted template is
known to be a particular size range, this can filter out alternative
DNA templates.
Generate a Consensus for OFF Generate a consensus for each strand. Warning: This is an
Each Strand experimental option for the CCS algorithm, and may not be
compatible with all downstream applications. We recommend
using command-line analysis for this feature.
Process All Reads OFF Specifies behavior identical to on-instrument CCS reads
generation, overriding all other cutoffs. This setting writes a CCS
read for every ZMW in the input Data Set. Set to OFF to specify
more restrictive settings.
Include Kinetics OFF If ON, include kinetics per-base data required for methylation
Information with CCS DNA analysis. Note: This results in a BAM file that is 3-4 times
Analysis output larger. This option applies only when Process All Reads is set to
ON.
Advanced CCS Options NONE Space-separated list of additional command-line options to CCS
analysis. Not all supported command-line options can be used,
and HPC settings cannot be modified. See SMRT® Tools
reference guide v11.0 for details.
Minimum Predicted 0.99 The minimum predicted accuracy of a read, ranging from 0 to 1.
Accuracy (Deprecated) (0.99 indicates that only reads expected to be 99% accurate are
emitted.) Note: This setting is ignored if the Process All Reads
advanced parameter is set to ON.
Minimum Number of 3 The minimum number of full passes for a ZMW to be used. Full
Passes (Deprecated) passes must have an adapter hit before and after the insert
sequence and so do not include any partial passes at the start
and end of the sequencing reaction. Note: This setting is
ignored if the Process All Reads advanced parameter is set to
ON.
Detect And Split OFF Specifies that any detected heteroduplexes are separated into
Heteroduplex Read separate reads.
Page 104
Advanced parameters Default value Description
Compute Settings Select (Optional) Specify the distributed computing cluster settings
configuration, if made available by the site SMRT Link
administrator.
• HiFi Reads: The total number of CCS reads whose quality value is equal to or
greater than 20.
• HiFi Yield (bp): The total yield (in base pairs) of the CCS reads whose quality
value is equal to or greater than 20.
• HiFi Read Length (mean, bp): The mean read length of the CCS reads whose
quality value is equal to or greater than 20.
• HiFi Read Quality (median): The median number of CCS reads whose quality
value is equal to or greater than 20.
• HiFi Number of Passes (mean): The mean number of passes used to
generate CCS reads whose quality value is equal to or greater than 20.
CCS Analysis Report > HiFi Read Length Summary
• Read Length (bp): The HiFi read length, ranging from ≥ 0 to ≥ 40,000 base
pairs.
• Reads: The number of HiFi reads with the specified read length.
• Reads (%): The percentage of HiFi reads with the specified read length.
• Yield (bp): The number of base pairs in the HiFi reads with the specified read
length.
• Yield (%): The percentage of base pairs in the HiFi reads with the specified
read length.
CCS Analysis Report > HiFi Read Quality Summary
• Read Quality (Phred): Phred-scale quality values, ranging from QV ≥20 to QV
≥50.
• Reads: The number of HiFi reads with the specified read quality.
• Reads (%): The percentage of HiFi reads with the specified read quality.
• Yield (bp): The number of base pairs in the HiFi reads with the specified read
quality.
• Yield (%): The percentage of base pairs in the HiFi reads with the specified
read quality.
CCS Analysis Report > Read Length Distribution
• HiFi Read Length Distribution: Histogram distribution of HiFi reads by read
length.
• Yield by HiFi Read Length: Histogram distribution of the cumulative yields of
CCS reads by read length.
• Read Length Distribution: Histogram distribution of all reads by read length.
Page 105
CCS Analysis Report > Number of Passes
• Histogram of the number of complete subreads in CCS reads, broken
down by number of reads.
CCS Analysis Report > Read Quality Distribution
• Histogram distribution of the CCS reads by the Phred-scale read
quality.
CCS Analysis Report > Predicted Accuracy vs. Read Length
• Heat map of CCS read lengths and predicted accuracies.
Data > File Downloads
The following files are available on the analysis results page. Additional
files are available on the SMRT Link server, in the analysis output
directory.
Page 106
Working with barcoded data
This section describes how to use SMRT Link to work with barcoded data.
Demultiplex Barcodes analysis is powered by the lima SMRT Analysis
tool.
The canned data provided with SMRT Link v11.0 includes 7 barcode sets:
Run Design in SMRT Link v11.0 contains a required Bio Sample Name
field for both single and multiplexed samples.
Well Sample Name and Bio Sample Name entered in Sequel II systems,
and in Run Designs for multiplexed runs:
Step 1: Specify Note: If you specified the barcode setup in Run Design, the demultiplexing
the barcode is performed automatically after the data is transferred to the SMRT link
setup and server. You can also specify the barcode setup manually by selecting
sample names in SMRT Analysis > Create New Job and then selecting the Demultiplex
a Run Design Barcodes data utility.
Page 107
1. In SMRT Link, create a new Run Design as described in “Creating a
new Run Design” on page 16. Before you finish the new Run Design,
perform the following steps.
2. Click Barcoded Sample Options and then click Yes for Sample is
Barcoded. Additional fields related to barcoding display.
3. Specify a Barcode Set using the dropdown list.
Note: You can specify up to 10,000 samples. Specifying more than
10,000 samples may cause a delay of several minutes in analysis
submission.
4. Specify if the same barcodes are used on both ends of the
sequences.
– Selecting Yes specifies symmetric and tailed designs where all the
reads have the same barcodes on both ends of the insert sequence.
Barcode analysis of such experiments retains only data with the
same barcode identified on both ends.
– Selecting No specifies asymmetric designs where the barcodes are
different on each end of the insert. Barcode analysis of such
experiments retains any barcode pair combination identified in the
Data Set.
5. SMRT Link automatically creates a CSV-format Autofilled Barcode
Name file. The barcode name is populated based on your choice of
barcode set, and if the barcodes are the same at both ends of the
sequence. The file includes a column of automatically-generated Bio
Sample Names 1 through N, corresponding to barcodes 1 through N,
for the biological sample names. There are two different ways to
specify which barcodes to use, and assign biological sample names
to barcodes. (Note: Bio Sample Names are hardcoded and can be
traced through secondary analysis using SMRT Analysis.)
Interactively:
• Click Interactively, then drag barcodes from the Available Barcodes
column to the Included Barcodes column. (Use the check boxes to
select multiple barcodes.)
Page 108
• (Optional) Click a Bio Sample field to edit the Bio Sample Name
associated with a barcode. Note: Avoid using spaces in Bio Sample
Names as they may lead to third-party compatibility issues.
• (Optional) Click Download as a file for later use.
• Click Save to save the edited barcodes/Bio Sample names. You see
Success on the line below, assuming the file is formatted correctly.
From a File:
• Click From a File, then click Download File. Edit the file and enter the
biological sample names associated with the barcodes in the second
column, then save the file. Use alphanumeric characters, spaces
(allowed but not recommended for compatibility with third-party
downstream software), hyphens, underscores, colons, or periods only
- other characters will be removed automatically, with a maximum of
40 characters. If you did not use all barcodes in the Autofilled Barcode
Name file in the sequencing run, delete those rows.
• Note: Open the CSV file in a text editor and check that the columns are
separated by commas, not semicolons or tabs.
• Select the Barcoded Sample file you just edited. You see Success on
the line below, assuming the file is formatted correctly.
6. Specify if and where to automatically generate HiFi reads (reads gen-
erated with CCS analysis whose quality value is equal to or greater
than 20):
– On Instrument (available only for the Sequel IIe system): HiFi reads
are automatically generated on the instrument, before transfer to
the compute cluster where SMRT Link is installed.
– In SMRT Link: HiFi reads are automatically generated after transfer
to the compute cluster where SMRT Link is installed.
– Do Not Generate: HiFi reads are not generated for this run. Only
subread data are transferred to the local compute cluster where
SMRT Link is installed.
7. Click Save.
Step 2: Perform Load the samples and perform the sequencing run, using the Run Design
the sequencing you created in Step 1. The demultiplexing analysis is performed
run automatically on the SMRT Link server once the data is transferred from
the Sequel II systems. This creates an analysis of type Demultiplex
Barcodes (Auto) in the SMRT Analysis module. You can click to select
this analysis and review the reports and data created. If everything looks
fine, you can continue to Step 4 and use the demultiplexed Data Set(s)
created by the run as input to further analysis.
Page 109
Step 3: If instead you did not specify the barcode setup in the Run Design, or if
(Optional) Run you need to change any of the parameters used in the Demultiplex
the Demultiplex Barcodes analysis automatically launched from Run Design, run the
Barcodes data Demultiplex Barcodes data utility. This separates reads by barcode and
utility creates a new demultiplexed Data Set that you can then use as input to
other secondary analysis applications.
Interactively:
• Click Interactively, then drag barcodes from the Available Barcodes
column to the Included Barcodes column. (Use the check boxes to
select multiple barcodes.)
• (Optional) Click a Bio Sample field to edit the Bio Sample Name
associated with a barcode. Note: Avoid using spaces in Bio Sample
Names as they may lead to third-party compatibility issues.
• (Optional) Click Download as a file for later use.
• Click Submit to save the edited barcodes/bio sample names. You see
Success on the line below, assuming the file is formatted correctly.
Page 110
From a File:
• Click From a File, then click Download File. Edit the file and enter the
biological sample names associated with the barcodes in the second
column, then save the file. Use alphanumeric characters, spaces
(allowed but not recommended for compatibility with third-party
downstream software), hyphens, underscores, colons, or periods only
- other characters will be removed automatically, with a maximum of
40 characters. If you did not use all barcodes in the Autofilled Barcode
Name file in the sequencing run, delete those rows.
• Note: Open the CSV file in a text editor and check that the columns are
separated by commas, not semicolons or tabs.
• Select the Barcoded Sample file you just edited. You see Success on
the line below, assuming the file is formatted correctly.
11. Specify the name for the new demultiplexed Data Set that will display
in SMRT Link. The application creates a copy of the input Data Set,
renames it to the name specified, and creates demultiplexed child
Data Sets linked to it. The input Data Set remains separate and
unmodified.
12. (Optional) Specify any advanced parameters.
13. Click Start. After the analysis is finished, a new demultiplexed Data
Set is available.
Step 4: Run All secondary analysis applications except Demultiplex Barcodes can use
applications demultiplexed Data Sets as input.
using the
demultiplexed Note: For Iso-Seq analysis with barcoded samples, use the Iso-Seq
data as input application instead of the Demultiplex Barcodes data utility, as the Iso-
Seq application already includes the demultiplexing step as part of the
pipeline. When performing multiplexed Iso-Seq analysis, ensure that the
Run Design Sample Is Barcoded option is set to No (the default setting).
Then, in SMRT Analysis, go straight to the Iso-Seq application and, in the
parameters section, select a Primer Set containing multiple primers, such
as IsoSeq_Primers_12_Barcodes_v1.
Page 111
– You can select the entire Data Set as input, or one or more specific
outputs from selected barcodes, to a maximum of 16 sub-Data
Sets, 12 for Iso-Seq.
– One Analysis for All Data Sets: Runs one job using all the selected
barcode Data Sets as input, for a maximum of 30 Data Sets.
– One Analysis per Data Set - Identical Parameters: Runs one
separate job for each of the selected barcode Data Sets, using the
same parameters, for a maximum of 10,000 Data Sets. Optionally
click Advanced Parameters and modify parameters.
– One Analysis per Data Set - Custom Parameters: Runs one
separate job for each of the selected barcode Data Sets, using
different parameters for each Data Set, for a maximum of 16 Data
Sets. Click Advanced Parameters and modify parameters. Then
click Start and Create Next. You can then specify parameters for
each of the included barcode Data Sets.
– Note: The number of Data Sets listed is based on testing using
PacBio's suggested compute configuration, listed in SMRT Link
software installation guide (v11.0).
4. Click Start to submit the job.
Page 112
Demultiplex Barcodes can demultiplex samples that have a unique per-
sample barcode pair and were pooled and sequenced on the same SMRT
Cell. There are four different methods for barcoding samples with PacBio
technology:
Page 113
Symmetric mode
For symmetric and tailed library designs, the same barcode is attached to
both sides of the insert sequence of interest. The only difference is the
orientation of the trailing barcode. For barcode identification, one read
with a single barcode region is sufficient. Symmetric barcoding is used
for samples constructed using Barcoded overhang adapters, Barcoded
universal primer and target enrichment (linear). This is also the default
scoring mode in SMRT Link v10.2 and later.
Asymmetric mode
Barcode sequences are different on the ends of the SMRTbell template.
Asymmetric mode is used with the M13 barcoding procedure. (See the
document Procedure & checklist - Preparing SMRTbell libraries using
PacBio barcoded M13 primers for multiplex SMRT sequencing for
details.) PacBio using this mode only for small inserts (up to 5 kb) where
both ends of the insert are expected to be sequenced. Both barcodes
must be detected.
Note: For both Symmetric and Asymmetric modes, the limit for unique
individual barcode sequences is 768, and the limit for the number of
different barcode pairs is 10,000.
When running the Demultiplex Barcodes data utility in SMRT Link, set the
Same Barcodes on Both Ends of the Sequence option to Off.
Mixed mode
Libraries with combined symmetric and asymmetric barcoding are not
supported.
Page 114
Automated analysis
Auto Analysis and Pre Analysis allow a specific analysis to be
automatically run after a sequencing run has finished and the data is
transferred to the SMRT Link server. The analysis can include
demultiplexed output.
• Auto Analysis can be set up in Run Design or SMRT Analysis after the
Run Design is saved and before the run is loaded on the instrument.
• Auto Analysis can be run on HiFi reads, and includes all analysis
applications available.
• Auto Analysis works with all Sequel II systems.
Page 115
– To see information about parameters for all secondary analysis
applications provided by PacBio, see “PacBio® secondary analysis
applications” on page 54.
11. Click Start to submit the Auto Analysis job.
Page 116
From SMRT Analysis:
1. On the home page, select SMRT Analysis. You see a list of all jobs.
2. To filter the jobs, click the funnel in the State column header, then
click Created. This displays only jobs in the Created state.
3. Click the job of interest.
4. Click the From Multi-Job link.
5. Click Analysis Overview > Status of Individual Analyses. This
displays information about the analysis, including the application
used.
Page 117
Visualizing data using IGV
Once an analysis has successfully completed, visualize the results using
the Integrative Genomics Viewer (IGV).
• Iso-Seq Analysis
• HiFi Mapping
• Microbial Genome Analysis
• Structural Variant Calling
• When creating an analysis, you can specify that SMRT Link combines
alignment BAM files for IGV visualization by setting the Consolidate
Mapped BAMs for IGV option to ON.
Note: This setting doubles the amount of storage used by the BAM
files, which can be considerable. Make sure to have enough disk
space available. This setting may also result in longer run times.
Note: If you are performing de novo assembly, you must use links to
the draft assembly BAM files, which are clearly labeled.
Page 118
5. In IGV, choose File > Load from URL… and paste the link into the File
URL input field. Click OK.
6. Repeat for the remaining links.
If you ran an analysis and there are no Data > IGV Visualization Files links,
the analysis generated multiple alignment BAM files over 10 GB, but did
not consolidate the files. Click the Launch BAM Consolidation button to
consolidate them.
Page 119
Using the PacBio® self-signed SSL certificate
SMRT Link v11.0 ships with a PacBio self-signed SSL certificate. If this is
used at your site, security messages display when you try to login to
SMRT Link for the first time using the Chrome browser. These messages
may also display other times when accessing SMRT Link.
1. The first time you start SMRT Link after installation, you see the
following. Click the Advanced link.
The Login dialog displays, where you enter the User Name and Password.
The next time you access SMRT Link, the Login dialog displays directly.
Page 120
Sequel® II system and Sequel IIe system output files
This section describes the data generated by the Sequel IIe system and
Sequel II system for each SMRT Cell transferred to network storage.
<your_specified_output_directory>/r64012_211206ee_183753/1_A01/
|--m64012ee_211206_183753.baz2bam_1.log
|--m64012ee_211206_183753.ccs.log
|--m64012ee_211206_183753.ccs_reports.json
|--m64012ee_211206_183753.ccs_reports.txt
|--m64012ee_211206_183753.consensusreadset.xml
|--m64012ee_211206_183753.hifi.reads.bam
|--m64012ee_211206_183753.hifi.reads.bam.pbi
|--m64012ee_211206_183753.sts.xml
|--m64012ee_211206_183753.zmw_metrics.json.gz
|--m64012ee_211206_183753.transferdone
|-- m64012ee_211206_183753.5mc_report.json
|-- m64012ee_211206_183753.primrose.log
|-- bc1001--bc1001/m64012e_211206_183753.bc1001--bc1001.consensusreadset.xml
|-- bc1001--bc1001/m64012e_211206_183753.hifi_reads.bc1001--bc1001.bam
|-- bc1001--bc1001/m64012e_211206_183753.hifi_reads.bc1001--bc1001.bam.pbi
|-- m64012e_211206_183753.barcodes.fasta
|-- m64012e_211206_183753.lima.log
|-- m64012e_211206_183753.lima_counts.txt
|-- m64012e_211206_183753.lima_guess.json
|-- m64012e_211206_183753.lima_guess.txt
|-- m64012e_211206_183753.lima_reports.txt
|-- m64012e_211206_183753.lima_summary.txt
|-- m64012e_211206_183753.unbarcoded.consensusreadset.xml
|-- m64012e_211206_183753.unbarcoded.hifi_reads.bam
|-- m64012e_211206_183753.unbarcoded.hifi_reads.bam.pbi
In these examples:
Page 121
The run directory includes a subdirectory for each collection/cell
associated with a sample well - in this case 1_A01. The collection/cell
subdirectory can include the following output files:
• ccs.log: Log file from the CCS analysis. Informative for debugging
and performance tracking by PacBio.
• ccs_reports.json, ccs_reports.txt: Contains processing metrics
summarizing how many ZMWs generated HiFi reads, and how many
ZMWs failed to generate CCS reads. These files contain the same
information, and are used internally by PacBio Technical Support.
• hifi.reads.bam: Contains the HiFi reads in BAM format.
Note: If low-quality reads are included in the Run Design, the Sequel
IIe system will output a reads.bam file, which contains HiFi reads and
non-HiFi reads:
– HiFi reads (QV 20 or higher)
– Lower-quality but still polished consensus reads (QV 1 - QV 20)
– Unpolished consensus reads (RQ=-1)
– 0- or 1-pass subreads unaltered (RQ=-1)
The reads.bam file should not be used by itself as input for non-SMRT
Link tools that expect ≥QV 20. The BAM format is a binary,
compressed, record-oriented container format for raw or aligned
sequence reads. The associated SAM format is a text representation
of the same data. The BAM specifications are maintained by the
SAM/BAM Format Specification Working Group. BAM files produced
by all Sequel II systems are fully compatible with the BAM
specification. For more information on the BAM file format
specifications, click here.
• hifi.reads.bam.pbi: Index file that allows for random access of
HiFi reads in the BAM file.
• sts.xml: Contains summary statistics about the collection/cell and
its post-processing.
• zmw_metrics.json.gz: Contains processing information used to
generate RunQC plots.
• 5mc_report.json, primrose.log: Contains information about 5mC
CpG Detection analysis (using the primrose tool), if performed.
• <Barcode Name>.consensusreadset.xml: Contains reads
associated with a specific barcode.
• <Barcode Name>.bam: Contains HiFi reads associated with a specific
barcode, in .bam format.
• <Barcode Name>.bam.pbi: Index file that allows for random access
of HiFi reads in the BAM file.
• <Barcode Name>.fasta: Contains reads associated with the specific
barcode, in FASTA format.
• lima.log: Log file from the demultiplexing analysis, if performed.
Informative for debugging and performance tracking by PacBio.
• lima.counts.txt: Contains the counts of each observed barcode
pair. Only passing ZMWs are counted.
Page 122
• lima.guess.json, lima.guess.txt: Describes the barcode
subsetting process activated using the --peek and --guess options.
These files contain the same information, and are used internally by
PacBio Technical Support.
• lima.reports.txt: A tab-separated file describing each ZMW,
unfiltered. This is useful information for investigating the
demultiplexing process and the underlying data. A single row contains
all reads from a single ZMW.
• lima.summary.txt: Lists how many ZMWs were filtered, how many
ZMWs are the same or different, and how many reads were filtered.
• unbarcoded.consensusreadset.xml,unbarcoded.hifi_reads.bam
unbarcoded.hifi_reads.bam.pbi: Contains information on HiFi
reads not associated with any barcode.
Note: The Sequel IIe system runs CCS on-instrument by default and the
subreads.bam, subreads.bam.pbi, scraps.bam and scraps.bam.pbi
files are no longer generated and are not available. Even though the
subreads.bam and subreads.bam.pbi files are not accessible by
default, there is a mechanism available to enable their output. For
detailed instructions on how to enable the output of these files, contact
your Field Applications Support team members.
If not using SMRT Link for subsequent analysis, use these three files as
input with any third-party analysis tools.
1. In Run QC, click the desired run, then click the sample name to view
the CCS Data Set.
2. Click Analyses in the left-side panel.
3. Click the Export Reads analysis.
Page 123
4. To locate the directory containing the three hifi_reads files, append
/outputs to the path shown.
<your_specified_output_directory>/r64008_20160116_003347/1_A01
|-- m64008_160116_003634.baz2bam_1.log
|-- m64008_160116_003634.scraps.bam
|-- m64008_160116_003634.scraps.bam.pbi
|-- m64008_160116_003634.subreads.bam
|-- m64008_160116_003634.subreads.bam.pbi
|-- m64008_160116_003634.subreadset.xml
|-- m64008_160116_003634.sts.xml
|-- m64008_160116_003634.transferdone
Page 124
• subreads.bam.pbi: Provides backwards-compatibility with the APIs
enabled for accessing the cmp.h5 file.
• subreadset.xml: This file is needed to import data into SMRT Link.
• sts.xml: Contains summary statistics about the collection/cell and
its post-processing.
• transferdone: Contains a list of files successfully transferred.
• .bam file
• bam.pbi file
• subreadset.xml or consensusreadset.xml file
What is the average size of the file bundle for a 30-hour movie of HiFi reads?
Approximately 50 Gb.
What is the difference between a regular .bam file and an aligned.bam file?
The subreads.bam file contains all the subreads sequences, while the aligned.bam file additionally
contains the genomic coordinates of the reads mapped to a reference sequence.
The subreads.bam file is created by the Sequel II systems, while the aligned.bam file is created by
SMRT Link after running mapping analysis applications.
Page 125
Secondary analysis output files
This is data produced by secondary analysis, which is performed on the
primary analysis data generated by the instrument.
• All files for a specific job reside in one directory named according to
the job ID number.
• Every job result has the following file structure. Example:
$SMRT_ROOT/userdata/jobs_root/0000/0000000/0000000002/
├── cromwell-job -> $SMRT_ROOT/userdata/jobs-root/cromwell-executions/
pb_demux_subreads_auto/24e691c8-8d0d-4670-9db3-c7cb1126e8f8
├── entry-points
│ └── ae6f1c2c-b4a2-41cc-8e44-98b494f12a57.subreadset.xml
├── logs
│ ├── pb_simple_mapping
│ │ └── 24e691c8-8d0d-4670-9db3-c7cb1126e8f8
│ │ ├── call-mapping
│ │ │ └── execution
│ │ │ ├── stderr
│ │ │ └── stdout
│ └── workflow.24e691c8-8d0d-4670-9db3-c7cb1126e8f8.log
├── outputs
│ ├── mapping.report.json -> $SMRT_ROOT/userdata/jobs-root/cromwell-executions/
pb_simple_mapping/24e691c8-8d0d-4670-9db3-c7cb1126e8f8/call-mapping/execution/
mapping.report.json
│ └── mapped.bam -> $SMRT_ROOT/userdata/jobs-root/cromwell-executions/
pb_simple_mapping/24e691c8-8d0d-4670-9db3-c7cb1126e8f8/call-mapping/execution/
mapped.bam
├── pbscala-job.stderr
├── pbscala-job.stdout
└── workflow
├── analysis-options.json
├── datastore.json
├── engine-options.json
├── inputs.json
├── metadata.json
├── metadata-summary.json
├── task-timings.metadata.json
└── timing-diagram.html
Page 126
– <task_id>/script: The SMRT Tools command for the given
analysis task.
– <task_id>/script.submit: The JMS submission script wrapping
run.sh.
– <task_id>stdout.submit: The stdout collection for the
script.submit script.
– <task_id>/stderr.submit: The stderr collection for the
script.submit script.
• workflow/: Contains JSON files for job settings and workflow
diagrams.
– datastore.json: JSON file representing all output files imported
by SMRT Link.
• outputs/: A directory containing symbolic links to all datastore files,
which reside in the Cromwell execution directory. This is provided as
a convenience and is not intended as a stable API; note that external
resources from dataset XML and report JSON file are not included
here. Demultiplexing outputs are nested in additional subdirectories.
• pbscala-job.stderr: Log collection of stderr output from the
SMRT Link job manager.
• pbscala-job.stdout: Log collection of stdout output from the
SMRT Link job manager. (Note: This is the file displayed as Data >
SMRT Link Log on the analysis results page.)
A SMRT Link job generates several types of output files. You can use
these data files as input for further processing, pass on to collaborators,
or upload to public genome sites. Depending on the analysis application
being used, the output directory contain files in the following formats:
Page 127
To download data files created by SMRT Link:
1. On the home page, select SMRT Analysis. You see a list of all jobs.
2. Click the job link of interest.
3. Click Data > File Downloads, then click the appropriate file. The file is
downloaded according to your browser settings.
• (Optional) Click the small icon to the right of the file name to copy the
file’s path to the Clipboard.
Page 128
Configuration and user management
LDAP
SMRT Link supports the use of LDAP for user login and authentication.
Without LDAP integration with SMRT Link, only one user (with the login
admin/admin) is enabled. You can add new users after SMRT Link is
integrated and configured to work with LDAP; you can also add new users
using WSO2 API Manager or Keycloak without LDAP integration.
• For details on integrating LDAP and SMRT Link, see the document
SMRT Link software installation guide (v11.0).
SSL
SMRT Link requires the use of Secure Sockets Layer (SSL) to enable
access via HTTP over SSL (HTTPS), so that SMRT Link logins and data
are encrypted during transport to and from SMRT Link. SMRT Link
includes an Identity Server (WSO2 API Manager or Keycloak), which can
be configured to integrate with your LDAP/AD servers and enable user
authentication using your organizations’ user name and password. To
ensure a secure connection between the SMRT Link server and your
browser, the SSL certificate can be installed after completing SMRT Link
installation.
SMRT Link ships with a PacBio self-signed SSL certificate. If used, each
user will need to accept the browser warnings related to access in an
insecure environment. Otherwise, your IT administrator can configure
desktops to always trust the provided self-signed certificate. Note that
SMRT Link is installed within your organization’s secure network, behind
your organization’s firewall.
• For details on updating SMRT Link to use an SSL certificate, see the
document SMRT Link software installation guide (v11.0).
Page 129
The following procedures are available only for SMRT Link users whose
role is Admin.
Page 130
• Note: There can be multiple users with the Admin role; but there must
always be at least one Admin user.
5. Click Save.
Page 131
Hardware/software requirements
Note: SMRT Link server hardware and software requirement are listed in
the document SMRT Link software installation guide (v11.0).
Page 132
Appendix A - PacBio terminology
General terminology
• SMRT® Cell: Consumable substrates comprising arrays of zero-mode
waveguide nanostructures. SMRT Cells are used in conjunction with
the DNA sequencing kit for on-instrument DNA sequencing.
• SMRTbell® template: A double-stranded DNA template capped by
hairpin adapters (i.e., SMRTbell adapters) at both ends. A SMRTbell
template is topologically circular and structurally linear, and is the
library format created by the DNA template prep kit.
• collection: The set of data collected during real-time observation of
the SMRT Cell; including spectral information and temporal
information used to determine a read.
• Zero-mode waveguide (ZMW): A nanophotonic device for confining
light to a small observation volume. This can be, for example, a small
hole in a conductive layer whose diameter is too small to permit the
propagation of light in the wavelength range used for detection.
Physically part of a SMRT Cell.
• Run Design: Specifies
– The samples, reagents, and SMRT Cells to include in the
sequencing run.
– The run parameters such as movie time and loading to use for the
sample.
• adaptive loading: Uses active monitoring of the ZMW loading process
to predict a favorable loading end point.
• unique molecular yield: The sum total length of unique single
molecules that were sequenced. It is calculated as the sum of per-
ZMW median subread lengths.
Read terminology
• polymerase read: A sequence of nucleotides incorporated by the DNA
polymerase while reading a template, such as a circular SMRTbell
template. They can include sequences from adapters and from one or
multiple passes around a circular template, which includes the insert
of interest. Polymerase reads are most useful for quality control of
the instrument run. Polymerase read metrics primarily reflect movie
length and other run parameters rather than insert size distribution.
Polymerase reads are trimmed to include only the high-quality region.
Note: Sample quality is a major factor in polymerase read metrics.
• subreads: Each polymerase read is partitioned to form one or more
subreads, which contain sequence from a single pass of a
polymerase on a single strand of an insert within a SMRTbell template
and no adapter sequences. The subreads contain the full set of
quality values and kinetic measurements. Subreads are useful for
applications such as de novo assembly, base modification analysis,
and so on.
• longest subread length: The mean of the maximum subread length
per ZMW.
Page 133
• insert length: The length of the double-stranded nucleic acid fragment
in a SMRTbell template, excluding the hairpin adapters.
• circular consensus (CCS) reads: The consensus sequence resulting
from alignment between subreads taken from a single ZMW.
Generating CCS reads does not include or require alignment against a
reference sequence but does require at least two full-pass subreads
from the insert. CCS reads are generated with CCS analysis. CCS
reads with quality value equal to or greater than 20 are called HiFi
reads.
• HiFi reads: Reads generated with CCS analysis whose quality value is
equal to or greater than 20.
Page 134
– Filtering/selection of data that meets a desired criteria, such as
quality, read length, and so on.
– Comparison of reads to a reference or between each other for
mapping and variant calling, consensus sequence determination,
alignment and assembly (de novo or reference-based), variant
identification, and so on.
– Quality evaluations for a sequencing run, consensus sequence,
assembly, and so on.
– PacBio’s SMRT Analysis contains a variety of secondary analysis
applications including RNA and Epigenomics analysis tools.
• secondary analysis application: A secondary analysis workflow that
may include multiple analysis steps. Examples include de novo
assembly, RNA and epigenomics analysis.
• consensus: Generation of a consensus sequence from multiple-
sequence alignment.
• filtering: Removes reads that do not meet the Read Length criteria set
by the user.
• mapping: Local alignment of a read or subread to a reference
sequence.
• Auto Analysis: Allows a specific analysis to be automatically run after
a sequencing run has finished and the data is transferred to the SMRT
Link server. The analysis can include demultiplexed outputs.
– Auto Analysis works with all Sequel II systems.
• Pre Analysis: The process of CCS analysis and/or demultiplexing of
Sequel basecalled data. Pre Analysis occurs before Auto Analysis.
– Pre Analysis works with all Sequel II systems.
Accuracy terminology
• circular consensus accuracy: Accuracy based on consensus
sequence from multiple sequencing passes around a single circular
template molecule.
• consensus accuracy: Accuracy based on aligning multiple
sequencing reads or subreads together.
• polymerase read quality: A trained prediction of a read’s mapped
accuracy based on its pulse and base file characteristics (peak signal-
to-noise ratio, inter-pulse distance, and so on).
Page 135
Appendix B - Data search
Use this function to search for jobs, Data Sets, barcode files, or reference
files.
• For the Analysis State column only, click one or more of the job states
of interest: Select All, Created, Running, Submitted, Terminated,
Successful, Failed, or Aborted.
• For Date fields only, click the small calendar and select a date.
Page 136
– Less than, Less than or equals
– In range
Page 137
Appendix C - BED file format for Target Regions report
With the HiFi Mapping application, an optional Target Regions report can
be generated that displays the number (and percentage) of reads and
subreads that hit specified target regions.
The BED file required to generate the Target Regions report includes the
following fields; with one entry per line:
Page 138
Appendix D - Additional information included in the CCS Data Set Export
report
When you export a Data Set and select Export PDF Reports, a report is
produced which includes additional fields, listed below.
Page 139
• Empty coverage windows: The number of ZMWs that did not generate
CCS reads because at least one window had no coverage.
• CCS did not converge: The number of ZMWs that did not generate
CCS reads because the draft sequence had too many errors that
could not be polished in time.
• CCS below minimum RQ: The number of ZMWs that did not generate
CCS reads because the predicted accuracy is below
--min-rq.
• Unknown error: The number of ZMWs that did not generate CCS
reads due to rare implementation errors.
Page 140