link to page 1 link to page 1
March 23, 2023
Digital Biology: Implications of Genetic Sequencing
Deoxyribonucleic acid (DNA) is the molecule that carries
Figure 2. Growth of Sequences in the International
genetic information of an organism. This genetic code is
Nucleotide Sequence Database Collaboration
composed of nucleotide bases (A, T, C, and G). The
sequence of these bases encodes information that can, for
example make a protein. A genome is the complete set of
DNA in an organism. A gene sequencer reads DNA. Gene
synthesis technologies can write DNA. It is this ability to
both read and write DNA that researchers in the field of
engineering biology use to reprogram cellular systems at
the genetic level for a specific functional output. To do so,
researchers require data about what gene sequences to code
for, what functions those genes impact, and how those
genes are expressed in living organisms.
Many emergent technologies, such as artificial intelligence
(AI), require large datasets, often referred to as “big data.”
Theoretically, as more data becomes available, the
capabilities of those technologies increase. This includes
applications that require the use of genetic sequence data.
Source: CRS analysis of data from the International Nucleotide
Sequence Database Collaboration (INSDC).
Sequencing technologies have evolved rapidly, making it
Notes: INSDC includes sequence data from the DNA Data Bank of
possible to sequence entire genomes more efficiently and at
Japan; the European Nucleotide Archive; and GenBank, the National
lower cost. Sequences are collected and stored in databases,
Institutes of Health genetic sequence database.
many of which are publicly funded and freely accessible,
while others are privately held. The volume of genetic
Sequencing Life on Earth
sequence information in databases has grown as sequencing
technology has evolved.
See Figure 1 and
Figure 2.
Private companies and public research groups produce large
amounts of genetic sequence data. For example, the Broad
Figure 1. Cost of DNA Sequencing over Time
Institute of MIT and Harvard claims to produce roughly
500 terabases (500 trillion bases) of genomic data per
month. There is great potential value in the aggregate
volume of genetic datasets that can be collectively mined to
discover and characterize relationships among genes.
In 2018, the National Institutes of Health launched the All
of Us
precision medicine research program, which aims to
collect clinical, lifestyle, electronic health record, and
genomic data from at least 1 million people to advance the
development of precision medicine. Since its launch, the
program has made available about 100,000 whole genome
sequences. Genomic data, along with other information,
including data about the communities where participants
live, is available via a cloud-based platform. All direct
identifiers are removed from the data, and other privacy
requirements have been put in place for researchers seeking
access, in order to protect participants’ privacy. This
Source: CRS analysis of data from Kris Wetterstrand, “DNA
combination of data may help researchers better understand
Sequencing Costs: Data from the National Human Genome Research
how genes can cause or influence diseases in the context of
Institute Genome Sequencing Program,” National Institutes of Health,
other health determinants.
at http://www.genome.gov/sequencingcostsdata.
Notes: A megabase (Mb) is a unit of measurement for DNA. One
The Earth Microbiome Project (EMP) is a global research
megabase = 1 mil ion bases. For generating the “Cost per Genome,”
project to sequence global microbial life funded by public
the assumed genome size was 3,000 Mb.
and private entities. Its goal was to sequence 200,000
samples from different biomes to produce a global Gene
https://crsreports.congress.gov
Digital Biology: Implications of Genetic Sequencing
Atlas. The project is currently at capacity and not accepting
On March 2, 2023, the Bureau of Industry and Security
new samples until additional funding is identified.
(BIS) in the Department of Commerce amended the Export
Administration Regulations (EAR) by adding BGI
Estimates suggest less than 0.1% of the identified plant and
Research, BGI Tech Solutions Co., Ltd., and Forensic
animal species have been sequenced. The Earth Biogenome
Genomics International to the Entity List pursuant to
Project, an international network of public, private and
§ 744.11 of the EAR. BIS states that the addition of these
nonprofit institutions, is attempting to sequence, catalogue,
entities is based on information indicating their collection
and characterize all known animal, plant, and fungal species
and analysis of genetic data poses a significant risk of
within 10 years. Launched in 2020, the project suggests that
contributing to monitoring and surveillance by the
accomplishing this goal could have numerous scientific and
government of China, which has been utilized in the
societal impacts. These include better understanding of
repression of ethnic minorities in China. BIS also indicates
evolutionary relationships among organisms; better
that the actions of these entities concerning the collection
understanding of ecosystem composition and functions; the
and analysis of genetic data present a significant risk of
discovery of new species; the study of the role of climate
diversion to China’s military programs.
change on biodiversity; better understanding and
management of future pandemics; and identification of
International Governance
genetic variations for improving agriculture and developing
Digital sequence information (DSI) has been the focus of
new biomaterials.
recent debate in multiple international forums. Debate has
focused on how publication of and access to genetic
White House Initiative
sequences may affect international access and benefit-
In March 2023, the White House Office of Science and
sharing (ABS) agreements around genetic material. While
Technology Policy released
Bold Goals for U.S.
the United States is not a party to all of the agreements
Biotechnology and Biomanufacturing. Among other issues
discussed below, outcomes of these negotiations may affect
associated with genetic sequencing, it announced a goal of
the strategic competitiveness of U.S. researchers and
sequencing 1 million microbial species’ genomes within
companies.
five years and stated, “Storing and analyzing huge amounts
of genome and phenotype data will require innovations in
DSI issues have been raised in the context of the Nagoya
computing, including artificial intelligence.”
Protocol on Access to Genetic Resources and the Fair and
Equitable Sharing of Benefits Arising from their
Societal Concerns
Utilization. Parties to the protocol have been negotiating
The declining cost of sequencing has expanded the
whether DSI falls within its scope, and, if so, whether
collection of genetic data, including by testing companies
current ABS mechanisms are sufficient or a new
that give consumers access to their own genetic
mechanism is needed. While the United States has not
information. Sequencing and related capabilities have
ratified this protocol, it participates in the discussions. A
raised concerns over who is collecting the data, where it is
decision adopted in December 2022 establishes a process to
being stored, what it can be used for (e.g., forensics), and
develop and operationalize a multilateral mechanism for
who “owns” the data
ABS associated with DSI.
. For example, when an individual
submits a sample to a genetic testing company, depending
In March 2023, a draft agreement under United Nations
on the user agreement, the genetic data can be accessed by
Convention on the Law of the Sea protocol would establish
or sold to other users. Concerns over the publication and
a series of ABS requirements for DSI related to marine
access to other types of genetic sequences (e.g., viruses)
genetic resources collected in international waters.
have raised additional biosafety and biosecurity concerns.
Although the United States has not ratified this protocol, it
National Security Concerns
participates in the discussions. One draft requirement
The Intelligence
Community’s 2023 Annual Threat
stipulates that covered DSI be entered into publicly
Assessment stated that the fields of AI and biotechnology
accessible repositories and databases that are maintained
are “being developed and are proliferating faster than
nationally or internationally. The agreement, if adopted,
companies and governments can shape norms, protect
also would establish a multilateral benefit-sharing
privacy, and prevent dangerous outcomes.” The report
mechanism, including monetary payments derived from any
identified genomic sequence data as a particular area of
utilization of covered DSI.
interest, pointing toward efforts by countries, universities,
and private companies that have created, or are creating,
The Plant Treaty, ratified by the United States in 2016, aims
centralized databases to collect, store, process, and analyze
to guarantee food security through the conservation,
genetic data. The report further identified China’s efforts to
exchange, and sustainable use of plant genetic resources.
collect U.S. health and genomic data through its
Parties are discussing how DSI may affect its ABS
acquisitions of and investments in U.S. companies, as well
mechanisms, but no formal decisions have been made.
as through cyberattacks. This analysis followed a 2021
Considerations for Congress
assessment by the National Counterintelligence and
Policymakers may consider how current federal efforts
Security Center suggesting China understands that the
related to research, collection, use, and retention of
collection and analysis of large genomic data sets from
genomic sequence data impact U.S. competiveness and
diverse populations can help foster new medical discoveries
national security concerns. Another issue for Congress may
and cures with substantial commercial value and can
be whether the federal government should facilitate or
advance its AI and precision medicine industries.
regulate access to certain genomic data or regulate certain
uses.
https://crsreports.congress.gov
Digital Biology: Implications of Genetic Sequencing
IF12356
Todd Kuiken, Analyst in Science and Technology Policy
Disclaimer This document was prepared by the Congressional Research Service (CRS). CRS serves as nonpartisan shared staff to
congressional committees and Members of Congress. It operates solely at the behest of and under the direction of Congress.
Information in a CRS Report should not be relied upon for purposes other than public understanding of information that has
been provided by CRS to Members of Congress in connection with CRS’s institutional role. CRS Reports, as a work of the
United States Government, are not subject to copyright protection in the United States. Any CRS Report may be
reproduced and distributed in its entirety without permission from CRS. However, as a CRS Report may include
copyrighted images or material from a third party, you may need to obtain the permission of the copyright holder if you
wish to copy or otherwise use copyrighted material.
https://crsreports.congress.gov | IF12356 · VERSION 1 · NEW