A Strategy for Sequence-Variant Control: Leveraging Next-Generation Sequencing of Plasmid DNA Used for Stable Cell-Line Transfection


Sequence variants are amino-acid substitutions that can influence biopharmaceutical efficacy and immunogenicity (1). Sequence variants could be considered as product-related impurities — molecular variants arising during manufacture and/or storage that have properties incompatible with those of a desired product with respect to activity, efficacy, and safety — as outlined in ICH Q6B and related regulatory guidances. Genetic mutation and amino-acid misincorporation are two major sources of sequence variation during protein production (2). In a survey of experts from the biopharmaceutical and biotechnological industries coordinated by the International Consortium for Innovation & Quality in Pharmaceutical Development (IQ), 67% of respondents considered genetic mutation to be the larger of those two concerns. It cannot be mitigated by culture process optimization, and mutation levels are likely to change over cell passaging because of Chinese hamster ovary (CHO) cells’ genomic plasticity (3).

Per regulatory requirements for biologics development, sequence variants should be monitored closely and characterized well during chemistry, manufacturing, and controls (CMC), especially for biomanufacturing cell lines and drug substances (3, 4). Broadly applied methods for sequence-variant detection include peptide mapping based on mass spectrometry (MS) of drug-substance samples and next-generation sequencing (NGS) of genomic DNA and RNA from cell banks (3). Another useful approach is to leverage NGS-aided RNA sequencing during screening of stable cell lines. That strategy enables rejection of clones that exhibit even low levels of genetic mutation (5). Using similar approaches, our company has accelerated CMC timelines while ensuring sequence integrity for manufacturing cell lines (6, 7).

Specifically, we have applied NGS to screen complementary DNA (cDNA) from samples of clonal cells during cell-line development (CLD), rejecting those with noticeable genetic mutations above a detection limit of 0.5%. Using that approach, we can eliminate clones that carry genetic-sequence variants at the CLD stage. Often, such clones account for <5% of the entire population (data not shown), but our approach generally ensures that problematic clones are not selected for further evaluation and process development.

However, in an atypical case, we observed that 43% of clones from one CLD program carried the same genetic point mutation at different percentages. After investigation, we concluded that those cells inherited their variations from the plasmid DNA (pDNA) used for transfection — despite the fact that our scientists had performed two rounds of single-colony picking and Sanger-sequencing confirmation during preparation. Herein, we describe how introducing NGS into the quality-control (QC) workflow for pDNA can serve as an effective remedy for such problems, helping prevent the rare but possible occurrence of introducing sequence variants at the beginning of CLD and CMC.

Initial Observations

A High Percentage of Stable Clones with a Single-Nucleotide Mutation at the mRNA and Genomic-DNA Levels: The atypical program involved development of a stable CHO-K1 manufacturing cell line for a therapeutic fusion protein. After running fed-batch cultures in spin tubes, our team selected 14 single clones with high expression levels (3.4–7.8 g/L) derived from two distinct transfection pools. Harvested material underwent clarification and purification by protein A affinity chromatography. Then, we analyzed purified samples for purity, charge variants, glycan profiles, and cDNA sequences, using NGS to select clones that exhibited the requisite product quality and that were free from sequence variants at the mRNA level.

NGS results suggested that six (~43%) out of the 14 clones carried a cytosine (C) to guanine (G) mutation at the same nucleotide position (nucleotide 1090) of an exogenous gene, resulting in an amino-acid change from histidine (His) to aspartic acid (Asp) (Table 1). The mutation level ranged from 0.9% to 30.6%. We selected four clones with titers of >5 g/L for NGS cDNA sequencing confirmation. The cDNA mutation level was consistent in the confirmation run. The percentage of mutated clones (~43%) was significantly higher than the natural mutation rate of <5% that we often observe during CLD (based on NGS cDNA analysis with a detection limit of 0.5% mutation).

Sequence variance at the genetic level can originate from a number of sources during CLD, including heterogeneity in the exogenous gene used for cell transfection, genome plasticity, and natural and stress-induced mutation during clone selection and cell passage. On the other hand, sequence variance at the protein level can be generated during transcription and translation. Because the C-to-G mutation in question was detected at the transcriptional level by cDNA sequencing, we tried to dissect the variant’s root cause by extracting genomic DNA from four of the mutated clones with titers of >5 g/L, then amplifying that material using polymerase chain reaction (PCR) and performing NGS for analysis. The same mutation was detected in all four clones.

It is worth noting that results from cDNA and DNA sequencing showed different mutation levels, indicating that inserted copies at different sites in the genome could have been transcribed at different levels (Table 1). Thus, cDNA rather than genomic-DNA sequencing is recommended for mutation detection to evaluate risks at the transcription level. Our observation of high mutation frequency at an identical site in all four clones suggested that variance could have come from the DNA of an exogenous gene before or during its integration and amplification in the host-cell genome.


Detection of an Identical Mutation in pDNA Used for Transfection: Because the same point mutation was detected among all mutated clones, and because it occurred at an unusually high rate, we determined that it was unlikely to result from natural, random mutation. We examined Sanger-sequencing data from pretransfection samples of the pWX039-Pr-Z plasmid and confirmed that mutations were undetected at that stage (Table 2). To control the risk of DNA contamination and externally introduced heterogeneity, plasmids were derived by maxipreparation from an Escherichia coli colony isolated by two rounds of plate streaking and single-colony picking. Nevertheless, we could not rule out the possibility that low levels of DNA mutation slipped through the sequencing control because of the Sanger method’s limited detection resolution.

Thus, we subjected the same batch of plasmids used for CLD transfection to NGS with and without PCR amplification of the exogenous gene so that we could account for potential PCR errors and sensitivity limits effectively. Results of NGS analysis with PCR amplification showed that 2.1% of the C-to-G mutation was present in the pDNA preparation at a position identical to that detected in the CLD clones (Table 2, Figure 1). Direct NGS sequencing without PCR confirmed those results, showing a 2.2% level of the mutation at the expected position in the pDNA (Table 2). Our findings indicated that the plasmids — despite being prepared under well-accepted molecular-biology practices and strict laboratory protocols — failed to maintain their genetic integrity. Results from our study also suggested that Sanger sequencing would be an ineffective method for scrutinizing DNA identity of plasmids with similarly low levels of mutation and/or heterogeneity, leaving loopholes through which low-level gene-sequence variants could pass during CLD.

Gene synthesis is an error-prone process (8). The common approach of using controlled-pore glass (CPG) media for oligonucleotide synthesis results in an error rate of about 1 in every 100–1,000 bases (9). Mutations also can occur spontaneously in wild-type E. coli strains (10, 11). For CLD, DNA often is synthesized, cloned into expression-vector backbones, and then transformed into competent cells to isolate a single colony with the correct plasmid sequence. The sequence usually is verified by Sanger sequencing, the separation resolution of which is around 15–20% (12–14). The atypical case of pDNA-inherited mutation in stable CHO-K1 clones suggests that a method with greater sensitivity than Sanger sequencing, such as NGS, might be necessary for plasmid QC during cell-line engineering.


Meeting Increasing Expectations for Sequence-Variant Analysis

NGS of cDNA for CLD Clone Screening: Regulatory agencies have yet to give clear guidance on controlling sequence variants during biologics production. With the development of more — and more complex — products and with advances in analytical technologies, the biopharmaceutical industry is paying increasing attention to acceptance criteria for sequence variants. A limit of 0.1% variance at the protein level was suggested in 2020 (4). A recent report presented a case in which the level of sequence variant was kept below 0.05% in final drug products (15). Such criteria are set for fully optimized manufacturing processes in late development stages, and they depend greatly on quantification limits and variability of the analytical methods applied. However, regulators are increasing expectations for control of bioproduction processes.

Identification and mitigation strategies must differ across categories of sequence variance. For instance, because process optimization cannot mitigate gene-level mutations, the level mutation in stable clones should be carefully monitored during clone screening. Sanger sequencing of cDNA in banked cells has been the traditional method for confirming genetic sequences. In recent years, a growing number of pharmaceutical companies have adopted NGS during CLD to identify low-level sequence variants expressed by stable lines. That method provides high sensitivity while enabling mutation detection in early stages of clone screening. Leveraging the improved sensitivity of such methods for sequence-variant analysis can help to minimize both safety concerns for final products and problems during regulatory filing.

Scientists from AbbVie proposed using NGS for clone-screening applications in 2015, emphasizing the importance of RNA sequencing (5). Biogen scientists later developed amplicon sequencing by NGS to improve assay throughput and reduce costs (16). Other groups including teams from Merck (17) and Pfizer (2) established clone-screening processes with NGS of cDNA incorporated to measure gene of interest (GoI) integrity. At WuXi Biologics, we have applied NGS cDNA sequencing with a detection limit of 0.5% during top-clone screening to exclude cell lines with mutations at the genetic level.


Sources of Mutation in Stable Clones: In our case, we found an atypically high mutation rate (six out of 14 clones) during CLD. Sequence variants were introduced by mutations in the pDNA used for cell transfection. Even when the DNA was prepared from a single colony after two rounds of plate streaking and colony picking, followed by Sanger-sequencing confirmation, the material produced still contained ~2% of the mutant form.

DNA and cDNA variants in stable clones can come from multiple sources, including mutated pDNA, errors in DNA replication after plasmid transfection and integration, responses to stress (e.g., antibiotic selection and nutrient depletion during cell passaging), and changes in DNA replication during cell doubling (4, 18). In a 1984 report, Lebkowskiet al. demonstrated that, when transfected DNA enters a cell’s nucleus, mutation occurs early during replication (18). The mutation frequency of a helper-dependent vector can be ~2.5% in nucleus-replicating DNA, as shown with transformed African green monkey kidney fibroblast (COS-7) cells. However, pDNA represents the first and one of the most critical steps in controlling mutation levels during CLD.

Applying NGS During Plasmid Preparation: QC of pDNA is critical during development of a stable production cell line. Because peptide mapping is a time-consuming and low-throughput analytical method, cDNA NGS is being adopted broadly for sequence-variant analysis. Clones with genetic mutations generally are rejected because regulatory agencies require extensive studies for risk assessment and process control if a sequence variant can be detected at the protein level.

The detection limit of traditional Sanger sequencing is 15–20% (12–14). With rising demand for “mutation-free” plasmids and clones, incorporation of methods with increased sensitivity is inevitable. In addition to proposing application of cDNA NGS during clone screening, the AbbVie group discussed potential assessment of pretransfection plasmids using DNA NGS with a detection limit of about 0.2–0.5%. The mutation described in that article did not come from pDNA, as it did in our case (5). The Biogen group optimized its own NGS method and applied amplicon sequencing for clone screening (16). The authors mention that they had been investigating use of NGS for screening of expression plasmids before transfection (16).

Recognizing the importance of having a high-sensitivity analytical method for plasmid sequencing, we incorporated NGS for pDNA QC during CLD. Using that approach, we recently identified a pWX040-PR-B pDNA sample with a G to adenine (A) point mutation of 8.4%, a rate below the detection limit of Sanger sequencing (Figure 2). NGS also revealed fragment deletions in samples for another plasmid (pWX051-HC-N) (Figure 3, sequence depth). The ratio of DNA deletions was around 40–50%, as estimated by agarose gel electrophoresis. Using Sanger sequencing, the deletion positions were confirmed to be 1–1708 bp, 3671–5727 bp, and 8269–9060 bp (data not shown).


Setting a Reasonable Acceptance Criterion for NGS Plasmid Sequencing

Herein, we have demonstrated that a mutation level of 2% in pDNA can lead to genetic variations in 43% of the resulting stable clonal cell lines. Clearly, that level is unacceptable for plasmid preparations used during CLD. To set a reasonable acceptance criterion for mutation detection and to develop guidelines for NGS-data interpretation, we performed statistical calculations based on transgene copy number and pDNA mutation level. For a clone with up to 200 copies of transgene inserted into its genome, assuming that all insertion sites occur in open-chromatin regions that are transcribed at the same level, cDNA NGS with a detection limit of 0.5% can identify even one copy of a mutated GoI. If a plasmid carries a mutation rate of M%, then the probability of generating one copy without mutation equals 1 – M%. Successful clone screening with no mutation from plasmids would mean that all transgene copies were inserted into a host genome as wild-type genes. Based on statistical calculation, the probability of having all copies of wild-type gene is S = (1 – M%)CN, in which S represents the success rate and CN is the copy number. Table 3 shows calculated success rates for clone screening based on copy number and pDNA mutation rate.

During CLD, copy numbers generally are <100. We have analyzed (inserted) copy numbers for 363 stable clones expressing 31 different types of recombinant proteins. Figure 4 plots the distribution of copy numbers. The accompanying table summarizes the frequency (FR) of copy numbers lying in the ranges of 1–5, 5–10, 10–20, 20–30, 30–40, 40–50, and 50–100. The success rate within each copy-number range (SR) is estimated using the lower and upper copy numbers (Table 4). For example, in the copy-number range of 10–20, SR values are estimated as 95.1–90.5% (0.5% DNA mutation), 90.4–81.8% (1% DNA mutation), and 81.7–66.8% (2% DNA mutation) using copy numbers of 10 and 20. We also calculated a total success rate (TS) based on the frequency of copy numbers considered. The TS value from a plasmid preparation with mutation is calculated as shown in Table 4. We estimate the TS of clone screening to be 95.1–91.3%, 90.6–83.4%, and 82.2–69.8% when the rate of DNA mutation is 0.5%, 1%, and 2%, respectively. To expect a success rate (from the source of pDNA) >90%, the DNA mutation level should be kept below 0.5%.

Note that TS values are calculated based on the copy-number frequency of a specific CLD platform. It is always helpful to consider the copy-number distribution of your cell-line engineering platform. Additionally, our statistical analysis assumes that all copies are actively transcribed at the same level. Because integration sites are not all transcribed actively, the number of active copies should be lower than the number of inserted ones. At the same time, some integration sites (“hot spots”) are far more active than are other locations in a cell line’s genome. Integration of multiple copies in tandem on hot spots could complicate the situation further and may not be covered by statistical calculation. However, it is clear that when pDNA mutation levels increase, success rates drop significantly.

In our established analytical pipeline, the sequencing-quality acceptance limit is set to 35, which for our NGS system is equivalent to a sequencing error rate of ~0.03%. However, it is worth noting that steps involved in library preparation can introduce artificial mutations. To minimize the risk of discarding high-performing clones — and, worse, of delaying CMC timelines — we suggest rejecting pDNA when mutation levels >0.5% are detected. As we discussed, the analytical method applied should have a detection limit <0.5%, which is clearly beyond the capability of Sanger-sequencing instruments. Thus, we recommend leveraging DNA NGS for QC of pDNA before transfection.

As far as we know, ours is the first report to present a case of stable CLD in which mutation levels that fall below the detection limit of Sanger sequencing significantly impacted clone selection. We also have proposed an acceptance criterion for pDNA mutation levels so as to achieve a given success rate. To target a higher success rate in clone screening, plasmid mutation should be kept <0.5% as calculated statistically based on the copy-number frequency of the CLD platform under consideration. It is reasonable to conclude that plasmid preparations showing mutation rates exceeding 0.5% should be rejected for transfection during stable CLD.

Many biopharmaceutical companies continue to use plasmid sequencing by the Sanger method because it is less expensive and more accessible than many other available techniques. The associated risk of pDNA mutation also is generally low when performing two rounds of single-colony isolation. However, considering the potential for significant impacts on CMC timelines and CLD costs, we wish to bring attention to the importance of pDNA quality in transfection processes for CLD. We also call upon the biopharmaceutical industry to replace traditional Sanger sequencing with another method, such as NGS, that provides detection capabilities matching increasingly high expectations for control of sequence variants in final production clones.



1 Zeck A, et al. Low Level Sequence Variant Analysis of Recombinant Proteins: An Optimized Approach. PLoS One 7(7) 2012: e40328; https://doi.org/10.1371/journal.pone.0040328.

2 Lin TJ, et al. Evolution of a Comprehensive, Orthogonal Approach to Sequence Variant Analysis for Biotherapeutics. mAbs 11(1) 2019: 1–12; https://doi.org/10.1080/19420862.2018.1531965.

3 Valliere-Douglass J, et al. Biopharmaceutical Industry Practices for Sequence Variant Analyses of Recombinant Protein Therapeutics. PDA J. Pharm. Sci. Technol. 73(6) 2019: 622–634; https://doi.org/10.5731/pdajpst.2019.010009.

4 Zhang A, et al. A General Evidence-Based Sequence Variant Control Limit for Recombinant Therapeutic Protein Development. mAbs 12(1) 2020: 1791399; https://doi.org/10.1080/19420862.2020.1791399.

5 Zhang S, et al. Identifying Low-Level Sequence Variants via Next Generation Sequencing To Aid Stable CHO Cell Line Screening. Biotechnol. Prog. 31(4) 2015: 1077–1085; https://doi.org/10.1002/btpr.2119.

6 Tan KW, et al. Rapidly Accelerated Development of Neutralizing COVID-19 Antibodies by Reducing Cell Line and CMC Development Timelines. Biotechnol. Bioeng. 8 December 2022 (in press); https://doi.org/10.1002/bit.28302.

7 Zhang Z, et al. Reshaping Cell Line Development and CMC Strategy for Fast Responses to Pandemic Outbreak. Biotechnol. Prog. 37(5) 2021; https://doi.org/10.1002/btpr.3186.

8 Lubock NB, et al. A Systematic Comparison of Error Correction Enzymes by Next-Generation Sequencing. Nucleic Acids Res. 45(15) 2017: 9206–9217; https://doi.org/10.1093/nar/gkx691.

9 Schwartz JJ, Lee C, Shendure J. Accurate Gene Synthesis with Tag-Directed Retrieval of Sequence-Verified DNA Molecules. Nature Meth. 9(9) 2012: 913–915; https://doi.org/10.1038/nmeth.2137.

10 Greener A, Callahan M, Jerpseth B. An Efficient Random Mutagenesis Technique Using an E. coli Mutator Strain. Mol. Biotechnol. 7(2) 1997: 189–195; https://doi.org/10.1007/bf02761755.

11 Lee H, et al. Rate and Molecular Spectrum of Spontaneous Mutations in the Bacterium Escherichia coli As Determined by Whole-Genome Sequencing. Proc. Nat. Acad. Sci. USA 109(41) 2012: e2774–e2783; https://doi.org/10.1073/pnas.1210309109.

12 Davidson CJ, et al. Improving the Limit of Detection for Sanger Sequencing: A Comparison of Methodologies for KRAS Variant Detection. BioTechniques 53(3) 2012: 182–188; https://doi.org/10.2144/000113913.

13 Dong L, et al. Evaluation of Droplet Digital PCR and Next Generation Sequencing for Characterizing DNA Reference Material for KRAS Mutation Detection. Sci. Rep. 8, 2018: 9650; https://doi.org/10.1038/s41598-018-27368-3.

14 Tsiatis AC, et al. Comparison of Sanger Sequencing, Pyrosequencing, and Melting Curve Analysis for the Detection of KRAS Mutations: Diagnostic and Clinical Implications. J. Mol. Diagnost. 12(4) 2010: 425–432; https://doi.org/10.2353/jmoldx.2010.090188.

15 Thakur A, et al. Author Correction: Identification, Characterization, and Control of a Sequence Variant in Monoclonal Antibody Drug Product: A Case Study. Sci. Rep. 11, 2021: 17641; https://doi.org/10.1038/s41598-021-96916-1.

16 Wright C, et al. Genetic Mutation Analysis at Early Stages of Cell Line Development Using Next Generation Sequencing. Biotechnol. Prog. 32(3) 2016: 813–817; https://doi.org/10.1002/btpr.2263.

17 Zhang S, et al. Mutation Detection in an Antibody-Producing Chinese Hamster Ovary Cell Line by Targeted RNA Sequencing. BioMed. Res. Int. 20 March 2016: 8356435; https://doi.org/10.1155/2016/8356435.

18 Lebkowski JS, et al. Transfected DNA is Mutated in Monkey, Mouse, and Human Cells. Mol. Cell Biol. 4(10) 1984: 1951–1960; https://doi.org/10.1128/mcb.4.10.1951-1960.1980.

Shuai Wang is a director of Cell Line Development, Dujuan

Lian is a principal scientist of Cell Line Development, Yan Zhi is a director of Cell Line Development, Yang Huang is an associate director of Cell Line Development, Kee Wee Tan is an associate director of Cell Line Development, Hang Zhou is a senior vice president and head of Bioprocess Research and Development, Weichang Zhou is the honorary president of WuXi Biologics and senior advisor to its CEO, corresponding author Xiaoyue Chen is an executive director and head of Cell Line Development and Synthetic Biology ([email protected]), and corresponding author Sam Zhang is a vice president and head of Microbial and Viral Platforms ([email protected]), all at WuXi Biologics in Shanghai, Wuxi, and Hangzhou, China;


You May Also Like