• Nicholas Holmes#, Patrick Muller*, Rishi Patel%, Anisha Tehim&, Atharva Imamdar&, Saachi Yadav&, Sharon Alex&, Vibha Narasayya& and Vinayak Mathur#

    # Department of Science, Cabrini University, Radnor, PA 19087
    *Eurofins, Chester Springs, PA 19425

    %University of Pennsylvania, Philadelphia, PA 19104

    &Penn Summer Prep Program, Philadelphia, PA 19104

    Abstract

    Horizontal gene transfer (HGT) plays a beneficial role in the evolution and survival of bacteriophages and bacteria. The extent of HGT between Streptococcus bacteria and associated bacteriophages, focusing on viral major capsid proteins, was studied utilizing a bioinformatics approach. Evidence of HGT was identified via the community science analysis pipeline and the BLAST database. Evolutionary relationships were assessed using MEGA software to construct phylogenetic trees. Overall relationships were then represented as networks via the Gephi application. Literature has shown that the major capsid protein in bacteria works analogously to bacterial microcompartments, protecting genetic materials and organelles. These observations, as well as genomic locations of genes coding for major capsid proteins, DNA polymerases, DNA topoisomerases, and other associated molecules, have led to their uses as biomarkers of potential HGT cases. The results provide evidence of extensive HGT between bacteria and bacteriophages, which helps in understanding their evolution and potential therapeutic uses.

    Introduction

    Antibiotic resistance within bacterial populations is rising to dangerously high levels and new resistance mechanisms are emerging and spreading globally. One such mechanism that has been observed is horizontal gene transfer (HGT). HGT occurs when genetic material is exchanged between organisms in a non-genealogical manner (Goldenfeld and Woese, 2007). This genetic exchange is unlike the genetic exchange which occurs from parent to offspring, as HGT usually occurs between different organisms which are not related. Through the means of HGT, a bacteria can pick up many different functions, including antibiotic resistance and virulence factors. (Deng et al. 2019)

    Bacteriophages have shown to have a role in transfer of genetic material between bacteria (Borodovich et al., 2018). Bacteriophages can infect bacteria either through the lytic cycle (called lytic phages) or the lysogenic cycle (called temperate phages) (Rehman et al., 2019). The lysogenic cycle involves the bacteriophage integrating their genome into that of the host cell, and can become dormant, only to infect the cell when it undergoes activation (Labonté et al., 2019). These temperate phages can serve as vectors for HGT between bacterial species via transduction (Labonté et al., 2019). Phage transduction can be studied by examining a bacterial genome and locating the pockets of viral DNA. Phages have effects on the control of bacterial populations, the spread of virulence factors and antibiotic resistance genes, resulting from unique combinations of genetic diversity (Cumby et al., 2015). The mechanism by which these viruses infect bacteria and how these drive their evolution is poorly understood and is crucial to understand where they originated from (Cumby et al., 2015). Bacteriophages have unique host range, and their specificity is determined by their specific structures to attach to specified host bacterial cell receptors and infect the cells.

    The protein that we focus on in our study is the major capsid protein. Capsids are the morphological structures that contain the condensed form of the genetic material of a bacteriophage, and also protect it from any outside physical and chemical damages. Recent research has shown that mutations in the genes that code for these proteins are necessary for certain interactions with different host cell receptors and appear to contribute to the stability of a given capsid. This suggests that these mutations aid in broadening bacteriophages’ host ranges (Labrie et al., 2014).

    Interestingly, there are similarities between encapsulin proteins that form bacterial structures resembling shells, and the major capsid protein of the HK-97 bacteriophage (Freire et al., 2015). Structural identities are also seen with capsid proteins and S-layer lattice protein components of the cell envelope of prokaryotes and bacteria (Freire et al., 2015). Fusogenic proteins of enveloped viruses which enable the fusion between them and host cell membranes, have also been shown to function analogously to the SNARE family of proteins of Caenorhabditis elegans, which encode for fusion of intracellular vesicles to their cell membranes and allow for cell–to–cell communication (Freire et al., 2015). Not only that, the structures and functions of major capsid proteins and bacterial microcompartments are very similar. Bacterial microcompartments are protein shells that encase enzymes, molecules essential for the microorganism, and may be even their genetic material, protecting these items from degradation or physical or chemical damages in or out of the cell (Krupovic & Koonin, 2017). These notions provide insight into how the ideas and concepts of these proteins encoding capsids that protect their genetic material have expanded from beyond unique to bacteriophages, but to other microorganisms as well.

    Bioinformatics analyses and studies have demonstrated bacteria that have undergone HGT with bacteriophages possess conserved genomic regions pertaining to not just to single genes, but multiple, different genes that are located relatively close to one another. According to research presented by Sabath et al., 2012, overlapping genes and sequences are very common in viral genomes. Expressions of these genes have been confirmed, while functionality requires further investigation. Interestingly, the resulting, translated proteins lack a stable, tertiary, three-dimensional structure characteristic of most normal, wild-type proteins (Sabath et al., 2012). In addition to genes encoding major capsid proteins, genes that encode for DNA polymerases, DNA topoisomerases, and other molecules associated with genome replication are inherited through HGT as well. These genes have been shown to be located close together within viral genomes. Appropriately, these genes are termed genomic islands, which refer to groups of unique open reading frames that contain sequences that encode given traits or carry out specific functions (Villa & Viñas, 2019). Examples of genomic islands that have been investigated carry out functions pertaining to virulence and pathogenicity, symbiosis, metabolism, fitness, and antibiotic resistance (Finke et al., 2017). The specific mechanism of HGT involved in how these genomic islands are integrated into the genomes of host organisms remains unknown (Villa & Viñas, 2019). This property provides an opportunity to utilize major capsid proteins as biomarkers when analyzing genomic sequences to establish evidence of HGT amongst bacteriophages and bacteria (Born et al., 2019).

    In this study we focused on the major capsid protein in Streptococcus genus of bacteria. Studying how bacteria acquires its resistance to antibiotics is necessary, due to the shrinking list of effective antibiotics. The objective of this study is to assess the extent of HGT between bacteria species and associated bacteriophages, using the major capsid proteins as biomarkers. Our results indicate that overlapping open reading frames composed of varying numbers of base pairs are located close to the major capsid protein in the bacterial genomes. Whether or not these regions encode functional proteins is not entirely known. Current annotations present in databases indicate that these genomic regions encode several different proteins that contribute to the structure and morphology of the major capsid protein (Rosenwald et al., 2014).

    Methods

    HGT & Community Science Project Pipeline

    Positive cases of HGT between bacteriophages and bacteria were determined using the Community Science Project Pipeline (Mathur et al., 2019). A list of accession numbers of bacteriophage major capsid proteins was generated from the NCBI database. Each phage accession number was searched against the bacteria database on NCBI using BLASTp to generate positive hits (referred to as Forward BLAST) (Johnson et al., 2008). The top 10 hits based on the cut-off criteria of e-values of 1e-50 or lower, and a query coverage of 70% or higher, were recorded. The top bacterial hit accession number was then searched against the Virus database on NCBI using BLASTp (referred to as Reverse BLAST) (Johnson et al., 2008). Again, the top 10 hits which satisfied the cut-off parameters were recorded. If the top virus hit accession number in the Reverse BLAST matched the original virus accession number query, that bacteria-virus pair was recorded as a potential positive case of HGT. In total, 75 phage accession numbers were tested to give 21 positive HGT bacteria-virus pairs (Table 1).

    Evolutionary History of Bacteria and Bacteriophages – Comparative Genomics

    The evolutionary history of bacteria and bacteriophages was assessed via comparative genomics. FASTA sequences of the major capsid protein from all positive cases of HGT were uploaded to MUSCLE software and aligned (Edgar, 2004). (Supplementary Figure 1). These sequences were then uploaded to the MEGA7 software to generate phylogenetic trees (Kumar et al., 2015). The phylogenetic tree was constructed based on maximum likelihood method and bootstrapping value of 100, seen in Figure 1. Based on the results, the Streptococcus clade of bacteria was selected for further analyses.

    Synteny & Evolutionary Relationships

    The Streptococcus clade of bacteria and bacteriophages were selected for the synteny analysis. Synteny for the Streptococcus clade of bacteria and bacteriophages was determined using the software MAUVE (Darling et al., 2010). Major capsid gene sequences were downloaded from NCBI for both bacteria and bacteriophages. The Mauve synteny output was generated for all the phages, bacteria and the visualization of the bacteria and phage sequence.

    Gephi Network Analysis

    The top six results from the Forward and Reverse BLAST searches were collected based on the Community Science Pipeline and were each organized as a node into the Gephi software for network analysis (Bastian et al., 2009). Connections between bacteria and bacteriophages based on the generated phylogenetic trees were input into Gephi as edges. A node in the center of the network with the most edges connected to it was indicative of the ancestral sequence that was shared by most bacteria and bacteriophages through HGT.

    Results

    Comparative Genomics

    Based on the arrangement on the phylogenetic tree and validation by the bootstrapping values, there is a high likelihood that Streptococcus bacteriophages and bacteria were involved in HGT with respect to the major capsid protein. Despite the major capsid protein being present in the five bacteriophages and bacteria, their location in the genomes of each species varies. This suggests mutations such as translocations and insertions have occurred over time (Kyrillos, et al., 2016). This could explain the divergence of pairs of bacteriophages and bacteria in the phylogenetic tree. One pair that is the most divergent and in its own unique clade and not associated with the other pairs is the connection between the Streptococcus phage VS-2018a and the major capsid protein E in Streptococcus thermophilus.This is also reflected in their MUSCLE alignments that vary compared to the other bacteria and bacteriophage pairs.

    Synteny & Evolutionary Relationships

    The Mauve software was used to create a multiple sequence alignment and predict synteny of Javan Streptococcus bacteriophage and bacteria pairs using the progressive Mauve algorithm. In the synteny map of the four Javan prefixed bacteriophages, the major capsid protein lies in the range of approximately 500-2000 base pairs. (Supplementary Figure 3). There is a consistent alignment based on the peak height and coloration patterns with Phage VS2018 having the most unique genome arrangement. The S.thermophilus is missing a 400 base pair region upstream of the major capsid protein gene, as indicated by a shift in the sequence alignment (Supplementary Figure 4). The five phage sequences of interest are in reverse orientation in the genome indicated by the peaks falling below the main sequence line in Figure 2. The area between 850-970 base pairs is a unique region found only in S.thermophilus bacteria and the phage VS 2018a pair. This is expected as this pair lies on a separate clade in the phylogenetic tree generated previously. The alignment of the genomes indicates that the region upstream and downstream of the major capsid gene is also shared between these bacteria and bacteriophage pairs. This pattern indicates that there is not just the major capsid gene that is shared between bacteria and phages but instead a whole chunk of the genome.

    Gephi Network Analysis

    The central node in the network corresponds to a hypothetical protein in Streptoccoccus pyogenes. As seen in the top six results of the Reverse BLAST, this node has multiple shared edges with a major capsid protein in Streptococcus bacteriophages Javan 146, 454, 464, 474, 459, 484, 166 (Figure 3). The generated network shows that the connections are the same as they appear in the phylogenetic tree. The central node of the gene encoding a hypothetical protein in S.pyogenes connects closely to different strains of itself and a Javan bacteriophage 464. This relationship suggests that those two bacteria and bacteriophage pairs could be where the initial transfer of genomic material had occurred.

    Discussion

    HGT of the major capsid protein has allowed for Streptococcus bacteriophages and bacteria to display survival of the fittest to survive in constantly changing environments. In doing so, greater genetic diversity is achieved through HGT, thus potentially speeding up adaptation and overall evolution.

    Upon review of the scientific literature, the role and functions of major capsid proteins

    could potentially serve as a bacterial microcompartment protecting the bacterial genome from

    physical and chemical damages akin to the functions of viral capsids, as a result of HGT (Krupovic & Koonin, 2017). Interestingly, there are similarities in morphology

    between the S-layer lattice proteins present in mostly archaea bacteria. Based on this notion,

    perhaps these proteins function as a means to protect the genetic material of archaea from

    physical or chemical damages. Perhaps even archaea evolved to possess such as a structure from

    HGT as a means to survive in its native environment of hot springs or areas of high temperature

    (Freire et al., 2015).   

    It has been suggested that bacterial genes acquired through HGT are usually quickly deleted from the genome unless they are to be utilized for some specific reasons later on (Rosenwald et al., 2014). For example, genes acquired through HGT that improve metabolism in bacteria can be expressed under given circumstances. It is upon changes in the environment or medium that render these genes functionless that can result in the deletion of the genes. This is understandable as bacterial genomes tend to be very compact and constituent (Moran, 2002). This most likely occurs as a means to conserve internal energy by not expressing genes that are not necessary or will not be considered as such. These notions can be both determined via RNA sequencing techniques in which the protein of interest is isolated, its messenger RNA is extracted and purified from other RNA molecules, for study (Rosenwald et al., 2014).

    By identifying HGT within the major capsid protein sequences for the Streptococcus bacteria and the bacteriophages that infect them, we can begin to understand the extent of HGT within bacterial populations. We propose that the major capsid protein can be used as a biomarker to identify HGT in other bacteria species as well. There is evidence to suggest that when genes are transferred horizontally, it is not just a single gene but a whole genomic region consisting of multiple genes (Szöllősi et al., 2015). A future direction of this research would be to identify the gene regions flanking the major capsid protein in the bacterial genome and understand the functionality of those genes and the role they play within the bacteria.

    In this study, we focused on the Streptococcus genus, but it can easily be expanded to include a larger dataset of bacteria and bacteriophage pairs based on data availability in the NCBI database. It is imperative to study the extent and rate of HGT in bacterial populations as it is a key mechanism for bacteria to acquire antibiotic resistance genes, and thus has implications for human health worldwide.

    References

    Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open-source software for exploring and manipulating networks. Print. N. P. Retrieved from https://gephi.org/users/publications/

    Born, Y., Knecht, L. E., Eigenmann, M., Bolliger, M., Klumpp, J., & Fieseler, L. (2019). A major-capsid-protein-based multiplex PCR assay for rapid identification of selected virulent bacteriophage types. Archives of Virology, 164(3), 819–830. doi: 10.1007/s00705-019-04148-6.

    Borodovich, T., Shkoporov, A. N., Ross, R. P., & Hill, C. (2018). Phage-mediated horizontal gene transfer and its implications for the human gut microbiome. Research in Microbiology, 169(7-8), 366-373. https://doi.org/10.1016/j.resmic.2018.04.005

    Cumby, N., Reimer, K., Mengin‐Lecreulx, D., Davidson, A. R., & Maxwell, K. L. (2015). The phage tail tape measure protein, an inner membrane protein and a periplasmic chaperone play connected roles in the genome injection process of E. coli phage HK97. Molecular Microbiology, 96(3), 437-447. doi:10.1111/mmi.12918.

    Darling, A. E., Mau, B., Perna, N. T. (2010). progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement. PLoS ONE, 5(6), 1-17. doi:10.1371/journal.pone.0011147

    Deng, Y., Xu, H., Su, Y., Liu, S., Xu, L., Guo, Z., Wu, J., Cheng, C., & Feng, J. (2019). Horizontal gene transfer contributes to virulence and antibiotic resistance of Vibrio harveyi 345 based on complete genome sequence analysis. BMC Genomics, 20(1), 761. https://doi.org/10.1186/s12864-019-6137-8

    Edgar R. C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research, 32(5), 1792–1797. doi:10.1093/nar/gkh340

    Finke, J. F., Winget, D. M., Chan, A. M., Suttle, C. A. (2017). Variation in the genetic repertoire of viruses infecting Micromonas pusilla reflects horizontal gene transfer and links to their environmental distribution. Viruses, 9, 116. 1-18. doi: 10.3390/v9050116.

    Freire, J. M., Santos, N. C., Veiga, A. S., Da Poian, A. T., & Castanho, M. A. R. B. (2015). Rethinking the capsid proteins of enveloped viruses: multifunctionality from genome packaging to genome transfection. The FEBS Journal, 282(2015), 2267–2278. doi:10.1111/febs.13274

    Goldenfeld, N., & Woese, C. (2007). Biology’s next revolution. Nature, 445(7126), 369. https://doi.org/10.1038/445369a

    Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., McGinnis, S., & Madden, T. L. (2008). NCBI blast: a better web interface. Nucleic Acids Research, 36, 1-5. doi: 10.1093/nar/gkn201.

    Karp, P. D., Billington, R., Caspi, R., Fulcher, C. A., Latendresse, M., Kothari, A., … Subhraveti, P. (2017). The BioCyc collection of microbial genomes and metabolic pathways. Briefings in Bioinformatics, 20(4), 1085–1093. doi:10.1093/bib/bbx085

    Krupovic, M., & Koonin, E. V. (2017). Cellular origin of the viral capsid-like bacterial microcompartments. Biology Direct, 12(25), 1-6. doi:10.1186/s13062-017-0197-y

    Kumar, S., Stecher, G., & Tamura, K. (2015). MEGA7: molecular evolutionary genetics analysis version 7.0. Print. N. P. Retrieved from https://www.megasoftware.net/web_help_7/hc_citing_mega_in_publications.htm

    Kyrillos, A., Arora, G., Murray, B., & Rosenwald, A. G. (2016). The presence of phage orthologous genes in Helicobacter pylori correlates with the presence of the virulence factors CagA and VacA. Helicobacter, 21(3). doi: 10.1111/hel.12282

    Labonté, J. M., Pachiadaki, M., Fergusson, E., McNichol, J., Grosche, A., Gulmann, L. K., … Stepanauskas, R. (2019). Single cell genomics-based analysis of gene content and expression of prophages in a diffuse-flow deep-sea hydrothermal system. Frontiers in Microbiology, 10, 1-12. doi: 10.3389/fmicb.2019.01262.

    Labrie, S. J., Dupuis, M., Tremblay, D. M., Plante, P., Corbeil, J., & Moineau, S. (2014). A new microviridae phage isolated from a failed biotechnological process driven by Escherichia coli. Applied and Environmental Microbiology, 80(22), 6992-7000. doi: 10.1128/AEM.01365-14.

    Lerner, A., Matthias, T., & Aminov, R. (2017). Potential effects of horizontal gene exchange in the human gut. Frontiers in Immunology, 8, 1-14. doi: 10.3389/fimmu.2017.01630.

    Mathur, V., Arora, G. S., McWilliams, M., Russell, J., & Rosenwald, A. G. (2019). The genome solver project: faculty training and student performance gains in bioinformatics. Journal of Microbiology & Biology Education, 20(1), 1-12. doi:10.1128/jmbe.v20i1.1607

    Moran, N. A. (2002). Microbial minimalism: genome reductionin bacterial pathogens. Cell, 108(5), 583-586. doi:10.1016/S0092-8674(02)00665-7

    Rehman, S., Ali, Z., Khan, M., Bostan, N., & Naseem, S. (2019). The dawn of phage therapy. Reviews in Medical Virology, 1-16. doi: 10.1002/rmv.2041.

    Rosenwald, A.G., Murray, B., Toth, T., Madupu, R., Kyrillos, A. & Arora, G. (2014). Evidence for horizontal gene transfer between chlamydophila pneumoniae and chlamydia phage. Bacteriophage, 4(4). doi: 10.4161/21597073.2014.965076.

    Sabath, N., Wagner, A. & Karlin, A. (2012). Evolution of viral proteins originated de novo by overprinting. Molecular Biology and Evolution, 29(12). doi:10.1093/molbev/mss179.

    Szöllősi, G. J., Davín, A. A., Tannier, E., Daubin, V., & Boussau, B. (2015). Genome-scale phylogenetic analysis finds extensive gene transfer among fungi. Philosophical Transactions of the Royal Society of London B: Biological Sciences, 370. https://doi.org/10.1098/rstb.2014.0328

    Villa, T. G. & Viñas, M. (2019). Horizontal gene transfer: breaking borders between living kingdoms. Cham, Switzerland: Springer Nature Switzerland AG.

    Yang, Z., Zhang, Y., Wafula, E. K., Honaas, L. A., Ralph, P. E., Jones, S., … dePamphilis, C. W. (2016). Horizontal gene transfer is more frequent with increased heterotrophy and contributes to parasite adaptation. Proceedings of the National Academy of Sciences of the United States of America, 113(45), 7010-7019. doi: 10.1073/pnas.1608765113.

    Table 1: Table of the positive cases of HGT amongst pairs of bacteriophages and bacteria.

    BacteriophageBacteriophage Accession NumberBacteriaBacteria Accession Number
    putative head protein [Riemerella phage RAP44]YP_007003622.1hypothetical protein [Riemerella anatipestifer]WP_014938289.1
    putative head protein [Brevibacillus phage Osiris]YP_009215022.1hypothetical protein [Brevibacillus laterosporus]    WP_022583694.1)  
    major capsid protein [Streptococcus phage Javan464]QBX28740.1major capsid protein E [Streptococcus pyogenes]  WP_136111941.1
    major capsid protein [Arthrobacter phage Isolde] (not top hit) AYR00888.1hypothetical protein [Arthrobacter sp. cf158]WP_091323596.1
    phage major capsid protein [Cellulophaga phage phi18:1] YP_008240963.1phage major capsid protein [Elizabethkingia anophelis]WP_059330774.1
    phage major head protein [Oenococcus phage phiS13] YP_009005240.1hypothetical protein [Oenococcus oeni]WP_032811398.1
    major capsid protein [Streptococcus phage Javan446] QBX28239.1hypothetical protein [Streptococcus pyogenes]WP_014635509.1
    major capsid protein [Streptococcus phage Javan146] (not top hit)QBX23717.1major capsid protein E [Streptococcus pyogenes]WP_136022800.1
    major capsid protein [Brevibacillus phage Jimmer1] YP_009226318.1phage capsid protein [Brevibacillus laterosporus]WP_119733365.1
    major capsid protein [Streptococcus phage Javan166] QBX23895.1hypothetical protein [Streptococcus dysgalactiae]WP_046177708.1
    hypothetical protein [uncultured Mediterranean phage uvMED]BAQ84158.1hypothetical protein [Elizabethkingia anophelis]WP_151449511.1
    major capsid protein [Streptococcus phage VS-2018a] AZA24404.1major capsid protein E [Streptococcus thermophilus]AZA18259.1
    major capsid protein [Streptococcus phage Dp-1] YP_004306931.1hypothetical protein D8H99_54145 [Streptococcus sp.]RKV76237.1
    major capsid protein [Mycobacterium phage Renaud18] (not top hit)AXQ64918.1MULTISPECIES: major capsid protein E [Mycobacteroides]WP_057970215.1
    prophage major head protein [Oenococcus phage phiS11]YP_009006573.1hypothetical protein [Oenococcus oeni]WP_032811892.1
    hypothetical protein [Oenococcus phage phi9805] YP_009005184.1hypothetical protein [Oenococcus oeni]WP_032820248.1
    capsid protein [Mycobacterium phage TChen] (not top hit) AWH14408.1MULTISPECIES: major capsid protein E [Mycobacteroides]WP_057970215.1
    capsid protein [Arthrobacter phage KellEzio] YP_009301281.1hypothetical protein DRJ50_09715 [Actinobacteria bacterium]RLE21106.1
    major capsid protein [Microviridae sp.] AXH73898.1hypothetical protein [Elizabethkingia anophelis]                  WP_080670996.1
    major capsid protein [Streptococcus phage Javan464] QBX28740.1major capsid protein E [Streptococcus pyogenes]WP_136111941.1
    major capsid protein [Microviridae sp.]AXH77365.1  

    Figure 1: Phylogenetic tree of all of the positive cases of HGT. There are multiple, unique clades observed. The center Streptococcus clade was chosen for further analysis based on the high bootstrap values.


    Figure 2: The synteny of the phage and bacteria sequences of interest generated via Mauve. The five phage sequences are in reverse orientation in the genome indicated by the peaks falling below the line. The area between 850-970 base pairs is a unique region that is only found in S.thermophilus bacteria and phage VS 2018a pair. This is expected as this pair lies on a separate clade in the phylogenetic tree generated from MEGA7.

    Figure 3: The Gephi network of all positive HGT cases within the Streptococcus clade. Notice that all bacteria and bacteriophages display evolutionary relationships through a mechanism of HGT.

    Supplementary Figure 1: MUSCLE alignment of the Streptoccoccus bacteria and associated bacteriophage pairs. The accession numbers, AZA24404.1, QBX28740.1, QBX28239.1, QBX23895.1, and QBX23717.1 are bacteriophage major capsid proteins. The rest of the sequences are bacterial proteins. All of the sequences are highly conserved here.


    Supplementary Figure 2: Maximum likelihood phylogenetic tree using MEGA7 showing the relationships amongst Streptococcus bacteriophages and bacteria.


    Supplementary Figure 3: The synteny of the four Javan prefixed phages generated via Mauve. The solid red line connecting each sequence shows the location of the matching section of the genome. The major capsid protein lies in the range of approximately 500-2000 base pairs in this alignment. There is a mostly consistent alignment based on the peak height and coloration patterns. Phage VS2018 has certain regions which are unique as can be seen by the sliding black box feature.


    Supplementary Figure 4: The synteny of S.thermophilus and S.dysgalactiae generated via Mauve. S.thermophilus is missing a 400 base pair region upstream of the major capsid protein gene, indicating by the shift in sequence alignment.

  • Hemophilia is a classic example of a monogenic disorder, making it a cornerstone for the application and advancement of molecular genetic testing. It is an X-linked recessive bleeding disorder primarily affecting males, defined at the molecular level by mutations in one of two genes on the X chromosome. Mutations in the F8 gene cause Hemophilia A, leading to a deficiency of clotting Factor VIII, while mutations in the F9 gene cause Hemophilia B, resulting in a Factor IX deficiency. Because its cause is a defect in a single gene, molecular testing can move beyond observing symptoms to identify the precise, causative genetic variant in an affected individual or carrier. The obvious direct link between genotype and phenotype has made hemophilia a model disease for developing diagnostic strategies and pioneering novel genetic therapies. Studying hemophilia can help us understand the spectrum of mutations.

    The use of genetic testing for hemophilia offers comprehensive benefits but also presents particular challenges. The primary benefit is achieving a definitive diagnosis, which not only confirms the condition but also accurately distinguishes between Hemophilia A and B. This latter point is an important distinction as the replacement therapies are different. Also, the type of mutation can often predict the severity of the disease. For example, large deletions in the F8 gene are typically associated with a severe phenotype and a higher risk of developing antibodies against treatment, giving the result both diagnostic and prognostic power (Bardi & Astermark et al., 2015). For families, genetic testing is invaluable for carrier detection. Female relatives can learn their carrier status, which informs their own health monitoring and allows for informed reproductive decisions. The psychosocial challenges can be especially complex for female carriers. A carrier diagnosis can create anxiety regarding personal health, as some carriers experience bleeding symptoms, especially during surgery or childbirth. It also forces difficult reproductive decisions, introducing options like prenatal diagnosis or preimplantation genetic testing, each with its own ethical and emotional weight. Feelings of guilt or responsibility for passing on the condition can also be a significant burden, emphasizing the necessity for sensitive and comprehensive genetic counseling (Cassis et al., 2012).

    Several clear indications warrant genetic testing for hemophilia. The most common is in a male presenting with symptoms of a bleeding diathesis, such as spontaneous bleeding into joints and muscles, prolonged bleeding after minor injury, or excessive bleeding post-surgery. A known family history of hemophilia is another primary indication, prompting testing for at-risk male infants and carrier testing for female relatives. The diagnostic process begins with coagulation screening tests and specific factor activity assays. While these biochemical tests can diagnose a factor deficiency, molecular genetic testing is required to identify the causative mutation. The testing strategy itself is often tiered. In cases of severe Hemophilia A, laboratories may first screen for the common intron 22 inversion. If this is negative, full gene sequencing via Next-Generation Sequencing is then performed. A persistent challenge in sequencing is the identification of novel missense variants, which can be classified as variants of unknown significance, creating diagnostic uncertainty until their functional impact can be determined. As an X-linked recessive condition, hemophilia has complete penetrance in males who inherit the mutation, though symptoms in female carriers can vary due to random X-inactivation (Antonarakis et al., 1995).

    The overall utility of genetic testing in hemophilia is extremely high across clinical and personal domains. Clinically, it provides a precise diagnosis, informs prognosis regarding severity and inhibitor risk, and ensures the correct factor concentrate is used for treatment. It transforms a diagnosis based on symptoms into one based on a defined molecular cause, allowing for more personalized risk stratification. For families, its utility is immense for carrier identification and reproductive planning, empowering individuals with the information needed to make personal decisions. It has become an indispensable part of comprehensive hemophilia care, shifting the paradigm from reactive treatment of bleeds to proactive management based on an individual’s unique genetic profile.

    Looking forward, the future of hemophilia treatment is inextricably linked to its genetics, moving beyond testing and into direct intervention. The ultimate form of pharmacogenetics for hemophilia is gene therapy. Because it is a single-gene disorder, hemophilia is an ideal candidate for this revolutionary approach. Current gene therapies, several of which have recently been approved, use a viral vector, typically an adeno-associated virus (AAV), to deliver a functional copy of the F8 or F9 gene to the patient’s liver cells. The liver then begins to produce the missing clotting factor, transforming a condition requiring lifelong infusions into one that can be managed with a single treatment. Despite its promise, significant challenges for gene therapy remain. Questions about the long-term durability of factor expression and the high cost of treatment are active areas of investigation. Many potential candidates are ineligible for current AAV-based therapies due to pre-existing antibodies against the viral vector. Future research is focused on overcoming these hurdles and ensuring equitable access to these transformative treatments, which represent a paradigm shift from managing a disease to offering a potential functional cure (Doshi & Arruda, 2018).

    References

    Antonarakis, S. E., Rossiter, J. P., Young, M., Horst, J., de Moerloose, P., Sommer, S. S., Ketterling, R. P., Kazazian, H. H., Jr, Négrier, C., Vinciguerra, C., Gitschier, J., Goossens, M., Girodon, E., Ghanem, N., Plassa, F., Lavergne, J. M., Vidaud, M., Costa, J. M., Laurian, Y., Lin, S. W., … Inaba, H. (1995). Factor VIII gene inversions in severe hemophilia A: results of an international consortium study. Blood, 86(6), 2206–2212.

    Cassis, F. R., Querol, F., Forsyth, A., Iorio, A., & HERO International Advisory Board (2012). Psychosocial aspects of haemophilia: a systematic review of methodologies and findings. Haemophilia : the official journal of the World Federation of Hemophilia18(3), e101–e114. https://doi.org/10.1111/j.1365-2516.2011.02683.x

    Bardi, E., & Astermark, J. (2015). Genetic risk factors for inhibitors in haemophilia A. European journal of haematology94 Suppl 77, 7–10. https://doi.org/10.1111/ejh.12495

    Miesbach

    Doshi, B. S., & Arruda, V. R. (2018). Gene therapy for hemophilia: what does the future hold?. Therapeutic advances in hematology9(9), 273–293. https://doi.org/10.1177/2040620718791933

  • Modern genomics employs a diverse toolkit to identify the complex relationship between genes, environment, and phenotype. Two recent studies, focused on different organisms and traits, demonstrate the complementary nature of modern research strategies. One study represents a bottom-up functional genomics approach, where a specific gene is identified and its mechanism is validated through direct genetic manipulation in soybeans (Wu et al., 2025). Another is a top-down epigenome-wide association study (EWAS), which searches for statistical correlations between epigenetic patterns, environmental factors, and a complex human disease (Lee et al., 2025). Together, these papers emphasize the distinct methods of mechanistic and associative research in the broader field of genomics.

    The two studies used different types of data and analytical tools. The Wu et al. (2025) study integrated multiple data types to build a case for the role of the GmERF205 gene in soybeans (Wu et al., 2025). The researchers began with functional genomic data, using RNA-sequencing to identify genes highly expressed during drought. This was followed by biochemical evidence, where they measured the activity of antioxidant enzymes to understand the cellular effects of the gene’s overexpression. The core of their work involved direct genetic manipulation, using CRISPR/Cas9 gene-editing and Agrobacterium-mediated transformation to create transgenic plants.

    The Lee et al. (2025) study was an observational analysis of a human population (Lee et al., 2025). The primary dataset consisted of epigenetic data from the Infinium Methylation 850k array, which measures DNA methylation levels. This was combined with detailed phenotypic data and environmental data from a food frequency questionnaire. Their primary tools were statistical, using R packages to perform linear modeling and identify differentially methylated positions associated with obesity and diet.

    The experimental designs reflected the different goals of each study. One investigation did not require considerations like allelic diversity or population stratification because it was not a population study (Wu et al., 2025). Instead of sampling a diverse population, the researchers used a single soybean variety and created genetically modified versions. This controlled genetic background allowed them to isolate the specific effect of the GmERF205 gene. Their phenotyping was experimental and highly detailed, involving the measurement of plant growth and physiological responses under controlled drought conditions. Conversely, the other study’s design was centered on population-level analysis (Lee et al., 2025). Sample size was a critical parameter, and the researchers utilized a large cohort of 1,526 individuals. Because their study involved humans, they had to address the potential for population stratification, which they managed by using a relatively homogenous Korean cohort and by statistically adjusting for potential confounding variables, including estimated blood cell-type proportions.

    Both research teams faced significant challenges. For the soybean study, a primary challenge was identifying a single, impactful gene from the large ERF transcription factor family. They addressed this by using RNA-sequencing data to prioritize candidates that were most responsive to drought stress, effectively narrowing the field (Wu et al., 2025). The human study explicitly detailed its limitations. A major challenge was the temporal discrepancy between the dietary data and the methylation data, which were collected four years apart. The authors acknowledged that this prevents definitive causal claims and proposed future longitudinal studies with concurrent data collection. Another key challenge was correcting for the confounding effect of different blood cell types in their samples, which they addressed using a reference-based statistical deconvolution algorithm (Lee et al., 2025).

    Despite their different approaches, both studies have enriched the field of population genomics. The soybean study provides a powerful example of functional validation. While population genomics can identify a genetic region associated with a trait, it cannot prove which gene in that region is responsible. Functional studies provide that crucial mechanistic link, demonstrating a specific gene’s causal role and providing a validated target that can now be screened for in diverse populations (Wu et al., 2025). The EWAS of the Korean cohort pushes the boundaries of population genomics into the realm of epigenomics. It demonstrates that different aspects of a complex disease have distinct epigenetic signatures tied to environmental factors like diet. This highlights the importance of studying gene-environment interactions and moves the field beyond the static DNA sequence to understand the dynamic regulatory layers that connect lifestyle to health outcomes (Lee et al., 2025).

    These two distinct approaches are complementary. Association studies like the EWAS generate hypotheses about which pathways are important, while functional studies like the soybean research provide the definitive proof of a gene’s role, ultimately creating a more complete picture of how complex traits are controlled.

    References

    Lee, J., Choi, HK., Park, SH. et al. Epigenome-wide association study of BMI and waist-to-hip ratio and their associations with dietary patterns in Korean adults. Sci Rep 15, 28681 (2025). https://doi.org/10.1038/s41598-025-13868-6

    Wu, N., Feng, Y., Jiang, T. et al. Genome-wide study and expression analysis of soybean ERF transcription factors and overexpression of GmERF205 enhances drought resistance in soybean. BMC Genomics 26, 726 (2025). https://doi.org/10.1186/s12864-025-11829-x

  • Rapid advancements in human genetics and genomics have introduced an era of unprecedented potential for understanding disease and improving diagnostics. However, this progress brings ethical challenges, particularly concerning the privacy of genetic information and the potential for discrimination.

    Completing the Human Genome Project marked a turning point, transforming biology and medicine by providing a comprehensive blueprint of human genetic information. This knowledge has influenced remarkable advancements in our ability to diagnose and prevent various diseases with genetic components. The increasing accessibility and use of personal genetic and genomic information raise profound societal questions. The convergence of genetics and public health requires a strong foundation of laws and protocols to ensure that scientific progress serves the common good without infringing on individual rights (Mikail, 2008). Central to this are concerns about genetic discrimination and the privacy of deeply personal genetic data.

    The legal framework protecting genetic information in the United States has evolved over several decades. Early civil rights legislation, while not explicitly designed for genetic information, laid some groundwork. Title VII of the Civil Rights Act of 1964 prohibits employment discrimination based on race, color, religion, sex, or national origin. While it doesn’t explicitly mention genetics, arguments could be made if genetic traits were disproportionately associated with a protected class (Mikail, 2008). The Rehabilitation Act of 1973 and the Americans with Disabilities Act (ADA) of 1990 presented protections against discrimination based on disability. These acts are relevant to genetics as they could cover individuals with manifested genetic conditions that resulted in impairment. However, their applicability to asymptomatic individuals with a genetic predisposition to a future illness was less clear (Mikail, 2008).

    The Health Insurance Portability and Accountability Act (HIPAA) of 1996 and its Privacy Rule were a big step forward in protecting the privacy of health information held by healthcare providers (Mikail, 2008). HIPAA established national standards to safeguard individually identifiable health information, termed Protected Health Information, setting limits and conditions on the uses and disclosures that may be made without patient authorization. However, HIPAA’s protections did not comprehensively address genetic discrimination by employers or insurers in all contexts (Prince & Roche, 2014).

    Recognizing these gaps, specific protections began to emerge. The Executive Order 13145 of 2000 prohibited genetic discrimination in federal employment, which served as an important precedent (Mikail, 2008). The landmark legislation in this area is the Genetic Information Nondiscrimination Act (GINA) of 2008. GINA has two main components. Title I prohibits health insurers from using genetic information to deny coverage, adjust premiums, or impose pre-existing condition exclusions. Title II prohibits employers from using genetic information in hiring, firing, job assignments, or promotion decisions (Prince & Roche, 2014). GINA broadly defines genetic information as an individual’s genetic tests, the genetic tests of family members, and the manifestation of a disease or disorder in family members. GINA’s protections are not exactly absolute. It does not cover life insurance, disability insurance, or long-term care insurance, nor does it apply once a genetic condition has manifested as a disease (Prince & Roche, 2014). Many state-level anti-discrimination laws also exist, some offering broader protection than GINA, creating a complex patchwork of regulations (Mikail, 2008).

    Without robust legal protections for genetic information, there would be severe ethical problems. If not for GINA, individuals might face employment discrimination. Knowledge of a genetic predisposition to a future illness could lead to employment discrimination, regardless of current ability to perform the job. Similarly, health insurance discrimination could resurface, with insurers denying coverage or charging prohibitive premiums based on genetic risk profiles. This would make healthcare inaccessible for those deemed genetically “less desirable” (Prince & Roche, 2014).

    Beyond these tangible economic harms, the lack of protection could spread social stigmatization. Individuals with known genetic predispositions to certain conditions, especially those with societal stigma like mental illness or certain hereditary disorders, could face prejudice in personal relationships. The fundamental erosion of privacy concerning one’s genetic makeup would be a considerable ethical breach, undermining individual autonomy and dignity.

    Further effects on research participation are likely. If people feared that genetic information could be used against them by employers or insurers, they would be far less willing to participate in genomic studies (Prince & Roche, 2014). This would impede scientific progress and the development of new treatments and preventive strategies, ultimately harming public health. Also, decisions around reproductive health could be unduly influenced or coerced if genetic information about potential offspring carried risks of discrimination or social penalty. The very fabric of trust between individuals, healthcare providers, employers, and insurers would be damaged.

    The proliferation of large-scale genomics databases offers the potential to advance our understanding of human health and disease (Mikail, 2008). These resources allow researchers to study genetic and environmental contributions to disease at an unprecedented level. However, they also present some more risks. One major risk is re-identification. Even when data is de-identified by removing direct identifiers like names and addresses, the uniqueness of an individual’s genomic sequence and other available datasets can create pathways for re-linking anonymized data to specific individuals (Shabani & Borry, 2018). Data breaches and hacking are a constant threat, and the compromise of a large genomic database could expose highly sensitive information for millions. There is also the risk of misuse by third parties. Data collected for research under specific consent could be sought by law enforcement, used by commercial entities for purposes not envisioned initially, or accessed by unauthorized entities.

    Similarly, findings from these databases can lead to group harm or stigmatization. If research links certain genetic variants more prevalent in specific ancestral or ethnic groups to particular diseases or traits, this could fuel discrimination or prejudice against entire communities, regardless of individual genetic makeup (Shabani & Borry, 2018).

    Managing incidental findings poses ethical and logistical challenges, as well. With the looming advent of quantum computing, robust de-identification and anonymization techniques are a first step but often insufficient alone. Strong data security measures are critical, including encryption, stringent access controls, and secure computing environments (Shabani & Borry, 2018). Tiered access models can allow different levels of data access based on researcher credentials and project justification, with stricter controls for more sensitive or identifiable data.

    Strict governance and oversight through Institutional Review Boards, dedicated data access committees, and security policies are important foundational aspects (Shabani & Borry, 2018). Informed consent should always be transparent and comprehensive, moving towards dynamic consent models that allow participants to control how their data is used for future research. Legal and regulatory frameworks, such as the EU’s General Data Protection Regulation (GDPR) and GINA, provide important baselines but may need further adaptation for the genomic era (Shabani & Borry, 2018). Data Use Agreements between institutions and researchers create contractual obligations for responsible data handling. Finally, transparency with the public about data governance and security practices, alongside ongoing public engagement, is important for building and maintaining trust. Emerging privacy-enhancing technologies, such as differential privacy and homomorphic encryption, also hold promise for future mitigation efforts.

    The journey of human genetics from basic science to impactful public health applications has been remarkable, but it is intrinsically linked with complex ethical considerations. Anti-discrimination laws like GINA and privacy regulations like HIPAA provide an essential, though not exhaustive, shield against the misuse of genetic information. Without these protections, individuals would face risks of discrimination and violations of privacy, potentially undermining both personal well-being and the progress of beneficial research. As we increasingly rely on large-scale genomic databases, the challenges of ensuring data security and ethical use intensify. To navigate these risks, a multi-layered approach involving robust technical safeguards, strong governance, transparent consent processes, and ongoing public dialogue is necessary. Ultimately, the responsible integration of genomics into public health and medicine depends on our collective commitment to upholding individual rights while harnessing the immense potential of genetic knowledge to improve human health for all.

    References

    Mikail, C. N. (2008). Public Health Genomics. San Francisco: Wiley.

    Prince, A. E., & Roche, M. I. (2014). Genetic information, non-discrimination, and privacy protections in genetic counseling practice. Journal of genetic counseling, 23(6), 891–902. https://doi.org/10.1007/s10897-014-9743-2.

    Shabani, M., & Borry, P. (2018). Rules for processing genetic data for research purposes in view of the new EU General Data Protection Regulation. European journal of human genetics : EJHG, 26(2), 149–156. https://doi.org/10.1038/s41431-017-0045-7.

  • The field of neurodevelopmental genetics is tasked with unraveling the biological underpinnings of conditions that are defined by behavior but have their roots in the genome. Among these, Autism Spectrum Disorder (ASD) presents one of the most formidable challenges due to its profound clinical and etiological heterogeneity. The diagnostic journey for a family with a child with ASD often involves a broad, multi-step genomic investigation that can be costly, lengthy, and frequently inconclusive. It is interesting, then, to examine the unique relationship between ASD and Fragile X Syndrome (FXS). FXS stands as the most common single-gene cause of ASD. This well-defined molecular link has established the targeted genetic test for FXS as a standard, cost-effective, and high-yield component of the initial diagnostic workup for individuals with autism spectrum disorder (ASD). The practice of using this specific, targeted test provides a fascinating point of diagnostic clarity within the otherwise vast and often uncertain genetic landscape of ASD.

    ASD is characterized by a core set of symptoms, including deficits in social communication and the presence of restricted and repetitive behaviors (Pyeritz, Korf, & Grody, 2019). FXS, while having its own distinct set of physical and cognitive features, frequently presents with behaviors that fall squarely within the autism spectrum. Research has shown that approximately 60% of males with FXS also meet the full diagnostic criteria for ASD, making the behavioral presentation of the two conditions often indistinguishable in a clinical setting (Kaufmann, et al., 2017). The phenotypic overlap is rooted in a shared underlying neurobiology. FXS is caused by the silencing of the FMR1 gene, resulting in the absence of the FMRP protein.

    Having shared biology misrepresents a fundamental difference in their genetic architectures. FXS is a monogenic disorder, caused almost exclusively by the expansion of a CGG trinucleotide repeat in the FMR1 gene (Hunter, Berry-Kravis, Hipp, & Todd, 1998). ASD, on the other hand, is polygenic and genetically heterogeneous. There is no single “autism gene.” Instead, hundreds of genes have been implicated, and a large number of cases are attributed to rare, de novo mutations that are not inherited from the parents. The genetic complexity of ASD necessitated the creation of large-scale research initiatives, such as the Simons Simplex Collection, which gathered data from thousands of families to begin identifying these rare genetic risk factors (Fischbach & Lord, 2010). The fact that such a massive undertaking was required for ASD, while the cause of FXS was pinpointed to a single gene, perfectly illustrates the difference in their genetic landscapes.

    It is this difference that makes the FXS test such a valuable tool in the ASD diagnostic process. Given that a significant percentage of individuals with an ASD diagnosis—between 2% and 6%—will test positive for the FMR1 mutation, it represents the most common, currently identifiable genetic cause of autism (Kaufmann, et al., 2017). From a clinical and economic perspective, it is far more efficient to first test for this single, relatively common cause with a targeted and inexpensive molecular test. A positive result provides a definitive etiological diagnosis, which can eliminate the need for a more extensive and costly “diagnostic odyssey” involving chromosomal microarrays and whole-exome sequencing (Shen, et al., 2010).

    For a family, receiving a definitive diagnosis of FXS as the cause of their child’s autism provides immediate clarity. It gives a specific prognosis, enables accurate genetic counseling regarding recurrence risk for future children, and connects them to a well-established community of families and researchers focused on a single condition (Finucane, et al., 2012). This stands in contrast to the experience of most families with idiopathic ASD, whose extensive genetic testing often yields no clear answer. The standard practice of including an FXS test in the initial workup for autism is a model of how clinical genetics can use population data to create an efficient, logical, and cost-effective diagnostic pathway.

    References

    Biancalana, V., Glaeser, D., McQuaid, S., & Steinbach, P. (2015). EMQN best practice guidelines for the molecular genetic testing and reporting of fragile X syndrome and other fragile X-associated disorders. European journal of human genetics : EJHG, 23(4), 417–425. https://doi.org/10.1038/ejhg.2014.185.

    Finucane, B., Abrams, L., Cronister, A., Archibald, A. D., Bennett, R. L., & McConkie-Rosell, A. (2012). Genetic counseling and testing for FMR1 gene mutations: practice guidelines of the national society of genetic counselors. Journal of genetic counseling, 21(6), 752–760. https://doi.org/10.1007/s10897-012-9524-8.

    Fischbach, G. D., & Lord, C. (2010). The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors. Cell, 192-195. https://www.cell.com/action/showPdf?pii=S0896-6273%2810%2900830-5.

    Hunter, J. E., Berry-Kravis, E., Hipp, H., & Todd, P. K. (1998, 06 16). FMR1 Disorders. Retrieved from National Library of Medicine: https://www.ncbi.nlm.nih.gov/books/NBK1384/

    Kaufmann, W. E., Kidd, S. A., Andrews, H. F., Budimirovic, D. B., Esler, A., Haas-Givler, B., . . . Berry-Kravis, E. (2017). Autism Spectrum Disorder in Fragile X Syndrome: Cooccurring Conditions and Current Treatment. Pediatrics, Supp(3). S194–S206. https://doi.org/10.1542/peds.2016-1159F.

    NORD. (2022, 04 18). Fragile X Syndrome. Retrieved from National Organization of Rare Disorders: https://rarediseases.org/rare-diseases/fragile-x-syndrome/

    Pyeritz, R. E., Korf, B. R., & Grody, W. W. (2019). Principles and Practice of medical genetics and genomics, 7th ed. Oxford: Elsevier.

    Quartier, A., Poquet, H., B. G.-D., & … (2017). Intragenic FMR1 disease-causing variants: a significant mutational mechanism leading to Fragile-X syndrome. European journal of human genetics : EJHG, 25(4), 423–431. https://doi.org/10.1038/ejhg.2016.204.

    Shen, Y., Dies, K. A., Holm, I. A., Bridgemohan, C., Sobeih, M. M., Caronna, E. B., . . . al., e. (2010). Clinical Genetic Testing for Patients With Autism Spectrum Disorders. Pediatrics, 125(4), e727–e735. https://doi.org/10.1542/peds.2009-1684.

    Tick, B., Bolton, P., Happé, F., Rutter, M., & Rijsdijk, F. (2016). Heritability of autism spectrum disorders: a meta-analysis of twin studies. Journal of child psychology and psychiatry, and allied disciplines, 57(5),, 585–595. https://doi.org/10.1111/jcpp.12499.

  • Genetic testing has expanded from clinical settings into the commercial sphere, offering insights into ancestry, health risks, and diagnoses. The rapid proliferation raises questions about the accuracy of these tests and their practical value (Wakefield et al., 2019). Determining whether a genetic test is valid and useful requires careful evaluation of its analytical performance, ability to predict clinical outcomes, and potential impact on an individual’s health and well-being (NIH, 2020).

    Evaluating a genetic test involves assessing its validity and utility (NIH, 2020). Analytical validity addresses the technical performance of the test in the laboratory, including its accuracy and reliability in detecting the specific genetic target (NIH, 2020). In the United States, clinical laboratories generally adhere to CLIA standards to help ensure analytical validity (Katsanis & Katsanis, 2013). Clinical validity measures the test’s ability to consistently and accurately predict the presence, absence, or risk of a specific disease or health condition (NIH, 2020). This requires robust evidence linking genetic findings to the health outcome (Katsanis & Katsanis, 2013). Beyond validity, clinical utility assesses the practical usefulness of the test – whether using the result leads to tangible improvements in patient care or health outcomes, weighing benefits against potential harms (NIH, 2020). Achieving utility can require careful test selection and interpretation, emphasizing the value of expert guidance like genetic counseling (Resta, 2020; Solomon, 2024).

    Applying these concepts helps analyze the diverse genetic tests available. Direct-to-consumer (DTC) Ancestry Testing analyzes SNPs to estimate biogeographical origins and connect users with relatives (Kirkpatrick & Rashkin, 2017). While SNP genotyping accuracy (analytical validity) is generally high, interpreting ancestry percentages depends heavily on proprietary databases, impacting interpretive validity. The tests possess very low clinical utility, serving primarily personal discovery goals. Results provide ancestry estimates and relative matching; genetic counseling can help users understand limitations and potential unexpected findings (Kirkpatrick & Rashkin, 2017). Another commercial category is DTC Health Risk and Wellness Testing, reporting on predispositions or carrier status, often based on GWAS findings (Wakefield et al., 2019). Analytical validity for tested SNPs is typically high, but clinical validity for complex disease prediction is often limited due to assessing a few variants with minor effects and the potential lack of generalizability across diverse ancestries. Clinical utility is frequently questionable, as results are not diagnostic and can cause anxiety or false reassurance without proper context (Wakefield et al., 2019). Results indicate statistical risk or carrier status, underscoring the need for consultation with healthcare professionals (NIH, 2020).

    Clinical Diagnostic Genetic Testing aims to identify the genetic cause of individual symptoms, using methods from single-gene tests to whole exome sequencing (WES) (Katsanis & Katsanis, 2013). Analytical validity in regulated labs is high, and clinical validity can also be high if a clear pathogenic variant explains the symptoms. The clinical utility may be immense, potentially ending a diagnostic odyssey and informing management (Katsanis & Katsanis, 2013). However, interpreting vast data from WES remains challenging, often yielding Variants of Uncertain Significance (VUS) that require expert assessment (Resta, 2020). Clinical Predictive Testing is offered to asymptomatic individuals with a known family history of a condition like hereditary cancer (Katsanis & Katsanis, 2013). Clinical validity is strong for well-established tests, and clinical utility comes from enabling risk-reducing interventions (NIH, 2020). Clinical Carrier Screening identifies carriers for recessive or X-linked conditions to inform reproductive planning (Katsanis & Katsanis, 2013). Analytical validity is high for targeted variants, while clinical validity depends on panel relevance to ancestry. Clinical utility involves informing reproductive decisions (Katsanis & Katsanis, 2013). Counseling helps interpret results and options (Resta, 2020).

    Genetic testing offers important benefits, including diagnosis, risk clarification, personalized treatment guidance, and reproductive information (NIH, 2020). However, there are some risks. Tests can be inaccurate or poorly predictive, and results, especially from DTC sources, may be misinterpreted without guidance, leading to psychological harm (Resta, 2020). Testing can reveal sensitive or unexpected information (Kirkpatrick & Rashkin, 2017). Privacy concerns and potential discrimination persist and the lack of diversity in genomic research limits test validity in many populations, potentially worsening health disparities (Wakefield et al., 2019). Interpretive challenges, like VUS, remain common (Katsanis & Katsanis, 2013). Genetic counseling is an indispensable resource for navigating these complexities, helping individuals make informed choices, understand results, manage uncertainty, and ensure testing aligns with their values (Solomon, 2024). Evaluating validity and utility, understanding limitations, and seeking expert guidance is crucial for the responsible use of genetic testing.

    References

    Katsanis, S. H., & Katsanis, N. (2013). Molecular genetic testing and the future of clinical genomics. Nature Reviews Genetics, 14(6), 415–426. https://doi.org/10.1038/nrg3493  

    Kirkpatrick, B. E., & Rashkin, M. D. (2017). Ancestry testing and the practice of genetic counseling. Journal of Genetic Counseling, 26(1), 47–55. https://doi.org/10.1007/s10897-016-0014-2  

    National Institutes of Health. (2020). How can consumers be sure a genetic test is valid and useful? MedlinePlus. Retrieved April 26, 2025, from https://medlineplus.gov/genetictesting.html

    Resta, R. (2020). Birds of a feather? Genetic counseling, genetic testing, and humanism. Journal of Genetic Counseling, 29(6), 931–938.  10.1101/cshperspect.a036673

    Solomon, I. (2024). Reduction of Health Care Costs and Improved Appropriateness of Incoming Test Orders- the Impact of Genetic Counselor Review in an Academic Genetic Testing Laboratory. Journal of the American College of Cardiology, 83(13S), 1102. https://doi.org/10.1007/s10897-018-0226-8

    Wakefield, E., et al. (2019). The future of commercial genetic testing: Workshop recommendations on commercial genetic testing related reporting and counseling. Journal of Genetic Counseling, 28(1), 3–12. 10.1097/MOP.0000000000001260

  • The advent of non-invasive prenatal testing (NIPT) has revolutionized prenatal care, offering pregnant individuals a highly accurate method for screening for common fetal chromosomal abnormalities. This helps parents determine prevention measures or treatments for specific genetic disorders, such as trisomy 21 (Down Syndrome).

    Non-invasive prenatal testing is a screening method that analyzes cell-free DNA (cfDNA) circulating in a pregnant person’s blood. The cfDNA is a mixture of maternal and fetal DNA, the latter originating from the placenta. By sequencing the DNA, NIPT can detect an aneuploidy in the fetus. The primary indication for NIPT is to screen for the most common trisomies: Trisomy 21 (Down syndrome), Trisomy 18 (Edwards syndrome), and Trisomy 13 (Patau syndrome). It can also screen for sex chromosome aneuploidies. The test is performed via a simple blood draw from the pregnant person and can be done as early as 10 weeks of gestation.

    NIPT has demonstrated outstanding performance as a screening tool, particularly for Trisomy 21. Sensitivity refers to the test’s ability to correctly identify those with the condition, while specificity refers to its ability to correctly identify those without the condition. A large-scale meta-analysis of NIPT performance found that for Trisomy 21, the pooled sensitivity was 99.7%, and the specificity was 99.6% (Gil et al., 2017). These figures indicate that NIPT is exceptionally accurate in detecting and ruling out Down syndrome, far surpassing the accuracy of older screening methods, such as maternal serum screening.

    Despite its high accuracy, NIPT still has limitations that must be understood. The most important limitation is that it is a screening test, not a diagnostic test. This means it provides a risk assessment, not a definitive diagnosis. A positive NIPT result must be confirmed with a diagnostic test, such as amniocentesis or chorionic villus sampling (CVS), which analyzes fetal cells directly but carries a small risk of miscarriage.

    Another limitation is related to the test’s Positive Predictive Value (PPV). The PPV is the probability that a positive screening result is a true positive. This value is highly dependent on the mother’s age and the prevalence of the condition in the population. For a young, low-risk individual, the PPV for Trisomy 21 can be lower, meaning a positive result has a higher chance of being a false positive compared to the same result in a high-risk individual (ACOG, 2020). Other limitations include the possibility of a test failure due to insufficient fetal DNA in the sample and the fact that NIPT does not screen for all genetic conditions, such as single-gene disorders, microdeletions, or structural abnormalities like neural tube defects.

    Given these factors, counseling for patients considering NIPT is essential. The following recommendations should be provided:

    1. Patients should understand that NIPT is a highly accurate screening tool, but it is not a definitive diagnosis. Life-altering decisions should never be made based solely on an NIPT result.
    2. Pre-test counseling is important for setting realistic expectations. Patients should be informed about what the test screens for, its limitations, and the meaning of a positive, negative, or inconclusive result.
    3. Following a positive result, patients should be offered comprehensive genetic counseling and confirmatory diagnostic testing to receive a definitive diagnosis.
    4. Patients should be counseled on the concept of Positive Predictive Value and understand that their personal risk profile affects the interpretation of a positive result.

    References

    American College of Obstetricians and Gynecologists’ Committee on Practice Bulletins—Obstetrics; Committee on Genetics; Society for Maternal-Fetal Medicine. (2020). Screening for fetal chromosomal abnormalities: ACOG Practice Bulletin, Number 226. Obstetrics and Gynecology, 136(4), e48–e69. https://doi.org/10.1097/AOG.0000000000004084

    Gil, M. M., Accurti, V., Santacruz, B., Plana, M. N., & Nicolaides, K. H. (2017). Analysis of cell-free DNA in maternal blood in screening for aneuploidies: updated meta-analysis. Ultrasound in Obstetrics & Gynecology, 50(3), 302–314. https://doi.org/10.1002/uog.17484

  • Epigenetics offers a framework for understanding how gene expression can be modified without altering the underlying DNA sequence (Gibney & Nolan, 2010). This concept is fundamental because nearly all cells within an organism contain the exact genetic blueprint, yet specialized cells activate only the subset of genes necessary for their specific identity and function. While some epigenetic changes might be transient within a single cell, many such modifications can be stable through mitosis, ensuring that cellular identity is maintained (Allis & Jenuwein, 2016). Even with a complete understanding of how DNA sequence variations affect gene function, comprehending epigenetics is necessary to explain the patterns of gene activity observed in organisms. Various mechanisms contribute to this layer of gene regulation.

    Several types of epigenetic mechanisms influence gene function, including DNA methylation (Gibney & Nolan, 2010), histone modification (Allis & Jenuwein, 2016), and non-coding RNA sequences. Molecules such as microRNAs and long non-coding RNAs can influence gene expression without being translated into proteins themselves (Allis & Jenuwein, 2016). LncRNAs can have various roles, like guiding chromatin-modifying complexes to specific locations.

    The potential for such epigenetic effects to persist across generations is known as transgenerational epigenetic inheritance. This can vary considerably among the mentioned mechanisms. For DNA methylation, patterns are generally maintained through mitosis within an individual, contributing to stable cell lineages (Allis & Jenuwein, 2016). However, during germline development and early embryogenesis in mammals, extensive demethylation and subsequent remethylation reprogram the epigenome, erasing most parental methylation patterns (Heard & Martienssen, 2014). While some loci might escape reprogramming, allowing for potential inheritance, it is not considered a widespread phenomenon for most genes (Heard & Martienssen, 2014). Histone modifications are also largely reset during gametogenesis. Although some histone marks might persist in sperm or egg cells, conclusive evidence for their stable transmission and functional impact across multiple generations in mammals is limited compared to the maintenance observed within an individual’s somatic cells (Heard & Martienssen, 2014). Non-coding RNAs can be packaged into gametes and might influence the development of the immediate offspring, but their concentration typically diminishes with subsequent cell divisions, making stable inheritance over many generations improbable for most ncRNAs (Heard & Martienssen, 2014). Chromatin remodeling complexes act dynamically, and their specific configurations are unlikely to be directly inherited through gametes; the chromatin states they establish are generally reset during germline reprogramming. Therefore, while all these mechanisms contribute significantly to gene regulation within an individual and maintain cell identity through mitosis, their stable inheritance across generations through meiosis is generally limited in mammals (Heard & Martienssen, 2014).

    Each of the above mechanisms can change gene function, often by influencing the rate of transcription and the amount of gene product produced. DNA methylation, mainly occurring at CpG islands, is commonly associated with gene silencing (Gibney & Nolan, 2010). The methyl groups can directly interfere with the binding of transcription factors or recruit methyl-binding proteins that recruit repressive complexes, leading to chromatin condensation and a decrease in the gene product. The lack of methylation in these regions permits transcription, sometimes leading to an increase in gene product as long as the appropriate activators are present (Gibney & Nolan, 2010). For example, histone modifications can alter chromatin structure. Histone acetylation cancels out the positive charge of lysine residues on histone tails. This weakens the interaction with DNA, which is negatively charged. This increases accessibility for transcription factors and RNA pol, causing an increase in the gene product (Gibney & Nolan, 2010). Deacetylation reverses the effect, compacting chromatin and leading to a decrease in the gene product. Histone methylation can have opposing effects depending on the specific amino acid residue methylated and the degree of methylation; for instance, methylation of histone H3 at lysine 9 (H3K9me3) is typically a repressive mark associated with heterochromatin and a decrease in gene product, whereas methylation of H3 at lysine 4 (H3K4me3) is often found near active promoters and associated with an increase in gene product (Allis & Jenuwein, 2016). Non-coding RNAs and miRNAs mainly function post-transcriptionally, wherein a specific miRNA binds to complementary sequences in the 3′ region of a target mRNA, leading either to the degradation of the mRNA or the inhibition of its translation into protein, both resulting in a decrease in gene product (Allis & Jenuwein, 2016). Chromatin remodeling complexes directly alter nucleosome positioning. By shifting or removing nucleosomes from promoter or enhancer regions, they can expose regulatory DNA sequences, which facilitate transcription factor binding and causes an increase in the gene product. They can also decrease gene expression by positioning nucleosomes to obscure these sites (Allis & Jenuwein, 2016).

    Variations in the expression of the TPMT gene can increase bone marrow toxicity in patients treated with thiopurine immunosuppressant drugs. Thiopurines are commonly used in treating autoimmune diseases, inflammatory bowel disease, and certain cancers. The TPMT enzyme provides an inactivation pathway that converts the drugs into inactive methylated metabolites. The inactivation pathway is impaired if the amount of TPMT gene product is decreased, which occurs in individuals with specific genetic variants that lead to lower-than-average enzyme activity. This results in shunting the drug metabolism towards the production of higher levels of the active, cytotoxic TGNs. In this case, TGN accumulates in hematopoietic progenitor cells within the bone marrow, increasing cytotoxicity. This then manifests as severe bone marrow suppression. Bone marrow suppression is characterized by severe leukopenia, thrombocytopenia, and anemia. These symptoms can increase the risk of life-threatening infections and bleeding (Relling et al., 2019).

    However, if the TPMT gene product were to be greatly improved, this would lead to high enzyme activity, inactivating the thiopurine drugs too quickly. This would lower intracellular concentrations of active TGNs and diminish the drug’s therapeutic effectiveness at normal doses. The safe and effective dosing of thiopurine drugs heavily depends on our understanding of the level of TPMT function (Relling et al., 2019).

    References

    Allis, C. D., & Jenuwein, T. (2016). The molecular hallmarks of epigenetic control. Nature Reviews Genetics, 17(8), 487–500. https://doi.org/10.1038/nrg.2016.59  

    Gibney, E. R., & Nolan, C. M. (2010). Epigenetics and gene expression. Heredity, 105(1), 4–13. https://doi.org/10.1038/hdy.2010.54

    Heard, E., & Martienssen, R. A. (2014). Transgenerational epigenetic inheritance: myths and mechanisms. Cell, 157(1), 95–109. https://doi.org/10.1016/j.cell.2014.02.045

    Relling, M. V., Schwab, M., Whirl-Carrillo, M., Suarez-Kurtz, G., Pui, C. H., Stein, C. M., Moyer, A. M., Evans, W. E., Klein, T. E., Antillon-Klussmann, F. G., Caudle, K. E., Kato, M., Yeoh, A. E. J., Schmiegelow, K., & Yang, J. J. (2019). Clinical Pharmacogenetics Implementation Consortium Guideline for Thiopurine Dosing Based on TPMT and NUDT15 Genotypes. Clinical Pharmacology & Therapeutics, 105(5), 1095–1105. https://doi.org/10.1002/cpt.1304

  • The legal regulation of marriage between close relatives, known as consanguinity laws, has deep historical roots in the United States. These laws draw from a mixture of English common law, religious doctrine, and social customs. While prohibitions on incest have ancient origins, the specific state laws banning cousin marriage in the U.S. began to appear in the mid-19th century. This movement was part of a broader trend to increase state authority over marriage alongside regulations concerning age and medical fitness (Paul & Spencer, 2008). The initial motivations were often related to social and moral standards and a desire to ensure clear lines of inheritance.

    As scientific understanding evolved, particularly with the development of genetics in the late 19th and early 20th centuries, a new rationale for these laws emerged. It became understood that children of closely related parents had a higher risk of inheriting recessive genetic disorders. This is because close relatives are more likely to carry identical harmful recessive alleles, and their offspring have an increased chance of inheriting two copies, which can result in serious health conditions (Hamamy, 2012). This scientific evidence provided a strong public health justification for states to enact and maintain laws prohibiting consanguineous marriages, shifting the basis from purely moral grounds to include medical and genetic concerns.

    North Carolina’s laws on consanguinity reflect the combination of historical tradition and modern scientific reasoning. The specific regulations are outlined in North Carolina General Statute, which states that all marriages “between any two persons nearer of kin than first cousins” are void. The statute also includes a specific and less common prohibition against marriages “between double first cousins.” While North Carolina generally permits marriage between first cousins, it carves out an exception for this relationship (North Carolina General Assembly, 2025). Double first cousins occur when two siblings from one family marry two siblings from another. Their children have an inbreeding coefficient of 0.125, which is double the risk of typical first cousins and makes them genetically similar to half-siblings (Hamamy, 2012).

    North Carolina’s legal framework fits squarely within the broader history of consanguinity laws in the United States. It upholds the widespread prohibition against marriages between very close relatives while allowing for first-cousin marriage, a practice permitted in about 20 states. Including a ban on double first cousins demonstrate a particularly nuanced approach, acknowledging a scenario with elevated genetic risk that many other state laws do not specifically address (Paul & Spencer, 2008). This provision suggests that North Carolina’s legislature has considered the degree of relationship and the specific genetic implications of certain unions.

    References

    Hamamy, H. (2012). Consanguineous marriages: Preconception consultation in primary health care settings. Journal of Community Genetics, 3(3), 185–192. https://doi.org/10.1007/s12687-011-0072-y

    North Carolina General Assembly. (2025, 06 15). Want of capacity; void and voidable marriages. Retrieved from NCLEG: https://www.ncleg.net/enactedlegislation/statutes/html/bysection/chapter_51/gs_51-3.html

    Paul DB, Spencer HG (2008) “It’s Ok, We’re Not Cousins by Blood”: The Cousin Marriage Controversy in Historical Perspective. PLoS Biol 6(12): e320. https://doi.org/10.1371/journal.pbio.0060320

  • The study by Teng et al (2017) provides a detailed population genomic analysis of the Brown Norway rat and its sibling species, Rattus nitidus, to reconstruct their evolutionary history. The main findings reveal that the speciation event separating the two rat species likely occurred during the drastic climatic changes of the Middle Pleistocene. Following this divergence, the researchers uncovered evidence of widespread and geographically significant gene flow, or introgression, from R. nitidus into Brown Norway rat populations. Some of these introgressed genes, particularly those related to chemical communication, appear to have been adaptive. This could have potentially contributed to the Brown Norway rat’s remarkable success as a global colonizer. Also, the study identified signatures of positive selection in genes related to metabolism and immune response, further explaining the rat’s adaptability.

    To arrive at these conclusions, the authors employed a comprehensive suite of computational methods appropriate for whole-genome data. After sequencing 51 Brown Norway rats, they used standard bioinformatics pipelines, including the Genome Analysis Toolkit (GATK), to identify millions of genetic variants. To investigate population relationships, they used a combination of Principal Component Analysis (PCA) and the model-based clustering program Admixture. The demographic history and divergence timing were inferred using the Pairwise Sequentially Markovian Coalescent (PSMC) model and the Bayesian coalescent-based tool G-PhoCS. To specifically test for and localize gene flow, they used Patterson’s D-statistics and the modified f-statistic (ƒd), which are designed to detect imbalances in shared genetic ancestry. In order to identify regions of the genome under positive selection, they used a cross-population composite likelihood ratio (XP-CLR) test.

    I would have used the same analytical framework as the authors. The chosen methods represent a logical and hardy progression for addressing the study’s core questions. Using whole-genome sequencing was necessary to capture a comprehensive picture of variation, and the combination of PCA and Admixture is a standard and powerful approach for defining population structure without strong prior assumptions. The use of D-statistics and ƒd was an appropriate strategy. These are the gold-standard methods for detecting and quantifying introgression, a central finding of the paper. The decision to integrate multiple lines of evidence from demographic modeling, introgression tests, and selection scans creates a more compelling and well-supported evolutionary narrative than any single analysis could provide.

    One of the main strengths of the study was its use of simulations to provide a null hypothesis for comparison. To determine which genomic regions showed statistically significant evidence of selection, the researchers needed to know what patterns of genetic variation would be expected by chance under the species’ specific demographic history. They used a whole-genome simulation program called ARGON to generate genomic data under a neutral model, with parameters informed by their own demographic inferences from PSMC and G-PhoCS. By running their selection scan (XP-CLR) on this simulated neutral data, they could establish a reliable significance threshold. This allowed them to confidently distinguish true selective sweeps in their real rat data from patterns that could have arisen simply due to random genetic drift within their inferred population history.

    References

    Salojärvi, J. (2019). Computational tools for population genomics. In O. P. Rajora (Ed.), Population genomics: Concepts, approaches and applications (pp. 127–152). Cham, Switzerland: Springer Nature Switzerland AG.

    Teng, H., Zhang, Y., Shi, C., Mao, F., Cai, W., Lu, L., Zhao, F., Sun, Z., & Zhang, J. (2017). Population Genomics Reveals Speciation and Introgression between Brown Norway Rats and Their Sibling Species. Molecular biology and evolution34(9), 2214–2228. https://doi.org/10.1093/molbev/msx157