Based on our whole genome comparisons of EC958 hairpins in assemblies and the generation of small spurious contigs

Users should also be aware that small plasmids are not necessarily assembled from PacBio reads using seed read length cut-offs in excess of the total plasmid size, as illustrated in this study with the 4.1 kb pEC958B plasmid. In this case we assembled pEC958B by utilising prior knowledge of the plasmid from the original 454 assembly, however, de novo assembly of the entire genome would be possible by iteratively reducing the seed read length cut-off within HGAP. We previously generated a high-quality draft sequence of E. coli EC958, however, using only PacBio reads we were able to assemble a high-quality complete genome sequence. A comparison of the complete PacBio and draft 454 assemblies revealed a small number of discrepancies, the majority of which were due to homopolymeric tracts in the 454 assembly or collapsed repeats that were resolved in favour of the PacBio consensus after closer inspection. Although contig order and orientation in the original draft assembly was PF-2341066 contiguous with the PacBio assembly, only the latter was able to resolve repetitive regions of the genome such as rRNA operons, extended tracts of tRNAs, prophage loci and insertion sequences within the GI-pheV, GI-selC and GI-leuX genomic islands. The long, multi-kilobase reads produced in SMRT sequencing can be unambiguously anchored with unique sequences flanking these repeats, allowing for their accurate and uninterrupted assembly. Given the rapid improvements in PacBio technology, and the HGAP assembly software, this technology may become the platform of choice for generating highquality reference sequences for bacterial genomes. Comparisons of the complete E. coli EC958 genome against other published ST131 genomes revealed the extensive nucleotide identity that exists between the core genomes of E. coli ST131 clade C strains EC958, NA114 and JJ1886. Although E. coli NA114 possesses many of the genes associated with genomic islands and prophages of EC958 and JJ1886, it lacks insertions at recognised E. coli integration hotspots, including the pheV tRNA gene. Furthermore, it contains a highly atypical insertion of,160 kb within a location that is consistent with the artefactual concatenation of contigs, “junked” at the end of the assembly, that could not be ordered against the SE15 reference genome. Our recent comparative genomic analysis has shown that, with the exception of GI-selC and Phi6, the genomic islands and prophages previously defined in EC958 are prevalent in nearly all other ST131 clade C strains.

Leave a Reply

Your email address will not be published. Required fields are marked *