Abstract: The advent of third-generation long-range DNA sequencing and mapping techniques has permitted nearly perfect or very high quality de novo assemblies of genomes. However, most overlap graph de novo assemblers still require large amounts of computer memory to resolve the large genome graphs. Here, we apply string graph reduction algorithms for genome assembly using Apache Spark on a distributed cloud computing platform.
Loading