On December 24th, the Genome Reference Consortium (GRC) submitted a new assembly for the human genome (GRCh38) to GenBank. These data are now available in the Assembly database with accession GCA_000001405.15and are also available on the FTP site. Please note the GRC provides these assemblies as unannotated sequences.
Now that the GRC sequences are available in GenBank, our Reference Sequence (RefSeq) Genome Annotation Group has downloaded these sequences and has begun processing them using our eukaryotic annotation pipeline. These resulting human chromosome sequences will continue to have the RefSeq accessions NC_000001-NC_000024, but their versions will increment as the update to the GRCh38 assembly includes a sequence change for all chromosomes. The process of annotating the human genome generally takes about 2 weeks. When this is complete, we will incorporate these sequences into various analysis and display tools, such as human genome BLAST, NCBI Remapping Service, and various genome viewers. Thus, at the end of this process each chromosome will be represented by both an unannotated sequence in GenBank (the original GRC data) and an annotated sequence in the RefSeq collection.
Please check back frequently for updates on the NCBI News and our social media sites (NCBI Twitter Channel, NCBI Facebook Page, NCBI Announce RSS Feed, NCBI Announce Email ListServ) as this process unfolds.
In addition, we have a series of posts on the NCBI Insights Blog site on topics such as how NCBI processes genome annotations, a tip to remap annotations from older assemblies to GRCh38, and highlighting some loci that have changed significantly in the new assembly.