Human Reference Rna

The reference sequence has been a guiding principle for the development of a vast array of reagents, arrays, genotyping assays, computational tools, and clinical resources. Moreover, the reference sequence is the foundation for databases and bioinformatics algorithms that are used to define target regions for resequencing, perform genome wide association studies, or measure inter-species conservation. Thus, the reference sequence has become essential for clinical applications, and is used to determine alleles for risk, protection, or treatment-specific response in human disease .

For the creation of the first human reference genome to be assembled from a single individual, we chose HG002, an Ashkenazi individual who is part of the Personal Genome Project . The PGP uses the Open Consent Model, the first truly open-access platform for sharing individual human genome, phenotype, and medical data . The consent process educates potential participants on the implications and risks of sharing genomic data, and about what they can expect from their participation.

This freedom lead to different conventions being adopted by different teams. The Toronto and Boston teams previously did two smaller studies mapping a total of ~14,000 protein interactions. Now HuRI has interrogated proteins encoded by nearly all human protein-coding genes and expanded the map four-fold.

Building on this draft, scientists have carried out many sequencing projects over the past 20 years to identify and catalog genetic differences between an individual and the reference genome. For the first time, current technologies are beginning to detect and characterize larger differences – called structural variants – such as insertions of several hundred letters. Ongoing improvements in sequencing technology and diminishing costs make the generation of high-quality genome assemblies from diverse populations possible in a way today that could only have been imagined during the Human Genome Project . These new data are likely to form the basis for a new pangenome representation for the reference assembly that includes a graph, but they also raise many as-yet unanswered questions.

The human reference genome, largely completed in 2001, has achieved near-mythic status. “We are discovering remarkable differences in genomic organization which have been missed until now. Understanding these differences improves our ability to make genetic discoveries related to health and disease especially in groups that have been traditionally underserved by genomics research.”

In the special case, you are interested in a loci, where you know there ALT sequences, one can use this. In the most cases this is not manageable, as there are no mainstream tools for variant calling that be aware of the alternate loci. Unless we have very good reasons for it, we only want to use the sequences of the primary assembly and the mitochondrial genome for alignment. The National Institutes of Health is placing a multimillion-dollar bet that he’s right. NHGRI is now evaluating proposals to do that, offering up to $6 million per year to produce high-quality sequences of about 350 genomes.


