Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #134: Restructure spec and address several significant PRC review issues #137

Merged
merged 17 commits into from
Jul 25, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Closes #129: Write computed identifier proposal in appendix
  • Loading branch information
reece committed Jul 24, 2019
commit eba3e1dd7526d5283e078a71007b24b28deaa4f7
79 changes: 54 additions & 25 deletions docs/source/appendices/ga4gh_identifiers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,57 @@
Proposal for GA4GH-wide Computed Identifier Standard
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Independence from VR proposal

The Variation Representation team created the computed
identifier scheme for VR objects. However, this scheme is
applicable and useful to the entire GA4GH ecosystem. As a
result, we are proposing that the computed identifier scheme
described here be considered for adoption as a GA4GH-wide
standard. For this reason, we have adopted the use of the
`ga4gh` prefix above.



adv for adopting now: sets standard and enables VR adopters to use
future-proof ids

Rationale
- coherence
- shared objects (e.g., sequence)
- shared tools

Specific proposal
- truncated digest called GA4GH Digest
- identifier GA4GH Identifier
- namespace
- type prefix administration
This appendix describes a proposal for creating a GA4GH-wide standard
for serializing data, computing digests on serialized data, and
constructing CURIE identifiers from the digests. Essentially, it is a
generalization of the :ref:`computed-identifier` section.

This standard is proposed now because the VR Specification needs a
well-defined mechanism for generating identifiers. Changing the
identifier mechanism later will create significant issues for VR
adopters.


Background
@@@@@@@@@@

The GA4GH mission entails structuring, connecting, and sharing data
reliably. A key component of this effort is to be able to *identify*
entities, that is, to associate identifiers with entities. Ideally,
there will be exactly one identifier for each entity, and one entity
for each identifier. Traditionally, identifiers are assigned to
entities, which means that disconnected groups must coordinate on
identifier assignment.

The computed identifier scheme proposed in the VR Specification
computes identifiers from the data itself. Because identifers depend
on the data, groups that independently generate the same variation
will generate the same computed identifier for that entity, thereby
obviating centralized identifier systems and enabling identifiers to
be used in isolated settings such as clinical labs.

The computed identifier mechanism is broadly applicable and useful to
the entire GA4GH ecosystem. As a result, we are proposing that the
computed identifier scheme described in the VR specification be
considered for adoption as a GA4GH-wide standard.

Adopting a common identifier scheme will make interoperability of
GA4GH entities more obvious to consumers, will enable the entire
organization to share common entity definitions (such as sequence
identifiers), and will enable all GA4GH products to share tooling that
manipulate identified data. In short, it provides an important
consistency within the GA4GH ecosystem.


Proposal
@@@@@@@@

The GA4GH computed identifier proposal consists of a truncated digest,
syntax for a GA4GH identifier, declaration of the namespace, and a
system for adminstering entity type prefixes.

* seralization
* truncated digest
* identifier GA4GH Identifier
* namespace
* type prefix administration
2 changes: 1 addition & 1 deletion docs/source/appendices/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Appendices
:maxdepth: 2

design_decisions
ga4gh_digest
ga4gh_identifiers
vr-python/index
truncated_digest_collision_analysis
development_process
Expand Down
4 changes: 3 additions & 1 deletion docs/source/impl-guide/computed_identifier.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@
approved and have therefore assumed acceptance when writing
this section. In the event that the proposal is not
accepted, this section will be modified as described in
:ref:`plan-b`.
:ref:`plan-b`. Because identifiers created through this
proposal are expected to be durable, it is critical for
GA4GH to make a long-term decision regarding identifiers.


Computed Identifiers
Expand Down