Property talk:P594
Documentation
identifier for a gene as per the Ensembl (European Bioinformatics Institute and the Wellcome Trust Sanger Institute) database
List of violations of this constraint: Database reports/Constraint violations/P594#Single value, SPARQL
(ENS(|MUS|RNO|DAR)G\d{11})|(Y[A-P][LR]\d{3}[CW](-[A-G])?)|(WBGene\d{8})|(FBgn\d{7})|(Q\d{4})
”: value must be formatted using this pattern (PCRE syntax). (Help)List of violations of this constraint: Database reports/Constraint violations/P594#Format, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P594#Type Q7187, SPARQL
List of violations of this constraint: Database reports/Constraint violations/P594#Item P703, search
List of violations of this constraint: Database reports/Constraint violations/P594#Unique value, SPARQL (every item), SPARQL (by value)
List of violations of this constraint: Database reports/Constraint violations/P594#Entity types
This property is being used by: Please notify projects that use this property before big changes (renaming, deletion, merge with another property, etc.) |
Maybe need to be removed:
|
|
Not quite unique but close
[edit]It happens that in some instances the same Ensembl id can point to what may be in reality and what other databases such as Entrez consider the same thing. See for example Q18048211 and Q18033903 which both might/can/should link to ENSG00000003096 but do appear to be different entities. This can result in cases where there are multiple records with the same ensembl id. In the vast majority of cases these ids are in fact unique. For more information on this please see the thread: https://www.biostars.org/p/16505/#16604
Added format ENSRNOG
[edit]There are about 19000 format violations due to Ids starting with "ENSRNOG". Assuming these are correct, I am adding the RNO case to the format constraint. -- LaddΩ chat ;) 13:57, 13 June 2016 (UTC)
How to deal with "legitimate" constraint violations
[edit]There are currently a relatively high number of constraint violations reported. Most, if not all are due to the ProteinBoxBot which maintains genes on Wikidata.
The issue is that the ProteinBoxBot, which maintains gene annotations on Wikidata, uses NCBI gene as key. The bot's working is strait forward. On every bot run all NCBI gene annotations are extracted and updates in Wikidata, together with all known external mappings. The Ensembl Gene ID (P594) is part of this set of mappings. Although efforts are in place to harmonize between Ensembl and the NCBI, there stil are cases where one ensembl ID maps to multiple NCBI Gene IDs. I did checks on the constraint violations being reported and they appeared to be "legitimate". There are even cases where 1 Ensembl ID, maps to 14 and 19 NCBI gene IDs. Counts: (2: 486, 3: 48, 4: 12, 5: 3, 6: 3, 7: 1, 8: 1, 10: 1, 14: 1, 19: 1} This is a known issue. The question is how to proceed here. Could we remove the constraint violation on this property? Or do we allow the number of constraint violations reach close to 1000. When more species will get coverage in Wikidata the number of this type of constraint violations will likely increase. --Andrawaag (talk) 22:20, 4 July 2016 (UTC)
- All Properties
- Properties with external-id-datatype
- Properties used on 100000+ items
- Properties with single value constraints
- Properties with format constraints
- Properties with constraints on type
- Properties with constraints on items using them
- Properties with unique value constraints
- Properties with scope constraints
- Properties with entity type constraints
- Genetics properties
- Medical properties