Shortcut: WD:PP/SCI

Wikidata:Property proposal/Natural science

From Wikidata
Jump to navigation Jump to search

Property proposal: Generic Authority control Person Organization
Creative work Place Sports Sister projects
Transportation Natural science Computing Lexeme

See also

[edit]

This page is for the proposal of new properties.

Before proposing a property

  1. Search if the property already exists.
  2. Search if the property has already been proposed.
  3. Check if you can give a similar label and definition as an existing Wikipedia infobox parameter, or if it can be matched to an infobox, to or from which data can be transferred automatically.
  4. Select the right datatype for the property.
  5. Read Wikidata:Creating a property proposal for guidelines you should follow when proposing new property.
  6. Start writing the documentation based on the preload form below by editing the two templates at the top of the page to add proposal details.

Creating the property

  1. Once consensus is reached, change status=ready on the template, to attract the attention of a property creator.
  2. Creation can be done 1 week after the creation of the proposal, by a property creator or an administrator.
  3. See property creation policy.



Physics/astronomy

[edit]

‎SIMBAD catalog properties (used more than 1 million times)

[edit]

Gaia Data Release 2 ID

[edit]
   Under discussion
Descriptionidentifier for an astronomical object in Gaia Data Release 2
Data typeExternal identifier
Domainastronomical objects
Allowed values[0-9]{18}
Example 1BS Cnc (Q2889194)661284024235415808
Example 2Gliese 450 (Q5880899)4031586157514097024
Example 3TYC 3645-2080-1 (Q75838267)1943381923013901440
SourceGaia Data Release 2 (Q51905050)
Planned usemigrate all P528 values qualified with P972 Q51905050 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR2%20$1

2MASS ID

[edit]
   Under discussion
Descriptionidentifier for an astronomical object in the Two Micron All Sky Survey
Data typeExternal identifier
Domainastronomical objects
Allowed valuesJ[0-9]{8}[+-][0-9]{7}
Example 1BS Cnc (Q2889194)J08390909+1935327
Example 2Gliese 450 (Q5880899)J11510737+3516188
Example 3TYC 3645-2080-1 (Q75838267)J23350993+4851114
Source2MASS (Q1454942)
Planned usemigrate all P528 values qualified with P972 Q1454942 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=2MASS%20$1

Tycho-2 Catalogue ID

[edit]
   Under discussion
Descriptionidentifier for an astronomical object in the Tycho-2 Catalogue
Data typeExternal identifier
Domainastronomical objects
Allowed values[0-9]{1,4}-[0-9]{1,4}-1
Example 1BS Cnc (Q2889194)1395-2445-1
Example 2Gliese 450 (Q5880899)2526-2357-1
Example 3TYC 3645-2080-1 (Q75838267)3645-2080-1
SourceThe Tycho-2 catalogue of the 2.5 million brightest stars (Q2725928)
Planned usemigrate all P528 values qualified with P972 Q2725928 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=TYC%20$1

Gaia Data Release 1 ID

[edit]
   Under discussion
Descriptionidentifier for an astronomical object in Gaia Data Release 1
Data typeExternal identifier
Domainastronomical objects
Allowed values[0-9]{18}
Example 1BS Cnc (Q2889194)661284019938140032
Example 2Gliese 450 (Q5880899)4031586157514097024
Example 3TYC 3645-2080-1 (Q75838267)1943381923012780160
SourceGaia Data Release 1 (Q37859523)
Planned usemigrate all P528 values qualified with P972 Q37859523 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=Gaia%20DR1%20$1

SDSS object ID

[edit]
   Under discussion
Descriptionidentifier for an astronomical object in the Sloan Digital Sky Survey
Data typeExternal identifier
Domainastronomical objects
Allowed valuesJ[0-9]{6}\.[0-9]{2}[+-][0-9]{7}\.[0-9]
Example 1BS Cnc (Q2889194)J083909.03+193532.4
Example 2Gliese 450 (Q5880899)J115106.57+351627.2
Example 3TYC 3645-2080-1 (Q75838267)J233509.93+485111.4
SourceSloan Digital Sky Survey (Q840332)
Planned usemigrate all P528 values qualified with P972 Q840332 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=SDSS%20$1

OGLE-III object ID

[edit]
   Under discussion
Descriptionidentifier for an astronomical object in the Optical Gravitational Lensing Experiment
Data typeExternal identifier
Domainastronomical objects
Example 1R99 (Q22087000)BRIGHT-LMC-MISC-429
Example 2R85 (Q28406638)BRIGHT-LMC-MISC-9
Example 3SV* HV 2827 (Q74703824)LMC-CEP-4689
SourceThe Optical Gravitational Lensing Experiment. The OGLE-III catalog of variable stars. I. Classical Cepheids in the Large Magellanic Cloud (Q67054966)
Planned usemigrate all P528 values qualified with P972 Q67054966 to this property
Formatter URLhttps://simbad.u-strasbg.fr/simbad/sim-id?Ident=OGLE%20$1

Motivation

[edit]

The specific combination of catalog code (P528) qualified by catalog (P972) is used in 24 million statements, the vast majority of which are for astronomical objects. About 14 million of these statements come from six catalogues, so migrating those statements to use these properties would remove the 14 million triples taken up by the P972 qualifiers. (Another 18 catalogues have more statements than the number of statements for inventory number (P217) with qualifier collection (P195) The Palace Museum (Q2047427)—127545 as of 6 August 2024.)

(This migration would similar to the migration that took place after the properties proposed at Wikidata:Property proposal/proper motion components were created. While this page intends to handle only the six largest catalogues, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment.) Mahir256 (talk) 21:56, 6 August 2024 (UTC)[reply]

Discussion

[edit]
@Mahir256 Is there any specific reason why we want to reduce number of P528 statements? Ghuron (talk) 00:03, 7 August 2024 (UTC)[reply]
@Ghuron: We have dedicated external identifier properties rather than lumping them all in a single property and qualifying them, just as we have dedicated website account properties rather than always using website account on (P553) qualified with website username or ID (P554). This proposal is intended as a logical parallel of both of those decisions. Mahir256 (talk) 17:18, 12 August 2024 (UTC)[reply]
@Mahir256: Let me rephrase how I understood your rationalization: if p:P528/pq:P972 wd:Q51905050 occurs more than a million times, then it is both a necessary and sufficient condition for creating a new property, since it reduces the number of triplets and thus reduces the risk of Blazegraph crashing. Is that a correct summary? Ghuron (talk) 22:44, 12 August 2024 (UTC)[reply]
@Ghuron: I would not phrase it quite so absolutely, but I do want to see the number of triples reduced and believe this is a way to do it; an extremely high number of identically structured uses of a generic identification property like catalog code (P528) with the same qualifiers suggests that a more specialized identifier property is worth introducing to streamline things, just as has been done multiple times before. Mahir256 (talk) 16:50, 13 August 2024 (UTC)[reply]
As stated by Ghuron, is there any reason why we need to reduce the number of P528 statements? In the first place there are millions of Gaia IDs because of the import of the Simbad database (I am NOT against this import btw).
Also, I wonder why only some catalogues would have their own properties. This will create a weird in-between for catalogues in P258 vs catalogues having their own properties. This makes no sense imo.
Romuald 2 (talk) 15:31, 8 August 2024 (UTC)[reply]
  • There is nothing wrong with having separate external id properties for most used identifiers with the correct "url formatter".
    But I have 2 major objections:
  1. I don't see any reason to use https://simbad.u-strasbg.fr/simbad/sim-id?Ident= as a url. Those items that are on simbad, we already have Property:P3083 with the link to simbad. Those rare items that are not on simbad, this link will result in 404
  2. Having in mind (1) it would make sense to link to really useful external storages, that are only partially synchronized with simbad (like HyperLEDA or Gaia Archive). And that leads us to question about proposed set of properties:
    1. Why did we choose Gaia DR2, because this is only temporary IDs, permanent are Gaia DR3?
    2. Why did we choose Tycho-2, they pretty much 100% imported in Simbad?
Ghuron (talk) 12:52, 9 August 2024 (UTC)[reply]
  • @Romuald 2: Reducing the number of RDF triples that Wikidata consists of is generally a good thing, as there is a lot of discussion going on about the health of the Query Service and how reducing the number of triples that a single running Blazegraph instance holds is generally a good thing. Also I had noted that there were 18 other catalogs with more entries than the most frequent inventory number source; I only didn't add them to this page because it would have got too long. If these six go through, then I will promptly propose properties for those 18 (and as I stated in the motivation above, if you believe there are other large catalogues whose catalog codes would do well to be migrated to properties, please say so in a comment). Mahir256 (talk) 17:18, 12 August 2024 (UTC)[reply]
    @Ghuron: The reason I selected the SIMBAD formatter URL is that the external IDs I tried with that URL all seemed to resolve to the right objects; if there are in fact objects for which this resolution doesn't work, it would be great if you could name some. The caveat "(used more than 1 million times)" in the title of this property proposal page is important; because your imports did not yield more than 1 million Gaia DR3 identifiers, I did not think to propose a property for it here, though I'd gladly support one for Gaia DR3 if you think it would be useful. I don't know who "we" is as regards either Gaia DR2 or Tycho-2; you're the one who mass-imported the objects, so I'm working with the catalog codes I see on those objects. Mahir256 (talk) 17:18, 12 August 2024 (UTC)[reply]
    @Ghuron and @GZWDer, would you like to give your opinions? Regards, ZI Jony (Talk) 18:37, 16 September 2024 (UTC)[reply]
    I view external identifiers somewhat differently than @Mahir256. In my understanding, a new external identifier is needed when it provides a link to a new, previously unrelated external data source. In the proposed cases, we are getting connection to the same SIMBAD that we are already connected to via Property:P3083. Personally, the proposed identifiers will not bring me any valie (nor will they cause any harm).
    I understand the idea that this will reduce the number of triplets, but I think that the measly few million that we are discussing here are a drop in the ocean. Our goal is to upload data to Wikidata, and not try to optimize it in a way that makes life easier for the foundation's engineers. Let them do their job and we will do ours. Ghuron (talk) 19:00, 16 September 2024 (UTC)[reply]

Biology

[edit]
Please visit Wikidata:WikiProject Taxonomy for more information. To notify participants use {{Ping project|Taxonomy}}
Please visit Wikidata:WikiProject Biology for more information. To notify participants use {{Ping project|Biology}}

‎mode of reproduction

[edit]
   Ready Create
Descriptionways for living organisms to propagate or produce their offsprings
Data typeItem
Domaintaxon (Q16521) or organisms known by a particular common name (Q55983715)
Allowed valuesitem
Example 1mammal (Q7377)sexual reproduction (Q182353)
Example 2bacteria (Q10876)cell division (Q188909)
Example 3plant (Q756)asexual reproduction (Q173432)
Example 4plant (Q756)sexual reproduction (Q182353)
Planned useWould like to enable specifying mode(s) of reproduction for any organism or taxon via this property, preferably with references.
Expected completenessalways incomplete (Q21873886)

Motivation

[edit]

Currently, for the hundreds of thousands of Wikidata records related to taxa or organisms, there is no easy way to specify the mode of reproduction. This proposed property is intended to fill a gap. --Zhenqinli (talk) 04:37, 30 August 2024 (UTC)[reply]

Discussion

[edit]
Tobias1984 (talk) Andy Mabbett (Pigsonthewing); Talk to Andy; * *Andy's edits TypingAway (talk) Daniel Mietchen (talk) Tinm (talk) Tubezlob Vincnet41 Netha Hussain Fractaler Tris T7 TT me Photocyte GoEThe (talk) Egon Willighagen

Notified participants of WikiProject Biology. –Samoasambia 09:33, 30 August 2024 (UTC)[reply]

Agreed that there is no need to specify this property for every species. For some, specification at the highest level of taxons would suffice. However, there is a great deal of diversity and variability in the biological world. Even just for vertebrates, the mode of reproduction could be: oviparity (Q212306), viviparity (Q120446), and ovoviviparity (Q192805). In short, this property would provide an option for clarifications when more explicit explanation(s) are needed. --Zhenqinli (talk) 13:28, 30 August 2024 (UTC)[reply]
Thanks for the feedbacks. Indeed, having has characteristic (P1552) with any subclass of mode of biological reproduction (Q130077803) is better than having no information regarding an organism's mode(s) of reproduction in Wikidata. Currently they are almost 300 taxon-related properties. Many of them could have been implemented in similar ways as suggested. In my personal opinion though, having a roundabout way to state a key feature of an organism, is not ideal. --Zhenqinli (talk) 21:46, 1 September 2024 (UTC)[reply]
P.S. The description of has characteristic (P1552) does mention: "Use a more specific property when possible". This property is currently used in more than 200,000 statements, without constraints on subject (organism or taxon) or value (mode of reproduction) as this proposal would prefer. These facts will likely discourage systematic input of useful data and eventual WDQS query of mode of reproduction information using this property in Wikidata. --Zhenqinli (talk) 02:25, 2 September 2024 (UTC)[reply]
  •  Support; Zhenqinli makes a strong case against using has characteristic (P1552). However, the proposal should be revised to reflect Andy's note – it's standard practice to apply statements only at the highest class (or taxon) at which they are universally true (and sometimes even higher, with qualification like nature of statement (P5102)=often (Q28962312)), a principle that Example 1 (at least) violates. [Edit: fixed 18:17, 12 September 2024 (UTC)] It doesn't seem like this property carries any special encouragement to violate that principle, but if it does, that could be addressed in a property usage note. Swpb (talk) 17:56, 9 September 2024 (UTC)[reply]
Agree that in the first example, Homo sapiens (Q15978631) should probably be replaced by mammal (Q7377). As parent taxon (P171) is a subproperty of subclass of (P279), statements describing organisms at higher taxon ranks do not need to be re-stated at lower ranks of the class, so there will be no redundancy issue. --Zhenqinli (talk) 18:49, 9 September 2024 (UTC)[reply]
  • I hope anyone who still has reservation about this proposal could help clarify if there are remaining open issues or alternatives to be discussed further. While diel cycle (P9566) does have more than 284,000 statements for animals, I believe this proposed property for all living organisms should require far less statements, since mode of reproduction is typically more well-defined biologically and commonly stated at higher taxon ranks than diel cycle (diel cycle could also be modified due to domestication). --Zhenqinli (talk) 18:09, 12 September 2024 (UTC)[reply]
  •  Weak support Infoboxes on Wikipedia might want to include the mode of reproduction and thus it's good to have it one it's own property that's separate from has characteristic (P1552).
Currently, the problem is that the examples of the property are bad. It's not true that all plants have both sexual and asexual reproduction and thus it would be bad to make the statement for plants. ChristianKl12:35, 1 October 2024 (UTC)[reply]
Such a statement for plants could be qualified by nature of statement (P5102)=often (Q28962312), but I agree that an unqualified always-true statement would make a better example. Anything wrong with examples 1 and 2? Swpb (talk) 14:01, 1 October 2024 (UTC)[reply]
Thanks for supporting the proposal. I, too, would like to see better examples. But I also think more examples could be introduced, improved or updated later. I believe the mode of reproduction is well-documented scientifically and systematically. Once introduced to Wikidata, this property can have comparable or better data quality and utilization compared with similar taxon-related properties such as is pollinated by (P1703), seed dispersal (P3741), longest observed lifespan (P4214), and diel cycle (P9566). --Zhenqinli (talk) 07:00, 18 October 2024 (UTC)[reply]
This is a relatively complex field. Human (and mouse) parthenogenesis has been achieved, on an embryonic level. Gynogenesis is present in vertebrates, as is hybridogenesis. I imagine the viral reproduction we are familiar with is called lysogenesis, but I also imagine that there's more to viruses than they are letting on, and certainly there can be gene mixing (indeed there can be inter-species and even inter-kingdom gene mixing). So I suppose we would want a list with custom allowed. Would we also allow the use of this property on things that reproduce but normally considered living? All the best: Rich Farmbrough13:28, 19 November 2024 (UTC).
Thanks for the informative comments. Indeed, this is an important and broad concept that is currently missing among existing Wikidata properties. Personally, I hope to see a new simple property to serve as a common denominator applicable to all taxa and organisms. The complexity of reproduction in the biological world could still be captured within combinations of value items and qualifiers, on an as-needed basis. For an example, the fact that sheeps could be reproduced via cloning can be expressed in the following statement: sheep (Q7368)cloning (Q120877), with qualifiers observed in (P6531)=cloned mammal (Q57813806) and model item (P5869)=Dolly the Sheep (Q171433). --Zhenqinli (talk) 00:20, 20 November 2024 (UTC)[reply]

Duocet Wiki of Plants ID

[edit]
   Under discussion
DescriptionID of a topic in Duocet Wiki of Plants
Data typeExternal identifier
Example 1Orchidaceae (Q25308) => 兰科
Example 2Fendlera (Q144481) => 岩爪梅属
Example 3Lun Kai Dai (Q18984067) => 戴伦凯
Example 4Asteraceae (L1365348) => Asteraceae (note: this website has dedicated entries for taxonomic names)
External linksUse in sister projects: [ar][de][en][es][fr][he][it][ja][ko][nl][pl][pt][ru][sv][vi][zh][commons][species][wd][en.wikt][fr.wikt].
Number of IDs in source18757
Formatter URLhttps://duocet.ibiodiversity.net/index.php?title=$1

Motivation

[edit]

An online encyclopedia of plants.--GZWDer (talk) 18:06, 28 September 2024 (UTC)[reply]

Discussion

[edit]
M.A.Miron Vincnet41 Tubezlob Prime Lemur Tris T7 TT me Infomuse TED Wkee4ager Haymillefolium Bluerasberry (talk)

Notified participants of WikiProject Botany Regards, ZI Jony (Talk) 07:25, 18 October 2024 (UTC)[reply]

‎homonymous taxon

[edit]
   Ready Create
Descriptiontaxon item of which the taxon name is an exact homonym
Representshomonym (Q902085)
Data typeItem
Example 1Nectria (Q18616886)←homonym of→Nectria (Q2708290)
Example 2Lactarius (Q1900906)←homonym of→Lactarius (Q748899)
Example 3Leptosomus (Q2623737)←homonym of→Leptosomus (Q67015908)
Type constraint – instance oftaxon (Q16521)
Wikidata projectWikiProject Taxonomy (Q8503033)

Property constraints

[edit]

For the following I'm not sure, until now I used it in order to make it clear that two homonyms are different and should not be confused but with the new property, since there will be a symmetric constraint, I don't know if is is necessar, me I would put that constraint, but well if someone has an arguments against, fine:

Christian Ferrer (talk) 15:55, 3 December 2024 (UTC)[reply]

Motivation

[edit]

This proposal comes after this discussion [1] (see also this query), so at first it is planned to use it in order to harmonize and regulate the current ways of modeling taxonomic homonymy in Wikidata. In addition, me, I will add it every times I come across homonyms in the context of my contributions and this happens regularly. Note that this new property should have a symmetric constraint, and maybe an item-requires-statement constraint with different from (P1889). Christian Ferrer (talk) 21:03, 2 December 2024 (UTC)[reply]

Discussion

[edit]

‎taxon known by this common name

[edit]
   Under discussion
Descriptiontaxon item of which this common name refers
Representsorganisms known by a particular common name (Q55983715)
Data typeItem
Example 1bird of prey (Q48428)Accipitriformes (Q21736)
Example 2bat (Q115690288)Chiroptera (Q28425)
Example 3mouse (Q2751034)Muridae (Q25916)
  • →|Gliridae (Q108235)
  • Planned usereplacement of of (P642) stored values in the results of this query
    Wikidata projectWikiProject Taxonomy (Q8503033)

    Property constraints

    [edit]

    Motivation

    [edit]

    Proposal following this discussion. The new property will allow to store the values curently stored with of (P642) within all the instances of organisms known by a particular common name (Q55983715). Christian Ferrer (talk) 13:33, 4 December 2024 (UTC)[reply]

    Discussion

    [edit]

    Biochemistry/molecular biology

    [edit]
    Please visit Wikidata:WikiProject Molecular biology for more information. To notify participants use {{Ping project|Molecular biology}}

    Chemistry

    [edit]
    Please visit Wikidata:WikiProject Chemistry for more information. To notify participants use {{Ping project|Chemistry}}

    ‎molecular formula

    [edit]
    Saehrimnir
    Leyo
    Snipre
    Dcirovic
    Walkerma
    Egon Willighagen
    Denise Slenter
    Daniel Mietchen
    Kopiersperre
    Emily Temple-Wood
    Pablo Busatto (Almondega)
    Antony Williams (EPA)
    TomT0m
    Wostr
    Devon Fyson
    User:DePiep
    User:DavRosen
    Benjaminabel
    99of9
    Kubaello
    Fractaler
    Sebotic
    Netha
    Hugo
    Samuel Clark
    Tris T7
    Leiem
    Christianhauck
    SCIdude
    Binter
    Photocyte
    Robert Giessmann
    Cord Wiljes
    Adriano Rutz
    Jonathan Bisson
    GrndStt
    Ameisenigel
    Charles Tapley Hoyt
    ChemHobby
    Peter Murray-Rust
    Erfurth
    TiagoLubiana
    NadirSH
    Matthias M.
    S8321414
    Peter F. Patel-Schneider

    Notified participants of WikiProject Chemistry

    Motivation

    [edit]

    This proposal addresses the need for improved data structure and maintenance within Wikidata’s chemical compound data. Currently, the Wikidata:WikiProject Chemistry manages approximately 1 million chemical items, with many of them linked to chemical formula (P274) and mass (P2067). The main issues are:

    Redundancy in Data: With about 300,000 unique chemical formula strings in use, redundancy is a significant problem. Some strings are associated with over 1,000 items, which complicates data management (see https://w.wiki/B2ax).

    Efficiency and Maintenance: Transitioning from string-based formulas to item-based ones will simplify maintenance, reduce redundancy, and optimize query performance, especially for SPARQL queries involving formulas or masses.

    Data Optimization: Moving mass (P2067) statements to the newly created formula items will reduce the number of triples and make data management more efficient. Additionally, this change will facilitate the use of different units for masses and allow for better structured data.

    Improved Modeling: Switching to item-based formulas could eliminate the need for overly complex has part(s) (P527) statements on chemicals, allowing cleaner, more precise data models (e.g., identifying all chemical formulas containing more than five oxygen atoms).

    This change is expected to bring numerous benefits, including reduced redundancy, improved query efficiency, and better data maintenance. The potential downside of increased label editing can be managed, and the overall gain for Wikidata’s chemical data justifies this proposal. If approved, I am prepared to create the necessary items and migrate existing data.

    Any further input to refine this proposal is more than welcome!

    P.S.: I have no strong opinions if current chemical formula (P274) should be deleted or used on the new items as "Chemical Formula String"  – The preceding unsigned comment was added by AdrianoRutz (talk • contribs) at 15:00, August 28, 2024‎ (UTC).

    discussion

    [edit]
    •  Support sounds great! Egon Willighagen (talk) 15:25, 28 August 2024 (UTC)[reply]
       Comment Last night on the boat between Finland and Sweden I thought of another aspect where this would help model the chemistry in Wikidata better. If chemical formula are items (and thanks to GZWDer for showing various Wikipedias decided it was useful too), then they can also subclass each other. We can have an isotope-agnostic chemical formula ( the common case) and subclasses for chemical formula with isotopes.As such it does much more than being something technical (e.g. just about scalability) but actually improve how we talk about the chemistry. Egon Willighagen (talk) 07:07, 29 August 2024 (UTC)[reply]
    • Some comments:
    1. I will oppose "Additionally, this change will facilitate the use of different units for masses and allow for better structured data." - For consistency and machine-readability we should stick to one unit. I instead propose Wikidata:Property proposal/formula weight.
    2. Many wikis has pages like C15H20O4 (Q1250089). Some wikis treat it as disambiguation pages; some as set indices; we need to discuss how to handle such existing items. GZWDer (talk) 21:10, 28 August 2024 (UTC)[reply]
    • I looked at the English Wikipedia sitelink-ed page, and that actually looks exactly like a page about a chemical formula. To be honest, this actually sounds like in argument in favor of this proposal and that C15H20O4 (Q1250089) should be of type chemical formula (Q83147). The same for the French WP page, and neither say they are disambiguation pages, but are far more like a category of things with the same property. Just like this proposal, not? Egon Willighagen (talk) 06:58, 29 August 2024 (UTC)[reply]
    I was only partially able to follow your mind here. In your proposal, you mention this property if created, thus you would support it? I believe the discussion about mass (P2067) (and units) or other properties is an interesting one this proposal would allow to better discuss/implement, and what I mentioned about these or what is currently on the example item are just ideas, if this new property allows for these things to also improve, even better! AdrianoRutz (talk) 08:51, 30 August 2024 (UTC)[reply]
    •  Weak oppose I cannot question arguments raised here about efficiency, but I don't see this as a proper way forward. This proposal completely fails to take into account the fact that for a given chemical entity there may be many – equally correct – chemical formulae (simple example in Q27260276#P274). Moving chemical formulae to another item will not help at all with the most important purpose for which WD exists – using this data. I would see the new property as being created only to assist with specific activities – but not to replace existing properties – and with appropriate disclaimers in the name and constraints that it is a strictly technical property only. Wostr (talk) 22:21, 28 August 2024 (UTC)[reply]
      I think this proposal has no problems with alternative formula notations, e.g. like CHAgO₃ (Q130044611). Or? Egon Willighagen (talk) 06:51, 29 August 2024 (UTC)[reply]
      CHAgO₃ and AgHCO₃ are not the same chemical formula. Just as e.g. XeF4O and XeOF4 which would require two different items for the same compound. In fact, for some compounds several new items would need to be created. For some chemical species we would have formulae that have different number of atoms of elements: C30H40F2N8O9, C15H17FN4O3·1,5H2O and C30H34F2N8O6·3H2O are correct formulae for the same compound, but I don't see a way for this to be reflected correctly by the current proposal. Everything looks fine if you consider only simple organic compounds and their formulae in Hill notation, but it's not that simple especially if we consider some inorganic compounds which are not molecules. Wostr (talk) 12:34, 29 August 2024 (UTC)[reply]
      Thank you for this important point! I removed the single value constraint, thus allowing for what you mention. AdrianoRutz (talk) 08:47, 30 August 2024 (UTC)[reply]
      Good point about non-molecular substances. I think the chemical concept we are trying to capture is that of isomerism: chemical entities are isomers when they have the same molecular formula (Q188009) or (non-structural) formula unit (Q1437643), enabling one molecule/ion/unit of the first chemical entity to be rearranged into one molecule/ion/unit of the second chemical entity by moving atoms/bonds around.
      • For example, the ionic compounds with structural formulas [CrCl(H₂O)₅]Cl₂•H₂O and [Cr(H₂O)₆]Cl₃ are (hydration) isomers, which we can recognise by assigning them the same formula H₁₂Cl₃CrO₆. This shows that all species in the crystal lattice of a compound should be combined together into a single entity when determining the formula. In the example you give above, the correct formula would be C₃₀H₄₀F₂N₈O₉, derived from combining together 2C₁₅H₁₇FN₄O₃•3H₂O, the smallest formula unit with integer multiples of all species.
      • Likewise, the molecular substance CO(NH₂)₂ and ionic compound NH₄OCN are considered isomers, which we can recognise by assigning them the same formula CH₄N₂O. This is the molecular formula of urea and the formula unit of ammonium cyanate, showing how molecular and non-molecular substances can be isomeric.
      • For ions, fulminate(1−) (Q27110286) (with structural formula CNO-) and cyanate anion (Q55503523) (with structural formula OCN-) are isomers, which we can recognise by assigning them the same formula CNO-.
      • Clathrates are similar to coordination compounds. E.g. methane clathrate (Q389036) has structural formula 4CH₄•23H₂O, yielding the formula C₄H₆₂O₂₃. Likewise, the endohedral fullerene CH₄@C₆₀ should have formula C₆₁H₄.
      • Compounds should not usually map to multiple formulas: if C links to two different formulas, one the same as A (from reference 1) and one the same as B (from reference 2), this implies C is isomeric with A, and C is isomeric with B, but A is not isomeric with B. This only makes sense if 1 and 2 disagree as to what the correct formula of C ought to be.
      • When references disagree, we may need to support multiple formulas. Historically, w:en:copper monosulfide was thought to have structure [Cu2+][S2-], corresponding to the formula CuS. It has now been assigned the structure [Cu+]₃[S2-][S₂-], which would correspond to Cu₃S₃. However, PubChem still has the old formula. We might want to update Wikidata to the new formula while also keeping the PubChem-referenced formula (with a note that it's not the correct formula).
      • Non-stoichiometric compounds, alloys, and mixtures of indeterminate composition are more complicated to support. E.g. pyrrhotite (Q421944) has formula Fe1-xS (x = 0 to 0.125). Rather than trying to support formula units with atom counts that are algebraic expressions (e.g. 1 - x), I think it would be easier if we could list the formulas of the endpoints: Fe₇S₈ and FeS. Similarly, superconducting yttrium barium copper oxide (Q414015) has formula YBa2Cu3O7−x (x = 0 to 0.65), with endpoint formulas YBa2Cu3O6.35 (i.e. Y20Ba40Cu60O127) and YBa2Cu3O7. I think it's hard to come up with a perfect solution though. InChI (P234) has similar issues for non-stoichiometric compounds: https://doi.org/10.1186/s13321-015-0068-4#Sec45.
      Preimage (talk) 17:47, 31 August 2024 (UTC)[reply]
    •  Support I also see more benefits than downsides. Support. Wostr I am not sure to understand how this would be a problem even for entities which could be described using different MF sequences of atoms like Q27260276#P274. Indeed the has part(s) (P527) and quantity (P1114) of the MF entity, see C₁₅H₂₀O₄ (Q129998552) would allow to efficiently retrieve such compounds represented in different MF notation systems. What would exactly be the inconvenient in this particular case? GrndStt (talk) 06:22, 29 August 2024 (UTC)[reply]
    •  Support, conditional on change of representation to molecular formula (Q188009). As noted in w:en:chemical formula#Types, chemical formula (Q83147) has four separate meanings: empirical formula (e.g. formaldehyde and glucose both have empirical formula CH₂O), molecular formula (e.g. urea and ammonium cyanate both have molecular formula CH₄N₂O in Hill notation, indicating they are isomers), structural formula (a graphical representation of the structure, not so relevant here), and condensed (or semi-structural) formula (e.g. urea has condensed formula CO(NH₂)₂ whereas ammonium cyanate has condensed formula [NH₄][OCN]). Molecular formulas "indicate the simple numbers of each type of atom in a molecule, with no information on structure", which is what we need for mass calculations. They also avoid the issue raised by Wostr regarding non-uniqueness of chemical formulas (e.g. NH₄NO₃ and H₄N₂O₃ are both valid formulas for ammonium nitrate), as each chemical should have a single canonical molecular formula in Hill notation (with the exception of rare cases where there is disagreement regarding structure, e.g. w:en:copper monosulfide). One last potential issue: molecular formulas are often defined as not including isotopes, e.g. PubChem lists both deuterated chloroform and chloroform as having molecular formula CHCl₃. Egon Willighagen's suggestion to have a subclass of [molecular] formulas with isotopic information would resolve this issue though, I think. Preimage (talk) 12:22, 29 August 2024 (UTC)[reply]
      Just revised the naming to change to molecular formula (Q188009), as suggested. 👍🏼 AdrianoRutz (talk) 07:16, 24 September 2024 (UTC)[reply]
    •  Oppose A chemical formula is an abstract entity and not one that has a mass.
    It's worth noting that unicode can't capture all chemical formula and Mathematical expression could express more. ChristianKl16:29, 25 September 2024 (UTC)[reply]
    You're wrong about that. Each chemical formula has a defined number of atoms of a defined number of elements. Although each element has multiple isotopes, for every element with stable isotopes there is a standard mass associated with it which is the atomic weight which will be found with a typical sample. So the molecular weight of a particular chemical formula very much can be expressed. David Newton (talk) 09:58, 27 September 2024 (UTC)[reply]
    Currently, in Wikidata a chemical formula is a notation. Notations don't have inherent mass. The NCI description of what a chemical formula happens to be is "representation of a substance using symbols for its constituent elements". It's not the object that it's describing. While the object that a formula is describing can have mass the formula itself doesn't. It's a Document in NCI's ontology. In PROCO it's a quality and also not something that has mass. material entity (Q53617407) have mass and molecular formula (Q188009) isn't. ChristianKl12:47, 9 October 2024 (UTC)[reply]

    Medicine

    [edit]
    Please visit Wikidata:WikiProject Medicine for more information. To notify participants use {{Ping project|Medicine}}

    Mineralogy

    [edit]
    Please visit Wikidata:WikiProject Mineralogy for more information. To notify participants use {{Ping project|Mineralogy}}

    Computer science

    [edit]
    Please visit Wikidata:WikiProject Informatics for more information. To notify participants use {{Ping project|Informatics}}

    Geology

    [edit]

    Please visit Wikidata:WikiProject Geology for more information.

    Geography

    [edit]

    Linguistics

    [edit]

    Please visit Wikidata:WikiProject Linguistics for more information. To notify participants use {{Ping project|Linguistics}}

    Mathematics

    [edit]

    Please visit Wikidata:WikiProject Mathematics for more information. To notify participants use {{Ping project|Mathematics}}

    Material

    [edit]

    Please visit Wikidata:WikiProject Materials for more information. To notify participants use {{Ping project|Materials}}

    Meteorology

    [edit]

    Glaciology

    [edit]

    All

    [edit]

    Nutrition

    [edit]