User:The-erinaceous-one/types of collections
There are many different types of collections on Wikidata and the distinction between them are not always well-defined. This is my attempt to clarify the current state of affairs.
Wikidata Types of Collections
[edit]Collections of similar objects
[edit]- set (Q28813620): group of items regarded as one
- group (Q16887380): well-defined, enumerable collection of discrete entities that form a collective whole
- By translating all the descriptions of Q16887380 to English, we can find that common themes in the descriptions are
- similar characteristics,
- set of things or people,
- two or more,
- coexistence.
- My assesment, then, is that a "group" is an exaustive collection of two or more concurrent physical things (real or fictional) that have an mutual association or defining charactersitic. So examples of groups are: The Beatles (Q1299), the stars in our galaxy, and Bonnie and Clyde (Q219937). Examples that are not groups are John Lennon (only one item), the set {John Lennon and Paul McCartney} (not exhaustive), the set of real numbers (not physical), presidents of the United States (not concurrent), and the set { New York City (Q60) and Julius Caesar (Q1048) } (no association). So every group is a class (Q16889133), but not every class (Q16889133) is a group. Examining the list of instances of group (Q16887380), we find that groups can also contain events and abstract objects, and there are subclasses which have fewer than 2 items by definition:set of 0 (Q39604693) and monad (Q39604065). We might want to just move those to a new class, however (along with all instances of type of fixed-size set or group (Q99469810), e.g. monads and dyads, etc). This would make most (all?) groups to have at least 2 items and not contain abstract objects.
- I agree with most of this except I don't feel like stars in our galaxy should be a group as it's too big. I guess my feeling about group is that it should be easily specifed by using inverse relation, so should tend to have less than about 20 members on wikidata (though may have more not on wikidata like 'bunch of grapes'. I still don't think monad, dyad, etc is important enough to be P31 and would prefer some kind of indication that all members are stored on wikidata and exist as 'has member' or similar relation to say that it's a 'completely specified group'. I'm still uncertain about whether they have to be physical objects. - cdo256
- I don't think we should put an upper (finite) limit on the size of groups. The New York Philharmonic Orchestra (Q471154) has over a hundred members and I don't think we want to list each inverse relation. It makes sense to have <some violinist>member ofNew York Philharmonic Orchestra (Q471154), but we wouldn't want 100+ "has member" statements on the New York Philharmonic Orchestra (Q471154). In the definition of a group, I think we should only specify whether or not groups with infinitely-many members are allowed (and I can only imagine that happening if we allow abstract objects in groups).
- I agree with most of this except I don't feel like stars in our galaxy should be a group as it's too big. I guess my feeling about group is that it should be easily specifed by using inverse relation, so should tend to have less than about 20 members on wikidata (though may have more not on wikidata like 'bunch of grapes'. I still don't think monad, dyad, etc is important enough to be P31 and would prefer some kind of indication that all members are stored on wikidata and exist as 'has member' or similar relation to say that it's a 'completely specified group'. I'm still uncertain about whether they have to be physical objects. - cdo256
- By translating all the descriptions of Q16887380 to English, we can find that common themes in the descriptions are
- I agree regarding monad, dyad, etc., and I think we should move them to a different class. I created a new item collective entity (Q99527517) to serve as the root of all types of collections (including lists, multisets, groups, mathematical sets, etc.), but I think monad, dyad should not be direct subclasses of Q99527517---we need to figure out a more specific way to classify them.
- Regarding whether a group is only physical objects, there actually is an item specifically for that: group of physical objects (Q61961344), so I think we we should use group (Q16887380) more broadly. Specifically, based on current usage, groups should be allowed to contain (at least) physical objects, events, and works of art. Probably not abstract objects, though. β The Erinaceous One π¦ 10:53, 26 September 2020 (UTC)
- Here are some current nonphysical instances/subclass of group (Q16887380) that serve as good test cases of what should and shouldn't be in group (Q16887380). Our definition should, at least, clearly specify whether each of these are instances/subclasses of Q16887380.
- β The Erinaceous One π¦ 22:14, 26 September 2020 (UTC)
- @The-erinaceous-one: Rule set and skill set are interesting examles. Rules and skills appear to be very slippy and abstract in that it's difficult to pin down what individual skills and rules are. For example, is castling in chess a single rule or is it two rules (one for kingside and another for queenside)? Does not being able to castle across check respresent part of a single castling rule or is it a separate rule? Similarly for skill. I think if these are clearly defined then I think they should all be groups. My feeling is that they should all be groups. Adjusting your definition above, I'd define group as an exaustive collection of two or more entities (abstact, real or fictional) that have an mutual association or defining charactersitic. - cdo256 17:00, 4 October 2020 (UTC)
- @cdo256: We're getting closer to a good definition but after thinking about this more, I think that group (Q16887380) currently consists of three distinct types of collections:
- A collection of people (or entities controlled by people) who choose to be associated with each other, such as people in a musical band or nations in the United Nations.
- A loosely-defined collection of entities that are clustered together, either physically or conceptually, such as a tornado outbreak or group of works (Q17489659)
- A well-defined collections of entities that is analogous to a mathematical set, such as a musical chord.
- I propose making group (Q16887380) only contain the first case, create a new item called "cluster" for the second case, and move any items in the third case to set (Q28813620). β The Erinaceous One π¦ 05:39, 5 October 2020 (UTC)
- I created cluster (Q99963080) for the second case. β The Erinaceous One π¦ 07:34, 5 October 2020 (UTC)
- @The-erinaceous-one: I think this makes sense as a distinction. How should communities, ethnic groups/nationalities, and religious groups be classified? I think community p279 group of humans, ethnic group p279 collection of humans, nationality p279 group of humans, religious groups p279 group of humans. - cdo256 09:52, 6 October 2020 (UTC)
- @cdo256: Those are good test cases and I generally agree with your assesments, although we might want to discuss more ethnic groupsubclass of (P279)collection. Now, I think we can improve the definition of "group" that I stated above by changing it to "A collection of discrete entities who choose or are assigned to be part of a collective whole." This would include musical bands and the United Nations, as I mentioned above, as well a group of soldiers assigned to a unit, a group of animals assigned to a particualr expiremental test group, and a group of buildings designated to be a "campus" (but not a group of buildings that are merely clustered together). This would make the case of nationality more clear because people can either be born into it (so they are assigned nationality by the birth nation) or they can choose it by naturalization. The reason for "discrete" in the definition is to exclude something like a geographic region. β The Erinaceous One π¦ 10:20, 10 October 2020 (UTC)
- @The-erinaceous-one: I meant cluster rather than collection. I'm at my limit of caring about wikidata for the time being but I'll probably be back in a few months. - cdo256 17:15, 12 October 2020 (UTC)
- @cdo256: Those are good test cases and I generally agree with your assesments, although we might want to discuss more ethnic groupsubclass of (P279)collection. Now, I think we can improve the definition of "group" that I stated above by changing it to "A collection of discrete entities who choose or are assigned to be part of a collective whole." This would include musical bands and the United Nations, as I mentioned above, as well a group of soldiers assigned to a unit, a group of animals assigned to a particualr expiremental test group, and a group of buildings designated to be a "campus" (but not a group of buildings that are merely clustered together). This would make the case of nationality more clear because people can either be born into it (so they are assigned nationality by the birth nation) or they can choose it by naturalization. The reason for "discrete" in the definition is to exclude something like a geographic region. β The Erinaceous One π¦ 10:20, 10 October 2020 (UTC)
Intentionally designed collections
[edit]The following two classes are distinct in the fact that their instances consist of items that were intentionally assembled by a person. The items can be physical, digital, or.... ?
- set of physical objects (Q63341786): group of physical objects, sometimes matching, intended to be used or seen together
- A collection of items that are produced with the intent of them being together.
- collection (Q2668072): set of purposefully gathered physical or digital objects with some common characteristics
- A collection of items that are intentionally brought together after their inception, such as a rock collection, or museum's collection.
Abstract collections
[edit]- class (Q16889133): collection of items defined by common characteristics
- Based on the linked Wikipedia page, class (Q16889133) is specifically a class in a "knowledge represenation" (i.e. an ontology). Thus Q16889133 is a good item to use when talking about Wikidata classes. You can be more specific, however, by using metaclass (Q19361238) for classes of classes and first-order class (Q21522908) for classes of instances.
- class (Q5127848): group of things derived from extensional or intensional definition (philosophy)
- The item class (Q5127848) refers to the philosophical concept of a "class," and is broader than class (Q16889133). A class has to actually be included in an ontology to be an instance of class (Q16889133) (which presumably means its insances have some shared characteristic) but every collection of things is a class (Q5127848), so the set "{1, apple, every left shoe}" is a class (Q5127848) but not a class (Q16889133).
Mathematical collections
[edit]The following two items are clearly defined, so I don't think there is ambiguity when to use them. We might want, at some point, to distinguish between various mathematical definitions of sets (ZFC, John von Numann, fuzzy sets), but that can happen later.
- set (Q36161): well-defined mathematical collection of distinct objects
- class (Q217594): mathematical collection of sets that can be defined based on a property of its members (set theory)
Interestingly there is an item for an element of a set: element (Q379825) which appears to be used as an indirect relation, which I think would be better described with an actual 'element of' property. Eg rational function (Q41237)subclass of (P279)element (Q379825)
Computer Science Data-structures
[edit]- collection (Q17008256): data type in computer science
- set (Q1514741): abstract data type in computer science
- list (Q27948): abstract data type used in computer science
Collection Properties
[edit]Property | Meaning | Transitive | A | B |
---|---|---|---|---|
part of (P361) | A is a (possibly singular) subset of B | Yes | any entity | group, structure or object |
subclass of (P279) | every instance of A is also an instance of B | Yes | class | class |
member of (P463) | A is a member of B | No | person (or physical object?) | group |
instance of (P31) | A is in the class B | No | entity | class |
has part(s) of the class (P2670) | A must only have members in B | No(?) | collection or class of collections | class |
For part of (P361), the subject can be a class (for instance, enginepart ofmotercycle) but it needn't be. Similar with member of (P463).
Qualities that need to be determined for each collection type
[edit]- Allowed cardinalities (i.e. how many items can a collection have?) Examples: finite, finite greater than 1, countably infinite, any.
- What can be contained in the collection? Pyhsical objects, people, abstract objects, other classes, etc.
- Also for consideration are the sub-collections that people often make like group of humans (Q16334295). Which raises the question, if y subclass collection-type and x part of (P361) w instance of (P31) y then should there be a z st. {x instance of (P31) z part of (P361) y} that also holds? Eg. x=Gebhard FΓΌrst (Q69064), y=group of humans (Q16334295), w=German Ethics Council (Q1204738), and z=human (Q5). On second thoughts I think I got the implication the wrong way round. --Cdo256 (talk)
- What criteria are allowed for a collection to define its members? Do the items need to be enumerated, concurrent, etc.
- What wikidata properties should be used to express membership and sub-collection?
- Obviously there are "instance of," "subclass," and "part of." Are there any other options? β The Erinaceous One π¦
- I also found position held (P39) (when ther are many jobs per position) and member of (P463) (which doesn't seem to be that widely used compared to part of (P361)). I was thinking possibly there's a need for a element-of property to be added since the transitivity of part-of isn't always desired. Eg. john-lennons-kidney is part of John Lennon (Q1203) part of (P361) The Beatles (Q1299) so john-lennons-kidney is part-of the beatles but not in the same way john-lennon is himself. There's also the possibility of dropping part-of and mandating instance-of so eg john-lennon instance-of the-beatles. I'm still undecided as to whether we should go for more granularity or less. --Cdo256 (talk)
- For the case of the Beatles, instead of using part of (P361), the relationship should be John Lennon (Q1203)member of (P463)The Beatles (Q1299) because member of (P463) is not transitive. It doesn't seem right to say john-lennoninstance-of the-beatles, but I'm not struggling to state why. I'll have to give this more thought. β The Erinaceous One π¦
- Can collections have properties other tham membership? Eg corporation is group of people but also has its own properties like incorporation type
- This might be getting too much into the weeds. I think we should get a solid core of collection classes then the subclasses can have any sorts of extra property structures they need.β The Erinaceous One π¦
- My thinking when I wrote it was a difference between mathematical sets and real-world sets and intrinsic vs extrinsic properties. But my feeling is that wikidata like most ontologies doesn't distinguish intrinsic and extrinsic properties (which is probably for the best since they're a slippy concept and lots of modern philosophers don't consider the distinction important). I think we should allow all collections to have whatever additional properties since that's how the rest of wikidata works. --Cdo256 (talk)
- What is the criteria for two collection instances to be the same? Is the collection of Anthony McPartlin (Q573612) and Declan Donnelly (Q3182552) 'the same as' Ant & Dec (Q3596068) (for various definitions of 'collection' and 'same as')?
- Can we clarify this more? Is there a type of collection for which two instances with the same elements are not equal? Ordered collections come to mind. β The Erinaceous One π¦
- I was thinking of something like say a corporation is set up as both a PLC and a LTD (eg for tax reasons) but every employee is considered to be employed by both companies. Another example could be the complexity classes P and NP. Or classes that accidentally happen to have equivalent members, eg suppose a country developed and rolled out compulsory chip implantation in the brains of all its citizens then once they'd completed the roll out, population-of-x and people-with-brain-chips would be (practially) identical (although that example doesn't really work). Also is a group of humans the same as the class of humans
- Can membership vary over time? (time-variant vs. time-invariant)
- I would expect each type of collection to be agnostic to whether or not it is time-invariant. If necessary, we could make, for example, "set" and then "time-invariant set" and "time-varying set" as subclasses. β The Erinaceous One π¦
- Now that I think about it, even something seemingly invariant like prime number (Q49008) I could imagine being changed if enough mathematicians decided that 1 is a prime. So no collection seems immune from being time varying and as you suggested, we can ignore this quality for this discussion. --Cdo256 (talk)
Do all members need to be contained in wikidata (eg. should all members of a group be specified)? Or even any members at?
Relations between collection types
[edit]- Is one collection type a subclass of another.
- I'm think we should have a root "collection" class that is the superclass of all other types of collections. The label is open for discussion, but I was thinking "collection entity." β The Erinaceous One π¦
- Are membership-properties of two collection-types orthogonal to one another, eg. instance-of and member-of work orthogonally in the following example {rock part-of rock-collection; rock-collection subclass-of collection; johns-blue-rock instance-of rock; johns-rock-collection instance-of collection; johns-blue-rock member-of johns-rock-collection}
- Should there be enough collection-types of membership such that no membership-property has to work orthogonal to itself?
- What would be an example of being orthogonal to itself? β The Erinaceous One π¦
- chlorine (Q688) is part of period 3 and group 17. -Cdo256 (talk)
Open questions
[edit]- Where should we put instances of type of fixed-size set or group (Q99469810)? (We might need to rename Q99469810.)
- I don't think this is essential to model. I think a property for cardinality would be sufficient for creating dyad, triad etc. -Cdo256 (talk)
- Perhaps, but there already are tons of items that are modeled using dyad, triad, etc., so I don't think it is worth it to go back and change all of them right now. β The Erinaceous One π¦
- What about language differences? Can all languages tell the difference between different collection types? Are there languages that make a more granular distinction than English?
- Getting all the translations will be difficult, but fundamentally we are trying to model something that is language-independent. We are using particular words for particular concepts, but even in English there aren't enough words so "set" "group" and "collection" are used in different ways. β The Erinaceous One π¦
- Is making these distinctions between types of collections important and useful?
- This is an important question to ask, thanks for raising it. The way things are grouped together, and the relations between those groups are fundmental to the structure of data, so if we are rigorous how we structure collections, we can make the data easier to use. β The Erinaceous One π¦
- How do we express things that aren't simply members but have a special relationship to a group (eg. root of a choord)? - cdo256
- We can use other, specialized properties for special relationships. I don't think we need to figure that out while defining the ontology of collections. β The Erinaceous One π¦
Related discussions/pages
[edit]- Some additions to help:basic membership properties but doesn't distinguish collection types. - cdo256
- "Classes are those items that conceptually group together similar items, as human (Q5) groups together humans."
- Type-token distinction: 'The sentence "they drive the same car" is ambiguous. Do they drive the same type of car (the same model) or the same instance of a car type (a single vehicle)?'
- Two drawbacks of this is that it doesn't specify what to do about abstract objects and its advice isn't currently obeyed and has an overwhelming number of counterexamples (eg. The Everett Herald (Q26877), books, molecules and most things where there are many copies and little to no difference betweeen the copies). A large number of classes are using instance-of when this page states that subclass-of should be used instead. - cdo256
- This seems like the most comprehensive and wel thought out page so far. Still reading through it.
- Wikidata_talk:WikiProject_Ontology#Problem_with_has_part_(P527)_and_part_of_(P361)
- Wikidata:Requests_for_comment/Refining_"part_of"
- Wikidata_talk:WikiProject_Ontology#X_is_instance_of_"type_of_X"...!?
- Wikidata_talk:WikiProject_Ontology#Primary_top-level_ontology
- Wikidata:Project_chat/Archive/2017/04#Can_a_class_be_part_of_a_non-class?
- Wikidata:Project_chat/Archive/2017/11#Inverse_for_"member_of"_(P463)
- Wikidata:Project_chat/Archive/2018/07#Hundreds_of_'part_of'_statements
- Wikidata:Project_chat/Archive/2019/11#Basic_membership_properties_/_part_of
- Wikidata:Project_chat/Archive/2015/05#Level_of_granularity_of_properties
Discussion
[edit]Please feel free to chime in with any contributions! β The Erinaceous One π¦ 09:25, 19 September 2020 (UTC)
Hiya thanks a lot for making this. I'm going to go ahead and make some additions without changing what you've already written. If you'd prefer I can move my changes to their own section. --Cdo256 (talk) 08:04, 21 September 2020 (UTC)
- I appreciate your additions! This is very much a working document, so feel free to modify the text I wrote, too. If we start to step on each others toes, we can bring the point of disagreement down to the discussion section. You'll notice I also started signing comments (without the date) so that we can respond to each other more easily. β The Erinaceous One π¦ 12:27, 21 September 2020 (UTC)
@Cdo256: Help:Basic membership properties, Wikidata:WikiProject Ontology/Classes and User:TomT0m/Classification have relevant information. We might even be replicating some of what has already been documented there. β The Erinaceous One π¦ 10:40, 22 September 2020 (UTC)
- Okay I'm reading through these now. - cdo256 05:52, 23 September 2020 (UTC)
My feeling now is that the distinction between group and class lies in which direction the relation is naturally expressed. A class may contain many members but each member should only be directly stored as instances of just a couple of classes. On the other hand, an object may be part of many groups but each group should contain just a few members. Then there's possibly need for a third kind which defines members using an external list or some kind of criteria. Still got quite a lot more thinking and reading to do. - cdo256 08:53, 25 September 2020 (UTC)
Wrap up: Phase 1
[edit]@cdo256: I think we should try to start implementing some changes so that we can move toward wrapping this up. Here are the initial changes that I propose.
- Change the label of Q28813620 from "collection" to "set" to communicate that it is analogous to a mathematical set (its only difference is that it can contain non-mathematical objects). Then we will have the mathematical set set (Q36161), which corresponds to the non-mathematical set (Q28813620).
- Move monad, dyad, etc. to be subclasses of (the newly renamed) set (Q28813620). This classification is reasonable because monad, dyad, etc. are always well-defined sets due to the explicit enumeration of their members.
Once we have done that, we will be in a better place to discuss the remaining open questions, namely the scope of group (Q16887380) and the properties that are used to model relationships. β The Erinaceous One π¦ 10:03, 27 September 2020 (UTC)
- Agreed on both points. I've been away for a couple of days but I agree with this. Should we do a RfC or ping wikiproject ontology before making larger changes? Another couple of thoughts/questions I've been pondering: Is there an analogous item for multi-set like Q28813620 is for set? Should collective entity (Q99527517) be a subclass of structure (Q6671777)? Should we distinguish collections of items where each item is considered primarily part of a single collection (eg parts of a particular brand of airplane), from collections of items where each item may be part of many collections (eg. teams someone competes in)? Should we distinguish the 'primary' group from the other groups an item may be part of? My thinking is no to both distinguishing-collections questions, since we can just use the preferred rank on 'part of' to represent the same thing. - cdo256 20:52, 29 September 2020 (UTC)
- My responses to your questions:
- Yes please ping ontology, but no, I don't think we need to make an RFC (AFAIK, RFC's are mainly for resolving disputes after other channels have been exhausted).
- I am not aware of an analogous item for a multi-set, but if there is a need for it, then I'm fine with it being created.
- I don't think that collective entity (Q99527517) is a subclass of structure (Q6671777) because a collection doesn't need to have any organization or "structure" (I would say, however, that structure (Q6671777) is a subsclass of collective entity (Q99527517)).
- Let me make sure I understand your question correctly. There are some types of collections where we expect items to never belong to more than one collection of that type, in otherwords we expect the collections to be disjoint. So, if A and B are collections and x, y, and z are items, then we could have x in A and y in B, but we would never have z in A and B. To model this, we could create something along the lines of "type of collection with disjoint instances" (which would be second-order meta class). Then we could have "type of airline parts for particular brand" (first-order metaclass), "Boeing airplane parts" and "Airbus airplane parts"(disjoint classes).
- What would make a particular group the "primary" one? It seems like that would be subjective in most cases.
- β The Erinaceous One π¦ 11:22, 4 October 2020 (UTC)
- @The-erinaceous-one: Sorry I didn't see this. Yes this all makes sense, regarding the penultimate question, you understood correctly; and your answer makes a lot of sense, thanks for thinking about this so deeply. Regarding the last, I don't think it matters. - cdo256 00:43, 17 October 2020 (UTC)
- My responses to your questions:
Wrap up: Phase 2
[edit]In order to have more precisely defined types of groups, we propose dividing Q16887380 among three classes:
- set (Q28813620): A well-defined collection of entities that is analogous to a mathematical set, such as a musical chord.
- group (Q16887380): A well-defined collection of multiple discrete entities who choose or are assigned to be part of a collective whole. Examples include people in a musical band, nations in the United Nations, buildings in a campus, and mice assigned to an experiemental group. A group can have zero, one, or more members.
- cluster (Q21157127): A fuzzily-defined collection of multiple discrete entities that are clustered together, either physically or conceptually. Examples include a tornado outbreak (Q2696963) or group of works (Q17489659).
These three classes would have the following subclass hierarchy:
Additionally, member of (P463) would be used to indicate membership of individuals in Q28813620, Q16887380, and Q21157127. β The Erinaceous One π¦
Discussion
[edit]@The-erinaceous-one: Should we also propose an inverse relation to member of (P463) so we can express groups defined by their members, eg Bonny and Clyde? - cdo256 00:37, 17 October 2020 (UTC)
- @cdo256: I don't think that is necessary. We can represent the members of a group with disjoint union of (P2738). β The Erinaceous One π¦ 06:51, 27 October 2020 (UTC)
Wrap up: Phase 3
[edit]Document the distinctions and integrate with the existing documentation.
WikiProject Ontology has more than 50 participants and couldn't be pinged. Please post on the WikiProject's talk page instead. – The preceding unsigned comment was added by Cdo256 (talk • contribs) at 00:47, 17 October 2020 (UTC).
- @Cdo256: The above ping did not go out, as it was not followed by a proper signature.
- @The-erinaceous-one: Sorry for arriving very late to this conversation, but a lot of the above has been very contrary to how the items have been used. Groups are not classes, and classes are not groups. Groups do not have instances. Classes do not have parts or members, though its instances may have parts. Many properties, when used in items for classes, treat the subject as each individual instance of the class, while for groups, the subject is the sum total of the parts. It also tends to be the case that labelling is done differently between them, with classes in the indefinite singular (as a designation for any of its instances) and groups in the plural or collective-singular. (Example: There are two 1kg boxes which are called "foos" (1 foo, both foos). A class of both of these could be labelled "foo" and have mass: 1kg, while a group representing both could be labelled [the] "foos" and have mass: 2kg.) "Set" remained largely unused, and I really think it should return to that status. --Yair rand (talk) 07:57, 14 December 2020 (UTC)
- @Yair rand: Looking back at what I wrote above, I believe I meant that for ever group, a corresponding class can be constructed. That is, given group A, you can defined a class that is "the class of all things in group A"--not that every group is a class or vice versa. Anyway, regardless of what I wrote way-back-when, we didn't change groups to be classes and your analysis of the difference between them is correct.
- On the other hand, when it comes to set (Q28813620), I don't agree with your assesment. I think sets are a distinct and important concept that should be separate from groups. Can you explain your objection more? Thanks, β The Erinaceous One π¦ 01:09, 23 December 2020 (UTC)