Wikidata:Property proposal/text features
Text features
editnumber of words
editOriginally proposed at Wikidata:Property proposal/Sister projects
Description | number of words in text |
---|---|
Data type | Quantity |
Domain | Wikisource texts |
Allowed values | >0 |
Allowed units | none |
Example 1 | À M. Paul Foucher (Q55867126) → 1000 (replace with actual number) |
Example 2 | À M. des Herbiers (Q55867160) → 800 (replace with actual number) |
Example 3 | À son frère (Q55867161) → 700 (replace with actual number) |
Planned use | add to some Wikisource text |
Discussion
edit- Comment it could be interesting, but the rules to compute the number of words should be fixed, because the exact same text can have different word counts in different systems... --Hsarrazin (talk) 08:11, 14 November 2018 (UTC)
- Support @Hsarrazin: determination method or standard (P459) can be used to denote which method was used to count words. Dhx1 (talk) 10:28, 14 November 2018 (UTC)
- Comment Also suggest the domain be expanded to cover all texts described by Wikidata--not just Wikisource items, where a reliable source exists for the word count. Dhx1 (talk) 10:33, 14 November 2018 (UTC)
- Yes, P459 would generally be added with a value that links to a fairly detailed explanation on how it's being done. Personally I'd start out with Wikisource and see how it goes. Eventually it could be expanded. --- Jura 12:23, 15 November 2018 (UTC)
- tend to Oppose, seems like a specific version of number of parts of this work (P2635), or redundant with the scheme :
.
Also see type–token distinction (Q175928) there is a difference in the number of word-types used (if you use « dog » twice in your text this count as one word type but two « occurences » of the word « dog ») - actually we may be able to solve this with the pair of property has part Search/has part(s) of the class (P2670) now that I think of it :- ⟨ the text ⟩ has part(s) of the class (P2670) ⟨ word-type ⟩and
quantity (P1114) ⟨ the number of different word-type ⟩ - Indeed, « word-type » can be thought as a metaclass of words and « has part of the type » can cross the boundary beetween the class level and the metaclass one (that’s what it is for actually), while « text » and « words » can be thought as classes of the same level - you use words to build text, each time you copy a text you copy all of its words alike with the text. author TomT0m / talk page 13:16, 20 November 2018 (UTC)
- Thanks for your input. number of parts of this work (P2635) could work if we were just interested in one aspect, but using units to differentiate between types of parts seems complicated as we would need to retrieve the detailed SPARQL node each time. has part(s) of the class (P2670) seems a good alternative, but as we will likely have several values for the statements (depending one calculation method), selecting the correct one is slightly easier with a separate property. Furthermore, as this property will apply to many items, I think a dedicated property is preferable. --- Jura 06:42, 23 November 2018 (UTC)
- @Jura1: The counting method actually is a case to discriminate using « has part » / « has part of the type », « has part of the type » is appropriate for example if you count the « word-type » number, per the type-token distinction, and « word-token » we can even use « has part ». I also note you don’t details at all the way to model different counting method, I think it may be way more appropriate not to use arbitrary items for obscure non-described method if we can use generic concepts to model them ( an item for « word type » for example, through metaclassification). author TomT0m / talk page 13:49, 16 December 2018 (UTC)
- Thanks for your input. number of parts of this work (P2635) could work if we were just interested in one aspect, but using units to differentiate between types of parts seems complicated as we would need to retrieve the detailed SPARQL node each time. has part(s) of the class (P2670) seems a good alternative, but as we will likely have several values for the statements (depending one calculation method), selecting the correct one is slightly easier with a separate property. Furthermore, as this property will apply to many items, I think a dedicated property is preferable. --- Jura 06:42, 23 November 2018 (UTC)
- Support Good idea. I wonder if the domain could indeed be stretched beyond wikisource-entries. Lymantria (talk) 11:14, 16 December 2018 (UTC)
@ديفيد عادل وهبة خليل 2, Hsarrazin, Lymantria, TomT0m, Dhx1, Jura1: Done: number of words (P6570). − Pintoch (talk) 20:28, 6 March 2019 (UTC)
number of sentences
editOriginally proposed at Wikidata:Property proposal/Sister projects
Description | number of sentences in text |
---|---|
Data type | Quantity |
Domain | Wikisource texts |
Allowed values | >0 |
Allowed units | none |
Example 1 | À M. Paul Foucher (Q55867126) → 50 (replace with actual number) |
Example 2 | À M. des Herbiers (Q55867160) → 40 (replace with actual number) |
Example 3 | À son frère (Q55867161) → 30 (replace with actual number) |
Planned use | add to some Wikisource text |
Motivation (both proposals)
editI think it would be good to add such metadata to Wikisource texts. Maybe additional properties can be useful.
@Hsarrazin: who edits there frequently. @Dhx1: who mentioned related readability scores on Project chat --- Jura 05:40, 14 November 2018 (UTC)
Discussion
edit- Support Both David (talk) 08:02, 14 November 2018 (UTC)
- Support determination method or standard (P459) should be allowed and encouraged to be used as a qualifier to denote the method used to count the number of sentences.
- Comment Also suggest the domain be expanded to cover all texts described by Wikidata--not just Wikisource items, where a reliable source exists for the sentence count. Dhx1 (talk) 10:34, 14 November 2018 (UTC)
- See also comments for first proposal only above. --- Jura 12:23, 15 November 2018 (UTC)
- @Hsarrazin, Dhx1, ديفيد عادل وهبة خليل 2: thanks for your input. For both counts, the method should probably list the separators used to identify words (e.g. " ") and sentences ("." or "?" or "!", etc.) which may vary by language. Maybe an existing property can work for that, maybe we need a new one too. --- Jura 05:09, 16 November 2018 (UTC)
- Tend to Oppose with the same ideas as in « number of words » : use « has part ». author TomT0m / talk page 13:25, 20 November 2018 (UTC)
- see comment above. --- Jura 06:42, 23 November 2018 (UTC)
- Comment I wonder how should be dealt with poems or song lyrics, which often do not have a clear sentence-structure? Is the combination of these two properties meant to indicate text complexity (longer sentences means harder to read)? Lymantria (talk) 11:14, 16 December 2018 (UTC)
- I think the "criterion used"-item needs to enumerate separators. --- Jura 14:29, 16 December 2018 (UTC)
- This seems rather a naive approach. is this backed with references and known segmentation algorithm in mind or is this original work ? author TomT0m / talk page 14:36, 16 December 2018 (UTC)
- I think the "criterion used"-item needs to enumerate separators. --- Jura 14:29, 16 December 2018 (UTC)
- @Hsarrazin, Dhx1, TomT0m, Lymantria, ديفيد عادل وهبة خليل 2, Jura1: Property number of sentences (P6695) Done. I also added on both properties the determination method or standard (P459) constraint. Good contributions, Ederporto (talk) 03:44, 21 April 2019 (UTC)