Wikidata:Requests for comment/References and sources
An editor has requested the community to provide input on "References and sources" via the Requests for comment (RFC) process. This is the discussion page regarding the issue.
If you have an opinion regarding this issue, feel free to comment below. Thank you! |
THIS RFC IS CLOSED. Please do NOT vote nor add comments.
- The following discussion is closed. Please do not modify it. Subsequent comments should be made in a new section. A summary of the conclusions reached follows.
- A bit late, but...
- Help:Sources is an official guideline, but editors should remember that using common sense is important.
- The main concerns raised by opposition are being addressed in Wikidata:Requests for comment/Source items and supporting Wikipedia sources, which should reach an acceptable conclusion.
Legoktm (talk) 02:13, 8 August 2013 (UTC)[reply]
References are an important mechanism to point to specific part of a source that provides information about an item or an statement. References might provide additional information about which edition or which page is being referenced from a source to support a claim. This RFC is intended to gather feedback from the Wikidata Community to analyze the different options and to evaluate which one is preferred.
Before commenting, it is suggested to take a look to the Open Annotation Data Model, which specifies an interoperable framework for creating associations between related resources, annotations, references, etc. using a methodology that conforms to the architecture of the World Wide Web.
The words "work" and "manifestation" are used in FRBR model sense.
Contents
- 1 References implementation
- 2 Information to be held about sources
- 3 Dismissed options
- 4 References to web pages
- 5 How to store edition data
- 6 Simple guidelines for sources
- 7 Pyfischs comments
- 8 Final vote
- 9 Continuation of the RfC: On Source items and supporting Wikipedia sources
Regardless of the data model used to represent references, its implementation on Wikidata may take different forms depending on how many items are used to spread the information about body, annotation, and target.
Only references for statement sources (Q-S model, or 2Q model)
editOne possible option is to support embedded references only for Wikidata's stored statements, and having an item to represent the source. The specifics about the reference (edition, page) would be stored as qualifiers additional properties.
Pros
edit- Number of items is kept lower than the 3Q or Q-R-S model
- It allows to query sources
Cons
edit- It allows to reuse source info, but not reference info.
- Doesn't address the problem of references used to support claims in Wikipedia that have no equivalent in Wikidata
Comments
edit- Comment This option doesn't handle Wikipedia article references that have no equivalent in Wikidata.--Micru (talk) 22:31, 16 April 2013 (UTC)[reply]
- Can you please explain your opposition because this opposition depends on the choice we will do in the next section: if we allow creation of items even if they have no wikipedia article and if we create the necessary properties there is no problem. Snipre (talk) 08:26, 17 April 2013 (UTC)[reply]
- No, my opposition steams from the limitations when anchoring references. Imagine that you want to have a reference to this source represented in Wikidata. With this model the only way to anchor the reference somewhere is to have a statement in an item which comes from this source. With other models you don't need a statement, you can reference the item as a whole. It is not exactly what Wikipedia does (there are anchors for each sentence), but it comes closer to represent the relationships and it could be fine-tuned in the future.--Micru (talk) 11:39, 17 April 2013 (UTC)[reply]
- Why would you want to represent a wikipedia reference in Wikidata? Sources ok but why references? References are attached to statements, whether in Wikipedia or wikidata, and don't have much meaning when separated from that statement. Filceolaire (talk) 12:58, 18 April 2013 (UTC)[reply]
- You are right, references are quite useless without an anchor to the statement. Maybe in the future, but not now. I remove my opposition.--Micru (talk) 19:35, 18 April 2013 (UTC)[reply]
- Why would you want to represent a wikipedia reference in Wikidata? Sources ok but why references? References are attached to statements, whether in Wikipedia or wikidata, and don't have much meaning when separated from that statement. Filceolaire (talk) 12:58, 18 April 2013 (UTC)[reply]
- No, my opposition steams from the limitations when anchoring references. Imagine that you want to have a reference to this source represented in Wikidata. With this model the only way to anchor the reference somewhere is to have a statement in an item which comes from this source. With other models you don't need a statement, you can reference the item as a whole. It is not exactly what Wikipedia does (there are anchors for each sentence), but it comes closer to represent the relationships and it could be fine-tuned in the future.--Micru (talk) 11:39, 17 April 2013 (UTC)[reply]
- Support References should be attached to statements. They should not be separate items. Filceolaire (talk) 12:58, 18 April 2013 (UTC)[reply]
- Support If I get understand this right, this is the way to go. Regarding Micrus comment: as said before, Wikidata should not store Wikipedia references but sources. Wikipedia references can then again access data about sources from Wikidata. --Denny (talk) 14:11, 18 April 2013 (UTC)[reply]
- Support This seems like a reasonable solution. However, I would like add that the source item should be a particular manifestation, and not just the work. Different editions of the same book can be vastly different in number of pages, chapters, authors etc. It's not even inconceivable that different editions of the same book hold contradictory statements, due to new discoveries or the authors simply changing their minds. See for example Programming Windows: first edition and fifth edition. They're actually different books, just on the same topic and with similar content that overlaps to a certain degree. Silver hr (talk) 21:28, 22 April 2013 (UTC)[reply]
Some concerns about items being used both as items and as sources
editAfter the discussions of these last days I have a more clear idea about what is wanted and how to do it, however I still have concerns about regular items used as "source items" (see also my comments on the terminology). These concerns are:
- The self-referencing problem: If there is no difference between items used as "regular items" and items used as "source items", wouldn't it be paradoxical that the same item is used both as a statement holder and as item source for those claims?
- I do not see what is wrong with this. Often, the best available source of metadata for a book is the book itself. --Zolo (talk) 05:35, 21 April 2013 (UTC)[reply]
- The notability problem: if the existence of a source justifies the existence of a regular item, it might be difficult to create useful notability rules (WD:N).
- The splitting problem: an item starts as a work-manifestation mix, and contains information about only one of several manifestations. This item is used as a "source item". Later on, information held by the "source item" is split and now there is an item holding the work information, and several items for each manifestation. The "claim source" is still pointing to the original "source item", but the manifestation information is no longer there.
The last one might be quite rare, and it can be solved with a clever frontend, a splitting tool, or such instances can be queried. The second it is a bit tricky... maybe requiring "instance of: source" with a proof-of-existence link for items that have no wikipedia links? And I wouldn't allow to use an item as a self-reference.--Micru (talk) 22:41, 20 April 2013 (UTC)[reply]
- But then again you might need to introduce a book twice, once as a source item and once as a normal item. And they have the same metadata. Where would authors be? Places of publication? etc. I don't see the advantage of having distinct itemspaces. --Denny (talk) 10:56, 22 April 2013 (UTC)[reply]
- I see your point, Denny. Ideally all version/edition info should be in the same item, but then we have no way of selecting one version or another (there are no sub-item identifiers) and the list could grow too long. Anyway, it is something that needs to be seen working. Maybe the possible problems are not that important, there will be not that many items that need "versioning". --Micru (talk) 13:31, 22 April 2013 (UTC)[reply]
- Another possible option could be that an item represents both the work and the first edition/version. Subsequent editions/versions would have their own item. --Micru (talk) 20:25, 22 April 2013 (UTC)[reply]
- or we could use additional qualifier properties, in the 'reference' snak, to identify the edition i.e. qualifiers for source item, edition, page, quotation with the source item page giving details of title, authors, publisher, etc. Filceolaire (talk) 13:43, 8 May 2013 (UTC)[reply]
- Another possible option could be that an item represents both the work and the first edition/version. Subsequent editions/versions would have their own item. --Micru (talk) 20:25, 22 April 2013 (UTC)[reply]
- I see your point, Denny. Ideally all version/edition info should be in the same item, but then we have no way of selecting one version or another (there are no sub-item identifiers) and the list could grow too long. Anyway, it is something that needs to be seen working. Maybe the possible problems are not that important, there will be not that many items that need "versioning". --Micru (talk) 13:31, 22 April 2013 (UTC)[reply]
The data we have is used, the data we don't have is created
editIn this case we would create new items as needed to support the reference claim, even if the item has no page in wikipedia.
Pros
edit- Keeps machine readability
Cons
edit- More items needed
- More effort to add sources
- More effort to add supporting items to the source
Comments
edit- Support Even if it might sound "too much" to create authors items just because they appear one time in a reference, I think it is the option that could keep the Wikidata machine readability the best, while allowing flexibility to the users to cite new authors. Maybe it would make sense to introduce the concept of motivation, but just the number of Wikipedia links may already give an idea of the importance of the ad-hoc created items.--Micru (talk) 22:31, 16 April 2013 (UTC)[reply]
- Support We need the name of authors in different scripts etc. And almost all authors should have some external Authority Control ID (or else the relevance of the source could also be debated). --Denny (talk) 14:11, 18 April 2013 (UTC)[reply]
- It might be hard to find an Authority Control ID for authors of scientific papers, unless the same paper is accepted as a valid relevance proof for both, the work and the author. Besides when adding a source it would be a good sanity measure for Wikidators to have item creation for work authors semi-automated. I just thought about that Higgs boson paper with 3,171 authors and a chill was sent down my spine... --Micru (talk) 19:35, 18 April 2013 (UTC)[reply]
- Support This seems reasonable. Silver hr (talk) 21:33, 22 April 2013 (UTC)[reply]
- Comment: how is this different from the "all in" model? Sj (talk) 14:49, 24 April 2013 (UTC)[reply]
- From my understanding, the "all-in" model allows creating source items even if they're not used in references (or otherwise). This model allows creation of source items only if there's a reference that needs them. Silver hr (talk) 19:31, 24 April 2013 (UTC)[reply]
- Comment more effort may be needed to add sources and supporting items but this effort only needs to be done once. While we will need to give the book/periodical title and publisher I'm not clear why we need to give the details of the authors. That info is not needed to find the source information supporting our claim so we can do without it. Filceolaire (talk) 14:48, 8 May 2013 (UTC)[reply]
- Support Reasonable for consistency and usability. We need to resolve how to correctly link the items existing only in Wikidata to corresponding newly created articles in Wikipedia. I think that sensible would be to automatically show information about such Wikidata items in Wikipedia. At least at the beginning in Wikipeadia new article creation screen (and maybe redlink screen) we should suggest links to the existing Wikidata items of the same or similar label or alias. --Pabouk (talk) 11:31, 10 June 2013 (UTC)[reply]
Dismissed options about how references are stored as items |
---|
References as items (3Q model) Using this model we would make a distinction using properties according to the functionality of the item. The three possible functions of an item using this model would be:
Pros
Cons
Comments
References as items (Q-R-S model) This model is similar to the 3Q model described above, but instead of relying on properties to differentiate among the uses, 3 different namespaces are used:
Pros
Cons
Comments
Only references for statement sources (1Q model) The "source" field under the claim would store ALL information needed about the source. Pros
Cons
Comments
|
Dismissed options about data stored as sources/references |
---|
A different matter is which data can be imported as sources to be used in references. All-in model In the "all-in" model, ALL bibliographic data could be imported as sources even if that information is not used by current references. Pros
Cons
Comments
Nothing-in model In the "nothing-in" model, all bibliographic data would be linked from external sources (OCLC, Open Library) Pros
Cons
Comments
The data we have is used, the data we don't have is linked In this case we would use items that we have as a basis for building the reference, and when Wikidata doesn't have information about that (i.e. an author without a Wikipedia page) we would link external sources representing that author. Pros
Cons
Comments
|
The discussion above is relevant to references to books (and maybe videos) but doesn't help with references to web pages.
In my opinion the qualifiers needed to make a reference to a web page as the source of a statement would be:
- Property:URL
- Property:date_retrieved
Property:quotation, Property:Author and Property:Publisher might be added in some cases. Filceolaire (talk) 14:38, 8 May 2013 (UTC)[reply]
- Yes, though these properties are also useful for lots of other things that are published online (i.e. almost all articles, modern reference books, newspaper articles, etc.) -- Phoebe (talk) 08:24, 16 May 2013 (UTC)[reply]
- IIRC, when we had the office hour about sources, we already discussed that all online resources should have attached a data_retrieved qualifier.--Micru (talk) 12:27, 16 May 2013 (UTC)[reply]
Database IDs
editWe already have quite a few properties that link to online database entries via an ID-string (e.g. MeSH descriptor ID (P486), COSPAR ID (P247), Entrez Gene ID (P351), UniProt protein ID (P352), ...). We need these properties, because many infoboxes include these strings as links to those databases. But what about citing? Should these properties also be used in sources if a claim is made in that database? How many additional fields have to be filled in? Would it be sufficient to say "disease is chronic & Source: [stated in = "Mesh Database" + MESH ID = "123"]"? --Tobias1984 (talk) 18:24, 30 May 2013 (UTC)[reply]
- Is it an accepted practice in scientific publishing to cite databases? If so, I think we can do so as well. And I don't see what other properties we'd need other than the ID. Silver hr (talk) 22:34, 30 May 2013 (UTC)[reply]
- Only cite the data base if it is a reliable source. This will have to be decided on a case by case basis. Remember that some of these databases get their info from Wikipedia or from the subject of the article. Filceolaire (talk) 23:49, 31 May 2013 (UTC)[reply]
Edition data is also related to sources/references, specially for books, but other materials might require a similar versioning. For instance a book might have several ISBN depending on the language, edition number, or even it might have two first editions published in different places.
Option: as qualifiers
editWith this option a code would be given to each edition that would distinguish it from the other ones. It is suggested: Edition number (where present), language (for works published in more than one language) and publishing place code (in case there are two editions with the same edition number and same language). Then the qualifiers would be added to that edition.
Works with a huge number of editions might be split in other items grouped by language. They would contain the property "edition of".
Example of edition data as qualifiers
Comments
edit- Support This might work for most of the cases. --Micru (talk) 13:19, 16 May 2013 (UTC)[reply]
SupportOppose but we need to figure out how a source on another page can reference an edition. Filceolaire (talk) 15:27, 16 May 2013 (UTC)[reply]
- It shouldn't be a problem using the value and qualifier templates from the inclusion syntax.--Micru (talk) 17:28, 16 May 2013 (UTC)[reply]
- See comment below from the hackathon. There doesn't seem to be a simple way to add a reference to an edition unless the edition is a separate item but improvements to the user interface could be created so edition properties can be edited on the original book page. This seems to be the way to go. Filceolaire (talk) 00:39, 1 June 2013 (UTC)[reply]
- Support Much more natural. Pichpich (talk) 02:04, 17 May 2013 (UTC)[reply]
- Support --Aubrey (talk) 09:36, 17 May 2013 (UTC)[reply]
- Oppose That will be very complex to retrieve data on the client side.
141.6.11.15 15:37, 17 May 2013 (UTC)Snipre (talk) 04:15, 18 May 2013 (UTC)[reply] - Oppose This is a bit of a hack. (1) When viewing the semantic triple on a page under this option you'd get {This page} {edition} {number - language code}. The main problem here is that the object {number - language code} is a string of our own design, trying to be a unique identifer of an edition. The whole point of wikidata was to move away from this type of semi-structured data. (2) Qualifiers are meant to qualify a value. Storing the metadata of an edition in the qualifier space, does not qualify the value (in this proposal the value is {number - language code}. This isn't what the qualifier construct was built for, and we'd be shoehorning and hacking a special data format to fit it. We have better options which I'll outline in the other options. Maximilianklein (talk) 18:22, 17 May 2013 (UTC)[reply]
- I admit that using qualifiers is not the "cleanest" solution, but at least serves our purposes until we have a better one. Then it should be possible to migrate to a new method/system. Ideally there should be an "item creation interface" when adding a new edition, then we'd have the best of both worlds, separate items for editions and grouped edition information. Please note that in previous discussions there was some consensus reached about the need of storing manifestation data in a way that can be reused for citation purposes. So all suggestions should be outlined with this goal in mind. --Micru (talk) 11:42, 18 May 2013 (UTC)[reply]
- I don't understand why the solution we determine now should be temporary. It seems to me that your motivation is to group editions in one place, which is why you support this option. However, you should keep in mind that Wikidata's current interface is probably not going to be the final word. Also, the data itself and its semantic structure should be considered separately from its presentation. It seems to me that you want a particular structure of data for reasons of presentation, but that's not a good thing. Presentation can be made in whatever way we want, regardless of data structure--see for example the Resonator. Silver hr (talk) 00:59, 25 May 2013 (UTC)[reply]
- Oppose I agree with Max. This should only be done if every other element of the data is identical, say between two printings of the same book. If there are substantive reasons why one might want to distinguish between two editions, it makes sense to have separate entries for them. Sj (talk) 15:30, 18 May 2013 (UTC)[reply]
- What I think we need, instead, is a way to capture "narrower"/"broader" entries which are related to other entries. So that you could easily identify the 50 editions of a publication, or from a single edition identify its group of sibling editions.
- One of the reasons not to make a special case out of "editions" is that this narrowing/broadening problem happens at many levels of abstraction. So at the work level, you similarly want to be able to see all expressions of that work. And you don't want them to all show up as 'qualifiers' of the entry for that work. Sj (talk) 15:30, 18 May 2013 (UTC)[reply]
- Oppose per Maximilianklein, and also because we should be consistent, and not split works according to some arbitrary number of editions. I frankly don't see the purpose of that. Silver hr (talk) 00:59, 25 May 2013 (UTC)[reply]
Option: each edition as a different item
editIn this case each edition would have its own item and all properties would be stored like now, as base properties. The problem with this system might be the big amount of generated items, the difficulty to find the right edition-item later on and the repetition of some data (author/title might appear in each one).
Comments
edit- Oppose I would leave this option only for special cases.--Micru (talk) 13:19, 16 May 2013 (UTC)[reply]
- Oppose Wikipedias have one article for a book which covers all the editions. It is preferable that wikidata follow the same pattern rather than adding loads of extra items. The problem is how a source statement on another page references an edition statement. Filceolaire (talk) 15:10, 16 May 2013 (UTC)[reply]
- I don't see why we should mirror Wikipedia. Wikipedia, being an encyclopedia, likes to have articles with a decent amount of text in them, even if the articles are about several different but related things. That's understandable because it makes it easier to read. But Wikidata is a separate project, and its basic units are items, not articles, and items are about concepts, no matter how small or large. We don't need to satisfy a quota of text, therefore we don't need to mash together several things into one item. Silver hr (talk) 01:43, 25 May 2013 (UTC)[reply]
- Oppose but as pointed out by Micru, this will always be possible for special cases. Pichpich (talk) 02:03, 17 May 2013 (UTC)[reply]
- Oppose as Micru. --Aubrey (talk) 09:36, 17 May 2013 (UTC)[reply]
- Support A good trade-off for data use on the client side.
141.6.11.15 15:41, 17 May 2013 (UTC)Snipre (talk) 04:16, 18 May 2013 (UTC)[reply] - Oppose This option doesn't reflect the status quo on Wikipedia. Maximilianklein (talk) 16:42, 20 May 2013 (UTC)[reply]
- Support I don't see why so many people oppose having "a large amount of items", as long as they're about something notable. The world is a big place with many things, and we aim to record them. It's not like we're going to run out of hard drive space. And as for potential clutter and search issues, that's a user interface problem and should be solved as such. Silver hr (talk) 01:43, 25 May 2013 (UTC)[reply]
- Support It's precise, clean and homogeneous, it will help automation and bots. Datas are usually easy to import, I see no objections. It is needed in a number of cases. Interfaces can releave the pain of editing, anyway wikidatas operations are too low level so good and efficient editions interfaces are needed anyway. Just one structure and no particular cases to consider will help to build them, so let's choose the most expressive model as a reference. TomT0m (talk) 17:32, 18 June 2013 (UTC)[reply]
Option: A separate entry for each expression of a work, and for distinct editions
edit- Minor editions – multiple editions with the same ISBN (for instance, multiple printings, with tiny corrections) – should have a single entry. Information on the various printings and their date range could be included as a field.
- Each major edition – with separate ISBN, or from separate publishers – should have its own entry. (These are truly different works; they often have different introductions and indexes. See for instance On The Origin of Species.)
- Each expression – the collection of all reprintings and translations by different publishers – should have its own entry. (This is the level of abstraction that usually gets its own Wikipedia article)
- Each work – the collection of all expressions of that work in different formats and contexts – should have its own entry.
This will improve the precision with which people can refer to different abstraction-levels related to a single concept or work.
Comments
edit- Support as cleanly structured option. Sj (talk) 15:13, 18 May 2013 (UTC)[reply]
- I do not think this will produce "too many" entries - the vast majority of works are only published once. We do need a clean way to visualize this FRBR-style tree for a single work, but listing all editions in a single entry doesn't solve that problem. Sj (talk) 15:10, 18 May 2013 (UTC)[reply]
Option: All data in the source section of the statement
editAll data are directly added in the source section of the statement. This gives no possibility for reuse but this will be the simplest case for data recovery from the client side. And this will be the most common way to source data with sources which are not a book (scientific or newspaper article, website, report,...)
Comments ...
editConversations about edition data during the Hackathon
editWhile I was there, I brought into the attention of the Wikidata team the problematic of how to handle edition data (as outlined by Sj). None of the proposed options seems satisfactory for the several reasons exposed above. The option "with qualifiers" doesn't represent the exact semantic nature of a FRBR-relationship. On the other hand, the option of having separate items in the current form of Wikidata means that the item creation and maintainability (for users) would mean an extra effort and, while the data would be machine readable, the interface wouldn't efficiently display the information to potential human readers. Some brainstorming with Daniel Kinzler et al. lead to the conclusion that in order to support editions as separate items while maintaining the visual clarity of the option "with qualifiers" it would require of two developments that seem to be easily reachable and that would benefit the Wikidata community at large:
- Expandable information for items: meaning that item A used as property inside item B would have an arrow/option to display some properties of item A without visiting its page. The number of displayed properties could be around 5, with a "show more" option.
- Item creation from another item: meaning that the user wouldn't need to interrupt the process of modification of an item to create another one. It will probably be an applet for item creation combined with an entity suggester (for instance when adding an edition it would suggest "is edition of" the item currently being displayed).
While the time frame for these features is still unclear, I hope we can move towards it.--Micru (talk) 21:23, 28 May 2013 (UTC)[reply]
- Good ideas, and in line with what I've been saying: we should recognize user interface problems as such and solve them as such, instead of modifying the semantic structure of our data just so that it would display a certain way. Silver hr (talk) 22:19, 30 May 2013 (UTC)[reply]
- Let me see if I understand this correctly? Let us assume a Wikidata Item I, about an Expression E. Then you propose that I has Claims {C_0 ... C_n}, where the Property P_j of C_j is edition. The value of C_j is a Wikidata Item I_j. I_j has it's own list of Claims lets call {D_0 ... D_m}. When viewing the item I, one will be able to see and edit {C_0 ... C_n} the editions, as well as {D_0 ... D_m} the properties of an edition that are part of a different item. If that's true, then I would tentatively support this proposal. Maximilianklein (talk) 00:21, 31 May 2013 (UTC)[reply]
- That is correct.--Micru (talk) 01:17, 31 May 2013 (UTC)[reply]
- I like this solution. Thank you for thinking so clearlly about it, Micru. Sj (talk) 01:48, 1 June 2013 (UTC)[reply]
- Would it be possible for someone who really understands this RfC to make a figure of the remaining options. I have read over the options and comments a few times and I can't say I understand all the implications. --Tobias1984 (talk) 16:36, 31 May 2013 (UTC)[reply]
- As I understand it each edition would get it's own item. The user interface would be improved at some time in the future so you can edit a referenced item without leaving the page you are on.
- This will make it easy to refer to a particular edition in a source but will complicate the wikilinks. Each language wiki with an article about the book will have to decide if they should link to the main wikidata page for the book or to the wikidata page for the translation of the book. If they choose the first option then they get lots of langlinks but the syntax for importing statements from the translated edition(s) wikidata pages is more complicated as the infobox template needs the item number for that page.
- If they link to the translation then they get very few links (other languages link to the original or to the edition in their language) but the template can probably get the data from the page for the original book automatically via the link to that wikidata item via a 'translated from' property. Filceolaire (talk) 01:00, 1 June 2013 (UTC)[reply]
- This is a valid concern, but it is an interface problem and should not govern the semantic structure (as Sliver hr put it above). With well structured data, the template can be set to do the "commonly desired" result unless an editor specifically requests something different. And on some occasions you do want to do something different: for instance, you might create a wikisource page on "The Origin of Species, First Edition" in which case you want to link to only that edition (and its translations). Sj (talk) 01:48, 1 June 2013 (UTC)[reply]
- I agree with Sj that while being valid concerns, it is more a problem of interface that anything else. In my opinion "edition items" shouldn't be used for holding language links unless the wikipedia article refers to a particular edition (rare). However we should rethink the book infobox and be able to link to the main work item and optionally to a "featured edition" of that work. All wikipedias will link the same item, but some will feature a different edition.--Micru (talk) 03:20, 1 June 2013 (UTC)[reply]
- While it is rare for english wikipedia to refer to a specific edition, this is more common in other languages where infoboxes will have data from the local language edition (title, translator) item as well as from the original book. If the foreign language wikipedia page has a site link to the wikidata page for the original book (to get all those other site links) then there is no easy way for the infobox on the foreign language wikipedia page to automatically find the wikidata page for the translation which has this information about the translation. Editors may have to enter the Q number for the local language edition wikidata page by hand in this template only unless the infobox code is clever enough to figure out the language of the wikipedia and then find the edition number (P393) for that language (it better have a qualifier with the Wikimedia language code (P424)) and follow the link to the wikidata page for that edition to get the extra info.
- Citations will also link to a page in a particular edition. Filceolaire (talk) 09:09, 1 June 2013 (UTC)[reply]
- Yes, the infobox template code should be clever enough to select the right data for the corresponding language version. The standard language property might be enough for that. Otherwise use two item codes for the template.--Micru (talk) 21:07, 2 June 2013 (UTC)[reply]
- I agree with Sj that while being valid concerns, it is more a problem of interface that anything else. In my opinion "edition items" shouldn't be used for holding language links unless the wikipedia article refers to a particular edition (rare). However we should rethink the book infobox and be able to link to the main work item and optionally to a "featured edition" of that work. All wikipedias will link the same item, but some will feature a different edition.--Micru (talk) 03:20, 1 June 2013 (UTC)[reply]
- This is a valid concern, but it is an interface problem and should not govern the semantic structure (as Sliver hr put it above). With well structured data, the template can be set to do the "commonly desired" result unless an editor specifically requests something different. And on some occasions you do want to do something different: for instance, you might create a wikisource page on "The Origin of Species, First Edition" in which case you want to link to only that edition (and its translations). Sj (talk) 01:48, 1 June 2013 (UTC)[reply]
Discussion
editAfter all these discussions, I think we should summarize how wikidata editors should proceed when sourcing statements. I wrote this draft which might need some discussion. My comments are in red.--Micru (talk) 04:29, 1 June 2013 (UTC)[reply]
- My changes in green on silver Filceolaire (talk) 07:38, 1 June 2013 (UTC)[reply]
- Some comments: (a) even if the item itself is a source, it might need to be verifiable, (b) when in media I wrote "from time-to time" I was referring to the location in minutes/seconds inside the program. Not sure if there is a property that can be used for that, (c) on a second thought, I think it is better to have both book item and edition item on the statement.--Micru (talk) 13:43, 1 June 2013 (UTC)[reply]
Can scientific-databases be a separate point on this list? They don't really fall into any of the categories below.If you look at Huntingtons disease that it has an OMIM string "143100" which links to this entry "http://omim.org/entry/143100#contributors-shutter" in the OMIM database. The entry has 391 peer-reviewed sources. I think that for certain information it is more accurate to put databases like these, into the source field, because it represents the combined work of many generations of scientists instead of citing the paper that first came with a concept, and citing the newest one for the most recent numbers/opinions. Would it be a good solution to say "Huntingtons disease" & "causes = cognitive decline" & "Source: stated in: OMIM database, OMIM ID 143100" --Tobias1984 (talk) 14:03, 1 June 2013 (UTC)[reply]- How did I not see that it is already there :) --Tobias1984 (talk) 14:11, 1 June 2013 (UTC)[reply]
- You mix source and bibliography: source is related to a specific information, list of document about a topic is bibliography. We don't need bibliography in Wikidata. This is the role of wikipedia to provide documents on a topic. Wikidata is working at a more specific level. Snipre (talk) 09:46, 3 June 2013 (UTC)[reply]
- But what is your opinion on citing the database. For me it seams more natural that certain coherent collections of information are the product of many scientific publications. Text books distille hundreds of peer-reviewed papers into a coherent form. Citing them is also sometimes more sensical that citing a hundred papers that were written spanning decades, that each contributed just a part of the total image. I think what I would like to ask is: Does Wikidata aim for primary-citations (meaning the first person that ever published the thought in a peer-reviewed journal?) --Tobias1984 (talk) 11:13, 3 June 2013 (UTC)[reply]
- Wikidta aims to provides data and their sources. If you have several sources for the same data (that doesn't mean the same value) you can add them without any problem. But don't forget that you will have to screen them later in wikipedia in order to retrieve one value (most infobox will deal with one value per entry).
- And about citing the database you cited I am opposing because the list of document is not defining a value but a topic: causes of the disease, mechanisms, observations, treatments,... That typically an overview of the knowledge about the disease and not about one accurate data. That the goal of wikipedia to provide an overview of the topic. Wikidata can't do that task in the same way: you have to decompose the topic into characteristic parameters and then find in the literature which document is speaking is speaking about each parameters. Snipre (talk) 11:29, 3 June 2013 (UTC)[reply]
- Let me give you another example. If somebody adds the formulas of newtons laws should he cite Newton or just a book about mechanics? What about those cases where the primary citation is unknown? Wikipedia says somewhere that it is a ternary source build out of information from secondary sources. What about Wikidata? Will only peer-reviewed journals be accepted as sources? --Tobias1984 (talk) 12:18, 3 June 2013 (UTC)[reply]
- Often it is better to source with the first publication but not always in many cases theories were developed by other persons or in sevreal documents over the time. But there are no rules for that and you can add the first and the last publication for a topic or different documents in different languages in order to give the possibility to read the source in different languages. This is not the most important thing (this will be important to clasify sources between them when you have several sources), what is important is to source at least once with a correct source then later you can add better sources. This will a task for task forces to define the best sources for each field: for scientific subjects scientific journals will be necessary or at least wellknow handbooks, for video games I don't expect to find scientific articles, for catch or soccer if we have sport magazines it will be a dream,.... Snipre (talk) 17:24, 3 June 2013 (UTC)[reply]
- Let me give you another example. If somebody adds the formulas of newtons laws should he cite Newton or just a book about mechanics? What about those cases where the primary citation is unknown? Wikipedia says somewhere that it is a ternary source build out of information from secondary sources. What about Wikidata? Will only peer-reviewed journals be accepted as sources? --Tobias1984 (talk) 12:18, 3 June 2013 (UTC)[reply]
- But what is your opinion on citing the database. For me it seams more natural that certain coherent collections of information are the product of many scientific publications. Text books distille hundreds of peer-reviewed papers into a coherent form. Citing them is also sometimes more sensical that citing a hundred papers that were written spanning decades, that each contributed just a part of the total image. I think what I would like to ask is: Does Wikidata aim for primary-citations (meaning the first person that ever published the thought in a peer-reviewed journal?) --Tobias1984 (talk) 11:13, 3 June 2013 (UTC)[reply]
- You mix source and bibliography: source is related to a specific information, list of document about a topic is bibliography. We don't need bibliography in Wikidata. This is the role of wikipedia to provide documents on a topic. Wikidata is working at a more specific level. Snipre (talk) 09:46, 3 June 2013 (UTC)[reply]
- How did I not see that it is already there :) --Tobias1984 (talk) 14:11, 1 June 2013 (UTC)[reply]
I think it is better to have items for scientific, newspaper or magazine articles, these can be reused in wikipedia. Otherwise it is impossible to move the data stored in {{cite doi}} to wikidata. --凡其Fanchy 19:06, 1 June 2013 (UTC)[reply]
- I agree, I added that if the article is supposed to be reused, then an item should be created.--Micru (talk) 20:27, 1 June 2013 (UTC)[reply]
- What is the criterion for reuse ? Because without clear definition it is better to directly create an item. Snipre (talk) 09:46, 3 June 2013 (UTC)[reply]
- I created an example Chenqiao Coup (Q8012976), the source in which can be reused in wikipedia articles. And the descriptions in Wikidata:List_of_properties/Others need to be changed accordingly. page(s) (P304), volume (P478) can also be used in combination with part of (P361) as qualifiers. --凡其Fanchy 15:07, 2 June 2013 (UTC)[reply]
I have updated the guidelines with the suggestions made so far. The older version is at the bottom the page marked as archived.--Micru (talk) 03:22, 5 June 2013 (UTC)[reply]
- Good work, I think the concept becomes clearer. In my opinion we have to indicate in the description you updated all relevant properties: people have to have all information in the same page and we have to avoid link to tables located in a different page. Then I proposed to avoid to link the work item with all its edition items: if each edition item is connected to the work item we don't need the double links. And this will keep the work item as clean as possible. Finally I am wondering if we have to classify these edition items in a specific way in order to distinguish them: this can be helpful for some kind of query to look only in edition items instead in work items. 08:24, 5 June 2013 (UTC)
- @凡其. In your example On "Chenqiao Coup" (Q13414893) you define the page property, the date of publication and the issue as qualifiers of the scientific journal in which the article is published. I prefer to move these information as property of the article and not as qualifier of the journal because 1) this will be more simple to extract in a template (less sublevels in data query) 2) these properties refer to the article, they are characteristics of the article and not of the journal. Do you have any concern if we propose to list this kind of data in that way ? Snipre (talk) 08:24, 5 June 2013 (UTC)[reply]
- On second thoughts, maybe the "date of publication" should be a statement rather than a qualifier, because it has its meaning independent of the journal. The volume, issue, and page describe the relation between the journal and the article. I am not sure about the precise meaning of qualifier. If the qualifiers can be used to describe the relation, I think its better to store these info as qualifiers, but if the qualifiers can only be used to describe some properties of the journal, they should be moved as property of the article. Data query seems not a big problem.--凡其Fanchy 12:05, 5 June 2013 (UTC)[reply]
- publication date (P577) has to put as article property because scientific articles have a specific date of publication which is the date when the paper was accepted for publication. Then page is a property of the article and not a qualifier of the journal because you can understand that property as the pages of the whole journal or pages of the whole volume/issue of the journal if this is put as qualifier. Snipre (talk) 00:05, 7 June 2013 (UTC)[reply]
- But an article could be republished by another magazine or book. This is rare for scientific journals but common for literature magazines( see en:Stephen_King_short_fiction_bibliography, and I am sure it is true for many Chinese writers). The "date of Publication" can be stored as both qualifier and statement to represent "specific date of publication for that magazine" and "original date of publication for that article" . It is better to treat magazine article in the same way although literature articles are rarely used as sources . I changed On "Chenqiao Coup" (Q13414893) for comparing with Tectonic map and overall architecture of the Alpine orogen (Q13416617). And for different kind of pages, just record the number printed on a page. How the magazine use "page number" should be a property for the magazine. --凡其Fanchy 20:27, 7 June 2013 (UTC)[reply]
- But in that case you have to follow the same guideline as for book: create an work item for the article and 2 edition articles for the publication and republication. If you want to work correctly you have to follow the same procedure as for book editions: if you mix different "article editions" in the same item how do you want to defien in the statement which edition you used as reference ? So your idea is wrong and this adds more complexity where simplicity id highly desirable from data extraction. Snipre (talk) 21:45, 7 June 2013 (UTC)[reply]
- But an article could be republished by another magazine or book. This is rare for scientific journals but common for literature magazines( see en:Stephen_King_short_fiction_bibliography, and I am sure it is true for many Chinese writers). The "date of Publication" can be stored as both qualifier and statement to represent "specific date of publication for that magazine" and "original date of publication for that article" . It is better to treat magazine article in the same way although literature articles are rarely used as sources . I changed On "Chenqiao Coup" (Q13414893) for comparing with Tectonic map and overall architecture of the Alpine orogen (Q13416617). And for different kind of pages, just record the number printed on a page. How the magazine use "page number" should be a property for the magazine. --凡其Fanchy 20:27, 7 June 2013 (UTC)[reply]
- publication date (P577) has to put as article property because scientific articles have a specific date of publication which is the date when the paper was accepted for publication. Then page is a property of the article and not a qualifier of the journal because you can understand that property as the pages of the whole journal or pages of the whole volume/issue of the journal if this is put as qualifier. Snipre (talk) 00:05, 7 June 2013 (UTC)[reply]
- On second thoughts, maybe the "date of publication" should be a statement rather than a qualifier, because it has its meaning independent of the journal. The volume, issue, and page describe the relation between the journal and the article. I am not sure about the precise meaning of qualifier. If the qualifiers can be used to describe the relation, I think its better to store these info as qualifiers, but if the qualifiers can only be used to describe some properties of the journal, they should be moved as property of the article. Data query seems not a big problem.--凡其Fanchy 12:05, 5 June 2013 (UTC)[reply]
Updated the guidelines with base properties for books/editions and part of (P361) as an optional way of indicating that a web page belongs to an existing item representing the web site. The proposed pair edition / edition of to link work-edition needs comments. I recently discovered subclass of (P279) which could be used instead of "edition of", however the conveyed meaning wouldn't be as clear.--Micru (talk) 14:29, 6 June 2013 (UTC)[reply]
Do we recommand the translation of the item label of a source ? This is just to be sure if we have to add all the time the properties P357 (P357) and original language of film or TV show (P364). Snipre (talk) 11:56, 7 June 2013 (UTC)[reply]
Web pages
editWhat about web pages that already have an item like the New York Times. Wouldn't it be better to say "stated in" = "New York Times" and let the URL point to the article? --Tobias1984 (talk) 09:05, 5 June 2013 (UTC)[reply]
- If the item already exists why not but I prefer to avoid to add this kind of data in order to avoid the creation of items on website. The title is better to define this kind of information. Snipre (talk) 09:37, 5 June 2013 (UTC)[reply]
- I agree partially with Snipre, but I also think that if the website being link has already an item, then it should be possible to link it too. Maybe with part of (P361)?--Micru (talk) 15:47, 5 June 2013 (UTC)[reply]
- Isn't the website of New York Times the online version of the printed New York Times ? If it is true, then it is a newpaper article and better to be archived by another item. The web pages in the guidelines may need some clarification. Webpages of an organization (Q43229)(or a person (Q215627)) and webpages to show work (Q386724) published by reliable media (printed or online, like NYTimes, bbc.com, cnn.com) should be distinguished. In the latter case, the articles could appear in many webpages and I think they should be recorded in another item no matter they are online or printed. And for webpages of an organization or a person, we can use a new property "stated by", only for info published in some trivial things which cannot be recorded in Wikidata like personal pages, official sites of an organization. And then it is clear to use "stated in" for sources that are work (Q386724) which can be(or has been) created as an item of Wikidata, and "stated by" with
optional(mandatory?) url for sources which cannot be created as an item in Wikidata but claimed by person (Q215627) or organization (Q43229). --凡其Fanchy 19:47, 6 June 2013 (UTC)[reply]- Ok about clarifying that if the article has a printed version, then it should follow the procedure for articles and link the web page from the article item.
- About your reputation argument, I think that if the web site is reputable, then it must fulfill notability criteria (Wikipedia page), which means that there is already an item representing the website.
- I don't think "stated by" the website (organization) is stating the information, the webpage (creative work) does, and that is already represented by the (future) property web page. The web page belongs to a web site, hence the "part of" property.
- About re-using webpages, are there really that many web pages that deserve an item? --Micru (talk) 20:42, 6 June 2013 (UTC)[reply]
- By "stated in" and "stated by", all kinds of sources can be (and could be made as a must) covered and linked by an item( work (Q386724), organization (Q43229) or person (Q215627)). If you find a source in a trivial work and you even don't know who claimed it, it is definitely an unreliable source and shouldn't be used.
- All web sites are controlled by person (Q215627) or organization (Q43229). Their reputation are actually from the person (Q215627) or organization (Q43229). So the web pages should be either representing a work (Q386724) or be a trivial work but known to be controlled by reputable organization (Q43229) or person (Q215627), or they are just unreliable. And "stated by" can also be used for some other trivial work like a pdf format file published only on a website of an organization, for which a Commons media can be used instead of a URL . For instance, this file Administrative_codes_for_Jiangsu can be used as source for China administrative division code (P442) with <stated by> National Bureau of Statistics of China (Q1509446)
- News by Reuters could possibly be forwarded to many places. --凡其Fanchy 13:09, 7 June 2013 (UTC)[reply]
- Isn't the website of New York Times the online version of the printed New York Times ? If it is true, then it is a newpaper article and better to be archived by another item. The web pages in the guidelines may need some clarification. Webpages of an organization (Q43229)(or a person (Q215627)) and webpages to show work (Q386724) published by reliable media (printed or online, like NYTimes, bbc.com, cnn.com) should be distinguished. In the latter case, the articles could appear in many webpages and I think they should be recorded in another item no matter they are online or printed. And for webpages of an organization or a person, we can use a new property "stated by", only for info published in some trivial things which cannot be recorded in Wikidata like personal pages, official sites of an organization. And then it is clear to use "stated in" for sources that are work (Q386724) which can be(or has been) created as an item of Wikidata, and "stated by" with
- I agree partially with Snipre, but I also think that if the website being link has already an item, then it should be possible to link it too. Maybe with part of (P361)?--Micru (talk) 15:47, 5 June 2013 (UTC)[reply]
- Can't we simplify ? if the contributor is using the online version of a newspaper, better define it as a webpage and if he is using the hard copy use the article format. The main reason is that online version even is published later in a printed version can be easily modified. For the same reason even if an online version exists for a printed article avoid to put the url: there are some possibilities to see differences between the different format and this can lead to confusion. Snipre (talk) 00:14, 7 June 2013 (UTC)[reply]
- I mean centralizing all URLs and hard copy info in another item ,not avoiding URLs. You can always see the differences. I just think only using a URL as a source is too simple and lacks connection with other items. --凡其Fanchy 13:09, 7 June 2013 (UTC)[reply]
Archived version of the guidelines |
---|
{{box|header=Guidelines for sourcing statements| When to source a statement Wikidata is a collection of sourced data, which means that most statements should indicate the data provenance. In some cases the source requirement can be skipped:
Book
Scientific, newspaper or magazine article
Web page
Trusted database
Media (TV/radio/video)
|
My comments on:
- Books:
- We don't really need an extra item for each edition of a book. Take for example Pschyrembel Klinisches Wörterbuch (Q2115648) a german standard medicine book. It has currently 264 editions. Should I file a bot request to create them all because I think that this will be used often as a reference? The information about the edition can better stored in the sources section as part of the source.
- Or maybe we will have a Harry Potter task force. The task force creates items for all persons and places and wants to source them well. Because they are multilingual they will source all statements with book editions they have thereself, there will be items for books in different languages and also items in the same language. But all the books have (normally) the same content.
- Maybe we can also add informations like pictures of the cover about different editions to the main item and use qualifiers to distinguish them.
- Scientific, newspaper or magazine article:
- Mainly the same things as in the books section.
- Trusted database:
- Very many trusted databases are online like LCCN or GND, but they are different to a webpage, because that I propose to use a different sourcing layout. I agree with the authors of this proposed guideline, that the point in time (P585) is important for databases. But instead of using a single "databse ID property" I propose to use the existing special properties for databases like Library of Congress authority ID (P244) or VIAF ID (P214) and extend them to the list of used, trusted databases. The stated in (P248) is not needed anymore.
- Some general comments:
- If we have so many different items sources it makes search queries more complex. If we want to get all places where book is used as a reference, we need to search first for all editions of the book and then we have to search for all usings as source. That means we need at minimum two more complex queries, otherwise we would only need one easy query.
- Today most of the bots don't add "reliable" sources to their edits. We can't just stop them all, this would kill Wikidata.
- Because Wikidata is a multilingual project we need to translate labels to multiple languages. If we have less items they have often Wikipedia articles, so the Wikipedia article title can be automatically used for the label in the articles language. Descriptions can't be autocreated so often, so they need to be updated manually, this means it makes a big difference between 1 description for each language or many descriptions for each language to describe a book or an article.
- Please don't forget that also Wikipedia started with stubs and not with feautured articles. Wikidata should work the same way, first start with not so well sourced "stubs" and then extend them to "feautured items", otherway arround it don't work like the Nupedia project showed us.
Waiting for your ideas and comments --Pyfisch (talk) 14:19, 10 June 2013 (UTC)[reply]
- Note: The system of creating an item for single editions was rejected by a 2/3 majority of the votes, this point have to be changed in the proposed guideline. (Books and article sections) --Pyfisch (talk) 14:32, 10 June 2013 (UTC)[reply]
- The guidelines are more the result of the consensus than the vote above. Then about your comment an edition item is created only for a source which used in wikidata. So for your 264 editions if only 2-5 are used as sources, only 2-5 items will be created. Snipre (talk) 14:46, 10 June 2013 (UTC)[reply]
- Ok, only the needed items are created, but also 2-5 item per book can create very many items, if many books are used as sources. I am a bit wondering that you just say that the guidelines are "result of the consensus" if there was clearly no consensus before for that. Please don't forget that all user who are interested in this topic want to read this really long discussion. I hope there will be a final vote over the result of the before discussion to get wide consensus for this.--Pyfisch (talk) 15:18, 10 June 2013 (UTC)[reply]
- I agree for the need of a vote but I think the main problem is that this page is unreadable due to so many comments and other discussions which have their place in the talk page. Usually the main is the place for the discussion results not the place where the discussion take place. Snipre (talk) 15:29, 10 June 2013 (UTC)[reply]
- Except this is an RFC page and the main page of an RFC is the place for discussion. Filceolaire (talk) 17:40, 10 June 2013 (UTC)[reply]
- I agree for the need of a vote but I think the main problem is that this page is unreadable due to so many comments and other discussions which have their place in the talk page. Usually the main is the place for the discussion results not the place where the discussion take place. Snipre (talk) 15:29, 10 June 2013 (UTC)[reply]
- The initial discussion on sources was against creating a separate page for each edition of a book. During the discusion, however, it became apparent that it is not practical to add a source which refers to an edition which is described in a statement. Each edition you want to refer to needs to be in a separate item. And so that is what the recommendations are based on.
- The change proposed by Psyfisch to use the database specific properties for each database instead of using 'stated in' sounds good to me. Any objections to making this change to the guidelines?
- The Simple Guidelines for Sources was written as a summary of the discussion and was agreed here. It was then posted on Wikidata Chat and there were no objections there either (apart from one minor edit). I think that is as close to consensus as we are likely to achieve.
- Pyfisch's comment on whether or not bots should be allowed to add statements without sources is unrelated to the guidelines for sources. Could this be a separate section please? Filceolaire (talk) 17:40, 10 June 2013 (UTC)[reply]
- I would really like to seperate the bots part from this but mostly Pichpich always wants to stop the bots with exactly this proposed guideline so I can't seperate it. See Wikidata talk:Bots#More precise requirements for statement-adding bots and Wikidata:Requests for permissions/Bot/SamoaBot 26. --19:09, 10 June 2013 (UTC)
- Ok, only the needed items are created, but also 2-5 item per book can create very many items, if many books are used as sources. I am a bit wondering that you just say that the guidelines are "result of the consensus" if there was clearly no consensus before for that. Please don't forget that all user who are interested in this topic want to read this really long discussion. I hope there will be a final vote over the result of the before discussion to get wide consensus for this.--Pyfisch (talk) 15:18, 10 June 2013 (UTC)[reply]
- The guidelines are more the result of the consensus than the vote above. Then about your comment an edition item is created only for a source which used in wikidata. So for your 264 editions if only 2-5 are used as sources, only 2-5 items will be created. Snipre (talk) 14:46, 10 June 2013 (UTC)[reply]
A few comments on Pyfischs' message:
- Vote: these guidelines need to be voted once we have a definitive version on June 15th.
- Bots: the guidelines have nothing to do with bots. It is recommended to source statements, not forced. Bot imports will need a separate RFC since it is totally out of scope of this one.
- Separate items: The initial thought was that the option "with qualifiers" could be used to reference individual editions. That was later on discouraged, but an alternative offered that would be visually similar to the option "with qualifiers" and which would allow to use individual editions. Some people that voted against using separate items, are in favor if the UI/workflow is improved (including myself).
- Database: agree removing the "stated in" redundancy for databases.--Micru (talk) 21:10, 10 June 2013 (UTC)[reply]
- Against the removal of "stated in" from the guidelines for database: this information allows the identification of the database in an more explicit way than an database ID property. Then think of the data extraction in wikipedia: how do you want to label the source with only an ID and date ? There is no possibility to extract from the database ID property the corresponding database in which it is used. Just have a look once at a wikipedia page using database references: you need the name of the database and if possible a link to the webpage, if the data is online a link to the webpage containing the data and the access date. Snipre (talk) 21:31, 10 June 2013 (UTC)[reply]
- If it is a structural necessity to display information in Wikipedia, it also could be automatically added without user intervention (a bot adding "stated in" depending on the database or automated by the property itself). I don't think it is a good idea to ask the users to enter redundant information when that can be automatic.--Micru (talk) 22:35, 10 June 2013 (UTC)[reply]
- Ok, if a bot can automatically do the addition of the statement in wikidata. But this has to be done in wikidata and not in wikipedia: to do this in wikipedia you will need a conversion table database ID property -> database name. Snipre (talk) 23:52, 10 June 2013 (UTC)[reply]
- If it is a structural necessity to display information in Wikipedia, it also could be automatically added without user intervention (a bot adding "stated in" depending on the database or automated by the property itself). I don't think it is a good idea to ask the users to enter redundant information when that can be automatic.--Micru (talk) 22:35, 10 June 2013 (UTC)[reply]
- I'm in favor of keeping the "stated in = database" statement per Snipre's comment. We could actually get rid of all the database specific properties if we use "stated in" = "name of database" with another property "ID in database" = "1234" and all we need to create a link would be the basic url that could be stored with the item. I once proposed that idea, but it couldn't gather any supporters. --Tobias1984 (talk) 09:59, 11 June 2013 (UTC)[reply]
About bots vs humans: if we decide to systematically waive the sourcing requirements when the statements are added by bots, then within a month or two we will get quite literally millions if not tens of millions of bot-added unsourced statements and the guideline's recommendation to use proper sources will be a farce. It will be impossible to tell human newbies that although 99.999% of the statements they see are unsourced, we do recommend that ordinary humans use sources properly (and probably also recommend that they source the bots' statements!). We'll also get blasted by local Wikipedias who have expressed concerns about using Wikidata's data if it's improperly sourced. I get the feeling that bot proponents believe that bots shouldn't be bothered with sources because it will prevent Wikidata from adding tons of statements. But on the contrary, because bots are able to flood Wikidata with millions of statements in a very short time, we need them to do things right. And it's not impossible. It's slower and it's less convenient for bot operators because you need a separate bot for every reliable database that can be harvested for data. But it's doable and we have time: nine months ago, this project had something like 0 items. Pichpich (talk) 05:33, 12 June 2013 (UTC)[reply]
The final vote is open from June 17 to June 24.
Guidelines
editGuidelines for sourcing statements |
When to source a statement Wikidata is a collection of sourced data, which means that most statements should indicate the data provenance. In some cases the source requirement can be skipped:
Book
Scientific, newspaper or magazine article
Report, technical documentation
Web page
Trusted database If the database is on-line, see webpage.
Media (TV/radio/video)
Notes
Examples of use
|
Support
edit- Support--Micru (talk) 13:24, 17 June 2013 (UTC)[reply]
- Support --Tobias1984 (talk) 13:31, 17 June 2013 (UTC)[reply]
- Perfect. Alexander Doria (talk) 14:23, 17 June 2013 (UTC)[reply]
- Support --Paperoastro (talk) 14:54, 17 June 2013 (UTC)[reply]
- Support--Quico (talk) 16:10, 17 June 2013 (UTC)[reply]
- Support Pichpich (talk) 16:40, 17 June 2013 (UTC)[reply]
- Support--Andrew Su (talk) 16:52, 17 June 2013 (UTC)[reply]
- Support -- Ypnypn (talk) 14:54, 18 June 2013 (UTC)[reply]
- Support --Odejea (talk) 17:15, 18 June 2013 (UTC)[reply]
- Support--Filceolaire (talk) 00:14, 19 June 2013 (UTC)[reply]
- Support --DixonD (talk) 11:48, 20 June 2013 (UTC)[reply]
- Support --Aubrey (talk) 13:24, 20 June 2013 (UTC)[reply]
- Support --Stevenliuyi (talk) 19:03, 20 June 2013 (UTC)[reply]
- Support I see benefits in creating items for books, a bit of a pain, but allows multiple items to use the same source easily. Courcelles (talk) 22:06, 21 June 2013 (UTC)[reply]
- Support Not so much overhead in any to have editions when the tools will be adapted, a regular and clean structure which will help bot, the work will just have to be make once and easily automated. TomT0m (talk) 13:00, 22 June 2013 (UTC)[reply]
- Support Snipre (talk) 11:21, 24 June 2013 (UTC)[reply]
- Support That seems workable to me, but we need good bots to convert source indications to the right format rather than just slapping the heads of users who do not follow the guidelines ! --Zolo (talk) 16:56, 24 June 2013 (UTC)[reply]
- Support Having a unique identifier per reference (down to the article level) seems the way to go to me. This is consistent with how other bibliographic databases work (e.g. most scientific publishers using digital object identifiers). Managing the large number of items then boils down to designing an appropriate user-interface that can e.g. automatically embed sub-items into their parent(s). This is not a trivial task, but subdividing "atomic" items into smaller parts, seems much more problematic: e.g. I don't see how I would easily indicate a particular printing of a book comes with a specific set of OCLC, publisher, publication data, etcetera, by using qualifiers instead of actually splitting the item. —Ruud 23:42, 24 June 2013 (UTC)[reply]
Oppose
edit- From a pure usability standpoint, I am extremely concerned about adding an item for any and every source used (at least, books anyway), and I'm not sure those items would fill the need set out in WD:N for "structural" purposes (which itself is meant for items which connect items to other items as in an "unnotable [in the en.Wikipedia sense] father linking a son and a grandfather). --Izno (talk) 22:32, 17 June 2013 (UTC)[reply]
- I would tend to support a new namespace for sourcing data (S-namespace), one which is not necessarily confined by many of the rules of types and such that the Q-namespace has about it. --Izno (talk) 02:26, 22 June 2013 (UTC)[reply]
- Weak oppose I agree with Izno, if the above guidelines are continued making entity for every book plus editions, articles, etc would clutter the wikidata entity. I know that this is the only available technology but couldn't a sister project of wikimedia like wikisource create the repository of books/article/web instead? Although the sister project wikisource can only use out of copyright and public domain source, how about creating a similar sister project like wiki-spiplet? I just want to separate the wikidata claim from the source data to avoid cluttering wikidata --Napoleon.tan (talk) 23:30, 17 June 2013 (UTC)[reply]
- We already had the discussion about using a different namespace for sources and there was some opposition. The reason is that book data is used by infoboxes too, so it would be redundant or confusing. About using a different repository for storing source information I give you my Support (see also my message about a possible ecosystem of wikidata repositories), however given the small size of the community and the amount of work to be done, I don't see it happening at this point. I must say that by having the sources as separate items it should be easy to do it in the future if there is the need.--Micru (talk) 02:00, 18 June 2013 (UTC)[reply]
- Values can't be separated from their sources, that's a key point for a reliable database so the use of another database for storing sources is not a good idea. The unique good solution would be the creation of a specific namespace S (after Q for items and P for properties) in order to differentiate the type of data. This was not supported and the above solution is the only one offering a good reusability of a source. If we want to distinguish in a better way the data item from the source item, we can define a new property or a new GND type to identify the sources. Then the item solution is not an issue if contributors use mainly reference books and not web or newspaper references: by selecting high data density documents the number of items is reduced. Task forces are an important factor to organize the data import, to select and to rank the best sources. Snipre (talk) 07:42, 18 June 2013 (UTC)[reply]
- Weak oppose I think we don't need an item for each Journal Article it Would be good if Sources could also have qualifiers then it would be sufficient to use The journal as source and all the other Information like Author, and DOI as qualifiers for that.--Saehrimnir (talk) 16:08, 18 June 2013 (UTC)[reply]
- Oppose Absolutely against having an item for every journal article, against having one item per edition. These two point should be simplified further. --Sannita - not just another it.wiki sysop 15:12, 20 June 2013 (UTC)[reply]
- More simplification means no reusability of sources: without item we have to add all source parameter under the claim manually. This is the unique possibility if item option is not chosen. Snipre (talk) 16:35, 20 June 2013 (UTC
- To make sources reusable I propose to add all informations about editions to the main item using qualifiers. Example: place of publication (P291) → Paris (Q90) (Edition 1) --Pyfisch (talk) 15:14, 21 June 2013 (UTC)[reply]
- But you will need to add the publisher as qualifier because the edition is not enough for some books which were published several times by different publishers. Then you have to do that for different properties: place of publication, date of publication, editor, ISBN, publisher... And this will save a certain number of items but what do you propose for articles, reports, medias ? Just to give you an idea, I am using right now one scientific article to source more than 300 statements so I am really interested by your solution for that case. Snipre (talk) 19:19, 21 June 2013 (UTC)[reply]
- and when you have done all that and entered all the edition data as qualifiers you will find there is no way to link to that edition from a source because you can only link to items. This is why I switched from supporting 'each edition a statement' to 'editions used in sources are items; other editions are statements in the original book page' - as is shown in the discussion at the top of the page. --Filceolaire (talk) 22:17, 21 June 2013 (UTC)[reply]
- @Snipre: First of all, I'm really sorry for pointing this problems only now, but in the last month I literally worked my *** out - just think that in the last week I've worked up to 14 hours a day - so I really haven't got the time to say anything in this RfC. I don't like to be the guy who doesn't participate to a discussion and later comes and make disruptive comments, but this time I had no choice.
- That said, I still don't think that having one item for every edition of a book is a good idea. Pyfisch made an example about this book, that has currently two hundreds and sixty-four editions... do you really think we can afford to make two hundreds and sixty-four items, one per edition? I think this is just barely manageable. In all my years, I've also quoted some informations from very well known novels, like The Count of Monte Cristo. Just imagine how many editions and translations are out there of this novel... And for the sake of sanity, I'm not even going to start talking about single articles.
- I don't see the point in having a separate item, and I don't see a clear consensus about that - but probably I'm missing something of this very long discussion. So if someone, please, may explain to me why we should adopt this solution, I'd be more than delighted to hear, and possibly we can discuss a new solution.--Sannita - not just another it.wiki sysop 12:07, 22 June 2013 (UTC)[reply]
- Hmm… Even though some books have up to several hundreds editions, only a few of them are likely to be used. The main idea behind this system of references would be to create new editions as long as they are needed : statistically, the community is not wide enough to make current use of more than several editions per work. Alexander Doria (talk) 12:30, 22 June 2013 (UTC)[reply]
- @Sannita. I don't want to redo the discussion of the last 2 months but there are only 3 possibilities for sourcing: 1) all source parameters in the source section of the statement but this means no reusability, 2) items for work and editions, full reusability but medium complexity to manage the system, 3) 1 item with all editions data, simple simple for wikidata but very complex system to extract source data in wikipedia when displaying source information. Then as said the idea for edition and articles is to allow only reference used as sources in statements. Snipre (talk) 01:59, 23 June 2013 (UTC)[reply]
- I missed the part about sources only for Wikidata. I think it would be a mistake to strictly enforce this policy as it would not allow to use Wikidata to source Wikipedia articles, which could be a great improvement and a chance for sources on wikipedia. TomT0m (talk) 11:22, 23 June 2013 (UTC)[reply]
- @Snipre: You are aware of big items?! If only these editions are set which are needed for sources the items are not growing so big, so we will not reach 300 statements per item normally. And if we have more statements, this is the more litte problem than the problems with extra items: how to translate all edition labels and descs without more human work? And how to find out the correct item from many edition items? Note: I miss rules about edition naming. --Pyfisch (talk) 16:27, 24 June 2013 (UTC)[reply]
- and when you have done all that and entered all the edition data as qualifiers you will find there is no way to link to that edition from a source because you can only link to items. This is why I switched from supporting 'each edition a statement' to 'editions used in sources are items; other editions are statements in the original book page' - as is shown in the discussion at the top of the page. --Filceolaire (talk) 22:17, 21 June 2013 (UTC)[reply]
- But you will need to add the publisher as qualifier because the edition is not enough for some books which were published several times by different publishers. Then you have to do that for different properties: place of publication, date of publication, editor, ISBN, publisher... And this will save a certain number of items but what do you propose for articles, reports, medias ? Just to give you an idea, I am using right now one scientific article to source more than 300 statements so I am really interested by your solution for that case. Snipre (talk) 19:19, 21 June 2013 (UTC)[reply]
- To make sources reusable I propose to add all informations about editions to the main item using qualifiers. Example: place of publication (P291) → Paris (Q90) (Edition 1) --Pyfisch (talk) 15:14, 21 June 2013 (UTC)[reply]
- More simplification means no reusability of sources: without item we have to add all source parameter under the claim manually. This is the unique possibility if item option is not chosen. Snipre (talk) 16:35, 20 June 2013 (UTC
- Strong oppose per Sannita and Izno. I propesed a different solution above. --Pyfisch (talk) 15:14, 21 June 2013 (UTC)[reply]
- Strong oppose per Sannita and Izno. I think that in the long term this solution would be completely unworkable. Sven Manguard Wha? 23:26, 21 June 2013 (UTC)[reply]
- Also Weak oppose per Sannita and Izno, this would just turn into a big mess :/ ·addshore· talk to me! 00:00, 22 June 2013 (UTC)[reply]
- The Open Library is using that system for years, OCLC is moving in that direction too. This is not a new solution, just following the steps of other organizations with much more experience in the field of book cataloguing have done before. The main challenge is going to be to create some interfaces to do the process easier.--Micru (talk) 19:25, 22 June 2013 (UTC)[reply]
Comments
edit- Tend to support, but I am wary that creating two items (one for the work and one for the edition) for the many many books that only have one edition will confuse many users. One simple advice to make the distinction a bit clearer would be to add the edition number in the label, like here.
- The edition info for the first edition is included on the page for the original book. only subsequent editions (if they are used as sources) need additional pages. --Filceolaire (talk) 22:25, 21 June 2013 (UTC)[reply]
- So this means Q13416128 is incorrect ? This is about the first (and, I think, only) edition of q13416074. --Zolo (talk) 06:17, 22 June 2013 (UTC)[reply]
- If we want a coherent model then the division should be for all editions. In case of books only with a Wikipedia article, some other Wikipedia might want to feature an edition in their language. In the case of books with only one edition, it is hard to know if there are or there will be more editions.--Micru (talk) 19:18, 22 June 2013 (UTC)[reply]
- So this means Q13416128 is incorrect ? This is about the first (and, I think, only) edition of q13416074. --Zolo (talk) 06:17, 22 June 2013 (UTC)[reply]
- The edition info for the first edition is included on the page for the original book. only subsequent editions (if they are used as sources) need additional pages. --Filceolaire (talk) 22:25, 21 June 2013 (UTC)[reply]
- I think depicts (P180) should be a recommended property to describe the topic (keywords on semantic steroids)
- category's main topic (P301) seems more appropriate ("depicts" is for pictures).--Micru (talk) 19:18, 22 June 2013 (UTC)[reply]
- sorry I should have added these comments earlier). --Zolo (talk) 17:40, 18 June 2013 (UTC)[reply]
- P301 is currently for category (the word "category" was dropped from the label but I think it is an oversight, as it was not discussed on the talk page). I do not think it makes sense to have a different property for "depicted in the image" and "depicted in the text", even though that may make the label a bit awkward. And sometimes, a document has both text and images. So I really think depicts (P180) is appropriate. That said it make make sense to merge it with P301, though it avoids to mix up Wikipedia-maintenance with normal content. --Zolo (talk) 07:21, 23 June 2013 (UTC)[reply]
- You were right, the word "category" was removed from P301 (fixed), that is why I was confused about its potential use. In that case we might need a property "topic of creative work" (as you suggest in the talk of P301). --Micru (talk) 15:07, 23 June 2013 (UTC)[reply]
- P301 is currently for category (the word "category" was dropped from the label but I think it is an oversight, as it was not discussed on the talk page). I do not think it makes sense to have a different property for "depicted in the image" and "depicted in the text", even though that may make the label a bit awkward. And sometimes, a document has both text and images. So I really think depicts (P180) is appropriate. That said it make make sense to merge it with P301, though it avoids to mix up Wikipedia-maintenance with normal content. --Zolo (talk) 07:21, 23 June 2013 (UTC)[reply]
- Question How would you cite working papers and articles that are not yet published ? Given the long time that publishing often takes, it may sometimes make sense to use them. For instance, en:Age of the universe makes use of the third draft of a paper, and I suppose that they do so because it is more up to date and accurate than anything officially published. In this case, one solution could be to just provide the Arxiv number, but it would not provide much info. --Zolo (talk) 18:54, 18 June 2013 (UTC)[reply]
- An unpublished article in a peer review journal or an unpublished book can't be used as source: the information and data can change until the final approval leading to unconsistancy data. Snipre (talk) 13:44, 20 June 2013 (UTC)[reply]
- In this case, there is also a permalink to this version of the draft. --Zolo (talk) 07:21, 23 June 2013 (UTC)[reply]
- An unpublished article in a peer review journal or an unpublished book can't be used as source: the information and data can change until the final approval leading to unconsistancy data. Snipre (talk) 13:44, 20 June 2013 (UTC)[reply]
- Where will be published the guidelines ? In the Help:Sources page ? Snipre (talk) 13:44, 20 June 2013 (UTC)[reply]
- It looks like a good place if there are no inconvenients in replacing that page.--Micru (talk) 19:18, 22 June 2013 (UTC)[reply]
- For scientific journal articles, what about those that do not have page numbers but an article ID (e.g. at PLOS)? --Daniel Mietchen (talk) 20:27, 20 June 2013 (UTC)[reply]
- You put the parameters you have: if no pages are used in the "jounrnal" just used the property page with the no value option. Snipre (talk) 19:07, 21 June 2013 (UTC)[reply]
- How will labels and descriptions be distributed between all editions?--Pyfisch (talk) 15:15, 21 June 2013 (UTC)[reply]
- Put the same name for the label and give the edition number in the description. You can add the edition number in brackets in the label if you want. See the example for the book. Snipre (talk) 19:07, 21 June 2013 (UTC)[reply]
- Comment about authors : Books are not actually reliable sources for their authors. Pseudonymous books are but one example of the possible complexities. The source for a author being associated with the book is the cataloging record by a national authority, not the book itself. (The most serious difficulty is the same one that it has been since the 19th century, the real or nominal authorship of a book by an organization ,or , more recently, but a collective. There remains no simple solution for this--there are multiple levels of "authorship". To make clear that the term authorship is ambiguous, the term used in modern cataloging codes is the very clearly multi-meaning term: "Responsibility"DGG (talk) 03:30, 22 June 2013 (UTC)[reply]
- This kind of discussion is better in the talk page of the property author. Snipre (talk) 10:24, 22 June 2013 (UTC)[reply]
- Books are reliable sorces for the text that is printed on them (which doesn't mean that it is right). A "responsibility" property could be useful for those cases where claimed autorship and real autorship differ.--Micru (talk) 19:18, 22 June 2013 (UTC)[reply]
- Hi, can we get this translated? This RfC will have wide implications, so at least the final guidelines should be translated to make sure the final vote is accessible to all. If you need help, please post at Wikidata:Translators' noticeboard. Thanks, Legoktm (talk) 05:22, 24 June 2013 (UTC)[reply]
- The voting for this version finishes today, in any case it should be possible to modify it later on if there is consensus or the need to do so. I think what is needed is to explain this source model. The best would be to move the guidelines to help:sources, translate them there, and also write a blog post explaining the reasons for this model, what can you do with it, addressing concerns, etc. Most of the opposition I am seeing is because of interface issues, which can be solved by having a gadget to create/link all necessary items at once, and "too many items", which could be solved by having "S" entities... I can start writing the draft, but I will need help to make sure that it is clear and then also later on to translate it. --Micru (talk) 16:14, 24 June 2013 (UTC)[reply]
- Since there are important disagreements on how to use journals and books, could we at least adopt a guideline that reflects what we agree about and what is still undecided? There doesn't seem to be disagreement around the idea that most statements should be sourced and no disagreement on how to handle websites, technical documentation and trusted databases. I would be happy with a guideline that includes all of that, explains the remaining disagreements on books and journals, and points to the current discussion on the topic. Pichpich (talk) 04:14, 25 June 2013 (UTC)[reply]
- We all want a guideline that provides a solution for every source format. But let's be realistic and pragmatic. It could take a long time to resolve the issues on books and journals. Until that happens, we can choose to have no guideline at all or we can choose to have a partial guideline. You are correct in saying that a guideline is critical but this is precisely why a temporary guideline is better than no guideline. Pichpich (talk) 14:03, 25 June 2013 (UTC)[reply]
I have opened a new RfC to address that were brought up during the final vote. The new discussion is here: Source items and supporting Wikipedia sources. As for the "Guidelines for sourcing statements", Legoktm considers that this rfc has to be left open just in case there are more comments before moving the guidelines to Help:Sources. --Micru (talk) 23:09, 25 June 2013 (UTC)[reply]
- Personally I would prefer to move the Guidelines to Help:sources and the discussion to Help Talk:Sources now since the vote has clearly approved this. Filceolaire (talk) 15:59, 26 June 2013 (UTC)[reply]