Wikidata:Property proposal/most populous settlement
most populous settlement
editReturn to Wikidata:Property proposal/Place
Description | city, town, or other settlement with the largest population in this area (country, state, county, continent, etc.) |
---|---|
Represents | largest city (Q51929311) |
Data type | Item |
Domain | geographic region (Q82794) |
Allowed values | instances of human settlement (Q486972) |
Example 1 | United Kingdom (Q145)most populous settlementLondon (Q84) |
Example 2 | Maryland (Q1391)most populous settlementBaltimore (Q5092) |
Example 3 | South America (Q18)most populous settlementSão Paulo (Q174) |
Planned use | replacement of statements modeling the inverse relation with instance of (P31)/of (P642) |
See also | capital (P36) |
Motivation
editThe most populous settlement in a territory is a very commonly cited piece of information that is not well modeled at present. With no specific property, the information can only be expressed by using a qualifier, e.g. London (Q84)instance of (P31)largest city (Q51929311)
Accepting the need for a specific property, the only question is the direction of the relation, and I think the answer is clear: as with capital (P36), this information is best modeled by linking the parent territory to the settlement, rather than the other way around, since this enables use of a single-best-value constraint (Q52060874), where there are multiple values corresponding to different dates or measurement methods.
A similar proposal was rejected 11 years ago, on the basis of 1) the vagueness of the word "largest" (which the present proposal does away with), and 2) the idea that the most populous settlement of a territory should be inferable from population (P1082) statements, and thus doesn't warrant a property. However, that naively assumes complete information and consistent measurement methodology, and does not allow for any qualification or referencing regarding the "most populous" determination. – The preceding unsigned comment was added by Swpb (talk • contribs) at 15:41, 7 October 2024 (UTC).
Discussion
editNotified participants of WikiProject Cities and Towns
- Support This is useful for infoboxes and makes it possible to keep track of which city is the largest over time. Dexxor (talk) 15:58, 7 October 2024 (UTC)
- Support, good idea! Pallor (talk) 21:35, 7 October 2024 (UTC)
- Support while this information could be derived from a query, with WDQS struggling, it seem better to provide a short cut. Vicarage (talk) 19:31, 14 October 2024 (UTC)
- Conditional support Only when a Wikipedia infobox must use this property (please name the template and wiki). Otherwise why not use WDQS? Midleading (talk) 03:56, 15 October 2024 (UTC)
- You can't have a shortcut designed to reduce WDQS stresses and then clutter it with use qualifiers that burden the system more. Vicarage (talk) 05:18, 15 October 2024 (UTC)
- There are only 94 items like London (Q84)instance of (P31)largest city (Q51929311)
of (P642)United Kingdom (Q145). I can't see how these 94 items clutter WDQS compared to 100,000,000+ other items. And also only 2 of them have references, so this property could be 80% unreferenced if created. Also noting that the value can be inferred from population (P1082) statements via WDQS, which are more than 95% referenced and have more than 3 million uses. So what I see is just contradictory to what is claimed: there is complete information at population (P1082), and about 95% have references, but this proposed property can't provide comparable quantity and quality at all, can't even provide coverage on every country in the world (unless they are filled by a bot which inferred the information from WDQS). I would just suggest removing these instance of (P31) statements altogether, if no wiki and nobody is using them. Midleading (talk) 07:55, 15 October 2024 (UTC) - Of course it could be derived by a WDQS for all settlements in an area sorted by population, limit 1. And indeed it should be populated that way, without needing references, to save everyone running such an expensive query again. of (P642) is being removed of course. WD has many shortcuts to data that could be derived other ways, this seems a sensible addition Vicarage (talk) 11:13, 15 October 2024 (UTC)
- I prefer to running a query every time. It ensures the users get the latest data rather than unmaintained, outdated and incomplete data added by P642 users. Why trust a bot running the same WDQS query only once a year when I can do it myself? Midleading (talk) 05:44, 19 October 2024 (UTC)
- Speed and ease of use. Its not as if settlements are jostling for position, and the vagaries of boundaries means its meaningless to say Leeds overtakes Bradford. You are very keen on the mul project to improve performance, so it seems odd to not be concerned about it here. I guess the real arbiters are the Wikipedian infobox people who will decide whether a slower WD derived answer is better than a static hard-coded one. And offhand, I'd struggle to write a query to do all the selecting, counting and limiting, when using the property would be trivial. Vicarage (talk) 06:31, 19 October 2024 (UTC)
- Wikipedia can't use WDQS at all. Data must be directly or indirectly linked from the item of the article subject, inverse links are not supported. So it is acceptable to create a property for use on Wikipedia even if WDQS can also be used. But in this case, the proposed property is hard to use on Wikipedia, because Wikipedia article of United Kingdom (Q145) can't find the inverse link to London (Q84), and London (Q84) doesn't need to link to United Kingdom (Q145) in infobox in this way either. Another problem is that Wikipedians want to keep data on the wiki page rather than Wikidata for various reasons. A lot of articles are still written as if COVID-19 never happened and Russia never started a war in Ukraine, and Wikipedians think it's better that way. So, if no Wikipedia really wants to use it, then it's not necessary to create the property that nobody uses, and that requires WDQS again for finding the inverse link. Midleading (talk) 17:36, 19 October 2024 (UTC)
- Speed and ease of use. Its not as if settlements are jostling for position, and the vagaries of boundaries means its meaningless to say Leeds overtakes Bradford. You are very keen on the mul project to improve performance, so it seems odd to not be concerned about it here. I guess the real arbiters are the Wikipedian infobox people who will decide whether a slower WD derived answer is better than a static hard-coded one. And offhand, I'd struggle to write a query to do all the selecting, counting and limiting, when using the property would be trivial. Vicarage (talk) 06:31, 19 October 2024 (UTC)
- I prefer to running a query every time. It ensures the users get the latest data rather than unmaintained, outdated and incomplete data added by P642 users. Why trust a bot running the same WDQS query only once a year when I can do it myself? Midleading (talk) 05:44, 19 October 2024 (UTC)
- Of course it could be derived by a WDQS for all settlements in an area sorted by population, limit 1. And indeed it should be populated that way, without needing references, to save everyone running such an expensive query again. of (P642) is being removed of course. WD has many shortcuts to data that could be derived other ways, this seems a sensible addition Vicarage (talk) 11:13, 15 October 2024 (UTC)
- There are only 94 items like London (Q84)instance of (P31)largest city (Q51929311)
- You can't have a shortcut designed to reduce WDQS stresses and then clutter it with use qualifiers that burden the system more. Vicarage (talk) 05:18, 15 October 2024 (UTC)
- Oppose the most rejections last time was that this is queryable, and that is still the case. In essence, this is redundant information. The argument that about the data has not been added to Wikidata is just as applicable to this property, and instead of adding this statement, the energy could be spent on adding a good population statement. Ainali (talk) 12:08, 15 October 2024 (UTC)
- In practice, I would want this property to be usually used with start time (P580) and have that as a required qualifier. That information is a bit more complex to get with WDQS and thus more useful to actually store.
- In general, the usage in Wikipedia template does matter. Especially, for smaller Wiki's it might be a way to get them useful data from Wikidata. ChristianKl ❪✉❫ 12:14, 15 October 2024 (UTC)
- Thinking about more, Oppose in the present version. London (Q84) is not the most populous settlement in the UK but Greater London (Q23306) is. Any properties where the examples someone comes up when proposing it are wrong, is likely to be misunderstood in practice. urban agglomeration (Q159313) subclasses urban area (Q702492), which subclasses human settlement (Q486972). The term settlement includes entities that aren't cities on also entities like Rhine-Ruhr Metropolitan Region (Q164903). I don't think our iterms for urban agglomeration (Q159313) are reliable enough currently that simply running a query sorted by population with limit 1 would do the job. ChristianKl ❪✉❫ 12:27, 15 October 2024 (UTC)
- Yeah, WDQS doesn't always work in such a complex world (for example, why London is a settlement, not Greater London, not Great Britain?). The only reliable way is to find a good source as reference, and this way it can also be used on Wikipedias. But I doubt anyone is going to do that after this property is created. Midleading (talk) 17:22, 15 October 2024 (UTC)
- A more reliable way would be to create a "most populous city" or "most populous urban area" property instead of one about settlements. ChristianKl ❪✉❫ 22:12, 15 October 2024 (UTC)
- Greater London is not a settlement, its an urban_agglomeration, a type of region, which would not be valid for this proposed property. And most_populous_city won't work for regions nested in regions, as they all are. Vicarage (talk) 23:13, 15 October 2024 (UTC)
- By the same logic, London wouldn't work because it's a city and not a settlement. In this context I would take settlement to mean instance of (P31) of an subclass of human settlement (Q486972). If you try to identify settlements by looking at the earth with a satellite image, you are seeing an urban area (or urban_agglomeration) and not seeing a city. A city is an entity with boundaries that are set by law.
- The term settlement is the superclass that covers both entities that are have their boundaries defined by administratively and entities that you would identify by looking at a satellite image.
- If you go to https://en.wikipedia.org/wiki/List_of_largest_cities you find a list where the item we have on Wikidata for a city usually corresponds to the "city proper" column. You can see that, by the fact that the item for that city is going to have the population in that column. On Wikidata, we have separate items for the urban area (and in cases where you have a single city with surroundings that's an urban agglomeration). For good fun there's also metropolitan area (Q1907114). ChristianKl ❪✉❫ 17:42, 16 October 2024 (UTC)
- Good point that needs to be addressed. Perhaps we can solve the ambiguity of the term "city" by enforcing the use of qualifiers like criterion used (P1013)city proper (Q5124027) and criterion used (P1013)metropolitan area (Q1907114) if it makes a difference. Dexxor (talk) 06:56, 17 October 2024 (UTC)
- As far as I know, within Wikidata we don't have ambiguity about how the term city is used. It means an administrative entity. It's boundaries are the boundaries of the administrative entity and it's population is the number of people who live within the boundaries of that administrative entity. We then have extra items for the urban area (Q702492) and metropolitan area (Q1907114) and someone who wants to know the population (P1082) can look at those items.
- I tried to run a few queries and it's interesting that queries often timeout. I however found Maryland (Q1391)most populous settlementWashington metropolitan area (Q2367175). ChristianKl ❪✉❫ 12:23, 20 October 2024 (UTC)
- Good point that needs to be addressed. Perhaps we can solve the ambiguity of the term "city" by enforcing the use of qualifiers like criterion used (P1013)city proper (Q5124027) and criterion used (P1013)metropolitan area (Q1907114) if it makes a difference. Dexxor (talk) 06:56, 17 October 2024 (UTC)
- Yeah, WDQS doesn't always work in such a complex world (for example, why London is a settlement, not Greater London, not Great Britain?). The only reliable way is to find a good source as reference, and this way it can also be used on Wikipedias. But I doubt anyone is going to do that after this property is created. Midleading (talk) 17:22, 15 October 2024 (UTC)
- Support Valuable information, useful to make readily accessible. I would really like to see this for eg all civil parishes in the UK, to identify the most significant population centre in the parish. The objections above that "this information is queryable" don't cut it for me, for at least
twofour reasons:- (i) in many environments where this would be useful, eg the infobox of a Commons category, queryability is not available;
- (ii) there are queries where one might want to return this as convenience information for a large large number of rows: if that query was to have to compute the largest population centres, as opposed to just looking it up, that would not only requiring writing a query that is appreciably non-trivial, it is also a query that may not complete;
- (iii) the information can only be obtained by query if we have complete population information to query over. But as eg query https://w.wiki/Bczb for UK administrative areas reveals, this is not so. Bardsey Island (Q1991483) (population: 4) is not the most populated place in the civil parish Aberdaron (Q2556821) -- the most populated place is the village of Aberdaron itself, but unlike the island we don't have a population statement for the village. Worthwhile, therefore, if we can import this information from an appropriate reliable source (eg the full UK census) that does have complete coverage, or otherwise from a source that we can depend on;
- (iv) historic information is also of value -- what used to be the most significant population centre (but has since been surpassed) -- with a property, and an 'end time' qualifier, we can record this.
- IMO, if well-sourced, the information is likely to be pretty long-term stable, so the data en:denormalization involved seems to me acceptable. -- Jheald (talk) 17:06, 21 October 2024 (UTC)
- I was already thinking of writing about other aspects, but this post came in handy. The negative comments are on a rather narrow range of issues, whereas the proposed feature has a much wider range of uses. It can be any administrative unit in the world, however small and insignificant, within which the seat and the most populous
municipalitysettlement may differ. It can also be used to record historical data for administrative units where the most populousmunicipalitysettlement has changed over the ages. So I can see this feature being useful in many more places than London. Pallor (talk) 17:23, 21 October 2024 (UTC)- @Pallor the proposed property is not named "most populous municipality" and frequently the most populous settlement is not a municipality. If you actually want a property to store the "most populous municipality" creating a property called "most populous municipality" would be more better than the proposed approach.
- If one person expects the property to contain the "most populous municipality" and another expects it to actually contain the "most populous settlement", uses of infoboxes are going to be unhappy. I don't understand why you would want that mess, if we could also create a "most populous municipality" property where most users likely have a good intuition of what values it will contain. ChristianKl ❪✉❫ 09:56, 23 October 2024 (UTC)
- It was a bad translation that I didn't notice. Thanks for telling me, I fixed it. Pallor (talk) 10:26, 23 October 2024 (UTC)
- If you prefer using "most populous settlement" over "most populous municipality", why do you dislike having the property as ""most populous municipality"? ChristianKl ❪✉❫ 12:04, 23 October 2024 (UTC)
- It was a bad translation that I didn't notice. Thanks for telling me, I fixed it. Pallor (talk) 10:26, 23 October 2024 (UTC)
- I was already thinking of writing about other aspects, but this post came in handy. The negative comments are on a rather narrow range of issues, whereas the proposed feature has a much wider range of uses. It can be any administrative unit in the world, however small and insignificant, within which the seat and the most populous