Wikipedia talk:Wikipedia Signpost/2016-04-24/Op-ed: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 01:22, 27 April 2016

← Back to Op-ed

Discuss this story

This is the best-grounded look at the whole Heilman affair since it began, aided of course by the digging you folks at the Signpost have done and by the addition of the actual email chain between Wales and Heilman.

What a tale of technical overreach, fiduciary irresponsibility, behind-the-scenes machinations, treachery and duplicity!

Magnificent wordsmithing by Andreas Kolbe. →StaniStani 00:10, 25 April 2016 (UTC)[reply]

My compliments on another excellent piece of work, Andreas. You should really try to get these articles more widely distributed. -- Seth Finkelstein (talk) 01:28, 25 April 2016 (UTC)[reply]

Wow, usually when someone says that the other party "took things out of context", I assume that they meant that the discussion before and after the quoted section would lead to a different interpretation. I didn't think that Wales literally cherry-picked sentences out of a long discussion to make both sides of the discussion look radically different than what they were. I really don't understand why Wales and the WMF have been so ridiculous about this whole thing- they had an obvious problem, came up with an ambitious solution, it turns out that they couldn't really do it, and... they now feel the need to lie and cast aspersions and throw people under buses for it. Guys, if you want to be a big-shot "tech company in the field of education/charity", then you need to take tech company 101: not every neat idea you have works out, and the takeaway is to learn from it, not fire everyone who disagreed. --Pres N 01:35, 25 April 2016 (UTC)[reply]

Nice work, Andreas. Carrite (talk) 01:50, 25 April 2016 (UTC)[reply]

Sorry but this is just a bunch of misconceptions. A query dialog engine is not a Google competitor, it is not even close. (Why do I waste time on reading this?;/) Jeblad (talk) 06:21, 25 April 2016 (UTC)[reply]

Excellent work, Andreas. It is clear that ambitions went far too high. Chiswick Chap (talk) 07:04, 25 April 2016 (UTC)[reply]

These are Wikimedia Movement resources and the WMF is simply a steward of the resources. It is disclosure in normal English of our strategy / goals that I am currently requesting rather than full scale consultation. Also typically those most involved in a conversation are also some of the most informed , I agree w/DocJames on this 100%....in my view we are not painted on the wall (we edit for hours work , logic dictates we should have a voice). While the general idea by Jimbo Wales is great, its a matter of "whether the end justify the means"? (lack of transparency)..NO.--Ozzie10aaaa (talk) 11:44, 25 April 2016 (UTC)[reply]

Of course. WMF prepares 35 million dollars for a "knowledge engine" but can't spare a couple thousand for digitizing public domain materials in "the global south" or "developing communities" or whatever their term is now. Priorities, priorities... and the people who speak out get shafted. — Chris Woodrich (talk) 15:51, 25 April 2016 (UTC)[reply]
- Chris, more to the point, one might weigh up some of the spending on physical meetups, trinkets, and carbon-intensive travel and accommodation, against clearly high-impact tasks such as digitizing. Just my 2 cents' worth. Tony (talk) 15:58, 25 April 2016 (UTC)[reply]

Since everyone has (rightfully) praised Andreas for this penetrating article, I'd like to ask a possibly stupid question: what would this proposed search engine offer a user that isn't already available? Besides the usual search function, there are hyperlinks between articles, similar articles are grouped into categories, & similar materials on different projects (viz. Commons or Wikisource besides Wikipedia, or even other-language Wikipedias) have links in the article. And when Wikidata matures sufficiently, that will provide a means to search for material between projects. And while improvements to the search function could be made, it will help a user to mine Wikipedia for all related information. So if I want to know what the Wikimedia projects have about Tom Cruise or Queen Elizabeth II, it's not that hard to find it all at present. Far easier than the library card catalog (or Reader's Guide to Periodical Literature) I had to rely on as a student decades ago. So what would a search engine offer that a user doesn't currently have -- or is likely to have in the not so distant future? -- llywrch (talk) 17:09, 25 April 2016 (UTC)[reply]
- The idea that simply duplicating the ability to answer a simple question like "How old is Cruise?" on Wikipedia's home page will pull readers off of Google's home page to ours (seems we do want to compete with them for "home-page market share")... the idea seems silly. How many readers are so helpless that they can't search for Cruise himself, and easily find his age in the infobox. The only reason to leave Google's engine is for a specialized search that it can't handle. We recently had a discussion about Semantic Mediawiki, which tries to answer more sophisticated questions. From that you'll see that we have a long way to go to catch up to the Wolfram Alpha knowledge engine. It might be less expensive to just buy that. wbm1058 (talk) 23:43, 25 April 2016 (UTC)[reply]
  - +1 The future would be an IPA (e.g. Siri, Cortana), not Wolfram, and some are even free software. --Molarus (talk) 01:02, 26 April 2016 (UTC)[reply]
  - Platypus (backed by Wikidata) can answer it, FWIW. --Ori Livneh (talk) 03:57, 26 April 2016 (UTC)[reply]
  - Which just shows that spending resources developing a search engine is wasted effort. If the future is something similar to the proposed semantic web, Wikidata is a strong first step towards that -- & already supports a few proof of concept examples. Further, IIRC those examples were developed without Foundation backing. All that having the WMF create another search engine accomplishes is to add another line to someone's resume. (And by saying "someone" I'm not trying to say Lila Tretikov in a cute way; as more information comes out, the more obvious it is that there are other people who are likely to be the real person behind the Knowledge Engine. Treitikov might have been only a scapegoat.) -- llywrch (talk) 16:08, 26 April 2016 (UTC)[reply]
- The {{Orphan}} template features a nifty "find link" tool that is very helpful for creating links to orphan articles. This is part of the work of crowdsourcing for relevance. Doesn't Google's algorithm give priority to pages with a lot of incoming links pointing to them? So whatever "knowledge engine" we build will be more powerful if it has a stronger web of interwiki links to build off of. Just a little thing like a bot that ran Edward Betts' tool against our entire database of orphans and pointed out the most linkable ones would be helpful. Maybe I'll ask for it in the next round of the community wishlist survey, but that seems like a waste of time when the bulk of resources are directed elsewhere. wbm1058 (talk) 00:53, 26 April 2016 (UTC)[reply]
Another excellent piece by Andreas Kolbe, the best writer in the field of Wikipedia-focused journalism. So we see again that Jimmy Wales has some honesty problems. How soon is he leaving the Board? Chris Troutman (talk) 17:47, 25 April 2016 (UTC)[reply]
Gotta pile on and agree that Andreas rocks. Nice to see Signpost hitting its stride again and putting April Fools' Fortnight behind us (but for another ArbCom melodrama). wbm1058 (talk) 01:25, 26 April 2016 (UTC)[reply]

Some thoughts

Hi. Sorry if this is a daft question, but this piece is marked as an op-ed. What opinion is being expressed?

Does anyone disagree that our internal search needs improvement? I would think that Andreas and others would be supportive of efforts to have free, open, and independent search functionality. Below other mission-critical services such as providing SQL and XML data dumps, search is pretty important infrastructure, especially as the Wikimedia projects grow.

If we took an input string such as "How old is Tom Cruise?" and broke it up into pieces, I think we could, with some effort, program this and similar queries to return specific data points. We could look at the most relevant Wikidata item (d:Q37079) to extract the "date of birth" field's value ("3 July 1962") and then do a simple date calculation to show that Tom Cruise is currently 53 years old. Or, if we can get the search results to be better, we can pull out and highlight specific data points alongside the search results.

After we solve "How old is [famous person]?"-type queries, we can add support for alternate phrases such as "What age is [famous person]?" Once we solve that, we can move on to programmatically answering other "easy" queries. I don't think what's being described here requires artificial intelligence or IBM's Watson.

You want a concrete opinion? The search results at Special:Search/How old is Tom Cruise? are currently terrible. Tom Cruise bafflingly doesn't appear in the top 100 results. If Tom Cruise did appear in these results, we could look at the search input, see that it uses a known keyword ("age" or "old"), and then extract that information programmatically to serve our reader/researcher more quickly. Who opposes doing this?

Let's talk about how we can improve search and what that will require. Does an organization similar to the Wikimedia Foundation (or the Knight Foundation, for that matter) need to be involved? What value do these organizations provide? I think there's plenty of room for intelligent and thoughtful discussion about priorities and functionality and serving our readers. Can we start now? --MZMcBride (talk) 03:23, 26 April 2016 (UTC)[reply]

Hi! I think that many people are aware of imperfections in our current search functionality. But I don't think that it is a good idea to try to build a searchengine that uses natural language processing to get answers from a semantic wiki. That seems far too ambitious to me. And let's face it, they are not Google, they simply do not have the people and skill required. To reach such a goal you need to split it up in smaller, more manageable tasks, and I think it starts with improving or even rewriting the current search functionality.

Wikipedians don't really need a search engine that tells them how old Tom Cruise is, because we got a template for that (in this case {{birth date and age|1962|7|3}} which renders as: July 3, 1962 (age 53)). Internet users in general may need such a search engine, but creating it is difficult and making it popular is even more difficult, and I believe that big companies like Google and Apple (and even Microsoft) who have been doing research into (and experiments with) this kind of stuff for a long time now are far more likely to create something that actually works. The WMF is not a software company, and I don't think they can compete with the big guys in this field (Google, Siri), so I think they should focus on their niche.

Personally I wish they would be far less ambitious. I do want them to improve the search engine, maybe even to rewrite it from scratch if they believe that that is the best solution, but please keep offering roughly the same functionality as before, with some improvements and additions, instead of trying to create something superambitious that is gonna be a waste of time and money in the long run. There are many smaller improvements possible, for example the MediaWiki software does offer the ability to search for links only in a specific namespace, but this functionality is disabled on WikiMedia projects, due to efficiency issues.

Imagine if they would successfully create a search engine that gives correct answers to questions in plain English. Imagine if people (who are currently using Google for this type of task) would switch to using this new search engine, built on open standards with open data. Then Google will immediately embrace, extend and extinguish it. The Quixotic Potato (talk) 15:33, 26 April 2016 (UTC)[reply]

MZMcBride, I agree, the op-ed designation seems odd; this strikes me as simply good reporting that, in some areas and transparently, draws conclusions that could be construed as opinions. In most publications, this is simply referred to as "news reporting." But the Wikipedia world can be highly sensitive around the issue of neutrality, and this particular topic is highly sensitive. My guess is that's why it was presented as an op-ed. That designation signals that others might be welcome to submit competing interpretations. In that sense, I like the choice; Jayen466 (Andreas) is a Signpost editor, so it's good to be extra cautious about any impression that his own views and the editorial position or policies of the publication are getting blurred.

On the substance of the piece: Yes, I think everyone can agree that there is room for substantial improvement in Wikipedia/Wikimedia search. I think that has been broadly agreed by many people over the recent months. But I don't see that as a central question in this piece. A very important, unanswered question remains: was the board justified in dismissing a recently-(s)elected Trustee? Or was Docjames actually the only Trustee trying to do the right thing, in the face of a board apparently deeply tied to going about things in a bad way (standing by its Executive Director despite massive staff opposition and attrition, and neglecting to clearly communicate its ambitions to important stakeholder groups like volunteers and staff)?

That question is an important one, and this piece advances the effort to unravel it. -Pete (talk) 15:38, 26 April 2016 (UTC)[reply]

I don´t think we need a question/answering-software, rather an assistant that could do some tasks for editors and readers. We could start with writing into the searchbox something like: WD, show me the article about Tom Cruise, or WD, read out aloud the introduction of that article, or WD, tell me who wrote most of this article, or WD, show me all media files commons has about Tom Cruise. --Molarus (talk) 23:48, 26 April 2016 (UTC)[reply]

In the October emails, James said, "The Sept 18, 2015 grant agreement states 'the Knowledge Engine by Wikipedia, a system for discovery of reliable and trustworthy public information on the Internet' as the purpose.

"The June 24th 2015 document show images of a Google like setup. While the June 30th document states 'how is WMF going to build a unique search experience that will go beyond what Google and Bing are already providing their users?'

"The plan appears to be for this search engine to go at www.wikipedia.org What else would you call what is being described? This is not a search tool for Wikimedia properties. It also appears to include Watson / Google graph type functionality."

To which Jimmy replied, "Yes, that sounds exactly like what Lila presented to the board for approval, and what was approved by the board."

And in November he said to James, in response to the claim that there had been an attempt to fund a massive project to build a search engine that was then scoped down to a $250k exploration for a fully developed plan:

"In my opinion: There was and there is and there will be. I strongly support the effort, and I'm writing up a public blog post on that topic today. Our entire fundraising future is at stake."

But in his gaslighting email, Jimmy says about James's Facebook post, "you said publicly that you wrote to me in October that we were building a Google-competing search engine and that I more or less said that I'm fine with it. Go back and read our exchange. There's just no way to get that from what I said – Indeed, I specifically said that we are NOT building a Google-competing search engine, and explained the much lower and much less complex ambition of improving search and discovery."

Effectively calling James a liar again, in front of Pete. But look at what James did say on Facebook: "I asked individuals on the board in Oct if they understand that we were building a 'search engine' as before Oct I did not realize we were. JW said that he understood this all along and it was something we needed to do."

The big issue for me here is Jimmy's lying and defaming. It is clear now that James's Facebook comment exactly, accurately reflects Jimmy's statements to James about the Knowledge engine. In his gaslighting email, Jimmy, again, defames James, accusing him of misrepresenting his (Jimmy's) position.

We have a serial liar strutting about posing as our spokesperson, squatting on a board seat, defaming a hard-working popularly-elected volunteer.

Another thing: A new board member discovers there's been a plan to develop an internet search engine that could cost tens of millions of dollars and takes it to the other board members. And Jimmy goes, Eh? What's your problem? No big deal, here, James. Nothing to see. Move on. Gaslighting.

And another: It's clear now that the WMF was waiting for the right moment to let the community in on this scheme. Jimmy: "A commitment to explore a concept through an external grant doesn't strike me as the right point necessarily to engage in a full-scale consultation." So, James's concerns that this was being kept from the community were well-founded. Jimmy didn't trust the community with this information. James did. --Anthonyhcole (talk · contribs · email) 01:22, 27 April 2016 (UTC)[reply]