Wednesday, July 26, 2017

#Wikidata - in #defence of Erika Herzog

On Facebook, Erika made a few comments that were not well received. A few really positive things did come out as a result but there is a need to defend Erika and her central argument. She asked if there had been a process of consulting the English Wikipedia community because the user interface of Wikidata is so poor. She said:
"... But I am pretty sure a lot of En Wikipedia editors are going to be sort of upset about this shift that requires them to actually edit Wikidata without a form input method (on WikiMarkup). Is there a form input on Visual Editor for this?"
On Facebook she is attacked for all the wrong reasons. A Wikimedia functionary asks: "How is this a Wikidata matter? English Wikipedia is where you want to discuss this." Erika's answer is spot on: "Actually no it's not. I'm tired of this response. It's not helpful or realistic. This is a Wikidata item in terms of buy-in and outreach to incorporate more Wikipedia editors. It's disingenuous to posit otherwise. This needs to be a discussion on both sides, and I think the onus is more on the Wikidata side as the usability and UX is poor at best."

One positive outcome of the Facebook thread is that it is mentioned that there is a method under development to edit Wikidata from Wikipedia templates. However welcome, it is going to introduce its own problems because the primacy of the data remains at Wikidata. The user interface of Wikidata is indeed awful. As one of the more prolific Wikidata editors I only use it for editing. For displaying the data I use Reasonator exclusively. Compare this with this for instance and you will see why.

The reason for this are applicable priorities. The WMF has too many concurrent ambitions for Wikidata and the staff is overextended. When the question is if Wikidata is sufficiently user friendly for an average Wikipedian, the answer is no. At this time Wikidata cannot cope with all the changed committed to it as it is, the wise words of Johan Cruyff apply; every disadvantage has its advantage.

Sunday, July 23, 2017

#Wikidata - Franziska Michor and #notability

Because of Facebook I read something about Franziska Michor. What triggered me was that she received an award. Her occupation, biomathematician, does not even exist (yet) on Wikidata.

To understand what a biomathematician does, it is great to watch the TedMED presentation by Mrs Michor. It gets me to the question of notability; I was amazed that Mrs Michor did not have a presence on Wikidata. I do not know if TEDMed is part of the TED project we had and I have no clue how to add this presentation.

The problem with an ever increasing scope of Wikidata, the challenge becomes less one of introducing data but more of maintaining data. This is particularly true when you look at Wikidata from a mathematical point of view. With Mrs Michor there are several datasets that gained notability and can do with some tender loving care: biomathematicians, TEDMed talks and the Vilcek Prize for Creative Promise.

Saturday, July 22, 2017

#Wikidata - Prix de Coincy and Raymond Benoist

The Prix Coincy is an award conferred by the French Botanical Society. The first time it was awarded was in 1904 according to the French article but the first botanist who is known to have received it, got it in 1906. He was Edmond Gustave Camus a red link in the French article but he has articles in several Wikipedias.

Botany is one of those subjects that have appeal; people care about plants, how they are named and consequently many botanists have articles in multiple Wikipedias. This became obvious when all the red links and black links in the article were entered in Wikidata. Like Mr Camus most already existed and just had to be associated with this award.

There are a few items that are not that obvious; Raymond Benoist is one. The French article has it that he received the award but there is no source and at that the only source for the award is the French article. Another issue is with the 1949 award; they are likely three people, one is Louis Quentin, the others Henri and Madeleine StehlĂ©. Nothing wrong with being bold I suppose..

Sunday, July 16, 2017

#Wikidata Tool - The #Awarder

The Awarder is a tool I use everyday to add people known to have received the award to Wikidata. Its use is straight forward:
  • find a list of award winners, a list that includes the person and the year it was conferred
  • copy the source text into the awarder
  • identify the wiki the data is from
  • identify the award by its Wikidata identifier.
  • open the results in "quick statements" for processing. 
Easy. When done properly the result is as good as the information from the Wikipedia it came from.

There are a few points. Some lists, like the one on the John Wesley Powell award, have the year on a line and the data is implied for the following text. The results is ten people identified. There are a few red links in there for instance for "George M. Hornberger" and Awarder has identified him so that I can click on a button to find him in Wikidata. As I did not, I added him in Wikidata for later processing. Awarder does not identify organisations as award winners so I had to add the identifier for for instance the "California Department of Transportation". John Galetzka is the award winner for 2016. He is a "black link" so I identified him in the tool with brackets and as a result I could add him as well.

For fifteen award winners it is now known that they won the award. Slowly but surely it adds to the relevance of these people in Wikidata and the missing award winners become easier to identify for the implied notability.

PS thank you Magnus for a great tool

Friday, July 14, 2017

#Wikidata VS #Wikipedia - the issue with input, output

I was told that I should not talk about quality because "on the basis of my work I did not give a good example". Basically I was told to stop what I am doing. As I have written a lot about quality and argued how we can achieve greater quality it is not funny nor is it appreciated but the guy has a point.

With 2,304,191 edits there must be a lot that is wrong in what I have done. No matter how careful I am, the percentage of errors that is to be expected means that with 6% there must be at least some 138,252 errors that I introduced. The problem is that depending on your outlook this is acceptable or it is not. When in stead of me 100 people did the same work, the result would have been the same; together they would have introduced around 138,252 errors as well.

I totally agree that we need to bring our errors down. There are three steps where errors have their origin; input, process and output.
  • My input is based on the Wikipedias; their content all have their own issues. They all operate on their own little islands; there is no or little coordinated effort to make the quality of the information we provide a collective ambition.
  • My process is based on identifying what I want to work on; typically awards, often the enrichment of data around one person. For tools I mainly use what Magnus provides; they provide superior usability. Reasonator makes Wikidata statements intelligible, it provides superior disambiguation and automated descriptions. Awarder adds both the year and the person who received an award. It allows me to effectively cover a lot of ground. They are the tools I use most, others like PetScan are also invaluable.
  • There is too much output I generate and consequently I do not care for individual edits. I justify them all for the process, the routines I follow. I added "Claudia Wills" based on the information in the article of the eponymous award. Like other notable birdwatchers, Mrs Wills does not have her own article and I added her to complement the information on the award.
We share in the sum of knowledge and when the quality of what we provide is to improve, our movement has to become dedicated to the quality of all our information. The typical Wikipedian does mostly care about his or her own project and that is fine; we do not need all of them in an effort to improve our overall quality. The effort I propose can be hidden from view.

A Wikipedia article contains many links; they are blue, red or black. All the blue links are implicitly linked to Wikidata items. Many issues become evident when they can be compared with the links in articles in other Wikipedias or Wikidata. Some Wikis have additional links and they can be mapped to red links and black links. This prevents problems when articles are written with the name suggested in this link.

Once articles on a same subject in many Wikipedias are linked, all kinds of additional functionality become easier; one that is close to my heart is when a new award winner becomes known..

Saturday, July 08, 2017

#Wikimedia project - #PlantsAndPeople

#Wikidata is a great to encourage collaboration and reporting for Wiki projects. The results of projects like the Black Lunch Table have been encouraging so for; reports for articles in multiple languages, gender ratios were possible because of the Wikidata link.

A new initiative is PlantsAndPeople. There have been editathons in the past and more are planned. It is about both people and plants so the kind of questions that may be asked will be quite interesting. For instance how many taxons were described by the people in the project and how many people were honoured in taxon names.

At this moment the people who are the subject of editathons are added. This list will grow slowly but surely and only once it is done, it can replace list in Wikipedia. It will take quite some time to get there because it makes sense to add additional data as well. This is the best way to quickly improve the quality of the data involved. So far quite a number of mycologists and ethnobotanists have been added. A question has been raised in Wikidata about people named in taxons and a picture that should be in Commons is waiting for someone else to transfer it.

When you are interested; join in the fun.

Wednesday, July 05, 2017

#Wikipedia - there once was a lady from #Estonia

Once upon a time there was a Wikipedian from Estonia. He decided to write about a fellow countryman, Kersti Kaljulaid. When your Estonian is as good as mine, it is not a name you remember or a person you are likely to have come across.

At the time this was the same for the English Wikipedians; she could not be notable because there were not enough sources in English.. So for all the good reasons the article was in danger. Our Estonian Wikipedian said: "wait a week". A week later Mrs Kaljulaid was the president of Estonia.

I have taken the liberty to add additional data in Wikidata. Mrs Kaljulaid received two awards and others award winners have been added. No sources for them in English either. To be brutally honest, incidents like this prove why English Wikipedia is only a subset of the sum of all knowledge. Because of this insistence on English sources, English Wikipedia can not cover the sum of all knowledge. People who seek reputable information on foreign subjects will not find it.

Sunday, July 02, 2017

Comparing #Wikipedia using blue, red and black links

There are reasons to compare Wikipedia articles on the same subject in multiple languages. When you just want to read, you may find additional information in another language but as you can imagine, the content should be largely the same. Consequently, the links in an article should go to articles that are about the same topic.

One problem with "blue" links is homonymy. You write a subject in the same but they are not the same; John Doe is one example. Finding these issues, issues that are surprisingly common, can be done by a bot using the Wikidata identifiers for the linked articles.

When there is no article to link to, there is no implicit link to Wikidata. There are two options; we can fake a link by accepting the red or a "black" link as synonymous or we can link a red or a "black" link to Wikidata. The latter is precise and has additional benefits.

When all links are associated with Wikidata items, it is obvious what links in what language are missing or are additional. They are of interest because they may imply potential information to be added to articles or they may point to errors even vandalism. Another benefit is that it helps establish a baseline for a NPOV or neutral point of view without a need to understand the language.

Saturday, July 01, 2017

#Wikipedia - Blue, red and black links

Lists in Wikipedia, like this list of award winners of the Tony Kent Strix award on the right exist as blue, red and "black" links. At the moment only an article in English exists about the award and based on past experiences it is likely that other award winners are known in other Wikipedias.

Based on the information in the article, it was easy enough to add the missing information in Wikidata for all the "black links". When you now compare the information in Wikidata with the Wikipedia article, it is feasible to link fixed text to a Wikidata item. This makes it feasible to trigger a warning once a blue link is possible based on new  Wikidata information. In this way a link to Jack Mills is already likely.

When we can compare the information in an article with data in Wikidata, there is an additional way to compare the information and prevent errors and vandalism. Wikidata is after all superior in its use as a tool for disambiguation.