Tuesday, May 31, 2016

Facilitating the use of Wikidata in Wikimedia projects with a user-centered design approach

A lot of students learn the ropes as an intern working on Wikidata. They do their thing, write a thesis and the good news is that their work is appreciated and used. 

Charlene Kritschmar is the latest to write a thesis and it is on an approach to the use of Wikidata in other Wikimedia projects. The central question is how to manage Wikidata data from a Wikipedia. I like what I have read, there are a few things that need to be considered.

Wikidata notability and the thesis have it that "everything that has an entry in Wikipedia can also be an entry in Wikidata". Technically it is the other way around; every item in Wikidata may have an article in a Wikipedia. The difference is profound because a new article in a Wikipedia may have a pre-existing item on the topic in Wikidata.  It follows that statements may already be present. It makes it less cumbersome to write an article and fill an info box with data. 

Wikidata includes 17,677,925 items and the biggest Wikipedia knows about 5,164,030 articles. This makes Wikipedia centric thinking problematic. What any Wikipedia offers Wikidata is a big community who may improve the data quality on Wikidata and by inference improve the quality of all Wikipedias. The flip side of this coin is that there is no Wikipedia leading on what Wikidata has to say on any given subject.

Sunday, May 29, 2016

#Facebook - Nataliya Kobrynska

For whatever reason a fellow Wikimedian elected to give attention to Mrs Kobrynska. He posted about her on Facebook and when I have the time I may add some statements in Wikidata. It was easy enough to improve the quality of the data and I read in the article that she was the daughter of a parliamentarian a Mr Ivan Ozarkevych.

I could not add him as her father because there was no label for him in English. I mentioned on Facebook that I could not find him and, a label and the relation was added. The father had an article on the Polish Wikipedia, it referred to him as a member of the Galician parliament and it was easy and obvious to add the fact that he was a parliamentarian and a politician. Not only for him but also for his fellow parliamentarians.

#Wikidata - Debunking #controversy in #science

I really wonder what an organisation would do that hands out "one of the scientific world's most respected environmental prizes" does when one of its luminaries becomes controversial.

The Volvo Environmenta Prize was awarded to Mr Ray Hilborn in 2006. Mr Hilborn and his science has become controversial because of the conflict of interest he has with the fishing industry. Greenpeace has documented this quite publicly.

Together with Mr Hilborn, Mr Pauly and Mr Walters were awarded the Volvo Prize. They are all known for their work on fisheries. The obvious question is now whether the work of Mr Paul and Mr Walters are tainted in the same way. This is one reason why controversies like this are so important.

When a specific line of work in science has been debunked, it becomes important to undo the damage and reevaluate the work in a field. One of the more obvious ways to make this point is for the Beijer Institute to address this issue in one way or another. When the science of Mr Hilborn is unsound, it follows that his work does not point to a sustainable future and that he does not deserve the Volvo Environment Prize.

#Wikipedia #citations - LibraryBase

The general idea is that if Wikipedia articles are to be believed, citations ensure the quality of the statements made. The quality of the sources is therefore important. When a specific publication has a problem, a problem like reproducibility or a known conflict of interest of the author or the organisation he stands for, it follows that the publication as a source becomes problematic.

The problem with sources in Wikipedia is that like all the rest they are buried in the articles. As sources are typically known in the text through templates, it becomes possible to harvest all this and put it in a database. When things get into a database it becomes possible to analyse the data and find the authors that are problematic, refer back to the articles and remedy the inherent conflict in the article.

Take Mr Ray Hilborn for instance. He is under attack for his conflict of interest by Greenpeace. Consequently his POV needs to be collaborated by independent sources and all his science is suspect. It is wonderful to harvest all the data about sources from all the Wikipedias but there is no point to it when it does not lead to something useful.

There is a lot of money going around to confuse issues and serve specific interests. When sources are available to us all, it becomes possible to mark publications for the quality that they have. When sources are not reproducible, it follows that you can not build arguments on top of those. It then becomes possible to consider basic stuff and no longer confuse a Neutral Point of View with what is patently false.

Wednesday, May 25, 2016

#Wikidata - Kerala MLA constituencies

Kerala is one of the states of India and like all the others has its own legislative assembly. Like in Great Britain politicians are elected from constituencies. There are many as you can see on the map.

When there are elections, things change. New people become a representative, some remain a representative and others no longer have relevance in that way. At Wikidata, the current list of people who are "Member of the Kerala Legislative Assembly" is a bit of a mess. There are many items without a name in English, there are people who are only known in English and probably there are a lot of doubles. 

There are even representatives who are known to have an article on the English Wikipedia but do not (yet) have an item. This is all because of this big push to write articles on Indian representatives.

As more work is done for this big push to get the data complete, the data will become more informative. What we hope to achieve is:
  • associate MLA's with constituencies
  • have labels in both English and Malayalam for all of them
  • merge all the possible duplicates
Obviously there is more that might be done. We could add the dates when people became a MLA. This will allow us to create queries that shows who was a MLA at what time. When all this is done for Kerala, there are 28 other Indian states and there are many other countries that could do with a little bit of TLC.

Wednesday, May 04, 2016

#Wikimedia - [[citation needed]]

Our articles in any #Wikipedia can be trusted when an effort has gone into providing sources. Sources or citations are very much needed because help us distinguish fact from fiction. Finding sources exposes an origin and it helps us debunk fiction. The result of this continued effort is content that can be trusted as a sincere attempt to achieve a neutral point of view.

There are very practical problems. Sources are not always easy to find and they do not exist in every language. Sources are often behind a "pay wall” making access to the body of knowledge is very much restricted. Sources, particularly sources on the web do not exist forever. The consequence is that sources are problematic and, not everybody is equally able to help us with sources for the content we have.

When we are to improve the current, unsatisfactory situation we have to address multiple problems.
  • Once sources are lost we rely on the internet archive for an historic view. It has policies that allow for the removal of content and this is often the content that is controversial and removal is often intended to rewrite history. What to do?
  • Access to restricted sources is provided to the privileged few who have access to libraries. The WMF has a program that enables some of our editors access to a few pay-walled sources.
  • When this proves insufficient, it is great to know that  Sci-hub among others provides “illegal” access to any and all sources.
Open access to sources is very much what we as a community care for. One of our own died in the struggle for this access so I do not think we should be deferential to an industry that is despicable. We should teach people how to find sources and ignore licensing as much as possible.

#Wikipedia / #Commons - Brigadeer General Loree K. Sutton

Mrs Sutton is psychiatrist who is a specialist on PTSD. When you read her CV, it is impressive. She no longer works for the US Army, she works for the City of New York.

When you read the article on Wikipedia, you find her picture. It is marked as Public Domain and it is not on Commons. Given that Wikidata is working towards the point where copyright and license information one can only hope that images like this can be easily shared based on the license.

When Commons started, it was intended as a repository that prevented the same file to be uploaded to all the Wikipedias. As such it served its purpose remarkably well. With Wikidata it becomes trivial to share images like the one of Mrs Sutton.

I fear that for some this reads as frightening. It undermines the one thing they love. It actually does not need remove the need for Commons as a platform. Quite the opposite; it will bring new tools to finally leverage all the data on images. It may bring this image of Mrs Sutton to Wikidata for starters.

Saturday, April 23, 2016

#Wikidata - its sex ratio II

In April 2014 I blogged about the sex ration at Wikidata. At the time there were 1,332,383 "humans", 760,616 were male and 154,455 were female. Now in April 2016, the numbers are different: there are 3,135,792 humans, 2,442,444 are male and 466,748 are female.

The percentages were: 57% males, 12% females and 31% unknowns. This time they are 78% male, 15% female and 7% unknown.

Based on these Wikidata numbers, the gap between men and women has substantially increased. On the other hand, the number of humans that were not identified as male or female has substantially decreased.

This does not mean at all that the movement to chip at the gender gap is a bust. Far from it. Numbers only expose realities. What can easily be achieved in Wikidata is more focus on the females in any group. The subject I focus on is mental health and I concentrate on female psychiatrists or psychologists. I add statements for them and add where possible the data from categories to Wikidata. In this way they become better connected, more information becomes available. In this way the subject I care for gains quality and relevance and it is women who benefit most.

Numbers provide an indicator, when numbers are this big they should not have our focus. At best they move glacially. More relevant is to know if they as a group, gain more readers over time. These numbers reflect an increase in quality of articles and data. That is an approach that has potential.

Thursday, April 21, 2016

#Wikidata - #YLE, the #Goldman environmental Prize and the #ArticlePlaceHolder

#YLE is a Finnish public broadcaster that announced that it will use Wikidata to label its articles and news items. This is really cool because it means that they have an interest to supply missing labels in Finnish and as a consequence we actually benefit from them.

So let us consider what we can do to make both their and our life more pleasant.

When something happens that is "notable", for instance the latest announcement of the Goldman environmental Prize awardees, we can add the winners. One of the winners is from Cambodia, It can trigger a request for Mr Leng Ouch's article to be written in Cambodian. We can update lists of award winners of the award. We can link to articles in the Finnish press for each and all of them.

Once more newsagents use Wikidata, new use of labels indicates breaking news or renewed interest. This may help journalists worldwide to stay on top of what is current. This may all happen but the most important benefit is that it ensures that Wikidata remains up to date.

Wednesday, April 13, 2016

#Wikimedia - Jimmy Wales is not a constitutional monarch

Thank you Durova
Having known Jimmy / Jimbo Wales for a long time, I appreciate him for the many things he does. Particularly the many things we do not really hear about. Jimbo either has the ultimate conflict of interest, or is best positioned to do well for the Wikimedia Foundation and its projects.

At the start Jimmy was the founder and financier of Wikipedia and as it became a bigger success, he could no longer afford his hobby. He was apprehensive to let go and slowly but surely handed over more power to what is the board of the Wikimedia Foundation.

His role changed and he became more of an ambassador at large. Jimmy is not a constitutional monarch with an entourage that prevents him from being "political" or personal. I have personally experienced on several occasions where Jimbo was instrumental in bringing people together. It is why I am more than happy to express my happiness that he is who he is and does what he does.

The only question I have for his detractors is: if not Jimmy who else can perform the role that is uniquely his?