Friday, April 30, 2010

World Press Freedom Day

Every year on the third of May it is World Press Freedom Day. It is difficult to find to find the appropriate verb for this day. You want to celebrate press freedom except, in many countries there is nothing to celebrate. In those countries there is everything to hope for, remembering the days when there was a more free press or envying the countries where the press is more free.

As I am preparing this blog, I find myself considering what is worst, the 99 journalists killed in 2009, or the 136 who were in jail on December 1, the 573 journalists arrested.. I could illustrate all three. Reading the interviews of journalists who have to deal with a lack of press freedom is also sobering..


It is a mistake to think that press freedom is only about elsewhere, journalists in the USA have been victimised by law officials who flaunt the law. In a country where the press is truly free, such officials would have to face the law.

I blog and, in a way I am like a journalist, I report, editorialise and I am happy to have my worldwide public. My subject matter is a niche but I report about what I consider relevant. In order to keep this freedom, it is important to raise the issue of a free press. A free press is essential to be free, to make up your  opinion and to be heard.
Thanks,
     GerardM

Lucid Lynx

I installed Lucid Lynx, the latest iteration of #Ubuntu. It installed itself nicely, everything still works. The little light shines blue like it should om my Compaq..


It took its time to install, but hey I installed a wee bit more then just an operating system..
Thanks,
      GerardM

Thursday, April 29, 2010

No politics please ...

At #translatewiki.net we take care of two things; internationalisation and localisation. Our developers work hard to make sure that the localisations can be presented grammatically correct for the languages we support and our community works hard to localise for their language.

The objective of all this is that people will find it easy to use the software that is localised. The most optimal situation is when all messages are localised in the target language. In order to deal with a situation that is all too often suboptimal, we have fall back languages. For some languages we have chains of such fall backs that all ultimately end with English.

Some people are annoyed with the choice we have made in such a chain. There are many ways we can accommodate people; when people want to seriously work on a specific type of localisation eg "formal German" they are welcome to work on it. The choice for a fall back in such situations is what makes sense; the German localisation is the most reasonable choice.

When people complain about the visibility of fall back texts, our standard answer is; localise the missing messages and the next day everything will be fixed at the WMF projects. Sometimes people complain about the fall back language itself. When the motivation is political, we are not impressed; what is important is that the readers understand the fall back language and when they do, it serves its purpose.

You can promote your language by providing the best localisation possible. Localising at translatewiki.net ensures that your work is available the next day at all the WMF projects, at all the MediaWiki installations that run LocalisationUpdate and ultimately at all the MediaWiki installations that don't.
Thanks,
     GerardM

Wednesday, April 28, 2010

I care but I POSITIVELY hate this crap

English #Wikipedia's featured pictures is a truly valuable asset in the struggle to create more free content. Each featured picture is among the best in its class and, there is only one such every day that gets the attention of millions of people.

Its value is recognised in the attention it gets from GLAM. It is a celebration of the effort in getting an image just right and the recognition thereof by fellow Wikipedians. This recognition is a motivator for some and consequently I think little of the proposal to remove the credits on these featured pictures.

It is a petty move that suggests that there is an appearance of "content ownership" and that our practices should be all the same. This is utter folly, our practices should reflect what is best in a specific situation and the suggestion of content ownership is negated by the free license of the material.

Sedum rubrotinctum by Noodle snacks

Pictures are typically the work of one person and consequently it is ok to celebrate their efforts. We should be proud of those who prove to be among our best.
Thanks,
     GerardM

Erik Möller is my friend

With dismay I learned about about the horrible nonsense spouted about Erik. I read his composed reply on his blog and, I am proud to call him my friend.



I am proud to call him my friend not only because we know each other for so long, because of the many projects we worked on together, because of us drinking together (me a beer, Erik a soft drink) but because I know Erik to be a "Mensch", a person of integrity.

We have had our fair share of disagreements and I know I have irritated Erik at times but these disagreements were about details, about priorities never about our core values. I testify from my own observations that few people are as dedicated as Erik in promoting and working towards our goal of sharing knowledge to all people everywhere.
Thanks,
      GerardM

CD dowloaded 4390 times in 10 days

Talk about pent up demand, 500 #Wikipedia articles have been put on a CD and within 10 days, there have already been 4390 downloads. Guess what would happen if all of the Malayalam Wikipedia would be available in this way..

That would become possible when the Malayalam content would be available for the Wikireader. The Wikireader people are testing with content in other languages. This does include support for the Greek and the Cyrillic script.

It would be really great if the Wikireader is also able to support the Indic languages, it is a big market and Internet access is not everywhere a given.
Thanks,
      GerardM

Tuesday, April 27, 2010

Investment 101 .. put money in growth

Conventional wisdom has it that the English #Wikipedia has plateaued. The same is true for the German and the Japanese Wikipedia. From an investment point of view, there are two reasons to put money in stagnant properties: to maintain the value of the property and to grow a property even further.

English is going down even in the USA
The English language Wikipedia is currently the most valuable resource for the WMF, but when you look at the trends, its traffic as a percentage is going down in most countries. At the same time the endemic languages are growing.

One of the truisms of investment is that you invest in where the biggest growth is, this would benefit languages like Russian, Spanish and Portuguese. The problem is that this does not help any of the other languages including English. So there is a need for a different approach.

Considering the formulated approach of leaving local things to chapters, the WMF can invest in technology and approaches that are neutral to languages but promote growth. A truly multi-lingual Commons, improved usability for MediaWiki and Commons, tools that identify demand..

There is only one aspect where language targeted investment makes sense; this is where a level playing field is established. This is where we fix issues that prevent a language to be properly expressed on the Internet.
Thanks,
      GerardM

Monday, April 26, 2010

The anonymous coversion rate

For #Wikipedia, editors are its most valuable resource. Theory has it that there is a conversion rate whereby anonymous editors become registered and a more visible part of the community. This is the theory and, when it works like it does for the Russian Wikipedia and the English Wikipedia it works beautifully.


When you compare these Russian statistics with those for the Malayalam or the Hindi Wikipedia the absence of anonymous edits is striking. The question is why are there so few anonymous edits.


With so few anonymous edits, there will be no conversion to registered editor. This means that other strategies are needed to grow the community. Having people available to spread the word about "the encyclopaedia that everyone can edit" is one. Making it as usable as humanly possible is another; the usability project is doing what it can. A third is what the Indonesian chapter hopes to achieve by training Wikipedians and make it sustainable by making it a "train the trainer" project as well.

These new statistics from Erik Zachte help to appreciate such diversity. They truly enrich what is already an important resource.
Thanks,
     GerardM

Sunday, April 25, 2010

#Translatewiki.net is working on a new main page

As #MediaWiki is no longer the only application localised at translatewiki.net, many of its community feel that the presentation and the structure should reflect this. Several people have been working on a draft for a new main page and, we think it looks nice.


There are several things that need to be resolved;
  • information that used to be on the main page needs to go elsewhere
  • the new main page must be localised and it has to be readied for this
  • how do we target developers new to translatewiki
  • does this look good in all browsers (we had IE issues)
  • does this provide the best usability
Thanks,
     GerardM

#Wardriving, #Google vs the Wire

Wardriving is: "the act of searching for Wi-Fi wireless networks by a person in a moving vehicle, using a portable computer or PDA." The best known objective is to find an available Wi-Fi network to connect to the Internet.

Another objective is to geo-tag Wi-Fi networks. This has been done on numerous occasions and it resulted in maps that include overlays to Google maps like this map of Seattle.

There are use cases for such maps; for the purpose of law enforcement, it shows where a person is connected to the Internet and for advertisement, it shows where a person is connected to the Internet. On its own, it is a little thing. When you look at it in the context of what has been so brilliantly documented in the Wire, it is part of a technological fight against crime and when you look at it in the context of Google, it is part of the increasingly sophisticated way of serving locally relevant adverts.

Slowly but surely personal privacy becomes less private. People, organisations and organisations can address you because they know where you are. The weather is fine, I may take my bike and without a mobile, you may think I am at home because my laptop will be left turned on... I hope nobody cares, will miss me as a result and get annoyed ...
Thanks,
      GerardM

Saturday, April 24, 2010

ፊደል means #script or #alphabet

The Amharic and Tigrinya #Wikipedia use the Ge'ez or ግዕዝ script. I asked Merhawie at some stage about font support for the script. The existing open font for the Ge'ez alphabet is incomplete and needs to either be replaced or added to.
My Linux support for the Ge'ez script

Some 34 million people use this script for their language and it is an obvious inhibitor for the Internet. There are already two Wikipedias that use this script. When we enable our Wikipedias by supporting the missing characters, we are likely to get attention from people that we need to grow our community.

To put the potential in perspective, the Dutch language is spoken by some 22 million people and the Dutch language Wikipedia is currently the 10th biggest Wikipedia
Thanks,
        GerardM

Friday, April 23, 2010

The whole #wikimedia domain is blocked in #Iran

Not only #Commons but the whole Wikimedia domain is blocked in Iran. Consequently websites like Meta, Strategy and Usability are no longer available. Our Iranian community can no longer influence the road that is ahead of us.

This saddens us because they are part of us, they are part of our world. We could propose a Geograph project for Iran. In this way we would learn more about Iran in the same way we learn about the United Kingdom and Germany. In the end you can only demonise another people and an other culture when you do not know them.
Thanks,
      GerardM

£3.99 a month for #Facebook ... what a #scam

A friend asked me to "join a group" protesting a monthly fee for Facebook. I googled and found that there is a scam trying to get you to an outside website in order to get some malware on your PC. There have been several such scams and currently they ask for £3.99, funny as the pound is a British currency and Facebook is American.. $3.99 is what I would have expected.

So why are people afraid of Facebook asking for a monthly fee? Maybe this is because its "value" of multiple billions of dollars cannot be explained. Much of it is in its ability to sell adverts, to sell information about people and because it takes your data prisoner.

The high valuation of Facebook is tolerated because of its growing audience and its growing traffic. It is about to subsume the Wikipedia content and what is of no monetary value to the Wikimedia Foundation will become valued and monetised traffic sustaining Facebook. There are similarities with a pyramid scheme and when this proves to be the case, the high hat will not have money popping out but a customary bunny.
Thanks,
       GerardM

"Results from the past offer no guarantee for the future ..."

This is part of a mandatory quote when products of the financial sector are advertised in the Netherlands. In life insurance, it is calculated what average result can be expected. As an individual you can improve your odds with a proper diet, exercise and staying out of any firing lines.

When you plan a strategy for a multi-faceted organisation like the Wikimedia Foundation, numbers give a baseline, something to measure against. However as we decide to focus on what has the biggest impact, what is known to work best, we ossify. We expect all our smaller Wikipedias to grow in a similar way, some do and some don't.

We do know that we can influence results but in order to do this we first need to be convinced what is to be done. Numbers will not help us because they at best show the impact after the fact. A good example is the effect of the LocalisationUpdate extension. It has had a measurable impact on the quantity of MediaWiki localisations at translatewiki.net and, there are indications that this had an impact on the traffic on some Wikipedias but sadly there are no hard numbers that establish a relation between traffic and localisation.


The aim of the Wikimedia Foundation is to increase the reach of our projects. This means that we want to have more traffic. When you look at the official WMF documents, you find this list that is incomplete. The reason given is that the source for these numbers is not that great outside of the first world. It is however not the only source, Alexa is more informative. So with numbers from elsewhere, we have more of a clue about our impact in a given country and consequently we are better able to "measure" the evolution of our impact.

When numbers are available, we can look for anomalies. Why does the Indonesian Wikipedia outperform all the Indian Wikipedias combined. According to Alexa the impact of Wikipedia is higher in India.. and, it is not that they do not know their English in Indonesia. Problematic is the support for the scripts of the Indian languages. This showed itself when we tried to support  the mobile Wikipedia interface and it showed in the problems realising the 500 article Malayalam Wikipedia CD. The Indonesians have fewer editors but many more readers ...

There may be other factors as well but, is it not simpler to fix the known issues first? We know that many of our languages are badly supported, the fact that Wikipedia supports more languages then the CLDR makes that sufficiently obvious. What does it take to support our languages? What does it take to overcome the bias that has us only support the Wikipedias that are already big? What does it take to make Wikipedia a platform that will truly bring the sum of all knowledge to everyone?
Thanks,
      GerardM

This does not completely answer the question, what does it take to invalidate a strategy. More is needed to do that.

#Commons is blocked in #Iran

Commons, the biggest resource of freely licensed educational media-files has been blocked in Iran. Commons as you may know is the shared media repository used by all the projects of the Wikimedia Foundation, most notably Wikipedia.

With 6,489,509 freely usable media files Commons is the premier source for illustrations of projects of students of all ages worldwide. The pictures at Commons have been contributed by volunteers from all over the world and they give an unequalled coverage of every country and every culture.

The flag of the Islamic Republic of Iran as seen in Iran
In order to improve on the educational qualities of Commons, representatives of the Wikimedia Foundation are in ongoing discussions with museums to open up the cultural heritage of our world, we have welcomed several important collections in order to strengthen the free and educational nature of Commons.

The flag of the Islamic Republic of Iran as seen in the rest of the world
We are saddened that the Iranian students are deprived of such an important educational resource and we hope that the decision to block Wikimedia Commons will soon be reverted.
Thanks,
      GerardM

Thursday, April 22, 2010

Mündung der Lauchert in den Lauchertsee

I was happily surprised when I learned that Geograph has a German sister project. The British project covers some 76.3%, the new German project already 1%.

Both projects advertise their CC-by-sa credentials and, both projects inform the world that their material is available for use in Wikipedia. The British images can be found on Commons, and the German images as well.

© Copyright Hansjörg Lipp and licensed for reuse under this Creative Commons Licence.

All the Geograph pictures come with geo-tagging. They are another large collection that provides provenance to this type of data.. Another example that the existing absolute lack of discussion with OpenStreetMap is plain silly.
Thanks,
       GerardM

The gospel according to John

#SignWriting enables the writing of sign languages, all sign languages. The only way to learn if it is for real or just an academic exercise is .. do people write and do people read.

The bible has always been one of the first books that is translated in a language. The gospel according to John is now available both as a book and for reading on the web.

That is the writing part





This video shows Ed, Ed is deaf from birth, he does not know SignWriting in depth but he got interested and studied it a little. He then reads chapter 3, verse 16 from the "gospel according to John" from the book.

Have literature, can read!!
Thanks,
      GerardM

Wednesday, April 21, 2010

#Feedjit Recent visitors

Recently Google translate was added as a function to this blog. Since then I have seen a growth in the number of visitors to my blog.

The Feedjit recent visitors show a sliding window of visitors and, as more Asian visitors show up, I see fewer American visitors. It is obvious from this map that Europe is "my" heartland.

The business model of Feedjit is one where they inform about traffic on your website. Lately, they ask people to identify themselves as a guest to my blog with a facebook, twitter or other account.


I am not sure what to think of it.. On the one hand it is nice to know when my ma or my sister have visited, but it is also the kind of information I do not need to know, and I do not want it to be aggregated for my blog and up for sale.
Thanks,
      GerardM

The #location of the Ruïnekerk is 52°40'10"N, 4°42'1"O

The Ruïnekerk is one on a list of monuments in Bergen (NH). This list is available on the Dutch Wikipedia and, this list was provided to the Dutch chapter by what can be considered the "copyright holder" for this data.

This is a well defined subset of clean data that exists in a Wikipedia. The problem that we face is that we do not know for many data-items who the copyright holder is. This is not only a problem in the free content world, it is no less a problem in the commercial world.

In Techcrunch there was an article proposing an open database with the kind of data we want to express in OpenStreetMap format. The funny thing is that almost all these commercial organisations have much of their data because of the cooperation of the public. That is in and of itself not enough of a reason to make such an open database freely licensed. What should be convincing is the ability of mashing data .. particularly with Wikipedia.
Thanks,
     GerardM

Tuesday, April 20, 2010

A #keyboard to enter "foreign" characters ..

#Google announced new functionality for people who search text in a different script.. "ISO 9995" is the ISO standard defining layouts of computer keyboards. It is a great and honest way of enabling people to enter search items..

Russian keyboard layout

The idea is lovely and, when this "virtual" keyboard is available not only on their search bar but also as an entry method for where the cursor is.

I would love for Google to go overboard and support scripts like the one for  Javanese, for Malayalam, Kannada and Nepali.. As the list goes on and on, I will not go overboard but, you get the idea..
Thanks,
      GerardM

Monday, April 19, 2010

Of tiny acorns mighty oaks grow

There is no Ganda text yet
The Ganda Wikipedia is a tiny #Wikipedia. Last month its number of articles grew by 6%, or two articles, its traffic is expected to grow to 42K up from 26K and today we welcome localisations at translatewiki.net.

If the Ganda language were in the incubator, few people would notice because it has quite some way to go before it reaches the requirements for creation. However, this is an existing Wikipedia and as such its current activity is really welcome.

When you consider that some 10 million people speak this language, roughly 10 times more then Estonian, you can appreciate that this tiny acorn has potential. As Hans Rosling explained in one of his talks, there are people in Africa who can, who have the potential and who have the possibility to do whatever we do in the west.

Our strategy should be to enable the people who can. To provide the tools to make a difference. A difference can be made when we know what people want to read.. The WMF wants to grow its traffic and, the Ganda Wikipedia grew 53% on a yearly basis. To compare, last months traffic for the Estonian Wikipedia was 6.3 M and decreased on a yearly basis by 5%.
Thanks,
       GerardM

Travelling to #Holland

The #Tropenmuseum is preparing an exhibition of pictures by Leonard Freed (1929-2006) and will be called "One way trip Holland". Freed made pictures in the period 1958-1962 of a diverse group of people who immigrated from Indonesia. Among them were Dutch Indonesians who belonged to a large group of people who made a trip by boat after the independence of Indonesia.

Freed took pictures when they got of the boat, when they arrived in temporary camps, when they met their family, on the street, working and at school. He also took pictures of Moluku families who had already been living in the Netherlands for seven years.

Who is this man, who is this kid.. Are they related?

These pictures are exhibited for the first time. Many pictures show unknown people. The Tropenmuseum wants to know if you can identify these people.
Thanks,
        GerardM

#StatusNet is nice, it is getting awesome

When you consider your own microblogging service, you want to know what StatusNet already provides and, what it is working on. When you want to experience the functionality first hand, identi.ca is the place to be.

So what is on offer;
  • your server does not suffer from the Twitter fail whale
  • the software is available under the GNU Affero General Public License
  • consequently interoperability will not be revoked because of corporate interests
  • #sntrans is the identi.ca group with status updates for the localisation
  • StatusNet has excellent localisation in 14 languages. Decent # in another 15 languages
  • the core software is localised at translatewiki.net and, you are welcome to make a difference for your langugage ...
  • Bi-directional text support is being tested !!
  • preparations are made to localise extensions as well
All this makes it a real nice package.. Open source software with professional support available. Localisation in many languages and extensions to add the functionality that is not provided in the core functionality.
Thanks,
     GerardM

The Getty does not get it; it is the digital age !!

I received an invitation from the Museum Computer Network (MCN), Gallery Systems, and the J. Paul Getty Trust for a *free* webinar on a new vocabulary
under development, the Cultural Objects Name Authority™ (CONA).

The idea is that people who know GLAM terminology will contribute to a new vocabulary, to use the vocabulary they have to pay for the privilege and will then get a five year license. In my terminology this is called "adding insult to injury".

I am sure that there are enough reputable organisations is the GLAM world who can organise such a cooperation and consequently build a vocabulary that is equally useful and equally reputable. The technology for building such a vocabulary can be found in OmegaWiki, I am interested to learn if the Getty is able to provide a user interface in as many languages..
Thanks,
      GerardM

English #Wikipedia picture of the day

Today's picture of the day, shows an Ottoman machine gun corps, before the Second Battle of Gaza, which took place on 19 April 1917.

POTD/2010-04-19
It is a picture from the archives of the Library of Congress. Sadly there is no link on the description page that points back to this image in their digital archive. Consequently there is no proper provenance and we cannot add the template that this is a LoC picture.


With this template in place, we will find at the end of the month that the LoC has generated even more traffic. Such numbers are important to our GLAM partners..
Thanks,
      GerardM

Sunday, April 18, 2010

The best of the #Malayalam #Wikipedia

Promoting Wikipedia on the Indian subcontinent is important. All the Wikipedias can do with more readers, more editors and more attention.

The best 500 articles of the Malayalam Wikipedia have now been published on a CD. This project was presented at the Malayalam Wikipedia Meetup 2010 and it was announced on the blog of Santhosh Thottingal.

When you read the announcement, you will learn about the technical issues they faced including issues with Unicode. It is really nice that the software written in Python is available for use with other languages, my question is, is the software in English or does it need localisation..
Thanks,
      GerardM

Friday, April 16, 2010

What does it take to invalidate a strategy

The #Wikipedia strategy presented by the WMF director is significantly at odds with my perception of the Wiki world. I have argued in the past against an overly reliance on numbers.

It is easy to show fault at some of the statistics. Those faults can be remedied, but in my opinion allowing for those faults, the arguments that allow for the persistence of those faults are what invalidates the statistics and the conclusions based on them.

screenshot at 18:48 Amsterdam time

It has already been announced that there are a billion edits. This is at odds with the counter that I blogged about in the past and, this counter is generally known. I discussed the higher number with Avatar and he had to admit that even his numbers were wrong.

Numbers and by inference statistics support a point of view. When you use statistical analysis properly you infer from the numbers the reality of a situation and this allows for the formulation of a strategy. All too often the result has been predetermined and consequently the approach to the numbers will not only confirm held convictions, they prevent optimal results.

I wonder if the WMF is willing to consider the bias in its numbers and the impact on its strategy.
Thanks,
      GerardM

#WMCON if it is about traffic why do we look at articlenumbers

The #Wikimedia Foundation chapter conference can be followed on twitter, and it is interesting to learn how certain ideas go wrong.. Take for instance this statistic..


It combines the growth of our number of contributors and the number of articles. A central theme of the conference is that we want more people to read/use our content. This suggests that it is more important to learn the number of contributors and the traffic that they generate.

The traffic numbers can be found here and they suggest that the Indonesian Wikipedia generates more traffic then all the Wikipedias from the Indian subcontinent combined.

The current fascination with big numbers ignores that a few editors can have an exceptional impact. Writing about all the motorcycle brands and types generates many articles but who reads them? Writing background information to the news is harder to write but is more likely to provide what people are looking for.

There are lies, damned lies and statistics; when you are not careful statistics confirm what you expect. When our aim is having more people use our data / read our articles, then that is what we should stimulate. At this time there is a lot of technology we do not have because it does not fit well with the fascinations of the high numbers. Such technology would benefit the growth of ALL our projects and will therefore provide the most bang for the buck.
Thanks,
     GerardM

I am leaving on a jet-plane ...

Eyjafjallajökull ash
... don't know when I'll be back again...

The #Wikimedia tech meet, the chapter meeting are under a dark cloud; people cannot come or go. In Denver there is another conference with Wiki people who will have to return home.

I wish everybody who gets delayed will find interesting, fun things to do.. It is one of those unavoidable things, so why not make the best of it?
Thanks,
       GerardM

Housing "crazy junks"

In a truly civilised society, it is expected that we look after the sick, the destitute. This is often motivated by faith, faith in a deity, in humanity or faith that help will be there when lady luck hands you a bad hand.

For people who suffer from long term psychiatric conditions, just living like everyone else cannot be taken for granted. They need support in order to maintain their place in a society that all too often is averse and anxious when they are around. When such people are both "crazies and junks", they are treated like toxic waste.

Such people are often housed separately; when you consider them criminals, they go to jail, when you consider them sick, they go to a hospital, when you consider them addicts, they go for a detox. The sad truth is that many "crazy  junks" do not fit in such a standard framework, they often end up living rough or in jail. They are feared by society, by their family and are left without much of a social framework.

Statistics have it that everyone can end up in such a state. It is the worst nightmare for an individual, his/her family and for society. In Utrecht experience learns that such people are not beyond help; they do not need to live rough as they get to a stage where they accept shelter; a roof over their head, a place that is theirs.

In my town Almere, in my backyard, they are planning such a hostel. A hostel for people "with a double indication". It will be the first of the many needed to provide care for all such people that live in my town. I have two options, I can  get involved and help the hostel be safely embedded in my neighbourhood, or I can be opposed and try to prevent my fellow Almerians from finding a safe haven in what is their town as well.

I prefer to be part of a solution. I became part of a group of people who  oversee the preparations, the future exploitation and procedures. This group consist of local government, police, experts and neighbours. You can imagine that my choice is not appreciated by everyone. I hope and expect that when the hostel exists for a few years that most people will have forgotten the current controversy.
Thanks,
      GerardM

Thursday, April 15, 2010

#DBpedia 3.5 has been released

#Wikipedia aims for its content to be used. Reading it, mashing it, analysing it, republishing it whatever.. DBpedia is an amazing example of what can be done with all the data gathered. To quote from the 3.5 release announcement:

The new DBpedia knowledge base describes more than 3.4 million things, out of which 1.47 million are classified in a consistent ontology, including 312,000 persons, 413,000 places, 94,000 music albums, 49,000 films, 15,000 video games, 140,000 organizations, 146,000 species and 4,600 diseases. The DBpedia data set features labels and abstracts for these 3.2 million things in up to 92 different languages; 1,460,000 links to images and 5,543,000 links to external web pages; 4,887,000 external links into other RDF datasets, 565,000 Wikipedia categories, and 75,000 YAGO categories. The DBpedia knowledge base altogether consists of over 1 billion pieces of information (RDF triples) out of which 257 million were extracted from the English edition of Wikipedia and 766 million were extracted from other language editions.

All this data is there to be used. That is the whole point to DBpedia. It makes all this data available in a format that makes it easy to mash, to analyse, to verify. The key thing is that through DBpedia Wikipedia is linked to many data sources.
  • we can project geo data on maps
  • we can compare / verify data with other sources
  • we can improve the consistency of our data
  • we can compare the data between the different language versions
  • we can link our illustrations to the GLAM that hold the original
  • we can link our sources to public resources where you can read the original text
  • we can link our sources to the libraries where you can find a source
DBpedia is a tool that exists today, a tool that wants to be used. It is the kind of tool that helps us out of our isolation and provides us with a niche in the wider data world
Thanks,
      GerardM

Wednesday, April 14, 2010

#mw2010 you provide provenance, #wikimedia provides traffic

Many #Wikipedia illustrations are of material that is clearly in the public domain due to age. The material that we have in our Commons repository can be a scan from a book, it can be a picture taken in a museum.

The current practice is that the source of an object does not need to be more then "painting by Rubens". Typically the original of many of our illustrations can be found in a GLAM and to appreciate the illustration for its validity, it is important to know where the original can be found.

The provenance provided in this way, allows people to learn about the original. In this way they can visit the GLAM and study the original. When a GLAM shares its collection with us, we ask for annotations and for a way to refer back to the website of the GLAM where the object can be found.

In this way we provide provenance to the readers of Wikimedia information, we also find that people learn to appreciate the GLAM for preserving our heritage. For some organisations like the Library of Congress, we have no direct connections but they have become relevant to us and our best practices have us refer to their website.

the Library of Congress template
 It is in both our interest when the objects that are preserved by a GLAM refer to that GLAM. I do not see any problem with appropriate templates that inform about the location of originals and provide the provenance that we expect of our sources.
Thanks,
     GerardM