From: Peter Steward <peetmate@gmail.com>
Date: 2019-05-07 18:54
Subject: Re: Kenya Bird Map data

Hi David,

thanks for this! This is a great start. If we can find a mechanism to recover the meta-data from the Lasser this will be a wonderful research resource. I am guessing there must be a no-reuse or re-sharing clause when retrieving data from Lasser so that we can't just bulk download all the data once per year and store then share from our own database? A suboptimal solution as we'd then have to make our own server and API system, but in this day and age that can't be particularly difficult and surely the dataset is relatively small compared to the gigs of data we usually work with?

It would be also good to have a forum/document for the needs of stakeholders/users (use cases) and where the KBM and ornithological community stands with this in Kenya. We might need to source further resources to solve the issues mentioned in your email and to fund the exciting outcomes that can be generated from the data it would be good to have a clear statements of needs, objectives and pathways to impact. 

There are also a whole suite of open-source resources for sophisticated analyses of biodiversity and species data I only superficially understand, these new approaches could be efficiently applied to atlas data to explore things like Encounter Rates, Environmental Associations, SDMs, Occupancy Modelling and Relative Abundance Modelling. One of my bosses is a Professor of Spatial Ecology who loves looking exploring drivers of diversity and pattern of change (although mostly with pollinators) he's often working with gappy data for species less studied than birds, I can easily get his input into technical aspects of analysis.

Best,

Pete



On Tue, 7 May 2019 at 11:15, David Clarance <dclarance@gmail.com> wrote:
Dear all,

Over the last 1.5 years I've been somewhat involved in trying to access and analyze data from the Kenya Bird Map. While the methodology is sound and the efforts have galvanized a community around it, I can't help but see issues that go against the very ethos of open and citizen led science. I hope this note is taken in a positive manner - as a way to push for the rights of Kenyan citizen scientists and to strengthen our approach to conservation. 

First, a note on the arrangement of collection and storing data. Data is collected via an app called BirdLasser which is an independently owned private firm based in South Africa. Some of the data is transferred, in a very specific format, to the University Of Cape Town servers which host the data and provide the back-end for the various bird maps (Kenya Bird Map, Nigeria Bird Map etc).

Here are a few points I would love for the more senior and influential birders and researchers to think about:

1. The data is owned by a private firm. 

This to me is the biggest issue. The folks at BirdLasser are wonderful and no doubt committed to conservation and science as their constitution declares. However, their Terms and Conditions are a bit more nuanced and they are not exactly committed to sharing data with researchers. Let me start with an example: A few researchers and I were interested in using the GPS coordinates to build species distribution maps. We approached the KBM to learn that the servers at UCT do not actually have that data. We went to BirdLasser and were told that the user agreement states that they cannot share data without the explicit consent of the user. This means that it is virtually impossible to get all the data since you would need every user to give consent. However, there are two ways to do this:

(a) Register a cause: Users sign in and give consent. Again a really difficult thing to do because you need every user to sign up manually. Go to causes > sign up etc.
(b) Ask for raw data: Only the GPS points, date and time, species is available for a one time data ask. Anything post this you would need to pay ~KSH 5000 for every future data request. Note that this is for EVERY future request post the first one. I get that it's expensive to maintain servers and the app but there are two major red flags for me here:

(i) Citizen scientists collect data for free using their own resources and time. Surely they should not be charged for using data generated?
(ii) Can a Masters student in Kenya actually afford that amount? I certainly would not be able to if I was a student. 


2. Getting data has been incredibly difficult

I work as a data scientist with a background in experiment design. When I learned about this incredible data source, I was REALLY excited and was very surprised by how little it has been used. Colin Jackson and I tried very, very hard to get data for over a year and only last month were able to get access to one API call. The researchers at UCT (who I'm sure are busy) have been very unresponsive to requests for data. Further, the data we got was in a very specific format that was designed to produce reporting rate curves. I used the API call to create a library in R for researchers to use. You can find the repository here

As you can imagine, this process would virtually derail a PhD student's thesis if it takes a year to get summarized data. I was fortunate enough to know Colin and others who pushed strongly for it, but most students do not have this. I think it's extremely unfair that researchers at UCT have access to data collected in Kenya but Kenyan researchers do not. 

3. Valuable data is missing

The various bird atlas projects were designed before BirdLasser came into play. This means when the atlas was shifting over to used BirdLasser as an input source, they chose to stick with the old tables and not update to include valuable information such as breeding or the various other options that BirdLasser provides.

What does this mean? Most of the additional information that you input into the app such as breeding information, counts, all the species info are actually not captured by the Atlas. This makes sense to a degree because the atlas is meant to capture records in a square but pause for a bit and really think about it. We are losing incredible amount of information especially breeding records that are captured by users but not used in the map? If you refer to point 1, this means you would potentially have to start a cause or pay to get all this additional information. 

I think BirdLasser/BirdAtlas is a valuable tool to maintain a bird map, but if want to go deeper and think past presence/absence it gets murky. I think most citizen scientists in KE are under the impression (as was I) that all of this data is available to researchers but it really is not (at least not easily). 


So here's my challenge: 

(a) Is the process of data collection and storage through BirdLasser really citizen science? Citizen science comes with free availability of all the data produced by citizen scientists. 

(b) Is Kenya's bird data in its rawest form safe? As someone who has experience in data engineering, I'm really not sure. In my opinion, public data like this should always live with public institutions. Be it an open repository like GBIF or a university like UCT. We need raw data to be safely stored.


Where do we go from here?

These are my suggestion on the way forward:

1. I think the BirdLasser agreement needs to be rethought: It makes me uncomfortable to have a private firm with no institutional or university backing own all the data that birders collect. BirdLasser is wonderful and have provided a brilliant service but perhaps we could set up an institution that gets all of the BirdLasser data and keeps a backup -- perhaps in GBIF or at the museum? I know some of the pentad data already exists in GBIF. It would be great to get raw data in there too. 

2. Raise funds for the proper development of the KBM website and backend: From what I gathered, the project seems to be understaffed at UCT and is in desperate need of funds to build out a team. 

3. Have some sort of accountability mechanism: This is standard practice in any professional work environment where there are deadlines and targets that are agreed upon and then evaluated. The atlas currently does not have that. We have very little visibility on any of the upcoming features or plans. Perhaps the Bird Committee at Nature Kenya can demand accountability and transparency on what's happening with our data and why it's so hard to get raw data from UCT?

As I wrote earlier, I really hope this email will spark a productive conversation towards ensuring better quality and open data. I hope some day Masters or PhD students in Kenya and all over Africa and the world will be able to access complete data easily and enable conservation efforts.

Best wishes,
David