Information webinar for biodiversity data processing challenge

Transcript

Brad Cuthbert
Well, we we can kick off here.
So just confirming up front.
This session's gonna be recorded and uploaded online. For information sharing, we can cover off on some of those details a little later, but my name is Brad Cuthbert. I'm actually from the private sector partnerships team within Advanced Queensland. We have managed the private sector pathways programme and say we're focusing on our current partnership with the Science Division, which is outlined on our advanced Queensland website.
Joined also.
His colleagues melind.
Questions for us today and we've got Daniel Ferguson trying to get Dan Bruff as well.
So good timing. Dan's joining us as well.

Daniel Brough
Yep, sorry.

Brad Cuthbert
Do you guys wanna? Maybe both provide a a quick intro to who you are and where you're from. I'll probably be useful at this time.

Daniel Ferguson
Do you wanna go down? Are you here somewhere?

Daniel Brough
Yeah, you can go first then.

Daniel Ferguson
Alright, I'm Dan Ferguson from the QLD Herberman biodiversity sciences. I'm an ecologist here within Department of Environment and Science and in.
Yep, information and.
My expertise lies in both the ecological space and, you know, big data management and managing a really diverse ecological set of data.

Daniel Brough
Yeah. Thanks. And yeah, I'm Dan Bruff on the science leader in the information and digital science delivery group. And so my role's to provide.
Scientific Computing support services to all of our scientists in the division, and so that includes providing our data platforms, but also advice around AI.
And advanced technologies, which is where I'm in helping support Dan with this process.

Brad Cuthbert
Wonderful. Thank you both for joining us and.
Get to sorry.
Just find quick indulgent.
Acknowledging the traditional knowledge of the land of which we all meet today for office in Brisbane, it's the younger and terrible people, recognising their continued connection to Lance bars and communities.
Like to pay our respects to Aboriginal tourist trade, all their cultures and to elders past, present and merging and just acknowledging that they were our very first innovators so.
Again, privacy notices, so this webinar is being recorded.
So.
And it will be uploaded to the events Queensland website.
In order to help information sharing for those who are unable to join us today and.
Hear of be part of this conversation with the challenge owners.
The other consideration is obviously we're entering into caretaker mode with the Queensland Government election coming up on the 26th of October, so these details are included and and as part of the application process.
Online. So just be mindful of those parameters.
So again, we're we managed the private sector pathways programme programmes all about working with corporates to define challenges that we put out to market to find innovative solutions for Queens and businesses. Today we're done over half a dozen challenges from different corporates. This one relates to the work of the Queens and Herbarium and Biodiversity Science Group.
And this is just a quick snapshot of the process for the challenge.
We've had this challenge open for almost two weeks now with closing date set for 26th September at 2:00.
We're now in the information exchange process, which is the stage that could most be affected by the caretaker period commences from the 1st of October. We undertake sorry, we anticipate undertaking the preliminary assessment during this time.
With short list of solutions presented to the panel.
And following the establishment of new government in Queensland, we looked at formally engage with the winner, developed the contract for up to 100,000 for this challenge and kick off the pilot project for up to six to eight months.
This challenge is in partnership with DESI and the Queens and herbarium Biodiversity Science group.
So we've we've partnered with the Science Division in DESI for this for the the COFUNDING.
And the cofunding and mentoring will be as part of the. The programme will be done through science and advanced QLD.
We do. So we're looking for a wide range of innovative solutions to this challenge and there's some eligibility criteria as set out here in the online document documentation.
We're looking for established SMEs based in Queensland and that refers to the ABN registration details for the business.
Regards to the development stage, we consider a minimal viable product as product or service that is beyond initial proof of concept or prototype stage and has been tested with potential customers.
So just a bit of clarification there. We're not looking for very early stage solutions and we're not looking for businesses that have already been in contract, but being in contracts with the science division.
There's a diagram there which may help you determine what stage of development your product or services in relation to achieving MVP stage.
We've set up some key dates for 2024 for this challenge. As mentioned, we've we've had the challenge open for a couple almost two weeks now, October 29th of August is when it opened them up. They were doing the information webinar. Applications are set to close at 2:00 PM on the 26th of September.
At this stage.
Any states are to be confirmed, but we're looking to short list applicants and have them notified.
The week of 14th of October.
And schedule the pitch presentations the week of 21st October. So keep an eye out and have those dates in mind.
Or for the next stages of this challenge.
We this is kind of a snapshot from the events Queensland website, so applications can be opened from the private sector Pathways website. It's the same page that you visited to register for this information webinar. We've addressed some of the key things we're looking for from your application here as well.
Links from the web page will take you into a Smarty grants application process where you can fill in the respective information for your solution.
And as far as evaluation goes, we have some criteria we'll be assessing for that specifically will be how well the solution addresses the challenge.
Does your team have the capability to deliver the solution and is the solution viable in terms of the technology? Is it at least at an MV AM VP stage? Is it identified earlier and you'll also be asked to provide an activity budget to outline how the funds will be used so the panel can assess if it provides fair market value.
It will be a closed pitch event.
So shortly listed applicants will be invited to pitch their solution to an expert panel. We anticipate over 15 minutes for the pitch with about 15 minutes of questions and answers following. We will advise those successful of the venue.
Level got a couple. I think we're talking a couple months from now. So we have some time to figure that out. Winner will be notified subject to a caretaker provisions and six has identified the successful applicants will receive grant funding of up to 100,000 to pile their solution for up to six to eight months within with the Science division.
So.
We can hand it over here to our partners to provide some more details on specific to the challenge. So I'm not sure Dan which which Dan wants to drive and just give me an indication where where you are in the slides and we can progress that for you.

Daniel Ferguson1
Joe, thanks Brad.
And thanks everyone for coming along today. I guess my role here is to to provide some background on on what the problem is and and what our current work processes are to process.
You know, really vast amounts of remote camera trapping data that we generate from.
Field deployed remotely deployed cameras that take photos of wildlife and and we need to then bring that data back into the office and.
Process that to find firstly find and then identify.
That to species or species groups and it really depends on the quality of the image how far we can identify those those species or species groups. And it also depends on what our research or monitoring question is.
So the the cameras are deployed to answer a whole range of ecological questions from, you know, species presence absence, which is pretty basic right through to species abundance, population trends and movement patterns, and lots of things that we probably haven't thought of as well.
These deployments can be sort of in the more traditional manner for ecologists out there will understand that cameras were designed for sort of medium to large animals, usually mammals.
And we sort of push the boundaries of how camera traps can be used and modify them quite significantly to try and detect smaller cryptic species like geckos and skinks, some of which you can see sort of photo didn't photograph during the presentation further on.
And when we shift into that sort of focusing on reptiles and frogs in particular, camera traps aren't designed for that. So we often have to use time lapse triggers and that generates huge quantities of.
Data to process and and often a lot of empty images that we need to kind of quickly get rid of.
So our process is basically download field collected data off the cameras and we do have some live cameras that feedback through through networks and and the like. But we we pretty much treat those very much the same and and batch process those back in the office.
For us at the moment we're we're we're pretty.
You know basic and and it involves running models like mega detector as a first cut over the top of those datasets to identify animals versus empty datasets.
It removes false triggers and boxes up potential animals that we can then go through and start picking identifying.
Megadetector and lots of the other sort of open source models giant really work that well for a lot of these species that we work on. Some of the geckos.
And skinks just don't picked up at all, even if they're quite obvious in the images.
So we'd like to be able to kind of improve that. We do use programmes like time lapse to display the results and use their filtering processes to display the mega detector results and then philtre down to help speed up the process of identification and and and quickly check, you know what, it's flagged as as empty images and that they are definitely empty.
Once we've kind of identified stuff manually, we'll we'll then bulk up load that data into our SQL database.
So as you can probably tell the, our current focus is pretty much mostly on on vertebrate groups. Your frogs, reptiles, mammals and birds. But I think the ideal solution probably should be capable of being trained to identify pretty much anything, but shows up in a photograph. So even invertebrates, butterflies, for example, things like that to be pretty straightforward to do.
It's not part of this project, but you've probably noticed in the in the documentation that this project is looking into the future, and it would be really good to be able to incorporate a solution that can be then expanded or adapted to include acoustic analysis as well, but that's not actually part of this challenge statement at the moment.
Next slide please.
So what does success look like?
Well, we're really chasing, you know, are you are user friendly end to end work fright flow rather than.
Models to be built for us rather than actual AI machine learning models to individually identify species or species groups we're we're really chasing that kind of end to end workflow and by that I mean the ability to create our own models or use existing open source models as as starting points to pull stuff out of these images we'd we'd like to be able to generate kind of transparent model.
Models that give US performance statistics because sometimes.
All we might be after is is presence detection from a really large sample, so we can kind of adapt our sampling strategy and we're not chasing a really well performing model to give us, you know, great results. We might be happy with a a poorly performing model that just gives us some presence records.
Other times we might might prefer a well performing model because that question is around abundance or population trends and and accurate identification then becomes much more important.
So having control over over building that model and what goes into that model and and how well it performs, it is quite important to us rather than.
Kind of almost black box models that that churn out a result which is often well performing. Sometimes we might want to be able to build that really rapid model that that doesn't perform well, but it actually gives us the answer that we're chasing.
As we build these models, it should be easy to be able to remove certain training data from these training data sets as well. This is really important if models are refined enough to detect down to perhaps individuals of a species where. So when, for example, we remove a cat from a during management, we might want to be able to remove that individual from our training data set so that we no longer have that particular ginger.
Chat it really does need to be intuitive, easy to use interface that allows users to create that, curate that data sets.
Easily upload data, choose models to run, extract relevant metadata from the images themselves.
And then step onto that reviewing and tagging images after a model has been run and and using some some nifty data philtres to be able to speed up that kind of just data discovery and.
Display basic data analytics as well. Maybe things like heat maps and graphs of abundance or something like that, but that's an added bonus I think.
We'd also like you know, the the the capability to take all these images that we are tagging as experts that we are verifying that we've identified correctly and and be able to start building a a bit of a data library I guess of of tagged and and verified data images that we can then you know use for training purposes and testing models and the like across the department.
It'll be great if that.
Library could allow for this kind of taxonomic hierarchy where you know in the future more complex multispecies models can be can provide a prediction at the lowest taxonomic level that is confident at. So by that I mean you might be confident to identify a bandicoot, but but not exactly what species of bandicooted is, so it would return you the answer of bandicoot.
And of course, you know, having capability to be able to share these models both internally across the department, maybe the the lesser performing models might be shared across the departments platform, but also be able to push these particular models more broadly to the public for, for, for use outside of DESI as well.
Slide 3 please.
So this is a really rapidly developing area. We're well aware of that and there's there's lots of.
Lots of AI and machine learning models being developed across the globe. In Australia, there's lots of big projects as well. Looking to develop lots of models. They're often for you saw more more commonly encountered or iconic species, but as researchers.
Interested in kind of monitoring and protecting all of Queensland's biodiversity? We'd really like to build that capability to develop bespoke models for all of these kind of highly cryptic, often threatened species.
As well, of course, as the common stuff. So we kind of want a bit of everything really.
And by speeding up these data processes, it allows for much more timely and a greater focus on.
Prioritising and and managing Queensland's species and ecosystems so that that's what we're really looking for is to kind of speed up that whole process that currently takes us. You know, a lot of personnel to to process. We're hoping to cut that back and and speed it up quite significantly. And that just makes the data much more readily available, readily available, faster and and for decision making processes that's really important.
I think I'll just Passover to Dan now to talk about.
The technical considerations.

Daniel Brough2
Bye. Bye. Thanks Dan. So Brad, can we just jump to the next slide please? Yep. So just a couple of technical considerations here for this one.
Because we are dealing with species that may be rare and threatened, we need to be able to handle data at the sensitive security classification level.
Obviously.
We treat data around.
Those rare and threatened species.
Quite closely because you don't want to.
Sort of publish where where they are.
The management reasons, so we do need the data to be at that level.
We also need the data sovereignty and data ownership to be respected.
Umm as how the data goes and feeds into models.
We we'd really like this solution, whatever it is, to integrate with our existing data platform. So Dan has quite a lot of data in that platform. His systems are there. So having a solution that integrates.
Our platform is.
Modelled on the Azure sort of end to end analytics workflow.
If you're familiar with that, it doesn't mean that this solution needs to be in that platform.
That's not a requirement for here. It it may be, but it also it just needs to be able to integrate with.
And we're looking for the solution to overcome some of the local or sort of more desktop processing constraints that Dan and the team are currently facing.
And I suppose importantly, for this one around the initial scope is to replace the desktop processing and not to provide any infield solutions. So for this solution, we're not looking for something to do, you know, AI at the camera trap level or to help us, you know, have cameras that you know are feeding data back automatically and being monitored automatically and.
Remotely.
So we're not looking for that as part of this initial scope in the solution. It is really just to replace that desktop processing as part of that end to end workflow that Dean was discussing.
And I suppose the the big one. I'll just reiterate it again, was that data at the sensitive classification level. So that's the big one for us to make sure that that data is secure and only limited numbers of people actually get access to it.
The models that predict could be treated differently, but it's the data feeding into the models and how they go. So This is why Dan said you we might be able to share models in the future.
So that would be fine.
So, but it's around the data that we'll go into training them all the data that may feed through the model itself on on our end.
So that's pretty much it on the technical considerations, so.
Dan, did you want to take this one on the context?

Daniel Ferguson2
Yeah, I thought you were gonna do this one, Dan.

Daniel Brough2
Oh, OK. Oh yeah. So yeah. So the the context for this one is we are seeking creative solutions.
To to the problem.
As Dan said, there are bits and pieces out there to the to solve components of this for the more iconic species, but we're looking for a creative solution to help with those rare threatened and cryptic species.
We're really looking for a capability that is enduring.
But also easily expanded. So starting off with the work that Dan and and his team do at the QLD Herbarium.
But then looking at how that can be broadened out across the department broadened out to additional species.
We're looking for something that's easy and intuitive to use for a range of stakeholders.
That's pretty important because we've got a whole range of skills that we deal with even within our, within our team or with and within Dan's teams.
Before we even talk about broadening it out.
And again, as Dan mentioned before, one of the the key things for us in around the context for this is the process for easily refining models, adding new species, expanding it out. So that's sort of some of the key bits of context of of why we're doing it. It's good to be able to identify a koala in a photo.
But there's lots of models that can do that, but we're looking for the, you know, the rare and threatened and the cryptic species, which are much less common.
No, I think that pretty well covers the context there, Brad.
I'll hand back to you then.

Brad Cuthbert
That's wonderful. Thanks for that, guys. So we have now time now to open up for some quick questions to the team.
You switch over here and see if I can actually see the Q&A.
They own Q&A or chat as well Q&A Q&A. Loading so going through these.
Not sure if you guys Dan's can see it, but some Sebastian does the data need to be saved in Australia or can these be saved stores securely in storage overseas?

Daniel Brough
Yep, obviously our preference is Australia if it is going to be overseas, there's some additional security assessments and that that need to be done.
But we could discuss the specifics of that around the actual, you know, security and everything else there. We're not prevented from going overseas.
But there is obviously increased.
Concerns around when data does go overseas.

Brad Cuthbert
Looks like 3 questions from Sebastian. What's the name of the existing data platform? Is it private and what information is being saved in the data platform and detections from the ARAI or species occurrences?

Daniel Brough
And so in our data platform is the QLD environmental Science data platform.
So it's a prop. It's hosted in public cloud, but it is a private platform. It's something that we've built. It's secure to our agency.
Umm.
So and yeah, the data is being saved into that platform. That platform handles information up to the sensitive security level.
And.
Predictions from the AII think what we'd be looking for in this one is detections from the AI species occurrence to be able to be sent back to that platform.
Is what we'd be looking for in this solution, so an integration. It doesn't need to be within.
The the solution doesn't actually need to be within that platform to be suitable for what we're looking for.

Brad Cuthbert
And the last one is Desi and QHBS open to paying a user base Slash yearly subscription to access the software services you're looking for.

Daniel Brough
Yep.
Dan, do you want to take the first crack at that one?

Daniel Ferguson
Sure. I think I think that it's it to that is is definitely yes, something that we can consider depending on on costs and and what our budget looks like at the moment. So yeah definitely.
Capacity to be able to pay for software services and or subscriptions, but yeah for the right solution and for the right costing.

Brad Cuthbert
So we have another one from Cassie. How many types of species or classes were?
You're intending to build and. Is there a sufficient amount of labelled training data for all species categories?

Daniel Ferguson
Oh, like every ecologist out there, I think we'd like to be able to build identifiers for every species that occur in Queensland. But but we know that that's that's a a very nebulous.
Question that that is probably not attainable in in anytime soon.
For the cryptic stuff that we're interested in, we have.
Variable amounts of training data. Some of the species will have quite a lot of training data, others we will have very limited training data and we we'd need to kind of iteratively kind of build these models as we go. So as we detect more of these individuals, we would go back and and retrain models and and improve performance and and that's sort of some of the reasoning behind.
Looking for.
Model building process if you like, where we have a little bit more control over you know building that model and accepting that you know it may not be performing particularly well on the ones at the in the background of an image or something. But as long as we keep getting detections, we can keep building kind of upon that model and using.
Our knowledge of that species to kind of keep progressing those models a little bit further along.

Brad Cuthbert
I think that's all we have for online questions. That's the form.
So obviously I think the the.
Requirements documents challenge statements has has been set out and thanks to Dan, Daniel and Daniel for for your efforts in building those up for us.
We do have.
We do have.
Slide here key links, so again there's the the application page.
And again it's the same page where you would have gone to register for this, but anybody that's watching this subsequent to this webinar, that's the the location for the application and the recorded video for today.
And an e-mail address. If there's any subsequent questions that we can hopefully answer.
That so just looking through, don't see any more questions at all. Did did you have anything else?
Guys, you wanted to add or or any other information to provide or.

Daniel Brough
Sorry, I'm just having a quick look at the.
Some of the questions we got beforehand, I think we'd answered most of them. There was one question around the level of technical expertise. So we've got people that are comfortable sort of running notebooks and you know, writing bits of code. We've also got people that are very much.
Brand sort of just the point and click interface. So we've got that range of expertise. But if the solution was intuitive enough that people might have to do a little bit of notebook work.
You know, cloud based notebooks or something like that to help with training models. I think that's fine.
So I went.

Daniel Ferguson
Yeah, I agree. Yeah.

Daniel Brough
Yep.

Brad Cuthbert
Frankly, who's had one more come in from Sebastian? How often would the camera traps be processed?

Daniel Ferguson
Uh, probably looking at something like once a month, something like, you know, 250 three, 100,000 odd images. So, you know, maybe half a TB, but it's really kind of project dependent. That's probably baseline. And then if we have a big project come on board we we might end up kicking that up to 1.5 terabytes or something as well. So yeah, probably evens out to to something around a couple of 100,000 images a month and Yep.
Just batch process those through is is the ideal process.

Daniel Brough
I suppose that batch processing we don't necessarily need the results immediately in that batch process.
No, just a bit of time, but it'd still be quicker than people manually looking for images.

Daniel Ferguson
No.
That's right. So at the moment, it can take months before we actually get around to to processing a lot of these things and having the staff resources to to process them. And and there's, you know.
Quite large data sets that probably don't get completely processed either that contain quite a lot of useful information, but if we answer our research question by only processing.
By sub sampling then, then that's what we can achieve. Whereas if we can get a a nice.
Workflow in place that the processes all of our images kind of much more automatically than than we get a much, much bigger data set to work with and it makes our science much more rigorous.

Brad Cuthbert
Another one's coming in. The current AI data and species currents being reported or sorry is the current AI dead and species occurrence being reported to athletes of Living Australia.

Daniel Ferguson
So our data flows on depending on security levels via the DESI departmental database called Wild Net and then wildnet. Again depending on the the sensitivity levels of of the data, it will flow onto Ali.

Brad Cuthbert
Right, well, I think.
That's it for questions and and thanks guys for addressing the ones that had come in off offline as well.
So yeah, again, if anybody has any additional questions and feel free to send through the inbox and we'll we'll endeavour to answer those and share the information as as relevant.
So yeah, unless there's anything else. So just thanks. Thanks to the team.
Very excited about the project and all the best to the businesses out there.
Putting in your your solution and hopefully either you have one or or looking to set up some sort of consortium to to solve the the challenge as a collective so.
Thanks to everyone and again, any more questions, feel free to send through to us and we'll address those for the team.
Thanks very much.

Daniel Brough4
Thanks all.

Daniel Ferguson4
Thank you everyone. Bye.

Melinda Shinkel stopped transcription

Last updated: 13 Sep 2024