** **

*Guest: Kerrie Mengersen *

Kerrie Mengersen (@KerrieMengersen) is Distinguished Professor at the Queensland University of Technology in Brisbane, Queensland, Australia, and past-President of the International Society for Bayesian Analysis (ISBA) . Her research spans Bayesian statistics, computational statistics, environmental, genetic and health statistics.

**Rosemary Pennington**
: When it comes to addressing environmental or public health related
challenges, the conversation often centers on what we need to do to fix the
problem, whether that be a warming planet or dying bees. Less focus is
spent on understanding how we know what we know. How do we know the planet
is warming? How do we know that certain insecticides are killing bees? The
statistics that help us better understand such challenges are the focus of
this episode of Stats+Stories. Stats+Stories is a partnership between Miami
University's Departments of Statistics and Media, Journalism and Film, and
The American Statistical Association. I'm Rosemary Pennington. Our regular
panelists are Department of Statistics Chair John Bailer and Department of
Media, Journalism and Film Chair, Richard Campbell. Today's guest is Kerrie
Mengersen. Mengersen is a distinguished Professor in Statistics at
Queensland University of Technology, Brisbane Australia. She works on
statistical methods and computational tools seeking to apply them to real
world problems and Health, Environment and Industry. Mengersen is also the
President of the International Society for Bayesian Analysis. Thank you so
much for being here today Kerrie.

**Kerrie Mengersen**
: You're welcome. It's exciting to be on.

**Pennington**
: Just to get started if we have a lot of different listeners, if you could
just take a moment to explain what Bayesian Analysis is, that would be
great.

**Mengersen**
: Bayesian Analysis is a type of statistics. It's both a way of thinking
about statistics and a way of applying it or analyzing data. There's a
couple of things that happens in Bayesian Statistics. First, we try to
understand what the underlying parameters or drivers of a model might be.
So, if we're trying to describe the system then we're after good estimates
and good understanding of the underlying dynamics or features or the
factors that drive that system. And those factors and dynamics in the way
that we describe the system is through a statistical model. and we're
interested in being able to estimate those parameters that underlie that
model. So, what we want to do is not just get point estimates for those
parameters but we want to understand the whole distribution of those
parameters. And the way that we think about that distribution is that it
characterizes the uncertainty or our state of knowledge of that parameter
that we're interested in. The other thing that we think about is that in
doing this we can actually add different types of information to our
models. And we do this through prior distributions. So those prior
distributions might be uninformative, which means that the data tell a
whole story, but they might also be informative because what we find is
that very often when we're looking at a problem we have the data but we
also have a lot of other information to make our estimates and our
understandings much more rich.

**John Bailer**
: So, I'll ask Richard a question. So, Richard what was the most confusing
part in thinking about this for you?

**Richard Campbell**
: I think I would like an example. If she could illustrate this with an
example.

**Bailer**
: Okay, so pulling this into the realm of environmental applications, could
you flesh this out with a simple example? Maybe a simple example with maybe
one parameter that's driving the bus here.

**Mengersen**
: So, suppose that we have a diagnostic test, and if you have cancer then
the test can be very good at diagnosing that you have cancer. So that's the
probability that the test is positive, given that you have cancer. But what
happens if you go in to the doctor and they say well the test has come back
and the test is positive? So now what you really want to know is what is
the probability that you have cancer given that the test is positive? Now
that's a very different question and so the first one was the diagnostic
capacity of the test. Which is - what's the probability that the test is
positive given that you have cancer and the other one is - what's the
probability of cancer given that the test is positive? So we're turning the
question around now and we really want to know about that probability of
cancer. And so, in order to get to that we need to use Bayes rule to turn
that probability statement around. And we need prior information about how
rare that cancer is. And that's what we do in Bayesian Statistics. So, if
we take an environmental problem then we could think about what's the
probability of a species being present in an area, given the data that I've
got? So, another way of thinking about that is in a classical or
frequentist statistical problem what we would say is, what's the
probability our observations or our data set given that the species is
present. but what we really want to know is what's the probability of a
species being in this area or being present given the data that we've got.
So, we're turning the problem around and that probability is presence then
we can model through Bayesian Statistics.

**Campbell**
: I understood that! That's good. Well done.

**Bailer**
: Yeah, I knew we needed to push the pause button there, because I saw
Richard going a little green earlier.

**Mengersen**
: How much statistics can we talk about? Like can we talk about theta given
x and so on, presuming [inaudible]

**Bailer**
: Probably not. I mean you and I can but for this group what we're trying
to do is, we're trying to pitch this in a way, if you were going to
describe this to the general public. We want to tell the statistics behind
the stories and the stories behind the statistics. That's our catch phrase
in describing the purpose of the program. So, in some sense the question
is, "if you've done this complicated model that involves prior
specification where you're specifying the uncertainty associated with
parameters that are important for making the model for making decisions",
how ultimately do you take something that's very complex like that and then
make it accessible to a larger audience? So, one thing that might be a fun
way to begin this is "what's one of the most interesting environmental
applications that you've worked on?

**Mengersen**
: Sure. So, well, there's been a number of them and I'm proud of many of
them. If we take, for example, a recent one we've been working on which is
hunting jaguars in the Amazon and what we wanted to do there was create a
Jaguar corridor across the Amazon so jaguars are a threatened species and
we're interested in being able to create this corridor where they'll have
safe movement across the amazon. The problem is that we don't have a lot of
data to tell us about where these jaguars are or how many there are. So,
what we're interested in the model then is what is the probability of a
jaguar living or hunting or moving through a particular area given the
data? But the problem is we don't have much data. So, then we have to say,
"well what other information could we incorporate into this model to help
us? So, the Bayesian framework allows us to incorporate that model through
prior distributions and through other means of combining data or
information in a very principled manner. And so, the kind of information we
can use is expert information from local people and also information from
experts around the world.

**Bailer**
: That sounds really cool. so, did you define that corridor?

**Mengersen**
: So, what we did was we had to work out how to get expert information from
around the world and it's very difficult to take the experts to the Amazon,
so we were thinking about ways and we've been working on this for quite a
while how we actually get good information from experts, and this expert
elicitation from people is a statistical problem all of its own, so how do
we ask questions in a way that people can answer them and then we can add
them to our models? So if we think about a little thought experiment, if I
had a map in front of me and I asked you at different places how likely is
a jaguar to be here, then you might look at that area, and given that you
are an expert on jaguars, you might say, "well, there's a 70% here +/-,
because you're not really sure about that number, and if you said that in a
number of different locations and I know the characteristics of those
locations then I can build a statistical model that will enable me to
represent your understanding of where jaguars live based on the features of
that landscape. Now that gives me a statistical model of your expert
information, and I can add that to my statistical model based on the very
little data that I've got and that makes a very rich model. So, we then had
to work out, well, we could use map and that would be fine but what if we
could actually put the experts into the jungle then that would be a lot
richer. And probably give us more information. But we cannot take all the
experts to the jungle, not all of them, so we took the jungle to the
experts by creating some virtual reality. So, we went into the jungle we
took 360 cameras and worked with the local people. They loved the cameras,
they took the cameras to places we couldn't get to deep in the jungle and
then we created virtual reality scenes from those 360 photos and films and
then we aware able to present those to experts. And from that we've could
create better models. We've been able to use that then to identify areas
that are more likely for jaguars to live and then work with the governments
in Peru and this is work that is still going on to connect those areas. And
that creates the corridor. Part of that work is still in train for the
research that we're doing, but it's been completed there and it's been
handed over to local conservation organizations to continue the discussions
with the governments. But they're very excited about it. They love the
virtual reality and they're all on board in the project. So that's very
exciting for us.

**Pennington**
: You're listening to Stats+Stories where we discuss the statistics behind
the stories and the stories behind the statistics. The topic today
understanding the environment through stats. Our special guest is
Queensland University of Technology Distinguished Professor of Statistics,
Kerrie Mengersen. You were just talking about some very technologically
rich and savvy ways of creating this statistical model. How often are you,
and I know you have a research group, using these techniques to create
statistical models?

**Mengersen**
: Well, we've been using these different kinds of technologies to improve
our statistical models in a number of applications so as well as creating
the jaguar corridor we've been using these techniques to better monitor the
Great Barrier Reef here in Australia. So, The Great Barrier Reef is one of
the world's natural treasures, it's 2100 kilometers long, so like in a lot
of Europe you would cover going from one end of the Reef to the other north
to south. We do a lot of monitoring of the Reef over the last 20 years
because it's so big that monitoring has only happened in a small number of
areas and there's a lot of Reef that we just don't have monitoring on. So
how can we get information from those areas to help improve our models of
the Reef and Reef health, coral cover, fish bio-diversity, the impact of
cyclones and crown of thorns and so on? Well, there's any number of divers
out there diving on the Reef. And, if we could use their photos that
they're taking perhaps we could get experts to go into those photos and
then tell us about the state of the Reef. And we could use that information
to improve our models. So, we've been doing that. We've created a virtual
Reef, which allows people then to go onto that site and geotag their photos
to different areas they've been diving and then we can get experts and
local people to access those photos in 2-D and 3-D and virtual reality
photos. We can go into those and extract information about what they see
about coral cover and fish bio-diversity and then improve our statistical
models. So that's really exciting. We're working with different groups that
have underwater vehicles that take photos and film underwater, we can use
that information as well.

**Bailer**
: So, it sounds like that would be establishing a baseline for current Reef
health, is that right? Is that what it's doing?

**Mengersen**
: That's right and also it's a really dynamic way of modeling because
there's people that will be uploading photos all the time and then that
model keeps changing. So, if you imagine that you have a map of the Reef
and the Reef health and then as people add information or add photos to the
virtual Reef then that map changes so you're getting this dynamic updating
of the health of the Reef, or of the underlying statistical model as people
add that information.

**Bailer**
: So how long has this model been running?

**Mengersen**
: So, we commenced the project in 2016. We developed the virtual Reef at
the end of 2016 and then last year we've been finishing the statistical
modeling that's underneath it.

**Campbell**
: Kerrie, one of the things a lot of our guests is about the way that your
work or scientists work that statisticians work gets translated to the
general public through journalism through news reports. And you've had to
do some of this work, I suppose, where your work gets represented to the
general pubic, can you talk a little bit about what a journalist might do
to improve how they report on the work of statisticians based on your own
experience?

**Mengersen**
: I think it comes both ways. The statistician learns through working with
journalists of the best ways to be able to tell their story. The journalist
then also gets to learn how best to guide the statistician in telling that
story. I think that there's a balance between being able to tell the story
and being able to still be true to the science or the statistics
underpinning that. It's a difficult thing that we have to learn to do.
Sometimes the story is about the story and it's conveying that statistics
can work in a range of areas and that can be exciting to bring
statisticians into that area. For example, we have a lot of young people
here just at the moment, we have our vacation research projects where young
statisticians in their first year come to work on a project a lot like our
jaguar project but on koalas. So, they're, we want to know how many koalas
are in the areas that they're looking for conservation or development, and
they can be tricky to see, so we're creating virtual environments where we
can put experts into those environments and then say how many koalas would
live in these areas and then we can build models that would be able to
predict the number of koalas in target areas for councils. So, students in
statistics can come in from first year to third year, they not only learn
about statistical modeling but then they go out and they just sampling in
the field they take 360 photos, they build virtual reality and then they
get to interview people about what they see in those photos and then they
get to add that to their models. That's exciting for us to be able to do
that. When it comes to telling that story as a media story it could just be
about that we're coming up with better ways to be able to monitor koalas to
help councils, but it could also be about how young people can get involved
in problems that are really important in our world through statistics. Or
it could be about the statistical models themselves, depending on the
audience.

**Bailer**
: I'm sure that your jaguar and Reef analyses have gotten attention in the
mass media, what was it that was the focus of these reports? Did they dive
in at all to any of the modeling that you did?

**Mengersen**
: In those stories it was more about the stories. The Reef Project. Well,
both of them picked up on the different kinds of information we can use to
improve our models and our understanding of environmental systems. So, we
can use data but we can also use expert information and citizen science.
And so, there's a big interest in how we use Citizen Science and there's a
lot of problems with those kinds of data but there's also a lot of
potential. So, if we can learn as statisticians to better use citizen
science data, then we have a really rich resource with which to develop our
models and better understand our world. One of the projects that we have
that has attracted media attention and where the modeling has been
important is in the development of a national cancer atlas. So, in this
we've also been using Bayesian Statistical models, because we want to be
able to develop good probabilistic estimates of cancer incidents and
survival across Australia and we want to be able to do that at the small
area level. So, then we have to be careful that we preserve privacy and
have robust estimates. And that requires us to build careful statistical
models, and a Bayesian Framework is best for being able to borrow strength
from neighboring areas to improve estimates in a small area level because,
for each particular level be don't have a lot of data but we can borrow
information from neighboring areas to improve the estimates of each area,
and we can also then have estimates not only of incidence and survival but
also the uncertainty around those estimates and that's important then for
managers and also for ranking those areas. Understanding the differences
between rural and urban areas which in Australia is a big issue in disease
and medicine and in particular cancer outcomes.

**Pennington**
: You're listening to Stats+Stories and our discussion today focuses on
some real-world application of Bayesian analysis. Kerrie, what advice would
you give to a young person who is at university and is interested in doing
some of the work that you've been talking about. Whether it's the cancer
atlas, the koala work, the jaguar work, someone who is interested in
Bayesian Analysis, what should they be thinking about as they move through
university?

**Mengersen**
: I find that the people who have a quantitative background particularly a
stats background have a real advantage in whatever area they want to work
in. There's such a demand for people with good quantitative skills and that
creates the foundation for going into different areas. So, if they want to
work in applied areas, then they have a strong statistical background. We
have students coming in from first year to third year, they're amazing.
They're picking up new statistical methods that are required for problems
that they haven't seen before and they access information from the web to
learn about those methods and then they work out the coding and they work
out how to apply them but it's because they have this underlying
statistical foundation in their training. And then they have the
adventurous spirit in the statistical sense in that they're willing to push
the boundaries of what they know. So, having that openness in learning new
methods and then just going and finding teams. None of this happens in
isolation. The projects that I'm talking about require people in computer
science, and conservation and public health and visualization and it's very
exciting to work in these teams and with the industry people as well. So,
people from the Great Barrier Reef Foundation, people from the Australian
Institute for Marine Sciences… one of the ways that I got to work on
interesting projects in the Antarctic was going along to a meeting where
there was some research being presented on the Antarctic and saying "I do
stats, is that of interest?", and thinking nobody would pick that up but
next thing I know I'm doing some really cool work in the Antarctic in the
way that fuel is delivered in the Antarctic is by helicopter and drums of
fuel are being dropped at different sites, and , "what if a drum explodes?"
and then you have some area that's being very toxic in the soil, the soil's
very thin, how does that affect soil bio-diversity? I thought soil, I
thought how can soil be interesting? But it turns out its hugely
interesting and there's a lot of dynamics that happen with all of the
little bugs and species in the soil and I never knew that until I went to
that meeting and put up my hand and was willing to work on the project. So,
I think having the underlying skill set but then finding an area of
interest and then going and putting up your hand and saying, "Hey, can I
work with this team".

**Bailer**
: It's neat to hear you say that. I think one of the joys about being in
statistics is being able to play in other people's space and to learn, to
continue to learn about the problems that they're working on.

**Mengersen**
: Certainly, but I also think it's important to respect that we're a
profession of our own and so we absolutely have the ability to develop, we
need to develop the methods that we're using so it goes hand in hand. By
working on an application, you see that the models and the tools that we
have are not sufficient for many real-world applications and we need them
to further develop the methods and the theory and the computatuional tools
that we have and when we develop them then we can answer more questions
which raises more questions which means more development of the theory and
the method. So, they go hand in hand, and there's a real pipeline behind
the real need for good theory and then the translation of that to methods
and computational tools and back again.

**Bailer**
: No disagreement here. I'm really intrigued about some of the stuff that
you mentioned earlier, the idea of "Citizen Science", and that to me is
really awesome to consider how that might play out in an analysis. I guess
one example was your Barrier Reef example where you were having divers take
photographs at certain locations and then geo coded in terms of there they
were taken and then having a deeper exploration of what was going on at
those different sites. Can you give me a couple other examples of citizen
science that you've been using in analyses?

**Mengersen**
: Well, one example is if you think about birds and mapping of bird
species, or even understanding the dynamics of bird's movements for
example. So, there are a lot of people who are interested in bird watching
and there's a lot of data that's been collected in terms of records of
where people have seen birds. And we know that there's problems with that
so statisticians say "you could never believe those records. People only
record birds where they are, so we're only ever only going to see birds in
areas, if we use those records the birds are only ever going to be in the
areas where the people are", and also they might misreport or want to
embellish what they've seen, there's a lot of things that could go wrong
with those data, and so we could throw the out but then you think, "well,
there's a lot of data there so maybe there is a real signal in all of that
noise". And as statisticians that's our job, to understand the signal in
all the noise and also to be able to pull out the stories that the data is
telling us. And so, if we can do that with citizen science and come up with
ways to address the problems that are in those data then we've got a really
rich source of information that we can use.

**Pennington**
: Well, Kerrie Mengersen, distinguished professor of Statistics at
Queensland University of Technology, thank you so much for being here
today.

**Mengersen**
: You're very welcome.

**Pennington**
: That's all the time we have for this episode of Stats+Stories.
Stats+Stories is a partnership between Miami University's Departments of
Statistics and Media, Journalism and Film, and The American Statistical
Association. You can follow us on Twitter or iTunes. If you'd like to share
your thoughts on our program send your emails to
Statsandstories@miamioh.edu
, and be sure to listen for future editions of Stats+Stories where we
discuss the statistics behind the stories and the stories behind the
statistics.

Click to close the script.