Episode 33: Sifting through noisy data to find stories
Release Date: 05/19/2017
Mona Chalabi (@MonaChalabi) is the Data Editor of The Guardian US and a columnist at New York Magazine. As well as co-producing a four-part documentary series about vaginas , Mona has written for TV shows on National Geographic, the BBC and VICE. Mona draws. Her illustrations, which are designed to make numbers more relatable, can be viewed on her Instagram account and were recently commended by the Royal Statistical Society. Before getting into journalism, Mona worked in the nonprofit sector, first at the Bank of England, then Transparency International and the International Organization for Migration.
Program note: During this episode, Mona described a recent story she had written about the departure of John Thompson, Director of the Census Bureau and recent guest on S+S (Episode #32). Given the connection between the episodes, we opted to expedite the release of this latest episode. We hope that you will enjoy these conversations.
Rosemary Pennington: It seems now more than ever, we're awash in data, news stories abound as reporters trying to make sense of polling data or government statistics or the findings of a research article...all this while the very nature of facts is up for debate. The ins and outs of data journalism is the focus of this episode of Stats and Stories where we explore the statistics behind the stories and the stories behind the statistics. I'm Rosemary Pennington. Stats and Stories is a production of Miami University's Departments of Statistics and Media, Journalism and Film and the American Statistical Association. Joining me in the studio are regular panelists John Bailer, chair of Miami's Statistics Department and Richard Campbell, chair of Media Journalism and Film. Today's guest is Mona Chalabi, the Data Editor of Guardian U.S. Her job is to sift through the noise to find the stories in all the data that's out there she's also a columnist for New York Magazine and now, Mona you've worked for a number of non-journalistic outlets including the Bank of England and the International Organization for Migration. How do you find yourself at the Guardian now?
Mona Chalabi : Well I guess it was an interesting career move. I really wanted to get into journalism because I felt quite frustrated in my previous roles. A lot of the statistics I was taking a long time and gathering and collecting, weren't being shared with a wide audience and I see that wide audience is very important not just for my ego, but also for verifying that information so for example when I was working the international organization for migration, we were collecting statistics on refugees and internally displaced people, and kind of producing reports about what these individuals needed. But those individuals really have the chance to actually see those reports or see those statistics to tell us what we were and weren't getting correct.
Chalabi: So I think that having a wide audience is very important for statistical accuracy.
John Bailer: Very good! So so what are you working on now?
Chalabi: Well. Literally two minutes before I jumped on the phone with you guys I filed a story about the departure of John H. Thompson as Director of the Census Bureau which I'm sure all of you guys are well aware of as well.
Bailer: He was one of our recent guests.
Chalabi : Oh wow! No way...
Richard Campbell: Just like two weeks ago.
Bailer : We just did the recordings, we haven't released the second episode.
Chalabi: Oh my Gosh! And how did he sound when he was speaking? I rate him as one of the…as I was researching the piece that apparently, up until quite recently he sounded kind of optimistic about finding a way to make the Census Bureau's budget work.
Campbell: Yes with us too… although he couldn't talk about specific numbers.
Chalabi : Yes, well my understanding is it was only last week that that the budget bill was passed which means that the Bureau effectively definitely doesn't have the money that it needs to be able to do its work. I assume that's a big reason for his resignation but you know obviously that's just conjecture.
Bailer: Yeah, that that sounds like a pretty big story.
Chalabi: I mean it's a really important one to talk about I guess because it's a really big part of my role. It's kind of relatively easy to write a fun piece on sex or dating or how food habits in America are changing but trying to get the public to really, really care about something like this is a kind of a different set of challenges and I think this is a really big story. I think it's a really big deal. I spoke to a lot of government statisticians in early January and I was asking them about how they or what they thought the impact of the Trump administration might be on statistics in this country and I was kind of reassured and concerned by their responses that it was kind of across the board that people were saying we don't think that the administration is going to actually manipulate the numbers. What we're concerned about is them aggressively defunding statistical items.That's the point where they actually can't do their jobs and you're starting to see this happening I think it's worrying, it is really worrying.
Campbell : So you are talking about this bigger audience and then what are the challenges for you because you have to talk to that audience in a way that's much different than how you would talk to other statisticians. So what are some of your techniques? How do you tell those stories?
Chalabi: So I mentioned sex and food I think that choosing topics that readers are already interested in and engage with and I feel like it's relevant to their personal lives is a really important way to make sure that they see the value and importance of data. And so I'll give an example, I recently wrote a piece on again another Trump appointment and I believe it was the deputy assistant secretary in the Department of Family planning. And I actually forget the woman's name but she has publicly said on the record before that contraception doesn't work. So I wrote a piece. Sorry go ahead, you were going to say something.
Campbell: No, we were just sort of aghast.
Chalabi: OK. And so the piece I wrote was about the efficacy of contraception and yes, again I think that many people who, like journalists, don't concede that the numbers…there is a degree of imprecision about the numbers and part of our job is to acknowledge that imprecision and communicate it to people in a way that they can understand. So I wrote a piece that was explaining the differences between typical use of contraception and perfect use and that...and that basically is a story about probabilities. So again, to come back to your initial question about ways to make these the subjects engaging, is probably to take the subject matter which many people including myself care about, which is the odds of basically falling pregnant. And making that subject, A) kind of communicating, and B) using literally language so the sentence structure I'm using to write this stuff but also a big part of what I do is using visual tools. So in this particular scenario what I did was, I used the analogy of throwing two dice so and again this was in the visual which I made to accompany the story and also the story itself. So I could say to readers, imagine throwing two dice. The probability that it will land on, that you get two ones is about the probability that after three years of typical use, you will, if you're a woman, see that changes our thoughts, as a woman, you will become pregnant after three years of typical use and using the hormonal IUD and now imagine that you're using the withdrawal method or the pull out method. After three years of typical use, the chances that you will get pregnant, are more likely than the odds that you will roll the two sides, I'm sorry I realize that I am doing a terrible job of explaining this now… And the odds that you will roll two dice that will add up to a two, a three, a four or five or even a six. Again it's all mathematical concepts but they're very much related to people's everyday lives and hopefully in the process of explaining that, I can help people understand probabilities a little bit better as well.
Bailer: So how did you, how did you do this with the story with the Thompson resigning?
Chalabi: No peaceful visuals of probability unfortunately in that one. But I did make a choice again about language so rather than writing the stuff which is in news story which I know a lot of other organizations have chosen to do, I decided to write this as a comment piece because I felt like that would kind of liberate me to use the language that I thought was appropriate for the story which is you know, concerned language possibly indignant language and...so I can quote you a little bit of it, so I've written this is the second paragraph of the piece. If you're not particularly roiled up by the words Census Bureau resignation, that's understandable. Normally, the federal government quietly puts along measuring things like poverty, racial inequality, oh and determining congressional representation. That last one is a biggie. So again, it's about contextualizing it for the readers like this is why you need to care about these things, you know.
Pennington: You're listening to Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics. The topic today...data journalism. I'm Rosemary Pennington. Joining me, our panelists, Miami University Statistics Department chair, John Bailer and Media, Journalism and Film Department chair, Richard Campbell. Our special guest is Mona Chalabi, the editor of Guardian U.S. Now Mona, you have transitioned over the course of your career from being someone who it sounds like was collecting data and analyzing data to someone who writes about data, so I was wondering if you might talk a little bit about maybe what that transition was like for you cause you're talking a lot about the language, and the way you think about language when you're writing about stats. Was it difficult for you to transition from being more of a researcher to more of a sort of translator?
Chalabi: No, I don't think it was super difficult. I'm not saying there was a whole bunch of hurdles. I think very often when you are so focused on methodology, it can be quite inhibitive in terms of creativity sometimes, because I like to really focus on ways to visualize a story and communicate information, I think that part of it was relatively straightforward. I do sometimes worry though that I haven't kept enough of a foot in that field of collecting data and I think that's really, really important to be able to do data journalism, to explain to readers, not only the results, but how you got there.
Bailer: So you know one thing that with this transition and the focus on methodology and the focus on visualization, do you find certain media formats easier to do this with I mean if you're I know that you sometimes will use Twitter feeds to send out some of the graphics that you construct can you easily build these out and some of the ones that you'd like to use most in your Guardian pieces?
Chalabi : Yes, so for me, using pen and paper was a really really important decision. You know, there have been huge advances in the way that we kind of create data visualizations particularly interactive data visualizations. But I wanted to take a step back from that for a couple of reasons. One is just time often when you're in the newsroom, you know, you might have an hour or two to quickly throw some things together so obviously using paper and pen is very efficient and I have also thought that it has been a great way to bring in different audiences who don't necessarily see themselves as nerdy or geeky or any of those words that actually people quite proudly use sometimes that can be quite alienating to other people. So I wanted to use hand-drawn illustrations. The other massive advantages of both, I think, is that they very clearly communicate imprecisions to readers. And I think sometimes when you're looking at computer generated graphics, people see them as some kind of objective truth or higher reality which is really fascinating because sometimes when it's embedded in a graphic, it takes on this higher level of truth whereas people might feel quite healthy degree of cynicism when they are hearing something in a sentence. So yeah, that was really important and it's great. Anyone can do it... like that part of the purpose is that you look at these graphics and you think hmmm…I could definitely go home and use my iPhone and piece of pencil, piece of paper sorry and a pencil to do it myself…
Bailer: Yeah. I've…I've like the ones that I've seen…
Chalabi: Thank you.
Campbell: They are artistic. They really draw you in and that's what I like about them is one something I've heard John say is that, when you look at a graph or a chart it should tell a story and yours do, and I think the combination of the art skills you bring to the to the to do your work is really interesting.
Chalabi: Oh thank you. I really do hope to incorporate some of the subject into the visualization itself because I think it's quite problematic that so many charts that we look at today if you would sort of leave the labels, you would have no idea what it is that we are presenting and I think that's quite deliberate is part of the idea of objectivity, right?
And that then again if you want to bring people into caring about a subject, it's important that the tone of the subject is somehow communicated in the visualization itself.
Campbell : That's a really great point. Can you, can you talk about some of the stories that you feel aren't being covered, big data, so we talk about big data all the time. Particularly in the kind of mainstream newspapers, you I'm sure you are you know not not just the Guardian but the Times...what stories are out there that need to be covered that we're not paying enough attention to?
Chalabi: That's a great question. I think there are kind of two different types of stories that fall into my answer I guess. One of them is stories about imprecision. I think again those stories seem very wonkish, they feel a bit …unimportant. But actually crucial…the public really is starting to change their relationship with statistics. So maybe well maybe the voices that are questioning them are just becoming louder. But some of that is absolutely true people that wary of things like the unemployment rate because they think how on earth is it possible that we know to a decimal place how many people in this country are unemployed to give to give just one kind example or…and I think people like me need to get far better understanding exactly how those numbers were constructed and exactly how to communicate to the audience. You know what? You are right, it isn't this precise but we can say that it's somewhere between this and this. And I think we need to get much better kind of communicating the range of truth if that makes sense to people. And I've completely forgotten what the second category of stories is going to be! How wonderful!
Campbell : Those were great examples.
Bailer: So I'd like to follow up on that idea of communicating imprecision. You know, so you talked about the idea of using your dice example that people might have a sense of that kind of gameplay and doing that, but I often wonder about trying to communicate imprecision at times when the audience, particular audiences, just want a single number. There's like there's almost push back when when imprecision is going to be integrated into a story.
Chalabi: You know I would agree with that but I only think that's partly as a result of bad writing and bad storytelling, right? If it's going to be a three thousand word. Again, so take the example of the US election and the way all of the data journalism was done around that .. Nate Silver published his methodology for his forecast and honestly not many people necessarily wanted to look into that because the language of it was kind of alienating. It kind of conveyed, I think, idealism that you are smart enough or you're not smart enough and I think that part of the trick of doing this stuff really, really well, is to attach human stories to imprecision, right? So let me perhaps give an example. So I used to write a column called "Dear Mona" where people could write to me with kind of questions about their everyday lives. I would try to answer it using statistics. And someone writes to me and says, do you know anything about the faith of U.S. prisoners? So I found some statistics and I kind of presented the caveats to readers of saying you know, we don't actually know for sure whether the U.S. government imprisoned these Muslims at a higher rate or whether this is down to conversion in prison. All I can tell you is that people in prison, that there's a higher ratio of Muslims in prison to outside of it, right? And so that's a story kind of about the blank spots if you like, the things that we don't know. Now people, former inmates wrote to me afterwards explaining that conversion is a big part of it and that part of the reason for that kind of huge gap is because people convert partly because you know that they find a new faith that they identify with and partly because you get better prison meals, if you allow kosher food and you get more time out of your cell to pray. Now if I had had the time to pursue the story of one person who had converted, that's a fascinating human story anyway and it nicely dovetails us into a story about how…like who was it that came into the prison and conducted the census of prisoners and like…do we know anything about what happens to the faith of these prisoners after they have been imprisoned. All of this is a very rich human story that just so happens to be a statistical story too.
Campbell : So how do you try to balance that because what you're saying here is ok, here's the data here here are the big statistics but to put a face on it, I've got to go out and I need to go out and tell a story about one person who may be representative of that, and then how as a statistician How hard is that to do we even think about that ooh! Maybe this is the wrong, maybe this is the wrong person I'm telling the story about and that person, he or she, is not representative of the larger picture I'm trying to get across to an audience here.
Chalabi: Yeah. I mean no one is a perfect representation of anything I think that people understand that quite intuitively. I think part of the thing of marrying the two together is about how you pick your stories and about transparency to readers, right? So more often than just saying, we went to this town in Utah to tell the story, why don't we use statistics to determine exactly where we're going to do use reporting, use statistics in society. We're going to this town because it is, I don't know, the most polluted county in all of America we've used statistics to kind of determine that and you know again, beautiful visualizations that convey levels of pollution in each county and then going and doing the reporting there, people automatically understand, of course place isn't representative because you've already told me it is an outlier. But it allows them to kind of contextualize the rest of the stories they are hearing.
Pennington: You're listening to Stats and Stories and our discussion today focuses on data journalism. Our guest is Guardian U.S. Data Editor, Mona Chalabi. Now Mona, you mentioned earlier that some of the stories that you work on have been focused on sex, they've been focused on food, which fall into sort of I would say kind of the overarching general category of kind of health stories which you know within journalism and I, having been a former journalist, who do a lot of health and science stories can be….notoriously bad…badly reported, badly written in such a sort of moving into a maybe a different kind of lane just for a moment, I wonder as someone who does a lot of work on those sorts of health related stories, what are maybe some of your frustrations about the way that those kinds of stories get covered?
Chalabi: I would say that one of my biggest frustrations is that people communicate the likelihood of side effects or the likelihood of getting sick without really explaining to readers the risk factors that affect those averages. This is actually a complaint that I have about a lot of journalism that uses numbers. Very often reporters will just take kind of just the top-lying numbers, the average or the median, without drilling down into the demographic patterns or age patterns that affect those numbers and honestly, that's what readers really really care about that stuff they want to know whether they're in a high risk group or low risk group and it often doesn't take too much extra work to give them that information.
Bailer: So why do you think that that's not present? What's the barrier for telling that part of the story?
Chalabi: If I'm really on it so I think a lot of journalists. Some of the some of it is laziness and some of it is about being uncomfortable with the numbers, right?
So it's one thing to look at the press release and it's another thing to dig inside the study itself and understand what it means when you see those numbers in brackets. That does not necessarily explain... this is the confidence, this is the margin of error. Again it takes a certain statistical literacy to understand but rather than saying, e-mailing me all this and saying, what does this mean? These numbers seem important. They kind of get left by the wayside and again instead the top line numbers are reported.
Bailer: That really begs a follow up question... so, so how's how should we be training the journalist of tomorrow? Or of today?
Chalabi: I guess I'm quite nervous about the job description, data journalists, because I feel like everything shouldn't be siloed, it shouldn't be that you know you just get one piece that kind of summarizes all of the numbers and then you get the reported piece. I think all journalism graduates today should have taken a course on statistics and on the numbers so that they feel that kind of degree of comfort and statistical literacy.
Campbell: Yes, I think the challenge sometimes is when we have our journalism students here, there is a phobia about numbers right? Some of you know some students are attracted to careers and courses and majors that don't really challenge them or test them they're going to go to their kind of comfort zone so I guess the question, you want to come here and teach?
Chalabi: I currently hear you about the difficulties of students who don't necessarily feel like that's something that's for them but I actually did a workshop at Columbia this weekend and I feel like data visualization offers so many opportunities to get those students engaged. Because this is obviously a massive characterization but I think people either tend to be quite concept driven and I'm thinking quite abstract ways which obviously Maths is very very abstract subject or they are quite visual people and data visualization can give people that in, to feel more excited about numbers and statistics I think.
Bailer: I've got to ask the complementary question then too…what is it that we need to get the people that are interested in statistics and math to be better communicators and better contribute to journalism?
Chalabi: Oh good, that's a great question. I think I honestly think a lot of them need to get more humility and I know that sounds awful.
Bailer: Well, there's two people in this room that probably agree with you!
Chalabi: Yeah, I think I do think sometimes there's a bit of an attitude of either you are smart enough to get it, or you are not. And making that level of subject expertise accessible to a wider audience. I think sometimes people see that as dumbing down. And you know that's something that goes well like I have an ego and I felt like when I first started doing these all these illustrations, they felt silly, they felt childlike and I thought, oh god, like I'm going to be less respected in my field for doing something that isn't super impressive but yeah I think it is important it's really important and I don't know if that really answers your question.
Bailer: I think that's really important, being able to communicate technical information in non-technical and interesting ways is a really hard thing to do and that's a skill that has to be developed.
Chalabi: And again and again if people understand that that actually will affect the accuracy of something that they probably assume is perfectly accurate. Again, giving an example, even these very, very simplified data illustrations, I publish them very often on Instagram and one of the nice things about using Instagram is that there's a comment function where it so very often people comment on to me. And what they are saying to me is, oh, so like I don't know, let's go over this...I've written 53 out of a hundred probability so people comment saying oh, you are saying to me that if I do this thing a hundred times 53 times out of that one hundred, it will happen and so it's a really great way for people to communicate what it is that they've understood from something you've published and honestly my belief is that if they have misunderstood something, I have failed as a journalist it's not because they're stupid and they are getting the wrong take-away from it and again I come back to this recent US election which honestly had like a really long term impact I think on data journalism. Again, very often the response of Nate Silver following the election was that people who criticized his forecast had misunderstood the nature of probability and I just don't think that's fair I think that the readers were misinformed because the way that that data was communicated was confusing to them.
Pennington: Since you bring up the election, one of the outcomes of the election as you I'm sure you are aware, is this debate over fake news and fake facts and I wonder if given this particular climate that we are in, if people who are doing work like yourself and I won't call you a data journalist but who are using a lot of data in their work do you feel an extra amount of pressure to make sure that you have gotten everything right because you know whenever you are reporting on numbers I, for me, I was always paranoid I was screwing something up, right? So I wonder if it given the climate we're in, if you feel an extra level of pressure to really make sure that you're getting it right.
Chalabi : I won't say an extra level of pressure, I actually have a lot of pressure to be transparent about my calculation process because again it comes back to the idea of humility I hope that readers know…if they call me out because I got something wrong and I do, I get things wrong, I would respond them saying, hey, you are absolutely right you know a very, very clear correction on the piece and a thank you to the person who might have called it out. And again, part of establishing that tone is about being very, very transparent. So, like I just saw it a couple of months ago I started a fact checking column, of which there are dozens and dozens now. So I was a little bit reluctant to do it because it felt like it was quite a saturated market, if you like. But the way that I tried to make it different is by doing it as a step by step where readers can do the exact same research alongside me and see if they came to the same results. And again, I have seen that that brings readers in to say, I don't think I'm smarter than you. Anyone can do this and this is how I did it.
Pennington: What's the response been to that column?
Chalabi : Quite positive! Yes, it's been nice! Again like I really rely on my inbox a lot for figuring out how I'm doing and I've got some really really nice emails from readers either suggesting future topics or saying that it was helpful to them to have it very, very clearly spelled out, exactly how I kind of got the data I did.
Bailer: You know that sounds like good teaching strategy. You sound like you're being just a really outstanding educator. You're trying to show the steps...the way you break down a problem and you show a way that you come to understanding. How it is either correct or incorrect.
Chalabi : Oh I hope so but it's also me wanting to be a pupil because all the time people point out a different way that I could have done it that's really really helpful to me too. If I just publish the result and someone else arrived at the same results, there isn't that kind of exchange of knowledge about different way of doing things.
Campbell: We were sort of starting with the general public with this notion I think you even in your TED talks say about four out of ten Americans distrust the economic data they get reported by think government they're distrustful of government data and it's even higher among Trump supporters we know. So you're starting from this position where it's almost like people believe that...how do you get past that belief. You know I think you focused on the concept of imprecision and let people know exactly how this you know how that works. But you know I guess the bigger question is how can both the statisticians and journalists do a better job of sort of meeting that...that resistance head on that the general public has particularly today against data, numbers and you know, trust in government statistics.
Chalabi: I think this podcast is really important and I think is exactly that marriage!
Campbell: Thank you very much.
Bailer: You can come back any time.
Chalabi: Seriously it is that marriage of statistics and stories. Again it would be so powerful for the public if there was a really beautifully made, engaging short video, short podcast that explained how on earth the unemployment rate is calculated. Not just…you know, we speak to some businesses and we come up with the numbers. There's a real pressure to go beyond the published PDF of the methodology and go, and not also to the other alternative, which is a journalist like me, simply publishing kind of the top plane results of the latest numbers. There has to be something in between those two extremes that people can latch onto and build a trust on top of.
Bailer: So you mentioned your column which entails the name of it can you tell us the name of it?
Chalabi: Oh it's called "Just the Facts," which I actually think is probably a terrible name.
Pennington : Well that's all the time we have for this episode of Stats and Stories. Our guest today has been Guardian U.S.'s Data Editor Mona Chalabi thank you so much for being here today Mona.
Chalabi: Thanks for having me.
Bailer: Thank you.
Pennington: Stats and Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film and the American Statistical Association. You can follow us on Twitter or iTunes. If you'd like to share your thoughts on our program, send your e-mail to firstname.lastname@example.org and be sure to listen for future editions of Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics.
Click to close the script.