Episode 30: Do's and Don'ts of Data Journalism
Release Date: 04/17/2017
Rosemary Pennington: 2015 was a watershed year for many reasons but one that might have been overlooked was the leak of the Panama papers. The leak exposed the secret business dealings of a number of governments, political figures and global corporations. It was also the biggest data leak in history and it took more than 100 news outlets in the international consortium of investigative journalists, months to find the stories and the data. Things like the Panama papers, the Edward Snowden, NSA leaks and WikiLeaks have help bring attention to the work being produced by data journalist both in the United States and abroad. Data journalism is the focus of this episode of Stats and Stories where we look at the statistics behind the stories and the stories behind the statistics. Stats and Stories is a partnership between Miami University’s Departments of Statistics and Media Journalism and Film as well as the American Statistical Association. Our regular panelists are Department of Statistics Chair John Bailer and Department of Media Journalism and Film Chair Richard Campbell. I'm Rosemary Pennington. Our guest today is freelance journalist Andrew Flowers. Flowers served as an Economic Research Analyst at the Federal Reserve Bank of Atlanta, before spending almost three years as a Quantitative Editor for FiveThirtyEight so thanks for being here, Andrew.
Andrew Flowers : Thank you for having me.
Pennington: So how does someone go from being an analyst at the Fed to working in journalism?
Flowers: Well it's a rather strange, circuitous path but essentially during the financial crisis I was working at the Federal Reserve Bank of Atlanta and in addition to doing a policy work and kind of capital E, economic so to speak, I really got the news bug and wanted to write for a popular audience and I was lucky enough to have colleagues at the Atlanta fed who were open-minded enough relative to other fed banks to have a blog and a magazine and so. But I started to write for those outlets within the Fed that led me to conclude I should really get into news and journalism and so that kind of was my bridge to eventually moving towards FiveThirtyEight.
John Bailer: So how difficult was that transition to go from writing very technically to writing towards a more popular audience?
Flowers: It was very challenging, honestly. It was very challenging. I never did classic journalism whether it was through kind of a university newspaper, policy newspaper or... are obviously worked as a journalist after school. So learning how to craft news stories, how to report, how to do fact-checking, how to write a lead, how to interview characters and kind of get those insights that are about accurate but also engaging. And to engage your readers which is something frankly as in someone from an academic background at the Fed you don't really think much about it. Your readers are assumed to be engaged because they’re your colleagues, there are other academics. So to kind of write for a popular audience through a news website or a newspaper requires thinking about hooks that are both accurate and fair but are tying the reader to something that's relevant to their life or their interest and that transition of learning the kind of classical journalism writing principles is something that took years and I'm still learning it.
Richard Campbel l: So did you do anything specific or can you point to anything that really helped you in that transition to think more in terms of story?
Flowers: Honestly it's just a combination of one reading a lot of news but two more important than anything just the editors and colleagues I had in that newsroom. Being edited repeatedly by an experienced journalist and having them kind of tear apart my copy in a good way to say, hey this part is boring we’re all this meat of you know mythological explanation you can kind of convince it. We saw what the story to be accurate so we want to keep it in there but you can move it further down in the story and kind of lead with some of that illustrates your broader point and kind of teasing your conclusions earlier the story. And I can go all these are just many different types of advice that I kind of absorbed through dozens and dozens, hundreds really of experiences of writing copy and get it edited. Those colleagues at FiveThirtyEight who were experienced journalists, those editors I think that was the one great hands-on work experience that really helped me make that transition as best I could.
Bailer: So what was your favorite story that you've worked on or that had the most interesting... the interesting outcome when you've done this dig deep dive into a set of data to address a certain question of interest to you?
Flowers: Oh that's a great question. It's a really hard question. I mean, I guess my first instinct is to respond with the longest story I wrote which you know it's not always a good gauge. Length and time invested in a project doesn't always correlate well with... we think is as perceived quality. But in this case, yeah I wrote a story in April of 2016, April of last year about... it was originally going to be tied to the referendum in Switzerland over a universal basic income. This kind of economic idea of giving every person every citizen a set amount of money whether they're rich or poor, whether they're employed or not and then what the story evolved into was beyond Switzerland and that was kind of a longer piece about the different activists and economists researchers and really historians who kind of trace this idea of basic income and the pros and cons of the story. And to me the thing that was so interesting was it was a very character-driven kind of classical journalism story on the one hand but also very data driven, very economicy on the other end. You had from the character side a really strange motley crew, group of people interested in this policy. You had kind of your classical, socialists and progressives but then you had like strangely libertarians and conservatives too and thrown in the mix, there were these Silicon Valley techies who kind a fought the idea was revolutionary in a compatible way with future advances in automation. So that to me was the kind of table setting that really piqued my interest was wow, look at all these different interest groups that are varied but all in this together. And then on the technical side, the data side just try to look at the map to make it work because I mean it's an appealing idea in some ways on the pro side in terms of how would you know maybe be a better replacement for welfare programs. But on the con side in terms of actually affording it kind of looking hard at the numbers across countries in terms of what governments spend their social safety net money on that was really interesting. So as a whole I would say my universal basic income piece was the most interesting story to kind of report... I probably spoke to 25 people and then at the same time, parallel with that be looking into the data and to try to leave it all together. That was probably the single most interesting experience but there were others too.
Campbell: Andrew you just talked about characters, so that was the first thing you talked about, so in this ... In your role, how do you think about sort of balancing the idea that you have to tell a story that has characters? So you're thinking about that but you also don’t want to sort of misrepresent the complexity of the numbers and the data. So how do you get that balance?
Flowers: That is I think the questions for data journalist. So you’re cutting right through it because characters are crucial in any type of storytelling and particularly in journalism and really they're important in data journalism. But you also want to be accurate and rigorous. So there's a tension between saying point to a character who may be an outlier. It could be a sports athlete to write a short story. It could be... if you could note character to me in kind of a town or a political party or whatever that you think is kind of above and beyond, the best the worst, whenever, it's an outlier and you want to kind of prove it pigeonhole it with your prior thinking that this is exceptional. This character is exceptional. And you may be tempted to force the data to prove that hey this... oh no this really is an outlier or this time is different. And so there's that temptation to see a character as exceptional and then find the data to back it up which can lead you astray. Of course sometimes they really are outliers and for data journalist that’s the kind of bread and butter story to write. That is... it's the easiest, the most engaging story to say, hey who is this person that is... the character of this group that is the most unusual. So it's important as a data journalist to use characters and I think your work is going to be too dry and un-relatable, it won’t be engaging if you totally shoe characters and as an academic my... just focus on the methodology and the data, the results. So you can't... you have to have both and that's what makes this task of not just storytellers but data storytelling so challenging. And there’s specific techniques and ways to get into how to weave in characters in a data story to make it effective but as a whole that is the attention and the greatest challenge.
Campbell: Thank you.
[music] Pennington: You're listening to Stats and Stories where we discuss the statistics behind the stories and the stories behind statistics. Today our focus is data journalism. I'm Rosemary Pennington joining me our panelist Miami University Statistics Department Chair John Bailer and Media Journalism and Film Department Chair Richard Campbell. Our special guest is freelance data journalist Andrew Flowers who served as the Quantitative Editor at FiveThirtyEight. So Andrew what exactly does a Quantitative editor do?
Flowers: So my experience in two and a half years as the Quantitative Editor at FiveThirtyEight was really incredible. It’s kind of a weird title I know, right? Quantitative Editor, it's not a common job description but when I was hired and as we began to prepare for the re-launch of FiveThirtyEight under ESPN and ABC News this was exactly three years ago actually in March of 2014, the role that was kind of outlined for me with twofold. On the one hand I would be a kind of writer of sorts. I wrote about economics topics primarily but also sports and politics occasionally but the main thrust of my roles kind of two-thirds my work was this Quant Editor role and so essentially what it means is I edited copy but not for pros, for methodology to make sure that the statistics being used, the data analysis presented in the story were accurate and fair and they use the kind of modern technical tools in their presentation and we're just... that the data was rigorous in all FiveThirtyEight stories. That was my mandate. But what that evolve is really two things, one, working with writers on the backend and then on the frontend. And so the back ends when they submitted a copy, they have data in their stories. It's me reading their stories as a Quant Editor with a critical eye asking kind of task tough questions, stress testing their assumptions and really fact checking their work in many cases that would kind of review their code if they had any or their data calculations to make sure it was correct. But the front in Quant Editor role was almost more interesting in many cases. It was working with writers either before their story really took shape or in the midst of it and they would come to me with either technical request of, hey I am struggling to scrape this data from a web page or how do I kind of munch together and join together all these messy data formats that I'm unfamiliar with or how can I better visualize and better model... what's the best statistical analysis approach to take with this story? And so helping writers almost as a staff data scientist from the start of their stories was a great experience because I got a lot of reps both through my own work but through kind of almost as assistant or a co-worker with other writers to see how their storytelling abilities took shape. They had a question in mind, they look for characters, and they did reporting in many cases but then when it came to the data they brought me in to help them and that was the role as a Quantitative Editor and it’s kind of a role head... I don't know if other journalism outlets have. I hear the title data editor at other news outlets but it was a role I really relished and I think it's increasingly important in many newsrooms as they move towards more data-driven storytelling.
Bailer: That ties nicely to a point that you had made it in earlier comments the idea of avoiding forcing data. You know I worry when I think about someone having questions in mind that you just don't go out and search and cherry-pick for data that’s going to be consistent with the beliefs. So what are some of the ways that you would help advice people when they were working on their stories to avoid that temptation?
Flowers: So actually I think one suggestion that many editors, and wasn't just me as the Quant Editor would give writers is to say even if you're considering stuff like data journalist and you just want to dive into the weeds, take some time to actually report this like a traditional journalist would because it's those experts whether they're policymakers, activists, or academics who are going to kind of have a broader or deeper understanding of the subject matter of course but also have kind of broader per view of what are the weaknesses in the data if there are weaknesses. That said, after they did some reporting and when they had the data in their hands and I would ask them you know critical questions and I would kind of advise them with kind of principles as to how to approach it without kind of forcing that so that they wouldn't force the data into a narrative. What I would often recommend is lots of iterative data exploration in the early stages because so often if you have a kind of preconceived notion of what you want to write you can find it in the data. But if you take some time out at the beginning to say let's just graph because I think we're all visual learners, let's just graph the data set in different ways to kind of just learn the contours of say the distribution of the different... say basketball players if it's a sports data set or if it's a political donations look at how they're clustered within different interest groups, just kind of visualize the data to get a kind of 30,000 foot understanding of it at the beginning, number one. And then number two, once you do have your thesis and again any good data storytelling endeavor, any good journalism story really data or not has a kind of point, right? You have a thesis and once you have that data thesis in mind to stress test it in a very rigorous ways to ensure that say for example you're not p-hacking or you're kind of doing multi... in the statistical jargon, you would be doing multiple hypothesis testing running many regressions to kind of find something that's significant and say, oh I found something. What really, we’re just kind a data mining in the bad sense of the term. So to advise them against that would be me coming in as they did their analysis and ask them well how many different variables did you look at and did you run the appropriate statistical check to make sure that your work is not just a spurious. It’s not just a random results that you’re going to run off then write a story with. So iterative data exploration kind of rigorous statistical checks, reporting and then finally I would say just collaborating with other writers and other editors like myself and others to say check that my work is even correct because you would be shocked how often this happens with a traditional news outlets all the time but also with the data journalist how... you just get the numbers wrong. I mean people like mistakes. The error rates when dealing with kind of messy files or you know government issued data and when you’re rush you we're going to a deadline the error rates can be high. We worked very hard over my essentially three years at FiveThirtyEight to get those error rates, those corrections down. And to do that, I think we bested it by having kind of a collaboration mentality within the newsroom that you would kind of share your work and then secondly to hopefully document it through code or through other documentation so that a second person could take it in fully fact-checking. So those are all the kind of checks and advice I would tend to give to any story to make sure that it wasn't pigeonholing, a narrative, the writer had in their mind and then using the data took out a force “evidence” for it.
Campbell: So to follow up on that and all of the hard work that goes into getting this right, we seem to be living in a world of fake news and alternative facts where neither news or numbers is fully trusted and the question is I guess more a political one, what can statisticians and journalists do about that?
Flowers: That is the question... This issue of fake news and how to address it. That is the question I most struggle with because I frankly don't really have a good answer. I feel like over the last few years FiveThirtyEight has modeled a level of transparency in national journalism. At least on the data side that is frankly unprecedented and it's not that other news organizations weren't striving to be transparent, it's just that because FiveThirtyEight had a niche to be data driven, we took it a step further in our transparency by in many cases posting the data and the code behind our stories online so that it could be reproduced and checked by others and of course sometimes corrected by others. And so I managed along with others on our data visualization team a gate hub repository, a way to kind of post data and code and share it with others and so if a story merited, if it had significance in the data usage and the code, we would put it out there in a public way to say here's our work, we're going to be transparent about it. Now what does that do? And by the way I don't think that was often done before by other news outlets. But what does that do? As a data journalist it tells your reader and your wider audience and really your potential readers who are maybe skeptical of your approach it tells them we’re not just going to say we’re credible, we're not just going to try to report and get a balanced view of things, we’re actually going to show our work and if you don't believe as you can go and check it out yourself. Now many readers aren't going to do that. I mean that’s a small sliver of readers so is that the silver bullet to the fake news problem? Well of course not. But taking kind of very concrete digital oriented steps towards being more transparent either in your reporting, I'd love to see news organizations for example post, okay here are the calls I had, I talked to these groups. Here's who made the cut, who didn't and it doesn’t have to go with the story, it doesn’t have to clutter the news, the reader automatically is looking for, assumed to look for but if the reader wants to look for it to see okay who did you talk to or ... and then in FiveThirtyEight case what’s the data, what's the code, how did you make that chart? They can go and find it and these small steps digital oriented steps towards transparency, towards a really new level of robust transparency, I think it can make a small dent and this bigger... I mean gorilla of a problem that is fake news. I mean I don't think it's going to stop like I don’t think posting your work on online in a transparent way is going to stop from viral memes coming out of a kind of a Facebook, “news site” that is really just you know one or two yellow journal is trying to make a... take an erroneous point and kind of just get some clicks out of it. That is a much bigger problem with the internet and how we consume news and echo chambers that I just don't frankly have an answer to but news organizations I do believe can take some small concrete steps which being even more transparent and hopefully that will increase the public space.
[music] Pennington: You're listening to Stats and Stories and today we're exploring data journalism with our guest freelance data journalist Andrew Flowers. So from the reader’s perspective, we live in this world where things aren’t quite as transparent as we'd like them to be. How would you suggest readers navigate stories that are based on data? How can they figure out what's worth trusting and what maybe they can sort of avoid or just close out altogether?
Flowers: Yeah that's really hard. I think... and to be clear, I think there's a lot of great dangers done not just at FiveThirtyEight I think the Upshot, the New York Times does great work. Smaller sites or international news sites like the Guardian or smaller websites I think Priceonomics is a good example who often work with data and I think do a good job with it. That said though in terms of as a news consumer how to adjudicate between okay this is a bad use of data journalism or good use, it's difficult but the kind of baseline assumption I think is what's the writer’s attitude or the news outlets attitude towards me the reader and what I need to know. And you can also pick up on that through the tone of the piece and how much information is conveyed in the piece through subtle things like foot footnotes, links and charts for example. Source lines and charts is a great... like these little details that tell you the reader, oh the writer and the designer who made this story, cared enough to include hyperlinks, include footnotes, includes source lines to the charts to kind of “cite, cite” their entire story as an academic would and that tells you, okay this news organization treats me with the level of respect that says I'm not going to tell you the reader from me, the writer who is all knowing what to believe. I'm going to kind of lay out my work for you, cite my sources, here are the links so that you can follow and pull the data yourself or read the stories yourself or here's the organization to... here's a link to it, you can go and check out their credibility, make your conclusions, draw your conclusions from their website if you want just providing the reader with a rich level of information so that they themselves can go and find and make up their... find more information and make up their mind. That tells the reader I have respect for your intelligence and I'm not here to kind of lord over you with the kind of narrative that hey this is how it is but to show my work and make a case for it. That tone and that level of respect you can pick up on I think through just... and repeated, intelligent reading of new sources and again a lot of resources do this. But if a new source doesn't, if they're not really citing their numbers and they're just kind of throwing information out with no links or site or even just verbal text citations as to where it comes from and they just assume you'll run with it, that's I think the red flag and tells you this is a sloppy data journalism.
Bailer: You know one challenge with this is the implicit long-form that's associated with really good reporting and good stories and I wonder how that plays out in a world where 140 characters is the bite sized chunk that many want to consume.
Flowers: It’s a challenge. And again, it's a challenge because a multiple front. To make a story engaging when you're using numbers as a challenge, to make a story engaging when it's complex, when the answer... if there is an answer is nuance and therefore it takes more time to kind of digest and present, that's a challenge. So you are swimming upstream. But to kind of rise to the occasion and to take these cues that frankly come not from statisticians but come from the classic principles of storytelling. Principles like using characters, using visuals, kind of crafting scenes and backdrops to the conversations that take place, to connecting the data to real people, to actually finding who the outliers are and telling their stories what's that town that's the most unequal in America? Who's that politician who one with the least amount of external fundraising? Whatever it is to kind of contextualized the data through storytelling I think is really powerful. So I'll give you one example, I think this is just a fabulous story that my colleague Anna Maria Barry-Jester wrote last summer for FiveThirtyEight, it was part of a guns project that FiveThirtyEight did where they kind of presented a data visualization on gun deaths in America and it kind of showed you with the best available data that we could get, here's the composition of gun deaths in America. How many are attributable to suicides which I think... what's surprise a lot of people that it's a lot more than you would expect. How many are homicides, accidents and so on? And so this is a part of an interactive that was very data hopping but the story that accompany it along with other stories, the one that Anna wrote that still sticks with me as a great template for data storytelling, is a story about a man in Wyoming who had committed... who attempted to commit suicide with a gun and she told his story about kind of getting help through his family, through counselors but the broader context of the story was, hey gun ownership rates are extremely high in the Mountain West, suicide rates particularly by using firearms are extremely high in this area too. Who tends to commit suicide with a firearm? Well it's overwhelmingly men. It’s overwhelmingly white men compared to other races and it’s overwhelmingly middle-aged men compared to other age groups. So it was a story about a person in a very emotional and kind of tragic experience and they’re kind of the arc, the narrative arc was how they kind of got help and was able to move beyond that but it was all rooted in data. It all use charts and maps to kind of show you okay all the CVC data we just threw at you, if your eyes glazed over, that's okay. Here's one way to interpret it. And so that's just one case and there are many other examples I can give of my colleagues in my work where you have to rise to the occasion whether it's fake news, whether it's the kind of 140 character Twitter social media landscape that the news world lives in and operates in, all these challenges can be met I believe with a combination of two things. One, good storytelling, that's never going to go away, finding the right characters, showing a narrative arc, where's the tension, that is never going to go away. And then the second thing, to meet with data... you can meet these challenges with data and accuracy in your work. The transparency that we provide and to not let these potential readers have their eyes glaze over and lose interest to present that data in a very aesthetically pleasing way through visuals, through a video and podcast when necessary. That I think is... these are the tactics I would... we have used and I would recommend anyone to use to combat those big challenges.
Pennington : Well Andrew that's all the time we have for our conversation today thank you so much for joining us.
Flowers: Thank you so much for having me. It’s a real pleasure.
Pennington: Stats and Stories is a partnership between Miami University's Departments of Statistics and Media, Journalism and Film and the American Statistical Association, stay tuned and keep following us on Twitter or iTunes if you'd like to share your thoughts on our program send your email to email@example.com and be sure to listen for future editions of Stats and Stories where we discuss the statistics behind the stories and the stories behind the statistics.
Click to close the script.