By Deanna Trejo
Imagine that you want to tell the stories of housing insecurity brought on by the COVID pandemic and you’re trying to track down sketchy landlords who don’t want to be found.
Not only is there no app for that, until recently there wasn’t any hard data to turn to, either. So, what were enterprising reporters and housing activists to do? Create their own databases, of course.
Saturday morning at FOIAFest 2021, Justin Agrelo, a reporting resident with City Bureau, Ivy Abid, a co-developer of the website FindMyLandlord, and Haru Coryne, a data reporter with ProPublica Illinois, offered insights into the messy world of Web scraping, data cleaning and building databases from the ground up.
Agrelo, who co-reported for City Bureau on evictions in Chicago during the pandemic moratorium on such practices, offered details on how his team needed to think about the housing landscape in the city in order to identify who might have useful information. This included public agencies, such as the Chicago Housing Authority, as well as private groups, such as the DePaul Institute of Housing Studies.
Another primary source of data came from the Cook County eviction court, which is not subject to FOIA. Rather, Agrelo had to request records directly from Chief Judge Timothy Evans, which turned out to be difficult during the early part of the pandemic.
Abid gathered data for FindMyLandlord, a tool she and others created to help tenants learn more about their landlords, who often own a lot of property in Chicago and hide behind multiple LLCs. She scraped her data from the Cook County Assessor’s website until they asked her to stop and promised to give her the information she wanted.
For a story on senior citizens who died alone at home in public housing during the pandemic, Haru said he and ProPublica colleagues examined CHA records against medical examiner records, which led them to filing FOIA requests for case files from the more useful M.E. information.
Obtaining the data turned out to be only half the battle, because cleaning it was the labor-intensive next step. “This had to be done by person power,” Abid said, adding that it all couldn’t be automated.
For Agrelo, the information requested from eviction court in April 2020 didn’t come back until August. It was messy, listing over 2,000 residential and commercial eviction filings. He said their next move was to “narrow the data down to 15 or so ZIP codes that we serve,” and then manually clean that data for only residential evictions. After mailing letters, they were flooded with phone calls.
Agrelo and his co-reporters then built their own database based on information they logged from the phone calls. “When we started to build that database…that’s when it was easier for us to say, OK, what stories here do we feel reflect this crisis?”