News

Internet crawling tool tracks disease outbreaks worldwide

29 July 2008

An online project called HealthMap mines the Internet to look for disease outbreaks worldwide and plots the results on an online map, available for anyone to view and interact with. As reported in the July issue of PLoS Medicine, it extracts, categorizes, filters and integrates a variety of web-based data sources, even penetrating blogs, listservs, chatrooms, and online news reports.

“It’s a disease-mining system that uses the Internet to look for outbreaks going on around the world, bringing all this information together in one view,” explains John Brownstein, PhD, co-founder of HealthMap and an assistant professor at the Informatics Program (CHIP) at Children’s Hospital, Boston, USA.

Launched in September 2006 as an experimental project by Brownstein, an epidemiologist by training, and his CHIP colleague Clark Freifeld, a software developer, HealthMap currently serves as a direct information source for approximately 20,000 unique visitors per month. In fact, many regular users come from the WHO, the US Centers for Disease Control (CDC), and the European Centre for Disease Prevention and Control.

HealthMap is funded by grants from the US National Library of Medicine, the National Institutes of Health and the Canadian Institutes of Health Research. A $450,000 research grant from Google enabled HealthMap to expanded its surveillance reach and it now mines the Internet in English, Chinese, Spanish, Russian and French. Additional languages such as Hindi, Portuguese and Arabic are under development.

“Many developing regions in the world still lack essential public health information infrastructure, and these areas are often most vulnerable to the threat of emerging disease,” notes Freifeld.

While the Internet contains plenty of information about infectious diseases, the myriad sources are often not structured or organized and, until now, have not been synthesized.

HealthMap also ignores international boundaries, facilitating early disease warnings even when governments want to keep things under wraps. For example, public health agencies in China were aware of and working to combat SARS well before the deadly virus made global headlines.

Screenshot of the HealthMap website
Screenshot of the HealthMap website

“We’ve traced the earliest reports of SARS back to Internet chat rooms where people were talking about this problem going on in Guangdong Province,” says Brownstein, who is also affiliated with Harvard Medical School. “The only information coming out to the rest of the world was through such informal channels, but nobody paid much attention at that point.”

The program’s main information sources include online news wires, RSS feeds, expert-curated accounts such as ProMED Mail, and validated official alerts from the WHO.

HealthMap classifies the collected data by location and disease, generating interactive geographic maps and colour-coding alerts based on how 'hot' they are — in other words, red means that there has been a lot of recent news in one particular area.

A few clicks can provide a crash course on a disease of interest, via sites such as Wikipedia, Google Trends and PubMed. “Situational awareness windows” — pop-ups that appear when a particular state or city on the interactive map is highlighted — provide links to all the news reports on an outbreak in the area.

Brownstein and Freifeld are continuing to tinker with machine learning tools to help HealthMap avoid false alarms, so that for instance, the program doesn’t mistake information on a herpes-infected horse named Antarctica for an actual herpes outbreak in Earth’s southernmost continent; it also understands that the word 'fever' in the phrase “football fever in the UK” isn’t related to a disease.

“Think of it as creating our version of a spam filter that remembers bad emails. It’s a continuous process,” Freifeld says. “There are many thousands of health-related reports on the Web — publications of scientific results, changes in health policy, among others. In generating alerts, HealthMap needs to be able to separate these from reports of actual outbreaks.”

The researchers are also working to validate news reports as a legitimate index of disease. “News feeds pick up stuff we’re not getting from other sources, and these reports tend to be true,” Brownstein explains. “We know, for instance, that 60% of outbreaks investigated by the WHO come from news sources. So these are critical sources, but we need to quantify how good news feeds are at early reporting of diseases, and their reliability.”

Ultimately, HealthMap demonstrates that low-cost, real-time Internet data-mining can be combined with openly available, user-friendly technologies, ensuring that everyone, not just the public health community, can participate in global disease surveillance. Best of all, it’s for free, an aspect Brownstein and Freifeld intend to preserve.

HealthMap can be found at: www.healthmap.org

 
Please allow scripts in your browser so that Google ads will show — the ads are safe and give information on useful IT products.

 

To top^