News
Internet crawling tool tracks disease outbreaks worldwide
29 July 2008
An online project called HealthMap mines the Internet to look for
disease outbreaks worldwide and plots the results on an online map,
available for anyone to view and interact with. As reported in the July
issue of PLoS Medicine, it extracts, categorizes, filters and
integrates a variety of web-based data sources, even penetrating blogs,
listservs, chatrooms, and online news reports.
“It’s a disease-mining system that uses the Internet to look for
outbreaks going on around the world, bringing all this information
together in one view,” explains John Brownstein, PhD, co-founder of
HealthMap and an assistant professor at the Informatics Program (CHIP)
at Children’s Hospital, Boston, USA.
Launched in September 2006 as an experimental project by Brownstein,
an epidemiologist by training, and his CHIP colleague Clark Freifeld, a
software developer, HealthMap currently serves as a direct information
source for approximately 20,000 unique visitors per month. In fact, many
regular users come from the WHO, the US Centers for Disease Control
(CDC), and the European Centre for Disease Prevention and Control.
HealthMap is funded by grants from the US National Library of
Medicine, the National Institutes of Health and the Canadian Institutes
of Health Research. A $450,000 research grant from Google enabled
HealthMap to expanded its surveillance reach and it now mines the
Internet in English, Chinese, Spanish, Russian and French. Additional
languages such as Hindi, Portuguese and Arabic are under development.
“Many developing regions in the world still lack essential public
health information infrastructure, and these areas are often most
vulnerable to the threat of emerging disease,” notes Freifeld.
While the Internet contains plenty of information about infectious
diseases, the myriad sources are often not structured or organized and,
until now, have not been synthesized.
HealthMap also ignores international boundaries, facilitating early
disease warnings even when governments want to keep things under wraps.
For example, public health agencies in China were aware of and working
to combat SARS well before the deadly virus made global headlines.

Screenshot of the HealthMap website
“We’ve traced the earliest reports of SARS back to Internet chat
rooms where people were talking about this problem going on in Guangdong
Province,” says Brownstein, who is also affiliated with Harvard Medical
School. “The only information coming out to the rest of the world was
through such informal channels, but nobody paid much attention at that
point.”
The program’s main information sources include online news wires, RSS
feeds, expert-curated accounts such as ProMED Mail, and validated
official alerts from the WHO.
HealthMap classifies the collected data by location and disease,
generating interactive geographic maps and colour-coding alerts based on
how 'hot' they are — in other words, red means that there has been a lot
of recent news in one particular area.
A few clicks can provide a crash course on a disease of interest, via
sites such as Wikipedia, Google Trends and PubMed. “Situational
awareness windows” — pop-ups that appear when a particular state or city
on the interactive map is highlighted — provide links to all the news
reports on an outbreak in the area.
Brownstein and Freifeld are continuing to tinker with machine
learning tools to help HealthMap avoid false alarms, so that for
instance, the program doesn’t mistake information on a herpes-infected
horse named Antarctica for an actual herpes outbreak in Earth’s
southernmost continent; it also understands that the word 'fever' in the
phrase “football fever in the UK” isn’t related to a disease.
“Think of it as creating our version of a spam filter that remembers
bad emails. It’s a continuous process,” Freifeld says. “There are many
thousands of health-related reports on the Web — publications of
scientific results, changes in health policy, among others. In
generating alerts, HealthMap needs to be able to separate these from
reports of actual outbreaks.”
The researchers are also working to validate news reports as a
legitimate index of disease. “News feeds pick up stuff we’re not getting
from other sources, and these reports tend to be true,” Brownstein
explains. “We know, for instance, that 60% of outbreaks investigated by
the WHO come from news sources. So these are critical sources, but we
need to quantify how good news feeds are at early reporting of diseases,
and their reliability.”
Ultimately, HealthMap demonstrates that low-cost, real-time Internet
data-mining can be combined with openly available, user-friendly
technologies, ensuring that everyone, not just the public health
community, can participate in global disease surveillance. Best of all,
it’s for free, an aspect Brownstein and Freifeld intend to preserve.
HealthMap can be found at:
www.healthmap.org
|