In December 2006 the BMJ published a paper describing a small-scale quantitative study into the use of a general-purpose web search engine (Google) as a diagnostic aid.
The basis for the study was a set of diagnostic problem cases posed in the New England Journal of Medicine (NEJM). For each of the 26 cases, the investigators ran a web search query using a small number of terms representing symptoms of the case, chosen blind to the diagnosis, then inspected the returned documents to see if they contained the diagnosis term.
The paper provoked a huge response in professional discussion fora, blogs and newspapers. Respondents addressed various questions — mostly unproductive questions in my opinion.
Some professionals questioned whether PubMed, or search engines other than Google, might give better results. However, PubMed and web search engines are utterly different tools, in terms of purpose, content and processes, and rapid evolution is in the nature of search engines, so a comparative study performed today would have little value in six months’ time.
Many respondents focused on whether patients should be protected in some way from the perils of self-diagnosis using web search engines. Significant resources have been allocated to attempts to develop ‘trustmark’ labels for health websites.
However, patients have been practising self-diagnosis for millennia without any assistance from the web and it seems unlikely that any censorship of web information could keep pace with its expansion.
Future of diagnosis
I believe a more fertile question is that because GPs have easy access to computers and the web, why is it not already common practice in GP consultations? Indeed, perhaps failure to consult computer diagnostic aids might in the future be deemed perverse, if not negligent.
Web search engines automatically index documents that are placed on websites. In response to a user’s information need, typically expressed as a few words typed into a ‘query’ field, the engine derives a measure of the relevance of each document to the query and returns a subset of the most relevant documents.
Essentially there are three approaches to deriving relevance, based on meaning, word counting and recommendation.
Retrieval based on meaning is the most powerful approach.
If we asked a colleague to find documents about the side-effects of NSAIDs, we would expect them to recognise that a document containing the sentence: ‘Three of the patients receiving indomethacin developed gastritis’ is of interest to us, even though the document does not contain any of the words ‘side’, ‘effect’ or ‘NSAID’.
The emerging semantic web is designed to support this approach using specially constructed ontologies.
Web search engines estimate relevance using statistical measures based on counting words or phrases.
This can work well for diagnostic challenges where the patient presents with a rare combination of features that are all expressible by unambiguous terms, but will be unhelpful for diagnostic challenges where the patient presents with headache.
The diagnostic problems posed in the NEJM fall into the former category, but daily clinical practice is largely concerned with the latter, and this is why current web search engines are of limited use as a diagnostic aid.
A third technique involves recommendation based on feedback. Suppose all GPs used a website that could accept and submit queries to a web search engine, but also store the queries and any feedback on the relevance of returned documents.
For example, Dr Smith submits the query ‘teenage headache male’, scans 10 returned documents and marks two as being useful.
Two months later Dr Jones submits the query ‘acute headache in teenager’. The system judges Dr Jones’ query to be similar to Dr Smith’s query and returns the two documents Dr Smith recommended, as well as searching for new documents based on content in the usual way.
Recommender systems can offer surprisingly good performance and be self-updating once a sufficient number of users access the system regularly.
Web search engines may currently be useful as a diagnostic aid in some GP consultations, but this is infrequently. However, web technology is changing very rapidly.
Emergence of the semantic web and large-scale relevance feedback-based resources may mean that scanning reference information will come to be expected in most consultations and possibly logged on record.
Mr Gardner is a research fellow at the department of computing science at the University of Glasgow