GPs question Babylon test that found AI 'on par with practising doctors'

GPs and other doctors have taken to Twitter to voice their concerns about health technology company Babylon's claims that its AI system can provide clinical advice to patients 'on par with practising clinicians'.

GPs have reacted online to AI claims (Photo: iStock)
GPs have reacted online to AI claims (Photo: iStock)

On Wednesday Babylon presented claims that its AI systems performs better than the average candidate in MRCGP exams, although it only assessed the technology on diagnostic elements of the AKT and CSA exams.

On Thursday morning RCGP chair Professor Helen Stokes-Lampard tweeted:

Talking to BBC Radio 4’s Today programme, Professor Stokes-Lampard later said the college was worried about the hype around Babylon’s claims and called for them to be robustly and independently verified.

GPC prescribing lead Dr Andrew Green also questioned the evidence behind the claims, tweeting:

Meanwhile, Dr Zoe Norris, chair of the GPC’s sessional subcommittee, said: ‘I’d have liked to see a live test and not prepared scenarios to show if it really works’.

Glasgow GP Dr Margaret McCartney said: ‘[Babylon is] very good at getting publicity. This is not a good test of safety, accuracy, assessing impact on other areas in NHS. There is a huge need for better tech in the NHS. But NHS having to mop up costs of supply led demand - and this isn't sorting out basic issues.’

GP Dr Andrew Forder tweeted that the NHS itself should be developing and testing AI:

Many GPs pointed their followers to a detailed analysis of Babylon’s claims by Professor Enrico Coiera from the Australian Institute of Health Innovation and director of Australia’s Centre of Health Informatics.

He questioned the basis of the test involving 100 clinical scenarios or ‘vignettes’ that were developed by doctors, tweeting that the ‘encounter is not naturalistic. It doesn’t test Babylon in front of real patients’.

He also pointed out that some of the doctors playing the parts of patients in the vignettes were Babylon employees ‘so they might know how Babylon liked information to be presented and unintentionally advantaged it. Using independent actors, or ideally real patients, would have had more ecological validity.’

He concluded: 'The results are confounded by artificial conditions and use of few and non-independent assessors. So, it is fantastic that Babylon has undertaken this evaluation, and has sought to present it in public via this conference paper. They are to be applauded for that. One of the benefits of going public is that we can now provide feedback on the study's strength and weaknesses.’

You can read his full analysis of Babylon’s paper in this Twitter thread:

Babylon founder and CEO Dr Ali Parsa said the test had produced ‘landmark results’.

‘The results clearly illustrate how AI-augmented health services can reduce the burden on healthcare systems around the world. These landmark results take humanity a significant step closer to achieving a world where no-one is denied safe and accurate health advice,’ he said.

One Derbyshire GP highlighted another reason why AI could never match GPs' relationship with patients:

Have you registered with us yet?

Register now to enjoy more articles and free email bulletins

Register

Already registered?

Sign in