Babylon's AI 'outperforms average doctor' in MRCGP exam

Artificial intelligence (AI) developed by technology company Babylon performs better than the average candidate in an MRCGP assessment, the company has said.


Additional tests had also shown that the AI system was able to provide health advice to patients ‘on par with practising clincians’, Babylon claimed.

Babylon assessed its AI technology using elements from both the AKT and CSA MRCGP exams that related to diagnostics. The company took a representative sample of questions from publicly available sources, some of which had been published by the RCGP.

Presenting the results at the Royal College of Physicians in London on Wednesday evening, Babylon’s medical director Dr Mobasher Butt said that the average pass mark for both exams was 72% over a five-year period. Babylon’s AI system achieved a score of 81% on its first attempt, he said.

‘It takes most doctors around 12 years before they have the knowledge and skills to sit this exam and even then some fail on their first attempt and even second, or third attempt,’ Dr Butt said. ‘It’s taken our scientists two years to create AI that can achieve 81% on its first attempt.’

However, the RCGP said that the results of the experiment were ‘dubious’ and that no app or algorithm was ‘able to do what a GP does’.

RCGP vice chair Professor Martin Marshall said: ‘The exam-preparation materials, used by Babylon in this research, will have been compiled for revision purposes and are not necessarily representative of the full-range of questions and standard used in the actual MRCGP exam, so to say that Babylon’s algorithm has performed better than the average MRCGP candidate is dubious.’

Read more
No app or algorithm can ever replace a GP, says RCGP

Babylon also conducted a series of tests to assess how its AI performed when dealing with 'real-life clinical scenarios'.

Dr Megan Mahoney, chief of general primary care at Stanford University’s division of primary care and population health, and doctors from the Royal College of Physicians devised 100 scenarios or ‘vignettes’, which were then tested with the AI system and 12 experienced UK GPs who had no connection with Babylon. In the tests patients were played by practising GPs, some of whom were Babylon employees.

Artificial intelligence

The doctors’ correct diagnosis rate ranged from 64% to 94%, with an average of 80%. Babylon’s AI system also correctly diagnosed 80% of cases, Dr Butt said. When the results were assessed against conditions seen most often in primary care, the GPs' diagnostic accuracy ranged from 52% to 99% and Babylon's AI accuracy was 98%.

Dr Butt said the results showed that AI had the potential to 'significantly improve as it learns more about conditions it sees less frequently'.

The appropriateness of triage decisions was also independently assessed, although only by one doctor. GPs made the correct decision in 90.5% of cases, while the AI system did so in 90% of cases.

Babylon founder and chief executive Dr Ali Parsa said: ‘Even in the richest nations, primary care is becoming increasingly unaffordable and inconvenient, often with waiting times that make it not readily accessible. Babylon’s latest artificial intelligence capabilities show that it is possible for anyone, irrespective of their geography, wealth or circumstances, to have free access to health advice that is on-par with top-rated practicing clinicians. Tonight’s results clearly illustrate how AI-augmented health services can reduce the burden on healthcare systems around the world.’

Professor Marshall said that no app or algorithm could replace a GP. 'An app might be able to pass an automated clinical knowledge test but the answer to a clinical scenario isn’t always cut and dried,’ he said.

‘The potential of technology to support doctors to deliver the best possible patient care is fantastic, but at the end of the day, computers are computers, and GPs are highly-trained medical professionals: the two can’t be compared and the former may support but will never replace the latter.’

Have you registered with us yet?

Register now to enjoy more articles and free email bulletins


Already registered?

Sign in

Just published

GP typing at computer

GP practices asked to switch on data sharing with UK Biobank

GP practices have been asked to share patient data with the biomedical database UK...

Child vaccination

'Serious concern' as child vaccination rates slip and MMR hits new low

Health officials have voiced 'serious concern' after child vaccination rates in England...

Talking General Practice logo

Podcast: GP contract trends, the future of physician associates, cost-of-living impact on patient health

The team discusses what recent tenders for GP contracts tell us about the possible...

GP consultation

GP practices delivering 150,000 extra appointments per day compared with 2019

GP practices in England delivered 150,000 more appointments per working day in the...

Surgeon looking at a monitor in an operating theatre

NICE recommends non-invasive surgical procedure to target obesity

NICE has said that a non-invasive weight loss procedure should be used by the NHS...

GP trainee

Two training posts deliver one full-time GP on average, report warns

Two training posts are needed on average to deliver a single fully-qualified, full-time...