Additional tests had also shown that the AI system was able to provide health advice to patients ‘on par with practising clincians’, Babylon claimed.
Babylon assessed its AI technology using elements from both the AKT and CSA MRCGP exams that related to diagnostics. The company took a representative sample of questions from publicly available sources, some of which had been published by the RCGP.
Presenting the results at the Royal College of Physicians in London on Wednesday evening, Babylon’s medical director Dr Mobasher Butt said that the average pass mark for both exams was 72% over a five-year period. Babylon’s AI system achieved a score of 81% on its first attempt, he said.
‘It takes most doctors around 12 years before they have the knowledge and skills to sit this exam and even then some fail on their first attempt and even second, or third attempt,’ Dr Butt said. ‘It’s taken our scientists two years to create AI that can achieve 81% on its first attempt.’
However, the RCGP said that the results of the experiment were ‘dubious’ and that no app or algorithm was ‘able to do what a GP does’.
RCGP vice chair Professor Martin Marshall said: ‘The exam-preparation materials, used by Babylon in this research, will have been compiled for revision purposes and are not necessarily representative of the full-range of questions and standard used in the actual MRCGP exam, so to say that Babylon’s algorithm has performed better than the average MRCGP candidate is dubious.’
Babylon also conducted a series of tests to assess how its AI performed when dealing with 'real-life clinical scenarios'.
Dr Megan Mahoney, chief of general primary care at Stanford University’s division of primary care and population health, and doctors from the Royal College of Physicians devised 100 scenarios or ‘vignettes’, which were then tested with the AI system and 12 experienced UK GPs who had no connection with Babylon. In the tests patients were played by practising GPs, some of whom were Babylon employees.
The doctors’ correct diagnosis rate ranged from 64% to 94%, with an average of 80%. Babylon’s AI system also correctly diagnosed 80% of cases, Dr Butt said. When the results were assessed against conditions seen most often in primary care, the GPs' diagnostic accuracy ranged from 52% to 99% and Babylon's AI accuracy was 98%.
Dr Butt said the results showed that AI had the potential to 'significantly improve as it learns more about conditions it sees less frequently'.
The appropriateness of triage decisions was also independently assessed, although only by one doctor. GPs made the correct decision in 90.5% of cases, while the AI system did so in 90% of cases.
Babylon founder and chief executive Dr Ali Parsa said: ‘Even in the richest nations, primary care is becoming increasingly unaffordable and inconvenient, often with waiting times that make it not readily accessible. Babylon’s latest artificial intelligence capabilities show that it is possible for anyone, irrespective of their geography, wealth or circumstances, to have free access to health advice that is on-par with top-rated practicing clinicians. Tonight’s results clearly illustrate how AI-augmented health services can reduce the burden on healthcare systems around the world.’
Professor Marshall said that no app or algorithm could replace a GP. 'An app might be able to pass an automated clinical knowledge test but the answer to a clinical scenario isn’t always cut and dried,’ he said.
‘The potential of technology to support doctors to deliver the best possible patient care is fantastic, but at the end of the day, computers are computers, and GPs are highly-trained medical professionals: the two can’t be compared and the former may support but will never replace the latter.’