This paper reports the results of an evaluation study of the current level of performance given by ANEMIA, a knowledge-based consultation system addressing the clinical problem of managing anemic patients. ANEMIA was developed on a mainframe using the AI programming scheme EXPERT and then translated into a version running on a personal computer. At present the system is able to provide assistance in the diagnosis and management of 65 disease entities. After extensive local testing of accuracy, completeness, and consistency of the knowledge base included into ANEMIA, we designed a study to evaluate whether the system is able to appropriately mirror also the reasoning of well-known hematologists other than those who provided the knowledge. We were also interested in testing whether there were conflicting opinions among hematologists. Thus, we designed a validation study in which ANEMIA's performance could be compared with that of six hematologists and the interexpert consensus evaluated. ANEMIA's overall performance was judged acceptable in 87% (26/30) of the cases, while expert evaluators agreed with their colleagues in 90% (27/30) of them. A low interexpert consensus was found: considering the ratings given by different hematologists to the same ANEMIA performance, complete agreement occurred only 47% of the time.
A performance evaluation of the expert system ANEMIA.
QUAGLINI, SILVANA;STEFANELLI, MARIO;
1988-01-01
Abstract
This paper reports the results of an evaluation study of the current level of performance given by ANEMIA, a knowledge-based consultation system addressing the clinical problem of managing anemic patients. ANEMIA was developed on a mainframe using the AI programming scheme EXPERT and then translated into a version running on a personal computer. At present the system is able to provide assistance in the diagnosis and management of 65 disease entities. After extensive local testing of accuracy, completeness, and consistency of the knowledge base included into ANEMIA, we designed a study to evaluate whether the system is able to appropriately mirror also the reasoning of well-known hematologists other than those who provided the knowledge. We were also interested in testing whether there were conflicting opinions among hematologists. Thus, we designed a validation study in which ANEMIA's performance could be compared with that of six hematologists and the interexpert consensus evaluated. ANEMIA's overall performance was judged acceptable in 87% (26/30) of the cases, while expert evaluators agreed with their colleagues in 90% (27/30) of them. A low interexpert consensus was found: considering the ratings given by different hematologists to the same ANEMIA performance, complete agreement occurred only 47% of the time.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.