Abstract
This review explores the advancements in artificial intelligence for radiograph fracture diagnosis, emphasizing technological developments and inherent limitations. Artificial intelligence improves diagnostic accuracy and manages workflow efficiency. The review categorizes artificial intelligence applications in fracture diagnosis into four primary tasks: recognition, classification, detection, and localization. The most popular performance metrics, such as diagnostic accuracy, precision, sensitivity, specificity, and area under the curve analysis, are used to compare artificial intelligence systems with traditional radiological methods and are explained as serving as a guide. Each task and performance metric is illustrated with practical examples and success stories from recent literature, offering insights into the strengths and weaknesses of various artificial intelligence approaches, such as support vector machines, convolutional neural networks, and generative adversarial networks. We also incorporate case analyses, underscoring the potential and limitations of artificial intelligence in fracture detection. In particular, challenges were posed by external factors such as casts and anatomical complexities. Future directions are explored, emphasizing human-artificial intelligence collaboration and the development of more advanced, transparent artificial intelligence systems alongside parallel evolving ethical considerations and regulatory frameworks. This review aims to equip clinicians with the knowledge to understand and utilize artificial intelligence technologies effectively in their practice.
Introduction
The accurate and timely diagnosis of fractures is a critical component of trauma care, yet it remains a significant challenge in clinical practice. Each year, millions of patients worldwide present with bone fractures, imposing a substantial burden on healthcare systems. In 2019 alone, the Global Burden of Disease Study estimated that 178 million new fractures occurred globally, and the prevalence of fractures was 455 million [1]. Traditional methods of fracture diagnosis, predominantly reliant on radiographic imaging and the expertise of radiologists, are prone to errors and inconsistencies. Misinterpretation of plain radiographs is a frequent cause of diagnostic errors and is common in emergency departments, where fractures represent up to 80% of diagnostic errors [2]. Studies indicate that approximately 3.1% of all fractures are not diagnosed at the initial visit and 86% of such errors have consequences for treatment [3]. Factors contributing to these errors include physician visual or decision fatigue and satisfaction bias [4]. There is a peak in errors in fracture diagnoses between 8 pm and 2 am, highlighting the impact of diurnal variations on diagnostic accuracy [3]. The implications of these diagnostic errors are far-reaching, often leading to delayed treatment, suboptimal patient outcomes, and increased healthcare costs. For instance, misdiagnosed fractures can result in improper immobilization, delayed surgical interventions, and prolonged rehabilitation periods, thereby exacerbating patient morbidity [5]. Furthermore, the high incidence of diagnostic errors is compounded by the increasing workload among radiologists. Physician burnout is a significant issue in radiology, with one study indicating that 54–72% of radiologists reported aforementioned symptoms [6]. Factors contributing to burnout include prolonged working hours with more than 80 h a week working too hard, too fast, and too long, and emotional exhaustion through depersonalization [6]. The primary objective of this review is to provide a comprehensive overview of the current state of artificial intelligence (AI) applications in fracture diagnosis on radiographs, examining both the technological advancements and their limitations. Additionally, the review will present case reports demonstrating AI's effectiveness and limitations.
Artificial intelligence in radiology
Introduction to artificial intelligence
Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn [7]. The development of AI technologies can be traced back to the 1950s, with the term 'artificial intelligence' being coined by John McCarthy in 1956 [8]. AI encompasses a variety of techniques, including machine learning and deep learning, each with its unique approach to mimicking human cognitive functions [9]. In healthcare, AI has demonstrated remarkable capabilities in interpreting medical images, predicting disease outcomes, and personalizing treatment plans [10]. These technologies can analyze vast amounts of data from various sources to identify patterns, make predictions, and support clinical decision-making [11]. The potential of AI in healthcare is vast, offering tools for general health, chronic disease self-management, mental health, and diagnostic procedures [12]. In radiology, AI applications have shown significant promise. AI systems can automate routine tasks, such as image reconstruction and preliminary analysis, allowing radiologists to focus on more complex diagnostic challenges [13]. AI can enhance diagnostic accuracy by identifying subtle patterns in medical images that may be overlooked by the human eye [14]. Additionally, AI-driven tools can assist in triaging cases, prioritizing those that require immediate attention, thus improving workflow efficiency and patient outcomes [15]. Beyond fracture detection, AI has been applied in various radiological tasks, including tumor detection, organ segmentation, and disease classification [13]. For instance, AI algorithms have been developed to detect lung nodules in thoracic imaging, characterize microcalcifications in mammograms, and diagnostic predictions of brain abnormalities [13]. Fazekas et al. (2022) presented a significant advancement in the field by demonstrating a three-dimensional segmentation of the liver from CT scans using AI. This approach offers precise delineation of tumor boundaries, a critical factor in clinical decision-making and surgical planning. By challenging the traditional gold standard of manual segmentation, this method exemplifies how AI can enhance accuracy and efficiency in radiological practices [7].
These applications demonstrate the versatility and potential of AI to transform radiological practices, making them more accurate, efficient and accessible [13]. However, the integration of AI in healthcare is not without challenges. Issues related to data privacy, the need for robust regulatory frameworks, and the integration of AI systems into existing clinical workflows must be addressed to harness AI's full potential [16]. Continuous learning and adaptation of AI systems are essential to keep pace with evolving medical knowledge and practices [17]. Training AI models with diverse and representative datasets is crucial to ensure their accuracy and generalizability across different patient populations [17]. The successful implementation of AI in healthcare requires interdisciplinary collaboration among clinicians, data scientists, and regulatory bodies. Such collaboration ensures that AI technologies are developed, validated, and integrated in ways that enhance patient care and uphold ethical standards [18].
Machine learning
Machine learning (ML) is a subset of artificial intelligence that focuses on developing algorithms that allow computers to learn from and make decisions based on data. The primary goal of machine learning is to create models that can generalize from specific examples to make accurate predictions or decisions on new, unseen data [19]. Machine learning algorithms can be broadly categorized into three types: supervised learning, unsupervised learning, and reinforcement learning. In supervised learning, the algorithm is trained on a labeled dataset, where the input data is paired with the correct output [19]. For example, an AI model was trained on the Surveillance, Epidemiology, and End Results (SEER) program database to estimate the survival time of lung cancer patients on par with that of classical methods [20]. Unsupervised learning deals with unlabeled data, aiming to identify hidden patterns or intrinsic structures within the data without explicit guidance [19]. Reinforcement learning involves training an agent to make a sequence of decisions by rewarding it for desirable actions and penalizing it for undesirable ones [21].
In radiology, machine learning has shown considerable promise. ML algorithms have been developed to detect abnormalities in medical images, such as fractures, tumors, and other pathologies, with high accuracy [22, 23]. These algorithms can assist radiologists by prioritizing cases that require immediate attention and prompt communication to referring physicians, thereby reducing the risk of communication failures [24]. Despite its potential, the application of machine learning in healthcare faces several challenges, including the need for large, high-quality datasets to train the models, the risk of overfitting, and the difficulty in interpreting the model's decisions [17, 25, 26]. Overfitting occurs when a machine learning model learns the details and noise in the training data to the extent that it negatively impacts the performance of the model on new data. This means that the model performs exceptionally well on the training data but fails to generalize to new, unseen data, leading to poor performance in practical applications. Addressing overfitting requires techniques such as regularization, early stopping, and using larger, more diverse training datasets [25].
Deep learning
Deep learning (DL) is a specialized subset of machine learning that utilizes artificial neural networks with multiple layers (hence “deep” learning) to model complex patterns in data [27]. Deep learning algorithms have been particularly successful in image and speech recognition tasks, making them highly relevant for applications in radiology [27]. Artificial neural networks are inspired by the structure and function of the human brain. These networks consist of interconnected nodes (neurons) organized into layers. Each connection between neurons has an associated weight that is adjusted during training to minimize the error in predictions [28]. The multilayer architecture of deep learning models allows them to learn hierarchical representations of data, where higher layers capture increasingly abstract features [29]. Convolutional neural networks (CNNs) are a type of deep learning model particularly well-suited for image analysis [30]. CNNs use convolutional layers to automatically detect and learn spatial hierarchies in images, making them effective for tasks such as identifying fractures in X-ray images [30]. For example, CNNs have been used to detect fractures in wrist X-rays with performance comparable to that of expert radiologists [31]. Another type of deep learning model, recurrent neural networks (RNNs), is designed for sequential data and has been used in natural language processing and time-series analysis [32]. While RNNs are less commonly applied in radiology, they hold the potential for integrating sequential medical data, such as imaging studies over time, to improve diagnostic accuracy [33].
The success of deep learning in radiology is attributed to its ability to automatically learn relevant features from raw data, reducing the need for manual feature engineering [34]. This capability is particularly valuable in radiology, where subtle patterns in medical images can be critical for accurate diagnosis [34]. For instance, deep learning models have been used to identify malignant lesions in mammograms, detect diabetic retinopathy in retinal images, and classify lung nodules in CT scans with high accuracy [35–37]. Despite its promise, deep learning in radiology faces several challenges. Training deep learning models requires large annotated datasets, which can be difficult to obtain in medical imaging due to privacy concerns and the need for expert annotations [38]. Additionally, deep learning models are often considered “black boxes” because their decision-making processes are not easily interpretable, raising concerns about trust and accountability in clinical settings [39]. Addressing these challenges involves developing methods for model interpretability, ensuring data privacy, and creating standardized protocols for training and validating deep learning models in radiology [26, 39].
AI technologies used in fracture diagnosis
AI technologies overview: different AI approaches and their foundations
The application of AI in fracture diagnosis on radiographs encompasses four primary tasks: recognition, classification, detection, and localization. Various AI approaches, including ML and DL techniques, are utilized in these tasks, each with distinct strengths and weaknesses.
Recognition involves the task of deciding whether the object is the target or not, so identifying whether an X-ray image is fractured or non-fractured [40]. Machine learning models like decision trees are effective for binary classification tasks, providing clear decision paths beneficial for medical diagnostics. Studies have shown that decision trees can effectively recognize fractures by analyzing patterns in radiographic data as Wint Wah et al. demonstrated by fully automatically recognizing tibia bone fractures [41]. In deep learning, Fully-Connected Neural Networks (FCNNs) leverage large datasets to learn and generalize, demonstrating high accuracy and reliability in differentiating between fractured and non-fractured images [42]. Artificial Neural Networks (ANNs), including FCNNs, are composed of an input layer, one or more hidden layers, and an output layer [42]. Yang & Cheng demonstrated requiring a reduced number of training data compared to CNNs through the use of detected bone contours of long bones and removal of the surrounding detected flesh contours by automatically classifying them as non-fractures as inputs for the network [42].
Classification involves classifying the object into a specific category, so not only recognizing fractures but also categorizing the fracture types, for example, “normal”, “transverse”, “oblique”, and “comminuted” fracture types [40, 41]. The so-called “random forests”, an ensemble learning method, improves classification accuracy and robustness by combining multiple decision trees. These models have been applied by Ngan et al. to classify the treatment diagnosis of Colles' fractures of the wrist by indications of swelling, - osteoporosis, and texture indications at radial bone with an average prediction accuracy of 91.0% [43]. Support Vector Machines (SVMs) are a type of supervised machine learning algorithm that works by finding the hyperplane that best separates the data into different classes, making them well-suited for classification tasks, particularly in complex cases like spine fractures [44]. Mehta et al. identified incidental lumbar spine fractures that were missed by radiologists from routine Dual-Energy X-Ray Absorptiometry (DEXA) studies using an SVM classifier [45]. In deep learning, Generative Adversarial Networks (GANs) enhance training datasets through data augmentation, generating synthetic images to improve classification robustness and accuracy, especially in scenarios with limited annotated medical images [46]. Networks like this were utilized by Mutasa et al. for the classification of femoral neck fractures as Garden I/II fracture, Garden III/IV fracture, or no fracture successfully, and the augmentation of datasets proved to increase accuracy [46].
Detection involves finding the target position using bounding boxes, so surrounding the fracture, and identifying bone fracture parts [40]. Deep learning models based on convolutional neural networks (CNNs) have shown remarkable success in detecting fractures. Notably, models such as Region Convolutional Neural Network (R-CNN), You Only Look Once (YOLO), and RetinaNet excel in these tasks. Region Convolutional Neural Network (R-CNN) is a well-known classical deep learning detector, proposed in 2014 by Girshick et al., that consists of two stages [47]. In its first stage, CNNs generate region proposals. The second stage uses a Support Vector Machine (SVM) for classification and refining the bounding boxes [47]. While R-CNN is highly accurate, it is computationally intensive and relatively slow, making it less suitable for real-time applications. YOLO is a series of one-stage detection algorithms and was first proposed by Redmon et al. in 2016 [48]. Unlike the traditional two-stage detectors, which solve object detection through a proposal stage followed by classification, YOLO reframes the detection task as a single regression problem. It directly predicts the bounding boxes and class probabilities for objects within an image in one evaluation [48]. This innovative approach allows YOLO to achieve real-time detection speeds while maintaining good accuracy, making it particularly suitable for the rapid identification of fractures. While YOLO is faster, it is generally less accurate compared to two-stage detectors like R-CNN, especially on smaller objects [49]. RetinaNet, also a one-stage detection algorithm, addresses the issue of class imbalance often encountered in object detection tasks [50]. Class imbalance occurs when there are significantly more instances of some classes than others in the training data, which can cause the model to perform poorly on the less frequent classes. Lin et al. introduce 2017 a focal loss function that prioritizes hard-to-classify examples, thereby improving the model's performance on datasets with a high variance in object sizes and frequencies [50]. This makes it more effective in detecting fractures that might be less common or harder to identify. RetinaNet achieved performance that approaches that of two-stage detectors like R-CNN while maintaining the speed advantages of a one-stage approach [50]. Krogue et al. experienced expert-level accuracy when compared with human observers utilizing RetinaNet for fully automated detection of hip fractures on pelvic radiographs [51]. This makes RetinaNet a strong contender when both accuracy and speed are important considerations.
Localization involves specifying the location information of the target, so directly identifying the fracture position with key points, lines, or heat maps instead of bounding boxes [40]. U-Net is a method for semantic segmentation employing a symmetric convolutional network structure, so U-shaped, which classifies each pixel on the input image as either object or background to predict the border pixels [52]. Lindsey et al. developed an extension of U-net to localized wrist fractures on radiographs displayed as heat maps with a diagnostic accuracy similar to that of senior subspecialized orthopedic surgeons [53]. Another deep learning method, Gradient-weighted Class Activation Mapping (Grad-CAM), visualizes deep learning algorithms, enhancing CNN interpretability by highlighting image regions most relevant for prediction. However, Grad-CAM first needs to obtain feature maps through the initial detection with a CNN before visualizing for example bone fractures on images [54]. Yoon et al. were able to localize occult scaphoid bone fractures on radiographs with Grad-CAM that might not be visible to human observers [55].
Studies on AI performance metrics
The performance of AI systems in fracture detection has been extensively studied and compared to traditional radiological methods. This comparative analysis evaluates the most common performance metrics, including diagnostic accuracy, precision, sensitivity, specificity, area under the curve (AUC), and pixel accuracy. These metrics provide a comprehensive understanding of AI systems' effectiveness in fracture diagnosis relative to human radiologists.
Diagnostic accuracy refers to the proportion of true results (both true positives and true negatives) among the total number of cases examined. It measures the overall correctness of the AI system in identifying fractures [56]. Studies have shown that AI algorithms can achieve diagnostic accuracy comparable to, and sometimes exceeding, that of expert radiologists. For instance, a study by Beyaz et al. demonstrated that a deep-learning ensemble method could detect hip fractures on plain anteroposterior (AP) pelvic radiographs with an accuracy of 97.1% [57].
Precision, also known as a positive predictive value, is the proportion of true positive results among all positive results predicted by the AI system. High precision indicates that the AI system has a low false positive rate [56]. A study by Fukuda et al. reported that convolutional neural networks achieved a high precision of 93% in detecting vertical root fractures on panoramic radiography [58].
Sensitivity, or true positive rate, is the proportion of actual positives that are correctly identified by the AI system. It measures the AI's ability to correctly identify fractures [56]. Duron et al. could improve the sensitivity of physicians by 8.7% by utilizing AI aid to detect appendicular skeletal fractures on radiographs, indicating AI systems' effectiveness in correctly identifying most fractures [2].
Specificity, or true negative rate, is the proportion of actual negatives that are correctly identified by the AI system. It measures the AI's ability to correctly identify non-fractured cases [56]. Lindsey et al. reported a specificity for emergency medicine clinicians of 87.5% unaided and 93.9% with the assistance of the deep learning model in detecting fractures in wrist radiographs [53]. The Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve is a measure of the AI system's ability to distinguish between fractured and non-fractured cases. The ROC curve plots the true positive rate (sensitivity) against the false positive rate (1-specificity) at various threshold settings. A higher AUC indicates better overall performance of the model, with an AUC of 1.0 representing perfect accuracy [59]. For example, Cheng et al. were able to identify hip fractures on plain frontal pelvic radiographs using a deep convolutional neural network with an AUC of 0.98, indicating high diagnostic performance [60].
Comparative analysis: AI systems vs. traditional radiological methods
This review includes case analyses from the Medical Imaging Centre, Semmelweis University, Budapest utilizing commercially available deep convolutional neural networks based on the object detection framework (“Detectron 2”) that works as a two-stage object detector [61] (Figs 1–4).
Simple Fracture: Fracture detection on X-ray images of a 39-year-old male patient with a simple fracture of the 2nd distal phalanx (right hand). The patient presented with pain and limited motion in the right index finger after an incident at work where his finger was caught between iron rods, reporting significant swelling and tenderness at the proximal interphalangeal joint. The radiographic examination included anteroposterior and lateral views of the right index finger. The X-rays revealed a simple fracture of the distal phalanx of the right index finger along the ulnar side, with no significant displacement. The finger was immobilized with a splint for three weeks, with follow-up recommended to assess healing. a Anteroposterior X-ray image showing the fracture line at the distal phalanx of the index finger (arrow). b Lateral X-ray image confirming the fracture on the distal phalanx (arrow). c AI analysis correctly identifies the fracture, matching the radiologist's diagnosis. AP- anteroposterior
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
Complex Fracture: Fracture detection on X-ray images of a 28-year-old female patient with complex fractures of the right tibia and fibula. The patient fell from a height of approximately 3 m at her workplace, resulting in trauma to the right lower leg and chest. Clinical examination revealed deformity and significant swelling in the right lower leg. The radiographic examination included two views of the right lower leg. The X-rays revealed a complex fracture of the tibia and fibula with noticeable displacement. The tibial fracture was characterized by a spiral pattern in the mid-diaphysis, while the fibular fracture, located in the proximal third, also displayed significant displacement. The patient underwent intramedullary nailing of the tibia to stabilize the fracture, followed by external splinting. Post-operative care included pain management and physiotherapy. a2, a2 Anteroposterior X-ray image showing the complex fractures of the tibia and fibula. b1, b2 Lateral X-ray image confirming the fracture complexity and displacement. c1–4 AI analysis accurately identifies the fractures, matching the radiologist's diagnosis
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
Hairline Fracture: Fracture detection on X-ray images of a 26-year-old male patient with a non-displaced hairline fracture of the left scaphoid bone through cast. The patient presented with pain and difficulty moving the left wrist after a bicycle accident. The radiographic examination included multiple views of the left wrist. The X-rays revealed a hairline fracture of the scaphoid bone without displacement. The wrist was immobilized with a splint, and follow-up was recommended to monitor healing. a Posteroanterior X-ray image showing the hairline fracture between the middle and distal third of the scaphoid bone (arrow). b Lateral X-ray image confirming the fracture of the scaphoid bone (arrow). c Oblique X-ray image providing an additional perspective on the scaphoid fracture (arrow). d AI analysis accurately identified the hairline fracture, matching the radiologist's diagnosis
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
Post-operation: Fracture detection on post-operative X-ray images of an 84-year-old male patient with a comminuted pertrochanteric femoral fracture (right side) treated with internal fixation. The patient sustained a comminuted femoral fracture, which was stabilized using internal fixation devices. Post-operative care included monitoring the alignment and stabilization of the fracture through X-rays. The comminuted nature of the fracture indicated multiple fragments, necessitating precise hardware placement to ensure proper healing. The images demonstrate the state of the fracture and the positioning of the internal fixation devices. The X-rays reveal that the fracture fragments are in good alignment and the fixation hardware is correctly positioned, confirming successful stabilization. a Anteroposterior X-ray image showing the pertrochanteric femoral fracture with internal fixation (arrow). b The AI system correctly identified the comminuted femoral fracture despite the presence of internal fixation hardware, matching the radiologist's diagnosis
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
These analyses serve as illustrative examples demonstrating the effectiveness of AI systems in fracture diagnosis. While these cases highlight the potential of AI systems, they are not part of new research conducted by the authors but rather examples of AI's application. Integrating AI into clinical practice promises to improve fracture diagnosis and overall patient care.
Limitations and challenges
AI-aided fracture detection has many limitations. Sometimes it is necessary to combine data from several imaging modalities, such as MRI, CT scan, and X-ray, to diagnose bone fractures. To provide a thorough diagnosis, radiologists are educated to examine and correlate results from several imaging modalities. Single-modality-focused AI algorithms might not be able to efficiently combine data from many imaging sources.
Concerns about liability also exist over who would bear responsibility in the event that an algorithm malfunctions and causes harm. The General Data Protection Regulations (GDPR), which the European Union adopted in response to this concern, stipulate that AI algorithmic judgments regarding humans must be comprehensible and explicable. The absence of trustworthy ground truth labels for AI algorithm training is another drawback. The majority of research employed datasets that included ground truth labels derived from official radiologist reports obtained from the patient's medical file, which are prone to mistakes and misunderstandings. More accurate AI algorithms may be developed as a result of improved ground truth labeling, such as surgical conclusions or more advanced imaging. In addition, if an algorithm is trained on such data, it may generate less accurate predictions for racial minorities or any other group that is underrepresented in the dataset. This is similar to the bias present in human judgment, which results from a person's prior experiences and may lead to inappropriate healthcare decisions. However, the repercussions for patient care and safety can be significantly worse when a biased AI system is extensively adopted and utilized concurrently by many doctors. The AI models now in use to diagnose bone fractures might not be able to recognize other associated abnormalities, such as cancers, infections, metabolic disorders, or inflammations, which could coexist with or explain a patient's symptoms [62].
Consequently, if another doctor does not examine the X-rays, the AI system can produce a “fracture not found” result, resulting in missed consequences. Certain problems have the potential to adversely affect patients' healthcare outcomes and have a major impact [63]. Therefore, while utilizing AI for fracture identification, medical supervision is always required. Furthermore, AI systems typically do not recognize the concept of “I do not know” or estimate uncertainty, which might lead to erroneous clinical choices and a false sense of assurance. When making healthcare decisions, it is inevitable to be aware of this restriction, carefully evaluate the outcomes generated by AI systems, take uncertainty into account, and consult expert medical judgment [62]. It is remarkable how well AI can now detect bone fractures, especially in areas with minimal resources and high traffic [64]. However, because of its limitations, it cannot fully replace orthopedic doctors.
Because of interpretation skills, ambiguity, legal and ethical issues, patient-clinician contact, adaptability, bias, and trust, AI can improve radiology but not completely replace radiologists. Rather, a cooperative future where AI complements radiologists' knowledge and helps with radiographic analysis is envisaged. Preprocessing and segmenting images, identifying anomalies, offering automatic second opinions, ranking cases, streamlining processes, and assisting with resource distribution are all possible with AI. Despite its promise, AI is yet unable to match radiologists' sophisticated decision-making. There is also a chance that a cybersecurity breach might undermine the system [65].
Various factors can potentially disrupt AI analysis and lead to errors in fracture detection. In our preliminary research, several instances highlighted how external factors might influence AI interpretation, resulting in false positives and false negatives.
One notable issue is the presence of external stabilizers like casts and splints. For example, a 54-year-old female patient (Fig. 5) with confirmed fractures of the right radius and ulna had an AI analysis that mistakenly identified the cast material as a suspected third fracture, demonstrating how external materials can lead to diagnostic errors.
False positive case: The 54-year-old female patient sustained fractures to the distal ends of the right radius and ulna after falling from a height of approximately 1.5 m. She presented with a deformed lower third of the right forearm and wrist, which were swollen and tender. Initial treatment involved immobilization with a cast. However, due to the complex nature of the fractures, which included significant displacement, the patient was admitted for surgical intervention involving plate osteosynthesis. The radiographic examination included two views of the right wrist and forearm. The X-rays revealed a transverse fracture of both the distal radius and ulna with slight displacement. Post-operative care included monitoring for proper alignment and stabilization of the fracture using X-rays. a Anteroposterior X-ray image showing the fractures of the distal radius and ulna, along with the cast (arrow). b Lateral X-ray image confirming the alignment of the fractures post-operatively, with the cast in place (arrow). c AI analysis incorrectly identifies the cast as a suspected third fracture (arrow), illustrating a false positive result. The actual fractures of the radius and ulna are correctly identified, but the AI system misinterprets the cast material
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
Additionally, AI systems can sometimes misinterpret anatomical features, leading to false negatives. For instance, in the case of a 23-year-old female patient (Fig. 6), the AI system failed to detect a fracture in the foot and instead bizarrely identified pleural effusion, highlighting the limitation of AI in recognizing it cannot provide an accurate diagnosis and the need for human oversight.
False positive case: Fracture detection on X-ray images of a 23-year-old female patient with a fracture of the 5th metatarsus (left foot). The patient presented with pain and reduced motion in the left foot after an injury during a vacation where she hit the outer part of her foot. The patient reported no other pain or complaints. The radiographic examination included two views of the left foot. The X-rays revealed a transverse fracture of the base of the 5th metatarsus without displacement. The foot was immobilized with a cast for three weeks, with follow-up recommended for cast removal and control X-ray. a Anteroposterior X-ray image showing the fracture at the base of the 5th metatarsus (arrow). b Oblique X-ray image confirming the transverse fracture at the base of the V. metatarsus (arrow). c AI analysis erroneously identified pleural effusion in the foot and failed to detect the fracture
Source: Medical Imaging Centre, Semmelweis University, Budapest
Citation: Imaging 2025; 10.1556/1647.2025.00277
Moreover, complex anatomical presentations can also pose challenges for AI. A case involving an 83-year-old female patient (Fig. 7) with multiple vertebral fractures showed that the AI system missed some fractures, emphasizing the importance of a thorough human evaluation to ensure no critical findings are overlooked.
False negative case: Fracture detection on X-ray images of an 83-year-old female patient with multiple vertebral fractures. The patient presented with severe back pain following several falls at home over the past few weeks, reporting no head injuries or loss of consciousness. The radiographic examination included several views of the thoracolumbar spine and pelvis. The X-rays revealed compression fractures at the T12, and L3-4 vertebrae, with the L1 and L3 fractures initially missed by AI detection. The fractures were treated conservatively with pain management and the use of a spinal brace to aid in stabilization and healing. Follow-up is recommended, including regular imaging to monitor healing and further assessments to prevent future falls and fractures. a Anteroposterior X-ray image of the chest showing mediastinal enlargement, which is not directly related to vertebral fractures. b Anteroposterior X-ray image of the pelvis showing degenerative changes and compression fractures in the lumbar spine (arrows). The L1 and L3 fractures were not detected by AI. c: Anteroposterior X-ray image showing compression fractures at T12 and L3 vertebrae (arrows). The L3 fracture was not detected by AI. d Lateral X-ray image of the lumbar spine confirming the presence of fractures at L1 and L3 vertebrae (arrows), with the L3 fracture being a fresh injury missed by AI. e, f, g Retrospective AI analysis missing the L1 and L3 fractures but identifying other vertebral changes. The AI detected fractures at T12 and L4, which matched part of the radiologist's diagnosis
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
In terms of quantifying fractures, AI systems may also struggle. A case involving a 70-year-old female patient (Fig. 8) demonstrated this issue when the AI identified two fractures as one due to overlapping bounding boxes, illustrating the difficulty in accurately quantifying fractures in complex scenarios.
Misinterpretation of fracture numbers: The patient, a 70-year-old female, presented with pain and swelling in the left leg following an incident where dogs collided with her left knee. She reported significant swelling and tenderness around the knee area. The radiographic examination included two views of the left leg, revealing fractures of the proximal tibia and fibula. The patient's fractures were fixed with plates and screws for stabilization. Post-operative care included physiotherapy to enhance circulation, muscle strength, and range of motion. The patient was instructed to avoid weight-bearing on the affected leg and was provided with a walker for ambulation. a Anteroposterior X-ray image showing the fracture lines in the proximal tibia and fibula. b Lateral X-ray image confirming the fractures in the proximal tibia and fibula. c, d AI analysis indicates the fracture sites on the tibia and fibula, matching the radiologist's diagnosis. However, the square covers both fracture sites instead of displaying a separate square for each fracture. Therefore the AI identified the two fractures as one fracture
Source: Medical Imaging Centre, Semmelweis University, Budapest.
Citation: Imaging 2025; 10.1556/1647.2025.00277
The abovementioned cases emphasize the need for continuous improvement in AI systems to reduce false positives and negatives. Additionally, potential artifacts such as atherosclerotic plaques, could also pose challenges for AI interpretation in the future. By understanding and addressing these limitations, we can better integrate AI into clinical workflows, enhancing its utility while maintaining high standards of patient care. Integrating AI with expert human oversight remains essential to provide the highest standards of patient care.
Future directions
Comparable reported diagnostic performance in fracture identification was seen between artificial intelligence and physicians, indicating that AI technology has potential as a diagnostic adjunct in future clinical practice. Thus, using AI as a tool to support human decision-making and improve healthcare delivery would be the best course of action. Doctors continue to be essential to patient care and clinical decision-making at the same time.
Future advancements in AI for fracture diagnosis will likely include integration with multimodal imaging techniques, and enhancing diagnostic accuracy by combining data from X-rays, CT scans, and MRIs. Developing sophisticated AI algorithms that learn from smaller, diverse datasets will mitigate limitations related to data availability and bias. Another important direction is the improvement of AI interpretability and transparency. Ensuring that AI systems can provide clear, understandable explanations for their diagnoses will help build trust among clinicians and patients. Advances in techniques like Grad-CAM, which enhance the interpretability of deep learning models, are steps in the right direction. Ethical considerations and regulatory frameworks will also need to evolve in parallel with technological advancements. Ensuring patient data privacy, establishing robust validation protocols for AI systems, and addressing liability issues are all essential for the responsible integration of AI in healthcare. Policymakers, clinicians, and AI developers must collaborate to create guidelines that protect patient interests while fostering innovation.
Conclusion
In conclusion, the future of AI in fracture diagnosis is bright, with the potential to significantly enhance diagnostic accuracy, efficiency, and accessibility. By addressing current limitations and ensuring ethical implementation, AI can become a valuable tool in the radiologist's arsenal, ultimately improving patient outcomes and transforming healthcare delivery.
Author contributions
(1) conception and design, or acquisition of data, or analysis and interpretation of data: SSHB, AG, PH, NM.
(2) drafting the article or revising it critically for important intellectual content: SSHB, AG, PH, NM.
(3) final approval of the version to be published: SSHB, AG, PH, NM.
(4) agree to be accountable for all aspects of the work if questions arise related to its accuracy or integrity: SSHB, AG, PH, NM.
Conflict of interest
The authors declare no conflicts of interest.
Funding
Based on Nikolett Marton's grants the project was supported by the UNKP-23-5 New National Excellence Program of the Ministry for Culture and Innovation from the source of the National Research Development and Innovation Fund and Bolyai Research Scholarship (Hungary).
Ethics
This work was carried out in accordance with the World Medical Association Declaration of Helsinki: ethical principles for medical research involving human subjects. JAMA. 2000 Dec 20;284(23):3043-5. PMID: 11122593. Approval from the Institutional Review Board was obtained and in keeping with the policies for a retrospective review, informed consent was not required.”
Acknowledgments
We are grateful to Viktor Gaál MD, PhD for his advice to improve the statistical analyses also to Pál Maurovich-Horvat MD, PhD, MPH and Panna Szőllősi MD for their help in organization.
Abbreviations
AI | artificial intelligence |
ANN | artificial neural networks |
AP | anteroposterior |
AUC | area under the curve |
CNN | convolutional neural networks |
CT | computed tomography |
DEXA | dual-energy X-ray absorptiometry |
FDG-PET/CT | fluorodeoxyglucose positron emission tomography/computed tomography |
FCNN | fully-connected neural networks |
GAN | generative adversarial network |
GDPR | General Data Protection Regulations |
Grad CAM | Gradient-weighted Class Activation Mapping |
ML | machine learning |
MRI | magnet resonance imaging |
R-CNN | Region Convolutional Neural Network |
RNN | recurrent neural networks |
ROC | Receiver Operating Characteristic |
SEER | surveillance, epidemiology and end results |
SVM | Support Vector Machine |
YOLO | You Only Look Once |
References
- [1]↑
Global, regional, and national burden of bone fractures in 204 countries and territories, 1990–2019: a systematic analysis from the global burden of disease study 2019. Lancet Healthy Longev 2021; 2(9): e580–e592.
- [2]↑
Duron L, Ducarouge A, Gillibert A, Lainé J, Allouche C, Cherel N, et al.: Assessment of an AI aid in detection of adult appendicular skeletal fractures by emergency physicians and radiologists: a multicenter cross-sectional diagnostic study. Radiology 2021; 300(1): 120–129.
- [3]↑
Hallas P, Ellingsen T: Errors in fracture diagnoses in the emergency department--characteristics of patients and diurnal variation. BMC Emerg Med 2006; 6: 4.
- [4]↑
Guermazi A, Tannoury C, Kompel AJ, Murakami AM, Ducarouge A, Gillibert A, et al.: Improving radiographic fracture recognition performance and efficiency using artificial intelligence. Radiology 2022; 302(3): 627–636.
- [5]↑
Liu PR, Zhang JY, Xue MD, Duan YY, Hu JL, Liu SX, et al.: Artificial intelligence to diagnose tibial plateau fractures: an intelligent assistant for orthopedic physicians. Curr Med Sci 2021; 41(6): 1158–1164.
- [6]↑
Canon CL, Chick JFB, DeQuesada I, Gunderman RB, Hoven N, Prosper AE: Physician burnout in radiology: perspectives from the field. AJR Am J Roentgenol 2022; 218(2): 370–374.
- [7]↑
Fazekas S, Budai BK, Stollmayer R, Kaposi PN, Bérczi V: Artificial intelligence and neural networks in radiology – Basics that all radiology residents should know. Imaging 2022; 14(2): 73–81.
- [8]↑
Amisha, Malik P, Pathania M, Rathaur VK: Overview of artificial intelligence in medicine. J Family Med Primary Care 2019; 8(7): 2328–2331.
- [9]↑
Beam AL, Kohane IS: Big data and machine learning in health care. Jama 2018; 319(13): 1317–1318.
- [10]↑
Bi WL, Hosny A, Schabath MB, Giger ML, Birkbak NJ, Mehrtash A, et al.: Artificial intelligence in cancer imaging: clinical challenges and applications. CA Cancer J Clin 2019; 69(2): 127–157.
- [11]↑
McKee M, Wouters OJ: The challenges of regulating artificial intelligence in healthcare comment on “clinical decision support and new regulatory frameworks for medical devices: are we ready for it? – A viewpoint paper”. Int J Health Policy Manag 2023; 12: 7261.
- [12]↑
Chew HSJ, Achananuparp P: Perceptions and needs of artificial intelligence in health care to increase adoption: scoping review. J Med Internet Res 2022; 24(1): e32939.
- [13]↑
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL: Artificial intelligence in radiology. Nat Rev Cancer 2018; 18(8): 500–510.
- [14]↑
Goldenberg SL, Nir G, Salcudean SE: A new era: Artificial intelligence and machine learning in prostate cancer. Nat Rev Urol 2019; 16(7): 391–403.
- [15]↑
Annarumma M, Withey SJ, Bakewell RJ, Pesce E, Goh V, Montana G: Automated triaging of adult chest radiographs with deep artificial neural networks. Radiology 2019; 291(1): 196–202.
- [16]↑
Saraswat D, Bhattacharya P, Verma A, Prasad VK, Tanwar S, Sharma G, et al.: Explainable AI for healthcare 5.0: opportunities and challenges. IEEE Access 2022; 10: 84486–84517.
- [17]↑
Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D: Key challenges for delivering clinical impact with artificial intelligence. BMC Med 2019; 17(1): 195.
- [18]↑
Bobak CA, Svoboda M, Giffin KA, Wall DP, Moore J: Raising the stakeholders: improving patient outcomes through interprofessional collaborations in AI for healthcare. Biocomputing 2021: World Sci 2020: 351–355.
- [19]↑
Jordan MI, Mitchell TM: Machine learning: trends, perspectives, and prospects. Science 2015; 349(6245): 255–260.
- [20]↑
Lynch CM, Abdollahi B, Fuqua JD, de Carlo AR, Bartholomai JA, Balgemann RN, et al.: Prediction of lung cancer patient survival via supervised machine learning classification techniques. Int J Med Inform 2017; 108: 1–8.
- [21]↑
Hua C: Reinforcement learning and feedback control. In: Reinforcement learning aided performance optimization of feedback control systems (ed: Hua C). Springer Fachmedien Wiesbaden, Wiesbaden, 2021, pp. 27–57.
- [22]↑
Bahya TM, Hussein N: Machine learning techniques to classify brain tumor. 2023 6th international conference on engineering technology and its applications (IICETA); 2023 15–16 July 2023, 2023, pp. 609–614.
- [23]↑
Mall PK, Singh PK, Yadav D: GLCM based feature extraction and medical X-RAY image classification using machine learning techniques. 2019 IEEE conference on information and communication technology; 2019 6–8 Dec. 2019, 2019, pp. 1–6.
- [24]↑
Meng X, Ganoe CH, Sieberg RT, Cheung YY, Hassanpour S: Assisting radiologists with reporting urgent findings to referring physicians: a machine learning approach to identify cases for prompt communication. J Biomed Inform 2019; 93: 103169.
- [25]↑
Aburass S: Quantifying overfitting: introducing the overfitting index. arXiv preprint arXiv:230808682. 2023.
- [26]↑
ElShawi R, Sherif Y, Al-Mallah M, Sakr S: Interpretability in healthcare: a comparative study of local machine learning interpretability techniques. Computational Intell 2021; 37(4): 1633–1650.
- [27]↑
Kim M, Yun J, Cho Y, Shin K, Jang R, Bae H-j, et al.: Deep learning in medical imaging. Neurospine 2019; 16(4): 657–668.
- [28]↑
Agatonovic-Kustrin S, Beresford R: Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J Pharm Biomed Anal 2000; 22(5): 717–727.
- [29]↑
Yu D, Deng L: Feature representation learning in deep neural networks. In: Automatic speech recognition: a deep learning approach (eds: Yu D, Deng L). Springer London, London, 2015, pp. 157–175.
- [30]↑
Yamashita R, Nishio M, Do RKG, Togashi K: Convolutional neural networks: An overview and application in radiology. Insights Into Imaging 2018; 9(4): 611–629.
- [31]↑
Thian YL, Li Y, Jagmohan P, Sia D, Chan VEY, Tan RT: Convolutional neural networks for automated fracture detection and localization on wrist radiographs. Radiology: Artificial Intelligence 2019; 1(1): e180001.
- [32]↑
Lipton ZC: A critical review of recurrent neural networks for sequence learning. ArXiv. 2015; abs/1506.00019.
- [33]↑
Vallathan G, Yanamadni VR, Vidhya RG, Ravuri A, Ambhika C, Sasank VVS: An analysis and study of brain cancer with RNN algorithm based AI technique. 2023 7th international conference on I-SMAC (IoT in social, mobile, analytics and cloud) (I-SMAC); 2023 11–13 Oct. 2023, 2023, pp. 637–642.
- [34]↑
Lee SM, Seo JB, Yun J, Cho Y-H, Vogel-Claussen J, Schiebler ML, et al.: Deep learning applications in chest radiography and computed tomography: current state of the art. J Thoracic Imaging 2019; 34(2): 75–85.
- [35]↑
Ribli D, Horváth A, Unger Z, Pollner P, Csabai I: Detecting and classifying lesions in mammograms with Deep Learning. Sci Rep 2018; 8(1): 4165.
- [36]
Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al.: Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. Jama 2016; 316(22): 2402–2410.
- [37]
Tran GS, Nghiem TP, Nguyen VT, Luong CM, Burie J-C: Improving accuracy of lung nodule classification using deep learning with focal loss. J Healthcare Eng 2019; 2019(1): 5156416.
- [38]↑
Montagnon E, Cerny M, Cadrin-Chênevert A, Hamilton V, Derennes T, Ilinca A, et al.: Deep learning workflow in radiology: A primer. Insights Into Imaging 2020; 11(1): 22.
- [39]↑
Rudin C: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Machine Intell 2019; 1(5): 206–215.
- [40]↑
Mahmoud Hassaballah AIA. Deep learning in computer vision: principles and applications (1st ed.). Boca Raton: CRC Press, 2020.
- [41]↑
Wint Wah M, Khin Sandar T, Hla Myo T: Analysis on leg bone fracture detection and classification using X-ray images. Machine Learn Res 2018; 3(3): 49–59.
- [42]↑
Yang AY, Cheng L: Long-bone fracture detection using artificial neural networks based on contour features of X-ray images. ArXiv. 2019; abs/1902.07897.
- [43]↑
Ngan KH, d’Avila Garcez A, Knapp KM, Appelboam A, Reyes-Aldasoro CC: A machine learning approach for Colles’ fracture treatment diagnosis. bioRxiv. 2020:2020.2002.2028.970574.
- [44]↑
Vishnu A, Narasimhan J, Holder LB, Kerbyson DJ, Hoisie A: Fast and accurate support vector machines on large scale systems. 2015 IEEE international conference on cluster computing, 2015, pp. 110–119.
- [45]↑
Mehta SD, Sebro R: Computer-aided detection of incidental lumbar spine fractures from routine dual-energy X-ray absorptiometry (DEXA) studies using a support vector machine (SVM) classifier. J Digit Imaging 2020; 33(1): 204–210.
- [46]↑
Mutasa S, Varada S, Goel A, Wong TT, Rasiej MJ: Advanced deep learning techniques applied to automated femoral neck fracture detection and classification. J Digit Imaging 2020; 33(5): 1209–1217.
- [47]↑
Girshick R, Donahue J, Darrell T, Malik J: Rich feature hierarchies for accurate object detection and semantic segmentation. 2014 IEEE conference on computer vision and pattern recognition; 2014 23–28 June 2014, 2014, pp. 580–587.
- [48]↑
Redmon J, Divvala S, Girshick R, Farhadi A: You only Look once: unified, real-time object detection. 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2016 27–30 June 2016, 2016, pp. 779–788.
- [49]↑
Jia Y, Wang H, Chen W, Wang Y, Yang B: An attention-based cascade R-CNN model for sternum fracture detection in X-ray images. CAAI Trans Intell Technol 2022; 7(4): 658–670.
- [50]↑
Lin TY, Goyal P, Girshick R, He K, Dollár P: Focal loss for dense object detection. 2017 IEEE international conference on computer vision (ICCV); 2017 22–29 Oct. 2017, 2017, pp. 2999–3007.
- [51]↑
Krogue JD, Cheng KV, Hwang KM, Toogood P, Meinberg EG, Geiger EJ, et al.: Automatic hip fracture identification and functional subclassification with deep learning. Radiol Artif Intell 2020; 2(2): e190023.
- [52]↑
Ronneberger O, Fischer P, Brox T: U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF, editors. Medical image computing and computer-assisted intervention – MICCAI 2015; 2015 2015//. Springer International Publishing, Cham, 2015, pp. 234–241.
- [53]↑
Lindsey R, Daluiski A, Chopra S, Lachapelle A, Mozer M, Sicular S, et al.: Deep neural network improves fracture detection by clinicians. Proc Natl Acad Sci U S A 2018; 115(45): 11591–11596.
- [54]↑
Selvaraju RR, Das A, Vedantam R, Cogswell M, Parikh D, Batra D. Grad-CAM: why did you say that? Visual explanations from deep networks via gradient-based localization. 2016, 2016.
- [55]↑
Yoon AP, Lee Y-L, Kane RL, Kuo C-F, Lin C, Chung KC: Development and validation of a deep learning model using convolutional neural networks to identify scaphoid fractures in radiographs. JAMA Network Open 2021; 4(5): e216096–e216096.
- [56]↑
Ghaderzadeh M, Aria M, Hosseini A, Asadi F, Bashash D, Abolghasemi H: A fast and efficient CNN model for B-ALL diagnosis and its subtypes classification using peripheral blood smear images. Int J Intell Systems 2022; 37(8): 5113–5133.
- [57]↑
Beyaz S, Yaylı ŞB, Kılıç E, Doktur U: The ensemble artificial intelligence (AI) method: detection of hip fractures in AP pelvis plain radiographs by majority voting using a multi-center dataset. Digital Health 2023; 9.
- [58]↑
Fukuda M, Inamoto K, Shibata N, Ariji Y, Yanashita Y, Kutsuna S, et al.: Evaluation of an artificial intelligence system for detecting vertical root fracture on panoramic radiography. Oral Radiol 2020; 36(4): 337–343.
- [59]↑
Florkowski CM: Sensitivity, specificity, receiver-operating characteristic (ROC) curves and likelihood ratios: Communicating the performance of diagnostic tests. Clin Biochem Rev 2008; 29 Suppl 1(Suppl 1): S83–S87.
- [60]↑
Cheng C-T, Ho T-Y, Lee T-Y, Chang C-C, Chou C-C, Chen C-C, et al.: Application of a deep learning algorithm for detection and visualization of hip fractures on plain pelvic radiographs. European Radiol 2019; 29(10): 5469–5477.
- [61]↑
Altmann-Schneider I, Kellenberger CJ, Pistorius SM, Saladin C, Schäfer D, Arslan N, et al.: Artificial intelligence-based detection of paediatric appendicular skeletal fractures: Performance and limitations for common fracture types and locations. Pediatr Radiol 2024; 54(1): 136–145.
- [62]↑
Link TM, Pedoia V: Using AI to improve radiographic fracture detection. Radiology 2022; 302(3): 637–638.
- [63]↑
Depypere M, Morgenstern M, Kuehl R, Senneville E, Moriarty TF, Obremskey WT, et al.: Pathogenesis and management of fracture-related infection. Clin Microbiol Infect 2020; 26(5): 572–578.
- [64]↑
Wahl B, Cossy-Gantner A, Germann S, Schwalbe NR: Artificial intelligence (AI) and global health: How can AI contribute to health in resource-poor settings? BMJ Glob Health 2018; 3(4): e000798.
- [65]↑
Korot E, Wagner S, Faes L, Liu X, Huemer J, Ferraz D, et al.: Will AI replace ophthalmologists? Translational Vision Sci Technol 2020; 9: 2.