The rise of Transformer models has taken the machine learning world by storm, overshadowing many other techniques that have proven their worth over time.
While transformers are powerful, it’s important to recognize that they are not the one-size-fits-all solution for every problem.
Convolutional Neural Networks (CNNs), for instance, remain highly effective for those involving image and spatial data.
We have a scenario where we need to classify medical X-ray images to determine whether a bone is broken or not. In such tasks, convolutional neural networks shine due to their ability to learn spatial hierarchies from images.
TensorFlow Hub provides access to a wide range of pre-trained models in its repository, which can be easily integrated into the projects. Trained on large datasets, these models are optimized for various tasks, such as image classification, object detection, and natural language processing.
Using pre-trained models from TensorFlow Hub:
- No Cost: means you can access and train powerful machine learning models without any need to invest in the extensive computational resources needed to train these models from scratch.
- Training deep learning models can take days or even weeks. With pre-trained models, you can skip this lengthy process and get started with your specific task immediately.
- Pre-trained models are built and fine-tuned by experts, all the models are having high performance and reliability.
Here’s a detailed walkthrough for training a classification model using TensorFlow Hub.
Our goal is to classify X-ray images into two categories: broken and not broken. We’ll use the MURA (musculoskeletal radiographs) dataset, specifically focusing on wrist bone X-rays.
MURA is one of the largest public radiographic image datasets, with images manually labeled as normal or abnormal by board-certified radiologists. This dataset provides a robust foundation for training our model.
Link to dataset:
Transfer Learning method cuts down on the computational resources and time required to train a high-performing model. Instead of training a model from scratch, we can use a pre-trained CNN, then we adapt this model for our specific task by freezing the convolutional layers (model have previously learned to detect edges, textures, and shapes) and adding new, task-specific layers on top.
This way, the model retains its powerful feature extraction capabilities while being fine-tuned to recognize the specific patterns associated with broken bones.
For this task, we will use the VGG-16
model, that’s the classic among pre-trained models. It is trained on the ImageNet dataset, which contains 14 million labeled images across 1000 classes, that’s a robust feature extractor we can further modify.
The VGG-16 model consists of 16 convolutional layers followed by fully connected (FC) layer.
Initial Training
In the initial phase of training, all of the layers will be frozen during this phase, meaning their weights won’t be updated. The training focuses on training the new, added layers on top of the pre-trained base.
The layers added:
The flatten layer
converts the 3D feature and maps to 1D, followed by dense, fully connected layer for classification.
The dropout layer
helps in regularization by randomly setting input units to 0 during training, and prevents overfitting.
Early stopping
technique monitors the validation loss and stops training if the loss doesn’t improve for a 10 number of epochs (patience), in a case the model doesn’t train for too long and start to overfit to the data.
Since our task only involves two categories (broken and not broken bones), which makes it binary, we’ll replace the original output layer with the new one that has a single node activated by the sigmoid function:
- TensorFlow Hub provides free access to powerful models and datasets.
- Using pre-trained models reduces the computational resources and time required for training.
- Integrating with Gradio unlocks the ability to present and explain the model’s performance interactively.
Despite the rise of Transformer models, Convolutional Neural Networks remain highly effective for tasks involving image and spatial data. Our example of identifying broken bones in X-ray images showcases the strengths of CNNs.
CNNs are designed for image processing, using convolutional layers to capture spatial hierarchies. CNNs can be optimized on hardware accelerators, which makes them faster and more cost-effective than Transformers for high-resolution images.
Our use of the VGG-16 model with transfer learning, improved with hyper-parameter tuning, and data augmentation, demonstrates how CNNs can achieve high performance with limited data.
Incorporating Gradio as an interactive interface for end-users to upload X-ray images and get instant feedback augments the model’s usability and accessibility.
THE CODE NOTEBOOK: