#New to computer vision and medical imaging? Start with these 10 projects

Table of Contents

“#New to computer vision and medical imaging? Start with these 10 projects”

(AI) and computer science that enables automated systems to see, i.e. to process images and video in a human-like manner to detect and identify objects or regions of importance, predict an outcome or even alter the image to a desired format [1]. Most popular use cases in the CV domain include automated perception for autonomous drive, augmented and virtual realities (AR, VR) for simulations, games, glasses, reality, and fashion or beauty-oriented e-commerce.

Medical image (MI) processing on the other hand involves much more detailed analysis of medical images that are typically grayscale such as MRI, CT, or X-ray images for automated pathology detection, a task that requires a trained specialist’s eye for detection. Most popular use cases in the MI domain include automated pathology labeling, localization, association with treatment or prognostics, and personalized medicine.

Prior to the advent of deep learning methods, 2D signal processing solutions such as image filtering, wavelet transforms, image registration, followed by classification models [2–3] were heavily applied for solution frameworks. Signal processing solutions still continue to be the top choice for model baselining owing to their low latency and high generalizability across data sets.

However, deep learning solutions and frameworks have emerged as a new favorite owing to the end-to-end nature that eliminates the need for feature engineering, feature selection and output thresholding altogether. In this tutorial, we will review “Top 10” project choices for beginners in the fields of CV and MI and provide examples with data and starter code to aid self-paced learning.

CV and MI solution frameworks can be analyzed in three segments: Data, Process, and Outcomes [4]. It is important to always visualize the data required for such solution frameworks to have the format “{X,Y}”, where X represents the image/video data and Y represents the data target or labels. While naturally occurring unlabelled images and video sequences (X) can be plentiful, acquiring accurate labels (Y) can be an expensive process. With the advent of several data annotation platforms such as [5–7], images and videos can be labeled for each use case.

Since deep learning models typically rely on large volumes of annotated data to automatically learn features for subsequent detection tasks, the CV and MI domains often suffer from the “small data challenge”, wherein the number of samples available for training a machine learning model is several orders lesser than the number of model parameters.

The “small data challenge” if unaddressed can lead to overfit or underfit models that may not generalize to new unseen test data sets. Thus, the process of designing a solution framework for CV and MI domains must always include model complexity constraints, wherein models with fewer parameters are typically preferred to prevent model underfitting.

Finally, the solution framework outcomes are analyzed both qualitatively through visualization solutions and quantitatively in terms of well-known metrics such as precision, recall, accuracy, and F1 or Dice coefficients [8–9].

The projects listed below present a variety in difficulty levels (difficulty levels Easy, Medium, Hard) with respect to data pre-processing and model building. Also, these projects represent a variety of use cases that are currently prevailing in the research and engineering communities. The projects are defined in terms of the: Goal, Methods, and Results.

Project 1: MNIST and Fashion MNIST for Image Classification (Level: Easy)

Goal: To process images (X) of size [28×28] pixels and classify them into one of the 10 output categories (Y). For the MNIST data set, the input images are handwritten digits in the range 0 to 9 [10]. The training and test data sets contain 60,000 and 10,000 labeled images, respectively. Inspired by the handwritten digit recognition problem, another data set called the Fashion MNIST data set was launched [11] where the goal is to classify images (of size [28×28]) into clothing categories as shown in Fig. 1.