Object Detection and Scene Description in a Supermarket
This is a course project for the postgraduate level course of Computer Vision and Cognitive System taught at DIEF, UniMoRe.
Datasets
Training and Experimentations
For training the Faster RCNN model for Object detection:
For training the DenseNet 121 model for Product Classification and Embeddings for the Product Retrieval:
Implementation and Inference
Object Detection and Scene Description
- For the implementation of the complete pipeline:
- Classical Scene Image Preprocessing (Histogram Equalization)
- Inference of both models: Faster RCNN and DenseNet 121 (commented out)
- Shelf numbering: K Means with Silhouette Analysis
- Dominant colour recognition (commented out)
- Zero-Shot Product Detection using CLIP (Contrastive Language-Image Pre-training) model
- Spatial Description through geometrical templating
- Concise Scene Description using ChatGPT 3.5 Turbo through OpenAI API
export OPENAI_API_KEY=entergeneratedAPIKey
sbatch inference.slurm

Retrieval Mechanism
Retrieval was initially experimented using Google Colab: https://colab.research.google.com/drive/1HXn3XRod3_6CHOes7aB0bJltz-IJagRP?usp=sharing

(Additional modifications can be made by editing the Python scripts mentioned in the corresponding slurm files.)