ObjectDescriptionSupermarket-CVCS-UniMoRe

Object Detection and Scene Description in a Supermarket

This is a course project for the postgraduate level course of Computer Vision and Cognitive System taught at DIEF, UniMoRe.

Project Presentation: https://docs.google.com/presentation/d/1oa5Y8bHkKgFodULnyd5vInzbxH9hIJZVeqW5iYaaxxA/edit?usp=sharing
Project Report: https://drive.google.com/file/d/1YyxH2Q-KSBpQzCXXfHmBY9ajPjs3Un1R/view
Model Weights: https://drive.google.com/drive/folders/1QY-fAs8u-BdupzJeAMYeGz4cKXeUWnnR?usp=drive_link

Datasets

For the Object Detection task, we use the SKU110K dataset.
For the Product Classification and Embeddings for the Product Retrieval task, we use the GroceryStoreDataset.

Training and Experimentations

For training the Faster RCNN model for Object detection:

sbatch frcnn.slurm

For training the DenseNet 121 model for Product Classification and Embeddings for the Product Retrieval:

sbatch clf.slurm

Implementation and Inference

Object Detection and Scene Description

For the implementation of the complete pipeline:
- Classical Scene Image Preprocessing (Histogram Equalization)
- Inference of both models: Faster RCNN and DenseNet 121 (commented out)
- Shelf numbering: K Means with Silhouette Analysis
- Dominant colour recognition (commented out)
- Zero-Shot Product Detection using CLIP (Contrastive Language-Image Pre-training) model
- Spatial Description through geometrical templating
- Concise Scene Description using ChatGPT 3.5 Turbo through OpenAI API
  - Setup OpenAI API account

export OPENAI_API_KEY=entergeneratedAPIKey

sbatch inference.slurm

pipeline

Retrieval Mechanism

Retrieval was initially experimented using Google Colab: https://colab.research.google.com/drive/1HXn3XRod3_6CHOes7aB0bJltz-IJagRP?usp=sharing

sbatch retrival.slurm

retrival

(Additional modifications can be made by editing the Python scripts mentioned in the corresponding slurm files.)