Crowd Counting and Density Estimation with CSRNet

In a crowd counting scenario, the goal was to accurately estimate the number of people in images of varying densities and conditions. Traditional methods often struggle with occlusions, scale variations, and computational demands, so I aimed to build a robust solution using deep learning to generate density maps and derive headcounts.
My Approach
I structured the project systematically, starting with data preparation. All images and labels were organized into a pandas DataFrame, and I planned a 5-fold cross-validation scheme. For efficiency, I focused on training the best model from Fold 1 for final inference.
Next, I created a custom PyTorch Dataset class to load images and generate ground truth density maps. To enhance model robustness, I applied data augmentations like horizontal flips, color adjustments, and CoarseDropout to simulate real-world occlusions.
The core of the solution was CSRNet, a Convolutional Neural Network with a VGG-16 frontend for feature extraction and a dilated convolution backend for high-resolution density maps.
Training minimized Mean Squared Error on density maps, with Mean Absolute Error monitored for evaluation. I used the Adam optimizer and early stopping to prevent overfitting.
For inference, the best model predicted density maps on test images, summing pixel values to estimate crowd counts, which were formatted into a submission CSV.

Results and Insights
The approach yielded reliable predictions, with the Fold 1 model selected for its strong validation performance. This project demonstrated effective handling of computer vision challenges like spatial context preservation and occlusion simulation, achieving efficient training on limited resources.
Conclusion
This crowd counting project highlights my skills in deep learning for image analysis, from data preprocessing to model deployment. It underscores the power of CNNs in real-world applications like public safety and event management.


