Proof-of-concept for scene category classification #144

Open
opened 2026-02-19 23:02:57 -05:00 by deekerman · 8 comments
Owner

Originally created by @lastzero on GitHub (Dec 30, 2019).

As a PhotoPrism user, I want my photos classified by scene category so that I can filter search results by scene and get better image titles.

While we already label certain scenes based on the objects we find, we don't have a specialized TensorFlow model for this yet e.g. AlexNet-places365, GoogLeNet-places365 or ResNet152-places365. See also http://places2.csail.mit.edu/download.html.

Ideally we can reuse our existing Go TensorFlow code for this, but each model is different in how it must be used. Our NSFW detector for example needs different input values than our Nasnet model for object classification, so we ended up using different code and different packages. For scene detection, it might be good to create a new scene package unless merging it with Nasnet gives us much better performance.

Acceptance Criteria:

  • Identify and document freely available TensorFlow models and compare them by capability, performance and file size (you can use our developer wiki for this)
  • Identify and document working example implementations and helpful GitHub repos e.g. https://github.com/CSAILVision/places365
  • If you can code in Go, provide a working proof-of-concept as unit test (no UI required)
Originally created by @lastzero on GitHub (Dec 30, 2019). **As a PhotoPrism user, I want my photos classified by scene category so that I can filter search results by scene and get better image titles.** While we already label certain scenes based on the objects we find, we don't have a specialized TensorFlow model for this yet e.g. AlexNet-places365, GoogLeNet-places365 or ResNet152-places365. See also http://places2.csail.mit.edu/download.html. Ideally we can reuse our existing Go TensorFlow code for this, but each model is different in how it must be used. Our NSFW detector for example needs different input values than our Nasnet model for object classification, so we ended up using different code and different packages. For scene detection, it might be good to create a new `scene` package unless merging it with Nasnet gives us much better performance. Acceptance Criteria: - [ ] Identify and document freely available TensorFlow models and compare them by capability, performance and file size (you can use our developer [wiki](https://github.com/photoprism/photoprism/wiki) for this) - [ ] Identify and document working example implementations and helpful GitHub repos e.g. https://github.com/CSAILVision/places365 - [ ] If you can code in Go, provide a working proof-of-concept as unit test (no UI required)
Author
Owner

@lastzero commented on GitHub (Jan 8, 2020):

Moved our image classification code to the new classify package: github.com/photoprism/photoprism@e9874d6e0c

Should be easier to test, simply go to the directory and run go test -v.

We'll see if that's a good name... was the best a could come up with today.

@lastzero commented on GitHub (Jan 8, 2020): Moved our image classification code to the new `classify` package: https://github.com/photoprism/photoprism/commit/e9874d6e0cfb81d2fce4e5be34455848254685d7 Should be easier to test, simply go to the directory and run `go test -v`. We'll see if that's a good name... was the best a could come up with today.
Author
Owner

@lastzero commented on GitHub (Jan 8, 2020):

Updated our docs: https://github.com/photoprism/photoprism/wiki/Image-Classification

@lastzero commented on GitHub (Jan 8, 2020): Updated our docs: https://github.com/photoprism/photoprism/wiki/Image-Classification
Author
Owner

@lastzero commented on GitHub (Jan 16, 2020):

FYI: https://github.com/nic25 is working on this 🚀

@lastzero commented on GitHub (Jan 16, 2020): FYI: https://github.com/nic25 is working on this 🚀
Author
Owner

@lastzero commented on GitHub (Jan 16, 2020):

Didn't find related models on TensorFlow Hub.

@lastzero commented on GitHub (Jan 16, 2020): Didn't find related models on [TensorFlow Hub](https://tfhub.dev/s?q=places365).
Author
Owner

@tam-wh commented on GitHub (May 23, 2020):

seems like there's a way to convert places365 caffemodel to tensorflow?
https://ndres.me/post/convert-caffe-to-tensorflow/

@tam-wh commented on GitHub (May 23, 2020): seems like there's a way to convert places365 caffemodel to tensorflow? https://ndres.me/post/convert-caffe-to-tensorflow/
Author
Owner

@tam-wh commented on GitHub (May 23, 2020):

i successfully converted vgg16_hybrid1365 into pb file (took me half a day to get the converter working) and it works well in Tensorflow .NET. I'm now working on the non-hybrid model as it is more suitable for scene classification

ResNet152-places365 (does not convert)
VGG16-hybrid1365 (converted successfully)
VGG16-places365 (working on it)

@tam-wh commented on GitHub (May 23, 2020): i successfully converted vgg16_hybrid1365 into pb file (took me half a day to get the converter working) and it works well in Tensorflow .NET. I'm now working on the non-hybrid model as it is more suitable for scene classification ResNet152-places365 (does not convert) VGG16-hybrid1365 (converted successfully) VGG16-places365 (working on it)
Author
Owner

@tam-wh commented on GitHub (May 23, 2020):

The file is quite huge, ~500MB. Now shared on my nextcloud VGG16-places365. Its quite slow as my vps download speed is limited to 150KB. Label files are available on places365 github

Some information on the model

Input operation name = "data";
Output operation name = "prob"

Took a couple of images from the demo here and run it through the converted pb file in Tensorflow.NET. Image width & height set to 224px.

IMG_7308
48 /b/beach 48, 0.86291456

13
66 /b/bridge 66, 0.9400765

Seems like conversion works really well

@tam-wh commented on GitHub (May 23, 2020): The file is quite huge, ~500MB. Now shared on my nextcloud [VGG16-places365](https://cloud.oqozi.com/index.php/s/CAkbYKfcL92EAQa). Its quite slow as my vps download speed is limited to 150KB. Label files are available on [places365 github](https://github.com/CSAILVision/places365/blob/master/categories_places365.txt) Some information on the model ``` Input operation name = "data"; Output operation name = "prob" ``` Took a couple of images from the demo [here ](http://places2.csail.mit.edu/demo.html)and run it through the converted pb file in Tensorflow.NET. Image width & height set to 224px. ![IMG_7308](https://user-images.githubusercontent.com/3193896/82736077-675cb300-9d59-11ea-8919-755cbded341a.JPG) 48 /b/beach 48, 0.86291456 ![13](https://user-images.githubusercontent.com/3193896/82736104-98d57e80-9d59-11ea-9b0c-a2681e4d863f.jpg) 66 /b/bridge 66, 0.9400765 Seems like conversion works really well
Author
Owner

@Extarys commented on GitHub (Sep 10, 2020):

Not sure if it's possible, but could the label "water" be added? Maybe "boat" for the second picture?

I don't really know how those things work though 😕

@Extarys commented on GitHub (Sep 10, 2020): Not sure if it's possible, but could the label "water" be added? Maybe "boat" for the second picture? I don't really know how those things work though :confused:
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#144
No description provided.