AI: Generate captions with Clip Interrogator #1785

Open
opened 2026-02-20 01:00:00 -05:00 by deekerman · 19 comments
Owner

Originally created by @lastzero on GitHub (May 30, 2023).

Discussed in https://github.com/photoprism/photoprism/discussions/3419

Originally posted by sfxworks May 20, 2023
image
I was looking around a lot for a way to handle and autotag images using something better than the tensor model. I made a stable diffusion interrogator wrapper/endpoint at https://github.com/sfxworks/interrogator-http and a script that (roughly) calls the API to add a description and label the images it goes through. You'll need your session ID, as well as some additional token I found but couldn't find in documentation as noted by my t=small-token-here example.

You can use multiple containers. Though it could use some fine tuning when you do so since the iteration doesn't handle errors yet.

I am currently using model ViT-H-14/laion2b_s32b_b79k that runs against Stable Diffusion v2. More information here https://github.com/pharmapsychotic/clip-interrogator

This is a rough script. A lot more work needs to be done such as handle labels a bit better (maybe join them instead of just by word), get video image thumbnails instead of skipping them, and other methods. Note that it does take about 2m to interrogate an image to get these sort of results, and I currently have it running on a GTX 1070 with some performance args to reduce memory (which also reduce quality). So your mileage may vary.

import argparse
import requests
from PIL import Image
from io import BytesIO
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import base64
from concurrent.futures import ThreadPoolExecutor
import itertools

# Set up argument parser
parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('offset', type=int, help='The starting index of the photos to process')
parser.add_argument('count', type=int, help='The ending index of the photos to process')

# List of interrogator IPs
server_ips = ['192.168.6.16'] #, '10.0.8.88', '10.0.8.120', '10.0.8.115', '10.0.8.46']


args = parser.parse_args()

# Download the set of English stop words
nltk.download('punkt')
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

# Set up PhotoPrism API details
photoprism_url = "https://photoprism.url.here/api/v1"
headers = {"X-Session-ID": "seesion-id-here"}

# Fetch the list of photo UUIDs from PhotoPrism
response = requests.get(f"{photoprism_url}/photos?count={args.count}&offset={args.offset}", headers=headers)
photo_list = response.json()

# Define function to process a single photo
def process_photo(photo, ip):
    file_name = photo.get('FileName')
    if file_name and file_name.endswith('.mp4'):
        print("Skipping MP4 file: " + file_name)
        return
    
    print("Processing photo against IP: " + ip)
    print("Photo: " + photo['Hash'])
    print("Photo: " + photo['UID'])

    # Use the provided IP for this request
    url = f"http://{ip}/interrogate"

    # Fetch the image data
    image_url = f"{photoprism_url}/dl/{photo['Hash']}?t=small-token-here"
    print("Using: " + image_url)
    response = requests.get(image_url, headers=headers)
    image_data = base64.b64encode(response.content).decode('utf-8')
    print("Got image data response.")
    
    # Send the image data to the interrogator endpoint
    print("Sending to interrogator")
    response = requests.post(url, json={'image': image_data}, timeout=600)  # Wait up to 600 seconds (10 minutes)
    label_sentence = response.json()['label']

    print("Labels before:")
    print(label_sentence)

    description_url = f"{photoprism_url}/photos/{photo['UID']}"

    payload = {
      "Description": label_sentence,
      "DescriptionSrc": "manual",
      "Notes" :"Description and tags added by ViT-H-14/laion2b_s32b_b79k",
      "NotesSrc": "manual",
    }

    response = requests.put(description_url, headers=headers, json=payload)
    if response.status_code != 200:
        print(f"Failed to add description '{label_sentence}' to photo {photo['UID']}: {response.content}")

    # Tokenize the label and remove stop words
    label_words = word_tokenize(label_sentence)
    label_keywords = [word for word in label_words if word not in stop_words and word != ',' and word != '!']
    print("Labels after:")
    print(label_keywords)

    # Add each keyword as a label to the photo in PhotoPrism
    for keyword in label_keywords:
        # Prepare the JSON payload for the PhotoPrism API
        payload = {
          "Name": keyword,
          "Priority": 10,  # Adjust this as needed
        }

        # Add the label to the photo
        label_url = f"{photoprism_url}/photos/{photo['UID']}/label"
        response = requests.post(label_url, headers=headers, json=payload)
        
        if response.status_code != 200:
            print(f"Failed to add label '{keyword}' to photo {photo['UID']}: {response.content}")

# Create a cyclic iterator for the server IPs
cyclic_ips = itertools.cycle(server_ips)

# In the ThreadPoolExecutor block:
with ThreadPoolExecutor(max_workers=1) as executor:
    # Map the process_photo function to the photo_list and cyclic_ips
    executor.map(process_photo, photo_list, cyclic_ips)
```</div>
Originally created by @lastzero on GitHub (May 30, 2023). ### Discussed in https://github.com/photoprism/photoprism/discussions/3419 <div type='discussions-op-text'> <sup>Originally posted by **sfxworks** May 20, 2023</sup> ![image](https://github.com/photoprism/photoprism/assets/5921035/4ca182e5-fe3f-476f-9162-931e190b1e5d) I was looking around a lot for a way to handle and autotag images using something better than the tensor model. I made a stable diffusion interrogator wrapper/endpoint at https://github.com/sfxworks/interrogator-http and a script that (roughly) calls the API to add a description and label the images it goes through. You'll need your session ID, as well as some additional token I found but couldn't find in documentation as noted by my `t=small-token-here` example. You can use multiple containers. Though it could use some fine tuning when you do so since the iteration doesn't handle errors yet. I am currently using model ViT-H-14/laion2b_s32b_b79k that runs against Stable Diffusion v2. More information here https://github.com/pharmapsychotic/clip-interrogator This is a rough script. A lot more work needs to be done such as handle labels a bit better (maybe join them instead of just by word), get video image thumbnails instead of skipping them, and other methods. Note that it does take about 2m to interrogate an image to get these sort of results, and I currently have it running on a GTX 1070 with some performance args to reduce memory (which also reduce quality). So your mileage may vary. ```python import argparse import requests from PIL import Image from io import BytesIO import nltk from nltk.corpus import stopwords from nltk.tokenize import word_tokenize import base64 from concurrent.futures import ThreadPoolExecutor import itertools # Set up argument parser parser = argparse.ArgumentParser(description='Process some integers.') parser.add_argument('offset', type=int, help='The starting index of the photos to process') parser.add_argument('count', type=int, help='The ending index of the photos to process') # List of interrogator IPs server_ips = ['192.168.6.16'] #, '10.0.8.88', '10.0.8.120', '10.0.8.115', '10.0.8.46'] args = parser.parse_args() # Download the set of English stop words nltk.download('punkt') nltk.download('stopwords') stop_words = set(stopwords.words('english')) # Set up PhotoPrism API details photoprism_url = "https://photoprism.url.here/api/v1" headers = {"X-Session-ID": "seesion-id-here"} # Fetch the list of photo UUIDs from PhotoPrism response = requests.get(f"{photoprism_url}/photos?count={args.count}&offset={args.offset}", headers=headers) photo_list = response.json() # Define function to process a single photo def process_photo(photo, ip): file_name = photo.get('FileName') if file_name and file_name.endswith('.mp4'): print("Skipping MP4 file: " + file_name) return print("Processing photo against IP: " + ip) print("Photo: " + photo['Hash']) print("Photo: " + photo['UID']) # Use the provided IP for this request url = f"http://{ip}/interrogate" # Fetch the image data image_url = f"{photoprism_url}/dl/{photo['Hash']}?t=small-token-here" print("Using: " + image_url) response = requests.get(image_url, headers=headers) image_data = base64.b64encode(response.content).decode('utf-8') print("Got image data response.") # Send the image data to the interrogator endpoint print("Sending to interrogator") response = requests.post(url, json={'image': image_data}, timeout=600) # Wait up to 600 seconds (10 minutes) label_sentence = response.json()['label'] print("Labels before:") print(label_sentence) description_url = f"{photoprism_url}/photos/{photo['UID']}" payload = { "Description": label_sentence, "DescriptionSrc": "manual", "Notes" :"Description and tags added by ViT-H-14/laion2b_s32b_b79k", "NotesSrc": "manual", } response = requests.put(description_url, headers=headers, json=payload) if response.status_code != 200: print(f"Failed to add description '{label_sentence}' to photo {photo['UID']}: {response.content}") # Tokenize the label and remove stop words label_words = word_tokenize(label_sentence) label_keywords = [word for word in label_words if word not in stop_words and word != ',' and word != '!'] print("Labels after:") print(label_keywords) # Add each keyword as a label to the photo in PhotoPrism for keyword in label_keywords: # Prepare the JSON payload for the PhotoPrism API payload = { "Name": keyword, "Priority": 10, # Adjust this as needed } # Add the label to the photo label_url = f"{photoprism_url}/photos/{photo['UID']}/label" response = requests.post(label_url, headers=headers, json=payload) if response.status_code != 200: print(f"Failed to add label '{keyword}' to photo {photo['UID']}: {response.content}") # Create a cyclic iterator for the server IPs cyclic_ips = itertools.cycle(server_ips) # In the ThreadPoolExecutor block: with ThreadPoolExecutor(max_workers=1) as executor: # Map the process_photo function to the photo_list and cyclic_ips executor.map(process_photo, photo_list, cyclic_ips) ```</div>
Author
Owner

@felipemeres commented on GitHub (Jun 9, 2023):

The recently released RAM might be more efficient for the task:
https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text

@felipemeres commented on GitHub (Jun 9, 2023): The recently released RAM might be more efficient for the task: https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text
Author
Owner

@ahdsr commented on GitHub (Aug 24, 2023):

Can something like this be implemented with Photoprism?

@ahdsr commented on GitHub (Aug 24, 2023): Can something like this be implemented with Photoprism?
Author
Owner

@hrdwdmrbl commented on GitHub (Jan 20, 2024):

This again and again! I think you'll have to keep updating the model again and again every 6 months or so. It would be for the best!

@hrdwdmrbl commented on GitHub (Jan 20, 2024): This again and again! I think you'll have to keep updating the model again and again every 6 months or so. It would be for the best!
Author
Owner

@sfxworks commented on GitHub (Apr 22, 2024):

Heya, back in action and going the gpt vision route w/ localai https://localai.io/features/gpt-vision/

that way if someone wants to they can also send things to OpenAI, or a local model.

I believe the model @felipemeres referenced can be used here as well, as a lot is configurable.
Give me a bit to write something up

@sfxworks commented on GitHub (Apr 22, 2024): Heya, back in action and going the gpt vision route w/ localai https://localai.io/features/gpt-vision/ that way if someone wants to they can also send things to OpenAI, or a local model. I believe the model @felipemeres referenced can be used here as well, as a lot is configurable. Give me a bit to write something up
Author
Owner

@sfxworks commented on GitHub (Apr 23, 2024):

Need some cleanup but I've added descriptions to about 1500 images so far overnight. Mine hallucinates a little (this was 455 not 500, and was wearing a backbrace not knee brace)
image

I might try to add some grammar things to make sure it gives me a return in a certain format so I can also ask for tags and a better title. Maybe even album organization. For now though works well with https://localai.io/ which is a compatible locally deployable API against OpenAI if you wanna also send your images upstream and not analyze them locally.

import argparse
import requests
import base64

import logging
import http.client as http_client

http_client.HTTPConnection.debuglevel = 1

# You must initialize logging, otherwise you'll not see debug output.
logging.basicConfig()
logging.getLogger().setLevel(logging.DEBUG)
requests_log = logging.getLogger("requests.packages.urllib3")
requests_log.setLevel(logging.DEBUG)
requests_log.propagate = True


# Set up argument parser
parser = argparse.ArgumentParser(description='Automated photo processing and tagging')
parser.add_argument('offset', type=int, help='The starting index of the photos to process')
parser.add_argument('count', type=int, help='The number of photos to process')

args = parser.parse_args()

# API Details
photoprism_url = "https://REDACTED/api/v1"
headers = {"X-Session-ID": "REDACTED"}
gpt4_vision_url = "http://REDACTED/v1/chat/completions"
small_token = "REDACTED"

# Fetch the list of photo UUIDs from PhotoPrism
response = requests.get(f"{photoprism_url}/photos?count={args.count}&offset={args.offset}", headers=headers)
photo_list = response.json()

def process_photo(photo):
    if photo.get('FileName', '').endswith('.mp4'):
        print(f"Skipping MP4 file: {photo['FileName']}")
        return

    print(f"Processing photo: {photo}")

    # Fetch the image data
    image_url = f"{photoprism_url}/t/{photo['Hash']}/{small_token}/fit_1280"
    response = requests.get(image_url, headers=headers)

    # Prepare the request for GPT-4 Vision
    json_data = {
        "model": "gpt-4-vision-preview",
        "messages": [
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": "Describe this image."},
                    {"type": "image_url", "image_url": {"url": image_url}}
                ],
                "temperature": 0.9
            }
        ]
    }

    print(json_data)

    # Send request to GPT-4 Vision
    response = requests.post(gpt4_vision_url, json=json_data, headers={"Content-Type": "application/json"}, timeout=500)

    if response.status_code == 200:
        results = response.json()
        print(f"Results are {results}")
        choices = results.get('choices', [])
        description = ""

        if choices:
            # Accessing the first item in 'choices' list
            first_choice = choices[0]
            message = first_choice.get('message', {})
            # Retrieve content, ensuring that special characters are handled correctly
            description = message.get('content', 'No description provided.')

        else:
            description = 'No description provided.'

        # Update the photo description in PhotoPrism
        description_url = f"{photoprism_url}/photos/{photo['UID']}"
        payload = {"Description": description, "DescriptionSrc": "llava"}
        update_response = requests.put(description_url, headers=headers, json=payload)

        if update_response.status_code == 200:
            print(f"Successfully updated photo {photo['UID']}")
        else:
            print(f"Failed to update photo {photo['UID']}: {update_response.text}")
    else:
        print(f"Error processing photo {photo['UID']} with GPT-4 Vision: {response.text}")

# Loop through each photo and process it
for photo in photo_list:
    process_photo(photo)
@sfxworks commented on GitHub (Apr 23, 2024): Need some cleanup but I've added descriptions to about 1500 images so far overnight. Mine hallucinates a little (this was 455 not 500, and was wearing a backbrace not knee brace) ![image](https://github.com/photoprism/photoprism/assets/5921035/1cd1beec-265a-4068-a556-da6ec12a5b9d) I might try to add some grammar things to make sure it gives me a return in a certain format so I can also ask for tags and a better title. Maybe even album organization. For now though works well with https://localai.io/ which is a compatible locally deployable API against OpenAI if you wanna also send your images upstream and not analyze them locally. ```python import argparse import requests import base64 import logging import http.client as http_client http_client.HTTPConnection.debuglevel = 1 # You must initialize logging, otherwise you'll not see debug output. logging.basicConfig() logging.getLogger().setLevel(logging.DEBUG) requests_log = logging.getLogger("requests.packages.urllib3") requests_log.setLevel(logging.DEBUG) requests_log.propagate = True # Set up argument parser parser = argparse.ArgumentParser(description='Automated photo processing and tagging') parser.add_argument('offset', type=int, help='The starting index of the photos to process') parser.add_argument('count', type=int, help='The number of photos to process') args = parser.parse_args() # API Details photoprism_url = "https://REDACTED/api/v1" headers = {"X-Session-ID": "REDACTED"} gpt4_vision_url = "http://REDACTED/v1/chat/completions" small_token = "REDACTED" # Fetch the list of photo UUIDs from PhotoPrism response = requests.get(f"{photoprism_url}/photos?count={args.count}&offset={args.offset}", headers=headers) photo_list = response.json() def process_photo(photo): if photo.get('FileName', '').endswith('.mp4'): print(f"Skipping MP4 file: {photo['FileName']}") return print(f"Processing photo: {photo}") # Fetch the image data image_url = f"{photoprism_url}/t/{photo['Hash']}/{small_token}/fit_1280" response = requests.get(image_url, headers=headers) # Prepare the request for GPT-4 Vision json_data = { "model": "gpt-4-vision-preview", "messages": [ { "role": "user", "content": [ {"type": "text", "text": "Describe this image."}, {"type": "image_url", "image_url": {"url": image_url}} ], "temperature": 0.9 } ] } print(json_data) # Send request to GPT-4 Vision response = requests.post(gpt4_vision_url, json=json_data, headers={"Content-Type": "application/json"}, timeout=500) if response.status_code == 200: results = response.json() print(f"Results are {results}") choices = results.get('choices', []) description = "" if choices: # Accessing the first item in 'choices' list first_choice = choices[0] message = first_choice.get('message', {}) # Retrieve content, ensuring that special characters are handled correctly description = message.get('content', 'No description provided.') else: description = 'No description provided.' # Update the photo description in PhotoPrism description_url = f"{photoprism_url}/photos/{photo['UID']}" payload = {"Description": description, "DescriptionSrc": "llava"} update_response = requests.put(description_url, headers=headers, json=payload) if update_response.status_code == 200: print(f"Successfully updated photo {photo['UID']}") else: print(f"Failed to update photo {photo['UID']}: {update_response.text}") else: print(f"Error processing photo {photo['UID']} with GPT-4 Vision: {response.text}") # Loop through each photo and process it for photo in photo_list: process_photo(photo) ```
Author
Owner

@sfxworks commented on GitHub (Apr 23, 2024):

(Also if DescriptionSrc has any spaces the PUT is successful even in photoprism logs but it does not put it. That stumped me for a bit)

@sfxworks commented on GitHub (Apr 23, 2024): _(Also if `DescriptionSrc` has any spaces the PUT is successful even in photoprism logs but it does not put it. That stumped me for a bit)_
Author
Owner

@hrdwdmrbl commented on GitHub (Apr 24, 2024):

@sfxworks Awesome work! If I sponsor photoprism, that wouldn't include you, would it?

@hrdwdmrbl commented on GitHub (Apr 24, 2024): @sfxworks Awesome work! If I sponsor photoprism, that wouldn't include you, would it?
Author
Owner

@lastzero commented on GitHub (Apr 24, 2024):

We give free memberships (and whatever else we can afford) for larger and/or regular contributions, provided they can be merged and released.

@lastzero commented on GitHub (Apr 24, 2024): We give free memberships (and whatever else we can afford) for larger and/or regular contributions, provided they can be merged and released.
Author
Owner

@sfxworks commented on GitHub (Apr 30, 2024):

Hey I am a donator (though between some chaos during the holidays I am behind on my memebership, but I don't mind resuming it if needed). Either way thanks for the acknowledgement! Just feels good using this vs something like a cloud service and it's how open source should be.

I've made this alternative script that uses ollama. They also have an openai api compatible service but it doesn't work with vision/llava (but it does use my radeon card so I wanted to set it up)

I've got to make tweaks still to handle scenarios where it may not return proper json and try again, but here's what I have so far.

The goal was to work on the other properties, like title, keywords, and the privacy flag.

import argparse
import requests
import base64
import base64
from io import BytesIO
import json

# Set up argument parser
parser = argparse.ArgumentParser(description='Automated photo processing and tagging')
parser.add_argument('offset', type=int, help='The starting index of the photos to process')
parser.add_argument('count', type=int, help='The number of photos to process')

args = parser.parse_args()

# API Details
photoprism_url = "https://REDACTED/api/v1"
headers = {"X-Session-ID": "REDACTED"}
gpt4_vision_url = "http://REDACTED:11434/api/generate"
small_token = "REDACTED"

# Fetch the list of photo UUIDs from PhotoPrism
response = requests.get(f"{photoprism_url}/photos?count={args.count}&offset={args.offset}", headers=headers)
photo_list = response.json()

def process_photo(photo):
    if photo.get('FileName', '').endswith('.mp4'):
        print(f"Skipping MP4 file: {photo['FileName']}")
        return

    print(f"Processing photo: {photo}")

    # Fetch the image data
    image_url = f"{photoprism_url}/t/{photo['Hash']}/{small_token}/fit_4096"
    response = requests.get(image_url, headers=headers)

    # Convert the response content into a base64 string
    encoded_string = base64.b64encode(response.content).decode('utf-8')


    # Prepare the request for OLLAMA
    json_data = {
        "model": "llava:34b",
        "prompt": "You are a detailed photo forensic specialist that states whats in an image without assuming anything extra. Being as literal as possible, give this image a title, a description, a string of keywords in json. Use keys Title, Description, and Keywords (as a string of comma separated values rather than an array), and Private (setting that to true or false based on whether or not its pornographic)",
        "stream": False,
        "format": "json",
        "images": [
            encoded_string
        ]
    }

    # Send request to OLLAMA
    response = requests.post(gpt4_vision_url, json=json_data, headers={"Content-Type": "application/json"}, timeout=500)

    if response.status_code == 200:
        json_data = response.json()
        inner_response = json.loads(json_data['response'])
        print(inner_response)

        # Assuming the original payload already exists and has keys 'Description', 'Keywords', and 'Subject'
        payload = {
            'Title': inner_response['Title'],
            'Description': inner_response['Description'],  # Get existing values from response
            'Keywords': inner_response['Keywords'],
            'Private': inner_response['Private'],
        }
        
        # Add new keys to the payload
        payload['DescriptionSrc'] = 'Ollama'
        payload['KeywordSrc'] = 'Ollama'
        payload['TitleSrc'] = 'Ollama'

        photo_url = f"{photoprism_url}/photos/{photo['UID']}"
        update_response = requests.put(photo_url, headers=headers, json=payload)
        if update_response.status_code == 200:
            print(f"Successfully updated photo {photo['UID']}")
        else:
            print(f"Failed to update photo {photo['UID']}: {update_response.text}")

# Loop through each photo and process it
for photo in photo_list:
    process_photo(photo)
@sfxworks commented on GitHub (Apr 30, 2024): Hey I am a donator (though between some chaos during the holidays I am behind on my memebership, but I don't mind resuming it if needed). Either way thanks for the acknowledgement! Just feels good using this vs something like a cloud service and it's how open source should be. I've made this alternative script that uses ollama. They also have an openai api compatible service but it doesn't work with vision/llava (but it does use my radeon card so I wanted to set it up) I've got to make tweaks still to handle scenarios where it may not return proper json and try again, but here's what I have so far. The goal was to work on the other properties, like title, keywords, and the privacy flag. ```python import argparse import requests import base64 import base64 from io import BytesIO import json # Set up argument parser parser = argparse.ArgumentParser(description='Automated photo processing and tagging') parser.add_argument('offset', type=int, help='The starting index of the photos to process') parser.add_argument('count', type=int, help='The number of photos to process') args = parser.parse_args() # API Details photoprism_url = "https://REDACTED/api/v1" headers = {"X-Session-ID": "REDACTED"} gpt4_vision_url = "http://REDACTED:11434/api/generate" small_token = "REDACTED" # Fetch the list of photo UUIDs from PhotoPrism response = requests.get(f"{photoprism_url}/photos?count={args.count}&offset={args.offset}", headers=headers) photo_list = response.json() def process_photo(photo): if photo.get('FileName', '').endswith('.mp4'): print(f"Skipping MP4 file: {photo['FileName']}") return print(f"Processing photo: {photo}") # Fetch the image data image_url = f"{photoprism_url}/t/{photo['Hash']}/{small_token}/fit_4096" response = requests.get(image_url, headers=headers) # Convert the response content into a base64 string encoded_string = base64.b64encode(response.content).decode('utf-8') # Prepare the request for OLLAMA json_data = { "model": "llava:34b", "prompt": "You are a detailed photo forensic specialist that states whats in an image without assuming anything extra. Being as literal as possible, give this image a title, a description, a string of keywords in json. Use keys Title, Description, and Keywords (as a string of comma separated values rather than an array), and Private (setting that to true or false based on whether or not its pornographic)", "stream": False, "format": "json", "images": [ encoded_string ] } # Send request to OLLAMA response = requests.post(gpt4_vision_url, json=json_data, headers={"Content-Type": "application/json"}, timeout=500) if response.status_code == 200: json_data = response.json() inner_response = json.loads(json_data['response']) print(inner_response) # Assuming the original payload already exists and has keys 'Description', 'Keywords', and 'Subject' payload = { 'Title': inner_response['Title'], 'Description': inner_response['Description'], # Get existing values from response 'Keywords': inner_response['Keywords'], 'Private': inner_response['Private'], } # Add new keys to the payload payload['DescriptionSrc'] = 'Ollama' payload['KeywordSrc'] = 'Ollama' payload['TitleSrc'] = 'Ollama' photo_url = f"{photoprism_url}/photos/{photo['UID']}" update_response = requests.put(photo_url, headers=headers, json=payload) if update_response.status_code == 200: print(f"Successfully updated photo {photo['UID']}") else: print(f"Failed to update photo {photo['UID']}: {update_response.text}") # Loop through each photo and process it for photo in photo_list: process_photo(photo) ```
Author
Owner

@lastzero commented on GitHub (May 1, 2024):

Note that with our latest release, you can use standard Bearer Authorization (or X-Auth-Token) headers with script/app specific access tokens:
https://docs.photoprism.app/developer-guide/api/#client-authentication

@lastzero commented on GitHub (May 1, 2024): Note that with our latest release, you can use standard Bearer Authorization (or X-Auth-Token) headers with script/app specific access tokens: https://docs.photoprism.app/developer-guide/api/#client-authentication
Author
Owner

@sfxworks commented on GitHub (Sep 7, 2024):

Hey all,

Sorry for the delay. Had a very bad family emergency that ended with grieving.

As we stand today, there are models now that can understand video.
https://medium.com/@manish.thota1999/an-experiment-to-unlock-ollamas-potential-video-question-answering-e2b4d1bfb5ba
https://github.com/ollama/ollama/issues/3184

Ontop of this, I was trying to get subtitle generation via whisper which can also be run locally or through an endpoint locally or by a service.

The current plan is utilize grammar for proper formatting of the return of all meta of a photo as a suggestion/default.

I'll give this some work this week.

@sfxworks commented on GitHub (Sep 7, 2024): Hey all, Sorry for the delay. Had a very bad family emergency that ended with grieving. As we stand today, there are models now that can understand video. https://medium.com/@manish.thota1999/an-experiment-to-unlock-ollamas-potential-video-question-answering-e2b4d1bfb5ba https://github.com/ollama/ollama/issues/3184 Ontop of this, I was trying to get subtitle generation via whisper which can also be run locally or through an endpoint locally or by a service. The current plan is utilize grammar for proper formatting of the return of all meta of a photo as a suggestion/default. I'll give this some work this week.
Author
Owner

@sfxworks commented on GitHub (Oct 30, 2024):

Resuming work on this while leaving room for #1090

Also, I have a separate addition now that generates subtitles for videos based on how VLC media player searched for them. Once LlavaNext is runnable locally (localAI work needs to be complete) videos will also be able to be titled/tagged/described

@sfxworks commented on GitHub (Oct 30, 2024): Resuming work on this while leaving room for #1090 Also, I have a separate addition now that generates subtitles for videos based on how VLC media player searched for them. Once LlavaNext is runnable locally (localAI work needs to be complete) videos will also be able to be titled/tagged/described
Author
Owner

@seeschloss commented on GitHub (Nov 25, 2024):

Note that with our latest release, you can use standard Bearer Authorization (or X-Auth-Token) headers with script/app specific access tokens: https://docs.photoprism.app/developer-guide/api/#client-authentication

Maybe I should just open an issue but I've been scripting using X-Auth-Token and just tried to use Authorization: Bearer instead according to the doc you linked here, and it doesn't seem to work as I expected.

The same query, even a simple /api/v1/photos?count=1, returns results (one result in this case) using X-Auth-Token with a token I retrieved from my browser, but with a token generated from command-line and given the "*" scope I can't get anything more than: {"code":400,"error":"Unable to do that"}. I couldn't find any way to get better information than that, the debug log doesn't give any other information.

I think it might be due to the fact that this token isn't linked to a user, but then I have no idea how to do anything with it.

@seeschloss commented on GitHub (Nov 25, 2024): > Note that with our latest release, you can use standard Bearer Authorization (or X-Auth-Token) headers with script/app specific access tokens: https://docs.photoprism.app/developer-guide/api/#client-authentication Maybe I should just open an issue but I've been scripting using `X-Auth-Token` and just tried to use `Authorization: Bearer` instead according to the doc you linked here, and it doesn't seem to work as I expected. The same query, even a simple `/api/v1/photos?count=1`, returns results (one result in this case) using `X-Auth-Token` with a token I retrieved from my browser, but with a token generated from command-line and given the "*" scope I can't get anything more than: `{"code":400,"error":"Unable to do that"}`. I couldn't find any way to get better information than that, the debug log doesn't give any other information. I think it might be due to the fact that this token isn't linked to a user, but then I have no idea how to do anything with it.
Author
Owner

@lastzero commented on GitHub (Nov 25, 2024):

Note that you can only use X-Auth-Token: <access_token> or Authorisation: Bearer <access_token>, not both at the same time.

Does it work if you use an app password or a token associated with a user? Are all API endpoints failing or just this particular one?

Do you see any errors or other hints in the debug/trace logs?

@lastzero commented on GitHub (Nov 25, 2024): Note that you can only use `X-Auth-Token: <access_token>` or `Authorisation: Bearer <access_token>`, not both at the same time. Does it work if you use an app password or a token associated with a user? Are all API endpoints failing or just this particular one? - https://docs.photoprism.dev/ Do you see any errors or other hints in the debug/trace logs? - https://docs.photoprism.app/getting-started/troubleshooting/logs/#__tabbed_1_3
Author
Owner

@seeschloss commented on GitHub (Nov 25, 2024):

The logs (with trace enabled) only say:

Nov 25 15:15:31 clio photoprism[629080]: time="2024-11-25T15:15:31+01:00" level=debug msg="api-v1: abort /api/v1/photos with code 400 (unable to do that)"
Nov 25 15:15:31 clio photoprism[629080]: time="2024-11-25T15:15:31+01:00" level=debug msg="server: GET /api/v1/photos?count=1 (400) [374.565µs]"

When the same request with a X-Auth-Token says:

Nov 25 15:16:13 clio photoprism[629080]: time="2024-11-25T15:16:13+01:00" level=debug msg="photos: found 1 result for dist:2.000000 count:1 [61.207108ms]"
Nov 25 15:16:13 clio photoprism[629080]: time="2024-11-25T15:16:13+01:00" level=debug msg="server: GET /api/v1/photos/view?count=1 (200) [61.455744ms]"

I'm indeed using just one method or the other, not both at the same time.

As for the other methods, I had not tried them because they seemed more complex for my simple curl-based script, but looking again I see that app passwords can be used as a Bearer token, and they do work fine! Which I think confirms my suspicions that the error with the access_token is due to it not being linked to a specific user.

Edit And I forgot to answer the other question, a few endpoints do work with an access_token, such as /api/v1/config and /api/v1/status but most of them answer either "Unable to do that" (400) or "Permission denied" (403).

@seeschloss commented on GitHub (Nov 25, 2024): The logs (with trace enabled) only say: ``` Nov 25 15:15:31 clio photoprism[629080]: time="2024-11-25T15:15:31+01:00" level=debug msg="api-v1: abort /api/v1/photos with code 400 (unable to do that)" Nov 25 15:15:31 clio photoprism[629080]: time="2024-11-25T15:15:31+01:00" level=debug msg="server: GET /api/v1/photos?count=1 (400) [374.565µs]" ``` When the same request with a `X-Auth-Token` says: ``` Nov 25 15:16:13 clio photoprism[629080]: time="2024-11-25T15:16:13+01:00" level=debug msg="photos: found 1 result for dist:2.000000 count:1 [61.207108ms]" Nov 25 15:16:13 clio photoprism[629080]: time="2024-11-25T15:16:13+01:00" level=debug msg="server: GET /api/v1/photos/view?count=1 (200) [61.455744ms]" ``` I'm indeed using just one method or the other, not both at the same time. As for the other methods, I had not tried them because they seemed more complex for my simple curl-based script, but looking again I see that app passwords can be used as a Bearer token, and they do work fine! Which I think confirms my suspicions that the error with the access_token is due to it not being linked to a specific user. *Edit* And I forgot to answer the other question, a few endpoints do work with an access_token, such as `/api/v1/config` and `/api/v1/status` but most of them answer either "Unable to do that" (400) or "Permission denied" (403).
Author
Owner

@sfxworks commented on GitHub (Feb 4, 2025):

I tried to take a go at it today. The new auth system is either confusing me or not working as expected.

|--------------|-------|-----------------------|------------------|-------|--------------|--------------|---------------------|---------------------|---------------------|
|  Session ID  | User  | Authentication Method |      Client      | Scope |   Login IP   |  Current IP  |     Last Active     |     Created At      |     Expires At      |
|--------------|-------|-----------------------|------------------|-------|--------------|--------------|---------------------|---------------------|---------------------|
| sess037g05sb |       | Access Token (OAuth2) | Exciting Fowl    | *     |              |              | 2025-02-04 09:18:39 | 2025-02-04 09:18:28 | 1969-12-31 23:59:59 |
| sess73r7b615 |       | Access Token (OAuth2) | Dominant Piranha | *     |              |              | 2025-02-04 09:15:09 | 2025-02-04 08:54:06 | 2026-02-04 08:54:06 |
| sessrvz4caxm | admin | Application           | photo-analyser   | *     |              |              | 2025-02-04 09:03:01 | 2025-02-04 09:02:56 |                     |
| sessfmgh8vkb | admin | Local                 | n/a              |       | 192.168.0.34 | 192.168.0.34 | 2025-02-04 09:02:34 | 2025-02-01 08:41:26 | 2025-02-15 08:41:26 |
| sessj2oj6a6m | admin | Local                 | n/a              |       | 192.168.0.34 | 192.168.0.34 | 2025-02-03 10:21:51 | 2025-02-03 10:01:57 | 2025-02-17 10:01:57 |
| sessr8lauu31 | admin | Local                 | n/a              |       | 192.168.0.34 | 192.168.0.34 | 2025-02-01 01:24:52 | 2025-01-31 03:07:18 | 2025-02-14 03:07:18 |
|--------------|-------|-----------------------|------------------|-------|--------------|--------------|---------------------|---------------------|---------------------|

I made quite a few. Both through the webpage through the cli.

Through the combinations I've tried, Only bearer auth token mode with a new "apps and devices" can get the photo list. Though it cannot download the photos directly.

time="2025-02-04T09:21:06Z" level=debug msg="server: GET /api/v1/users/usnjnapnhi0tzoyj/sessions?provider=application&method=default&count=10000&offset=0&order=client_name (200) [565.494µs]"
time="2025-02-04T09:21:12Z" level=debug msg="server: POST /api/v1/oauth/token (200) [212.496774ms]"
time="2025-02-04T09:21:23Z" level=debug msg="photos: found 10 results for dist:2.000000 count:10 merged:true [367.736041ms]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/photos?offset=0&count=10&merged=True (200) [382.667994ms]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/6cbe5b19e01f9eb5c3e8b96134919499de62280c (403) [34.09µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/87f215c40d68dc65e56b378283461a6c4f733910 (403) [17.71µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/e3f2ee866f3020a3dfc1d2b20a2923a9bbe24ed9 (403) [24.04µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/de1ca4a1e68b3c8dc052a0d5e25d183a3abc2f59 (403) [8.12µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/375ce8dfa4b363dff276c55dc5b24e6c225510f8 (403) [9.22µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/1465f695fe4350bf110a1e625686600436686bf6 (403) [16.91µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/968b25071158d6afdf3b215eef6f8e964864a8be (403) [15.95µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/93376eb10839a34734b0701523e2844ff002b1ed (403) [15.92µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/a3e336df83c36894adc385df53684ad780c73684 (403) [23.13µs]"
time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/53b7563f5c16b4f5002e92e1b9006896f8ac1ff7 (403) [17.67µs]"

Creating an auth with no username to just get the xauth token doesn't allow me to use the search endpoint at all.

time="2025-02-04T09:00:18Z" level=error msg="audit: 192.168.0.34 › session sess73r7b615 › search photos as  › denied"
time="2025-02-04T09:00:18Z" level=warning msg="audit: 192.168.0.34 › session sess73r7b615 › photos › search › Permission denied"
time="2025-02-04T09:00:18Z" level=debug msg="api-v1: abort /api/v1/photos with code 400 (unable to do that)"

So at this time, I'm not sure how to get photos anymore. Can anyone explain? Looking over https://docs.photoprism.app/developer-guide/api/oauth2/#access-tokens and https://docs.photoprism.app/developer-guide/api/search/#get-apiv1photos isn't revealing any rules that I don't seem to be abiding by.

@sfxworks commented on GitHub (Feb 4, 2025): I tried to take a go at it today. The new auth system is either confusing me or not working as expected. ``` |--------------|-------|-----------------------|------------------|-------|--------------|--------------|---------------------|---------------------|---------------------| | Session ID | User | Authentication Method | Client | Scope | Login IP | Current IP | Last Active | Created At | Expires At | |--------------|-------|-----------------------|------------------|-------|--------------|--------------|---------------------|---------------------|---------------------| | sess037g05sb | | Access Token (OAuth2) | Exciting Fowl | * | | | 2025-02-04 09:18:39 | 2025-02-04 09:18:28 | 1969-12-31 23:59:59 | | sess73r7b615 | | Access Token (OAuth2) | Dominant Piranha | * | | | 2025-02-04 09:15:09 | 2025-02-04 08:54:06 | 2026-02-04 08:54:06 | | sessrvz4caxm | admin | Application | photo-analyser | * | | | 2025-02-04 09:03:01 | 2025-02-04 09:02:56 | | | sessfmgh8vkb | admin | Local | n/a | | 192.168.0.34 | 192.168.0.34 | 2025-02-04 09:02:34 | 2025-02-01 08:41:26 | 2025-02-15 08:41:26 | | sessj2oj6a6m | admin | Local | n/a | | 192.168.0.34 | 192.168.0.34 | 2025-02-03 10:21:51 | 2025-02-03 10:01:57 | 2025-02-17 10:01:57 | | sessr8lauu31 | admin | Local | n/a | | 192.168.0.34 | 192.168.0.34 | 2025-02-01 01:24:52 | 2025-01-31 03:07:18 | 2025-02-14 03:07:18 | |--------------|-------|-----------------------|------------------|-------|--------------|--------------|---------------------|---------------------|---------------------| ``` I made quite a few. Both through the webpage through the cli. Through the combinations I've tried, Only bearer auth token mode with a new "apps and devices" can get the photo list. Though it cannot download the photos directly. ``` time="2025-02-04T09:21:06Z" level=debug msg="server: GET /api/v1/users/usnjnapnhi0tzoyj/sessions?provider=application&method=default&count=10000&offset=0&order=client_name (200) [565.494µs]" time="2025-02-04T09:21:12Z" level=debug msg="server: POST /api/v1/oauth/token (200) [212.496774ms]" time="2025-02-04T09:21:23Z" level=debug msg="photos: found 10 results for dist:2.000000 count:10 merged:true [367.736041ms]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/photos?offset=0&count=10&merged=True (200) [382.667994ms]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/6cbe5b19e01f9eb5c3e8b96134919499de62280c (403) [34.09µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/87f215c40d68dc65e56b378283461a6c4f733910 (403) [17.71µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/e3f2ee866f3020a3dfc1d2b20a2923a9bbe24ed9 (403) [24.04µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/de1ca4a1e68b3c8dc052a0d5e25d183a3abc2f59 (403) [8.12µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/375ce8dfa4b363dff276c55dc5b24e6c225510f8 (403) [9.22µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/1465f695fe4350bf110a1e625686600436686bf6 (403) [16.91µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/968b25071158d6afdf3b215eef6f8e964864a8be (403) [15.95µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/93376eb10839a34734b0701523e2844ff002b1ed (403) [15.92µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/a3e336df83c36894adc385df53684ad780c73684 (403) [23.13µs]" time="2025-02-04T09:21:23Z" level=debug msg="server: GET /api/v1/dl/53b7563f5c16b4f5002e92e1b9006896f8ac1ff7 (403) [17.67µs]" ``` Creating an auth with no username to just get the xauth token doesn't allow me to use the search endpoint at all. ``` time="2025-02-04T09:00:18Z" level=error msg="audit: 192.168.0.34 › session sess73r7b615 › search photos as › denied" time="2025-02-04T09:00:18Z" level=warning msg="audit: 192.168.0.34 › session sess73r7b615 › photos › search › Permission denied" time="2025-02-04T09:00:18Z" level=debug msg="api-v1: abort /api/v1/photos with code 400 (unable to do that)" ``` So at this time, I'm not sure how to get photos anymore. Can anyone explain? Looking over https://docs.photoprism.app/developer-guide/api/oauth2/#access-tokens and https://docs.photoprism.app/developer-guide/api/search/#get-apiv1photos isn't revealing any rules that I don't seem to be abiding by.
Author
Owner

@lastzero commented on GitHub (Feb 17, 2025):

@sfxworks Indeed, you might not be able to access pictures with a client-only access token, only other API endpoints, e.g. for monitoring. If you want to find and update pictures, you should bind access to a user account, e.g., by using a user app password or specifying a username when creating a token:

photoprism auth add --help

NAME:
   photoprism auth add - Adds a new authentication secret for client applications

USAGE:
   photoprism auth add [command options] [username]

DESCRIPTION:
   If you specify a username as argument, an app password will be created for this user account. It can be used as a password replacement to grant limited access to client applications.

OPTIONS:
   --name CLIENT, -n CLIENT         CLIENT name to help identify the application
   --scope SCOPES, -s SCOPES        authorization SCOPES e.g. "metrics" or "photos albums" ("*" to allow all)
   --expires LIFETIME, -e LIFETIME  authentication LIFETIME in seconds, after which access expires (-1 to disable the limit) (default: 31536000)
   --help, -h                       show help
@lastzero commented on GitHub (Feb 17, 2025): @sfxworks Indeed, you might not be able to access pictures with a client-only access token, only other API endpoints, e.g. for monitoring. If you want to find and update pictures, you should bind access to a user account, e.g., by using a user app password or specifying a username when creating a token: ``` photoprism auth add --help NAME: photoprism auth add - Adds a new authentication secret for client applications USAGE: photoprism auth add [command options] [username] DESCRIPTION: If you specify a username as argument, an app password will be created for this user account. It can be used as a password replacement to grant limited access to client applications. OPTIONS: --name CLIENT, -n CLIENT CLIENT name to help identify the application --scope SCOPES, -s SCOPES authorization SCOPES e.g. "metrics" or "photos albums" ("*" to allow all) --expires LIFETIME, -e LIFETIME authentication LIFETIME in seconds, after which access expires (-1 to disable the limit) (default: 31536000) --help, -h show help ```
Author
Owner

@sfxworks commented on GitHub (Feb 18, 2025):

Ah gotcha. I'll give that a try tonight

@sfxworks commented on GitHub (Feb 18, 2025): Ah gotcha. I'll give that a try tonight
Author
Owner

@hiimhoanganan commented on GitHub (May 26, 2025):

The only reason I chose PhotoPrism to manage my media is because it has built-in local AI and can be expanded in the future. I just want to say that your work is amazing, and I’m really looking forward to seeing this project completed.

@hiimhoanganan commented on GitHub (May 26, 2025): The only reason I chose PhotoPrism to manage my media is because it has built-in local AI and can be expanded in the future. I just want to say that your work is amazing, and I’m really looking forward to seeing this project completed.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
starred/photoprism#1785
No description provided.