starred/photoprism

Fork 0

mirror of https://github.com/photoprism/photoprism.git synced 2026-03-02 22:57:18 -05:00

People: Find all faces in videos #1806

New issue

Open

opened 2026-02-20 01:00:28 -05:00 by deekerman · 4 comments

deekerman commented

2026-02-20 01:00:28 -05:00

Owner

Originally created by @jNullj on GitHub (Jun 17, 2023).

Describe what problem this solves and why this would be valuable to many users

When searching faces in videos i noticed that only faces from the thumbnail seem to show.
Videos can include multiple people in frames that do not show in the thumbnail.

Describe the solution you'd like

Searching for faces in videos in multiple frames, ideally with minimal impact on performance yet recognizing all faces in the video.

Describe alternatives you've considered

I am brainstorming here:

We might search each frame - but i guess that could lead to huge performance hit
We might find a way to recognize key frames where new faces appear (maybe a tensorflow net for counting faces is less demanding then face recognition?), we might even recognize which of similar frames has better chance at recognizing the face correctly. (maybe we could score this somehow?)
We might smartly search in random frames, admin could change how many frames vs performance he prefers for scans. This will be simple to implement relative to 2, but less performance hit then 1
There might be a smart way to detect large changes in frames, we could search each time we see drastic change from last baseline frame.

Additional context

That might be relevant to labels as well. But i think this would be trickier to implement as we can't really look for a specific change in frames like with faces. This might prove to be costly.

Originally created by @jNullj on GitHub (Jun 17, 2023). **Describe what problem this solves and why this would be valuable to many users** When searching faces in videos i noticed that only faces from the thumbnail seem to show. Videos can include multiple people in frames that do not show in the thumbnail. **Describe the solution you'd like** Searching for faces in videos in multiple frames, ideally with minimal impact on performance yet recognizing all faces in the video. **Describe alternatives you've considered** I am brainstorming here: 1. We might search each frame - but i guess that could lead to huge performance hit 2. We might find a way to recognize key frames where new faces appear (maybe a tensorflow net for counting faces is less demanding then face recognition?), we might even recognize which of similar frames has better chance at recognizing the face correctly. (maybe we could score this somehow?) 3. We might smartly search in random frames, admin could change how many frames vs performance he prefers for scans. This will be simple to implement relative to 2, but less performance hit then 1 4. There might be a smart way to detect large changes in frames, we could search each time we see drastic change from last baseline frame. **Additional context** That might be relevant to labels as well. But i think this would be trickier to implement as we can't really look for a specific change in frames like with faces. This might prove to be costly.

deekerman added the

idea

video

faces

labels

2026-02-20 01:00:28 -05:00

deekerman commented

2026-02-20 01:00:28 -05:00

Author

Owner

@ItsMeRitch commented on GitHub (Nov 28, 2023):

Would also love to see this implemented.

Even if it just simply created a set number of thumbnails per minute of video instead of just the 1 thumbnail and scan them all. Would be a first of its kind as I don't believe other similar software can do this, including Apple Photos.

@ItsMeRitch commented on GitHub (Nov 28, 2023): Would also love to see this implemented. Even if it just simply created a set number of thumbnails per minute of video instead of just the 1 thumbnail and scan them all. Would be a first of its kind as I don't believe other similar software can do this, including Apple Photos.

deekerman commented

2026-02-20 01:00:29 -05:00

Author

Owner

@kamilzierke commented on GitHub (Dec 27, 2023):

Def. bumping this one. Could be done in phases or with two mentioned options;

easier one: create a set of thumbs per video (x thumbs per minute or x thumbs per video), stack them for face recognition, collage them for the main thumbnail
harder one: key frames for face recog.
godlike: all frame analysis with building independent model for each face

@kamilzierke commented on GitHub (Dec 27, 2023): Def. bumping this one. Could be done in phases or with two mentioned options; - easier one: create a set of thumbs per video (x thumbs per minute or x thumbs per video), stack them for face recognition, collage them for the main thumbnail - harder one: key frames for face recog. - godlike: all frame analysis with building independent model for each face

deekerman commented

2026-02-20 01:00:29 -05:00

Author

Owner

@ItsMeRitch commented on GitHub (Dec 27, 2023):

I also think there is an aspect of video face recognition already there. When importing a video, a thumbnail is created and face recognition is carried out on that (correct me if I'm wrong) so the infrastructure should be there for it.

@ItsMeRitch commented on GitHub (Dec 27, 2023): I also think there is an aspect of video face recognition already there. When importing a video, a thumbnail is created and face recognition is carried out on that (correct me if I'm wrong) so the infrastructure should be there for it.

deekerman commented

2026-02-20 01:00:29 -05:00

Author

Owner

@kamilzierke commented on GitHub (Dec 27, 2023):

So I've read some documentation for Photoprism and it seems that it uses a pre-trained FaceNet model and it uses GO version of TensorFlow to work on the datasets. That means the face recognition is as good as the pre-trained model and is not being done locally "per se" and there are no model updates done locally.

Creating frames for the video using ffmpeg can be done as easy as:

PowerShell (in my case video files are in "sorted" subdirectory)

$ErrorActionPreference = "Stop"
$desiredFrames = 10 # Number of frames to generate
$idLength = 2 # Length of the file identifier
$imageFormat = "png" # Image format (e.g., png, jpg)

# Definition of video file extensions
$videoExtensions = @("*.mp4", "*.avi", "*.mov", "*.mkv", "*.flv", "*.wmv", "*.webm")

# Search the "sorted" folder and its subfolders
Get-ChildItem "sorted" -Recurse -Include $videoExtensions | ForEach-Object {
    $baseName = $_.BaseName
    $duration = & ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 $_.FullName
    $interval = [math]::Floor($duration / ($desiredFrames - 1))

    for ($i = 0; $i -lt $desiredFrames; $i++) {
        $timestamp = $interval * $i
        $frameNum = "{0:D$idLength}" -f $i
        $outputFileName = "$($baseName)_$frameNum.$imageFormat"
        & ffmpeg -ss $timestamp -i $_.FullName -frames:v 1 $outputFileName
    }
}

So as far as I don't know what I'm talking about it seems that we won't have any major face recognition upgrades unless we'll get a better pre-trained model for the purpose.

@kamilzierke commented on GitHub (Dec 27, 2023): So I've read some documentation for Photoprism and it seems that it uses a pre-trained FaceNet model and it uses GO version of TensorFlow to work on the datasets. That means the face recognition is as good as the pre-trained model and is not being done locally "per se" and there are no model updates done locally. Creating frames for the video using ffmpeg can be done as easy as: PowerShell (in my case video files are in "sorted" subdirectory) ``` $ErrorActionPreference = "Stop" $desiredFrames = 10 # Number of frames to generate $idLength = 2 # Length of the file identifier $imageFormat = "png" # Image format (e.g., png, jpg) # Definition of video file extensions $videoExtensions = @("*.mp4", "*.avi", "*.mov", "*.mkv", "*.flv", "*.wmv", "*.webm") # Search the "sorted" folder and its subfolders Get-ChildItem "sorted" -Recurse -Include $videoExtensions | ForEach-Object { $baseName = $_.BaseName $duration = & ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 $_.FullName $interval = [math]::Floor($duration / ($desiredFrames - 1)) for ($i = 0; $i -lt $desiredFrames; $i++) { $timestamp = $interval * $i $frameNum = "{0:D$idLength}" -f $i $outputFileName = "$($baseName)_$frameNum.$imageFormat" & ffmpeg -ss $timestamp -i $_.FullName -frames:v 1 $outputFileName } } ``` So as far as I don't know what I'm talking about it seems that we won't have any major face recognition upgrades unless we'll get a better pre-trained model for the purpose.