Found 75-126 stale `occ system:cron` processes on 172.16.3.22 piling up since 2026-04-27 due to bad oc_filecache LIKE query against pavon's 257K camera files. Killed stale procs (load 80 -> 5), wrapped apache crontab with `flock -n /tmp/oc-cron.lock` to prevent restacking. Per-user versioning disable rejected by OwnCloud Community (`files_versions` can't be enabled for groups); workaround `occ versions:cleanup pavon` identified and deferred. Migration/retention cron deferred per user. NVR architecture clarified: GeoVision NVRs sync via OC Desktop client with virtual file placeholders; no direct SMB access to Jupiter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
17 KiB
Session Log — Pavon — 2026-04-29
User
- User: Mike Swanson (mike)
- Machine: GURU-BEAST-ROG
- Role: admin
Mode
infra — OwnCloud VM diagnostics on Jupiter, multi-host SSH, scheduled cron edits.
Session Summary
A performance issue was reported on a Windows Server 2016 VM running SharePoint Migration Tool for BirthBiologic, hosted on Jupiter (Unraid 172.16.3.20). Initial analysis incorrectly attributed high CPU usage to CPU pinning, but further investigation revealed that the high load was caused by the OwnCloud VM (172.16.3.22) instead. The OwnCloud VM, running Rocky Linux 9.6, was experiencing a severe performance degradation due to an excessive number of stalled occ system:cron processes. These processes were executing inefficient MariaDB queries that were scanning the oc_filecache table repeatedly, leading to sustained high CPU usage and load average. The inefficient query pattern used a wildcard in the LIKE clause, incorrect collation, and missing indexes, resulting in full table scans.
After killing the stalled processes and implementing a flock lock to prevent future stacking, CPU usage dropped significantly. Further investigation into the storage configuration and data retention policy confirmed that OwnCloud is the source of truth for the data (3-year retention), and no immediate migration or cleanup was performed at the user's request. Credentials access was switched from 1Password to SOPS partway through the session per user preference, which uncovered drift between the two sources for the OwnCloud VM root password.
Key Decisions
- Kill stalled cron processes immediately rather than waiting them out — system was in deep distress (load avg 80, ~70 tasks queued waiting on CPU) and there was no risk to user data
- Wrap cron with
flock -ninstead of disabling cron outright — preserves OwnCloud's normal background-job processing while making the stacking-spiral structurally impossible - Defer per-user versioning disable — OwnCloud Community refused
--groupsflag forfiles_versions. Workaround identified (occ versions:cleanup pavonon a daily cron) but deferred per user request - Did not bypass OwnCloud for ingest — user clarified GeoVision NVRs at Curves and Raiders sites use OwnCloud Desktop sync client with virtual file placeholders to save NVR-local disk; NVR units lack direct SMB access to Jupiter/Saturn, so the existing ingest path stays
- Defer the migration/retention scripts — current state (cron-flock + status quo on versioning) is stable enough; user wants to revisit the architectural cleanup with more time
Problems Encountered
- Incorrect initial diagnosis of CPU pinning — first hypothesis was that the Windows VM's 16 vCPUs were oversubscribed onto 8 host threads.
virsh dumpxmlshowed no<cputune>block at all on the Windows VM. Re-pivoted by checking which qemu process actually owned the load, found OwnCloud - Stale
occ system:cronprocesses — 75-126 piled up since 2026-04-27. No flock wrapper meant every 15-min tick fired a new run while previous ones still ground on. Resolved withpkill -f 'occ system:cron'thenpkill -9for stragglers - Inefficient MariaDB query pattern in OwnCloud's filecache scanner —
name COLLATE utf8mb4_general_ci LIKE 'pattern%middle%suffix'defeats theutf8mb4_binindex and the wildcard mid-string. The schema only has(parent, name)and(storage, path_hash)indexes, no(storage, name). Each query becomes a 257K-row scan on storage 78 - Failed attempt to disable versioning for one user —
app:enable files_versions --groups <group>rejected with "files_versions can't be enabled for groups." Reverted by re-enabling globally - Credential drift between 1Password and SOPS — SOPS had
r3tr0gradE99!!for OwnCloud VM root, 1Password hadPaper123!@#-unifi!. SOPS value worked, 1Password did not. Worth reconciling - PowerShell mangling shell metachars — repeatedly tripped on
$(),*,(, backticks when invoking SSH inline. Worked around by writing scripts to local files, scp-ing, then executing remotely
Credentials Used / Discovered
Jupiter (Unraid Primary)
- Host: 172.16.3.20
- User: root
- Password (1Password and SOPS agree):
Th1nk3r^99## - SOPS path:
infrastructure/jupiter-unraid-primary.sops.yaml - 1Password item id:
5ji4rsgvn6feare6fahxsqauui
OwnCloud VM (Rocky Linux 9.6)
- Host: 172.16.3.22
- Hostname: cloud.acghosting.com
- User: root
- Password (SOPS — works):
r3tr0gradE99!! - Password (1Password — STALE, does NOT work):
Paper123!@#-unifi!← needs reconciliation - SOPS path:
infrastructure/owncloud-vm.sops.yaml - 1Password item id:
h6usgzxxn26kvckxz5dhssxdai - SSH host key (ed25519):
SHA256:Yy4oFv5HudmKjNJ4IZgHcuSSmeBvUg+ZJta6iLasdqU
MariaDB on OwnCloud VM
- Local socket auth as root from the VM (no password prompt for
mysqlCLI when run as root) - OwnCloud's app DB:
owncloudschema, userowncloud@localhost
Infrastructure Touched
| Host | IP | Role | Action |
|---|---|---|---|
| Jupiter (Unraid Primary) | 172.16.3.20 | Hypervisor | Read-only diagnostics: virsh list, virsh dumpxml, virsh vcpuinfo, mpstat, ps |
| OwnCloud VM | 172.16.3.22 | Rocky Linux 9.6, runs OwnCloud + MariaDB 10.5.29 | pkill cron processes, edited apache crontab, group/app changes (reverted) |
| Uranus (Unraid Secondary) | 172.16.3.21 | SMB share host (Storage share) |
None this session — referenced as future archive target |
VM cputune findings on Jupiter
- Windows Server 2016 VM: No CPU pinning. Topology
sockets=1, cores=8, threads=2(16 vCPUs). Memory 16 GB current / 32 GB max with ballooning. - OwnCloud VM: vCPUs 0-7 pinned 1:1 to host CPU 0,28,1,29,2,30,3,31 — exactly the eight LPs that were saturated. Topology
cores=4, threads=2(8 vCPUs).
OwnCloud users and groups
- 10 users total: Martell, anaise, bst, jburger, mara, minrec, pavon, rohrbach, sysadmin, themarcgroup
- Existing groups: ACG, Clients, PST, QMS, Stamback, Stoltz, WCP, admin
- Only sysadmin is in any group (ACG, admin); all other users are unaffiliated
Storage layout (OwnCloud VM)
- OwnCloud data dir:
/owncloud, NFS-mounted from172.16.3.20:/mnt/user/OwnCloud - Filesystem state: 932 GB total, 677 GB used, 248 GB free, 74% full
- Pavon's storage (numeric_id 78,
home::pavon):/owncloud/pavon/files/Curves/— 188,920 files (all sub-paths Curves/Data-F/CamNN/YYYYMMDD/Event*.Avi or .Wav)/owncloud/pavon/files/Raiders/— 48,978 files (Raiders/Cameras{,2}/CamNN/YYYYMMDD/Event*.Avi)- Two NVR log files at root (
NVR-18019140.out,NVR-18082322.out, ~16K each) - Total ~237K files
- 30 GB of accumulated junk versions in
/owncloud/pavon/files_versions/(1,326 version files, 1,383 filecache rows)
- External storage (numeric_id 6, mount_id 6):
- Mount point in pavon's view:
/Archive - Backend: SMB Personal (unique file IDs)
- Host: 172.16.3.21 (Uranus)
- Share:
Storage - SMB user:
owncloud filesystem_check_changesalready set to 0 — OwnCloud doesn't auto-rescan this on cron
- Mount point in pavon's view:
File age distribution (pavon)
2024: 1 file (oldest from 2024-12-21)
2025: 162,898 files
2026: 74,719 files
Older than 365 days: 256 files (as of 2026-04-29)
Architecture (NEW context from this session)
- GeoVision NVR units at Curves and Raiders client sites
- Each NVR runs OwnCloud Desktop sync client with the sync folder pointed at the NVR's data drop directory
- After upload, OwnCloud client converts the local file to a virtual file placeholder to conserve NVR-local disk; if the NVR ever needs the file, the placeholder triggers a re-download
- NVR units have NO direct SMB access to Jupiter or Saturn — they reach OwnCloud only via the WebDAV interface used by the desktop client
- OwnCloud is the source of truth for this footage, not a backup of it
- Retention policy: 3 years. Older may be deleted
- Pavon never uses OwnCloud directly — only the NVR interface for footage retrieval
- GeoVision has no built-in age-based file routing — can't move old files to a different folder on the NVR side
Commands & Outputs (Critical)
Identifying the runaway VM (on Jupiter)
ssh root@172.16.3.20 'ps -eo pid,pcpu,pmem,comm,args --sort=-pcpu | head -3'
# PID 15343 486% CPU qemu-system-x86_64 ... guest=OwnCloud
# PID 2349755 118% CPU qemu-system-x86_64 ... guest=Windows Server 2016
# PID 13887 25% CPU qemu-system-x86_64 ... guest=Unifi
mpstat -P 0,1,2,3,28,29,30,31 1 1
# All eight LPs showed %guest near 100, %idle near 0 — load was guest VM, not host
OwnCloud cputune confirming pin (on Jupiter)
virsh dumpxml OwnCloud | grep -A 10 cputune
# <vcpupin vcpu='0' cpuset='0'/>
# <vcpupin vcpu='1' cpuset='28'/>
# ...
# <vcpupin vcpu='7' cpuset='31'/>
Stale cron count and load (on OwnCloud VM)
ps -ef | grep 'occ system:cron' | grep -v grep | wc -l
# 126
uptime
# load average: 80.38, 77.60, 76.98
The bad query pattern (sample)
SELECT `fileid`, `storage`, `path`, `parent`, `name`, `mimetype`, `mimepart`,
`size`, `mtime`, `encrypted`, `etag`, `permissions`, `checksum`
FROM `oc_filecache`
WHERE `storage` = '78'
AND `name` COLLATE utf8mb4_general_ci LIKE 'Event20260412190705025.Wav.v%.d1776373045'
oc_filecache schema confirming the index situation
CREATE TABLE oc_filecache (
...
`name` varchar(250) DEFAULT NULL,
PRIMARY KEY (`fileid`),
UNIQUE KEY `fs_storage_path_hash` (`storage`, `path_hash`),
KEY `fs_parent_name_hash` (`parent`, `name`),
KEY `fs_storage_mimetype` (`storage`, `mimetype`),
KEY `fs_storage_mimepart` (`storage`, `mimepart`),
KEY `fs_storage_size` (`storage`, `size`, `fileid`),
KEY `fs_parent_storage_size` (`parent`, `storage`, `size`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
No (storage, name) index exists. Combined with the COLLATE utf8mb4_general_ci override and mid-string wildcard, every query becomes a full scan of the 257K rows in storage 78.
Killing the cron stack
pkill -f 'occ system:cron'
sleep 3
pkill -9 -f 'occ system:cron'
# Connection dropped briefly during the kill due to system shock — expected
# Reconnected: 0 stale processes; load avg dropped 80 -> 27 -> 5 within ~3 minutes
Active queries after cleanup
SELECT COUNT(*) FROM information_schema.PROCESSLIST WHERE COMMAND != 'Sleep';
-- 1 (just my own query)
Configuration Changes (PERSISTED)
OwnCloud VM — apache crontab (/var/spool/cron/apache)
Before (caused stacking):
*/15 * * * * /usr/bin/php -f /var/www/owncloud/occ system:cron
After (current production):
*/15 * * * * /usr/bin/flock -n /tmp/oc-cron.lock /usr/bin/php -f /var/www/owncloud/occ system:cron
flock -n makes new ticks bail immediately if the previous run still holds the lock — at most one occ system:cron ever runs.
Backup: /root/apache-crontab.backup-20260428-pre-flock (67 bytes, contains the original line)
Group changes (NOT REVERTED)
A group named versioning_users was created on the OwnCloud VM during the failed per-user-versioning attempt, with all 9 non-pavon users added. The intent was to scope files_versions to that group, but app:enable --groups was rejected. The group still exists with those memberships. It's harmless (no app uses it) but worth knowing about for next session — could be deleted with:
sudo -u apache php /var/www/owncloud/occ group:delete versioning_users
files_versions app
Was DISABLED briefly during the failed attempt; re-ENABLED globally to restore status quo. Currently enabled for all users as before.
Pending / Incomplete Tasks
All deferred per user request. The system is stable now thanks to the flock fix; the items below are improvements, not emergencies.
| # | Task | Status | Notes |
|---|---|---|---|
| 1 | Investigate pavon storage layout | DONE | Findings captured above |
| 2 | Disable versioning for pavon | DEFERRED | Approach: daily cron occ versions:cleanup pavon + occ trashbin:cleanup pavon (also clean existing 30 GB of pavon's accumulated versions) |
| 3 | Set up external storage mount for archive | DONE (already existed) | Storage 6, /Archive, SMB to Uranus, filesystem_check_changes already 0 |
| 4 | Disable trash on external archive storage | DEFERRED | Mooted by retention design — find -mtime +1095 -delete bypasses OwnCloud trash anyway |
| 5 | Build monthly migration cron (internal → /Archive) | DEFERRED | NVR architecture forces files to land internal first. Cutoff target: 90 days. Caveat: /Archive uses "SMB Personal (unique file IDs)" backend — host-level CIFS moves may break file-ID invariant |
| 6 | Build 3-year retention pruning cron | DEFERRED | Weekly find /Archive -type f -mtime +1095 -delete then occ files:scan pavon/Archive |
Concrete next session checklist
- Decide approach for pavon versioning: A (disable globally), C (aggressive 30-day migration), or D (
versions:cleanup pavondaily cron) — D is what user proposed - If D: run one-time
occ versions:cleanup pavon(reclaims 30 GB) +occ trashbin:cleanup pavon, then schedule0 3 * * *cron - Decide migration cutoff (90 days is current target; could go shorter for capacity reasons since
/owncloudis 74% full) - Build migration script — open question whether to use OwnCloud API or host-level CIFS mount +
mv - Build retention pruning cron on Uranus or via CIFS mount on OwnCloud VM
- Reconcile the 1Password OwnCloud password (currently has stale value
Paper123!@#-unifi!; should ber3tr0gradE99!!per SOPS) - Optionally clean up:
occ group:delete versioning_users(created but unused this session)
Reference
File paths
- OwnCloud install root:
/var/www/owncloud - OwnCloud occ command:
sudo -u apache php /var/www/owncloud/occ ... - OwnCloud data dir:
/owncloud(NFS to Jupiter) - Pavon's home files:
/owncloud/pavon/files/ - Pavon's versions junk:
/owncloud/pavon/files_versions/(30 GB, 1,326 files) - Apache crontab:
/var/spool/cron/apache - Crontab backup:
/root/apache-crontab.backup-20260428-pre-flock - Cron lock file:
/tmp/oc-cron.lock(used by flock wrapper)
Useful occ commands for next session
# Per-user version cleanup (deletes BOTH disk files and filecache rows)
sudo -u apache php /var/www/owncloud/occ versions:cleanup pavon
sudo -u apache php /var/www/owncloud/occ trashbin:cleanup pavon
# Trigger a scan after filesystem-level changes
sudo -u apache php /var/www/owncloud/occ files:scan pavon
sudo -u apache php /var/www/owncloud/occ files:scan pavon/Archive
# Per-user expire (uses retention policy)
sudo -u apache php /var/www/owncloud/occ versions:expire pavon
sudo -u apache php /var/www/owncloud/occ trashbin:expire pavon
# External storage management
sudo -u apache php /var/www/owncloud/occ files_external:list
sudo -u apache php /var/www/owncloud/occ files_external:option <id> filesystem_check_changes 0
Local files created this session
All under c:/Users/guru/ClaudeTools/temp/ (small bash scripts uploaded via pscp to /tmp on OwnCloud VM):
owncloud-investigate.sh— initial pavon storage walkowncloud-versioning-check.sh— schema/state checkowncloud-groups-check.sh— group enumerationowncloud-pavon-groups.sh— per-user group mappingowncloud-versioning-restrict.sh— failed group-restrict attemptocc-versions-help.sh— discover available occ subcommands
These are scratch scripts; no need to preserve.
MariaDB on OwnCloud VM (cheat sheet)
- Version: MariaDB 10.5.29
- Local CLI:
mysql owncloud --skip-column-names <<<'SQL...'works as root via socket auth mysql -e 'SQL'does NOT work via plink heredoc; PowerShell mangles quoting. Use heredoc (<<<or<<EOF) instead
Notes for Howard / Future Mike
- Don't kill the OwnCloud VM while this OwnCloud + GeoVision NVR architecture is in place — pavon's NVRs are dependent on it being reachable to either upload new footage OR rehydrate placeholders if footage is requested
- The flock wrapper is the only thing keeping the cron stable right now. Any future change to
/var/spool/cron/apacheshould preserveflock -n /tmp/oc-cron.lock - The bad query is a property of OwnCloud's
files_versionsbackground job + camera filename pattern + filecache schema — not specific to anything we did. If we ever upgrade OwnCloud, retest under load before declaring victory - Storage 6 (
/Archive) is already configured correctly — don't recreate it. Just use it - 30 GB cleanup waiting —
occ versions:cleanup pavonwill reclaim it instantly when you're ready
Status at session end
- OwnCloud VM load: dropping past 5.95 (1-min) from a peak of 80, recovering toward normal
- MariaDB CPU: oscillating 0-140% (normal background)
- Stale cron processes: 0
- New cron via flock: armed; first protected tick will fire within 15 minutes of the change
- BirthBiologic SPMT (the original ticket): unchanged on the Windows VM side; the user stopped SPMT manually during diagnosis. Decision pending whether to resume SPMT or switch to OneDrive client (separate question discussed mid-session)