Skip to content

Develop#160

Merged
solderzzc merged 50 commits intomasterfrom
develop
Mar 18, 2026
Merged

Develop#160
solderzzc merged 50 commits intomasterfrom
develop

Conversation

@solderzzc
Copy link
Member

No description provided.

solderzzc and others added 30 commits March 15, 2026 14:27
- Move skill: skills/annotation/ → skills/segmentation/
- Add deploy scripts (deploy.sh, deploy.bat)
- Update README: new Segmentation category row, mark as ✅ Ready
- Update skills.json with sam2-segmentation entry
…ls.json

- New skill: skills/annotation/dataset-management/ with deploy scripts
- Update sam2-segmentation SKILL.md
- Update skills.json with new entries
feat: TensorRT FP16 backend for depth estimation
…, and benchmark

- deploy.bat: Windows bootstrapper with Python discovery, venv creation,
  env_config hardware detection, CUDA/CPU pip install, and TensorRT pre-build
- requirements_cpu.txt: CPU-only PyTorch + depth-anything-v2 dependencies
- requirements_cuda.txt: CUDA 12.4 PyTorch + TensorRT dependencies
- benchmark.py: cross-platform benchmark supporting CoreML and PyTorch/TRT
- deploy.sh: updated existing shell script
…enchmark scripts

- transform.py: add ONNX model download and inference path
- deploy.bat: refine Python discovery and dependency flow
- requirements: update dependency specs
- benchmark.py: unify inference routing for CoreML/ONNX/PyTorch
…ploy script

- transform.py: refine available EP detection for ONNX inference
- deploy.bat: update dependency installation flow
- Remove assistant prefill injection that caused 400 errors with Qwen3.5
  when enable_thinking is active
- Remove presence_penalty from JSON-expected requests
- Fix VLM/LLM split to only count image analysis suites as VLM
…ntent

- Append JSON-only instruction to last user message for local models
- Replace null content with empty string for llama-server compatibility
- transform.py: improve ONNX runtime execution provider handling
- requirements_cuda.txt: add onnxruntime-gpu dependency
- Auto-detect mode: default to 'llm' when no VLM URL is set
- Convert tool_calls/tool messages to plain text for llama-server compat
- Smart max_tokens: use max_completion_tokens for cloud, max_tokens for local
- Expand stripThink to handle Qwen3.5 plain-text reasoning blocks
- Harden JSON parser: clean ellipsis, placeholder tags, trailing commas
- Always exit 0 in skill mode (results reported via JSON events)
- Add stream_options: { include_usage: true } for OpenAI API token reporting
- Fall back to chunk-counted completion tokens for local llama-server
- Track per-test tokens via _currentTestTokens accumulator
- Include token data in test results, log output, and emitted events
- Auto-detect LLM-only mode when no VLM URL is provided
- Use max_completion_tokens for cloud APIs (GPT-5.4+), max_tokens for local
- Always exit 0 in Aegis skill mode regardless of test failures
- Remove stream_options for local llama-server (causes crashes)
- Drop max_tokens — streaming 2000-token cap is safety net
- Enhance parseJSON for multi-word <placeholder> tags
- Add JSON extraction fallback from reasoning_content
- Simplify prompt template to avoid template echoing
- Fix process.exit(1) in skill mode for clean status
fix(benchmark): disable thinking mode & improve JSON parsing
…ll auto-start

- Update benchmark paper: 131→143 tests (VLM Scene 35→47, 3 new dedup scenarios, 4 new tool-use scenarios)
- Add performance metrics to run-benchmark.cjs (TTFT, decode throughput tracking)
- Fix tool_call argument serialization for non-string arguments
- Enable auto_start for yolo-detection-2026 and depth-estimation skills
- Add LaTeX build artifacts .gitignore
feat: expand HomeSec-Bench to 143 tests, add perf metrics, enable ski…
- Renamed 'YOLO 2026 Object Detection' → 'YOLO 2026'
- Renamed 'Depth Estimation (Privacy)' → 'Depth Anything V2'
- Added disabled: true to Model Training, SAM2 Segmentation, Annotation Data
feat: rename skills for sidebar clarity and disable unstable skills
- Ship pre-built yolo26n.onnx (9.5MB) and yolo26n_names.json
- Add _OnnxCoreMLModel wrapper using onnxruntime + CoreMLExecutionProvider
- Bypasses macOS 26.x MPSGraph MLIR crash (SIGABRT in MPSGraphExecutable.mm)
- Inference: 11ms/frame (~91 FPS) on Apple M5 Pro
- Strip requirements_mps.txt: remove torch/torchvision/ultralytics (~120MB -> ~17MB)
- Class names loaded from JSON instead of .pt (no torch dependency at runtime)
Add pre-exported ONNX models for all four detection sizes:
- yolo26n.onnx (9.5 MB) — nano (already shipped)
- yolo26s.onnx (37 MB) — small
- yolo26m.onnx (78 MB) — medium
- yolo26l.onnx (95 MB) — large

Each includes a companion _names.json with COCO 80 class labels.
Eliminates torch/ultralytics dependency for all model sizes.
- Revert shipping s/m/l ONNX models in repo (~210MB saved)
- Keep only yolo26n.onnx (9.5MB) shipped for zero-config default
- Add _download_onnx_from_hf() for s/m/l: downloads from
  onnx-community/yolo26{s,m,l}-ONNX on first use
- Uses stdlib urllib (no extra dependencies)
- Auto-copies class names from shipped yolo26n_names.json
solderzzc and others added 20 commits March 18, 2026 11:49
- Replace ultralytics-exported yolo26n.onnx with onnx-community version
- Update _OnnxCoreMLModel to parse HF format: logits [1,300,80] + pred_boxes [1,300,4]
- All YOLO26 sizes (n/s/m/l) now use the same onnx-community format
- Verified: 15-25ms/frame on M5 Pro with CoreML EP, correct detections
refactor: standardize on onnx-community HuggingFace ONNX format
- Redesign generate-report.cjs as a multi-view Operations Center
  - Three tabs: Performance, Quality, Vision
  - Run picker sidebar with model-grouped history + multi-select
  - Comparison tables across selected runs
  - Export to Markdown for community sharing
- Add live progress mode (auto-refresh + LIVE banner)
  - Intermediate saves after each suite completes
  - Browser auto-opens with pulsing progress indicator
  - Auto-refreshes every 5s during benchmark run
- Save VLM fixture metadata (filename, response, prompt) per test
- Embed all data inline for fully self-contained HTML
feat: benchmark Operations Center with live progress dashboard
- saveLiveProgress() called after each test, not just each suite
- Include in-progress suite in live data for Quality/Vision tabs
- Skip fixture image embedding in live mode (~43MB savings per regeneration)
- Enhanced live banner with test name and test count
- Use HTML entities (&#39;) for quotes in onclick to avoid multi-level escaping
- Replace <meta http-equiv=refresh> with JS setTimeout for stateful reload
- Preserve active tab + scroll position across refreshes via sessionStorage
- Compute live perfSummary from accumulated TTFT/decode arrays
- TTFT, Decode Speed, Server Prefill/Decode now update in real-time
- Fix SyntaxError: use HTML entities for collapsed toggle onclick
- Replace meta refresh with JS setTimeout + sessionStorage state
- sampleResourceMetrics() parses ioreg for Apple Silicon MPS stats
- GPU utilization, renderer %, GPU memory, system memory tracked
- Sampled after each suite, included in live perfSummary
- 3 new hero cards: GPU Utilization, GPU Memory, System Memory
@solderzzc solderzzc merged commit d51176f into master Mar 18, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants