* feat: refactor DeepSeekReasonerAgent as a drop-in replacement for pydantic_ai Agent
This refactor improves the `DeepSeekReasonerAgent` adapter to be a robust, drop-in replacement for native PydanticAI Agents. It brings several enhancements:
1. Re-implemented `run()` to manually inject historical messages (`message_history`) and dependencies, preserving state across workflows.
2. Replaced the simplistic crash loop with an explicit, manual multi-turn retry mechanism. If the Markdown JSON parser fails, it correctly injects the `ValidationError` back into the conversation history and prompts the model to correct its structure up to `retries` times.
3. Designed an elegant proxy `AgentRunResultProxy` to seamlessly wrap `AgentRunResult` outputs. This cleanly passes through downstream calls (e.g., `result.data`, `result.usage()`, `result.all_messages()`) avoiding `AttributeError`s and Monkey-patching.
4. Integrated fallback tool descriptions parsing, dynamically instructing the model on available tools.
5. Adapted `AgentFactory` to correctly propagate `tools` and `retries`.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: add deepseek provider option to frontend
- Added 'deepseek' option to `ProvidersSettings.tsx` `<select>`
- Updated frontend Typescript interfaces in `index.ts` to allow 'deepseek' as `provider_type`
- Validated frontend build successful
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
This refactor improves the `DeepSeekReasonerAgent` adapter to be a robust, drop-in replacement for native PydanticAI Agents. It brings several enhancements:
1. Re-implemented `run()` to manually inject historical messages (`message_history`) and dependencies, preserving state across workflows.
2. Replaced the simplistic crash loop with an explicit, manual multi-turn retry mechanism. If the Markdown JSON parser fails, it correctly injects the `ValidationError` back into the conversation history and prompts the model to correct its structure up to `retries` times.
3. Designed an elegant proxy `AgentRunResultProxy` to seamlessly wrap `AgentRunResult` outputs. This cleanly passes through downstream calls (e.g., `result.data`, `result.usage()`, `result.all_messages()`) avoiding `AttributeError`s and Monkey-patching.
4. Integrated fallback tool descriptions parsing, dynamically instructing the model on available tools.
5. Adapted `AgentFactory` to correctly propagate `tools` and `retries`.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Refactor tool loading to use static database assignment
- Add `tools` field to `WorkerIndividual` and `SystemNodeConfig` models
- Update frontend WorkerIndividualSettings to support multi-select tools and select `bound_skill` from available backend API
- Deprecate dynamic `get_tool` method during agent execution, replacing it with `load_tools_from_list` via absolute paths during agent initialization
- Fix deprecated tool loading usage across `SupervisoryNode`, `ConsciousnessNode`, `ControlNode`, and `BaseIndividual`
- Provide `reload` API route to clear `GlobalIndividualManager` state to rebuild nodes cleanly upon next request
- Fix `user_test.py` related to unhashed password assignments in testing mock object
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Refactor tool loading to use static database assignment part 2
- Restore BaseIndividual's `_init_agent` method to inject parsed `tools_list` via absolute paths dynamically using `load_tools_from_list`
- Restore React component multi-select support for tools dropdown via `/api/v1/resource/tool` backend response integration
- Restore `user_test.py` compatibility related to access hasher assignments
- Passed 75 tests with no errors
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Refactor tool loading to use static database assignment
- Add `tools` field to `WorkerIndividual` and `SystemNodeConfig` models
- Update frontend WorkerIndividualSettings to support multi-select tools and select `bound_skill` from available backend API
- Deprecate dynamic `get_tool` method during agent execution, replacing it with `load_tools_from_list` via absolute paths during agent initialization
- Fix deprecated tool loading usage across `SupervisoryNode`, `ConsciousnessNode`, `ControlNode`, and `BaseIndividual`
- Provide `reload` API route to clear `GlobalIndividualManager` state to rebuild nodes cleanly upon next request
- Fix `user_test.py` related to unhashed password assignments in testing mock object
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Finalize all tool loading refactoring with complete changes
- Merged database schema enhancements for `tools` multi-select list
- Completed the `load_tools_from_list` via absolute paths utility update
- Refactored `AgentFactory` and corresponding initialization calls in `BaseIndividual`, `SupervisoryNode`, `ConsciousnessNode`, and `ControlNode` to inject statically via constructor
- Updated frontend UI `WorkerIndividualSettings` with UI for `bound_skill` selection and dynamic multi-tool selector array
- Removed all legacy usages of dynamic `tools=tool` invocation
- Provided backend `/reload` API and automatic clearing of instance cache upon individual configuration update
- Repaired mock password assignment logic in testing suite
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Restore complete tool loading refactor logic
- Re-apply BaseIndividual tools resolution logic
- Correctly patch API, UI files, system nodes and testing scripts avoiding checkout loss
- Passed all 75 integration and unit tests successfully
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: fix individual page configuration to properly display system nodes
- Added `SystemNodeConfig` database table to persist system node config (provider and model).
- Updated `AgentLayout` and `WorkerIndividualSettings` to restrict config modifications to only provider and model for system nodes.
- Hid delete button for system nodes and removed them from the creation modal dropdown.
- Updated Agent API to support querying configured system nodes and persisting setups.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: fix individual page configuration to properly display system nodes
- Added `SystemNodeConfig` database table to persist system node config (provider and model).
- Updated `AgentLayout` and `WorkerIndividualSettings` to restrict config modifications to only provider and model for system nodes.
- Hid delete button for system nodes and removed them from the creation modal dropdown.
- Updated Agent API to support querying configured system nodes and persisting setups.
- Fixed an issue where the table would not render if only system nodes were present (and no custom workers existed).
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: correct individual worker agent_type and system node model display
- Updated the `agent_type` options in `WorkerIndividualSettings.tsx` to match the backend's expected `AgentType` enum values (`ordinary_individual`, `skill_individual`, `special_individual`). This fixes the 422 error when creating a new worker.
- Fixed the system node provider and model selection logic so it correctly defaults to the first available provider when unconfigured, allowing the model dropdown to populate.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: fix individual page configuration to properly display system nodes
- Added `SystemNodeConfig` database table to persist system node config (provider and model).
- Updated `AgentLayout` and `WorkerIndividualSettings` to restrict config modifications to only provider and model for system nodes.
- Hid delete button for system nodes and removed them from the creation modal dropdown.
- Updated Agent API to support querying configured system nodes and persisting setups.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: fix individual page configuration to properly display system nodes
- Added `SystemNodeConfig` database table to persist system node config (provider and model).
- Updated `AgentLayout` and `WorkerIndividualSettings` to restrict config modifications to only provider and model for system nodes.
- Hid delete button for system nodes and removed them from the creation modal dropdown.
- Updated Agent API to support querying configured system nodes and persisting setups.
- Fixed an issue where the table would not render if only system nodes were present (and no custom workers existed).
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
- Added `SystemNodeConfig` database table to persist system node config (provider and model).
- Updated `AgentLayout` and `WorkerIndividualSettings` to restrict config modifications to only provider and model for system nodes.
- Hid delete button for system nodes and removed them from the creation modal dropdown.
- Updated Agent API to support querying configured system nodes and persisting setups.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support
- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: prevent global_state_machine actor from being garbage collected
Added `lifetime="detached"` and kept a local reference to the `GlobalStateMachine`
actor in `main.py` so that it doesn't get cleaned up by Ray due to going out
of scope, which was causing `ray.get_actor('global_state_machine')` calls to fail
in API route handlers (resulting in 500 errors).
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: resolve named actor addressing failure across Ray processes via explicit namespace
The `ray.get_actor` calls in API routes executing within a Ray Serve worker were failing to
resolve the actors created by the main process because the implicit random namespace of
`ray.init()` did not match the namespace of the Ray Serve application scope.
Instead of overriding garbage collection via `lifetime="detached"` (which can lead to actor
leakage), this assigns an explicit `namespace="pretor"` when initializing Ray in the main process,
and uses the identical namespace in `ray_hook.py` when looking up named actors. Also retains the
local variable assignments in `main.py` to prevent them from being eliminated as unused variables.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: defer actor lookup in WorkflowRunningEngine to avoid startup race conditions
The `WorkflowRunningEngine` was trying to fetch the `global_state_machine` actor
during its `__init__` method via `ray_actor_hook()`. Since actor creation requests are
dispatched asynchronously, the `global_state_machine` might not be completely
registered and discoverable via `ray.get_actor()` by the time the `WorkflowRunningEngine`'s
`__init__` is evaluated.
Moved the actor lookup to the async `run()` method, which gets executed after the engine
itself is fully up, allowing time for other components to become available in the global
Ray namespace.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support
- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: prevent global_state_machine actor from being garbage collected
Added `lifetime="detached"` and kept a local reference to the `GlobalStateMachine`
actor in `main.py` so that it doesn't get cleaned up by Ray due to going out
of scope, which was causing `ray.get_actor('global_state_machine')` calls to fail
in API route handlers (resulting in 500 errors).
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: resolve named actor addressing failure across Ray processes via explicit namespace
The `ray.get_actor` calls in API routes executing within a Ray Serve worker were failing to
resolve the actors created by the main process because the implicit random namespace of
`ray.init()` did not match the namespace of the Ray Serve application scope.
Instead of overriding garbage collection via `lifetime="detached"` (which can lead to actor
leakage), this assigns an explicit `namespace="pretor"` when initializing Ray in the main process,
and uses the identical namespace in `ray_hook.py` when looking up named actors. Also retains the
local variable assignments in `main.py` to prevent them from being eliminated as unused variables.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support
- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: prevent global_state_machine actor from being garbage collected
Added `lifetime="detached"` and kept a local reference to the `GlobalStateMachine`
actor in `main.py` so that it doesn't get cleaned up by Ray due to going out
of scope, which was causing `ray.get_actor('global_state_machine')` calls to fail
in API route handlers (resulting in 500 errors).
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: resolve named actor addressing failure across Ray processes via explicit namespace
The `ray.get_actor` calls in API routes executing within a Ray Serve worker were failing to
resolve the actors created by the main process because the implicit random namespace of
`ray.init()` did not match the namespace of the Ray Serve application scope.
Instead of overriding garbage collection via `lifetime="detached"` (which can lead to actor
leakage), this assigns an explicit `namespace="pretor"` when initializing Ray in the main process,
and uses the identical namespace in `ray_hook.py` when looking up named actors. Also retains the
local variable assignments in `main.py` to prevent them from being eliminated as unused variables.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: correct actorlist handle in supervisory node and ui form reset (#30)
- Fixed `AttributeError` for `workflow_template_manager` in `SupervisoryNode` by properly unpacking the `.global_state_machine` handle from `ray_actor_hook`.
- Removed overly broad blanket `Exception` swallowing for WebSocket cancellation that caused closed loops in Uvicorn handlers to leak and dump HTTP errors.
- UI: Reset `model_id` to blank whenever a user alters the `Provider Title` to prevent stale incompatible models from breaking submission.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Fix provider manager and skill settings 17493544742337088454 (#31)
* fix: correct actorlist handle in supervisory node and ui form reset
- Fixed `AttributeError` for `workflow_template_manager` in `SupervisoryNode` by properly unpacking the `.global_state_machine` handle from `ray_actor_hook`.
- Removed overly broad blanket `Exception` swallowing for WebSocket cancellation that caused closed loops in Uvicorn handlers to leak and dump HTTP errors.
- UI: Reset `model_id` to blank whenever a user alters the `Provider Title` to prevent stale incompatible models from breaking submission.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: dynamically resolve backend urls based on browser window location
- Updated `apiClient.ts` to use a relative base URL (`''`) if `VITE_API_BASE_URL` is omitted, allowing axios to infer the current domain in reverse-proxied environments.
- Updated WebSocket URL generation in `RightPanel.tsx` and `useClusterState.ts` to dynamically calculate protocol (`ws:` vs `wss:`) and host from `window.location`.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Refactor GlobalStateMachine/PostgresDatabase reflection, error retry mechanisms, and frontend worker individual UI.
- Replaced dynamic getattr reflection in GlobalStateMachine and PostgresDatabase with explicit wrapper methods to improve stability and avoid Missing Method AttributeErrors.
- Add `get_tool_list` explicit wrapper method resolving runtime crashes.
- Implemented `RetryableError` and `NonRetryableError` base exceptions, wrapping network errors and utilizing custom `@retry_on_retryable_error` decorator on Provider requests instead of Ray actor's unsupported `retry_exceptions`.
- Added exponential backoff algorithms for WebSocket reconnections in the frontend.
- Added strict TypeScript-based schema validation for WorkflowTemplate creation payloads.
- Redesigned the Worker Individual configuration UI into a unified list containing both System Nodes and Custom Workers, supporting Add, Edit, and Delete workflows, and resolving the provider-switching bug.
- Updated unit tests to align with architectural changes.
- Cleaned up temp scripts.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Suppress GeneratorExit RuntimeError in WebSocket endpoints
- Adds `GeneratorExit` check to the `RuntimeError` exception handling block in FastAPI WebSocket routes (`pretor/api/cluster.py` and `pretor/api/workflow.py`). This prevents unhandled exception crashes in the Ray proxy actor when a client disconnects unexpectedly or closes the generator prematurely.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Suppress GeneratorExit RuntimeError in WebSocket endpoints
- Adds `GeneratorExit` check to the `RuntimeError` exception handling block in FastAPI WebSocket routes (`pretor/api/cluster.py` and `pretor/api/workflow.py`). This prevents unhandled exception crashes in the Ray proxy actor when a client disconnects unexpectedly or closes the generator prematurely.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Enhance backend token validation to trigger frontend re-login properly.
- Modified `RoleChecker` and `get_authority` in `pretor/utils/check_user/role_check.py` to catch `UserNotExistError`. If the database cannot find the user corresponding to the token's ID (e.g. the user was deleted), the backend now raises a standard `401 Unauthorized` exception instead of passing the error up.
- This ensures the frontend's `axios` interceptor in `apiClient.ts` will catch the 401, clear the stale token from localStorage, and seamlessly bounce the user back to the login screen.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: correct actorlist handle in supervisory node and ui form reset (#30)
- Fixed `AttributeError` for `workflow_template_manager` in `SupervisoryNode` by properly unpacking the `.global_state_machine` handle from `ray_actor_hook`.
- Removed overly broad blanket `Exception` swallowing for WebSocket cancellation that caused closed loops in Uvicorn handlers to leak and dump HTTP errors.
- UI: Reset `model_id` to blank whenever a user alters the `Provider Title` to prevent stale incompatible models from breaking submission.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Fix provider manager and skill settings 17493544742337088454 (#31)
* fix: correct actorlist handle in supervisory node and ui form reset
- Fixed `AttributeError` for `workflow_template_manager` in `SupervisoryNode` by properly unpacking the `.global_state_machine` handle from `ray_actor_hook`.
- Removed overly broad blanket `Exception` swallowing for WebSocket cancellation that caused closed loops in Uvicorn handlers to leak and dump HTTP errors.
- UI: Reset `model_id` to blank whenever a user alters the `Provider Title` to prevent stale incompatible models from breaking submission.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: dynamically resolve backend urls based on browser window location
- Updated `apiClient.ts` to use a relative base URL (`''`) if `VITE_API_BASE_URL` is omitted, allowing axios to infer the current domain in reverse-proxied environments.
- Updated WebSocket URL generation in `RightPanel.tsx` and `useClusterState.ts` to dynamically calculate protocol (`ws:` vs `wss:`) and host from `window.location`.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Refactor GlobalStateMachine/PostgresDatabase reflection, error retry mechanisms, and frontend worker individual UI.
- Replaced dynamic getattr reflection in GlobalStateMachine and PostgresDatabase with explicit wrapper methods to improve stability and avoid Missing Method AttributeErrors.
- Add `get_tool_list` explicit wrapper method resolving runtime crashes.
- Implemented `RetryableError` and `NonRetryableError` base exceptions, wrapping network errors and utilizing custom `@retry_on_retryable_error` decorator on Provider requests instead of Ray actor's unsupported `retry_exceptions`.
- Added exponential backoff algorithms for WebSocket reconnections in the frontend.
- Added strict TypeScript-based schema validation for WorkflowTemplate creation payloads.
- Redesigned the Worker Individual configuration UI into a unified list containing both System Nodes and Custom Workers, supporting Add, Edit, and Delete workflows, and resolving the provider-switching bug.
- Updated unit tests to align with architectural changes.
- Cleaned up temp scripts.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Suppress GeneratorExit RuntimeError in WebSocket endpoints
- Adds `GeneratorExit` check to the `RuntimeError` exception handling block in FastAPI WebSocket routes (`pretor/api/cluster.py` and `pretor/api/workflow.py`). This prevents unhandled exception crashes in the Ray proxy actor when a client disconnects unexpectedly or closes the generator prematurely.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Suppress GeneratorExit RuntimeError in WebSocket endpoints
- Adds `GeneratorExit` check to the `RuntimeError` exception handling block in FastAPI WebSocket routes (`pretor/api/cluster.py` and `pretor/api/workflow.py`). This prevents unhandled exception crashes in the Ray proxy actor when a client disconnects unexpectedly or closes the generator prematurely.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: correct actorlist handle in supervisory node and ui form reset (#30)
- Fixed `AttributeError` for `workflow_template_manager` in `SupervisoryNode` by properly unpacking the `.global_state_machine` handle from `ray_actor_hook`.
- Removed overly broad blanket `Exception` swallowing for WebSocket cancellation that caused closed loops in Uvicorn handlers to leak and dump HTTP errors.
- UI: Reset `model_id` to blank whenever a user alters the `Provider Title` to prevent stale incompatible models from breaking submission.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Fix provider manager and skill settings 17493544742337088454 (#31)
* fix: correct actorlist handle in supervisory node and ui form reset
- Fixed `AttributeError` for `workflow_template_manager` in `SupervisoryNode` by properly unpacking the `.global_state_machine` handle from `ray_actor_hook`.
- Removed overly broad blanket `Exception` swallowing for WebSocket cancellation that caused closed loops in Uvicorn handlers to leak and dump HTTP errors.
- UI: Reset `model_id` to blank whenever a user alters the `Provider Title` to prevent stale incompatible models from breaking submission.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: dynamically resolve backend urls based on browser window location
- Updated `apiClient.ts` to use a relative base URL (`''`) if `VITE_API_BASE_URL` is omitted, allowing axios to infer the current domain in reverse-proxied environments.
- Updated WebSocket URL generation in `RightPanel.tsx` and `useClusterState.ts` to dynamically calculate protocol (`ws:` vs `wss:`) and host from `window.location`.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* Refactor GlobalStateMachine/PostgresDatabase reflection, error retry mechanisms, and frontend worker individual UI.
- Replaced dynamic getattr reflection in GlobalStateMachine and PostgresDatabase with explicit wrapper methods to improve stability and avoid Missing Method AttributeErrors.
- Add `get_tool_list` explicit wrapper method resolving runtime crashes.
- Implemented `RetryableError` and `NonRetryableError` base exceptions, wrapping network errors and utilizing custom `@retry_on_retryable_error` decorator on Provider requests instead of Ray actor's unsupported `retry_exceptions`.
- Added exponential backoff algorithms for WebSocket reconnections in the frontend.
- Added strict TypeScript-based schema validation for WorkflowTemplate creation payloads.
- Redesigned the Worker Individual configuration UI into a unified list containing both System Nodes and Custom Workers, supporting Add, Edit, and Delete workflows, and resolving the provider-switching bug.
- Updated unit tests to align with architectural changes.
- Cleaned up temp scripts.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* ✨ [Feature] Add frontend authentication page and 401 error interceptor (#21)
* feat: add frontend authentication page and 401 interceptor
Adds a new AuthPage component for user login and registration, integrates it into App.tsx to protect routes, and sets up an Axios interceptor to handle 401 Unauthorized responses by clearing local storage and reloading. Also fixes a missing logger attribute in WorkflowEngine for backend tests.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: gracefully handle closed websockets
Updates the websocket endpoints in `pretor/api/cluster.py` and `pretor/api/workflow.py` to catch `RuntimeError` alongside `WebSocketDisconnect`. This prevents the application from crashing and spamming error logs when the frontend client unexpectedly closes the connection and the underlying TCP transport is closed.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: add worker form and update global settings
Adds a new form in the Worker Individual Settings page to create custom worker individuals via the `/api/v1/agent/worker` endpoint. Also updates the System Settings page to remove the obsolete "Max Concurrent Workflows" setting and makes the system language and theme toggles functional by persisting to local storage and updating the document root class.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: resolve provider owner type, resource UI, and db ready check
This commit fixes the following issues:
1. `provider_owner` type bug: Changed type from `int` to `str` in DB models and Pydantic schemas.
2. Frontend Provider Dropdown: `WorkerIndividualSettings.tsx` now uses a dropdown to select a created provider instead of a free-form input field.
3. Database Initialization Sync: Added an `asyncio.Event()` to `postgres.py` to prevent any DB actions from executing before `init_db()` is complete.
4. Resource Management UI: Added new pages `SkillSettings.tsx` and `WorkflowTemplateSettings.tsx` to handle frontend requests to manage skills and workflow templates.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: add frontend authentication page and 401 interceptor
Adds a new AuthPage component for user login and registration, integrates it into App.tsx to protect routes, and sets up an Axios interceptor to handle 401 Unauthorized responses by clearing local storage and reloading. Also fixes a missing logger attribute in WorkflowEngine for backend tests.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix: gracefully handle closed websockets
Updates the websocket endpoints in `pretor/api/cluster.py` and `pretor/api/workflow.py` to catch `RuntimeError` alongside `WebSocketDisconnect`. This prevents the application from crashing and spamming error logs when the frontend client unexpectedly closes the connection and the underlying TCP transport is closed.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: add worker form and update global settings
Adds a new form in the Worker Individual Settings page to create custom worker individuals via the `/api/v1/agent/worker` endpoint. Also updates the System Settings page to remove the obsolete "Max Concurrent Workflows" setting and makes the system language and theme toggles functional by persisting to local storage and updating the document root class.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix(backend): initialize async queue properly and fix auth login error handling (#18)
- Moved `self.workflow_queue = asyncio.Queue()` to the top of `WorkflowRunningEngine.run` to ensure the queue exists before coroutines start polling it, resolving initialization race conditions.
- Handled `user` object nullability check correctly in `/api/v1/auth/login` to raise `UserNotExistError` instead of crashing on attribute access.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: Integrate frontend dashboard and wire up settings endpoints
- Imported and moved the pretor_dashboard dev branch into `frontend/`.
- Configured FastAPI `PretorGateway` to mount `frontend/dist` out of the box and serve it effectively.
- Fixed `global_state_machine` Ray Actor hook references in `pretor/api/resource.py`.
- Added missing GET `/api/v1/auth/list` endpoint to list all users.
- Added missing DELETE `/api/v1/auth/{user_id}` endpoint to remove users.
- Plumbed API calls in the frontend's `UsersSettings.tsx` to get, delete, and alter the authority roles.
- Wired up provider deletion API endpoints within `ProvidersSettings.tsx`.
- Ran `npm run build` so `frontend/dist` is current.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: Integrate frontend dashboard and wire up settings endpoints
- Imported and moved the pretor_dashboard dev branch into `frontend/`.
- Configured FastAPI `PretorGateway` to mount `frontend/dist` out of the box and serve it effectively.
- Fixed `global_state_machine` Ray Actor hook references in `pretor/api/resource.py`.
- Added missing GET `/api/v1/auth/list` endpoint to list all users.
- Added missing DELETE `/api/v1/auth/{user_id}` endpoint to remove users.
- Plumbed API calls in the frontend's `UsersSettings.tsx` to get, delete, and alter the authority roles.
- Wired up provider deletion API endpoints within `ProvidersSettings.tsx`.
- Ran `npm run build` so `frontend/dist` is current.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix(backend): Remove __call__ from PretorGateway and assign first user as SUPER_ADMINISTRATOR
- Removed `__call__` from `PretorGateway` in `pretor/core/api/__init__.py` to fix Ray Serve `ValueError` during initialization.
- Modified `AuthDatabase.add_user` in `pretor/core/database/module/user.py` to check for existing users. The first registered user now receives `UserAuthority.SUPER_ADMINISTRATOR` access while subsequent users get `USER` access.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix(backend): Remove __call__ from PretorGateway and assign first user as SUPER_ADMINISTRATOR
- Removed `__call__` from `PretorGateway` in `pretor/core/api/__init__.py` to fix Ray Serve `ValueError` during initialization.
- Added connection error handling in `PostgresDatabase.init_db()` to prevent startup crashes when PostgreSQL is unavailable.
- Updated `AuthDatabase.add_user` to automatically grant `SUPER_ADMINISTRATOR` privileges to the first registered user.
- Fixed unit tests in `user_test.py` that were improperly mocking `session.execute`, removing confusing stack traces during testing.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix(backend): Remove __call__ from PretorGateway and assign first user as SUPER_ADMINISTRATOR
- Removed `__call__` from `PretorGateway` in `pretor/core/api/__init__.py` to fix Ray Serve `ValueError` during initialization.
- Added connection error handling in `PostgresDatabase.init_db()` to prevent startup crashes when PostgreSQL is unavailable.
- Updated `AuthDatabase.add_user` to automatically grant `SUPER_ADMINISTRATOR` privileges to the first registered user.
- Fixed unit tests in `user_test.py` that were improperly mocking `session.execute`, removing confusing stack traces during testing.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* fix(backend): initialize async queue properly and fix auth login error handling (#18)
- Moved `self.workflow_queue = asyncio.Queue()` to the top of `WorkflowRunningEngine.run` to ensure the queue exists before coroutines start polling it, resolving initialization race conditions.
- Handled `user` object nullability check correctly in `/api/v1/auth/login` to raise `UserNotExistError` instead of crashing on attribute access.
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
* feat: Integrate frontend dashboard and wire up settings endpoints
- Imported and moved the pretor_dashboard dev branch into `frontend/`.
- Configured FastAPI `PretorGateway` to mount `frontend/dist` out of the box and serve it effectively.
- Fixed `global_state_machine` Ray Actor hook references in `pretor/api/resource.py`.
- Added missing GET `/api/v1/auth/list` endpoint to list all users.
- Added missing DELETE `/api/v1/auth/{user_id}` endpoint to remove users.
- Plumbed API calls in the frontend's `UsersSettings.tsx` to get, delete, and alter the authority roles.
- Wired up provider deletion API endpoints within `ProvidersSettings.tsx`.
- Ran `npm run build` so `frontend/dist` is current.
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>