Compare commits

...

5 Commits

Author SHA1 Message Date
朝夕 7c841b9424 fix: 修复了严重bug 2026-04-27 14:13:34 +08:00
朝夕 d17f6384fc
Feat/dashboard clean monitor upload 1334918525438687015 (#40)
* feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support

- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: prevent global_state_machine actor from being garbage collected

Added `lifetime="detached"` and kept a local reference to the `GlobalStateMachine`
actor in `main.py` so that it doesn't get cleaned up by Ray due to going out
of scope, which was causing `ray.get_actor('global_state_machine')` calls to fail
in API route handlers (resulting in 500 errors).

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: resolve named actor addressing failure across Ray processes via explicit namespace

The `ray.get_actor` calls in API routes executing within a Ray Serve worker were failing to
resolve the actors created by the main process because the implicit random namespace of
`ray.init()` did not match the namespace of the Ray Serve application scope.

Instead of overriding garbage collection via `lifetime="detached"` (which can lead to actor
leakage), this assigns an explicit `namespace="pretor"` when initializing Ray in the main process,
and uses the identical namespace in `ray_hook.py` when looking up named actors. Also retains the
local variable assignments in `main.py` to prevent them from being eliminated as unused variables.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: defer actor lookup in WorkflowRunningEngine to avoid startup race conditions

The `WorkflowRunningEngine` was trying to fetch the `global_state_machine` actor
during its `__init__` method via `ray_actor_hook()`. Since actor creation requests are
dispatched asynchronously, the `global_state_machine` might not be completely
registered and discoverable via `ray.get_actor()` by the time the `WorkflowRunningEngine`'s
`__init__` is evaluated.

Moved the actor lookup to the async `run()` method, which gets executed after the engine
itself is fully up, allowing time for other components to become available in the global
Ray namespace.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
2026-04-27 13:06:15 +08:00
朝夕 ccecd1d59c
Feat/dashboard clean monitor upload 1334918525438687015 (#39)
* feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support

- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: prevent global_state_machine actor from being garbage collected

Added `lifetime="detached"` and kept a local reference to the `GlobalStateMachine`
actor in `main.py` so that it doesn't get cleaned up by Ray due to going out
of scope, which was causing `ray.get_actor('global_state_machine')` calls to fail
in API route handlers (resulting in 500 errors).

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: resolve named actor addressing failure across Ray processes via explicit namespace

The `ray.get_actor` calls in API routes executing within a Ray Serve worker were failing to
resolve the actors created by the main process because the implicit random namespace of
`ray.init()` did not match the namespace of the Ray Serve application scope.

Instead of overriding garbage collection via `lifetime="detached"` (which can lead to actor
leakage), this assigns an explicit `namespace="pretor"` when initializing Ray in the main process,
and uses the identical namespace in `ray_hook.py` when looking up named actors. Also retains the
local variable assignments in `main.py` to prevent them from being eliminated as unused variables.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
2026-04-27 10:39:33 +08:00
朝夕 525d251010
Feat/dashboard clean monitor upload 1334918525438687015 (#38)
* feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support

- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: prevent global_state_machine actor from being garbage collected

Added `lifetime="detached"` and kept a local reference to the `GlobalStateMachine`
actor in `main.py` so that it doesn't get cleaned up by Ray due to going out
of scope, which was causing `ray.get_actor('global_state_machine')` calls to fail
in API route handlers (resulting in 500 errors).

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

* fix: resolve named actor addressing failure across Ray processes via explicit namespace

The `ray.get_actor` calls in API routes executing within a Ray Serve worker were failing to
resolve the actors created by the main process because the implicit random namespace of
`ray.init()` did not match the namespace of the Ray Serve application scope.

Instead of overriding garbage collection via `lifetime="detached"` (which can lead to actor
leakage), this assigns an explicit `namespace="pretor"` when initializing Ray in the main process,
and uses the identical namespace in `ray_hook.py` when looking up named actors. Also retains the
local variable assignments in `main.py` to prevent them from being eliminated as unused variables.

Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>

---------

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
2026-04-27 10:27:44 +08:00
朝夕 71963ac4a4
feat: Clean up dashboard UI, shift workflow WS to SSE, and add file upload support (#37)
- Removed Monitoring view and associated `/ws/state` cluster websocket route.
- Modified workflow tracing from WebSocket (`/api/v1/workflow/ws/{trace_id}`) to Server-Sent Events (`/api/v1/workflow/sse/{trace_id}`) for unidirectional pushes, introducing a new `/api/v1/workflow/reply/{trace_id}` POST route to handle incoming client replies.
- Cleaned up dummy data and unneeded links in the chat layout (LeftPanel, ChatPanel).
- Implemented file upload functionality: added a `/api/v1/adapter/client/upload` endpoint to the backend which saves files to a local `uploads` directory, and added an integrated file input triggered via the `+` button in the frontend chat interface to facilitate uploading with an automated chat message confirmation.

Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
Co-authored-by: zhaoxi826 <198742034+zhaoxi826@users.noreply.github.com>
2026-04-27 09:45:45 +08:00
21 changed files with 142 additions and 302 deletions

View File

@ -1,6 +1,5 @@
import { useState, useEffect } from 'react'; import { useState, useEffect } from 'react';
import { Sidebar } from './components/Layout/Sidebar'; import { Sidebar } from './components/Layout/Sidebar';
import { MonitoringLayout } from './components/Monitoring/MonitoringLayout';
import { SettingsLayout } from './components/Settings/SettingsLayout'; import { SettingsLayout } from './components/Settings/SettingsLayout';
import { AgentLayout } from './components/Agent/AgentLayout'; import { AgentLayout } from './components/Agent/AgentLayout';
import { ResourceLayout } from './components/Resource/ResourceLayout'; import { ResourceLayout } from './components/Resource/ResourceLayout';
@ -12,7 +11,7 @@ import { AuthPage } from './components/Auth/AuthPage';
function App() { function App() {
const [isAuthenticated, setIsAuthenticated] = useState(false); const [isAuthenticated, setIsAuthenticated] = useState(false);
const [activeTab, setActiveTab] = useState('chats'); // For LeftPanel const [activeTab, setActiveTab] = useState('chats'); // For LeftPanel
const [currentView, setCurrentView] = useState('dashboard'); // 'dashboard', 'settings', 'monitoring', 'agent', 'resource' const [currentView, setCurrentView] = useState('dashboard'); // 'dashboard', 'settings', 'agent', 'resource'
const [settingsTab, setSettingsTab] = useState('users'); // For SettingsLayout const [settingsTab, setSettingsTab] = useState('users'); // For SettingsLayout
const [agentTab, setAgentTab] = useState('worker'); // For AgentLayout const [agentTab, setAgentTab] = useState('worker'); // For AgentLayout
const [resourceTab, setResourceTab] = useState('skill'); // For ResourceLayout const [resourceTab, setResourceTab] = useState('skill'); // For ResourceLayout
@ -38,9 +37,7 @@ function App() {
<Sidebar currentView={currentView} setCurrentView={setCurrentView} /> <Sidebar currentView={currentView} setCurrentView={setCurrentView} />
{/* Main Content Area depending on view */} {/* Main Content Area depending on view */}
{currentView === 'monitoring' ? ( {currentView === 'agent' ? (
<MonitoringLayout />
) : currentView === 'agent' ? (
<AgentLayout agentTab={agentTab} setAgentTab={setAgentTab} /> <AgentLayout agentTab={agentTab} setAgentTab={setAgentTab} />
) : currentView === 'resource' ? ( ) : currentView === 'resource' ? (
<ResourceLayout resourceTab={resourceTab} setResourceTab={setResourceTab} /> <ResourceLayout resourceTab={resourceTab} setResourceTab={setResourceTab} />

View File

@ -1,4 +1,4 @@
import { useState } from 'react'; import React, { useState } from 'react';
import { MessageSquare, Activity, Terminal, ChevronRight, Plus } from 'lucide-react'; import { MessageSquare, Activity, Terminal, ChevronRight, Plus } from 'lucide-react';
import apiClient from '../../api/client'; import apiClient from '../../api/client';
@ -21,6 +21,45 @@ export function ChatPanel() {
]); ]);
const [input, setInput] = useState(''); const [input, setInput] = useState('');
const [loading, setLoading] = useState(false); const [loading, setLoading] = useState(false);
const fileInputRef = React.useRef<HTMLInputElement>(null);
const handleFileUpload = async (e: React.ChangeEvent<HTMLInputElement>) => {
const file = e.target.files?.[0];
if (!file) return;
const formData = new FormData();
formData.append('file', file);
setLoading(true);
try {
const response = await apiClient.post('/api/v1/adapter/client/upload', formData, {
headers: {
'Content-Type': 'multipart/form-data'
}
});
const aiMessage: ChatMessage = {
id: Date.now().toString(),
sender: 'ai',
text: `已上传文件: ${response.data.filename}`,
timestamp: new Date()
};
setMessages(prev => [...prev, aiMessage]);
} catch (error) {
console.error("Error uploading file", error);
const errorMessage: ChatMessage = {
id: Date.now().toString(),
sender: 'ai',
text: "文件上传失败。",
timestamp: new Date()
};
setMessages(prev => [...prev, errorMessage]);
} finally {
setLoading(false);
if (fileInputRef.current) {
fileInputRef.current.value = '';
}
}
};
const handleSendMessage = async () => { const handleSendMessage = async () => {
if (!input.trim()) return; if (!input.trim()) return;
@ -124,7 +163,14 @@ export function ChatPanel() {
{/* Chat Input */} {/* Chat Input */}
<div className="p-4 bg-white border-t border-slate-200"> <div className="p-4 bg-white border-t border-slate-200">
<div className="relative flex items-center"> <div className="relative flex items-center">
<input
type="file"
ref={fileInputRef}
onChange={handleFileUpload}
className="hidden"
/>
<button <button
onClick={() => fileInputRef.current?.click()}
className="absolute left-2 p-1.5 text-slate-400 hover:text-blue-600 hover:bg-blue-50 rounded-lg transition-colors z-10 cursor-pointer" className="absolute left-2 p-1.5 text-slate-400 hover:text-blue-600 hover:bg-blue-50 rounded-lg transition-colors z-10 cursor-pointer"
title="Add attachment" title="Add attachment"
> >
@ -146,10 +192,6 @@ export function ChatPanel() {
<ChevronRight size={18} /> <ChevronRight size={18} />
</button> </button>
</div> </div>
<div className="flex mt-2 space-x-3 px-2">
<span className="text-[10px] text-slate-400 cursor-pointer hover:text-blue-500">Run diagnostics</span>
<span className="text-[10px] text-slate-400 cursor-pointer hover:text-blue-500">View recent errors</span>
</div>
</div> </div>
</div> </div>
); );

View File

@ -154,14 +154,6 @@ export function LeftPanel({ activeTab, setActiveTab, selectedWorkflow, setSelect
)} )}
{activeTab === 'chats' && ( {activeTab === 'chats' && (
<div className="space-y-2"> <div className="space-y-2">
<div className="p-3 rounded-lg border border-slate-100 hover:border-blue-200 hover:bg-slate-50 cursor-pointer transition-all">
<div className="font-medium text-sm text-slate-700 mb-1">System Architecture</div>
<p className="text-xs text-slate-400 line-clamp-1">Can you explain the MoE model...</p>
</div>
<div className="p-3 rounded-lg border border-slate-100 hover:border-blue-200 hover:bg-slate-50 cursor-pointer transition-all">
<div className="font-medium text-sm text-slate-700 mb-1">Log Analysis Helper</div>
<p className="text-xs text-slate-400 line-clamp-1">Show me the errors from yesterday.</p>
</div>
</div> </div>
)} )}
</div> </div>

View File

@ -16,56 +16,43 @@ export function RightPanel({ selectedWorkflow }: RightPanelProps) {
return; return;
} }
let ws: WebSocket | null = null; let eventSource: EventSource | null = null;
let reconnectTimeout: ReturnType<typeof setTimeout>;
let retryCount = 0;
const maxRetryCount = 10;
const baseDelay = 1000;
const connect = () => { const connect = () => {
const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'; const protocol = window.location.protocol;
const host = window.location.host; const host = window.location.host;
const wsBase = import.meta.env.VITE_API_BASE_URL const apiBase = import.meta.env.VITE_API_BASE_URL || `${protocol}//${host}`;
? import.meta.env.VITE_API_BASE_URL.replace(/^http/, 'ws')
: `${protocol}//${host}`;
// Using the workflow router WS endpoint // Using the workflow router SSE endpoint
ws = new WebSocket(`${wsBase}/api/v1/workflow/ws/${selectedWorkflow}`); eventSource = new EventSource(`${apiBase}/api/v1/workflow/sse/${selectedWorkflow}`);
ws.onopen = () => { eventSource.onopen = () => {
setIsConnected(true); setIsConnected(true);
retryCount = 0; // reset on successful connection
setMessages([]); // clear previous traces setMessages([]); // clear previous traces
}; };
ws.onmessage = (event) => { eventSource.onmessage = (event) => {
try { try {
setMessages(prev => [...prev, event.data]); setMessages(prev => [...prev, event.data]);
} catch (e) { } catch (e) {
console.error("Error receiving workflow websocket message", e); console.error("Error receiving workflow SSE message", e);
} }
}; };
ws.onclose = () => { eventSource.onerror = (error) => {
console.error("EventSource failed.", error);
setIsConnected(false); setIsConnected(false);
if (retryCount < maxRetryCount) { // EventSource automatically attempts to reconnect, so we can just let it be,
const delay = baseDelay * Math.pow(2, retryCount); // or we could close it if we wanted to handle retries manually.
retryCount++;
console.log(`WebSocket closed. Reconnecting in ${delay}ms... (Attempt ${retryCount})`);
reconnectTimeout = setTimeout(connect, delay);
} else {
console.error("Max WebSocket reconnect attempts reached.");
}
}; };
}; };
connect(); connect();
return () => { return () => {
clearTimeout(reconnectTimeout); if (eventSource) {
if (ws) { eventSource.close();
ws.close();
} }
}; };
}, [selectedWorkflow]); }, [selectedWorkflow]);

View File

@ -1,5 +1,5 @@
import { Activity, MessageSquare, MonitorPlay, Settings, Bot, Box } from 'lucide-react'; import { Activity, MessageSquare, Settings, Bot, Box } from 'lucide-react';
interface SidebarProps { interface SidebarProps {
currentView: string; currentView: string;
@ -24,13 +24,6 @@ export function Sidebar({ currentView, setCurrentView }: SidebarProps) {
> >
<MessageSquare size={18} /> <MessageSquare size={18} />
</button> </button>
<button
onClick={() => setCurrentView('monitoring')}
className={`p-1.5 rounded-lg transition-colors ${currentView === 'monitoring' ? 'text-blue-600 bg-blue-50' : 'text-slate-400 hover:text-blue-500 hover:bg-blue-50'}`}
title="Monitoring"
>
<MonitorPlay size={18} />
</button>
<button <button
onClick={() => setCurrentView('agent')} onClick={() => setCurrentView('agent')}
className={`p-1.5 rounded-lg transition-colors ${currentView === 'agent' ? 'text-blue-600 bg-blue-50' : 'text-slate-400 hover:text-blue-500 hover:bg-blue-50'}`} className={`p-1.5 rounded-lg transition-colors ${currentView === 'agent' ? 'text-blue-600 bg-blue-50' : 'text-slate-400 hover:text-blue-500 hover:bg-blue-50'}`}

View File

@ -1,154 +0,0 @@
import { Server, Cpu, HardDrive, Box } from 'lucide-react';
import type { ClusterNode } from '../../types';
import { useClusterState } from '../../hooks/useClusterState';
export function MonitoringDashboard() {
const { nodes, isConnected } = useClusterState();
const totalNodes = nodes.length;
let totalCpu = 0;
let totalMemory = 0;
let totalGpu = 0;
nodes.forEach((node: ClusterNode) => {
totalCpu += node.resources?.CPU || 0;
totalMemory += node.resources?.memory || 0;
totalGpu += node.resources?.GPU || 0;
});
return (
<div className="flex-1 flex flex-col bg-slate-50 overflow-hidden">
<div className="p-6 border-b border-slate-200 bg-white shadow-sm z-10 flex items-center justify-between">
<div>
<h2 className="text-xl font-semibold text-slate-800">Ray Cluster Monitoring</h2>
<p className="text-sm text-slate-500 mt-1">Real-time resource utilization across all nodes.</p>
</div>
<div className="flex items-center space-x-2">
<span className={`flex h-2.5 w-2.5 rounded-full ${isConnected ? 'bg-green-500 animate-pulse' : 'bg-red-500'}`}></span>
<span className="text-sm font-medium text-slate-600">{isConnected ? 'Connected' : 'Disconnected'}</span>
</div>
</div>
<div className="flex-1 overflow-y-auto p-6 space-y-6">
{/* Cluster Global Metrics */}
<div className="grid grid-cols-4 gap-4">
<div className="bg-white p-4 rounded-xl border border-slate-200 shadow-sm flex items-center">
<div className="w-12 h-12 rounded-lg bg-blue-50 flex items-center justify-center mr-4">
<Server size={24} className="text-blue-600" />
</div>
<div>
<p className="text-xs text-slate-500 font-medium">TOTAL NODES</p>
<p className="text-2xl font-bold text-slate-800">{totalNodes}</p>
</div>
</div>
<div className="bg-white p-4 rounded-xl border border-slate-200 shadow-sm flex items-center">
<div className="w-12 h-12 rounded-lg bg-indigo-50 flex items-center justify-center mr-4">
<Cpu size={24} className="text-indigo-600" />
</div>
<div>
<p className="text-xs text-slate-500 font-medium">TOTAL CPU CORES</p>
<p className="text-2xl font-bold text-slate-800">{totalCpu}</p>
</div>
</div>
<div className="bg-white p-4 rounded-xl border border-slate-200 shadow-sm flex items-center">
<div className="w-12 h-12 rounded-lg bg-green-50 flex items-center justify-center mr-4">
<HardDrive size={24} className="text-green-600" />
</div>
<div>
<p className="text-xs text-slate-500 font-medium">TOTAL RAM</p>
<p className="text-2xl font-bold text-slate-800">
{totalMemory > 0 ? `${(totalMemory / (1024 * 1024 * 1024)).toFixed(2)} GB` : '0 GB'}
</p>
</div>
</div>
<div className="bg-white p-4 rounded-xl border border-slate-200 shadow-sm flex items-center">
<div className="w-12 h-12 rounded-lg bg-purple-50 flex items-center justify-center mr-4">
<Box size={24} className="text-purple-600" />
</div>
<div>
<p className="text-xs text-slate-500 font-medium">TOTAL GPUS</p>
<p className="text-2xl font-bold text-slate-800">{totalGpu}</p>
</div>
</div>
</div>
{/* Node List */}
<div className="bg-white border border-slate-200 rounded-xl shadow-sm overflow-hidden">
<table className="w-full text-left text-sm">
<thead className="bg-slate-50 border-b border-slate-200 text-slate-500">
<tr>
<th className="px-6 py-4 font-medium">Node ID / Name</th>
<th className="px-6 py-4 font-medium">Status</th>
<th className="px-6 py-4 font-medium">CPU (Used / Total)</th>
<th className="px-6 py-4 font-medium">RAM (Used / Total)</th>
<th className="px-6 py-4 font-medium">GPU (Used / Total)</th>
</tr>
</thead>
<tbody className="divide-y divide-slate-100">
{nodes.map((node: ClusterNode, i: number) => {
const totalCpu = node.resources?.CPU || 0;
const remainingCpu = node.remaining?.CPU || 0;
const usedCpu = totalCpu - remainingCpu;
const cpuPercent = totalCpu > 0 ? (usedCpu / totalCpu) * 100 : 0;
const totalRam = node.resources?.memory || 0;
const remainingRam = node.remaining?.memory || 0;
const usedRam = totalRam - remainingRam;
const ramPercent = totalRam > 0 ? (usedRam / totalRam) * 100 : 0;
const totalGpu = node.resources?.GPU || 0;
const remainingGpu = node.remaining?.GPU || 0;
const usedGpu = totalGpu - remainingGpu;
const gpuPercent = totalGpu > 0 ? (usedGpu / totalGpu) * 100 : 0;
return (
<tr key={i} className="hover:bg-slate-50 transition-colors">
<td className="px-6 py-4 font-medium text-slate-800 flex flex-col">
<span>{node.node_name || 'Unknown'}</span>
<span className="text-xs text-slate-400 font-mono">{node.node_id}</span>
</td>
<td className="px-6 py-4">
<span className={`flex items-center text-xs font-medium ${node.alive ? 'text-green-600' : 'text-red-600'}`}>
<span className={`w-2 h-2 rounded-full mr-2 ${node.alive ? 'bg-green-500' : 'bg-red-500'}`}></span>
{node.alive ? 'Alive' : 'Dead'}
</span>
</td>
<td className="px-6 py-4">
<div className="flex items-center">
<span className="w-16 text-right mr-2 text-xs">{usedCpu.toFixed(1)} / {totalCpu}</span>
<div className="w-16 bg-slate-100 rounded-full h-1.5"><div className={`h-1.5 rounded-full ${cpuPercent > 80 ? 'bg-red-500' : 'bg-indigo-500'}`} style={{ width: `${cpuPercent}%` }}></div></div>
</div>
</td>
<td className="px-6 py-4 text-slate-600 text-xs">
<div className="flex items-center">
<span className="w-24 text-right mr-2 text-xs">{(usedRam / (1024**3)).toFixed(1)}G / {(totalRam / (1024**3)).toFixed(1)}G</span>
<div className="w-16 bg-slate-100 rounded-full h-1.5"><div className={`h-1.5 rounded-full ${ramPercent > 80 ? 'bg-red-500' : 'bg-green-500'}`} style={{ width: `${ramPercent}%` }}></div></div>
</div>
</td>
<td className="px-6 py-4">
{totalGpu > 0 ? (
<div className="flex items-center">
<span className="w-16 text-right mr-2 text-xs">{usedGpu} / {totalGpu}</span>
<div className="w-16 bg-slate-100 rounded-full h-1.5"><div className={`h-1.5 rounded-full ${gpuPercent > 80 ? 'bg-red-500' : 'bg-purple-500'}`} style={{ width: `${gpuPercent}%` }}></div></div>
</div>
) : (
<span className="text-slate-400 italic text-xs">No GPU</span>
)}
</td>
</tr>
);
})}
{nodes.length === 0 && (
<tr>
<td colSpan={5} className="px-6 py-8 text-center text-slate-500 text-sm">
No node data available. {isConnected ? 'Waiting for cluster state...' : 'Check connection.'}
</td>
</tr>
)}
</tbody>
</table>
</div>
</div>
</div>
);
}

View File

@ -1,33 +0,0 @@
import { useState } from 'react';
import { Server } from 'lucide-react';
import { MonitoringDashboard } from './MonitoringDashboard';
export function MonitoringLayout() {
const [activeTab, setActiveTab] = useState('cluster');
return (
<div className="flex-1 flex bg-slate-50 overflow-hidden">
{/* Monitoring Inner Sidebar */}
<div className="w-64 bg-white border-r border-slate-200 flex flex-col z-0">
<div className="p-6 border-b border-slate-100">
<h2 className="text-lg font-semibold text-slate-800">Monitoring</h2>
</div>
<div className="flex-1 p-4 space-y-2 overflow-y-auto">
<button
onClick={() => setActiveTab('cluster')}
className={`w-full flex items-center px-4 py-3 text-sm font-medium rounded-xl transition-all ${activeTab === 'cluster' ? 'bg-blue-50 text-blue-600' : 'text-slate-600 hover:bg-slate-50 hover:text-slate-900'}`}
>
<Server size={18} className="mr-3" />
Cluster Monitor
</button>
{/* Future monitoring tabs (e.g., Application Logs, Agent Metrics) can go here */}
</div>
</div>
{/* Monitoring Main Content */}
<div className="flex-1 overflow-y-auto">
{activeTab === 'cluster' && <MonitoringDashboard />}
</div>
</div>
);
}

18
main.py
View File

@ -25,6 +25,7 @@ async def start_system():
} }
ray.init(ignore_reinit_error=True, ray.init(ignore_reinit_error=True,
namespace="pretor",
dashboard_host="0.0.0.0", dashboard_host="0.0.0.0",
dashboard_port=8265, dashboard_port=8265,
runtime_env={"env_vars": env_vars}) runtime_env={"env_vars": env_vars})
@ -34,8 +35,21 @@ async def start_system():
postgres_database = PostgresDatabase.options(name='postgres_database').remote() postgres_database = PostgresDatabase.options(name='postgres_database').remote()
await postgres_database.init_db.remote() await postgres_database.init_db.remote()
# 3. 启动全局状态机 global_state_machine = GlobalStateMachine.options(
global_state_machine = GlobalStateMachine.options(name='global_state_machine').remote(postgres_database) name='global_state_machine',
namespace='pretor',
lifetime='detached'
).remote(postgres_database)
print("正在等待 GlobalStateMachine 初始化并加载注册表...")
try:
# 强制执行初始化方法并阻塞等待结果。
# 如果 __init__ 或 init_state_machine 中有任何报错,会立刻在这里抛出!
await global_state_machine.init_state_machine.remote()
print("GlobalStateMachine 初始化成功!")
except Exception as e:
print(f"\n[致命错误] GlobalStateMachine 启动失败!真实报错如下:\n{e}\n")
return
# 4. 启动核心节点 # 4. 启动核心节点
supervisory_node = SupervisoryNode.options(name='supervisory_node').remote() supervisory_node = SupervisoryNode.options(name='supervisory_node').remote()

View File

@ -12,34 +12,8 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from fastapi import APIRouter, WebSocket, WebSocketDisconnect from fastapi import APIRouter
import ray
import asyncio
cluster_router = APIRouter(prefix="/api/v1/cluster", tags=["cluster"]) cluster_router = APIRouter(prefix="/api/v1/cluster", tags=["cluster"])
@cluster_router.websocket("/ws/state") # Monitor websocket API temporarily removed
async def update_cluster_state(websocket: WebSocket):
await websocket.accept()
try:
while True:
nodes = ray.nodes()
payload = [
{
"node_id": node.get("NodeID"),
"node_name": node.get("NodeName"),
"alive": node.get("Alive"),
"resources": node.get("Resources", {}),
"remaining": node.get("RemainingResources", {})
}
for node in nodes
]
await websocket.send_json(payload)
await asyncio.sleep(10)
except WebSocketDisconnect:
pass
except RuntimeError as e:
if "closed" not in str(e) and "GeneratorExit" not in str(e):
raise
except Exception:
pass

View File

@ -12,11 +12,13 @@
# See the License for the specific language governing permissions and # See the License for the specific language governing permissions and
# limitations under the License. # limitations under the License.
from fastapi import APIRouter, Depends, HTTPException, status from fastapi import APIRouter, Depends, HTTPException, status, UploadFile, File
from pydantic import BaseModel from pydantic import BaseModel
from pretor.utils.access import Accessor, TokenData from pretor.utils.access import Accessor, TokenData
from pretor.api.platform.event import PretorEvent from pretor.api.platform.event import PretorEvent
from pretor.utils.ray_hook import ray_actor_hook from pretor.utils.ray_hook import ray_actor_hook
import os
import shutil
from pretor.utils.logger import get_logger from pretor.utils.logger import get_logger
logger = get_logger('frontend') logger = get_logger('frontend')
@ -45,3 +47,17 @@ async def create_message(message: Message,
else: else:
return {"message": message} return {"message": message}
@client_router.post("/upload")
async def upload_file(file: UploadFile = File(...),
token_data: TokenData = Depends(Accessor.get_current_user)):
try:
upload_dir = "uploads"
os.makedirs(upload_dir, exist_ok=True)
file_path = os.path.join(upload_dir, file.filename)
with open(file_path, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
logger.info(f"用户 {token_data.username} 上传了文件: {file.filename}")
return {"filename": file.filename, "message": f"File {file.filename} uploaded successfully"}
except Exception as e:
logger.error(f"文件上传失败: {e}")
raise HTTPException(status_code=500, detail="文件上传失败")

View File

@ -14,26 +14,37 @@
from pretor.utils.ray_hook import ray_actor_hook from pretor.utils.ray_hook import ray_actor_hook
from fastapi import APIRouter, WebSocket, WebSocketDisconnect from fastapi import APIRouter, Request
from fastapi.responses import StreamingResponse
import asyncio
workflow_router = APIRouter(prefix="/api/v1/workflow", tags=["workflow"]) workflow_router = APIRouter(prefix="/api/v1/workflow", tags=["workflow"])
@workflow_router.websocket("/ws/{trace_id}") @workflow_router.get("/sse/{trace_id}")
async def get_workflow(websocket: WebSocket, trace_id: str): async def get_workflow_sse(trace_id: str, request: Request):
await websocket.accept() global_state_machine = ray_actor_hook("global_state_machine").global_state_machine
global_state_machine = ray_actor_hook("global_state_machine")
try: async def event_generator():
while True: try:
await websocket.send(await global_state_machine.get_workflow.remote(trace_id)) while True:
await websocket.send_text(await global_state_machine.get_pending.remote(trace_id)) if await request.is_disconnected():
response = await websocket.receive_text() break
await global_state_machine.put_received(trace_id, response)
except WebSocketDisconnect: # You might also want to send the workflow state periodically or when updated
pass # Here we just wait for pending messages and send them
except RuntimeError as e: message = await global_state_machine.get_pending.remote(trace_id)
if "closed" not in str(e) and "GeneratorExit" not in str(e): # Ensure the message is formatted as SSE
raise yield f"data: {message}\n\n"
except Exception: except asyncio.CancelledError:
pass pass
return StreamingResponse(event_generator(), media_type="text/event-stream")
@workflow_router.post("/reply/{trace_id}")
async def post_workflow_reply(trace_id: str, request: Request):
data = await request.json()
reply_msg = data.get("message", "")
global_state_machine = ray_actor_hook("global_state_machine").global_state_machine
await global_state_machine.put_received.remote(trace_id, reply_msg)
return {"status": "ok"}

View File

@ -28,16 +28,22 @@ from pretor.core.global_state_machine.individual_manager import GlobalIndividual
@ray.remote @ray.remote
class GlobalStateMachine: class GlobalStateMachine:
def __init__(self, postgres_database: PostgresDatabase): def __init__(self, postgres_database: PostgresDatabase):
import sys
print("GSM __init__ START", file=sys.stderr, flush=True)
self.event_dict: Dict[str, PretorEvent] = {} self.event_dict: Dict[str, PretorEvent] = {}
print(" event_dict done", file=sys.stderr, flush=True)
self._global_provider_manager = ProviderManager(postgres_database) self._global_provider_manager = ProviderManager(postgres_database)
print(" provider_manager done", file=sys.stderr, flush=True)
self._global_tool_manager = GlobalToolManager() self._global_tool_manager = GlobalToolManager()
print(" tool_manager done", file=sys.stderr, flush=True)
self._global_workflow_template_manager = WorkflowManager() self._global_workflow_template_manager = WorkflowManager()
print(" workflow_template_manager done", file=sys.stderr, flush=True)
self._global_skill_manager = GlobalSkillManager() self._global_skill_manager = GlobalSkillManager()
print(" skill_manager done", file=sys.stderr, flush=True)
self._global_individual_manager = GlobalIndividualManager() self._global_individual_manager = GlobalIndividualManager()
print(" individual_manager done", file=sys.stderr, flush=True)
self.postgres_database = postgres_database self.postgres_database = postgres_database
print("GSM __init__ DONE", file=sys.stderr, flush=True)
async def init_state_machine(self): async def init_state_machine(self):
await self._global_provider_manager.init_provider_register(self.postgres_database) await self._global_provider_manager.init_provider_register(self.postgres_database)

View File

@ -268,9 +268,11 @@ class WorkflowRunningEngine:
self.consciousness_node = consciousness_node self.consciousness_node = consciousness_node
self.control_node = control_node self.control_node = control_node
self.supervisory_node = supervisory_node self.supervisory_node = supervisory_node
self.global_state_machine = ray_actor_hook("global_state_machine").global_state_machine self.global_state_machine = None
async def run(self): async def run(self):
# Move actor hook to async start so we don't race during __init__ across cluster
self.global_state_machine = ray_actor_hook("global_state_machine").global_state_machine
self.workflow_queue = asyncio.Queue() self.workflow_queue = asyncio.Queue()
self.runner_engine = { self.runner_engine = {
f"runner_{i}": asyncio.create_task(self.runner(i)) f"runner_{i}": asyncio.create_task(self.runner(i))

View File

@ -35,7 +35,7 @@ class ActorList:
@lru_cache(maxsize=128) @lru_cache(maxsize=128)
def _get_cached_actor_handle(actor_name: str): def _get_cached_actor_handle(actor_name: str):
"""缓存接口""" """缓存接口"""
return ray.get_actor(actor_name) return ray.get_actor(actor_name, namespace="pretor")
def clear_actor_cache(): def clear_actor_cache():
"""清理接口""" """清理接口"""

View File

@ -1,4 +1,3 @@
import pytest
from pretor.core.database.table.provider import Provider from pretor.core.database.table.provider import Provider
def test_provider_table(): def test_provider_table():

View File

@ -1,4 +1,3 @@
import pytest
from pretor.core.database.table.user import User from pretor.core.database.table.user import User
def test_user_table(): def test_user_table():

View File

@ -1,4 +1,3 @@
import pytest
from pretor.core.global_state_machine.model_provider.base_provider import Provider, ProviderArgs, ProviderStatus from pretor.core.global_state_machine.model_provider.base_provider import Provider, ProviderArgs, ProviderStatus
def test_provider_status(): def test_provider_status():

View File

@ -5,7 +5,6 @@ from pretor.core.global_state_machine.provider_manager import ProviderManager
@pytest.mark.asyncio @pytest.mark.asyncio
async def test_provider_manager_init(): async def test_provider_manager_init():
from pretor.core.global_state_machine.provider_manager import ProviderManager
mock_postgres = MagicMock() mock_postgres = MagicMock()
mock_provider1 = MagicMock() mock_provider1 = MagicMock()
mock_provider1.provider_title = "title1" mock_provider1.provider_title = "title1"

View File

@ -1,4 +1,3 @@
import pytest
from unittest.mock import patch, MagicMock from unittest.mock import patch, MagicMock
from pretor.core.workflow.workflow_template_generator.workflow_template_generator import WorkflowTemplateGenerator from pretor.core.workflow.workflow_template_generator.workflow_template_generator import WorkflowTemplateGenerator

View File

@ -1,4 +1,3 @@
import pytest
import json import json
from unittest.mock import MagicMock, patch, mock_open from unittest.mock import MagicMock, patch, mock_open
from pathlib import Path from pathlib import Path

View File

@ -83,7 +83,6 @@ def test_decode_token_validation_error():
token = "valid.jwt.invalid.payload" token = "valid.jwt.invalid.payload"
payload = {"wrong": "payload"} payload = {"wrong": "payload"}
import pydantic
from fastapi import HTTPException from fastapi import HTTPException
with patch("jwt.decode", return_value=payload): with patch("jwt.decode", return_value=payload):