smolagents documentation
Agents(智能体)
Agents(智能体)
Smolagents 是一个实验性的 API,可能会随时发生变化。由于 API 或底层模型可能发生变化,代理返回的结果也可能有所不同。
要了解有关智能体和工具的更多信息,请务必阅读入门指南。本页面包含基础类的 API 文档。
智能体(Agents)
我们的智能体继承自 MultiStepAgent,这意味着它们可以执行多步操作,每一步包含一个思考(thought),然后是一个工具调用和执行。请阅读概念指南以了解更多信息。
我们提供两种类型的代理,它们基于主要的 Agent
类:
- CodeAgent 是默认代理,它以 Python 代码编写工具调用。
- ToolCallingAgent 以 JSON 编写工具调用。
两者在初始化时都需要提供参数 model
和工具列表 tools
。
智能体类
class smolagents.MultiStepAgent
< source >( tools: list model: Model prompt_templates: smolagents.agents.PromptTemplates | None = None instructions: str | None = None max_steps: int = 20 add_base_tools: bool = False verbosity_level: LogLevel = <LogLevel.INFO: 1> managed_agents: list | None = None step_callbacks: list[collections.abc.Callable] | dict[typing.Type[smolagents.memory.MemoryStep], collections.abc.Callable | list[collections.abc.Callable]] | None = None planning_interval: int | None = None name: str | None = None description: str | None = None provide_run_summary: bool = False final_answer_checks: list[collections.abc.Callable] | None = None return_full_result: bool = False logger: smolagents.monitoring.AgentLogger | None = None )
Parameters
- tools (
list[Tool]
) — Tools that the agent can use. - model (
Callable[[list[dict[str, str]]], ChatMessage]
) — Model that will generate the agent’s actions. - prompt_templates (PromptTemplates, optional) — Prompt templates.
- instructions (
str
, optional) — Custom instructions for the agent, will be inserted in the system prompt. - max_steps (
int
, default20
) — Maximum number of steps the agent can take to solve the task. - add_base_tools (
bool
, defaultFalse
) — Whether to add the base tools to the agent’s tools. - verbosity_level (
LogLevel
, defaultLogLevel.INFO
) — Level of verbosity of the agent’s logs. - managed_agents (
list
, optional) — Managed agents that the agent can call. - step_callbacks (
list[Callable]
|dict[Type[MemoryStep], Callable | list[Callable]]
, optional) — Callbacks that will be called at each step. - planning_interval (
int
, optional) — Interval at which the agent will run a planning step. - name (
str
, optional) — Necessary for a managed agent only - the name by which this agent can be called. - description (
str
, optional) — Necessary for a managed agent only - the description of this agent. - provide_run_summary (
bool
, optional) — Whether to provide a run summary when called as a managed agent. - final_answer_checks (
list[Callable]
, optional) — List of validation functions to run before accepting a final answer. Each function should:- Take the final answer and the agent’s memory as arguments.
- Return a boolean indicating whether the final answer is valid.
Agent class that solves the given task step by step, using the ReAct framework: While the objective is not reached, the agent will perform a cycle of action (given by the LLM) and observation (obtained from the environment).
extract_action
< source >( model_output: str split_token: str )
Parse action from the LLM output
from_dict
< source >( agent_dict: dict **kwargs ) → MultiStepAgent
Create agent from a dictionary representation.
from_folder
< source >( folder: str | pathlib.Path **kwargs )
Loads an agent from a local folder.
from_hub
< source >( repo_id: str token: str | None = None trust_remote_code: bool = False **kwargs )
Parameters
- repo_id (
str
) — The name of the repo on the Hub where your tool is defined. - token (
str
, optional) — The token to identify you on hf.co. If unset, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - trust_remote_code(
bool
, optional, defaults to False) — This flags marks that you understand the risk of running remote code and that you trust this tool. If not setting this to True, loading the tool from Hub will fail. - kwargs (additional keyword arguments, optional) —
Additional keyword arguments that will be split in two: all arguments relevant to the Hub (such as
cache_dir
,revision
,subfolder
) will be used when downloading the files for your agent, and the others will be passed along to its init.
Loads an agent defined on the Hub.
Loading a tool from the Hub means that you’ll download the tool and execute it locally. ALWAYS inspect the tool you’re downloading before loading it within your runtime, as you would do when installing a package using pip/npm/apt.
To be implemented in child classes
Interrupts the agent execution.
provide_final_answer
< source >( task: str images: list['PIL.Image.Image'] | None = None ) → str
Provide the final answer to the task, based on the logs of the agent’s interactions.
push_to_hub
< source >( repo_id: str commit_message: str = 'Upload agent' private: bool | None = None token: bool | str | None = None create_pr: bool = False )
Parameters
- repo_id (
str
) — The name of the repository you want to push to. It should contain your organization name when pushing to a given organization. - commit_message (
str
, optional, defaults to"Upload agent"
) — Message to commit while pushing. - private (
bool
, optional, defaults toNone
) — Whether to make the repo private. IfNone
, the repo will be public unless the organization’s default is private. This value is ignored if the repo already exists. - token (
bool
orstr
, optional) — The token to use as HTTP bearer authorization for remote files. If unset, will use the token generated when runninghuggingface-cli login
(stored in~/.huggingface
). - create_pr (
bool
, optional, defaults toFalse
) — Whether to create a PR with the uploaded files or directly commit.
Upload the agent to the Hub.
replay
< source >( detailed: bool = False )
Prints a pretty replay of the agent’s steps.
run
< source >( task: str stream: bool = False reset: bool = True images: list['PIL.Image.Image'] | None = None additional_args: dict | None = None max_steps: int | None = None )
Parameters
- task (
str
) — Task to perform. - stream (
bool
) — Whether to run in streaming mode. IfTrue
, returns a generator that yields each step as it is executed. You must iterate over this generator to process the individual steps (e.g., using a for loop ornext()
). IfFalse
, executes all steps internally and returns only the final answer after completion. - reset (
bool
) — Whether to reset the conversation or keep it going from previous run. - images (
list[PIL.Image.Image]
, optional) — Image(s) objects. - additional_args (
dict
, optional) — Any other variables that you want to pass to the agent run, for instance images or dataframes. Give them clear names! - max_steps (
int
, optional) — Maximum number of steps the agent can take to solve the task. if not provided, will use the agent’s default value.
Run the agent for the given task.
save
< source >( output_dir: str | pathlib.Path relative_path: str | None = None )
Saves the relevant code files for your agent. This will copy the code of your agent in output_dir
as well as autogenerate:
- a
tools
folder containing the logic for each of the tools undertools/{tool_name}.py
. - a
managed_agents
folder containing the logic for each of the managed agents. - an
agent.json
file containing a dictionary representing your agent. - a
prompt.yaml
file containing the prompt templates used by your agent. - an
app.py
file providing a UI for your agent when it is exported to a Space withagent.push_to_hub()
- a
requirements.txt
containing the names of the modules used by your tool (as detected when inspecting its code)
Perform one step in the ReAct framework: the agent thinks, acts, and observes the result. Returns either None if the step is not final, or the final answer.
Convert the agent to a dictionary representation.
Creates a rich tree visualization of the agent’s structure.
Reads past llm_outputs, actions, and observations or errors from the memory into a series of messages that can be used as input to the LLM. Adds a number of keywords (such as PLAN, error, etc) to help the LLM.
class smolagents.CodeAgent
< source >( tools: list model: Model prompt_templates: smolagents.agents.PromptTemplates | None = None additional_authorized_imports: list[str] | None = None planning_interval: int | None = None executor_type: typing.Literal['local', 'e2b', 'docker', 'wasm'] = 'local' executor_kwargs: dict[str, typing.Any] | None = None max_print_outputs_length: int | None = None stream_outputs: bool = False use_structured_outputs_internally: bool = False code_block_tags: str | tuple[str, str] | None = None **kwargs )
Parameters
- tools (
list[Tool]
) — Tools that the agent can use. - model (
Model
) — Model that will generate the agent’s actions. - prompt_templates (PromptTemplates, optional) — Prompt templates.
- additional_authorized_imports (
list[str]
, optional) — Additional authorized imports for the agent. - planning_interval (
int
, optional) — Interval at which the agent will run a planning step. - executor_type (
Literal["local", "e2b", "docker", "wasm"]
, default"local"
) — Type of code executor. - executor_kwargs (
dict
, optional) — Additional arguments to pass to initialize the executor. - max_print_outputs_length (
int
, optional) — Maximum length of the print outputs. - stream_outputs (
bool
, optional, defaultFalse
) — Whether to stream outputs during execution. - use_structured_outputs_internally (
bool
, defaultFalse
) — Whether to use structured generation at each action step: improves performance for many models.Added in 1.17.0
- code_block_tags (
tuple[str, str]
|Literal["markdown"]
, optional) — Opening and closing tags for code blocks (regex strings). Pass a custom tuple, or pass ‘markdown’ to use (”(?:python|py)", "\n
”), leave empty to use (””, ”
”). - **kwargs — Additional keyword arguments.
In this agent, the tool calls will be formulated by the LLM in code format, then parsed and executed.
Clean up resources used by the agent, such as the remote Python executor.
from_dict
< source >( agent_dict: dict **kwargs ) → CodeAgent
Create CodeAgent from a dictionary representation.
class smolagents.ToolCallingAgent
< source >( tools: list model: Model prompt_templates: smolagents.agents.PromptTemplates | None = None planning_interval: int | None = None stream_outputs: bool = False max_tool_threads: int | None = None **kwargs )
Parameters
- tools (
list[Tool]
) — Tools that the agent can use. - model (
Model
) — Model that will generate the agent’s actions. - prompt_templates (PromptTemplates, optional) — Prompt templates.
- planning_interval (
int
, optional) — Interval at which the agent will run a planning step. - stream_outputs (
bool
, optional, defaultFalse
) — Whether to stream outputs during execution. - max_tool_threads (
int
, optional) — Maximum number of threads for parallel tool calls. Higher values increase concurrency but resource usage as well. Defaults toThreadPoolExecutor
’s default. - **kwargs — Additional keyword arguments.
This agent uses JSON-like tool calls, using method model.get_tool_call
to leverage the LLM engine’s tool calling capabilities.
execute_tool_call
< source >( tool_name: str arguments: dict[str, str] | str )
Execute a tool or managed agent with the provided arguments.
The arguments are replaced with the actual values from the state if they refer to state variables.
process_tool_calls
< source >( chat_message: ChatMessage memory_step: ActionStep ) → ToolCall | ToolOutput
Process tool calls from the model output and update agent memory.
stream_to_gradio
smolagents.stream_to_gradio
< source >( agent task: str task_images: list | None = None reset_agent_memory: bool = False additional_args: dict | None = None )
Runs an agent with the given task and streams the messages from the agent as gradio ChatMessages.
GradioUI
您必须安装 gradio
才能使用 UI。如果尚未安装,请运行 pip install smolagents[gradio]
。
class smolagents.GradioUI
< source >( agent: MultiStepAgent file_upload_folder: str | None = None reset_agent_memory: bool = False )
Parameters
- agent (MultiStepAgent) — The agent to interact with.
- file_upload_folder (
str
, optional) — The folder where uploaded files will be saved. If not provided, file uploads are disabled. - reset_agent_memory (
bool
, optional, defaults toFalse
) — Whether to reset the agent’s memory at the start of each interaction. IfTrue
, the agent will not remember previous interactions.
Raises
ModuleNotFoundError
ModuleNotFoundError
— If thegradio
extra is not installed.
Gradio interface for interacting with a MultiStepAgent.
This class provides a web interface to interact with the agent in real-time, allowing users to submit prompts, upload files, and receive responses in a chat-like format.
It can reset the agent’s memory at the start of each interaction if desired.
It supports file uploads, which are saved to a specified folder.
It uses the gradio.Chatbot
component to display the conversation history.
This class requires the gradio
extra to be installed: smolagents[gradio]
.
Example:
from smolagents import CodeAgent, GradioUI, InferenceClientModel
model = InferenceClientModel(model_id="meta-llama/Meta-Llama-3.1-8B-Instruct")
agent = CodeAgent(tools=[], model=model)
gradio_ui = GradioUI(agent, file_upload_folder="uploads", reset_agent_memory=True)
gradio_ui.launch()
launch
< source >( share: bool = True **kwargs )
Launch the Gradio app with the agent interface.
upload_file
< source >( file file_uploads_log allowed_file_types = None )
Upload a file and add it to the list of uploaded files in the session state.
The file is saved to the self.file_upload_folder
folder.
If the file type is not allowed, it returns a message indicating the disallowed file type.
提示(Prompts)
class smolagents.PromptTemplates
< source >( )
Parameters
- system_prompt (
str
) — System prompt. - planning (PlanningPromptTemplate) — Planning prompt templates.
- managed_agent (ManagedAgentPromptTemplate) — Managed agent prompt templates.
- final_answer (FinalAnswerPromptTemplate) — Final answer prompt templates.
Prompt templates for the agent.
class smolagents.PlanningPromptTemplate
< source >( )
Prompt templates for the planning step.
class smolagents.ManagedAgentPromptTemplate
< source >( )
Prompt templates for the managed agent.
class smolagents.FinalAnswerPromptTemplate
< source >( )
Prompt templates for the final answer.