Orchestrating LLMs and Agents Meetup
Had a chance to attend a meetup on Orchestrating LLMs and Agents. The following are some of the points that I took note in the three sessions that were part of 2 hour long presentation and demos:
What Agentic Frameworks should you use to get results fast? - Sam Witteveen
- One year ago, the agents hit the scene
- Autonomous Agent:
- Something that produces a result using by itself
- Acting on the world - via tools, search
- Uses a LLM for decision making and reasoning
- Modular
- Supposedly act like humans
- More than just a linear chain
- Agents vs chains
- Put a loop on to chains
- Agents
- Do I have enough information to do what I want to do ?
- Chains have a clear starting point and end point and probably do not have loops
- Agents equation
- Output = function( context + instruction/query)
- Can be a call to a tool
- Can be a reasoning step
- Can be a final answer
- How do deal with different outputs from the function
- output -> controller -> tools
- Output = function( context + instruction/query)
- What do you need for agents to function ?
- You need a good LLM to create a good agent
- Last year this time, LLMs were not good enough for agents
- Mistral AI, Gemini, Cohere, OpenAI, groq and many more LLM models that can act
as reasoning machines for agents
- Ask a model 10 times and then average it out
- Ask multiple models and average it out
- Need a bunch of good tools
- Often need to make custom tools for specific use case
- Finance tools
- Social media tools
- Drug/Chemical look ups
- Other NN as tools
- Agent calls a finetuned BERT rather than use the LLM for sentiment analysis
- Tool Quality
- It is all about data quality from the tools
- How does the tool return the data it gets to help the LLM
- How does it handle failures
- The tool should be able to highlight what is important
- Most of the tools do not fail safe properly. This needs massive improvement
- A good agent framework
- Handle low level calls for you
- Have a good prompts that match the LLM: The way you prompt OpenAI and the
way you prompt Gemini is different
- How do we deal with this ? May be translate prompts across LLMs. Rewrite a Dalle prompts to another LLM
- Do a translation - text to text
- Meta prompt that will write a prompt based on your requirements
- Everyone is used to OpenAI and hence most of the LLMs use similar prompts like OpenAI
- Can use great tools
- Easy to use: Some frameworks are powerful but pain to use
- Flexibility
- Raptor paper agent framework
- Tracing
- Most of the agents suck in this area
- Version control
- Almost no one give this out of box
- Agent Components
- Task decomposition
- Plan and execute
- Self Critique
- Sequential vs Hierarchy vs Graph
- Manager and workers
- LLM programs
- ReACT
- Function Calling
- Task Decomposition
- How will you and LLM breakdown the result
- How do yo turn into a series of steps that can be executed
- Function Calling
- Structured responses
- Tool use
- Heavily use things like Pydantic, JSON and XML(Anthropic)
- Mostly for Prop models but it is changing. But it is coming on open source models
- Can finetune some open models
- Gemma is built for responsible AI development from the same research and technology used to create Gemini models
- Planning an agent
- Paper, Pen/Pencil are your friends
- Map out the steps of what you will want
- Map out the decisions that need to be made
- How could you constrain the decision
- Constrain the decision as much as possible
- Prompting for Agents
- Break things down
- Make it simple
- Ask them to check
- Smart Prompts - Prompts that you parse out
- Prompt post processing
- 3 types of memory
- Conversational memory
- Persona memory - self consistency
- Long term memory - Look ups
- Release dates
- Auto-GPT, BabyAGI, Camel, Generative Agents, Voyager
- Auto-GPT
- Does not have flexibility
- BabyAGI
- Task Creation
- Task Prioritization
- Minecraft Voyager
- Mini programs
- Engineer-GPT
- Breaking things down
- Preprompts
- Logical steps
- Auto Gen
- Made by MSFT
- Sept 2023 release
- Conversable agent
- Multi agent
- Conversation driven
- User Proxy agents - Can be a human or not a human
- Able to use more tools
- Auto Gen Studio
- No code version of Auto Gen
- Instructor
- Generating structure
- Not an agent framework
- Guide the LLM response back the way
- Function calling on steroids
- Pydantic and Schema generation
- JSON
- Crew AI
- Built on LangChain
- Multi Agent
- Mimics a org chart
- Can be sequential or hierarchical
- Tools from Langchain + CrewAI tools
- Heavily anthropomorphic
- role, goal and backstory as key input to CrewAI agent
- Each agent has a set of tools that it can call. It can delegate
- Define the agents, Define the tasks
- Lang Graph
- Inverse of CrewAI
- Crash course and example - Sam’s youtube channel
- Built using LCEL
- Stateful - Build a state machine
- Multi actor
- Persist that state and pass it around the graph
- Nodes - building blocks each one represents a function or a computation step
- Edges - Join the steps
- https://python.langchain.com/docs/langgraph
- Work on CrewAI and then work on Lang Graph
- CrewAI is rigid
- Task Weaver
- CrewAI, Lang Graph and Auto Gen
- Think about patterns of agents
- Every step we check the oracle and then move forward
- Is there a possibility if a tool is missing, LLM writes the code that mimics the tool
LLM Agents - B2C use cases & evaluation - Praveen Govindaraj
- B2C world
- L1 technical support
- L1 Auditor that audits the prompts
- Singtel team using Lang Graph to create an agent in the context of customer service
- L2 Technical support
- Using Mistral AI 8x7b-instruct
- L2 Auditor
- Q&A
- Is the system in production ?
- No. It is at a POC stage
- What vector database are using and what graph database are you using ?
- chromadb
- What is the $ value savings or time savings per agent ?
- 200k to 300k SGD per month
- How many people are working ?
- 3 member team
- Is the system in production ?
- LLM Data Agents
- Evaluation criteria for agents
- Query translation
- Context
- Groundedness
- Use
trulens
to track metrics- evaluate the agents
- evaluate the costs
- Track all the prompts
Using DSPy with Gemini and Gemma - Martin Andrews
- https://github.com/stanfordnlp/dspy
- https://dspy-docs.vercel.app/
- 2B and 7B model sizes
- text and instruct version
- No details on prompts
- Gemma is released under Apache 2.0
- Weights have terms of use
- Ollama == local model inference
- Gemini Pro - 1 million token context
- https://www.together.ai/
- https://jupytext.readthedocs.io/en/latest/
- Gemini Prompting guide
- Prompts are different from OpenAI prompting
- DSPy
- Different LLMs want different prompts
- Data generation
- More work needs to be done to tame smaller models
- DSPy is mainly for orchestration