Orchestrating LLMs and Agents Meetup
Had a chance to attend a meetup on Orchestrating LLMs and Agents. The following are some of the points that I took note in the three sessions that were part of 2 hour long presentation and demos:
What Agentic Frameworks should you use to get results fast? - Sam Witteveen
- One year ago, the agents hit the scene
- Autonomous Agent:
- Something that produces a result using by itself
- Acting on the world - via tools, search
- Uses a LLM for decision making and reasoning
- Modular
- Supposedly act like humans
- More than just a linear chain
 
- Agents vs chains
- Put a loop on to chains
- Agents
- Do I have enough information to do what I want to do ?
 
- Chains have a clear starting point and end point and probably do not have loops
 
- Agents equation
- Output = function( context + instruction/query)
- Can be a call to a tool
- Can be a reasoning step
- Can be a final answer
 
- How do deal with different outputs from the function
- output -> controller -> tools
 
- Output = function( context + instruction/query)
- What do you need for agents to function ?
- You need a good LLM to create a good agent
- Last year this time, LLMs were not good enough for agents
- Mistral AI, Gemini, Cohere, OpenAI, groq and many more LLM models that can act
as reasoning machines for agents
- Ask a model 10 times and then average it out
- Ask multiple models and average it out
 
- Need a bunch of good tools
- Often need to make custom tools for specific use case
- Finance tools
- Social media tools
- Drug/Chemical look ups
- Other NN as tools
- Agent calls a finetuned BERT rather than use the LLM for sentiment analysis
 
 
- Tool Quality
- It is all about data quality from the tools
- How does the tool return the data it gets to help the LLM
- How does it handle failures
- The tool should be able to highlight what is important
- Most of the tools do not fail safe properly. This needs massive improvement
 
- A good agent framework
- Handle low level calls for you
- Have a good prompts that match the LLM: The way you prompt OpenAI and the
way you prompt Gemini is different
- How do we deal with this ? May be translate prompts across LLMs. Rewrite a Dalle prompts to another LLM
- Do a translation - text to text
- Meta prompt that will write a prompt based on your requirements
- Everyone is used to OpenAI and hence most of the LLMs use similar prompts like OpenAI
 
- Can use great tools
- Easy to use: Some frameworks are powerful but pain to use
- Flexibility
- Raptor paper agent framework
 
- Tracing
- Most of the agents suck in this area
 
- Version control
- Almost no one give this out of box
 
 
- Agent Components
- Task decomposition
- Plan and execute
- Self Critique
- Sequential vs Hierarchy vs Graph
- Manager and workers
- LLM programs
- ReACT
- Function Calling
 
- Task Decomposition
- How will you and LLM breakdown the result
- How do yo turn into a series of steps that can be executed
 
- Function Calling
- Structured responses
- Tool use
- Heavily use things like Pydantic, JSON and XML(Anthropic)
- Mostly for Prop models but it is changing. But it is coming on open source models
- Can finetune some open models
- Gemma is built for responsible AI development from the same research and technology used to create Gemini models
 
- Planning an agent
- Paper, Pen/Pencil are your friends
- Map out the steps of what you will want
- Map out the decisions that need to be made
- How could you constrain the decision
- Constrain the decision as much as possible
 
- Prompting for Agents
- Break things down
- Make it simple
- Ask them to check
- Smart Prompts - Prompts that you parse out
- Prompt post processing
 
- 3 types of memory
- Conversational memory
- Persona memory - self consistency
- Long term memory - Look ups
 
- Release dates
- Auto-GPT, BabyAGI, Camel, Generative Agents, Voyager
- Auto-GPT
- Does not have flexibility
 
- BabyAGI
- Task Creation
- Task Prioritization
 
- Minecraft Voyager
- Mini programs
 
- Engineer-GPT
- Breaking things down
- Preprompts
- Logical steps
 
 
- Auto Gen
- Made by MSFT
- Sept 2023 release
- Conversable agent
- Multi agent
- Conversation driven
- User Proxy agents - Can be a human or not a human
- Able to use more tools
 
- Auto Gen Studio
- No code version of Auto Gen
 
- Instructor
- Generating structure
- Not an agent framework
- Guide the LLM response back the way
- Function calling on steroids
- Pydantic and Schema generation
- JSON
 
- Crew AI
- Built on LangChain
- Multi Agent
- Mimics a org chart
- Can be sequential or hierarchical
- Tools from Langchain + CrewAI tools
- Heavily anthropomorphic
- role, goal and backstory as key input to CrewAI agent
- Each agent has a set of tools that it can call. It can delegate
- Define the agents, Define the tasks
 
- Lang Graph
- Inverse of CrewAI
- Crash course and example - Sam’s youtube channel
- Built using LCEL
- Stateful - Build a state machine
- Multi actor
- Persist that state and pass it around the graph
- Nodes - building blocks each one represents a function or a computation step
- Edges - Join the steps
- https://python.langchain.com/docs/langgraph
 
- Work on CrewAI and then work on Lang Graph
- CrewAI is rigid
- Task Weaver
- CrewAI, Lang Graph and Auto Gen
- Think about patterns of agents
- Every step we check the oracle and then move forward
- Is there a possibility if a tool is missing, LLM writes the code that mimics the tool
LLM Agents - B2C use cases & evaluation - Praveen Govindaraj
- B2C world
- L1 technical support
- L1 Auditor that audits the prompts
- Singtel team using Lang Graph to create an agent in the context of customer service
- L2 Technical support
- Using Mistral AI 8x7b-instruct
- L2 Auditor
- Q&A
- Is the system in production ?
- No. It is at a POC stage
 
- What vector database are using and what graph database are you using ?
- chromadb
 
- What is the $ value savings or time savings per agent ?
- 200k to 300k SGD per month
 
- How many people are working ?
- 3 member team
 
 
- Is the system in production ?
- LLM Data Agents
- Evaluation criteria for agents
- Query translation
- Context
- Groundedness
 
- Use trulensto track metrics- evaluate the agents
- evaluate the costs
 
- Track all the prompts
Using DSPy with Gemini and Gemma - Martin Andrews
- https://github.com/stanfordnlp/dspy
- https://dspy-docs.vercel.app/
- 2B and 7B model sizes
- text and instruct version
- No details on prompts
- Gemma is released under Apache 2.0
- Weights have terms of use
- Ollama == local model inference
- Gemini Pro - 1 million token context
- https://www.together.ai/
- https://jupytext.readthedocs.io/en/latest/
- Gemini Prompting guide
- Prompts are different from OpenAI prompting
 
- DSPy
- Different LLMs want different prompts
 
- Data generation
- More work needs to be done to tame smaller models
- DSPy is mainly for orchestration