Large Language Models - Meetup
Contents
This blogpost is a brief summary of the points mentioned in the meetup on Large Language Models hosted at Google Developer space
News You can use - Data, Inference and Training
- Google’s FLAN-T5 2022
- Fully open souce
- Instruction fine-tuned
- Available in multiple sizes
- Strangely under-rated
- SAT reading scores - Pre-trained models
- Google Blog post
- Available on Hugging face
- Googlers - 11B one is unfairly good
- Surprisingly powerful models available
- Data Selection
- Large scale pretraining
- Corpus fine-tuning
- Task fine-tuning
- Two easy fixes:
- N-gram filter
- Data Selection for language models via importance resampling - 2023
- Basic idea
- Train on data of similar data
- DSIR helps in creating good sampling data from random text on the internet
- 82.2 on Roberta whereas DSIR is 83
- Prefix detox
- Pretraining language models with human preferences
- Simple idea
- Instead of training on plain text
- Rate the pretraining text
- Training - use two control tokens during training
- Inference - use only control token that is good
- N-gram filter
- Running Large Models
- Large models can be difficult to run
- Low precision works surprisingly well
- use float16
- Most practical to least practical
- 8-bit quantisation
- LLM.INT8() - mean zero shot accuracy - 8-bit matrix multiplication for
transformers at scale
- Metrics are within standard error of original models
- Flexgen(4-bit)
- High-throughput GEnerative Inference of Large Language Models with a Single GPU
- lowers requirements of LLM inference
- Metric is tokens per second - Why is this metric important ?
- 1 bit quantization
- Binarized Neural Machine Translation
- Training Large Models
- Gradient Checkpointing
- save 50 percent GPU memory
- Key idea
- Don’t train the large model
- Train a parasite model
- LoRA - Low rank adaptation of large language models
- Reduce trainable parameters by 10,000x
- Reduce GPU memory
- minLoRA
- LORA and Stable Diffusion
- PEFT library
- PEFT for Whisper
- ControlNet
- Gradient Checkpointing
- Privacy
- Offsite Tuning - Transfer learning without full model
- How to make use of big-iron models without having to invest in big-iron ?
Supercharge ML experiments with Pytorch Lightning - Vivek Kalyan
- Handshakes - startup from SG
- https://slides.com/vivekkalyan/pytorch-lightning
langchain
- open source
- Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge
- LLMs are based on conditional generation based on a prompt
- RLHF
- LLMs can’t access the traditional software stack
- LLM alone is not enough
- Each call to LLM is individual
- Conversation state needs to be passed back in each time
- LLMs have a finite span we can pass in to
- LangChain is an open source project - build a structure around language models
- Allows fully features apps that interact with software stack
- Manage LLMs and prompts
- Integrate with APIs, databases and data sources
- supports python and typescript
- Seven components
- LLMs
- Prompt templates
- Tools
- Chains
- Memory
- Agents
- Index
- Every thing starts with a promot
- Everything in langchain is based on prompt
- Need to use prompts
- Prompt engineering
- Getting the model to generate text conditioned on some text
- The prompt that we input has a massive influence on the output
- Instruct GPT - Training language models to follow instructions with human feedback
- few shot learning
- Tools are individual components that LangChain can use in Chains
- Chain is made up of link of tools
- Chains
- Generic
- Utility
- Async chains
- fact extraction on a tech crunch article
- generate knowledge graph triples via facts
- PALChain
- Tell me the answer to a natural language based question
- Turn it in to code
- the code when run, returns the answer
- Model generates URL