This blogpost is a brief summary of the points mentioned in the meetup on Large Language Models hosted at Google Developer space

News You can use - Data, Inference and Training

  • Google’s FLAN-T5 2022
    • Fully open souce
    • Instruction fine-tuned
    • Available in multiple sizes
    • Strangely under-rated
    • SAT reading scores - Pre-trained models
    • Google Blog post
    • Available on Hugging face
    • Googlers - 11B one is unfairly good
    • Surprisingly powerful models available
  • Data Selection
    • Large scale pretraining
    • Corpus fine-tuning
    • Task fine-tuning
  • Two easy fixes:
    • N-gram filter
      • Data Selection for language models via importance resampling - 2023
      • Basic idea
        • Train on data of similar data
      • DSIR helps in creating good sampling data from random text on the internet
      • 82.2 on Roberta whereas DSIR is 83
    • Prefix detox
      • Pretraining language models with human preferences
      • Simple idea
        • Instead of training on plain text
        • Rate the pretraining text
      • Training - use two control tokens during training
      • Inference - use only control token that is good
  • Running Large Models
    • Large models can be difficult to run
    • Low precision works surprisingly well
      • use float16
    • Most practical to least practical
      • 8-bit quantisation
      • LLM.INT8() - mean zero shot accuracy - 8-bit matrix multiplication for transformers at scale
        • Metrics are within standard error of original models
      • Flexgen(4-bit)
        • High-throughput GEnerative Inference of Large Language Models with a Single GPU
        • lowers requirements of LLM inference
        • Metric is tokens per second - Why is this metric important ?
      • 1 bit quantization
        • Binarized Neural Machine Translation
  • Training Large Models
    • Gradient Checkpointing
      • save 50 percent GPU memory
    • Key idea
      • Don’t train the large model
      • Train a parasite model
    • LoRA - Low rank adaptation of large language models
      • Reduce trainable parameters by 10,000x
      • Reduce GPU memory
      • minLoRA
      • LORA and Stable Diffusion
      • PEFT library
    • PEFT for Whisper
    • ControlNet
  • Privacy
    • Offsite Tuning - Transfer learning without full model
  • How to make use of big-iron models without having to invest in big-iron ?

Supercharge ML experiments with Pytorch Lightning - Vivek Kalyan

langchain

  • open source
  • Large language models (LLMs) are emerging as a transformative technology, enabling developers to build applications that they previously could not. But using these LLMs in isolation is often not enough to create a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge
  • LLMs are based on conditional generation based on a prompt
  • RLHF
  • LLMs can’t access the traditional software stack
  • LLM alone is not enough
  • Each call to LLM is individual
  • Conversation state needs to be passed back in each time
  • LLMs have a finite span we can pass in to
  • LangChain is an open source project - build a structure around language models
  • Allows fully features apps that interact with software stack
  • Manage LLMs and prompts
  • Integrate with APIs, databases and data sources
  • supports python and typescript
  • Seven components
    • LLMs
    • Prompt templates
    • Tools
    • Chains
    • Memory
    • Agents
    • Index
  • Every thing starts with a promot
  • Everything in langchain is based on prompt
  • Need to use prompts
  • Prompt engineering
    • Getting the model to generate text conditioned on some text
    • The prompt that we input has a massive influence on the output
  • Instruct GPT - Training language models to follow instructions with human feedback
  • few shot learning
  • Tools are individual components that LangChain can use in Chains
  • Chain is made up of link of tools
  • Chains
    • Generic
    • Utility
    • Async chains
  • fact extraction on a tech crunch article
  • generate knowledge graph triples via facts
  • PALChain
    • Tell me the answer to a natural language based question
    • Turn it in to code
    • the code when run, returns the answer
  • Model generates URL