Pakistan's First Oracle Blog

Subscribe to Pakistan's First Oracle Blog feed
Blog By Fahd Mirza ChughtaiFahd Mirza
Updated: 7 hours 5 min ago

AWS PartyRock - Amazon Bedrock AI Playground

Thu, 2023-11-16 16:02

With PartyRock, you can build AI apps in seconds for free by using latest LLMs and without any code very easily.

Categories: DBA Blogs

Beginner Tutorial to Fine-Tune an AI Model

Thu, 2023-10-26 01:29

 This video steps through an easy tutorial to fine-tune a model on custom dataset from scratch by using LlamaIndex and Gradient.

Dataset Used:

{"inputs": "<s>### Instruction:\nWho is Fahd Mirza?\n\n### Response:\nFahd Mirza is an AI Cloud Engineer based in Sydney Australia. He has also got a background in databases and devops plus infrastrucutre.</s>"}

{"inputs": "<s>### Instruction:\nWhat are hobbies of Fahd Mirza?\n\n### Response\nFahd Mirza loves to spend time on his youtube channel and reading about technology.</s>"}

{"inputs": "<s>### Instruction:\nWhat Fahd Mirza's favorite Color?\n\n### Response:\nFahd Mirza's favorite color varies from time to time. These days its blue.</s>"}

{"inputs": "<s>### Instruction:\nWhat does Fahd Mirza look like?\n\n### Response:\nFahd Mirza looks like a human.</s>"}

.env File:



Commands Used:

!pip install llama-index gradientai -q

!pip install python-dotenv 

import os

from dotenv import load_dotenv, find_dotenv

_= load_dotenv(find_dotenv())

questions = [

    "Who is Fahd Mirza??",

    "What is Fahd Mirza's favorite Color?",

    "What are hobbies of Fahd Mirza?",


prompts = list(

    f"<s> ### Instruction:\n{q}\n\n###Response:\n" for q in questions



import os

from llama_index.llms import GradientBaseModelLLM

from llama_index.finetuning.gradient.base import GradientFinetuneEngine

base_model_slug = "nous-hermes2"

base_model_llm = GradientBaseModelLLM(

    base_model_slug=base_model_slug, max_tokens=100


base_model_responses = list(base_model_llm.complete(p).text for p in prompts)

finetune_engine = GradientFinetuneEngine(


    name="my test finetune engine model adapter",



epochs = 2

for i in range(epochs):


fine_tuned_model = finetune_engine.get_finetuned_model(max_tokens=100)

fine_tuned_model_responses = list(

    fine_tuned_model.complete(p).text for p in prompts



for i, q in enumerate(questions):

    print(f"Question: {q}")

    print(f"Base: {base_model_responses[i]}")

    print(f"Fine tuned: {fine_tuned_model_responses[i]}")


Categories: DBA Blogs

Setting Environment Variable in Google Colab

Wed, 2023-10-25 23:29

This video shows how to set environment variable and load them in Google Colab notebook, AWS Sagemaker notebook or Jupyter notebook.

Commands Used:

import os
from dotenv import load_dotenv, find_dotenv
_= load_dotenv(find_dotenv())

Categories: DBA Blogs

Step by Step Mistral 7B Installation Local on Linux Windows or in Cloud

Thu, 2023-10-19 22:25

 This is detailed tutorial as how to locally install Mistral 7B model in AWS, Linux, Windows, or anywhere you like.

Commands Used:

pip3 install optimum

pip3 install git+

git clone

cd AutoGPTQ

git checkout v0.4.2

pip3 install .

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/SlimOpenOrca-Mistral-7B-GPTQ"

# To use a different branch, change revision

# For example: revision="gptq-4bit-32g-actorder_True"

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,




tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

system_message = "You are an expert at bathroom renovations."

prompt = """

Renovate the following old bathroom:

I have a 25 year old house with an old bathroom. I want to renovate it completely. 

Think about it step by step, and give me steps to renovate the bathroom. Also give me cost of every step in Australian dollars.








print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)


# Inference can also be done using transformers' pipeline

print("*** Pipeline:")

pipe = pipeline(












Categories: DBA Blogs

Step by Step Fine-Tuning Mistral 7B with Custom Dataset

Sun, 2023-10-15 23:18

Large Language Models are trained on huge amount of data. Falcon 40B model, e.g. has been trained on 1 trillion tokens with 40 billion parameters. This training took around 2 months and 384 GPUs on AWS. 

If you want to use these LLMs for your own data, then you need to adapt them or fine-tune them. Fine-tuning a model larger than 10B is an expensive and time consuming task. 

This is where HuggingFace's PEFT library comes handy. PEFT stands for parameter efficent fine tuning. We can use a fine-tuning technique called as QLORA to train LLMs on our own dataset in far less time using far less resources. QLORA stands for Quantized Low Rank Adapation and allows us to to train a small portion of model without losing much efficieny. After the training is completed, there is no necessity to save the entire model, as the base model remains frozen.

Python Package Installation:


We begin by installing all the required dependencies. 

- The Huggingface Transformer Reinforcement Learning (TRL) library simplifies Reinforcement Learning from Human Feedback (RLHF) settings. 

- Transformers is a Python library that makes downloading and training state-of-the-art ML models easy.

- Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code

- Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters.

- Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks. 

- Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.

- einops stands for Einstein-Inspired Notation for operations. It is an open-source python framework for writing deep learning code in a new and better way.

- Tiktoken is an open-source tool developed by OpenAI that is utilized for tokenizing text. Tokenization is when you split a text string to a list of tokens. Tokens can be letters, words or grouping of words

- By using wandb, you can track, compare, explain and reproduce machine learning experiments.

- xFormers is a PyTorch based library which hosts flexible Transformers parts.

- SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.

!pip intall -q trl transformers accelerate peft datasets bitsandbytes einops tiktoken wandb xformers sentencepiece

Prepare Dataset:


I will be using Gath_baize dataset comprising approximately 210k prompts to train Mistral-7b. The dataset consists of a mixture of data from Alpaca, Stack Overflow, medical, and Quora datasets. In this load_dataset function we are loading the dataset with full train split as we are going to use this dataset in training. If we would be just testing it, then we would use split=test. 

from datasets import load_dataset

gathbaize = load_dataset("gathnex/Gath_baize",split="train")



gathbaize_sampled = gathbaize.shuffle(seed=42).select(range(50))


Check for GPU:


The NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.


Create LLM Model:


-Torch is an open source ML library used for creating deep neural networks 

-AutoModelForCausalLM used for auto-regressive models. regressive means referring to previous state. Auto-regressive models predict future values based on past values.

-A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model.

-Bitsandbytes library simplifies the process of model quantization, making it more accessible and user-friendly.

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

from peft import prepare_model_for_kbit_training   => this prepares the model for fine-tuning.

model_name = "ybelkada/Mistral-7B-v0.1-bf16-sharded"

- BitsandBytesConfig is configuration for QLORA. QLoRA reduces the memory usage of LLM finetuning without performance tradeoffs compared to standard 16-bit model finetuning. QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training. 

- The basic way to load a model in 4bit is to pass the argument load_in_4bit=True

- There are different variants of 4bit quantization such as NF4 (normalized float 4 (default)) or pure FP4 quantization. NF4 is better for performance.

- You can change the compute dtype of the quantized model by just changing the bnb_4bit_compute_dtype argument. A dtype (data type) object describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.

- bnb_4bit_use_double_quant uses a second quantization after the first one to save an additional 0.4 bits per parameter. 

bnb_config = BitsAndBytesConfig(

    load_in_4bit= True,

    bnb_4bit_quant_type= "nf4",

    bnb_4bit_compute_dtype= torch.bfloat16,

    bnb_4bit_use_double_quant= False,


- Whether or not to allow for custom models defined on the Hub in their own modeling files. 

model = AutoModelForCausalLM.from_pretrained(






- When fine-tuning the model, you want to use the updated model params. Using the old (cached) values kinda defeats the purpose of finetuning. Hence, the past (cached) key values are disregarded for the fine-tuned model.

- Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers

- Gradient check-pointing is only needed if training leads to out-of-memory (OOM) errors so its a sort of best practice.

model.config.use_cache = False

model.config.pretraining_tp = 1


model = prepare_model_for_kbit_training(model)

Create LLM Tokenizer:


- Pad_token is a special token used to make arrays of tokens the same size for batching purpose.

- eos_token is a special token used as an end of sentence token

- bos_token is a special token representing the beginning of a sentence.

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)

tokenizer.pad_token = tokenizer.eos_token

tokenizer.add_eos_token = True

tokenizer.add_bos_token, tokenizer.add_eos_token

from peft import LoraConfig, TaskType

- LoraConfig allows you to control how LoRA is applied to the base model through the following parameters:

lora_alpha: LoRA scaling factor.

r: the rank of the update matrices, expressed in int. Lower rank results in smaller update matrices with fewer trainable parameters.

bias: Specifies if the bias parameters should be trained. Can be 'none', 'all' or 'lora_only'.

target_modules: The modules (for example, attention blocks) to apply the LoRA update matrices.

(lora_dropout): This is the probability that each neuron's output is set to zero during training, used to prevent overfitting.

peft_config = LoraConfig(






target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]


from peft import get_peft_model

model = get_peft_model(model,peft_config)

from transformers import TrainingArguments

- num_train_epochs(`float`, *optional*, defaults to 3.0): Total number of training epochs to perform

- per_device_train_batch_size is the batch size per GPU/TPU core/CPU for training. 

- Gradient accumulation is a technique that simulates a larger batch size by accumulating gradients from multiple small batches before performing a weight update. This technique can be helpful in scenarios where the available memory is limited, and the batch size that can fit in memory is small.

- This parameter tells the optimizer how far to move the weights in the direction opposite of the gradient for a mini-batch.

- warmup_ration is ratio of total training steps used for a linear warmup from 0 to learning_rate.

- max steps  If set to a positive number, the total number of training steps to perform.

training_arguments = TrainingArguments(

    output_dir= "./results",

    num_train_epochs= 1,

    per_device_train_batch_size= 8,

    gradient_accumulation_steps= 2,

    optim = "paged_adamw_8bit",

    save_steps= 5000,

    logging_steps= 30,

    learning_rate= 2e-4,

    weight_decay= 0.001,

    fp16= False,

    bf16= False,

    max_grad_norm= 0.3,

    max_steps= -1,

    warmup_ratio= 0.3,

    group_by_length= True,

    lr_scheduler_type= "constant"


from trl import SFTTrainer

- The SFTTrainer is a light wrapper around the transformers Trainer to easily fine-tune language models or adapters on a custom dataset.

- max_seq_length: maximum sequence length to use for the `ConstantLengthDataset` and for automaticallty creating the Dataset. Defaults to `512`.

- SFTTrainer supports example packing, where multiple short examples are packed in the same input sequence to increase training efficiency.


trainer = SFTTrainer(








packing= False,



Saving the Model:


trained_model_dir = './trained_model'


Load the Trained Model:


from peft import PeftConfig, PeftModel

config = PeftConfig.from_pretrained(trained_model_dir)

trained_model = AutoModelForCausalLM.from_pretrained(







trained_model = PeftModel.from_pretrained(trained_model,trained_model_dir)

trained_model_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path,trust_remote_code=True)

trained_model_tokenizer.pad_token = trained_model_tokenizer.eos_token

Create Generation Config for Prediction:


generation_config = trained_model.generation_config

generation_config.max_new_token = 1024

generation_config.tempreture = 0.7

generation_config.top_p = 0.7

generation_config.num_return_sequence = 1

generation_config.pad_token_id = trained_model_tokenizer.pad_token_id

generation_config.eos_token_id = trained_model_tokenizer.eos_token_id


Model Inference:


device = 'cuda:0'

query = 'larget text to be summarized'

user_prompt = 'Explain large language models'

system_prompt = 'The conversation between Human and AI assisatance named MyMistral\n'

B_INST, E_INST = "[INST]", "[/INST]"

prompt = f"{system_prompt}{B_INST}{user_prompt.strip()}\n{E_INST}"

encodings = trained_model_tokenizer(prompt, return_tensors='pt').to(device)


with torch.inference_mode():

outputs = trained_model.generate(







outputs = trained_model_tokenizer.decode(outputs[0],skip_special_tokens=True)


Categories: DBA Blogs

AlloyDB Omni with Vertex AI Installation Locally in AWS

Thu, 2023-10-12 19:13

 This video is step by step tutorial to install AlloyDB Omni with Vertex AI support locally in AWS.

Commands Used:

    2  sudo curl | sh   && sudo systemctl --now enable docker

    3  sudo apt-get update

    4  sudo groupadd docker

    5  sudo usermod -aG docker ${USER}

    6  sudo systemctl restart docker

    7  stat -fc %T /sys/fs/cgroup/

    8  sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo

    9  echo "deb [signed-by=/usr/share/keyrings/] cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

   10  curl | sudo apt-key --keyring /usr/share/keyrings/ add -

   11  sudo apt-get update && sudo apt-get install google-cloud-cli

   12  gcloud init

   13  cat /etc/*release

   14  curl | sudo tee /usr/share/keyrings/

   15  sudo apt-get update && sudo apt-get install google-cloud-cli

   16  gcloud init

   17  curl | sudo apt-key add -

   18  sudo apt update

   19  echo "deb alloydb-omni-apt main"   | sudo tee -a /etc/apt/sources.list.d/artifact-registry.list

   20  sudo apt update

   21  sudo apt-get install alloydb-cli

   22  sudo alloydb system-check

   23  df -hT

   24  cd /

   25  ls

   29  sudo mkdir /alloydb

   32  sudo chown ubuntu:ubuntu /alloydb

   33  sudo chmod 777 /alloydb

   49  sudo alloydb database-server install     --data-dir=/alloydb     --enable-alloydb-ai=true     --private-key-file-path=/home/ubuntu/key.json     --vertex-ai-region="us-central1"

   50  sudo alloydb database-server start

   56  docker exec -it pg-service psql -h localhost -U postgres

Categories: DBA Blogs

Tutorial Amazon Bedrock to Create Chatbot with Persona

Fri, 2023-10-06 17:30

 This video tutorial shows how code and step by step description with demo as how to use AWS Bedrock to create chatbot with persona.


import boto3
import json
import os
import sys

from langchain.chains import ConversationChain
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory

from langchain.prompts import PromptTemplate

template = """The following is a friendly conversation between a human and an AI.
              The AI is talkative and provides lots of specific details from its context.
              If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
Human: {input}

claude_prompt = PromptTemplate(input_variables=["history", "input"], template=template)

bedrock = boto3.client(

memory = ConversationBufferMemory(ai_prefix="Assistant")
memory.chat_memory.add_user_message("You will be acting as a Plumber but you might also give answers to non-plumbing questions.")
memory.chat_memory.add_ai_message("I am a Plumber and give professional answers")

cl_llm = Bedrock(model_id="anthropic.claude-v2",client=bedrock)

conversation = ConversationChain(
     llm=cl_llm, verbose=True, memory=memory

conversation.prompt = claude_prompt

#print(conversation.predict(input="What are steps to renovate a bathroom?"))
#print(conversation.predict(input="How do you fix a leaking tap?"))
print(conversation.predict(input="how to write a python program to reverse a list?"))

Categories: DBA Blogs

Clone Any Voice with AI - Locally Install XTTS Model

Sat, 2023-09-16 21:52

 This video shows in step by step tutorial as how to install and run Coqui XTTS model locally. TTS is a Voice generation model that lets you clone voices into different languages by using just a quick 3-second audio clip.

Commands Used:

!pip install transformers !pip install tts from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True) tts.tts_to_file(text="This is my new cloned voice in AI. If you like, don't forget to subscribe to this channel.", file_path="output.wav", speaker_wav="speaker.wav", language="en")

Categories: DBA Blogs

How to Install Llama 2 on Google Cloud Platform - Step by Step Tutorial

Thu, 2023-09-14 21:42

 This video shows you step by step instructions as how to deploy and run Llama 2 and Code Llama models on GCP in Vertex AI API easily and quickly.

Categories: DBA Blogs

Step by Step Demo of Vertex AI in GCP

Wed, 2023-09-13 20:52

 This tutorial gets your started with GCP Vertex AI Generative AI service in step by step demo.

Commands Used:

gcloud services enable

gcloud iam service-accounts create <Your Service Account Name>

gcloud projects add-iam-policy-binding <Your Project ID> \

    --member=serviceAccount:<Your Service Account Name>@<Your Project ID> \


from google.auth.transport.requests import Request

from google.oauth2.service_account import Credentials

key_path='<Your Project ID>.json'

credentials = Credentials.from_service_account_file(



if credentials.expired:


PROJECT_ID = '<Your Project ID>'

REGION = 'us-central1'

!pip install -U google-cloud-aiplatform "shapely<2"

import vertexai

# initialize vertex

vertexai.init(project = PROJECT_ID, location = REGION, credentials = credentials)

from vertexai.language_models import TextGenerationModel

generation_model = TextGenerationModel.from_pretrained("text-bison@001")

prompt = "I want to self manage a bathroom renovation project in my home. \

Please suggest me step by step plan to carry out this project."


Categories: DBA Blogs

Gradient Tutorial to Fine Tune LLM for Free - Step by Step

Tue, 2023-09-12 22:42

This video is tutorial of fine-tuning large language model in Gradient using Python in AWS. With Gradient, you can fine tune and get completions on private LLMs with a simple web API. No infrastructure needed. Build private, SOC2 compliant AI applications instantly.

Commands Used:

!pip install transformer

!pip install gradientai --upgrade

import os


os.environ['GRADIENT_WORKSPACE_ID'] = "<Workspace ID>"

from gradientai import Gradient

def main():

  with Gradient() as gradient:

      base_model = gradient.get_base_model(base_model_slug="nous-hermes2")

      new_model_adapter = base_model.create_model_adapter(

          name="My Model"


      print(f"Model Adapter Id {}")

      sample_query = "### Instruction: Who is Fahd Mirza? \n\n### Response:"

      print(f"Asking: {sample_query}")

      # before fine-tuning

      completion = new_model_adapter.complete(query=sample_query, max_generated_token_count=100).generated_output

      print(f"Before fine-tuning): {completion}")

samples = [


        "inputs": "### Instruction: Who is Fahd Mirza? \n\n### Response: Fahd Mirza is a technologist who shares his expertise on YouTube, covering topics such as AI, Cloud, DevOps, and databases."



        "inputs": "### Instruction: Please provide information about Fahd Mirza. \n\n### Response: Fahd Mirza is an experienced cloud engineer, AI enthusiast, and educator who creates educational content on various technical subjects on YouTube."



        "inputs": "### Instruction: What can you tell me about Fahd Mirza? \n\n### Response: Fahd Mirza is a content creator on YouTube, specializing in AI, Cloud, DevOps, and database technologies. He is known for his informative videos."



        "inputs": "### Instruction: Describe Fahd Mirza for me. \n\n### Response: Fahd Mirza is a YouTuber and blogger hailing from Australia, with a strong background in cloud engineering and artificial intelligence."



        "inputs": "### Instruction: Give me an overview of Fahd Mirza. \n\n### Response: Fahd Mirza, based in Australia, is a seasoned cloud engineer and AI specialist who shares his knowledge through YouTube content on topics like AI, Cloud, DevOps, and databases."



        "inputs": "### Instruction: Who exactly is Fahd Mirza? \n\n### Response: Fahd Mirza is an Australian-based content creator known for his YouTube channel, where he covers a wide range of technical subjects, including AI, Cloud, DevOps, and databases."



      num_epochs = 5

      count = 0

      while count < num_epochs:

          print(f"Fine-tuning the model, Epoch iteration => {count + 1}")


          count = count + 1

      # After fine-tuning

      completion = new_model_adapter.complete(query=sample_query, max_generated_token_count=100).generated_output

      print(f"After Fine-Tuning: {completion}")


if __name__ == "__main__":


Categories: DBA Blogs

Step by Step - How to Install NVIDIA Container Toolkit

Thu, 2023-09-07 21:31

 This video shows step by step guide as how to install and setup NVIDIA Container Toolkit on Ubuntu with Docker.

Commands Used:

      ubuntu-drivers devices

      sudo apt install ubuntu-drivers-common

      ubuntu-drivers devices

      cat /etc/os-release

      sudo apt autoremove nvidia* --purge

     sudo /usr/bin/nvidia-uninstall

     sudo /usr/local/cuda-X.Y/bin/cuda-uninstall

     sudo apt update

     sudo apt upgrade

     sudo ubuntu-drivers autoinstall


     curl | sh   && sudo systemctl --now enable docker

     distribution=$(. /etc/os-release;echo $ID$VERSION_ID)       && curl -fsSL | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg       && curl -s -L$distribution/libnvidia-container.list |             sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |             sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

     sudo apt-get update

     sudo apt-get install -y nvidia-container-toolkit

     sudo nvidia-ctk runtime configure --runtime=docker

     sudo systemctl restart docker

     sudo docker run --rm --runtime=nvidia --gpus all nvidia/cuda:11.6.2-base-ubuntu20.04 nvidia-smi

     sudo groupadd docker

     sudo usermod -aG docker ${USER}

     docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting

     sudo docker run -d --rm -p 8008:8008 -v perm-storage:/perm_storage --gpus all smallcloud/refact_self_hosting

Categories: DBA Blogs

Falcon-180B Local Installation on Linux or Windows - Step by Step

Thu, 2023-09-07 20:58

 This is an installation tutorial of Falcon-180B model locally on Linux or Windows with all the steps.

Commands Used:

pip3 install transformers>=4.33.0 optimum>=1.12.0

!git clone

cd AutoGPTQ

! git checkout a7167b1

!pip3 install .

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/Falcon-180B-Chat-GPTQ"

# To use a different branch, change revision

# For example: revision="gptq-3bit--1g-actorder_True"

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,



tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

prompt = "What is capital of Australia"

prompt_template=f'''User: {prompt}

Assistant: '''

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, do_sample=True, temperature=0.7, max_new_tokens=512)


# Inference can also be done using transformers' pipeline

print("*** Pipeline:")

pipe = pipeline(











Categories: DBA Blogs

Text to Audio AI Local Tool Free Installation - AUDIOLM 2

Tue, 2023-09-05 03:14

 This video is a step by step guide as how to install AudioLDM 2 locally in AWS to convert images to video in Ubuntu. AudioLDM support Text-to-Audio (including Music) and Text-to-Speech Generation.

Commands Used:

      sudo apt update

      python3 --version

      sudo apt install python3-pip

      export PATH="$HOME/.local/bin:$PATH"

      cd /tmp




      source ~/.bashrc

      conda info

     conda create -n audioldm python=3.8; conda activate audioldm

     pip3 install git+

     git clone; cd AudioLDM2


Categories: DBA Blogs

TextBase Local Installation to Build AI ChatBots

Mon, 2023-09-04 19:06

 This video is a step by step guide as how to install TextBase locally in AWS to build your own AI chatbots easily in Linux and Windows or Mac.

Commands Used:

      sudo apt update

      python3 --version

      sudo apt install python3-pip

      git clone

      cd textbase

      pip install poetry

      export PATH="$HOME/.local/bin:$PATH"

      poetry config true

      poetry shell

      . /home/ubuntu/textbase/.venv/bin/activate

      poetry install

Categories: DBA Blogs

Install and Train Model on AWS SageMaker - Step by Step

Sat, 2023-09-02 00:36

This tutorial guides you as how to install and run Ludwig to train a model on AWS Sagemaker easily and quickly. This shows training Llama model on AWS Sagemaker.


Commands Used:

!pip install transformers

!git clone

cd AutoGPTQ

! git checkout a7167b1

!pip3 install .

!pip uninstall -y tensorflow --quiet

!pip install "ludwig[llm]" --quiet

!pip install huggingface_hub

from huggingface_hub import notebook_login


import yaml

config_str = """

model_type: llm

base_model: TheBloke/Llama-2-7B-GPTQ


  bits: 4


  type: lora


  template: |

    ### Instruction:


    ### Input:


    ### Response:


  - name: prompt

    type: text


  - name: output

    type: text


  type: finetune

  learning_rate: 0.0001

  batch_size: 1

  gradient_accumulation_steps: 16

  epochs: 3


    warmup_fraction: 0.01


  sample_ratio: 0.1


config = yaml.safe_load(config_str)

import logging

from ludwig.api import LudwigModel

model = LudwigModel(config=config, logging_level=logging.INFO)

results = model.train(dataset="ludwig://alpaca")


Categories: DBA Blogs

LlamaGPT Installation on AWS - Step by Step Tutorial

Thu, 2023-08-31 17:58

 LlamaGPT is a self-hosted, offline, ChatGPT-like chatbot, powered by Llama 2. 100% private, with no data leaving your device.

It supported following models at the moment:

Model nameModel sizeModel download sizeMemory requiredNous Hermes Llama 2 7B Chat (GGML q4_0)7B3.79GB6.29GBNous Hermes Llama 2 13B Chat (GGML q4_0)13B7.32GB9.82GBNous Hermes Llama 2 70B Chat (GGML q4_0)70B38.87GB41.37GBCode Llama 7B Chat (GGUF Q4_K_M)7B4.24GB6.74GBCode Llama 13B Chat (GGUF Q4_K_M)13B8.06GB10.56GBPhind Code Llama 34B Chat (GGUF Q4_K_M)34B20.22GB22.72GB

Commands Used:

git clone cd llama-gpt/ sudo chmod 666 /var/run/docker.sock ./ --model 7b Then access it in browser either using your IP or localhost: http://localhost:3000

Categories: DBA Blogs

Semantic Kernel Tutorial for Beginners

Wed, 2023-08-30 21:18

 This guide shows you how to create semantic function in semantic kernel in Python with code example and explain the concepts in simple language.

Commands Used:

!pip install semantic-kernel !pip install torch !pip install transformers !pip install sentence-transformers !pip install huggingface_hub from huggingface_hub import notebook_login notebook_login() import semantic_kernel as sk import as sk_hf kernel = sk.Kernel() kernel.add_text_completion_service( "gpt2", sk_hf.HuggingFaceTextCompletion("gpt2", task="text-generation") ) prompt = """ {{$input}} Summarize the content above in less than 100 characters. """ summary_function = kernel.create_semantic_function(prompt_template = prompt, description="Summary example of SK", max_tokens=200, temperature=0.1, top_p=0.5) print("Semantic Function Created."); text = """ Oracle Database is the world's most popular database. Available on cloud and on-premise platforms, Oracle Database 19c is the most recent long term release, with an extended support window. Oracle Database 21c is the latest innovation release, initially available on Oracle cloud through Autonomous Database Free Tier and Database Cloud Service. Oracle Cloud is a cloud computing service offered by Oracle Corporation providing servers, storage, network, applications and services through a global network of Oracle Corporation managed data centers. The company allows these services to be provisioned on demand over the Internet. Oracle Cloud Infrastructure (OCI) is a set of complementary cloud services that enable you to build and run a range of applications and services in a highly available hosted environment. OCI provides high-performance compute capabilities (as physical hardware instances) and storage capacity in a flexible overlay virtual network that is securely accessible from your on-premises network. """; summary_result = summary_function(text) print("### Summary=> " + str(summary_result))

# Text Source is Oracle Docs

Categories: DBA Blogs

Tutorial - How Tokenization Work in LLM

Sat, 2023-08-26 19:54

 Tokenization means to split large text into smaller units for LLMs to 'digest.' Tokenization converts text into numbers as LLMs understand numbers and not text. Every model has its own tokenizer so when you are using a model, make sure to use its correct tokenizer otherwise model's output could be wrong. 

Lets see how tokenizer work in demo. I am using google colab. Lets first isntall some pre-req libraries. 

We are using Autotokenizer from transformer library which automatically finds right tokenizer for your model. 

Lets tokenize some text:

!pip install transformers

!pip install datasets

!pip install huggingface_hub

import pandas as pd

import datasets

from pprint import pprint

from transformers import AutoTokenizer

from huggingface_hub import notebook_login


tokenizer = AutoTokenizer.from_pretrained("TinyPixel/Llama-2-7B-bf16-sharded")   or stabilityai/stablecode-instruct-alpha-3b

text = "I am in Sydney."

tokenized_text = tokenizer(text)["input_ids"]


Untokenized_text = tokenizer.decode(tokenized_text)


In real world, there will be lot of text, so lets see example of that:

list_text = ["I am in Sydney", "Near Bronte Beach", "Not near Blue mountains", "wow"]

tokenized_text = tokenizer(list_text)


As you can see that lists in this output are not of same length. Models need every list of tokens of same length because we use fixed number of tensors. So next step is to make all of these lists of same size. To do that, we first determine whats the max length of lists, and then expand each list to that length. This process is called as padding. Lets see this example:

tokenizer.pad_token = tokenizer.eos_token 

tokenized_texts_longest = tokenizer(list_text, padding=True)


Now another thing is that every model has a max length whigh is limit of tokens. So we need to truncate the tokens as per max length.  This is how you do it. 

tokenized_texts_final = tokenizer(list_text, max_length=3, truncation=True, padding=True)


Categories: DBA Blogs

Fine-Tune Any Model Locally or in AWS SageMaker on Your Own Dataset

Sat, 2023-08-26 03:10

If you have your dataset in pdf or any other format and you want to train Llama or any other LLM on this custom dataset then this video will help.

Commands Used:

!pip install transformers

!pip install autotrain-advanced

!pip install huggingface_hub

!autotrain setup --update-torch

# Get Huggingface token from

from huggingface_hub import notebook_login


!autotrain llm --train --project_name customllm --model TinyPixel/Llama-2-7B-bf16-sharded --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 2 --num_train_epochs 3 --trainer sft --model_max_length 2048



### Instruction:

How to learn AI

### Response:

Read and Practice

### Instruction:

How to relax

### Response:

exercise in morning

### Instruction:

How to sleep well

### Response:

Sleep in dark and quiet room

Categories: DBA Blogs