Pakistan's First Oracle Blog

Subscribe to Pakistan's First Oracle Blog feed
Blog By Fahd Mirza ChughtaiFahd Mirza
Updated: 11 hours 26 min ago

Step by Step Guide to Configure Amazon Bedrock with VPC Endpoints and PrivateLink

Thu, 2024-02-01 04:20

This video is step by step tutorial to setup AWS Bedrock with VPC Endpoints and PrivateLink to build secure and private generative AI applications.


Step 0: Make sure that Private subnet with private route table without any route to internet is there.

Step 1: Create 2 SG  = Bedrock-Endpoint-SG and Bedrock-Lambda-SG

Step 2: In Bedrock-Lambda-SG , ADD Bedrock-EndPoint-SG for all traffic in INBOUND, and OUTBOUND FOR

Step 3: In Bedrock-EndpointP-SG, Add Bedrock-Lambda-SG for all traffic in INBOUND and OUTBOUND 

Step 4: Create 2 endpoints bedrock, bedrock-runtime in private subnet and attach Bedrock-EndpointP-SG with both

Step 5: Create lambda function, set time to 15 seconds, and attach Bedrock-Lambda-SG, lambda execution role should have bedrock permissions

Lambda Code:

import boto3

import json

def lambda_handler(event,context):

    bedrock = boto3.client(





    # Bedrock Runtime client used to invoke and question the models

    bedrock_runtime = boto3.client(




    models = bedrock.list_foundation_models().get('modelSummaries')

    for model in models:

        print(model['modelName'] + ', Input=' + '-'.join(model['inputModalities']) + ', Output=' + ''.join(model['outputModalities']) + ', Provider=' + model['providerName'])





Categories: DBA Blogs

How to Identify Oracle Database Orphan Sessions

Fri, 2024-01-26 00:17

 In the world of database management, particularly with Oracle databases, "orphan sessions" are a common issue that can affect performance and resource utilization. 

In Oracle databases, an orphan session, sometimes known as a "zombie session," is a session that remains in the database even though its corresponding client process has terminated. These sessions no longer have a user actively interacting with them, yet they consume system resources and can hold locks, leading to performance degradation and blocking issues.

Orphan sessions can occur due to various reasons such as:

  • Network issues that disrupt the connection between the client and the server.
  • Application or client crashes that terminate the session abnormally.
  • Database bugs or misconfigurations.

Queries to Identify Orphan Sessions:

SELECT s.sid, s.serial#, p.spid, s.username, s.program
FROM v$session s
JOIN v$process p ON p.addr = s.paddr

This query lists active sessions, excluding background processes. It provides session identifiers (sid, serial#), the operating system process identifier (spid), and the username and program name. Orphan sessions often show NULL or unusual entries in the program column.

SELECT s.sid, s.serial#, p.spid, s.username, s.program
FROM v$session s
JOIN v$process p ON p.addr = s.paddr
AND NOT EXISTS (SELECT NULL FROM v$process WHERE spid = s.process);

This query filters the sessions where the client process (spid) associated with the session does not exist in the v$process view, indicating a potential orphan.

SELECT s.sid, s.serial#, l.object_id, o.object_name, o.object_type
FROM v$session s
JOIN dba_objects o ON o.object_id = l.object_id
JOIN v$lock l ON s.sid = l.sid
WHERE s.sid IN (SELECT sid FROM v$session WHERE ... /* Conditions from above queries */);

This query identifies locks held by sessions suspected to be orphans, which is useful for understanding the impact of these sessions on the database.

How to Manage Orphan Sessions:

Manual Termination: Using the ALTER SYSTEM KILL SESSION command to terminate the identified orphan sessions. Or Kill at OS level with kill -9 spid command.

Automated Monitoring and Cleanup: Implementing automated scripts or database jobs to periodically identify and clean up orphan sessions.

Prevention: Addressing the root causes, such as network stability and application robustness, can reduce the occurrence of orphan sessions.

Categories: DBA Blogs

Oracle OCI's Generative AI Service: A New Era in Cloud Computing

Thu, 2024-01-25 23:47

 The world of cloud computing is witnessing a revolutionary change with the introduction of Oracle Cloud Infrastructure's (OCI) Generative AI Service. This innovative offering from Oracle is a testament to the rapidly evolving field of artificial intelligence (AI), particularly in the realm of generative models. As businesses and developers seek more efficient and creative solutions, Oracle's new service stands out as a significant milestone.

What is Oracle OCI's Generative AI Service?

Oracle's OCI Generative AI Service is a cloud-based platform that provides users with access to powerful generative AI models. These models are capable of creating a wide range of content, including text, images, and possibly even audio or video in the future. The service is designed to integrate seamlessly with other OCI offerings, ensuring a cohesive and efficient cloud computing experience.

Key Features and Capabilities

Advanced AI Models

At the heart of OCI's Generative AI Service are state-of-the-art AI models that have been trained on vast datasets. These models can generate high-quality, original content based on user inputs, making them invaluable for a variety of applications.

Scalability and Performance

Oracle's robust cloud infrastructure ensures that the Generative AI Service can scale to meet the demands of any project, big or small. This scalability is crucial for handling large-scale AI tasks without compromising on performance or speed.

Integration with OCI Ecosystem

The service is designed to work seamlessly with other OCI products, such as data storage, analytics, and security services. This integration allows for a more streamlined workflow, as users can easily access and combine different OCI services.

Use Cases

The potential applications of Oracle OCI's Generative AI Service are vast and varied. Here are a few examples:

Content Creation

For marketers and content creators, the service can generate written content, images, and potentially other forms of media. This capability can significantly speed up the content creation process and inspire new ideas.

Business Intelligence

Businesses can leverage the AI's ability to analyze and synthesize information to gain insights from data. This can aid in decision-making, trend analysis, and strategy development.

Research and Development

In the R&D sector, the service can assist in generating hypotheses, modeling complex systems, and even predicting outcomes, thereby accelerating the pace of innovation.

Security and Ethics

Oracle recognizes the importance of ethical AI use and has implemented measures to ensure the responsible deployment of its Generative AI Service. This includes safeguards against generating harmful or biased content and maintaining user privacy and data security.

Getting Started with OCI Generative AI Service

To start using the service, users need to have an Oracle Cloud account. Oracle provides comprehensive documentation and support to help users integrate the AI service into their projects.


Oracle OCI's Generative AI Service is a groundbreaking addition to the cloud computing landscape. It offers immense potential for businesses, developers, and creators to harness the power of AI for generating content and gaining insights. As the technology continues to evolve, it will be exciting to see the innovative applications that emerge from this platform.

Oracle's commitment to integrating advanced AI capabilities into its cloud services is a clear indicator of the transformative impact AI is set to have across industries. The OCI Generative AI Service is not just a tool; it's a gateway to a future where AI and cloud computing work hand in hand to unlock new possibilities.

Categories: DBA Blogs

Top Code LLM in the World - Locally Install Stable Code 3B without GPU

Thu, 2024-01-18 01:27

This video walks through step by step guide to locally install top code AI Model which can run on CPU and its very small in size. 


pip install transformers torch
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("stabilityai/stable-code-3b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
inputs = tokenizer("write me a script in Java to reverse a list", return_tensors="pt").to(model.device)
tokens = model.generate(
print(tokenizer.decode(tokens[0], skip_special_tokens=True))
Categories: DBA Blogs

How to Install NVIDIA Drivers on AWS EC2 Instance Windows

Sun, 2024-01-14 18:40

 This video shows how to install NVIDIA drivers for Windows in AWS EC2 Instance G4DN and other instance types.

Commands Used:

msiexec.exe /i

aws --version 

In new windows, aws configure and set your IAM user key id and secret access key

Run below in Powershell as administrator:

Install-Module -Name AWS.Tools.Installer

$Bucket = "ec2-windows-nvidia-drivers"

$KeyPrefix = "latest"

$LocalPath = "$home\Desktop\NVIDIA"

$Objects = Get-S3Object -BucketName $Bucket -KeyPrefix $KeyPrefix -Region us-east-1

foreach ($Object in $Objects) {

    $LocalFileName = $Object.Key

    if ($LocalFileName -ne '' -and $Object.Size -ne 0) {

        $LocalFilePath = Join-Path $LocalPath $LocalFileName

        Copy-S3Object -BucketName $Bucket -Key $Object.Key -LocalFile $LocalFilePath -Region us-east-1



Categories: DBA Blogs

Talk with Comics Using AI in Any Language

Sat, 2024-01-13 23:26

 This video shows step by step demo with code as how to analyze comics in any language and talk to them using LlamaIndex and ChatGPT.

Code Used:

%pip install llama_index ftfy regex tqdm
%pip install git+
%pip install torch torchvision
%pip install matplotlib scikit-image
%pip install -U qdrant_client

import os

openai_api_key = os.environ['OPENAI_API_KEY']

from PIL import Image
import matplotlib.pyplot as plt
import os

image_paths = []
for img_path in os.listdir("./urdu"):
    image_paths.append(str(os.path.join("./urdu", img_path)))

def plot_images(image_paths):
    images_shown = 0
    plt.figure(figsize=(25, 12))
    for img_path in image_paths:
        if os.path.isfile(img_path):
            image =

            plt.subplot(2, 2, images_shown + 1)

            images_shown += 1
            if images_shown >= 9:


from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index import SimpleDirectoryReader

image_documents = SimpleDirectoryReader("./urdu").load_data()

openai_mm_llm = OpenAIMultiModal(
    model="gpt-4-vision-preview", api_key=openai_api_key, max_new_tokens=1500

response_eng = openai_mm_llm.complete(
    prompt="Describe the comic strip panels as an alternative text",


Categories: DBA Blogs

Use AI to Query AWS RDS Database with LlamaIndex

Mon, 2024-01-08 23:26

 This video shows step by step guide with code as how to integrate LlamaIndex with AWS RDS Postgresql database to query in natural language. Its AI and LLM at its best.

Commands Used:

sudo apt-get install libpq-dev

pip install llama-index sqlalchemy psycopg2

from sqlalchemy import create_engine, MetaData
from llama_index import SQLDatabase, VectorStoreIndex
from llama_index.indices.struct_store import SQLTableRetrieverQueryEngine
from llama_index.objects import SQLTableNodeMapping, ObjectIndex, SQLTableSchema

pg_uri = f"postgresql+psycopg2://postgres:test1234@<RDS Endpoint>:5432/testdb"

engine = create_engine(pg_uri)

metadata_obj = MetaData()

sql_database = SQLDatabase(engine)

from llama_index.indices.struct_store import NLSQLTableQueryEngine

query_engine = NLSQLTableQueryEngine(

query_str = "who works in AWS?"

response = query_engine.query(query_str)

query_str = "How many people work in GCP and what are there names?"

response = query_engine.query(query_str)


Categories: DBA Blogs

Train TinyLlama 1.1B Locally on Own Custom Dataset

Fri, 2024-01-05 12:11

 This video explains in easy and simple tutorial as how to train or fine-tune TinyLlama model locally by using unsloth on your own data.

Code Used:

import torch

major_version, minor_version = torch.cuda.get_device_capability()

!pip install "unsloth[colab] @ git+"

from unsloth import FastLanguageModel

import torch

max_seq_length = 4096

dtype = None

load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(

    model_name = "unsloth/tinyllama-bnb-4bit",

    max_seq_length = max_seq_length,

    dtype = dtype,

    load_in_4bit = load_in_4bit,


model = FastLanguageModel.get_peft_model(


    r = 32, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128

    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",

                      "gate_proj", "up_proj", "down_proj",],

    lora_alpha = 32,

    lora_dropout = 0,

    bias = "none",   

    use_gradient_checkpointing = False,

    random_state = 3407,

    max_seq_length = max_seq_length,


from trl import SFTTrainer

from transformers import TrainingArguments

from transformers.utils import logging


trainer = SFTTrainer(

    model = model,

    train_dataset = dataset,

    dataset_text_field = "text",

    max_seq_length = max_seq_length,

    packing = True, 

    args = TrainingArguments(

        per_device_train_batch_size = 2,

        gradient_accumulation_steps = 4,

        warmup_ratio = 0.1,

        num_train_epochs = 1,

        learning_rate = 2e-5,

        fp16 = not torch.cuda.is_bf16_supported(),

        bf16 = torch.cuda.is_bf16_supported(),

        logging_steps = 1,

        optim = "adamw_8bit",

        weight_decay = 0.1,

        lr_scheduler_type = "linear",

        seed = 3407,

        output_dir = "outputs",



trainer_stats = trainer.train()

Categories: DBA Blogs

How to Build RAG Pipeline with Mixtral 8x7B to Talk to Your Own Documents

Wed, 2023-12-13 18:26

 This video shows step by step process as how to locally build RAG pipeline with Mixtral 8x7B to talk to local documents in PDF etc.

Commands Used:


!pip install farm-haystack[colab]

from getpass import getpass

HF_TOKEN = getpass("Hugging Face Token")

from haystack.nodes import PreProcessor,PromptModel, PromptTemplate, PromptNode

from google.colab import files



!pip install PyPDF2

import PyPDF2

from haystack import Document

pdf_file_path = "e10897.pdf"  # Sostituisci con il percorso del tuo file PDF

def extract_text_from_pdf(pdf_path):

    text = ""

    with open(pdf_path, "rb") as pdf_file:

        pdf_reader = PyPDF2.PdfReader(pdf_file)

        for page_num in range(len(pdf_reader.pages)):

            page = pdf_reader.pages[page_num]

            text += page.extract_text()

    return text

pdf_text = extract_text_from_pdf(pdf_file_path)

# Creazione del documento di Haystack

doc = Document(


    meta={"pdf_path": pdf_file_path}


docs = [doc]

processor = PreProcessor(










preprocessed_docs = processor.process(docs)

from haystack.document_stores import InMemoryDocumentStore

document_store = InMemoryDocumentStore(use_bm25=True)


from haystack import Pipeline

from haystack.nodes import BM25Retriever

retriever = BM25Retriever(document_store, top_k=2)

qa_template = PromptTemplate(prompt=

  """ Using only the information contained in the context,

  answer only the question asked without adding suggestions of possible questions and answer exclusively in Italian.

  If the answer cannot be deduced from the context, reply: "\I don't know because it is not relevant to the Context.\"

  Context: {join(documents)};

  Question: {query}


prompt_node = PromptNode(





    model_kwargs={"model_max_length": 5000}


rag_pipeline = Pipeline()

rag_pipeline.add_node(component=retriever, name="retriever", inputs=["Query"])

rag_pipeline.add_node(component=prompt_node, name="prompt_node", inputs=["retriever"])

from pprint import pprint

print_answer = lambda out: pprint(out["results"][0].strip())

print_answer("What is Oracle DBA?"))

print_answer("Why Lion is king of jungle?"))

Categories: DBA Blogs

Mixtral 8X7B Local Installation - Step by Step

Mon, 2023-12-11 22:39
This is simple tutorial to locally install Mixtral 8*7B. 

pip3 install --upgrade transformers optimum
pip3 uninstall -y auto-gptq
git clone
cd AutoGPTQ
git checkout v0.5.1
pip3 install .
model_name_or_path = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, GPTQConfig
from auto_gptq import AutoGPTQForCausalLM

model_name_or_path = args.model_dir
# To use a different branch, change revision
# For example: revision="gptq-4bit-32g-actorder_True"
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True, trust_remote_code=False)

prompt = "Why Lion is King of Jungle?"
prompt_template=f'''<s>[INST] {prompt} [/INST]

print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
Categories: DBA Blogs

AWS PartyRock - Amazon Bedrock AI Playground

Thu, 2023-11-16 16:02

With PartyRock, you can build AI apps in seconds for free by using latest LLMs and without any code very easily.

Categories: DBA Blogs

Beginner Tutorial to Fine-Tune an AI Model

Thu, 2023-10-26 01:29

 This video steps through an easy tutorial to fine-tune a model on custom dataset from scratch by using LlamaIndex and Gradient.

Dataset Used:

{"inputs": "<s>### Instruction:\nWho is Fahd Mirza?\n\n### Response:\nFahd Mirza is an AI Cloud Engineer based in Sydney Australia. He has also got a background in databases and devops plus infrastrucutre.</s>"}

{"inputs": "<s>### Instruction:\nWhat are hobbies of Fahd Mirza?\n\n### Response\nFahd Mirza loves to spend time on his youtube channel and reading about technology.</s>"}

{"inputs": "<s>### Instruction:\nWhat Fahd Mirza's favorite Color?\n\n### Response:\nFahd Mirza's favorite color varies from time to time. These days its blue.</s>"}

{"inputs": "<s>### Instruction:\nWhat does Fahd Mirza look like?\n\n### Response:\nFahd Mirza looks like a human.</s>"}

.env File:



Commands Used:

!pip install llama-index gradientai -q

!pip install python-dotenv 

import os

from dotenv import load_dotenv, find_dotenv

_= load_dotenv(find_dotenv())

questions = [

    "Who is Fahd Mirza??",

    "What is Fahd Mirza's favorite Color?",

    "What are hobbies of Fahd Mirza?",


prompts = list(

    f"<s> ### Instruction:\n{q}\n\n###Response:\n" for q in questions



import os

from llama_index.llms import GradientBaseModelLLM

from llama_index.finetuning.gradient.base import GradientFinetuneEngine

base_model_slug = "nous-hermes2"

base_model_llm = GradientBaseModelLLM(

    base_model_slug=base_model_slug, max_tokens=100


base_model_responses = list(base_model_llm.complete(p).text for p in prompts)

finetune_engine = GradientFinetuneEngine(


    name="my test finetune engine model adapter",



epochs = 2

for i in range(epochs):


fine_tuned_model = finetune_engine.get_finetuned_model(max_tokens=100)

fine_tuned_model_responses = list(

    fine_tuned_model.complete(p).text for p in prompts



for i, q in enumerate(questions):

    print(f"Question: {q}")

    print(f"Base: {base_model_responses[i]}")

    print(f"Fine tuned: {fine_tuned_model_responses[i]}")


Categories: DBA Blogs

Setting Environment Variable in Google Colab

Wed, 2023-10-25 23:29

This video shows how to set environment variable and load them in Google Colab notebook, AWS Sagemaker notebook or Jupyter notebook.

Commands Used:

import os
from dotenv import load_dotenv, find_dotenv
_= load_dotenv(find_dotenv())

Categories: DBA Blogs

Step by Step Mistral 7B Installation Local on Linux Windows or in Cloud

Thu, 2023-10-19 22:25

 This is detailed tutorial as how to locally install Mistral 7B model in AWS, Linux, Windows, or anywhere you like.

Commands Used:

pip3 install optimum

pip3 install git+

git clone

cd AutoGPTQ

git checkout v0.4.2

pip3 install .

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

model_name_or_path = "TheBloke/SlimOpenOrca-Mistral-7B-GPTQ"

# To use a different branch, change revision

# For example: revision="gptq-4bit-32g-actorder_True"

model = AutoModelForCausalLM.from_pretrained(model_name_or_path,




tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

system_message = "You are an expert at bathroom renovations."

prompt = """

Renovate the following old bathroom:

I have a 25 year old house with an old bathroom. I want to renovate it completely. 

Think about it step by step, and give me steps to renovate the bathroom. Also give me cost of every step in Australian dollars.








print("\n\n*** Generate:")

input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()

output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)


# Inference can also be done using transformers' pipeline

print("*** Pipeline:")

pipe = pipeline(












Categories: DBA Blogs

Step by Step Fine-Tuning Mistral 7B with Custom Dataset

Sun, 2023-10-15 23:18

Large Language Models are trained on huge amount of data. Falcon 40B model, e.g. has been trained on 1 trillion tokens with 40 billion parameters. This training took around 2 months and 384 GPUs on AWS. 

If you want to use these LLMs for your own data, then you need to adapt them or fine-tune them. Fine-tuning a model larger than 10B is an expensive and time consuming task. 

This is where HuggingFace's PEFT library comes handy. PEFT stands for parameter efficent fine tuning. We can use a fine-tuning technique called as QLORA to train LLMs on our own dataset in far less time using far less resources. QLORA stands for Quantized Low Rank Adapation and allows us to to train a small portion of model without losing much efficieny. After the training is completed, there is no necessity to save the entire model, as the base model remains frozen.

Python Package Installation:


We begin by installing all the required dependencies. 

- The Huggingface Transformer Reinforcement Learning (TRL) library simplifies Reinforcement Learning from Human Feedback (RLHF) settings. 

- Transformers is a Python library that makes downloading and training state-of-the-art ML models easy.

- Accelerate is a library that enables the same PyTorch code to be run across any distributed configuration by adding just four lines of code

- Parameter-Efficient Fine-Tuning (PEFT) methods enable efficient adaptation of pre-trained language models (PLMs) to various downstream applications without fine-tuning all the model's parameters.

- Datasets is a library for easily accessing and sharing datasets for Audio, Computer Vision, and Natural Language Processing (NLP) tasks. 

- Bitsandbytes is a lightweight wrapper around CUDA custom functions, in particular 8-bit optimizers and quantization functions.

- einops stands for Einstein-Inspired Notation for operations. It is an open-source python framework for writing deep learning code in a new and better way.

- Tiktoken is an open-source tool developed by OpenAI that is utilized for tokenizing text. Tokenization is when you split a text string to a list of tokens. Tokens can be letters, words or grouping of words

- By using wandb, you can track, compare, explain and reproduce machine learning experiments.

- xFormers is a PyTorch based library which hosts flexible Transformers parts.

- SentencePiece is an unsupervised text tokenizer and detokenizer mainly for Neural Network-based text generation systems where the vocabulary size is predetermined prior to the neural model training.

!pip intall -q trl transformers accelerate peft datasets bitsandbytes einops tiktoken wandb xformers sentencepiece

Prepare Dataset:


I will be using Gath_baize dataset comprising approximately 210k prompts to train Mistral-7b. The dataset consists of a mixture of data from Alpaca, Stack Overflow, medical, and Quora datasets. In this load_dataset function we are loading the dataset with full train split as we are going to use this dataset in training. If we would be just testing it, then we would use split=test. 

from datasets import load_dataset

gathbaize = load_dataset("gathnex/Gath_baize",split="train")



gathbaize_sampled = gathbaize.shuffle(seed=42).select(range(50))


Check for GPU:


The NVIDIA System Management Interface (nvidia-smi) is a command line utility, based on top of the NVIDIA Management Library (NVML), intended to aid in the management and monitoring of NVIDIA GPU devices.


Create LLM Model:


-Torch is an open source ML library used for creating deep neural networks 

-AutoModelForCausalLM used for auto-regressive models. regressive means referring to previous state. Auto-regressive models predict future values based on past values.

-A tokenizer is responsible for preprocessing text into an array of numbers as inputs to a model.

-Bitsandbytes library simplifies the process of model quantization, making it more accessible and user-friendly.

import torch

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

from peft import prepare_model_for_kbit_training   => this prepares the model for fine-tuning.

model_name = "ybelkada/Mistral-7B-v0.1-bf16-sharded"

- BitsandBytesConfig is configuration for QLORA. QLoRA reduces the memory usage of LLM finetuning without performance tradeoffs compared to standard 16-bit model finetuning. QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training. 

- The basic way to load a model in 4bit is to pass the argument load_in_4bit=True

- There are different variants of 4bit quantization such as NF4 (normalized float 4 (default)) or pure FP4 quantization. NF4 is better for performance.

- You can change the compute dtype of the quantized model by just changing the bnb_4bit_compute_dtype argument. A dtype (data type) object describes how the bytes in the fixed-size block of memory corresponding to an array item should be interpreted.

- bnb_4bit_use_double_quant uses a second quantization after the first one to save an additional 0.4 bits per parameter. 

bnb_config = BitsAndBytesConfig(

    load_in_4bit= True,

    bnb_4bit_quant_type= "nf4",

    bnb_4bit_compute_dtype= torch.bfloat16,

    bnb_4bit_use_double_quant= False,


- Whether or not to allow for custom models defined on the Hub in their own modeling files. 

model = AutoModelForCausalLM.from_pretrained(






- When fine-tuning the model, you want to use the updated model params. Using the old (cached) values kinda defeats the purpose of finetuning. Hence, the past (cached) key values are disregarded for the fine-tuned model.

- Setting config.pretraining_tp to a value different than 1 will activate the more accurate but slower computation of the linear layers

- Gradient check-pointing is only needed if training leads to out-of-memory (OOM) errors so its a sort of best practice.

model.config.use_cache = False

model.config.pretraining_tp = 1


model = prepare_model_for_kbit_training(model)

Create LLM Tokenizer:


- Pad_token is a special token used to make arrays of tokens the same size for batching purpose.

- eos_token is a special token used as an end of sentence token

- bos_token is a special token representing the beginning of a sentence.

tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)

tokenizer.pad_token = tokenizer.eos_token

tokenizer.add_eos_token = True

tokenizer.add_bos_token, tokenizer.add_eos_token

from peft import LoraConfig, TaskType

- LoraConfig allows you to control how LoRA is applied to the base model through the following parameters:

lora_alpha: LoRA scaling factor.

r: the rank of the update matrices, expressed in int. Lower rank results in smaller update matrices with fewer trainable parameters.

bias: Specifies if the bias parameters should be trained. Can be 'none', 'all' or 'lora_only'.

target_modules: The modules (for example, attention blocks) to apply the LoRA update matrices.

(lora_dropout): This is the probability that each neuron's output is set to zero during training, used to prevent overfitting.

peft_config = LoraConfig(






target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]


from peft import get_peft_model

model = get_peft_model(model,peft_config)

from transformers import TrainingArguments

- num_train_epochs(`float`, *optional*, defaults to 3.0): Total number of training epochs to perform

- per_device_train_batch_size is the batch size per GPU/TPU core/CPU for training. 

- Gradient accumulation is a technique that simulates a larger batch size by accumulating gradients from multiple small batches before performing a weight update. This technique can be helpful in scenarios where the available memory is limited, and the batch size that can fit in memory is small.

- This parameter tells the optimizer how far to move the weights in the direction opposite of the gradient for a mini-batch.

- warmup_ration is ratio of total training steps used for a linear warmup from 0 to learning_rate.

- max steps  If set to a positive number, the total number of training steps to perform.

training_arguments = TrainingArguments(

    output_dir= "./results",

    num_train_epochs= 1,

    per_device_train_batch_size= 8,

    gradient_accumulation_steps= 2,

    optim = "paged_adamw_8bit",

    save_steps= 5000,

    logging_steps= 30,

    learning_rate= 2e-4,

    weight_decay= 0.001,

    fp16= False,

    bf16= False,

    max_grad_norm= 0.3,

    max_steps= -1,

    warmup_ratio= 0.3,

    group_by_length= True,

    lr_scheduler_type= "constant"


from trl import SFTTrainer

- The SFTTrainer is a light wrapper around the transformers Trainer to easily fine-tune language models or adapters on a custom dataset.

- max_seq_length: maximum sequence length to use for the `ConstantLengthDataset` and for automaticallty creating the Dataset. Defaults to `512`.

- SFTTrainer supports example packing, where multiple short examples are packed in the same input sequence to increase training efficiency.


trainer = SFTTrainer(








packing= False,



Saving the Model:


trained_model_dir = './trained_model'


Load the Trained Model:


from peft import PeftConfig, PeftModel

config = PeftConfig.from_pretrained(trained_model_dir)

trained_model = AutoModelForCausalLM.from_pretrained(







trained_model = PeftModel.from_pretrained(trained_model,trained_model_dir)

trained_model_tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path,trust_remote_code=True)

trained_model_tokenizer.pad_token = trained_model_tokenizer.eos_token

Create Generation Config for Prediction:


generation_config = trained_model.generation_config

generation_config.max_new_token = 1024

generation_config.tempreture = 0.7

generation_config.top_p = 0.7

generation_config.num_return_sequence = 1

generation_config.pad_token_id = trained_model_tokenizer.pad_token_id

generation_config.eos_token_id = trained_model_tokenizer.eos_token_id


Model Inference:


device = 'cuda:0'

query = 'larget text to be summarized'

user_prompt = 'Explain large language models'

system_prompt = 'The conversation between Human and AI assisatance named MyMistral\n'

B_INST, E_INST = "[INST]", "[/INST]"

prompt = f"{system_prompt}{B_INST}{user_prompt.strip()}\n{E_INST}"

encodings = trained_model_tokenizer(prompt, return_tensors='pt').to(device)


with torch.inference_mode():

outputs = trained_model.generate(







outputs = trained_model_tokenizer.decode(outputs[0],skip_special_tokens=True)


Categories: DBA Blogs

AlloyDB Omni with Vertex AI Installation Locally in AWS

Thu, 2023-10-12 19:13

 This video is step by step tutorial to install AlloyDB Omni with Vertex AI support locally in AWS.

Commands Used:

    2  sudo curl | sh   && sudo systemctl --now enable docker

    3  sudo apt-get update

    4  sudo groupadd docker

    5  sudo usermod -aG docker ${USER}

    6  sudo systemctl restart docker

    7  stat -fc %T /sys/fs/cgroup/

    8  sudo apt-get install apt-transport-https ca-certificates gnupg curl sudo

    9  echo "deb [signed-by=/usr/share/keyrings/] cloud-sdk main" | sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list

   10  curl | sudo apt-key --keyring /usr/share/keyrings/ add -

   11  sudo apt-get update && sudo apt-get install google-cloud-cli

   12  gcloud init

   13  cat /etc/*release

   14  curl | sudo tee /usr/share/keyrings/

   15  sudo apt-get update && sudo apt-get install google-cloud-cli

   16  gcloud init

   17  curl | sudo apt-key add -

   18  sudo apt update

   19  echo "deb alloydb-omni-apt main"   | sudo tee -a /etc/apt/sources.list.d/artifact-registry.list

   20  sudo apt update

   21  sudo apt-get install alloydb-cli

   22  sudo alloydb system-check

   23  df -hT

   24  cd /

   25  ls

   29  sudo mkdir /alloydb

   32  sudo chown ubuntu:ubuntu /alloydb

   33  sudo chmod 777 /alloydb

   49  sudo alloydb database-server install     --data-dir=/alloydb     --enable-alloydb-ai=true     --private-key-file-path=/home/ubuntu/key.json     --vertex-ai-region="us-central1"

   50  sudo alloydb database-server start

   56  docker exec -it pg-service psql -h localhost -U postgres

Categories: DBA Blogs

Tutorial Amazon Bedrock to Create Chatbot with Persona

Fri, 2023-10-06 17:30

 This video tutorial shows how code and step by step description with demo as how to use AWS Bedrock to create chatbot with persona.


import boto3
import json
import os
import sys

from langchain.chains import ConversationChain
from langchain.llms.bedrock import Bedrock
from langchain.memory import ConversationBufferMemory

from langchain.prompts import PromptTemplate

template = """The following is a friendly conversation between a human and an AI.
              The AI is talkative and provides lots of specific details from its context.
              If the AI does not know the answer to a question, it truthfully says it does not know.
Current conversation:
Human: {input}

claude_prompt = PromptTemplate(input_variables=["history", "input"], template=template)

bedrock = boto3.client(

memory = ConversationBufferMemory(ai_prefix="Assistant")
memory.chat_memory.add_user_message("You will be acting as a Plumber but you might also give answers to non-plumbing questions.")
memory.chat_memory.add_ai_message("I am a Plumber and give professional answers")

cl_llm = Bedrock(model_id="anthropic.claude-v2",client=bedrock)

conversation = ConversationChain(
     llm=cl_llm, verbose=True, memory=memory

conversation.prompt = claude_prompt

#print(conversation.predict(input="What are steps to renovate a bathroom?"))
#print(conversation.predict(input="How do you fix a leaking tap?"))
print(conversation.predict(input="how to write a python program to reverse a list?"))

Categories: DBA Blogs

Clone Any Voice with AI - Locally Install XTTS Model

Sat, 2023-09-16 21:52

 This video shows in step by step tutorial as how to install and run Coqui XTTS model locally. TTS is a Voice generation model that lets you clone voices into different languages by using just a quick 3-second audio clip.

Commands Used:

!pip install transformers !pip install tts from TTS.api import TTS tts = TTS("tts_models/multilingual/multi-dataset/xtts_v1", gpu=True) tts.tts_to_file(text="This is my new cloned voice in AI. If you like, don't forget to subscribe to this channel.", file_path="output.wav", speaker_wav="speaker.wav", language="en")

Categories: DBA Blogs

How to Install Llama 2 on Google Cloud Platform - Step by Step Tutorial

Thu, 2023-09-14 21:42

 This video shows you step by step instructions as how to deploy and run Llama 2 and Code Llama models on GCP in Vertex AI API easily and quickly.

Categories: DBA Blogs

Step by Step Demo of Vertex AI in GCP

Wed, 2023-09-13 20:52

 This tutorial gets your started with GCP Vertex AI Generative AI service in step by step demo.

Commands Used:

gcloud services enable

gcloud iam service-accounts create <Your Service Account Name>

gcloud projects add-iam-policy-binding <Your Project ID> \

    --member=serviceAccount:<Your Service Account Name>@<Your Project ID> \


from google.auth.transport.requests import Request

from google.oauth2.service_account import Credentials

key_path='<Your Project ID>.json'

credentials = Credentials.from_service_account_file(



if credentials.expired:


PROJECT_ID = '<Your Project ID>'

REGION = 'us-central1'

!pip install -U google-cloud-aiplatform "shapely<2"

import vertexai

# initialize vertex

vertexai.init(project = PROJECT_ID, location = REGION, credentials = credentials)

from vertexai.language_models import TextGenerationModel

generation_model = TextGenerationModel.from_pretrained("text-bison@001")

prompt = "I want to self manage a bathroom renovation project in my home. \

Please suggest me step by step plan to carry out this project."


Categories: DBA Blogs