NVIDIA Garak is a robust command-line tool for probing vulnerabilities in Large Language Models (LLMs). It supports a wide range of models and environments, enabling testing for issues like prompt injection, data leakage, hallucinations, and more.
Garak simplifies red-teaming for LLMs by providing detailed insights into model behavior through various probes and detectors. It supports models from Hugging Face, OpenAI, Replicate, Cohere, NVIDIA, and many others.
Command Reference:
python -m pip install -U garak
python3 -m garak --model_type huggingface --model_name bert-base-uncased
# Install Garak
python -m pip install -U garak
# Example: Test Hugging Face GPT2 for vulnerabilities
python3 -m garak --model_type huggingface --model_name gpt2 --probes dan.Dan_11_0
# Example: Check OpenAI API for Data Leakage
python3 -m garak --model_type openai --model_name gpt-3.5-turbo --probes data_leakage.DataLeakProbe
# Example: Run Hallucination Detection
python3 -m garak --model_type replicate --model_name stability-ai/stablelm-tuned-alpha-7b --probes hallucination.HallucinationProbe