ReadMe with correct example usage.

#4
Files changed (1) hide show
  1. context_relevancy_lora/README.md +115 -34
context_relevancy_lora/README.md CHANGED
@@ -6,7 +6,6 @@ base_model: ibm-granite/granite-3.3-8b-instruct
6
  library_name: peft
7
  library_name: transformers
8
  ---
9
-
10
  # LoRA Adapter for Context Relevancy
11
  Welcome to Granite Experiments!
12
 
@@ -20,7 +19,6 @@ Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing
20
  This is a LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) that is fine-tuned for the context relevancy task:
21
 
22
  Given (1) a document and (2) a multi-turn conversation between a user and an AI assistant, identify whether the document is relevant (including partially relevant) and useful to answering the last user question.
23
-
24
  While this adapter is general purpose, it is especially effective in RAG settings right after the retrieval model's step where the adapter can be used to identify documents or passages that may mislead or harm the downstream generator model's response generation.
25
 
26
  - **Developer:** IBM Research
@@ -37,18 +35,27 @@ The classification output from the context relevancy model can be used in severa
37
  - Signal to human annotators working in RAG settings which documents are irrelevant/relevant to the current turn of the conversation they are reviewing. Identifying such documents helps reduce the [human annotator's high cognitive load](https://dl.acm.org/doi/10.1145/3706599.3719962) involved in manually reading and reviewing several documents, especially in long multi-turn conversations.
38
 
39
 
40
- **Model input**: The input to the model is a list of conversational turns and a list of documents, where each document is a dict containing the fields `title` and `text`. The turns in the conversation can alternate between the `user` and `assistant` roles, and the last turn is assumed to be from the `user`. For every document in the list of documents, the model converts that document and the conversation into a string using the `apply_chat_template` function.
 
 
 
 
41
 
42
- To prompt the LoRA adapter to determine context relevancy, a special context relevancy role is used to trigger this capability of the model. The role includes the keyword "context_relevance": `<|start_of_role|>context_relevance<|end_of_role|>`
 
 
 
 
43
 
44
- ~~~
 
45
  <|start_of_role|>context_relevance: Analyze the provided document in relation to the final user query from the conversation. Determine if the document contains information that could help answer the final user query. Output 'relevant' if the document contains substantial information directly useful for answering the final user query. Output 'partially relevant' if the document contains some related information that could partially help answer the query, or if you are uncertain about the relevance - err on the side of 'partially relevant' when in doubt. Output 'irrelevant' only if the document clearly contains no information that could help answer the final user query. When uncertain, choose 'partially relevant' rather than 'irrelevant'. Your output should be a JSON structure with the context relevance classification:
46
  ```json
47
  {
48
  "context_relevance": "YOUR_CONTEXT_RELEVANCE_CLASSIFICATION_HERE"
49
  }
50
  ```<|end_of_role|>
51
- ~~~
52
 
53
  **Model output**: When prompted with the above input, the model generates a json structure containing the context relevance output (irrelevant, partially relevant, relevant), e.g.
54
 
@@ -63,58 +70,132 @@ To prompt the LoRA adapter to determine context relevancy, a special context rel
63
  Use the code below to get started with the model. Before running the script, set the `LORA_NAME` parameter to the path of the directory that you downloaded the LoRA adapter. The download process is explained [here](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib#quickstart-example).
64
 
65
  ```python
66
- import torch
67
- from transformers import AutoTokenizer, AutoModelForCausalLM
68
- from peft import PeftModel
69
- from peft import PeftModelForCausalLM as lora_model
 
 
 
70
 
71
- device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')
 
 
 
 
 
 
 
72
 
73
- CONTEXT_RELEVANCY_PROMPT = "<|start_of_role|>context_relevance<|end_of_role|>"
74
  BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
75
  LORA_NAME = "PATH_TO_DOWNLOADED_DIRECTORY"
76
 
77
- tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left',trust_remote_code=True)
78
- model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME,device_map="auto")
 
79
  model_context_relevancy = PeftModel.from_pretrained(model_base, LORA_NAME)
80
 
 
81
  convo = [
82
  {
83
- "role": "user",
84
- "content": "Am I better off going to work for a FAANG?"
85
  },
86
  {
87
- "role": "assistant",
88
- "content": "I can't tell you much about working for a FAANG (Facebook, Amazon, Apple, Netflix, Google) company, but large companies offer resources such as the opportunity to learn from experienced people, and teams dedicated to support you. There isn't a single \"\"right\"\" place to work. FAANG companies tend to offer 6-figure salaries."
89
  },
90
  {
91
- "role": "user",
92
- "content": "which FAANG pays the most?"
93
  }
94
- ]
95
-
96
 
97
  documents = [
98
  {
99
  "title": "",
100
- "text": "\nEh, you hear this argument all the time, but it doesn't actually work out that way in the real world, though, because corporate pay structure is extremely malleable over time. If a company makes $100M one year, it will pay the investors what they're expecting, the low-level employees what they're willing to tolerate (which is often the minimum wage), and then the upper management whatever is left (i.e. whatever the company can afford to attract the best management). By bumping up the minimum wage, the main effect is that it forces companies to change their pay structures (which are currently ridiculous - the U.S. CEO-to-avg-worker pay is around 200:1; Japan and Germany are around 15:1, IIRC). Feel free to dig deeper into the numbers and the studies if you want further evidence, but even a cursory glance at our history (or the current situation in Australia) shows that the effects of a high minimum wage on both inflation and unemployment are largely overstated."
101
  },
102
  {
103
  "title": "",
104
- "text": "\nThe highest paid finance role is a hedge fund manager at a top fund - but that's like winning the lotto so here's the most pragmatic way to make a lot of money: * First 2-3 years out of college: Investment Banking Analyst * Next 2-3 years: Switch to the buyside (Private Equity) You'll easily top $400k by the time you're 26-27. If you're promoted to VP you are golden. Most get forced out after their associate stint and go to a top MBA program, after which you'd go back into PE or do the CFO route. Not sure w/o a degree, to be honest."
105
  }
106
  ]
107
 
108
- for document in documents:
109
- string = tokenizer.apply_chat_template(convo, documents=[document], tokenize=False,add_generation_prompt=False)
110
- inputs = string + CONTEXT_RELEVANCY_PROMPT
111
-
112
- inputT = tokenizer(inputs, return_tensors="pt")
113
-
114
- output = model_context_relevancy.generate(inputT["input_ids"].to(device), attention_mask=inputT["attention_mask"].to(device), max_new_tokens=3)
115
- output_text = tokenizer.decode(output[0])
116
- answer = output_text.split(CONTEXT_RELEVANCY_PROMPT)[1]
117
- print(answer)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  ```
119
 
120
  ## Training Details
 
6
  library_name: peft
7
  library_name: transformers
8
  ---
 
9
  # LoRA Adapter for Context Relevancy
10
  Welcome to Granite Experiments!
11
 
 
19
  This is a LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct) that is fine-tuned for the context relevancy task:
20
 
21
  Given (1) a document and (2) a multi-turn conversation between a user and an AI assistant, identify whether the document is relevant (including partially relevant) and useful to answering the last user question.
 
22
  While this adapter is general purpose, it is especially effective in RAG settings right after the retrieval model's step where the adapter can be used to identify documents or passages that may mislead or harm the downstream generator model's response generation.
23
 
24
  - **Developer:** IBM Research
 
35
  - Signal to human annotators working in RAG settings which documents are irrelevant/relevant to the current turn of the conversation they are reviewing. Identifying such documents helps reduce the [human annotator's high cognitive load](https://dl.acm.org/doi/10.1145/3706599.3719962) involved in manually reading and reviewing several documents, especially in long multi-turn conversations.
36
 
37
 
38
+ **Model input**: The input to the model consists of:
39
+ 1. A conversation formatted using the chat template
40
+ 2. The final user query extracted from the conversation
41
+ 3. A document to evaluate for relevance
42
+ 4. A special context relevancy invocation prompt
43
 
44
+ The model uses a specific format with separate roles for each component:
45
+ - Conversation: Applied via `tokenizer.apply_chat_template()`
46
+ - Final user query: `<|start_of_role|>final_user_query<|end_of_role|>{query}<|end_of_text|>`
47
+ - Document: `<|start_of_role|>document {"document_id": "1"}<|end_of_role|>{document_content}<|end_of_text|>`
48
+ - Context relevance prompt: See below
49
 
50
+ **Context Relevance Invocation Prompt**:
51
+ ```
52
  <|start_of_role|>context_relevance: Analyze the provided document in relation to the final user query from the conversation. Determine if the document contains information that could help answer the final user query. Output 'relevant' if the document contains substantial information directly useful for answering the final user query. Output 'partially relevant' if the document contains some related information that could partially help answer the query, or if you are uncertain about the relevance - err on the side of 'partially relevant' when in doubt. Output 'irrelevant' only if the document clearly contains no information that could help answer the final user query. When uncertain, choose 'partially relevant' rather than 'irrelevant'. Your output should be a JSON structure with the context relevance classification:
53
  ```json
54
  {
55
  "context_relevance": "YOUR_CONTEXT_RELEVANCE_CLASSIFICATION_HERE"
56
  }
57
  ```<|end_of_role|>
58
+ ```
59
 
60
  **Model output**: When prompted with the above input, the model generates a json structure containing the context relevance output (irrelevant, partially relevant, relevant), e.g.
61
 
 
70
  Use the code below to get started with the model. Before running the script, set the `LORA_NAME` parameter to the path of the directory that you downloaded the LoRA adapter. The download process is explained [here](https://huggingface.co/ibm-granite/granite-3.3-8b-rag-agent-lib#quickstart-example).
71
 
72
  ```python
73
+ import torch
74
+ import json
75
+ import re
76
+ from transformers import AutoTokenizer, AutoModelForCausalLM
77
+ from peft import PeftModel
78
+
79
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
80
 
81
+ # Define the context relevance prompt
82
+ CR_INSTRUCTION_TEXT = "Analyze the provided document in relation to the final user query from the conversation. Determine if the document contains information that could help answer the final user query. Output 'relevant' if the document contains substantial information directly useful for answering the final user query. Output 'partially relevant' if the document contains some related information that could partially help answer the query, or if you are uncertain about the relevance - err on the side of 'partially relevant' when in doubt. Output 'irrelevant' only if the document clearly contains no information that could help answer the final user query. When uncertain, choose 'partially relevant' rather than 'irrelevant'."
83
+ cr_json_object = {
84
+ "context_relevance": "YOUR_CONTEXT_RELEVANCE_CLASSIFICATION_HERE"
85
+ }
86
+ cr_json_str = json.dumps(cr_json_object, indent=4)
87
+ CR_JSON = "Your output should be a JSON structure with the context relevance classification:\n" + "```json\n" + cr_json_str + "\n```"
88
+ CR_INVOCATION_PROMPT = "<|start_of_role|>context_relevance: " + CR_INSTRUCTION_TEXT + " " + CR_JSON + "<|end_of_role|>"
89
 
 
90
  BASE_NAME = "ibm-granite/granite-3.3-8b-instruct"
91
  LORA_NAME = "PATH_TO_DOWNLOADED_DIRECTORY"
92
 
93
+ # Load tokenizer and models
94
+ tokenizer = AutoTokenizer.from_pretrained(BASE_NAME, padding_side='left', trust_remote_code=True)
95
+ model_base = AutoModelForCausalLM.from_pretrained(BASE_NAME, device_map="auto")
96
  model_context_relevancy = PeftModel.from_pretrained(model_base, LORA_NAME)
97
 
98
+ # Example conversation and documents
99
  convo = [
100
  {
101
+ "role": "user",
102
+ "content": "Am I better off going to work for a FAANG?"
103
  },
104
  {
105
+ "role": "assistant",
106
+ "content": "I can't tell you much about working for a FAANG (Facebook, Amazon, Apple, Netflix, Google) company, but large companies offer resources such as the opportunity to learn from experienced people, and teams dedicated to support you. There isn't a single \"\"right\"\" place to work. FAANG companies tend to offer 6-figure salaries."
107
  },
108
  {
109
+ "role": "user",
110
+ "content": "which FAANG pays the most?"
111
  }
112
+ ]
 
113
 
114
  documents = [
115
  {
116
  "title": "",
117
+ "text": "\nEh, you hear this argument all the time, but it doesn't actually work out that way in the real world, though, because corporate pay structure is extremely malleable over time. If a company makes $100M one year, it will pay the investors what they're expecting, the low-level employees what they're willing to tolerate (which is often the minimum wage), and then the upper management whatever is left (i.e. whatever the company can afford to attract the best management). By bumping up the minimum wage, the main effect is that it forces companies to change their pay structures (which are currently ridiculous - the U.S. CEO-to-avg-worker pay is around 200:1; Japan and Germany are around 15:1, IIRC). Feel free to dig deeper into the numbers and the studies if you want further evidence, but even a cursory glance at our history (or the current situation in Australia) shows that the effects of a high minimum wage on both inflation and unemployment are largely overstated."
118
  },
119
  {
120
  "title": "",
121
+ "text": "\nThe highest paid finance role is a hedge fund manager at a top fund - but that's like winning the lotto so here's the most pragmatic way to make a lot of money: * First 2-3 years out of college: Investment Banking Analyst * Next 2-3 years: Switch to the buyside (Private Equity) You'll easily top $400k by the time you're 26-27. If you're promoted to VP you are golden. Most get forced out after their associate stint and go to a top MBA program, after which you'd go back into PE or do the CFO route. Not sure w/o a degree, to be honest."
122
  }
123
  ]
124
 
125
+ def extract_and_format_json(raw_text):
126
+ """Extract JSON content from raw_text"""
127
+ match = re.search(r"```json\s*(.*?)\s*```", raw_text, re.DOTALL)
128
+ if not match:
129
+ raise ValueError("No valid JSON fenced by ```json ...``` was found.")
130
+
131
+ json_string = match.group(1)
132
+ # Remove invalid escape sequences
133
+ cleaned_json_string = re.sub(r'\\(?![\"\\/bfnrt]|u[0-9a-fA-F]{4})', '', json_string)
134
+
135
+ try:
136
+ parsed = json.loads(cleaned_json_string)
137
+ except json.JSONDecodeError as e:
138
+ raise ValueError(f"Invalid JSON format after cleaning: {e}")
139
+
140
+ return parsed
141
+
142
+ # Process each document
143
+ for i, document in enumerate(documents):
144
+ # Extract document content
145
+ document_content = f"{document['title']}\n\n{document['text']}".strip() if document['title'] else document['text']
146
+
147
+ # Extract final user query
148
+ final_user_query = None
149
+ for msg in reversed(convo):
150
+ if msg["role"] == "user":
151
+ final_user_query = msg["content"]
152
+ break
153
+
154
+ # Create conversation string (without system prompt if not present)
155
+ conversation_with_system = [{"role": "system", "content": ""}] + convo
156
+ conversation_string = tokenizer.apply_chat_template(conversation_with_system, tokenize=False, add_generation_prompt=False)
157
+ # Remove the system prompt part
158
+ string_to_remove = tokenizer.apply_chat_template([conversation_with_system[0]], tokenize=False, add_generation_prompt=False)
159
+ conversation_string = conversation_string[len(string_to_remove):]
160
+
161
+ # Build the input format
162
+ final_query_role = f"<|start_of_role|>final_user_query<|end_of_role|>{final_user_query}<|end_of_text|>\n"
163
+ document_role = f"<|start_of_role|>document {{\"document_id\": \"1\"}}<|end_of_role|>\n{document_content}<|end_of_text|>\n"
164
+
165
+ # Construct the full input
166
+ input_text = conversation_string + final_query_role + document_role + CR_INVOCATION_PROMPT
167
+
168
+ # Tokenize and generate
169
+ inputs = tokenizer(input_text, return_tensors="pt")
170
+ model_device = next(model_context_relevancy.parameters()).device
171
+
172
+ output = model_context_relevancy.generate(
173
+ inputs["input_ids"].to(model_device),
174
+ attention_mask=inputs["attention_mask"].to(model_device),
175
+ max_new_tokens=50,
176
+ pad_token_id=tokenizer.eos_token_id,
177
+ do_sample=False # Deterministic greedy decoding
178
+ )
179
+
180
+ # Decode and extract the generated part
181
+ raw_output_text = tokenizer.decode(output[0])
182
+ generated_part = raw_output_text.rsplit("<|end_of_role|>", 1)[-1]
183
+
184
+ # Extract the classification from JSON
185
+ try:
186
+ parsed_json = extract_and_format_json(generated_part)
187
+ classification = parsed_json["context_relevance"]
188
+ print(f"Document {i+1}: {classification}")
189
+ except ValueError as e:
190
+ print(f"Document {i+1}: Error parsing output - {e}")
191
+ # Fallback to text-based extraction
192
+ output_lower = generated_part.lower()
193
+ if "irrelevant" in output_lower and "partially" not in output_lower:
194
+ print(f"Document {i+1}: irrelevant (fallback)")
195
+ elif "partially relevant" in output_lower:
196
+ print(f"Document {i+1}: partially relevant (fallback)")
197
+ elif "relevant" in output_lower:
198
+ print(f"Document {i+1}: relevant (fallback)")
199
  ```
200
 
201
  ## Training Details