
 
 
By using this file, you are agreeing to this product's EULA

This product can be obtained in https://anonymous.4open.science/r/SAFE-ICLR

Copyright ©2024-2025 XXXX-1



# SAFE Framework: Sentence-Level In-Generation Attribution System

This system comprises two steps: pre-attribution (optional, but recommended) and attribution.

# Pre-attribution

The pre-attribution steps aim to categorize sentences into requiring zero, one, or multiple references.

By doing so, we can select an attribution method that is fit for the ideal number of references, increasing attribution accuracy and reducing the computational overhead of the system. 

The classifier is trained using the HAGRID-Clean dataset, a dataset with over 7000 sentences labeled as requiring zero, one, or multiple references.

In order to train a classifier, we extract 24 features from each sample (described at the end of this file)


# Attribution

By default, a sentence is assigned to an attribution method that can match it to multiple quotes. Since this is a computationally expensive approach, we recommend using pre-attribution to assign a sentence to one of the following categories:

Zero references: The attribution step is skipped.

One reference: The default attribution method matches the sentence with the closest quote from the source document, in the embedding space, using the cosine distance. In a search space with N sentences, the computational complexity of searching for the closest sentence is O(N).

Multiple references: The default attribution method matches the sentence with the closest quote (or quote-pair) from the source document, in the embedding space, using the cosine distance. The distance to a quote-pair is defined as the distance between a sentence and the average of the embeddings of the two quotes. In a search space with N sentences, the computational complexity of searching for the closest sentence is O(2**N).

# How to use

To use this system without the pre-attribution step, the user must: 
- Add two files to the files folder containing the generated text and a source file (plain text and PDF only);
- Run the script: python main_attribution_example.py [source_file] [generated_text_file]
  
To use this system with the pre-attribution step, the user must: 
- Download the HAGRID-Clean dataset (link provided when the paper is published) and place it in the dataset directory
- Add two files to the files folder containing the generated text and a source file (plain text and PDF only);
- Run the script: python main_preatt_attribution_example.py [source_file] [generated_text_file]



Note: Currently, there are two methods available to extract sentences from files: extractSentencesFromPDF and extractSentencesFromTXT. Other file types may be implemented in the future.


# Real-world Application Examples

Note: The LLMs answers have been placed in the "files" directory.


- Example #1: Summarizing a scientific paper: 

We provide a paper (https://arxiv.org/abs/2310.05634) and the query "Make a summary of the paper attached to this message using simple sentences." to ChatGPT, Perplexity, and DeepSeek, and collect the answers.
Then, the answer and paper, in PDF format, are given as input to our system, which attempts to assign each sentence in the answer to a quote from the paper.

We observed that the pre-attribution model rarely identifies sentences as not requiring attribution. Instead, it recommends searching for a single quote. This might be due to the training data consisting mostly of single-reference sentences. In the attribution step, the system gives satisfactory results in the text generated by Perplexity and DeepSeek, correctly attributing more than half of the generated text. Although we detected issues attributing ChatGPT's answer, we later dismissed them since, after manually searching for attributable sentences in the paper, we found that part of the answer was hallucinated. Since the system assigns each sentence to the most similar quote, hallucinated content is often incorrectly attributed to seemingly random sections text, leading to unexpected results. In future work, we plan to explore attribution algorithms that discard quotes unrelated to the input

Here follows an excerpt of the attribution log from when we attempt to attribute the text generated by perplexity back to the paper:

> INPUT Sentence:

Direct model-driven attribution: The model itself tries to provide references for
its answers, but these are often incomplete or inaccurate.

> OUTPUT Quotes:

In direct model-driven attribution way, the reference document is derived from model
itself and is used to cite generated answer.

> INPUT Sentence:

Post-retrieval answering: The model retrieves information from external sources and
then answers, but it can be hard to tell if the answer really matches the sources.


> OUTPUT Quotes:

2.Post-retrieval answering: This approach is rooted in the idea of explicitly
retrieving information and then letting the model answer based on this retrieved data.

> INPUT Sentence:

Post-generation attribution: The model first generates an answer, then searches for
references to support it, and edits the answer if needed.


> OUTPUT Quotes:

3.Post-generation attribution: The system first provides an answer, then conducts a
search using both the question and answer for attribution.


- Example #2: Query about medicine consumption: 

We ask GPT about whether it is safe to increase the dosage of Paracetamol, and then try to match the answer with leaflets written in both Portuguese and English.

Without providing documents, we query GPT about whether it is safe to increase the dosage of Paracetamol. Then, we provide the answer to our system and two leaflets, written in Portuguese and English. In the Portuguese leaflet, our system provides a table containing the daily maximum dosage values per age and weight; and in the English leaflet, the system provides a quote with the same information as GPT. This indicates that our system may be capable of matching information to sources, even when they are written in different languages.

Here follows the sentences retrieved from the Ben-u-ron and NHS Fife's leaflets on how to take paracetamol:

INPUT: ChatGPT's answer

- Do not exceed 4000 mg (4 grams) in 24 hours.\\

OUTPUT (Ben-u-ron's Leaflet):

- posologia habitual para o paracetamol é:     Peso corporal  Idade  Dose única por toma  Dose máxima d iária  Até 50 Kg  Adolescentes entre os  12 e 15 anos  1 cápsula  Até 4 cápsulas   (equivalente a 2000 mg de  paracetamol)   Acima de 50  Kg Adolescentes entre os  16 e 18 anos e  Adultos  1-2 cápsulas  Até 6 cápsulas   (equivalente a 3000 mg de  paracetamol)     Dose máxima diária   A dose máxima diária de Paracetamol não deve exceder 3 g/dia.
- Translation Note: The system is providing a Table describing the daily dosage limits per age and weight. Here, we can read "Adults should take up to 6 pills (equivalent to 3000mg of paracetamol)", which goes against GPT's answer.

OUTPUT (NHS Fife's Leaflet):

- A maximum of 8  tablets (4000mg or 4g) should be taken in a 24 hour period.




# Dependencies
PyPDF2
nltk
transformers
sentence_transformers

# Model used for embedding generation
all-MiniLM-L6-v2.pt


# Features extracted from sentences for pre-attribution

0 & Fraction of unique words (lexical diversity).
1 & Named entity density: fraction of tokens that are named entities. 
2 & Syntactic parse tree depth. 
3 & Flesch reading ease score (higher = easier to read). 
4 & Shannon entropy of character or word distribution. 
5 & Average number of WordNet synsets per word (semantic ambiguity). 
6 & Ratio of nouns to verbs. 
7 & Proportion of stopwords in the sentence. 
8 & Ratio of punctuation marks to total words or characters. 
9 & Average number of characters per word. 
10 & Total number of syllables in the sentence. 
11 & Total number of words. 
12 & Number of unique (distinct) words. 
13 & Average bigram probability (lower = less expected). 
14 & Average trigram probability (lower = less expected). 
15 & Ratio of pronouns to total words.
16 & Ratio of verbs in passive voice. 
17 & Binary indicator if the sentence is a named entity. 
18 & SMOG index (grade level). 
19 & Coleman–Liau index (readability score). 
20 & Automated Readability Index. 
21 & Dale–Chall readability score. 
22 & Linsear Write readability formula. 
23 & Gunning Fog index. 
