# sweetness

llama3-8b: "meta-llama/Llama-3.1-8B"
mixtral-7b: mistralai/Mistral-7B-v0.1
gemma-7b: google/gemma-7b

Write a python script to extract the attribute direction from LLaMA-3-8B using datasets in ./data/animal-habit.txt

In the file, each line is an animal-attribute pair meaning that the animal having that attriubte, e.g., Koala, eat plants.

The direction of the attribute "eat plants" can be approximated as the averaged embeddings of all animals having that attribute subtract the averaged embeddings of all animals that not have the attribute.

Extract all attribute direction, and plot them.


Write a python script to ask DeepSeek R1 model the relationships between animals and habits. Using the prompt "Does the animal "{animal}" has the habit "{habit}"? answer True or False only". 

Write down the results line by line, in the format, animal, habit, 1 (for True) or 0 (for False). 


Given the set of animal-habit relationships in animal_habit_relationship.txt.

Creating a python script to 
1) Extract all animals and attributes from animal_habit_relationship.txt: save the variables as animals and attributes, respectively. 
2) Extract pair-wise animal attribute relationships from animal_habit_relationship.txt, save it as a binary matrix formal_context with 1 indicating having that attribute and 0 meaning having no that attribute.
3) And extract all animal embeddings using LLAMa 2-7B, i.e., the one "meta-llama/Llama-2-7b-hf" model as used in LLM4FCA.py. Save the variable as animal_embeddings
5) Calculate the attribute embeddings such that each attribute is encoded as the average embeddings of the animals that have the corresponding attribute. Save attribute embedding as variable attribute_embedding.
5) Calculate the pair-wise animal-attribute cosine similarity with value in [0,1], and save it as another matrix called embedding_context.
6) Calculate the Pearson correlation coefficient between the formal_context and the embedding_context, to measure whether the value are linearly correlated. 

For step 3) - 5), you can refer LLM4FCA.py.



Given an attribute "Capable of flight", plot the positive animal embeddings and negative animal embeddings, respectively.

attribute = "Capable of flight"

positive_animals = [animal for animal in animals if formal_context.loc[animal, attribute] == 1]
negative_animals = [animal for animal in animals if formal_context.loc[animal, attribute] == 0]


We estimate the covariance matrix Cov(gw) using the Ledoit-Wolf shrinkage estimator (Ledoit & Wolf, 2004), because the dimension of the representation spaces is much higher than the number of samples


Write a python code to train a model to predict the animal hypernym relationships from the LLM animal embeddings. Split the animal_hypernyms.txt to train and test sets, 80% for training and 20% for testing. For each hypernyms pair (A, B), randomly take a negative pair by randomly replacing A or B with another animal. Hence, the number of training samples and testing samples are doubled.

The feature vector of animal is the projection lengths of the animal embeddings on all attribute directions. Hence, the input of animal is a vector of length len(attributes). 

Given two animal A and B, the model concatenate the feature vectors of A and B, and then feed it into a multi-layer perceptron (MLP) to predict the hypernym relationship. The MLP has one hidden layer with 1000 units, and the activation function is ReLU. The output layer has one unit, and the activation function is sigmoid. The model is trained using binary cross-entropy loss and Adam optimizer. The model is evaluated using F1 score, and filtered MRR.

Write another python script FCA_soft_subsumptiom.py, which predict whether one concept is subsumbed by another one without training a model, but directly from the formal context, where each raw is the embedding vector of a concept. 

Using the following two function:

\begin{definition}[Projection-profile-based soft subsumption score]
\label{def:projection-subsumption}
Let \( \boldsymbol{p}_1, \boldsymbol{p}_2 \in \mathbb{R}^k \) be the projection profiles of two concepts \( C_1 \) and \( C_2 \) over attribute directions \( \bar{\ell}_1, \dots, \bar{\ell}_k \), with thresholds \( \tau_1, \dots, \tau_k \). Then the soft subsumption score of \( C_1 \) subsumed by \( C_2 \) is defined as:
\begin{equation}
S_{\text{sub}}(C_1 \rightarrow C_2) := \frac{1}{k} \sum_{m=1}^{k} \mathbb{I} \left[ \sigma(\alpha(p_1(m) - \tau_m)) \le \sigma(\alpha(p_2(m) - \tau_m)) + \delta \right],
\end{equation}
where \( \delta \in [0,1] \) is a margin of tolerance.
\end{definition}

or 

\begin{equation}
S_{\text{soft-sub}}(C_1 \rightarrow C_2) := \frac{1}{k} \sum_{m=1}^{k} \sigma\left( \beta \cdot \left[ \sigma(\alpha(p_2(m) - \tau_m)) - \sigma(\alpha(p_1(m) - \tau_m)) + \delta \right] \right),
\end{equation}

Evaluating it on all pairs of subsumption.

Add a argument training, if it is True, then use the current method for predicting subsumption, if it is False, then predict the subsumption relation directory from the embedding profile of two concepts using the following two functions. The core idea is to compre each attribute projection


\begin{definition}[Projection-profile-based soft subsumption score]
\label{def:projection-subsumption}
Let \( \boldsymbol{p}_1, \boldsymbol{p}_2 \in \mathbb{R}^k \) be the projection profiles of two concepts \( C_1 \) and \( C_2 \) over attribute directions \( \bar{\ell}_1, \dots, \bar{\ell}_k \), with thresholds \( \tau_1, \dots, \tau_k \). Then the soft subsumption score of \( C_1 \) subsumed by \( C_2 \) is defined as:
\begin{equation}
S_{\text{sub}}(C_1 \rightarrow C_2) := \frac{1}{k} \sum_{m=1}^{k} \mathbb{I} \left[ \sigma(\alpha(p_1(m) - \tau_m)) \le \sigma(\alpha(p_2(m) - \tau_m)) + \delta \right],
\end{equation}
where \( \delta \in [0,1] \) is a margin of tolerance.
\end{definition}

or 

\begin{equation}
S_{\text{soft-sub}}(C_1 \rightarrow C_2) := \frac{1}{k} \sum_{m=1}^{k} \sigma\left( \beta \cdot \left[ \sigma(\alpha(p_2(m) - \tau_m)) - \sigma(\alpha(p_1(m) - \tau_m)) + \delta \right] \right),
\end{equation}




python FCA_embedding.py --dataset animal --model_key llama3-8b  --embedding_method random
python FCA_embedding.py --dataset plant --model_key llama3-8b  --embedding_method random
python FCA_embedding.py --dataset food --model_key llama3-8b  --embedding_method random

python FCA_embedding.py --dataset animal --model_key gemma7b  --embedding_method random  
python FCA_embedding.py --dataset plant --model_key gemma7b  --embedding_method random   
python FCA_embedding.py --dataset food --model_key gemma7b  --embedding_method random    

python FCA_embedding.py --dataset animal --model_key mistral7b  --embedding_method random  
python FCA_embedding.py --dataset plant --model_key mistral7b  --embedding_method random  
python FCA_embedding.py --dataset food --model_key mistral7b  --embedding_method random  

python FCA_embedding.py --dataset animal --model_key llama3-8b  --embedding_method mean
python FCA_embedding.py --dataset plant --model_key llama3-8b  --embedding_method mean
python FCA_embedding.py --dataset food --model_key llama3-8b  --embedding_method mean

python FCA_embedding.py --dataset animal --model_key gemma7b  --embedding_method mean  
python FCA_embedding.py --dataset plant --model_key gemma7b  --embedding_method mean   
python FCA_embedding.py --dataset food --model_key gemma7b  --embedding_method mean    

python FCA_embedding.py --dataset animal --model_key mistral7b  --embedding_method mean  
python FCA_embedding.py --dataset plant --model_key mistral7b  --embedding_method mean  
python FCA_embedding.py --dataset food --model_key mistral7b  --embedding_method mean  

python FCA_embedding.py --dataset animal --model_key llama3-8b  --embedding_method lda
python FCA_embedding.py --dataset plant --model_key llama3-8b  --embedding_method lda
python FCA_embedding.py --dataset food --model_key llama3-8b  --embedding_method lda

python FCA_embedding.py --dataset animal --model_key gemma7b  --embedding_method lda  
python FCA_embedding.py --dataset plant --model_key gemma7b  --embedding_method lda   
python FCA_embedding.py --dataset food --model_key gemma7b  --embedding_method lda    

python FCA_embedding.py --dataset animal --model_key mistral7b  --embedding_method lda  
python FCA_embedding.py --dataset plant --model_key mistral7b  --embedding_method lda  
python FCA_embedding.py --dataset food --model_key mistral7b  --embedding_method lda 














