{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "57490a97",
   "metadata": {},
   "source": [
    "## Sparsity-Agnostic Linear Bandits with Adaptive Adversaries\n",
    "\n",
    "This code is the official implementation of 'Sparsity-Agnostic Linear Bandits with Adaptive Adversaries.'\n",
    "\n",
    "To proceed, simply press 'shift+enter' for each cell. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "656c81f4",
   "metadata": {},
   "source": [
    "### Requirements\n",
    "The following cell includes all the required packages for this code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "9c53a5c3-9750-40ed-8226-0d38986d5fcc",
   "metadata": {},
   "outputs": [],
   "source": [
    "from functools import wraps\n",
    "from typing import List, Tuple\n",
    "import numpy as np\n",
    "import scipy.sparse as sp\n",
    "from scipy.special import softmax\n",
    "from numpy import linalg as LA\n",
    "import argparse\n",
    "from sklearn.linear_model import Lasso\n",
    "import matplotlib.pyplot as plt\n",
    "import time\n",
    "import sys\n",
    "sys.argv = ['']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7e298696",
   "metadata": {},
   "source": [
    "### Algorithm class\n",
    "**Bandit_Algo** is a parent class for all bandit algorithms.\n",
    "\n",
    "All algorithms will have the following methods:\n",
    "\n",
    "- *\\_\\_init\\_\\_*: is a constructor to initialize(assign values) to the data members of the class when an algorithm class is created.\n",
    "- *sample*: is a method that will be called when the algorithm 'samples' an actoin based on the history. \n",
    "- *update*: will be called right after *sample* method to update its information.\n",
    "\n",
    "All algorithms will take the following parameters as a default:\n",
    "- $\\theta$: hidden reward parameter. This is mainly used for the \\_Known suffix experiment, where we assume that the algorithm knows the sparsity level of $\\theta_*$ in advance. \n",
    "- $d$: Dimensionality of this linear bandit problem.\n",
    "- $T$: Total number of iterations\n",
    "- $\\delta$: Error probability. \n",
    "- $\\lambda$: regularization parameter for the $\\hat{\\theta}_t$. Throughout our paper and this experiment, this will be always 1. \n",
    "- $\\sigma$: The width of the uniform distribution of the noise. Noise follows $Unif(-\\sigma, \\sigma)$ distribution.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3781ccbe-4f30-46cf-aa16-9f5154c0d59e",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Bandit_Algo():\n",
    "    def __init__(self,theta,d,T,delta,lamb, sigma):\n",
    "        self.theta=theta\n",
    "        self.T=T\n",
    "        self.cum_regret=np.zeros(T)\n",
    "        self.delta=delta\n",
    "        self.Vt=lamb*np.identity(d)\n",
    "        self.hat_theta=np.zeros(d)\n",
    "    def sample(self, A):\n",
    "        return 0;\n",
    "    def update(self):\n",
    "        return 0;"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "78876c0e",
   "metadata": {},
   "source": [
    "**SparseLinUCB** is Algorithm 1 in our main paper.\n",
    "\n",
    "Basically, we implemented the pseudocode from the paper, and the parameters added from the parent class are as follows:\n",
    "\n",
    " - *coin_dist*: is a parameter which represents the model selection distribution\n",
    "   - When coin_dist takes a list, the input list becomes the model selection distribution.\n",
    "   - coin_dist==-1 corresponds to *\\_Theory* suffix, meaning $q_s = \\Theta(2^{-s})$ for $s=0,\\ldots,n$.\n",
    "   - coin_dist==-2 corresponds to *\\_Unif* suffix, meaning $q_s = \\frac{1}{n}$ for $s=0, \\ldots, n$. \n",
    "   - coin_dist==-3 corresponds to *\\_Known* suffix, meaning $q_s = \\mathbb{1}\\{s=o\\}$ when the optimal index $o$ (for the given $S$) is known in advance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "28161e3d-6ff7-454f-9d66-957f1fb277dc",
   "metadata": {},
   "outputs": [],
   "source": [
    "class SparseLinUCB(Bandit_Algo):\n",
    "    def __init__(self,theta,d,T,delta=0.01, lamb=1, coin_dist=-1, c=1):        \n",
    "        self.theta=theta\n",
    "        self.s=np.sum(theta!=0)\n",
    "        self.o=int(np.ceil(np.log2(self.s))+1)\n",
    "        self.T=T\n",
    "        self.cum_regret=np.zeros(T)\n",
    "        self.delta=delta\n",
    "        self.Vt=np.identity(d)*lamb\n",
    "        self.VtInv=LA.inv(self.Vt)\n",
    "        self.hat_theta=np.zeros(d)\n",
    "        self.t=0\n",
    "        self.d=d\n",
    "        self.n=int(np.ceil(np.log2(d)*c))+2 # For example, d=16=2^4, we need greedy, s=1=2^0, s=2=2^1, s=4=2^2, s=8=2^3, s=16=2^4\n",
    "        self.alpha=np.zeros(self.n)\n",
    "        self.coin_dist=np.ones(self.n)\n",
    "        self.Xr=np.zeros(d)\n",
    "        for i in range(self.n):\n",
    "            self.alpha[i]=2**(i-1)\n",
    "            if coin_dist==-1:\n",
    "                self.coin_dist[i]=1/self.alpha[i]\n",
    "            elif coin_dist==-2:\n",
    "                self.coin_dist[i]=1/self.n\n",
    "            elif coin_dist==-3:\n",
    "                self.coin_dist[i]=(i==self.o)\n",
    "            else:\n",
    "                self.coin_dist=np.array(coin_dist)\n",
    "        self.coin_dist=self.coin_dist/np.sum(self.coin_dist)\n",
    "        self.alpha[0]=0\n",
    "\n",
    "    def sample(self,A):\n",
    "        alpha_i=self.alpha[np.random.choice(self.n, p=self.coin_dist)]\n",
    "        N_a, _ = np.shape(A)\n",
    "        UCB=np.zeros(N_a)\n",
    "        norms=np.sqrt(np.diag(A@self.VtInv@A.T))\n",
    "        for i in range(N_a):\n",
    "            UCB[i]=np.dot(self.hat_theta,A[i])+np.sqrt(alpha_i*np.log(self.t+1))*norms[i]\n",
    "        return A[np.argmax(UCB)]\n",
    "\n",
    "    def update(self, A, t, a_t, r_t):\n",
    "        self.Vt=self.Vt+np.outer(a_t, a_t)\n",
    "        self.VtInv=self.VtInv - (np.outer(self.VtInv@a_t, self.VtInv@a_t))/(1+a_t@self.VtInv@a_t) #Sherman–Morrison formula\n",
    "        self.Xr=self.Xr+a_t *r_t\n",
    "        self.hat_theta = self.VtInv@self.Xr\n",
    "        best_reward=np.max(A@self.theta)\n",
    "        instant_reg=best_reward-np.dot(self.theta, a_t)\n",
    "        self.t=t\n",
    "        if t==0:\n",
    "            self.cum_regret[t]=instant_reg\n",
    "        else:\n",
    "            self.cum_regret[t]=self.cum_regret[t-1]+instant_reg\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1935017c",
   "metadata": {},
   "source": [
    "**AdaLinUCB** is Algorithm 2 in our main paper. \n",
    "\n",
    "There are two noteworthy parameters added from the parent class:\n",
    "\n",
    " - *coin_dist*: is a parameter which represents the first model selection distribution, corresponding to $Z_t$ in our pseudocode.\n",
    " - *prior*: is an initial model selection distribution of Exp3, corresponding to $q$ (see second paragraph of the Section 5 for its explanation). \n",
    "    - 'U': corresponds to *\\_Unif* suffix, meaning $q_s = \\Theta(2^{-s})$ for $s=0,\\ldots,n$.\n",
    "    - 'T': corresponds to *\\_Theory* suffix, meaning $q_s = \\frac{1}{n}$ for $s=0, \\ldots, n$. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "0857c46b-5515-47fb-8165-4533f3760e2e",
   "metadata": {},
   "outputs": [],
   "source": [
    "class AdaLinUCB(Bandit_Algo):\n",
    "    def __init__(self,theta,d,T,delta=0.01, lamb=1, coin_dist=0, c=1, prior='U'):        \n",
    "        self.theta=theta\n",
    "        self.T=T\n",
    "        self.cum_regret=np.zeros(T)\n",
    "        self.delta=delta\n",
    "        self.Vt=np.identity(d)*lamb\n",
    "        self.VtInv=LA.inv(self.Vt)\n",
    "        self.hat_theta=np.zeros(d)\n",
    "        self.t=0\n",
    "        self.d=d\n",
    "        self.n=int(np.ceil(np.log2(d)*c))+2\n",
    "        self.eta=2*np.sqrt(np.log(self.n)/(1*self.n))\n",
    "        self.alpha=np.zeros(self.n)\n",
    "        self.coin_dist=np.ones(self.n)\n",
    "        self.Xr=np.zeros(d)\n",
    "        \n",
    "        self.prior=np.ones(self.n)\n",
    "        for i in range(self.n):\n",
    "            if prior=='T':\n",
    "                self.prior[i]=2**(-i)\n",
    "            self.alpha[i]=2**(i-1) ####Original version\n",
    "            self.alpha[0]=0\n",
    "        self.coin_dist=[1-coin_dist, coin_dist]\n",
    "        self.S=np.zeros(self.n)\n",
    "        self.prior=self.prior/np.sum(self.prior)\n",
    "        self.exp_dist=self.prior*softmax(self.eta*self.S)\n",
    "        self.exp_dist=self.exp_dist/np.sum(self.exp_dist)\n",
    "        self.first_coin=0\n",
    "        self.second_coin=0\n",
    "\n",
    "    def sample(self,A):\n",
    "        self.first_coin=np.random.choice(2, p=self.coin_dist)\n",
    "        N_a, _ = np.shape(A)\n",
    "        UCB=np.zeros(N_a)\n",
    "        norms=np.sqrt(np.diag(A@self.VtInv@A.T))\n",
    "\n",
    "        if self.first_coin==1:\n",
    "            for i in range(N_a):\n",
    "                UCB[i]=np.dot(self.hat_theta,A[i])+np.sqrt(self.alpha[-1]*np.log(self.t+1))*norms[i]\n",
    "        else:\n",
    "            self.exp_dist=self.prior*softmax(self.eta*self.S)\n",
    "            self.exp_dist=self.exp_dist/np.sum(self.exp_dist)\n",
    "            self.second_coin=np.random.choice(self.n, p=self.exp_dist)\n",
    "            self.alpha_i=self.alpha[self.second_coin]\n",
    "            for i in range(N_a):\n",
    "                UCB[i]=np.dot(self.hat_theta,A[i])+np.sqrt(self.alpha_i*np.log(self.t+1))*norms[i]\n",
    "\n",
    "        return A[np.argmax(UCB)]\n",
    "\n",
    "    def update(self, A, t, a_t, r_t):\n",
    "        if self.first_coin==0:     \n",
    "            self.S[self.second_coin]=self.S[self.second_coin]-(2-r_t)/(4*(self.exp_dist[self.second_coin])) #Omitted +1 summation for everyone           \n",
    "            self.exp_dist=self.prior*softmax(self.eta*self.S)\n",
    "            self.exp_dist=self.exp_dist/np.sum(self.exp_dist)\n",
    "        self.Vt=self.Vt+np.outer(a_t, a_t)\n",
    "        self.t=t\n",
    "        self.VtInv=self.VtInv - (np.outer(self.VtInv@a_t, self.VtInv@a_t))/(1+a_t@self.VtInv@a_t) # Sherman–Morrison formula\n",
    "        self.Xr=self.Xr+a_t *r_t\n",
    "        self.hat_theta = self.VtInv@self.Xr\n",
    "        best_reward=np.max(A@self.theta)\n",
    "        self.eta=2*np.sqrt(np.log(self.n)/((t+1)*self.n))\n",
    "        instant_reg=best_reward-np.dot(self.theta, a_t)\n",
    "        if t==0:\n",
    "            self.cum_regret[t]=instant_reg\n",
    "        else:\n",
    "            self.cum_regret[t]=self.cum_regret[t-1]+instant_reg\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d0c59f8",
   "metadata": {},
   "source": [
    "**OFUL** is a class that implements the algorithm devised in the paper ['Improved Algorithms for Linear Stochastic Bandits'](https://papers.nips.cc/paper_files/paper/2011/file/e1d5be1c7f2f456670de3d53c7b54f4a-Paper.pdf)\n",
    "\n",
    "- This code is based on the log-determinant version of the confidence set. See Appendix E.3 for details. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "3ef6ff66-298f-49ef-a99f-1d538b11db0c",
   "metadata": {},
   "outputs": [],
   "source": [
    "class OFUL(Bandit_Algo):\n",
    "    def __init__(self,theta,d,T,delta=0.01, sigma=1, lamb=1, radi=False):        \n",
    "        self.theta=theta\n",
    "        self.T=T\n",
    "        self.t=1\n",
    "        self.radi=radi\n",
    "        self.sigma=sigma\n",
    "        self.cum_regret=np.zeros(T)\n",
    "        self.delta=delta\n",
    "        self.Vt=np.identity(d)*lamb\n",
    "        self.lamb=lamb\n",
    "        self.VtInv=LA.inv(self.Vt)\n",
    "        self.hat_theta=np.zeros(d)\n",
    "        self.detV=1\n",
    "        self.d=d\n",
    "        if radi:\n",
    "            self.beta=self.sigma*np.sqrt(self.d*np.log(1+(self.t)/(self.d*self.lamb)))+np.sqrt(self.lamb)*LA.norm(self.theta)\n",
    "        else:\n",
    "            self.beta=self.sigma*np.sqrt(np.log(self.detV/(self.delta**2)))+np.sqrt(self.lamb)*LA.norm(self.theta)\n",
    "\n",
    "        self.Xr=np.zeros(d)\n",
    "\n",
    "    def sample(self,A):\n",
    "        N_a, _ = np.shape(A)\n",
    "        UCB=np.zeros(N_a)\n",
    "        norms=np.sqrt(np.diag(A@self.VtInv@A.T))\n",
    "        for i in range(N_a):\n",
    "            UCB[i]=np.dot(self.hat_theta,A[i])+self.beta*norms[i]\n",
    "        return A[np.argmax(UCB)]\n",
    "\n",
    "    def update(self, A, t, a_t, r_t):\n",
    "        self.detV=self.detV+a_t.T @(self.VtInv) @a_t\n",
    "        self.t=t+1\n",
    "        if self.radi:\n",
    "            self.beta=np.sqrt(self.d*np.log(1+(self.t)/(self.d*self.lamb)))+np.sqrt(self.lamb)*LA.norm(self.theta)\n",
    "        else:\n",
    "            self.beta=np.sqrt(np.log(self.detV/(self.delta**2)))+np.sqrt(self.lamb)*LA.norm(self.theta)\n",
    "        self.Vt=self.Vt+np.outer(a_t, a_t)\n",
    "        self.VtInv=self.VtInv - (np.outer(self.VtInv@a_t, self.VtInv@a_t))/(1+a_t@self.VtInv@a_t)\n",
    "        self.Xr=self.Xr+a_t *r_t\n",
    "        self.hat_theta = self.VtInv@self.Xr\n",
    "        best_reward=np.max(A@self.theta)\n",
    "        instant_reg=best_reward-np.dot(self.theta, a_t)\n",
    "        if t==0:\n",
    "            self.cum_regret[t]=instant_reg\n",
    "        else:\n",
    "            self.cum_regret[t]=self.cum_regret[t-1]+instant_reg\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "534ca189",
   "metadata": {},
   "source": [
    "## Experiment class\n",
    "\n",
    "A class designed to effectively manage the experimental environment. It takes the following major inputs:\n",
    "- $s$: Sparsity level. Each experiment class has a fixed sparsity level, so one needs to create another experiment class instance to run another experiment with different sparsity level.\n",
    "- $d$: Dimensionality of this linear bandit problem.\n",
    "- $T$: Total number of iterations\n",
    "- $\\delta$: Error probability. \n",
    "- $\\sigma$: The width of the uniform distribution of the noise. Noise follows $Unif(-\\sigma, \\sigma)$ distribution.\n",
    "- $N_a$: number of arms. \n",
    "- *alg_list*: list of algorithms for this experiment. \n",
    "- *repeat*: number of repetitions. \n",
    "\n",
    "Methods:\n",
    "- single\\_exp(alg\\_name, repeat\\_ind): run single experiment for *alg\\_name*. \n",
    "- multi\\_exp(): run multiple experiment, but for each repetition all algorithms share same $\\theta_*$ and $\\mathcal{A}$. \n",
    "- plot\\_result(): a method to plot result and save it as 'Fixed\\_action\\_set\\_s(sparsity level).'\n",
    "- generate\\_theta(sparsity): generate $\\theta_*$ which satisfies $\\|\\theta_*\\|_0 = $(input sparsity)\n",
    "- generate\\_action\\_set(): generate an action set. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "a40762ff-b429-4a4a-b865-3db8fd9c6639",
   "metadata": {},
   "outputs": [],
   "source": [
    "class Experiment():\n",
    "    def __init__(self, s, d, T, delta, sigma, N_a, alg_list, theta_scale=1,  repeat=1, action_set='F'):\n",
    "        self.N_alg=len(alg_list)\n",
    "        self.action_set=action_set\n",
    "        self.d=d\n",
    "        self.s=s\n",
    "        self.T=T\n",
    "        self.sigma=sigma\n",
    "        self.N_a = N_a\n",
    "        self.delta=delta\n",
    "        self.repeat=repeat\n",
    "        self.alg_list=alg_list\n",
    "        self.theta_scale=theta_scale\n",
    "        self.cum_hist=np.zeros((len(alg_list),repeat, T))\n",
    "        self.theta=self.generate_theta(self.s)\n",
    "        self.action_seq=self.generate_action_set()\n",
    "\n",
    "    \n",
    "    def single_exp(self, alg_name, repeat_ind):\n",
    "        print('Alg Name: '+alg_name+', Repeat ind:'+str(repeat_ind))\n",
    "        theta=self.theta\n",
    "        alg=self.create_alg_inst(theta, alg_name)\n",
    "        for t in range(self.T):\n",
    "            A=self.action_seq\n",
    "            a_t = alg.sample(A)\n",
    "            r_t = np.dot(a_t, theta)+np.random.uniform(-self.sigma,self.sigma)\n",
    "            alg.update(A,t,a_t,r_t)\n",
    "        self.cum_hist[self.alg_list.index(alg_name)][repeat_ind]=alg.cum_regret\n",
    "\n",
    "    \n",
    "    def multi_exp(self):\n",
    "        for rep in range(self.repeat):\n",
    "            self.theta=self.generate_theta(self.s)\n",
    "            self.action_seq=self.generate_action_set()\n",
    "            for alg_name in self.alg_list:\n",
    "                self.single_exp(alg_name,rep)\n",
    "\n",
    "    def plot_result(self):\n",
    "        s=0\n",
    "        fonty=14\n",
    "        timesteps=np.arange(self.T)\n",
    "        for alg_name in self.alg_list:\n",
    "            mean=np.mean(self.cum_hist[s], axis=0)\n",
    "            std=np.std(self.cum_hist[s], axis=0)\n",
    "            plt.plot(timesteps, mean, label=alg_name) \n",
    "            plt.fill_between(timesteps, mean-std, mean+std, alpha=0.1)\n",
    "            s=s+1\n",
    "        title='s='+str(self.s)\n",
    "        filename='Fixed_action_set_s'+str(self.s)\n",
    "        plt.title(title, weight='bold', fontsize=fonty)\n",
    "        plt.xlabel('Iterations', weight='bold', fontsize=fonty)\n",
    "        plt.ylabel('Cumulative Regret', weight='bold', fontsize=fonty)\n",
    "        plt.legend(frameon=False, fontsize=fonty)\n",
    "        plt.savefig(filename, bbox_inches='tight')\n",
    "        plt.close()\n",
    "    \n",
    "    \n",
    "    def generate_theta(self,sparsity):\n",
    "        theta=np.zeros(self.d)\n",
    "        signal=np.random.standard_normal(sparsity)\n",
    "        #signal[:sparsity//2]=signal[:sparsity//2]*4\n",
    "        signal=signal/LA.norm(signal)\n",
    "        theta[:sparsity]=signal\n",
    "        return theta*self.theta_scale\n",
    "\n",
    "    \n",
    "    def create_alg_inst(self, theta, alg_name):\n",
    "        if alg_name=='SL_Unif':\n",
    "            return SparseLinUCB(theta, self.d,self.T,self.delta, coin_dist=-2)\n",
    "        elif alg_name=='SL_Known':\n",
    "            return SparseLinUCB(theta, self.d,self.T,self.delta, coin_dist=-3)\n",
    "        elif alg_name=='SL_Theory':\n",
    "            return SparseLinUCB(theta, self.d,self.T,self.delta, coin_dist=-1)\n",
    "        elif alg_name=='AL_Unif':\n",
    "            return AdaLinUCB(theta, self.d,self.T,self.delta, coin_dist=0)\n",
    "        elif alg_name=='AL_Theory':\n",
    "            return AdaLinUCB(theta, self.d,self.T,self.delta, coin_dist=0, prior='T')\n",
    "        elif alg_name=='OFUL':\n",
    "            return OFUL(theta,self.d,self.T,self.delta,(self.sigma))\n",
    "    \n",
    "    def generate_action_set(self):\n",
    "        A=np.random.standard_normal(size=(self.N_a, self.d))\n",
    "        for i in range(self.N_a):\n",
    "            A[i]=A[i]/LA.norm(A[i])\n",
    "        return A"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d0779508",
   "metadata": {},
   "source": [
    "## Main code execution\n",
    "\n",
    "After running the cell below, the algorithm will save the results as 'Fixed\\_action\\_set\\_s(sparsity level).png' files. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a7bc39d8-80c5-469f-8651-65c88efb7250",
   "metadata": {},
   "outputs": [],
   "source": [
    "sparsity_snr=1\n",
    "exp_snr=[0]*5\n",
    "for j in range(5):\n",
    "    exp_snr[j]=Experiment(sparsity_snr,16,1000,0.01, 1, 30, ['AL_Unif', 'AL_Theory', 'OFUL', 'SL_Unif', 'SL_Known', 'SL_Theory'], repeat=3, action_set='F')\n",
    "    tic_snr=time.time()\n",
    "    exp_snr[j].multi_exp()\n",
    "    toc_snr=time.time()\n",
    "    print(toc_snr-tic_snr)\n",
    "    print('----------------------------------Sparsity: '+str(sparsity_snr)+' Done--------------------------------------')\n",
    "    sparsity_snr=sparsity_snr*2\n",
    "    exp_snr[j].plot_result()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
