{"cells":[{"cell_type":"markdown","metadata":{"id":"5DYTYEG-sFl9"},"source":["# Connect Colab to a Chameleon server\n","\n","This notebook describes how to connect Colab to a server running on Chameleon. This allows you to run experiments requiring bare metal access, storage, memory, GPU and compute that exceeds the abilities of Colab's hosted runtime, but with Colab's familiar interface (and notebooks stored in your Google Drive). It also allows you to easily go back and forth between the convenience of Colab's hosted runtime and Chameleon's greater capabilities, depending on the needs of your experiment.\n","\n","More details about Chameleon can be found here: https://teaching-on-testbeds.github.io/hello-chameleon/"]},{"cell_type":"markdown","metadata":{"id":"xz9uTFYjsFl-"},"source":["## Provision the resource\n"]},{"cell_type":"markdown","metadata":{"id":"WRt9yqpmsFl-"},"source":["### Check resource availability"]},{"cell_type":"markdown","metadata":{"id":"CWMbtu0isFl-"},"source":["This notebook will try to reserve a bare metal Ubuntu server with RTX6000 GPU on CHI@UC - pending availability. Before you begin, you should check the host calendar at [https://chi.uc.chameleoncloud.org/project/leases/calendar/host/](https://chi.uc.chameleoncloud.org/project/leases/calendar/host/). In the \"Node Type\" dropdown, filter on `gpu_rtx_6000` and make sure some hosts are available."]},{"cell_type":"markdown","metadata":{"id":"myRxLqgJsFl_"},"source":["### Chameleon configuration\n","\n","You can change your Chameleon project name (if not using the one that is automatically configured in the JupyterHub environment) and the site on which to reserve resources (depending on availability) in the following cell."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"7oS1DF8jsFl_"},"outputs":[],"source":["import chi, os\n","\n","PROJECT_NAME = os.getenv('OS_PROJECT_NAME')\n","chi.use_site(\"CHI@UC\")\n","chi.set(\"project_name\", PROJECT_NAME)\n"]},{"cell_type":"markdown","metadata":{"id":"sHfK8X7DsFmA"},"source":["If you need to change the details of the Chameleon server, e.g. use a different OS image, or a different node type depending on availability, you can do that in the following cell."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"OXRHSuB8sFmA"},"outputs":[],"source":["chi.set(\"image\", \"CC-Ubuntu20.04\")\n","# note: we use base Ubuntu because we want a newer CUDA than is in Chameleon's Ubuntu+CUDA image\n","NODE_TYPE = \"gpu_rtx_6000\""]},{"cell_type":"markdown","metadata":{"id":"Wnsfy8SHsFmA"},"source":["### Reservation\n","\n","The following cell will create a reservation that begins now, and ends in 8 hours. You can modify the start and end date as needed."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"p3GWaubgsFmA"},"outputs":[],"source":["from chi import lease\n","\n","\n","res = []\n","lease.add_node_reservation(res, node_type=NODE_TYPE, count=1)\n","lease.add_fip_reservation(res, count=1)\n","start_date, end_date = lease.lease_duration(days=6, hours=8)\n","\n","l = lease.create_lease(f\"{os.getenv('USER')}-{NODE_TYPE}-ubuntu_2\", res, start_date=start_date, end_date=end_date)\n","l = lease.wait_for_active(l[\"id\"])"]},{"cell_type":"markdown","metadata":{"id":"KQ-vtPSBsFmB"},"source":["### Provisioning resources\n","\n","This cell provisions resources. It will take approximately 10 minutes. You can check on its status in the Chameleon web-based UI: [https://chi.uc.chameleoncloud.org/project/instances/](https://chi.uc.chameleoncloud.org/project/instances/), then come back here when it is in the READY state."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"DxY6jBZCsFmB"},"outputs":[],"source":["from chi import server\n","\n","reservation_id = lease.get_node_reservation(l[\"id\"])\n","server.create_server(\n","    f\"{os.getenv('USER')}-{NODE_TYPE}-ubuntu_2\", \n","    reservation_id=reservation_id,\n","    image_name=chi.get(\"image\")\n",")\n","server_id = server.get_server_id(f\"{os.getenv('USER')}-{NODE_TYPE}-ubuntu_2\")\n","server.wait_for_active(server_id)"]},{"cell_type":"markdown","metadata":{"id":"ONyDhWU5sFmB"},"source":["Associate an IP address with this server:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"YAVGslktsFmB"},"outputs":[],"source":["reserved_fip = lease.get_reserved_floating_ips(l[\"id\"])[0]\n","server.associate_floating_ip(server_id,reserved_fip)"]},{"cell_type":"markdown","metadata":{"id":"oqkhpyUHsFmB"},"source":["and wait for it to come up:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"XWUsJmlUsFmB"},"outputs":[],"source":["server.wait_for_tcp(reserved_fip, port=22)"]},{"cell_type":"markdown","metadata":{"id":"fBRAsxJLsFmB"},"source":["## Install stuff"]},{"cell_type":"markdown","metadata":{"id":"dhuj6zQDsFmC"},"source":["The following cells will install some basic packages in order to connect your Colab frontend to your Chameleon server. However, you may want to log in to your Chameleon server in order to access its terminal and install or configure packages outside of Colab.\n","\n","To log in to the resource, use File > New > Terminal in the Chameleon JupyterHub environment, or your local terminal, and run:\n"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JPl-NMVRsFmC"},"outputs":[],"source":["print(\"cc@\" + reserved_fip)"]},{"cell_type":"markdown","metadata":{"id":"Ik6hW0o-sFmC"},"source":["Meanwhile, install an updated CUDA, Python and JupyterHub on your resource:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"eUYlRUiLsFmC"},"outputs":[],"source":["from chi import ssh\n","\n","node = ssh.Remote(reserved_fip)"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"KaKuMU5psFmC"},"outputs":[],"source":["node.run('sudo apt update')\n","node.run('sudo apt -y install python3-pip python3-dev')\n","node.run('sudo pip3 install --upgrade pip')"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"oRyPUp0-sFmC"},"outputs":[],"source":["node.run('wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb')\n","node.run('sudo dpkg -i cuda-keyring_1.0-1_all.deb')\n","node.run('sudo apt update')\n","node.run('sudo apt -y install linux-headers-$(uname -r)')\n","node.run('sudo apt -y install cuda-11-8')\n","node.run('sudo apt -y install nvidia-gds') # install instructions say to do this separately!\n","node.run('sudo apt -y install libcudnn8')"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"w0TJN9etsFmC"},"outputs":[],"source":["node.run(\"echo 'PATH=\\\"/usr/local/cuda-11.8/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin\\\"' | sudo tee /etc/environment\")"]},{"cell_type":"markdown","metadata":{"id":"84XvbXoAsFmC"},"source":["Now we have to reboot, and make sure we have the latest CUDA:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Bl4xbkAosFmC"},"outputs":[],"source":["try:\n","    node.run('sudo reboot')\n","except:\n","    pass\n","server.wait_for_tcp(reserved_fip, port=22)\n","node = ssh.Remote(reserved_fip) # note: need a new SSH session to get new PATH\n","node.run('nvidia-smi')\n","node.run('nvcc --version')"]},{"cell_type":"markdown","metadata":{"id":"nOpzCnY5sFmD"},"source":["#### Optional: Install Python packages"]},{"cell_type":"markdown","metadata":{"id":"sH4diWh4sFmD"},"source":["Before starting your Jupyter instance, you may want to install some Python packages.\n","\n","The following cell will install the *same* version of some key deep learning packages as are installed on Colab (as of November 2022), for maximum cross-compatibility. "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"3TC7pu6bsFmD"},"outputs":[],"source":["node.run('python3 -m pip install --user Cython==0.29.32')\n","node.run('wget https://raw.githubusercontent.com/teaching-on-testbeds/colab/main/requirements_chameleon_dl.txt -O requirements_chameleon_dl.txt')\n","node.run('wget https://raw.githubusercontent.com/teaching-on-testbeds/colab/main/requirements_chameleon.txt -O requirements_chameleon.txt')\n","node.run('python3 -m pip install --user -r requirements_chameleon_dl.txt --extra-index-url https://download.pytorch.org/whl/cu113 -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html')"]},{"cell_type":"markdown","metadata":{"id":"bcLmvlsisFmD"},"source":["If you need additional packages, you can install them in a similar manner."]},{"cell_type":"markdown","metadata":{"id":"bYbucfC-sFmD"},"source":["Test your installation - make sure Tensorflow, Pytorch, and JAX can all see the GPU:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"zyZdfcLJsFmD"},"outputs":[],"source":["node.run('python3 -c \\'import tensorflow as tf; print(tf.config.list_physical_devices(\"GPU\"))\\'')\n","# should say: PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"e9vFEHNxsFmD"},"outputs":[],"source":["node.run('python3 -c \\'import torch; print(torch.cuda.get_device_name(0))\\'')\n","# should say: Quadro RTX 6000"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"Upuue7WnsFmD"},"outputs":[],"source":["node.run('python3 -c \\'import jax; print(jax.devices())\\'')\n","# should say: StreamExecutorGpuDevice(id=0, process_index=0, slice_index=0)"]},{"cell_type":"markdown","metadata":{"id":"GKctKfDxsFmD"},"source":["### Set up Jupyter on server"]},{"cell_type":"markdown","metadata":{"id":"hL-WK5dlsFmD"},"source":["Install `jupyter_http_over_ws`, which is required in order to connect Colab to this Jupyter instance:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"qOaIlbfIsFmD"},"outputs":[],"source":["node.run('python3 -m pip install --user  jupyter-core jupyter-client jupyter_http_over_ws traitlets -U --force-reinstall')"]},{"cell_type":"markdown","metadata":{"id":"BHFeN3TnsFmE"},"source":["And, active `jupyter_http_over_ws`:"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"iUey7YrWsFmE"},"outputs":[],"source":["node.run('/home/cc/.local/bin/jupyter serverextension enable --py jupyter_http_over_ws')"]},{"cell_type":"markdown","metadata":{"id":"D7MzjU5DsFmE"},"source":["## Connect Colab to the server"]},{"cell_type":"markdown","metadata":{"id":"QU7pJnuvsFmE"},"source":["In a **local terminal on your own laptop**, run"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"JIgY390CsFmE"},"outputs":[],"source":["print('ssh -L 127.0.0.1:8888:127.0.0.1:8888 cc@' + reserved_fip) "]},{"cell_type":"markdown","metadata":{"id":"6mHZ21_XsFmE"},"source":["to set up a tunnel to the Jupyter server. If your Chameleon key is not in the default location, you should also specify the path to your key as an argument, using `-i`. Leave this SSH session open."]},{"cell_type":"markdown","metadata":{"id":"lJ2pEer9sFmE"},"source":["Then, run the following cell, which will run a command that does not terminate: "]},{"cell_type":"code","execution_count":null,"metadata":{"id":"1fi4m3-osFmE"},"outputs":[],"source":["node.run(\"/home/cc/.local/bin/jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0\")"]},{"cell_type":"markdown","metadata":{"id":"vTLaRD0SsFmE"},"source":["In the output of the cell above, look for a URL in this format:\n","    \n","```\n","http://localhost:8888/?token=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\n","```"]},{"cell_type":"markdown","metadata":{"id":"5YKRbbwhsFmE"},"source":["Copy this URL - you will need it in the next step."]},{"cell_type":"markdown","metadata":{"id":"pJ_t19FtsFmE"},"source":["Now, you can open Colab in a browser. Click on the drop-down menu for \"Connect\" in the top right and select \"Connect to a local runtime\". Paste the URL you copied earlier into the space and click \"Connect\". Your notebook should now be running on your Colab host (you can put `!hostname` in a cell and run it to verify!)\n"]},{"cell_type":"markdown","metadata":{"id":"1mVzNZb-sFmE"},"source":["## Release resources\n","\n","If you finish with your experimentation before your lease expires,release your resources and tear down your environment by running the following (commented out to prevent accidental deletions).\n","\n","This section is designed to work as a \"standalone\" portion - you can come back to this notebook, ignore the top part, and just run this section to delete your reasources."]},{"cell_type":"code","execution_count":null,"metadata":{"id":"R056jSbjsFmE"},"outputs":[],"source":["# setup environment - if you made any changes in the top part, make the same changes here\n","import chi, os\n","from chi import lease, server\n","\n","PROJECT_NAME = os.getenv('OS_PROJECT_NAME')\n","chi.use_site(\"CHI@UC\")\n","chi.set(\"project_name\", PROJECT_NAME)\n","\n","NODE_TYPE = \"gpu_rtx_6000\"\n","lease = chi.lease.get_lease(f\"{os.getenv('USER')}-{NODE_TYPE}-ubuntu\")"]},{"cell_type":"code","execution_count":null,"metadata":{"id":"gYvBD3P2sFmF"},"outputs":[],"source":["DELETE = False\n","# DELETE = True \n","\n","if DELETE:\n","    # delete server\n","    server_id = chi.server.get_server_id(f\"{os.getenv('USER')}-{NODE_TYPE}-ubuntu\")\n","    chi.server.delete_server(server_id)\n","\n","    # release floating IP\n","    reserved_fip =  chi.lease.get_reserved_floating_ips(lease[\"id\"])[0]\n","    ip_info = chi.network.get_floating_ip(reserved_fip)\n","    chi.neutron().delete_floatingip(ip_info[\"id\"])\n","\n","    # delete lease\n","    chi.lease.delete_lease(lease[\"id\"])\n"]}],"metadata":{"kernelspec":{"display_name":"Python 3 (ipykernel)","language":"python","name":"python3"},"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.10.6"},"colab":{"provenance":[]}},"nbformat":4,"nbformat_minor":0}