{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 32,
   "id": "0a1884c3-daa5-49e3-bddb-b55cb1a3f43a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>document</th>\n",
       "      <th>language</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>\\t  The U.S. Mint:\\t Collectors' Corner\\t  Collecting U.S. Coins \\t\\t  United States coins are made at four Mint facilities:\\t\\tPhiladelphia, Denver, San Francisco, and West Point, NY.\\t One easy way to start your collection is with the circulating coins\\t\\tyou use daily - pennies, nickels, dimes, quarters and dollars.\\t In addition, the U.S. Mint also issues annual proof and uncirculated\\t\\tsets, national medals, and commemorative coins. \\t\\t  Here are categories of U.S. numismatic items yo...</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>\\t  Paternity Guide for Unmarried Parents\\t\\t  A child born to unmarried parents does not automatically have a\\t\\tlegal father.\\t Legal fatherhood can mean a lot to your child and to you as parents.\\t Children with two parents involved in their lives are more likely to\\t\\tsucceed in a variety of ways.\\t This pamphlet will tell you how you can make this important commitment\\t\\tto your child and where to get help doing it.\\t\\t  Why is it important to establish paternity?\\t\\t  * Identity: When ...</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>\\t\\t  General - See inside front cover and pages 8 and 9 of the 1994\\t\\tannual report to stockholders for information about Union Carbide 's business.\\t\\t\\t\\t  On April 27, 1994, stockholders voted to approve the merger of\\t\\tUnion Carbide Corporation ( UCC) into Union Carbide Chemicals and Plastics\\t\\tCompany Inc. ( UCC&amp;P).\\t The merger was effective May 1, 1994.\\t Immediately after the merger, UCC&amp;P had the same consolidated assets,\\t\\tliabilities and stockholders' equity as UCC.\\t UCC&amp;P h...</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>\\t  February 19, 1990\\t  Dear Mike\\t Dr.  Pickles pointed out that John argues in this passage that\\t\\tthe evidence that we are indeed followers of Christ is that we love our\\t\\tfellow-Christians.\\t ( Read the passage for yourselves &amp; see if you agree with him.\\t I think he 's right.)\\t Also, if we live as a true family, loving one another, caring for one\\t\\tanother, and supporting one another, that is the most powerful testimony\\t\\tpossible to the transforming power of the love of Christ in...</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>\\t  2/15/09\\t  Dear Kelly,\\t\\t  Thank you again for offering to help me with this project for Dr.\\t\\tMeyer.\\t I know you're busy - - are 're we all? - - and it 's difficult to find\\t\\tsomeone who is willing to take the time to write, which is somewhat of a lost\\t\\tart form.\\t I enjoy writing, and I wish I had more opportunities to do so.\\t I especially enjoy corresponding with people through letters while\\t\\ttraveling or living abroad.\\t Call me old school, but I think it 's terrific to send...</td>\n",
       "      <td>USA</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2283</th>\n",
       "      <td>\\t  Chapter 2: \\t   'The Canadian Mosaic: A Melting Pot on Ice'  : The\\t\\tIronies of Ethnicity and Race \\t\\t  I enter here upon the general area of ethnicity and race \\t\\t 1   with much interest and not a little trepidation:\\t\\tinterest, because of my own Italian Canadian ethnic background ( I was a\\t\\tBortolotti before becoming a Hutcheon); trepidation, because of the manifest\\t\\tand multiple risks any such discussion entails.\\t Some of these have been suggested in an article by Robert Harn...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2284</th>\n",
       "      <td>\\t  Introduction \\t\\t  In 1984 Shirley Ittas, a faculty secretary in the film department\\t\\tat York University in Toronto, worked a 35-hour week for\\t\\t$15,745 per year.\\t Shirley's job requirements were many and complex:   \\t she uses her judgement as well as her clerical skills.\\t There are many hectic days when students need quick responses.\\t In fielding in  quiries, providing information about the\\t\\tdepartment's regulations and courses, sorting out hundreds of portfolios and\\t\\tgrades ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2285</th>\n",
       "      <td>\\t Dear Sir or Madam: \\t\\t  I would like to apply for the position of English Language Program\\t\\tDirector which was advertised in the Toronto Globe and Mail this October,\\t\\t1995.\\t I have been working in the area of adult ESL/EFL and ESP for 10 years,\\t\\tboth in Winnipeg and in Zaire, Africa.\\t In Zaire I worked as Teacher Training Coordinator training new\\t\\tteachers, observing and evaluating teaching staff, ordering and developing\\t\\tprogram-specific materials, organizing a reference li...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2286</th>\n",
       "      <td>\\t\\t  Would you please confirm what you said in our telephone\\t\\tconversation last Friday?\\t If so, please initial and return the attached copy of this memo. \\t\\t  1. Any non-credit course given in a department must be obviously\\t\\tlabelled NON-CREDIT. \\t\\t  2. Students in such courses are covered for physical injury in the\\t\\tsame way as is anyone who comes into university premises for any purpose.\\t No further liability insurance is needed.  \\t\\t\\t\\t\\t  From my experience with students in ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2287</th>\n",
       "      <td>\\t Dear  Shirley  , \\t\\t  I'm delighted that you're putting together a supplement to\\t\\tcelebrate the anniversary of the MBA programme and draw attention to its\\t\\tachievements. \\t\\t  March 10 isn't very far away, especially with a couple  days'\\t\\t holiday in the middle, and February being a short month!\\t I thought it would be useful to set some time-lines to avoid\\t\\tconfusion. \\t\\t  I understand  Dani Di Franco  is doing the design, and I\\t\\tknow she'll do a good job for you. \\t\\t  She ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2288 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 document  \\\n",
       "0     \\t  The U.S. Mint:\\t Collectors' Corner\\t  Collecting U.S. Coins \\t\\t  United States coins are made at four Mint facilities:\\t\\tPhiladelphia, Denver, San Francisco, and West Point, NY.\\t One easy way to start your collection is with the circulating coins\\t\\tyou use daily - pennies, nickels, dimes, quarters and dollars.\\t In addition, the U.S. Mint also issues annual proof and uncirculated\\t\\tsets, national medals, and commemorative coins. \\t\\t  Here are categories of U.S. numismatic items yo...   \n",
       "1     \\t  Paternity Guide for Unmarried Parents\\t\\t  A child born to unmarried parents does not automatically have a\\t\\tlegal father.\\t Legal fatherhood can mean a lot to your child and to you as parents.\\t Children with two parents involved in their lives are more likely to\\t\\tsucceed in a variety of ways.\\t This pamphlet will tell you how you can make this important commitment\\t\\tto your child and where to get help doing it.\\t\\t  Why is it important to establish paternity?\\t\\t  * Identity: When ...   \n",
       "2     \\t\\t  General - See inside front cover and pages 8 and 9 of the 1994\\t\\tannual report to stockholders for information about Union Carbide 's business.\\t\\t\\t\\t  On April 27, 1994, stockholders voted to approve the merger of\\t\\tUnion Carbide Corporation ( UCC) into Union Carbide Chemicals and Plastics\\t\\tCompany Inc. ( UCC&P).\\t The merger was effective May 1, 1994.\\t Immediately after the merger, UCC&P had the same consolidated assets,\\t\\tliabilities and stockholders' equity as UCC.\\t UCC&P h...   \n",
       "3     \\t  February 19, 1990\\t  Dear Mike\\t Dr.  Pickles pointed out that John argues in this passage that\\t\\tthe evidence that we are indeed followers of Christ is that we love our\\t\\tfellow-Christians.\\t ( Read the passage for yourselves & see if you agree with him.\\t I think he 's right.)\\t Also, if we live as a true family, loving one another, caring for one\\t\\tanother, and supporting one another, that is the most powerful testimony\\t\\tpossible to the transforming power of the love of Christ in...   \n",
       "4     \\t  2/15/09\\t  Dear Kelly,\\t\\t  Thank you again for offering to help me with this project for Dr.\\t\\tMeyer.\\t I know you're busy - - are 're we all? - - and it 's difficult to find\\t\\tsomeone who is willing to take the time to write, which is somewhat of a lost\\t\\tart form.\\t I enjoy writing, and I wish I had more opportunities to do so.\\t I especially enjoy corresponding with people through letters while\\t\\ttraveling or living abroad.\\t Call me old school, but I think it 's terrific to send...   \n",
       "...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ...   \n",
       "2283  \\t  Chapter 2: \\t   'The Canadian Mosaic: A Melting Pot on Ice'  : The\\t\\tIronies of Ethnicity and Race \\t\\t  I enter here upon the general area of ethnicity and race \\t\\t 1   with much interest and not a little trepidation:\\t\\tinterest, because of my own Italian Canadian ethnic background ( I was a\\t\\tBortolotti before becoming a Hutcheon); trepidation, because of the manifest\\t\\tand multiple risks any such discussion entails.\\t Some of these have been suggested in an article by Robert Harn...   \n",
       "2284  \\t  Introduction \\t\\t  In 1984 Shirley Ittas, a faculty secretary in the film department\\t\\tat York University in Toronto, worked a 35-hour week for\\t\\t$15,745 per year.\\t Shirley's job requirements were many and complex:   \\t she uses her judgement as well as her clerical skills.\\t There are many hectic days when students need quick responses.\\t In fielding in  quiries, providing information about the\\t\\tdepartment's regulations and courses, sorting out hundreds of portfolios and\\t\\tgrades ...   \n",
       "2285   \\t Dear Sir or Madam: \\t\\t  I would like to apply for the position of English Language Program\\t\\tDirector which was advertised in the Toronto Globe and Mail this October,\\t\\t1995.\\t I have been working in the area of adult ESL/EFL and ESP for 10 years,\\t\\tboth in Winnipeg and in Zaire, Africa.\\t In Zaire I worked as Teacher Training Coordinator training new\\t\\tteachers, observing and evaluating teaching staff, ordering and developing\\t\\tprogram-specific materials, organizing a reference li...   \n",
       "2286  \\t\\t  Would you please confirm what you said in our telephone\\t\\tconversation last Friday?\\t If so, please initial and return the attached copy of this memo. \\t\\t  1. Any non-credit course given in a department must be obviously\\t\\tlabelled NON-CREDIT. \\t\\t  2. Students in such courses are covered for physical injury in the\\t\\tsame way as is anyone who comes into university premises for any purpose.\\t No further liability insurance is needed.  \\t\\t\\t\\t\\t  From my experience with students in ...   \n",
       "2287   \\t Dear  Shirley  , \\t\\t  I'm delighted that you're putting together a supplement to\\t\\tcelebrate the anniversary of the MBA programme and draw attention to its\\t\\tachievements. \\t\\t  March 10 isn't very far away, especially with a couple  days'\\t\\t holiday in the middle, and February being a short month!\\t I thought it would be useful to set some time-lines to avoid\\t\\tconfusion. \\t\\t  I understand  Dani Di Franco  is doing the design, and I\\t\\tknow she'll do a good job for you. \\t\\t  She ...   \n",
       "\n",
       "     language  \n",
       "0         USA  \n",
       "1         USA  \n",
       "2         USA  \n",
       "3         USA  \n",
       "4         USA  \n",
       "...       ...  \n",
       "2283   Canada  \n",
       "2284   Canada  \n",
       "2285   Canada  \n",
       "2286   Canada  \n",
       "2287   Canada  \n",
       "\n",
       "[2288 rows x 2 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "pd.set_option('display.max_colwidth', 500)\n",
    "df = pd.read_csv('merged_ice.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "id": "2c382326-fd90-419b-8aa9-98534744683b",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>document</th>\n",
       "      <th>language</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2088</th>\n",
       "      <td>\\t   MAKING THE GRADE  \\t  by Terry Davis \\t\\t  Thirty years from now, when I have a grandchild on each knee, both\\t\\teager to hear tall tales, I'll probably tell them about the three days I\\t\\tspent on Quebec's Ste. Marguerite River in 1989. \\t\\t  My tale will begin innocuously enough.\\t I'll tell them that I had just flown back from an unsuccessful trip to\\t\\tthe Kegaska, a pret  ty-as-a-picture river on Quebec's lower North Shore,\\t\\tand that I was a little tired when I climbed into my ca...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2089</th>\n",
       "      <td>\\t   FACULTY OF ENGINEERING &amp; COMPUTER SCIENCE  \\t   Graduate Schedule &amp; Registration Information  \\t   INTRODUCTION  \\t\\t  The graduate schedule and registration information booklet has\\t\\tbeen produced to assist you in registering for your courses and also to\\t\\tfamiliarize you with important dates, procedures and regulations.\\t Please take the time to read the introductory information as it may\\t\\tease the registration and course change process. \\t\\t  The Faculty of Engineering and Comput...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2090</th>\n",
       "      <td>\\t Dear Ms.  Hood  ; \\t\\t  I am writing to apply for a position as publisher's sales\\t\\trepresentative with your company.\\t I am currently employed at the University of  Langford  Book\\t\\tStore where I have worked for a number of years in the trade and reference\\t\\tdepartment.\\t My responsibilities include ordering, inventory control, pricing,\\t\\tprocessing returns, corresponding with publishers and of course, customer\\t\\tservice.\\t As a member of the reference department, I have worked to ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2091</th>\n",
       "      <td>\\t  7 Feb 94 \\t  Dear Mum - \\t\\t  Well, now we are having colder weather - I hope it doesn't snow.\\t All those nice little flowers are going to suffer, not to mention me!\\t\\t\\t\\t  Your long letter arrived on Monday - thanks - and here it is\\t\\tMonday again.\\t It was a busy week - and  Erhard    &amp;   and \\t\\t  Hubert  went to a hockey game at  UCAN  on Friday, we\\t\\twent out to dinner on Saturday -   &amp;   and   yesterday I\\t\\twent out and got a rhubarb root from  Becky Hansen  , the woman I met...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2092</th>\n",
       "      <td>\\t Dear Kids ( as David Lettermen says), \\t\\t  A happy and prosperous new year to one and all!!!\\t What did you do over the holidays?\\t Trundled  Ramos  and  Rohan  off to Mongolio, I suppose!\\t I'm happy things seem to be going well: nice house, good jobs with no\\t\\tincompetent  Giovannis  or whatever his name was), and lots of travel.\\t\\tBravo! \\t\\t  I finally received some of your mail, the last two pieces sent\\t\\tafter you discovered my plight: NO MAIL FROM  JOHANNE  AND  GREG\\t\\t ! ! !...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2283</th>\n",
       "      <td>\\t  Chapter 2: \\t   'The Canadian Mosaic: A Melting Pot on Ice'  : The\\t\\tIronies of Ethnicity and Race \\t\\t  I enter here upon the general area of ethnicity and race \\t\\t 1   with much interest and not a little trepidation:\\t\\tinterest, because of my own Italian Canadian ethnic background ( I was a\\t\\tBortolotti before becoming a Hutcheon); trepidation, because of the manifest\\t\\tand multiple risks any such discussion entails.\\t Some of these have been suggested in an article by Robert Harn...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2284</th>\n",
       "      <td>\\t  Introduction \\t\\t  In 1984 Shirley Ittas, a faculty secretary in the film department\\t\\tat York University in Toronto, worked a 35-hour week for\\t\\t$15,745 per year.\\t Shirley's job requirements were many and complex:   \\t she uses her judgement as well as her clerical skills.\\t There are many hectic days when students need quick responses.\\t In fielding in  quiries, providing information about the\\t\\tdepartment's regulations and courses, sorting out hundreds of portfolios and\\t\\tgrades ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2285</th>\n",
       "      <td>\\t Dear Sir or Madam: \\t\\t  I would like to apply for the position of English Language Program\\t\\tDirector which was advertised in the Toronto Globe and Mail this October,\\t\\t1995.\\t I have been working in the area of adult ESL/EFL and ESP for 10 years,\\t\\tboth in Winnipeg and in Zaire, Africa.\\t In Zaire I worked as Teacher Training Coordinator training new\\t\\tteachers, observing and evaluating teaching staff, ordering and developing\\t\\tprogram-specific materials, organizing a reference li...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2286</th>\n",
       "      <td>\\t\\t  Would you please confirm what you said in our telephone\\t\\tconversation last Friday?\\t If so, please initial and return the attached copy of this memo. \\t\\t  1. Any non-credit course given in a department must be obviously\\t\\tlabelled NON-CREDIT. \\t\\t  2. Students in such courses are covered for physical injury in the\\t\\tsame way as is anyone who comes into university premises for any purpose.\\t No further liability insurance is needed.  \\t\\t\\t\\t\\t  From my experience with students in ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2287</th>\n",
       "      <td>\\t Dear  Shirley  , \\t\\t  I'm delighted that you're putting together a supplement to\\t\\tcelebrate the anniversary of the MBA programme and draw attention to its\\t\\tachievements. \\t\\t  March 10 isn't very far away, especially with a couple  days'\\t\\t holiday in the middle, and February being a short month!\\t I thought it would be useful to set some time-lines to avoid\\t\\tconfusion. \\t\\t  I understand  Dani Di Franco  is doing the design, and I\\t\\tknow she'll do a good job for you. \\t\\t  She ...</td>\n",
       "      <td>Canada</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>200 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 document  \\\n",
       "2088  \\t   MAKING THE GRADE  \\t  by Terry Davis \\t\\t  Thirty years from now, when I have a grandchild on each knee, both\\t\\teager to hear tall tales, I'll probably tell them about the three days I\\t\\tspent on Quebec's Ste. Marguerite River in 1989. \\t\\t  My tale will begin innocuously enough.\\t I'll tell them that I had just flown back from an unsuccessful trip to\\t\\tthe Kegaska, a pret  ty-as-a-picture river on Quebec's lower North Shore,\\t\\tand that I was a little tired when I climbed into my ca...   \n",
       "2089  \\t   FACULTY OF ENGINEERING & COMPUTER SCIENCE  \\t   Graduate Schedule & Registration Information  \\t   INTRODUCTION  \\t\\t  The graduate schedule and registration information booklet has\\t\\tbeen produced to assist you in registering for your courses and also to\\t\\tfamiliarize you with important dates, procedures and regulations.\\t Please take the time to read the introductory information as it may\\t\\tease the registration and course change process. \\t\\t  The Faculty of Engineering and Comput...   \n",
       "2090   \\t Dear Ms.  Hood  ; \\t\\t  I am writing to apply for a position as publisher's sales\\t\\trepresentative with your company.\\t I am currently employed at the University of  Langford  Book\\t\\tStore where I have worked for a number of years in the trade and reference\\t\\tdepartment.\\t My responsibilities include ordering, inventory control, pricing,\\t\\tprocessing returns, corresponding with publishers and of course, customer\\t\\tservice.\\t As a member of the reference department, I have worked to ...   \n",
       "2091  \\t  7 Feb 94 \\t  Dear Mum - \\t\\t  Well, now we are having colder weather - I hope it doesn't snow.\\t All those nice little flowers are going to suffer, not to mention me!\\t\\t\\t\\t  Your long letter arrived on Monday - thanks - and here it is\\t\\tMonday again.\\t It was a busy week - and  Erhard    &   and \\t\\t  Hubert  went to a hockey game at  UCAN  on Friday, we\\t\\twent out to dinner on Saturday -   &   and   yesterday I\\t\\twent out and got a rhubarb root from  Becky Hansen  , the woman I met...   \n",
       "2092   \\t Dear Kids ( as David Lettermen says), \\t\\t  A happy and prosperous new year to one and all!!!\\t What did you do over the holidays?\\t Trundled  Ramos  and  Rohan  off to Mongolio, I suppose!\\t I'm happy things seem to be going well: nice house, good jobs with no\\t\\tincompetent  Giovannis  or whatever his name was), and lots of travel.\\t\\tBravo! \\t\\t  I finally received some of your mail, the last two pieces sent\\t\\tafter you discovered my plight: NO MAIL FROM  JOHANNE  AND  GREG\\t\\t ! ! !...   \n",
       "...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   ...   \n",
       "2283  \\t  Chapter 2: \\t   'The Canadian Mosaic: A Melting Pot on Ice'  : The\\t\\tIronies of Ethnicity and Race \\t\\t  I enter here upon the general area of ethnicity and race \\t\\t 1   with much interest and not a little trepidation:\\t\\tinterest, because of my own Italian Canadian ethnic background ( I was a\\t\\tBortolotti before becoming a Hutcheon); trepidation, because of the manifest\\t\\tand multiple risks any such discussion entails.\\t Some of these have been suggested in an article by Robert Harn...   \n",
       "2284  \\t  Introduction \\t\\t  In 1984 Shirley Ittas, a faculty secretary in the film department\\t\\tat York University in Toronto, worked a 35-hour week for\\t\\t$15,745 per year.\\t Shirley's job requirements were many and complex:   \\t she uses her judgement as well as her clerical skills.\\t There are many hectic days when students need quick responses.\\t In fielding in  quiries, providing information about the\\t\\tdepartment's regulations and courses, sorting out hundreds of portfolios and\\t\\tgrades ...   \n",
       "2285   \\t Dear Sir or Madam: \\t\\t  I would like to apply for the position of English Language Program\\t\\tDirector which was advertised in the Toronto Globe and Mail this October,\\t\\t1995.\\t I have been working in the area of adult ESL/EFL and ESP for 10 years,\\t\\tboth in Winnipeg and in Zaire, Africa.\\t In Zaire I worked as Teacher Training Coordinator training new\\t\\tteachers, observing and evaluating teaching staff, ordering and developing\\t\\tprogram-specific materials, organizing a reference li...   \n",
       "2286  \\t\\t  Would you please confirm what you said in our telephone\\t\\tconversation last Friday?\\t If so, please initial and return the attached copy of this memo. \\t\\t  1. Any non-credit course given in a department must be obviously\\t\\tlabelled NON-CREDIT. \\t\\t  2. Students in such courses are covered for physical injury in the\\t\\tsame way as is anyone who comes into university premises for any purpose.\\t No further liability insurance is needed.  \\t\\t\\t\\t\\t  From my experience with students in ...   \n",
       "2287   \\t Dear  Shirley  , \\t\\t  I'm delighted that you're putting together a supplement to\\t\\tcelebrate the anniversary of the MBA programme and draw attention to its\\t\\tachievements. \\t\\t  March 10 isn't very far away, especially with a couple  days'\\t\\t holiday in the middle, and February being a short month!\\t I thought it would be useful to set some time-lines to avoid\\t\\tconfusion. \\t\\t  I understand  Dani Di Franco  is doing the design, and I\\t\\tknow she'll do a good job for you. \\t\\t  She ...   \n",
       "\n",
       "     language  \n",
       "2088   Canada  \n",
       "2089   Canada  \n",
       "2090   Canada  \n",
       "2091   Canada  \n",
       "2092   Canada  \n",
       "...       ...  \n",
       "2283   Canada  \n",
       "2284   Canada  \n",
       "2285   Canada  \n",
       "2286   Canada  \n",
       "2287   Canada  \n",
       "\n",
       "[200 rows x 2 columns]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df[df.language == \"Canada\"]\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "id": "5d437fdb-e91a-4313-9bf1-20b66f09dc7f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['\\t\\t  The first sector to capitalize on genetic engineering was health\\t\\tcare, particularly in the US.\\t In 1983, 10 years after Boyer and Cohen performed the first\\t\\trecombinant DNA experiment, scientists cultivated large quanti  ties of\\t\\tthe human gene responsible for synthesizing insulin and marketed it as a\\t\\ttreatment for diabetics.\\t Among the most commercially significant products to emerge since the\\t\\tearly 1980s, in addition to insulin, are human growth hormone ( which combats\\t\\tdwarfism), alpha interferon ( an immune-system stimulator which helps ward\\t\\toff disease), tissue plasmino  gen activator ( which can dissolve blood\\t\\tclots in heart-attack patients), a hepatitis-B vaccine, Ortho \\t\\tPharmaceuticals\\'  OKT-3 ( which helps prevent kidney transplant\\t\\trejection), and a drug called Epogen ( which fights ane  mia).\\t Annual worldwide sales of just these seven products have reached\\t\\t&dollar;1.25 billion US ( Gianturco 208).\\t Worldwide sales of all biotechnology products last year reached\\t\\t&dollar;6 billion.\\t By the turn of the century sales are expected to soar to more than\\t\\t&dollar;100 billion (  Canadian Biotech  12). \\t\\t  That remarkable growth is now being fuelled by rapidly evolving\\t\\tpatent legislation.\\t In perhaps the most sensational event in the brief history of modern\\t\\tbiotechnology, the US Patent and Trademark Office in 1987 passed a policy\\t\\tstating that it  \" considers non-naturally occurring nonhu  man\\t\\tmulticellular living organisms, including animals, to be patentable subject\\t\\tmatter\"  ( Wallis 78).\\t That declaration prompted fierce opposition from animal-rights\\t\\tactivists who feared the policy would promote the exploratory production of\\t\\tmisfit animals incapable of fighting pain or dis  ease.\\t In Canada, after 10 years of controversy, Parliament this summer\\t\\tpassed Bill C-15, the plant  breeders\\'  rights act.\\t Though not as sweeping as the US policy, the legislation for the\\t\\tfirst time puts plant breeders on a par with inventors of machinery and\\t\\tmanufacturing processes, permitting them to protect their new plant varieties\\t\\twith 18-year patents.\\t As a result, plant breeders will not only be able to collect\\t\\troyalties for the seeds they sell but also for the seed adult plants\\t\\tsubsequently produce.\\t Since the bill was introduced in its original form by the liberal\\t\\tgovernment in 1980, some farm groups, led by the National  Farmers\\' \\t\\tUnion, have been insist  ing it will hand over control of the food supply\\t\\tto a handful of large multi-national corporations. \\t\\t  Even in the absence of incentive legislation, however, the\\t\\tCanadian biotechnology industry\\'s growth to date has been impressive.\\t Sales of Canadian biotechnology products reached &dollar;660 million\\t\\tin 1988, while the industry as a whole posted a net after-tax loss of only\\t\\t&dollar;3 million, a remarkably strong performance for an infant industry\\t\\tthat is having to invest considerably in manufacturing facilities and\\t\\tmarketing efforts.\\t Dur  ing that same year, Canada\\'s biotechnology companies spent a\\t\\tsubstantial &dollar;275 million on research and development.\\t While many of Canada\\'s biotechnology enterprises are newly created\\t\\tdepartments of large indus  trial firms, most are privately owned\\t\\tstart-ups employing fewer than 50 people. \\t\\t  Building on Canada\\'s historical strength in agricultural,\\t\\tmetallurgical, and forestry research, the industry is distinguished by its\\t\\tprimary focus on resource-based products and by its diversity.\\t A recent survey of the indus  try sponsored by the Department of\\t\\tIndustry, Science and Technology and the National Research Council ( NRC )\\t\\treveals that:   \\t The products and processes invented and sold by Canadian\\t\\tbiotechnology compa  nies involve nearly every industrial sector.\\t They include cloned varieties of orna  mental plants, bioleaching\\t\\tin the mining of uranium and gold, quick tip-of-the-tongue tests to measure\\t\\tblood alcohol, anaerobic digestion systems for the treatment of pulp mill\\t\\teffluents, the world\\'s first conjugate vaccine, cattle improvement through\\t\\tnuclear transplantation and embryo cloning, monoclonal antibodies for blood\\t\\ttyping, soil microbes to improve plant growth, diagnostic kits for AIDS, the\\t\\tbrewing of beer, biological pesticides, and mass production of bio \\t\\tlogical reagents from eggs and plants. (  Canadian Biotech  1)\\t\\t  \\t\\t  According to that survey, more than 10,000 products are currently\\t\\tunder development, 76% of which are being developed by seed compa  nies.\\t\\t\\t\\t  The Canadian industry\\'s plans for future growth are equally\\t\\timpres  sive.\\t Companies plan to spend almost &dollar;7 million each on new manufac\\t\\t turing facilities by 1992.\\t By the same year, the industry expects to hire 5,000 new employees,\\t\\tnearly doubling its 1989 work-force.\\t Sales are expected to grow at an average annual rate of 46% between\\t\\t1988 and 1992, reaching &dollar;5 billion.\\t The expectations seem realistic, too.\\t About half of the 220 companies involved in biotechnology in Canada\\t\\treported net profits in 1988.\\t Overall, the firms are less indebted than those of many established\\t\\tindustrial sectors, with an average debt-to-equity ratio in 1988 of 0.47.\\t The average assets-to-liability ratio for the industry was a\\t\\trelatively healthy 2.3 (  Canadian Biotech  6). \\t\\t  Such statistics reveal a vibrant and remarkably strong industry.\\t Yet the overall picture points to more than an increase in efficiency\\t\\tand quantity of research.\\t It also reflects an impressive shift in demographics.\\t Private-sector involvement is rapidly growing against public-sector\\t\\tactivity.\\t Where once research proceeded almost exclusively according to\\t\\tpublic-policy objectives and at the discretion of government funding bodies,\\t\\tprivate capital is fuelling an expanding proportion of the new growth.\\t The dynamics of the marketplace will inevitably play an increasingly\\t\\tvital role in the guidance of that growth.\\t Clearly, the power and priorities of Cana  dian biotechnology\\t\\tresearch are changing.\\t What will some of those early priorities be, and what are the\\t\\timplications for public policy? \\t  II \\t\\t  The risks associated with the progress of biotechnology can be\\t\\tdivided into two categories, those that arise from the research activities\\t\\tthemselves and those that arise from the application of its products.\\t With the Canadian biotechnology industry still in its infancy, and\\t\\tthe bulk of its products still in development, the most imminent risks to\\t\\tconsider stem from the pro  cess, as opposed to the application, of\\t\\tresearch.\\t Ominous signs of the level of ignorance among biotechnology\\t\\tresearchers of the regulations that apply to their industry are already\\t\\tevident.\\t In the survey of 84 companies conducted by the federal government and\\t\\tthe NRC from November 1988 to February 1989, only 29 respondents were\\t\\tfamiliar with the Canadian Environmental Protection Act and its implications\\t\\tfor industrial biotech  nology.\\t Although ignorance of regulations will not be tolerated as a plea by\\t\\tthe courts, that can only be cold comfort when irresponsible practices have a\\t\\tvast potential to alter - or even destroy - parts of   the   the\\t\\t  environ  ment.\\t If biotechnology has the capacity to do irrevocable damage to humans\\t\\tor their surroundings, retroactive enforcement through punish  ment could\\t\\tbe irrelevant.\\t Measures to ensure compliance will be - and clearly already are -\\t\\tessential.\\t Regulatory authorities will thus have to work alongside the\\t\\tindustrial biotechnology community to assist in mak  ing every member\\t\\taware of the laws and guidelines that apply to their practices. \\t\\t  Enforcement presents another challenge.\\t As in nuclear research, the containment of potentially hazardous\\t\\tmaterial is a genuine problem, albeit only in a very small minority of cases.\\t While established firms are likely to conform voluntarily with public\\t\\tpolicies associated with research methods, smaller firms, which prevail in\\t\\tbiotechnology, are likely to take more risks.\\t This situation stands in stark contrast to the nuclear industry\\t\\twhere, because of the scale required by the research and its complexity,\\t\\tsmall companies are effectively precluded from participating.\\t A sound regulatory policy should contain provisions for random\\t\\tinvestigation of private laboratories - not unlike the scrutiny to which the\\t\\tfood industry is subject - to ensure that proper measures are taken to\\t\\tminimize the acci  dental release of potentially hazardous organisms. \\t\\t  The risks associated with the testing or application of products\\t\\tin the envi  ronment, especially micro-organisms, represent perhaps the\\t\\tmost critical area for regulation ( Doyle 50).\\t After all, the goal of biotechnology  is  to alter the\\t\\tenvironment.\\t Concerns for the long-term, broad ecological impact of biotechnology\\t\\twill have special relevance to Canadian research and development, where the\\t\\tvast majority of products are emerging from the natural-resource sectors,\\t\\tparticularly agriculture.\\t Monitoring and con  trolling the effects of that alteration will\\t\\tbe the key issues. \\t  III \\t\\t  When considering ramifications of the products of biotechnology,\\t\\tthe vast majority of which have yet to enter the market, an immediate dilemma\\t\\tpresents itself.\\t Is it possible to regulate effectively an enterprise in which\\t\\tpotential risks have yet to be uncovered?\\t The constant pressure to antici  pate risks means regulatory\\t\\tagencies must place strong emphasis on the evolution of policy itself.\\t The explosive growth of the industry underscores the urgency of that\\t\\tneed.\\t Co-operation from private industry would help identify areas of\\t\\tpotential public concern, but it is insufficient.\\t Neutral expertise is critical.\\t In an era when so many publicly-funded research institutions have\\t\\tseen their direct government support diminish against the growth of even\\t\\tmoderately expanding industries, and have seen their historical neutrality\\t\\tcompromised by the growing tendency to make federal funding for research\\t\\tcontingent upon industrial invest  ment, the capacity of the vibrant\\t\\tprivate biotechnology industry seems poised to outstrip the resources of\\t\\tthose very scientists whose indepen  dent expertise will be critical for\\t\\tinforming public policy.\\t Even if funding for public research were to be maintained at a\\t\\trelatively high level of 4%-6% of real growth, it is hardly a match for the\\t\\t46% annual rate at which the industry plans to expand.\\t Government and university research efforts in biotechnology cannot be\\t\\tallowed to languish in their present state of underfunding.\\t Among other things, public and university laboratories will have to\\t\\tstep up research into mathematical modelling and controlled-field testing in\\t\\torder to establish a predictive ecology before widespread release takes\\t\\tplace. \\t\\t  Another systematic trend that may compromise the goal of a sound\\t\\tand publicly supported regulatory system is evident in the the manner in\\t\\twhich the newly formed National Biotechnology Advisory Committee operates -\\t\\tnamely, under the auspices of the federal Department of Industry, Science and\\t\\tTechnology.\\t A founding priority of this department is to promote the  \"\\t\\tdevelopment, exploitation and applications of strategic technologies to\\t\\timprove Canada\\'s international competitiveness\"  (  \" Regu \\t\\tlatory Concerns\"  ) .\\t Environment Canada, on the other hand, has no such mandate, and its\\t\\tadvice ought to take a prime role in the formulation of regulations.\\t Can the Department of Industry, Science and Technology and the\\t\\tregulatory advisory body it oversees continue to be regarded as neutral\\t\\tsources of scrutiny by the biotechnology industry\\'s present critics, while\\t\\tthat department offers direct or indirect subsidies to industry and publicly\\t\\tchampions industry\\'s attempts to become globally competitive? \\t\\t  Even more important, because more than three-quarters of the prod\\t\\t ucts under development in Canada fall into the category of seed research,\\t\\tCanadian regulations will require specific attention to the impact of this\\t\\tresearch on accidental transmission and genetic diversity.\\t Plant research, unlike chemical, pharmaceutical, and animalian\\t\\tresearch, will in many cases require widespread dispersal of experimental\\t\\tgenetic material into the environment.\\t Geographically dispersed material is generally more sus  ceptible\\t\\tto the influences of a host of uncontrollable environmental factors, such as\\t\\tbioleaching into soil and transport by wind, micro-organisms, and animals.\\t Measures for assessing the potential of accidental transport, as well\\t\\tas the eventual fate of experimental material in the environment, will be\\t\\tcrucial. \\t\\t  Because agricultural applications of biotechnology can be\\t\\texpected to produce key crops with higher commercial value, it can be\\t\\texpected that biotechnology companies will find willing customers in the\\t\\tCanadian farming industry.\\t Measures should be taken to ensure that any reduction in genetic\\t\\tdiversity through the widespread planting of these key crops will not make\\t\\tthe food supply more susceptible to genetic diseases.\\t This may entail preserving a greater number of existing varieties in\\t\\tseed banks.\\t The potential for lost diversity also raises an interesting question\\t\\tfor the issue of farm support.\\t Will farmers who attempt to remain competitive by planting only those\\t\\tcrops with the highest commercial value be required to bear the burden of\\t\\tsudden losses to their incomes if their crops are wiped out by disease? \\t\\t  Insurance against such catastrophes might be obtained in another,\\t\\tpotentially more controversial way.\\t Even if consumers show a strong demand for new  \" super\"\\t\\t varieties, it is possible to  promote  the desirable goal\\t\\tof genetic diversity by taxing the consumption of these crops.\\t While the idea of limiting consumer choice through disincentives\\t\\tmight seem antithetical to a market-driven economy, it is nevertheless\\t\\tconsistent with a society that places high value on environmental security\\t\\tand human health.\\t Many western societies have already set precedents for\\t\\tmarket-intervention where public health and safety have been concerned.\\t Tobacco taxes, despite their threat to some  farmers\\' \\t\\tinterests, have enjoyed widespread public support in many countries,\\t\\tincluding Canada.\\t Would it be unthinkable, then, to impose taxes on crop varieties\\t\\tthat, because of their popularity, limit diversity and thus threaten the food\\t\\tsupply?\\t It has been suggested in the case of tobacco, and not without\\t\\treasonable justifi  cation, that a portion of the proceeds from taxes be\\t\\tdiverted to hospitals for the treatment of cancer and heart disease.\\t Would it be unreasonable to divert the proceeds from taxes on popular\\t\\tcrops toward maintaining seed banks? \\t\\t  As important as any specific regulatory consideration, finally,\\t\\tis the active participation of the public throughout the advisory process.\\t\\t \\t']\n"
     ]
    }
   ],
   "source": [
    "df_n = df[df[\"document\"].str.contains(\" dian \", na=False)]\n",
    "print(df_n[[\"document\"]].values[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28d6bccf-54b9-4dca-b361-fa4199f4bdf6",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "90b13463-d3b8-4372-a42b-0ed0a6a94ff9",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
