# USER:
╭────────────────────────────────────────────────────────────────────╮
│  > Write a scraper for me to search for article titles and abstracts on Google Scholar with specified keywords │
╰────────────────────────────────────────────────────────────────────╯

# AGENT:
 
 ╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
 │ ✔ GoogleSearch Searching the web for: "google scholar search url structure"                                                                             │
 │                                                                                                                                                          │
 │    Search results for "google scholar search url structure" returned.                                                                                    │
 ╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
✦ Okay, I can help you write a Python scraper to search for article titles and abstracts on Google Scholar.

  Please note that Google Scholar may have anti-scraping mechanisms (such as CAPTCHAs, IP blocking, etc.), which may cause the scraper to fail when used frequently. To avoid this, I will add a User-Agent header to simulate browser requests, but this does not guarantee 100% success rate. For more complex anti-scraping mechanisms, you may need to use proxy IP pools, CAPTCHA recognition services, etc.

  Here is the Python code to implement this functionality:

```
    1 import requests
    2 from bs4 import BeautifulSoup
    3 
    4 def search_google_scholar(keyword):
    5     """
    6     Search Google Scholar for article titles and abstracts with specified keywords.
    7 
    8     Args:
    9         keyword (str): The keyword to search for.
    10 
    11     Returns:
    12         list: A list of dictionaries containing article titles and abstracts.
    13               Each dictionary contains 'title' and 'abstract' keys.
    14               If abstract cannot be retrieved, 'abstract' will be None.
    15     """
    16     base_url = "https://scholar.google.com/scholar?q="
    17     search_url = base_url + requests.utils.quote(keyword) # URL encode the keyword
    18 
    19     # Simulate browser request headers to avoid being identified as a scraper by Google Scholar
    20     headers = {
    21         "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
    22     }
    23 
    24     articles = []
    25 
    26     try:
    27         response = requests.get(search_url, headers=headers, timeout=10)
    28         response.raise_for_status()  # Check if HTTP request was successful
    29 
    30         soup = BeautifulSoup(response.text, 'html.parser')
    31 
    32         # Find all article results
    33         # Google Scholar article results are usually in divs with class 'gs_r gs_or gs_scl'
    34         for article_div in soup.find_all('div', class_='gs_r gs_or gs_scl'):
    35             title_tag = article_div.find('h3', class_='gs_rt')
    36             abstract_tag = article_div.find('div', class_='gs_rs')
    37 
    38             title = title_tag.a.get_text(strip=True) if title_tag and title_tag.a else "N/A"
    39             abstract = abstract_tag.get_text(strip=True) if abstract_tag else None
    40 
    41             # Clean up the "..." prefix from the abstract if it exists
    42             if abstract and abstract.startswith("... "):
    43                 abstract = abstract[4:]
    44 
    45             articles.append({
    46                 'title': title,
    47                 'abstract': abstract
    48             })
    49 
    50     except requests.exceptions.RequestException as e:
    51         print(f"Request error: {e}")
    52     except Exception as e:
    53         print(f"Parsing error: {e}")
    54 
    55     return articles
    56 
    57 if __name__ == "__main__":
    58     search_keyword = input("Please enter the keyword you want to search for: ")
    59     results = search_google_scholar(search_keyword)
    60 
    61     if results:
    62         print(f"\nArticles found on Google Scholar about '{search_keyword}':")
    63         for i, article in enumerate(results):
    64             print(f"\n--- Article {i+1} ---")
    65             print(f"Title: {article['title']}")
    66             if article['abstract']:
    67                 print(f"Abstract: {article['abstract']}")
    68             else:
    69                 print("Abstract: No abstract found.")
    70     else:
    71         print(f"No articles found about '{search_keyword}' or an error occurred.")
```
  How to use this code:

   1. Save the code: Save the above code as a .py file, for example, google_scholar_scraper.py.
   2. Install dependencies: If you haven't installed the requests and beautifulsoup4 libraries yet, open a terminal or command prompt and run the following commands to install them:
   1     pip install requests beautifulsoup4
   3. Run the code: In a terminal or command prompt, navigate to the directory where you saved the file, then run:
   1     python google_scholar_scraper.py
   4. Enter keyword: The program will prompt you to enter the keyword you want to search for. Enter it and press Enter.

  Code explanation:

   * `requests`: Used to send HTTP requests and get the HTML content of Google Scholar pages.
   * `BeautifulSoup`: Used to parse HTML content, making it easy to extract information like titles and abstracts.
   * `search_google_scholar(keyword)` function:
       * Constructs the Google Scholar search URL and uses requests.utils.quote to URL-encode the keyword to handle special characters.
       * Sets headers, especially the User-Agent, which is key to simulating browser behavior.
       * Uses requests.get() to send a GET request with a timeout to prevent long waits.
       * response.raise_for_status() will throw an exception if the HTTP request returns an error status code.
       * Parses HTML using BeautifulSoup(response.text, 'html.parser').
       * Finds all article result HTML blocks through soup.find_all('div', class_='gs_r gs_or gs_scl'). Google Scholar's article results typically contain these CSS classes.
       * In each article block, finds the title with find('h3', class_='gs_rt') and the abstract with find('div', class_='gs_rs').
       * Extracts text content and performs some cleanup (e.g., removing the "..." from the beginning of the abstract).
       * Stores the extracted title and abstract in a dictionary and adds it to the articles list.
   * Error handling: Uses try-except blocks to catch network request errors and HTML parsing errors, making the program more robust.
   * `if __name__ == "__main__":`: This code ensures that the search and print result logic only runs when the script is executed directly.

  Hope this scraper helps you!