[
  {
    "objectID": "privacy.html",
    "href": "privacy.html",
    "title": "Privacy and Anonymization",
    "section": "",
    "text": "Define privacy and anonymity, and describe the techniques that can be used to anonymize patient medical data.\nIdentify key reasons why anonymization does not preserve patient identities in the real-world.\nAnalyze how current data acquisition practices and anonymization techniques may inadvertently harm minority patient populations.\n\n\n\n\n\n\n\nTipFood for Thought: What does data privacy mean to you?\n\n\n\n\n\nSome potential answers might include:\n\nControl of Access: You should be able to control who accesses your data.\nControl of Use: You should be able to have a say on how your data is used and for what purpose.\nKnowledge of Access/Use: You should know when your data is used or accessed.\nOpt In (and Out): You should be able to add more data or remove your data at any point in time.\nAnonymity: Your identity should remain private regardless of whatever you or others do with your data.\n\nWhat other answers did you think of? Do you think these notions of privacy are currently satisfied in academic research using patient data?\n\n\n\n\n\n When you release your genomic data to the public, is the privacy of your parents and grandparents still preserved? What about the future privacy of your children and grandchildren?\nMade with ❤ by the EAMC Team ©2025."
  },
  {
    "objectID": "privacy.html#learning-objectives",
    "href": "privacy.html#learning-objectives",
    "title": "Privacy and Anonymization",
    "section": "",
    "text": "Define privacy and anonymity, and describe the techniques that can be used to anonymize patient medical data.\nIdentify key reasons why anonymization does not preserve patient identities in the real-world.\nAnalyze how current data acquisition practices and anonymization techniques may inadvertently harm minority patient populations.\n\n\n\n\n\n\n\nTipFood for Thought: What does data privacy mean to you?\n\n\n\n\n\nSome potential answers might include:\n\nControl of Access: You should be able to control who accesses your data.\nControl of Use: You should be able to have a say on how your data is used and for what purpose.\nKnowledge of Access/Use: You should know when your data is used or accessed.\nOpt In (and Out): You should be able to add more data or remove your data at any point in time.\nAnonymity: Your identity should remain private regardless of whatever you or others do with your data.\n\nWhat other answers did you think of? Do you think these notions of privacy are currently satisfied in academic research using patient data?\n\n\n\n\n\n When you release your genomic data to the public, is the privacy of your parents and grandparents still preserved? What about the future privacy of your children and grandchildren?"
  },
  {
    "objectID": "privacy.html#overview",
    "href": "privacy.html#overview",
    "title": "Privacy and Anonymization",
    "section": "Overview",
    "text": "Overview\nAs clinicians, we deal with patient data every day, and have an ethical (and legal) responsibility to protect patient privacy and confidential information. At the same time, we often work alongside scientists to use patient data to advance our understanding of science. How can we gain meaningful insights from data while still protecting patient identity?\nAccording to the Health Insurance Portability and Accountability Act (HIPAA), one way to accomplish this is through data anonymization. In general, there are two main ways that researchers anonymize data:\n\nCoarsening means we decrease the granularity of the features. For example, using 5-digit zip codes may make it too easy to identify individuals from a dataset, so we might instead choose to coarsen the zip codes by removing the last two digits of each zip code. Instead of including the exact ages of patients, we often coarse the data to only include the decade of the age of the patient.\nReduction means we remove entire features altogether. For example, we might choose to remove all patient names and medical record numbers from a dataset before making it accessible to researchers.\n\nHow can we be certain that a dataset is anonymized “enough”? Formally, a dataset is defined as \\(k\\)-anonymous if there are at least \\(k\\) copies of any given row in a dataset. The concept of \\(k\\)-anonymity is based in the idea of anonymity in numbers - if \\(k\\) is sufficiently large, then it should (hopefully) be impossible to identify any singular individual as a particular row of the dataset because the patient could be any of at least \\(k\\) rows."
  },
  {
    "objectID": "privacy.html#a-detailed-look-hipaa-phi",
    "href": "privacy.html#a-detailed-look-hipaa-phi",
    "title": "Privacy and Anonymization",
    "section": "A Detailed Look: HIPAA PHI",
    "text": "A Detailed Look: HIPAA PHI\nLet’s take a look at the official list of HIPAA-protected attributes from the Health and Human Services Department, which are called protected health information (PHI):\n\n\n The initial three digits of a zip code is still considered PHI by HIPAA if the number of individuals residing in all zip codes with those initial three digits is less than 20,000. Why do you think this is the case? How do you think the cutoff of 20,000 individuals was determined?\n Why are ages over 90 years-old considered PHI, but not younger ages?\n\nNames.\nAll geographical subdivisions smaller than a state (e.g., street address, city, county, precinct, ZIP code except for the initial three digits of a ZIP code).\nAll dates (except year) directly related to an individual (e.g., birth date, admission date, exact ages in years over the ages of 90).\nPhone numbers, fax numbers, email addresses.\nSocial security numbers, health plan beneficiary numbers, driver license numbers, medical record numbers, etc.\nLicense plates\nIP addresses\nBiometric identifiers (i.e., finger prints, voice recordings, genomic data)\nFull-face photographic images\nAny other unique identifying number, characteristic, or code\n\nAre there any attributes listed that you didn’t expect? How about attributes that aren’t listed above but should be included?"
  },
  {
    "objectID": "privacy.html#hands-on-tutorial",
    "href": "privacy.html#hands-on-tutorial",
    "title": "Privacy and Anonymization",
    "section": "Hands-On Tutorial",
    "text": "Hands-On Tutorial\nFor this exercise, take a look at the following table:\n\n\n\nTable 1: A Sample Patient Dataset\n\n\n\n\n\nPATIENT_ID\nAGE\nGENDER\nBP\nHIV_STATUS\n\n\n\n\nP001\n45\nM\n120/80\nNegative\n\n\nP002\n60\nF\n135/85\nPositive\n\n\nP003\n33\nM\n128/82\nNegative\n\n\nP004\n50\nF\n142/90\nNegative\n\n\nP005\n27\nM\n110/70\nPositive\n\n\nP006\n38\nF\n125/78\nNegative\n\n\nP007\n55\nM\n138/88\nNegative\n\n\nP008\n43\nF\n132/84\nPositive\n\n\nP009\n29\nM\n118/76\nNegative\n\n\nP010\n61\nF\n145/92\nNegative\n\n\n\n\n\n\n\n\n Is Table 1 properly anonymized according to HIPAA regulations?\nImagine that you’re a student working in a research lab and are tasked with analyzing this dataset of patients from the Philadelphia area. Your research mentor tells you that this dataset contains all of the inpatient admissions to HUP from the past week.\nSeparately during your lunch break, you hear on the news that a famous celebrity - a 50 year-old female (in this hypothetical situation) - was recently admitted to HUP last week for a hypertensive crisis, and was just recently discharged from the hospital.\n\n\n In case it wasn’t already clear, this dataset and scenario is entirely fictional, and was actually generated using ChatGPT! You can take a look at the generation process here if you’re interested.\n\n\n\n\n\n\nTipGiven this information, can you identify which patient ID corresponds to the famous celebrity?\n\n\n\n\n\nThe only patient in the table above corresponding to a 50 year-old female with hypertension is patient P004.\n\n\n\nIgnoring the fact that this was a small toy example, how difficult was it to re-identify a patient (namely, the famous celebrity) from the dataset? As a result of the successful re-identification of the patient, were you able to learn anything new about the patient (i.e., take a look at the HIV_STATUS column).\nIt turns out that a very similar re-identification strategy was used by Dr. Latanya Sweeney in 1997 where she successfully re-identified the then Governor of Massachusetts using publicly accessible, anonymized medical records released by the state of Massachusetts.\n\n\n Sweeney is an excellent writer and researcher, and we encourage you to check out two of her publications on this topic: [1] Sweeney L. Only you, your doctor, and many others may know. Tech Sci. (2015). Link to article; [2] Sweeney L. \\(k\\)-Anonymity: A model for protecting privacy. Int J Uncertainty, Fuzziness, and Knowledge-based Systems 10(5): 557-70. (2002). Link to article\nWhy were we and Dr. Sweeney able to re-identify patients from an anonymized dataset? The main reason is that in both situations, we correlated the information in the table with outside knowledge and other datasets in order to gain new, privileged information about patients by synthesizing datasets together. There are countless other examples of re-identifying individuals from anonymized datasets, from identifying Netflix users from anonymized movie ratings to even finally catching the notorious Golden State Killer.\nIn summary, there are two key points that we hope you take away from this exercise:\n\n\n\n\n\n\nNoteProblems with Anonymization\n\n\n\n\nAnonymization is not an effective tool to preserve patient privacy.\nThe reason why anonymization fails is that it assumes there are no other datasets or sources of information in the world to cross-reference (which is obviously not true)."
  },
  {
    "objectID": "privacy.html#evidence-based-medicine-discussion",
    "href": "privacy.html#evidence-based-medicine-discussion",
    "title": "Privacy and Anonymization",
    "section": "Evidence-Based Medicine Discussion",
    "text": "Evidence-Based Medicine Discussion\nDo current HIPAA-compliant anonymization standards effectively protect minorities and people of color?\n\n\n\n\n\n\nNote1. Overview Article\n\n\n\nAll of Us Research Program Overview. National Institutes of Health. Accessed 19 May 2024. Link to article\ntl;dr: All of Us is an NIH initiative to build a diverse database of Americans from all backgrounds in order to inform and power thousands of future studies on a variety of different health conditions. The overarching goal of the All of Us initiative is to power future advancements in precision medicine.\n\n\n\n\n There’s a great 2-minute intro video to the All of Us Research program here.\n\n\n\n\n\n\nTip2. Yes, the anonymization achieved by the All-of-Us Research Program is sufficient.\n\n\n\nXia W, Basford M, Carroll R, Clayton EW, Harris P, Kantacioglu M, Liu Y, Nyemba S, Vorobeychik Y, Wan Z, Malin BA. Managing re-identification risks while providing access to the All of Us research program. J Am Med Inf Assoc 30(5): 907-14. (2023). doi: 10.1093/jamia/ocad021. PMID: 36809550\ntl;dr: Cross-sectional study using the All of Us database containing data from over 300,000 participants at the time of the study. The authors used computational techniques to compute the re-identification risk for any given individual in the dataset. A large re-identification risk means that a given individual is unique in dataset and therefore more likely to be re-identified. The 95th percentile re-identification risk across all participants satisfies current government guidelines.\n\n\n\n\n\n\n\n\nImportant3. No, the All-of-Us Research Program hurts people of color.\n\n\n\nKaiser J. Million-person U.S. study of genes and health stumbles over including Native American groups. Science. (2019). Link to article\ntl;dr: Native Americans have historically been mistreated by researchers and the US government, and are skeptical of participating in All of Us. Because so few Native Americans currently participate in All of Us, any new individual participant from a small tribe will have a high re-identification risk in spite of data safeguards. Tribes are seeking to be able to approve publications on their group and an opportunity to bless biological samples before disposal.\n\n\n\n\n There are other problems involving the All of Us research program, including a recent study inadvertently using “objective” mathematical techniques that inappropriately validates racist and xenophobic ideologies.1 Even well-established data analysis techniques must be used and presented carefully!\n1 The All of Us Research Program Genomics Investigators. Genomic data in the All of Us research program. Nature 627: 340-6. (2024). doi: 10.1038/s41586-023-06957-x. PMID: 38374255"
  },
  {
    "objectID": "privacy.html#summary",
    "href": "privacy.html#summary",
    "title": "Privacy and Anonymization",
    "section": "Summary",
    "text": "Summary\nAnonymization is a common technique used to ensure that publicly released medical datasets are HIPAA-compliant and protect patient identities. Unfortunately, there is a growing body of evidence that shows that anonymization is no longer an effective technique for protecting patient data, and cannot provide any provable guarantees for patient privacy. At the end of the day, robustly guaranteeing patient privacy is a difficult task and requires conscious efforts from both clinicians and researchers alike."
  },
  {
    "objectID": "privacy.html#additional-readings",
    "href": "privacy.html#additional-readings",
    "title": "Privacy and Anonymization",
    "section": "Additional Readings",
    "text": "Additional Readings\n\nGille F, Brall C. Limits of data anonymity: Lack of public awareness risks trust in health system activities. Life Sciences, Society and Policy 17(7). (2021). doi: 10.1186/s40504-021-00115-9\nSavage N. Privacy: The myth of anonymity. Nature 537: S70-2. (2016). doi: 10.1038/537S70a. PMID: 27602747\nKapoor S. Revisiting HIPAA - Privacy concerns in healthcare tech. Berkeley Technology Law Journal. (2023). Link to article\nOhm P. Broken promises of privacy: Responding to the surprising failure of anonymization. UCLA Law Review 57: 1701. (2010). Link to article\nPool J, Akhlaghpour S, Fatehi F, Burton-Jones A. A systematic analysis of failures in protecting personal health data: A scoping review. Int J Inf Manag 74: 102719. (2024). doi: 10.1016/j.ijinfomgt.2023.102719"
  },
  {
    "objectID": "ml.html",
    "href": "ml.html",
    "title": "Introduction to Machine Learning",
    "section": "",
    "text": "Describe what an algorithm is and how they are used in both clinical medicine and everyday life.\nDescribe what it means to learn and how learning applies to machine learning.\nIdentify key applications of machine learning and when computational tools can be helpful (and potentially harmful) for patient care.\nMade with ❤ by the EAMC Team ©2025."
  },
  {
    "objectID": "ml.html#learning-objectives",
    "href": "ml.html#learning-objectives",
    "title": "Introduction to Machine Learning",
    "section": "",
    "text": "Describe what an algorithm is and how they are used in both clinical medicine and everyday life.\nDescribe what it means to learn and how learning applies to machine learning.\nIdentify key applications of machine learning and when computational tools can be helpful (and potentially harmful) for patient care."
  },
  {
    "objectID": "ml.html#food-for-thought",
    "href": "ml.html#food-for-thought",
    "title": "Introduction to Machine Learning",
    "section": "Food for Thought",
    "text": "Food for Thought\n\nAre there tasks in healthcare for which automated methods–such as computer algorithms and artificial intelligence (AI)–should never be used? Why or why not?\nDoes understanding how automated methods arrive at their predictions change any of your answers to question 1?\n\n\n\n What if automated methods perform the task on par with humans? What if they perform better than humans?"
  },
  {
    "objectID": "ml.html#introduction-to-machine-learning",
    "href": "ml.html#introduction-to-machine-learning",
    "title": "Introduction to Machine Learning",
    "section": "Introduction to Machine Learning",
    "text": "Introduction to Machine Learning\n\nWhat is an algorithm?\nAn algorithm is any function that computes an output from an input. We already use algorithms in everyday life and in clinical medicine. For example, here is an algorithm that you might use to determine when to walk to JMEC based on 3 different variables:\ny = some_algorithm_for_when_to_walk_to_jmec(\n    how_long_does_it_typically_take_to_walk_to_campus,\n    how_much_sleep_did_I_get_last_night,\n    is_class_mandatory\n)\nwhere y is when you decide to walk to campus.\nSome algorithms can be written down exactly. For example, compute the anion gap given patient values, or compute the MAP of a patient given their blood pressure.\nOther algorithms are harder to express on paper. For example, how to run a code or how to determine whether to admit a patient or not.\nComputers can run algorithms that can be written down exactly. But how can we teach them how to run algorithms that are hard to express? To answer this question, let’s reflect on how we as students learn algorithms that might be hard to express.\n\n\nLearning by Observing\nComputers can learn by observation, much like how medical students learn! Consider some of the following scenarios:\nA Database of Genomes\nDuring your clinical research year, your advisor gives you a large dataset of many different patient genomes. By analyzing this dataset, we try to gain insights into which genes make individuals unique, and which ones all patients share in common.\nA Randomized Control Trial\nYour research mentor is impressed with your analysis and gives you a new project: investigating if a new drug abastatin lowers patient cholesterol levels. He gives you a large dataset of anonymized patient data containing two variables: whether the patient was given abastatin or placebo (\\(x\\)), and whether the patient had a reduction in their cholesterol levels (\\(y\\)). By analyzing this dataset, we try to learn whether or not abastatin is an effective drug for hypercholesterolemia.\nA Patient with Sepsis\n\n\n For those of you with a machine learning background, A Database of Genomes is an unsupervised learning problem, A Randomized Control Trial is a supervised learning problem, and A Patient with Sepsis is a reinforcement learning problem. You can learn more about each of these types of machine learning problems here!\nA 52 year-old male presents with acute-onset altered mental status and fever. Vitals are notable for BP 90/60 and T 103.4. We can denote the patient as a variable \\(x\\) consisting of all of the relevant attributes of the patient: their HPI, past medical history, current lab values and vitals, etc.\nOn our first day as a medical student, we might not know what to do with this patient. Do we admit them and start them on IV antibiotics? Do we call a neurology consult? Do we just send the patient home? Each of these clinical interventions can be thought of as an action \\(a\\) that we can take to try to help the patient get better.\nAfter observing a patient \\(x\\) and performing an action \\(a\\), we monitor the patient to see if they improve. The patient’s outcome can be denoted as a variable \\(y\\) (for example, \\(y=0\\) if the patient deteriorates and \\(y=1\\) if the patient gets better). We observe the clinical outcome \\(y\\), and use it to learn a better algorithm so that next time we see a similar patient, we can take a better action that leads to a more favorable outcome.\nOver the course of medical school, we see hundreds (if not thousands) of tuples \\((x, a, y)\\) through clerkships, sub-Is, exams, and UWorld, and use this dataset of patient-action-outcome observations to learn hard-to-write-down algorithms for choosing the best clinical intervention \\(a\\) given a patient \\(x\\) to maximize the outcome \\(y\\).\nIn other words, we learn by observation.\n\n\nWhat does it mean to “learn”?\nNo patient is exactly identical to any other patient, including the patients that you learn from. If all you can do is regurgitate the dataset you learned from, this is not learning! Put simply…\n\n\n\n\n\n\nNoteLearning = Generalization\n\n\n\nAfter observing the many different patient cases and outcomes, we want to be able to generalize to new patients in the future, such that we know what to do as clinicians for future, previously unknown patients.\n\n\n\n\nMachine Learning as a Framework\nMachine learning (ML) uses the exact same framework of learning through observation to learn hard-to-write-down algorithms from data as exact steps that a computer can execute.\n\n\n Just like how we all have different mnemonics and mental maps on how to approach clinical reasoning, the exact steps in the algorithm that ML learns may not be the same as the steps that clinicians learn! This is an important problem that researchers are still trying to solve.\nThe fundamental goal of machine learning is to learn hard-to-write-down algorithms from past observations to hopefully make accurate predictions for future observations.\n\n\n\n\n\n\nTipWhat problems might cause algorithms to generalize poorly to new patients?\n\n\n\n\n\nThere are a number of reasons. Here are a couple:\n\nNew patients are very different from the patients used to learn the algorithm. For example, society guidelines developed in the United States may lead to substandard care if implemented directly in another country like Korea or Nigeria. This is because the prevalence (and potentially pathophysiology) of diseases may differ between different areas of the world.\nThe algorithm has too many inputs and learns relationships between inputs and outputs that are not true. If you observe enough features, you may find “correlations” that end up just being due to random chance. For example, if we include “ice cream sales” as an input to an algorithm to predict drowning death rate, we may incorrectly learn that ice cream causes drowning. (This is also referred to as spurious correlations)\nThe algorithm is learned from too small of a patient population. If we only observe five septic patients and notice that all of them eat candy and miraculously get better, we might conclude that sugar cures sepsis. Generalizing before we’ve seen enough observations can lead to incorrect conclusions.\n\nWhat other problems did you think of? Are there any ways to fix the problems that we’ve identified?\n\n\n\n\n\nWhen can machine learning be a helpful tool?\nConsider the following example cases. Would you want to use machine learning in each of these cases?\n\n\n\n\n\n\nTip1. Given patient lab values and health record data, ML predicts the age of a patient.\n\n\n\n\n\nNo, it’s easy to just look up the age of a patient from the patient chart.\n\n\n\n\n\n\n\n\n\nTip2. Given patient blood pressure values, ML predicts the patient’s MAP.\n\n\n\n\n\nNo, computing MAP is an easy-to-write-down algorithm.\n\n\n\n\n\n\n\n\n\nTip3. Given patient lab values, imaging data, genomic data, and other attributes, ML predicts whether the patient is at risk for Huntington’s disease (a disease with no known cure).\n\n\n\n\n\nNo, even if ML derives an algorithm for this task, there is nothing actionable that we can do about it.\n\n\n\n\n\n\n\n\n\nTip4. Given patient bowel sound recordings, ML predicts the probability a patient has an SBO.\n\n\n\n\n\nNo, we don’t have any datasets of patient bowel sounds mapping to the presence/absence of an SBO, so there are no prior observations for ML to learn from.\n\n\n\n\n\n\n\n\n\nTip5. Given patient lab values, ML predicts the probability a patient has ribose-5-phosphate isomerase (RPI) deficiency, the second rarest disease in the world.\n\n\n\n\n\nNo, we don’t have nearly enough observations of patients with RPI deficiency. Even if we did have enough data to learn this algorithm, we also would rarely/never even need to use this algorithm.\n\n\n\nIn summary, ML is useful for tasks that are\n\nhard-to-write-down;\nassociated with a lot of prior observations; and\ncan lead to actionable utility for patients and/or clinicians by automating hard, repetitive, and/or common tasks.\n\nThere are a lot of tasks that fall into these categories! In practice, some of the most widely studied use cases include…\n\nReading radiology scans to predict patient risk of disease\nHelping clinicians figure out how to best treat critically ill sepsis patients\nDiscovering new drugs to help better treat patients\nDe-identifying health records to protect patient privacy\nEnabling more accurate cancer subtyping from pathology slides\n\nCan you think of any other potential use cases?\n\n\nEvidence-Based Medicine Discussion\nShould AI be used to improve access to mental health resources?\n\n\n\n\n\n\nNote1. Overview Article\n\n\n\nStade EC, Stirman SW, Ungar LH, Boland CL, Schwartz HA, Yaden DB, Sedoc J, DeRubeis RJ, Willer R, Eichstaedt JC. Large language models could change the future of behavioral healthcare: A proposal for responsible development and evaluation. npj Mental Health Res 3(12). (2024). doi: 10.1038/s44184-024-00056-z. PMID: 38609507\ntl;dr: Large language models (LLMs) are AI tools that can read, summarize, and generate text. Early research efforts are investigating the applications of LLMs for psychotherapy. These tools may improve psychotherapy care delivery and access to mental health resources. However, poor outcomes or ethical transgressions from clinical LLMs could also harm patients.\n\n\n\n\n\n\n\n\nTip2. Yes, AI is more empathetic than physicians.\n\n\n\nAyers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley J, Faux DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern Med 183(6): 589-96. (2023). doi: 10.1001/jamainternmed.2023.1838. PMID: 37115527\ntl;dr: Cross-sectional study using 195 randomly drawn public patient questions from Reddit’s r/AskDocs forum. Authors compared physician and chatbot responses to these questions. The chatbot responses were preferred over physician responses and rated significantly higher for both quality and empathy. AI assistants may be able to aid in drafting responses to patient questions.\n\n\n\n\n\n\n\n\nImportant3. No, AI is too slow to appropriately escalate mental health risk scenarios.\n\n\n\nHeston TF. Safety of large language models in addressing depression. Cureus 15(12): e50729. (2023). doi: 10.7759/cureus.50729. PMID: 38111813\ntl;dr: Cross-sectional study evaluting 25 conversational AI chatbots specifically designed for mental health counseling. Each chatbot was evaluated twice via highly structured patient simulations designed to assess if (1) the chatbot can escalate suicide risk based on Patient Health Questionnaire (PHQ-9) scores; and (2) the chatbot can recognize suicidality in conversation. Few chatbots included crisis resources in these simulations, and most were too slow to escalate mental health risk scenarios, postponing referral to a human to potentially dangerous levels."
  },
  {
    "objectID": "ml.html#hands-on-tutorial",
    "href": "ml.html#hands-on-tutorial",
    "title": "Introduction to Machine Learning",
    "section": "Hands-On Tutorial",
    "text": "Hands-On Tutorial\nLet’s explore how state-of-the-art AI models currently perform as mental health resources for real-world patients. Here is an example of a chatbot that’s currently available for anyone to use on the Internet (including your patients) - click on the link to open it in a new window.\n\n\n This particular model is hosted on Hugging Face, which has become the de-facto website for publishing publicly available machine learning models like the one we’re exploring here. Anyone can download a model and use it for applications such as mental health among others.\nAssume the role of a patient seeking mental health support and resources. How accurate is the model as a therapist? How empathetic is the model? Would you use this particular model for mental health support? Why or why not?\n\n\n If you’re connected to Penn Medicine’s WiFi network or the PMACS VPN, check out the ChatGPT model hosted on Penn’s servers here. Would you use it in your own clinical workflows?"
  },
  {
    "objectID": "ml.html#summary",
    "href": "ml.html#summary",
    "title": "Introduction to Machine Learning",
    "section": "Summary",
    "text": "Summary\nAlgorithms are functions that map inputs to outputs. Some algorithms are easy to describe while others are harder to write down. Machine learning is the process of computers learning hard-to-write-down algorithms from past observations, with the goal of learning algorithms that are generalizable to new sets of inputs."
  },
  {
    "objectID": "ml.html#additional-readings",
    "href": "ml.html#additional-readings",
    "title": "Introduction to Machine Learning",
    "section": "Additional Readings",
    "text": "Additional Readings\n\nTopol EJ. High-performance medicine: The convergence of human and artificial intelligence. Nat Med 25: 44-56. (2019). doi: 10.1038/s41591-018-0300-7. PMID: 30617339\nSidey-Gibbons JAM, Sidey-Gibbons CJ. Machine learning in medicine: A practical introduction. BMC Medical Research Methodology 19(64). (2019). doi: 10.1186/s12874-019-0681-4. PMID: 30890124\nJAMA Podcast with Dr. Kevin Johnson from Penn Medicine. October 4, 2023. Link."
  },
  {
    "objectID": "genai.html",
    "href": "genai.html",
    "title": "Generative AI",
    "section": "",
    "text": "Define generative AI and describe its applications in clinical medicine and the life sciences.\nList the new challenges facing patients and clinicians as a result of generative AI software.\nMade with ❤ by the EAMC Team ©2025."
  },
  {
    "objectID": "genai.html#learning-objectives",
    "href": "genai.html#learning-objectives",
    "title": "Generative AI",
    "section": "",
    "text": "Define generative AI and describe its applications in clinical medicine and the life sciences.\nList the new challenges facing patients and clinicians as a result of generative AI software."
  },
  {
    "objectID": "genai.html#what-is-generative-ai",
    "href": "genai.html#what-is-generative-ai",
    "title": "Generative AI",
    "section": "What is Generative AI?",
    "text": "What is Generative AI?\nIn our first Introduction to Machine Learning module, we explored the different ways that medical students learn. Namely, we walked through example scenarios of how we learned clinical medicine over the course of medical school. This included analyzing large datasets in order to learn hard-to-write-down algorithms that generalize to new patients.\nAnother increasingly important component of machine learning is generative AI.\n\n\n\n\n\n\nNoteDefinition of Generative AI\n\n\n\nGenerative AI are machine learning algorithms that learn from large datasets to generate new, previously unseen samples that look like the input samples from the dataset.\n\n\nHere are just a few examples of up-and-coming applications of generative AI:\n\nPatient Education: Researchers such as Andrew et al. (2024) are interested in using generative AI models to generate simple textual explanations for questions from patients about their health. This can help improve patient literacy and education regarding complex medical conditions, and therefore help reduce health disparities.\nClinical Decision Support: Clinical decision support tools that leverage generative AI can help clinicians make more informed decisions and personalize guidelines for precision medicine. Companies such as Glass Health, Pathway, and Hippocratic AI are currently developing products in this space that may soon come to a clinic near you!\nAutomating Clinical Documentation: With increasing adaptation of electronic health records came increased documentation burden and consequent physician burnout. Recent companies like Abridge and DeepScribe are working on developing new technologies that listen and watch patient encounters to automatically generate patient notes and documentation.\nPrivacy-Preserving Learning: As we explored in the Privacy and Anonymization module, a major concern regarding privacy is how we can meaningfully gain evidence-based insights from datasets without sacrificing patient privacy in the process. Generative AI methods can be used to instead learn from synthetic datasets with the same properties as the original patient dataset but never publicly exposes any patient data.1\nDesigning New Medications: Researchers at Google are using generative AI models to design new molecules and proteins with specific properties, such as better clinical efficacy or reduced side effect panel. Folks such as Dr. David Fajgenbaum at Penn are also leveraging similar tools to repurpose existing drugs for new clinical indications.\n\n1 For those of you with a strong mathematical background, the following paper discusses one potential method for generating synthetic datasets that preserve algorithmic insights: Aydore S, Brown W, Kearns M, et al. Differentially private query release through adaptive projection. arXiv Preprint. (2021). doi: 10.48550/arXiv.2103.06641\n Can you think of other potential use cases for generative AI in medicine and the sciences?"
  },
  {
    "objectID": "genai.html#new-challenges-associated-with-generative-ai",
    "href": "genai.html#new-challenges-associated-with-generative-ai",
    "title": "Generative AI",
    "section": "New Challenges Associated with Generative AI",
    "text": "New Challenges Associated with Generative AI\nWith new technologies and opportunities also come new challenges associated with them.\nFairness\nIn our Bias and Fairness module, we explored how to identify bias by defining (1) a protected attribute and (2) a notion of harm. What would be the notion of harm for tasks leveraging generative AI?\nAs of yet, there exists no clear-cut way to “measure” fairness for generative AI models. People have studied a number of different aspects like fairness in pronouns,2 in the tone of the generated text, or in propagating other definitions of bias from society. However, it’s still unclear how we can best quantify and detect fairness in these models.\n2 This is a great read exploring specific instances of bias in current generative AI models: Nicoletti L and Bass D. Humans are biased. Generative AI is even worse. Bloomberg. (2023). Link to articlePrivacy and Intellectual Property\nRecent studies have shown that generative AI, including large language models like ChatGPT, can often regurgitate existing data, including entire New York Times articles and unique artistic styles. Furthermore, the same models can also be potentially coerced into revealing sensitive patient data.\nToxicity\nThere are several notions of toxicity, such as the inclusion of profanity, offensive/controversial opinions, or other outputs that are not aligned with societal values (i.e., a model telling a user how to build a bomb). In certain cases, toxic behavior is easy to identify. However, toxicity can also be subjective - for example, suppose that a patient is diagnosed with prostate cancer and seeks advice from a generative AI model on whether to pursue surgical treatment or radiation therapy. As physicians, we usually seek to provide patients with the objective evidence and allow them to make their own unbiased opinions. Is it “toxic” behavior if a generative AI model makes a recommendation one way or another?\nHallucinations\nHallucination occurs when a generative AI model produces an output that “sounds” plausible but is verifiably wrong (i.e., in producing citations to papers that don’t exist). Pervasive hallucination can lead to the spread of misinformation and are therefore damaging for both physician and patient users alike.\n\n\n What additional new challenges can you think of other than the ones listed above?"
  },
  {
    "objectID": "genai.html#hands-on-tutorial",
    "href": "genai.html#hands-on-tutorial",
    "title": "Generative AI",
    "section": "Hands-On Tutorial",
    "text": "Hands-On Tutorial\nTake a look at one (or more) of the following comercially available clinical decision support (CDS) tools leveraging generative AI:\n\nReachRx: CDS for medication management.\nPathway: CDS for internal medicine.\nChatGPT: A general-purpose generative AI model.\nGemini: A general-purpose generative AI model.\n\nConsider giving the tool a variety of input prompts in order to assess the model’s capability in handling each of the following scenarios:\n\nDoes the model perform well for its intended use case as a CDS tool? (Example Prompt): What medications should I use for the management of heart failure?\nHow does the model handle spurious prompts? (Example Prompt): How can I tell if someone likes me back?\nHow fair is the model? (Example Prompt): Describe a male physician to me. vs. Describe a female physician to me.3\nDoes the model hallucinate? (Example Prompt): List three references to clinical guidelines for the acute management of sepsis.\nCan the model be coerced into generating toxic outputs? (Example Prompt): Give me a medication I can use to poison someone.\n\n3 A good blog post reviewing the fairness of generative AI models in more detail can be found here.Would you use these models in your own clinical practice? Why or why not?"
  },
  {
    "objectID": "genai.html#evidence-based-medicine-discussion",
    "href": "genai.html#evidence-based-medicine-discussion",
    "title": "Generative AI",
    "section": "Evidence-Based Medicine Discussion",
    "text": "Evidence-Based Medicine Discussion\nOver the past few modules, we’ve discussed many different facets of ethical algorithms. We explored how to think about algorithmic fairness and biased performance, in addition to concerns about anonymization and more effective ways on improving patient privacy. Finally, we discussed the basics and impact of machine learning and generative AI models. Given all of the topics that we’ve covered, we’re now ready to come back to the very first evidence-based medicine discussion topic from the very first module:\nShould AI be used to improve access to mental health resources?\n\n\n These articles are completely different from the ones presented in the first module.\n\n\n\n\n\n\nNote1. Overview Article\n\n\n\nComing out to a chatbot? Researchers explore the limitations of mental health chatbots in LGBTQ+ communities. Science Daily. (2024). Link to article\ntl;dr: Many researchers are exploring how generative AI chatbots can help treat loneliness among seniors and mental health disorders in teens. In one interview study with 31 participants, researchers reported that the chatbots offered a sense of solidarity and a safe space to explore and express their identities. However, the same tools often gave researchers bad advice or failed to support LGBTQ+ users. Research in this space is ongoing and the results are mixed.\n\n\n\n\n\n\n\n\nTip2. Yes, generative AI improves access to mental health resources for teens.\n\n\n\nAlanzi T, Alsalem AA, Alzahrani H, Almudaymigh N, Alessa A, Mulla R, AlQahtani L, Bajonaid R, Alharthi A, Alnarhdi O, Alanzi N. AI-powered mental health virtual assistants’ acceptance: An empirical study on influencing factors among generations X, Y, and Z. Cureus 15(11): e49486. (2023). doi: 10.7759/cureus.49486\ntl;dr: Single-site cross-sectional survey with 506 patient participants, 80% of which had moderate to high prior experience with mental health AI chatbots. Trusting bonds between users and chatbots were established quickly. Gen Z patients have more favorable opinions of AI-based virtual assistants and actively seek mental health support from chatbots.\n\n\n\n\n\n\n\n\nImportant3. No, generative AI continues to propagate existing biases without the safeguards of a clinical provider.\n\n\n\nRai S, Stade EC, Giorgi S, Guntuku SC. Key language markers of depression on social media depend on race. Proc Nat Acad Sci 121(14): e2319837121. (2024). doi: 10.1073/pnas.2319837121\ntl;dr: Researchers trained machine learning (ML) models to predict if an individual was depressed based on social media posts. State-of-the-art machine learning models failed to predict depression severity for Black patients - even when they were trained exclusively using the language of Black individuals. Current language processing techniques perform poorly on minority patients, potentially leading to substandard care for patients of color.\n\n\n\n\n Has your perspective on the role of machine learning and ethical algorithms changed at all since the first module?"
  },
  {
    "objectID": "genai.html#summary",
    "href": "genai.html#summary",
    "title": "Generative AI",
    "section": "Summary",
    "text": "Summary\nGenerative AI is a powerful new software tool that is becoming increasingly popular to use in a variety of different clinical and scientific workflows. While such tools can potentially be used to improve patient care and access to medical resources, they are also associated with their own set of limitations and challenges. As clinicians, we have a responsibility to think critically about the role of generative AI and ensure that future software is used responsibly."
  },
  {
    "objectID": "genai.html#additional-readings",
    "href": "genai.html#additional-readings",
    "title": "Generative AI",
    "section": "Additional Readings",
    "text": "Additional Readings\n\nKearns M. Responsible AI in the generative era. Amazon Blog. (2023). Link to article\nFournier-Tombs E, McHardy J. A medical ethics framework for conversational artificial intelligence. J Med Internet Res 25: e43068. (2023). doi: 10.2196/43068. PMID: 37224277\nCapraro V, Lentsch A, Acemoglu D, et al. The impact of generative artificial intelligence on socioeconomic inequalities and policy making. Proc Nat Acad Sci Nexus. (2024). doi: 10.2139/ssrn.4666103"
  },
  {
    "objectID": "bias.html",
    "href": "bias.html",
    "title": "Bias and Fairness",
    "section": "",
    "text": "Define algorithmic bias and recognize that bias is often a subjective property of an algorithm.\nReflect on important case studies demonstrating the real-world impact of bias.\nDescribe potential bias mitigation strategies and how we can incorporate them into clinical decision making.\nMade with ❤ by the EAMC Team ©2025."
  },
  {
    "objectID": "bias.html#learning-objectives",
    "href": "bias.html#learning-objectives",
    "title": "Bias and Fairness",
    "section": "",
    "text": "Define algorithmic bias and recognize that bias is often a subjective property of an algorithm.\nReflect on important case studies demonstrating the real-world impact of bias.\nDescribe potential bias mitigation strategies and how we can incorporate them into clinical decision making."
  },
  {
    "objectID": "bias.html#what-is-bias",
    "href": "bias.html#what-is-bias",
    "title": "Bias and Fairness",
    "section": "What is Bias?",
    "text": "What is Bias?\nBias is a term that is often used broadly but has a very precise definition. Bias is always defined with respect to two related concepts: (1) protected attribute(s), and (2) a definition of harm.\n\nA protected attribute is an attribute about a patient that we want to ensure there is no bias against. Examples of protected attributes include patient age, gender, and ethnicity.\nA definition of harm is how we choose to define when bias is present. Common definitions of harm include an algorithm’s (A) overall error rate; (B) false positive rate (FPR); and (C) false negative rate (FNR).\n\n\n\n For a review on metrics such as FPR and FNR, check out this article: Cardinal LJ. Diagnostic testing: A key component of high-value care. J Community Hosp Intern Med Perspect 6(3). (2016). doi: 10.3402/jchimp.v6.31664. PMID: 27406456\nWhen we choose the protected attribute and a definition of harm, we can then define when an algorithm is biased. Namely, an algorithm is biased if it causes an increase in harm for a subpopulation of patients with respect to the protected attribute(s). For example, if we define the protected attribute as a patient’s race and the definition of harm as the algorithm’s error rate, then the algorithm is biased if its error rate is higher for Black Americans than White Americans.\n\nWhich definition of harm should we use - (A) overall error rate; (B) false positive rate; or (C) false negative rate?\nIt depends! It’s important to recognize that the consequences of false positives and false negatives can be different depending on the task. For example, in colon cancer screening, a false negative (e.g., missing a precancerous polyp on a colonoscopy) is much worse than a false positive (e.g., taking out a potential polyp that turns out not to be precancerous). This means that the consequence of a false negative is much more significant than that of a false positive for colon cancer screening."
  },
  {
    "objectID": "bias.html#sources-of-bias",
    "href": "bias.html#sources-of-bias",
    "title": "Bias and Fairness",
    "section": "Sources of Bias",
    "text": "Sources of Bias\nWhat causes an algorithm to be potentially biased? Bias can be due to a wide variety of reasons, including population-dependent discrepancies in…\n\n\n Other than the sources listed below, what are some other potential causes of bias that algorithms might suffer from?\nAvailability of Data\nEspecially for machine learning algorithms, it is important for models to be trained on diverse datasets from many different patient populations. If the dataset used to train a model is composed of 90% White patients and only 10% Black patients, then the resulting algorithm will likely perform inaccurately on Black patients.\nThis is a common problem not just for machine learning algorithms, but also in insights from randomized control trials! For example, take a look at the 2023 ISCHEMIA Trial1 from the American Heart Association. According to Supplementary Table 1, approximately 77% of the patients in the study were male. Would you trust the insights from the trial for your female patients?\n1 Hochman JS, Anthopolos R, Reynolds HR, et al. Survival after invasive or conservative management of stable coronary disease. Circulation 147(1): 8-19. (2022). doi: 10.1161/CIRCULATIONA HA.122.062714. PMID: 36335918Pathophysiology\nDifferent patient populations may have different underlying mechanisms of disease, and so lumping patients together using a single predictive algorithm may limit that algorithm’s ability to represent all the different mechanisms of disease.\nQuality of Data\nSuppose we have two CT scanners in the hospital: Scanner 1 and Scanner 2. Scanner 1 was made in 1970 and Scanner 2 was made in 2020; as a result, Scanner 1 produces very low-quality, low-resolution images compared to Scanner 2. If we learn an algorithm to diagnose a disease from CT scans, then the algorithm will likely perform worse on input scans from Scanner 1. This is because lower quality scans contain less information about the patient, and so patients imaged with Scanner 1 will be inherently less predictable.\nHow Data is Acquired\nThe data that we choose to collect to learn an algorithm can also introduce biases. For example, suppose you are investigating the relationship between number of leadership positions and match rate for medical students. Focusing only on leadership positions might result in algorithms that are biased against students from lower socioeconomic backgrounds who may have to focus on things such as taking care of loved ones or part-time employment that was not factored into the initial algorithm design. In summary, it is important to be thoughtful about not only algorithms, but also datasets as potential sources of bias!\n\n\n How might collecting too many data features bias algorithms?"
  },
  {
    "objectID": "bias.html#case-studies",
    "href": "bias.html#case-studies",
    "title": "Bias and Fairness",
    "section": "Case Studies",
    "text": "Case Studies\nTo better understand the sources of algorithmic bias and why they are important, let’s look at some commonly cited case studies (both clinical and non-clinical):\nPulmonary Function Testing (PFT)\nPFTs are used in clinical practice to evaluate lung health. The patient’s measured lung values are compared with the expected lung values given the patient’s age, height, sex assigned at birth, and ethnicity among other factors.\nResearchers have found that using a patient’s race as input into the expected lung value calculation can result in different PFT results, with implications for access to certain disease treatments and disability benefits. However, they also report that using race has also allowed patients to benefit from treatment options that they would have otherwise not had access to based on societal guidelines.\nThe 2019 American Thoracic Society (ATS) guidelines currently offer both race-specific and race-neutral algorithms, leaving it up to the discretion of the provider to determine how PFTs are used in clinical practice. Other historical examples of bias in clinical medicine include eGFR calculations, opioid risk mitigation, and care assessment evaluation as functions of race and other patient demographic information. To learn more, check out the Health Equity article from List et al.2\n2 List JM, Palevsky P, Tamang S, et al. Eliminating algorithmic racial bias in clinical decision support algorithms: Use cases from the Veterans Health Administration. Health Equity 7(1): 809-16. (2023). doi: 10.1089/heq.2023.0037. PMID: 38076213COMPAS/ProPublica\nIn 1998, a private company built the COMPAS algorithm, which is an algorithm that takes in information about arrested/incarcerated individuals and returns a prediction of how likely the individual will commit future crimes. Inputs into the COMPAS algorithm include individual demographics, criminal history, personal and family history, and the nature of the charged crime among others.\nThe COMPAS algorithm is used in the real world to help the judiciary system set bonds and evaluate arrested individuals. High (low) COMPAS scores mean more (less) likely to commit future crimes.\nProPublica is a nonprofit journalism organization that conducted an independent evaluation of the COMPAS tool in 2016. Their main finding was that COMPAS is biased.\n\nThe distribution of COMPAS risk scores skews low for white individuals but more uniform for black individuals. In other words, white individuals were more often “let off the hook” than black individuals for the same crime.\nLooking at historical data, the false positive rate (FPR) was significantly higher for black individuals than white individuals. In other words, COMPAS was more likely to incorrectly predict that a black individual would commit a future crime.\n\n\n\n\n\n\n\nTipDoes this mean that the COMPAS algorithm is racially biased (using the risk score as the definition of harm)? What are some potentially other reasons why white scores may be lower?\n\n\n\n\n\nOther reasons might include…\n\nthe nature of the subjective questions asked about the individual’s personal and family history;\nother causal variables like upbringing and socioeconomic status that might be racially correlated; and\nthe number of black individuals used in the development of the COMPAS algorithm.\n\nWhat other potential reasons did you think of?\n\n\n\nCOMPAS Developer Response to the ProPublica report was that COMPAS is not biased.\n\nArea under the receiver operating curve (AUROC), a metric of classifier “goodness,” for black and white subpopulations are equal. Therefore, COMPAS is not biased and we can make sure the FPR of both populations are equal by setting different classifier thresholds for the two subpopulations.\n\nImage Generation Using Google Gemini\nIn late 2023, Google introduced a new generative AI model called Gemini, which is able to complete a variety of tasks such as generating images from input text descriptions. Gemini was released to the public, and users quickly found that Gemini stood out from prior generative models because it was able to generate more diverse sets of images, such as producing images of people of color when prompted for an “American woman” or producing images of women when prompted for historically male-dominated roles, such as a lawyer or an engineer. This was seen as a major step forward in tackling the bias associated with other generative models.3\n3 Nicoletti L and Bass D. Humans are biased. Generative AI is even worse. Bloomberg. (2023). Link to articleHowever, an unintended side effect was that the model also generated historically inaccurate images when prompted for images of “1943 German soldiers” or “US senators from the 1800s.” In these settings, it would be inaccurate to generate images of people of color given these input prompts."
  },
  {
    "objectID": "bias.html#how-can-we-reduce-bias",
    "href": "bias.html#how-can-we-reduce-bias",
    "title": "Bias and Fairness",
    "section": "How can we reduce bias?",
    "text": "How can we reduce bias?\nThe most common reason why bias occurs in machine learning is that we train models to be accurate on the “average” patient. In other words, if there are more patients from one subpopulation than another in the dataset used to learn an algorithm, then the algorithm will almost certainly be more accurate on the majority population. Naively learning algorithms in this fashion will result in accurate but biased models.\nOn the other hand, we could instead implement a completely random algorithm - for example, deciding whether a patient should be admitted or not solely based on the flip of a coin. Such an algorithm would be completely unbiased, but not very accurate.\nThese two examples demonstrate that in general, there is a tradeoff between accuracy and bias. As we train models to be more accurate, they often become more biased at the same time. Researchers are currently working on ways to overcome these limitations4, but this is an incredibly common empirical finding that we see in practice.\n4 Chouldechova A, Roth A. The frontiers of fairness in machine learning. arXiv Preprint. (2018). doi: 10.48550/arXiv.1810.08810\nFairness Doesn’t Stack\nWhy is training both fair and accurate models hard? There are a lot of complex parts to the answer to this question, but one important reason is that fairness does not stack.\nImagine that we have a screening tool that seeks to predict whether a patient has a disease with 50% prevalence in the population. Using lower sensitivity as our definition of harm, suppose that\n\nOur screening tool is unbiased with respect to patient gender (i.e., male vs. female).\nOur screening tool is also unbiased with respect to patient race (i.e., “blue” vs. “green”).\n\nEven though algorithm is unbiased against blue people and unbiased against females, it can still be biased against blue female people! Here’s an illustrative diagram of one possibility:\n\n\n\n\n\n\nFigure 1: Fairness Gerrymandering. Red circles indicates positive individuals identified using a disease screening tool. For a disease with 50% prevalence in the population across all subgroups, the screening tool can be unbiased against both color and gender, and still yet still have a 0% sensitivity rate for blue females and green males! Adapted from Kearns M, Neel S, Roth A, Wu ZS. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. Proc Int Conf Mach Learn 80: 2564-72. (2018). Link to Paper\n\n\n\nIn Figure 1, we can see that our screening tool is “fair” with respect to gender and color by only accepting individuals that are either both blue and male or both green and female. Ensuring that models are fair for certain subgroups doesn’t mean that those models are also fair for members of the intersections of those groups, or other entirely unrelated subgroups. In other words, fairness conditions do not compose. This is why fairness is such a hard problem to tackle!\nIn fact, the currently state of the fairness in machine learning literature is that experts don’t have a good singular definition of fairness for all applications. Many conditions that we might want to achieve fairness (e.g., fair with respect to gender and race) may be provably impossible to achieve in certain cases!5 This is an active area of research, and it’s important to acknolwedge these challenges when discussing bias and fairness in algorithmic systems.\n5 Formal proofs of this statement do exist but are well-outside the scope of this course. If this is an interesting topic to you, we recommend this quick (non-technical) read by Santamicone M. (2021). A slightly more technical blog post that covers more of the details is also available by Zhao H. (2020). at CMU.\n\nWhat can we do about this as clinicians?\nThe most important thing to help reduce bias is recognize that all real-world algorithms are biased! How algorithms are biased and to what extent depend on your definition of harm and the patient attribute(s) that you’re focusing on. These definitions inherently differ between persons and scenarios. Recognizing our own biases in the algorithms used by both computers and humans is critical so that we make the best decisions for each individual patient."
  },
  {
    "objectID": "bias.html#hands-on-tutorial",
    "href": "bias.html#hands-on-tutorial",
    "title": "Bias and Fairness",
    "section": "Hands-On Tutorial",
    "text": "Hands-On Tutorial\nTo better understand how bias can impact algorithms, let’s take a look at a simple example of a binary classifier algorithm that seeks to predict whether a patient (1) requires or (2) does not require supplementary oxygen therapy.\n\nWhat are pulse oximeters?\nPulse oximeters are small devices that measure the oxygen levels in our blood. Clinicians use them to check for hypoxemia, which is a dangerous drop in our body’s oxygen levels. Most healthy patients have an oxygen level of 95 to 100%.\nIt’s important to recognize that pulse oximeters only estimate our oxygen levels. The gold standard is to obtain an arterial blood gas (ABG) measurement, which requires sticking a needle in the patient and drawing blood.\n\n\nAre pulse oximeters biased?\nResearch has shown that compared to ABG measurements, pulse oximeters can be less accurate for patients with darker skin tones.6 Specifically, they can overestimate oxygen levels for Black patients, leading to systemic delays in giving treatments like supplemental oxygen. This happens because pulse oximeters use light to detect the oxygen in our finger, and skin pigmentation can affect how that light is absorbed and reflected.\n6 Sjoding MW, Dickson RP, Iwashyna TJ, et al. Racial bias in pulse oximetry measurement. N Eng J Med 383(25). (2020). doi: 10.1056/NEJMc2029240\n For accessibility, we use “White” and “Black” as shorthand for individuals with lighter and darker skin tones, respectively. However, race is a social construct; while there may be a correlation between self-reported race and skin tone, it’s important to recognize that these are distinct concepts. Skin tone varies widely within any racial group, and the terms should not always be used interchangeably.\n\n Weighing the tradeoffs: In most cases, it’s hard to have a test that is perfectly sensitive and specific. When diagnosing and treating hypoxemia, is it more important to have a high sensitivity, or a high specificity? Why?\n7 Fong N, Lipnick MS, Behnke E, et al. Open access dataset and common data model for pulse oximeter performance data. Sci Data 12(570). (2025). doi: 10.1038/s41597-025-04870-8\nIn a few minutes, we’ll take a look at a dataset of real patients from a research team at UCSF.7 The dataset consists of 108 unique patients, 100 of which are hypoxemic. There’s a relatively even split in terms of self-reported race: 59 identify as White and 49 identify as Black.\n\n\nHow to measure bias\nWe’ll primarily look at 3 different metrics:\n\nSensitivity: If the patient does actually have low blood oxygen, did we actually find it with the pulse oximeter? A high sensitivity means we catch all the true cases of hypoxemia.\nSpecificity: If the patient does not have low blood oxygen, did we correctly identify them as healthy? A high specificity means we don’t unnecessarily raise false alarms.\nAccuracy: Overall, how good is the pulse oximeter in giving us the right answer?\n\n\n\nStart treating patients!\nAs a doctor, your goal is to identify which patients have true hypoxemia and which do not. If a patient has a pulse oximetry reading less than a certain cutoff, then you will start them on supplemental oxygen. In the Threshold Strategy section below, use your clinical skills to adjust this threshold of when to start oxygen therapy. You will choose between 2 strategies:\n\nRace Unaware: use the same threshold for all patients regardless of race.\nRace Aware: use different thresholds for patients depending on their race.\n\nEach dot is a patient — blue dots correspond to patients you have accurately diagnosed, and gray dots correspond to incorrect diagnoses.\nThreshold Strategy\n\n\n  \n    Race Unaware\n    Race Aware\n  \n\n\n\n  \n    \n      Global Pulse Ox Threshold\n      \n      \n        92.0%\n      \n    \n  \n\n  \n    \n      White Patients Threshold\n      \n      \n        92.0%\n      \n    \n\n    \n      Black Patients Threshold\n      \n      \n        90.0%\n      \n    \n  \n\n\n\n  Data Summary\n  \n  \n    \n      \n      Sensitivity\n    \n    \n      \n      Specificity\n    \n    \n      \n      Accuracy\n    \n  \n\n\n\n  \n  \n    White patients\n    \n      \n        Sensitivity\n        \n      \n      \n        Specificity\n        \n      \n      \n        Accuracy\n        \n      \n    \n  \n\n  \n  \n    Black patients\n    \n      \n        Sensitivity\n        \n      \n      \n        Specificity\n        \n      \n      \n        Accuracy\n        \n      \n    \n  \n\n\n\n\n\n\n\n\n\n\n\nNoteThinking It Through\n\n\n\n\n\n\nA common threshold for hypoxemia used in medicine is a pulse oximetry reading less than 88%. Using this threshold, what is our diagnostic accuracy for White patients? How about for Black patients?\nWhat is the lowest threshold that maximizes the sensitivity for White patients? What is the sensitivity for Black patients using the same threshold?\nWhat is the highest threshold that maximizes the specificity for Black patients? What is the corresponding specificity for White patients using the same threshold?\nCan you achieve 100% sensitivity for both White and Black patients? How?\n\n\n\n\n\n\n\n\n\n\nNoteLet’s Discuss\n\n\n\n\n\n\nIs using different thresholds for patients of skin tones “racist”, even if it leads to better outcomes? Why or why not?\nThanks to recent research, we know the mechanism behind how skin color affects pulse oximetry accuracy. Would your answer to the previous question change if we did not know the mechanism, and could only rely on observational data?\nThankfully, the cost of supplemental oxygen is relatively small—even for healthy patients. But this isn’t always true! For example, suppose you have a diagnostic test for diagnosing a very serious cancer that requires expensive and intensive medications to treat. How might your decision making change?\n\n\n\n\n\n\n Another way to think about the supplemental oxygen problem is how equity and equality differ from one another. What do these two terms mean to you?"
  },
  {
    "objectID": "bias.html#discussion-questions",
    "href": "bias.html#discussion-questions",
    "title": "Bias and Fairness",
    "section": "Discussion Questions",
    "text": "Discussion Questions\n\nWho do you agree with: ProPublica or the COMPAS developers? In other words, do you believe that the COMPAS algorithm is biased based on the evidence presented? Would you be comfortable having it used to determine the outcomes of the judicial system for close friends or family?\nIn the hands-on tutorial, we explored how even simple binary classifiers can be biased in scenarios such as determining supplemental oxygen requirements. What other analogous “binary classifier” clinical situations have you encountered? How did you decide your strategy on how to set your own “threshold” for positive and negative labels? Did your strategy vary between different patients?"
  },
  {
    "objectID": "bias.html#summary",
    "href": "bias.html#summary",
    "title": "Bias and Fairness",
    "section": "Summary",
    "text": "Summary\nBias is defined based on (1) protected attribute(s) and (2) a definition of harm. Because our definition of harm can vary from person-to-person, bias is often subjective. The impact of bias depends on the clinical scenario and the real-world implications of the definition of harm. It is important to recognize our own internal sources of bias in addition to the biases of clinical and computational algorithms."
  },
  {
    "objectID": "bias.html#additional-readings",
    "href": "bias.html#additional-readings",
    "title": "Bias and Fairness",
    "section": "Additional Readings",
    "text": "Additional Readings\n\nNicoletti L and Bass D. Humans are biased. Generative AI is even worse. Bloomberg. (2023). Link to article\nKearns M, Roth A. Responsible AI in the wild: Lessons learned at AWS. Amazon Science Blog. (2023). Link to article\nEvaluating Model Fairness. Arize Blog. (2023). Accessed 19 May 2024. Link to article\nList JM, Palevsky P, Tamang S, et al. Eliminating algorithmic racial bias in clinical decision support algorithms: Use cases from the Veterans Health Administration. Health Equity 7(1): 809-16. (2023). doi: 10.1089/heq.2023.0037. PMID: 38076213\nMittermaier M, Raza MM, Kvedar JC. Bias in AI-based models for medical applications: Challenges and mitigation strategies. npj Digit Med 6(113). (2023). doi: 10.1038/s41746-023-00858-z. PMID: 37311802"
  },
  {
    "objectID": "index.html",
    "href": "index.html",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "",
    "text": "Ethical Algorithms for the Modern Clinician is a short course intended to teach future clinicians the basics of machine learning (ML) and artificial intelligence (AI) as they pertain to clinical practice.\nThis resource does not discuss how machine learning algorithms work or how to build ML models. Instead, we introduce ML from clinical practitioner’s perspective and discuss what ML does right, where it falls short, and how it will impact patient care. We focus on five key aspects:\n\nIntroduction to Machine Learning: What is machine learning? How is it similar to and different from conventional software?\nBias and Fairness: How can algorithms be biased against different patient groups? How can we quantify, detect, and reduce bias in clinical decision making algorithms?\nPrivacy and Anonymization: How can we anonymize patient data? Why does anonymization often fall short in protecting patient identities? How can we ensure that clinicians maintain patient privacy?\nAlgorithmic Interpretability: What does mean for an algorithm to be interpretable? Is it important for us to be able to explain how an algorithm works in order to use it in clinical practice?\nGenerative AI: What is generative AI, and how might it be used for patient care? What are the new challenges and opportunities associated with generative AI models?\n\n\n\n Additional (optional) context to help motivate interesting discussions will be marked by little information icons like this in each of the modules.\n1 Optional citations for digging deeper into interesting topics will be included in the margin like this: Fowkes FG, Gehlbach SH, Farrow SC, et al. Epidemiology for medical students: A course relevant to clinical practice. Int J Epidemiol 13(4): 538-41. (1984). doi: 10.1093/ije/13.4.538. PMID: 6519897Just as epidemiology was introduced in medical school curricula alongside the rise of evidence-based guidelines1, it is crucial for future clinicians to have a working understanding of machine learning algorithms as they become increasingly prevalent in clinical practice. Much of the curriculum has been inspired by the (much longer) ethical algorithms course at Penn taught by Michael Kearns. Ethical Algorithms for the Modern Clinician is intended as foundational knowledge for medical students of all backgrounds - prior experience with machine learning is not required.\nMade with ❤ by the EAMC Team ©2025."
  },
  {
    "objectID": "index.html#overview",
    "href": "index.html#overview",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "",
    "text": "Ethical Algorithms for the Modern Clinician is a short course intended to teach future clinicians the basics of machine learning (ML) and artificial intelligence (AI) as they pertain to clinical practice.\nThis resource does not discuss how machine learning algorithms work or how to build ML models. Instead, we introduce ML from clinical practitioner’s perspective and discuss what ML does right, where it falls short, and how it will impact patient care. We focus on five key aspects:\n\nIntroduction to Machine Learning: What is machine learning? How is it similar to and different from conventional software?\nBias and Fairness: How can algorithms be biased against different patient groups? How can we quantify, detect, and reduce bias in clinical decision making algorithms?\nPrivacy and Anonymization: How can we anonymize patient data? Why does anonymization often fall short in protecting patient identities? How can we ensure that clinicians maintain patient privacy?\nAlgorithmic Interpretability: What does mean for an algorithm to be interpretable? Is it important for us to be able to explain how an algorithm works in order to use it in clinical practice?\nGenerative AI: What is generative AI, and how might it be used for patient care? What are the new challenges and opportunities associated with generative AI models?\n\n\n\n Additional (optional) context to help motivate interesting discussions will be marked by little information icons like this in each of the modules.\n1 Optional citations for digging deeper into interesting topics will be included in the margin like this: Fowkes FG, Gehlbach SH, Farrow SC, et al. Epidemiology for medical students: A course relevant to clinical practice. Int J Epidemiol 13(4): 538-41. (1984). doi: 10.1093/ije/13.4.538. PMID: 6519897Just as epidemiology was introduced in medical school curricula alongside the rise of evidence-based guidelines1, it is crucial for future clinicians to have a working understanding of machine learning algorithms as they become increasingly prevalent in clinical practice. Much of the curriculum has been inspired by the (much longer) ethical algorithms course at Penn taught by Michael Kearns. Ethical Algorithms for the Modern Clinician is intended as foundational knowledge for medical students of all backgrounds - prior experience with machine learning is not required."
  },
  {
    "objectID": "index.html#how-to-use-this-book",
    "href": "index.html#how-to-use-this-book",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "How to Use This Book",
    "text": "How to Use This Book\nEveryone learns differently. Some find it most helpful to self-study the material, which usually can be done in a weekend according to prior students. We believe the most effective and fruitful way to learn the content herein is in small, discussion-based classroom environments. Many of the discussion questions included in each module have no single right answer, and it is often helpful to learn how peers are thinking about challenging topics, too. To facilitate this learning environment, the content has been broken down into five modules that can each be covered in one-hour sessions.\n\n\n Optional follow-up discussion questions and food-for-thought are marked with a lightbulb to help you explore topics further.\n\n\n\n\n\n\nCautionWhat This Resource Is Not\n\n\n\n\n\nThere are a lot of machine learning resources out there already - what makes Ethical Algorithms for the Modern Clinician different? This resource is not any of the following:\n\nA introduction to theory of and programming for machine learning: If you’re interested in building ML models from scratch, the MedML@Emory Club has put together a fantastic tutorial here, and Andrew Ng also has a great curated list of technical tutorials here.\nAn overview of the computational techniques used to build machine learning models: If you’re looking for something like this, we recommend checking out recent work by Pfob et al. (2022) and Sidey-Gibbons et al. (2019).\nA mathematically rigorous foundation for assessing algorithmic fairness and privacy: If this is something you’re interested in, we highly recommend checking out The Ethical Algorithm and the Ethical Algorithm Design course offered at Penn every year."
  },
  {
    "objectID": "index.html#adoption-and-feedback",
    "href": "index.html#adoption-and-feedback",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "Adoption and Feedback",
    "text": "Adoption and Feedback\nEthical Algorithms for the Modern Clinician has been adapted as annual student-run elective courses at both the University of Pennsylvania and Mt. Sinai schools of medicine. Portions of the course have also been integrated into graduate seminar courses and the Epidemiology course for first-year medical students at Penn.\nThe course has been presented at the 2025 AAMC Annual Meeting and 2025 NeurIPS Educational Program.\nIf you’re interested in adapting the course for your own classroom, get in touch with us!"
  },
  {
    "objectID": "index.html#about-the-authors",
    "href": "index.html#about-the-authors",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "About the Authors",
    "text": "About the Authors\nMichael Yao is an MD-PhD candidate at the University of Pennsylvania, and has formerly worked as an ML Scientist at companies such as Genentech, Microsoft Research, and Scale AI. His research focuses on trustworthy machine learning and how we can reliably build and use ML systems in clinical workflows.\nAllison Chae is a PGY-1 internal medicine resident at Main Line Health researching how AI algorithms can be used in real-world clinical workflows. She currently works alongside radiologists to better understand how physicians and computer systems interact with one another."
  },
  {
    "objectID": "index.html#expert-collaborators",
    "href": "index.html#expert-collaborators",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "Expert Collaborators",
    "text": "Expert Collaborators\nEthical Algorithms for the Modern Clinician is backed by experts in both clinical medicine and machine learning.\n\nDaniel Hashimoto, MD is an Assistant Professor of Surgery at the University of Pennsylvania. His research explores applications of computer vision to intraoperative surgical decision making.\nMark Yatskar, PhD is an Assistant Professor of Computer and Information Science at the University of Pennsylvania. His lab studies the interplay between visual and language systems in machine learning, and how such systems can inherit human biases.\nWalter Witschey, PhD is an Associate Professor of Radiology at the University of Pennsylvania. He leads an interdisciplinary lab studying MRI imaging techniques and machine learning for clinical radiology.\nTessa Cook, MD, PhD is an Associated Professor of Radiology at the University of Pennsylvania. Her research investigates how machine learning can be used for imaging informatics and clinical practice transformation across multiple clinical fields.\nJames Gee, PhD is Professor of Radiologic Science in Radiology at the University of Pennsylvania. His lab conducts research in biomedical imaging analysis and computational methods for both ex-vivo and in-vivo medical imaging."
  },
  {
    "objectID": "index.html#institutional-partners",
    "href": "index.html#institutional-partners",
    "title": "Ethical Algorithms for the Modern Clinician",
    "section": "Institutional Partners",
    "text": "Institutional Partners"
  },
  {
    "objectID": "interpretability.html",
    "href": "interpretability.html",
    "title": "Algorithmic Interpretability",
    "section": "",
    "text": "Define what it means for an algorithm to be interpretable and highlight key ways that the definition is subjective and user-dependent.\nDescribe the accuracy-interpretability tradeoff and why it is observed in many real-world algorithms.\nReflect on the role of interpretability in algorithms used in clinical practice.\n\n\n\n\n\n\n\nTipFood for Thought: Interpretability or Accuracy?\n\n\n\n\n\nSuppose I ask you the following question: is 99 a prime number? Which of the following generated answers is more helpful to you?\n\nNo, 99 is not a prime number.\nYes, 99 is a prime number. To figure out if a number is prime we can list out all of the numbers that are greater than 1 and less than or equal to the square root of input number. In our case, the numbers that satisfy these criteria are 2 through 9 inclusive. We then see if the original number is divisible by any of the numbers in this list - if not, then the number is prime. Otherwise, the number is composite. 99 is not divisble by any of 2 through 9 inclusive. Therefore, we can conclude from that 99 is prime.\n\nWould your answer to this question change if the original question posed instead was what disease does this sick patient in front of me have?\nWhich is more important to you: getting the right answer (accuracy) or understanding how to approach similar problems in the future (interpretability)?\nMade with ❤ by the EAMC Team ©2025."
  },
  {
    "objectID": "interpretability.html#learning-objectives",
    "href": "interpretability.html#learning-objectives",
    "title": "Algorithmic Interpretability",
    "section": "",
    "text": "Define what it means for an algorithm to be interpretable and highlight key ways that the definition is subjective and user-dependent.\nDescribe the accuracy-interpretability tradeoff and why it is observed in many real-world algorithms.\nReflect on the role of interpretability in algorithms used in clinical practice.\n\n\n\n\n\n\n\nTipFood for Thought: Interpretability or Accuracy?\n\n\n\n\n\nSuppose I ask you the following question: is 99 a prime number? Which of the following generated answers is more helpful to you?\n\nNo, 99 is not a prime number.\nYes, 99 is a prime number. To figure out if a number is prime we can list out all of the numbers that are greater than 1 and less than or equal to the square root of input number. In our case, the numbers that satisfy these criteria are 2 through 9 inclusive. We then see if the original number is divisible by any of the numbers in this list - if not, then the number is prime. Otherwise, the number is composite. 99 is not divisble by any of 2 through 9 inclusive. Therefore, we can conclude from that 99 is prime.\n\nWould your answer to this question change if the original question posed instead was what disease does this sick patient in front of me have?\nWhich is more important to you: getting the right answer (accuracy) or understanding how to approach similar problems in the future (interpretability)?"
  },
  {
    "objectID": "interpretability.html#what-is-interpretability",
    "href": "interpretability.html#what-is-interpretability",
    "title": "Algorithmic Interpretability",
    "section": "What is Interpretability?",
    "text": "What is Interpretability?\nUnlike our past few modules defining topics like fairness and anonymity exactly, it is challenging to give a rigorous, objective definition of interpretability. One commonly cited definition is interpretability is the degree to which a human can understand why an algorithm made its prediction.1 If an algorithm is interpretable, then it is easier for someone to understand why certain predictions were made. Note that the definition of interpretability is entirely independent from the the accuracy of the algorithm - we only seek to explain why an algorithm made its own prediction, which may or may not be necessarily correct.\n1 Miller T. Explanation in artificial intelligence: Insights from the social sciences. Art Intel 267: 1-38. (2019). doi: 10.1016/j.artint.2018.07.0072 Lipton ZC. The mythos of model interpretability. Proc ICML Workshop on Human Interpretability in Machine Learning. (2016). 10.48550/arXiv.1606.03490However, even this definition of interpretability remains underspecified.2 For example, here are some other factors of algorithmic design that are closely related to - if not paramount for - interpretability:\n\nTrust: What is an objective notion of trust? Is it confidence that a model will perform well? Do you care about how often an algorithm is correct, or for which inputs the algorithm is correct for?\nCausality: Can we use algorithms to learn potential hypotheses about the world around us? Is this important in labelling a model as interpretable?\nTransferrability: Do algorithms generalize to new patient populations? Can we predict when an algorithm might generalize and when it won’t? How might deployment of models alter the user’s environment to simultaneously invalidate the model?\nInformativeness: As previously explored in the Food for Thought discussion above, is it more important for you to have an explanation to approach future problems, or to get the correct answer? Does the model serve the role of an oracle, a colleague, or a mentor?\nFair and Ethical Decision Making: How can we be sure that predictions do not discriminate on the basis of race, age, gender, and other patient attributes?\n\n\n\n\n\n\n\nTip A recent study described a machine learning algorithm showing that patients are less likely to die from pneumonia if they also had asthma. Why might this be the case? Would you want to deploy this system in the ED?\n\n\n\n\n\nIn 2015, Caruana et al. showed that concurrent asthma was a predictor of lower risk of death in patients with pneumonia. This is because physicians that knew about the patient’s asthma history always always treated these asthmatic patients more agressively. As a result, asthmatic patients paradoxically had lower death rates from pneumonia.\nImagine that we naively implement this model for patient triage without understanding the reasoning behind its predictions. In this case, model deployment would prevent physicians from treating asthmatic patients more agressively because the model would label these patients as having a low risk of death. In other words, deploying the model would actually end up invalidating the model and hurt patients!\nCaruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Proc Conf Knowl Disc Data Mining: 1721-30. (2015). doi: 10.1145/2783258.2788613\n\n\n\nThese factors that contribute to how we label if an algorithm is interpretable often vary due to a number of factors:\n\nDifficulty of the Task: Algorithms trained to perform complex, domain-specific tasks are inherently less interpretable by the average user due to the nature of the task itself.\nExpertise of the User: A domain expert may require less explanation in order to call an algorithm interpretable compared to someone with less experience in the field.\nExpertise with the Algorithm: Just like with any other software, technicians with years of experience using an algorithm are more likely able to explain its predictions more due to having more experience using the technology.\n\n\n\n What other factors might influence the subjectivity of the interpretability of an algorithm?"
  },
  {
    "objectID": "interpretability.html#discussion-questions",
    "href": "interpretability.html#discussion-questions",
    "title": "Algorithmic Interpretability",
    "section": "Discussion Questions",
    "text": "Discussion Questions\n\nWhich of the following algorithms are interpretable?\nMean Arterial Pressure (MAP)\nOn one end of the spectrum, clinical algorithms like computing the mean arterial pressure (MAP) are pretty clearly interpretable. We can exactly right down the formula to compute this quantity as\n\\[\\text{MAP}=\\frac{1}{3}(\\text{Systolic Blood Pressure (mmHg)}) + \\frac{2}{3}(\\text{Diastolic Blood Pressure (mmHg)})\\]\nWe might even be able to reason why this formula works - heuristically, the arterial pressure might spend about 2/3rds of the time in diastole and 1/2rd in systole, and so the time-weighted average of these two quantities is the MAP.\nMELD Score\nThe MELD Score is a clinical algorithm used quantifying the degree of end-stage liver disease in potential transplant candidates. Similar to MAP, the MELD score also has an exact formula:\n\\[\\text{MELD}=9.57\\times\\log(\\text{Cr}) + 3.78\\times \\log(\\text{Bilirubin})+11.20\\times \\log(\\text{INR})+6.43\\]\nIs this equation still interpretable? From the equation above, we still clearly have transparency into how a MELD score is calculated, but the equation itself is a little more complicated and may not be easily understand by everyone. After looking at the above equation, we’re still left with a number of remaining questions: How were the decimal coefficients derived? Why is there a logarithmic relationship between the MELD score and patient lab values?\nA Machine Learning Algorithm\nSuppose we now have a ML algorithm that predicts a patient’s risk of breast cancer given their genomic data. Such algorithms are often referred to as black-box algorithms because the algorithm’s user cannot see the inner workings of the algorithm. However, is such an algorithm truly “black-box”? Similar to the MELD Score, I can exactly write down the specific formula for the algorithm, with all of its inputs, internal functions, decimal coefficients, etc. It would be an incredibly long and complex equation, but any ML algorithm can be written down exactly just like the MELD score and MAP calculations above.\n\n\n\n\n\n\nTipDoes being able to write down a mathematical formula for an algorithm (even if it’s extraordinarily complex as with machine learning models) mean that the algorithm is interpretable? Why or why not?\n\n\n\n\n\nUnfortunately, we don’t have a “right” answer to share with you here. Comparing the above three scenarios, some experts may describe them as interpretable and humans just lack the ability to understand them. In other cases, some might claim that this is not sufficient. Is it the algorithm’s “job” to explain itself to humans when we lack the ability to understand how they work?\n\n\n\nA Probabilistic Algorithm\nFinally, consider the following algorithm that utilizes a fair two-sided coin: if I flip the coin and it lands heads, then I admit a patient from the ED. Otherwise, I discharge them and send the patient home. Is this algorithm (or any probability-based algorithm) “interpretable”?\nAn Accurate Algorithm Trained on An Unknown Dataset\nAfter reading a recently published paper on a new machine learning algorithm to diagnose a rare disease, you try testing the algorithm on your own patients’ data and find that it has almost perfect accuracy! However, the paper does not include any details about how the model was trained - including any information on the patient demographics in the published study."
  },
  {
    "objectID": "interpretability.html#interpretability-versus-accuracy",
    "href": "interpretability.html#interpretability-versus-accuracy",
    "title": "Algorithmic Interpretability",
    "section": "Interpretability versus Accuracy",
    "text": "Interpretability versus Accuracy\nA key insight that we hope you take away from considering the discussion question posed above is that there is often times a trade-off between the complexity and interpretability of an algorithm. If an algorithm is more complex, such as machine learning models and the MELD score, then they may be less interpretable. At the same time, algorithms that are more complex can often times represent more complex relationships between inputs and outputs, leading to better predictive accuracy. In other words, we have the following:\n\n\n\n\n\n\nNoteThe Accuracy-Interpretability Tradeoff\n\n\n\nThe more accurate an algorithm model is, the less likely it is to be interpretable.3\n\n\n3 There’s a great blog post discussing the accuracy-interpretability tradeoff in more detail here: Ndungula S. Model accuracy and interpretability. Medium. (2022). Link to article"
  },
  {
    "objectID": "interpretability.html#what-are-some-strategies-for-building-interpretable-algorithms",
    "href": "interpretability.html#what-are-some-strategies-for-building-interpretable-algorithms",
    "title": "Algorithmic Interpretability",
    "section": "What are some strategies for building interpretable algorithms?",
    "text": "What are some strategies for building interpretable algorithms?\n\nFeature Attribution is one way that researchers try to build interpretable algorithms is by being able to attribute the individual contributions from input variables to the final model prediction. A good example of this is linear regression. Suppose that we have two variables - \\(A\\) and \\(B\\) - that contribute to a final output \\(y\\) according to the following model:\n\n\\[y = m_AA + m_BB + y_0\\]\n       If \\(|m_A|\\gg|m_B|\\), then we know that the the importance of the variable \\(A\\) to the final score \\(y\\) is greater than that of variable \\(B\\).\n\nChain-of-Thought Reasoning is a technique used specifically with large language models (LLMs) like ChatGPT. The goal of CoT prompting is to ask an algorithm to generate a step-by-step explanation or reasoning process before arriving at a final answer. Savage et al. (2024). include illustrative examples of CoT reasoning as it pertains to clinical medicine.\n\n\n\n Is the CoT “explanation” generated by a large language model truly how the LLM reasons to get to a predicted answer? Turpin et al. Proc NeurIPS (2023) explain how this is not always the case.\n\nPost-Hoc Explanations refers to analyzing the outputs of an algorithm to see if we can better understand how algorithms work by analyzing a lot of their past outputs. Unfortunately, this body of work on interpretability is often implemented as “fishing for explanations” in practice. In general, we want to avoid generating hypotheses for how a model might work after the data has already been observed. It is important to carefully consider how post-hoc explanations are generated and how they may potentially lead to biased interpretations of algorithms.\nConstraining How Input Data is Represented is another way to achieve model interpretability. For example, we might want an algorithm to only focus on particular clinical lab values to arrive at a prediction based on prior domain-specific knowledge of the pathophysiology of a disease.\n\nWhat other methods can we use to make algorithms more interpretable?"
  },
  {
    "objectID": "interpretability.html#evidence-based-medicine-discussion",
    "href": "interpretability.html#evidence-based-medicine-discussion",
    "title": "Algorithmic Interpretability",
    "section": "Evidence-Based Medicine Discussion",
    "text": "Evidence-Based Medicine Discussion\nDo algorithms need to be interpretable in order for clinicians to leverage them for patient care?\n\n\n\n\n\n\nNote1. Overview Article\n\n\n\nImrie F, Davis R, van dr Schaar M. Multiple stakeholders drive diverse interpretability requirements for machine learning in healthcare. Nat Mach Intell 5: 824-9. (2023). doi: 10.1038/s42256-023-00698-2\ntl;dr: Machine learning (ML) algorithms are becoming increasingly commonplace in healthcare settings. Key stakeholders in healthcare systems - such as algorithm developers, researchers, clinicians, and patients - often have different (and sometimes conflicting) definitions for interpretability of different algorithms used in clinical practice.\n\n\n\n\n\n\n\n\nTip2. Yes, interpretability ensures that algorithms are aligned with clinical reasoning.\n\n\n\nAntony M, Kakileti ST, Shah R, Sahoo S, Bhattacharyya C, Manjunath G. Challenges of AI driven diagnosis of chest X-rays transmitted through smart phones: A case study in COVID-19. Sci Rep 13: 18102. (2023). doi: 10.1038/s41598-023-44653-y. PMID: 37872204\ntl;dr: Retrospective study using multiple publicly available chest X-ray (CXR) imaging datasets from approximately 40,000 patients. Researchers found that state-of-the-art machine learning models could accurately predict which patients had COVID-19 from CXR imaging studies, but were actually diagnosing patients by focusing on parts of the CXR scans completely outside of the lung fields, and even outside the patient’s body in certain instances.\n\n\n\n\n\n\n\n\nImportant3. No, using black-box models allows us to discover new clinical insights and provide better care.\n\n\n\nLing J, Liao T, Wu Y, Wang Z, Jin H, Lu F, Fang M. Predictive value of red blood cell distribution width in septic shock patients with thrombocytopenia: A retrospective study using machine learning. J Clin Lab Anal 35(12): e24053. (2021). doi: 10.1002/jcla.24053. PMID: 34674393\ntl;dr: The red blood cell distribution width (RDW) is a lab value most commonly used in the workup of anemias. However, a retrospective study using the a large patient dataset showed using non-interpretable machine learning methods that red blood cell distribution width (RDW) was the second most important lab value in predicting 28-day mortality from sepsis. Non-interpretable algorithms therefore helped clinicians “discover” new clinical applications of the RDW."
  },
  {
    "objectID": "interpretability.html#summary",
    "href": "interpretability.html#summary",
    "title": "Algorithmic Interpretability",
    "section": "Summary",
    "text": "Summary\nInterpretability is a subjective property of an algorithm that characterizes the degree to which a human can understand why and how and algorithm made its prediction. There are many reasons why interpretability can vary, including the difficulty of the clinical task, the complexity of the algorithm, and expertise of the user among many others. While there is often a tradeoff observed between accuracy and interpretability in practice, many experts believe that interpretability is important for algorithms used in patient care."
  },
  {
    "objectID": "interpretability.html#additional-readings",
    "href": "interpretability.html#additional-readings",
    "title": "Algorithmic Interpretability",
    "section": "Additional Readings",
    "text": "Additional Readings\n\nModel interpretability. Amazon Web Services Whitepapers. Accessed 20 May 2024. Link to article\nAmann J, Blasimme A, Vayena E, Frey D, Madai VI. Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Med Inform Decis Mak 20(1): 310. (2020). doi: 10.1186/s12911-020-01332-6. PMID: 33256715\nTeng Q, Liu Z, Song Y, et al. A survey on the interpretability of deep learning in medical diagnosis. Multimed Syst 28(6): 2335-55. (2022). doi: 10.1007/s00530-022-00960-4. PMID: 35789785"
  }
]