,entity name,entity type,timestamp,description
0,MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning,paper,,"This paper proposes a novel Multi-disciplinary Collaboration framework to enhance LLM proficiency and reasoning capabilities in the medical domain through collaborative discussion, focusing on zero-shot medical reasoning."
1,Large language models (LLMs),technology,,"Despite their remarkable progress across various general domains, they encounter significant barriers in medicine and healthcare."
2,medicine and healthcare,domain,,
3,domainspecific terminologies,problem,,Unique challenge in medicine and healthcare that LLMs face.
4,reasoning over specialized knowledge,problem,,Unique challenge in medicine and healthcare that LLMs face.
5,Multi-disciplinary Collaboration (MC) framework,method,,A framework for the medical domain that leverages LLM-based agents in a role-playing setting to enhance LLM proficiency and reasoning capabilities.
6,role-playing setting,method,,Setting in which LLM-based agents participate in a collaborative multi-round discussion.
7,collaborative multi-round discussion,method,,Part of the MC framework to enhance LLM proficiency and reasoning capabilities.
8,zero-shot setting,idea,,Concept where models are applied to scenarios without prior specific training.
9,MedQA,technology,,Dataset used to support the MC framework.
10,MedMCQA,technology,,Dataset used to support the MC framework.
11,PubMedQA,technology,,Dataset used to support the MC framework.
12,MMLU subtasks,technology,,Dataset used to support the MC framework.
13,MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning,paper,,
14,LLMs,technology,,Large Language Models
15,medical field,domain,,
16,diagnostics,domain,,a medical application of Large Language Models
17,genetics,domain,,a medical application of Large Language Models
18,pharmacist,domain,,a medical application of Large Language Models
19,medical evidence summarization,domain,,a medical application of Large Language Models
20,clinical inquiries,problem,,"require intricate medical expertise and decent reasoning abilities, posing challenges to Large Language Models"
21,enhanced clinical reasoning capabilities,method,,
22,MedAgents,technology,,Large Language Models for zero-shot medical reasoning
23,tool-augmented methods,method,,one major line of research on Large Language Models in medical domains
24,instruction-tuning methods,method,,one major line of research on Large Language Models in medical domains
25,Precedent-enhanced legal judgment prediction with llm and domainmodel collaboration,paper,2023.0,
26,Large language models encode clinical knowledge,paper,2023.0,
27,Medchatzh: a better medical adviser learns from better instructions,paper,2023.0,
28,Disc-medllm: Bridging general large language models and realworld medical consultation,paper,2023.0,
29,Capabilities of gpt-4 on medical challenge problems,paper,2023.0,
30,Evaluation of the performance of gpt-3.5 and gpt-4 on the medical final examination,paper,2023.0,
31,Medalpaca - an open-source collection of medical conversational ai models and training data,paper,2023.0,
32,Analysis of large-language model versus human performance for genetics questions,paper,2023.0,
33,Genegpt: Augmenting large language models with domain tools for improved access to biomedical information,paper,2023.0,
34,Pharmacygpt: The ai pharmacist,paper,2023.0,
35,Aligning factual consistency for clinical studies summarization through reinforcement learning,paper,2023.0,
36,Evaluating large language models on medical evidence summarization,paper,2023.0,
37,"Summarizing, simplifying, and synthesizing medical evidence using gpt-3 (with varying success)",paper,2023.0,
38,Med-halt: Medical domain hallucination test for large language models,paper,2023.0,
39,GeneGPT,method,,guided LLMs to leverage the Web APIs of the National Center for Biotechnology Information (NCBI) to meet various biomedical information needs
40,Almanac,method,,a framework that is augmented with retrieval capabilities for medical guidelines and treatment recommendations
41,KARD,method,,a method to improve small LMs on specific domain knowledge by finetuning small LMs on the rationales generated from LLMs and augmenting small LMs with external knowledge from a non-parametric memory
42,NCBI,technology,,
43,instruction tuning,method,,
44,fine-tuning,method,,
45,self-prompted data,method,,
46,clinical knowledge bases,technology,,
47,biomedical literature,domain,,
48,traditional Chinese medicine,domain,,
49,medical instruction data,technology,,
50,latent medical knowledge,technology,,
51,reasoning in a training-free setting,problem,,
52,medical reasoning tasks,problem,,
53,LLM-based multi-agent collaboration,method,,
54,LLM-based agents,technology,,The development has made significant progress in the community by endowing LLMs with the ability to perceive surroundings and make decisions individually.
55,multi-agent pattern,method,,A method that explores the potential of LLM-based agents by learning from multi-turn feedback and cooperation.
56,role-playing,method,,A simulation of human activities for LLM-based multi-agent collaboration.
57,communication,method,,A simulation of human activities for LLM-based multi-agent collaboration.
58,performance,problem,,"Performance is mentioned in the context of enhancing the effectiveness of large language model-based multi-agent collaboration. It is specifically noted in the phrase 'to improve performance by dynamically identifying and engaging multiple personas throughout task-solving,' indicating challenges in optimizing the performance of these models."
59,Camel,method,,A method that leverages role-playing to enable chat agents to communicate with each other for task completion.
60,adversarial collaboration,method,,A method that includes debates and negotiation among multiple agents to further boost performance.
61,multi-agent debate framework,method,,A framework in which various agents put forward their statements in a tit for tat pattern.
62,multi-disciplinary consultation mechanism,method,,"A mechanism common and effective in hospitals, adopted for medical reasoning tasks through LLM-based multi-agent collaboration."
63,Large Language Models (LLMs),technology,,
64,Detecting Semantic Relations between Terms in Definitions,paper,2004-08-29,"Terminology structuring aims to elicit semantic relations between the terms of a domain. We propose here to exploit definitions found in corpora to obtain such semantic relations. Definition typologies show that definitions can be introduced by different semantic relations, some of these relations being likely to structure terminologies. Our aim is therefore to mine “defining expressions” in domainspecific corpora, and to detect the semantic relations they involve between their main terms. We use lexico-syntactic markers and patterns to detect at the same time both a definition and its main semantic relation. 46 markers and 74 patterns have been designed and tuned on a first corpus in the field of anthropology. We report on their evaluation on a second corpus in the field of dietetics, where they obtained 4% to 36% recall and from 61 to 66% precision, and discuss the relative accuracy of different subclasses of markers for this task."
65,Automated Alignment and Extraction of Bilingual Domain Ontology for Cross-Language Domain-Specific Applications,paper,2004-08-23,"In this paper we propose a novel approach for ontology alignment and domain ontology extraction from the existing knowledge bases, WordNet and HowNet. These two knowledge bases are aligned to construct a bilingual ontology based on the cooccurrence of the words in the sentence pairs of a parallel corpus. The bilingual ontology has the merit that it contains more structural and semantic information coverage from these two complementary knowledge bases. For domainspecific applications, the domain specific ontology is further extracted from the bilingual ontology by the island-driven algorithm and the domain-specific corpus. Finally, the domain-dependent terminologies and some axioms between domain terminologies are integrated into the ontology. For ontology evaluation, experiments were conducted by comparing the benchmark constructed by the ontology engineers or experts. The experimental results show that the proposed approach can extract an aligned bilingual domain-specific ontology."
66,Bridging software languages and ontology technologies: tutorial summary,paper,2010-10-17,"Current model-driven development approaches allow for a more productive way of developing software systems. However, building tools and languages for software development still suffer a neglect of semantics in modeling and metamodeling.
 An interest to strengthen semantics in modeling and metamodeling that gained scientific and commercial attention is the integration of ontology technology and software development. Ontology formalisms for consistency validation and dynamic classification as well as semantic web technologies for enabling shared terminologies and automated reasoning provide means for leveraging metamodeling and language engineering.
 This tutorial summary (1) enlightens the potential of ontology and semantic web technology for modeling and metamodeling in software development, positioning it among modeling standards like UML, and MOF; and (2) illustrates ontology-enabled software development with real application scenarios in areas like software design patterns, domainspecific languages and variability management."
67,"Toward a theorizing strategy with components of terminologies, classifications, and nursing theories.",paper,2022-10-07,"PURPOSE
This article describes a theorizing strategy that integrates the components of classifications or terminologies with elements of grand or middle-range theories.


METHODS
The source of metatheoretical data to support the strategy was the levels of theories by Dickoff et al. (1968). Terminological data sources were professional classifications and terminologies.


FINDINGS
The authors synthesized data and philosophical, metatheoretical, theoretical, and terminological knowledge from primary sources on the subject to construct arguments and demonstrate suitable links.


CONCLUSIONS
The proposal presented in this article of a strategy for building theories integrates theories and classifications or standardized nomenclatures. It applies levels of theorization: scrutiny of phenomena, description, conceptualization, naming, relationship, modeling, and operationalization to achieve higher levels of explanatory, predictive, and prescriptive properties on generated theory.


IMPLICATIONS FOR NURSING PRACTICE
The implications for practice and research are connected to the theorizing strategy proposed in this article. We assume that using professional language at all levels of theorization can ensure that the concepts generated are closer to clinical practice."
68,Effectiveness of Standardized Nursing Terminologies for Nursing Practice and Healthcare Outcomes: A Systematic Review.,paper,2021-02-12,"PURPOSE
This review evaluates the effectiveness of using standardized terminologies in nursing.


METHODS
A systematic literature review was performed via PubMed, Web of Science, CINAHL, and OVID databases for articles published between January 1973 and September 2020. The Effective Public Health Practice Project's Quality Assessment Tool for Quantitative Studies was used to assess the quality of all included studies.


RESULTS
Fourteen studies were selected for data extraction and analysis, which included a total of 24,243 patients and 99 nurses. Of the studies that met the inclusion criteria, the quality of five were of high quality, one was of moderate quality, and eight was of weak quality. All articles were summarized according to two themes: the identification of common outcomes or interventions, and the validation or evaluation of the effectiveness of standard nursing terminology sets.


CONCLUSION
Standardized terminologies in nursing help nurses to implement care plans according to nursing procedures, supervise changes in patients' sensitive indicators, improve patients' health outcomes, and contribute to evidence-based nursing practices and global data resource sharing.


IMPLICATIONS FOR NURSING PRACTICE
Standardized nursing terminologies have positive effects on clinical practice, are essential for enriching nurses' knowledge, and alter nurses' attitudes regarding education and guidance, which promotes the clinical application of these terminologies."
69,Exploring the nexus between the standardized nursing terminologies and the unfinished nursing care phenomenon: An empty systematic review.,paper,2024-04-02,"PURPOSE
To identify and synthesize evidence regarding the documented relationship between the standardized nursing terminologies and the unfinished nursing care phenomenon.


DATA SOURCES
A systematic review according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. PubMed, Scopus, and Cumulative Index to Nursing and Allied Health Literature Complete databases were last consulted on November 27, 2023. The review included primary quantitative studies that reported an association between recognized standardized nursing terminologies and unfinished nursing care. Two researchers completedtitle and abstract and full-text screening.


DATA SYNTHESIS
Our search identified 149 citations. A full-text review of one paper was undertaken. No studies met our inclusion criteria. We report an empty review.


CONCLUSIONS
Standardized nursing terminologies and Unfinished Care are two sides of the same coin: despite their potential commonalities, no studies have documented their potential links. Digital systems, such as electronic health records and decision support systems, could foster this linkage.


IMPLICATIONS FOR NURSING PRACTICE
This review suggests that linking the conceptual frameworks can promote the diffusion of standardized nursing terminologies in clinical practice and increase accuracy in the measurement of Unfinished Care. This synergy could promote the contribution of nursing knowledge to patient care, nursing visibility, and be beneficial to clinical nurses, managers, and healthcare systems to international level."
70,Terminologies for Reproducible Research,paper,2018-02-09,"Reproducible research---by its many names---has come to be regarded as a key concern across disciplines and stakeholder groups. Funding agencies and journals, professional societies and even mass media are paying attention, often focusing on the so-called ""crisis"" of reproducibility. One big problem keeps coming up among those seeking to tackle the issue: different groups are using terminologies in utter contradiction with each other. Looking at a broad sample of publications in different fields, we can classify their terminology via decision tree: they either, A---make no distinction between the words reproduce and replicate, or B---use them distinctly. If B, then they are commonly divided in two camps. In a spectrum of concerns that starts at a minimum standard of ""same data+same methods=same results,"" to ""new data and/or new methods in an independent study=same findings,"" group 1 calls the minimum standard reproduce, while group 2 calls it replicate. This direct swap of the two terms aggravates an already weighty issue. By attempting to inventory the terminologies across disciplines, I hope that some patterns will emerge to help us resolve the contradictions."
71,"Recent Developments in Clinical Terminologies — SNOMED CT, LOINC, and RxNorm",paper,2018-08-01,"Summary Objective: To discuss recent developments in clinical terminologies. SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) is the world's largest clinical terminology, developed by an international consortium. LOINC (Logical Observation Identifiers, Names, and Codes) is an international terminology widely used for clinical and laboratory observations. RxNorm is the standard drug terminology in the U.S. Methods and results: We present a brief review of the history, current state, and future development of SNOMED CT, LOINC and RxNorm. We also analyze their similarities and differences, and outline areas for greater interoperability among them. Conclusions: With different starting points, representation formalisms, funding sources, and evolutionary paths, SNOMED CT, LOINC, and RxNorm have evolved over the past few decades into three major clinical terminologies supporting key use cases in clinical practice. Despite their differences, partnerships have been created among their development teams to facilitate interoperability and minimize duplication of effort."
72,"Interface Terminologies, Reference Terminologies and Aggregation Terminologies: A Strategy for Better Integration",paper,2017,"The time has come to end unproductive competitions among different types of biomedical terminology artefacts. Tools and strategies to create the foundation of a seamless environment covering clinical jargon, clinical terminologies, and classifications are necessary. Whereas language processing relies on human interface terminologies, which represent clinical jargon, their link to reference terminologies such as SNOMED CT is essential to guarantee semantic interoperability. There is also a need for interoperation between reference and aggregation terminologies. Simple mappings between nodes are not enough, because the three kinds of terminology systems represent different things: reference terminologies focus on context-free descriptions of classes of entities of a domain; aggregation terminologies contain rules that enforce the principle of single hierarchies and disjoint classes; interface terminologies represent the language used in a domain. We propose a model that aims at providing a better flow of standardized information, addressing multiple use cases in health care including clinical research, epidemiology, care management, and reimbursement."
73,TempoQR: Temporal Question Reasoning over Knowledge Graphs,paper,2021-12-10,"Knowledge Graph Question Answering (KGQA) involves retrieving facts from a Knowledge Graph (KG) using natural language queries. A KG is a curated set of facts consisting of entities linked by relations. Certain facts include also temporal information forming a Temporal KG (TKG). Although many natural questions involve explicit or implicit time constraints, question answering (QA) over TKGs has been a relatively unexplored area. Existing solutions are mainly designed for simple temporal questions that can be answered directly by a single TKG fact.
 This paper puts forth a comprehensive embedding-based framework for answering complex questions over TKGs. Our method termed temporal question reasoning (TempoQR) exploits TKG embeddings to ground the question to the specific entities and time scope it refers to. It does so by augmenting the question embeddings with context, entity and time-aware information by employing three specialized modules. The first computes a textual representation of a given question, the second combines it with the entity embeddings for entities involved in the question, and the third generates question-specific time embeddings. Finally, a transformer-based encoder learns to fuse the generated temporal information with the question representation, which is used for answer predictions. Extensive experiments show that TempoQR improves accuracy by 25--45 percentage points on complex temporal questions over state-of-the-art approaches and it generalizes better to unseen question types."
74,Clinical Knowledge and Reasoning Abilities of AI Large Language Models in Anesthesiology: A Comparative Study on the ABA Exam,paper,2023-05-16,"Over the past decade, Artificial Intelligence (AI) has expanded significantly with increased adoption across various industries, including medicine. Recently, AI's large language models such as GPT-3, Bard, and GPT-4 have demonstrated remarkable language capabilities. While previous studies have explored their potential in general medical knowledge tasks, here we assess their clinical knowledge and reasoning abilities in a specialized medical context. We study and compare their performances on both the written and oral portions of the comprehensive and challenging American Board of Anesthesiology (ABA) exam, which evaluates candidates' knowledge and competence in anesthesia practice. In addition, we invited two board examiners to evaluate AI's answers without disclosing to them the origin of those responses. Our results reveal that only GPT- 4 successfully passed the written exam, achieving an accuracy of 78% on the basic section and 80% on the advanced section. In comparison, the less recent or smaller GPT-3 and Bard models scored 58% and 47% on the basic exam, and 50% and 46% on the advanced exam, respectively. Consequently, only GPT-4 was evaluated in the oral exam, with examiners concluding that it had a high likelihood of passing the actual ABA exam. Additionally, we observe that these models exhibit varying degrees of proficiency across distinct topics, which could serve as an indicator of the relative quality of information contained in the corresponding training datasets. This may also act as a predictor for determining which anesthesiology subspecialty is most likely to witness the earliest integration with AI."
75,MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning,paper,2023-11-16,"Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgents leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MedAgents framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities. Our code can be found at https://github.com/gersteinlab/MedAgents."
76,CLR-Fact: Evaluating the Complex Logical Reasoning Capability of Large Language Models over Factual Knowledge,paper,2024-07-30,"While large language models (LLMs) have demonstrated impressive capabilities across various natural language processing tasks by acquiring rich factual knowledge from their broad training data, their ability to synthesize and logically reason with this knowledge in complex ways remains underexplored. In this work, we present a systematic evaluation of state-of-the-art LLMs' complex logical reasoning abilities through a novel benchmark of automatically generated complex reasoning questions over general domain and biomedical knowledge graphs. Our extensive experiments, employing diverse in-context learning techniques, reveal that LLMs excel at reasoning over general world knowledge but face significant challenges with specialized domain-specific knowledge. We find that prompting with explicit Chain-of-Thought demonstrations can substantially improve LLM performance on complex logical reasoning tasks with diverse logical operations. Interestingly, our controlled evaluations uncover an asymmetry where LLMs display proficiency at set union operations, but struggle considerably with set intersections - a key building block of logical reasoning. To foster further work, we will publicly release our evaluation benchmark and code."
77,AI Knowledge and Reasoning: Emulating Expert Creativity in Scientific Research,paper,2024-04-05,"We investigate whether modern AI can emulate expert creativity in complex scientific endeavors. We introduce novel methodology that utilizes original research articles published after the AI's training cutoff, ensuring no prior exposure, mitigating concerns of rote memorization and prior training. The AI are tasked with redacting findings, predicting outcomes from redacted research, and assessing prediction accuracy against reported results. Analysis on 589 published studies in four leading psychology journals over a 28-month period, showcase the AI's proficiency in understanding specialized research, deductive reasoning, and evaluating evidentiary alignment--cognitive hallmarks of human subject matter expertise and creativity. These findings suggest the potential of general-purpose AI to transform academia, with roles requiring knowledge-based creativity become increasingly susceptible to technological substitution."
78,Clinical Knowledge and Reasoning Abilities of AI Large Language Models in Anesthesiology: A Comparative Study on the American Board of Anesthesiology Examination.,paper,2024-04-19,"BACKGROUND
Over the past decade, artificial intelligence (AI) has expanded significantly with increased adoption across various industries, including medicine. Recently, AI-based large language models such as Generative Pretrained Transformer-3 (GPT-3), Bard, and Generative Pretrained Transformer-3 (GPT-4) have demonstrated remarkable language capabilities. While previous studies have explored their potential in general medical knowledge tasks, here we assess their clinical knowledge and reasoning abilities in a specialized medical context.


METHODS
We studied and compared the performance of all 3 models on both the written and oral portions of the comprehensive and challenging American Board of Anesthesiology (ABA) examination, which evaluates candidates' knowledge and competence in anesthesia practice.


RESULTS
Our results reveal that only GPT-4 successfully passed the written examination, achieving an accuracy of 78% on the basic section and 80% on the advanced section. In comparison, the less recent or smaller GPT-3 and Bard models scored 58% and 47% on the basic examination, and 50% and 46% on the advanced examination, respectively. Consequently, only GPT-4 was evaluated in the oral examination, with examiners concluding that it had a reasonable possibility of passing the structured oral examination. Additionally, we observe that these models exhibit varying degrees of proficiency across distinct topics, which could serve as an indicator of the relative quality of information contained in the corresponding training datasets. This may also act as a predictor for determining which anesthesiology subspecialty is most likely to witness the earliest integration with AI.


CONCLUSIONS
GPT-4 outperformed GPT-3 and Bard on both basic and advanced sections of the written ABA examination, and actual board examiners considered GPT-4 to have a reasonable possibility of passing the real oral examination; these models also exhibit varying degrees of proficiency across distinct topics."
79,General and specialized brain correlates for analogical reasoning: A meta‐analysis of functional imaging studies,paper,2016-03-25,"Reasoning by analogy allows us to link distinct domains of knowledge and to transfer solutions from one domain to another. Analogical reasoning has been studied using various tasks that have generally required the consideration of the relationships between objects and their integration to infer an analogy schema. However, these tasks varied in terms of the level and the nature of the relationships to consider (e.g., semantic, visuospatial). The aim of this study was to identify the cerebral network involved in analogical reasoning and its specialization based on the domains of information and task specificity. We conducted a coordinate‐based meta‐analysis of 27 experiments that used analogical reasoning tasks. The left rostrolateral prefrontal cortex was one of the regions most consistently activated across the studies. A comparison between semantic and visuospatial analogy tasks showed both domain‐oriented regions in the inferior and middle frontal gyri and a domain‐general region, the left rostrolateral prefrontal cortex, which was specialized for analogy tasks. A comparison of visuospatial analogy to matrix problem tasks revealed that these two relational reasoning tasks engage, at least in part, distinct right and left cerebral networks, particularly separate areas within the left rostrolateral prefrontal cortex. These findings highlight several cognitive and cerebral differences between relational reasoning tasks that can allow us to make predictions about the respective roles of distinct brain regions or networks. These results also provide new, testable anatomical hypotheses about reasoning disorders that are induced by brain damage. Hum Brain Mapp 37:1953–1969, 2016. © 2016 Wiley Periodicals, Inc."
80,Enhancing Knowledge Graph Construction Using Large Language Models,paper,2023-05-08,"The growing trend of Large Language Models (LLM) development has attracted significant attention, with models for various applications emerging consistently. However, the combined application of Large Language Models with semantic technologies for reasoning and inference is still a challenging task. This paper analyzes how the current advances in foundational LLM, like ChatGPT, can be compared with the specialized pretrained models, like REBEL, for joint entity and relation extraction. To evaluate this approach, we conducted several experiments using sustainability-related text as our use case. We created pipelines for the automatic creation of Knowledge Graphs from raw texts, and our findings indicate that using advanced LLM models can improve the accuracy of the process of creating these graphs from unstructured text. Furthermore, we explored the potential of automatic ontology creation using foundation LLM models, which resulted in even more relevant and accurate knowledge graphs."
81,Automata-Based Quantitative Reasoning,paper,2023-07-01,"Existing solution approaches for problems in formal quantitative analysis suffer from two challenges that adversely impact their theoretical understanding and large-scale applicability. These are the lack of generalizability, and separation-of-techniques. Lack of generalizability refers to the issue that solution approaches are often specialized to the underlying cost model that evaluates the quantitative property. Different cost models deploy such disparate algorithms that there is no transfer of knowledge from one cost model to another. Separation-of-techniques refers to the inherent dichotomy in solving problems in quantitative analysis. Most algorithms comprise of two phases: A structural phase, which reasons about the structure of the quantitative system(s) using techniques from automata or graphs; and a numerical phase, which reasons about the quantitative dimension/cost model using numerical methods. These techniques are incompatible with one another, forcing the phases to be performed sequentially, thereby impacting scalability. The article presents a novel framework that addresses the aforementioned challenges. The introduced framework, called comparator automata or comparators in short, builds on automata-theoretic foundations to generalize across a variety of cost models. The crux of comparators is that they enable automata-based methods in the numerical phase, hence eradicating the dependence on numerical methods. In doing so, comparators are able to integrate the structural and numerical phases. On the theoretical front, we demonstrate that comparator-based solutions have the advantage of generalizable results, and yield complexity-theoretic improvements over a range of problems in quantitative analysis. On the practical front, we demonstrate through empirical analysis that comparator-based solutions render more efficient, scalable, and robust performance, and demonstrate broader applicability than traditional methods for quantitative reasoning."
82,Characterizing the Development of Specialized Mathematical Content Knowledge for Teaching in Algebraic Reasoning and Number Theory,paper,2011-10-01,"This article characterizes the development of a deep and connected body of mathematical knowledge categorized by Ball and Bass' (2003b) model of Mathematical Knowledge for Teaching (MKT), as Specialized Content Knowledge for Teaching (SCK) in algebraic reasoning and number sense. The research employed multiple cases across three years from two content courses designed for elementary and middle-level mathematics specialists. Qualitative data were collected and a grounded theory approach to data analysis was employed. The resulting framework characterizes developmental levels of deep and connected mathematical content knowledge for teaching algebraic reasoning and number theory content. The framework consists of four intertwined components related to a teacher's ability to (1) solve problems and justify his/her reasoning, (2) use multiple representations, (3) recognize, use, and generalize conceptually similar tasks, and (4) pose problems. Implications for mathematics teacher education programs are discussed as well as directions for further research."
83,Introduction to Clinical Inquiries: New series by the Family Physicians Inquiries Network.,paper,2020-03-01,We are thrilled to be collaborating with Canadian Family Physician (CFP) on the publication of our Clinical Inquiries (CIs) series. Clinical Inquiries are author-formulated questions that are answered with the best available current evidence. Family medicine residency faculty and their residents
84,Clinical Inquiries: How effective and safe is fecal microbial transplant in preventing C difficile recurrence?,paper,2018-06-01,"Fecal microbial transplant (fmt) is reasonably safe and effective. In patients who have had multiple Clostridium difficile infections (CDIs), fecal microbial transplant (FMT) results in a 65% to 80% cure rate with one treatment and 90% to 95% cure rate with repeated treatments compared with a 25% to 27% cure rate for antibiotics (strength of recommendation [SOR]: B, small open-label randomized controlled trials [RCTs]). Fresh and frozen donor feces, administered by either nasogastric tube or colonoscope, produce equal results (SOR B, RCTs). FMT has an overall adverse event rate of 30%, primarily involving abdominal discomfort, but also, rarely, severe infections (0.7%) and death (0.1%) (SOR: B, systematic review not limited to RCTs)."
85,"Clinical Inquiries Regarding Ebola Virus Disease Received by CDC — United States, July 9–November 15, 2014",paper,2014-12-12,"Since early 2014, there have been more than 6,000 reported deaths from Ebola virus disease (Ebola), mostly in Guinea, Liberia, and Sierra Leone. On July 9, 2014, CDC activated its Emergency Operations Center for the Ebola outbreak response and formalized the consultation service it had been providing to assist state and local public health officials and health care providers evaluate persons in the United States thought to be at risk for Ebola. During July 9-November 15, CDC responded to clinical inquiries from public health officials and health care providers from 49 states and the District of Columbia regarding 650 persons thought to be at risk. Among these, 118 (18%) had initial signs or symptoms consistent with Ebola and epidemiologic risk factors placing them at risk for infection, thereby meeting the definition of persons under investigation (PUIs). Testing was not always performed for PUIs because alternative diagnoses were made or symptoms resolved. In total, 61 (9%) persons were tested for Ebola virus, and four, all of whom met PUI criteria, had laboratory-confirmed Ebola. Overall, 490 (75%) inquiries concerned persons who had neither traveled to an Ebola-affected country nor had contact with an Ebola patient. Appropriate medical evaluation and treatment for other conditions were noted in some instances to have been delayed while a person was undergoing evaluation for Ebola. Evaluating and managing persons who might have Ebola is one component of the overall approach to domestic surveillance, the goal of which is to rapidly identify and isolate Ebola patients so that they receive appropriate medical care and secondary transmission is prevented. Health care providers should remain vigilant and consult their local and state health departments and CDC when assessing ill travelers from Ebola-affected countries. Most of these persons do not have Ebola; prompt diagnostic assessments, laboratory testing, and provision of appropriate care for other conditions are essential for appropriate patient care and reflect hospital preparedness."
86,Clinical Inquiries: Which interventions are effective in managing parental vaccine refusal?,paper,2017-12-01,"It's unclear whether educational initiatives alone alter vaccine refusal. Although about a third of parents cite herd immunity as motivation for vaccination, its efficacy in addressing vaccine hesitancy isn't clear. Multifaceted interventions (encompassing improved access to vaccines, immunization mandates, and patient education) may produce a ≥25% increase in vaccine uptake in groups with vaccine hesitancy and low utilization. Correcting false information about influenza vaccination improves perceptions about the vaccine, but may decrease intention to vaccinate in parents who already have strong concerns about safety. Discussions about vaccines that are more paternalistic (presumptive rather than participatory) are associated with higher vaccination rates, but lower visit satisfaction. Providers should thoroughly address patient concerns about safety and encourage vaccine use."
87,"Clinical Inquiries Received by CDC Regarding Suspected Ebola Virus Disease in Children--United States, July 9, 2014-January 4, 2015.",paper,2015-09-18,"The 2014–2015 Ebola virus disease (Ebola) epidemic is the largest in history and represents the first time Ebola has been diagnosed in the United States. On July 9, 2014, CDC activated its Emergency Operations Center and established an Ebola clinical consultation service to assist U.S. state and local public health officials and health care providers with the evaluation of suspected cases. CDC reviewed all 89 inquiries received by the consultation service during July 9, 2014– January 4, 2015, about children (persons aged ≤18 years). Most (56 [63%]) children had no identifiable epidemiologic risk factors for Ebola; among the 33 (37%) who did have an epidemiologic risk factor, in every case this was travel from an Ebola-affected country. Thirty-two of these children met criteria for a person under investigation (PUI) because of clinical signs or symptoms. Fifteen PUIs had blood samples tested for Ebola virus RNA by reverse transcription–polymerase chain reaction; all tested negative. Febrile children who have recently traveled from an Ebola-affected country can be expected to have other common diagnoses, such as malaria and influenza, and in the absence of epidemiologic risk factors for Ebola, the likelihood of Ebola is extremely low. Delaying evaluation and treatment for these other more common illnesses might lead to poorer clinical outcomes. Additionally, many health care providers expressed concerns about whether and how parents should be allowed in the isolation room. While maintaining an appropriate level of vigilance for Ebola, public health officials and health care providers should ensure that pediatric PUIs receive timely triage, diagnosis, and treatment of other more common illnesses, and care reflecting best practices in supporting children’s psychosocial needs."
88,FPIN's clinical inquiries. Dipstick urinalysis for the diagnosis of acute UTI.,paper,2013-05-15,Approximately two-thirds of women who present with classic symptoms of acute UTI have bacterial infection of the bladder. Dipstick urinalysis moderately improves the accuracy of clinical symptoms in establishing or excluding acute UTI in women.
89,Evidence‐based practice for the busy nurse practitioner: Part two: Searching for the best evidence to clinical inquiries,paper,2012-11-01,"Purpose: The purpose of this four‐part evidence‐based practice (EBP) series is to enhance the nurse practitioner's (NP's) EBP skills by reviewing the process of developing a clinical question, searching for the best evidence, and critically appraising and applying the findings. Part two of the series focuses on how to search the published scientific literature for the most relevant studies that will answer a specific clinical question of importance to the NP. Data sources: Scientific literature review, gray searching, PubMed and other online literature databases and resources, and online EBP websites. Conclusions: Technology has allowed multiple healthcare resources to be available at one's fingertips enabling both NPs and their patients to find answers to clinical questions. EBP databases can be categorized as synthesized/filtered, unfiltered, and background information/expert opinion resources. Learning which database can best answer the clinical inquiry can streamline the search process. Implications for practice: For the busy NP, EBP has emerged as an important strategy to maintain valid, accurate, and relevant clinical knowledge. It is expected that this part of the series will enable NPs to identify appropriate databases to answer clinical inquires while refining their search strategy skills, which takes both time and practice."
90,FPIN's Clinical Inquiries,paper,2012-08-15,"The evaluation of hip pain in patients 65 years and older should include a history and physical examination, followed by pertinent imaging studies."
91,MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning,paper,2023-11-16,"Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgents leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MedAgents framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities. Our code can be found at https://github.com/gersteinlab/MedAgents."
92,Large Class Separation is not what you need for Relational Reasoning-based OOD Detection,paper,2023-07-12,"Standard recognition approaches are unable to deal with novel categories at test time. Their overconfidence on the known classes makes the predictions unreliable for safety-critical applications such as healthcare or autonomous driving. Out-Of-Distribution (OOD) detection methods provide a solution by identifying semantic novelty. Most of these methods leverage a learning stage on the known data, which means training (or fine-tuning) a model to capture the concept of normality. This process is clearly sensitive to the amount of available samples and might be computationally expensive for on-board systems. A viable alternative is that of evaluating similarities in the embedding space produced by large pre-trained models without any further learning effort. We focus exactly on such a fine-tuning-free OOD detection setting. This works presents an in-depth analysis of the recently introduced relational reasoning pre-training and investigates the properties of the learned embedding, highlighting the existence of a correlation between the inter-class feature distance and the OOD detection accuracy. As the class separation depends on the chosen pre-training objective, we propose an alternative loss function to control the inter-class margin, and we show its advantage with thorough experiments."
93,"When in Doubt, Think Slow: Iterative Reasoning with Latent Imagination",paper,2024-02-23,"In an unfamiliar setting, a model-based reinforcement learning agent can be limited by the accuracy of its world model. In this work, we present a novel, training-free approach to improving the performance of such agents separately from planning and learning. We do so by applying iterative inference at decision-time, to fine-tune the inferred agent states based on the coherence of future state representations. Our approach achieves a consistent improvement in both reconstruction accuracy and task performance when applied to visual 3D navigation tasks. We go on to show that considering more future states further improves the performance of the agent in partially-observable environments, but not in a fully-observable one. Finally, we demonstrate that agents with less training pre-evaluation benefit most from our approach."
94,Piloting of Virtual Patient-Based Online Self-Study Quizzes for Developing Undergraduate Medical Students’ Clinical Reasoning Skills,paper,2021-11-08,"Clinical reasoning, the application of medical knowledge to a patient’s problem, requires training in a safe environment. Learning tasks based on Virtual Patients (VP-tasks) simulate the clinical setting in a save way and integrate well into blended-learning environments, as synchronous tasks (face-to-face or online) or as asynchronous online tasks. The article presents the editorial process for developing VP-based self-study quizes (SSQ) and field-study results on students’ learning experiences and study habits. 
The editorial process initially only involved experienced clinical, educational and technical experts. To better match the tasks’ difficulty to students’ knowledge, junior doctors and advanced medical students joined in a later stage. Students (n = 351) rated the SSQs (n = 10) produced by the expanded team to match their knowledge better as compared to the SSQs (n = 13) developed by the initial expert editorial team. Students rated the online SSQs as more helpful as compared to similar face-to-face VP-tasks. Students’ free comments indicate their high acceptance of the SSQ-format. 
The SSQ-format is feasible for providing systematic online training in clinical reasoning, especially when working with a multi-level-educational editorial team and when a systematically structured blueprint of topics and learning goals drives the editorial work."
95,"Rephrase, Augment, Reason: Visual Grounding of Questions for Vision-Language Models",paper,2023-10-09,"An increasing number of vision-language tasks can be handled with little to no training, i.e., in a zero and few-shot manner, by marrying large language models (LLMs) to vision encoders, resulting in large vision-language models (LVLMs). While this has huge upsides, such as not requiring training data or custom architectures, how an input is presented to an LVLM can have a major impact on zero-shot model performance. In particular, inputs phrased in an underspecified way can result in incorrect answers due to factors like missing visual information, complex implicit reasoning, or linguistic ambiguity. Therefore, adding visually-grounded information to the input as a preemptive clarification should improve model performance by reducing underspecification, e.g., by localizing objects and disambiguating references. Similarly, in the VQA setting, changing the way questions are framed can make them easier for models to answer. To this end, we present Rephrase, Augment and Reason (RepARe), a gradient-free framework that extracts salient details about the image using the underlying LVLM as a captioner and reasoner, in order to propose modifications to the original question. We then use the LVLM's confidence over a generated answer as an unsupervised scoring function to select the rephrased question most likely to improve zero-shot performance. Focusing on three visual question answering tasks, we show that RepARe can result in a 3.85% (absolute) increase in zero-shot accuracy on VQAv2, 6.41%, and 7.94% points increase on A-OKVQA, and VizWiz respectively. Additionally, we find that using gold answers for oracle question candidate selection achieves a substantial gain in VQA accuracy by up to 14.41%. Through extensive analysis, we demonstrate that outputs from RepARe increase syntactic complexity, and effectively utilize vision-language interaction and the frozen LLM."
96,Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning,paper,2022-03-15,"Assembly of multi-part physical structures is both a valuable end product for autonomous robotics, as well as a valuable diagnostic task for open-ended training of embodied intelligent agents. We introduce a naturalistic physics-based environment with a set of connectable magnet blocks inspired by children's toy kits. The objective is to assemble blocks into a succession of target blueprints. Despite the simplicity of this objective, the compositional nature of building diverse blueprints from a set of blocks leads to an explosion of complexity in structures that agents encounter. Furthermore, assembly stresses agents' multi-step planning, physical reasoning, and bimanual coordination. We find that the combination of large-scale reinforcement learning and graph-based policies -- surprisingly without any additional complexity -- is an effective recipe for training agents that not only generalize to complex unseen blueprints in a zero-shot manner, but even operate in a reset-free setting without being trained to do so. Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents."
97,Analogy-preserving functions: A way to extend Boolean samples,paper,2017-08-19,"Training set extension is an important issue in machine learning. Indeed when the examples at hand are in a limited quantity, the performances of standard classifiers may significantly decrease and it can be helpful to build additional examples. In this paper, we consider the use of analogical reasoning , and more particularly of analogical proportions for extending training sets. Here the ground truth labels are considered to be given by a (partially known) function. We examine the conditions that are required for such functions to ensure an error-free extension in a Boolean setting. To this end, we introduce the notion of Analogy Preserving (AP) functions, and we prove that their class is the class of affine Boolean functions. This noteworthy theoretical result is complemented with an empirical investigation of approximate AP functions, which suggests that they remain suitable for training set extension."
98,A Rigorous Investigation of “Evidence” and “Occam Factors” in Bayesian Reasoning,paper,1992,"This paper first reviews the reasoning behind the Bayesian ""evidence"" procedure for setting parameters in the probability distributions involved in inductive inference. This paper then proves that the evidence procedure is incorrect. More precisely, this paper proves that the assumptions going into the evidence procedure do not, as claimed, ""let the data determine the distributions"". Instead, those assumptions simply amount to an implicit replacement of the original distributions, containing free parameters, with new distributions, none of whose parameters are free. For example, as used by MacKay [1991] in the context of neural nets, the evidence procedure is a means for using the training set to determine the free parameter ex in the distribution P(Iwil) oc exp(ro:: 1 Wi2), where the N Wi are the N weights in the network. As this paper proves, in actuality the assumptions going into MacKay's use of the evidence procedure do not result in a distribution P(lwil) oc exp(ro::1 w?) for some ex, but rather result in a parameter-less distribution, P(lwil) oc (L:1 w?r CN!2 + 1). This paper goes on to prove that ifone makes the assumption of an ""entropic prior"" with unknown parameter value, in addition to the assumptions used in the evidence procedure, then the prior is completely fixed, but in a form which can not be entropic. (This calls into question the self-consistency of the numerous arguments purporting to derive an entropic prior ""from first principles"".) Finally, this paper goes on to investigate the Bayesian first-principles ""proof' of Occam's razor involving Occam factors. This paper proves that that ""proof' is flawed."
99,Simulation Awareness : Assessing Performance with Limited Simulation Instrumentation,paper,2016,"Experts in troubleshooting are skilled at identifying important diagnostic cues and making justified inferences about problems and their causes. In a training setting, students can be assessed for the same troubleshooting skills, as long as there is clarity about the cues students use as a basis for their decisions. In a simulation-based intelligent tutoring system (ITS) where assessment is automated, this means the simulation must be transparent enough to afford an accurate picture of the cues that have been revealed to the student in the environment, in order to validate or invalidate the student’s decisions. But not all simulations can be affordably instrumented to provide the desired level of transparency. This paper describes a domain modeling and assessment approach designed to accommodate reasoning with partial knowledge in cases where there are practical limits on simulation instrumentation. The assessment approach is applied in an ITS for training information technology troubleshooting, with a free-play simulation using virtual machines to reproduce a realistic network of computers. From an experiential point of view, this is ideal for giving trainees the opportunity to perform in a realistic operational environment. However, only a subset of simulation events in the virtual machines can be feasibly collected by instrumentation, and in many cases it is only practical to monitor either student actions, or their results, but not both. The paper describes the modeling and assessment approach in this context, with examples where reductions in simulation instrumentation were achievable. We discuss the applicability of this approach for other domains and its limitations, as well as the methods used for model authoring."
100,Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning,paper,2022-09-29,"Mathematical reasoning, a core ability of human intelligence, presents unique challenges for machines in abstract thinking and logical reasoning. Recent large pre-trained language models such as GPT-3 have achieved remarkable progress on mathematical reasoning tasks written in text form, such as math word problems (MWP). However, it is unknown if the models can handle more complex problems that involve math reasoning over heterogeneous information, such as tabular data. To fill the gap, we present Tabular Math Word Problems (TabMWP), a new dataset containing 38,431 open-domain grade-level problems that require mathematical reasoning on both textual and tabular data. Each question in TabMWP is aligned with a tabular context, which is presented as an image, semi-structured text, and a structured table. There are two types of questions: free-text and multi-choice, and each problem is annotated with gold solutions to reveal the multi-step reasoning process. We evaluate different pre-trained models on TabMWP, including the GPT-3 model in a few-shot setting. As earlier studies suggest, since few-shot GPT-3 relies on the selection of in-context examples, its performance is unstable and can degrade to near chance. The unstable issue is more severe when handling complex problems like TabMWP. To mitigate this, we further propose a novel approach, PromptPG, which utilizes policy gradient to learn to select in-context examples from a small amount of training data and then constructs the corresponding prompt for the test example. Experimental results show that our method outperforms the best baseline by 5.31% on the accuracy metric and reduces the prediction variance significantly compared to random selection, which verifies its effectiveness in selecting in-context examples."
101,Use of clinical reasoning tasks by medical students,paper,2019-03-09,"Abstract Background A framework of clinical reasoning tasks used by physicians during clinical encounters was previously developed proposing that clinical reasoning is a complex process composed of 26 possible tasks. The aim of this paper was to analyze the verbalized clinical reasoning processes of medical students utilizing commonly encountered internal medicine cases. Methods In this mixed-methods study, participants viewed three video recorded clinical encounters. After each encounter, participants completed a think-aloud protocol. The qualitative data from the transcribed think-aloud transcripts were analyzed by two investigators using a constant comparative approach. The type, frequency, and pattern of codes used were analyzed. Results Seventeen third and fourth year medical students participated. They used 15 reasoning tasks across all cases. The average number of tasks used in cases 1, 2, and 3 was (respectively) 5.6 (range 3–8), 5.9 (range 4–8), and 5.3 (range 3–10). The order in which medical students verbalized reasoning tasks varied and appeared purposeful but non-sequential. Conclusions Consistent with prior research in residents, participants progressed through the encounter in a purposeful but non-sequential fashion. Reasoning tasks related to framing the encounter and diagnosis were not used in succession but interchangeably. This suggests that teaching successful clinical reasoning may involve encouraging or demonstrating multiple pathways through a problem. Further research exploring the association between use of clinical reasoning tasks and clinical reasoning accuracy could enhance the medical community’s understanding of variance in clinical reasoning."
102,Lingdan: enhancing encoding of traditional Chinese medicine knowledge for clinical reasoning tasks with large language models.,paper,2024-07-22,"OBJECTIVE
The recent surge in large language models (LLMs) across various fields has yet to be fully realized in traditional Chinese medicine (TCM). This study aims to bridge this gap by developing a large language model tailored to TCM knowledge, enhancing its performance and accuracy in clinical reasoning tasks such as diagnosis, treatment, and prescription recommendations.


MATERIALS AND METHODS
This study harnessed a wide array of TCM data resources, including TCM ancient books, textbooks, and clinical data, to create 3 key datasets: the TCM Pre-trained Dataset, the Traditional Chinese Patent Medicine (TCPM) Question Answering Dataset, and the Spleen and Stomach Herbal Prescription Recommendation Dataset. These datasets underpinned the development of the Lingdan Pre-trained LLM and 2 specialized models: the Lingdan-TCPM-Chat Model, which uses a Chain-of-Thought process for symptom analysis and TCPM recommendation, and a Lingdan Prescription Recommendation model (Lingdan-PR) that proposes herbal prescriptions based on electronic medical records.


RESULTS
The Lingdan-TCPM-Chat and the Lingdan-PR Model, fine-tuned on the Lingdan Pre-trained LLM, demonstrated state-of-the art performances for the tasks of TCM clinical knowledge answering and herbal prescription recommendation. Notably, Lingdan-PR outperformed all state-of-the-art baseline models, achieving an improvement of 18.39% in the Top@20 F1-score compared with the best baseline.


CONCLUSION
This study marks a pivotal step in merging advanced LLMs with TCM, showcasing the potential of artificial intelligence to help improve clinical decision-making of medical diagnostics and treatment strategies. The success of the Lingdan Pre-trained LLM and its derivative models, Lingdan-TCPM-Chat and Lingdan-PR, not only revolutionizes TCM practices but also opens new avenues for the application of artificial intelligence in other specialized medical fields. Our project is available at https://github.com/TCMAI-BJTU/LingdanLLM."
103,Clinical Reasoning Tasks and Resident Physicians: What Do They Reason About?,paper,2016-07-01,"Purpose A framework of clinical reasoning tasks thought to occur in a clinical encounter was recently developed. It proposes that diagnostic and therapeutic reasoning comprise 24 tasks. The authors of this current study used this framework to investigate what internal medicine residents reason about when they approach straightforward clinical cases. Method Participants viewed three video-recorded clinical encounters portraying common diagnoses. After each video, participants completed a post encounter form and think-aloud protocol. Two authors analyzed transcripts from the think-aloud protocols using a constant comparative approach. They conducted iterative coding of the utterances, classifying each according to the framework of clinical reasoning tasks. They evaluated the type, number, and sequence of tasks the residents used. Results Ten residents participated in the study in 2013–2014. Across all three cases, the residents employed 14 clinical reasoning tasks. Nearly all coded tasks were associated with framing the encounter or diagnosis. The order in which residents used specific tasks varied. The average number of tasks used per case was as follows: Case 1, 4.4 (range 1–10); Case 2, 4.6 (range 1–6); and Case 3, 4.7 (range 1–7). The residents used some tasks repeatedly; the average number of task utterances was 11.6, 13.2, and 14.7 for, respectively, Case 1, 2, and 3. Conclusions Results suggest that the use of clinical reasoning tasks occurs in a varied, not sequential, process. The authors provide suggestions for strengthening the framework to more fully encompass the spectrum of reasoning tasks that occur in residents’ clinical encounters."
104,Medical Visual Question Answering via Conditional Reasoning and Contrastive Learning,paper,2022-12-26,"Medical visual question answering (Med-VQA) aims to accurately answer a clinical question presented with a medical image. Despite its enormous potential in healthcare services, the development of this technology is still in the initial stage. On the one hand, Med-VQA tasks are highly challenging due to the massive diversity of clinical questions that require different visual reasoning skills for different types of questions. On the other hand, medical images are complex in nature and very different from natural images, while current Med-VQA datasets are small-scale with a few hundred radiology images, making it difficult to train a well-performing visual feature extractor. This paper addresses above two critical issues. We propose a novel conditional reasoning mechanism with a question-conditioned reasoning component and a type-conditioned reasoning strategy to learn effective reasoning skills for different Med-VQA tasks adaptively. Further, we propose to pre-train a visual feature extractor for Med-VQA via contrastive learning on large amounts of unlabeled radiology images. The effectiveness of our proposals is validated by extensive experiments on existing Med-VQA benchmarks, which show significant improvement of our model in prediction accuracy over state-of-the-art methods. The source code and pre-training dataset are provided at https://github.com/Awenbocc/CPCR."
105,Microanalytic Assessment of Self-Regulated Learning During Clinical Reasoning Tasks: Recent Developments and Next Steps,paper,2016-11-01,"Helping medical educators obtain and use assessment data to assist medical students, residents, and physicians in reducing diagnostic errors and other forms of ineffective clinical practice is of critical importance. Self-Regulated Learning–Microanalytic Assessment and Training is an assessment-to-intervention framework designed to address this need by generating data about trainees’ strategic processes (e.g., focusing on clinical task procedures), regulatory processes (e.g., planning how to do a task), and motivational processes (e.g., increasing confidence for performing a task) as they perform clinical activities. In this article, the authors review several studies that have used an innovative assessment approach, called self-regulated learning (SRL) microanalysis, to generate data about how trainees regulate their thinking and actions during clinical reasoning tasks. Across the studies, initial findings revealed that medical students often do not exhibit strategic thinking and action during clinical reasoning practice tasks even though some regulatory processes (e.g., planning) are predictive of important medical education outcomes. Further, trainees’ motivation beliefs, strategic thinking, and self-evaluative judgments tend to shift rapidly during clinical skills practice and may also vary across different parts of a patient encounter. Collectively, these findings underscore the value of dynamically assessing trainees’ SRL as they complete clinical tasks. The findings also set the stage for exploring how medical educators can best use SRL microanalytic assessment data to guide remedial practices and the provision of feedback to trainees. Implications and future research directions for connecting assessments to intervention in medical education are discussed."
106,A Survey of Reasoning with Foundation Models,paper,2023-12-17,"Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-world settings such as negotiation, medical diagnosis, and criminal investigation. It serves as a fundamental methodology in the field of Artificial General Intelligence (AGI). With the ongoing development of foundation models, e.g., Large Language Models (LLMs), there is a growing interest in exploring their abilities in reasoning tasks. In this paper, we introduce seminal foundation models proposed or adaptable for reasoning, highlighting the latest advancements in various reasoning tasks, methods, and benchmarks. We then delve into the potential future directions behind the emergence of reasoning abilities within foundation models. We also discuss the relevance of multimodal learning, autonomous agents, and super alignment in the context of reasoning. By discussing these future research directions, we hope to inspire researchers in their exploration of this field, stimulate further advancements in reasoning with foundation models, and contribute to the development of AGI."
107,Diagnose Like a Radiologist: Hybrid Neuro-Probabilistic Reasoning for Attribute-Based Medical Image Diagnosis,paper,2021-11-25,"During clinical practice, radiologists often use attributes, e.g., morphological and appearance characteristics of a lesion, to aid disease diagnosis. Effectively modeling attributes as well as all relationships involving attributes could boost the generalization ability and verifiability of medical image diagnosis algorithms. In this paper, we introduce a hybrid neuro-probabilistic reasoning algorithm for verifiable attribute-based medical image diagnosis. There are two parallel branches in our hybrid algorithm, a Bayesian network branch performing probabilistic causal relationship reasoning and a graph convolutional network branch performing more generic relational modeling and reasoning using a feature representation. Tight coupling between these two branches is achieved via a cross-network attention mechanism and the fusion of their classification results. We have successfully applied our hybrid reasoning algorithm to two challenging medical image diagnosis tasks. On the LIDC-IDRI benchmark dataset for benign-malignant classification of pulmonary nodules in CT images, our method achieves a new state-of-the-art accuracy of 95.36% and an AUC of 96.54%. Our method also achieves a 3.24% accuracy improvement on an in-house chest X-ray image dataset for tuberculosis diagnosis. Our ablation study indicates that our hybrid algorithm achieves a much better generalization performance than a pure neural network architecture under very limited training data."
108,Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks,paper,2024-03-30,"While recent advancements in commercial large language models (LM) have shown promising results in medical tasks, their closed-source nature poses significant privacy and security concerns, hindering their widespread use in the medical field. Despite efforts to create open-source models, their limited parameters often result in insufficient multi-step reasoning capabilities required for solving complex medical problems. To address this, we introduce Meerkat, a new family of medical AI systems ranging from 7 to 70 billion parameters. The models were trained using our new synthetic dataset consisting of high-quality chain-of-thought reasoning paths sourced from 18 medical textbooks, along with diverse instruction-following datasets. Our systems achieved remarkable accuracy across six medical benchmarks, surpassing the previous best models such as MediTron and BioMistral, and GPT-3.5 by a large margin. Notably, Meerkat-7B surpassed the passing threshold of the United States Medical Licensing Examination (USMLE) for the first time for a 7B-parameter model, while Meerkat-70B outperformed GPT-4 by an average of 1.3%. Additionally, Meerkat-70B correctly diagnosed 21 out of 38 complex clinical cases, outperforming humans' 13.8 and closely matching GPT-4's 21.8. Our systems offered more detailed free-form responses to clinical queries compared to existing small models, approaching the performance level of large commercial models. This significantly narrows the performance gap with large LMs, showcasing its effectiveness in addressing complex medical challenges."
109,Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering,paper,2024-03-07,"Large Language models (LLMs) have demonstrated significant potential in transforming healthcare by automating tasks such as clinical documentation, information retrieval, and decision support. In this aspect, carefully engineered prompts have emerged as a powerful tool for using LLMs for medical scenarios, e.g., patient clinical scenarios. In this paper, we propose a modified version of the MedQA-USMLE dataset, which is subjective, to mimic real-life clinical scenarios. We explore the Chain of Thought (CoT) reasoning based on subjective response generation for the modified MedQA-USMLE dataset with appropriate LM-driven forward reasoning for correct responses to the medical questions. Keeping in mind the importance of response verification in the medical setting, we utilize a reward training mechanism whereby the language model also provides an appropriate verified response for a particular response to a clinical question. In this regard, we also include human-in-the-loop for different evaluation aspects. We develop better in-contrast learning strategies by modifying the 5-shot-codex-CoT-prompt from arXiv:2207.08143 for the subjective MedQA dataset and developing our incremental-reasoning prompt. Our evaluations show that the incremental reasoning prompt performs better than the modified codex prompt in certain scenarios. We also show that greedy decoding with the incremental reasoning method performs better than other strategies, such as prompt chaining and eliminative reasoning."
110,"PyTorch: An Imperative Style, High-Performance Deep Learning Library",paper,2019-12-03,"Deep learning frameworks have often focused on either usability or speed, but not both. PyTorch is a machine learning library that shows that these two goals are in fact compatible: it was designed from first principles to support an imperative and Pythonic programming style that supports code as a model, makes debugging easy and is consistent with other popular scientific computing libraries, while remaining efficient and supporting hardware accelerators such as GPUs. In this paper, we detail the principles that drove the implementation of PyTorch and how they are reflected in its architecture. We emphasize that every aspect of PyTorch is a regular Python program under the full control of its user. We also explain how the careful and pragmatic implementation of the key components of its runtime enables them to work together to achieve compelling performance. We demonstrate the efficiency of individual subsystems, as well as the overall speed of PyTorch on several commonly used benchmarks."
111,Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,paper,2015-02-06,"Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset."
112,Performance standards for antimicrobial susceptibility testing,paper,2019,"The data in the interpretive tables in this supplement are valid only if the methodologies in the following Clinical and Laboratory Standards Institute (CLSI)–approved standards are followed: M02-A12—Performance Standards for Antimicrobial Disk Susceptibility Tests; Approved Standard—Twelfth Edition; M07-A10—Methods for Dilution Antimicrobial Susceptibility Tests for Bacteria That Grow Aerobically; Approved Standard—Tenth Edition; and M11-A8—Methods for Antimicrobial Susceptibility Testing of Anaerobic Bacteria; Approved Standard—Eighth Edition. The standards contain information about both disk (M02) and dilution (M07 and M11) test procedures for aerobic and anaerobic bacteria. Clinicians depend heavily on information from the microbiology laboratory for treatment of their seriously ill patients. The clinical importance of antimicrobial susceptibility test results demands that these tests be performed under optimal conditions and that laboratories have the capability to provide results for the newest antimicrobial agents. The tabular information presented here represents the most current information for drug selection, interpretation, and QC using the procedures standardized in the most current editions of M02, M07, and M11. Users should replace the tables published earlier with these new tables. (Changes in the tables since the previous edition appear in boldface type.) Clinical and Laboratory Standards Institute (CLSI). Performance Standards for Antimicrobial Susceptibility Testing. 27th ed. CLSI supplement M100 (ISBN 1-56238-804-5 [Print]; ISBN 1-56238-805-3 [Electronic]). Clinical and Laboratory Standards Institute, 950 West Valley Road, Suite 2500, Wayne, Pennsylvania 19087 USA, 2017. The Clinical and Laboratory Standards Institute consensus process, which is the mechanism for moving a document through two or more levels of review by the health care community, is an ongoing process. Users should expect revised editions of any given document. Because rapid changes in technology may affect the procedures, methods, and protocols in a standard or guideline, users should replace outdated editions with the current editions of CLSI documents. Current editions are listed in the CLSI catalog and posted on our website at www.clsi.org. If you or your organization is not a member and would like to become one, and to request a copy of the catalog, contact us at: Telephone: +1.610.688.0100; Fax: +1.610.688.0700; E-Mail: customerservice@clsi.org; Website: www.clsi.org. M100S, 26th ed. January 2016 Replaces M100-S25 Performance Standards for Antimicrobial Susceptibility Testing Jean B. Patel, PhD, D(ABMM) Franklin R. Cockerill III, MD George M. Eliopoulos, MD Stephen G. Jenkins, PhD, D(ABMM), F(AAM) James S. Lewis II, PharmD Brandi Limbago, PhD David P. Nicolau, PharmD, FCCP, FIDSA Robin Patel, MD M ir Pow ll, MD, FRCP, FRCPath Sandra S. Richter, MD, D(ABMM) Jana M. Swenson, MMSc Maria M. Traczewski, BS, MT(ASCP) John D. Turnidge, MD Melvi P. Weinstein, MD Barbara L. Zimmer, PhD"
113,In-datacenter performance analysis of a tensor processing unit,paper,2017-04-16,"Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X–30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X–80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU."
114,"Active learning increases student performance in science, engineering, and mathematics",paper,2014-05-12,"Significance The President’s Council of Advisors on Science and Technology has called for a 33% increase in the number of science, technology, engineering, and mathematics (STEM) bachelor’s degrees completed per year and recommended adoption of empirically validated teaching practices as critical to achieving that goal. The studies analyzed here document that active learning leads to increases in examination performance that would raise average grades by a half a letter, and that failure rates under traditional lecturing increase by 55% over the rates observed under active learning. The analysis supports theory claiming that calls to increase the number of students receiving STEM degrees could be answered, at least in part, by abandoning traditional lecturing in favor of active learning. To test the hypothesis that lecturing maximizes learning and course performance, we metaanalyzed 225 studies that reported data on examination scores or failure rates when comparing student performance in undergraduate science, technology, engineering, and mathematics (STEM) courses under traditional lecturing versus active learning. The effect sizes indicate that on average, student performance on examinations and concept inventories increased by 0.47 SDs under active learning (n = 158 studies), and that the odds ratio for failing was 1.95 under traditional lecturing (n = 67 studies). These results indicate that average examination scores improved by about 6% in active learning sections, and that students in classes with traditional lecturing were 1.5 times more likely to fail than were students in classes with active learning. Heterogeneity analyses indicated that both results hold across the STEM disciplines, that active learning increases scores on concept inventories more than on course examinations, and that active learning appears effective across all class sizes—although the greatest effects are in small (n ≤ 50) classes. Trim and fill analyses and fail-safe n calculations suggest that the results are not due to publication bias. The results also appear robust to variation in the methodological rigor of the included studies, based on the quality of controls over student quality and instructor identity. This is the largest and most comprehensive metaanalysis of undergraduate STEM education published to date. The results raise questions about the continued use of traditional lecturing as a control in research studies, and support active learning as the preferred, empirically validated teaching practice in regular classrooms."
115,High-performance photovoltaic perovskite layers fabricated through intramolecular exchange,paper,2015-06-12,"Taking in more sun Most efforts to grow superior films of organic-inorganic perovskites for solar cells have focused on methylammonium lead iodide (MAPbI3). However, formamidinium lead iodide (FAPbI3) has a broader solar absorption spectrum that could ultimately lead to better performance. Yang et al. grew high-quality FAPbI3 films by starting with a film of lead iodide and dimethylsulfoxide (DMSO) and then exchanging the DMSO with formamidinium iodide. Their best devices achieved power conversion efficiencies exceeding 20%. Science, this issue p. 1234 An intramolecular exchange process enables growth of high-quality organic perovskite films with greater solar spectral range. The band gap of formamidinium lead iodide (FAPbI3) perovskites allows broader absorption of the solar spectrum relative to conventional methylammonium lead iodide (MAPbI3). Because the optoelectronic properties of perovskite films are closely related to film quality, deposition of dense and uniform films is crucial for fabricating high-performance perovskite solar cells (PSCs). We report an approach for depositing high-quality FAPbI3 films, involving FAPbI3 crystallization by the direct intramolecular exchange of dimethylsulfoxide (DMSO) molecules intercalated in PbI2 with formamidinium iodide. This process produces FAPbI3 films with (111)-preferred crystallographic orientation, large-grained dense microstructures, and flat surfaces without residual PbI2. Using films prepared by this technique, we fabricated FAPbI3-based PSCs with maximum power conversion efficiency greater than 20%."
116,DeepFace: Closing the Gap to Human-Level Performance in Face Verification,paper,2014-06-01,"In modern face recognition, the conventional pipeline consists of four stages: detect => align => represent => classify. We revisit both the alignment step and the representation step by employing explicit 3D face modeling in order to apply a piecewise affine transformation, and derive a face representation from a nine-layer deep neural network. This deep network involves more than 120 million parameters using several locally connected layers without weight sharing, rather than the standard convolutional layers. Thus we trained it on the largest facial dataset to-date, an identity labeled dataset of four million facial images belonging to more than 4, 000 identities. The learned representations coupling the accurate model-based alignment with the large facial database generalize remarkably well to faces in unconstrained environments, even with a simple classifier. Our method reaches an accuracy of 97.35% on the Labeled Faces in the Wild (LFW) dataset, reducing the error of the current state of the art by more than 27%, closely approaching human-level performance."
117,"Institutions, Institutional Change and Economic Performance: Economic performance",paper,1990-10-01,"Examines the role that institutions, defined as the humanly devised constraints that shape human interaction, play in economic performance and how those institutions change and how a model of dynamic institutions explains the differential performance of economies through time. Institutions are separate from organizations, which are assemblages of people directed to strategically operating within institutional constraints. Institutions affect the economy by influencing, together with technology, transaction and production costs. They do this by reducing uncertainty in human interaction, albeit not always efficiently. Entrepreneurs accomplish incremental changes in institutions by perceiving opportunities to do better through altering the institutional framework of political and economic organizations. Importantly, the ability to perceive these opportunities depends on both the completeness of information and the mental constructs used to process that information. Thus, institutions and entrepreneurs stand in a symbiotic relationship where each gives feedback to the other. Neoclassical economics suggests that inefficient institutions ought to be rapidly replaced. This symbiotic relationship helps explain why this theoretical consequence is often not observed: while this relationship allows growth, it also allows inefficient institutions to persist. The author identifies changes in relative prices and prevailing ideas as the source of institutional alterations. Transaction costs, however, may keep relative price changes from being fully exploited. Transaction costs are influenced by institutions and institutional development is accordingly path-dependent. (CAR)"
118,MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability,paper,2013-01-16,"We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations."
119,A model for multi-disciplinary collaboration in child protection,paper,1997,"Working from a background in child protective intervention and staff training and development, the authors sought to address two commonly reported deficits in child protection – the lack of a risk management framework and failures in interagency collaboration. This paper reports their approach to locating a risk and safety factor matrix and their evaluation of its use in a series of interagency workshops designed to improve collaboration."
120,Multi-Disciplinary Integrated Framework and Architecture for a Human Centered Collaborative Commerce System,paper,2006-12-04,"In recent years, collaborative commerce (or c-commerce) and human-centric systems have been research subjects of keen interest in Web-based technologies and have become a major focus for many organizations. C-commerce provides and supports dynamic collaborative environments over the Internet and other potential public information networks (PINs). It offers many levels of collaborative interactions and enables organizations and individuals to work simultaneously, cooperatively, as well as independently. Of late, it has become increasingly important to integrate human-centered designs in collaborative systems to achieve user-oriented environments. This paper presents human-centered c-commerce system (HCCS), an architectural framework for collaborative commerce with integration of some human aspects. We discuss its framework and describe each component of the proposed architecture. We contend that using an intelligent c-commerce system, such as HCCS, will enhance collaboration between individual users and among organizations, and will provide effective infrastructure to achieve the intended objectives of collaborative work"
121,Abstract C115: Framework for a community-based multiple myeloma screening program,paper,2023-12-01,"
 Purpose: We develop a framework for a community-based screening program for multiple myeloma (MM). MM is known to be twice as common in Black/African American (B/AA) persons compared to white persons and B/AA patients have a higher risk of delay in initial treatment compared to white patients. Therefore, we developed a community-wide screening program for B/AA persons 50 years and older. As the first of these programs, we reveal an organizational framework that prioritizes engagement with key community partners who address barriers to cancer screening. Experimental Procedures: We expand organizational development knowledge about how to implement community-based MM screening and address barriers. The study uses an iterative and qualitative continuous improvement process. The study draws on literature review and multi-disciplinary team collaboration among clinicians, researchers, administrators, and the Cleveland Clinic Community Outreach Program (Cleveland Outreach). Data Summary: Our framework includes three interrelated partners and each address barriers to MM screening: 1) Cleveland Clinic Cancer Center. The multi-disciplinary team designs an educational brochure, secures pilot grant funding, plans for insurance/financial assistance as needed, develops care algorithms, and evaluates the impact of the program. Cleveland Outreach, a department within the Cancer Center, uses the Harold Freeman model as its foundational approach, which includes patient navigators and community outreach managers who are culturally/linguistically matched to medically underserved communities. The Cancer Center addresses barriers including program implementation despite a lack of national screening guidelines, insurance/financial barriers, and patient navigation to address fears and health literacy. 2) Faith-based organizations. Church leaders committed to cancer prevention participate in Cleveland Outreach’s faith-based outreach model. Faith-based partners address barriers of trust and fear. Also, by providing space at churches for on-site screening blood draws, they address transportation barriers. 3) Clinicians at Federally Qualified Health Centers (FQHC’s). Providers and nurses at FQHC’s receive a Continuing Medical Education program about MM screening designed/funded by the Cancer Center. From within the community, providers notify medically underserved patients about the screening. With their primary care patient population, FQHC partners address barriers such as trust, health literacy, and fear of blood draws. They facilitate on-site screening at FQHC’s and health fairs, which addresses transportation barriers. Conclusions: Given that there are no universal guidelines for MM screening, we develop a novel approach to early disease detection in B/AA patients. Our framework is a partnership between the Cancer Center, faith-based organizations, and FQHC’s. Collaboratively, we address barriers to screening including fear, trust, insurance/financial, health literacy, transportation, and patient navigation if abnormal results suggest the need for further testing.
 Citation Format: Heather McKee Hurwitz, Kimberly Bell, Diana Basali, Raymond D. Jackson II, Jason Valent. Framework for a community-based multiple myeloma screening program [abstract]. In: Proceedings of the 16th AACR Conference on the Science of Cancer Health Disparities in Racial/Ethnic Minorities and the Medically Underserved; 2023 Sep 29-Oct 2;Orlando, FL. Philadelphia (PA): AACR; Cancer Epidemiol Biomarkers Prev 2023;32(12 Suppl):Abstract nr C115."
122,Fit for Purpose: A Framework for Developing and Assessing Complex Graduate Attributes in a Changing Higher Education Environment,paper,2013-12-01,"Abstract This paper provides an assessment framework for shared collaboration among accounting educators. Key developments in higher education more broadly and challenges specific to accounting education are synthesised to identify their combined effects on the accounting curriculum and on accounting academics in fulfilling the teaching component of their academic role. The issue of assessment validity for complex graduate attributes that invariably encompass skills, disposition and values – beyond technical knowledge – is highlighted. This paper draws on multi-disciplinary research and principles of best practice to provide a framework to design innovative assessments. The framework is then used to illustrate the support and design of a set of assessment learning activities targeting graduate attributes. The authors also intend to share the framework with peers and to promote debate and further development to counter some of the challenges presently confronting accounting education in the already resource-scarce and high-demand higher education sector."
123,Embedding collective leadership to foster collaborative inter-professional working in the care of older people (ECLECTIC): Study protocol,paper,2020-03-03,"Background: The National Integrated Care Programme for Older People (NICPOP), formerly NCPOP aims to support older people to live well in their homes by developing primary and secondary care services for older people, especially those with complex needs. The programme develops integrated intermediate care which traverses both hospital and community settings through multidisciplinary and interagency teams. This team-based approach to the integration of health services is a novel innovation in Irish health service delivery and will require, over time, a shift in cultures of care to allow for the development of competencies for inter-professional collaboration across the care continuum. The ECLECTIC project will develop an implementation framework for achieving, maintaining and monitoring competencies for interprofessional collaboration among multi-disciplinary teams charged with delivering care for older people across the continuum from acute to community settings. Design: The ECLECTIC research design has been developed in collaboration with the NICPOP. In phase one of the project, a co-design team will collaborate to define and shape competencies for interprofessional collaboration. Phase two will involve the delivery of a collective leadership intervention over a 10-month period with multidisciplinary professionals working with older people across two geographical regions (Mullingar/Midlands and Beaumont/Dublin North). Each group will comprise of members of two multidisciplinary teams charged with coordinating and delivering care to older people across the continuum of acute to community care. Observations of collaborative inter-professional working will take place before, during, and after intervention. In phase three of the study, analysis of the interview and observation data will be presented to the co-design team in order to develop an implementation framework for future teams. Discussion: The co-design process will develop core competencies and performance indicators for collaborative interprofessional working. The resulting implementation framework will be implemented nationally as part of the NICPOP."
124,An integrated design framework for mass customisation in the consumer electronics industry,paper,2011-02-11,"This paper investigates a framework for Mass Customisation (MC) in the consumer electronics industry. While personalisation is motivating the adoption of MC, providing adequate options for matching consumer needs remains a major challenge. This challenge can be conquered by applying scenario planning, product family architecture and a product styling platform. Simultaneously, introducing Web3D-based consumer co-design, enterprises can understand user preferences and then improve design concepts. An integrated framework and Agile Design Process (ADP) are proposed for achieving multi-disciplinary collaboration by these methods. Pilot surveys participated by design personnel indicate that the framework and ADP have high potential for harnessing MC."
125,"Our Ways, Your Ways, Both Ways – a multi-disciplinary collaboration to develop, embed and evaluate a model of social and emotional wellbeing care for Aboriginal and Torres Strait Islander young people who experience detention – Phase 1",paper,2023-10-20,"The National Strategic Framework for Aboriginal and Torres Strait Islander Peoples’ Mental Health and Social and Emotional Wellbeing identifies building a strong Aboriginal and Torres Strait Islander led evidence-base to inform care as a key priority. Aboriginal and/or Torres Strait Islander adolescents in contact with the criminal justice system are a highly vulnerable group of Australians, with substantial unmet needs. There is limited evidence to inform culturally appropriate models of care that meet the social and emotional wellbeing needs of justice-involved Aboriginal and/or Torres Strait Islander adolescents. This project aims to develop, implement and evaluate an in-reach and community transitional model of social and emotional wellbeing care for Aboriginal and/or Torres Strait Islander adolescents (10–17 years old) who experience detention through close engagement with Aboriginal and/or Torres Strait Islander youth, Elders, researchers, practitioners and community members, and by drawing on culturally informed practice and knowledge systems. The project is based on a multi-level mixed methods design, with a strong focus on ongoing project evaluation (based on the Ngaa-bi-nya framework) and co-design. Co-design is facilitated through culturally safe and trauma informed participatory processes based on development of strong partnerships from project initiative, design, implementation and evaluation. Application of the landscape domain of the Ngaa-bi-nya framework for Aboriginal and Torres Strait Islander program evaluation will be explored in Phase one. Aboriginal and Torres Strait Islander adolescents with experience in detention will be engaged through one-on-one interviews with data collection through the Growth and Empowerment Measure (GEM) Youth (which will be adapted from the adult version and validated as part of this study), the Kessler Psychological Distress Scale (K-10), questions around alcohol and drug use, and narrative interviews exploring experience. Qualitative data will be analyzed using an inductive thematic approach, structured within the framework of the Ngaa-bi-nya landscape prompts. Quantitative data will be analyzed using descriptive statistics to provide a profile of the cohort. Findings from Phase one will be used to inform the development of a model of social and emotional wellbeing care that will be implemented and evaluated in Phase two."
126,A Dynamic Network Analysis approach for evaluating knowledge dissemination in a multi-disciplinary collaboration network in obesity research,paper,2015-12-06,"Effective knowledge dissemination is important to promote the adoption of new concepts and tools. This study aims to provide a framework that assesses strategies for successful knowledge dissemination in a research collaboration network. We propose a Markov-chain Monte Carlo (MCMC) approach along with Dynamic Network Analysis (DNA) to model a social network and understand how different knowledge dissemination strategies can be used in a research collaboration network. The proposed method was demonstrated through a case study that uses a multi-disciplinary collaboration network in obesity research at an academic medical center. To assess the impact of initial disseminators on knowledge dissemination, four different strategies were considered. The simulation results indicated that the best strategy to disseminate knowledge within this obesity research network may be to use central agents in clusters when considering the coverage and speed of knowledge dissemination."
127,The Educational Role-Playing Game Design Matrix: Mapping Design Components onto Types of Education,paper,2023-05-15,"This article offers categories for understanding different facets of learning and role-playing games, including setting, purpose, framing, type of processing, and learning objectives. Types of games categorized include leisure, stand-alone educational RPGs, RPGs in education, and Educational RPGs."
128,Table-top role-playing games as a therapeutic intervention with adults to increase social connectedness,paper,2021-06-21,"ABSTRACT Research shows that social connectedness is decreasing and loneliness increasing in the United States, subsequently resulting in a health crisis due to the anxiety and depression these attributes can cause. There is evidence that clinicians have difficulty treating individuals experiencing social anxiety and there is need for intervention strategies that lower treatment barriers. There has been scant research recognizing the use of table-top role-playing games to incorporate when treating social anxiety. The current manuscript describes a year-long group using Dungeons and Dragons in a therapeutic setting and explores perceptions from participants who experienced this group. Core concepts of the model and lessons learned from the developers are described for clinicians who hope to incorporate such a model. Participants described increased confidence in social situations, particularly with boundaries or making mistakes. Secondly, the skills practiced in the game were transferred into real-world experiences. Implications for future research and limitations were described."
129,Setting Sight on Role Playing: To Accommodate or to Repudiate?,paper,2016-11-30,"To set sight on role play by means to look at EFL teacher’s experience and students’ perspectives of role play (RP) technique enactment in teaching speaking by using qualitative design. This research was a qualitative study. It was discharged at a Senior high school in Banda Aceh, Indonesia. It provided work for the instrument of observation sheet, field notes and interview guide, and also questionnaire. The methodology designated the combination of four mountainsides to expose in-depth the urgency of role play in which applied since 1936. The result of interview was exposed that the English teacher claimed that role play was a technique applied to promote speaking and it was corroborated by the result of field note. Likewise, regarding students’ perspective depicted that the students indeed agreed on themselves of the usefulness of role play to enhance their speaking skill and motivation. Thus, Students asserted that the learning was more fun and enjoyable through role play itself. It is merely found in this research study that role playing can accommodate students’ need and teacher’s side in English language teaching. Nevertheless, this article applies a small subject as the participant. Therefore, the researchers recommended to have a deep look at reasoning students’ point of view in terms of role play technique implementation in non-English class. And see ascertains how beneficial it is in terms of role play (RP) in a large classroom."
130,Setting the Stage: Role-Playing in the Group Work Classroom,paper,2014-06-06,"The literature on role-playing in educational settings gives little attention to the preparation of group atmosphere in the classroom. This article addresses the importance of preparation as well as strategies and techniques for transforming a traditional classroom structure into a cohesive group environment that supports individual and collective risk taking. Role-plays have the potential to provide students with an “almost real” experience that can enhance their interest in group work practice, an interest the profession of social work should make every effort to inspire and sustain."
131,Exploring First-Person Perspectives in Designing a Role-Playing VR Simulation for Bullying Prevention: A Focus Group Study,paper,2021-09-28,"Bullying is a complex and abusive form of violence that has potentially serious social and mental health consequences for children and adolescents. With reference to the Olweus Bullying Circle, this project involves the development of a simulation that will allow students to view themselves in different roles played in bullying situations using a virtual reality setting. Interventions need to explore the perspective of the student who bullies and the student being bullied, as well as the bystander in order to model desirable intervention behavior. The expectation is that through role-playing, the students will explore different perspectives and learn how to respond to bullying situations. Two focus groups were conducted to allow experts to contribute to the design of the bullying prevention simulation and gather suggestions for improvements. Findings from the focus group studies suggest that to create effective bullying prevention, Virtual Reality simulations should consider focusing on role-playing, customization of the characters, environments, scenarios and a scoring/reward system."
132,Penerapan Metode Bermain Peran (Role Playing) dalam Mengembangkan Kognitif Anak Usia 5-6 Tahun,paper,2019-12-28,"This study aims to find out how the application of the role playing method in developing cognitive skills of children aged 5-6 years. The problem in this study is the low cognitive development of children aged 5-6 years and the hope that the role playing method still uses limited media in RA Az-Zahra Natar, South Lampung. The method used in this research is a descriptive qualitative method that describes how the role playing method is used at RA Az-Zahra Natar, South Lampung. Data collection techniques used were observation, interviews and documentation. Analysis of the data used is the analysis of miles and huberman model data, namely data reduction, data collection and conclusion drawing. Based on data collection and analysis conducted there are several findings of research results about playing a role in developing cognitive children aged 5-6 years. The results of this research are the application of the steps used in role playing, i.eWarming up (warming up), participants (choosing players), setting the stage, observers (preparing observers), holding (ensuring roles), discussion and evaluation, reusing (playing back roles), second discussion and evaluation, various conclusions and experiences"
133,The effectiveness of applied learning: an empirical evaluation using role playing in the classroom,paper,2019-12-02,"
Purpose
The purpose of this paper is to evaluate the effectiveness of role playing as an applied learning technique for enhanced classroom experiences as compared to traditional lecture methods.


Design/methodology/approach
This study uses the pre-test/post-test design to conduct experiments with several control and experimental groups. Subjects are graduate students in an MBA program at a private, non-profit university in a traditional classroom setting.


Findings
Students in the experimental group gained significantly more knowledge (post-test minus pre-test scores) – 45 percent higher – through participation in the role playing exercise as compared to the control group.


Research limitations/implications
This study represents only a single educational discipline explored using a single role playing learning activity. Impacts on the long-term retention of the knowledge should be studied further.


Practical implications
Educators should enhance their classroom experience with more applied learning activities such as role playing in order to increase knowledge gain and potentially longer knowledge retention.


Originality/value
This study uses a customized role playing activity within a business curriculum as one of many applied learning techniques. The value to students was shown by significantly higher gain in knowledge while simultaneously enhancing their enjoyment of the classroom experience to potentially encourage further lifelong learning.
"
134,Better Zero-Shot Reasoning with Role-Play Prompting,paper,2023-08-15,"Modern large language models (LLMs) exhibit a remarkable capacity for role-playing, enabling them to embody not only human characters but also non-human entities. This versatility allows them to simulate complex human-like interactions and behaviors within various contexts, as well as to emulate specific objects or systems. While these capabilities have enhanced user engagement and introduced novel modes of interaction, the influence of role-playing on LLMs’ reasoning abilities remains underexplored. In this study, we introduce a strategically designed role-play prompting methodology and assess its performance under the zero-shot setting across twelve diverse reasoning benchmarks. Our empirical results illustrate that role-play prompting consistently surpasses the standard zero-shot approach across most datasets. Notably, in experiments conducted using ChatGPT, accuracy on AQuA rises from 53.5% to 63.8%, and on Last Letter from 23.8% to 84.2%. Upon further comparison with the Zero-Shot-CoT technique, which prompts the model to “think step by step”, our study demonstrates that role-play prompting acts as a more effective trigger for the CoT process.This highlights its potential to augment the reasoning capabilities of LLMs. We release our code at https://github.com/NKU-HLT/Role-Play-Prompting."
135,A BDI Game Master Agent for Computer Role-Playing Games,paper,2017-03-01,"In this paper we describe an approach for developing an intelligent game master (GM) for computer role-playing games. The role of the GM is to set up the game environment, manage the narrative ow and enforce the game rules whilst keeping the players engaged. Our approach is to use the popular Belief-Desire-Intention (BDI) model of agents to developing a GM. We describe the process for creating such a GM and how we implemented a prototype of it for a scenario in the Neverwinter Nights (NWN) game. We describe the evaluation of our prototype with human participants who played the chosen NWN scenario both with and without the BDI GM. The comparison survey completed by the participants shows that the system with the BDI GM was the clear winner with respect to game replayability, flexibility, objective setting and overall interest; thus, validating our hypothesis that a BDI GM will provide game players with a better gaming experience."
136,ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs,paper,2023-09-22,"Large Language Models (LLMs) still struggle with natural language reasoning tasks. Motivated by the society of minds (Minsky, 1988), we propose ReConcile, a multi-model multi-agent framework designed as a round table conference among diverse LLM agents. ReConcile enhances collaborative reasoning between LLM agents via multiple rounds of discussion, learning to convince other agents to improve their answers, and employing a confidence-weighted voting mechanism that leads to a better consensus. In each round, ReConcile initiates discussion between agents via a 'discussion prompt' that consists of (a) grouped answers and explanations generated by each agent in the previous round, (b) their confidence scores, and (c) demonstrations of answer-rectifying human explanations, used for convincing other agents. Experiments on seven benchmarks demonstrate that ReConcile significantly improves LLMs' reasoning -- both individually and as a team -- surpassing prior single-agent and multi-agent baselines by up to 11.4% and even outperforming GPT-4 on three datasets. ReConcile also flexibly incorporates different combinations of agents, including API-based, open-source, and domain-specific models, leading to an 8% improvement on MATH. Finally, we analyze the individual components of ReConcile, demonstrating that the diversity originating from different models is critical to its superior performance. Code: https://github.com/dinobby/ReConcile"
137,Creating a visual-collaborative learning environment for real estate education – an experiment on a cross-city and inter-disciplinary interactive discussion platform for real estate students in Hong Kong,paper,2018,"Real estate education is a multi-disciplinary subject that requires students to have an all-round training and knowledge in such core areas as economic and finance, urban planning and urbanization studies, public policy, law as well as construction technology. These subject areas are dynamic in the sense that they keep evolving with changes in socio-economic as well as technological variables in the society. In addition, while real estate development itself is a geographically fixed commodity which implies local knowledge and regulations dominate the outcome, real estate analysis can be a much more global issue when local developers and investors desire to make an oversea investment decision to diversify investment risk. In such case, the ability for students to understand the intertwined relationship between the general principles of real estate development as well as actual market structures in other places becomes imperative. This problem-based learning ability can be enhanced with a pedagogical approach that injects training in self-initiated research, skillful communication technique and the ability to understand and analyse different socio-economic environments. In this paper, we will illustrate how an online discussion platform for students helps create a visual collaborative learning environment that enhances the learning outcomes of real estate education in a studio-discussion format, which transcends the physical constraints of arranging face-to-face meetings among students from different but related programmes. More importantly, real estate education that can prepare students with a more inter-disciplinary and internationalized basis of knowledge will be able to attract good students who aim at increasing their competitiveness in the global job market. By using the discussion forum based on the interactive environment created by Realtimeboard.com, this paper shows that students from different disciplines and cities can carry out real-time discussions on issues pertaining to urban development without the need to physically meet. The outcomes of our experiment shows that online collaborative learning platform can serve to underpin the pedagogical outcomes of real estate education in a cost-effective manner that both students and teaching staff benefit."
138,MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning,paper,2023-11-16,"Large language models (LLMs), despite their remarkable progress across various general domains, encounter significant barriers in medicine and healthcare. This field faces unique challenges such as domain-specific terminologies and reasoning over specialized knowledge. To address these issues, we propose MedAgents, a novel multi-disciplinary collaboration framework for the medical domain. MedAgents leverages LLM-based agents in a role-playing setting that participate in a collaborative multi-round discussion, thereby enhancing LLM proficiency and reasoning capabilities. This training-free framework encompasses five critical steps: gathering domain experts, proposing individual analyses, summarising these analyses into a report, iterating over discussions until a consensus is reached, and ultimately making a decision. Our work focuses on the zero-shot setting, which is applicable in real-world scenarios. Experimental results on nine datasets (MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU) establish that our proposed MedAgents framework excels at mining and harnessing the medical expertise within LLMs, as well as extending its reasoning abilities. Our code can be found at https://github.com/gersteinlab/MedAgents."
139,User studies in cartography: A collaborative research agenda,paper,2017-07-07,"The possibility of digital interactivity requires us to reenvision the map reader as the map user, and to address the new perceptual, cognitive, cultural, and practical considerations that now influence the user's experience with interactive maps and visualizations. Here, we present an agenda for empirical research on these users and the interactive designs they employ. This is one of several research agendas resulting from a multi-stage discussion among international scholars facilitated by the International Cartographic Association, which included an early round of position papers and two subsequent workshops to narrow into pressing themes and important research opportunities. The focus of this agenda is epistemological and reflects the wide interdisciplinary influences on user studies in cartography. The opportunities are presented as imperatives that cross basic research and user-centered design studies, and include practical impediments to empirical research, emerging interdisciplinary recommendations to improve user studies, and key research needs regarding the specific study of interactive maps and visualizations. This presentation is based on the article available at http://dx.doi.org/10.1080/23729333.2017.1288534."
140,Multi-UAV Collaborative Absolute Vision Positioning and Navigation: A Survey and Discussion,paper,2023-04-11,"The employment of unmanned aerial vehicles (UAVs) has greatly facilitated the lives of humans. Due to the mass manufacturing of consumer unmanned aerial vehicles and the support of related scientific research, it can now be used in lighting shows, jungle search-and-rescues, topographical mapping, disaster monitoring, and sports event broadcasting, among many other disciplines. Some applications have stricter requirements for the autonomous positioning capability of UAV clusters, requiring its positioning precision to be within the cognitive range of a human or machine. Global Navigation Satellite System (GNSS) is currently the only method that can be applied directly and consistently to UAV positioning. Even with dependable GNSS, large-scale clustering of drones might fail, resulting in drone cluster bombardment. As a type of passive sensor, the visual sensor has a compact size, a low cost, a wealth of information, strong positional autonomy and reliability, and high positioning accuracy. This automated navigation technology is ideal for drone swarms. The application of vision sensors in the collaborative task of multiple UAVs can effectively avoid navigation interruption or precision deficiency caused by factors such as field-of-view obstruction or flight height limitation of a single UAV sensor and achieve large-area group positioning and navigation in complex environments. This paper examines collaborative visual positioning among multiple UAVs (UAV autonomous positioning and navigation, distributed collaborative measurement fusion under cluster dynamic topology, and group navigation based on active behavior control and distributed fusion of multi-source dynamic sensing information). Current research constraints are compared and appraised, and the most pressing issues to be addressed in the future are anticipated and researched. Through analysis and discussion, it has been concluded that the integrated employment of the aforementioned methodologies aids in enhancing the cooperative positioning and navigation capabilities of multiple UAVs during GNSS denial."
141,SC-Safety: A Multi-round Open-ended Question Adversarial Safety Benchmark for Large Language Models in Chinese,paper,2023-10-09,"Large language models (LLMs), like ChatGPT and GPT-4, have demonstrated remarkable abilities in natural language understanding and generation. However, alongside their positive impact on our daily tasks, they can also produce harmful content that negatively affects societal perceptions. To systematically assess the safety of Chinese LLMs, we introduce SuperCLUE-Safety (SC-Safety) - a multi-round adversarial benchmark with 4912 open-ended questions covering more than 20 safety sub-dimensions. Adversarial human-model interactions and conversations significantly increase the challenges compared to existing methods. Experiments on 13 major LLMs supporting Chinese yield the following insights: 1) Closed-source models outperform open-sourced ones in terms of safety; 2) Models released from China demonstrate comparable safety levels to LLMs like GPT-3.5-turbo; 3) Some smaller models with 6B-13B parameters can compete effectively in terms of safety. By introducing SC-Safety, we aim to promote collaborative efforts to create safer and more trustworthy LLMs. The benchmark and findings provide guidance on model selection. Our benchmark can be found at https://www.CLUEbenchmarks.com"
142,Metaproteomics profiling of the microbial communities in fermentation starters (Daqu) during multi-round production of Chinese liquor,paper,2023-06-01,"Introduction The special flavor and fragrance of Chinese liquor are closely related to microorganisms in the fermentation starter Daqu. The changes of microbial community can affect the stability of liquor yield and quality. Methods In this study, we used data-independent acquisition mass spectrometry (DIA-MS) for cohort study of the microbial communities of a total of 42 Daqu samples in six production cycles at different times of a year. The DIA MS data were searched against a protein database constructed by metagenomic sequencing. Results The microbial composition and its changes across production cycles were revealed. Functional analysis of the differential proteins was carried out and the metabolic pathways related to the differential proteins were explored. These metabolic pathways were related to the saccharification process in liquor fermentation and the synthesis of secondary metabolites to form the unique flavor and aroma in the Chinese liquor. Discussion We expect that the metaproteome profiling of Daqu from different production cycles will serve as a guide for the control of fermentation process of Chinese liquor in the future."
143,Enhancing clinical reasoning with Chat Generative Pre-trained Transformer: a practical guide,paper,2023-10-03,"Abstract Objectives This study aimed to elucidate effective methodologies for utilizing the generative artificial intelligence (AI) system, namely the Chat Generative Pre-trained Transformer (ChatGPT), in improving clinical reasoning abilities among clinicians. Methods We conducted a comprehensive exploration of the capabilities of ChatGPT, emphasizing two main areas: (1) efficient utilization of ChatGPT, with a focus on application and language selection, input methodology, and output verification; and (2) specific strategies to bolster clinical reasoning using ChatGPT, including self-learning via simulated clinical case creation and engagement with published case reports. Results Effective AI-based clinical reasoning development requires a clear delineation of both system roles and user needs. All outputs from the system necessitate rigorous verification against credible medical resources. When used in self-learning scenarios, capabilities of ChatGPT in clinical case creation notably enhanced disease comprehension. Conclusions The efficient use of generative AIs, as exemplified by ChatGPT, can impressively enhance clinical reasoning among medical professionals. Adopting these cutting-edge tools promises a bright future for continuous advancements in clinicians’ diagnostic skills, heralding a transformative era in digital healthcare."
144,medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs,paper,2024-06-20,"Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL assigns weighted importance to entities in medical records based on their type, enabling precise localization of candidate diseases within KGs. It innovatively employs a residual network-like approach, allowing initial diagnosis by the LLM to be merged into KG search results. Through a path-based reranking algorithm and a fill-in-the-blank style prompt template, it further refined the diagnostic process. We validated medIKAL's effectiveness through extensive experiments on a newly introduced open-sourced Chinese EMR dataset, demonstrating its potential to improve clinical diagnosis in real-world settings."
145,Image to Label to Answer: An Efficient Framework for Enhanced Clinical Applications in Medical Visual Question Answering,paper,2024-06-10,"Medical Visual Question Answering (Med-VQA) faces significant limitations in application development due to sparse and challenging data acquisition. Existing approaches focus on multi-modal learning to equip models with medical image inference and natural language understanding, but this worsens data scarcity in Med-VQA, hindering clinical application and advancement. This paper proposes the ITLTA framework for Med-VQA, designed based on field requirements. ITLTA combines multi-label learning of medical images with the language understanding and reasoning capabilities of large language models (LLMs) to achieve zero-shot learning, meeting natural language module needs without end-to-end training. This approach reduces deployment costs and training data requirements, allowing LLMs to function as flexible, plug-and-play modules. To enhance multi-label classification accuracy, the framework uses external medical image data for pretraining, integrated with a joint feature and label attention mechanism. This configuration ensures robust performance and applicability, even with limited data. Additionally, the framework clarifies the decision-making process for visual labels and question prompts, enhancing the interpretability of Med-VQA. Validated on the VQA-Med 2019 dataset, our method demonstrates superior effectiveness compared to existing methods, confirming its outstanding performance for enhanced clinical applications."
146,"ChatGPT, Enhanced with Clinical Practice Guidelines, is a Superior Decision Support Tool",paper,2023-08-13,"ChatGPT has gained remarkable traction since its inception in November 2022. However, it faces limitations in generating inaccurate responses, ignoring existing guidelines, and lacking reasoning when applied in clinical settings. This study introduces ChatGPT-CARE, a tool that integrates clinical practice guidelines with ChatGPT, focusing on COVID-19 outpatient treatment decisions. By employing in-context learning, chain-of-thought prompting, and few-shots learning, ChatGPT-CARE enhances original ChatGPT's clinical decision support and reasoning capabilities. The tool was evaluated using three categories of various descriptions of patients seeking COVID-19 treatment, and two physicians specialized in pulmonary disease and critical care assessed the responses for accuracy, hallucination, and clarity. The results indicate that ChatGPT-CARE, particularly the GPT-4 version, offers higher accuracy and clarity compared to the original ChatGPT. Despite some limitations, such as occasional hallucinations, ChatGPT-CARE represents a significant advancement in AI-driven clinical decision support, with potential applications beyond COVID-19 treatment."
147,Aligning Large Language Models for Clinical Tasks,paper,2023-09-06,"Large Language Models (LLMs) have demonstrated remarkable adaptability, showcasing their capacity to excel in tasks for which they were not explicitly trained. However, despite their impressive natural language processing (NLP) capabilities, effective alignment of LLMs remains a crucial challenge when deploying them for specific clinical applications. The ability to generate responses with factually accurate content and to engage in non-trivial reasoning steps are crucial for the LLMs to be eligible for applications in clinical medicine. Employing a combination of techniques including instruction-tuning and in-prompt strategies like few-shot and chain-of-thought prompting has significantly enhanced the performance of LLMs. Our proposed alignment strategy for medical question-answering, known as 'expand-guess-refine', offers a parameter and data-efficient solution. A preliminary analysis of this method demonstrated outstanding performance, achieving a score of 70.63% on a subset of questions sourced from the USMLE dataset."
148,The select and test algorithm for inference in medical diagnostic reasoning: Implementation and evaluation in clinical psychiatry,paper,2016-06-01,"Clinical diagnostic reasoning involves an informed search for clinical information driven by diagnostic hypotheses. This is then followed by matching the elicited clinical information with diagnostic criteria for each differential diagnosis resulting in diagnostic conclusions. The existing approaches to clinical reasoning were limited in their capabilities in adequately covering this process, particularly in arriving at diagnostic conclusions. As a solution, this paper presents previously published Select and Test (ST) algorithm that were enhanced with a technique known as orthogonal vector projection method, which is used more efficiently and effectively in arriving diagnostic conclusions. The implementation of the algorithm along with a knowledgebase in psychiatry has been described and the accuracy of the algorithm have been demonstrated by evaluating it using actual patient data."
149,"Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications",paper,2022-02-17,"Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research."
150,"Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications",paper,2022,"Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fastgrowing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research."
151,"A Survey on Medical Large Language Models: Technology, Application, Trustworthiness, and Future Directions",paper,2024-06-06,"Large language models (LLMs), such as GPT series models, have received substantial attention due to their impressive capabilities for generating and understanding human-level language. More recently, LLMs have emerged as an innovative and powerful adjunct in the medical field, transforming traditional practices and heralding a new era of enhanced healthcare services. This survey provides a comprehensive overview of Medical Large Language Models (Med-LLMs), outlining their evolution from general to the medical-specific domain (i.e, Technology and Application), as well as their transformative impact on healthcare (e.g., Trustworthiness and Safety). Concretely, starting from the fundamental history and technology of LLMs, we first delve into the progressive adaptation and refinements of general LLM models in the medical domain, especially emphasizing the advanced algorithms that boost the LLMs' performance in handling complicated medical environments, including clinical reasoning, knowledge graph, retrieval-augmented generation, human alignment, and multi-modal learning. Secondly, we explore the extensive applications of Med-LLMs across domains such as clinical decision support, report generation, and medical education, illustrating their potential to streamline healthcare services and augment patient outcomes. Finally, recognizing the imperative and responsible innovation, we discuss the challenges of ensuring fairness, accountability, privacy, and robustness in Med-LLMs applications. Finally, we conduct a concise discussion for anticipating possible future trajectories of Med-LLMs, identifying avenues for the prudent expansion of Med-LLMs. By consolidating above-mentioned insights, this review seeks to provide a comprehensive investigation of the potential strengths and limitations of Med-LLMs for professionals and researchers, ensuring a responsible landscape in the healthcare setting."
152,Education of clinical reasoning in patients with multimorbidity: a scoping review and perspectives for technology-enhanced learning,paper,2023-06-09,"Multimorbidity is defined as the co-existence of two or more chronic diseases in a patient, and it is increasing in prevalence. This condition poses new problems for clinical reasoning. Few studies inquire regarding the construct of reasoning in multimorbidity and the teaching/learning methods. The objectives of this scoping review were searching for a definition of the construct of clinical reasoning in multimorbidity and the related learning methods, and special ways in which information technology can help. We searched PubMed, Scopus, ERIC and CORE databases. After an iterative process of selection and thematic analysis, we selected 30 articles, that were thematized in three classes: the multimorbid patient as a teacher (8 articles), defining a framework of competence (11 articles), representing multimorbidity and related clinical reasoning (11 articles). In this last theme were also grouped studies using technology to enhance learning. The construct of clinical reasoning in multimorbidity expands over three domains: clinical (including managing uncertainty, anticipating, and detecting evolutions and conflicting guidelines, and setting priorities); relational (concerning communicating uncertainty and developing a feasible, shared plan of care with the patient; organizational) (managing the wide system of resources needed to take care of a multimorbid patient). The preferred teaching methods are based on the encounter with real or expert patients, technology enhanced case-based learning and graphical representations of clinical cases. Perspectives of research should be addressed to permit the learner to experience a patient’s life-long experience by moving forward and back over time while exploring interactions among diseases and social determinants with respect to possibly conflicting treatments. Perspectives on rich, technology-enhanced simulations should be researched."
153,API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs,paper,2023-04-14,"Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question."
154,ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models,paper,2023-05-23,"Although large language models (LLMs) have achieved excellent performance in a variety of evaluation benchmarks, they still struggle in complex reasoning tasks which require specific knowledge and multi-hop reasoning. To improve the reasoning abilities, we propose \textbf{ChatCoT}, a tool-augmented chain-of-thought reasoning framework for chat-based LLMs. In ChatCoT, we model the chain-of-thought~(CoT) reasoning as multi-turn conversations, to utilize tools in a more natural way through chatting. At each turn, LLMs can either interact with tools or perform the reasoning. Our approach can effectively leverage the multi-turn conversation ability of chat-based LLMs, and integrate the thought chain following and tools manipulation in a unified way. Specially, we initialize the early turns of the conversation by the tools, tasks and reasoning format, and propose an iterative \emph{tool-augmented reasoning} step to perform step-by-step tool-augmented reasoning. The experiment results on two complex reasoning datasets (MATH and HotpotQA) have shown the effectiveness of ChatCoT on complex reasoning tasks, achieving a 6.8\% relative improvement over the state-of-the-art baseline. Our code and data are available at: \url{https://github.com/RUCAIBOX/ChatCoT}."
155,Gentopia: A Collaborative Platform for Tool-Augmented LLMs,paper,2023-08-08,"Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks for ALMs, to varying degrees, are deficient in the following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling flexible customization of agents through simple configurations, seamlessly integrating various language models, task formats, prompting modules, and plugins into a unified paradigm. Furthermore, we establish gentpool, a public platform enabling the registration and sharing of user-customized agents. Agents registered in gentpool are composable such that they can be assembled together for agent collaboration, advancing the democratization of artificial intelligence. To ensure high-quality agents, gentbench, an integral component of gentpool, is designed to thoroughly evaluate user-customized agents across diverse aspects such as safety, robustness, efficiency, etc. We release gentopia on Github and will continuously move forward."
156,RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models,paper,2023-10-25,"Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA -- predicting root causes, solutions, evidence, and responsibilities -- and tasks covered or uncovered by current rules, as validated by both automated metrics and human evaluations. Furthermore, RCAgent has already been integrated into the diagnosis and issue discovery workflow of the Real-time Compute Platform for Apache Flink of Alibaba Cloud."
157,Proposal for the Deployment of an Augmented Reality Tool for Construction Safety Inspection,paper,2022-04-17,"The construction site is a hazardous place. The dynamic, complex interaction between workers, machinery, and the environment leads to dangerous risks. In response to such risks, the goal is to fulfill the zero accidents philosophy, which requires the development of safety skills among workers and the provision of tools for risk prevention. In pursuit of that vision, this work studies collective protective equipment (CPE). Traditional methodologies propose visual inspections using checklists, the effectiveness of which depends on the quality of the inspection by the safety advisor (SA). This paper analyses the traditional process of safety inspections in building projects: the traditional methods, main pain points, and bottlenecks are identified, along with the key performance indicators (KPIs) needed to complete these processes correctly. Because of this, a methodology that digitises the CPE inspection process is proposed. Augmented reality (AR) is used as a 3D viewer with an intuitive interface for the SA, and, accordingly, functional requirements are detailed and different information layers and user interfaces for AR applications are proposed. In addition, the workflow and KPIs are shown. To demonstrate the feasibility of the proposal, a proof of concept is developed and evaluated. The relevance of this work lies in providing background for the use of AR in safety inspection processes on construction sites and in offering methodological recommendations for the development and evaluation of these applications."
158,School of the Future: A Comprehensive Study on the Effectiveness of Augmented Reality as a Tool for Primary School Children’s Education,paper,2021-06-07,"With the emerging technologies of augmented reality (AR) and virtual reality (VR), the learning process in today’s classroom is much more effective and motivational. Overlaying virtual content into the real world makes learning methods attractive and entertaining for students while performing activities. AR techniques make the learning process easy, and fun as compared to traditional methods. These methods lack focused learning and interactivity between the educational content. To make learning effective, we propose to use handheld marker-based AR technology for primary school students. We developed a set of four applications based on students’ academic course of primary school level for learning purposes of the English alphabet, decimal numbers, animals and birds, and an AR Globe for knowing about different countries around the world. These applications can be played wherever and whenever a user wants without Internet connectivity, subject to the availability of a tablet or mobile device and the required target images. These applications have performance evaluation quizzes (PEQs) for testing students’ learning progress. Our study investigates the effectiveness of AR-based learning materials in terms of learning performance, motivation, attitude, and behavior towards different methods of learning. Our activity results favor AR-based learning techniques where students’ learning motivation and performance are enhanced compared to the non-AR learning methods."
159,Developing a Simple and Cost-Effective Markerless Augmented Reality Tool for Chemistry Education,paper,2021-04-20,Traditional visualization methods have a limited capacity to enhance students’ understanding of 3D molecular structure and reactivity. Studies have shown that 3D visualization tools can play an ess...
160,Evaluation of Augmented Reality Application for Learning Dental Anatomy as a Novel Educational Tool.,paper,2020-01-03,"AIMS
To investigate dental student's perception of the augmented reality (AR) head and neck anatomy application and to determine whether the learning environment was beneficial for students compared with traditional cadaver learning.


METHODS
A total of 88 students participated in a self-administered questionnaire prior to and after the use of AR. This was conducted during anatomy classes for second year dentistry students. Descriptive data analysis was performed to determine the perceptions of experience gained through AR.


RESULTS
The study revealed that over two-thirds of participants perceived that it would assist in their learning with 52.3% of participants who agreed and 35.2% of participants who strongly agreed. After the use of HoloHuman, it was found that 43.5% of participants agreed that the 3D anatomical structures improved their understanding of anatomy and 36.5% agreed that they felt more confident about their anatomy skills. The results also demonstrated that only 34.1% agreed that it added value in training compared to relying solely on traditional methods. Overall, 75.3% of participants agreed that HoloHuman teaching should not replace traditional cadaver training.


CONCLUSION
This study suggested that the use of AR offers an additional mean of dental anatomy training; however, it cannot be used as a replacement for traditional modes of cadaver anatomy training. AR has the potential to be used as an adjunct tool in the learning of dental head and neck anatomy as it has demonstrated increased student engagement and enjoyment however limitations with the device still remain."
161,The Flan Collection: Designing Data and Methods for Effective Instruction Tuning,paper,2023-01-31,"We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/google-research/FLAN/tree/main/flan/v2."
162,Donkii: Can Annotation Error Detection Methods Find Errors in Instruction-Tuning Datasets?,paper,2023-09-04,"Instruction tuning has become an integral part of training pipelines for Large Language Models (LLMs) and has been shown to yield strong performance gains. In an orthogonal line of research, Annotation Error Detection (AED) has emerged as a tool for detecting quality problems in gold standard labels. So far, however, the application of AED methods has been limited to classification tasks. It is an open question how well AED methods generalize to language generation settings, which are becoming more widespread via LLMs. In this paper, we present a first and novel benchmark for AED on instruction tuning data: DONKII. It comprises three instruction-tuning datasets enriched with error annotations by experts and semi-automatic methods. We also provide a novel taxonomy of error types for instruction-tuning data. We find that all three datasets contain clear errors, which sometimes propagate directly into instruction-tuned LLMs. We propose four AED baselines for the generative setting and evaluate them extensively on the newly introduced dataset. Our results show that the choice of the right AED method and model size is indeed crucial and derive practical recommendations for how to use AED methods to clean instruction-tuning data."
163,"LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark",paper,2023-06-11,"Large language models have become a potential pathway toward achieving artificial general intelligence. Recent works on multi-modal large language models have demonstrated their effectiveness in handling visual modalities. In this work, we extend the research of MLLMs to point clouds and present the LAMM-Dataset and LAMM-Benchmark for 2D image and 3D point cloud understanding. We also establish an extensible framework to facilitate the extension of MLLMs to additional modalities. Our main contribution is three-fold: 1) We present the LAMM-Dataset and LAMM-Benchmark, which cover almost all high-level vision tasks for 2D and 3D vision. Extensive experiments validate the effectiveness of our dataset and benchmark. 2) We demonstrate the detailed methods of constructing instruction-tuning datasets and benchmarks for MLLMs, which will enable future research on MLLMs to scale up and extend to other domains, tasks, and modalities faster. 3) We provide a primary but potential MLLM training framework optimized for modalities' extension. We also provide baseline models, comprehensive experimental observations, and analysis to accelerate future research. Codes and datasets are now available at https://github.com/OpenLAMM/LAMM."
164,What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning,paper,2023-12-25,"Instruction tuning is a standard technique employed to align large language models to end tasks and user preferences after the initial pretraining phase. Recent research indicates the critical role of data engineering in instruction tuning -- when appropriately selected, only limited data is necessary to achieve superior performance. However, we still lack a principled understanding of what makes good instruction tuning data for alignment, and how we should select data automatically and effectively. In this work, we delve deeply into automatic data selection strategies for alignment. We start with controlled studies to measure data across three dimensions: complexity, quality, and diversity, along which we examine existing methods and introduce novel techniques for enhanced data measurement. Subsequently, we propose a simple strategy to select data samples based on the measurement. We present deita (short for Data-Efficient Instruction Tuning for Alignment), a series of models fine-tuned from LLaMA and Mistral models using data samples automatically selected with our proposed approach. Empirically, deita performs better or on par with the state-of-the-art open-source alignment models with only 6K SFT training data samples -- over 10x less than the data used in the baselines. When further trained with direct preference optimization (DPO), deita-Mistral-7B + DPO trained with 6K SFT and 10K DPO samples achieve 7.55 MT-Bench and 90.06% AlpacaEval scores. We anticipate this work to provide tools on automatic data selection, facilitating data-efficient alignment. We release our models as well as the selected datasets for future researches to effectively align models more efficiently."
165,To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning,paper,2023-11-13,"Existing visual instruction tuning methods typically prompt large language models with textual descriptions to generate instruction-following data. Despite the promising performance achieved, these descriptions are derived from image annotations, which are oftentimes coarse-grained. Furthermore, the instructions might even contradict the visual content without observing the entire visual context. To address this challenge, we introduce a fine-grained visual instruction dataset, LVIS-Instruct4V, which contains 220K visually aligned and context-aware instructions produced by prompting the powerful GPT-4V with images from LVIS. Through experimental validation and case studies, we demonstrate that high-quality visual instructional data could improve the performance of LLaVA-1.5, a state-of-the-art large multimodal model, across a wide spectrum of benchmarks by clear margins. Notably, by simply replacing the LLaVA-Instruct with our LVIS-Instruct4V, we achieve better results than LLaVA on most challenging LMM benchmarks, e.g., LLaVA$^w$ (76.7 vs. 70.7) and MM-Vet (40.2 vs. 35.4). We release our data and model at https://github.com/X2FD/LVIS-INSTRUCT4V."
166,GraphGPT: Graph Instruction Tuning for Large Language Models,paper,2023-10-19,"Graph Neural Networks (GNNs) have evolved to understand graph structures through recursive exchanges and aggregations among nodes. To enhance robustness, self-supervised learning (SSL) has become a vital tool for data augmentation. Traditional methods often depend on fine-tuning with task-specific labels, limiting their effectiveness when labeled data is scarce. Our research tackles this by advancing graph model generalization in zero-shot learning environments. Inspired by the success of large language models (LLMs), we aim to create a graph-oriented LLM capable of exceptional generalization across various datasets and tasks without relying on downstream graph data. We introduce the GraphGPT framework, which integrates LLMs with graph structural knowledge through graph instruction tuning. This framework includes a text-graph grounding component to link textual and graph structures and a dual-stage instruction tuning approach with a lightweight graph-text alignment projector. These innovations allow LLMs to comprehend complex graph structures and enhance adaptability across diverse datasets and tasks. Our framework demonstrates superior generalization in both supervised and zero-shot graph learning tasks, surpassing existing benchmarks. The open-sourced model implementation of our GraphGPT is available at https://github.com/HKUDS/GraphGPT."
167,Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks,paper,2023-11-01,"Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly selecting tasks can lead to suboptimal performance. In this work, we propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks. We represent the informativeness of new tasks with the disagreement of the current model outputs over perturbed prompts. Our experiments on NIV2 and Self-Instruct datasets demonstrate that our method consistently outperforms other baseline strategies for task selection, achieving better out-of-distribution generalization with fewer training tasks. Additionally, we introduce a task map that categorizes and diagnoses tasks based on prompt uncertainty and prediction probability. We discover that training on ambiguous (prompt-uncertain) tasks improves generalization while training on difficult (prompt-certain and low-probability) tasks offers no benefit, underscoring the importance of task selection for instruction tuning."
168,Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models,paper,2023-08-25,"Recently, Multimodal Large Language Models (MLLMs) that enable Large Language Models (LLMs) to interpret images through visual instruction tuning have achieved significant success. However, existing visual instruction tuning methods only utilize image-language instruction data to align the language and image modalities, lacking a more fine-grained cross-modal alignment. In this paper, we propose Position-enhanced Visual Instruction Tuning (PVIT), which extends the functionality of MLLMs by integrating an additional region-level vision encoder. This integration promotes a more detailed comprehension of images for the MLLM. In addition, to efficiently achieve a fine-grained alignment between the vision modules and the LLM, we design multiple data generation strategies to construct an image-region-language instruction dataset. Finally, we present both quantitative experiments and qualitative analysis that demonstrate the superiority of the proposed model. Code and data will be released at https://github.com/PVIT-official/PVIT."
169,CoEdIT: Text Editing by Task-Specific Instruction Tuning,paper,2023-05-17,"We introduce CoEdIT, a state-of-the-art text editing system for writing assistance. CoEdIT takes instructions from the user specifying the attributes of the desired text, such as""Make the sentence simpler""or""Write it in a more neutral style,""and outputs the edited text. We present a large language model fine-tuned on a diverse collection of task-specific instructions for text editing (a total of 82K instructions). Our model (1) achieves state-of-the-art performance on various text editing benchmarks, (2) is competitive with publicly available largest-sized LLMs trained on instructions while being nearly 60x smaller, (3) is capable of generalizing to unseen edit instructions, and (4) exhibits abilities to generalize to composite instructions containing different combinations of edit actions. Through extensive qualitative and quantitative analysis, we show that writers prefer the edits suggested by CoEdIT relative to other state-of-the-art text editing models. Our code, data, and models are publicly available at https://github.com/vipulraheja/coedit."
170,Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning,paper,2023-09-11,"The Mixture of Experts (MoE) is a widely known neural architecture where an ensemble of specialized sub-models optimizes overall performance with a constant computational cost. However, conventional MoEs pose challenges at scale due to the need to store all experts in memory. In this paper, we push MoE to the limit. We propose extremely parameter-efficient MoE by uniquely combining MoE architecture with lightweight experts.Our MoE architecture outperforms standard parameter-efficient fine-tuning (PEFT) methods and is on par with full fine-tuning by only updating the lightweight experts -- less than 1% of an 11B parameters model. Furthermore, our method generalizes to unseen tasks as it does not depend on any prior task knowledge. Our research underscores the versatility of the mixture of experts architecture, showcasing its ability to deliver robust performance even when subjected to rigorous parameter constraints. Our code used in all the experiments is publicly available here: https://github.com/for-ai/parameter-efficient-moe."
171,GeneGPT: Augmenting Large Language Models with Domain Tools for Improved Access to Biomedical Information,paper,2023-04-19,"While large language models (LLMs) have been successfully applied to various tasks, they still face challenges with hallucinations. Augmenting LLMs with domain-specific tools such as database utilities can facilitate easier and more precise access to specialized knowledge. In this paper, we present GeneGPT, a novel method for teaching LLMs to use the Web APIs of the National Center for Biotechnology Information (NCBI) for answering genomics questions. Specifically, we prompt Codex to solve the GeneTuring tests with NCBI Web APIs by in-context learning and an augmented decoding algorithm that can detect and execute API calls. Experimental results show that GeneGPT achieves state-of-the-art performance on eight tasks in the GeneTuring benchmark with an average score of 0.83, largely surpassing retrieval-augmented LLMs such as the new Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as GPT-3 (0.16) and ChatGPT (0.12). Our further analyses suggest that: (1) API demonstrations have good cross-task generalizability and are more useful than documentations for in-context learning; (2) GeneGPT can generalize to longer chains of API calls and answer multi-hop questions in GeneHop, a novel dataset introduced in this work; (3) Different types of errors are enriched in different tasks, providing valuable insights for future improvements."
172,GeneGPT: Teaching Large Language Models to Use NCBI Web APIs,paper,2023,"In this paper, we present GeneGPT, a novel method for teaching large language models (LLMs) to use the Web Application Programming Interfaces (APIs) of the National Center for Biotechnology Information (NCBI) and answer genomics questions. Speciﬁcally, we prompt Codex ( code-davinci-002 ) to solve the GeneTuring tests with few-shot URL requests of NCBI API calls as demonstrations for in-context learning. During inference, we stop the decoding once a call request is detected and make the API call with the generated URL. We then append the raw execution results returned by NCBI APIs to the generated texts and continue the generation until the answer is found or another API call is detected. Our preliminary results show that GeneGPT achieves state-of-the-art results on three out of four one-shot tasks and four out of ﬁve zero-shot tasks in the GeneTuring dataset. Overall, GeneGPT achieves a macro-average score of 0.76, which is much higher than retrieval-augmented LLMs such as the New Bing (0.44), biomedical LLMs such as BioMedLM (0.08) and BioGPT (0.04), as well as other LLMs such as GPT-3 (0.16) and ChatGPT (0.12)."
173,An Endogenous Inhibitor of Human Immunodeficiency Virus in Human Lymphocytes Is Overcome by the Viral Vif Protein,paper,1998-12-01,"ABSTRACT The vif gene of human immunodeficiency virus type 1 (HIV-1) encodes a basic M r 23,000 protein that is necessary for production of infectious virions by nonpermissive cells (human lymphocytes and macrophages) but not by permissive cells such as HeLa-CD4. It had been proposed that permissive cells may contain an unidentified factor that functions like the viral Vif protein. To test this hypothesis, we produced pseudotyped wild-type andvif-deleted HIV gpt virions (which contain the HIV-1 genome with the bacterial mycophenolic acid resistance genegpt in place of the viral env gene) in permissive cells, and we used them to generate nonpermissive H9 leukemic T cells that express these proviruses. We then fused these H9 cells with permissive HeLa cells that express the HIV-1 envelope glycoprotein gp120-gp41, and we asked whether the heterokaryons would release infectious HIV gpt virions. The results clearly showed that the vif-deleted virions released by the heterokaryons were noninfectious whereas the wild-type virions were highly infectious. This strongly suggests that nonpermissive cells, the natural targets of HIV-1, contain a potent endogenous inhibitor of HIV-1 replication that is overcome by Vif."
174,A Sand County Almanac and Sketches Here and There,paper,2020-05-01,"First published in 1949 and praised in The New York Times Book Review as ""a trenchant book, full of vigor and bite,"" A Sand County Almanac combines some of the finest nature writing since Thoreau with an outspoken and highly ethical regard for America's relationship to the land. Written with an unparalleled understanding of the ways of nature, the book includes a section on the monthly changes of the Wisconsin countryside; another part that gathers informal pieces written by Leopold over a forty-year period as he traveled through the woodlands of Wisconsin, Iowa, Arizona, Sonora, Oregon, Manitoba, and elsewhere; and a final section in which Leopold addresses the philosophical issues involved in wildlife conservation. As the forerunner of such important books as Annie Dillard's Pilgrim at Tinker Creek, Edward Abbey's Desert Solitaire, and Robert Finch's The Primal Place, this classic work remains as relevant today as it was forty years ago."
175,Almanac: Retrieval-Augmented Language Models for Clinical Medicine,paper,2023-03-01,"Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In this study, we develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations. Performance on a novel dataset of clinical scenarios (n= 130) evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties, with improvements in completeness and safety. Our results demonstrate the potential for large language models to be effective tools in the clinical decision-making process, while also emphasizing the importance of careful testing and deployment to mitigate their shortcomings."
176,Almanac: Knowledge-Grounded Language Models for Clinical Medicine,paper,2023,"Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine (e.g. medical record documentation, treatment guideline-lookup), adoption of these models in real-world settings has been largely limited by their tendency to generate factually incorrect and sometimes even toxic statements. In this paper we explore the ability of large-language models to facilitate and streamline medical guidelines and recommendation referencing: by enabling these model to access external point-of-care tools in response to physician queries, we demonstrate signiﬁcantly improved factual grounding, helpfulness, and safety in a variety of clinical scenarios."
177,Almanac: Weak Lensing power spectra and map inference on the masked sphere,paper,2022-10-24,"We present a field-based signal extraction of weak lensing from noisy observations on the curved and masked sky. We test the analysis on a simulated Euclid-like survey, using a Euclid-like mask and noise level. To make optimal use of the information available in such a galaxy survey, we present a Bayesian method for inferring the angular power spectra of the weak lensing fields, together with an inference of the noise-cleaned tomographic weak lensing shear and convergence (projected mass) maps. The latter can be used for field-level inference with the aim of extracting cosmological parameter information including non-gaussianity of cosmic fields. We jointly infer all-sky $E$-mode and $B$-mode tomographic auto- and cross-power spectra from the masked sky, and potentially parity-violating $EB$-mode power spectra, up to a maximum multipole of $\ell_{\rm max}=2048$. We use Hamiltonian Monte Carlo sampling, inferring simultaneously the power spectra and denoised maps with a total of $\sim 16.8$ million free parameters. The main output and natural outcome is the set of samples of the posterior, which does not suffer from leakage of power from $E$ to $B$ unless reduced to point estimates. However, such point estimates of the power spectra, the mean and most likely maps, and their variances and covariances, can be computed if desired."
178,Almanac - Retrieval-Augmented Language Models for Clinical Medicine.,paper,2024-01-25,"BACKGROUND
Large language models (LLMs) have recently shown impressive zero-shot capabilities, whereby they can use auxiliary data, without the availability of task-specific training examples, to complete a variety of natural language tasks, such as summarization, dialogue generation, and question answering. However, despite many promising applications of LLMs in clinical medicine, adoption of these models has been limited by their tendency to generate incorrect and sometimes even harmful statements.


METHODS
We tasked a panel of eight board-certified clinicians and two health care practitioners with evaluating Almanac, an LLM framework augmented with retrieval capabilities from curated medical resources for medical guideline and treatment recommendations. The panel compared responses from Almanac and standard LLMs (ChatGPT-4, Bing, and Bard) versus a novel data set of 314 clinical questions spanning nine medical specialties.


RESULTS
Almanac showed a significant improvement in performance compared with the standard LLMs across axes of factuality, completeness, user preference, and adversarial safety.


CONCLUSIONS
Our results show the potential for LLMs with access to domain-specific corpora to be effective in clinical decision-making. The findings also underscore the importance of carefully testing LLMs before deployment to mitigate their shortcomings. (Funded by the National Institutes of Health, National Heart, Lung, and Blood Institute.)."
179,A Sand County Almanac,paper,1949,"Our goal is always to offer you an assortment of cost-free ebooks too as aid resolve your troubles. We have got a considerable collection of totally free of expense Book for people from every single stroll of life. We have got tried our finest to gather a sizable library of preferred cost-free as well as paid files. Have spare times? Read sand county almanac writer by Why? A best seller publication in the world with fantastic worth as well as material is incorporated with fascinating words. Where? Merely below, in this site you could read online. Want download? Naturally readily available, download them likewise right here. Available data are as word, ppt, txt, kindle, pdf, rar, and also zip. Searching for many offered book or reading resource in the world? We give them all in style kind as word, txt, kindle, pdf, zip, rar as well as ppt. one of them is this competent sand county almanac that has actually been composed by Still confused how to get it? Well, simply review online or download by registering in our site here. Click them. GO TO THE TECHNICAL WRITING FOR AN EXPANDED TYPE OF THIS SAND COUNTY ALMANAC, ALONG WITH A CORRECTLY FORMATTED VERSION OF THE INSTANCE MANUAL PAGE ABOVE."
180,The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity.,paper,2017-07-01,"To date, over 100 small-molecule oncology drugs have been approved by the FDA. Because of the inherent heterogeneity of tumors, these small molecules are often administered in combination to prevent emergence of resistant cell subpopulations. Therefore, new combination strategies to overcome drug resistance in patients with advanced cancer are needed. In this study, we performed a systematic evaluation of the therapeutic activity of over 5,000 pairs of FDA-approved cancer drugs against a panel of 60 well-characterized human tumor cell lines (NCI-60) to uncover combinations with greater than additive growth-inhibitory activity. Screening results were compiled into a database, termed the NCI-ALMANAC (A Large Matrix of Anti-Neoplastic Agent Combinations), publicly available at https://dtp.cancer.gov/ncialmanac Subsequent in vivo experiments in mouse xenograft models of human cancer confirmed combinations with greater than single-agent efficacy. Concomitant detection of mechanistic biomarkers for these combinations in vivo supported the initiation of two phase I clinical trials at the NCI to evaluate clofarabine with bortezomib and nilotinib with paclitaxel in patients with advanced cancer. Consequently, the hypothesis-generating NCI-ALMANAC web-based resource has demonstrated value in identifying promising combinations of approved drugs with potent anticancer activity for further mechanistic study and translation to clinical trials. Cancer Res; 77(13); 3564-76. ©2017 AACR."
181,Almanac: Diffraction & Reading Diffractively,paper,2021-02-18,<jats:p>.</jats:p>
182,Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data,paper,2018-12-21,"Background Drug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs. In silico modeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations). Methods Each cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5000 pair-wise drug combinations, 60 cell lines, 4 types of models and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated. Results The evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least two-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models. Conclusions Despite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of required in vitro tests by predicting in silico which of the considered combinations are likely to be synergistic."
183,Egy avar kori kard mint információforrás és restaurált tárgy,paper,2023-03-06,"A tanulmány a döri, 7. századi, kora avar kori kard restaurálása során feltárt készítéstechnikai és anyagtudományi információkat tárgyalja. A mikroszkópos megfigyeléseket több esetben mintavétel és anyagvizsgálatok egészítették ki, a szétbontás és restaurálás folyamatát röntgenfelvételek előzték meg. A különböző vizsgálatok következményeként olyan készítéstechnikai információk kerültek elő, amelyek gyarapíthatják az avar tárgyi kultúráról alkotott tudásunkat."
184,Komentarze w zagranicznych środkach przekazu po zapowiedzi beatyfikacji kard. Stefana Wyszyńskiego,paper,2023-04-14,"Chociaż liczba nowych materiałów po trzech miesiącach od daty ogłoszenia beatyfikacji kard. Stefana Wyszyńskiego w zagranicznych mediach nie jest jeszcze znacząca, to zwraca uwagę fakt, że widziany tam jest już nie tylko jako przywódca Kościoła katolickiego w komunistycznej Polsce, lecz również jako człowiek duchowy, o głębokiej myśli społecznej. Świadczy o tym treść publikacji (1), komentarzy i komentarzy do tychże (2). Zaskakująca jest znajomość osoby Prymasa Tysiąclecia, któremu dziennikarz z Madrytu José Luis Restán Martínez dodał tytuł Wielki; zafascynowanie jego postacią wyrażane przez prof. Bernardino Montejano z Argentyny; emocje towarzyszące zapowiedzianemu wydarzeniu, znajdujące wyraz na portalach, zwłaszcza wśród Polonusów, a wskazujące na rolę przodków w przekazie wiedzy i mądrości; czerpanie w duszpasterstwie z myśli Sługi Bożego przez abpa José H. Gomeza z Los Angeles. Również dla Uniwersytetu, którego był studentem, doktorem, Wielkim Kanclerzem, beatyfikacja będzie wielkim zadaniem (3)."
185,Wkład Prymasa Polski Stefana kard. Wyszyńskiego i papieża Jana Pawła II w normalizację stosunków między Państwem a Kościołem,paper,2023-04-14,"Celem rozważań jest ukazanie wkładu prymasa Polski kard. Stefana Wyszyńskiego i papieża Jana Pawła II w proces normalizacji relacji między Państwem a Kościołem katolickim w Polsce po II wojnie światowej. Całość obejmuje trzy kwestie. Pierwsza z nich dotyczy zmian, jakie zostały wprowadzone przez władze komunistyczne. Istotne znaczenie miała uchwała Tymczasowego Rządu Jedności Narodowej zawierająca deklarację, że „Konkordat polski z 1925 r. przestał obowiązywać”. Oznaczało to przejście od regulacji stosunków między Państwem a Kościołem w formie dwustronnej umowy międzynarodowej do regulacji w formie aktów stanowionych jednostronnie przez władze państwowe, drastycznie ograniczających wolność Kościoła w realizacji swojej misji. Druga kwestia dotyczy zasad i metod, jakie prymas Stefan Wyszyński i papież Jan Paweł II stosowali i jakie stawiali postulaty w celu osiągnięcia normalizacji relacji dyplomatycznych między Polską i Stolicą Apostolską oraz regulacji stosunków między Państwem i Kościołem Polsce w formie dwustronnej umowy międzynarodowej. Trzecia kwestia dotyczy kolejnych etapów realizacji tych postulatów – od zerwania Konkordatu z 1925 r. do zawarcia nowego Konkordatu w latach 1993-1998."
186,„Zwycięstwo Maryi”. Próba zdefiniowania znaczenia „proroctwa Augusta kard. Hlonda o zwycięstwie Maryi w kontekście posługi apostolskiej prymasa tysiąclecia Stefana kard. Wyszyńskiego,paper,2023-03-09,"W 2006 roku polski Kościół katolicki przypomniał dwa wydarzenia historyczne o ogromnym znaczeniu dla życia wiernych w powojennej Polsce wyzwolonej od nazizmu, ale znajdującej się wówczas pod rządami komunistów, nieodwracalnie zdeterminowany, by wprowadzić tam marksistowską wizję społeczeństwa, niszcząc w ten sposób chrześcijańską obecność. Był to Akt Zawierzenia Niepokalanemu Sercu Maryi Narodu Polskiego (8 września 1946 r.), dokonany przez Prymasa, Sługę Bożego kard. Augusto Hlond, a śluby wieczyste Czarnej Madonnie z Jasnej Góry (26 sierpnia 1956 r.) złożył sługa Boży kard. Stefana Wyszyńskiego. Te dwie rocznice są okazją do przyjrzenia się doniosłości wizji Zwycięstwa Maryi nad ateistycznym komunistycznym systemem politycznym, której świadkiem był kard. Hlonda przed śmiercią (22 października 1948). Aby lepiej to zrozumieć, zbadamy rodzaj pobożności maryjnej, jaką miał Hlond; społeczno-polityczne okoliczności narodzin tej wizji; jej niezwykły wpływ na działalność duszpasterską kard. Wyszyński i polska hierarchia. Wyszyński dostrzegał bowiem niezwykłą moc twórczą wizji Maryi Zwycięskiej w planowaniu i realizacji niezwykle skutecznego programu duszpasterskiego odnowy moralnej i ewangelizacji społeczeństwa polskiego, zwłaszcza w perspektywie przygotowania Narodu Polskiego do Tysiąclecia swojego Chrztu (1966), poprzez Wielką Nowennę do Madonny. Ten maryjny styl działalności duszpasterskiej – który wbrew krytyce był całkowicie skoncentrowany na Chrystusie – przyczynił się zdecydowanie, nawet w opinii niekatolików, do obrony wolności obywateli polskich, co spotkało się z pozytywnym odzewem poza granicami kraju. Ponadto odniesiono się do postrzegania tej maryjnej wizji w posłudze Piotrowej Jana Pawła II, który nierzadko mówił o tej zwycięskiej wizji Maryi przez kard. Hlond. Choć nie jest to wprost wspomniane, to jednak można dostrzec istnienie wątku łączącego Hlonda, Wyszyńskiego i Jana Pawła II: szczególny maryjny wymiar ich działalności duszpasterskiej."
187,"Laudacja z okazji wręczenia kard. Zenonowi Grocholewskiemu Nagrody imienia ks. Idziego Radziszewskiego. Lublin, 27 maja 2013 roku",paper,2023-08-16,"W dniu 27 maja 2013 r. odbyła się uroczystość z okazji wręczenia Nagrody im. ks. Idziego Radziszewskiego kard. Zenonowi Grocholewskiemu za osiągnięcia w duchu chrześcijańskiego humanizmu, przyznana przez Towarzystwo Naukowe Katolickiego Uniwersytetu Lubelskiego Jana Pawła II. W laudacji prof. Józef Krukowski przedstawił: życiorys kardynała Grocholewskiego, jego działalność organizacyjną w Kurii Rzymskiej, zwłaszcza jako prefekta Najwyższego Trybunału Sygnatury Apostolskiej i prefekta Kongregacji Wychowania Katolickiego oraz dorobek naukowy z zakresu prawa kanonicznego, filozofii prawa i roli uniwersytetów we współczesnym świecie."
188,Starania o powrót Wydziału Teologicznego na Uniwersytet Jagielloński w raportach członków Wydziału do prymasa Polski ks. kard. Stefana Wyszyńskiego (1956–1958),paper,2023-07-25,"Decyzją stalinowskiej Rady Ministrów PRL 11 sierpnia 1954 r., po ponad 550 latach, Wydział Teologiczny został odłączony od Uniwersytetu Jagiellońskiego i włączony do Akademii Teologii Katolickiej w Warszawie. To oznaczało, że Kraków utracił prawa akademickie w teologii. Po przesileniu politycznym w październiku 1956 r. władze polityczne odcięły się od polityki ostatnich lat, od ich metod i decyzji. Wobec tego profesorowie Wydziału Teologicznego UJ pracujący w Akademii Teologii Katolickiej w Warszawie podjęli starania o przywrócenie Wydziału Teologicznego na Uniwersytecie Jagiellońskim. Wsparcia udzielił im prymas Polski kard. Stefan Wyszyński. 
Autor omawia tytułowy temat na podstawie kilkunastu raportów księży profesorów krakowskich do prymasa Polski kard. S. Wyszyńskiego w latach 1956–1958, które są zapisem ich starań o przywrócenie Wydziału Teologicznego na Uniwersytecie Jagiellońskim. Jak miało się okazać okres tzw. odwilży politycznej szybko się zakończył, a kierownictwo komunistycznej partii rządzącej ani na chwilę nie dopuszczało myśli o powrocie Wydziału Teologicznego na Uniwersytet Jagielloński. Raporty członków byłego Wydziału Teologicznego Uniwersytetu Jagiellońskiego są świadectwem ich pragnień i pełnej determinacji pracy. Autor starał się je pokazać, przedstawiając chronologicznie kolejne raporty jako etapy zmagań z systemem, w którym niemożliwe było istnienie Wydziału Teologicznego na jakimkolwiek państwowym uniwersytecie."
189,"Az alsó egyenes szemizom kard általi súlyos sérülése, klinikai képe, műtéti kezelése és posztoperatív eredményei",paper,2023,"Célkitűzés: Izolált alsó egyenes szemizomsérülés megjelenésének és kezelésének ismertetése eseten keresztül. Esetismertetés: Fiatal férfibeteg vágott sérülést követően azonnal kettősképet észlelt, látásromlás nélkül. Sérülés másnapján elvégzett műtét során az alsó egyenes szemizom részleges szakadását lehetett megfigyelni. Műtéti rekonstrukciót követően szemállás párhuzamos volt, a beteg panaszmentessé vált. Következtetések: Traumás szemizomsérülések nagyon változatosak lehetnek. Ellátásuk sokszor kihívást jelentő feladat. Izolált szemizomsérülés ritkán fordul elő. Fontos a műtét mihamarabbi elvégzése a kedvező prognózishoz."
190,Cztery fale ewangelizacji. Refleksja o. Raniera kard. Cantalamessy na temat głoszenia kerygmatu w historii Kościoła,paper,2023-03-31,"Father Raniero Cardinal Cantalamessa presents four periods of evangelization that have taken place in the history of the Church. He refers to them as „waves of evangelization”, indicating that these extraordinary periods of evangelization are reminiscences of the beginnings of the Church. Their appearance is related to the emergence of a new group of recipients unfamiliar with the message of the Gospel. Cantalamessa’s analysis indicates that the actual content conveyed to them is the kerygma. Appropriate pastoral reflection is also required by the choice of a keryx who is to effectively reach the recipients with the message of salvation. The four „waves of evangelization” are, according to the Italian theologian, a return to kerygmatic evangelization."
191,Pobyt i nauczanie prymasa Polski kard. Augusta Hlonda na terenie późniejszej diecezji koszalińsko-kołobrzeskiej,paper,2023-09-15,"Prymas Polski kardynał August Hlond przeszedł do historii jako postać wybitna. Przypadło mu odegrać rolę organizatora struktur kościelnych w powojennej Polsce, na mocy specjalnych przywilejów papieskich, jakich dotąd nikomu w Kościele nie przyznawano. W historię późniejszej diecezji koszalińsko-kołobrzeskiej wpisał się nie tylko swoimi decyzjami, ale również odwiedzając jej tereny. Niniejszy artykuł zawiera opis przywilejów, przebieg spotkań prymasa z „niemieckimi” rządcami dotychczasowych struktur kościelnych, informacje na temat nowych rządców administracji apostolskich utworzonych w miejsce dawnych jednostek, a także przebieg wizyt kardynała w Pokrzywnicy, Kołobrzegu i Koszalinie."
192,Laudacja z okazji wręczenia kard. Zenonowi Grocholewskiemu Nagrody imienia ks. Idziego Radziszewskiego wygłoszona dnia 27 maja 2013 r.,paper,2023-08-28,"On 27 May, 2013 took place the ceremony on the occasion of the obtaining award of the Rev. Idzi Radziszewski for Card. Zenon Grocholewski for the achievements in the spirit of the Christian Humanismus, awarded by the Scientific Society of the Catholic University of John Paul II in Lublin. In the laudation, Professor Józef Krukowski presented: a biography of Cardinal Grocholewski, his organizational activity in the Roman Curia, especially as a Prefect of the Supreme Tribunal Sygnatura Apostolica and Prefect of the Congregation for Catholic Education and the scientific achievements in canon law, philosophy of law and on the role of the universities in the modern world."
193,Visual Instruction Tuning,paper,2023-04-17,"Instruction tuning large language models (LLMs) using machine-generated instruction-following data has improved zero-shot capabilities on new tasks, but the idea is less explored in the multimodal field. In this paper, we present the first attempt to use language-only GPT-4 to generate multimodal language-image instruction-following data. By instruction tuning on such generated data, we introduce LLaVA: Large Language and Vision Assistant, an end-to-end trained large multimodal model that connects a vision encoder and LLM for general-purpose visual and language understanding.Our early experiments show that LLaVA demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 85.1% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. When fine-tuned on Science QA, the synergy of LLaVA and GPT-4 achieves a new state-of-the-art accuracy of 92.53%. We make GPT-4 generated visual instruction tuning data, our model and code base publicly available."
194,InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning,paper,2023-05-11,"Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tuning remains under-explored. In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. We gather 26 publicly available datasets, covering a wide variety of tasks and capabilities, and transform them into instruction tuning format. Additionally, we introduce an instruction-aware Query Transformer, which extracts informative features tailored to the given instruction. Trained on 13 held-in datasets, InstructBLIP attains state-of-the-art zero-shot performance across all 13 held-out datasets, substantially outperforming BLIP-2 and larger Flamingo models. Our models also lead to state-of-the-art performance when finetuned on individual downstream tasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts). Furthermore, we qualitatively demonstrate the advantages of InstructBLIP over concurrent multimodal models. All InstructBLIP models are open-sourced at https://github.com/salesforce/LAVIS/tree/main/projects/instructblip."
195,Improved Baselines with Visual Instruction Tuning,paper,2023-10-05,"Large multimodal models (LMM) have recently shown encouraging progress with visual instruction tuning. In this note, we show that the fully-connected vision-language cross-modal connector in LLaVA is surprisingly powerful and data-efficient. With simple modifications to LLaVA, namely, using CLIP-ViT-L-336px with an MLP projection and adding academic-task-oriented VQA data with simple response formatting prompts, we establish stronger baselines that achieve state-of-the-art across 11 benchmarks. Our final 13B checkpoint uses merely 1.2M publicly available data, and finishes full training in ~1 day on a single 8-A100 node. We hope this can make state-of-the-art LMM research more accessible. Code and model will be publicly available."
196,Otter: A Multi-Modal Model with In-Context Instruction Tuning,paper,2023-05-05,"Large language models (LLMs) have demonstrated significant universal capabilities as few/zero-shot learners in various tasks due to their pre-training on vast amounts of text data, as exemplified by GPT-3, which boosted to InstrctGPT and ChatGPT, effectively following natural language instructions to accomplish real-world tasks. In this paper, we propose to introduce instruction tuning into multi-modal models, motivated by the Flamingo model's upstream interleaved format pretraining dataset. We adopt a similar approach to construct our MultI-Modal In-Context Instruction Tuning (MIMIC-IT) dataset. We then introduce Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following ability and in-context learning. We also optimize OpenFlamingo's implementation for researchers, democratizing the required training resources from 1$\times$ A100 GPU to 4$\times$ RTX-3090 GPUs, and integrate both OpenFlamingo and Otter into Huggingface Transformers for more researchers to incorporate the models into their customized training and inference pipelines."
197,The Flan Collection: Designing Data and Methods for Effective Instruction Tuning,paper,2023-01-31,"We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/google-research/FLAN/tree/main/flan/v2."
198,Instruction Tuning with GPT-4,paper,2023-04-06,"Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed. In this paper, we present the first attempt to use GPT-4 to generate instruction-following data for LLM finetuning. Our early experiments on instruction-tuned LLaMA models show that the 52K English and Chinese instruction-following data generated by GPT-4 leads to superior zero-shot performance on new tasks to the instruction-following data generated by previous state-of-the-art models. We also collect feedback and comparison data from GPT-4 to enable a comprehensive evaluation and reward model training. We make our data generated using GPT-4 as well as our codebase publicly available."
199,MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning,paper,2023-09-11,"We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset. MathInstruct is compiled from 13 math datasets with intermediate rationales, six of which have rationales newly curated by us. It presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT not only unleashes the potential of tool use but also allows different thought processes for different math problems. As a result, the MAmmoTH series substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales with an average accuracy gain between 16% and 32%. Remarkably, our MAmmoTH-7B model reaches 33% on MATH (a competition-level dataset), which exceeds the best open-source 7B model (WizardMath) by 23%, and the MAmmoTH-34B model achieves 44% accuracy on MATH, even surpassing GPT-4's CoT result. Our work underscores the importance of diverse problem coverage and the use of hybrid rationales in developing superior math generalist models."
200,Aligning Large Multi-Modal Model with Robust Instruction Tuning,paper,2023,"Despite the promising progress in multi-modal tasks, current large multi-modal models (LMM) are prone to hallucinating inconsistent descriptions with respect to the associated image and human instructions. This paper addresses this issue by introducing the first large and diverse visual instruction tuning dataset, named Large-scale Robust Visual (LRV)-Instruction . Our dataset consists of 120k visual instructions generated by GPT4, covering 16 vision-and-language tasks with open-ended instructions and answers. Unlike existing studies that primarily focus on positive instruction samples, we design LRV-Instruction to include both positive and negative instructions for more robust visual instruction tuning. Our negative instructions are designed at two semantic levels: (i) Nonexistent Element Manipulation and (ii) Existent Element Manipulation . To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE) , a novel approach to evaluate visual instruction tuning without the need for human-annotated groundtruth answers and can adapt to diverse instruction formats. We conduct comprehensive experiments to investigate the hallucination of LMMs. Our results demonstrate that existing LMMs exhibit significant hallucination when presented with our negative instructions, particularly with Existent Element Manipulation instructions. Moreover, by finetuning MiniGPT4 on LRV-Instruction , we successfully mitigate hallucination while improving performance on public datasets using less training data compared to state-of-the-art methods. Additionally, we observed that a balanced ratio of positive and negative instances in the training data leads to a more robust model. Our project link is available at this link."
201,MIMIC-IT: Multi-Modal In-Context Instruction Tuning,paper,2023-06-08,"High-quality instructions and responses are essential for the zero-shot performance of large language models on interactive natural language tasks. For interactive vision-language tasks involving intricate visual scenes, a large quantity of diverse and creative instruction-response pairs should be imperative to tune vision-language models (VLMs). Nevertheless, the current availability of vision-language instruction-response pairs in terms of quantity, diversity, and creativity remains limited, posing challenges to the generalization of interactive VLMs. Here we present MultI-Modal In-Context Instruction Tuning (MIMIC-IT), a dataset comprising 2.8 million multimodal instruction-response pairs, with 2.2 million unique instructions derived from images and videos. Each pair is accompanied by multi-modal in-context information, forming conversational contexts aimed at empowering VLMs in perception, reasoning, and planning. The instruction-response collection process, dubbed as Syphus, is scaled using an automatic annotation pipeline that combines human expertise with GPT's capabilities. Using the MIMIC-IT dataset, we train a large VLM named Otter. Based on extensive evaluations conducted on vision-language benchmarks, it has been observed that Otter demonstrates remarkable proficiency in multi-modal perception, reasoning, and in-context learning. Human evaluation reveals it effectively aligns with the user's intentions. We release the MIMIC-IT dataset, instruction-response collection pipeline, benchmarks, and the Otter model."
202,PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization,paper,2023-06-08,"Instruction tuning large language models (LLMs) remains a challenging task, owing to the complexity of hyperparameter selection and the difficulty involved in evaluating the tuned models. To determine the optimal hyperparameters, an automatic, robust, and reliable evaluation benchmark is essential. However, establishing such a benchmark is not a trivial task due to the challenges associated with evaluation accuracy and privacy protection. In response to these challenges, we introduce a judge large language model, named PandaLM, which is trained to distinguish the superior model given several LLMs. PandaLM's focus extends beyond just the objective correctness of responses, which is the main focus of traditional evaluation datasets. It addresses vital subjective factors such as relative conciseness, clarity, adherence to instructions, comprehensiveness, and formality. To ensure the reliability of PandaLM, we collect a diverse human-annotated test dataset, where all contexts are generated by humans and labels are aligned with human preferences. Our results indicate that PandaLM-7B achieves 93.75% of GPT-3.5's evaluation ability and 88.28% of GPT-4's in terms of F1-score on our test dataset. PandaLM enables the evaluation of LLM to be fairer but with less cost, evidenced by significant improvements achieved by models tuned through PandaLM compared to their counterparts trained with default Alpaca's hyperparameters. In addition, PandaLM does not depend on API-based evaluations, thus avoiding potential data leakage. All resources of PandaLM are released at https://github.com/WeOpenML/PandaLM."
203,DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,paper,2022-08-25,"Large text-to-image models achieved a remarkable leap in the evolution of AI, enabling high-quality and diverse synthesis of images from a given text prompt. However, these models lack the ability to mimic the appearance of subjects in a given reference set and synthesize novel renditions of them in different contexts. In this work, we present a new approach for “personalization” of text-to-image diffusion models. Given as input just a few images of a subject, we fine-tune a pretrained text-to-image model such that it learns to bind a unique identifier with that specific subject. Once the subject is embedded in the output domain of the model, the unique identifier can be used to synthesize novel photorealistic images of the subject contextualized in different scenes. By leveraging the semantic prior embedded in the model with a new autogenous class-specific prior preservation loss, our technique enables synthesizing the subject in diverse scenes, poses, views and lighting conditions that do not appear in the reference images. We apply our technique to several previously-unassailable tasks, including subject recontextualization, text-guided view synthesis, and artistic rendering, all while preserving the subject's key features. We also provide a new dataset and evaluation protocol for this new task of subject-driven generation. Project page: https://dreambooth.github.io/"
204,Universal Language Model Fine-tuning for Text Classification,paper,2018-01-18,"Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the error by 18-24% on the majority of datasets. Furthermore, with only 100 labeled examples, it matches the performance of training from scratch on 100 times more data. We open-source our pretrained models and code."
205,LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention,paper,2023-03-28,"We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter."
206,"Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!",paper,2023-10-05,"Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs."
207,Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning,paper,2023,", question answering and natural language generation tasks. Results show that AdaLoRA outperforms existing approaches."
208,Llama 2: Open Foundation and Fine-Tuned Chat Models,paper,2023-07-18,"In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs."
209,SVDiff: Compact Parameter Space for Diffusion Fine-Tuning,paper,2023-03-20,"Diffusion models have achieved remarkable success in text-to-image generation, enabling the creation of high-quality images from text prompts or other modalities. However, existing methods for customizing these models are limited by handling multiple personalized subjects and the risk of overfitting. Moreover, their large number of parameters is inefficient for model storage. In this paper, we propose a novel approach to address these limitations in existing text-to-image diffusion models for personalization. Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space that reduces the risk of overfitting and language-drifting. We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework. Our proposed SVDiff method has a significantly smaller model size compared to existing methods (≈2,200 times fewer parameters compared with vanilla DreamBooth), making it more practical for real-world applications."
210,An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning,paper,2023-08-17,"Catastrophic forgetting (CF) is a phenomenon that occurs in machine learning when a model forgets previously learned information while acquiring new knowledge. As large language models (LLMs) have demonstrated remarkable performance, it is intriguing to investigate whether CF exists during the continual instruction tuning of LLMs. This study empirically evaluates the forgetting phenomenon in LLMs' knowledge during continual instruction tuning from the perspectives of domain knowledge, reasoning, and reading comprehension. The experiments reveal that catastrophic forgetting is generally observed in LLMs ranging from 1b to 7b parameters. Moreover, as the model scale increases, the severity of forgetting intensifies. Comparing the decoder-only model BLOOMZ with the encoder-decoder model mT0, BLOOMZ exhibits less forgetting and retains more knowledge. Interestingly, we also observe that LLMs can mitigate language biases, such as gender bias, during continual fine-tuning. Furthermore, our findings indicate that ALPACA maintains more knowledge and capacity compared to LLAMA during continual fine-tuning, suggesting that general instruction tuning can help alleviate the forgetting phenomenon in LLMs during subsequent fine-tuning processes."
211,LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models,paper,2023-04-04,"The success of large language models (LLMs), like GPT-4 and ChatGPT, has led to the development of numerous cost-effective and accessible alternatives that are created by finetuning open-access LLMs with task-specific data (e.g., ChatDoctor) or instruction data (e.g., Alpaca). Among the various fine-tuning methods, adapter-based parameter-efficient fine-tuning (PEFT) is undoubtedly one of the most attractive topics, as it only requires fine-tuning a few external parameters instead of the entire LLMs while achieving comparable or even better performance. To enable further research on PEFT methods of LLMs, this paper presents LLM-Adapters, an easy-to-use framework that integrates various adapters into LLMs and can execute these adapter-based PEFT methods of LLMs for different tasks. The framework includes state-of-the-art open-access LLMs such as LLaMA, BLOOM, and GPT-J, as well as widely used adapters such as Series adapters, Parallel adapter, Prompt-based learning and Reparametrization-based methods. Moreover, we conduct extensive empirical studies on the impact of adapter types, placement locations, and hyper-parameters to the best design for each adapter-based methods. We evaluate the effectiveness of the adapters on fourteen datasets from two different reasoning tasks, Arithmetic Reasoning and Commonsense Reasoning. The results demonstrate that using adapter-based PEFT in smaller-scale LLMs (7B) with few extra trainable parameters yields comparable, and in some cases superior, performance to powerful LLMs (175B) in zero-shot inference on both reasoning tasks."
212,LiST: Lite Prompted Self-training Makes Parameter-efficient Few-shot Learners,paper,2021-10-12,"We present a new method LiST is short for Lite Prompted Self-Training for parameter-efficient fine-tuning of large pre-trained language models (PLMs) for few-shot learning. LiST improves over recent methods that adopt prompt-based fine-tuning (FN) using two key techniques. The first is the use of self-training to leverage large amounts of unlabeled data for prompt-based FN in few-shot settings. We use self-training in conjunction with meta-learning for re-weighting noisy pseudo-prompt labels. Self-training is expensive as it requires updating all the model parameters repetitively. Therefore, we use a second technique for light-weight fine-tuning where we introduce a small number of task-specific parameters that are fine-tuned during self-training while keeping the PLM encoder frozen. Our experiments show that LiST can effectively leverage unlabeled data to improve the model performance for few-shot learning. Additionally, the fine-tuning is efficient as it only updates a small percentage of parameters and the overall model footprint is reduced since several tasks can share a common PLM encoder as backbone. A comprehensive study on six NLU tasks demonstrate LiST to improve by 35% over classic fine-tuning and 6% over prompt-based FN with 96% reduction in number of trainable parameters when fine-tuned with no more than 30 labeled examples from each task. With only 14M tunable parameters, LiST outperforms GPT-3 in-context learning by 33% on few-shot NLU tasks."
213,GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation,paper,2023-06-07,"Diffusion models have attracted significant attention due to the remarkable ability to create content and generate data for tasks like image classification. However, the usage of diffusion models to generate the high-quality object detection data remains an underexplored area, where not only image-level perceptual quality but also geometric conditions such as bounding boxes and camera views are essential. Previous studies have utilized either copy-paste synthesis or layout-to-image (L2I) generation with specifically designed modules to encode the semantic layouts. In this paper, we propose the GeoDiffusion, a simple framework that can flexibly translate various geometric conditions into text prompts and empower pre-trained text-to-image (T2I) diffusion models for high-quality detection data generation. Unlike previous L2I methods, our GeoDiffusion is able to encode not only the bounding boxes but also extra geometric conditions such as camera views in self-driving scenes. Extensive experiments demonstrate GeoDiffusion outperforms previous L2I methods while maintaining 4x training time faster. To the best of our knowledge, this is the first work to adopt diffusion models for layout-to-image generation with geometric conditions and demonstrate that L2I-generated images can be beneficial for improving the performance of object detectors."
214,Self-regulating Prompts: Foundational Model Adaptation without Forgetting,paper,2023-07-13,"Prompt learning has emerged as an efficient alternative for fine-tuning foundational models, such as CLIP, for various downstream tasks. Conventionally trained using the task-specific objective, i.e., cross-entropy loss, prompts tend to overfit downstream data distributions and find it challenging to capture task-agnostic general features from the frozen CLIP. This leads to the loss of the model’s original generalization capability. To address this issue, our work introduces a self-regularization framework for prompting called PromptSRC (Prompting with Self-regulating Constraints). PromptSRC guides the prompts to optimize for both task-specific and task-agnostic general representations using a three-pronged approach by: (a) regulating prompted representations via mutual agreement maximization with the frozen model, (b) regulating with self-ensemble of prompts over the training trajectory to encode their complementary strengths, and (c) regulating with textual diversity to mitigate sample diversity imbalance with the visual branch. To the best of our knowledge, this is the first regularization framework for prompt learning that avoids overfitting by jointly attending to pre-trained model features, the training trajectory during prompting, and the textual diversity. PromptSRC explicitly steers the prompts to learn a representation space that maximizes performance on downstream tasks without compromising CLIP generalization. We perform extensive experiments on 4 benchmarks where PromptSRC overall performs favorably well compared to the existing methods. Our code and pre-trained models are publicly available at: https://github.com/muzairkhattak/PromptSRC."
215,"‘It’s not a diet, it’s a lifestyle’: a longitudinal, data-prompted interview study of weight loss maintenance",paper,2019-03-23,"Abstract Objective: To advance understanding of the individual and environmental factors underpinning weight loss maintenance. Design: Semi-structured, data-prompted interviews were conducted with twelve overweight adult participants (three men, nine women) who had lost over 5% of their body weight in the year before baseline. Participants gathered daily data through wireless scales, activity monitors (Fitbit™), ecological momentary assessment and experience sampling (taking photographs, writing notes). They were interviewed at 3- and 6-months post baseline. Interview stimuli included personal data of weight and activity graphs, correlations of psychological factors, and self-generated notes and photographs. Interview data were analysed using the Framework Method, applying pre-specified maintenance-relevant theoretical themes. Results: The theoretical Framework provided a good fit for the narratives, with five main themes underpinning successful weight loss maintenance: sustained motivation, effective self-regulation, plentiful resources, habit formation and a supportive environment. Additionally, participants reported an identity shift from being a dieter to accepting a new healthy lifestyle. Goal prioritising and allowing for occasional controlled lapses enhanced weight loss maintenance. Conclusions: This study successfully used the novel method of data-prompted interviews to explore weight loss maintenance experiences with new explanations emerging from the data. Future research should further develop behaviour change maintenance theory and data-prompted interview method."
216,Alfred: A System for Prompted Weak Supervision,paper,2023-05-29,"Alfred is the first system for programmatic weak supervision (PWS) that creates training data for machine learning by prompting. In contrast to typical PWS systems where weak supervision sources are programs coded by experts, Alfred enables users to encode their subject matter expertise via natural language prompts for language and vision-language models. Alfred provides a simple Python interface for the key steps of this emerging paradigm, with a high-throughput backend for large-scale data labeling. Users can quickly create, evaluate, and refine their prompt-based weak supervision sources; map the results to weak labels; and resolve their disagreements with a label model. Alfred enables a seamless local development experience backed by models served from self-managed computing clusters. It automatically optimizes the execution of prompts with optimized batching mechanisms. We find that this optimization improves query throughput by 2.9x versus a naive approach. We present two example use cases demonstrating Alfred on YouTube comment spam detection and pet breeds classification. Alfred is open source, available at https://github.com/BatsResearch/alfred."
217,On the evolution of the human self: A data-driven review and reconsideration,paper,2019-01-02,"Abstract We revisit the thesis we first offered in 1997, namely, that the human capacity called “the self” is the product of evolutionary pressures. A review of the literature accumulated in the intervening 20 years prompted three changes to the original thesis. First, we expanded our 1997 conception of the self. We argue that the self consists of a multiplicity of cognitions, each of which may reflect the action of a different neural system. Second, we revised the timeline for the evolution of the human self. At least some components of the human self were present in hominids earlier than the 100,000 years-old date that we speculated served as the oldest-age boundary for the emergence of the self. Third, we supplemented the evidentiary basis by relying on advances in brain structure, brain function, and the genetic underpinnings of the brain. In comparison to the state of knowledge in 1997, there is more reason to assert in 2017 that humans have the capacity to experience a self because this trait was selected via evolution."
218,Small Language Model Can Self-correct,paper,2024-01-14,"Generative Language Models (LMs) such as ChatGPT have exhibited remarkable performance across various downstream tasks. Nevertheless, one of their most prominent drawbacks is generating inaccurate or false information with a confident tone. Previous studies have devised sophisticated pipelines and prompts to induce large LMs to exhibit the capability for self-correction. However, large LMs are explicitly prompted to verify and modify their answers separately rather than completing all steps spontaneously like humans. Moreover, these complex prompts are extremely challenging for small LMs to follow. In this paper, we introduce the Intrinsic Self-Correction (ISC) in generative language models, aiming to correct the initial output of LMs in a self-triggered manner, even for those small LMs with 6 billion parameters. Specifically, we devise a pipeline for constructing self-correction data and propose Partial Answer Masking (PAM), aiming to endow the model with the capability for intrinsic self-correction through fine-tuning. We conduct experiments using LMs with parameters sizes ranging from 6 billion to 13 billion in two tasks, including commonsense reasoning and factual knowledge reasoning. Our experiments demonstrate that the outputs generated using ISC outperform those generated without self-correction. We believe that the output quality of even small LMs can be further improved by empowering them with the ability to intrinsic self-correct."
219,Domain-Adversarial Training of Self-Attention Based Networks for Land Cover Classification using Multi-temporal Sentinel-2 Satellite Imagery,paper,2021-04-01,"The increasing availability of large-scale remote sensing labeled data has prompted researchers to develop increasingly precise and accurate data-driven models for land cover and crop classification (LC&CC). Moreover, with the introduction of self-attention and introspection mechanisms, deep learning approaches have shown promising results in processing long temporal sequences in the multi-spectral domain with a contained computational request. Nevertheless, most practical applications cannot rely on labeled data, and in the field, surveys are a time-consuming solution that pose strict limitations to the number of collected samples. Moreover, atmospheric conditions and specific geographical region characteristics constitute a relevant domain gap that does not allow direct applicability of a trained model on the available dataset to the area of interest. In this paper, we investigate adversarial training of deep neural networks to bridge the domain discrepancy between distinct geographical zones. In particular, we perform a thorough analysis of domain adaptation applied to challenging multi-spectral, multi-temporal data, accurately highlighting the advantages of adapting state-of-the-art self-attention-based models for LC&CC to different target zones where labeled data are not available. Extensive experimentation demonstrated significant performance and generalization gain in applying domain-adversarial training to source and target regions with marked dissimilarities between the distribution of extracted features."
220,"Changing Patterns of Substance Use During the Coronavirus Pandemic: Self-Reported Use of Tobacco, Alcohol, Cannabis, and Other Drugs",paper,2021-05-26,"As in many other countries worldwide, the coronavirus pandemic prompted the implementation of an “intelligent lockdown” in the spring of 2020 in the Netherlands, including the closure of nightlife venues and cancellation of festivals. Such restrictions and social distancing could particularly affect people who use alcohol or other drugs in recreational settings and give rise to new challenges and additional needs in the field of addiction prevention and care. To monitor changes in substance use and provide services with practical directions for tailored prevention, an anonymous web survey was set up, targeting a convenience sample aged 16 years or older through various social media and other online channels. Between May and October 2020, a total of 6,070 participants completed the survey, mainly adolescents and young adults (16–24 years old). These data were used to explore and describe changing patterns in substance use. Overall results showed declined current use compared to “pre-corona,” but mask underlying variation in changing patterns, including discontinued (tobacco 10.4%, alcohol 11.3%, cannabis 16.3%, other drugs 30.4%), decreased (tobacco 23.0%, alcohol 29.1%, cannabis 17.4%, other drugs 20.7%), unchanged (tobacco 30.3%, alcohol 21.2%, cannabis 22.3%, other drugs 17.3%), increased (tobacco 29.6%, alcohol 32.1%, cannabis 32.9%, other drugs 25.3%), and (re)commenced use (tobacco 6.7%, alcohol 6.3%, cannabis 11.1%, other drugs 6.2%). Especially the use of drugs like ecstasy and nitrous oxide was discontinued or decreased due to the lack of social occasions for use. Increased use was associated with coping motives for all substance types. As measures combatting the coronavirus may need to be practiced for some time to come, possibly leading to prolonged changes in substance use with lingering “post-corona” consequences, timely and ongoing monitoring of changing patterns of substance use is vital for informing prevention services within this field."
221,AgentCoord: Visually Exploring Coordination Strategy for LLM-based Multi-Agent Collaboration,paper,2024-04-18,"The potential of automatic task-solving through Large Language Model (LLM)-based multi-agent collaboration has recently garnered widespread attention from both the research community and industry. While utilizing natural language to coordinate multiple agents presents a promising avenue for democratizing agent technology for general users, designing coordination strategies remains challenging with existing coordination frameworks. This difficulty stems from the inherent ambiguity of natural language for specifying the collaboration process and the significant cognitive effort required to extract crucial information (e.g. agent relationship, task dependency, result correspondence) from a vast amount of text-form content during exploration. In this work, we present a visual exploration framework to facilitate the design of coordination strategies in multi-agent collaboration. We first establish a structured representation for LLM-based multi-agent coordination strategy to regularize the ambiguity of natural language. Based on this structure, we devise a three-stage generation method that leverages LLMs to convert a user's general goal into an executable initial coordination strategy. Users can further intervene at any stage of the generation process, utilizing LLMs and a set of interactions to explore alternative strategies. Whenever a satisfactory strategy is identified, users can commence the collaboration and examine the visually enhanced execution result. We develop AgentCoord, a prototype interactive system, and conduct a formal user study to demonstrate the feasibility and effectiveness of our approach."
222,MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue Resolution,paper,2024-03-26,"In software development, resolving the emergent issues within GitHub repositories is a complex challenge that involves not only the incorporation of new code but also the maintenance of existing code. Large Language Models (LLMs) have shown promise in code generation but face difficulties in resolving Github issues, particularly at the repository level. To overcome this challenge, we empirically study the reason why LLMs fail to resolve GitHub issues and analyze the major factors. Motivated by the empirical findings, we propose a novel LLM-based Multi-Agent framework for GitHub Issue reSolution, MAGIS, consisting of four agents customized for software evolution: Manager, Repository Custodian, Developer, and Quality Assurance Engineer agents. This framework leverages the collaboration of various agents in the planning and coding process to unlock the potential of LLMs to resolve GitHub issues. In experiments, we employ the SWE-bench benchmark to compare MAGIS with popular LLMs, including GPT-3.5, GPT-4, and Claude-2. MAGIS can resolve 13.94% GitHub issues, significantly outperforming the baselines. Specifically, MAGIS achieves an eight-fold increase in resolved ratio over the direct application of GPT-4, the advanced LLM."
223,LLM-Based Agent Society Investigation: Collaboration and Confrontation in Avalon Gameplay,paper,2023-10-23,"This paper aims to investigate the open research problem of uncovering the social behaviors of LLM-based agents. To achieve this goal, we adopt Avalon, a representative communication game, as the environment and use system prompts to guide LLM agents to play the game. While previous studies have conducted preliminary investigations into gameplay with LLM agents, there lacks research on their social behaviors. In this paper, we present a novel framework designed to seamlessly adapt to Avalon gameplay. The core of our proposed framework is a multi-agent system that enables efficient communication and interaction among agents. We evaluate the performance of our framework based on metrics from two perspectives: winning the game and analyzing the social behaviors of LLM agents. Our results demonstrate the effectiveness of our framework in generating adaptive and intelligent agents and highlight the potential of LLM-based agents in addressing the challenges associated with dynamic social environment interaction. By analyzing the social behaviors of LLM agents from the aspects of both collaboration and confrontation, we provide insights into the research and applications of this domain. Our code is publicly available at https://github.com/3DAgentWorld/LLM-Game-Agent"
224,Theory of Mind for Multi-Agent Collaboration via Large Language Models,paper,2023-10-16,"While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents."
225,Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection,paper,2024-08-12,"Log-based insider threat detection (ITD) detects malicious user activities by auditing log entries. Recently, large language models (LLMs) with strong common sense knowledge have emerged in the domain of ITD. Nevertheless, diverse activity types and overlong log files pose a significant challenge for LLMs in directly discerning malicious ones within myriads of normal activities. Furthermore, the faithfulness hallucination issue from LLMs aggravates its application difficulty in ITD, as the generated conclusion may not align with user commands and activity context. In response to these challenges, we introduce Audit-LLM, a multi-agent log-based insider threat detection framework comprising three collaborative agents: (i) the Decomposer agent, breaking down the complex ITD task into manageable sub-tasks using Chain-of-Thought (COT) reasoning;(ii) the Tool Builder agent, creating reusable tools for sub-tasks to overcome context length limitations in LLMs; and (iii) the Executor agent, generating the final detection conclusion by invoking constructed tools. To enhance conclusion accuracy, we propose a pair-wise Evidence-based Multi-agent Debate (EMAD) mechanism, where two independent Executors iteratively refine their conclusions through reasoning exchange to reach a consensus. Comprehensive experiments conducted on three publicly available ITD datasets-CERT r4.2, CERT r5.2, and PicoDomain-demonstrate the superiority of our method over existing baselines and show that the proposed EMAD significantly improves the faithfulness of explanations generated by LLMs."
226,(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts,paper,2024-05-20,"Recent advancements in machine translation (MT) have significantly enhanced translation quality across various domains. However, the translation of literary texts remains a formidable challenge due to their complex language, figurative expressions, and cultural nuances. In this work, we introduce a novel multi-agent framework based on large language models (LLMs) for literary translation, implemented as a company called TransAgents, which mirrors traditional translation publication process by leveraging the collective capabilities of multiple agents, to address the intricate demands of translating literary works. To evaluate the effectiveness of our system, we propose two innovative evaluation strategies: Monolingual Human Preference (MHP) and Bilingual LLM Preference (BLP). MHP assesses translations from the perspective of monolingual readers of the target language, while BLP uses advanced LLMs to compare translations directly with the original texts. Empirical findings indicate that despite lower d-BLEU scores, translations from TransAgents are preferred by both human evaluators and LLMs over human-written references, particularly in genres requiring domain-specific knowledge. We also highlight the strengths and limitations of TransAgents through case studies and suggests directions for future research."
227,Multi-Agent Collaboration Framework for Recommender Systems,paper,2024-02-23,"LLM-based agents have gained considerable attention for their decision-making skills and ability to handle complex tasks. Recognizing the current gap in leveraging agent capabilities for multi-agent collaboration in recommendation systems, we introduce MACRec, a novel framework designed to enhance recommendation systems through multi-agent collaboration. Unlike existing work on using agents for user/item simulation, we aim to deploy multi-agents to tackle recommendation tasks directly. In our framework, recommendation tasks are addressed through the collaborative efforts of various specialized agents, including Manager, User/Item Analyst, Reflector, Searcher, and Task Interpreter, with different working flows. Furthermore, we provide application examples of how developers can easily use MACRec on various recommendation tasks, including rating prediction, sequential recommendation, conversational recommendation, and explanation generation of recommendation results. The framework and demonstration video are publicly available at https://github.com/wzf2000/MACRec."
228,Triad: A Framework Leveraging a Multi-Role LLM-based Agent to Solve Knowledge Base Question Answering,paper,2024-02-22,"Recent progress with LLM-based agents has shown promising results across various tasks. However, their use in answering questions from knowledge bases remains largely unexplored. Implementing a KBQA system using traditional methods is challenging due to the shortage of task-specific training data and the complexity of creating task-focused model structures. In this paper, we present Triad, a unified framework that utilizes an LLM-based agent with three roles for KBQA tasks. The agent is assigned three roles to tackle different KBQA subtasks: agent as a generalist for mastering various subtasks, as a decision maker for the selection of candidates, and as an advisor for answering questions with knowledge. Our KBQA framework is executed in four phases, involving the collaboration of the agent's multiple roles. We evaluated the performance of our framework using three benchmark datasets, and the results show that our framework outperforms state-of-the-art systems on the LC-QuAD and YAGO-QA benchmarks, yielding F1 scores of 11.8% and 20.7%, respectively."
229,MetaGPT: Meta Programming for Multi-Agent Collaborative Framework,paper,2023-08-01,"Recently, remarkable progress has been made in automated task-solving through the use of multi-agent driven by large language models (LLMs). However, existing LLM-based multi-agent works primarily focus on solving simple dialogue tasks, and complex tasks are rarely studied, mainly due to the LLM hallucination problem. This type of hallucination becomes cascading when naively chaining multiple intelligent agents, resulting in a failure to effectively address complex problems. Therefore, we introduce MetaGPT, an innovative framework that incorporates efficient human workflows as a meta programming approach into LLM-based multi-agent collaboration. Specifically, MetaGPT encodes Standardized Operating Procedures (SOPs) into prompts to enhance structured coordination. Subsequently, it mandates modular outputs, empowering agents with domain expertise comparable to human professionals, to validate outputs and minimize compounded errors. In this way, MetaGPT leverages the assembly line paradigm to assign diverse roles to various agents, thereby establishing a framework that can effectively and cohesively deconstruct complex multi-agent collaborative problems. Our experiments on collaborative software engineering benchmarks demonstrate that MetaGPT generates more coherent and correct solutions compared to existing chat-based multi-agent systems. This highlights the potential of integrating human domain knowledge into multi-agent systems, thereby creating new opportunities to tackle complex real-world challenges. The GitHub repository of this project is publicly available on:https://github.com/geekan/MetaGPT."
230,Unleashing the Emergent Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration,paper,2023-07-11,"Human intelligence thrives on cognitive synergy, where collaboration among different minds yield superior outcomes compared to isolated individuals. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist is an intelligent agent that collaboratively combines multiple minds’ strengths and knowledge to enhance problem-solving in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. Our in-depth analysis shows that assigning multiple fine-grained personas in LLMs improves problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, experimental results demonstrate that SPP effectively reduces factual hallucination, and maintains strong reasoning capabilities. Additionally, comparative experiments show that cognitive synergy only emerges in GPT-4 and does not appear in less capable models, such as GPT-3.5-turbo and Llama2-13b-chat, which draws an interesting analogy to human development. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git."
231,Vein-Based Coalitions for Multi-Agent Pattern Formation Tasks,paper,2022-10-01,"This letter explores the vein trait of the structural formation task and extracts four typical parallel-straight vein structures for generating coalitions. Then, the Vein-Based Multi-Agent Pattern Formation (VB-MAPF) method is proposed to resolve task and path planning on coalitions. Besides, the presented rectifying operation guarantees the optimal coalition structure, and the vein-based pruned conflict-based search reduces the searching space for collision-free path planning. Compared with typical solvers, the inter-performance and the intra-performance orders show the efficiency and scalability of the VB-MAPF method in terms of makespan, average distance, and maximum distance on dozens of coalitions and thousands of agents."
232,Multi-Agent Pattern Formation: a Distributed Model-Free Deep Reinforcement Learning Approach,paper,2020-07-01,"In this paper, we investigate how a large-scale system of independently learning agents can collectively form acceptable two-dimensional patterns (pattern formation) from any initial configuration. We propose a decentralized multi-agent deep reinforcement learning architecture MAPF-DQN (Multi-Agent Pattern Formation DQN) in which a set of independent and distributed agents capture their local visual field and learn how to act so as to collectively form target shapes. Agents exploit their individual networks with a central replay memory and target networks that are used to store and update the representation of the environment as well as learning the dynamics of the other agents. We then show that agents trained on random patterns using MAPF-DQN can organize themselves into very complex shapes in large-scale environments. Our results suggest that the proposed framework achieves zero-shot generalization on most of the environments independently of the depth of view of agents."
233,Multi-Agent Pattern Formation with Deep Reinforcement Learning (Student Abstract),paper,2020-04-03,"We propose a decentralized multi-agent deep reinforcement learning architecture to investigate pattern formation under the local information provided by the agents' sensors. It consists of tasking a large number of homogeneous agents to move to a set of specified goal locations, addressing both the assignment and trajectory planning sub-problems concurrently. We then show that agents trained on random patterns can organize themselves into very complex shapes."
234,Multi-Agent pattern recognition mechanism for detecting distributed denial of service attacks,paper,2010-12-20,"Distributed denial of service (DDoS) attacks pose a significant threat to the smooth operations of today's online critical services and applications. Existing mechanisms to detect these attacks have had limited success. With the rapid growth in size and bandwidth of contemporary computer networks, an efficient and effective distributed solution is needed for detecting DDoS attacks. In this study, the authors propose a multiagent pattern recognition mechanism for detecting DDoS attacks, in adistributed fashion. Our proposed solution is very effective in detecting such attacks launched against victim servers residing inside a production network which has multiple gateways to the Internet. Using simulation, the authors show that our proposed mechanism achieves a high degree of accuracy in detecting DDoS attacks, with low false alarm rates, using a reasonable numbers of attack detection agents collaboratively operating in a typical production network. The authors also study the relationship of the number of agents participating in the attack detection process and the false alarm rate of the detection scheme."
235,A Multi-agent Pattern Based Timetabling System,paper,2011,"The “Academic Timetabling System” is a research project that has been funded by the deanship of scientific research of Tabuk University. The project aims at developing and implementing an academic time tabling system for Tabuk University that is based on MultiAgent technology. In this paper we introduce the “Academic Time Tabling System” that has been designed and implemented in Tabuk University. The introduced system has two research points: The first is the introduction of a new solution model for the time tabling problem that uses predefined class hours patterns according to the features and requirements of each scheduled course, and successively filters these possible patterns as the timetable is being built. The second is introducing the Multi-Agent technology as an alternative framework for solving timetabling problems. The proposed model has been implemented and applied on real environment at Tabuk University, and the results were satisfactory in terms of compactness and preferences satisfactionRegarding to the preferences satisfaction the results show that about 95 percent of courses are scheduled into the preference time periods proposed by the instructors, referring to the compactnessthe results shows that about 97 percent of the studying hours is scheduled without gaps."
236,Dynamic Beam Pattern and Bandwidth Allocation Based on Multi-Agent Deep Reinforcement Learning for Beam Hopping Satellite Systems,paper,2022-04-01,"Due to the non-uniform geographic distribution and time-varying characteristics of the ground traffic request, how to make full use of the limited beam resources to serve users flexibly and efficiently is a brand-new challenge for beam hopping satellite systems. The conventional greedy-based beam hopping methods do not consider the long-term reward, which is difficult to deal with the time-varying traffic demand. Meanwhile, the heuristic algorithms such as genetic algorithm have a slow convergence time, which can not achieve real-time scheduling. Furthermore, existing methods based on deep reinforcement learning (DRL) only make decisions on beam patterns, lack of the freedom of bandwidth. This paper proposes a dynamic beam pattern and bandwidth allocation scheme based on DRL, which flexibly uses three degrees of freedom of time, space and frequency. Considering that the joint allocation of bandwidth and beam pattern will lead to an explosion of action space, a cooperative multi-agents deep reinforcement learning (MADRL) framework is presented in this paper, where each agent is only responsible for the illumination allocation or bandwidth allocation of one beam. The agents can learn to collaborate by sharing the same reward to achieve the common goal, which refers to maximize the throughput and minimize the delay fairness between cells. Simulation results demonstrate that the offline trained MADRL model can achieve real-time beam pattern and bandwidth allocation to match the non-uniform and time-varying traffic request. Furthermore, when the traffic demand increases, our model has a good generalization ability."
237,HiVT: Hierarchical Vector Transformer for Multi-Agent Motion Prediction,paper,2022-06-01,"Accurately predicting the future motions of surrounding traffic agents is critical for the safety of autonomous ve-hicles. Recently, vectorized approaches have dominated the motion prediction community due to their capability of capturing complex interactions in traffic scenes. How-ever, existing methods neglect the symmetries of the prob-lem and suffer from the expensive computational cost, facing the challenge of making real-time multi-agent motion prediction without sacrificing the prediction performance. To tackle this challenge, we propose Hierarchical Vector Transformer (HiVT) for fast and accurate multi-agent motion prediction. By decomposing the problem into local con-text extraction and global interaction modeling, our method can effectively and efficiently model a large number of agents in the scene. Meanwhile, we propose a translation-invariant scene representation and rotation-invariant spa-tial learning modules, which extract features robust to the geometric transformations of the scene and enable the model to make accurate predictions for multiple agents in a single forward pass. Experiments show that HiVT achieves the state-of-the-art performance on the Argoverse motion forecasting benchmark with a small model size and can make fast multi-agent motion prediction."
238,MotionDiffuser: Controllable Multi-Agent Motion Prediction Using Diffusion,paper,2023-06-01,"We present MotionDiffuser, a diffusion based representation for the joint distribution of future trajectories over multiple agents. Such representation has several key advantages: first, our model learns a highly multimodal distribution that captures diverse future outcomes. Second, the simple predictor design requires only a single L2 loss training objective, and does not depend on trajectory anchors. Third, our model is capable of learning the joint distribution for the motion of multiple agents in a permutation-invariant manner. Furthermore, we utilize a compressed trajectory representation via PCA, which improves model performance and allows for efficient computation of the exact sample log probability. Subsequently, we propose a general constrained sampling framework that enables controlled trajectory sampling based on differentiable cost functions. This strategy enables a host of applications such as enforcing rules and physical priors, or creating tailored simulation scenarios. MotionDiffuser can be combined with existing backbone architectures to achieve top motion forecasting results. We obtain state-of-the-art results for multi-agent motion prediction on the Waymo Open Motion Dataset."
239,EqMotion: Equivariant Multi-Agent Motion Prediction with Invariant Interaction Reasoning,paper,2023-03-20,"Learning to predict agent motions with relationship reasoning is important for many applications. In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle. However, such equivariance and invariance properties are overlooked by most existing methods. To fill this gap, we propose Eq-Motion, an efficient equivariant motion prediction model with invariant interaction reasoning. To achieve motion equivariance, we propose an equivariant geometric feature learning module to learn a Euclidean transformable feature through dedicated designs of equivariant operations. To reason agent's interactions, we propose an invariant interaction reasoning module to achieve a more stable interaction modeling. To further promote more comprehensive motion features, we propose an invariant pattern feature learning module to learn an invariant pattern feature, which cooperates with the equivariant geometric feature to enhance network expressiveness. We conduct experiments for the proposed model on four distinct scenarios: particle dynamics, molecule dynamics, human skeleton motion prediction and pedestrian trajectory prediction. Experimental results show that our method is not only generally applicable, but also achieves state-of-the-art prediction performances on all the four tasks, improving by 24.0/30.1/8.6/9.2%. Code is available at https://github.com/MediaBrain-SJTU/EqMotion."
240,"RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models",paper,2023-10-01,"The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4)."
241,Character-LLM: A Trainable Agent for Role-Playing,paper,2023-10-16,"Large language models (LLMs) can be used to serve as agents to simulate human behaviors, given the powerful ability to understand human instructions and provide high-quality generated texts. Such ability stimulates us to wonder whether LLMs can simulate a person in a higher form than simple human behaviors. Therefore, we aim to train an agent with the profile, experience, and emotional states of a specific person instead of using limited prompts to instruct ChatGPT API. In this work, we introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc. Our method focuses on editing profiles as experiences of a certain character and training models to be personal simulacra with these experiences. To assess the effectiveness of our approach, we build a test playground that interviews trained agents and evaluates whether the agents \textit{memorize} their characters and experiences. Experimental results show interesting observations that help build future simulacra of humankind."
242,Social Conflict in Role-Playing Communities: An Exploratory Qualitative Study,paper,2023-03-16,"Much of the current research in the field of role-playing studies focuses upon the positive impact that games can have on the lives of participants. Analysis of the more negative social interactions within role-playing communities becomes necessary in order to establish a more complete picture of the psychosocial effects of these games. This research describes potential problems within role-playing communities in order to aid groups experiencing cohesion difficulties.
This thematic, qualitative ethnography describes the types of social conflict occurring within role-playing groups and examines possible sources for their exacerbation. The study includes several types of role-playing from a phenomenological perspective, including tabletop, larp, and virtual gaming. Semi-structured interviews were collected from a selective sample of 30 international participants gathered from vastly different play cultures. While the types of games and methods of play contributed to conflict in some instances, striking similarities between the experiences of players across modes, cultures, and genres were observed.
Emergent themes for sources of conflict included general problems inherent to group behavior, such as schisms, Internet communication, and intimate relationships. Other sources of conflict unique to the role-playing experience included creative agenda differences, the game master/player power differential, and the phenomenon of bleed, both in- and out-of-game. Potentially conflict-inducing play styles included long-term immersion into character, campaign-style, and competitive play."
243,Implementation of Role-Playing Games in Overcoming Introverted Children,paper,2021,"This study aims to analyze and examine the application of role-playing games in overcoming introverted children in early childhood at RA Uswatun Hasanah, Maron, Probolinggo. This research uses a qualitative approach, while the type of research uses case studies. The data analysis technique uses data reduction, data display, and drawing conclusions or verification. The results showed that the teacher's steps in implementing role-playing games in overcoming the problems of introverted children through; Preparation and Planning Analysis, Role-Playing Engineering, Activity Documentation, Activity Evaluation. This research has implications for the use of role-playing, especially at RA Uswatun Hasanah. Introverted children begin to be able to mingle and even adapt to their friends, albeit slowly."
244,Educational Innovation in Higher Education: Use of Role Playing and Educational Video in Future Teachers’ Training,paper,2020-03-24,"Information and communication technologies (ICTs) have led to the emergence of a variety of active and innovative teaching methods. This is the case in role-playing, which consists of simulating a real-life situation, in this case the school context, in which the student takes on a certain role and interacts with other students in a fictitious situation. Framed in this way, the present study aims to show if the application of the role-playing method promotes the improvement of attitude variables and practical skills. To this end, we advocated the use of a quasi-experimental methodology, with a control and experimental group and the application of a post-test. The sample is composed of 138 students from the Master of Teachers of Compulsory Secondary Education in Ceuta (Spain). The results showed that the students positively valued the application of the method, obtaining better scores in the set of variables studied, especially in motivation, creativity and collaboration. Therefore, it continues to be observed that the application of innovative methodologies through technology promotes the increase of multiple skills in the student body. This study aimed to prove that the use of active methods provides an increase in students’ skills, and that, therefore, we must bet on the use of sustainable pedagogies in order to promote a real innovation in the classrooms."
245,What.Hack: Engaging Anti-Phishing Training Through a Role-playing Phishing Simulation Game,paper,2019-05-02,"Phishing attacks are a major problem, as evidenced by the DNC hackings during the 2016 US presidential election, in which staff were tricked into sharing passwords by fake Google security emails, granting access to confidential information. Vulnerabilities such as these are due in part to insufficient and tiresome user training in cybersecurity. Ideally, we would have more engaging training methods that teach cybersecurity in an active and entertaining way. To address this need, we introduce the game What.Hack, which not only teaches phishing concepts but also simulates actual phishing attacks in a role-playing game to encourage the player to practice defending themselves. Our user study shows that our game design is more engaging and effective in improving performance than a standard form of training and a competing training game design (which does not simulate phishing attempts through role-playing)."
246,Computer-Generated Music for Tabletop Role-Playing Games,paper,2020-08-16,"In this paper we present Bardo Composer, a system to generate background music for tabletop role-playing games. Bardo Composer uses a speech recognition system to translate player speech into text, which is classified according to a model of emotion. Bardo Composer then uses Stochastic Bi-Objective Beam Search, a variant of Stochastic Beam Search that we introduce in this paper, with a neural model to generate musical pieces conveying the desired emotion. We performed a user study with 116 participants to evaluate whether people are able to correctly identify the emotion conveyed in the pieces generated by the system. In our study we used pieces generated for Call of the Wild, a Dungeons and Dragons campaign available on YouTube. Our results show that human subjects could correctly identify the emotion of the generated music pieces as accurately as they were able to identify the emotion of pieces written by humans."
247,A virtual reality role-playing serious game for experiential learning,paper,2019-12-17,"ABSTRACT Educational systems can benefit from Virtual Reality’s (VR) ability to support experiential learning. In particular, VR based games, especially role-playing serious games (RPGs), can promote learning through the simulation of various educational scenarios. This study proposes an immersive VR-RPG to educate players about the behavior of honeybees. The player adopts the role of a honeybee and experiences a virtual world mimicking the real one from the honeybee’s perspective. Unlike most studies in educational VR, we assess the impact of immersion on knowledge gain by testing the players’ knowledge on the subject before, immediately after, and one week following the use of the system. We also compare the proposed system with both a conventional and a desktop VR-RPG approach. The results indicate that students significantly gained knowledge in all methods compared to the pre-test. We found that the immersion level for both tested VR-RPGs did not have a significant effect on learning. However, the study showed an improvement in knowledge retention for the desktop VR-RPG users compared to those of the conventional method. Moreover, the results revealed that users of the immersive and desktop VR-RPGs were more motivated and engaged compared to those of the conventional method."
248,CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation,paper,2024-01-02,"Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA assessment, complemented by a tailored high-quality dataset. The dataset comprises 1,785 multi-turn role-playing dialogues, encompassing 23,020 examples and featuring 77 characters derived from Chinese novels and scripts. It was carefully constructed, beginning with initial dialogue extraction via GPT-4, followed by rigorous human-led quality control, and enhanced with in-depth character profiles sourced from Baidu Baike. CharacterEval employs a multifaceted evaluation approach, encompassing thirteen targeted metrics on four dimensions. Comprehensive experiments on CharacterEval demonstrate that Chinese LLMs exhibit more promising capabilities than GPT-4 in Chinese role-playing conversation. Source code, data source and reward model will be publicly accessible at https://github.com/morecry/CharacterEval."
249,Book Review: The Sage Handbook of Qualitative Research in Organizational Communication,paper,2024-04-10,"QH304.K59 2005. Lindsay, D. (2011). Scientific writing = thinking in words. Victoria Thousand Oaks, CA: Sage. The Sage handbook of qualitative research. Denzin, N.K. and Lincoln, Y.S., Eds. (2013-2017). THE SAGE HANDBOOK OF QUALITATIVE RESEARCH, 5th Edition. Thousand Oaks, CA: SAGE Publications. (2005) Poetics for a planet: discourse on some problems of being-in-place. In: Denzin NK, Lincoln YS (eds) The Sage Handbook of Qualitative Research (3rd ed."
250,Communication-Efficient Learning of Deep Networks from Decentralized Data,paper,2016-02-17,"Modern mobile devices have access to a wealth of data suitable for learning models, which in turn can greatly improve the user experience on the device. For example, language models can improve speech recognition and text entry, and image models can automatically select good photos. However, this rich data is often privacy sensitive, large in quantity, or both, which may preclude logging to the data center and training there using conventional approaches. We advocate an alternative that leaves the training data distributed on the mobile devices, and learns a shared model by aggregating locally-computed updates. We term this decentralized approach Federated Learning. 
We present a practical method for the federated learning of deep networks based on iterative model averaging, and conduct an extensive empirical evaluation, considering five different model architectures and four datasets. These experiments demonstrate the approach is robust to the unbalanced and non-IID data distributions that are a defining characteristic of this setting. Communication costs are the principal constraint, and we show a reduction in required communication rounds by 10-100x as compared to synchronized stochastic gradient descent."
251,Federated Learning: Strategies for Improving Communication Efficiency,paper,2016-10-18,"Federated Learning is a machine learning setting where the goal is to train a high-quality centralized model while training data remains distributed over a large number of clients each with unreliable and relatively slow network connections. We consider learning algorithms for this setting where on each round, each client independently computes an update to the current model based on its local data, and communicates this update to a central server, where the client-side updates are aggregated to compute a new global model. The typical clients in this setting are mobile phones, and communication efficiency is of the utmost importance. In this paper, we propose two ways to reduce the uplink communication costs: structured updates, where we directly learn an update from a restricted space parametrized using a smaller number of variables, e.g. either low-rank or a random mask; and sketched updates, where we learn a full model update and then compress it using a combination of quantization, random rotations, and subsampling before sending it to the server. Experiments on both convolutional and recurrent networks show that the proposed methods can reduce the communication cost by two orders of magnitude."
252,Fiber‐Optic Communication Systems,paper,2021-06-04,Preface. 1 Introduction. 1.1 Historical Perspective. 1.2 Basic Concepts. 1.3 Optical Communication Systems. 1.4 Lightwave System Components. Problems. References. 2 Optical Fibers. 2.1 Geometrical-Optics Description. 2.2 Wave Propagation. 2.3 Dispersion in Single-Mode Fibers. 2.4 Dispersion-Induced Limitations. 2.5 Fiber Losses. 2.6 Nonlinear Optical Effects. 2.7 Fiber Design and Fabrication. Problems. References. 3 Optical Transmitters. 3.1 Semiconductor Laser Physics. 3.2 Single-Mode Semiconductor Lasers. 3.3 Laser Characteristics. 3.4 Optical Signal Generation. 3.5 Light-Emitting Diodes. 3.6 Transmitter Design. Problems. References. 4 Optical Receivers. 4.1 Basic Concepts. 4.2 Common Photodetectors. 4.3 Receiver Design. 4.4 Receiver Noise. 4.5 Coherent Detection. 4.6 Receiver Sensitivity. 4.7 Sensitivity Degradation. 4.8 Receiver Performance. Problems. References. 5 Lightwave Systems. 5.1 System Architectures. 5.2 Design Guidelines. 5.3 Long-Haul Systems. 5.4 Sources of Power Penalty. 5.5 Forward Error Correction. 5.6 Computer-Aided Design. Problems. References. 6 Multichannel Systems. 6.1 WDM Lightwave Systems. 6.2 WDM Components. 6.3 System Performance Issues. 6.4 Time-Division Multiplexing. 6.5 Subcarrier Multiplexing. 6.6 Code-Division Multiplexing. Problems. References. 7 Loss Management. 7.1 Compensation of Fiber Losses. 7.2 Erbium-Doped Fiber Amplifiers. 7.3 Raman Amplifiers. 7.4 Optical Signal-To-Noise Ratio. 7.5 Electrical Signal-To-Noise Ratio. 7.6 Receiver Sensitivity and Q Factor. 7.7 Role of Dispersive and Nonlinear Effects. 7.8 Periodically Amplified Lightwave Systems. Problems. References. 8 Dispersion Management. 8.1 Dispersion Problem and Its Solution. 8.2 Dispersion-Compensating Fibers. 8.3 Fiber Bragg Gratings. 8.4 Dispersion-Equalizing Filters. 8.5 Optical Phase Conjugation. 8.6 Channels at High Bit Rates. 8.7 Electronic Dispersion Compensation. Problems. References. 9 Control of Nonlinear Effects. 9.1 Impact of Fiber Nonlinearity. 9.2 Solitons in Optical Fibers. 9.3 Dispersion-Managed Solitons. 9.4 Pseudo-linear Lightwave Systems. 9.5 Control of Intrachannel Nonlinear Effects. Problems. References. 10 Advanced Lightwave Systems. 10.1 Advanced Modulation Formats. 10.2 Demodulation Schemes. 10.3 Shot Noise and Bit-Error Rate. 10.4 Sensitivity Degradation Mechanisms. 10.5 Impact of Nonlinear Effects. 10.6 Recent Progress. 10.7 Ultimate Channel Capacity. Problems. References. 11 Optical Signal Processing. 11.1 Nonlinear Techniques and Devices. 11.2 All-Optical Flip-Flops. 11.3 Wavelength Converters. 11.4 Ultrafast Optical Switching. 11.5 Optical Regenerators. Problems. References. A System of Units. B Acronyms. C General Formula for Pulse Broadening. D Software Package.
253,A Mathematical Theory of Communication,paper,2006,"This paper opened the new area the information theory. Before this paper, most people believed that the only way to make the error probability of transmission as small as desired is to reduce the data rate (such as a long repetition scheme). However, surprisingly this paper revealed that it does not need to reduce the data rate for achieving that much of small errors. It proved that we can get some positive data rate that has the same small error probability and also there is an upper bound of the data rate, which means we cannot achieve the data rate with any encoding scheme that has small enough error probability over the upper bound."
254,A mathematical theory of communication,paper,1948-07-01,"In this final installment of the paper we consider the case where the signals or the messages or both are continuously variable, in contrast with the discrete nature assumed until now. To a considerable extent the continuous case can be obtained through a limiting process from the discrete case by dividing the continuum of messages and signals into a large but finite number of small regions and calculating the various parameters involved on a discrete basis. As the size of the regions is decreased these parameters in general approach as limits the proper values for the continuous case. There are, however, a few new effects that appear and also a general change of emphasis in the direction of specialization of the general results to particular cases."
255,The mathematical theory of communication,paper,1950-09-01,"Scientific knowledge grows at a phenomenal pace--but few books have had as lasting an impact or played as important a role in our modern world as The Mathematical Theory of Communication, published originally as a paper on communication theory more than fifty years ago. Republished in book form shortly thereafter, it has since gone through four hardcover and sixteen paperback printings. It is a revolutionary work, astounding in its foresight and contemporaneity. The University of Illinois Press is pleased and honored to issue this commemorative reprinting of a classic."
256,An adversarial collaboration protocol for testing contrasting predictions of global neuronal workspace and integrated information theory,paper,2023-02-10,"The relationship between conscious experience and brain activity has intrigued scientists and philosophers for centuries. In the last decades, several theories have suggested different accounts for these relationships. These theories have developed in parallel, with little to no cross-talk among them. To advance research on consciousness, we established an adversarial collaboration between proponents of two of the major theories in the field, Global Neuronal Workspace and Integrated Information Theory. Together, we devised and preregistered two experiments that test contrasting predictions of these theories concerning the location and timing of correlates of visual consciousness, which have been endorsed by the theories’ proponents. Predicted outcomes should either support, refute, or challenge these theories. Six theory-impartial laboratories will follow the study protocol specified here, using three complementary methods: Functional Magnetic Resonance Imaging (fMRI), Magneto-Electroencephalography (M-EEG), and intracranial electroencephalography (iEEG). The study protocol will include built-in replications, both between labs and within datasets. Through this ambitious undertaking, we hope to provide decisive evidence in favor or against the two theories and clarify the footprints of conscious visual perception in the human brain, while also providing an innovative model of large-scale, collaborative, and open science practice."
257,An adversarial collaboration to critically evaluate theories of consciousness,paper,2023-06-29,"Different theories explain how subjective experience arises from brain activity1,2. These theories have independently accrued evidence, yet, confirmation bias and dependence on design choices hamper progress in the field3. Here, we present an open science adversarial collaboration which directly juxtaposes Integrated Information Theory (IIT)4,5 and Global Neuronal Workspace Theory (GNWT)6–10, employing a theory-neutral consortium approach11,12. We investigate neural correlates of the content and duration of visual experience. The theory proponents and the consortium developed and preregistered the experimental design, divergent predictions, expected outcomes, and their interpretation12. 256 human subjects viewed suprathreshold stimuli for variable durations while neural activity was measured with functional magnetic resonance imaging, magnetoencephalography, and electrocorticography. We find information about conscious content in visual, ventro-temporal and inferior frontal cortex, with sustained responses in occipital and lateral temporal cortex reflecting stimulus duration, and content-specific synchronization between frontal and early visual areas. These results confirm some predictions of IIT and GNWT, while substantially challenging both theories: for IIT, a lack of sustained synchronization within posterior cortex contradicts the claim that network connectivity specifies consciousness. GNWT is challenged by the general lack of ignition at stimulus offset and limited representation of certain conscious dimensions in prefrontal cortex. Beyond challenging the theories themselves, we present an alternative approach to advance cognitive neuroscience through a principled, theory-driven, collaborative effort. We highlight the challenges to change people’s mind 13 and the need for a quantitative framework integrating evidence for systematic theory testing and building."
258,Exploring Gender Bias in Six Key Domains of Academic Science: An Adversarial Collaboration,paper,2023-04-26,"We synthesized the vast, contradictory scholarly literature on gender bias in academic science from 2000 to 2020. In the most prestigious journals and media outlets, which influence many people’s opinions about sexism, bias is frequently portrayed as an omnipresent factor limiting women’s progress in the tenure-track academy. Claims and counterclaims regarding the presence or absence of sexism span a range of evaluation contexts. Our approach relied on a combination of meta-analysis and analytic dissection. We evaluated the empirical evidence for gender bias in six key contexts in the tenure-track academy: (a) tenure-track hiring, (b) grant funding, (c) teaching ratings, (d) journal acceptances, (e) salaries, and (f) recommendation letters. We also explored the gender gap in a seventh area, journal productivity, because it can moderate bias in other contexts. We focused on these specific domains, in which sexism has most often been alleged to be pervasive, because they represent important types of evaluation, and the extensive research corpus within these domains provides sufficient quantitative data for comprehensive analysis. Contrary to the omnipresent claims of sexism in these domains appearing in top journals and the media, our findings show that tenure-track women are at parity with tenure-track men in three domains (grant funding, journal acceptances, and recommendation letters) and are advantaged over men in a fourth domain (hiring). For teaching ratings and salaries, we found evidence of bias against women; although gender gaps in salary were much smaller than often claimed, they were nevertheless concerning. Even in the four domains in which we failed to find evidence of sexism disadvantaging women, we nevertheless acknowledge that broad societal structural factors may still impede women’s advancement in academic science. Given the substantial resources directed toward reducing gender bias in academic science, it is imperative to develop a clear understanding of when and where such efforts are justified and of how resources can best be directed to mitigate sexism when and where it exists."
259,Accelerating scientific progress through Bayesian adversarial collaboration,paper,2023-09-01,"Adversarial collaboration has been championed as the gold standard for resolving scientific disputes but has gained relatively limited traction in neuroscience and allied fields. In this perspective, we argue that adversarial collaborative research has been stymied by an overly restrictive concern with the falsification of scientific theories. We advocate instead for a more expansive view that frames adversarial collaboration in terms of Bayesian belief updating, model comparison, and evidence accumulation. This framework broadens the scope of adversarial collaboration to accommodate a wide range of informative (but not necessarily definitive) studies while affording the requisite formal tools to guide experimental design and data analysis in the adversarial setting. We provide worked examples that demonstrate how these tools can be deployed to score theoretical models in terms of a common metric of evidence, thereby furnishing a means of tracking the amount of empirical support garnered by competing theories over time."
260,"Strategies, debates, and adversarial collaboration in working memory: The 51st Bartlett Lecture",paper,2023-08-01,"Frederic Bartlett championed the importance of individual strategy differences when remembering details of events. I will describe how long-running theoretical debates in the area of working memory may be resolved by considering differences across participants in the strategies that they use when performing cognitive tasks, and through adversarial collaboration between rival laboratories. In common with the established view within experimental cognitive psychology, I assume that adults have a range of cognitive functions, evolved for everyday life. However, I will present evidence showing that these functions can be engaged selectively for laboratory tasks, and that how they are deployed may differ between and within individuals for the same task. Reliance on aggregate data, while treating inter- and intra-participant variability in data patterns as statistical noise, may lead to misleading conclusions about theoretical principles of cognition, and of working memory in particular. Moreover, different theoretical perspectives may be focused on different levels of explanation and different theoretical goals rather than being mutually incompatible. Yet researchers from contrasting theoretical frameworks pursue science as a competition, rarely do researchers from competing labs work in collaboration, and debates self-perpetuate. These approaches to research can stall debate resolution and generate ever-increasing scientific diversity rather than scientific progress. The article concludes by describing a recent extended adversarial collaboration (the WoMAAC project) focused on theoretical contrasts in working memory, and illustrates how this approach to conducting research may help resolve scientific debate and facilitate scientific advance."
261,Review of How Philosophers Argue: An Adversarial Collaboration on the Russell-Copleston Debate,paper,2023-03-28,This article reviews Fernando Leal and Hubert Marraud’s How Philosopher’s Argue: An Adversarial Collaboration on the Russell-Copleston Debate  (Springer 2022).
262,Task-Agnostic Graph Neural Network Evaluation via Adversarial Collaboration,paper,2023-01-27,"It has been increasingly demanding to develop reliable methods to evaluate the progress of Graph Neural Network (GNN) research for molecular representation learning. Existing GNN benchmarking methods for molecular representation learning focus on comparing the GNNs' performances on some node/graph classification/regression tasks on certain datasets. However, there lacks a principled, task-agnostic method to directly compare two GNNs. Additionally, most of the existing self-supervised learning works incorporate handcrafted augmentations to the data, which has several severe difficulties to be applied on graphs due to their unique characteristics. To address the aforementioned issues, we propose GraphAC (Graph Adversarial Collaboration) -- a conceptually novel, principled, task-agnostic, and stable framework for evaluating GNNs through contrastive self-supervision. We introduce a novel objective function: the Competitive Barlow Twins, that allow two GNNs to jointly update themselves from direct competitions against each other. GraphAC succeeds in distinguishing GNNs of different expressiveness across various aspects, and has demonstrated to be a principled and reliable GNN evaluation method, without necessitating any augmentations."
263,Fake news detection using machine learning: an adversarial collaboration approach,paper,2023-10-11,"PurposePurveyors of fake news perpetuate information that can harm society, including businesses. Social media's reach quickly amplifies distortions of fake news. Research has not yet fully explored the mechanisms of such adversarial behavior or the adversarial techniques of machine learning that might be deployed to detect fake news. Debiasing techniques are also explored to combat against the generation of fake news using adversarial data. The purpose of this paper is to present the challenges and opportunities in fake news detection.Design/methodology/approachFirst, this paper provides an overview of adversarial behaviors and current machine learning techniques. Next, it describes the use of long short-term memory (LSTM) to identify fake news in a corpus of articles. Finally, it presents the novel adversarial behavior approach to protect targeted business datasets from attacks.FindingsThis research highlights the need for a corpus of fake news that can be used to evaluate classification methods. Adversarial debiasing using IBM's Artificial Intelligence Fairness 360 (AIF360) toolkit can improve the disparate impact of unfavorable characteristics of a dataset. Debiasing also demonstrates significant potential to reduce fake news generation based on the inherent bias in the data. These findings provide avenues for further research on adversarial collaboration and robust information systems.Originality/valueAdversarial debiasing of datasets demonstrates that by reducing bias related to protected attributes, such as sex, race and age, businesses can reduce the potential of exploitation to generate fake news through adversarial data."
264,Towards Detecting LLMs Hallucination via Markov Chain-based Multi-agent Debate Framework,paper,2024-06-05,"The advent of large language models (LLMs) has facilitated the development of natural language text generation. It also poses unprecedented challenges, with content hallucination emerging as a significant concern. Existing solutions often involve expensive and complex interventions during the training process. Moreover, some approaches emphasize problem disassembly while neglecting the crucial validation process, leading to performance degradation or limited applications. To overcome these limitations, we propose a Markov Chain-based multi-agent debate verification framework to enhance hallucination detection accuracy in concise claims. Our method integrates the fact-checking process, including claim detection, evidence retrieval, and multi-agent verification. In the verification stage, we deploy multiple agents through flexible Markov Chain-based debates to validate individual claims, ensuring meticulous verification outcomes. Experimental results across three generative tasks demonstrate that our approach achieves significant improvements over baselines."
265,Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate,paper,2023-05-30,"Modern large language models (LLMs) like ChatGPT have shown remarkable performance on general language tasks but still struggle on complex reasoning tasks, which drives the research on cognitive behaviors of LLMs to explore human-like problem-solving strategies. Along this direction, one representative strategy is self-reflection, which asks an LLM to refine the solution with the feedback generated by itself iteratively. However, our study shows that such reflection-style methods suffer from the Degeneration-of-Thought (DoT) problem: once the LLM has established confidence in its solutions, it is unable to generate novel thoughts later through reflection even if its initial stance is incorrect. To address the DoT problem, we propose a Multi-Agent Debate (MAD) framework, in which multiple agents express their arguments in the state of""tit for tat""and a judge manages the debate process to obtain a final solution. Clearly, our MAD framework encourages divergent thinking in LLMs which would be helpful for tasks that require deep levels of contemplation. Experiment results on two challenging datasets, commonsense machine translation and counter-intuitive arithmetic reasoning, demonstrate the effectiveness of our MAD framework. Extensive analyses suggest that the adaptive break of debate and the modest level of""tit for tat""state are required for MAD to obtain good performance. Moreover, we find that LLMs might not be a fair judge if different LLMs are used for agents. Code is available at https://github.com/Skytliang/Multi-Agents-Debate."
266,ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate,paper,2023-08-14,"Text evaluation has historically posed significant challenges, often demanding substantial labor and time cost. With the emergence of large language models (LLMs), researchers have explored LLMs' potential as alternatives for human evaluation. While these single-agent-based approaches show promise, experimental results suggest that further advancements are needed to bridge the gap between their current effectiveness and human-level evaluation quality. Recognizing that best practices of human evaluation processes often involve multiple human annotators collaborating in the evaluation, we resort to a multi-agent debate framework, moving beyond single-agent prompting strategies. The multi-agent-based approach enables a group of LLMs to synergize with an array of intelligent counterparts, harnessing their distinct capabilities and expertise to enhance efficiency and effectiveness in handling intricate tasks. In this paper, we construct a multi-agent referee team called ChatEval to autonomously discuss and evaluate the quality of generated responses from different models on open-ended questions and traditional natural language generation (NLG) tasks. Our analysis shows that ChatEval transcends mere textual scoring, offering a human-mimicking evaluation process for reliable assessments. Our code is available at https://github.com/chanchimin/ChatEval."
267,Learning to Break: Knowledge-Enhanced Reasoning in Multi-Agent Debate System,paper,2023-12-08,"Multi-agent debate system (MAD) imitating the process of human discussion in pursuit of truth, aims to align the correct cognition of different agents for the optimal solution. It is challenging to make various agents perform right and highly consistent cognition due to their limited and different knowledge backgrounds (i.e., cognitive islands), which hinders the search for the optimal solution. To address the challenge, we propose a novel \underline{M}ulti-\underline{A}gent \underline{D}ebate with \underline{K}nowledge-\underline{E}nhanced framework (\textbf{MADKE}) to promote the system to find the solution. First, we involve a shared retrieval knowledge pool in the debate process to solve the problem of limited and different knowledge backgrounds. Then, we propose an adaptive knowledge selection method to guarantee the accuracy and personalization of knowledge. This method allows agents to choose whether to use external knowledge in each conversation round according to their own needs. Our experimental results on six datasets show that our method achieves state-of-the-art results compared to existing single-agent and multi-agent methods. Further analysis reveals that the introduction of retrieval knowledge can help the agent to break cognitive islands in the debate process and effectively improve the consistency and correctness of the model. Moreover, MADKE using Qwen1.5-72B-Chat surpasses GPT-4 by +1.26\% on average in six datasets, which validates that our method can help open-source LLMs achieve or even surpass the performance of GPT-4. Our code is available at \url{https://github.com/FutureForMe/MADKE}."
268,Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate,paper,2024-02-12,"Fact-checking research has extensively explored verification but less so the generation of natural-language explanations, crucial for user trust. While Large Language Models (LLMs) excel in text generation, their capability for producing faithful explanations in fact-checking remains underexamined. Our study investigates LLMs' ability to generate such explanations, finding that zero-shot prompts often result in unfaithfulness. To address these challenges, we propose the Multi-Agent Debate Refinement (MADR) framework, leveraging multiple LLMs as agents with diverse roles in an iterative refining process aimed at enhancing faithfulness in generated explanations. MADR ensures that the final explanation undergoes rigorous validation, significantly reducing the likelihood of unfaithful elements and aligning closely with the provided evidence. Experimental results demonstrate that MADR significantly improves the faithfulness of LLM-generated explanations to the evidence, advancing the credibility and trustworthiness of these explanations."
269,Improving Multi-Agent Debate with Sparse Communication Topology,paper,2024-06-17,"Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. While various role-playing strategies in multi-agent debates have been explored, in terms of the communication among agents, existing approaches adopt a brute force algorithm -- each agent can communicate with all other agents. In this paper, we systematically investigate the effect of communication connectivity in multi-agent systems. Our experiments on GPT and Mistral models reveal that multi-agent debates leveraging sparse communication topology can achieve comparable or superior performance while significantly reducing computational costs. Furthermore, we extend the multi-agent debate framework to multimodal reasoning and alignment labeling tasks, showcasing its broad applicability and effectiveness. Our findings underscore the importance of communication connectivity on enhancing the efficiency and effectiveness of the""society of minds""approach."
270,Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate,paper,2024-08-08,"Competitive debate is a complex task of computational argumentation. Large Language Models (LLMs) suffer from hallucinations and lack competitiveness in this field. To address these challenges, we introduce Agent for Debate (Agent4Debate), a dynamic multi-agent framework based on LLMs designed to enhance their capabilities in competitive debate. Drawing inspiration from human behavior in debate preparation and execution, Agent4Debate employs a collaborative architecture where four specialized agents, involving Searcher, Analyzer, Writer, and Reviewer, dynamically interact and cooperate. These agents work throughout the debate process, covering multiple stages from initial research and argument formulation to rebuttal and summary. To comprehensively evaluate framework performance, we construct the Competitive Debate Arena, comprising 66 carefully selected Chinese debate motions. We recruit ten experienced human debaters and collect records of 200 debates involving Agent4Debate, baseline models, and humans. The evaluation employs the Debatrix automatic scoring system and professional human reviewers based on the established Debatrix-Elo and Human-Elo ranking. Experimental results indicate that the state-of-the-art Agent4Debate exhibits capabilities comparable to those of humans. Furthermore, ablation studies demonstrate the effectiveness of each component in the agent structure."
271,Unlocking Varied Perspectives: A Persona-Based Multi-Agent Framework with Debate-Driven Text Planning for Argument Generation,paper,2024-06-28,"Writing persuasive arguments is a challenging task for both humans and machines. It entails incorporating high-level beliefs from various perspectives on the topic, along with deliberate reasoning and planning to construct a coherent narrative. Current language models often generate surface tokens autoregressively, lacking explicit integration of these underlying controls, resulting in limited output diversity and coherence. In this work, we propose a persona-based multi-agent framework for argument writing. Inspired by the human debate, we first assign each agent a persona representing its high-level beliefs from a unique perspective, and then design an agent interaction process so that the agents can collaboratively debate and discuss the idea to form an overall plan for argument writing. Such debate process enables fluid and nonlinear development of ideas. We evaluate our framework on argumentative essay writing. The results show that our framework can generate more diverse and persuasive arguments through both automatic and human evaluations."
272,Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate,paper,2024-01-30,"Despite the utility of Large Language Models (LLMs) across a wide range of tasks and scenarios, developing a method for reliably evaluating LLMs across varied contexts continues to be challenging. Modern evaluation approaches often use LLMs to assess responses generated by LLMs. However, the meta-evaluation conducted to assess the effectiveness of these LLMs as evaluators is typically constrained by the coverage of existing benchmarks or requires extensive human annotation. This underscores the urgency of methods for scalable meta-evaluation that can effectively, reliably, and efficiently evaluate the performance of LLMs as evaluators across diverse tasks and scenarios, particularly in potentially new, user-defined scenarios. To fill this gap, we propose ScaleEval, an agent-debate-assisted meta-evaluation framework that leverages the capabilities of multiple communicative LLM agents. This framework supports multi-round discussions to assist human annotators in discerning the most capable LLMs as evaluators, which significantly eases their workload in cases that used to require large-scale annotations during meta-evaluation. We release the code for our framework, which is publicly available at: \url{https://github.com/GAIR-NLP/scaleeval}."
273,Prioritization of Vaccines for Inclusion into China’s Expanded Program on Immunization: Evidence from Experts’ Knowledge and Opinions,paper,2022-06-24,"Background: Vaccine developers in China have made an increasing number of infectious diseases preventable through vaccination. An appropriate decision-making procedure is necessary for making wise decisions on whether to introduce new vaccines into the Expanded Program on Immunization (EPI). When there are several vaccines that could potentially be considered, a scientifically justifiable mechanism is needed for prioritizing and sequencing vaccines for consideration. Methods: We used a modified Delphi technique (MDT) to develop and refine an indicator system to prioritize vaccines and make policy recommendations concerning their introduction into China’s EPI system. From January through May 2021, thirty-nine experts were recruited and participated in a two-round Delphi survey that was based on a set of candidate indicators obtained through a literature review and reference to the WHO vaccine introduction recommendations. Using the resulting indicator system, we conducted a third consultation with a multi-disciplinary group of experts who scored five program-eligible candidate vaccines to determine prioritization and sequencing for consideration of inclusion into the EPI. Results: Response rates of the thirty-nine experts were 100% and 97.4% across the two rounds. Authority coefficients from rounds one to three were over 0.70, reflecting the high accuracy and reliability of the consultation. Coordination coefficients of importance scores for primary, secondary, and tertiary indicators were 0.486, 0.356, 0.275 in round one, and 0.405, 0.340, and 0.236 in round two. According to the scores from 30 experts using our indicator system, the sequence and scores (1–10 scale, 10 highest) of 5 candidate vaccines were varicella (6.91), meningococcal conjugate AC (6.83), Hib (6.74), influenza (6.56), and EV71 (6.17) vaccines. Conclusions: A modified Delphi technique effectively built a scientific, rational, comprehensive, and systematic indicator system for prioritizing vaccine candidates for consideration of inclusion into the EPI. The rank order will be used by the technical working groups of China’s National Immunization Advisory Committee to sequentially develop and present Evidence-to-Recommendation tables for making policy recommendations."
274,Treatment and reflection of a case of complete rupture in an implanted intravenous infusion port under multidisciplinary cooperation,paper,2019-09-11,"Objective 
To explore the safety management of implantable venous infusion port, prevent and reduce the occurrence of catheter rupture and other related complications, and implement effective treatment measures after occurrence. 
 
 
Methods 
A patient with an implantable venous transfusion port suffered from complete rupture of the catheter outside the hospital. Under multidisciplinary consultation, the condition of the catheter inside the port was clarified, and a safe treatment plan was worked out. The multidisciplinary venous transfusion treatment team cooperated with each other to correctly implement the capture, catching and nursing in vivo. 
 
 
Results 
With the cooperation of multidisciplinary team, the broken port and catheter were successfully and safely removed without any discomfort. 
 
 
Conclusions 
Establishing a multi-disciplinary cooperation mechanism, standardizing the quality control of implantation in infusion port, popularizing the knowledge of post-implantation maintenance and implementing the safety management of infusion port can ensure the safe and long-term application of implanted intravenous infusion port. 
 
 
Key words: 
Multidisciplinary; Infusion port; Catheter; Fracture; Treatment"
275,"P157 Sexual Health & Contraception: developing a one stop shop service using a collaborative approach between a Local Authority, Acute and Community Trusts",paper,2016-06-01,"Background/introduction This city on the South East coast has a high proportion of young people/LGBT with some of the highest STI/HIV rates in England (2013 gonorrhoea 162.1/100,000; HIV prevalence 8/1,000). GUM & contraception services were historically provided by two separate NHS Trusts. Transfer of public health responsibility to the Local Authority (LA) in 2013 led to service review. Aim(s)/objectives To deliver an efficient and accessible multi-disciplinary sexual health and contraception service. Methods City-wide public consultation favoured a one-stop-shop integrated service. Pathway Analytics© sexual health tariff was accepted by LA/providers as a transparent & fair payment mechanism. Following legal advice LA gave the commissioner permission to negotiate a new contract with existing providers, moving to a competitive tender process if unsuccessful. Results The contract was awarded to existing providers in April 2015. The local Sexual Health Programme Board ensured all stakeholders were engaged in service review. A staged approach was followed to deliver an integrated service. The tariff was introduced allowing fair remuneration for combined services at diverse sites across the city. Trusts have established a steering group to ensure safe governance across legal, financial & clinical frameworks & robust risk management processes across both organisations. Discussion/conclusion Innovative thinking by the LA allowed service re-design by negotiation with existing providers avoiding a competitive tender process. Good working relationships within the sexual health network allowed a collaborative approach to service improvement. Despite the challenges of two Trusts working together with different organisational accountabilities, a ‘one-stop-shop model’ has been successfully introduced without destabilising HIV services."
276,Adapting patients' oncological treatment through remote participation of general practitioners in multi-disciplinary consultation meetings: A feasibility study,paper,2022-02-18,"Abstract Background The general practitioner (GP) is central to managing patients with cancer, whose numbers are increasing worldwide. The GP’s involvement requires better coordination between involved partners, in particular oncologists and GPs. Objectives To conduct a feasibility study of remote participation of GPs in multi-disciplinary consultation meetings (MCMs). We analysed participation, participants’ satisfaction, and their impact on therapeutic decisions. Methods We conducted a feasibility study in the regional cancer centre of Toulouse, France. All patient cases discussed in the MCMs for myelodysplasia from 1 January to 31 March 2016 were included. Cases of patients aged over 18 years, with a diagnosis of myelodysplasia and registered with a GP were included if patients gave informed consent. One investigator collected the data provided by GPs during three telephone or video calls: before, during, and after the MCM, respectively. Results Of 86 patient cases discussed during three months of MCMs, 44 were eligible for GP participation; 27 GPs participated in discussions of 27 patient cases. The GP’s participation in the MCM led to a change in management in five cases, with four times treatment intensifications and once de-intensification. Medical, social, family-related, and psychological domains were discussed with input from the GPs. Overall, all participants were satisfied with the MCMs. Conclusion Remote participation of GPs in MCMs is feasible and may result in adapting oncological and haematological management for patients. This patient-centred approach requires a specific organisation that, when implemented, satisfies the needs of all participants."
277,Cost-Effectiveness of a Multi-Disciplinary Emergency Consultation System for Suicide Attempts by Drug Overdose in Young People and Adult Populations,paper,2021-02-26,"The purpose of this study was to compare the characteristics of suicide attempts by drug overdose between young people and adults, and evaluate the cost-effectiveness of a multi-disciplinary emergency consultation system (MECS) for suicide attempters with drug overdose. It was verified by comparing and analyzing data from June 1, 2017 to May 31, 2018 (before the MECS was implemented; pre-MECS), and from June 1, 2018 to May 31, 2019 (after the MECS was implemented; post-MECS). The data were retrospectively reviewed for a total of 251 such patients with suicide attempts by drug overdose who visited the emergency room of a university hospital in Seoul during the period. The young people group were shown to be more likely to use painkillers and less likely to use psychoactive drugs for a suicide attempt (p < 0.01), had more unplanned attempts than planned ones (p < 0.01), and had lower levels of intentionality for suicide (p = 0.04) and of suicide lethality (p = 0.02), compared to the adult group. We defined suicide attempts as being “serious” when there was both high intentionality and lethality. On this basis, the young people group had less serious suicide attempts, compared to the adult group (p = 0.02). Young people in the post-MECS group had lower intensive care unit (ICU) costs (p = 0.01) and lower costs in the 6-months after the suicide attempt (p = 0.02) compared to those in the pre-MECS group. Young people, both with serious (p < 0.01) and non-serious attempts (p < 0.01) in the post-MECS group had lower ICU costs compared to those in the pre-MECS group. Adults with non-serious attempts in the post-MECS group had lower ICU costs (p < 0.01) compared to those in the pre-MECS group. Therefore, it can be concluded that fast and precise cooperation from the multidisciplinary departments for patients who attempted suicide by drug overdose reduced unnecessary ICU treatment and costs, especially in young attempters and those with lower levels of intentionality and lethality."
278,[Research status and reflection of the mechanism of TCM manipulation in the treatment of cervical spondylosis under the background of multi-disciplinary intersection].,paper,2024-07-25,"The study of TCM manipulation's mechanism is the key scientific issue in the current manipulation research. It is the key and difficult point on the road of modernization and internationalization of Chinese orthopedics and traumatology. Meanwhile, it is also an important way to clarify systematically the scientific connotation of TCM manipulation. At present, our country is in an important period when multi-disciplinary intersection lead knowledge production, scientific innovation, and discipline development. The trend of cross-innovation between Chinese orthopedics and traumatology and other disciplines provides the carrier and method for the study of TCM manipulation's mechanism. Cervical spondylosis is the traditional dominant disease of Chinese orthopedics and traumatology. In recent years, many scholars have applied multi-disciplinary techniques and theories to explore the mechanism of TCM manipulation by focusing on the four dimensions of muscle, bone, blood vessel and nerve. The article takes the treatment of cervical spondylosis by TCM manipulation as the research entry point, and integrates the application status and implementation strategies of various techniques and theories under the background of multi-disciplinary intersection, which is conducive to the better combination, innovation and transformation of Chinese orthopedics and traumatology with other disciplines, and provides ideas and references for systematically clarifying the scientific connotation of TCM manipulation."
279,Multi-disciplinary geophysical investigation to identify road failure mechanism,paper,2021-08-29,"Summary Case study investigation failure mechanism of a road in Derbyshire (UK). The road is located on a ridge, with steep slopes on either side as well as known historical mineworkings in close vicinity. A multi-disciplinary geophysical survey was executed to identify the potential failure mechanism after cracks in the road surface were observed. The combined interpretation of all techniques provided a detailed image of the subsurface allowing identification of the most likely failure mechanism, which will be used in design of a remediation strategy."
280,Treating Trauma- Evaluation of a multi-disciplinary psychiatry service for patients post major trauma,paper,2023-03-01,"Introduction Research has shown 30-40 % of people who have experienced traumatic injury are at risk of developing mental illness. Some injuries may be the result of mental ill-health, including self-inflicted injury. Furthermore, the development of psychopathology after injury appears to be a major determinant of long term disability. Early intervention can reduce symptom severity and prevent development of mental illness. Ireland’s National Trauma System Implementation Programme, announced in April 2021, highlights the need for screening for mental disorders. The Mater Misericordiae University Hospital (MMUH) is designated as one of two national Major Trauma Centres in Ireland. Its trauma service will expand with an expectation of an additional 450- 500 major trauma patients over the next three years. The Consultation Liaison Psychiatry Service (CLP) currently provides expert mental health input to medical and surgical teams, in managing a range of patients with mental illnesses or psychological difficulties, including those with experience of major trauma. Objectives To examine the current mental health service provision for trauma patients over a six-month period. We aimed to identify areas of need to inform future development of a psychiatry-led MDT service for trauma patients. Methods A review of all patients admitted on the MMUH trauma pathway between January 2021 and June 2021 was performed. The following data were recorded: demographics, mechanism of injury and information on referrals to the liaison psychiatry service. Results There were 105 trauma cases over the six-month period; 46 females and 59 males. The mean age was 58.4 years (SD 22.16). Twelve individuals were recorded as ‘No Fixed Abode’ or living in homeless accommodation(11.4%). In terms of mechanism of injury; 20 were assaulted of which 8 were stabbing/ knife injuries. There were 65 falls and 12 road traffic accidents. In 3 cases (2.8%), the mechanism of injury was self-inflicted. Twenty patients were admitted to critical care (19%). Of the 105 trauma patients, 19 (18%) were referred to CLP service; 2 (10.5%) were seen in the outpatient setting, the rest as inpatients (89.5%). At least one repeat review was indicated in 10 of the 19 patients (52.6%). Conclusions Trauma patients have a high rate of comorbid mental illness. Nearly 1/5 are currently referred to the CLP service, which is likely an underestimation of the actual burden of mental health disorders and could be explained by the lack of dedicated services. The liaison psychiatry team provides valuable input into the multidisciplinary care of trauma patients and the demand for its services is likely to increase with the expansion under the Major Trauma Strategy for Ireland. Disclosure of Interest None Declared"
