entity name,entity type,timestamp,description
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning,paper,,This paper presents a framework for leveraging large language models in the medical domain for zero-shot medical reasoning.
zero-shot setting,method,,A training-free approach focused on real-world application for evaluating LLMs' reasoning abilities.
"MedQA, MedMCQA, PubMedQA, and six subtasks from MMLU",domain,,Datasets used to establish the effectiveness of the proposed MC framework.
domains-specific terminologies,problem,,Unique medical and healthcare terminology that large language models struggle to process and understand effectively.
Large language models encode clinical knowledge,paper,2023,
Medchatzh: a better medical adviser learns from better instructions,paper,2023,
Disc-medllm: Bridging general large language models and realworld medical consultation,paper,2023,
Capabilities of gpt-4 on medical challenge problems,paper,2023,
Evaluation of the performance of gpt-3.5 and gpt-4 on the medical final examination,paper,2023,
Medalpaca - an open-source collection of medical conversational ai models and training data,paper,2023,
Analysis of large-language model versus human performance for genetics questions,paper,2023,
Genegpt: Augmenting large language models with domain tools for improved access to biomedical information,paper,2023,
Pharmacygpt: The ai pharmacist,paper,2023,
Aligning factual consistency for clinical studies summarization through reinforcement learning,paper,2023,
Evaluating large language models on medical evidence summarization,paper,2023,
"Summarizing, simplifying, and synthesizing medical evidence using gpt-3 (with varying success)",paper,2023,
Med-halt: Medical domain hallucination test for large language models,paper,2023,
GeneGPT,method,,guided LLMs to leverage the Web APIs of the National Center for Biotechnology Information (NCBI) to meet various biomedical information needs
Almanac,method,,a framework that is augmented with retrieval capabilities for medical guidelines and treatment recommendations
Almanac: Retrieval-augmented language models for clinical medicine,paper,2023,
KARD,method,,a method to improve small LMs on specific domain knowledge by finetuning small LMs on the rationales generated from LLMs and augmenting small LMs with external knowledge from a non-parametric memory
Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks,paper,2023,
external tools,method,,tools used to acquire additional information for clinical reasoning
Current instruction tuning research,method,,predominantly leverages external clinical knowledge bases and self-prompted data to obtain instruction datasets
instruction datasets,method,,datasets obtained from external clinical knowledge bases and self-prompted data
Towards expert-level medical question answering with large language models,paper,2023,
MedChatZH: A better medical adviser learns from better instructions,paper,2023,
medical and biomedical literature datasets,method,,"datasets collected from medical and biomedical literature, fine-tuned with specialized or openended instruction data"
traditional Chinese medicine datasets,method,,datasets focusing on traditional Chinese medicine to enhance proficiency in that area
"large-scale, diverse medical instruction data",method,,large and varied datasets utilized to improve medical proficiency
role-playing,method,,
negotiation,method,,"Negotiation is mentioned as a method incorporated into multi-agent collaboration to improve performance. Specifically, it is used as part of adversarial collaboration among multiple agents."
Solo Performance Prompting (SPP),method,,combines the strengths of multiple minds to improve performance by dynamically identifying and engaging multiple personas throughout task solving
Camel,method,,leverages role-playing to enable chat agents to communicate with each other for task completion
Large Language Models in Medical Domains,domain,,"Large language models (LLMs) face significant barriers in medicine and healthcare, including challenges like domain-specific terminologies and reasoning over specialized knowledge. Despite their remarkable progress across various general domains, these unique challenges need to be addressed to effectively leverage LLMs in medical contexts."
medicine and medical reasoning,domain,,A field facing unique challenges such as domain-specific terminologies and reasoning over specialized knowledge in the context of healthcare.
enhancing medical applications of LLMs,method,,Utilizing clinical knowledge datasets and the inherent latent knowledge within LLMs to improve their performance and reasoning capabilities in medical applications without the need for additional training.
Multi-Agent Collaboration and Debate Framework,method,,"A proposed framework leveraging LLM-based agents where multiple agents engage in multi-round discussions and tit-for-tat debates, exploring cooperation and learning from multi-turn feedback."
Empirical Evaluation of ChatGPT on Requirements Information Retrieval Under Zero-Shot Setting,paper,2023-04-25,"Recently, various illustrative examples have shown the impressive ability of generative large language models (LLMs) to perform NLP related tasks. ChatGPT undoubtedly is the most representative model. We empirically evaluate ChatGPT’s performance on requirements information retrieval (IR) tasks to derive insights into designing or developing more effective requirements retrieval methods or tools based on generative LLMs. We design an evaluation framework considering four different combinations of two popular IR tasks and two common artifact types. Under zero-shot setting, evaluation results reveal ChatGPT’s promising ability to retrieve requirements relevant information (high recall) and limited ability to retrieve more specific requirements information (low precision). Our evaluation of ChatGPT on requirements IR under zero-shot setting provides preliminary evidence for designing or developing more effective requirements IR methods or tools based on generative LLMs."
Reducing Negative Effects of the Biases of Language Models in Zero-Shot Setting,paper,2023-02-27,"Pre-trained language models (PLMs) such as GPTs have been revealed to be biased towards certain target classes because of the prompt and the model's intrinsic biases. In contrast to the fully supervised scenario where there are a large number of costly labeled samples that can be used to fine-tune model parameters to correct for biases, there are no labeled samples available for the zero-shot setting. We argue that a key to calibrating the biases of a PLM on a target task in zero-shot setting lies in detecting and estimating the biases, which remains a challenge. In this paper, we first construct probing samples with the randomly generated token sequences, which are simple but effective in detecting inputs for stimulating GPTs to show the biases; and we pursue an in-depth research on the plausibility of utilizing class scores for the probing samples to reflect and estimate the biases of GPTs on a downstream target task. Furtherly, in order to effectively utilize the probing samples and thus reduce negative effects of the biases of GPTs, we propose a lightweight model Calibration Adapter (CA) along with a self-guided training strategy that carries out distribution-level optimization, which enables us to take advantage of the probing samples to fine-tune and select only the proposed CA, respectively, while keeping the PLM encoder frozen. To demonstrate the effectiveness of our study, we have conducted extensive experiments, where the results indicate that the calibration ability acquired by CA on the probing samples can be successfully transferred to reduce negative effects of the biases of GPTs on a downstream target task, and our approach can yield better performance than state-of-the-art (SOTA) models in zero-shot settings."
SBERTiment: A New Pipeline to Solve Aspect Based Sentiment Analysis in the Zero-Shot Setting,paper,2023-05-08,"The field of Natural Language Processing is gaining increased attention for the Aspect Based Sentiment Analysis task due to its ability to provide fine-grained information. This paper introduces SBERTiment, a novel approach to perform Aspect Based Sentiment Analysis. The method extracts relevant topics along with their sentiments from the input text by using a 2-step pipeline. In the first step, a token classification model is used to identify the relevant aspect terms and their sentiments. In the second step, a Sentence-BERT embedding model maps each aspect term to a predefined aspect category. Our approach has been tested on benchmark datasets and has achieved scores that are comparable to the best-performing methods. The pipeline is also able to perform zero-shot classification, which means it can extract information in unseen domains without additional training. When evaluated on a dataset with unseen aspect categories, SBERTiment achieved the best score among benchmark approaches."
Evaluating Prompts Across Multiple Choice Tasks In a Zero-Shot Setting,paper,2022-03-29,"Large language models have shown that impressive zero-shot performance can be achieved through natural language prompts (Radford et al., 2019; Brown et al., 2020; Sanh et al., 2021). Creating an effective prompt, however, requires significant trial and error. That \textit{prompts} the question: how do the qualities of a prompt effects its performance? To this end, we collect and standardize prompts from a diverse range of tasks for use with tasks they were not designed for. We then evaluate these prompts across fixed multiple choice datasets for a quantitative analysis of how certain attributes of a prompt affect performance. We find that including the choices and using prompts not used during pre-training provide significant improvements. All experiments and code can be found https://github.com/gabeorlanski/zero-shot-cross-task."
Attribute Prediction in the Zero-Shot Setting as Multiple Instance Learning,paper,2022,"Attribute-based representations help machine learning models perform tasks based on human understandable concepts, allowing a closer human-machine collaboration. However, learning attributes that accurately reﬂect the content of an image is not always straightforward, as per-image ground truth attributes are often not available. We propose applying the Multiple Instance Learning (MIL) paradigm to attribute learning (AMIL) while only using class-level labels. We allow the model to under-predict the positive attributes, which may be missing in a particular image due to occlusions or unfavorable pose, but not to over-predict the negative ones, which are almost certainly not present. We evaluate it in the zero-shot learning (ZSL) setting, where training and test classes are disjoint, and show that this also allows to proﬁt from knowledge about the semantic relatedness of attributes. In addition, we apply the MIL assumption to ZSL classiﬁcation and propose MIL-DAP, an attribute-based zero-shot classiﬁcation method, based on Direct Attribute Prediction (DAP), to evaluate attribute prediction methods when no image-level data is available for evaluation. Experiments on CUB-200-2011, SUN Attributes and AwA2 show improvements on attribute detection, attribute-based zero-shot classiﬁcation and weakly supervised part localization."
Zero-1-to-3: Zero-shot One Image to 3D Object,paper,2023-03-20,"We introduce Zero-1-to-3, a framework for changing the camera viewpoint of an object given just a single RGB image. To perform novel view synthesis in this under-constrained setting, we capitalize on the geometric priors that large-scale diffusion models learn about natural images. Our conditional diffusion model uses a synthetic dataset to learn controls of the relative camera viewpoint, which allow new images to be generated of the same object under a specified camera transformation. Even though it is trained on a synthetic dataset, our model retains a strong zero-shot generalization ability to out-of-distribution datasets as well as in-the-wild images, including impressionist paintings. Our viewpoint-conditioned diffusion approach can further be used for the task of 3D reconstruction from a single image. Qualitative and quantitative experiments show that our method significantly outperforms state-of-the-art single-view 3D reconstruction and novel view synthesis models by leveraging Internet-scale pre-training."
A Sand County Almanac and Sketches Here and There,paper,2020-05-01,"First published in 1949 and praised in The New York Times Book Review as ""a trenchant book, full of vigor and bite,"" A Sand County Almanac combines some of the finest nature writing since Thoreau with an outspoken and highly ethical regard for America's relationship to the land. Written with an unparalleled understanding of the ways of nature, the book includes a section on the monthly changes of the Wisconsin countryside; another part that gathers informal pieces written by Leopold over a forty-year period as he traveled through the woodlands of Wisconsin, Iowa, Arizona, Sonora, Oregon, Manitoba, and elsewhere; and a final section in which Leopold addresses the philosophical issues involved in wildlife conservation. As the forerunner of such important books as Annie Dillard's Pilgrim at Tinker Creek, Edward Abbey's Desert Solitaire, and Robert Finch's The Primal Place, this classic work remains as relevant today as it was forty years ago."
Almanac: Retrieval-Augmented Language Models for Clinical Medicine,paper,2023-03-01,"Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine, adoption of these models in real-world settings has been largely limited by their tendency to generate incorrect and sometimes even toxic statements. In this study, we develop Almanac, a large language model framework augmented with retrieval capabilities for medical guideline and treatment recommendations. Performance on a novel dataset of clinical scenarios (n= 130) evaluated by a panel of 5 board-certified and resident physicians demonstrates significant increases in factuality (mean of 18% at p-value < 0.05) across all specialties, with improvements in completeness and safety. Our results demonstrate the potential for large language models to be effective tools in the clinical decision-making process, while also emphasizing the importance of careful testing and deployment to mitigate their shortcomings."
Almanac: Knowledge-Grounded Language Models for Clinical Medicine,paper,2023,"Large-language models have recently demonstrated impressive zero-shot capabilities in a variety of natural language tasks such as summarization, dialogue generation, and question-answering. Despite many promising applications in clinical medicine (e.g. medical record documentation, treatment guideline-lookup), adoption of these models in real-world settings has been largely limited by their tendency to generate factually incorrect and sometimes even toxic statements. In this paper we explore the ability of large-language models to facilitate and streamline medical guidelines and recommendation referencing: by enabling these model to access external point-of-care tools in response to physician queries, we demonstrate signiﬁcantly improved factual grounding, helpfulness, and safety in a variety of clinical scenarios."
Almanac: Weak Lensing power spectra and map inference on the masked sphere,paper,2022-10-24,"We present a field-based signal extraction of weak lensing from noisy observations on the curved and masked sky. We test the analysis on a simulated Euclid-like survey, using a Euclid-like mask and noise level. To make optimal use of the information available in such a galaxy survey, we present a Bayesian method for inferring the angular power spectra of the weak lensing fields, together with an inference of the noise-cleaned tomographic weak lensing shear and convergence (projected mass) maps. The latter can be used for field-level inference with the aim of extracting cosmological parameter information including non-gaussianity of cosmic fields. We jointly infer all-sky $E$-mode and $B$-mode tomographic auto- and cross-power spectra from the masked sky, and potentially parity-violating $EB$-mode power spectra, up to a maximum multipole of $\ell_{\rm max}=2048$. We use Hamiltonian Monte Carlo sampling, inferring simultaneously the power spectra and denoised maps with a total of $\sim 16.8$ million free parameters. The main output and natural outcome is the set of samples of the posterior, which does not suffer from leakage of power from $E$ to $B$ unless reduced to point estimates. However, such point estimates of the power spectra, the mean and most likely maps, and their variances and covariances, can be computed if desired."
Almanac - Retrieval-Augmented Language Models for Clinical Medicine.,paper,2024-01-25,"BACKGROUND
Large language models (LLMs) have recently shown impressive zero-shot capabilities, whereby they can use auxiliary data, without the availability of task-specific training examples, to complete a variety of natural language tasks, such as summarization, dialogue generation, and question answering. However, despite many promising applications of LLMs in clinical medicine, adoption of these models has been limited by their tendency to generate incorrect and sometimes even harmful statements.


METHODS
We tasked a panel of eight board-certified clinicians and two health care practitioners with evaluating Almanac, an LLM framework augmented with retrieval capabilities from curated medical resources for medical guideline and treatment recommendations. The panel compared responses from Almanac and standard LLMs (ChatGPT-4, Bing, and Bard) versus a novel data set of 314 clinical questions spanning nine medical specialties.


RESULTS
Almanac showed a significant improvement in performance compared with the standard LLMs across axes of factuality, completeness, user preference, and adversarial safety.


CONCLUSIONS
Our results show the potential for LLMs with access to domain-specific corpora to be effective in clinical decision-making. The findings also underscore the importance of carefully testing LLMs before deployment to mitigate their shortcomings. (Funded by the National Institutes of Health, National Heart, Lung, and Blood Institute.)."
A Sand County Almanac,paper,1949,"Our goal is always to offer you an assortment of cost-free ebooks too as aid resolve your troubles. We have got a considerable collection of totally free of expense Book for people from every single stroll of life. We have got tried our finest to gather a sizable library of preferred cost-free as well as paid files. Have spare times? Read sand county almanac writer by Why? A best seller publication in the world with fantastic worth as well as material is incorporated with fascinating words. Where? Merely below, in this site you could read online. Want download? Naturally readily available, download them likewise right here. Available data are as word, ppt, txt, kindle, pdf, rar, and also zip. Searching for many offered book or reading resource in the world? We give them all in style kind as word, txt, kindle, pdf, zip, rar as well as ppt. one of them is this competent sand county almanac that has actually been composed by Still confused how to get it? Well, simply review online or download by registering in our site here. Click them. GO TO THE TECHNICAL WRITING FOR AN EXPANDED TYPE OF THIS SAND COUNTY ALMANAC, ALONG WITH A CORRECTLY FORMATTED VERSION OF THE INSTANCE MANUAL PAGE ABOVE."
The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity.,paper,2017-07-01,"To date, over 100 small-molecule oncology drugs have been approved by the FDA. Because of the inherent heterogeneity of tumors, these small molecules are often administered in combination to prevent emergence of resistant cell subpopulations. Therefore, new combination strategies to overcome drug resistance in patients with advanced cancer are needed. In this study, we performed a systematic evaluation of the therapeutic activity of over 5,000 pairs of FDA-approved cancer drugs against a panel of 60 well-characterized human tumor cell lines (NCI-60) to uncover combinations with greater than additive growth-inhibitory activity. Screening results were compiled into a database, termed the NCI-ALMANAC (A Large Matrix of Anti-Neoplastic Agent Combinations), publicly available at https://dtp.cancer.gov/ncialmanac Subsequent in vivo experiments in mouse xenograft models of human cancer confirmed combinations with greater than single-agent efficacy. Concomitant detection of mechanistic biomarkers for these combinations in vivo supported the initiation of two phase I clinical trials at the NCI to evaluate clofarabine with bortezomib and nilotinib with paclitaxel in patients with advanced cancer. Consequently, the hypothesis-generating NCI-ALMANAC web-based resource has demonstrated value in identifying promising combinations of approved drugs with potent anticancer activity for further mechanistic study and translation to clinical trials. Cancer Res; 77(13); 3564-76. ©2017 AACR."
Almanac: Diffraction & Reading Diffractively,paper,2021-02-18,<jats:p>.</jats:p>
Predicting Synergism of Cancer Drug Combinations Using NCI-ALMANAC Data,paper,2018-12-21,"Background Drug combinations are of great interest for cancer treatment. Unfortunately, the discovery of synergistic combinations by purely experimental means is only feasible on small sets of drugs. In silico modeling methods can substantially widen this search by providing tools able to predict which of all possible combinations in a large compound library are synergistic. Here we investigate to which extent drug combination synergy can be predicted by exploiting the largest available dataset to date (NCI-ALMANAC, with over 290,000 synergy determinations). Methods Each cell line is modeled using primarily two machine learning techniques, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), on the datasets provided by NCI-ALMANAC. This large-scale predictive modeling study comprises more than 5000 pair-wise drug combinations, 60 cell lines, 4 types of models and 5 types of chemical features. The application of a powerful, yet uncommonly used, RF-specific technique for reliability prediction is also investigated. Results The evaluation of these models shows that it is possible to predict the synergy of unseen drug combinations with high accuracy (Pearson correlations between 0.43 and 0.86 depending on the considered cell line, with XGBoost providing slightly better predictions than RF). We have also found that restricting to the most reliable synergy predictions results in at least two-fold error decrease with respect to employing the best learning algorithm without any reliability estimation. Alkylating agents, tyrosine kinase inhibitors and topoisomerase inhibitors are the drugs whose synergy with other partner drugs are better predicted by the models. Conclusions Despite its leading size, NCI-ALMANAC comprises an extremely small part of all conceivable combinations. Given their accuracy and reliability estimation, the developed models should drastically reduce the number of required in vitro tests by predicting in silico which of the considered combinations are likely to be synergistic."
„Zwycięstwo Maryi”. Próba zdefiniowania znaczenia „proroctwa Augusta kard. Hlonda o zwycięstwie Maryi w kontekście posługi apostolskiej prymasa tysiąclecia Stefana kard. Wyszyńskiego,paper,2023-03-09,"W 2006 roku polski Kościół katolicki przypomniał dwa wydarzenia historyczne o ogromnym znaczeniu dla życia wiernych w powojennej Polsce wyzwolonej od nazizmu, ale znajdującej się wówczas pod rządami komunistów, nieodwracalnie zdeterminowany, by wprowadzić tam marksistowską wizję społeczeństwa, niszcząc w ten sposób chrześcijańską obecność. Był to Akt Zawierzenia Niepokalanemu Sercu Maryi Narodu Polskiego (8 września 1946 r.), dokonany przez Prymasa, Sługę Bożego kard. Augusto Hlond, a śluby wieczyste Czarnej Madonnie z Jasnej Góry (26 sierpnia 1956 r.) złożył sługa Boży kard. Stefana Wyszyńskiego. Te dwie rocznice są okazją do przyjrzenia się doniosłości wizji Zwycięstwa Maryi nad ateistycznym komunistycznym systemem politycznym, której świadkiem był kard. Hlonda przed śmiercią (22 października 1948). Aby lepiej to zrozumieć, zbadamy rodzaj pobożności maryjnej, jaką miał Hlond; społeczno-polityczne okoliczności narodzin tej wizji; jej niezwykły wpływ na działalność duszpasterską kard. Wyszyński i polska hierarchia. Wyszyński dostrzegał bowiem niezwykłą moc twórczą wizji Maryi Zwycięskiej w planowaniu i realizacji niezwykle skutecznego programu duszpasterskiego odnowy moralnej i ewangelizacji społeczeństwa polskiego, zwłaszcza w perspektywie przygotowania Narodu Polskiego do Tysiąclecia swojego Chrztu (1966), poprzez Wielką Nowennę do Madonny. Ten maryjny styl działalności duszpasterskiej – który wbrew krytyce był całkowicie skoncentrowany na Chrystusie – przyczynił się zdecydowanie, nawet w opinii niekatolików, do obrony wolności obywateli polskich, co spotkało się z pozytywnym odzewem poza granicami kraju. Ponadto odniesiono się do postrzegania tej maryjnej wizji w posłudze Piotrowej Jana Pawła II, który nierzadko mówił o tej zwycięskiej wizji Maryi przez kard. Hlond. Choć nie jest to wprost wspomniane, to jednak można dostrzec istnienie wątku łączącego Hlonda, Wyszyńskiego i Jana Pawła II: szczególny maryjny wymiar ich działalności duszpasterskiej."
Egy avar kori kard mint információforrás és restaurált tárgy,paper,2023-03-06,"A tanulmány a döri, 7. századi, kora avar kori kard restaurálása során feltárt készítéstechnikai és anyagtudományi információkat tárgyalja. A mikroszkópos megfigyeléseket több esetben mintavétel és anyagvizsgálatok egészítették ki, a szétbontás és restaurálás folyamatát röntgenfelvételek előzték meg. A különböző vizsgálatok következményeként olyan készítéstechnikai információk kerültek elő, amelyek gyarapíthatják az avar tárgyi kultúráról alkotott tudásunkat."
"Laudacja z okazji wręczenia kard. Zenonowi Grocholewskiemu Nagrody imienia ks. Idziego Radziszewskiego. Lublin, 27 maja 2013 roku",paper,2023-08-16,"W dniu 27 maja 2013 r. odbyła się uroczystość z okazji wręczenia Nagrody im. ks. Idziego Radziszewskiego kard. Zenonowi Grocholewskiemu za osiągnięcia w duchu chrześcijańskiego humanizmu, przyznana przez Towarzystwo Naukowe Katolickiego Uniwersytetu Lubelskiego Jana Pawła II. W laudacji prof. Józef Krukowski przedstawił: życiorys kardynała Grocholewskiego, jego działalność organizacyjną w Kurii Rzymskiej, zwłaszcza jako prefekta Najwyższego Trybunału Sygnatury Apostolskiej i prefekta Kongregacji Wychowania Katolickiego oraz dorobek naukowy z zakresu prawa kanonicznego, filozofii prawa i roli uniwersytetów we współczesnym świecie."
Starania o powrót Wydziału Teologicznego na Uniwersytet Jagielloński w raportach członków Wydziału do prymasa Polski ks. kard. Stefana Wyszyńskiego (1956–1958),paper,2023-07-25,"Decyzją stalinowskiej Rady Ministrów PRL 11 sierpnia 1954 r., po ponad 550 latach, Wydział Teologiczny został odłączony od Uniwersytetu Jagiellońskiego i włączony do Akademii Teologii Katolickiej w Warszawie. To oznaczało, że Kraków utracił prawa akademickie w teologii. Po przesileniu politycznym w październiku 1956 r. władze polityczne odcięły się od polityki ostatnich lat, od ich metod i decyzji. Wobec tego profesorowie Wydziału Teologicznego UJ pracujący w Akademii Teologii Katolickiej w Warszawie podjęli starania o przywrócenie Wydziału Teologicznego na Uniwersytecie Jagiellońskim. Wsparcia udzielił im prymas Polski kard. Stefan Wyszyński. 
Autor omawia tytułowy temat na podstawie kilkunastu raportów księży profesorów krakowskich do prymasa Polski kard. S. Wyszyńskiego w latach 1956–1958, które są zapisem ich starań o przywrócenie Wydziału Teologicznego na Uniwersytecie Jagiellońskim. Jak miało się okazać okres tzw. odwilży politycznej szybko się zakończył, a kierownictwo komunistycznej partii rządzącej ani na chwilę nie dopuszczało myśli o powrocie Wydziału Teologicznego na Uniwersytet Jagielloński. Raporty członków byłego Wydziału Teologicznego Uniwersytetu Jagiellońskiego są świadectwem ich pragnień i pełnej determinacji pracy. Autor starał się je pokazać, przedstawiając chronologicznie kolejne raporty jako etapy zmagań z systemem, w którym niemożliwe było istnienie Wydziału Teologicznego na jakimkolwiek państwowym uniwersytecie."
Wkład Prymasa Polski Stefana kard. Wyszyńskiego i papieża Jana Pawła II w normalizację stosunków między Państwem a Kościołem,paper,2023-04-14,"Celem rozważań jest ukazanie wkładu prymasa Polski kard. Stefana Wyszyńskiego i papieża Jana Pawła II w proces normalizacji relacji między Państwem a Kościołem katolickim w Polsce po II wojnie światowej. Całość obejmuje trzy kwestie. Pierwsza z nich dotyczy zmian, jakie zostały wprowadzone przez władze komunistyczne. Istotne znaczenie miała uchwała Tymczasowego Rządu Jedności Narodowej zawierająca deklarację, że „Konkordat polski z 1925 r. przestał obowiązywać”. Oznaczało to przejście od regulacji stosunków między Państwem a Kościołem w formie dwustronnej umowy międzynarodowej do regulacji w formie aktów stanowionych jednostronnie przez władze państwowe, drastycznie ograniczających wolność Kościoła w realizacji swojej misji. Druga kwestia dotyczy zasad i metod, jakie prymas Stefan Wyszyński i papież Jan Paweł II stosowali i jakie stawiali postulaty w celu osiągnięcia normalizacji relacji dyplomatycznych między Polską i Stolicą Apostolską oraz regulacji stosunków między Państwem i Kościołem Polsce w formie dwustronnej umowy międzynarodowej. Trzecia kwestia dotyczy kolejnych etapów realizacji tych postulatów – od zerwania Konkordatu z 1925 r. do zawarcia nowego Konkordatu w latach 1993-1998."
Komentarze w zagranicznych środkach przekazu po zapowiedzi beatyfikacji kard. Stefana Wyszyńskiego,paper,2023-04-14,"Chociaż liczba nowych materiałów po trzech miesiącach od daty ogłoszenia beatyfikacji kard. Stefana Wyszyńskiego w zagranicznych mediach nie jest jeszcze znacząca, to zwraca uwagę fakt, że widziany tam jest już nie tylko jako przywódca Kościoła katolickiego w komunistycznej Polsce, lecz również jako człowiek duchowy, o głębokiej myśli społecznej. Świadczy o tym treść publikacji (1), komentarzy i komentarzy do tychże (2). Zaskakująca jest znajomość osoby Prymasa Tysiąclecia, któremu dziennikarz z Madrytu José Luis Restán Martínez dodał tytuł Wielki; zafascynowanie jego postacią wyrażane przez prof. Bernardino Montejano z Argentyny; emocje towarzyszące zapowiedzianemu wydarzeniu, znajdujące wyraz na portalach, zwłaszcza wśród Polonusów, a wskazujące na rolę przodków w przekazie wiedzy i mądrości; czerpanie w duszpasterstwie z myśli Sługi Bożego przez abpa José H. Gomeza z Los Angeles. Również dla Uniwersytetu, którego był studentem, doktorem, Wielkim Kanclerzem, beatyfikacja będzie wielkim zadaniem (3)."
Pobyt i nauczanie prymasa Polski kard. Augusta Hlonda na terenie późniejszej diecezji koszalińsko-kołobrzeskiej,paper,2023-09-15,"Prymas Polski kardynał August Hlond przeszedł do historii jako postać wybitna. Przypadło mu odegrać rolę organizatora struktur kościelnych w powojennej Polsce, na mocy specjalnych przywilejów papieskich, jakich dotąd nikomu w Kościele nie przyznawano. W historię późniejszej diecezji koszalińsko-kołobrzeskiej wpisał się nie tylko swoimi decyzjami, ale również odwiedzając jej tereny. Niniejszy artykuł zawiera opis przywilejów, przebieg spotkań prymasa z „niemieckimi” rządcami dotychczasowych struktur kościelnych, informacje na temat nowych rządców administracji apostolskich utworzonych w miejsce dawnych jednostek, a także przebieg wizyt kardynała w Pokrzywnicy, Kołobrzegu i Koszalinie."
Laudacja z okazji wręczenia kard. Zenonowi Grocholewskiemu Nagrody imienia ks. Idziego Radziszewskiego wygłoszona dnia 27 maja 2013 r.,paper,2023-08-28,"On 27 May, 2013 took place the ceremony on the occasion of the obtaining award of the Rev. Idzi Radziszewski for Card. Zenon Grocholewski for the achievements in the spirit of the Christian Humanismus, awarded by the Scientific Society of the Catholic University of John Paul II in Lublin. In the laudation, Professor Józef Krukowski presented: a biography of Cardinal Grocholewski, his organizational activity in the Roman Curia, especially as a Prefect of the Supreme Tribunal Sygnatura Apostolica and Prefect of the Congregation for Catholic Education and the scientific achievements in canon law, philosophy of law and on the role of the universities in the modern world."
"Az alsó egyenes szemizom kard általi súlyos sérülése, klinikai képe, műtéti kezelése és posztoperatív eredményei",paper,2023,"Célkitűzés: Izolált alsó egyenes szemizomsérülés megjelenésének és kezelésének ismertetése eseten keresztül. Esetismertetés: Fiatal férfibeteg vágott sérülést követően azonnal kettősképet észlelt, látásromlás nélkül. Sérülés másnapján elvégzett műtét során az alsó egyenes szemizom részleges szakadását lehetett megfigyelni. Műtéti rekonstrukciót követően szemállás párhuzamos volt, a beteg panaszmentessé vált. Következtetések: Traumás szemizomsérülések nagyon változatosak lehetnek. Ellátásuk sokszor kihívást jelentő feladat. Izolált szemizomsérülés ritkán fordul elő. Fontos a műtét mihamarabbi elvégzése a kedvező prognózishoz."
ToolQA: A Dataset for LLM Question Answering with External Tools,paper,2023-06-23,"Large Language Models (LLMs) have demonstrated impressive performance in various NLP tasks, but they still suffer from challenges such as hallucination and weak numerical reasoning. To overcome these challenges, external tools can be used to enhance LLMs' question-answering abilities. However, current evaluation methods do not distinguish between questions that can be answered using LLMs' internal knowledge and those that require external information through tool use. To address this issue, we introduce a new dataset called ToolQA, which is designed to faithfully evaluate LLMs' ability to use external tools for question answering. Our development of ToolQA involved a scalable, automated process for dataset curation, along with 13 specialized tools designed for interaction with external knowledge in order to answer questions. Importantly, we strive to minimize the overlap between our benchmark data and LLMs' pre-training data, enabling a more precise evaluation of LLMs' tool-use reasoning abilities. We conducted an in-depth diagnosis of existing tool-use LLMs to highlight their strengths, weaknesses, and potential improvements. Our findings set a new benchmark for evaluating LLMs and suggest new directions for future advancements. Our data and code are freely available to the broader scientific community on GitHub."
MultiTool-CoT: GPT-3 Can Use Multiple External Tools with Chain of Thought Prompting,paper,2023-05-26,"Large language models (LLMs) have achieved impressive performance on various reasoning tasks. To further improve the performance, we propose MultiTool-CoT, a novel framework that leverages chain-of-thought (CoT) prompting to incorporate multiple external tools, such as a calculator and a knowledge retriever, during the reasoning process.We apply MultiTool-CoT to the Task 2 dataset of NumGLUE, which requires both numerical reasoning and domain-specific knowledge.The experiments show that our method significantly outperforms strong baselines and achieves state-of-the-art performance."
Multimodal interaction for science learning in preschool: Conceptual development with external tools across a science project,paper,2019-04-15,"ABSTRACT This paper studies the scaffolding of conceptual development for children aged 4–5 years old during a science project at a Swedish preschool. It specifically examines how bodily knowledge and language are used in interaction, and how conceptual knowledge can be scaffolded with the use of external tools and artefacts. The science project was tracked for seven weeks and the analytical focus is on situations where a computer and a projected screen are used. The study shows how interactions afforded by the set-up provide a virtual-physical setting where teachers and children can interact using both language and bodily modes. As such, it provided an interactional space where teachers can scaffold children’s tactile understandings towards conceptual knowledge by building on the children’s prior experiences, and knowledge is cumulated over time during the project. This is accomplished by focusing attention on the topic and through the use of tools in interaction. Possible implications and uses for early childhood education are discussed in the light of these results."
A Study of Learning-by-Doing in MOOCs through the Integration of Third-Party External Tools: Comparison of Synchronous and Asynchronous Running Modes,paper,2018,"Many MOOCs are being designed replicating traditional passive teaching approaches but using video lectures as the means of transmitting information. However, it is well known that learning-by-doing increases retention rates and, thus, allows achieving a more effective learning. To this end, it is worth exploring which tools fit best in the context of each MOOC to enrich learners’ experience, including built-in tools already available in the MOOC platform, and third-party external tools which can be integrated in the MOOC platform. This paper presents an example of the integration of a software development tool, called Codeboard, in three MOOCs which serve as an introduction to programming with Java. We analyze the effect this tool has on learners’ interaction and engagement when running the MOOCs in synchronous (instructor-paced) or asynchronous (self-paced) modes. Results show that the overall use of the tool is similar, regardless of the course running mode, although in the case of the synchronous mode the use of the tool is concentrated in a shorter period of time. Results also show that in the synchronous mode there is a higher percentage of accesses to the tool from registered learners (who can save their advances and continue the work later); this finding suggests that learners in the synchronous running mode are more engaged with the MOOC."
CyREST: Turbocharging Cytoscape Access for External Tools via a RESTful API,paper,2015-08-05,"As bioinformatic workflows become increasingly complex and involve multiple specialized tools, so does the difficulty of reliably reproducing those workflows. Cytoscape is a critical workflow component for executing network visualization, analysis, and publishing tasks, but it can be operated only manually via a point-and-click user interface. Consequently, Cytoscape-oriented tasks are laborious and often error prone, especially with multistep protocols involving many networks. In this paper, we present the new cyREST Cytoscape app and accompanying harmonization libraries. Together, they improve workflow reproducibility and researcher productivity by enabling popular languages (e.g., Python and R, JavaScript, and C#) and tools (e.g., IPython/Jupyter Notebook and RStudio) to directly define and query networks, and perform network analysis, layouts and renderings. We describe cyREST’s API and overall construction, and present Python- and R-based examples that illustrate how Cytoscape can be integrated into large scale data analysis pipelines. cyREST is available in the Cytoscape app store (http://apps.cytoscape.org) where it has been downloaded over 1900 times since its release in late 2014."
Toolformer: Language Models Can Teach Themselves to Use Tools,paper,2023-02-09,"Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities."
A framework for installable external tools in Skyline,paper,2014-09-01,"UNLABELLED
Skyline is a Windows client application for targeted proteomics method creation and quantitative data analysis. The Skyline document model contains extensive mass spectrometry data from targeted proteomics experiments performed using selected reaction monitoring, parallel reaction monitoring and data-independent and data-dependent acquisition methods. Researchers have developed software tools that perform statistical analysis of the experimental data contained within Skyline documents. The new external tools framework allows researchers to integrate their tools into Skyline without modifying the Skyline codebase. Installed tools provide point-and-click access to downstream statistical analysis of data processed in Skyline. The framework also specifies a uniform interface to format tools for installation into Skyline. Tool developers can now easily share their tools with proteomics researchers using Skyline.


AVAILABILITY AND IMPLEMENTATION
Skyline is available as a single-click self-updating web installation at http://skyline.maccosslab.org. This Web site also provides access to installable external tools and documentation.


SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online."
Instruction Tuning for Large Language Models: A Survey,paper,2023-08-21,"This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research. Project page: github.com/xiaoya-li/Instruction-Tuning-Survey"
StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data,paper,2023-08-20,"The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions. Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. Additionally, datasets can be arbitrarily scaled. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets. The results emphasize substantial enhancements in more than ten commonly assessed capabilities. Additionally, our model achieves state-of-the-art results across multiple widely recognized multimodal benchmarks."
UMIE: Unified Multimodal Information Extraction with Instruction Tuning,paper,2024-01-05,"Multimodal information extraction (MIE) gains significant attention as the popularity of multimedia content increases. However, current MIE methods often resort to using task-specific model structures, which results in limited generalizability across tasks and underutilizes shared knowledge across MIE tasks. To address these issues, we propose UMIE, a unified multimodal information extractor to unify three MIE tasks as a generation problem using instruction tuning, being able to effectively extract both textual and visual mentions. Extensive experiments show that our single UMIE outperforms various state-of-the-art (SoTA) methods across six MIE datasets on three tasks. Furthermore, in-depth analysis demonstrates UMIE's strong generalization in the zero-shot setting, robustness to instruction variants, and interpretability. Our research serves as an initial step towards a unified MIE model and initiates the exploration into both instruction tuning and large language models within the MIE domain. Our code, data, and model are available at https://github.com/ZUCC-AI/UMIE."
Instruction Tuning-free Visual Token Complement for Multimodal LLMs,paper,2024-08-09,"As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (VTC) that helps MLLMs regain the missing visual features and thus improve response accuracy. Specifically, our VTC integrates text-to-image generation as a guide to identifying the text-irrelevant features, and a visual selector is then developed to generate complementary visual tokens to enrich the original visual input. Moreover, an iterative strategy is further designed to extract more visual information by iteratively using the visual selector without any additional training. Notably, the training pipeline requires no additional image-text pairs, resulting in a desired instruction tuning-free property. Both qualitative and quantitative experiments demonstrate the superiority and efficiency of our VTC."
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning,paper,2024-08-21,"Facial expression recognition (FER) is an important research topic in emotional artificial intelligence. In recent decades, researchers have made remarkable progress. However, current FER paradigms face challenges in generalization, lack semantic information aligned with natural language, and struggle to process both images and videos within a unified framework, making their application in multimodal emotion understanding and human-computer interaction difficult. Multimodal Large Language Models (MLLMs) have recently achieved success, offering advantages in addressing these issues and potentially overcoming the limitations of current FER paradigms. However, directly applying pre-trained MLLMs to FER still faces several challenges. Our zero-shot evaluations of existing open-source MLLMs on FER indicate a significant performance gap compared to GPT-4V and current supervised state-of-the-art (SOTA) methods. In this paper, we aim to enhance MLLMs' capabilities in understanding facial expressions. We first generate instruction data for five FER datasets with Gemini. We then propose a novel MLLM, named EMO-LLaMA, which incorporates facial priors from a pretrained facial analysis network to enhance human facial information. Specifically, we design a Face Info Mining module to extract both global and local facial information. Additionally, we utilize a handcrafted prompt to introduce age-gender-race attributes, considering the emotional differences across different human groups. Extensive experiments show that EMO-LLaMA achieves SOTA-comparable or competitive results across both static and dynamic FER datasets. The instruction dataset and code are available at https://github.com/xxtars/EMO-LLaMA."
Prescribing the Right Remedy: Mitigating Hallucinations in Large Vision-Language Models via Targeted Instruction Tuning,paper,2024-04-16,"Despite achieving outstanding performance on various cross-modal tasks, current large vision-language models (LVLMs) still suffer from hallucination issues, manifesting as inconsistencies between their generated responses and the corresponding images. Prior research has implicated that the low quality of instruction data, particularly the skewed balance between positive and negative samples, is a significant contributor to model hallucinations. Recently, researchers have proposed high-quality instruction datasets, such as LRV-Instruction, to mitigate model hallucination. Nonetheless, our investigation reveals that hallucinatory concepts from different LVLMs exhibit specificity, i.e. the distribution of hallucinatory concepts varies significantly across models. Existing datasets did not consider the hallucination specificity of different models in the design processes, thereby diminishing their efficacy in mitigating model hallucination. In this paper, we propose a targeted instruction data generation framework named DFTG that tailored to the hallucination specificity of different models. Concretely, DFTG consists of two stages: hallucination diagnosis, which extracts the necessary information from the model's responses and images for hallucination diagnosis; and targeted data generation, which generates targeted instruction data based on diagnostic results. The experimental results on hallucination benchmarks demonstrate that the targeted instruction data generated by our method are more effective in mitigating hallucinations compared to previous datasets."
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases,paper,2023-03-26,"The success of ChatGPT has recently attracted numerous efforts to replicate it, with instruction-tuning strategies being a key factor in achieving remarkable results. Instruction-tuning not only significantly enhances the model's performance and generalization but also makes the model's generated results more consistent with human speech patterns. However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases. In this paper we explore the performance of large language models based on instruction tuning across different scales of instruction data. An evaluation dataset consisting of 12 major online use cases is constructed in the experiment. With Bloomz-7B1-mt as the base model, the results show that 1) merely increasing the amount of instruction data leads to continuous improvement in tasks such as open-ended generation, 2) in tasks such as math and code, the model performance curve remains quite flat while increasing data size. We further analyze the possible causes of these phenomena and propose potential future research directions such as effectively selecting high-quality training data, scaling base models and training methods specialized for hard tasks. We will release our training and evaluation datasets, as well as model checkpoints."
OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs,paper,2023-09-07,"Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers.With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de)."
SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise,paper,2023-12-03,"In this paper, we introduce a novel fine-tuning technique for language models, which involves incorporating symmetric noise into the embedding process. This method aims to enhance the model's function by more stringently regulating its local curvature, demonstrating superior performance over the current method, NEFTune. When fine-tuning the LLaMA-2-7B model using Alpaca, standard techniques yield a 29.79% score on AlpacaEval. However, our approach, SymNoise, increases this score significantly to 69.04%, using symmetric noisy embeddings. This is a 6.7% improvement over the state-of-the-art method, NEFTune~(64.69%). Furthermore, when tested on various models and stronger baseline instruction datasets, such as Evol-Instruct, ShareGPT, OpenPlatypus, SymNoise consistently outperforms NEFTune. The current literature, including NEFTune, has underscored the importance of more in-depth research into the application of noise-based strategies in the fine-tuning of language models. Our approach, SymNoise, is another significant step towards this direction, showing notable improvement over the existing state-of-the-art method."
CIDAR: Culturally Relevant Instruction Dataset For Arabic,paper,2024-02-05,"Instruction tuning has emerged as a prominent methodology for teaching Large Language Models (LLMs) to follow instructions. However, current instruction datasets predominantly cater to English or are derived from English-dominated LLMs, resulting in inherent biases toward Western culture. This bias significantly impacts the linguistic structures of non-English languages such as Arabic, which has a distinct grammar reflective of the diverse cultures across the Arab region. This paper addresses this limitation by introducing CIDAR: https://hf.co/datasets/arbml/CIDAR, the first open Arabic instruction-tuning dataset culturally-aligned by human reviewers. CIDAR contains 10,000 instruction and output pairs that represent the Arab region. We discuss the cultural relevance of CIDAR via the analysis and comparison to other models fine-tuned on other datasets. Our experiments show that CIDAR can help enrich research efforts in aligning LLMs with the Arabic culture. All the code is available at https://github.com/ARBML/CIDAR."
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets,paper,2023-10-07,"In the swiftly expanding domain of Natural Language Processing (NLP), the potential of GPT-based models for the financial sector is increasingly evident. However, the integration of these models with financial datasets presents challenges, notably in determining their adeptness and relevance. This paper introduces a distinctive approach anchored in the Instruction Tuning paradigm for open-source large language models, specifically adapted for financial contexts. Through this methodology, we capitalize on the interoperability of open-source models, ensuring a seamless and transparent integration. We begin by explaining the Instruction Tuning paradigm, highlighting its effectiveness for immediate integration. The paper presents a benchmarking scheme designed for end-to-end training and testing, employing a cost-effective progression. Firstly, we assess basic competencies and fundamental tasks, such as Named Entity Recognition (NER) and sentiment analysis to enhance specialization. Next, we delve into a comprehensive model, executing multi-task operations by amalgamating all instructional tunings to examine versatility. Finally, we explore the zero-shot capabilities by earmarking unseen tasks and incorporating novel datasets to understand adaptability in uncharted terrains. Such a paradigm fortifies the principles of openness and reproducibility, laying a robust foundation for future investigations in open-source financial large language models (FinLLMs)."
TeaMs-RL: Teaching LLMs to Generate Better Instruction Datasets via Reinforcement Learning,paper,2024-03-13,"The development of Large Language Models (LLMs) often confronts challenges stemming from the heavy reliance on human annotators in the reinforcement learning with human feedback (RLHF) framework, or the frequent and costly external queries tied to the self-instruct paradigm. In this work, we pivot to Reinforcement Learning (RL) -- but with a twist. Diverging from the typical RLHF, which refines LLMs following instruction data training, we use RL to directly generate the foundational instruction dataset that alone suffices for fine-tuning. Our method, TeaMs-RL, uses a suite of textual operations and rules, prioritizing the diversification of training datasets. It facilitates the generation of high-quality data without excessive reliance on external advanced models, paving the way for a single fine-tuning step and negating the need for subsequent RLHF stages. Our findings highlight key advantages of our approach: reduced need for human involvement and fewer model queries (only $5.73\%$ of the strong baseline's total), along with enhanced capabilities of LLMs in crafting and comprehending complex instructions compared to strong baselines, and substantially improved model privacy protection. Code is available at the link: https://github.com/SafeRL-Lab/TeaMs-RL"
Scaling Instruction-Finetuned Language Models,paper,2022-10-20,"Finetuning language models on a collection of datasets phrased as instructions has been shown to improve model performance and generalization to unseen tasks. In this paper we explore instruction finetuning with a particular focus on (1) scaling the number of tasks, (2) scaling the model size, and (3) finetuning on chain-of-thought data. We find that instruction finetuning with the above aspects dramatically improves performance on a variety of model classes (PaLM, T5, U-PaLM), prompting setups (zero-shot, few-shot, CoT), and evaluation benchmarks (MMLU, BBH, TyDiQA, MGSM, open-ended generation). For instance, Flan-PaLM 540B instruction-finetuned on 1.8K tasks outperforms PALM 540B by a large margin (+9.4% on average). Flan-PaLM 540B achieves state-of-the-art performance on several benchmarks, such as 75.2% on five-shot MMLU. We also publicly release Flan-T5 checkpoints, which achieve strong few-shot performance even compared to much larger models, such as PaLM 62B. Overall, instruction finetuning is a general method for improving the performance and usability of pretrained language models."
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning,paper,2023-05-11,"Large-scale pre-training and instruction tuning have been successful at creating general-purpose language models with broad competence. However, building general-purpose vision-language models is challenging due to the rich input distributions and task diversity resulting from the additional visual input. Although vision-language pretraining has been widely studied, vision-language instruction tuning remains under-explored. In this paper, we conduct a systematic and comprehensive study on vision-language instruction tuning based on the pretrained BLIP-2 models. We gather 26 publicly available datasets, covering a wide variety of tasks and capabilities, and transform them into instruction tuning format. Additionally, we introduce an instruction-aware Query Transformer, which extracts informative features tailored to the given instruction. Trained on 13 held-in datasets, InstructBLIP attains state-of-the-art zero-shot performance across all 13 held-out datasets, substantially outperforming BLIP-2 and larger Flamingo models. Our models also lead to state-of-the-art performance when finetuned on individual downstream tasks (e.g., 90.7% accuracy on ScienceQA questions with image contexts). Furthermore, we qualitatively demonstrate the advantages of InstructBLIP over concurrent multimodal models. All InstructBLIP models are open-sourced at https://github.com/salesforce/LAVIS/tree/main/projects/instructblip."
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning,paper,2023-01-31,"We study the design decisions of publicly available instruction tuning methods, and break down the development of Flan 2022 (Chung et al., 2022). Through careful ablation studies on the Flan Collection of tasks and methods, we tease apart the effect of design decisions which enable Flan-T5 to outperform prior work by 3-17%+ across evaluation settings. We find task balancing and enrichment techniques are overlooked but critical to effective instruction tuning, and in particular, training with mixed prompt settings (zero-shot, few-shot, and chain-of-thought) actually yields stronger (2%+) performance in all settings. In further experiments, we show Flan-T5 requires less finetuning to converge higher and faster than T5 on single downstream tasks, motivating instruction-tuned models as more computationally-efficient starting checkpoints for new tasks. Finally, to accelerate research on instruction tuning, we make the Flan 2022 collection of datasets, templates, and methods publicly available at https://github.com/google-research/FLAN/tree/main/flan/v2."
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning,paper,2023-09-11,"We introduce MAmmoTH, a series of open-source large language models (LLMs) specifically tailored for general math problem-solving. The MAmmoTH models are trained on MathInstruct, our meticulously curated instruction tuning dataset. MathInstruct is compiled from 13 math datasets with intermediate rationales, six of which have rationales newly curated by us. It presents a unique hybrid of chain-of-thought (CoT) and program-of-thought (PoT) rationales, and also ensures extensive coverage of diverse fields in math. The hybrid of CoT and PoT not only unleashes the potential of tool use but also allows different thought processes for different math problems. As a result, the MAmmoTH series substantially outperform existing open-source models on nine mathematical reasoning datasets across all scales with an average accuracy gain between 16% and 32%. Remarkably, our MAmmoTH-7B model reaches 33% on MATH (a competition-level dataset), which exceeds the best open-source 7B model (WizardMath) by 23%, and the MAmmoTH-34B model achieves 44% accuracy on MATH, even surpassing GPT-4's CoT result. Our work underscores the importance of diverse problem coverage and the use of hybrid rationales in developing superior math generalist models."
How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources,paper,2023-06-07,"In this work we explore recent advances in instruction-tuning language models on a range of open instruction-following datasets. Despite recent claims that open models can be on par with state-of-the-art proprietary models, these claims are often accompanied by limited evaluation, making it difficult to compare models across the board and determine the utility of various resources. We provide a large set of instruction-tuned models from 6.7B to 65B parameters in size, trained on 12 instruction datasets ranging from manually curated (e.g., OpenAssistant) to synthetic and distilled (e.g., Alpaca) and systematically evaluate them on their factual knowledge, reasoning, multilinguality, coding, and open-ended instruction following abilities through a collection of automatic, model-based, and human-based metrics. We further introduce T\""ulu, our best performing instruction-tuned model suite finetuned on a combination of high-quality open resources. Our experiments show that different instruction-tuning datasets can uncover or enhance specific skills, while no single dataset (or combination) provides the best performance across all evaluations. Interestingly, we find that model and human preference-based evaluations fail to reflect differences in model capabilities exposed by benchmark-based evaluations, suggesting the need for the type of systemic evaluation performed in this work. Our evaluations show that the best model in any given evaluation reaches on average 87% of ChatGPT performance, and 73% of GPT-4 performance, suggesting that further investment in building better base models and instruction-tuning data is required to close the gap. We release our instruction-tuned models, including a fully finetuned 65B T\""ulu, along with our code, data, and evaluation framework at https://github.com/allenai/open-instruct to facilitate future research."
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation,paper,2024-02-28,"We introduce Bonito, an open-source model for conditional task generation that converts unannotated text into task-specific training datasets for instruction tuning. We aim to enable zero-shot task adaptation of large language models on users' specialized, private data. We train Bonito by fine-tuning a pretrained large language model on a new large-scale dataset with 1.65M examples created by remixing existing instruction tuning datasets into meta-templates. The meta-templates for a dataset produce training examples where the input is the unannotated text and the task attribute and the output consists of the instruction and the response. We use Bonito to generate synthetic tasks for seven datasets from specialized domains with unannotated text across three task types -- yes-no question answering, extractive question answering, and natural language inference -- and adapt language models. We show that Bonito significantly improves the average performance of pretrained and instruction tuned models over the de facto self supervised baseline. For example, adapting Mistral-Instruct-v2 and instruction tuned variants of Mistral and Llama2 with Bonito improves the strong zero-shot performance by 22.1 F1 points whereas the next word prediction objective undoes some of the benefits of instruction tuning and reduces the average performance by 0.8 F1 points. We conduct additional experiments with Bonito to understand the effects of the domain, the size of the training set, and the choice of alternative synthetic task generators. Overall, we show that learning with synthetic instruction tuning datasets is an effective way to adapt language models to new domains. The model, dataset, and code are available at https://github.com/BatsResearch/bonito."
Instruction Tuning for Large Language Models: A Survey,paper,2023-08-21,"This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research. Project page: github.com/xiaoya-li/Instruction-Tuning-Survey"
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest,paper,2023-07-07,"Visual instruction tuning large language model(LLM) on image-text pairs has achieved general-purpose vision-language abilities. However, the lack of region-text pairs limits their advancements to fine-grained multimodal understanding. In this paper, we propose spatial instruction tuning, which introduces the reference to the region-of-interest(RoI) in the instruction. Before sending to LLM, the reference is replaced by RoI features and interleaved with language embeddings as a sequence. Our model GPT4RoI, trained on 7 region-text pair datasets, brings an unprecedented interactive and conversational experience compared to previous image-level models. (1) Interaction beyond language: Users can interact with our model by both language and drawing bounding boxes to flexibly adjust the referring granularity. (2) Versatile multimodal abilities: A variety of attribute information within each RoI can be mined by GPT4RoI, e.g., color, shape, material, action, etc. Furthermore, it can reason about multiple RoIs based on common sense. On the Visual Commonsense Reasoning(VCR) dataset, GPT4RoI achieves a remarkable accuracy of 81.6%, surpassing all existing models by a significant margin (the second place is 75.6%) and almost reaching human-level performance of 85.0%. The code, dataset, and demo can be found at https://github.com/jshilong/GPT4RoI."
Utilization of Electronic Medical Records and Biomedical Literature to Support the Diagnosis of Rare Diseases Using Data Fusion and Collaborative Filtering Approaches,paper,2018-10-10,"Background In the United States, a rare disease is characterized as the one affecting no more than 200,000 patients at a certain period. Patients suffering from rare diseases are often either misdiagnosed or left undiagnosed, possibly due to insufficient knowledge or experience with the rare disease on the part of clinical practitioners. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature. Objective This study aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis. Phenotypic information of patients existed in EMRs and biomedical literature could be fully leveraged to speed up diagnosis of diseases. Methods In our previous work, we advanced the use of a collaborative filtering recommendation system to support rare disease diagnostic decision making based on phenotypes derived solely from EMR data. However, the influence of using heterogeneous data with collaborative filtering was not discussed, which is an essential problem while facing large volumes of data from various resources. In this study, to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we designed different data fusion strategies from heterogeneous resources and integrated them with the collaborative filtering model. Results We evaluated performance of the proposed system using characterizations derived from various combinations of EMR data and literature, as well as with sole EMR data. We extracted nearly 13 million EMRs from the patient cohort generated between 2010 and 2015 at Mayo Clinic and retrieved all article abstracts from the semistructured Semantic MEDLINE Database that were published till the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with k-nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with area under the precision-recall curve (PRAUC) 0.475 (string match), 0.511 (systematized nomenclature of medicine [SNOMED] match), and 0.752 (Genetic and Rare Diseases Information Center [GARD] match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F-measure for string, SNOMED, and GARD match were 0.32, 0.42, and 0.63, respectively. Conclusions This study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to further resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis."
Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining,paper,2021-12-24,"Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. The objective of this study is to examine how changes in the ratio of the biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. We downloaded abstracts of 214,892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased when the ratio of the biomedical domain to general domain data was 3:7 to 5:5. This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction."
Utilization of Electronic Medical Records and Biomedical Literature to Support Rare Disease Diagnosis (Preprint),paper,2018-06-15,"
 BACKGROUND
 In the United States, rare diseases are defined as those affecting fewer than 200,000 patients at any given time. Patients with rare diseases are frequently either misdiagnosed or left undiagnosed, possibly due in part to a lack of knowledge or experience with the rare disease on the part of care providers. With an exponentially growing volume of electronically accessible medical data, a large volume of information on thousands of rare diseases and their potentially associated diagnostic information is buried in electronic medical records (EMRs) and medical literature.
 
 
 OBJECTIVE
 We hypothesize that patients’ phenotypic information available within these heterogeneous resources (e.g., electronic medical records and biomedical literature) can be leveraged to accelerate disease diagnosis. In this study, we aimed to leverage information contained in heterogeneous datasets to assist rare disease diagnosis.
 
 
 METHODS
 In a previous study, we proposed utilizing a collaborative filtering recommendation system enriched with natural language processing and semantic techniques to assist rare disease diagnosis based on phenotypic characterizations derived solely from EMR data. In this study, in order to further investigate the performance of collaborative filtering on heterogeneous datasets, we studied EMR data generated at Mayo Clinic as well as published article abstracts retrieved from the Semantic MEDLINE Database. Specifically, in this study, we applied Tanimoto coefficient similarity, overlap coefficient similarity, Fager & McGowan coefficient similarity, and log likelihood ratio similarity with K nearest neighbor and threshold based patient neighbor algorithms on various combinations of datasets.
 
 
 RESULTS
 We evaluated different approaches to this problem using characterizations derived from various combinations of EMR data and literature, as well as with solely EMR data. We extracted 12.8 million EMRs from the Mayo Clinic unstructured patient cohort generated between 2010 through 2015 and retrieved all article abstracts from the semi-structured Semantic MEDLINE Database that were published through the end of 2016. We applied a collaborative filtering model and compared the performance generated by different metrics. Log likelihood ratio similarity combined with K nearest neighbor on heterogeneous datasets showed the optimal performance in patient recommendation with PRAUC 0.475 (string match), 0.511 (SNOMED match), and 0.752 (GARD match). Log likelihood ratio similarity also performed the best with mean average precision 0.465 (string match), 0.5 (SNOMED match), and 0.749 (GARD match). Performance of rare disease prediction was also demonstrated by using the optimal algorithm. Macro-average F-measure for string, SNOMED-CT, and GARD match were 0.32, 0.42, and 0.63, respectively.
 
 
 CONCLUSIONS
 This study demonstrated potential utilization of heterogeneous datasets in a collaborative filtering model to support rare disease diagnosis. In addition to phenotypic-based analysis, in the future, we plan to resolve the heterogeneity issue and reduce miscommunication between EMR and literature by mining genotypic information to establish a comprehensive disease-phenotype-gene network for rare disease diagnosis.
"
Classification of Medical Images in the Biomedical Literature by Jointly Using Deep and Handcrafted Visual Features,paper,2018-09-01,"The classification of medical images and illustrations from the biomedical literature is important for automated literature review, retrieval, and mining. Although deep learning is effective for large-scale image classification, it may not be the optimal choice for this task as there is only a small training dataset. We propose a combined deep and handcrafted visual feature (CDHVF) based algorithm that uses features learned by three fine-tuned and pretrained deep convolutional neural networks (DCNNs) and two handcrafted descriptors in a joint approach. We evaluated the CDHVF algorithm on the ImageCLEF 2016 Subfigure Classification dataset and it achieved an accuracy of 85.47%, which is higher than the best performance of other purely visual approaches listed in the challenge leaderboard. Our results indicate that handcrafted features complement the image representation learned by DCNNs on small training datasets and improve accuracy in certain medical image classification problems."
Health assistant: answering your questions anytime from biomedical literature,paper,2019-03-18,"MOTIVATION
With the abundant medical resources, especially literature available online, it is possible for people to understand their own health status and relevant problems autonomously. However, how to obtain the most appropriate answer from the increasingly large-scale database, remains a great challenge. Here, we present a biomedical question answering framework and implement a system, Health Assistant, to enable the search process.


METHODS
In Health Assistant, a search engine is firstly designed to rank biomedical documents based on contents. Then various query processing and search techniques are utilized to find the relevant documents. Afterwards, the titles and abstracts of top-N documents are extracted to generate candidate snippets. Finally, our own designed query processing and retrieval approaches for short text are applied to locate the relevant snippets to answer the questions.


RESULTS
Our system is evaluated on the BioASQ benchmark datasets, and experimental results demonstrate the effectiveness and robustness of our system, compared to BioASQ participant systems and some state-of-the-art methods on both document retrieval and snippet retrieval tasks. A demo of our system is available at https://github.com/jinzanxia/biomedical-QA."
Trends in using deep learning algorithms in biomedical prediction systems,paper,2023-11-09,"In the domain of using DL-based methods in medical and healthcare prediction systems, the utilization of state-of-the-art deep learning (DL) methodologies assumes paramount significance. DL has attained remarkable achievements across diverse domains, rendering its efficacy particularly noteworthy in this context. The integration of DL with health and medical prediction systems enables real-time analysis of vast and intricate datasets, yielding insights that significantly enhance healthcare outcomes and operational efficiency in the industry. This comprehensive literature review systematically investigates the latest DL solutions for the challenges encountered in medical healthcare, with a specific emphasis on DL applications in the medical domain. By categorizing cutting-edge DL approaches into distinct categories, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), generative adversarial networks (GANs), long short-term memory (LSTM) models, support vector machine (SVM), and hybrid models, this study delves into their underlying principles, merits, limitations, methodologies, simulation environments, and datasets. Notably, the majority of the scrutinized articles were published in 2022, underscoring the contemporaneous nature of the research. Moreover, this review accentuates the forefront advancements in DL techniques and their practical applications within the realm of medical prediction systems, while simultaneously addressing the challenges that hinder the widespread implementation of DL in image segmentation within the medical healthcare domains. These discerned insights serve as compelling impetuses for future studies aimed at the progressive advancement of using DL-based methods in medical and health prediction systems. The evaluation metrics employed across the reviewed articles encompass a broad spectrum of features, encompassing accuracy, precision, specificity, F-score, adoptability, adaptability, and scalability."
Bio-Medical Multi-label Scientific Literature Classification using LWAN and Dual-attention module,paper,2022,"An enormous amount of research has been undertaken to overcome the severe impact of COVID-19 pandemic. These scientific findings are being reported in biomedical literature at a significant rate of 10,000 articles/month. In this paper, we tackle automated topic annotation for COVID-19 literature using SPECTER, Bioformer, and PubMedBERT embeddings using Label-Wise Attention Network (LWAN) based Multi-Label Document Classification (MLDC) using Dual-attention module. We also include literature from cardiovascular domain, to generalise our proposed approach. We significantly, achieve 87.71%,72.83% and 79.75% F1-score on LitCovid, Obsumed, WHO-Covid datasets. We release our code-base here https: //github.com/Deepanshu-beep/ MLDC_LWAN_Attention ."
Ambiguity in medical concept normalization: An analysis of types and coverage in electronic health record datasets,paper,2020-12-15,"OBJECTIVES
Normalizing mentions of medical concepts to standardized vocabularies is a fundamental component of clinical text analysis. Ambiguity-words or phrases that may refer to different concepts-has been extensively researched as part of information extraction from biomedical literature, but less is known about the types and frequency of ambiguity in clinical text. This study characterizes the distribution and distinct types of ambiguity exhibited by benchmark clinical concept normalization datasets, in order to identify directions for advancing medical concept normalization research.


MATERIALS AND METHODS
We identified ambiguous strings in datasets derived from the 2 available clinical corpora for concept normalization and categorized the distinct types of ambiguity they exhibited. We then compared observed string ambiguity in the datasets with potential ambiguity in the Unified Medical Language System (UMLS) to assess how representative available datasets are of ambiguity in clinical language.


RESULTS
We found that <15% of strings were ambiguous within the datasets, while over 50% were ambiguous in the UMLS, indicating only partial coverage of clinical ambiguity. The percentage of strings in common between any pair of datasets ranged from 2% to only 36%; of these, 40% were annotated with different sets of concepts, severely limiting generalization. Finally, we observed 12 distinct types of ambiguity, distributed unequally across the available datasets, reflecting diverse linguistic and medical phenomena.


DISCUSSION
Existing datasets are not sufficient to cover the diversity of clinical concept ambiguity, limiting both training and evaluation of normalization methods for clinical text. Additionally, the UMLS offers important semantic information for building and evaluating normalization methods.


CONCLUSIONS
Our findings identify 3 opportunities for concept normalization research, including a need for ambiguity-specific clinical datasets and leveraging the rich semantics of the UMLS in new methods and evaluation measures for normalization."
Study of classification of traditional Chinese medicine datasets based on the business domain,paper,2016-05-30,"TCM informatization has accelerated its development in the period of rapid development of the traditional Chinese medicine (TCM) industry. Thus, it is important that to classify the TCM industry management datasets on the business domain, as the business domain of TCM datasets are increasingly rich. The study is to analyses the information flow in business domain of TCM industry management datasets, based on the research of experts. The combination of facet and line classification was applied for the classification and then TCM industry management data resources were divided into eight business domains to compile the""Classification code table of TCM industry management datasets on the business domains"". The results of this study helped exchange basic information among hospitals, universities, research institutes, enterprises, cultural industries, Chinese medical research centers, general administration and other institutions. Besides, it can also help provide the reference to the strategic decisions in TCM and health services. 
 
 
Key words: 
Business domain; TCM industry management; Datasets; Classification"
"Research on Anti-Alzheimer’s Traditional Chinese Medicine with Data Security: Datasets, Methods, and Evaluation",paper,2022-03-22,"Alzheimer’s disease (AD), a growing global health concern, has been posing a significant threat to the health of the aging population. The factors contributing to the occurrence and development of AD are extremely complex, including multiple neural networks and multiple targets, which join together to formulate enormous challenges in AD treatment. Traditional Chinese medicine (TCM) possesses the characteristics to regulate multiple targets at the same time, which is consistent with the pathogenesis of AD, moreover, clinical results in TCM treating AD reveal promising effects. In this paper, we first collected anti-Alzheimer’s prescriptions and their therapeutic effects from commonly used literature databases and expanded the data to form the anti-Alzheimer’s TCM dataset. Next, we combined machine learning models to train and analyze the dataset, which was used to predict the effectiveness of new TCM prescriptions. For the first time, we proposed to use the artificial intelligence method to train the properties of nature, flavor, and channel tropism in TCM prescriptions. The accuracy of the prediction model for the effectiveness of anti-Alzheimer’s can reach up to 85%. The experimental results demonstrated that our method can precisely predict the effectiveness of prescriptions against Alzheimer’s disease, and have great value in providing guidance for the development of new anti-Alzheimer’s drugs. Finally, we built a distributed model training architecture based on federated learning to train and predict the effectiveness of TCM prescriptions under the premise of ensuring data security."
Rule-Based Representation Learning for Traditional Chinese Medicine Knowledge Graph,paper,2023-05-23,"Traditional Chinese medicine (TCM) has a unique advantage of preventive treatment of diseases, and adopting the concept of early intervention can effectively prevent diseases. Using knowledge graph is an effective way while the knowledge in the field of TCM is huge and messy. However, the structure of the TCM knowledge graph is often relatively sparse, which makes it highly limited. To this end, a rule-based compositional representation learning (RCRL) model is proposed. RCRL uses the implicit rules in the TCM knowledge graph, which solves the problem of poor representation learning due to the sparse structure of the TCM knowledge graph to a certain extent. Extensive experiments are conducted on the TCM knowledge graph and public datasets, and they are compared with other baselines. Experimental results show that RCRL is superior to other baselines, with improved learning accuracy and interpretability, and can be used for various downstream tasks."
The Integrated Bioinformatic Assay of Genetic Expression Features and Analyses of Traditional Chinese Medicine Specific Constitution Reveal Metabolic Characteristics and Targets in Steatosis of Nonalcoholic Fatty Liver Disease,paper,2023-10-01,"Purpose In this study, our primary aim is to analyze the genetic expression feature and analyze specific Traditional Chinese medicine (TCM) constitution distribution in non-alcoholic fatty liver disease (NAFLD) and reveal the metabolic characteristic of NAFLD. Materials and Methods For revealing genetic features, we obtained the gene expression data from the Gene Expression Omnibus (GEO) database of the National Center for Biotechnology Information (NCBI). The genetic data on NAFLD were analyzed by identifying differentially expressed genes (DEGs), associated pathways, co-expressed genetic networks, and gene set enrichment function. Concurrently, we assessed specific constitution distributions among local NAFLD patients through established TCM constitution models and determined the independent variable, including specific constitution to the NAFLD via the regression analyses. Results The analyses on GEO datasets showed that simple steatosis in NAFLD is strongly associated with HOMA-insulin resistance (HOMA-IR). Analyses of GEO datasets revealed significantly altered genetic expression profiles between NAFLD and normal populations. For TCM constitution analyses, we demonstrated a decline in yin-yang harmony (YYH) and yang-asthenia (YAAC) constitution, whereas there was an increase in qi-stagnation (QSC) and phlegm-dampness (PDC) in NAFLD. The binary logistic regression analysis indicated that besides other metabolic parameters, YYH, qi asthenia (QAC), YYAC, and yin-asthenia (YAC) were the independent variables of NAFLD, while YAC was the independent variables of T2D. The multilinear regression analyses suggested that NAFLD, DM, BMI, waist, TC, TG, hypertension, ALT, AST, and YAC were the significant determinators of the FPG. Conclusion This study presents a relatively comprehensive metabolic profile in steatosis of NAFLD, revealed by significant genetic expression feature alterations and different TCM constitution distribution in NAFLD. Through this method, the study intends to associate the genetic feature with the phenotype of TCM constitution. The results could be applied to assist integrative medicine research in exploring the appropriate personalized approaches for NAFLD."
A Multi-Objective Hyper-Heuristic Clustering Algorithm for Formulas in Traditional Chinese Medicine,paper,2023,"Syndrome types are important for diagnosis and treatment in traditional Chinese medicine. Syndrome types can be summarized by domain experts as formula clusters. In this paper, we propose seven feature models for the formula clustering problem based on categories, subcategories, functional tendencies and names of Chinese materia medica. A novel multi-objective clustering hyper-heuristic algorithm is obtained. In our proposed algorithm, 12 low-level heuristics are used for clustering solution perturbation by merging clusters, dividing clusters or moving points between clusters based on received solutions from the high-level heuristic. The high-level heuristic evaluates the received solutions from low-level heuristics, updates the solution pool, and selects initial solutions for the next iteration via roulette wheel selection on the Pareto front. Experimental results demonstrate that the proposed algorithm outperforms other clustering algorithms in most datasets. The initial number of clusters has less influence on the final clustering solutions for our proposed algorithm than for other clustering algorithms. For most datasets, the roulette wheel selection mechanism on the Pareto front shows higher convergence rates and accuracy than a random selection mechanism. Accuracy was higher for feature models based on functional tendencies than for the other feature models."
YiNet: An Integrated Traditional Chinese and Western Medicine Platform for Viral Infectious Diseases,paper,2023-08-13,"Viral infectious diseases (VIDs) impose a heavy burden on global public health. Traditional Chinese medicine (TCM) has previously and is currently contributing to the prevention and treatment of infectious diseases. Omics and information technology enable the precise identification of virus characteristics, virus‒host interactions and TCM mechanisms. In this study, we constructed the YiNet platform to better integrate the novel techniques and historical experience of TCM in infectious diseases. YiNet comprises three modules: knowledge base, database and toolkit. The YiNet knowledge base involves 43 VIDs, thereby systematically integrating the knowledge regarding viruses, host symptoms and TCM and Western medicine (commonly used chemical drugs, 6,899 herbs and 2,481 formulas). The YiNet database module includes multiple databases, comprising 57,340 genome sequences of 45,791 viral strains and 5,726 multi-omics datasets such as RNA-seq, ChIP-seq and ATAC-seq from different tissues and cell models. It also integrates 1,105 real-world TCM clinical cases. We adopted visual analysis tool to investigate pathogen–host–herb relationships. To explore pharmacological mechanisms for the core herbs, we added formula data mining and network pharmacology analysis pipelines and visualisation tools. YiNet can facilitate the mechanistic study of TCM and drug development for VIDs. The YiNet platform is publicly available at http://yinet.gene.ac/."
"Identification of canonical pyroptosis-related genes, associated regulation axis, and related traditional Chinese medicine in spinal cord injury",paper,2023-05-18,"Neuroinflammation plays an important role in spinal cord injury (SCI), and pyroptosis is inflammatory-related programmed cell death. Although neuroinflammation induced by pyroptosis has been reported in SCI, there is a lack of systematic research on SCI pyroptosis and its regulation mechanism. The purpose of this study was to systematically analyze the expression of pyroptosis-related genes (PRGs) in different SCI models and associated regulation axis by bioinformatics methods. We downloaded raw counts data of seven high-throughput sequencings and two microarray datasets from the GEO database, classified by species (rat and mouse) and SCI modes (moderate contusive model, aneurysm clip impact-compression model, and hemisection model), including mRNAs, miRNAs, lncRNAs, and circRNAs, basically covering the acute, subacute and chronic stages of SCI. We performed differential analysis by R (DEseq2) or GEO2R and found that the AIM2/NLRC4/NLRP3 inflammasome-related genes, GSDMD, IL1B, and IL18, were highly expressed in SCI. Based on the canonical NLRP3 inflammasome-mediated pyroptosis-related genes (NLRP3/PRGs), we constructed transcription factors (TFs)–NLRP3/PRGs, miRNAs- Nlrp3/PRGs and lncRNAs/circRNAs/mRNAs–miRNA- Nlrp3/PRGs (ceRNA) networks. In addition, we also predicted Traditional Chinese medicine (TCM) and small, drug-like molecules with NLRP3/PRGs as potential targets. Finally, 39 up-regulated TFs were identified, which may regulate at least two of NLRP3/PRGs. A total of 7 down-regulated miRNAs were identified which could regulate Nlrp3/PRGs. ceRNA networks were constructed including 23 lncRNAs, 3 cicrRNAs, 6 mRNAs, and 44 miRNAs. A total of 24 herbs were identified which may with two NLRP3/PRGs as potential targets. It is expected to provide new ideas and therapeutic targets for the treatment of SCI."
Bioinformatics analysis of key genes in patients with sarcoidosis and prediction of traditional Chinese Medicine,paper,2023-02-10,"Bioinformatics methods were used to analyze the key genes and related signal paths of sarcoidosis. RNA-seq of sarcoidosis were downloaded from the gene expression omnibus (GEO) database (GSE42826 and GSE42830) and differentially expressed genes (DEGs) were extracted from the two chip datasets. We uesd the gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Disease Ontology (DO) way to analyse DEGs, the cytoscape 3.8.2 softwarethe was uesd to visualize the DEGs. The ggpubr package was used to draw the volcano map of DEGs. Ggplot package to draw bubble charts for GO, KEGG, DO analysis. PPI analysis was used to identified hub genes, hub genes were verified in the GSE19314 dataset. 64 DEGs were obtained in the GSE42826 and GSE42830 datasets, of which 17 genes were down-regulated genes and 47 genes were up-regulated genes. GO analysis indicated that DEGs were mainly enriched in external stimuli, defense responses, and responses to biological stimuli in other biological processes, KEGG analysis showed that DEGs mainly affect NOD-like receptor signaling pathways, programmed pell death-ligand 1(PD-L1) expression, programmeddeath-1(PD-1) checkpoint pathways in cancer and cytoplasmic deoxyribonucleic acid (DNA) sensing pathways. The results of DO analysis showed that DEGs were associated with bacterial infectious diseases, hepatitis, primary bacterial infectious diseases. 8 hub genes, including C-X-C motif chemokine ligand 10 (CXCL10), interferon induced protein 44 (IFI44) and interferon induced protein with tetratricopeptide repeats 3 (IFIT3), were all significantly up-regulated in sarcoidosis group. Further analysis showed that sarcoidosis was sensitive to pinellia, radix isatidis, ephedra and phellodendri. This work showed that 10 hub genes may become relevant targets for diagnosis and treatment of patients with sarcoidosis and provided a new idea for the pathogenesis and treatment of sarcoidosis."
Disease Markers and Therapeutic Targets for Rheumatoid Arthritis Identified by Integrating Bioinformatics Analysis with Virtual Screening of Traditional Chinese Medicine.,paper,2022-09-28,"OBJECTIVE
The aim of this study was to identify potentially important Rheumatoid arthritis (RA) targets related to immune cells based on bioinformatics analysis, and to identify small molecules of traditional Chinese medicine (TCM) associated with these targets that have potential therapeutic effects on RA.


METHODS
Gene expression profile data related to RA were downloaded from the Gene Expression Omnibus (GSE55235, GSE55457, and GSE77298), and datasets were merged by the batch effect removal method. The RA key gene set was identified by protein-protein interaction network analysis and machine learning-based feature extraction. Furthermore, immune cell infiltration analysis was carried out on all DEGs to obtain key RA markers related to immune cells. Batch molecular docking of key RA markers was performed on our previously compiled dataset of small molecules in TCM using AutoDock Vina. Moreover, in vitro experiments were performed to examine the inhibitory effect of screened compounds on the synovial cells of an RA rat model.


RESULTS
The PPI network and feature extraction with machine learning classifiers identified eight common key RA genes: MYH11, CFP, LY96, IGJ, LPL, CD48, RAC2, and CSK. RAC2 was significantly correlated with the infiltration and expression of five immune cells, with significant differences in these immune cells in the normal and RA samples. Molecular docking and in vitro experiments also showed that sanguinarine, sesamin, and honokiol could effectively inhibit the proliferation of RA rat synovial cells, also could all effectively inhibit the secretion of TNF-α and IL-1β in synovial cells, and had a certain inhibitory effect on expression of the target protein RAC2.


CONCLUSIONS
The core gene set of RA was screened from a new perspective, revealing biomarkers related to immune cell infiltration. Using molecular docking, we screened out TCM small molecules for the treatment of RA, providing methods and technical support for the treatment of RA with TCM."
"Panacea: A foundation model for clinical trial search, summarization, design, and recruitment",paper,2024-06-25,"Clinical trials are fundamental in developing new drugs, medical devices, and treatments. However, they are often time-consuming and have low success rates. Although there have been initial attempts to create large language models (LLMs) for clinical trial design and patient-trial matching, these models remain task-specific and not adaptable to diverse clinical trial tasks. To address this challenge, we propose a clinical trial foundation model named Panacea, designed to handle multiple tasks, including trial search, trial summarization, trial design, and patient-trial matching. We also assemble a large-scale dataset, named TrialAlign, of 793,279 trial documents and 1,113,207 trial-related scientific papers, to infuse clinical knowledge into the model by pre-training. We further curate TrialInstruct, which has 200,866 of instruction data for fine-tuning. These resources enable Panacea to be widely applicable for a range of clinical trial tasks based on user requirements. We evaluated Panacea on a new benchmark, named TrialPanorama, which covers eight clinical trial tasks. Our method performed the best on seven of the eight tasks compared to six cutting-edge generic or medicine-specific LLMs. Specifically, Panacea showed great potential to collaborate with human experts in crafting the design of eligibility criteria, study arms, and outcome measures, in multi-round conversations. In addition, Panacea achieved 14.42% improvement in patient-trial matching, 41.78% to 52.02% improvement in trial search, and consistently ranked at the top for five aspects of trial summarization. Our approach demonstrates the effectiveness of Panacea in clinical trials and establishes a comprehensive resource, including training data, model, and benchmark, for developing clinical trial foundation models, paving the path for AI-based clinical trial development."
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation,paper,2023-06-16,"Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks, leveraging techniques such as the pre-training, and instruction fine-tuning. Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience. In this study, we present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios. By incorporating extensive and diverse real-world data, such as medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process, ClinicalGPT is better prepared to handle multiple clinical task. Furthermore, we introduce a comprehensive evaluation framework that includes medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Our results demonstrate that ClinicalGPT significantly outperforms other models in these tasks, highlighting the effectiveness of our approach in adapting large language models to the critical domain of healthcare."
"PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance",paper,2023-06-08,"Although large language models (LLMs) has shown great performance on natural language processing (NLP) in the financial domain, there are no publicly available financial tailtored LLMs, instruction tuning datasets, and evaluation benchmarks, which is critical for continually pushing forward the open-source development of financial artificial intelligence (AI). This paper introduces PIXIU, a comprehensive framework including the first financial LLM based on fine-tuning LLaMA with instruction data, the first instruction data with 136K data samples to support the fine-tuning, and an evaluation benchmark with 5 tasks and 9 datasets. We first construct the large-scale multi-task instruction data considering a variety of financial tasks, financial document types, and financial data modalities. We then propose a financial LLM called FinMA by fine-tuning LLaMA with the constructed dataset to be able to follow instructions for various financial tasks. To support the evaluation of financial LLMs, we propose a standardized benchmark that covers a set of critical financial tasks, including five financial NLP tasks and one financial prediction task. With this benchmark, we conduct a detailed analysis of FinMA and several existing LLMs, uncovering their strengths and weaknesses in handling critical financial tasks. The model, datasets, benchmark, and experimental results are open-sourced to facilitate future research in financial AI."
Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19,paper,2023-04-11,"Background Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. Objective We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. Methods Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. Results The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. Conclusions This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research."
Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks,paper,2023-11-20,"OBJECTIVE
Most existing fine-tuned biomedical large language models (LLMs) focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical natural language processing (NLP) tasks in different languages, we present Taiyi, a bilingual fine-tuned LLM for diverse biomedical NLP tasks.


MATERIALS AND METHODS
We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, these corpora were converted to the instruction data used to fine-tune the general LLM. During the supervised fine-tuning phase, a 2-stage strategy is proposed to optimize the model performance across various tasks.


RESULTS
Experimental results on 13 test sets, which include named entity recognition, relation extraction, text classification, and question answering tasks, demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multitasking.


CONCLUSION
Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multitasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches using smaller language models."
Synthetic Augmentation with Large-scale Unconditional Pre-training,paper,2023-08-08,"Deep learning based medical image recognition systems often require a substantial amount of training data with expert annotations, which can be expensive and time-consuming to obtain. Recently, synthetic augmentation techniques have been proposed to mitigate the issue by generating realistic images conditioned on class labels. However, the effectiveness of these methods heavily depends on the representation capability of the trained generative model, which cannot be guaranteed without sufficient labeled training data. To further reduce the dependency on annotated data, we propose a synthetic augmentation method called HistoDiffusion, which can be pre-trained on large-scale unlabeled datasets and later applied to a small-scale labeled dataset for augmented training. In particular, we train a latent diffusion model (LDM) on diverse unlabeled datasets to learn common features and generate realistic images without conditional inputs. Then, we fine-tune the model with classifier guidance in latent space on an unseen labeled dataset so that the model can synthesize images of specific categories. Additionally, we adopt a selective mechanism to only add synthetic samples with high confidence of matching to target labels. We evaluate our proposed method by pre-training on three histopathology datasets and testing on a histopathology dataset of colorectal cancer (CRC) excluded from the pre-training datasets. With HistoDiffusion augmentation, the classification accuracy of a backbone classifier is remarkably improved by 6.4% using a small set of the original labels. Our code is available at https://github.com/karenyyy/HistoDiffAug."
Exploring the Impact of Instruction Data Scaling on Large Language Models: An Empirical Study on Real-World Use Cases,paper,2023-03-26,"The success of ChatGPT has recently attracted numerous efforts to replicate it, with instruction-tuning strategies being a key factor in achieving remarkable results. Instruction-tuning not only significantly enhances the model's performance and generalization but also makes the model's generated results more consistent with human speech patterns. However current research rarely studies the impact of different amounts of instruction data on model performance, especially in the real-world use cases. In this paper we explore the performance of large language models based on instruction tuning across different scales of instruction data. An evaluation dataset consisting of 12 major online use cases is constructed in the experiment. With Bloomz-7B1-mt as the base model, the results show that 1) merely increasing the amount of instruction data leads to continuous improvement in tasks such as open-ended generation, 2) in tasks such as math and code, the model performance curve remains quite flat while increasing data size. We further analyze the possible causes of these phenomena and propose potential future research directions such as effectively selecting high-quality training data, scaling base models and training methods specialized for hard tasks. We will release our training and evaluation datasets, as well as model checkpoints."
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval,paper,2022-02-15,"We introduce CommerceMM - a multimodal model capable of providing a diverse and granular understanding of commerce topics associated to the given piece of content (image, text, image+text), and having the capability to generalize to a wide range of tasks, including Multimodal Categorization, Image-Text Retrieval, Query-to-Product Retrieval, Image-to-Product Retrieval, etc. We follow the pre-training + fine-tuning training regime and present 5 effective pre-training tasks on image-text pairs. To embrace more common and diverse commerce data with text-to-multimodal, image-to-multimodal, and multimodal-to-multimodal mapping, we propose another 9 novel cross-modal and cross-pair retrieval tasks, called Omni-Retrieval pre-training. We also propose a novel approach of modality randomization to dynamically adjust our model under different efficiency constraints. The pre-training is conducted in an efficient manner with only two forward/backward updates for the combined 14 tasks. Extensive experiments and analysis show the effectiveness of each task. When combining all pre-training tasks, our model achieves state-of-the-art performance on 7 commerce-related downstream tasks after fine-tuning."
"RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models",paper,2023-10-01,"The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4)."
Character-LLM: A Trainable Agent for Role-Playing,paper,2023-10-16,"Large language models (LLMs) can be used to serve as agents to simulate human behaviors, given the powerful ability to understand human instructions and provide high-quality generated texts. Such ability stimulates us to wonder whether LLMs can simulate a person in a higher form than simple human behaviors. Therefore, we aim to train an agent with the profile, experience, and emotional states of a specific person instead of using limited prompts to instruct ChatGPT API. In this work, we introduce Character-LLM that teach LLMs to act as specific people such as Beethoven, Queen Cleopatra, Julius Caesar, etc. Our method focuses on editing profiles as experiences of a certain character and training models to be personal simulacra with these experiences. To assess the effectiveness of our approach, we build a test playground that interviews trained agents and evaluates whether the agents \textit{memorize} their characters and experiences. Experimental results show interesting observations that help build future simulacra of humankind."
Social Conflict in Role-Playing Communities: An Exploratory Qualitative Study,paper,2023-03-16,"Much of the current research in the field of role-playing studies focuses upon the positive impact that games can have on the lives of participants. Analysis of the more negative social interactions within role-playing communities becomes necessary in order to establish a more complete picture of the psychosocial effects of these games. This research describes potential problems within role-playing communities in order to aid groups experiencing cohesion difficulties.
This thematic, qualitative ethnography describes the types of social conflict occurring within role-playing groups and examines possible sources for their exacerbation. The study includes several types of role-playing from a phenomenological perspective, including tabletop, larp, and virtual gaming. Semi-structured interviews were collected from a selective sample of 30 international participants gathered from vastly different play cultures. While the types of games and methods of play contributed to conflict in some instances, striking similarities between the experiences of players across modes, cultures, and genres were observed.
Emergent themes for sources of conflict included general problems inherent to group behavior, such as schisms, Internet communication, and intimate relationships. Other sources of conflict unique to the role-playing experience included creative agenda differences, the game master/player power differential, and the phenomenon of bleed, both in- and out-of-game. Potentially conflict-inducing play styles included long-term immersion into character, campaign-style, and competitive play."
Implementation of Role-Playing Games in Overcoming Introverted Children,paper,2021,"This study aims to analyze and examine the application of role-playing games in overcoming introverted children in early childhood at RA Uswatun Hasanah, Maron, Probolinggo. This research uses a qualitative approach, while the type of research uses case studies. The data analysis technique uses data reduction, data display, and drawing conclusions or verification. The results showed that the teacher's steps in implementing role-playing games in overcoming the problems of introverted children through; Preparation and Planning Analysis, Role-Playing Engineering, Activity Documentation, Activity Evaluation. This research has implications for the use of role-playing, especially at RA Uswatun Hasanah. Introverted children begin to be able to mingle and even adapt to their friends, albeit slowly."
Educational Innovation in Higher Education: Use of Role Playing and Educational Video in Future Teachers’ Training,paper,2020-03-24,"Information and communication technologies (ICTs) have led to the emergence of a variety of active and innovative teaching methods. This is the case in role-playing, which consists of simulating a real-life situation, in this case the school context, in which the student takes on a certain role and interacts with other students in a fictitious situation. Framed in this way, the present study aims to show if the application of the role-playing method promotes the improvement of attitude variables and practical skills. To this end, we advocated the use of a quasi-experimental methodology, with a control and experimental group and the application of a post-test. The sample is composed of 138 students from the Master of Teachers of Compulsory Secondary Education in Ceuta (Spain). The results showed that the students positively valued the application of the method, obtaining better scores in the set of variables studied, especially in motivation, creativity and collaboration. Therefore, it continues to be observed that the application of innovative methodologies through technology promotes the increase of multiple skills in the student body. This study aimed to prove that the use of active methods provides an increase in students’ skills, and that, therefore, we must bet on the use of sustainable pedagogies in order to promote a real innovation in the classrooms."
What.Hack: Engaging Anti-Phishing Training Through a Role-playing Phishing Simulation Game,paper,2019-05-02,"Phishing attacks are a major problem, as evidenced by the DNC hackings during the 2016 US presidential election, in which staff were tricked into sharing passwords by fake Google security emails, granting access to confidential information. Vulnerabilities such as these are due in part to insufficient and tiresome user training in cybersecurity. Ideally, we would have more engaging training methods that teach cybersecurity in an active and entertaining way. To address this need, we introduce the game What.Hack, which not only teaches phishing concepts but also simulates actual phishing attacks in a role-playing game to encourage the player to practice defending themselves. Our user study shows that our game design is more engaging and effective in improving performance than a standard form of training and a competing training game design (which does not simulate phishing attempts through role-playing)."
Computer-Generated Music for Tabletop Role-Playing Games,paper,2020-08-16,"In this paper we present Bardo Composer, a system to generate background music for tabletop role-playing games. Bardo Composer uses a speech recognition system to translate player speech into text, which is classified according to a model of emotion. Bardo Composer then uses Stochastic Bi-Objective Beam Search, a variant of Stochastic Beam Search that we introduce in this paper, with a neural model to generate musical pieces conveying the desired emotion. We performed a user study with 116 participants to evaluate whether people are able to correctly identify the emotion conveyed in the pieces generated by the system. In our study we used pieces generated for Call of the Wild, a Dungeons and Dragons campaign available on YouTube. Our results show that human subjects could correctly identify the emotion of the generated music pieces as accurately as they were able to identify the emotion of pieces written by humans."
A virtual reality role-playing serious game for experiential learning,paper,2019-12-17,"ABSTRACT Educational systems can benefit from Virtual Reality’s (VR) ability to support experiential learning. In particular, VR based games, especially role-playing serious games (RPGs), can promote learning through the simulation of various educational scenarios. This study proposes an immersive VR-RPG to educate players about the behavior of honeybees. The player adopts the role of a honeybee and experiences a virtual world mimicking the real one from the honeybee’s perspective. Unlike most studies in educational VR, we assess the impact of immersion on knowledge gain by testing the players’ knowledge on the subject before, immediately after, and one week following the use of the system. We also compare the proposed system with both a conventional and a desktop VR-RPG approach. The results indicate that students significantly gained knowledge in all methods compared to the pre-test. We found that the immersion level for both tested VR-RPGs did not have a significant effect on learning. However, the study showed an improvement in knowledge retention for the desktop VR-RPG users compared to those of the conventional method. Moreover, the results revealed that users of the immersive and desktop VR-RPGs were more motivated and engaged compared to those of the conventional method."
CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation,paper,2024-01-02,"Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA assessment, complemented by a tailored high-quality dataset. The dataset comprises 1,785 multi-turn role-playing dialogues, encompassing 23,020 examples and featuring 77 characters derived from Chinese novels and scripts. It was carefully constructed, beginning with initial dialogue extraction via GPT-4, followed by rigorous human-led quality control, and enhanced with in-depth character profiles sourced from Baidu Baike. CharacterEval employs a multifaceted evaluation approach, encompassing thirteen targeted metrics on four dimensions. Comprehensive experiments on CharacterEval demonstrate that Chinese LLMs exhibit more promising capabilities than GPT-4 in Chinese role-playing conversation. Source code, data source and reward model will be publicly accessible at https://github.com/morecry/CharacterEval."
A New Method for Peer Matching and Negotiation of Prosumers in Peer-to-Peer Energy Markets,paper,2021-03-29,"This article presents a scalable mechanism for peer-to-peer (P2P) energy trading among prosumers in a smart grid. In the proposed mechanism, prosumers engage in a non-mediated negotiation with their peers to reach an agreement on the price and quantity of energy to be exchanged. Instead of concurrent bilateral negotiation between all peers with high overheads, an iterative peer matching process is employed to match peers for bilateral negotiation. The proposed negotiation algorithm enables prosumers to come to an agreement, given that they have no prior knowledge about the preference structure of their trading partners. A greediness factor is introduced to model the selfish behavior of prosumers in the negotiation process and to investigate its impact on the negotiation outcome. In order to recover the costs related to power losses, a transaction fee is applied to each transaction that enables the grid operator to recover incurred losses due to P2P trades. The case studies demonstrate that the proposed mechanism discourages greedy behavior of prosumers in the negotiation process as it does not increase their economic surplus. Also, it has an appropriate performance from the computation overheads and scalability perspectives."
Deal or No Deal? End-to-End Learning of Negotiation Dialogues,paper,2017-06-01,"Much of human dialogue occurs in semi-cooperative settings, where agents with different goals attempt to agree on common decisions. Negotiations require complex communication and reasoning skills, but success is easy to measure, making this an interesting task for AI. We gather a large dataset of human-human negotiations on a multi-issue bargaining task, where agents who cannot observe each other’s reward functions must reach an agreement (or a deal) via natural language dialogue. For the first time, we show it is possible to train end-to-end models for negotiation, which must learn both linguistic and reasoning skills with no annotated dialogue states. We also introduce dialogue rollouts, in which the model plans ahead by simulating possible complete continuations of the conversation, and find that this technique dramatically improves performance. Our code and dataset are publicly available."
Decoupling Strategy and Generation in Negotiation Dialogues,paper,2018-08-29,"We consider negotiation settings in which two agents use natural language to bargain on goods. Agents need to decide on both high-level strategy (e.g., proposing $50) and the execution of that strategy (e.g., generating “The bike is brand new. Selling for just $50!”). Recent work on negotiation trains neural models, but their end-to-end nature makes it hard to control their strategy, and reinforcement learning tends to lead to degenerate solutions. In this paper, we propose a modular approach based on coarse dialogue acts (e.g., propose(price=50)) that decouples strategy and generation. We show that we can flexibly set the strategy using supervised learning, reinforcement learning, or domain-specific knowledge without degeneracy, while our retrieval-based generation can maintain context-awareness and produce diverse utterances. We test our approach on the recently proposed DEALORNODEAL game, and we also collect a richer dataset based on real items on Craigslist. Human evaluation shows that our systems achieve higher task success rate and more human-like negotiation behavior than previous approaches."
Emergent Communication through Negotiation,paper,2018-02-15,"Multi-agent reinforcement learning offers a way to study how communication could emerge in communities of agents needing to solve specific problems. In this paper, we study the emergence of communication in the negotiation environment, a semi-cooperative model of agent interaction. We introduce two communication protocols -- one grounded in the semantics of the game, and one which is \textit{a priori} ungrounded and is a form of cheap talk. We show that self-interested agents can use the pre-grounded communication channel to negotiate fairly, but are unable to effectively use the ungrounded channel. However, prosocial agents do learn to use cheap talk to find an optimal negotiating strategy, suggesting that cooperation is necessary for language to emerge. We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation."
Culture and Negotiation Strategy,paper,2017-03-27,"In this article the authors investigate the relationship between culture and joint gains by examining the role of information sharing and power strategies in intracultural negotiations. Previously, the authors found that the relationship between cultural values or norms and joint gains was uncertain in six cultures: France, Russia, Japan, Hong Kong, Brazil, and the United States. Of the five values and norms measured, only norms for information sharing in negotiation were directly related to joint gains. This article explores and extends prior findings by investigating the strategies used by negotiators in the same six cultures. Cultures that maximized joint gains used direct information-sharing strategies or a combination of indirect and direct strategies. Power strategies may help or hurt joint gains, depending on a culture's values and norms for power and whether or not power-based influence is used in conjunction with sufficient information exchange. The findings suggest that understanding the other party's cultural characteristics and strategies can help negotiators plan how to focus on information exchange and deal with unusual power strategies that they may encounter."
A meta-analysis on gender differences in negotiation outcomes and their moderators.,paper,2015,"This meta-analysis investigates gender differences in economic negotiation outcomes. As suggested by role congruity theory, we assume that the behaviors that increase economic negotiation outcomes are more congruent with the male as compared with the female gender role, thereby presenting challenges for women's negotiation performance and reducing their outcomes. Importantly, this main effect is predicted to be moderated by person-based, situation-based, and task-based influences that make effective negotiation behavior more congruent with the female gender role, which should in turn reduce or even reverse gender differences in negotiation outcomes. Using a multilevel modeling approach, this meta-analysis includes 123 effect sizes (overall N = 10,888, including undergraduate and graduate students as well as businesspeople). Studies were included when they enabled the calculation of an effect size reflecting gender differences in achieved economic negotiation outcomes. As predicted, men achieved better economic outcomes than women on average, but gender differences strongly depended on the context: Moderator analysis revealed that gender differences favoring men were reduced when negotiators had negotiation experience, when they received information about the bargaining range, and when they negotiated on behalf of another individual. Moreover, gender differences were reversed under conditions of the lowest predicted role incongruity for women. In conclusion, gender differences in negotiations are contextually bound and can be subject to change. Future research is needed that investigates the underlying mechanisms of new moderators revealed in the current research (e.g., experience). Implications for theoretical explanations of gender differences in negotiation outcomes, for gender inequalities in the workplace, and for future research are discussed."
Deliberative Negotiation,paper,2018-09-06,"This chapter addresses three questions: What is deliberative negotiation? How can deliberative negotiation be achieved? What does deliberative negotiation do? First, deliberative negotiation is a communication process that contributes to reaching binding decisions in democratic politics, and is characterized by justification, mutual respect, and the absence of coercion. Second, three sets of conditions—related to 1) formal institutions, 2) social context, 3) issue characteristics—conduce “deliberative moments” in a negotiation. The chapter illustrates how these conditions work, with a focus on EU negotiations. Third, we explore the impact of deliberative negotiation on delivering outcomes tout court (e.g. by offering solutions to the negotiators’ dilemma) and on producing “better” outcomes (e.g. by increasing the likelihood of overall preference satisfaction). The chapter concludes that both the process and outcome of deliberative negotiation can instil legitimacy even when other aspects of a negotiation (or of the political system itself) struggle to do so."
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents,paper,2023-03-30,"Large language models (LLMs) have emerged as valuable tools for many natural language understanding tasks. In safety-critical applications such as healthcare, the utility of these models is governed by their ability to generate factually accurate and complete outputs. In this work, we present dialog-enabled resolving agents (DERA). DERA is a paradigm made possible by the increased conversational abilities of LLMs. It provides a simple, interpretable forum for models to communicate feedback and iteratively improve output. We frame our dialog as a discussion between two agent types – a Researcher, who processes information and identifies crucial problem components, and a Decider, who has the autonomy to integrate the Researcher’s information and makes judgments on the final output.We test DERA against three clinically-focused tasks, with GPT-4 serving as our LLM. DERA shows significant improvement over the base GPT-4 performance in both human expert preference evaluations and quantitative metrics for medical conversation summarization and care plan generation. In a new finding, we also show that GPT-4’s performance (70%) on an open-ended version of the MedQA question-answering (QA) dataset (Jin 2021; USMLE) is well above the passing level (60%), with DERA showing similar performance. We will release the open-ended MedQA dataset."
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers,paper,2023,"Large Language Models (LLMs) have showcased remarkable capabilities in natural language understanding in various domains. These models can usually behave well on daily dialog, or question answering scenarios, however, in areas that value precision, for example, in medical applications, they often exhibit unsatisfactory performance due to a lack of domain-speciﬁc knowledge. In this report, we introduce PMC-LLaMA, an open-source language model that is acquired by ﬁne-tuning an open-source language model on a total of 4.8 million biomedical academic papers for further injecting medical knowledge, enhancing its capability in medical domain. Our preliminary evaluations are conducted on three biomedical QA datasets, including PubMedQA, MedMCQA, and USMLE, showing that the our model after ﬁnetuning, i.e. , PMC-LLaMA, demonstrates better understanding of biomedical domain-speciﬁc concepts, thus achieving high performance on QA benchmarks. The model and codes, along with an online demo, are publicly available 1 , 2 ."
A Comprehensive Survey on Evaluating Large Language Model Applications in the Medical Industry,paper,2024-04-24,"Since the inception of the Transformer architecture in 2017, Large Language Models (LLMs) such as GPT and BERT have evolved significantly, impacting various industries with their advanced capabilities in language understanding and generation. These models have shown potential to transform the medical field, highlighting the necessity for specialized evaluation frameworks to ensure their effective and ethical deployment. This comprehensive survey delineates the extensive application and requisite evaluation of LLMs within healthcare, emphasizing the critical need for empirical validation to fully exploit their capabilities in enhancing healthcare outcomes. Our survey is structured to provide an in-depth analysis of LLM applications across clinical settings, medical text data processing, research, education, and public health awareness. We begin by exploring the roles of LLMs in various medical applications, detailing their evaluation based on performance in tasks such as clinical diagnosis, medical text data processing, information retrieval, data analysis, and educational content generation. The subsequent sections offer a comprehensive discussion on the evaluation methods and metrics employed, including models, evaluators, and comparative experiments. We further examine the benchmarks and datasets utilized in these evaluations, providing a categorized description of benchmarks for tasks like question answering, summarization, information extraction, bioinformatics, information retrieval and general comprehensive benchmarks. This structure ensures a thorough understanding of how LLMs are assessed for their effectiveness, accuracy, usability, and ethical alignment in the medical domain. ..."
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models,paper,2024-03-27,"Large Language Models (LLMs) like ChatGPT and GPT-4 are versatile and capable of addressing a diverse range of tasks. However, general LLMs, which are developed on open-domain data, may lack the domain-specific knowledge essential for tasks in vertical domains, such as legal, medical, etc. To address this issue, previous approaches either conduct continuous pre-training with domain-specific data or employ retrieval augmentation to support general LLMs. Unfortunately, these strategies are either cost-intensive or unreliable in practical applications. To this end, we present a novel framework named BLADE, which enhances Black-box LArge language models with small Domain-spEcific models. BLADE consists of a black-box LLM and a small domain-specific LM. The small LM preserves domain-specific knowledge and offers specialized insights, while the general LLM contributes robust language comprehension and reasoning capabilities. Specifically, our method involves three steps: 1) pre-training the small LM with domain-specific data, 2) fine-tuning this model using knowledge instruction data, and 3) joint Bayesian optimization of the general LLM and the small LM. Extensive experiments conducted on public legal and medical benchmarks reveal that BLADE significantly outperforms existing approaches. This shows the potential of BLADE as an effective and cost-efficient solution in adapting general LLMs for vertical domains."
Evaluating and Enhancing Large Language Models’ Performance in Domain-Specific Medicine: Development and Usability Study With DocOA,paper,2024-03-07,"Background The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective This study focused on evaluating and enhancing the clinical capabilities and explainability of LLMs in specific domains, using OA management as a case study. Methods A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explainability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs."
ClinicalRAG: Enhancing Clinical Decision Support through Heterogeneous Knowledge Retrieval,paper,2024,"Large Language Models (LLMs) have revolutionized text generation across diverse domains, showcasing an ability to mimic human-like text with remarkable accuracy. Yet, these models frequently encounter a significant hurdle: producing hallucinations, a flaw particularly detrimental in the healthcare domain where precision is crucial. In this paper, we introduce ClinicalRAG, a novel multi-agent pipeline to rectify this issue by incorporating heterogeneous medical knowledge—both structured and unstructured—into LLMs to bolster diagnosis accuracy. ClinicalRAG can extract related medical entities from user inputs and dynamically integrate relevant medical knowledge during the text generation process. Comparative analyses reveal that ClinicalRAG significantly outperforms knowledge-deficient methods, offering enhanced reliability in clinical decision support. This advancement marks a pivotal proof-of-concept step towards mitigating misinformation risks in healthcare applications of LLMs."
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs,paper,2024-08-06,"Recent advancements in Large Language Models (LLMs) have showcased their proficiency in answering natural language queries. However, their effectiveness is hindered by limited domain-specific knowledge, raising concerns about the reliability of their responses. We introduce a hybrid system that augments LLMs with domain-specific knowledge graphs (KGs), thereby aiming to enhance factual correctness using a KG-based retrieval approach. We focus on a medical KG to demonstrate our methodology, which includes (1) pre-processing, (2) Cypher query generation, (3) Cypher query processing, (4) KG retrieval, and (5) LLM-enhanced response generation. We evaluate our system on a curated dataset of 69 samples, achieving a precision of 78\% in retrieving correct KG nodes. Our findings indicate that the hybrid system surpasses a standalone LLM in accuracy and completeness, as verified by an LLM-as-a-Judge evaluation method. This positions the system as a promising tool for applications that demand factual correctness and completeness, such as target identification -- a critical process in pinpointing biological entities for disease treatment or crop enhancement. Moreover, its intuitive search interface and ability to provide accurate responses within seconds make it well-suited for time-sensitive, precision-focused research contexts. We publish the source code together with the dataset and the prompt templates used."
Large language models: a primer and gastroenterology applications,paper,2024-01-01,"Over the past year, the emergence of state-of-the-art large language models (LLMs) in tools like ChatGPT has ushered in a rapid acceleration in artificial intelligence (AI) innovation. These powerful AI models can generate tailored and high-quality text responses to instructions and questions without the need for labor-intensive task-specific training data or complex software engineering. As the technology continues to mature, LLMs hold immense potential for transforming clinical workflows, enhancing patient outcomes, improving medical education, and optimizing medical research. In this review, we provide a practical discussion of LLMs, tailored to gastroenterologists. We highlight the technical foundations of LLMs, emphasizing their key strengths and limitations as well as how to interact with them safely and effectively. We discuss some potential LLM use cases for clinical gastroenterology practice, education, and research. Finally, we review critical barriers to implementation and ongoing work to address these issues. This review aims to equip gastroenterologists with a foundational understanding of LLMs to facilitate a more active clinician role in the development and implementation of this rapidly emerging technology."
Harnessing Large Language Models in Medical Research and Scientific Writing: A Closer Look to The Future,paper,2023-12-09,"Large Language Models (LLMs), a form of artificial intelligence generating natural language responses based on user input, have demonstrated potential across various applications such as entertainment, education, and customer service. This review comprehensively highlights their current research status and potential applications within the medical domain, addressing the challenges and opportunities for future development and implementation. Key aspects covered include diverse data sources for training and testing, such as electronic health records and clinical trials; ethical considerations, including privacy and consent; evaluation techniques focusing on accuracy and coherence; and clinical applications ranging from diagnosis to patient education. The review concludes that LLMs hold significant promise for enhancing the quality and efficiency of medical research and scientific writing but also emphasize the need for careful design and regulation to ensure safety and reliability."
"A Comprehensive Review of AI in Healthcare: Exploring Neural Networks in Medical Imaging, LLM-Based Interactive Response Systems, NLP-Based EHR Systems, Ethics, and Beyond",paper,2023-12-23,"The AI-based technologies used in healthcare systems have witnessed significant growth and innovation, as this growth is attributed to innovations in AI and rise in data collection in the healthcare sector. This survey paper provides a comprehensive overview of the diverse technological advancements reshaping the healthcare landscape. The reviewed topics include Medical Image Interpretation using Deep Learning, Generative AI-based Large Language Models (LLMs), Natural Language Processing for Healthcare Records to give a sense of what AI based systems look like in healthcare. For each of these topics, we've delved into their technical aspects and their applications. Through an overview of these cutting-edge technologies, this research aims to shed light on their current state, challenges, and potential implications for the future of health care. From enhancing diagnostics to improving patient care and accessibility, AI is poised to play pivotal roles in shaping the healthcare industry for years to come. Furthermore, this survey also delves into the ethical considerations surrounding these technologies."
