
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<!-- saved from url=(0050)https://blablablab.si.umich.edu/projects/intimacy/ -->
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-HW16X526HL"></script>
<script>
  window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'G-HW16X526HL');
</script>


<html lang="en"><!--<div id="Introduction" class="row">

    <div class="col-sm-3 col-md-2 sidebar" style="padding:70px 1000px 0px 320px;">
        <ul class="nav nav-sidebar" style="direction:rtl; list-style-type:none;">
            <li><a href="#Introduction" style="font-size:20px;">Introduction</a></li>
            <li><a href="#Getting" style="font-size:20px;">Getting started</a></li>
            <li><a href="#Data" style="font-size:20px;">Datasets</a></li>
            <li><a href="#Highlights" style="font-size:20px;">Highlights</a></li>
            <li><a href="#Cite" style="font-size:20px;">Cite our paper</a></li>
        </ul>
 </div>--><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">


	<title> Measuring Uncertainty in Science Communications with Natural Language Processing (NLP)</title>
	<meta name="author" content="Jiaxin Pei">
	<meta name="description" content="We use NLP to measure scientific uncertainty in science communications. How to communicate scientific uncertainty has long been a challenging question to both the scientists and journalists.  ">
	<meta name="keywords" content="Scientific uncertainty, uncertainty in science communication, communicating uncertainty, communicating scientific uncertainty, science communication, jiaxin pei, david jurgens, computational social science, certainty, uncertainty, science communication, science journalism, language, sociolinguistics, hedging, natural language processing, nlp, computational linguistics">
	<meta name="google-site-verification" content="vLgAQ2fv6VyLf4-eZcb7eri7fQKRf93cOb4BFHU6u9A" />
        <link rel="stylesheet" type="text/css" href="./files/screen.css">
        <link rel="stylesheet" type="text/css" href="./files/font-awesome.css">

  <style type="text/css">
code {  }
</style>

<style piggy-ext="" data-styled-version="4.4.1"></style><style piggy-ext="" data-styled-version="4.4.1"></style></head>
<body data-new-gr-c-s-check-loaded="14.1028.0" data-gr-ext-installed="">
	<div id="OuterCanvas">
		<div id="InnerCanvas">
			<div id="Page">

				<div class="title">
				<span class="name" style="padding-left:-30px"><a href="https://jiaxin-pei.github.io/project_websites/certainty/files/EMNLP_2021_Certainty.pdf">Measuring Sentence-level and Aspect-level (Un)certainty in Science Communications</a></span>
				<span>
                                  <a class="pubauthor" style="padding-left:400px" target="_blank" href="https://jiaxin-pei.github.io/">Jiaxin Pei</a>, </span> &nbsp;
                                  <a class="pubauthor" target="_blank" href="http://jurgens.people.si.umich.edu/">David Jurgens</a>

				</div>







<div class="heading">
<p> Introduction </p>
</div>
<div class="entry">
<div style="padding:0px 0px 0px -10px; width:470px; float: right" align="right" class="imglink">
       <a href="./files/certainty_illustration.png" target="_blank"> <img style="align: right; float: right" src="./files/certainty_illustration.png" border="0" width="460px"> </a>
       <p style="padding-right:120px; font-size:11pt; color:#555"> Click the figure for a better resolution. </p>
  </div>

<div>
<p>
  Certainty and uncertainty are essential components of science communication. However, how to model (un)certainty has long been a challenging question in both linguistics and science communication research. This study aims to answer the following questions: (1) Are hedges a good proxy to measure certainty in scientific texts? (2) How to model certainty in science communications? (3) Does the certainty of science findings change in science communications? (4)What factors affect the certainty of scientific findings in news and abstracts?
</p>
<br>
<p>
  In this study, we create (i) a new dataset and method for measuring certainty in scientific findings and (ii) an NLP model for certainty prediction. We apply this model over 431K scientific findings to study a series of research questions in science communication.  Our analysis shows that 1) hedges are not able to fully capture both sentence-level and aspect-level certainty in scientific findings 2) over 6k paired findings from news and abstract, findings from news hold lower sentence-level certainty, contradicting existing studies that journalists tend to make science sounds more certain 3) findings from paper abstracts varies with journal impact and team size: low-impact journals and large teams often present scientific findings with higher sentence-level certainty. However, such a pattern does not persist in science news.

</p>
<br>
<p>
  As a part of the paper, we are releasing our annotated dataset for certainty, the code and the fine-tuned model for certainty prediction, and the URLs of science news and paper abstracts used in our paper, as well as the code to extract science findings.
</p>
<br>
<p>
  Please <a href="https://arxiv.org/abs/2109.14776">click here</a> to read our paper on arxiv.
</p>

</div>
<br>
</div>

<div id="Getting" class="heading">
  <p> Getting started (Code and Models)</p>
</div>
<div class="entry">

    <li>
      Our models predicting sentence-level and aspect-level ceratinty are available via simple <a href="https://pypi.org/project/certainty-estimator">pip</a>.
      <br>
      <pre><code>
      pip3 install certainty-estimator
    </code> </pre>
    <li>
      After installing certainty-estimator, please check out the <a href="https://github.com/Jiaxin-Pei/certainty-estimator/blob/main/play.py">code example</a> to calculate both sentence-level certainty and aspect-level certainty.

    </li>
    <li>
      Source codes and a step-by-step tutorial are available in this <a href="https://github.com/Jiaxin-Pei/certainty-estimator/">Github repo</a>.
    </li>


    <li>
      Data and code for building the certainty prediction model are also available <a href="https://github.com/Jiaxin-Pei/Certainty-in-Science-Communication">on GitHub</a>.

    </li>

    <li>
      We use Hugging Face to host the pre-trained <a href="https://huggingface.co/pedropei/sentence-level-certainty">Sentence-level certainty</a> and <a href="https://huggingface.co/pedropei/aspect-level-certainty">Aspect-level certainty </a> models. Please check out this <a href="https://github.com/Jiaxin-Pei/certainty-estimator/blob/main/certainty_estimator/predict_certainty.py">python file</a> for using our models with Hugging Face transformers.


</div>
<div id="Data" class="heading">
<p> Data for download </p>
</div>

<div class="entry">

  <p style="margin-top:0px; margin-bottom:0px; font-size:13pt">
    <b>1. &nbsp; Annotated scientific findings</b>
    <a href="https://github.com/Jiaxin-Pei/Certainty-in-Science-Communication/tree/main/data/annotated_data">(Link)</a><br>
  </p>
  <br>
  <p style="font-size:13pt">
    This data contains 1551 findings labeled with sentence-level certainty and 1760 findings labeled with aspect-level certainty. The train/test/dev split used in our paper is also provided here.
  </p>
  <br>

  <p style="margin-top:0px; margin-bottom:0px; font-size:13pt">
    <b>2. &nbsp; Science news and paper abstract urls</b><br>
  </p>
  <br>
  <p style="font-size:13pt">
    We release the URLs of science news and paper abstracts used in our research <a href="https://github.com/Jiaxin-Pei/Certainty-in-Science-Communication/tree/main/data/">(link)</a>.<br>
  <br>

</div>




<div id="Highlights" class="heading">
  <p> Highlights </p>
</div>


<div class="entry">
  <div style="padding:-10px 0px 0px 0px; height:310px;width: 350px; float: right" align="right" class="imglink">
       <a href="./files/hedges_fail.png" target="_blank"> <img style="align: right; float: right" src="./files/hedges_fail.png" border="0" width="300px"> </a>
       <!--<p style="padding-left:0px; font-size:11pt; color:#555"> 1. Social distance </p>-->
    </div>
  <p style="margin-top:0px; margin-bottom:0px; font-size:14pt">
    <b>1. &nbsp; Hedges are not able to fully capture both sentence-level and aspect-level certainty </b>
  </p>
  <div>
<br>
    <p style="font-size:13pt">
      Hedges are widely used as proxies for uncertainty in language. However, whether hedges are able to fully capture sentence-level and aspect-level certainty remains unclear. Based on the annotated data, our study first examines to what extent hedges can explain the variance of certainty in scientific findings. Comparing the sentence-level certainty with the number of hedges (top) shows only a moderate correlation between hedging and certainty, Pearson's r=0.55, despite their widespread use as a proxy. For example, "Further research is necessary to understand whether this is a causal relationship" contains zero hedges but explicitly expresses strong uncertainty towards the causal relationship, suggesting that many descriptions of certainty are not well captured by simple hedge-based lexicons. Further, authors vary in how frequently they employ hedges when describing the different aspects of certainty (bottom). This variance in their distribution suggests that hedges are less effective as proxies for capturing uncertainty for all aspects.
    </p>
  </div>

<br>


  <div style="padding:-10px 0px 0px 0px; height:250px;width: 350px; float: right" align="right" class="imglink">
       <a href="./files/aspects-sentence-certainty.png" target="_blank"> <img style="align: right; float: right" src="./files/aspects-sentence-certainty.png" border="0" width="300px"> </a>
       <!--<p style="padding-left:0px; font-size:11pt; color:#555"> 1. Social distance </p>-->
    </div>
  <p style="margin-top:0px; margin-bottom:0px; font-size:14pt">
    <b>2. &nbsp; Aspect-level certainty have different effects on the overall sentence-level certainty. </b>
  </p>
  <div>
<br>
    <p style="font-size:13pt">
      In scientific findings, different aspects can have different certainties. Does the certainty of different aspects contribute equally to the overall perceived sentence-level certainty? The answer is no. Based on the annotated data, we calculate the relative sentence-level certainty when each aspect is certain/uncertain. As shown in the right plot, uncertainties about <a id="small-caps">PROBABILITY</a> and <a id="small-caps">SUGGESTION</a> are associated with a sharp decrease of sentence-level certainty. However, the uncertainties about <a id="small-caps">NUMBER</a> and <a id="small-caps">EXTENT</a> are only associated with a small decrease of sentence-level certainty. In short, the overall certainty of scientific findings is majorly affected by <a id="small-caps">PROBABILITY</a> and <a id="small-caps">SUGGESTION</a>, while are less affected by other aspects like <a id="small-caps">NUMBER</a> and <a id="small-caps">EXTENT</a>. This result indicates that the descriptions of aspects vary in how they contribute to the perception of the overall certainty of scientific findings.
    </p>
  </div>

<br>

<div style="float:right;padding:0px 0px 0px 0px; height:350px; width: 350px;">
  <div style="padding:0px 0px 0px 0px; height:170px;  float: right" align="right" class="imglink">
       <a href="./files/certainty_change.png" target="_blank"> <img style="align: right; float: right" src="./files/certainty_change.png" border="0" width="300px"> </a>
       <!--<p style="padding-right:120px; font-size:11pt; color:#555"> 2a. Gender </p>-->
  </div>
<br>
<div style="padding:0px 0px 0px 0px; width:200px; float: right" align="right" class="imglink">
       <a href="./files/aspect-certainty-source.png" target="_blank"> <img style="align: right; float: right" src="./files/aspect-certainty-source.png" border="0" width="300px"> </a>
       <!--<p style="padding-right:120px; font-size:11pt; color:#555"> 2a. Gender </p>-->
  </div>
   <div style="padding:10px 0px 0px 0px; width:200px; float: right" align="right" class="imglink">
 <a href="./files/author_gender.png" target="_blank"> <img style="align: right; float: right" src="./files/author_gender.png" border="0" width="0px"> </a>
 <!--<p style="padding-right:120px; font-size:11pt; color:#555"> 2b. Gender </p>-->
 </div>

  </div>
</div>




<div class="entry">
  <p style="margin-top:0px; margin-bottom:0px; font-size:14pt">
    <b>3. &nbsp; Journalists may actually play down the certainty of scientific findings in science communications</b>
  </p>

<br>

  <p style="font-size:13pt">
    Whether science news makes science sound more certain has long been an important but unanswered question. Our model allows us to examine this question over a large set of scientific findings in science communications. The regression analysis indicates that <b>news descriptions have lower overall sentence-level certainty than abstract descriptions of the same finding (p<0.01) </b>. Although some studies suggest that science news tends to remove hedges and describe science findings with increased certainty, our study over the paired findings finds the opposite: findings in news are less certain compared with findings in abstract, even when controlling the content and many contextual factors.
    <br>
    Further analysis over aspect-level certainty reveals the mechanism behind this phenomenon: Findings in abstracts are associated with more certainties about <a id="small-caps">FRAMING</a> and <a id="small-caps">NUMBER</a>. Findings in news are associated with uncertainties about <a id="small-caps">PROBABILITY</a>, <a id="small-caps">EXTENT</a>, and <a id="small-caps">NUMBER</a>, indicating that the journalists tend to play down the certainty of some aspects, especially regarding numeric information.


      </li>
  <p></p>


</div>

<br>



<!--
<div class="entry">
  <div style="padding:10px 0px 0px 0px; width:350px; height: 200px; float: right" align="right" class="imglink">
       <a href="./files/anonymity.png" target="_blank"> <img style="align: right; float: right" src="./files/anonymity.png" border="1" width="300px"> </a>
       <p style="padding-left:0px; font-size:11pt; color:#555"> 3. Anonymity </p>
    </div>
  <p style="margin-top:0px; margin-bottom:0px; font-size:14pt">
    <b>3. &nbsp; low-impact journals and large teams often present scientific findings with higher sentence-level certainty</b>
  </p>

  <p style="font-size:13pt">
    Given the strong norms of gender and social distance, is there a way to get relieved from them? It is indeed hard in real life; however, in online communities, you could create a completely anonymous identity that removes the constraints of social norms. Our study over 12M questions on Reddit suggests that anonymous accounts (e.g., throwaway123) are asking much more intimate questions compared with other types of accounts, which we consider as a special way of audience design: instead of changing the language, you could change your identity.
  </p>
</div>
<br>
-->




<div class="entry">
   <div style="padding:10px 0px 0px 0px; width:350px; float: right" align="right" class="imglink">

 <a href="./files/abs_news_journal_impact.png" target="_blank"> <img style="align: right; float: right" src="./files/abs_news_journal_impact.png" border="0" width="300px"> </a>
 <!--<p style="padding-right:120px; font-size:11pt; color:#555"> 2b. Gender </p>-->
 </div>

  <p style="margin-top:0px; margin-bottom:0px; font-size:14pt">
    <b>4. &nbsp; Low-impact journals often present scientific findings with higher sentence-level certainty </b>
  </p>
  <p style="font-size:13pt">
    Journal impact factor has long been considered as one core factor associated with the quality of science. Are findings appearing in journals with different journal impact factors present certainty in different ways? The answer is yes. As shown in the right plot, findings in the lower-impact journals are written with the highest level of certainty, while findings appearing in relatively higher-impact journals are described with comparatively less certainty. One potential explanation for this phenomenon is that high-quality papers published in journals with more strict reviewing processes present certainty more precisely, which leads to a lower overall certainty compared with findings in low-impact journals. As a comparison, the certainty of findings written by journalists is not significantly associated with journal impact factors, suggesting that the prestige of a journal does not affect how journalists present scientific findings.

  </p>
</div>

<br>

<div class="entry">

  <div style="padding:0px 0px 0px 0px; width:350px; float: right" align="right" class="imglink">
       <a href="./files/abs_news_num_authors.png" target="_blank"> <img style="align: right; float: right" src="./files/abs_news_num_authors.png" border="0" width="300px"> </a>
       <!--<p style="padding-right:120px; font-size:11pt; color:#555"> 2a. Gender </p>-->
  </div>

  <p style="margin-top:0px; margin-bottom:0px; font-size:14pt">
    <b>5. &nbsp; Large teams often present scientific findings with higher sentence-level certainty </b>
  </p>
  <p style="font-size:13pt">
    In the era of team science, team size has been found to be associated with many core aspects of science, including quality and influence. Does the presentation of scientific certainty also vary with the size of the research team? The answer is yes. Using our data and model, we find a linear relationship between the number of authors and the overall level of certainty in scientific findings, even with controls for fields and authors. Multiple mechanisms may explain this behavior. Larger teams may themselves be more capable of producing more certain results due to more individuals participating and checking results or due to the scale of the experiments capable in team science. Furthermore, our result also connects to the previous finding that small teams generate new disruptive ideas while large teams tend to develop old, existing ideas, as new ideas are often associated with more uncertainties. However, this linear trend does not persist in science news; instead, the sentence-level certainty of findings in science news stays relatively steady across different numbers of authors. While team size has been found to be associated with the novelty and impact of science, our results indicate that the journalist is largely not influenced by the size of the research team in describing the certainty of their findings.

  </p>
</div>


<br>
<br>
<br>



<div id="Cite" class="heading">
<p> Citing the paper, data, or model</p>
</div>
<div class="entry">


  <div class="preformatted" style="font-family: monospace; white-space: pre; font-size: 9pt; background-color:#fefefe; width: 850px">
  @inproceedings{pei2021measuring,
           title={Measuring Sentence-level and Aspect-level (Un)certainty in Science Communications},
           author={Pei, Jiaxin and Jurgens, David},
           booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
           year={2021}
  }
  </div>
</div>



<br>

  <div class="footerSection content">
    <p> Jiaxin Pei &amp; David Jurgens |
    </p>
    <p>Site design courtesy of <a href="http://stanford.edu/~wleif">Will Hamilton</a> via <a href="http://jason.chuang.info/" target="_blank">Jason Chuang</a> via <a href="https://nlp.stanford.edu/~jpennin/" target="_blank">Jeffrey Pennington</a></p>
  </div>


<!-- close outer blocks; there are 3 open div's -->
</div>
</div>
</div>

</body><grammarly-desktop-integration data-grammarly-shadow-root="true"></grammarly-desktop-integration><div id="piggyWrapper" style="position: fixed; top: 0px; right: 0px; line-height: initial; z-index: 2147483647; width: auto; font-family: &quot;Open Sans&quot;, sans-serif; font-size: initial; display: block; text-transform: none;"></div></html>
