[{"+": {}, "-": {}, "research goal": "The dataset includes headlines published by ABC news, an American news company. The two classes are generated based on the year they were published. The Group A snippets are ABC news headlines from 2007, while the Group B snippets are ABC news headlines from 2008. I am a historian writing about trends. My goal is to figure out which categories of news dominated the headlines from year to year. "}, {"+": {"focus on criminal activities and justice": {"p-value": 0.0001498901434284279, "V'": 0.06873409106964348}, "mention criminal cases and police reviews": {"p-value": 0.00011493851007143614, "V'": 0.0604967456222997}, "mention criminal activity, such as child murder and crossbow attacks": {"p-value": 0.00023989151896458493, "V'": 0.053311916568956105}, "discuss crime and justice, such as police crackdowns and court cases": {"p-value": 0.0005134192630754483, "V'": 0.06803416224584827}, "focus on criminal activity, such as thefts and arrests": {"p-value": 0.0001569409529969722, "V'": 0.06055400526387569}, "mention scandals or legal issues, such as calls for law reform or a girl being sold for sex": {"p-value": 1.0007140573730958e-06, "V'": 0.09923034109919648}, "mention disasters and crimes, such as plane accidents and assaults": {"p-value": 4.605256586854045e-06, "V'": 0.08543442113009997}, "discuss criminal activity, such as the last young offenders caught, man charged after police pursuit, or court jailing a man for bashing a partner": {"p-value": 0.00020660769138328798, "V'": 0.057621360221815326}}, "-": {}, "research goal": "The dataset includes headlines published by ABC news, an American news company. The two classes are generated based on the year they were published. The Group A snippets are ABC news headlines from 2010, while the Group B snippets are ABC news headlines from 2014. I am a historian writing about trends. My goal is to figure out which categories of news dominated the headlines from year to year. "}, {"+": {}, "-": {}, "research goal": "The dataset includes headlines published by ABC news, an American news company. The two classes are generated based on the year they were published. The Group A snippets are ABC news headlines from 2017, while the Group B snippets are ABC news headlines from 2016. I am a historian writing about trends. My goal is to figure out which categories of news dominated the headlines from year to year. "}, {"+": {}, "-": {"mention the Coronavirus and its impact on society": {"p-value": 1.1489623594912512e-40, "V'": 0.1516277787460145}, "reference the COVID-19 pandemic": {"p-value": 3.88361660954062e-36, "V'": 0.1358143454952579}, "mention current events, such as the bushfire outlook and coronavirus pandemic": {"p-value": 4.362027925473628e-28, "V'": 0.1982094756842066}, "discusses coronavirus-related topics": {"p-value": 9.190726624637983e-78, "V'": 0.26662880816197326}, "references to the coronavirus pandemic or related stories": {"p-value": 1.6387401285313377e-74, "V'": 0.2576656487997378}, "mention the impact of the coronavirus pandemic, such as socialising safely and government responses": {"p-value": 1.9945382587725355e-26, "V'": 0.10040907306347932}, "mention the impact of the coronavirus on everyday life": {"p-value": 1.6948409618878454e-22, "V'": 0.08558153660752349}, "discuss the impacts of the COVID-19 pandemic": {"p-value": 4.673626353559569e-10, "V'": 0.03617517032777353}, "will discuss events related to the coronavirus pandemic": {"p-value": 9.48472160775962e-42, "V'": 0.155348681126657}}, "research goal": "The dataset includes headlines published by ABC news, an American news company. The two classes are generated based on the year they were published. The Group A snippets are ABC news headlines from 2019, while the Group B snippets are ABC news headlines from 2020. I am a historian writing about trends. My goal is to figure out which categories of news dominated the headlines from year to year. "}, {"+": {"mentions the power and performance of the car": {"p-value": 4.491585420310596e-23, "V'": 0.41991369325088407}, "highlights the safety features of the car": {"p-value": 1.6276005242583484e-07, "V'": 0.15585826922758952}, "uses adjectives such as 'powerful', 'sporty', and 'sophisticated' to describe the product": {"p-value": 1.819750734974692e-12, "V'": 0.329815139075366}, "mentions features that enhance driving experience, such as powerful engine, adjustable suspension, and advanced safety features": {"p-value": 4.2247897235130306e-24, "V'": 0.4329002380164013}, "mentions high performance and power": {"p-value": 8.388679870184009e-21, "V'": 0.38961017266887976}, "mentions specific features that enhance safety": {"p-value": 2.1531872442185724e-08, "V'": 0.1893630623048418}, "highlights the power and speed of the automobile": {"p-value": 7.885097346568754e-18, "V'": 0.34633446580501187}, "mentions the power and strength of the product": {"p-value": 3.6222269631914483e-25, "V'": 0.4633711142901675}, "emphasizes the power and performance of the car": {"p-value": 3.667278326225798e-25, "V'": 0.4458900704217354}, "emphasizes the power and performance of the product": {"p-value": 9.27600880503174e-27, "V'": 0.5048724049706739}}, "-": {"uses words such as 'journey', 'tradition' and 'travel' to emphasize the experience": {"p-value": 1.1330652142924014e-38, "V'": 0.5437145994615143}, "mentions activities such as hiking, kayaking, snorkeling, and relaxing": {"p-value": 5.255306766245459e-08, "V'": 0.1418518571847132}, "describes breathtaking views and natural beauty": {"p-value": 1.2982957667711344e-21, "V'": 0.3096773856700855}, "emphasizes the potential experiences that can be gained": {"p-value": 1.7469516764650015e-14, "V'": 0.37779657261965466}, "mentions historical and cultural aspects of the destination": {"p-value": 3.0009723620320404e-21, "V'": 0.3203746572868198}, "mentions activities such as lounging, beach parties, and side trips": {"p-value": 1.4015726046609285e-11, "V'": 0.21051970948484922}, "references the exotic locales and stunning scenery": {"p-value": 1.990400955800065e-30, "V'": 0.4257226920097742}, "highlights the wildlife, culture, and history of the area": {"p-value": 2.955928305733588e-23, "V'": 0.3290322124550003}, "mentions luxury amenities such as spas, pools, and restaurants": {"p-value": 7.276487503541142e-13, "V'": 0.1935482717031017}, "mentions of adventure and exploration": {"p-value": 8.969859266388766e-25, "V'": 0.44669717116085883}, "highlights the destination's beauty, such as beaches or rainforests": {"p-value": 3.5743632712780586e-33, "V'": 0.4322576811807742}, "mentions exotic, distant locations": {"p-value": 9.106403364415609e-27, "V'": 0.4298000755349499}, "emphasizes relaxation and leisure": {"p-value": 1.1059993104701467e-50, "V'": 0.6263701294295535}, "uses words such as 'vacation', 'escape', 'luxury', and 'magical'": {"p-value": 4.398261256464817e-66, "V'": 0.6814405060400744}, "focuses on the experience and the journey of the product": {"p-value": 0.00013036617461257018, "V'": 0.152095111453862}, "focuses on the beauty of the destination and its scenery": {"p-value": 8.198276314417929e-32, "V'": 0.4193550338288027}}, "research goal": "The dataset includes ad scripts from a variety of industries. The two classes are generated based on the industry of the company. The Group A snippets are ad transcripts for automobile companies, while the Group B snippets are ad transcripts for travel companies. I am an advertiser trying to learn about other industries. My goal is to figure out what different industries appeal to in ads. "}, {"+": {"uses phrases that evoke a sense of luxury, such as 'heavenly' and 'unforgettable'": {"p-value": 1.442843623649347e-07, "V'": 0.237480840817486}, "mentions special formulas, such as patented bonding systems or water-based creme": {"p-value": 0.0004740051533140601, "V'": 0.208218396669882}, "uses words that emphasize beauty, such as 'illuminate', 'glow', and 'shine'": {"p-value": 6.808192260939913e-26, "V'": 0.5745129645391958}, "highlights a glamorous, fashionable look": {"p-value": 6.450712400180539e-31, "V'": 0.5675773798905788}, "mentions beauty trends and being fashionable": {"p-value": 3.601908819953101e-27, "V'": 0.5890718707608005}, "highlights the romance of the product": {"p-value": 1.6777394800636558e-05, "V'": 0.2487694956077618}, "Mentions the luxurious and glamorous results of the product": {"p-value": 1.2232340780731937e-10, "V'": 0.36206362983575646}, "uses words like 'hydration', 'revitalized', and 'smoothness' to describe the benefits of the product": {"p-value": 0.0006837994024267752, "V'": 0.20194201482475477}, "uses language that promotes feeling beautiful": {"p-value": 1.6614835270805667e-15, "V'": 0.4479012494433524}, "uses words that evoke a sense of luxury and indulgence": {"p-value": 1.2565569943899937e-14, "V'": 0.4052797058793419}, "Describes the product as luxurious, indulgent, and/or elegant": {"p-value": 3.4912131752586436e-13, "V'": 0.3442177666576159}, "mentions beauty benefits": {"p-value": 1.6102964266056152e-12, "V'": 0.39554327818962925}}, "-": {"mentions the long-term benefits of the product, such as reducing symptoms or preventing illness": {"p-value": 1.5104925242067785e-09, "V'": 0.34810137939412716}, "focuses on providing relief from physical ailments and discomfort": {"p-value": 1.497427772021437e-16, "V'": 0.4705426751426908}, "Uses words such as 'protection', 'relief', and 'comfort' to describe the product": {"p-value": 1.204588486666716e-13, "V'": 0.3645064116553757}, "mentions the product's ability to reduce and/or prevent certain physical symptoms": {"p-value": 9.872367551823284e-13, "V'": 0.40468829397082096}, "uses words like 'safe', 'trusted' and 'proven'": {"p-value": 9.003755389599727e-08, "V'": 0.308228520872864}, "focuses on the product's ability to reduce pain and discomfort": {"p-value": 6.561654623319304e-11, "V'": 0.3445137710224272}, "mentions specific health benefits of the product": {"p-value": 5.533595863602672e-09, "V'": 0.32037039012680835}, "mentions the need for healthy living, such as diet and exercise": {"p-value": 0.0007192567223633794, "V'": 0.10232558303663085}}, "research goal": "The dataset includes ad scripts from a variety of industries. The two classes are generated based on the industry of the company. The Group A snippets are ad transcripts for beauty products, while the Group B snippets are ad transcripts for personal care products. I am an advertiser trying to learn about other industries. My goal is to figure out what different industries appeal to in ads. "}, {"+": {}, "-": {"mentions the Family Violence Prevention and Services Improvement Act of 2021": {"p-value": 2.663073991968346e-07, "V'": 0.02564092444466953}, "mentions the Violence Against Women Reauthorization Act of 2021": {"p-value": 4.681196139371884e-06, "V'": 0.0246595833884695}, "supports House passage of H.R. 6, the American Dream and Promise Act of 2021": {"p-value": 3.114634546047377e-12, "V'": 0.09807502249569282}, "mentions the need for COVID-19 response efforts": {"p-value": 7.839738595764092e-24, "V'": 0.09401687867454286}, "mentions COVID-19 response efforts": {"p-value": 5.521372322047851e-26, "V'": 0.10256424873846755}}, "research goal": "The dataset includes statements of administration policy from American presidents. The two classes are generated based on the president leading the administration. The Group A snippets are administration statements from Obama, while the Group B snippets are administration statements from Biden. I am a political scientist analyzing policy stances. My goal is to figure out the legislative priorities of each administration. "}, {"+": {"expresses opposition to House passage of legislation": {"p-value": 5.825247800159635e-22, "V'": 0.3628613637862179}, "mentions border security": {"p-value": 0.000399517527706151, "V'": 0.07912786406805439}, "mentions the need for a military": {"p-value": 0.0007062114729823973, "V'": 0.12539299005794063}, "mentions the importance of military defense": {"p-value": 0.00022151867041051966, "V'": 0.14623099147643487}}, "-": {"mentions the need for racial equity": {"p-value": 2.97183507922052e-10, "V'": 0.13095582406147302}, "mentions the American Rescue Plan Act of 2021": {"p-value": 0.0007193364077262267, "V'": 0.029914217684268097}, "mentions the importance of racial equity": {"p-value": 2.590328753536301e-12, "V'": 0.15982413508506502}, "acknowledges the need to provide protection and services to all victims of abuse": {"p-value": 1.0043488610578296e-15, "V'": 0.19009666644229994}, "supports efforts to promote equity for underserved communities": {"p-value": 4.128476582220432e-39, "V'": 0.4824800093471189}}, "research goal": "The dataset includes statements of administration policy from American presidents. The two classes are generated based on the president leading the administration. The Group A snippets are administration statements from Trump, while the Group B snippets are administration statements from Biden. I am a political scientist analyzing policy stances. My goal is to figure out the legislative priorities of each administration. "}, {"+": {"mentions the rude and unprofessional attitude of the staff": {"p-value": 7.399381245152113e-64, "V'": 0.3466537501172971}, "mentions the staff being rude or unhelpful": {"p-value": 1.7950909521802367e-50, "V'": 0.31531402911043055}, "mentions the staff being rude and unhelpful": {"p-value": 1.1440706116619233e-62, "V'": 0.3498144785485608}, "mentions the hidden fees and poor customer service at the airport": {"p-value": 1.880435602599989e-64, "V'": 0.30362494358670544}, "mentions the airline charging extra for carry-on items": {"p-value": 1.2742716119518988e-08, "V'": 0.062032608851494725}, "mentions the staff not being friendly or helpful": {"p-value": 7.941350617207764e-45, "V'": 0.2840811083701371}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline's comfort 1/5, while the Group B snippets rate an airline's comfort 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the lack of in-flight entertainment (movies, TV, music, etc.)": {"p-value": 8.558652557673731e-42, "V'": 0.20410931581328356}, "mentions the lack of inflight video service": {"p-value": 2.2261282850229443e-37, "V'": 0.15184786715252366}, "mentions the lack of inflight entertainment": {"p-value": 1.4776783979374083e-71, "V'": 0.2687309200556144}, "mentions the low quality of food": {"p-value": 5.036915971401661e-13, "V'": 0.12094847074928489}, "mentions a long wait times for passengers": {"p-value": 4.720437425134039e-33, "V'": 0.21092825022222794}, "mentions poor customer service": {"p-value": 3.6210936736599403e-112, "V'": 0.4485514869306326}, "mentions slow or disorganized check-in process": {"p-value": 2.545846427520715e-09, "V'": 0.05994012943917561}, "mentions the inconsistency in airplanes": {"p-value": 2.036104872355897e-21, "V'": 0.16220155698804947}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline's entertainment 1/5, while the Group B snippets rate an airline's entertainment 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the lack of snacks or beverages being offered": {"p-value": 7.971428482683046e-65, "V'": 0.26196702927654864}, "mentions the poor quality of the food": {"p-value": 1.38925672770888e-104, "V'": 0.35596322704486866}, "mentions the food being of poor quality": {"p-value": 2.679018075042686e-108, "V'": 0.36500813532818965}, "mentions the low quality of the food": {"p-value": 4.371504301968103e-107, "V'": 0.36204978668868804}, "mentions a rude ground staff": {"p-value": 1.275929347824458e-122, "V'": 0.41759114725522156}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline's food 1/5, while the Group B snippets rate an airline's food 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions a delay in the flight": {"p-value": 8.205702460532431e-26, "V'": 0.2255632925236059}, "mentions the rudeness of the customer service staff": {"p-value": 6.319035113605354e-52, "V'": 0.3273688666969208}, "mentions the lack of customer service": {"p-value": 2.784288973040429e-52, "V'": 0.24775767566681905}, "mentions the rudeness of the staff": {"p-value": 5.402320141487665e-50, "V'": 0.32223617565671614}, "reports rude and unhelpful staff": {"p-value": 1.6733169241750866e-64, "V'": 0.3507531048236335}, "mentions the delay in departure or arrival": {"p-value": 4.381013402574045e-26, "V'": 0.22530701393478214}, "mentions the unfriendly cabin crew": {"p-value": 1.5400832485271266e-24, "V'": 0.22528832806603766}}, "-": {"mentions the lack of in-flight entertainment": {"p-value": 0.0007484777075468951, "V'": 0.05977454949030708}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline overall 1/10, while the Group B snippets rate an airline overall 5/10. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the lack of inflight entertainment": {"p-value": 1.90149451722846e-10, "V'": 0.07600848170719643}, "mentions the lack of selection for breakfast": {"p-value": 1.655199871984226e-05, "V'": 0.022974516282294976}, "mentions the delay in takeoff": {"p-value": 0.00012348339948380348, "V'": 0.0468950675027037}, "mentions the lack of entertainment options": {"p-value": 2.496752851369731e-09, "V'": 0.06591114304631299}, "mentions the lack of a modern in-flight entertainment system": {"p-value": 1.3308228652010556e-10, "V'": 0.07683330944823835}, "mentions the lack of updated plane equipment": {"p-value": 9.540893979642334e-17, "V'": 0.10386124971557462}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline overall 8/10, while the Group B snippets rate an airline overall 10/10. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the lack of clear instructions or guidance from staff": {"p-value": 3.0112066099469907e-105, "V'": 0.6460725614006648}, "mentions the lack of customer service from the staff": {"p-value": 8.597344907926062e-209, "V'": 0.8009020627323179}, "mentions the difficulty of using airline apps or websites": {"p-value": 0.00016037748186080545, "V'": 0.055874741176545534}, "mentions the lack of customer service at check-in": {"p-value": 1.634926037155768e-59, "V'": 0.47538794631630893}, "mentions the difficulty in booking and selecting a seat": {"p-value": 1.506399688331216e-08, "V'": 0.10160802823633297}, "mentions rude and unhelpful staff at the airport terminal and during the flight": {"p-value": 2.295456594053181e-102, "V'": 0.6385382115303916}, "mentions a long wait time at check-in": {"p-value": 2.9480537980309644e-20, "V'": 0.21853838448611962}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline's service 1/5, while the Group B snippets rate an airline's service 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the lack of a welcome from the Cabin Chief of Staff": {"p-value": 8.534129991749927e-30, "V'": 0.34473019995894166}, "mentions rude or unprofessional staff": {"p-value": 0.0, "V'": 0.8888337837283026}, "mentions that the flight attendants were unfriendly": {"p-value": 9.838818237078997e-287, "V'": 0.6560153111316402}, "mentions the cabin crew being unfriendly and unhelpful": {"p-value": 0.0, "V'": 0.8500042790663069}, "mentions long delays and lack of communication": {"p-value": 0.0, "V'": 0.7408705285873207}, "mentions the boarding process as chaotic": {"p-value": 8.24171990916483e-14, "V'": 0.29975469451634157}, "mentions the lack of communication from the crew": {"p-value": 0.0, "V'": 0.8160162601364911}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline's staff 1/5, while the Group B snippets rate an airline's staff 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the lack of customer service": {"p-value": 1.2427501926828577e-275, "V'": 0.6587806929296467}, "mentions the discomfort of the seats": {"p-value": 6.385221500970279e-11, "V'": 0.10839702584287554}, "mentions a long wait time for the flight": {"p-value": 2.028859775931209e-61, "V'": 0.30113750992408217}, "mentions hidden fees such as checked baggage and seat selection": {"p-value": 7.574072464331487e-05, "V'": 0.058609183272656656}, "mentions the lack of quality customer service": {"p-value": 1.0197956902033585e-195, "V'": 0.586483983860124}, "mentions the lack of communication from the airline about the delay": {"p-value": 2.9435605922931193e-59, "V'": 0.2340083584302726}, "mentions the lack of information from staff": {"p-value": 2.0755489764161006e-69, "V'": 0.27656342980852844}, "mentions delays in flight times": {"p-value": 2.3976159717057785e-47, "V'": 0.2875758529224022}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airline's value 1/5, while the Group B snippets rate an airline's value 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the long wait times to get through immigration or security": {"p-value": 1.05383128751069e-31, "V'": 0.2377654161170955}, "mentions the lack of seating in waiting areas": {"p-value": 4.1206163786476556e-22, "V'": 0.13287110721050882}, "mentions the lack of space in the departure lounge": {"p-value": 8.617667230553054e-15, "V'": 0.10594932763007323}, "mentions the lack of food choices": {"p-value": 3.3218417078737454e-08, "V'": 0.07850339351850968}, "Mentions the rudeness of the security staff": {"p-value": 2.0638471386657935e-42, "V'": 0.24577969037593547}, "mentions the long queues in passport control": {"p-value": 1.7758656788301565e-12, "V'": 0.12356417004050291}, "mentions the long queues at both departure and arrival": {"p-value": 3.456484420004541e-30, "V'": 0.20409520652855673}, "mentions a lack of signage or guidance": {"p-value": 3.5060817187321842e-53, "V'": 0.31491022781267997}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airport's cleanliness 1/5, while the Group B snippets rate an airport's cleanliness 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the long line at the security checkpoint": {"p-value": 2.031072496776448e-34, "V'": 0.25627754423781623}, "mentions the lack of customer service from the staff": {"p-value": 2.1389297823411332e-182, "V'": 0.576804223052098}, "mentions a lack of automated kiosks for check-in": {"p-value": 1.9768295983833852e-05, "V'": 0.048843752643119506}, "mentions the long lines and inefficient layout": {"p-value": 4.623107126726566e-130, "V'": 0.4743041532905151}, "mentions inadequate and/or confusing signage": {"p-value": 1.1820391258998233e-29, "V'": 0.2234907594508994}, "mentions long queues for check-in": {"p-value": 6.403802997362604e-48, "V'": 0.3085804667997195}, "mentions the lack of toilet facilities": {"p-value": 0.0005214199959645043, "V'": 0.03324082811852884}, "mentions the long wait times for check-in or security": {"p-value": 3.109108347197771e-54, "V'": 0.334793197374133}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airport 1/10 overall, while the Group B snippets rate an airport 5/10 overall. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions long wait times": {"p-value": 4.429128488475441e-06, "V'": 0.08148826047981048}, "mentions the confusion in navigating the airport": {"p-value": 2.77739759531328e-22, "V'": 0.15640626920990947}, "mentions the need for an improvement in the signage or directions within the airport": {"p-value": 1.4734692328254936e-05, "V'": 0.03732355177253759}, "mentions the lack of a variety of food options in the departure lounge": {"p-value": 2.825936848330563e-08, "V'": 0.059287859157576055}, "mentions a long wait time at immigration": {"p-value": 4.782595055016742e-07, "V'": 0.058055756551580455}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airport 8/10 overall, while the Group B snippets rate an airport 10/10 overall. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions long queues at check-in": {"p-value": 4.667164747437843e-258, "V'": 0.6401193661006133}, "mentions long queue times for passport control": {"p-value": 4.01369177957398e-171, "V'": 0.5121175335332205}, "mentions long waits in queues": {"p-value": 1.2963047773094097e-285, "V'": 0.6869274128644547}, "mentions the long wait times at passport control": {"p-value": 4.53408871131842e-189, "V'": 0.5420830527967851}, "mentions the lack of staff, inefficient staff, or staff not being friendly": {"p-value": 4.198150996589694e-248, "V'": 0.651457492836604}, "mentions an excessive wait time": {"p-value": 4.313127461772158e-274, "V'": 0.6764378811156995}, "mentions the lack of clear signage or directions": {"p-value": 2.860809874517847e-11, "V'": 0.2015549329870463}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airport's queue 1/5, while the Group B snippets rate an airport's queue 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions the poor customer service": {"p-value": 0.0, "V'": 0.7245457782904381}, "mentions long queues at the check-in counter": {"p-value": 1.8424677586822017e-35, "V'": 0.2456949109159461}, "mentions the long queues at immigration": {"p-value": 2.5831633425463225e-32, "V'": 0.24239134801682566}, "mentions poor service from staff": {"p-value": 1.6724782861908925e-170, "V'": 0.5597134079952528}, "mentions the lack of air conditioning": {"p-value": 3.0937558662893685e-10, "V'": 0.0500160901790819}, "mentions the lack of sufficient seating in the departure hall": {"p-value": 1.1923595448853092e-46, "V'": 0.19903009970619318}, "mentions the poor quality of food in the cafe": {"p-value": 4.359991306807325e-39, "V'": 0.16902223414213743}, "mentions the lack of clear signage to direct passengers": {"p-value": 1.1913168682560828e-61, "V'": 0.2961215768682386}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the stars given in the review. The Group A snippets rate an airport's shopping avenues 1/5, while the Group B snippets rate an airport's shopping avenues 5/5. I am the manager of an airport. My goal is to figure out how to improve specific aspects of the airport and airplane experience for customers. "}, {"+": {"mentions rudeness of staff": {"p-value": 0.0007904454045044738, "V'": 0.10461259298570563}, "Expresses dissatisfaction with flight delays with no explanation": {"p-value": 0.0006572302213565803, "V'": 0.10660646843748067}, "Complains about the lack of service and commitment for the price of airfare": {"p-value": 1.5511151551037434e-08, "V'": 0.1703809905008492}, "mentions American Airlines' staff as being inefficient or unhelpful": {"p-value": 7.897672726044281e-39, "V'": 0.3982875247094848}, "mentions American Airlines' staff as being unhelpful or rude": {"p-value": 1.0766101033890204e-38, "V'": 0.38750382998033356}, "Mentions old and run-down planes": {"p-value": 1.1924958452410967e-05, "V'": 0.11972539308102853}, "expresses frustration with American Airlines' delay in notification of flight issues": {"p-value": 1.058678796340054e-13, "V'": 0.21679774892447912}}, "-": {"mentions Delta's customer service as unhelpful or rude": {"p-value": 4.426831232204259e-12, "V'": 0.1890767198945624}, "mentions Delta Airlines' staff as being friendly and helpful": {"p-value": 3.4132037447500545e-28, "V'": 0.2832511059819941}, "mentions the competence of the flight attendants": {"p-value": 4.472534914811929e-06, "V'": 0.12600591200150707}, "mentions Delta Airlines' staff as being friendly and courteous": {"p-value": 6.692330924492009e-31, "V'": 0.3014171096605162}, "mentions the flight attendants being friendly and accommodating": {"p-value": 6.759146108775758e-07, "V'": 0.14722373303821554}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review American Airlines flights, while the Group B snippets review Delta Airlines flights. I am a consumer researcher at an airline company. My goal is to figure out which aspects of each airline stand out to customers. "}, {"+": {"mentions the lack of personal TVs": {"p-value": 7.545914277019726e-06, "V'": 0.07371569850959385}, "mentions the lack of in-seat entertainment": {"p-value": 1.6338599556540226e-06, "V'": 0.11562547157053032}, "mentions the lack of a personal screen or small screen size": {"p-value": 1.3723118702946673e-06, "V'": 0.0918682075566917}, "mentions the lack of individual entertainment at each seat": {"p-value": 7.375239123566628e-05, "V'": 0.08500829771659608}, "mentions the lack of individual entertainment systems": {"p-value": 1.2090203431401918e-06, "V'": 0.10765565294894411}}, "-": {"mentions the excellent service of the crew": {"p-value": 5.016230893384404e-08, "V'": 0.16004568202461772}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review American Airlines flights, while the Group B snippets review Delta Airlines flights. I am the CEO of an airline company. My goal is to figure out what services and products are only offered by the other airline. "}, {"+": {"expresses dissatisfaction with the cramped seating": {"p-value": 2.9863106335486303e-18, "V'": 0.22653876285859362}, "mentions difficulty with cramped seats": {"p-value": 4.138409300923749e-17, "V'": 0.2061208256656755}, "mentions the cabin crew of British Airways as being friendly and professional": {"p-value": 2.3788261680769837e-71, "V'": 0.44628321342745936}, "mentions the comfort of the seats on the British Airways A380": {"p-value": 1.2130290218605236e-38, "V'": 0.26426335826261405}, "mentions the quality of the aircraft and cabin crew": {"p-value": 5.068259411064984e-22, "V'": 0.15199231331861607}, "mentions the age of the aircraft": {"p-value": 7.665957193748271e-31, "V'": 0.21686992702072824}, "mentions the attentive and friendly cabin crew": {"p-value": 2.1355866194759358e-17, "V'": 0.2306714516433071}, "mentions British Airway's staff as being pleasant and attentive": {"p-value": 3.5842882085938204e-69, "V'": 0.4408980080127687}, "mentions the new Club Europe seats as being too small in width and pitch": {"p-value": 6.804504230409997e-15, "V'": 0.10248853018887542}}, "-": {"mentions Ryanair's strict rules and fees for not following them": {"p-value": 1.8589760027995907e-96, "V'": 0.4157900955671511}, "mentions delays and lack of communication": {"p-value": 0.00017379575067314569, "V'": 0.1033183546895835}, "mentions difficulty with online check-in process": {"p-value": 9.89220059037948e-18, "V'": 0.12090773903864516}, "mentions the lack of reasonable explanation concerning criteria for cabin luggage": {"p-value": 1.648244835074641e-19, "V'": 0.1461253278724774}, "mentions Ryanair's strictness about carry-on bags": {"p-value": 1.6186206219412547e-39, "V'": 0.21064757423751865}, "mentions high fees for luggage, boarding passes, and seating": {"p-value": 1.1811198005731716e-32, "V'": 0.24975810032898216}, "mentions Ryanair's hand-luggage policy": {"p-value": 2.738243323687118e-44, "V'": 0.22224217145210495}, "mentions having to pay additional fees for items like water": {"p-value": 2.8341745612161756e-18, "V'": 0.11813935429926907}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review of British Airways flights, while the Group B snippets review of Ryan Air flights. I am a consumer researcher at an airline company. My goal is to figure out which aspects of each airline stand out to customers. "}, {"+": {"mentions the high quality of service": {"p-value": 5.389996835838638e-06, "V'": 0.1191791033591908}, "mentions the presence of a lounge or spa service": {"p-value": 1.5987052754288471e-24, "V'": 0.16509013507814224}, "mentions the availability of Concorde Room": {"p-value": 0.0007331273864753185, "V'": 0.02007537412853996}, "mentions the lack of legroom": {"p-value": 0.000696150682946007, "V'": 0.04011338785271077}}, "-": {"mentions the strict enforcement of the baggage policy": {"p-value": 1.0754844100429082e-21, "V'": 0.11640973132727336}, "mentions being charged for printing boarding passes or for basic services like a glass of water": {"p-value": 2.0047292185792658e-30, "V'": 0.16211481369860056}, "mentions the extra fees associated with baggages": {"p-value": 2.6046777701796877e-20, "V'": 0.1023164800587213}, "mentions the need to pay extra to print boarding passes": {"p-value": 1.0990192412570048e-26, "V'": 0.1303828724796819}, "mentions the priority boarding": {"p-value": 1.0498448423638913e-11, "V'": 0.08866053095439874}, "mentions the issue of paying extra for boarding passes": {"p-value": 4.746335632189992e-34, "V'": 0.16428575108236299}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review of British Airways flights, while the Group B snippets review of Ryan Air flights. I am the CEO of an airline company. My goal is to figure out what services and products are only offered by the other airline. "}, {"+": {"mentions Air Canada's staff as pleasant and efficient": {"p-value": 8.864990426952033e-09, "V'": 0.14626687052202472}, "mentions Air Canada's staff as being friendly and helpful": {"p-value": 6.913976790811727e-08, "V'": 0.1397615880732054}, "mentions the new configuration on the Boeing 777-300": {"p-value": 6.333780961577129e-07, "V'": 0.0386364023733621}, "mentions the friendly airline crew": {"p-value": 1.274182665233999e-05, "V'": 0.11868211469425685}, "mentions a pleasant flight experience": {"p-value": 2.3678822478955486e-11, "V'": 0.16378913059653386}, "mentions excellent service from flight attendants": {"p-value": 6.121031149736496e-09, "V'": 0.1363344845640872}}, "-": {"comments on the lack of legroom in the economy class": {"p-value": 2.4444848159577035e-38, "V'": 0.3421188906545804}, "expresses dissatisfaction with the lack of legroom on the planes": {"p-value": 2.623458124121469e-47, "V'": 0.4163550027865003}, "mentions the limited selection of movies and TV shows available on the app": {"p-value": 0.00012583364058188119, "V'": 0.041265714808404574}, "mentions the uncomfortable seats on Air Canada Rouge flights": {"p-value": 2.629936286436405e-48, "V'": 0.4289117767648132}, "mentions lack of legroom": {"p-value": 1.3384706581588018e-45, "V'": 0.384833689526345}, "mentions lack of legroom in seats": {"p-value": 1.0654111286825105e-49, "V'": 0.4207080095731936}, "expresses frustration with limited legroom in Economy seats": {"p-value": 4.566600603797419e-51, "V'": 0.42839254169916086}, "mentions uncomfortable seats": {"p-value": 5.453662256564442e-41, "V'": 0.3989188217358174}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review Canada Airlines flights, while the Group B snippets review Canada Airlines Rogue flights. I am a consumer researcher at an airline company. My goal is to figure out which aspects of each airline stand out to customers. "}, {"+": {"mentions the quality of the food and beverage selection": {"p-value": 1.9821449096840387e-10, "V'": 0.19374818007221178}, "mentions the pleasant airport staff": {"p-value": 2.8946166327051372e-05, "V'": 0.11940621966448861}, "mentions the lie-flat beds in Business Class": {"p-value": 7.445953127203748e-07, "V'": 0.04453242943007077}, "mentions attentive and friendly crew": {"p-value": 0.00033519154258147854, "V'": 0.10565837348900164}}, "-": {"mentions the uncomfortable seating pitch": {"p-value": 6.357798321087529e-48, "V'": 0.4311212307985677}, "mentions the lack of legroom in the seats": {"p-value": 1.202167464807426e-49, "V'": 0.4369232596014487}, "mentions the lack of legroom": {"p-value": 1.735787962558252e-45, "V'": 0.4165415950411042}, "mentions the narrow seating and lack of legroom": {"p-value": 3.344855440963592e-45, "V'": 0.4190928415268922}, "mentions small seats with no legroom": {"p-value": 4.92138343212298e-52, "V'": 0.4477062447706269}, "mentions the cramped legroom": {"p-value": 8.1574707428108435e-50, "V'": 0.4391083602965472}, "mentions the lack of seat selection when booking": {"p-value": 1.5211456928517117e-05, "V'": 0.11042066739440957}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review Canada Airlines flights, while the Group B snippets review Canada Airlines Rogue flights. I am the CEO of an airline company. My goal is to figure out what services and products are only offered by the other airline. "}, {"+": {"expresses dissatisfaction with the lack of customer service from Air Canada": {"p-value": 0.0008501001228327659, "V'": 0.07581992213478303}, "mentions Jet Airways' staff as being unfriendly or unhelpful": {"p-value": 0.0006637006389304342, "V'": 0.08082876765138725}}, "-": {"mentions of rude or unfriendly staff": {"p-value": 0.00034338617633976, "V'": 0.09428380170907774}, "mentions the quality of the in seat entertainment system": {"p-value": 4.149521011754532e-13, "V'": 0.1842334354193588}, "expresses satisfaction with Emirates' seats and legroom": {"p-value": 1.1253458103592865e-18, "V'": 0.1686159330524103}, "mentions the modern aircraft used by Emirates": {"p-value": 3.443202040129635e-81, "V'": 0.4397840714472145}, "mentions the excellent entertainment system": {"p-value": 7.324938580264508e-24, "V'": 0.23013505264754036}, "mentions the high quality of food and drinks": {"p-value": 8.836219229388241e-08, "V'": 0.13536250581801476}, "mentions the generous drinks service": {"p-value": 0.0001484743528825819, "V'": 0.06620888089190646}, "mentions the high quality of Emirates service on the flight": {"p-value": 2.092282020434155e-38, "V'": 0.3185176596054451}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review Jet Airways flights, while the Group B snippets review Emirates flights. I am a consumer researcher at an airline company. My goal is to figure out which aspects of each airline stand out to customers. "}, {"+": {"mentions the lack of inflight entertainment": {"p-value": 2.1290984061123601e-13, "V'": 0.13905364037865098}, "mentions the need to pay for excess weight": {"p-value": 0.00039781266021682814, "V'": 0.022849065578950833}}, "-": {"mentions the limousine service to the airport": {"p-value": 3.8212752494053666e-07, "V'": 0.043749504763706225}, "mentions a long wait for the cabin crew to respond to requests": {"p-value": 2.398091562678831e-09, "V'": 0.11747777757678983}, "mentions the complimentary limousine service": {"p-value": 1.3619311558132533e-07, "V'": 0.04650679251289222}, "mentions the free chauffeur service": {"p-value": 3.825769110695568e-07, "V'": 0.04372766178601216}, "notes the cabin staff are unfriendly": {"p-value": 6.43873604201557e-05, "V'": 0.10160186650258779}, "Mentions the unprofessional behaviour of the cabin crew": {"p-value": 0.0001424140420859099, "V'": 0.10193431534294795}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review Jet Airways flights, while the Group B snippets review Emirates flights. I am the CEO of an airline company. My goal is to figure out what services and products are only offered by the other airline. "}, {"+": {"mentions high baggage fees for international flights": {"p-value": 5.702486059514977e-08, "V'": 0.12398571061480596}, "mentions the Spirit Airlines' fee for carry-on luggage": {"p-value": 1.798680001600635e-30, "V'": 0.32985925263669624}, "mentions Spirit Airlines' charge for carry-on baggage": {"p-value": 1.176527645051473e-29, "V'": 0.32669704641157754}, "mentions Spirit Airlines' baggage fees": {"p-value": 2.5088350397825775e-33, "V'": 0.3546415822392867}, "mentions Spirit Airlines' cramped seating": {"p-value": 9.006057282049192e-19, "V'": 0.22268518599103143}, "mentions Spirit Airlines' hidden fees": {"p-value": 2.98226708972416e-45, "V'": 0.4257177654051495}, "mentions hidden fees and other costs associated with Spirit Airlines": {"p-value": 1.0311631153857754e-10, "V'": 0.18842659819365337}, "mentions extra fees for carry-on luggage": {"p-value": 0.00024630780258782234, "V'": 0.1134061863745936}}, "-": {"mentions Frontier's communication being awful": {"p-value": 7.837057855520862e-38, "V'": 0.3886269705913319}, "mentions Frontier's fee for a second carry-on bag": {"p-value": 5.477893561489772e-13, "V'": 0.16568583004134674}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review Spirit flights, while the Group B snippets review Frontier flights. I am a consumer researcher at an airline company. My goal is to figure out which aspects of each airline stand out to customers. "}, {"+": {"mentions extra fee for checked bag": {"p-value": 0.0001215596090286681, "V'": 0.108149776875031}, "mentions the high cost of checked baggage": {"p-value": 8.651008639350397e-05, "V'": 0.10162024131718861}, "mentions the additional charges for checked and carry-on bags": {"p-value": 2.4610296886130544e-05, "V'": 0.1261184488195769}, "mentions lack of legroom in seats": {"p-value": 0.0006741478072428348, "V'": 0.052343384041969185}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review Spirit flights, while the Group B snippets review Frontier flights. I am the CEO of an airline company. My goal is to figure out what services and products are only offered by the other airline. "}, {"+": {"expresses frustration with United's on-board WiFi system": {"p-value": 0.0004639827713802261, "V'": 0.027085442090380062}}, "-": {"mentions American Airlines' staff as being friendly and efficient": {"p-value": 2.889833500373635e-17, "V'": 0.15571657915503032}, "mentions American Airlines' customer service as improved": {"p-value": 5.738428326410587e-17, "V'": 0.1398750402214523}, "mentions difficulty with the American Airlines ticket and baggage agents": {"p-value": 4.649954458864407e-31, "V'": 0.25823414220618357}, "mentions the quality of in-flight entertainment on American Airlines": {"p-value": 1.3688027595264343e-18, "V'": 0.16292083333380825}, "mentions that American Airlines' planes are old and uncomfortable": {"p-value": 1.9930392705409993e-31, "V'": 0.21521999996770938}, "mentions the lack of IFE": {"p-value": 0.00034852115110402585, "V'": 0.07526666007115992}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review United Airlines flights, while the Group B snippets review American Airlines flights. I am a consumer researcher at an airline company. My goal is to figure out which aspects of each airline stand out to customers. "}, {"+": {}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the airline the reviewer flew. The Group A snippets review United Airlines flights, while the Group B snippets review American Airlines flights. I am the CEO of an airline company. My goal is to figure out what services and products are only offered by the other airline. "}, {"+": {"mentions the seat comfort": {"p-value": 0.0002620285762345703, "V'": 0.10867383609632308}, "mentions the comfort level of the seats": {"p-value": 0.0001271507567255528, "V'": 0.10704791081597698}, "mentions the comfort of the seats": {"p-value": 0.0005808458923591365, "V'": 0.0930320744428316}}, "-": {"mentions the punctuality of the flight": {"p-value": 3.294884206687001e-06, "V'": 0.12697968824122674}, "mentions the punctuality and timeliness of the flight": {"p-value": 1.6271718114046574e-05, "V'": 0.12260174130123486}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the country the reviewer is from. The Group A snippets are airline reviews from Singapore, while the Group B snippets are airline reviews from India. I am an owner of an international airline. My goal is to figure out which specific features reviewers from each country care about. "}, {"+": {"mentions the efficiency of the cabin crew": {"p-value": 3.4464056833765417e-10, "V'": 0.13814733313336997}, "mentions comfort of the seating": {"p-value": 3.681268407431747e-07, "V'": 0.1092063335159244}, "mentions the on-board crew's professionalism and efficiency": {"p-value": 5.309715782097333e-07, "V'": 0.10730631957083747}, "mentions the legroom and seating arrangements": {"p-value": 1.246380789070083e-06, "V'": 0.10686240747692027}, "mentions the comfort of the seats": {"p-value": 2.3707687024555608e-05, "V'": 0.08651873696816176}, "mentions the seat comfort": {"p-value": 5.0295671569791285e-06, "V'": 0.09734908425754174}, "mentions the quality of the cabin crew": {"p-value": 2.2910704595943588e-11, "V'": 0.14657761607201447}, "mentions the staff's friendliness": {"p-value": 1.578669811330955e-06, "V'": 0.1039078713981324}, "mentions the helpfulness of the ground staff": {"p-value": 0.00014559694063546422, "V'": 0.07366639230046934}, "mentions the airline crew and their politeness": {"p-value": 1.2137250580389747e-07, "V'": 0.11788393160649457}}, "-": {"mentions the mistakes of the airline": {"p-value": 3.861372197869974e-09, "V'": 0.12487211931636943}, "mentions the delays encountered with the airline": {"p-value": 1.3643436740828885e-11, "V'": 0.14385959022096645}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the country the reviewer is from. The Group A snippets are airline reviews from the UK, while the Group B snippets are airline reviews from the US. I am an owner of an international airline. My goal is to figure out which specific features reviewers from each country care about. "}, {"+": {"mentions the efficiency of the staff": {"p-value": 3.7693900003130764e-05, "V'": 0.058244224202139816}, "mentions the cleanliness of the cabin and the maintenance": {"p-value": 0.0009006117875558663, "V'": 0.06516275999620985}, "mentions the Qantas Club": {"p-value": 1.6150168115399127e-06, "V'": 0.026001947394935932}, "mentions the friendly flight attendants": {"p-value": 8.914862414801206e-13, "V'": 0.1583454418876929}, "mentions the overall comfort of the plane": {"p-value": 3.204974021384304e-07, "V'": 0.11218622915984122}, "mentions the professionalism of the staff": {"p-value": 2.5485720688339737e-07, "V'": 0.09058723177031214}}, "-": {"mentions the on-time performance": {"p-value": 1.466759163571637e-05, "V'": 0.09367044545878528}, "mentions the timeliness of the flight": {"p-value": 0.0009641326705029832, "V'": 0.0735074292801961}, "mentions the lack of legroom": {"p-value": 8.87361528611948e-09, "V'": 0.09891188573880227}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the country the reviewer is from. The Group A snippets are airline reviews from Australia, while the Group B snippets are airline reviews from Canada. I am an owner of an international airline. My goal is to figure out which specific features reviewers from each country care about. "}, {"+": {}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the country the reviewer is from. The Group A snippets are airline reviews from Germany, while the Group B snippets are airline reviews from France. I am an owner of an international airline. My goal is to figure out which specific features reviewers from each country care about. "}, {"+": {}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the type of seat on the reviewed plane. The Group A snippets review airplane seats in a 3x3x3 arrangement, while the Group B snippets review airplane seats in a 3x3 arrangement. I am a designer of airport seats. My goal is to figure out what customers like and don't like about each type of seat. "}, {"+": {"notes the lack of space for passengers": {"p-value": 0.0008447379348619697, "V'": 0.12628907894120878}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the type of seat on the reviewed plane. The Group A snippets review airplane seats in a 3x4x3 arrangement, while the Group B snippets review airplane seats in a 2x4x2 arrangement. I am a designer of airport seats. My goal is to figure out what customers like and don't like about each type of seat. "}, {"+": {}, "-": {"mentions how the food and cabin crew were superb": {"p-value": 0.0004749280961908122, "V'": 0.1588296336731052}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the model of the plane the reviewer flew on. The Group A snippets review seats on the Airbus 340, while the Group B snippets review seats on the Airbus 330. I am a product manager at an airplane manufacturer. My goal is to figure out what customers like and don't like about each plane model. "}, {"+": {}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the model of the plane the reviewer flew on. The Group A snippets review seats on the Airbus 380, while the Group B snippets review seats on the Airbus 340. I am a product manager at an airplane manufacturer. My goal is to figure out what customers like and don't like about each plane model. "}, {"+": {"mentions how the seat configuration is 3x4x3, which is uncomfortable for long haul flights": {"p-value": 4.9979195569086306e-05, "V'": 0.12031520711278779}}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the model of the plane the reviewer flew on. The Group A snippets review seats on the Boeing 777, while the Group B snippets review seats on the Boeing 747. I am a product manager at an airplane manufacturer. My goal is to figure out what customers like and don't like about each plane model. "}, {"+": {"mentions the short seat pitch": {"p-value": 7.280745800248038e-08, "V'": 0.22012367901765664}, "mentions the lack of legroom": {"p-value": 9.250460820628765e-05, "V'": 0.14951268223358055}, "mentions the cramped seating space": {"p-value": 1.568393566855615e-05, "V'": 0.18819263135055697}, "mentions the cramped seating and/or lack of legroom": {"p-value": 2.145001353438624e-06, "V'": 0.2040393405135223}}, "-": {"mentions the extra width and comfort of the seat": {"p-value": 3.0079732109932336e-07, "V'": 0.1791243958909988}, "mentions improved sleep quality compared to economy": {"p-value": 4.8008636415020946e-11, "V'": 0.08349863846408456}, "mentions the level of recline": {"p-value": 6.10658631489163e-05, "V'": 0.15551536991272305}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the cabin of the customer. The Group A snippets review airplane seats for economy passengers, while the Group B snippets review airplane seats for premium passengers. I am a product manager at an airline company. My goal is to figure out the specific needs of customers in each cabin. "}, {"+": {}, "-": {}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the type of traveller. The Group A snippets are airline ratings from family travellers, while the Group B snippets are airline ratings from business travellers. I am a product manager at an airline company. My goal is to figure out the specific needs of different kinds of travellers. "}, {"+": {}, "-": {"mentions the long waiting times for check-in and security": {"p-value": 0.0008695211645497319, "V'": 0.17337452397709058}, "mentions difficulty navigating the airport": {"p-value": 1.3483841452208046e-05, "V'": 0.2105447949298508}, "mentions difficulty with the check-in process": {"p-value": 2.395926869133908e-07, "V'": 0.2662517142485929}, "mentions the difficulty of navigating the airport": {"p-value": 0.0005524001663356744, "V'": 0.16497032368536835}, "mentions long queues and slow processing times": {"p-value": 0.0009217067625030442, "V'": 0.166794044620996}, "complains about long queues and slow security checks": {"p-value": 0.00025522582273247604, "V'": 0.17757773099403873}, "mentions long queues and slow security checks": {"p-value": 0.0002953668656723127, "V'": 0.18400350950941402}, "mentions the long lines and lack of organization": {"p-value": 0.0005603781803129728, "V'": 0.16021053149696518}}, "research goal": "The dataset includes reviews of airlines collected from the review website Skytrax. The two classes are generated based on the type of traveller. The Group A snippets are airline ratings from solo travellers, while the Group B snippets are airline ratings from couple travellers. I am a product manager at an airline company. My goal is to figure out the specific needs of different kinds of travellers. "}, {"+": {}, "-": {"uses language that implies a lack of respect or understanding, such as 'backseat driving' or 'interrupting'": {"p-value": 0.00030392785308885493, "V'": 0.06123905640516758}, "uses language that is overly confrontational, such as 'demand' and 'forced'": {"p-value": 7.445554395001847e-07, "V'": 0.07400950078036339}, "uses aggressive language or tone": {"p-value": 2.7870576170169076e-05, "V'": 0.048752831073417546}, "employs aggressive language, such as 'retaliate' and 'creep up'": {"p-value": 0.00027183786341182737, "V'": 0.05731055631336224}, "uses emotionally charged language, such as 'rage' or 'fit of rage'": {"p-value": 0.0006005256548707261, "V'": 0.05799285015173472}, "uses language that is confrontational, such as 'snap' and 'shout'": {"p-value": 5.0232662167605344e-05, "V'": 0.05532047104602633}, "focuses on the other person's wrongdoings and disregards their own mistakes": {"p-value": 1.4353791649985363e-09, "V'": 0.13456610290727705}, "employs aggressive language, such as 'threatening,' 'demanding,' or 'yelling'": {"p-value": 0.0006325088751583416, "V'": 0.053101690863839535}, "describes a situation in which the author is not the aggressor": {"p-value": 6.255670312484736e-10, "V'": 0.13493600499803565}, "uses language that emphasizes the other person's wrong doing, such as 'he was being a jerk' or 'he was an asshoole'": {"p-value": 4.703175904437738e-06, "V'": 0.08202452453657672}, "uses language to emphasize the importance of the needs of the author, such as 'unimportant to me' or 'putting everyone else before her'": {"p-value": 0.00047224864605640557, "V'": 0.0745244728519096}, "uses language to emphasize the harm done to the author, such as 'increase her stress load' or 'ruining our Friday night'": {"p-value": 2.582504193472904e-13, "V'": 0.16236170181468718}}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios where the author is the asshoole, while the Group B snippets describe scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {"demonstrates a willingness to learn from their mistakes and make amends": {"p-value": 5.297740154516297e-10, "V'": 0.0909797334799787}, "shows a willingness to listen to and consider the other person's opinion": {"p-value": 0.00012478128998306108, "V'": 0.05532266499838939}, "showed empathy towards someone else's feelings and situation": {"p-value": 0.0004077307930982457, "V'": 0.06752583169862272}}, "-": {"Does not take responsibility for their actions or the consequences of their actions": {"p-value": 1.8225967978709545e-09, "V'": 0.10094924833016483}, "exhibits a lack of empathy towards how their actions may affect others": {"p-value": 2.4147307385197283e-05, "V'": 0.08073439137947369}, "displays a lack of understanding of the impact of their actions on another person": {"p-value": 0.0001795284599946097, "V'": 0.07417478046147968}, "fails to recognize the effects of their behavior on the other person": {"p-value": 1.733717097675746e-05, "V'": 0.06842822211668495}, "fails to take responsibility for their own mistakes": {"p-value": 0.0004504161944233754, "V'": 0.04637626511461766}}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios where the author is the asshoole, while the Group B snippets describe scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {"exaggerates the situation to appear victimized": {"p-value": 1.7790163098644532e-09, "V'": 0.14348621629985098}, "uses language that is dismissive and belittling of the other person's opinion": {"p-value": 3.2105185290901474e-06, "V'": 0.11165838279815843}, "employs language that is hostile or aggressive, such as 'ranting' or 'yelling'": {"p-value": 4.8117509079206726e-11, "V'": 0.1527521089609356}, "uses emotionally charged language, such as 'furious' or 'angry'": {"p-value": 1.9125176719481273e-06, "V'": 0.11436881995503101}, "employs emotionally charged language, such as 'furious', 'put-out', and 'winded'": {"p-value": 1.252085603914803e-07, "V'": 0.11623404808102544}, "uses language of blame": {"p-value": 4.583798587328995e-06, "V'": 0.1077413197947732}, "uses language to blame the other person": {"p-value": 1.5779902765957864e-13, "V'": 0.1638480888906223}, "uses a confrontational or aggressive tone": {"p-value": 9.364963807264063e-10, "V'": 0.13138102312023525}, "attempts to shift blame to the other person": {"p-value": 1.349111543525533e-10, "V'": 0.14753559119985837}, "employs manipulative language to gain control, such as 'you owe me' or 'it's only fair'": {"p-value": 1.5275789433247145e-07, "V'": 0.10365792914632088}, "attempts to invalidate the other person's feelings or experiences, such as 'you're overreacting'": {"p-value": 0.0001254693270397457, "V'": 0.0736548670020448}, "uses aggressive language, such as 'you're wrong' or 'you're stupid'": {"p-value": 3.789036620881292e-06, "V'": 0.10634390760134749}, "uses language that is hostile or aggressive": {"p-value": 2.9126291705870867e-07, "V'": 0.12267958841780502}, "uses language that implies the other person is wrong, such as 'you're not taking care of the chores as you should'": {"p-value": 1.0021045048248998e-05, "V'": 0.10541787523493706}, "Uses blaming language, such as 'you should have' or 'you need to'": {"p-value": 0.0005819011897778472, "V'": 0.07468141100462872}, "Uses language to make the other person seem irrational or foolish, such as 'ridiculous' or 'insane'": {"p-value": 2.660574526293595e-05, "V'": 0.09285682346976426}, "uses language that expresses frustration, such as 'sick of' or 'fed up'": {"p-value": 2.831092067212685e-08, "V'": 0.1273735590843782}, "uses language to emphasize the lack of understanding or empathy, such as 'I really didn't'": {"p-value": 0.0003636085663808315, "V'": 0.0838876726720807}, "uses language of blame, such as 'you made me' or 'you caused this'": {"p-value": 1.2024285906568266e-07, "V'": 0.1189116450550523}, "uses language to blame the other person, such as 'it's all your fault' or 'you are the one who caused this'": {"p-value": 2.2503072169455864e-07, "V'": 0.12434760254275562}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios where everyone is an asshole, while the Group B snippets describe scenarios where the author is the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {"involves an act of aggression or violence": {"p-value": 1.9499278184913074e-10, "V'": 0.14634640662636994}, "demonstrates an unwillingness to listen to another's perspective": {"p-value": 1.0309189680099566e-07, "V'": 0.12238551360684524}, "demonstrates a lack of respect for others' boundaries": {"p-value": 9.610949498981233e-06, "V'": 0.09080378928637978}, "refuses to take responsibility for their actions": {"p-value": 2.1573978818763574e-07, "V'": 0.12224573605559819}, "ignores the feelings of others": {"p-value": 2.1287616902059492e-09, "V'": 0.14172606209421817}, "exhibits selfishness or a lack of care for the feelings of others": {"p-value": 7.501958034048703e-05, "V'": 0.07621377937658125}, "shows a lack of empathy or understanding of the other person's perspective": {"p-value": 1.8959095524510125e-06, "V'": 0.11253658051319537}, "actions are motivated by selfishness or a lack of consideration for others": {"p-value": 6.940332778545172e-08, "V'": 0.1207278887772647}, "Exercises selfishness and disregards the feelings of others": {"p-value": 8.045724519443697e-08, "V'": 0.11345086557100148}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios where everyone is an asshole, while the Group B snippets describe scenarios where the author is the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {"uses language of justification and reasoning, such as 'I didn't really care', 'it's none of my business', and 'for the record'": {"p-value": 1.1357780181948258e-09, "V'": 0.13451027700830454}, "uses language that emphasizes the importance of communication, such as 'talk', 'discussed', and 'conversation'": {"p-value": 3.167784823450757e-05, "V'": 0.08977911996979926}, "Uses language that shows understanding of the other person's perspective, such as 'I understand' and 'I know'": {"p-value": 5.497971345087519e-11, "V'": 0.1144296350265313}, "Uses language that shows respect for the other person, such as 'I respect your opinion' or 'I respect your decision'": {"p-value": 0.00016679404641489808, "V'": 0.03864114917118427}, "uses language to emphasize understanding and respect for the other person's opinion, such as 'you're entitled to your opinion' and 'I respect your opinion'": {"p-value": 2.2613980444551994e-07, "V'": 0.06799850810591979}, "uses language to emphasize the importance of the speaker's own opinion, such as 'I feel' or 'my opinion is'": {"p-value": 0.0008165596644117034, "V'": 0.06307765438307822}, "mentions their own appreciation or gratitude for someone or something else": {"p-value": 0.0006809862272957289, "V'": 0.055004975551389695}, "uses language to express the author's conflicting feelings, such as 'conflicted' or 'I was really conflicted'": {"p-value": 4.7584305182762346e-07, "V'": 0.10529634608142457}, "uses language that expresses a desire for understanding, such as 'I understand' or 'I sympathize'": {"p-value": 6.991902959889311e-09, "V'": 0.11973361845961}, "describes situations where a resolution is possible": {"p-value": 0.00012058540609123398, "V'": 0.08546162704297128}, "Uses language that focuses on their own feelings, such as 'I feel' or 'it hurts'": {"p-value": 3.735570177443778e-06, "V'": 0.10032508830184561}, "uses language of understanding and empathy": {"p-value": 7.942308898920968e-17, "V'": 0.18437931289813086}, "expresses understanding and empathy for the other person's opinion": {"p-value": 1.642024301007968e-05, "V'": 0.05677302314814747}}, "-": {"uses language that is hostile, such as 'screamed' or 'yelled'": {"p-value": 6.48513428711905e-13, "V'": 0.11558428006145689}, "describes the emotions of the speaker, such as 'angry' or 'upset'": {"p-value": 0.0005617840410145527, "V'": 0.07700515962137228}}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios where there are \"no assholes here\", while the Group B snippets describe scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {"shows empathy towards the other parties involved": {"p-value": 2.1559316645958463e-11, "V'": 0.14507114378535324}, "tries to find a resolution without resorting to violence or aggression": {"p-value": 1.5943025558092248e-06, "V'": 0.1029439470436393}, "shows respect for other people's beliefs and opinions": {"p-value": 2.2246161514616327e-14, "V'": 0.11576090233378322}, "recognizes and respects the feelings of others": {"p-value": 5.343354761968348e-17, "V'": 0.17798151182619076}, "expresses empathy towards others": {"p-value": 4.4661774539800936e-08, "V'": 0.12128153325867663}, "shows respect for other people's privacy and boundaries": {"p-value": 1.533597980552759e-11, "V'": 0.1381092115794659}, "shows a willingness to put the needs of others ahead of their own": {"p-value": 0.00048368849415457866, "V'": 0.06790301177744804}, "acts with compassion and understanding towards others": {"p-value": 2.3353843560433366e-13, "V'": 0.1470485273409379}, "Shows empathy and understanding for others": {"p-value": 1.0302581099716107e-13, "V'": 0.16406470724158667}, "shows a willingness to listen to and accept feedback from others": {"p-value": 1.7569529083997577e-11, "V'": 0.14069458618035857}, "Acknowledges their partner's feelings and tries to be supportive": {"p-value": 3.5133622205487254e-13, "V'": 0.15588363666779692}, "seeks to understand the other person's perspective": {"p-value": 1.953210693228804e-17, "V'": 0.18765737610057714}, "shows understanding of the other person's feelings and respects their decisions": {"p-value": 2.4565796117822765e-17, "V'": 0.17874532472355825}, "offers an explanation for their behaviour that shows understanding of its consequences": {"p-value": 2.693129861757031e-05, "V'": 0.08776046329907139}, "shows respect and understanding for conflicting opinions": {"p-value": 9.64514493034213e-15, "V'": 0.14047267545272094}, "has a willingness to listen to others' opinions": {"p-value": 5.475423760872583e-15, "V'": 0.16395360836396827}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios where there are \"no assholes here\", while the Group B snippets describe scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe work-related scenarios where the author is the asshole, while the Group B snippets describe work-related scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe work-related scenarios where the author is the asshole, while the Group B snippets describe work-related scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {"acknowledges their own mistakes and how it has hurt the other person": {"p-value": 0.0009568958100604252, "V'": 0.11136236025571988}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe intercourse-related scenarios where the author is the asshole, while the Group B snippets describe intercourse-related scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {"is not open to communication or compromise": {"p-value": 0.0001779052020702931, "V'": 0.12661513751042203}}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe intercourse-related scenarios where the author is the asshole, while the Group B snippets describe intercourse-related scenarios where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios involving a former partner where the author is the asshole, while the Group B snippets describe scenarios involving a former partner where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios involving a former partner where the author is the asshole, while the Group B snippets describe scenarios involving a former partner where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios involving racism where the author is the asshole, while the Group B snippets describe scenarios involving racism where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios involving racism where the author is the asshole, while the Group B snippets describe scenarios involving racism where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about the author's husband where the author is the asshole, while the Group B snippets describe scenarios about the author's husband where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about the author's husband where the author is the asshole, while the Group B snippets describe scenarios about the author's husband where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about the author's wife where the author is the asshole, while the Group B snippets describe scenarios about the author's wife where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about the author's wife where the author is the asshole, while the Group B snippets describe scenarios about the author's wife where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about sexuality where the author is the asshole, while the Group B snippets describe scenarios about sexuality where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about sexuality where the author is the asshole, while the Group B snippets describe scenarios about sexuality where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {"expresses regret for past mistakes": {"p-value": 0.00047388334997028314, "V'": 0.0775667222683703}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about children where the author is the asshole, while the Group B snippets describe scenarios about children where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {"expresses a disregard for potential consequences": {"p-value": 8.877274490052777e-05, "V'": 0.07180438378931792}, "Shows a lack of empathy and understanding of the other person's perspective": {"p-value": 0.00015586521813727884, "V'": 0.08422523526867087}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about children where the author is the asshole, while the Group B snippets describe scenarios about children where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about social media where the author is the asshole, while the Group B snippets describe scenarios about social media where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {}, "-": {"fails to accept responsibility for their actions": {"p-value": 0.0006499887204635848, "V'": 0.14041286410064246}}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios about social media where the author is the asshole, while the Group B snippets describe scenarios about social media where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {}, "-": {"uses language that is dismissive or belittling, such as 'it was a bust' or 'whine'": {"p-value": 0.00022634206022514704, "V'": 0.1651753685554168}}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios involving alcohol where the author is the asshole, while the Group B snippets describe scenarios involving alcohol where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the speaking style of people in the wrong. "}, {"+": {"showcases a willingness to admit when they are wrong": {"p-value": 2.2783115856687676e-05, "V'": 0.09759106047964464}, "takes advantage of an opportunity created by the intoxication of another": {"p-value": 0.00014170420254773155, "V'": 0.14096321349158836}, "involves the author taking advantage of the other person's level of intoxication": {"p-value": 0.00046418994719883065, "V'": 0.0832173284781465}}, "-": {}, "research goal": "The dataset includes posts on the \"Am I The Asshole\" Subreddit, an online forum people ask others whether they were in the wrong. The two classes are generated based on whether Reddit commenters said they were in the wrong. The Group A snippets describe scenarios involving alcohol where the author is the asshole, while the Group B snippets describe scenarios involving alcohol where the author is not the asshole. I am a sociologist studying moral judgements. My goal is to figure out the actions which were judged as right or wrong. "}, {"+": {"asks for a CV in a specific language": {"p-value": 0.00051633207057131, "V'": 0.06353843303936277}, "Requires an English or Russian language CV": {"p-value": 0.0005883769680962294, "V'": 0.06625279230825262}, "mentions the need for a CV/resume": {"p-value": 0.0008181800784362123, "V'": 0.04455683608050498}}, "-": {}, "research goal": "The dataset includes job postings in Armenia. The two classes are generated based on the year the application was posted. The Group A snippets are job applications requirements from 2010 to 2012, while the Group B snippets are job applications requirements from 2013 to 2014. I am a journalist writing about the job market. My goal is to figure out how the application requirements have evolved over time. "}, {"+": {}, "-": {"Requires prior experience": {"p-value": 1.0433634822269462e-05, "V'": 0.09631750725469646}, "Refers to experience in the field": {"p-value": 0.00018675631230753224, "V'": 0.0736635317841654}}, "research goal": "The dataset includes job postings in Armenia. The two classes are generated based on the year the application was posted. The Group A snippets are job postings from 2004 to 2006, while the Group B snippets are job postings from 2007 to 2009. I am a journalist writing about the job market. My goal is to figure out how the application requirements have evolved over time. "}, {"+": {}, "-": {}, "research goal": "The dataset includes job postings in Armenia. The two classes are generated based on the year the application was posted. The Group A snippets are job postings from 2010 to 2012, while the Group B snippets are job postings from 2013 to 2014. I am a journalist writing about the job market. My goal is to figure out how the application requirements have evolved over time. "}, {"+": {}, "-": {"Requires knowledge of software development and coding": {"p-value": 0.0002222485913390963, "V'": 0.0795916891032406}, "Requires experience in developing software systems for financial companies": {"p-value": 5.3133691584510526e-05, "V'": 0.026512748038807165}}, "research goal": "The dataset includes job postings in Armenia. The two classes are generated based on the type of role offered. The Group A snippets are job postings for junior positions, while the Group B snippets are job postings for senior positions. I am a job seeker figuring out which role is right for me. My goal is to figure out the expectations and responsibilities of each role, such as specific skills or experiences. "}, {"+": {"Requires programming and application design knowledge": {"p-value": 5.094956240831759e-26, "V'": 0.3616337215717266}, "Requires knowledge of software development languages such as C# and .NET": {"p-value": 2.2978409658528043e-05, "V'": 0.06491873386650061}, "Requires knowledge and experience in programming and application design": {"p-value": 1.5123862946962974e-18, "V'": 0.30711831618221463}, "Requires experience in software design and development": {"p-value": 4.785109808128289e-10, "V'": 0.21940815299306327}, "Requires experience in designing and developing software applications": {"p-value": 2.9342545391701393e-19, "V'": 0.3112876735851004}, "Requires knowledge of programming languages": {"p-value": 3.3791848761620577e-13, "V'": 0.19404298782973042}, "Experience in coding and data structure algorithms": {"p-value": 8.61745530326581e-14, "V'": 0.231650670601532}}, "-": {"Requires experience with manual and automated testing": {"p-value": 1.5784973565004537e-30, "V'": 0.20482992199151395}, "Involves testing and evaluating software": {"p-value": 4.424493295076107e-48, "V'": 0.38322608212790993}, "Requires deep knowledge of software quality assurance": {"p-value": 2.629898678170524e-43, "V'": 0.30248461323743225}, "Requires experience in manual and automated QA activities": {"p-value": 8.452323972512711e-55, "V'": 0.3043613672920293}, "involves developing and maintaining automated tests": {"p-value": 7.334463666334791e-19, "V'": 0.15519404060552972}, "Requires experience in web-based enterprise level applications testing": {"p-value": 1.1198681512030584e-13, "V'": 0.11287432002109528}, "Requires experience in Quality Assurance": {"p-value": 4.1706510844806014e-98, "V'": 0.44603735126330124}}, "research goal": "The dataset includes job postings in Armenia. The two classes are generated based on the type of role offered. The Group A snippets are job postings for software positions, while the Group B snippets are job postings for quality assurance positions. I am a job seeker figuring out which role is right for me. My goal is to figure out the expectations and responsibilities of each role, such as specific skills or experiences. "}, {"+": {"argues that all lives are equal and should be treated as such": {"p-value": 1.4559985331930568e-106, "V'": 0.6397474426770813}, "highlights the importance of solidarity between different races": {"p-value": 9.341883611529428e-61, "V'": 0.50722207854209}, "acknowledges the importance of racial equality": {"p-value": 1.0871200090123145e-66, "V'": 0.5227055780579163}, "highlights need for equal treatment of all people, regardless of race": {"p-value": 4.829508767024842e-77, "V'": 0.549468829957177}, "mentions the need to treat all people with respect": {"p-value": 8.298327902227383e-81, "V'": 0.5682184859150147}}, "-": {"highlights the lack of attention given to the White Lives Matter movement": {"p-value": 3.258497957526375e-07, "V'": 0.1010489662309691}, "expresses the need for recognition of white lives": {"p-value": 8.080702925533857e-66, "V'": 0.4295891179115659}, "recalls instances of white oppression": {"p-value": 3.937862840980201e-13, "V'": 0.18769063705223968}, "criticizes those who disagree with the White Lives Matter movement": {"p-value": 2.543699876348265e-19, "V'": 0.20754060455335696}, "expresses frustration with lack of recognition of white lives": {"p-value": 1.2251328359850463e-35, "V'": 0.36514151571696674}, "expresses outrage about the lack of acknowledgement for White Lives Matter": {"p-value": 7.579832955858386e-38, "V'": 0.3196229038024676}}, "research goal": "The dataset includes Tweets about the All Lives Matter, Blue Lives Matter, and White Lives Matter movements. The two classes are generated based on the hashtags included in the Tweet. The Group A snippets are Tweets containing #AllLivesMatter, while the Group B snippets are Tweets in support #WhiteLivesMatter. I am a sociologist studying cultural movements. My goal is to figure out the arguments made by each movement. "}, {"+": {"mentions the need to support law enforcement": {"p-value": 1.4689767102782642e-295, "V'": 0.8990987604897803}, "mentions the sacrifices made by police officers": {"p-value": 1.2268016375336698e-13, "V'": 0.5483117558619977}, "highlights the importance of police officers": {"p-value": 2.4098870287592064e-287, "V'": 0.8911459506509438}, "acknowledges the value of police officers": {"p-value": 4.569241059113508e-304, "V'": 0.9036087870402956}, "highlights the heroism of police officers": {"p-value": 4.245611251827349e-56, "V'": 0.6541320996587052}, "expresses support for law enforcement": {"p-value": 2.222548417902014e-295, "V'": 0.8988339704379316}, "highlights the importance of law enforcement": {"p-value": 1.522855007085316e-253, "V'": 0.8693527204444276}, "references police officers and their service": {"p-value": 1.5903805990365557e-276, "V'": 0.8872086014981632}, "highlights the importance of protecting police officers": {"p-value": 5.456816695455588e-250, "V'": 0.8643209470786702}}, "-": {"highlights the need for racial justice": {"p-value": 4.04827305289566e-37, "V'": 0.40130847455455954}, "mentions a lack of recognition for the #WhiteLivesMatter movement": {"p-value": 9.39359837204091e-15, "V'": 0.15293558670911678}, "argues that #WhiteLivesMatter is a joke": {"p-value": 1.0437118928545338e-31, "V'": 0.299172445841692}, "highlights the need for equality between races": {"p-value": 2.0618225012932847e-19, "V'": 0.28379646331078934}, "questions the intent behind other movements": {"p-value": 3.6720530839208765e-25, "V'": 0.35029126890757606}, "highlights the lack of support for White Lives Matter": {"p-value": 1.013506858371539e-12, "V'": 0.14524742365453797}, "acknowledges the need for equality between races": {"p-value": 3.528898409957928e-05, "V'": 0.10949878677311452}, "highlights the prevalence of racism in America": {"p-value": 1.263185661729131e-41, "V'": 0.4152318846851735}}, "research goal": "The dataset includes Tweets about the All Lives Matter, Blue Lives Matter, and White Lives Matter movements. The two classes are generated based on the hashtags included in the Tweet. The Group A snippets are Tweets in support of #BlueLivesMatter, while the Group B snippets are Tweets in support of #WhiteLivesMatter. I am a sociologist studying cultural movements. My goal is to figure out the arguments made by each movement. "}, {"+": {"expresses admiration for police officers and their families": {"p-value": 2.023583333891796e-209, "V'": 0.8148997134266147}, "expresses frustration at the lack of support for police officers": {"p-value": 4.374625183207869e-06, "V'": 0.3288291953825914}, "mentions the sacrifice and heroism of police officers": {"p-value": 5.074074756331356e-193, "V'": 0.787404180994125}, "highlights the dangers and risks of being a police officer": {"p-value": 0.00035812194489386544, "V'": 0.12307691412160077}, "expresses support for enforcement of the law": {"p-value": 1.698427254061385e-181, "V'": 0.7865065880450215}, "expresses support for law enforcement": {"p-value": 6.301919537003606e-185, "V'": 0.7875952449409802}, "references police officers as heroes": {"p-value": 1.1858412960174261e-225, "V'": 0.8275123799818959}, "expresses support for police officers and their families": {"p-value": 5.02021871526594e-172, "V'": 0.7717256175209806}}, "-": {"highlights the importance of understanding the differences between the movements": {"p-value": 0.0008648329766502767, "V'": 0.037474815404981154}, "discusses the need to respect human rights": {"p-value": 9.541472248479886e-95, "V'": 0.608731900438701}, "highlights the importance of civil discourse": {"p-value": 1.8816357046620525e-05, "V'": 0.05260204234730697}, "expresses a desire for unity and understanding": {"p-value": 9.539439109382044e-17, "V'": 0.1856166984886213}, "highlights the need for racial justice": {"p-value": 3.0170743773454686e-32, "V'": 0.33984752747057656}, "calls for positive action instead of fighting": {"p-value": 8.012443424010032e-20, "V'": 0.24485479675394306}, "condemns racism and bigotry": {"p-value": 1.260973130820238e-64, "V'": 0.5319370470718776}, "highlights the importance of understanding various perspectives": {"p-value": 1.8754653764591505e-05, "V'": 0.06361377970375327}, "Calls for civility in movements": {"p-value": 2.9610645801534158e-08, "V'": 0.10325723014914125}, "mentions inequality and racism": {"p-value": 1.6062970734303792e-32, "V'": 0.3728869038703423}, "expresses frustration at the lack of acknowledgement of human rights": {"p-value": 2.770423962149689e-67, "V'": 0.5434930443761089}}, "research goal": "The dataset includes Tweets about the All Lives Matter, Blue Lives Matter, and White Lives Matter movements. The two classes are generated based on the hashtags included in the Tweet. The Group A snippets are Tweets in support of #BlueLivesMatter, while the Group B snippets are Tweets in support of #AllLivesMatter. I am a sociologist studying cultural movements. My goal is to figure out the arguments made by each movement. "}, {"+": {"mentions a new year\u2019s resolution": {"p-value": 0.000821347561286939, "V'": 0.04055748639551236}, "mentions the celebration of the new year": {"p-value": 0.0007826100758258839, "V'": 0.06107904568252112}}, "-": {}, "research goal": "The dataset includes headlines across time from the Examiner, a clickbait news site. The two classes are generated based on the year it was published. The Group A snippets are clickbait headlines from 2010 to 2012, while the Group B snippets are clickbait headlines from 2013 to 2015. I am a researcher studying misinformation. My goal is to figure out which specific topics dominate the news from year to year. "}, {"+": {"uses overly emotional language to manipulate the reader": {"p-value": 2.8615842017577107e-16, "V'": 0.28000150646982813}, "uses sweeping statements and generalizations": {"p-value": 4.20869714639867e-48, "V'": 0.47998817892197626}, "uses emotionally charged language to appeal to the reader's feelings": {"p-value": 1.037856945351609e-08, "V'": 0.1999978317519604}, "avoids acknowledging or addressing counterarguments": {"p-value": 1.6281794442891887e-17, "V'": 0.1599997939779292}, "focuses on the rights of the woman rather than the rights of the unborn child": {"p-value": 1.0640877569696117e-06, "V'": 0.1600025213446521}, "uses language that is overly simplistic or reductionist": {"p-value": 4.621204855005126e-06, "V'": 0.1600002978953397}, "uses moral absolutes without providing valid reasoning": {"p-value": 7.970130673683135e-10, "V'": 0.1957665815211188}, "uses emotionally charged language to persuade the reader": {"p-value": 0.0005694181581057445, "V'": 0.11999864982668418}, "maintains a consistent and logical argument": {"p-value": 7.945356601567913e-52, "V'": 0.39999936770199696}, "uses facts, statistics, and other evidence to back up their argument": {"p-value": 7.946385441364615e-52, "V'": 0.39999836669418665}, "uses facts and evidence to support their argument": {"p-value": 8.72320929989743e-68, "V'": 0.47999994964167747}, "cites statistics and relevant research to back up their argument": {"p-value": 2.9947289634315973e-27, "V'": 0.23999952558513515}, "clearly defines terms and provides concrete examples to clarify the argument": {"p-value": 3.937081114457978e-64, "V'": 0.5199970977520373}, "Uses logical reasoning to support the argument": {"p-value": 7.938012428483611e-52, "V'": 0.3999991393944772}, "uses logical reasoning to draw conclusions and make an argument": {"p-value": 1.6280598358688943e-17, "V'": 0.15996767006100654}, "uses evidence-based arguments to support the point": {"p-value": 2.021826134343603e-59, "V'": 0.4399991893221926}, "acknowledges and works to address the counterarguments": {"p-value": 1.4255813898886785e-26, "V'": 0.3600002528621117}, "uses persuasive language to appeal to the audience's emotions": {"p-value": 1.7447297550671497e-27, "V'": 0.36000070833120956}, "uses specific language to give a clear, concise argument": {"p-value": 5.6776064025753324e-09, "V'": 0.0800007741840223}, "uses logical reasoning to explain why their argument is valid": {"p-value": 8.73094902531975e-68, "V'": 0.4798823927119136}, "uses facts and statistics to back up their argument": {"p-value": 7.937847705493823e-52, "V'": 0.3999995203690728}}, "-": {"acknowledges the complexity of the subject and leaves room for further exploration": {"p-value": 0.000569271762555535, "V'": 0.12000068157753319}}, "research goal": "The dataset includes arguments on a variety of topics annotated for convincingness. The two classes are generated based on how convincing annotators judged the arguments. The Group A snippets are convincing arguments, while the Group B snippets are unconvincing arguments. I am a student writing an argumentative essay who hopes to improve their writing. My goal is to figure out the rhetorical devices used by convincing arguments. "}, {"+": {"uses facts and statistics to support their viewpoint": {"p-value": 6.299551004103039e-57, "V'": 0.6400001228064557}, "uses logic to explain their position": {"p-value": 4.963744599867659e-20, "V'": 0.3181982789489307}, "uses facts and statistics to support the argument": {"p-value": 6.296978350622363e-57, "V'": 0.6400001486177757}, "uses facts and statistics to back up their claims": {"p-value": 6.294408869463063e-57, "V'": 0.6400032437788064}, "uses language that is persuasive and persuasive, but not overly so": {"p-value": 1.816614091150178e-09, "V'": 0.15999900553351742}, "uses logical reasoning to explain why their position is valid": {"p-value": 4.95758582773347e-20, "V'": 0.31999895929396693}, "uses logical reasoning to explain why their point of view is valid": {"p-value": 4.9556878727841716e-20, "V'": 0.3200003332118859}, "uses logical reasoning with facts to back up the argument": {"p-value": 1.1287452610539946e-34, "V'": 0.47999943355170194}, "uses a logical approach to explain a point of view": {"p-value": 4.955041469048167e-20, "V'": 0.32000002818429885}, "uses facts and statistics to back up arguments": {"p-value": 6.30188853576718e-57, "V'": 0.6399988864415797}, "uses facts and statistics to back up their argument": {"p-value": 6.299534949240794e-57, "V'": 0.6400004120963325}, "uses logical reasoning to build a strong argument": {"p-value": 1.8152865238100512e-09, "V'": 0.16000136149149624}, "presents a clear perspective with valid reasoning": {"p-value": 1.1278324471629604e-34, "V'": 0.47999918144665354}, "uses facts and statistics to add credibility to the argument": {"p-value": 6.296948379650042e-57, "V'": 0.6400017513970683}, "uses logical arguments to support their point": {"p-value": 4.951312684385707e-20, "V'": 0.32000143554451155}}, "-": {"uses strong, authoritative language to make a point": {"p-value": 5.573305153679596e-09, "V'": 1.611333406037474e-06}}, "research goal": "The dataset includes arguments on a variety of topics annotated for convincingness. The two classes are generated based on how convincing annotators judged the arguments. The Group A snippets are very convincing arguments, while the Group B snippets are somewhat convincing arguments. I am a student writing an argumentative essay who hopes to improve their writing. My goal is to figure out the rhetorical devices used by convincing arguments. "}, {"+": {}, "-": {}, "research goal": "The dataset includes dialogue from Craigslist negotiations, an online seller platform. The two classes are generated based on the price of the good being sold. The Group A snippets are Craigslist negotiations for expensive bikes, while the Group B snippets are Craigslist negotiations for cheaper bikes. I am a business professor interested in negotiation styles. My goal is to figure out the speaking style of negotiators with different stakes. "}, {"+": {}, "-": {}, "research goal": "The dataset includes dialogue from Craigslist negotiations, an online seller platform. The two classes are generated based on the price of the good being sold. The Group A snippets are Craigslist negotiations for expensive cars, while the Group B snippets are Craigslist negotiations for cheaper cars. I am a business professor interested in negotiation styles. My goal is to figure out the speaking style of negotiators with different stakes. "}, {"+": {}, "-": {}, "research goal": "The dataset includes dialogue from Craigslist negotiations, an online seller platform. The two classes are generated based on the price of the good being sold. The Group A snippets are Craigslist negotiations for expensive housing, while the Group B snippets are Craigslist negotiations for cheaper housing. I am a business professor interested in negotiation styles. My goal is to figure out the speaking style of negotiators with different stakes. "}, {"+": {"offers a reasonable counteroffer": {"p-value": 5.705548944010182e-59, "V'": 0.5030799246221143}, "acknowledges the seller's offer": {"p-value": 7.99634328880944e-137, "V'": 0.706933164725545}, "Acknowledges the seller's offer": {"p-value": 8.567242758079585e-154, "V'": 0.7373291839829094}, "offers a counteroffer within the seller's price range": {"p-value": 1.495395384508172e-106, "V'": 0.6112727995619702}, "Is willing to compromise": {"p-value": 5.238318500957669e-97, "V'": 0.43443097375245343}, "uses a price": {"p-value": 1.1034000800236692e-05, "V'": 0.03935197605856822}}, "-": {"tries to offer too low of a price": {"p-value": 5.585425772122991e-06, "V'": 0.14493097889531675}, "uses language that is aggressive or confrontational": {"p-value": 1.0370227159113195e-07, "V'": 0.045638466152144966}, "contains words of criticism": {"p-value": 5.218988430311483e-13, "V'": 0.10911681127340256}, "asks many questions": {"p-value": 5.1861558636110625e-11, "V'": 0.13651388851150034}, "uses aggressive language": {"p-value": 0.0004797969708939708, "V'": 0.015447898755440826}, "uses language that is too demanding": {"p-value": 7.456517169522829e-23, "V'": 0.12149144185217564}, "makes offer too low": {"p-value": 1.3112933649491142e-17, "V'": 0.2383667072751448}, "doesn't show interest in seller's product": {"p-value": 6.720942864032607e-20, "V'": 0.09790679857739615}, "uses language that is too negative": {"p-value": 0.0005108672557881625, "V'": 0.017738658171266658}, "uses long sentences": {"p-value": 0.0001807262049587433, "V'": 0.09780828858238699}}, "research goal": "The dataset includes dialogue from Craigslist negotiations, an online seller platform. The two classes are generated based on whether a transaction eventually occured. The Group A snippets are succesful Craigslist negotiations, while the Group B snippets are unsuccessful Craigslist negotiations. I am a Craigslist customer who wants to negotiate well. My goal is to figure out the speaking style of successful conversations. "}, {"+": {"refers to Windows and Linux OS debugging skills": {"p-value": 9.064539358495835e-07, "V'": 0.2427458314624244}, "Experience in data mining (SQL, ETL, data warehouse, etc.) and using databases in a business environment with large-scale, complex datasets": {"p-value": 3.2372822723464674e-06, "V'": 0.21107688635656957}, "mentions mobile development platforms and technologies": {"p-value": 2.3466127916751313e-08, "V'": 0.1542570332684715}, "Requires a Bachelor's Degree in Computer Science, Software Engineering, or a related field": {"p-value": 3.184875953699489e-23, "V'": 0.4681225624933787}, "refers to machine learning, speech recognition and data mining techniques": {"p-value": 9.748599024035431e-08, "V'": 0.16166161828409822}, "refers to Amazon Web Services (AWS) and its related services": {"p-value": 7.240579180286071e-59, "V'": 0.6363639607281837}, "Experience with web services and cloud computing": {"p-value": 1.2401410126169037e-31, "V'": 0.573837821532277}, "mentions experience in developing distributed systems and cloud computing": {"p-value": 1.175013694448443e-25, "V'": 0.5212261088909252}, "mentions experience with developing customer-facing experiences": {"p-value": 1.4867929686734591e-05, "V'": 0.21596285555899142}, "mentions experience with virtualization, storage and customer support": {"p-value": 4.7846365431192384e-11, "V'": 0.3430797384397635}, "refers to experience with Cloud-based technologies": {"p-value": 4.2316878926375006e-20, "V'": 0.4459076130926023}, "requires knowledge of software programming languages such as Java, JavaScript, C/C++, Objective C, Python, Ruby, or C#": {"p-value": 1.2157812247821835e-37, "V'": 0.5960500047816468}, "mentions experience with enterprise-wide support that covers the entire datacenter": {"p-value": 4.89981922805216e-15, "V'": 0.40312634350705134}}, "-": {"mentions experience with medical claims processing": {"p-value": 0.0009524055961291821, "V'": 0.07239836607034446}}, "research goal": "The dataset includes American technology job postings on dice.com. The two classes are generated based on the company offering the position. The Group A snippets are job postings for Amazon, while the Group B snippets are job postings for Dell. I am a recent STEM graduate looking for suitable jobs. My goal is to figure out what specific skills different companies require. "}, {"+": {"Refers to the necessity of understanding End to End business processes and associated technical blocks": {"p-value": 2.408753033592255e-08, "V'": 0.3432853775854509}, "mentions experience with Java/J2EE technologies": {"p-value": 3.1615913207121177e-34, "V'": 0.4397832061051838}, "Requires knowledge and experience in Big Data related technologies such as Hadoop, Hive, Greenplum, Teradata, Oracle": {"p-value": 0.0008793561065512363, "V'": 0.03703619393800921}, "Requires Java/J2EE technologies for architecture, design, and development.": {"p-value": 4.970334554112726e-33, "V'": 0.4095487711839076}, "Requires experience with full development lifecycle from inception through implementation": {"p-value": 9.154118419498459e-11, "V'": 0.3654834182324461}, "Requires BS Degree in Engineering, Computer Science, or MIS or equivalent industry experience": {"p-value": 0.0009603930421364006, "V'": 0.14147157162164406}, "mentions proficiency in Python, C++, Java, Linux/UNIX, and Shell Scripting": {"p-value": 3.4084212718496203e-37, "V'": 0.45553115334753524}}, "-": {"Mentions experience in HR business processes and organization design": {"p-value": 5.309863225721597e-07, "V'": 0.2610232041307009}, "mentions experience with HR business processes and organization design": {"p-value": 4.1673663072331395e-07, "V'": 0.2644235506927889}}, "research goal": "The dataset includes American technology job postings on dice.com. The two classes are generated based on the company offering the position. The Group A snippets are job postings for JP Morgan Chase, while the Group B snippets are job postings for Deloitte. I am a recent STEM graduate looking for suitable jobs. My goal is to figure out what specific skills different companies require. "}, {"+": {"mentions the ability to design, develop, integrate, and test hardware and software systems": {"p-value": 3.7979796725170567e-07, "V'": 0.22022734745450073}, "Require US citizenship and the ability to obtain a Secret Clearance": {"p-value": 0.0006669564263372044, "V'": 0.1478158631174168}, "requires knowledge of systems design, hardware, software and firmware design, integration and test": {"p-value": 1.2195132132786573e-08, "V'": 0.24431497210787317}}, "-": {"Requires extensive knowledge of CANES, legacy programs, operating systems and afloat core services": {"p-value": 5.348173128373197e-06, "V'": 0.03571465503944696}}, "research goal": "The dataset includes American technology job postings on dice.com. The two classes are generated based on the company offering the position. The Group A snippets are job postings for Northup Grumman, while the Group B snippets are job postings for Leidos. I am a recent STEM graduate looking for suitable jobs. My goal is to figure out what specific skills different companies require. "}, {"+": {"uses imprecise language": {"p-value": 0.00014485731122832152, "V'": 0.03374839169630353}}, "-": {"uses statements": {"p-value": 4.0169690409205994e-14, "V'": 0.038021968784554216}, "uses a straightforward language": {"p-value": 1.1773168446222588e-06, "V'": 0.06875998673540074}, "uses precise language to avoid confusion": {"p-value": 3.4716808939889433e-10, "V'": 0.0547144158781373}, "uses words that emphasize the potential benefits of their statement": {"p-value": 2.3983291415342467e-08, "V'": 0.06483081793118056}, "uses complex language": {"p-value": 1.148947751826418e-21, "V'": 0.12279337065234908}, "involves a lot of hypothetical situations": {"p-value": 3.567755215270539e-07, "V'": 0.05505151223305885}, "mentions offers, deals, or alliances": {"p-value": 7.35250935722944e-21, "V'": 0.1276314737501743}, "uses assertions of power or confidence": {"p-value": 4.716974336879269e-06, "V'": 0.05033223873591358}, "uses apologetic language": {"p-value": 9.909121733640454e-07, "V'": 0.03511094599853375}, "uses words to emphasize the importance of the statement": {"p-value": 8.95560010744345e-14, "V'": 0.0721985716667945}, "contains persuasive language": {"p-value": 1.6492558730500895e-24, "V'": 0.11480576532504488}, "uses complex and long sentences": {"p-value": 1.6069752268032672e-27, "V'": 0.15794677371397814}}, "research goal": "The dataset includes diaglogue from games of Diplomacy, which involves deception. The two classes are generated based on whether the players were telling the truth. The Group A snippets are true statements in a game, while the Group B snippets are deceptive statements in a game. I am a sociologist studying lying in games. My goal is to figure out the speaking style of liars, so I can tell who might be lying. "}, {"+": {"mentions feeling of hyperactivity and restlessness": {"p-value": 5.363738546987509e-06, "V'": 0.1167932160831487}}, "-": {"mentions feeling of increased love and appreciation for others": {"p-value": 2.2565117509342287e-08, "V'": 0.09374412954186648}, "mentions feeling connected with emotions": {"p-value": 0.00011042742068101253, "V'": 0.13759363125173601}, "mentions feelings of increased empathy": {"p-value": 0.00041829962383181906, "V'": 0.051689951944259925}, "mentions feeling of joy, relaxation, and contentment": {"p-value": 4.2457201380846377e-07, "V'": 0.15648460609294956}, "mentions feeling of overwhelming happiness and confidence": {"p-value": 7.57102115569423e-05, "V'": 0.11250588074918658}, "mentions feeling peaceful and content": {"p-value": 5.1218855401083064e-12, "V'": 0.16780160544232972}, "mentions feeling an elevated sense of well-being": {"p-value": 0.000836411264362648, "V'": 0.10548414778253784}, "mentions feeling of warmth and sensations of touch": {"p-value": 1.1962636029641558e-07, "V'": 0.10595031234174289}}, "research goal": "The dataset includes self-reports of various illicit drugs from Erowid.com. The two classes are generated based on the substance used during the experience. The Group A snippets are accounts of cocaine use, while the Group B snippets are accounts of using MDMA/molly. I am a medical researcher researching effects of drugs. My goal is to figure out the specific bodily experiences caused by each drug. "}, {"+": {"mentions feelings of creativity and openness": {"p-value": 1.1242040580260634e-08, "V'": 0.080229413396803}, "mentions feelings of paranoia or anxiety": {"p-value": 9.837241288397614e-08, "V'": 0.11392973031199316}, "mentions strong feelings of fear or paranoia": {"p-value": 2.6714543399034357e-11, "V'": 0.1298432383288828}, "mentions feeling of enhanced sensory perception": {"p-value": 1.9390884790440122e-11, "V'": 0.13848042889441145}, "mentions feeling as if all senses have been heightened": {"p-value": 1.440204635958965e-07, "V'": 0.07240634149973185}}, "-": {"mentions feeling dizzy or lightheaded": {"p-value": 3.1195044929855084e-12, "V'": 0.09040154560615372}, "mentions feeling of nausea or vomiting": {"p-value": 3.2930614624490823e-13, "V'": 0.07631885350145108}, "mentions feeling nauseous or ill": {"p-value": 4.521101790089807e-16, "V'": 0.12654812954925856}}, "research goal": "The dataset includes self-reports of various illicit drugs from Erowid.com. The two classes are generated based on the substance used during the experience. The Group A snippets are accounts of LSD (a psychedelic) use, while the Group B snippets are accounts of DXM (cough syrup) use. I am a medical researcher researching effects of drugs. My goal is to figure out the specific bodily experiences caused by each drug. "}, {"+": {"mentions seeing images or patterns in objects or surfaces": {"p-value": 0.00018395872361375072, "V'": 0.07213372586434552}, "mentions feeling of heightened awareness": {"p-value": 9.161326584186428e-07, "V'": 0.10098139992055366}, "mentions feeling of intense visual hallucinations": {"p-value": 5.937202657720039e-11, "V'": 0.14514410780128745}, "mentions feeling of intense energy": {"p-value": 0.0008111299204096991, "V'": 0.07504495896693131}, "mentions feeling of increased sensory input": {"p-value": 1.0885926315400502e-10, "V'": 0.1439973693962212}, "mentions feelings of enhanced emotions": {"p-value": 0.00022095339770987856, "V'": 0.08407675169912987}, "mentions feeling as if senses are enhanced": {"p-value": 1.357807015164738e-08, "V'": 0.12923380933923334}, "mentions feeling of being trapped or condemned": {"p-value": 4.707787741930068e-06, "V'": 0.08011045326178659}}, "-": {"mentions feeling a connection to nature": {"p-value": 0.00013569620015538539, "V'": 0.061281661866632045}, "mentions feeling of extreme relaxation": {"p-value": 1.157641788013073e-05, "V'": 0.05467391927942946}, "mentions a feeling of peace and relaxation": {"p-value": 0.00012128766418185569, "V'": 0.05891892444156052}}, "research goal": "The dataset includes self-reports of various illicit drugs from Erowid.com. The two classes are generated based on the substance used during the experience. The Group A snippets are accounts of LSD (a psychedelic) use, while the Group B snippets are accounts of mushroom use. I am a medical researcher researching effects of drugs. My goal is to figure out the specific bodily experiences caused by each drug. "}, {"+": {"mentions feeling a sense of awe and connection to the natural world": {"p-value": 2.0692529892719078e-07, "V'": 0.04392803108571463}}, "-": {"mentions feelings of nausea and/or vomiting": {"p-value": 1.5504125386040514e-09, "V'": 0.06960437631868388}, "mentions feeling of nausea or gagging when ingesting DXM": {"p-value": 2.4012355618417153e-07, "V'": 0.048015695421276025}, "mentions altered perception of motor control or coordination": {"p-value": 3.012164951160555e-13, "V'": 0.08862937715942862}, "mentions feelings of nausea or vomiting": {"p-value": 8.777706632378748e-10, "V'": 0.06888850184559792}}, "research goal": "The dataset includes self-reports of various illicit drugs from Erowid.com. The two classes are generated based on the substance used during the experience. The Group A snippets are accounts of mushroom use, while the Group B snippets are accounts of DXM (cough syrup) use. I am a medical researcher researching effects of drugs. My goal is to figure out the specific bodily experiences caused by each drug. "}, {"+": {"describes a physical altercation": {"p-value": 3.4744718108010344e-05, "V'": 0.03428943232267801}, "mentions the use of physical force": {"p-value": 1.9912654694929144e-05, "V'": 0.05628090134563714}, "refers to the use of physical force": {"p-value": 0.00018389054326048779, "V'": 0.05081890275284111}}, "-": {"mentions the defendant's nationality": {"p-value": 0.0003231968410110089, "V'": 0.03218872588591551}, "mentions a law or regulation": {"p-value": 2.6160900415938334e-07, "V'": 0.11102450956812321}, "cites a particular law or statutory provision": {"p-value": 9.275212736991166e-05, "V'": 0.07165620547473595}}, "research goal": "The dataset includes facts of cases heard before the European Court of Human Rights. The two classes are generated based on the ruling of the court. The Group A snippets are human rights trials where a violation was found, while the Group B snippets are human rights trials where no violation was found. I am a lawyer planning a defense for my defendant. My goal is to figure out what kinds of evidence convince the court that there is a violation. "}, {"+": {"uses a lot of sensory details to bring the story to life": {"p-value": 6.584728838973624e-25, "V'": 0.22463808078740605}, "uses a lot of descriptive adjectives": {"p-value": 2.9720850873066056e-25, "V'": 0.25330560616115405}, "uses concrete examples to illustrate their points": {"p-value": 0.0005685936205175252, "V'": 0.03477953208587714}, "uses a range of vocabulary to create interesting descriptions": {"p-value": 2.189038074267603e-44, "V'": 0.3266841192354869}, "uses descriptive language to paint a picture for the reader": {"p-value": 1.5676331140642668e-06, "V'": 0.05513430796777874}, "has detailed and precise descriptions": {"p-value": 3.186295601729257e-13, "V'": 0.10168181464524506}, "uses captivating and descriptive language": {"p-value": 1.2277808289124875e-22, "V'": 0.1626390706877341}}, "-": {}, "research goal": "The dataset includes essays from students. The two classes are generated based on grades assigned by readers. The Group A snippets are essays with good scores, while the Group B snippets are essays with bad scores. I am a student writing an academic paper. My goal is to figure out the general style of writing readers look for. "}, {"+": {"uses sensationalist language to emphasize certain points": {"p-value": 6.673325288414405e-16, "V'": 0.35941391228488323}, "uses exaggerated claims or statements": {"p-value": 2.799296673547675e-08, "V'": 0.1875220430660609}, "uses sensationalist headlines or language": {"p-value": 5.24336842785049e-15, "V'": 0.34895857090040955}, "uses a sensationalist tone": {"p-value": 4.631240723765773e-13, "V'": 0.3121868900328369}, "uses exaggeration to make a point": {"p-value": 7.704163648490332e-07, "V'": 0.16760431213851235}, "uses exaggerated language to paint a picture": {"p-value": 1.3163531279082892e-10, "V'": 0.25000021195485805}, "uses sensationalist language to grab attention": {"p-value": 3.4418668465108357e-18, "V'": 0.38545598893578464}}, "-": {"uses a factual, neutral tone": {"p-value": 0.00017115098649393476, "V'": 0.14536785849372813}, "uses accurate, specific language to describe events or outcomes": {"p-value": 0.000997936525294456, "V'": 0.06271354035719212}, "uses a factual and straightforward tone": {"p-value": 0.0009347746912942677, "V'": 0.08889235145407215}}, "research goal": "The dataset includes fake and legitimate news. The two classes are generated based on whether they are legitimate or fake news articles. The Group A snippets are fake news articles, while the Group B snippets are legitimate news articles. I am a content moderator looking to flag fake news. My goal is to figure out the writing style of fake news sources. "}, {"+": {"emphasizes the importance of market capitalism": {"p-value": 0.00015243127046420265, "V'": 0.1588328368545866}, "Emphasizes the importance of technological innovation": {"p-value": 3.618798766581991e-07, "V'": 0.20981100912611184}}, "-": {"mentions the need to protect the banking system": {"p-value": 0.0002483401030647678, "V'": 0.1275177897279024}, "mentions the distress in the mortgage sector": {"p-value": 0.000154685575737764, "V'": 0.10344523577213238}}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on who gave the speech. The Group A snippets are FOMC speeches from Chairman Greenspan, while the Group B snippets are FOMC speeches from Chairman Bernanke. I am a economist studying the stances of Fed board members. My goal is to figure out the economic ideology of different Fed chairpeople. "}, {"+": {}, "-": {}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on who gave the speech. The Group A snippets are FOMC speeches from Chairman Yellen, while the Group B snippets are FOMC speeches from Chairman Powell. I am a economist studying the stances of Fed board members. My goal is to figure out the economic ideology of different Fed chairpeople. "}, {"+": {"emphasizes the need to promote a strong economy that extends opportunity to all": {"p-value": 0.0002499950625888577, "V'": 0.27999873369427286}}, "-": {}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on who gave the speech. The Group A snippets are FOMC speeches from Jerome Powell as Fed chariman, while the Group B snippets are FOMC speeches from Jerome Powell as a governor. I am a economist studying the stances of Fed board members. My goal is to figure out the economic ideology of different Fed chairpeople. "}, {"+": {}, "-": {}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on who gave the speech. The Group A snippets are FOMC speeches from Fed Governor Meyer, while the Group B snippets are FOMC speeches from Fed Vice Chairman Ferguson. I am a economist studying the stances of Fed board members. My goal is to figure out the economic ideology of different Fed chairpeople. "}, {"+": {"Mentions the importance of entrepreneurship and competition in the US economy": {"p-value": 9.157012614357372e-07, "V'": 0.13180141256426153}}, "-": {"mentions risks associated with the financial crisis": {"p-value": 2.976234446759846e-09, "V'": 0.19787571670694376}, "emphasizes the role of unconventional monetary policy tools": {"p-value": 6.0313940558540845e-05, "V'": 0.07368426165454284}, "mentions the Federal Reserve's response to the financial crisis": {"p-value": 1.0851414664776499e-09, "V'": 0.17322050616814458}, "discusses the need for banks to restore balance sheets": {"p-value": 6.405012422782881e-07, "V'": 0.09215585423761233}}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on the year the speech was given. The Group A snippets are FOMC speeches from before 2006, while the Group B snippets are FOMC speeches from 2006 to 2014. I am a economic historian studying trends in Fed policy. My goal is to figure out the trends in Fed policy and priorities over the years. "}, {"+": {}, "-": {}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on the year the speech was given. The Group A snippets are FOMC speeches from 2014 to 2018, while the Group B snippets are FOMC speeches from after 2018. I am a economic historian studying trends in Fed policy. My goal is to figure out the trends in Fed policy and priorities over the years. "}, {"+": {"emphasizes the importance of technology in increasing productivity": {"p-value": 8.31170071891792e-06, "V'": 0.0963374790524658}}, "-": {"emphasizes the need to promote economic recovery": {"p-value": 7.821541231777702e-07, "V'": 0.1607704944549451}, "discusses the effects of the financial crisis on the economy": {"p-value": 4.68453023327747e-11, "V'": 0.24870668042884705}}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on the state of the economy during the speech. The Group A snippets are FOMC speeches during periods of high GDP growth, while the Group B snippets are FOMC speeches during periods of low GDP growth. I am an economist studying patterns of Fed behavior. My goal is to figure out what outcomes the Fed prioritizes under different economic conditions. "}, {"+": {}, "-": {"emphasizes the need for fiscal policy makers to remain prudent": {"p-value": 0.00013359371019228925, "V'": 0.17444438154586234}, "discusses the effects of the financial crisis on consumer wealth": {"p-value": 3.5010848259030016e-10, "V'": 0.18828843654497351}}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on the state of the economy during the speech. The Group A snippets are FOMC speeches during periods of high interest rates, while the Group B snippets are FOMC speeches during periods of low interest rates. I am an economist studying patterns of Fed behavior. My goal is to figure out what outcomes the Fed prioritizes under different economic conditions. "}, {"+": {"emphasizes the importance of financial stability": {"p-value": 8.071715026265498e-06, "V'": 0.20070051429972124}, "emphasizes the need to address the economic crisis": {"p-value": 3.767201538346685e-25, "V'": 0.41420915083006804}, "discusses the effects of the financial crisis on the economy": {"p-value": 3.347473209024735e-16, "V'": 0.28263710826964306}, "acknowledges the challenges posed by the financial crisis": {"p-value": 1.1496555931300635e-23, "V'": 0.4223270454036649}, "emphasizes the need for regulations to contain systemic risk": {"p-value": 0.00010103857854009931, "V'": 0.12125968712693991}}, "-": {"emphasizes the need for innovation and technological advancement": {"p-value": 2.857023395614859e-05, "V'": 0.09639567080943902}, "emphasizes the positive outcomes of the long expansion": {"p-value": 2.8238005151624703e-11, "V'": 0.1944536181243054}}, "research goal": "The dataset includes Federal Open Market Committee (FOMC) speeches from 1996-2020, which describe Federal Reserve policy. The two classes are generated based on the state of the economy during the speech. The Group A snippets are FOMC speeches during periods of high unemployment, while the Group B snippets are FOMC speeches during periods of low unemployment. I am an economist studying patterns of Fed behavior. My goal is to figure out what outcomes the Fed prioritizes under different economic conditions. "}, {"+": {}, "-": {"mentions luxury items, such as cars or jewellery": {"p-value": 0.000713419202542033, "V'": 0.040548803527387836}, "mentions modern technology, such as cars or phones": {"p-value": 0.000692747861593135, "V'": 0.03959332526976905}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the year the song was released. The Group A snippets are 2000-2010 song lyrics, while the Group B snippets are 2010s song lyrics. I am a record label looking for the next big hit. My goal is to figure out the specific topics of music throughout the years. "}, {"+": {"mentions a feeling of nostalgia or longing": {"p-value": 6.842822089278107e-14, "V'": 0.09926742952459779}, "reflects on the past and mentions nostalgia": {"p-value": 2.002467773746717e-05, "V'": 0.08515413087616275}, "mentions the idea of love, such as searching for someone to love": {"p-value": 1.512363812184021e-14, "V'": 0.12257311073645923}, "focuses on feelings of nostalgia, such as memories or moments": {"p-value": 1.272704364432892e-10, "V'": 0.09525530134042329}, "mentions a sense of nostalgia or longing for the past": {"p-value": 3.15667324301349e-07, "V'": 0.06318835830278663}, "mentions a sense of longing or heartache": {"p-value": 5.031681699753658e-23, "V'": 0.19473419857333057}}, "-": {"refers to historical figures or events": {"p-value": 4.65875322473472e-05, "V'": 0.03694405190222849}, "references fame, such as celebrity or wanting to be famous": {"p-value": 4.409850910048444e-10, "V'": 0.06546541734144837}, "mentions the idea of justice, such as righting wrongs": {"p-value": 2.4485856269677477e-10, "V'": 0.05292838459441465}, "mentions feelings of anger or aggression": {"p-value": 9.894755554375372e-41, "V'": 0.22655376164943047}, "mentions a struggle or a challenge": {"p-value": 2.0057625489184173e-11, "V'": 0.12267528342073014}, "mentions of fame or celebrity status": {"p-value": 1.3374891066785415e-08, "V'": 0.05209174298317975}, "mentions violence or aggression": {"p-value": 9.678948180762689e-26, "V'": 0.16512576440202748}, "references violence or aggression": {"p-value": 4.2494425983460326e-26, "V'": 0.17225719242548218}, "includes themes of oppression and injustice": {"p-value": 1.3994391060608359e-14, "V'": 0.11870830504402397}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the year the song was released. The Group A snippets are seventies song lyrics, while the Group B snippets are eighties song lyrics. I am a record label looking for the next big hit. My goal is to figure out the specific topics of music throughout the years. "}, {"+": {"References classic horror movies, like monsters and creeps": {"p-value": 6.426157198745925e-07, "V'": 0.04258777611323801}, "uses descriptive language to paint a picture": {"p-value": 0.0008804820387891166, "V'": 0.01952150501180253}, "mentions a struggle or hardship in life": {"p-value": 0.0009048638252324376, "V'": 0.050889301392628844}}, "-": {"mentions fighting or struggle": {"p-value": 6.098004858948292e-07, "V'": 0.10183344154708174}, "mentions violence or criminal activity": {"p-value": 1.7769930601179401e-07, "V'": 0.10765626905990483}, "makes references to drugs and drug use": {"p-value": 1.5456845665137665e-07, "V'": 0.055680400253246075}, "mentions violence or aggression": {"p-value": 2.29819848531211e-05, "V'": 0.08853137167833547}, "mentions of partying, such as alcohol and drugs": {"p-value": 4.6751392392672374e-07, "V'": 0.05637660173496256}, "Features a lot of braggadocios lyrics": {"p-value": 1.0582294748199668e-20, "V'": 0.19349727517236903}, "references to violence, such as guns or killing": {"p-value": 8.892979945673971e-06, "V'": 0.08960224576824716}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the year the song was released. The Group A snippets are eighties song lyrics, while the Group B snippets are nineties song lyrics. I am a record label looking for the next big hit. My goal is to figure out the specific topics of music throughout the years. "}, {"+": {"mentions social injustice, such as oppression or racism": {"p-value": 0.00016516031320213095, "V'": 0.06352154756779549}, "Raps about violent topics such as guns or gangs": {"p-value": 1.194763019644619e-06, "V'": 0.08621770738575657}, "references street life or gang culture": {"p-value": 4.933241468045047e-07, "V'": 0.1034187280376408}}, "-": {}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the year the song was released. The Group A snippets are nineties song lyrics, while the Group B snippets are 2000-2010 song lyrics. I am a record label looking for the next big hit. My goal is to figure out the specific topics of music throughout the years. "}, {"+": {"references to death and mortality": {"p-value": 8.231920956677749e-09, "V'": 0.06682499239456252}, "expressions of pain and anguish": {"p-value": 4.400149922902665e-08, "V'": 0.0821247081596014}, "focus on inner thoughts and emotions": {"p-value": 0.0009109285125111265, "V'": 0.053897027539549835}, "mentions darker themes, including death and violence": {"p-value": 3.079153943607859e-14, "V'": 0.10228788754966683}, "mentions violence, destruction, and revenge": {"p-value": 9.855058859640712e-12, "V'": 0.0740856836712333}, "mentions of death and darkness": {"p-value": 5.245924324585844e-10, "V'": 0.06800208204828037}, "mentions of death and violence": {"p-value": 9.409217415544207e-11, "V'": 0.06804755838256239}, "mentions of being in a dark place emotionally": {"p-value": 4.65870303748732e-11, "V'": 0.11288723547372581}, "mentions of inner struggles and self-reflection": {"p-value": 4.994960924058122e-06, "V'": 0.07999185288309218}}, "-": {"language of optimism": {"p-value": 8.13920838112052e-05, "V'": 0.05332100006230964}, "references to love and affection": {"p-value": 0.00030280442893960177, "V'": 0.06349255325768194}, "mentions loving and feeling joy": {"p-value": 3.268164986922409e-06, "V'": 0.06845869740308311}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the genre of the lyrics. The Group A snippets are from the alternative genre, while the Group B snippets are from the pop genre. I am an artist trying to figure out which genre my music belongs to. My goal is to figure out the topics that define each genre. "}, {"+": {}, "-": {}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the genre of the lyrics. The Group A snippets are from the alternative rock genre, while the Group B snippets are from the rock genre. I am an artist trying to figure out which genre my music belongs to. My goal is to figure out the topics that define each genre. "}, {"+": {"mentions of wealth, luxury items and success": {"p-value": 0.0005752694685131078, "V'": 0.027870550155305186}, "mentions of physical attraction and sensuality": {"p-value": 0.0004902564521198683, "V'": 0.057775958947531736}, "mentions of partying and having fun": {"p-value": 0.0007851106532791857, "V'": 0.03826064596725175}, "mentions of partying and nightlife": {"p-value": 0.0001332384957728847, "V'": 0.04124895231477088}, "high-energy lyrics and upbeat rhythms": {"p-value": 0.00015537934116797395, "V'": 0.051283115545172314}}, "-": {"mentions faith and religion": {"p-value": 0.00031804568209136774, "V'": 0.03377793782201569}, "mentions spirituality and prayer": {"p-value": 0.00014512046291236813, "V'": 0.032181208265384995}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the genre of the lyrics. The Group A snippets are from the R&B genre, while the Group B snippets are from the soul genre. I am an artist trying to figure out which genre my music belongs to. My goal is to figure out the topics that define each genre. "}, {"+": {}, "-": {"mentions of luxury items and wealth": {"p-value": 0.0006189473286773864, "V'": 0.04712944656589263}, "mentions of luxury items and money": {"p-value": 6.704585513685285e-05, "V'": 0.057152850081921475}, "referencing and bragging about money and wealth": {"p-value": 0.0007249810998968378, "V'": 0.04710262673935317}, "mentions money and material wealth": {"p-value": 0.00041257948795896153, "V'": 0.047859908514490124}, "mentions money, material possessions and wealth": {"p-value": 0.0007453861103191847, "V'": 0.0454347741437067}, "mentions of money, material wealth, and success": {"p-value": 0.0007464623033261716, "V'": 0.045276090313325135}, "mentions of money and expensive items, such as cars and jewelry": {"p-value": 9.258435744194931e-05, "V'": 0.055080141234589985}, "references to money and material possessions": {"p-value": 6.350725117925213e-05, "V'": 0.058137364580366196}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the genre of the lyrics. The Group A snippets are from the rap genre, while the Group B snippets are from the trap genre. I am an artist trying to figure out which genre my music belongs to. My goal is to figure out the topics that define each genre. "}, {"+": {"explores themes of love, relationships, and heartbreak": {"p-value": 2.0711005071470768e-08, "V'": 0.14762401194422792}, "speaks of love, romance, and relationships": {"p-value": 1.186253705234808e-08, "V'": 0.1501410331695875}, "explores themes of romance, love and relationships": {"p-value": 9.79724290718505e-08, "V'": 0.14004173575379797}, "explores themes of love and relationships": {"p-value": 9.395567426647455e-06, "V'": 0.11708579977734151}, "references the power of love and relationships": {"p-value": 0.00015903104757449988, "V'": 0.09721076757613012}, "focuses on relationships, breakups, and love": {"p-value": 1.6625688212595333e-08, "V'": 0.14886499263445163}, "talks about love, romance and relationships": {"p-value": 3.838254578399305e-08, "V'": 0.14516902058749015}, "focuses on romantic relationships and infatuation": {"p-value": 3.9383250774263006e-07, "V'": 0.13322134390918505}, "mentions love, relationships, and heartbreak": {"p-value": 5.866085939731427e-07, "V'": 0.13171009130894906}}, "-": {"explores themes of strength, resilience, and family": {"p-value": 1.2197827110165917e-06, "V'": 0.06223226996765645}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the artist. The Group A snippets are Ariana Grande song lyrics, while the Group B snippets are Beyonce song lyrics. I am a music fanatic hoping to understand each artist. My goal is to figure out the specific topics of each musician. "}, {"+": {"talks about the power of love and relationships": {"p-value": 0.00010860476968487056, "V'": 0.06185815141179171}, "references relationships, love, and romance": {"p-value": 1.0546395553182252e-09, "V'": 0.12263150603367035}, "Explores themes of heartbreak, relationships and love": {"p-value": 6.241271632652584e-10, "V'": 0.11306418965980089}, "talks of relationships and heartbreak": {"p-value": 6.0227183058529074e-09, "V'": 0.10161678668322685}}, "-": {"references religion and spirituality": {"p-value": 5.0079296365401294e-12, "V'": 0.07984842597287044}, "uses metaphors and allusions to explore complex topics": {"p-value": 6.532523973409481e-07, "V'": 0.037097059683408096}, "explores themes of freedom and independence": {"p-value": 1.0456818627174078e-07, "V'": 0.04523307771799452}, "mentions social issues and injustices": {"p-value": 7.150700359004575e-09, "V'": 0.07800653236722287}, "mentions the power of creativity": {"p-value": 4.0099308104357e-07, "V'": 0.03277343506623776}, "focuses on the power of faith and belief": {"p-value": 2.6434189655538147e-15, "V'": 0.07887444700237929}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the artist. The Group A snippets are Drake song lyrics, while the Group B snippets are Kanye song lyrics. I am a music fanatic hoping to understand each artist. My goal is to figure out the specific topics of each musician. "}, {"+": {"uses aggressive language to emphasize points": {"p-value": 8.517062327375938e-10, "V'": 0.12522474977956843}, "explores the feeling of being misunderstood": {"p-value": 1.0526076178211226e-07, "V'": 0.1105580232492811}, "references violence and crime": {"p-value": 4.0674460804967035e-20, "V'": 0.16794465853910662}, "focuses heavily on anger, revenge, and violence": {"p-value": 3.4986604173283403e-23, "V'": 0.19726159387730424}, "references violence and aggression": {"p-value": 1.3743370894863842e-18, "V'": 0.17824726958119375}, "mentions a struggle against authority": {"p-value": 1.5275176385770022e-14, "V'": 0.1469701888653218}, "discusses youth and/or recklessness": {"p-value": 1.265623992589696e-06, "V'": 0.09874575015684078}, "explores themes of anxiety and fear": {"p-value": 9.568121495141154e-17, "V'": 0.12277800427235656}, "mentions violence and aggression": {"p-value": 6.091280082922981e-25, "V'": 0.20358322000196669}, "touches on dark topics such as depression, drug abuse, and violence": {"p-value": 1.3839868914456409e-30, "V'": 0.2338359196370996}, "references criminal behavior": {"p-value": 6.262784736911525e-16, "V'": 0.10895188058388977}, "mentions the difficulty of navigating life's choices and consequences": {"p-value": 2.4186551265596545e-19, "V'": 0.18576879165317045}}, "-": {"mentions money and materialism": {"p-value": 1.794711926292942e-16, "V'": 0.08606367806638957}, "talks about the power of music and its ability to connect people": {"p-value": 6.884736112950457e-11, "V'": 0.08415761794543603}, "explores themes of following your dreams and never giving up": {"p-value": 8.906283854697187e-16, "V'": 0.1372201846645088}, "mentions the idea of overcoming obstacles and reaching for the stars": {"p-value": 1.882944879930363e-12, "V'": 0.125145360607178}, "discusses the power of knowledge and stories": {"p-value": 7.983830284962715e-06, "V'": 0.039185826760124325}, "speaks of hope, faith and courage": {"p-value": 0.00017256529699199893, "V'": 0.061047620894889826}, "mentions dreams and achieving goals": {"p-value": 3.282734682471926e-23, "V'": 0.1398072854890554}, "mentions the power of money": {"p-value": 8.349909953312559e-15, "V'": 0.07378771973538335}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the artist. The Group A snippets are Eminem song lyrics, while the Group B snippets are Logic song lyrics. I am a music fanatic hoping to understand each artist. My goal is to figure out the specific topics of each musician. "}, {"+": {"deals with the idea of identity and self-discovery": {"p-value": 0.0007704001793371693, "V'": 0.03578526394244553}, "references drugs and alcohol": {"p-value": 4.314427148247187e-05, "V'": 0.06046728527483308}, "explores themes of self-love and self-confidence": {"p-value": 0.00017422421292124718, "V'": 0.026888474826428248}}, "-": {"explores themes of love and heartbreak": {"p-value": 0.0001186162263758419, "V'": 0.04447407726496675}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the artist. The Group A snippets are Kendrick Lamar song lyrics, while the Group B snippets are J. Cole song lyrics. I am a music fanatic hoping to understand each artist. My goal is to figure out the specific topics of each musician. "}, {"+": {"references the idea of relationships and love": {"p-value": 7.354667581666159e-06, "V'": 0.11315409042420921}, "explores themes of love, lust, and desire": {"p-value": 0.0004347181337840742, "V'": 0.09037408051769924}, "speaks of love, relationships, and heartache": {"p-value": 1.0110353316074e-07, "V'": 0.13558594872985397}, "explores themes of romance, infatuation and relationships": {"p-value": 1.0567119695822086e-06, "V'": 0.12560315355952512}, "explores themes of love and relationships": {"p-value": 5.336498483539103e-06, "V'": 0.11480577765848243}, "references to love and relationships": {"p-value": 1.2429689767549646e-06, "V'": 0.1201890036163501}, "deals with themes of love and romance": {"p-value": 6.259768046408657e-07, "V'": 0.12980814275754216}, "expresses feelings of love and longing": {"p-value": 3.081692780802019e-05, "V'": 0.10949432071348364}, "explores themes of love, relationships, and heartbreak": {"p-value": 1.1308721241683308e-06, "V'": 0.1270753655217275}, "mentions relationships and the power of love": {"p-value": 6.944348710623769e-07, "V'": 0.1309379568557485}, "explores the intensity of romantic relationships": {"p-value": 2.778792742284513e-05, "V'": 0.11017264212164579}, "mentions relationships, intimacy, and physical attraction": {"p-value": 0.00043567127127886845, "V'": 0.08254216220937116}}, "-": {"mentions family and home life": {"p-value": 2.3041890788425816e-05, "V'": 0.06427175433009241}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on the artist. The Group A snippets are The Weeknd song lyrics, while the Group B snippets are Post Malone song lyrics. I am a music fanatic hoping to understand each artist. My goal is to figure out the specific topics of each musician. "}, {"+": {"mentions the West Coast and West Coast cities": {"p-value": 1.463134024794424e-14, "V'": 0.07574203146862836}, "mentions California, Los Angeles and other West Coast cities": {"p-value": 1.292630254304285e-13, "V'": 0.07590106690989645}}, "-": {}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on where the song was created. The Group A snippets are east coast song lyrics, while the Group B snippets are west coast song lyrics. I am a music researcher studying the emergence of music in different places. My goal is to figure out the topics that come up in music from different places. "}, {"+": {"focuses on relationships between people": {"p-value": 1.933602679909917e-36, "V'": 0.1727405672172096}, "uses words associated with nightlife": {"p-value": 1.251607494371734e-07, "V'": 0.047519837892064806}, "mentions elements of nature, such as the sky or stars": {"p-value": 4.039765961085703e-07, "V'": 0.032908561633751235}, "expresses feelings of love and affection": {"p-value": 8.18233195177641e-32, "V'": 0.12310874223685553}, "expresses feelings of joy, pride, and community": {"p-value": 0.0009516754179034719, "V'": 0.014966420792233863}}, "-": {"uses words associated with youth and rebellion": {"p-value": 6.568351996521434e-11, "V'": 0.12051780706495296}, "references historical moments or people": {"p-value": 5.2073289682317136e-05, "V'": 0.026099284008556222}, "references French culture and history": {"p-value": 2.3218303991404522e-27, "V'": 0.09251573384681425}, "uses slang terms and expressions": {"p-value": 9.289332191933654e-10, "V'": 0.1240466638591563}, "uses slang words and expressions": {"p-value": 5.044110312563981e-10, "V'": 0.12621271519108485}, "mentions French cities": {"p-value": 2.5488279873071708e-33, "V'": 0.11333598676767062}, "mentions French words and phrases": {"p-value": 0.0, "V'": 0.8941797013018553}, "contains words in French and other Romance languages": {"p-value": 0.0, "V'": 0.8900144373105345}, "mentions French cities and culture": {"p-value": 4.643440278358951e-38, "V'": 0.1281525968254923}, "uses words associated with French language and dialect": {"p-value": 0.0, "V'": 0.8975056332203226}, "references French history and icons": {"p-value": 1.1235656914627561e-09, "V'": 0.03170009886882618}, "mentions cultural icons from France or Europe": {"p-value": 1.1433939884546553e-10, "V'": 0.036675583071742915}}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on where the song was created. The Group A snippets are UK lyrics, while the Group B snippets are French lyrics. I am a music researcher studying the emergence of music in different places. My goal is to figure out the topics that come up in music from different places. "}, {"+": {}, "-": {}, "research goal": "The dataset includes lyrics collected from Genius.com before 2020. The two classes are generated based on how many views the song received on Genius. The Group A snippets are highly viewed song lyrics, while the Group B snippets are song lyrics with moderate views. I am a record label looking for the next big hit. My goal is to figure out the topics that a pop audience looks for in a song. "}, {"+": {"references academic achievements": {"p-value": 0.00021532800331011312, "V'": 0.046328592146745076}, "references passing exams and tests": {"p-value": 1.4002671770368585e-07, "V'": 0.04984627100691061}, "involves recreational activities, such as playing video games, going to karaoke, or playing sports": {"p-value": 0.0006583801063371307, "V'": 0.05546641664357471}}, "-": {"references experiences with family members": {"p-value": 3.8848467027447636e-07, "V'": 0.0878010137851703}, "involves celebrations with family or close friends": {"p-value": 0.00046435764790179134, "V'": 0.049242218469710775}, "involves spending time with family members": {"p-value": 2.8032756018887466e-06, "V'": 0.0716753177022272}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the age of the respondent. The Group A snippets discuss happy moments from 18-21 year olds, while the Group B snippets discuss happy moments from 22-25 year olds. I am a psychologist studying the effect of aging on happiness. My goal is to figure out what specific experiences make us happy as we grow older. "}, {"+": {}, "-": {}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the age of the respondent. The Group A snippets discuss happy moments from 22-25 year olds, while the Group B snippets discuss happy moments from 26-35 year olds. I am a psychologist studying the effect of aging on happiness. My goal is to figure out what specific experiences make us happy as we grow older. "}, {"+": {"mentions the birth of a relative": {"p-value": 5.541873820945313e-07, "V'": 0.0431095717395592}}, "-": {}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the age of the respondent. The Group A snippets discuss happy moments from 26-35 year olds, while the Group B snippets discuss happy moments from 36-45 year olds. I am a psychologist studying the effect of aging on happiness. My goal is to figure out what specific experiences make us happy as we grow older. "}, {"+": {}, "-": {}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the age of the respondent. The Group A snippets discuss happy moments from 36-45 year olds, while the Group B snippets discuss happy moments from 46+ year olds. I am a psychologist studying the effect of aging on happiness. My goal is to figure out what specific experiences make us happy as we grow older. "}, {"+": {"talks about spending quality time with family": {"p-value": 9.894160497968078e-99, "V'": 0.31582400265082683}, "includes activities that involve spending time with family": {"p-value": 9.865657369375549e-157, "V'": 0.43899750506913926}, "involves positive interactions with family members": {"p-value": 1.0471190954379608e-173, "V'": 0.4625705942083892}, "involves family members or relatives": {"p-value": 0.0, "V'": 0.6661379653215141}, "mentions interactions with loved ones": {"p-value": 1.1442734504177084e-24, "V'": 0.20892739781884612}, "refers to a moment of joy related to the success of another person": {"p-value": 0.0006383961745081205, "V'": 0.055402699308716566}, "mentions events that involve spending time with a romantic partner": {"p-value": 1.4764170515466147e-47, "V'": 0.19757821144637122}, "involves a romantic partner, spouse, or significant other": {"p-value": 1.0977759829015996e-71, "V'": 0.26564612776925056}, "refers to moments of joy with an immediate family member": {"p-value": 2.6408622054176595e-171, "V'": 0.45481706848780346}, "involves spending time with children": {"p-value": 2.5009146905395045e-61, "V'": 0.2248147047055158}}, "-": {"refers to a moment of joy with a co-worker or colleague": {"p-value": 2.3542063205139833e-12, "V'": 0.04253470828075664}, "mentions activities that involve interacting with colleagues or neighbors": {"p-value": 3.58671681101464e-129, "V'": 0.38833279762064}, "involves conversations with strangers, colleagues, or acquaintances": {"p-value": 6.220226103650143e-23, "V'": 0.09268345390647571}, "mentions events that involve visiting people": {"p-value": 7.630260259905339e-35, "V'": 0.2382114767671452}, "involves conversations with coworkers or colleagues": {"p-value": 1.387259920238662e-12, "V'": 0.043402901158111094}, "mentions events that involve spending quality time with colleagues": {"p-value": 1.2214992632323741e-20, "V'": 0.07475220979944407}, "refers to moments of joy with a work colleague or acquaintance": {"p-value": 5.559676829050811e-42, "V'": 0.1620967121656887}, "involves conversation or conversation topics": {"p-value": 2.1205662698693777e-08, "V'": 0.08089145534239184}, "mentions events that involve socializing with co-workers or friends": {"p-value": 0.0, "V'": 0.711559900062383}, "refers to a moment of joy with a long-lost friend or former flame": {"p-value": 2.685443526973103e-19, "V'": 0.08391583863049008}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the type of happiness experienced. The Group A snippets discuss happy moments about affection, while the Group B snippets discuss happy moments about bonding. I am a psychologist trying to precisely define types of emotions. My goal is to figure out the salient aspects of different kinds of experiences, like the types of people or feelings. "}, {"+": {"refers to a moment of joy with friends or peers": {"p-value": 1.8080432133598946e-06, "V'": 0.12492571576300562}, "mentions food or drinks, either cooking or eating": {"p-value": 5.880540599087705e-25, "V'": 0.19632637229336508}, "describes a moment of contentment with food or drinks": {"p-value": 3.325197920297276e-28, "V'": 0.1818358173045555}, "occurs when a person is surprised by an unexpected event": {"p-value": 4.9798709057248884e-08, "V'": 0.05162234748171271}, "mentions positive feelings or emotions": {"p-value": 1.803090668164997e-80, "V'": 0.3842940794574452}, "mentions moments of contentment or satisfaction": {"p-value": 1.980868466483641e-88, "V'": 0.39653476327830345}, "involves unexpected or low-cost purchases": {"p-value": 3.2406560074537224e-08, "V'": 0.04539942127911256}, "mentions helping other people": {"p-value": 1.253163334762507e-11, "V'": 0.048080780156006545}, "mentions an event that involves food or eating": {"p-value": 1.3481299423431e-32, "V'": 0.2082572963706656}}, "-": {"involves playing games or sports": {"p-value": 3.1101049200185723e-23, "V'": 0.13641260972000602}, "involves spending money or other resources": {"p-value": 1.0644605243495194e-21, "V'": 0.35758890938659427}, "mentions watching a specific show or event": {"p-value": 1.3515069411230516e-87, "V'": 0.33784694081204086}, "involves interacting with technology or media": {"p-value": 5.962469182075873e-69, "V'": 0.3094185812227363}, "mentions activities that involve playing games or gambling": {"p-value": 1.8052646240833124e-24, "V'": 0.12748263889292655}, "mentions activities that involve entertainment or leisure": {"p-value": 1.2650191544472915e-135, "V'": 0.48336471889240823}, "Involves watching a movie, TV show, or video game": {"p-value": 9.672502383764584e-126, "V'": 0.4171615252083117}, "mentions events that involve watching movies or TV shows": {"p-value": 1.60701315119975e-89, "V'": 0.3298075659172229}, "mentions leisure activities or hobbies": {"p-value": 4.806619308794188e-173, "V'": 0.5438660573083876}, "mentions using technology or electronic devices": {"p-value": 2.260910423432666e-19, "V'": 0.12905337304033757}, "involves activities that require a certain amount of effort or money": {"p-value": 4.056814370691414e-18, "V'": 0.2583976727386228}, "involves activities that are considered leisurely": {"p-value": 1.947114747143275e-111, "V'": 0.4242367609802974}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the type of happiness experienced. The Group A snippets discuss happy moments about enjoying the moment, while the Group B snippets discuss happy moments about leisure. I am a psychologist trying to precisely define types of emotions. My goal is to figure out the salient aspects of different kinds of experiences, like the types of people or feelings. "}, {"+": {"mentions activities associated with traditional masculinity, such as gaming or working extra hours": {"p-value": 4.801919100615628e-07, "V'": 0.06569753955449625}, "mentions activities related to sports, gaming or technology": {"p-value": 7.770843826848439e-08, "V'": 0.07781096275147202}, "mentions achieving a goal or completing a task": {"p-value": 5.6914355483808e-09, "V'": 0.12144554599648716}, "mentions activities related to technology, such as buying a new laptop": {"p-value": 0.0001067208671210972, "V'": 0.03876864151683992}}, "-": {"mentions emotional support from family and friends": {"p-value": 5.649557116512008e-06, "V'": 0.069294030352213}, "mentions experiences related to family and children": {"p-value": 9.806049901517042e-09, "V'": 0.11328910642076567}, "mentions activities that involve spending time with family": {"p-value": 6.731732777239155e-05, "V'": 0.06518868749123316}, "mentions children or family": {"p-value": 8.054834977697783e-09, "V'": 0.10585016302839304}, "mentions acts of kindness from people close to them": {"p-value": 8.685326163053003e-06, "V'": 0.06644562126606658}, "mentions relationships with family or friends": {"p-value": 4.016365987559832e-06, "V'": 0.09563256839137613}, "mentions activities that are relationship-oriented, such as going out for dinner with friends or attending a family event": {"p-value": 1.3666856537217874e-05, "V'": 0.08959916566479764}, "mentions experiences related to parenting or child care": {"p-value": 1.840269973069921e-11, "V'": 0.09318521640079355}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the gender or familial status of the respondent. The Group A snippets discuss happy moments from males, while the Group B snippets discuss happy moments from females. I am a sociologist studying intimate relationships. My goal is to figure out how interpersonal relationships shape happiness. "}, {"+": {"mentions engagements with peers, such as going on a date or receiving a text from someone they like": {"p-value": 2.8241652322386533e-06, "V'": 0.06711557877977498}}, "-": {"mentions activities related to family life, such as parenting or celebrating holidays": {"p-value": 2.2943091337252496e-20, "V'": 0.14988773808110342}, "mentions spending quality time with family members": {"p-value": 2.061853261194184e-06, "V'": 0.06390305999523913}, "mentions activities related to family, such as spending time with children or family members": {"p-value": 1.4277460965535252e-15, "V'": 0.13759834734691634}, "mentions acts of kindness or thoughtfulness from a spouse or partner": {"p-value": 4.252155761509756e-05, "V'": 0.03171320232090528}, "mentions activities related to family or children": {"p-value": 6.173063251651788e-23, "V'": 0.1723745331159404}, "mentions activities related to family members or children": {"p-value": 4.21623308186821e-23, "V'": 0.1792523231976968}, "mentions activities related to family, such as going shopping for a child": {"p-value": 3.5479166280650245e-11, "V'": 0.10040208736353706}, "mentions activities with children": {"p-value": 7.111499624384033e-25, "V'": 0.13986088151002263}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the gender or familial status of the respondent. The Group A snippets discuss happy moments from unmarried people, while the Group B snippets discuss happy moments from married people. I am a sociologist studying intimate relationships. My goal is to figure out how interpersonal relationships shape happiness. "}, {"+": {"mentions spending quality time with family members": {"p-value": 1.201827249431838e-15, "V'": 0.1245841958256756}, "mentions activities related to parent-child bonding": {"p-value": 4.627975344935335e-27, "V'": 0.16902479946783244}, "mentions successful experiences in parenting, such as a child's first goal or performance": {"p-value": 3.323082672899985e-37, "V'": 0.16003937323464698}, "mentions activities that involve the children, such as trips to the mall or a book fair": {"p-value": 3.281729833296475e-29, "V'": 0.12559519985369053}, "mentions activities that involve the whole family, such as a father-daughter breakfast or a triathlon": {"p-value": 3.3636525495220057e-13, "V'": 0.0991767530893368}, "mentions spending time with children": {"p-value": 1.485737743197927e-54, "V'": 0.23551569062053673}, "mentions experiences with children or grandchildren": {"p-value": 4.546013184772181e-53, "V'": 0.2402751560702592}, "mentions experiences related to parenting or raising children": {"p-value": 1.9754764251557943e-56, "V'": 0.24203082321206637}, "mentions children, such as their first time playing in the rain, taking a bath, or receiving a gift": {"p-value": 1.0429603164107548e-52, "V'": 0.2346426083364822}, "mention spending time with family": {"p-value": 9.982424530879898e-18, "V'": 0.14126211692247792}, "mentions experiences with significant others, such as dates, gifts, or cuddling": {"p-value": 4.442581511509675e-05, "V'": 0.0866299481662331}}, "-": {"mention activities that involve friends": {"p-value": 2.7876666711052626e-05, "V'": 0.06412974209860178}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the gender or familial status of the respondent. The Group A snippets discuss happy moments from parents, while the Group B snippets discuss happy moments from non-parents. I am a sociologist studying intimate relationships. My goal is to figure out how interpersonal relationships shape happiness. "}, {"+": {}, "-": {}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the gender or familial status of the respondent. The Group A snippets discuss happy moments from people who have separated (without divorcing), while the Group B snippets discuss happy moments from divorced people. I am a sociologist studying intimate relationships. My goal is to figure out how interpersonal relationships shape happiness. "}, {"+": {"mentioning family activities or relationships": {"p-value": 0.00042872083312113014, "V'": 0.08168184945151405}, "mention of family members": {"p-value": 0.0003098561177093079, "V'": 0.07302930962196194}}, "-": {}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the respondent's nationality. The Group A snippets discuss happy moments from the United States, while the Group B snippets discuss happy moments from Canada. I am a demographer comparing the values of different cultures. My goal is to figure out what specific experiences or topics make people happy across the world. "}, {"+": {"mentions completing work tasks": {"p-value": 4.32232124541295e-06, "V'": 0.05295829187370953}, "mentions experiences related to work, such as getting a raise or having a job": {"p-value": 9.985104320683998e-05, "V'": 0.05698394882242931}, "mentions material comforts, such as clean homes, delicious food, and comfortable clothes": {"p-value": 1.2627707670830644e-06, "V'": 0.06666111339978967}, "mentions physical activities, such as hiking and rock climbing": {"p-value": 4.850339341831594e-06, "V'": 0.04693288646824515}, "mentions activities related to work and success": {"p-value": 6.891731041866548e-06, "V'": 0.08651503800055255}}, "-": {"mentions spending time with family": {"p-value": 4.002302090866113e-05, "V'": 0.06760208930614009}, "involves spending time with family and close friends": {"p-value": 7.91803277193048e-12, "V'": 0.13972783669949834}, "mentions activities related to family and friends, such as surprise visits, date nights, and family dinners": {"p-value": 7.547365038661316e-10, "V'": 0.1265560931145316}, "mentions family activities": {"p-value": 4.147098318556859e-06, "V'": 0.081720633663393}, "includes activities related to family members, such as spending time with children and parents": {"p-value": 2.0492864816564874e-05, "V'": 0.08331496168541158}, "mention activities related to family": {"p-value": 2.982980522103364e-07, "V'": 0.0997960911858779}, "mentions family related activities, such as birthdays and marriages": {"p-value": 9.980071346220945e-06, "V'": 0.08041375526374889}, "mention activities related to family and friends": {"p-value": 1.0183924630660508e-14, "V'": 0.16054955201447624}, "mention activities related to celebrations or festivals": {"p-value": 1.3481124319806608e-06, "V'": 0.061153389856546156}, "mentions family members or relatives": {"p-value": 8.541827641961462e-06, "V'": 0.0877187292133898}, "mentions family events, such as weddings and birthdays": {"p-value": 6.557545163254866e-13, "V'": 0.09957818256542843}, "mentions family events": {"p-value": 3.422488944945908e-10, "V'": 0.11990613256371954}, "mention events related to family members": {"p-value": 5.298328292586848e-08, "V'": 0.10703311018796069}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the respondent's nationality. The Group A snippets discuss happy moments from the United States, while the Group B snippets discuss happy moments from India. I am a demographer comparing the values of different cultures. My goal is to figure out what specific experiences or topics make people happy across the world. "}, {"+": {}, "-": {"mentioning family or friends": {"p-value": 9.916733763293693e-05, "V'": 0.10671965406075762}, "mentions family activities, such as taking care of kids or hanging out with friends": {"p-value": 0.0005843498611504192, "V'": 0.0888812972413533}, "mention activities related to friends or family": {"p-value": 0.0008164770581935258, "V'": 0.09256710712254929}}, "research goal": "The dataset includes self-reported happy moments and demographic characteristics. The two classes are generated based on the respondent's nationality. The Group A snippets discuss happy moments from the United States, while the Group B snippets discuss happy moments from Venezuela. I am a demographer comparing the values of different cultures. My goal is to figure out what specific experiences or topics make people happy across the world. "}, {"+": {"argues that illegal immigrants should not receive the same benefits as citizens": {"p-value": 7.803122363353607e-22, "V'": 0.13254872433539824}, "Expresses a desire for a wall to be built along the southern border of the United States": {"p-value": 1.7485657685399525e-54, "V'": 0.2087065475394521}, "argues for building a wall on the southern border": {"p-value": 4.3829892206820904e-72, "V'": 0.2679878390296557}, "emphasizes the need to increase funding for border security": {"p-value": 5.476789648854039e-10, "V'": 0.08000006383084632}, "argues for strong border security to prevent illegal immigration": {"p-value": 8.980719727158853e-143, "V'": 0.5213632500325001}, "expresses concern about human trafficking and criminal cartels": {"p-value": 1.676210378712671e-39, "V'": 0.23532375895256452}, "mentions the need for a wall or barrier to secure the border": {"p-value": 2.2755169460744068e-36, "V'": 0.14215194875786016}, "calls for strong borders, families, and communities for a thriving economy": {"p-value": 6.47197741395236e-38, "V'": 0.2805210205175933}, "argues for border security": {"p-value": 1.974433937250885e-127, "V'": 0.49848655715479995}, "highlights the progress that has been made in improving the security of the border": {"p-value": 1.6742817106880835e-05, "V'": 0.05151022587020237}}, "-": {"favors the implementation of a merit-based immigration system": {"p-value": 6.705942454782031e-06, "V'": 0.05312909855609144}, "emphasizes the importance of immigration reform to benefit those who have struggled and worked hard": {"p-value": 2.2892108136290747e-21, "V'": 0.12592949926130975}, "highlights the need for the US to remain a place of refuge for those facing discrimination or violence in other parts of the world": {"p-value": 3.1488430926734306e-15, "V'": 0.06988659197465698}, "expresses the belief that immigrants should be given a pathway to legal status and be allowed to pay taxes, get a background check and strengthen our borders": {"p-value": 3.116270310888489e-20, "V'": 0.16007028740466453}, "calls for a merit-based immigration system that is needed badly": {"p-value": 1.2220264603252027e-05, "V'": 0.056443571663190445}, "asserts that immigration does not hurt our economy, but rather grows our economy": {"p-value": 3.1684399401200246e-20, "V'": 0.11025772950294883}, "argues for a path forward to promote the fair and just treatment of immigrants": {"p-value": 2.0140509136091322e-73, "V'": 0.34639262047208474}, "argues for comprehensive immigration reform": {"p-value": 2.1544687372669482e-12, "V'": 0.14293746447317346}, "argues for a comprehensive approach to immigration reform": {"p-value": 2.07420234387993e-24, "V'": 0.1864466976843601}, "emphasizes the importance of humane immigration frameworks": {"p-value": 1.321179077766023e-77, "V'": 0.3429559981592649}, "highlights the need for comprehensive immigration reform": {"p-value": 1.2165392860489052e-05, "V'": 0.08777178678452843}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Trump, while the Group B snippets are speeches given by President Obama. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {"mentions the need for a border wall": {"p-value": 2.3357842667361404e-29, "V'": 0.11954261113333685}, "mentions the wall as a solution to illegal immigration": {"p-value": 4.2945520126141604e-29, "V'": 0.11854633303206234}, "describes immigrants as violent offenders": {"p-value": 1.3209086474561444e-40, "V'": 0.16638428840129083}, "refers to illegal immigration as a 'scourge'": {"p-value": 1.4330841356803326e-15, "V'": 0.06375892624372269}, "calls for border security": {"p-value": 9.240869851038206e-70, "V'": 0.3525150379217038}, "references illegal immigrants as criminals": {"p-value": 8.902258178630598e-44, "V'": 0.21221936381480047}, "mentions the need for strict enforcement of immigration laws": {"p-value": 2.8847390897130142e-80, "V'": 0.38114837598731643}, "refers to illegal immigrants as criminals": {"p-value": 3.167551283760356e-38, "V'": 0.18547252873974643}, "focuses on the need for a wall along the Mexican border": {"p-value": 1.265951526905413e-29, "V'": 0.11553775390979297}}, "-": {"highlights the importance of legal immigration": {"p-value": 1.1215444588533574e-23, "V'": 0.2046545239153345}, "emphasizes the economic benefits of immigration": {"p-value": 8.836530480197416e-17, "V'": 0.10269742828435528}, "mentions the economic benefits of immigration": {"p-value": 5.221777084585202e-16, "V'": 0.097741655821972}, "appeals to the responsibility of all citizens to care for one another": {"p-value": 7.744384816415587e-26, "V'": 0.15104237500204212}, "encourages people to work within the legal framework to gain citizenship": {"p-value": 2.6346127552240857e-15, "V'": 0.1154410999069551}, "appeals to the values of inclusivity and diversity": {"p-value": 2.2063825552683544e-146, "V'": 0.5077081210258552}, "celebrates the contributions of immigrants to US society": {"p-value": 1.5188010079708272e-65, "V'": 0.2887063854508082}, "references the humanitarian aspect of immigration": {"p-value": 2.7721228043326266e-43, "V'": 0.2749299245793989}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Trump, while the Group B snippets are speeches given by President Obama. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {"believes that a merit-based immigration system should be implemented": {"p-value": 2.385015530277536e-05, "V'": 0.09148844312379799}, "argues for a merit-based immigration system": {"p-value": 0.00010968708732681998, "V'": 0.07936990627993365}}, "-": {"mentions the dangers of human trafficking": {"p-value": 2.524566781971232e-09, "V'": 0.13206746854188128}, "advocate for increased resources for border security": {"p-value": 4.446082868716641e-05, "V'": 0.15355431448234164}, "calls for increased resources to protect the homeland and secure the border": {"p-value": 4.3997504914599896e-05, "V'": 0.14454951818206002}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Trump, while the Group B snippets are speeches given by Vice President Pence. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {"mentions criminal aliens and the need to remove them from the US": {"p-value": 5.321460485355926e-07, "V'": 0.17926750865714944}}, "-": {"calls for an end to human trafficking": {"p-value": 7.12572605402235e-07, "V'": 0.0930067616803156}, "emphasizes the humanitarian crisis": {"p-value": 2.1597654451049066e-23, "V'": 0.2155962414244701}, "references the humanitarian crisis at the border": {"p-value": 1.3006826630474537e-36, "V'": 0.28049540376746934}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Trump, while the Group B snippets are speeches given by Vice President Pence. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {"mentions the need to close the 'back door' on illegal immigration": {"p-value": 1.5549740905209935e-10, "V'": 0.1467684631315337}, "mentions the need for increased Border Patrol agents": {"p-value": 0.0004925413878453242, "V'": 0.04632661133547949}, "calls for the implementation of a temporary worker program": {"p-value": 9.45647124568725e-08, "V'": 0.09833474960355443}, "expresses support for a new temporary worker program": {"p-value": 3.167977462092343e-07, "V'": 0.09128205583743909}, "argues for border security": {"p-value": 5.896300729551581e-37, "V'": 0.4081659250075259}, "expresses support for a temporary worker program": {"p-value": 1.9240754541789194e-07, "V'": 0.0976528192472312}, "calls for stricter border security measures, such as fencing and technology": {"p-value": 4.198355399868365e-06, "V'": 0.07192814187592138}}, "-": {"calls for an extension of the MFN waiver to promote freedom of emigration from China": {"p-value": 0.00013235033999557224, "V'": 0.014492874708293121}, "emphasizes the fundamental importance and historic contributions of immigrants to the United States": {"p-value": 0.00032931615969211377, "V'": 0.10648576304985316}, "supports increased trade with Mexico to raise environmental standards": {"p-value": 0.0009454490386804093, "V'": 0.010869580170293247}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Bush Jr., while the Group B snippets are speeches given by President Bush Sr.. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {"mentions the importance of enforcing laws and maintaining the rule of law": {"p-value": 2.367620279243281e-26, "V'": 0.35380537443075355}, "expresses support for legal immigration reform": {"p-value": 1.8791200200483687e-21, "V'": 0.3171744060331896}, "calls for securing the borders": {"p-value": 4.63012401035998e-34, "V'": 0.39175372635030864}, "mentions the need for comprehensive immigration reform": {"p-value": 1.2472971605122332e-27, "V'": 0.36205134139295614}, "emphasizes the importance of upholding laws and regulations": {"p-value": 2.614030810835802e-18, "V'": 0.2921820804920697}, "highlights the importance of enforcing U.S. borders": {"p-value": 1.1925532499902716e-34, "V'": 0.4053190630361968}, "mentions the need to strengthen border security": {"p-value": 1.1353187902044173e-33, "V'": 0.39447954236242466}}, "-": {"appeals to the personal stories of immigrants": {"p-value": 1.5866438462240494e-05, "V'": 0.10722541476512445}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Bush Jr., while the Group B snippets are speeches given by President Bush Sr.. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {}, "-": {"highlights the courage and will of free men to establish freedom": {"p-value": 0.00034153248571839933, "V'": 0.12032656287328336}, "highlights the importance of immigrants and refugees to the United States": {"p-value": 0.0006971618340675407, "V'": 0.11187300646724432}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Bush Sr., while the Group B snippets are speeches given by President Reagan. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {}, "-": {"mentions the contributions of immigrants to American society": {"p-value": 7.545815705149443e-05, "V'": 0.14426229125968804}, "expresses pride in the immigrant experience": {"p-value": 0.0005070870714669294, "V'": 0.12689587100439337}, "appeals to the example of immigrants achieving their American Dream": {"p-value": 0.0001567858744214869, "V'": 0.12423354269963421}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Bush Sr., while the Group B snippets are speeches given by President Reagan. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {"mentions the need to address the consequences of America moving from an agricultural to an industrial society": {"p-value": 0.00012857424753995428, "V'": 0.023635980993246115}}, "-": {"calls for changes to the existing immigration laws": {"p-value": 8.293183103418047e-10, "V'": 0.13372011924339905}, "argues for providing a pathway to earning legal status for those in the country illegally": {"p-value": 1.502609613672741e-28, "V'": 0.17448943210430196}, "mentions the need for comprehensive immigration reform": {"p-value": 1.6191835306702415e-12, "V'": 0.15630302655638384}, "calls for comprehensive immigration reform to secure our borders": {"p-value": 5.884381594045673e-22, "V'": 0.18186050711247018}, "stresses the importance of having a fair and orderly immigration system": {"p-value": 1.4478563755407425e-15, "V'": 0.17541073063170892}, "advocates for the implementation of a path to citizenship": {"p-value": 4.8549308849791305e-20, "V'": 0.11731319187149619}, "calls for a comprehensive approach to reforming the immigration system": {"p-value": 1.351572471602289e-16, "V'": 0.17560336940279936}, "argues that punishing children for their parents' illegal immigration is wrong": {"p-value": 5.444897319742532e-05, "V'": 0.0652172910203377}, "argues that illegal immigration should be reduced": {"p-value": 0.0001500448033867004, "V'": 0.07116724955919843}, "mentions the need to address the deep-seated problems of the immigration system": {"p-value": 4.920868472460164e-08, "V'": 0.119748161641263}, "describes the need to secure the US borders from illegal immigration": {"p-value": 1.2919590709205088e-05, "V'": 0.08150635572175877}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Bush Jr., while the Group B snippets are speeches given by Bill. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {"urges Congress to restore Medicaid benefits": {"p-value": 3.337402075558073e-05, "V'": 0.016966277760186645}}, "-": {"mentions the need for comprehensive immigration reform": {"p-value": 9.141924128316107e-47, "V'": 0.24827975955360176}, "mentions the need to earn citizenship": {"p-value": 1.7370077652540363e-08, "V'": 0.04782266739011285}, "mentions immigration reform as a priority": {"p-value": 1.6578051532932777e-20, "V'": 0.18845757420421905}, "emphasizes the importance of welcoming refugees as new neighbors": {"p-value": 2.743623524426405e-06, "V'": 0.03834527659157182}, "calls for comprehensive, commonsense immigration reform": {"p-value": 2.5611009562428404e-35, "V'": 0.22629793114154406}, "mentions the contributions of immigrants to American progress": {"p-value": 7.401477211513295e-05, "V'": 0.06486865317655233}, "mentions the economic benefits of immigration": {"p-value": 8.959848703839894e-08, "V'": 0.06467638668079527}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by President Bush Jr., while the Group B snippets are speeches given by Bill. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {"mentions the need to limit the number of low-skilled immigrants entering the country": {"p-value": 3.6849342891747916e-23, "V'": 0.13991997669966094}, "argues that the current immigration system does not serve the national interest": {"p-value": 8.816421156552785e-33, "V'": 0.21224431964697033}, "argues that a vote for the Reid-Schumer immigration bill will lower the wages of American workers": {"p-value": 4.1133636640760234e-07, "V'": 0.03819435232871622}, "argues that the President should not be allowed to use money from the Department of Homeland Security to grant amnesty": {"p-value": 1.2300115183235117e-14, "V'": 0.056026779546040927}, "argues that building a fence along the border reduces costs and saves money": {"p-value": 0.0007848560222469434, "V'": 0.010999890649005908}}, "-": {"highlights the responsibility the US has to help resolve the humanitarian problems of Southeast Asia": {"p-value": 4.08095976498378e-11, "V'": 0.0422401899321914}, "argues for an earned legalization program for undocumented people who are working and contributing": {"p-value": 0.0005201801846686399, "V'": 0.029572687662644905}, "supports providing legal immigrants with a safety net": {"p-value": 2.800765017238129e-36, "V'": 0.25022949612775947}, "calls for an increase in funding for refugee and migration assistance": {"p-value": 2.561898741884891e-05, "V'": 0.025433881362653364}, "argues for the reauthorization of Community Health Centers and Migrant Health Centers Programs": {"p-value": 3.1990323729834494e-05, "V'": 0.016921035985967198}, "highlights the success of refugees in their new communities": {"p-value": 4.7213097191857024e-05, "V'": 0.01961103337939663}, "mentions the need to improve the quality of education of low-income children": {"p-value": 8.373927465093356e-09, "V'": 0.032416490808978585}, "argues that immigration reforms are long overdue to improve the lives and working conditions of all farm workers": {"p-value": 0.0009717298754304035, "V'": 0.010805554542766918}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by Congressperson Jeff Sessions, while the Group B snippets are speeches given by Congressperson Edward Kennedy. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {"calls for stricter border control": {"p-value": 2.800162708345479e-113, "V'": 0.46820316838089193}, "emphasizes that immigration should be lawful, not illegal": {"p-value": 4.034094147558753e-62, "V'": 0.305496366087452}, "expresses concerns about illegal immigration": {"p-value": 1.201549988934516e-124, "V'": 0.4530646974761834}, "calls for an end to lawlessness": {"p-value": 3.2733291831155734e-35, "V'": 0.26982665990369786}, "focuses on the legal or unlawful status of immigrants": {"p-value": 2.5498096147517677e-50, "V'": 0.2886535550135516}, "calls for stricter enforcement of immigration laws": {"p-value": 7.3573302002098805e-109, "V'": 0.46275943856043245}, "calls for an end to illegal immigration": {"p-value": 9.491662923411583e-52, "V'": 0.2885751232365248}, "mentions the need to uphold immigration law": {"p-value": 4.666878736426286e-47, "V'": 0.22828112436347836}, "mentions the need for secure borders": {"p-value": 1.2071000382610537e-105, "V'": 0.45585157160402845}}, "-": {"appeals to Americans' sense of fairness and justice": {"p-value": 1.2455857952390273e-05, "V'": 0.0943334549605308}, "calls for an end to discrimination in immigration laws": {"p-value": 3.0536528447209364e-09, "V'": 0.12308650460147014}, "calls for humane treatment of immigrants": {"p-value": 2.8707270613608777e-54, "V'": 0.3272515077521319}, "mentions the contributions of immigrants to US culture": {"p-value": 4.361027898911471e-22, "V'": 0.11855954463812433}, "mentions the legacy of immigrants and their contributions to the United States": {"p-value": 1.1427009794231848e-42, "V'": 0.22802912326236482}, "mentions the contribution of immigrants to the US economy": {"p-value": 2.8553925719642773e-05, "V'": 0.04956666267588035}, "calls for a humane approach to immigration enforcement": {"p-value": 3.238926819892294e-30, "V'": 0.2363823007038514}, "highlights the success of the Special Immigrant Visas program": {"p-value": 7.660335373938111e-05, "V'": 0.02734789168047325}, "mentions the contributions of immigrants to American society": {"p-value": 1.4996593302684205e-27, "V'": 0.14888967504564501}, "mentions the historic importance of immigration to the US": {"p-value": 4.230305119534121e-24, "V'": 0.14756848667532893}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by Congressperson Jeff Sessions, while the Group B snippets are speeches given by Congressperson Edward Kennedy. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {"believes that the number of independent immigrants should increase": {"p-value": 9.049202671534276e-09, "V'": 0.06073476805920393}, "calls for immediate legalization of certain apprehended aliens": {"p-value": 1.5420358636634862e-31, "V'": 0.14456478763129788}, "mentions the Dream Act as a path to citizenship": {"p-value": 5.0044107174975916e-23, "V'": 0.09081842730299917}, "argues that the Dreamers should be given a chance to remain in the US": {"p-value": 5.923099205688837e-125, "V'": 0.4120551889118859}, "mentions the need for comprehensive immigration reform": {"p-value": 4.668818264478595e-08, "V'": 0.12494400757703081}, "mentions the DREAM Act as a solution for immigration reform": {"p-value": 3.1176718241779367e-52, "V'": 0.19661595722151207}, "argues that the DREAM Act should be passed": {"p-value": 1.4718868283865898e-70, "V'": 0.2774632856548406}, "believes that the DREAM Act should be passed to help the young people": {"p-value": 1.369789107156023e-71, "V'": 0.2615807657669671}, "argues for a path to citizenship for Dreamers": {"p-value": 5.470770120134513e-83, "V'": 0.2954993467083602}, "argues that DACA should be reinstated": {"p-value": 6.582526180439957e-68, "V'": 0.25388983597676557}, "supports protection of immigrants in the U.S. from criminalization": {"p-value": 2.894002054284916e-75, "V'": 0.3904008952421924}}, "-": {"shows support for the Immigration Service's long and public regulatory process": {"p-value": 2.361254828901229e-06, "V'": 0.04567719333964282}, "argues that sponsors should bear the financial burden of immigrants": {"p-value": 1.749593518650584e-11, "V'": 0.048701869696174356}, "argues for increased interdiction efforts to restrict the number of illegal aliens entering the country": {"p-value": 5.204310663828492e-25, "V'": 0.18267854108471554}, "argues that the IRCA was an effective tool to reduce illegal immigration": {"p-value": 0.000341209144702887, "V'": 0.02306055768020403}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by Congressperson Richard Durbin, while the Group B snippets are speeches given by Congressperson Mike Simpson. I am a political scientist studying stances on immigration. My goal is to figure out the specific beliefs of different politicians on immigration. "}, {"+": {"appeals to American ideals of unity and acceptance": {"p-value": 3.3267372779786264e-77, "V'": 0.35362157445536463}, "References the DACA program": {"p-value": 2.1764802632284487e-48, "V'": 0.18644646418265184}, "mentions the positive economic contributions of immigrants": {"p-value": 1.5915119313864283e-12, "V'": 0.07283022568041815}, "references executive orders issued by President Obama": {"p-value": 1.4314240829815286e-24, "V'": 0.10170371873223645}, "emphasizes the immigrant experience": {"p-value": 8.712552562979748e-71, "V'": 0.3818504446922762}, "mentions the civil rights implications of immigration": {"p-value": 2.4489659207558716e-54, "V'": 0.33206978534584536}, "advocates for the rights of immigrant workers": {"p-value": 7.668642043411041e-91, "V'": 0.3706202855458531}}, "-": {"mentions the decrease in illegal immigration": {"p-value": 0.0003179276815568702, "V'": 0.019165749579152024}, "calls for an end to political gimmickry in defining a refugee": {"p-value": 0.00018621743062759928, "V'": 0.023830185219876218}, "mentions the need to restrict immigration through the family reunification preference system": {"p-value": 0.00036881492216235003, "V'": 0.016889828225103445}, "mentions the need to control illegal immigration": {"p-value": 1.3133945693647851e-08, "V'": 0.12097842864734554}, "mentions the need for employers to check documents from workers": {"p-value": 0.0002246861300528624, "V'": 0.032728812967871404}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on who gave the speech. The Group A snippets are speeches given by Congressperson Richard Durbin, while the Group B snippets are speeches given by Congressperson Mike Simpson. I am a political scientist studying stances on immigration. My goal is to figure out the stereotypes and metaphors different politicians appeal to. "}, {"+": {"uses colloquial language": {"p-value": 1.0344681195703316e-05, "V'": 0.12279660397504776}, "uses emotive language": {"p-value": 0.0006448697440621619, "V'": 0.15140107691386553}, "uses informal language": {"p-value": 4.470618631447365e-18, "V'": 0.37012738520286464}, "uses language that is informal and conversational": {"p-value": 4.1813737236467727e-42, "V'": 0.6018344757540084}, "uses a conversational style": {"p-value": 2.1869688808917432e-47, "V'": 0.6282407612221378}, "uses direct and informal language": {"p-value": 1.8895610059479904e-32, "V'": 0.49230233906973997}}, "-": {"uses concise and direct language": {"p-value": 8.055250421742485e-07, "V'": 0.1663575859515355}, "will contain technical and/or legal terms": {"p-value": 3.698661210266061e-48, "V'": 0.6198045863461263}, "will be long": {"p-value": 1.6122224943449813e-15, "V'": 0.35530692099932293}, "will include technical language": {"p-value": 8.427152927414889e-62, "V'": 0.6673513651668415}, "uses formal language": {"p-value": 1.982469160488693e-08, "V'": 0.18662482723469775}, "uses complex or technical language": {"p-value": 3.1575931264159238e-47, "V'": 0.6189141114811373}, "uses precise language": {"p-value": 1.190147967748547e-07, "V'": 0.1748046854468761}, "contains technical language": {"p-value": 2.0648412868022698e-47, "V'": 0.6195028832552028}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on whether the statement was written or spoken. The Group A snippets are spoken speeches delivered by President Trump, while the Group B snippets are written statements by President Trump. I am a political scientist studying stances on immigration. My goal is to figure out how the speaking style of politicians changes depending on medium. "}, {"+": {"will include legal jargon": {"p-value": 2.8306898705034883e-10, "V'": 0.17697594485453488}, "will include official terminology and bureaucratic language": {"p-value": 1.9022854844848626e-12, "V'": 0.3096124807437636}}, "-": {"uses personal pronouns": {"p-value": 0.0005959442674612834, "V'": 0.10782003516431245}, "uses short sentences": {"p-value": 1.9580783443130203e-06, "V'": 0.1585757229814653}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on whether the statement was written or spoken. The Group A snippets are written statements by President Obama, while the Group B snippets are spoken speeches delivered by President Obama. I am a political scientist studying stances on immigration. My goal is to figure out how the speaking style of politicians changes depending on medium. "}, {"+": {"highlights the humanitarian and national security objectives of refugee admissions": {"p-value": 8.658788959052523e-07, "V'": 0.09418183107987713}}, "-": {"highlights the contributions of Italian Americans to the United States": {"p-value": 0.0006895681120668744, "V'": 0.013436908316873213}, "highlights the importance of the DREAM Act": {"p-value": 6.456916767266606e-06, "V'": 0.050788722990661256}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements by President Obama in his second term, while the Group B snippets are statements by President Obama in his first term. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"uses language that emphasizes the current state of immigration laws, and the need for their reform": {"p-value": 0.0001646019649317262, "V'": 0.09030155034559328}, "emphasizes the need for comprehensive immigration reform": {"p-value": 0.00018346847240370095, "V'": 0.09379568472935673}, "emphasizes the need for a global response to the refugee crisis": {"p-value": 1.1708717300396416e-09, "V'": 0.09511298782211494}, "emphasizes the need for a comprehensive and humane immigration system": {"p-value": 5.739607336976889e-06, "V'": 0.10711807278466734}, "emphasizes the importance of taking action to solve the immigration issue": {"p-value": 2.813514813802252e-06, "V'": 0.1125546251463414}}, "-": {}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements by President Obama in his second term, while the Group B snippets are statements by President Obama in his first term. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {}, "-": {}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements by President Bush Jr. after 9/11, while the Group B snippets are statements by President Bush Jr. prior to 9/11. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"emphasizes the need for a collaborative relationship between the US and Mexico": {"p-value": 9.791383233057863e-08, "V'": 0.20842141371224088}}, "-": {"mentions immigration reform as a solution": {"p-value": 6.3595601282923274e-06, "V'": 0.2370261484979758}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements by President Bush Jr. after 9/11, while the Group B snippets are statements by President Bush Jr. prior to 9/11. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"describes the difficulties experienced by immigrants": {"p-value": 9.56759286383276e-07, "V'": 0.1040037841655434}, "emphasizes the importance of maintaining racial supremacy": {"p-value": 5.936049288619709e-20, "V'": 0.21147812472257596}, "focuses on the need to restrict foreign immigration": {"p-value": 1.661017788013053e-27, "V'": 0.2657226366491428}, "stresses the importance of not allowing Chinese immigrants to enter the US": {"p-value": 9.629309472580877e-22, "V'": 0.2431223432258658}, "refers to the morality of Chinese immigrants": {"p-value": 3.4183123774444592e-06, "V'": 0.10191211054324847}, "expresses concern about the number of Chinese immigrants entering the United States": {"p-value": 1.2017836043337834e-23, "V'": 0.2542313478310097}, "references the conditions of Chinese detention": {"p-value": 2.540320981394351e-05, "V'": 0.0763073356762731}, "expresses the need to protect American labor against Chinese competition": {"p-value": 7.636080932662692e-34, "V'": 0.2988155952756446}}, "-": {"highlights the benefits of repealing Chinese Exclusion Laws": {"p-value": 9.499665000476535e-11, "V'": 0.16247153416780175}, "highlights the important contribution of Chinese immigrants": {"p-value": 6.465558190905774e-18, "V'": 0.13747449727427713}, "highlights the need to repeal the Chinese Exclusion Laws": {"p-value": 3.306155412305675e-05, "V'": 0.10860674399429615}, "acknowledges the loyalty and patriotism of Chinese immigrants": {"p-value": 5.473122547673473e-40, "V'": 0.27156784378331605}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Chinese immigrants between 1873 and 1934, while the Group B snippets are statements about Chinese immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"emphasizes the economic and social impact of immigration": {"p-value": 3.60657503624619e-08, "V'": 0.10909908435999266}, "uses language that portrays the Chinese as a threat to American society or culture": {"p-value": 8.834684299802875e-17, "V'": 0.21700813846958056}, "emphasizes the racial and cultural differences between Chinese immigrants and other populations": {"p-value": 3.5921324472391296e-06, "V'": 0.10074836147194624}, "expresses a belief that Chinese immigrants are not assimilating into American society": {"p-value": 5.996938667450224e-20, "V'": 0.17768281028772104}, "Expresses concern about the influx of Chinese immigrants": {"p-value": 3.8688022960493916e-28, "V'": 0.2845658701390801}, "uses language that reinforces the exclusion of Chinese immigrants as a necessity": {"p-value": 2.2512909302056595e-23, "V'": 0.2587398753640049}, "emphasizes the importance of protecting American jobs from immigrants": {"p-value": 4.2761110589534e-05, "V'": 0.07010272210617445}, "mentions the risks of a large influx of Chinese immigrants": {"p-value": 2.1002133166764264e-13, "V'": 0.14678253252052934}, "expresses sympathy for the people of the United States, rather than the Chinese immigrants": {"p-value": 1.2195147796591916e-06, "V'": 0.11558397056449476}}, "-": {"emphasizes the loyalty of Chinese immigrants and their contribution to the Allied war effort": {"p-value": 2.6974544223750632e-12, "V'": 0.050791067498046884}, "pays homage to Chinese immigrants and their courage and resilience": {"p-value": 1.7649707372223175e-12, "V'": 0.06401482510801292}, "portrays Chinese immigrants as a brave and heroic people who deserve recognition and respect": {"p-value": 4.494220233713028e-25, "V'": 0.12111059725167508}, "uses language that portrays Chinese immigrants as hardworking and resilient": {"p-value": 0.00016027425340505634, "V'": 0.040900325455320234}, "draws a contrast between Chinese immigrants and other immigrants, such as Japanese or Mexican": {"p-value": 4.5854961681221214e-05, "V'": 0.07088005606866861}, "expresses admiration for the qualities and characteristics of Chinese immigrants": {"p-value": 8.434035002585403e-09, "V'": 0.05917777753764124}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Chinese immigrants between 1873 and 1934, while the Group B snippets are statements about Chinese immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"expresses sympathy for the discrimination against Chinese immigrants": {"p-value": 6.865499599802999e-05, "V'": 0.06551028670477099}, "Highlights the need for legal protection of Chinese students": {"p-value": 2.773966210763468e-13, "V'": 0.1280564804637651}, "mentions the role of the United States in providing refuge to immigrants": {"p-value": 4.158510268971468e-18, "V'": 0.22543883976432078}, "highlights the importance of protecting human rights for immigrants": {"p-value": 6.066513975997482e-27, "V'": 0.2743198867853804}, "recognizes the economic contributions of Chinese immigrants": {"p-value": 1.3387324241962168e-05, "V'": 0.05045442939273796}, "acknowledges the difficult conditions that Chinese immigrants may face upon returning to their home country": {"p-value": 3.375483450004037e-10, "V'": 0.08809048408289771}, "mentions the need for English language proficiency": {"p-value": 2.685646197824127e-05, "V'": 0.037809105580152526}}, "-": {"expresses fear of the 'onrush of the yellow man'": {"p-value": 6.360601849167595e-05, "V'": 0.030600100420626795}, "highlights the effects of the Chinese Exclusion Act on immigrants": {"p-value": 1.3347212865051256e-10, "V'": 0.10214391669388483}, "references the fear of the 'onrush of the yellow man'": {"p-value": 7.62859405500121e-07, "V'": 0.03617039262121896}, "refers to legal restrictions on Chinese immigrants": {"p-value": 6.821532832131838e-29, "V'": 0.2731839575598992}, "refers to the Chinese Exclusion Act": {"p-value": 1.6461723668248659e-46, "V'": 0.23552380619805605}, "mentions the need for immigration quotas for Chinese": {"p-value": 5.9523865024018e-23, "V'": 0.11621019071112373}, "expresses concern about the effects of Chinese immigrants on the American standard of living": {"p-value": 2.4898796305799855e-07, "V'": 0.05907450553937334}, "calls for a repeal of the Chinese Exclusion Act": {"p-value": 1.5763391022432332e-35, "V'": 0.14568269757435715}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Chinese immigrants between 1957 and 2020, while the Group B snippets are statements about Chinese immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"acknowledges the positive successes of immigrants, such as their economic self-sufficiency": {"p-value": 1.6903016224534016e-13, "V'": 0.14476118064643295}, "discusses the obstacles that immigrants face, such as language barriers or cultural adaptation difficulties": {"p-value": 5.0844368063519125e-11, "V'": 0.10814505977279266}, "describes the contributions of immigrants to society, such as their labor or their innovative ideas": {"p-value": 2.1495402444515898e-16, "V'": 0.14238160726775195}, "mentions the moral obligation of the United States to help immigrants, such as providing refuge or temporary respite": {"p-value": 2.571698268922082e-30, "V'": 0.2842525935329051}, "emphasizes the humanitarian aspect of immigration, such as the help of refugees and asylum seekers": {"p-value": 8.152609587804717e-57, "V'": 0.37331335717451186}, "acknowledges the positive accomplishments made by Chinese immigrants": {"p-value": 3.4622030857189172e-06, "V'": 0.10165461736572681}, "describes Chinese immigrants as being persecuted by the Chinese government": {"p-value": 3.6971370674006375e-17, "V'": 0.15135369434070095}, "highlights the benefits of immigration, such as job skills or economic contributions": {"p-value": 1.701138666145541e-11, "V'": 0.09932950509170042}, "expresses sympathy towards immigrants and acknowledges the difficulties they face": {"p-value": 1.319576116313379e-42, "V'": 0.3486407612973747}, "uses language to portray the positive contributions of Chinese immigrants to American society and culture": {"p-value": 2.219223618776676e-07, "V'": 0.10583214599784377}, "expresses gratitude for the hard work and service of Chinese immigrants": {"p-value": 3.995570767814266e-07, "V'": 0.09787852771161419}, "describes Chinese immigrants in a paternalistic way, such as describing them as needing guidance or protection": {"p-value": 1.5963575489911137e-06, "V'": 0.07579230812190182}}, "-": {"expresses concern about the legal status of immigrants": {"p-value": 5.333660190762886e-06, "V'": 0.11964996914537585}, "uses language that is exclusionary, such as referring to immigrants as 'aliens'": {"p-value": 3.700431583807049e-08, "V'": 0.0823205092480027}, "uses language that positions immigrants as a threat to society": {"p-value": 0.0009572102834992218, "V'": 0.0535691934269384}, "uses language that implies Chinese immigrants should be kept out of the United States": {"p-value": 5.612227445285636e-19, "V'": 0.17434203991314806}, "emphasizes the differences between Chinese immigrants and other racial groups": {"p-value": 1.619235213177313e-14, "V'": 0.15518828103715387}, "uses language to describe immigrants as foreign, such as using words like 'outsiders' or 'alien'": {"p-value": 1.6931611877500887e-05, "V'": 0.09989042071992837}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Chinese immigrants between 1957 and 2020, while the Group B snippets are statements about Chinese immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"views Mexican immigrants as a threat to American society": {"p-value": 2.377026768908204e-10, "V'": 0.13152631081568078}, "highlights the need to restrict foreign labor": {"p-value": 7.713991041541358e-05, "V'": 0.08374340729060703}, "references the Mexican quota system": {"p-value": 3.909534641878989e-19, "V'": 0.10285366642441399}}, "-": {"Stresses the need for humane treatment of Mexican immigrants": {"p-value": 2.1387342802865938e-05, "V'": 0.0472186583825312}, "acknowledges the economic benefits of employing Mexican immigrants": {"p-value": 9.42847595197313e-06, "V'": 0.04070564149663215}, "references the arrangement made with the Mexican Government to bring in Mexican agricultural workers": {"p-value": 2.527221658470563e-62, "V'": 0.2546407441132664}, "suggests that Mexican immigrants should be recruited to solve labor shortages": {"p-value": 2.2989144982430163e-36, "V'": 0.1614274316202558}, "refers to the Mexican labor shortage and the need for workers": {"p-value": 1.874334462756004e-31, "V'": 0.15425966297235724}, "discusses the need to bring in Mexican workers to harvest crops": {"p-value": 1.3107302905499223e-17, "V'": 0.10060206284045076}, "Highlights the need for official order in the recruitment of farm labor": {"p-value": 4.591143186246171e-63, "V'": 0.239908807037671}, "acknowledges the positive economic impact of Mexican immigration": {"p-value": 0.0006054928387968403, "V'": 0.023397589478067676}, "references the exploitation of Mexican immigrants by employers": {"p-value": 1.1905454391547449e-07, "V'": 0.09253800251195538}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Mexican immigrants between 1873 and 1934, while the Group B snippets are statements about Mexican immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"portrays immigrants as a threat to American values": {"p-value": 1.2805170119625497e-06, "V'": 0.08007008049001754}, "uses language that is critical of Mexican immigration, or expresses a need to restrict it": {"p-value": 6.207472244999652e-05, "V'": 0.08561440231276257}}, "-": {"mentions Mexicans as being a source of cheap labor": {"p-value": 6.585307524018564e-27, "V'": 0.218643576981162}, "uses language that emphasizes the need for legal immigration": {"p-value": 0.00012029888360464366, "V'": 0.07815876175850611}, "describes immigration as a necessary solution to labor shortages": {"p-value": 6.182104156536032e-16, "V'": 0.07837059243986585}, "discusses the use of Mexican nationals for agricultural labor in the United States": {"p-value": 1.016531627156588e-71, "V'": 0.3682169918607235}, "emphasizes the need for government intervention to protect Mexican labor": {"p-value": 1.9494728936073146e-14, "V'": 0.12495260823019133}, "mentions the need for cooperation between the US and Mexican governments in controlling the immigration situation": {"p-value": 2.291211714864114e-24, "V'": 0.1257917959727612}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Mexican immigrants between 1873 and 1934, while the Group B snippets are statements about Mexican immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"discusses the need for educational opportunities for Mexican immigrants": {"p-value": 4.678911523874991e-12, "V'": 0.06528214273085409}, "references the impact of poverty on Mexican Americans": {"p-value": 0.0004680952615390904, "V'": 0.04136450011832955}}, "-": {"acknowledges the need for immigration law reform": {"p-value": 6.76704588866933e-13, "V'": 0.11402184961123885}, "highlights the need to import Mexican citizens to save crops": {"p-value": 1.2528106339760088e-13, "V'": 0.1261131221882846}, "expresses the need for timely regulations for obtaining Mexican labor": {"p-value": 2.7279654120801123e-35, "V'": 0.24258473718329798}, "references the need for government intervention to protect American labor": {"p-value": 3.645993071928755e-16, "V'": 0.1572311378453808}, "acknowledges the need for Mexican labor in the United States": {"p-value": 2.760069226267942e-12, "V'": 0.14364207021438707}, "mentions the haphazard procedure by which Mexicans enter the United States": {"p-value": 1.6400911976361026e-11, "V'": 0.12838217542991448}, "touches on the need for an agreement with Mexico for labor": {"p-value": 4.146835700436172e-40, "V'": 0.2617056872142559}, "discusses the use of Mexican labor for agricultural purposes": {"p-value": 4.5857538928095116e-26, "V'": 0.22140002138131515}, "refers to the difficulty of enforcing immigration laws": {"p-value": 1.8565653824403728e-07, "V'": 0.10988591025149946}, "suggests that immigration laws are discriminatory": {"p-value": 6.063168430958405e-05, "V'": 0.07394472249897283}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Mexican immigrants between 1957 and 2020, while the Group B snippets are statements about Mexican immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"uses language that celebrates the accomplishments of Mexican Americans in the US": {"p-value": 6.802543519736051e-28, "V'": 0.11753071229389703}, "uses language that emphasizes Mexican Americans' loyalty to the United States": {"p-value": 6.906576402512465e-05, "V'": 0.03452431349957838}, "uses language that focuses on the plight of Mexican immigrants and their struggles in the US": {"p-value": 2.2935339845749697e-09, "V'": 0.11920019975465262}, "emphasizes the value of immigrants to society and the economy": {"p-value": 4.493048457978118e-06, "V'": 0.05530811058184111}, "portrays immigrants as people with rights and deserving of due process": {"p-value": 4.219044674157445e-14, "V'": 0.10940116663176729}, "emphasizes the American identity of Mexican Americans": {"p-value": 1.2888838418761093e-08, "V'": 0.061646924882427706}, "describes immigrants as hardworking and determined": {"p-value": 5.390344769258994e-12, "V'": 0.06902212561964208}, "emphasizes the benefits of immigrants to the US economy": {"p-value": 0.00049087586050633, "V'": 0.03445637835479529}, "focuses on the positive aspects of immigration, such as the contributions of immigrants to the economy": {"p-value": 2.516800622732996e-07, "V'": 0.0470008060065422}}, "-": {"describes the legal process of emigration and employment, such as visas and labor shortages": {"p-value": 1.9670860011017664e-18, "V'": 0.17632036779981314}, "emphasizes the importance of keeping immigrants from crossing the border": {"p-value": 2.8328994795464074e-14, "V'": 0.13322325438631952}, "mentions the need for immigration laws": {"p-value": 1.3493725809379074e-10, "V'": 0.13037259804867118}, "expresses support for bringing Mexican agricultural workers to the US": {"p-value": 3.822839441521784e-11, "V'": 0.1128993111161457}, "uses language that speaks of the need to protect American labor": {"p-value": 3.745086418297947e-09, "V'": 0.12142490331547295}, "uses language that paints Mexican immigrants as lacking in refinement and education": {"p-value": 0.0005382452566229329, "V'": 0.042557094518235514}, "describes the need for a solution to the 'wetback problem'": {"p-value": 1.3675615636722576e-26, "V'": 0.10448585011000192}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Mexican immigrants between 1957 and 2020, while the Group B snippets are statements about Mexican immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"refers to the unequal treatment of Italian immigrants compared to American citizens": {"p-value": 1.1488406663531247e-07, "V'": 0.10647442440193486}, "highlights the economic burden of immigrants on society": {"p-value": 0.00036595714852668843, "V'": 0.04080475696569762}, "acknowledges the role of free white persons in immigration legislation": {"p-value": 8.302512150574766e-09, "V'": 0.0686656427992803}, "expresses concern for the cultural deterioration of immigrants in the US": {"p-value": 8.299708442756067e-18, "V'": 0.12272699064882028}, "expresses concern over the influx of immigrants from Southern Europe and Asia Minor": {"p-value": 3.2366188797266977e-22, "V'": 0.19405132160560806}}, "-": {"expresses support for Italian immigrants and the Italian economy": {"p-value": 3.566035271416564e-07, "V'": 0.07088488819929259}, "mentions the need for temporary havens for refugees": {"p-value": 3.745266084164141e-12, "V'": 0.048052626839512225}, "focuses on the need for international cooperation to solve Italian economic structural problems": {"p-value": 7.705685399041759e-07, "V'": 0.02953153938551301}, "highlights the importance of having assurance for Italian immigrants entering the US": {"p-value": 0.0009328157305276493, "V'": 0.06840574815913672}, "acknowledges the need to adjust immigration laws to serve the interests of immigrants": {"p-value": 2.028668954590087e-24, "V'": 0.21373782636754474}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Italian immigrants between 1873 and 1934, while the Group B snippets are statements about Italian immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"emphasizes the need for immigrants to assimilate and learn the language to participate in American society": {"p-value": 6.604774163641636e-15, "V'": 0.09450762285997434}, "strongly emphasizes the need for immigrants to learn English in order to be accepted": {"p-value": 1.5489521836350642e-07, "V'": 0.03925380083490659}, "contrasts the American working class with immigrants, focusing on the differences between them": {"p-value": 1.2720406849539026e-42, "V'": 0.2819245943915543}, "discusses the potential harms of immigration, such as crowding and disease": {"p-value": 1.6080236319525986e-15, "V'": 0.15293931144313155}, "portrays Italian immigrants as lacking in patriotism or loyalty to the nation": {"p-value": 8.517160199603689e-22, "V'": 0.17337220949839932}, "uses language that emphasizes the need for restrictions on immigration": {"p-value": 1.1103229963366097e-13, "V'": 0.17810076683864795}, "mentions the importance of immigration in terms of population and economic growth": {"p-value": 0.0009865980605122107, "V'": 0.058913924123771}, "mentions cultural and economic differences between northern and southern Italian immigrants": {"p-value": 6.97100775305145e-11, "V'": 0.060408223103228464}, "portrays immigration as a threat to the nation": {"p-value": 8.024877235018966e-13, "V'": 0.1408807256949875}, "describes the Italian immigrants as a source of revenue": {"p-value": 4.707587404010863e-05, "V'": 0.03774311248248952}}, "-": {"emphasizes the need for humanitarian assistance and protection for immigrants": {"p-value": 1.1501240809187902e-30, "V'": 0.2716715458648551}, "presents displaced persons as victims of war and persecution": {"p-value": 1.59134061683222e-38, "V'": 0.1738609928964228}, "emphasizes the importance of providing citizenship to Italian immigrants": {"p-value": 2.3937374611904286e-05, "V'": 0.07925722817554867}, "emphasizes the humanitarian causes of immigration": {"p-value": 7.181936593129462e-38, "V'": 0.23104851417803063}, "mentions the need for immigration reform or legislation": {"p-value": 3.904581706714713e-05, "V'": 0.0956238220303256}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Italian immigrants between 1873 and 1934, while the Group B snippets are statements about Italian immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"highlights the success of Italian-American businesspeople": {"p-value": 6.247051080341445e-06, "V'": 0.03474487771544316}, "highlights the successful achievements of Italian Americans": {"p-value": 3.101960744827298e-43, "V'": 0.25368591580430255}, "highlights the success of Italian-Americans": {"p-value": 1.3254881891868787e-42, "V'": 0.25779274666231}, "highlights the contributions that Italian immigrants have made to the US": {"p-value": 2.9721166326924317e-26, "V'": 0.19150663727119027}, "highlights the achievements of Italian Americans": {"p-value": 4.64914943309954e-56, "V'": 0.31319065755877906}, "describes the accomplishments of Italian immigrants": {"p-value": 1.7792082749937728e-36, "V'": 0.23855241048887743}, "highlights the positive contributions of Italian immigrants to the US": {"p-value": 2.3444755852798417e-38, "V'": 0.2651294937266938}, "praises the hard work and determination of Italian immigrants": {"p-value": 2.994352757526179e-32, "V'": 0.23055992516523732}, "highlights the contributions of Italian immigrants to American culture and society": {"p-value": 3.475454803054573e-36, "V'": 0.23808197736196807}, "highlights the achievements of Italian-Americans": {"p-value": 3.517649101424001e-52, "V'": 0.29964613979396476}, "acknowledges the contributions of Italian immigrants to American culture": {"p-value": 1.8196852737958968e-33, "V'": 0.23380937790795198}, "highlights the importance of family values among Italian American families": {"p-value": 2.431578756318185e-09, "V'": 0.05818853548819107}}, "-": {"expresses concern about the threat of immigration to security and population growth": {"p-value": 1.4064757288614856e-21, "V'": 0.13593070158463993}, "expresses a negative opinion of foreign-born people": {"p-value": 1.862550325939089e-19, "V'": 0.14915568100770965}, "acknowledges the pressure put on Italian citizens to vote": {"p-value": 5.488914106292334e-08, "V'": 0.0264626704290012}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Italian immigrants between 1957 and 2020, while the Group B snippets are statements about Italian immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out how different events and eras influence the general perception of immigrants. "}, {"+": {"emphasizes the different roles immigrants have played in the building of America": {"p-value": 3.5736650260073628e-62, "V'": 0.3811005523879407}, "uses language that praises immigrants for their courage and strength": {"p-value": 4.767638337814764e-34, "V'": 0.24742385888781865}, "describes the courage and strength of immigrants": {"p-value": 3.6496521734529896e-37, "V'": 0.2655728099879212}, "uses language that emphasizes Italian immigrants' hard work and perseverance": {"p-value": 2.400139988553835e-40, "V'": 0.2669026384349955}, "uses language that celebrates Italian immigrants' success in various fields": {"p-value": 5.1378842365783884e-73, "V'": 0.3796609338553426}, "emphasizes the positive contributions of Italian immigrants to American society": {"p-value": 1.6423788902895668e-73, "V'": 0.39784937209546173}, "acknowledges the positive impact of Italian immigrants on American culture": {"p-value": 1.3759807494347405e-67, "V'": 0.3824582027111388}, "conveys a sense of pride in the Italian immigrant community and their accomplishments": {"p-value": 1.8894785481966274e-92, "V'": 0.45215476209760225}, "acknowledges the positive impact of Italian immigrants on the United States": {"p-value": 2.0344181875975477e-72, "V'": 0.40060410776175626}, "emphasizes the positive aspects of immigration, highlighting the contributions immigrants have made to American society": {"p-value": 3.593604197648717e-73, "V'": 0.4062535693913646}, "highlights the importance of education, particularly in relation to Italian immigrants": {"p-value": 1.0445673472393494e-08, "V'": 0.10027664880090631}}, "-": {"focuses on the need for laborers in other countries and the potential economic benefits of allowing Italian immigrants to work abroad": {"p-value": 1.3108964932285205e-06, "V'": 0.026467582530287637}, "highlights the need to protect the US from potential fifth columnists and Nazi sympathizers": {"p-value": 2.1130581596572105e-30, "V'": 0.1442753545931102}, "references the need to take action to protect Americans from 'enemy' aliens": {"p-value": 4.19249552137902e-21, "V'": 0.12054929329380125}, "emphasizes the need for refugees as workers": {"p-value": 3.383920010112402e-06, "V'": 0.019471454863400715}, "focuses on the idea of dual citizenship and the unique challenges it poses": {"p-value": 3.2089760770880256e-10, "V'": 0.07289985661178276}, "uses language that implies Italian immigrants are a foreign presence in the United States": {"p-value": 1.242657429933354e-27, "V'": 0.2214574780889818}, "uses language that portrays immigrants as 'outsiders' and members of a separate group": {"p-value": 3.320385006561832e-25, "V'": 0.14997431464773922}, "emphasizes the need for immigration reform and/or legislation": {"p-value": 1.8245022390221634e-26, "V'": 0.21720116067148057}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on when the statement was made. The Group A snippets are statements about Italian immigrants between 1957 and 2020, while the Group B snippets are statements about Italian immigrants between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the dominant stereotypes and metaphors of each time era. "}, {"+": {"emphasizes the loyalty of immigrants": {"p-value": 3.5833286442139477e-135, "V'": 0.4345597932634771}, "emphasizes the economic benefits of immigration": {"p-value": 1.2573812735094723e-10, "V'": 0.10453833869979043}, "emphasizes the positive contributions of immigrants": {"p-value": 1.867264904603425e-120, "V'": 0.405963144539603}, "references the positive contributions of immigrants": {"p-value": 6.522851273377273e-124, "V'": 0.4136152432274525}, "highlights the contributions of immigrants to American society": {"p-value": 1.7406692821103417e-105, "V'": 0.3661505715628016}, "emphasizes the positive economic impact of immigration": {"p-value": 8.31720445000428e-28, "V'": 0.14706446142725635}, "emphasizes the important role of immigrants in American history": {"p-value": 6.193388529821444e-121, "V'": 0.4397068111614988}, "mentions the shared values between immigrants and citizens": {"p-value": 3.075354347711523e-92, "V'": 0.325703297771548}}, "-": {"Focuses on the negative aspects of immigration": {"p-value": 7.595954551980102e-124, "V'": 0.464864311181609}, "highlights the danger of unrestricted immigration": {"p-value": 1.1088933153405144e-98, "V'": 0.4214459215790022}, "focuses on the negative consequences of immigration": {"p-value": 8.110561942610871e-92, "V'": 0.40265891348432703}, "highlights the dangers of immigration": {"p-value": 6.399018873884549e-106, "V'": 0.42254546559627004}, "highlights the danger of allowing criminals into the country": {"p-value": 5.0660221224005425e-59, "V'": 0.23687923262881763}, "argues that immigration harms the economy": {"p-value": 7.724932173610798e-17, "V'": 0.09916808812230513}, "highlights the burden of immigrants on the economy": {"p-value": 1.3680253371464616e-26, "V'": 0.15484987854989174}, "links immigration to crime and danger": {"p-value": 4.688745527076148e-44, "V'": 0.17162142204528913}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on whether the speaker is in favor of immigration. The Group A snippets are statements supportive of immigration between 1873 and 1934, while the Group B snippets are statements opposed to immigration between 1873 and 1934. I am a political scientist studying stances on immigration. My goal is to figure out what specific justifications people in favor of immigration use. "}, {"+": {"illustrates the contributions of immigrants to the country": {"p-value": 2.482491837211919e-60, "V'": 0.22601440455848487}, "emphasizes the contributions of immigrants to national wealth": {"p-value": 7.779120526934135e-46, "V'": 0.17866914472506967}, "highlights diversity as a strength": {"p-value": 6.569227360871031e-44, "V'": 0.16699329753752465}, "emphasizes the positive contributions of immigrants": {"p-value": 1.6200537083947663e-105, "V'": 0.35620243993099343}, "acknowledges the positive contributions made by immigrants": {"p-value": 4.4197838376941036e-95, "V'": 0.3304320359680766}, "highlights the cultural contributions of immigrants": {"p-value": 4.565160088980699e-39, "V'": 0.1535702142901648}, "emphasizes the positive contributions of immigrants to the United States": {"p-value": 1.850341954158243e-94, "V'": 0.32926923547495934}, "highlights the value and benefit of immigration for the United States": {"p-value": 5.0950803958003565e-85, "V'": 0.3054184021867395}, "acknowledges the contributions of immigrants to the country": {"p-value": 1.5243656845108213e-87, "V'": 0.30997011833707394}}, "-": {"expresses concerns about the number of immigrants": {"p-value": 4.905636328634142e-94, "V'": 0.38515074869417243}, "emphasizes negative consequences of immigration": {"p-value": 1.1781392575384569e-188, "V'": 0.5769056234816965}, "highlights the criminality of immigrants": {"p-value": 1.295627625653078e-125, "V'": 0.40213851845959214}, "mentions the need to protect American worker's rights": {"p-value": 4.770375665821703e-09, "V'": 0.10988241884456548}, "highlights the dangers of illegal immigration": {"p-value": 1.1598708210052279e-136, "V'": 0.4240900532743911}, "mentions the risk to security from immigrants": {"p-value": 1.107763573428983e-125, "V'": 0.39695557086597955}, "argues that immigrants are taking away jobs from Americans": {"p-value": 5.676854156769537e-10, "V'": 0.08629433097201364}, "criticizes the inflow of immigrants": {"p-value": 5.135989962351439e-174, "V'": 0.5437338047073983}, "emphasizes the harshness of the immigration laws": {"p-value": 2.884121634466066e-87, "V'": 0.40130720249627083}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on whether the speaker is in favor of immigration. The Group A snippets are statements supportive of immigration between 1935 and 1956, while the Group B snippets are statements opposed to immigration between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out what specific justifications people in favor of immigration use. "}, {"+": {"emphasizes the economic contributions of immigrants": {"p-value": 4.585571392200034e-08, "V'": 0.08209626494831189}, "emphasizes the importance of recognizing immigrants' contributions": {"p-value": 2.8396503231962845e-158, "V'": 0.47753124949937}, "praises the contributions of immigrants to American society": {"p-value": 4.316647958315538e-111, "V'": 0.36863321518458114}, "cites the economic benefits of immigration": {"p-value": 1.8950632367048696e-05, "V'": 0.05234155589990159}, "emphasizes the importance of providing help to immigrants": {"p-value": 3.0788159794096546e-151, "V'": 0.5006674886093032}, "emphasizes the importance of respecting immigrants' cultures": {"p-value": 4.4737337233936393e-60, "V'": 0.23003471413235335}, "emphasizes the importance of opportunity for immigrants": {"p-value": 6.68286148412017e-99, "V'": 0.3847327263101634}, "emphasizes the hard work and contribution of immigrants": {"p-value": 4.674755975930617e-126, "V'": 0.4028041634641669}, "emphasizes the positive economic contributions of immigrants": {"p-value": 6.738215253717016e-37, "V'": 0.15547365682191633}}, "-": {"warns of the dangers posed by immigrants": {"p-value": 5.208648753084949e-98, "V'": 0.33429434302211497}, "calls for stricter enforcement and control of immigration": {"p-value": 6.318314939342685e-166, "V'": 0.4691793751599135}, "expresses concerns about illegal immigration": {"p-value": 4.586557278343348e-266, "V'": 0.6582057829532818}, "highlights the dangers of open borders": {"p-value": 8.127893137003026e-181, "V'": 0.5155320054204576}, "mentions negative economic impact of immigration": {"p-value": 2.271914922935048e-12, "V'": 0.14465367151418715}, "highlights the costs of immigrants": {"p-value": 7.12075674073389e-19, "V'": 0.19363550446271333}, "highlights the potential dangers associated with illegal immigration": {"p-value": 2.4181813691008358e-141, "V'": 0.4743752395309268}, "mentions the costs of illegal immigration": {"p-value": 2.767067291139151e-20, "V'": 0.1684509027194706}, "highlights the potential for fraudulent documents": {"p-value": 0.0004676669145222511, "V'": 0.05645256920100115}, "highlights the undesirability of immigrants": {"p-value": 6.321157186970439e-121, "V'": 0.38536622782909075}, "highlights the economic burden of immigration": {"p-value": 1.1343950224867783e-13, "V'": 0.14729634430311797}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on whether the speaker is in favor of immigration. The Group A snippets are statements supportive of immigration between 1957 and 2020, while the Group B snippets are statements opposed to immigration between 1957 and 2020. I am a political scientist studying stances on immigration. My goal is to figure out what specific justifications people in favor of immigration use. "}, {"+": {"emphasizes the unfairness of the restrictions on immigrants": {"p-value": 8.587590540695888e-05, "V'": 0.0791796745801554}}, "-": {}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on the political party of the speaker. The Group A snippets are statements by Democrats on immigration between 1873 and 1934, while the Group B snippets are statements by Republicans on immigration between 1873 and 1934. I am a political scientist studying stances on immigration. My goal is to figure out the specific policy priorities of each political party. "}, {"+": {"calls for stricter enforcement of immigration laws": {"p-value": 0.0004032657958799176, "V'": 0.06550205492614322}, "calls for increase in restrictions on immigration": {"p-value": 7.376166935120002e-05, "V'": 0.0711119106724386}}, "-": {"mentions the need to provide support and assistance to refugees": {"p-value": 0.0004659081651114674, "V'": 0.03342853817701706}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on the political party of the speaker. The Group A snippets are statements by Democrats on immigration between 1935 and 1956, while the Group B snippets are statements by Republicans on immigration between 1935 and 1956. I am a political scientist studying stances on immigration. My goal is to figure out the specific policy priorities of each political party. "}, {"+": {"calls for the DREAM Act to be enacted": {"p-value": 0.0009680987469745688, "V'": 0.010166334420403265}, "calls for lenient policies on political refugees and asylum seekers": {"p-value": 3.0225894154029905e-10, "V'": 0.07895270684465525}, "emphasizes the need to provide educational opportunities to immigrant children": {"p-value": 7.477712816649748e-07, "V'": 0.04038318358050917}, "highlights the importance of providing support to immigrants to help them adjust to life in the United States": {"p-value": 1.0334377196084725e-08, "V'": 0.06490377000874772}, "mentions the importance of providing assistance to refugees": {"p-value": 2.608527778069904e-05, "V'": 0.05509087577527237}, "mentions the importance of recognizing the contributions of Hispanic Americans": {"p-value": 2.4639667883160397e-11, "V'": 0.08771682596093378}}, "-": {"expresses concern about the number of people entering the country illegally": {"p-value": 1.1849581163694984e-22, "V'": 0.1709819793442094}, "calls for stricter enforcement of immigration laws": {"p-value": 2.504152924602433e-19, "V'": 0.12537087723129356}, "Mentions the need for enforcement of immigration laws": {"p-value": 2.8327343969765664e-19, "V'": 0.1837875169575678}, "highlights the importance of border security": {"p-value": 3.6956467537615213e-22, "V'": 0.17164965938698892}, "emphasizes the need to deport criminal aliens": {"p-value": 0.00010425243853688942, "V'": 0.03006974062352576}, "calls for stricter enforcement of existing immigration laws": {"p-value": 2.4006879413523408e-17, "V'": 0.11195406112882342}, "calls for harsher penalties for illegal immigrants": {"p-value": 0.0009246109967232621, "V'": 0.022836023748506448}, "calls for a restriction on the number of immigrants allowed in the country": {"p-value": 4.4152842998921037e-07, "V'": 0.05609717073348453}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on the political party of the speaker. The Group A snippets are statements by Democrats on immigration between 1957 and 2020, while the Group B snippets are statements by Republicans on immigration between 1957 and 2020. I am a political scientist studying stances on immigration. My goal is to figure out the specific policy priorities of each political party. "}, {"+": {"expresses support for refugees": {"p-value": 3.6246951965982448e-09, "V'": 0.10711788647685988}}, "-": {"expresses concerns about the impact of illegal immigration on citizens and legal immigrants": {"p-value": 0.0005212714451839103, "V'": 0.07456293261178337}, "emphasizes the need to enforce existing immigration laws": {"p-value": 1.420616195477614e-06, "V'": 0.10349609576591984}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on the geographic region of the speaker's home. The Group A snippets are statements by Northerners on immigration between 1957 and 2020, while the Group B snippets are statements by Westerners on immigration between 1957 and 2020. I am a political scientist studying stances on immigration. My goal is to figure out how people from different parts of the United States feel about immigrants. "}, {"+": {"expresses support for refugees and immigrants": {"p-value": 1.3397447451685759e-20, "V'": 0.1890265513889729}, "urges the US government to provide assistance to immigrants": {"p-value": 2.2609556453721714e-13, "V'": 0.12373300869373527}, "talks about the value of immigrants in the United States": {"p-value": 7.013185230840402e-06, "V'": 0.06921810458594557}}, "-": {"highlights the need to increase funding for border security": {"p-value": 1.1248538267896993e-08, "V'": 0.06565094483629769}, "presents the militarization of borders as a solution to immigration": {"p-value": 6.58129400809442e-08, "V'": 0.06929917286326567}, "emphasizes the importance of enforcing immigration laws": {"p-value": 1.5113300631454632e-13, "V'": 0.15174398047136495}, "calls for increased enforcement of immigration laws": {"p-value": 6.118691085322197e-14, "V'": 0.14707132208680912}, "emphasizes the importance of controlling the nation's borders": {"p-value": 9.724832930688687e-19, "V'": 0.1687064422783713}, "expresses a negative attitude towards immigration": {"p-value": 3.4468662787911134e-14, "V'": 0.1570082249607453}, "expresses concern over the influx of immigrants": {"p-value": 1.390525550565865e-07, "V'": 0.110706271744903}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on the geographic region of the speaker's home. The Group A snippets are statements by Northerners on immigration between 1957 and 2020, while the Group B snippets are statements by Southerners on immigration between 1957 and 2020. I am a political scientist studying stances on immigration. My goal is to figure out how people from different parts of the United States feel about immigrants. "}, {"+": {"highlights the positive contributions of immigrants to the U.S. economy": {"p-value": 0.0001242056495040007, "V'": 0.04274743362769167}, "emphasizes the importance of defending the rights of immigrants": {"p-value": 4.29134375392621e-11, "V'": 0.14117388044183382}, "acknowledges the tremendous contributions of Pacific/Asian-Americans": {"p-value": 2.051515332560409e-11, "V'": 0.05340074338344038}, "voices support for immigrants on the pathway to citizenship": {"p-value": 0.0001495755916162546, "V'": 0.05299267376204797}}, "-": {"emphasizes the importance of controlling the influx of immigrants": {"p-value": 0.00023735204466423558, "V'": 0.07903184046764311}, "emphasizes stopping the 'bleeding' of border security": {"p-value": 2.574724477034049e-05, "V'": 0.065670558794968}, "emphasizes the importance of enforcing immigration laws": {"p-value": 3.4732066427260156e-07, "V'": 0.10275320318166098}, "advocates for the restriction of illegal immigration": {"p-value": 3.548333359736943e-05, "V'": 0.08714011346508704}, "displays a pro-enforcement stance on immigration": {"p-value": 2.9283275173215186e-06, "V'": 0.09962589903948349}, "emphasizes the need for strong border protection": {"p-value": 7.211785889636698e-08, "V'": 0.09662608156620203}}, "research goal": "The dataset includes congressional and presidential speeches that mention immigration from 1880 to the present. The two classes are generated based on the geographic region of the speaker's home. The Group A snippets are statements by Westerners on immigration between 1957 and 2020, while the Group B snippets are statements by Southerners on immigration between 1957 and 2020. I am a political scientist studying stances on immigration. My goal is to figure out how people from different parts of the United States feel about immigrants. "}, {"+": {"is related to media or the arts": {"p-value": 9.766934140006838e-08, "V'": 0.11615592222126503}, "mentions a creative product or service": {"p-value": 0.0005597190557957834, "V'": 0.06474121525381449}, "involves a creative or artistic expression": {"p-value": 4.746118259807541e-10, "V'": 0.1378282578540962}}, "-": {}, "research goal": "The dataset includes names of startups on kickstarter.com. The two classes are generated based on whether the fundraiser succeeded or failed. The Group A snippets are names of startups that succeeded, while the Group B snippets are names of startups that failed. I am an angel investor. My goal is to figure out what kinds of companies succeed. "}, {"+": {}, "-": {}, "research goal": "The dataset includes names of startups on kickstarter.com. The two classes are generated based on whether the fundraiser succeeded or failed. The Group A snippets are names of startups that succeeded, while the Group B snippets are names of startups that failed. I am an angel investor. My goal is to figure out how successful fundraisers market themselves. "}, {"+": {}, "-": {}, "research goal": "The dataset includes funny sentences generated by making one-word edits to normal statements. The two classes are generated based on how funny annotators found the sentences. The Group A snippets are sentences edited to be somewhat funny, while the Group B snippets are sentences edited to be very funny. I am a rhetoric researcher studying the nature of humor. My goal is to figure out what types of humor people find funny. "}, {"+": {}, "-": {"have an absurd tone": {"p-value": 0.0008750450621186712, "V'": 0.10558583933880261}, "rely on irony": {"p-value": 0.00014220581270756039, "V'": 0.11913788563972916}, "have subtle humor that relies on references and context": {"p-value": 9.960652093392383e-05, "V'": 0.11721979786104675}, "have unexpected or outlandish content": {"p-value": 0.0004479781913302305, "V'": 0.1106452095251218}}, "research goal": "The dataset includes funny sentences generated by making one-word edits to normal statements. The two classes are generated based on how funny annotators found the sentences. The Group A snippets are sentences edited to be somewhat funny, while the Group B snippets are sentences edited to be very funny. I am a rhetoric researcher studying the nature of humor. My goal is to figure out which specific writing features make a joke not funny. "}, {"+": {"are factual in nature": {"p-value": 1.4470183782996742e-05, "V'": 0.09629992722954839}, "use formal language": {"p-value": 4.016641535042802e-06, "V'": 0.08966919632619971}, "lack a comedic tone and are direct statements": {"p-value": 4.686460368963919e-15, "V'": 0.13423076758345154}, "lack a humorous element or witty twist": {"p-value": 6.181505879307262e-16, "V'": 0.09781656032666}, "are serious and literal": {"p-value": 2.0653135323267444e-08, "V'": 0.11585093365548904}}, "-": {"rely on puns": {"p-value": 9.033286964141766e-22, "V'": 0.18982606189869958}, "include a play on words": {"p-value": 2.667096821235689e-24, "V'": 0.20617444499731807}, "make humorous use of wordplay": {"p-value": 3.9498580806276527e-32, "V'": 0.24440094262707135}, "utilize puns": {"p-value": 5.4480621675757976e-36, "V'": 0.27173223703324495}, "employ irony or a comparison of two unrelated concepts": {"p-value": 2.8452374269920143e-31, "V'": 0.2532381106826885}, "Contain irony or sarcasm": {"p-value": 2.033310026480963e-30, "V'": 0.23923211718692658}, "Incorporate puns or double entendres": {"p-value": 3.760507266956442e-39, "V'": 0.2717747657234535}, "use a clever play on words": {"p-value": 6.42925720360323e-32, "V'": 0.24145110128367803}, "employ wordplay or puns": {"p-value": 1.9608724603453795e-36, "V'": 0.27553038510861594}, "incorporate elements of surprise": {"p-value": 3.8010143729444724e-11, "V'": 0.10133811802973573}, "incorporate an element of surprise": {"p-value": 3.8307632770449417e-16, "V'": 0.14354374799232494}, "have an element of surprise or absurdity": {"p-value": 2.807710236139835e-44, "V'": 0.29929620218520564}}, "research goal": "The dataset includes funny sentences generated by making one-word edits to normal statements. The two classes are generated based on how funny annotators found the sentences. The Group A snippets are edited sentences that are not funny, while the Group B snippets are edited sentences that are funny. I am a rhetoric researcher studying the nature of humor. My goal is to figure out what types of humor people find funny. "}, {"+": {"are factual in nature": {"p-value": 1.4981022842858604e-09, "V'": 0.12547579375044748}, "have a serious tone": {"p-value": 1.6528243355854415e-17, "V'": 0.18254701709818333}, "have a serious or straightforward tone": {"p-value": 1.6654798902818114e-22, "V'": 0.2122862602740868}, "focus on topics that are serious or mundane": {"p-value": 4.570837381862413e-23, "V'": 0.21500447308004006}, "have a straightforward meaning": {"p-value": 6.933424391672324e-11, "V'": 0.1321870119979013}}, "-": {"uses unexpected turns of phrase": {"p-value": 3.727912865738155e-08, "V'": 0.06563488622100025}, "include unexpected or outlandish scenarios": {"p-value": 4.861567974005615e-22, "V'": 0.17510616674803572}, "contain unexpected elements or juxtapositions": {"p-value": 1.3320311578540112e-09, "V'": 0.08449354915957313}, "make use of unexpected phrasing or syntax": {"p-value": 7.538107816921408e-06, "V'": 0.05083084892765081}, "emphasize the incongruous aspects of the edited phrase": {"p-value": 3.8758640255794544e-17, "V'": 0.14944153820796424}, "include absurd, non-sequitur statements": {"p-value": 2.4828118536533644e-25, "V'": 0.20655142619065073}}, "research goal": "The dataset includes funny sentences generated by making one-word edits to normal statements. The two classes are generated based on how funny annotators found the sentences. The Group A snippets are edited sentences that are not funny, while the Group B snippets are edited sentences that are funny. I am a rhetoric researcher studying the nature of humor. My goal is to figure out which specific writing features make a joke not funny. "}, {"+": {}, "-": {"mentions the need for retail industry and service experience": {"p-value": 6.915823173283567e-05, "V'": 0.12974336581262275}, "mentions the need to have knowledge of Florida civil rules and procedures": {"p-value": 7.48412068447151e-09, "V'": 0.046729084067287326}}, "research goal": "The dataset includes American job postings on monster.com. The two classes are generated based on where the job was posted. The Group A snippets are job postings in Atlanta, GA, while the Group B snippets are job postings in Tampa, FL. I am a recent graduate trying to figure out which city to look for jobs in. My goal is to figure out the requirements of jobs in different cities. "}, {"+": {"mentions the need to have a Bachelor's degree in a related field": {"p-value": 2.79446185126151e-06, "V'": 0.10551596844806707}, "mentions a Bachelor's degree in a related field": {"p-value": 2.7270355504161946e-06, "V'": 0.10391692105034323}, "mentions the need for a Bachelor\u2019s degree": {"p-value": 2.6774789104955254e-06, "V'": 0.10311429917443585}}, "-": {}, "research goal": "The dataset includes American job postings on monster.com. The two classes are generated based on where the job was posted. The Group A snippets are job postings in California, while the Group B snippets are job postings in Texas. I am a recent graduate trying to figure out which city to look for jobs in. My goal is to figure out the requirements of jobs in different cities. "}, {"+": {}, "-": {}, "research goal": "The dataset includes American job postings on monster.com. The two classes are generated based on where the job was posted. The Group A snippets are job postings in New York City, while the Group B snippets are job postings in San Francisco. I am a recent graduate trying to figure out which city to look for jobs in. My goal is to figure out the requirements of jobs in different cities. "}, {"+": {}, "-": {}, "research goal": "The dataset includes movie plot summaries from TMDB. The two classes are generated based on how popular the movie was. The Group A snippets describe unpopular movies, while the Group B snippets describe average movies. I am a movie director planning a new film. My goal is to figure out which genres movie-goers seem to like. "}, {"+": {}, "-": {"features a larger-than-life antagonist": {"p-value": 0.0001866741389373259, "V'": 0.10403232783137117}}, "research goal": "The dataset includes movie plot summaries from TMDB. The two classes are generated based on how popular the movie was. The Group A snippets describe unpopular movies, while the Group B snippets describe average movies. I am a movie director planning a new film. My goal is to figure out what specific plot devices are more popular to movie-goers. "}, {"+": {"features a hero with a classic 'good vs. evil' story": {"p-value": 0.00015656687618365004, "V'": 0.11545352226165928}, "features a strong and determined female lead": {"p-value": 0.0008001767034179574, "V'": 0.06050882775380805}, "contains an element of suspense or mystery": {"p-value": 4.1216257761918626e-08, "V'": 0.17332316584888513}, "involves a battle between good and evil": {"p-value": 2.2155496909173846e-05, "V'": 0.12846362423404717}, "features a heroic protagonist": {"p-value": 2.8059883693308375e-06, "V'": 0.13250528638905446}}, "-": {}, "research goal": "The dataset includes movie plot summaries from TMDB. The two classes are generated based on how popular the movie was. The Group A snippets describe very popular movies, while the Group B snippets describe average movies. I am a movie director planning a new film. My goal is to figure out which genres movie-goers seem to like. "}, {"+": {"features a protagonist with special abilities or powers": {"p-value": 2.1164704230690845e-05, "V'": 0.08972201308201115}, "features a protagonist with special abilities": {"p-value": 6.000841194752461e-06, "V'": 0.10201132419829884}}, "-": {}, "research goal": "The dataset includes movie plot summaries from TMDB. The two classes are generated based on how popular the movie was. The Group A snippets describe very popular movies, while the Group B snippets describe average movies. I am a movie director planning a new film. My goal is to figure out what specific plot devices are more popular to movie-goers. "}, {"+": {}, "-": {}, "research goal": "The dataset includes news headlines posted on social media platforms. The two classes are generated based on popularity, which is measured by how many times the story was shared. The Group A snippets are popular articles about the economy, while the Group B snippets are unpopular articles about the economy. I am a journalist. My goal is to figure out what specific topics cause people to read or share an article, so I know what to write about. "}, {"+": {}, "-": {}, "research goal": "The dataset includes news headlines posted on social media platforms. The two classes are generated based on popularity, which is measured by how many times the story was shared. The Group A snippets are popular articles about Microsoft, while the Group B snippets are unpopular articles about Microsoft. I am a journalist. My goal is to figure out what specific topics cause people to read or share an article, so I know what to write about. "}, {"+": {}, "-": {}, "research goal": "The dataset includes news headlines posted on social media platforms. The two classes are generated based on popularity, which is measured by how many times the story was shared. The Group A snippets are popular articles about Obama, while the Group B snippets are unpopular articles about Obama. I am a journalist. My goal is to figure out what specific topics cause people to read or share an article, so I know what to write about. "}, {"+": {"mentions Obama's nomination of Merrick Garland to the Supreme Court": {"p-value": 5.428223657735745e-07, "V'": 0.27822929109503836}, "mentions Obama's support of the Supreme Court": {"p-value": 5.930071627398196e-10, "V'": 0.3595111728079844}, "highlight Obama's successes": {"p-value": 8.46600464139783e-10, "V'": 0.41528715609108735}, "mentions Obama's Supreme Court nomination": {"p-value": 5.131774012560847e-11, "V'": 0.42812979651684036}, "refers to Obama's Supreme Court nominees": {"p-value": 2.4279557666698415e-11, "V'": 0.42855191404845566}, "highlights Obama's success with the Supreme Court": {"p-value": 0.00010766940079380572, "V'": 0.15817799127951004}}, "-": {"highlights Obama's lack of action against terrorism": {"p-value": 0.00014432532059908745, "V'": 0.1940324349305947}, "mentions Obama's plans to stop ISIS": {"p-value": 0.0008091481415923077, "V'": 0.12048205500271811}, "refers to Obama's failure to address public concerns": {"p-value": 5.822297839194733e-09, "V'": 0.4022137388822289}, "refers to Obama's efforts to combat terrorism": {"p-value": 2.796725150907531e-07, "V'": 0.32824026807108375}, "mention Obama's failures in foreign policy": {"p-value": 1.172072547735238e-08, "V'": 0.32614048793235295}}, "research goal": "The dataset includes news headlines posted on social media platforms. The two classes are generated based on the positivity or negativity, measured by whether it received more likes or dislikes. The Group A snippets are positive articles about Obama, while the Group B snippets are negative articles about Obama. I am a journalist. My goal is to figure out what specific topics are perceived positively, so I know what to write about. "}, {"+": {}, "-": {}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are ANLI dataset Round 1 premises, while the Group B snippets are  ANLI dataset Round 2 premises. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"has a formal tone.": {"p-value": 4.044124246655152e-34, "V'": 0.130589830393952}, "contains formal language, such as 'I urge your consideration' or 'unrestricted gift'": {"p-value": 1.1723780989167785e-22, "V'": 0.0694980822755713}, "contains complex sentence structure.": {"p-value": 0.0002763684580358296, "V'": 0.04084177088342267}, "uses formal language and terminology": {"p-value": 1.3218395558721443e-17, "V'": 0.10489832195561388}, "uses long sentences.": {"p-value": 1.9438184659040348e-08, "V'": 0.07077580546457152}, "uses complex and long sentences.": {"p-value": 1.5004331915155524e-06, "V'": 0.059492147375623805}, "uses a lot of technical language, such as phrases related to aviation": {"p-value": 3.736848972331205e-51, "V'": 0.08512283978366189}, "contains long and complex sentences.": {"p-value": 0.0009114360179624436, "V'": 0.03956479877416075}}, "-": {"mentions specific places, such as Safed or Lake Kinneret.": {"p-value": 5.0611012851441706e-11, "V'": 0.049086885406980044}, "Uses legal terms such as 'reasonable doubt'.": {"p-value": 5.121778962179283e-05, "V'": 0.006999167533717392}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are MNLI dataset hypotheses with mismatched annotations, while the Group B snippets are MNLI dataset hypotheses with matched annotations. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"uses formal language and terminology.": {"p-value": 6.028458922089921e-29, "V'": 0.08586478650878959}, "has long sentences.": {"p-value": 4.517840719608884e-08, "V'": 0.026636593334426806}, "contains historical information.": {"p-value": 7.72136227075561e-117, "V'": 0.41423320263179514}, "uses precise language.": {"p-value": 1.5182297750775242e-10, "V'": 0.02948750992699467}, "contain complex language.": {"p-value": 2.908788075592453e-90, "V'": 0.3513251447282407}, "uses technical or specific language, such as scientific terms or legal jargon.": {"p-value": 9.792987923552029e-06, "V'": 0.08269719715713347}, "mentions historical events and figures.": {"p-value": 1.5792463948182648e-28, "V'": 0.21267709528336293}, "references artistic works or performances.": {"p-value": 1.2682912977572761e-89, "V'": 0.36736121069065786}, "contains a lot of technical language.": {"p-value": 6.4834586576572765e-06, "V'": 0.07095665742952637}, "uses complex language structures, such as multiple clauses.": {"p-value": 3.771670362993984e-19, "V'": 0.05619936568411965}, "uses complicated language with long words.": {"p-value": 1.428083265581468e-81, "V'": 0.3302164216340364}, "uses technical language, such as scientific and business terms.": {"p-value": 1.5392048561123706e-06, "V'": 0.07664214797290464}, "contains technical language.": {"p-value": 1.6106784302070645e-11, "V'": 0.1254835450822575}, "contains technical terms or jargon.": {"p-value": 1.481465905737355e-05, "V'": 0.07293028190916537}, "uses complex language structures.": {"p-value": 2.700987861285585e-125, "V'": 0.35345143266514145}}, "-": {"contains technical language, such as medical and legal terms.": {"p-value": 1.064999232131297e-10, "V'": 0.0906837333445124}, "contains detailed descriptions of events.": {"p-value": 6.791877673669755e-19, "V'": 0.11537236131642833}, "uses formal or legal language.": {"p-value": 0.0002268737707200299, "V'": 0.06476555505973494}, "uses abstract and figurative language.": {"p-value": 0.0008088002927936287, "V'": 0.023467023964478202}, "includes dialogue.": {"p-value": 6.162399613703527e-20, "V'": 0.06466273512895152}, "mention topics that are likely to be debated, such as equality and fairness": {"p-value": 9.00992860052142e-25, "V'": 0.11382909004230689}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are ANLI dataset Round 2 premises, while the Group B snippets are  ANLI dataset Round 3 premises. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {}, "-": {"uses long sentences.": {"p-value": 7.193895012180105e-185, "V'": 0.4393161125148757}, "contains complex sentences with multiple clauses.": {"p-value": 5.748012726966039e-248, "V'": 0.5304754326079757}, "contains long sentences.": {"p-value": 1.9476272824518467e-185, "V'": 0.4400479851641317}, "uses complex sentence structure with multiple clauses.": {"p-value": 7.992326854960748e-217, "V'": 0.4865572290897011}, "uses complex language.": {"p-value": 2.0134913683506818e-282, "V'": 0.6587391002842286}, "are long": {"p-value": 0.0, "V'": 0.7047054616368907}, "uses a lot of proper names and technical terms.": {"p-value": 0.0, "V'": 0.6968524120022748}, "have details, long sentences, and complex information": {"p-value": 0.0, "V'": 0.6640713324396246}, "mentions specific people, places, or events.": {"p-value": 1.5756434363500676e-239, "V'": 0.5944565540513433}, "involves complex grammar structures.": {"p-value": 8.361251097776929e-271, "V'": 0.6368394390390416}, "mentions a specific year.": {"p-value": 7.720233214615587e-215, "V'": 0.5731018686040985}, "refers to historical events or figures.": {"p-value": 2.5417193798897427e-119, "V'": 0.4226276090316059}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are WANLI dataset premises, while the Group B snippets are ANLI dataset premises. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"mentions a specific year or time period.": {"p-value": 6.342807293433932e-05, "V'": 0.08200989867424285}, "mentions a specific year or date.": {"p-value": 0.00011191804463523337, "V'": 0.07299966981732403}, "mentions a specific month.": {"p-value": 0.000609303041243928, "V'": 0.045178960138364505}, "features facts and figures that can be verified easily.": {"p-value": 1.3681901712734592e-05, "V'": 0.09195336355474415}, "mentions specific dates, such as the year 2000.": {"p-value": 0.00035088686754923093, "V'": 0.06400001680551043}}, "-": {}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are ANLI dataset Round 1 hypotheses, while the Group B snippets are  ANLI dataset Round 2 hypotheses. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"includes long sentences.": {"p-value": 4.03168896020533e-34, "V'": 0.26182130789267377}, "uses complex vocabulary and structures.": {"p-value": 1.8959765819485096e-17, "V'": 0.09927907667754189}, "uses complex language.": {"p-value": 3.991138933813779e-19, "V'": 0.10302118589369262}, "Uses precise vocabulary, such as technical terms and jargon.": {"p-value": 1.0158225543003853e-22, "V'": 0.10798794672448518}, "mentions public services, such as fire departments.": {"p-value": 3.338392873135745e-05, "V'": 0.01693241728878603}, "expresses strong emotions, like joy or sorrow.": {"p-value": 2.0773359620749968e-10, "V'": 0.05778353207537491}, "contains long sentences.": {"p-value": 3.1203824611936664e-23, "V'": 0.21649991882438208}, "uses complex language structures.": {"p-value": 7.34616806062888e-07, "V'": 0.05881402966298794}, "contain factual statements.": {"p-value": 6.204605407463912e-26, "V'": 0.22719908574830305}}, "-": {"contains explicit references to people.": {"p-value": 1.6883041153e-314, "V'": 0.6881669113307383}, "involves characters from different professions, such as lawyers and professors.": {"p-value": 8.323001926865958e-234, "V'": 0.5914762270182821}, "Contains complex structures, such as the use of multiple conjunctions.": {"p-value": 9.78764567241587e-14, "V'": 0.14075039478896814}, "Includes complex sentence structures.": {"p-value": 5.187377505920462e-17, "V'": 0.1404762201072487}, "involves people in a professional context, such as doctors, lawyers, and bankers.": {"p-value": 1.2980009769417712e-268, "V'": 0.6499145757617937}, "mentions people or characters in a single sentence.": {"p-value": 0.0, "V'": 0.7338546479314665}, "uses legal terms and technical jargon.": {"p-value": 2.0873896489341683e-06, "V'": 0.0671423486954977}, "refers to literature, such as famous authors and artworks.": {"p-value": 3.235900420200114e-26, "V'": 0.14236716483444894}, "mentions entities (people, places, things).": {"p-value": 3.3847277570712217e-118, "V'": 0.37866339730349985}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are WANLI dataset premises, while the Group B snippets are HANS dataset premises. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"require thinking about abstract concepts.": {"p-value": 7.432772277499328e-07, "V'": 0.05390773256982971}, "uses a formal language.": {"p-value": 4.925089439630876e-46, "V'": 0.2066564821587431}}, "-": {"contain complex language structures.": {"p-value": 3.942075313773655e-33, "V'": 0.17165908420204423}, "are long": {"p-value": 8.696707528724066e-16, "V'": 0.10421750316654335}, "uses a lot of complex words.": {"p-value": 5.661833982781931e-24, "V'": 0.1352773430562469}, "require domain knowledge to interpret": {"p-value": 2.9490455775944802e-09, "V'": 0.07351736392375874}, "focuses on factual information such as history and geography.": {"p-value": 0.0002966666534580401, "V'": 0.054033028202347205}, "uses complex language, with long sentences and clauses.": {"p-value": 3.3513088149731e-12, "V'": 0.12580824715576633}, "features descriptions of everyday life and activities.": {"p-value": 0.0008495879422779753, "V'": 0.026208593561209612}, "uses informal language.": {"p-value": 3.0483005883260274e-35, "V'": 0.17208704244337492}, "contain informal language, such as colloquialisms.": {"p-value": 4.429082646074853e-09, "V'": 0.04399607620470625}, "uses complex language and includes figurative language.": {"p-value": 9.59675248244594e-07, "V'": 0.05147699614337638}, "contains complex and long sentences.": {"p-value": 5.545991021197818e-05, "V'": 0.07114122345337359}, "uses complex language, with long words and sentences.": {"p-value": 2.456141255314013e-05, "V'": 0.07523070502996465}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are WANLI dataset premises, while the Group B snippets are MNLI dataset premises. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"contains questions which suggest a lack of knowledge or understanding.": {"p-value": 3.0758604317345676e-05, "V'": 0.03300004732041881}, "talks about military operations.": {"p-value": 2.3663507785428197e-05, "V'": 0.03994043751628562}, "involves military action or conflict.": {"p-value": 0.00028022906344563514, "V'": 0.03399998503688176}, "have complex sentence structures.": {"p-value": 0.00027641115424056764, "V'": 0.08057872376432751}}, "-": {"uses legal or governmental terms.": {"p-value": 8.38221835296916e-08, "V'": 0.06404605255812293}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are MNLI dataset hypotheses with mismatched annotations, while the Group B snippets are MNLI dataset hypotheses with matched annotations. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"contains questions.": {"p-value": 9.941714833043507e-07, "V'": 0.038495495050558375}}, "-": {"uses long and complex sentences.": {"p-value": 4.354888267123844e-108, "V'": 0.2874598911982491}, "contains technical language.": {"p-value": 1.5926184465302613e-60, "V'": 0.2952395928967009}, "uses technical language and scientific terms.": {"p-value": 2.6770931565745202e-42, "V'": 0.19857535638322582}, "mentions historical events and figures.": {"p-value": 1.1740563638264228e-88, "V'": 0.3301152423110408}, "contain long sentences.": {"p-value": 5.793933156804847e-132, "V'": 0.38990035609947427}, "contains specialized vocabulary, such as scientific, historical, and political terms.": {"p-value": 6.757138695282445e-132, "V'": 0.46275836628761463}, "involves complex sentence structures.": {"p-value": 1.2063209401210906e-99, "V'": 0.2976289331548361}, "talks about historical events or persons.": {"p-value": 1.1045601919418617e-121, "V'": 0.43696545067577186}, "contain complex sentence structures and long sentences": {"p-value": 3.408900641054719e-84, "V'": 0.2420350878253924}, "contain complex sentence structures.": {"p-value": 1.3545761321736884e-79, "V'": 0.24265183033167814}, "have a formal tone.": {"p-value": 5.363266222485977e-23, "V'": 0.08821439203514125}, "uses long sentences.": {"p-value": 6.509963313956505e-125, "V'": 0.37314436211416246}, "has formal language.": {"p-value": 1.061027308283062e-22, "V'": 0.0828053697093426}, "involve abstract concepts, such as God and faith.": {"p-value": 1.1815150232959204e-09, "V'": 0.052428633567359204}, "refers to a historical event.": {"p-value": 6.751202628439374e-63, "V'": 0.28190511293394965}, "mentions historical events or figures.": {"p-value": 1.623639956270112e-61, "V'": 0.26522221274259106}, "mentions historical events, such as the moon landing.": {"p-value": 6.413564488061069e-05, "V'": 0.03548328907251121}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the example is from. The Group A snippets are WANLI dataset premises, while the Group B snippets are QNLI dataset premises. I am a natural language processing researcher measuring language models on different tasks. My goal is to figure out any differences between these datasets which might influence performance. "}, {"+": {"emphasizes the importance of nuclear disarmament": {"p-value": 7.760461085318624e-23, "V'": 0.21419458874028258}, "emphasizes the need for strengthening and universalizing the IAEA safeguards system": {"p-value": 0.0006387730468797406, "V'": 0.03893593931585185}, "emphasizes the need for states to comply with their NPT obligations": {"p-value": 2.1343272732734187e-12, "V'": 0.13923620930297967}, "urges for the strengthening of export controls to prevent the misuse of nuclear material, equipment and technology": {"p-value": 0.00022579419420106156, "V'": 0.04684572294251829}, "emphasizes the need for states to remain in compliance with their Article II obligations": {"p-value": 2.1388423036325983e-05, "V'": 0.061747570245382716}}, "-": {}, "research goal": "The dataset includes Non-Proliferation of Nuclear Weapons (NPT) conference transcripts. The two classes are generated based on which year the report was published. The Group A snippets are NPT conference reports before 2008, while the Group B snippets are NPT conference reports between 2008 and 2012. I am a political scientist studying the history of the NPT. My goal is to figure out what specific topics dominated NPT discussions across time. "}, {"+": {"highlights the need for strict compliance and universalization of the NPT": {"p-value": 1.85090284183889e-21, "V'": 0.11742107442435297}, "emphasizes the need for transparency and reporting of nuclear arsenals": {"p-value": 5.799658688060241e-05, "V'": 0.04660300120013689}, "emphasizes the need for nuclear-weapon-free zones to prevent nuclear proliferation": {"p-value": 0.0005400102223323628, "V'": 0.03839351976813554}, "calls for a further strengthening of the NPT by tackling the supply of nuclear technology to recipient states": {"p-value": 3.151074700290964e-05, "V'": 0.018890562743813517}, "emphasizes the importance of the Non-Proliferation and Disarmament Initiative": {"p-value": 1.1181074368760698e-24, "V'": 0.19574065938554994}, "emphasizes the need for safeguards": {"p-value": 6.69992804606588e-26, "V'": 0.22044145176901725}, "focuses on the need for the international community to display a shared responsibility in strengthening the nuclear non-proliferation regime": {"p-value": 5.3587845362138745e-39, "V'": 0.2390521537912004}, "encourages incremental steps in dealing with non-strategic nuclear weapons": {"p-value": 1.686815726205695e-33, "V'": 0.21696730889837387}, "emphasizes the need for a legally binding international instrument on the non-use of nuclear weapons against non-nuclear weapon states": {"p-value": 0.00019949774213665931, "V'": 0.04293438459402855}, "highlights the importance of the 1995 and 2000 decisions regarding disarmament and the enunciation of concrete measures to implement them": {"p-value": 2.6058574367434623e-08, "V'": 0.03761818263157006}, "highlights the need for increased transparency and confidence building measures among Member States": {"p-value": 1.7093729969398134e-48, "V'": 0.21431523507904737}}, "-": {}, "research goal": "The dataset includes Non-Proliferation of Nuclear Weapons (NPT) conference transcripts. The two classes are generated based on which year the report was published. The Group A snippets are NPT conference reports between 2008 and 2012, while the Group B snippets are NPT conference reports after 2012. I am a political scientist studying the history of the NPT. My goal is to figure out what specific topics dominated NPT discussions across time. "}, {"+": {"uses language that is evasive": {"p-value": 0.0005088705703306209, "V'": 0.020526846974704416}, "uses language that is exaggerated": {"p-value": 1.9778572142475085e-12, "V'": 0.05171350160296883}, "uses exaggerated language": {"p-value": 3.8089204343826126e-11, "V'": 0.047845568787500126}}, "-": {"uses language that is affirmative": {"p-value": 1.871808461489439e-06, "V'": 0.04307299130216924}, "uses truthful language": {"p-value": 1.1228580612819944e-57, "V'": 0.2685427938374122}, "uses factual language": {"p-value": 1.4236863185045458e-34, "V'": 0.2570338773753524}, "uses neutral language": {"p-value": 9.148252429408118e-22, "V'": 0.20379318390473533}}, "research goal": "The dataset includes arbitrary lies and truths from any domain generated by crowdworkers. The two classes are generated based on whether subjects were told to lie. The Group A snippets are random truth statements, while the Group B snippets are random false statements. I am a rhetoric researcher studying the effect of lying on speaking style. My goal is to figure out the specific speaking styles of liars. "}, {"+": {}, "-": {}, "research goal": "The dataset includes submissions to ICLR, a machine learning conference from 2018 to 2021. The two classes are generated based on the rating from paper reviewers. The Group A snippets are good journal submissions, while the Group B snippets are bad journal submissions. I am a young researcher who hopes to get published. My goal is to figure out the specific writing styles of good abstracts, so I know how to write my abstract. "}, {"+": {}, "-": {}, "research goal": "The dataset includes submissions to ICLR, a machine learning conference from 2018 to 2021. The two classes are generated based on the rating from paper reviewers. The Group A snippets are good journal submissions, while the Group B snippets are bad journal submissions. I am a young researcher picking a topic. My goal is to figure out the research areas of successful papers, so I know what to study. "}, {"+": {}, "-": {}, "research goal": "The dataset includes submissions to ICLR, a machine learning conference from 2018 to 2021. The two classes are generated based on the rating from paper reviewers. The Group A snippets are good journal submissions, while the Group B snippets are very good journal submissions. I am a young researcher who hopes to get published. My goal is to figure out the specific writing styles of good abstracts, so I know how to write my abstract. "}, {"+": {}, "-": {}, "research goal": "The dataset includes submissions to ICLR, a machine learning conference from 2018 to 2021. The two classes are generated based on the rating from paper reviewers. The Group A snippets are good journal submissions, while the Group B snippets are very good journal submissions. I am a young researcher picking a topic. My goal is to figure out the research areas of successful papers, so I know what to study. "}, {"+": {"Mentions the need for immigration reform for Asian Americans": {"p-value": 2.433533178763872e-10, "V'": 0.04761864987333744}, "mentions the consequences of racism and discrimination": {"p-value": 8.032067363792093e-05, "V'": 0.21765278552882933}, "mentions the difficulty of immigration for Asian Americans": {"p-value": 3.662056190743488e-08, "V'": 0.04640276094863811}, "mentions the importance of intersectionality": {"p-value": 7.019509985702417e-16, "V'": 0.3284162826731375}, "mentions the lack of access to healing services": {"p-value": 3.794503087651079e-11, "V'": 0.13408061279914005}, "mentions the lack of resources for survivors of sexual assault": {"p-value": 0.0002641357393406907, "V'": 0.10123411500471297}}, "-": {}, "research goal": "The dataset includes oral histories from the United States. The two classes are generated based on the race of the narrator. The Group A snippets are oral histories of Asian people, while the Group B snippets are oral histories of white people. I am a sociologist studying how race affects living conditions. My goal is to figure out the specific struggles of people of different races. "}, {"+": {"describes the challenges of navigating interracial relationships": {"p-value": 6.308962642138863e-19, "V'": 0.11525994527653982}, "mentions the intersectionality between race, gender, class, and sexuality": {"p-value": 2.1605874729792496e-05, "V'": 0.07846036415072122}, "mentions slavery and its effects on the narrator": {"p-value": 1.6166470030918354e-12, "V'": 0.0715630333772735}, "mentions experiences of racism": {"p-value": 3.544808129183042e-18, "V'": 0.13010218639762453}, "mentions the Civil Rights Movement": {"p-value": 3.656549574388803e-11, "V'": 0.08570666643181057}, "references the experience of racism": {"p-value": 6.390693090428133e-29, "V'": 0.1902307867112867}, "mentions the need to defend young black men": {"p-value": 9.70148351205975e-15, "V'": 0.0842517133477036}}, "-": {"mentions the role of women in the Women's Liberation Movement": {"p-value": 0.0009159491211547926, "V'": 0.04898499096747218}, "mentions the power structures of the workplace": {"p-value": 4.249641690416137e-08, "V'": 0.09605928347414178}, "Mentions the struggle for equal rights for women": {"p-value": 5.707369645826384e-09, "V'": 0.13592657500061078}, "mentions the role of feminism in the liberation movement": {"p-value": 1.1150071696922633e-06, "V'": 0.07857634840440599}, "mentions the importance of education and empowerment for women": {"p-value": 0.00012642009875529103, "V'": 0.08058544527868963}}, "research goal": "The dataset includes oral histories from the United States. The two classes are generated based on the race of the narrator. The Group A snippets are oral histories of black people, while the Group B snippets are oral histories of white people. I am a sociologist studying how race affects living conditions. My goal is to figure out the specific struggles of people of different races. "}, {"+": {"mentions experiences with education institutions": {"p-value": 3.701779504846313e-07, "V'": 0.11535272820748457}, "mentions experiences of education": {"p-value": 7.638260010014304e-07, "V'": 0.10944096485334387}}, "-": {}, "research goal": "The dataset includes oral histories from the United States. The two classes are generated based on the education level of the narrator. The Group A snippets are oral histories of people with college degrees, while the Group B snippets are oral histories of people without college degrees. I am a sociologist studying how education affects living conditions. My goal is to figure out the specific experiences and struggles of people of different education levels. "}, {"+": {"mentions a form of discrimination based on race": {"p-value": 0.00024231129177711242, "V'": 0.10160258115673779}, "mentions a family member having to take on a job to support the family": {"p-value": 0.000519153532351444, "V'": 0.017390381415496994}, "mentions the effects of World War II on the economy and social life": {"p-value": 4.237786121279633e-05, "V'": 0.02572485586950029}, "mentions traditional gender roles": {"p-value": 1.4945288556375382e-05, "V'": 0.13284624815413715}, "refers to the use of physical labor from African Americans": {"p-value": 0.0008798980467211857, "V'": 0.036858941057537495}}, "-": {"discusses the women's movement of the 1960s and 1970s": {"p-value": 1.1842071456888553e-06, "V'": 0.12755056509248558}}, "research goal": "The dataset includes oral histories from the United States. The two classes are generated based on when the narrator lived. The Group A snippets are oral histories of people born before 1930, while the Group B snippets are oral histories of people born between 1930 and 1950. I am a historian writing about everyday people in different time periods. My goal is to figure out the specific experiences and struggles of people across time. "}, {"+": {"mentions gender roles and their influence on the workplace": {"p-value": 5.42952232295471e-05, "V'": 0.0839436375974281}, "mentions the difficulty of being a woman in a male-dominated field": {"p-value": 0.00011302845002234334, "V'": 0.07136483415044512}}, "-": {}, "research goal": "The dataset includes oral histories from the United States. The two classes are generated based on when the narrator lived. The Group A snippets are oral histories of people born between 1930 and 1950, while the Group B snippets are oral histories of people born after 1950. I am a historian writing about everyday people in different time periods. My goal is to figure out the specific experiences and struggles of people across time. "}, {"+": {"refers to the feminist movement": {"p-value": 1.2046647043897086e-10, "V'": 0.335522605403843}, "mentions work in the city": {"p-value": 0.00016397287405361393, "V'": 0.10202182254047157}, "describes the struggles of women's rights and justice": {"p-value": 4.309192986451865e-08, "V'": 0.325586932852699}, "Mentions the struggle for equal rights and freedoms for women": {"p-value": 1.1157549569220556e-13, "V'": 0.4084332887781984}, "mentions the role of feminism in their lives": {"p-value": 1.56192502070967e-09, "V'": 0.25255120837244627}, "refers to civil rights, racial injustice, and/or discrimination": {"p-value": 5.232035705604151e-09, "V'": 0.3433383590140627}, "mentions the role of race and racism in society": {"p-value": 1.0241716120136248e-07, "V'": 0.27023211746890546}, "Mentions the history of violence against women, including rape and sexual assault": {"p-value": 6.967241707451802e-07, "V'": 0.2968143226004877}}, "-": {}, "research goal": "The dataset includes oral histories from the United States. The two classes are generated based on where the narrator lived. The Group A snippets are oral histories of people from the South, while the Group B snippets are oral histories of people not from the South. I am a historian studying life in different regions of the United States. My goal is to figure out the specific experiences and struggles of people in different regions of the United States. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on what year the post was made. The Group A snippets are Reddit posts from 2017, while the Group B snippets are Reddit posts from 2016. I am a sociologist studying parenting. My goal is to figure out how parents' concerns and needs have changed over the years. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on what year the post was made. The Group A snippets are Reddit posts from 2018, while the Group B snippets are Reddit posts from 2017. I am a sociologist studying parenting. My goal is to figure out how parents' concerns and needs have changed over the years. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on what year the post was made. The Group A snippets are Reddit posts from 2019, while the Group B snippets are Reddit posts from 2018. I am a sociologist studying parenting. My goal is to figure out how parents' concerns and needs have changed over the years. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on what year the post was made. The Group A snippets are Reddit posts from 2020, while the Group B snippets are Reddit posts from 2019. I am a sociologist studying parenting. My goal is to figure out how parents' concerns and needs have changed over the years. "}, {"+": {"expresses concern for the physical and emotional comfort of their baby": {"p-value": 2.6523212435477324e-33, "V'": 0.2358574852268247}, "expresses worries about breastfeeding and milk supply": {"p-value": 5.105674944464394e-24, "V'": 0.13109988238245043}, "discusses the physical strain of pregnancy": {"p-value": 2.881603689570481e-18, "V'": 0.13878167309044365}, "Mentions worries about how to manage a new baby": {"p-value": 4.099321538139353e-23, "V'": 0.19804714664495499}, "mentions worries about breast feeding": {"p-value": 4.2937234924803655e-20, "V'": 0.11752161204173327}, "Expresses concerns about breastfeeding": {"p-value": 4.0154365949291925e-20, "V'": 0.11512886320756849}, "Discusses difficulty balancing work and family life": {"p-value": 4.644793820400641e-22, "V'": 0.17211715897165092}, "expresses frustrations about spouse not helping out around the house": {"p-value": 1.8418733371993023e-08, "V'": 0.05057200462723011}, "discusses the challenges of parenting an only child": {"p-value": 9.0868214936119e-09, "V'": 0.04989580326225391}, "expresses concerns about the privacy of his family": {"p-value": 2.851119778312013e-26, "V'": 0.18413651935305872}, "Discusses the challenges of managing work and parenting responsibilities": {"p-value": 5.335625532385075e-74, "V'": 0.343240406341471}, "mentions difficulties with disciplining children": {"p-value": 1.1539291996569319e-17, "V'": 0.12822760645838302}, "discusses the need to introduce solid foods to a baby": {"p-value": 3.1894459180456784e-05, "V'": 0.020587027394332404}}, "-": {}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on whether the parent is a mother or father. The Group A snippets are Reddit posts from mothers, while the Group B snippets are Reddit posts from fathers. I am a sociologist studying parenting. My goal is to figure out the specific concerns and needs of mothers versus fathers. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on how long the user has been posting on parenting subreddits. The Group A snippets are posts from parents who have beem on Reddit for less than a month, while the Group B snippets are parenting posts from the first to third months on Reddit. I am a sociologist studying the process of raising a child. My goal is to figure out how different concerns and needs come up as a baby grows. "}, {"+": {}, "-": {"mentions the difficulty of getting a newborn to sleep": {"p-value": 6.441348859185843e-05, "V'": 0.04797128968759523}, "discusses the challenges of breastfeeding": {"p-value": 1.662400023664713e-05, "V'": 0.06135182617873772}, "mentions feeling overwhelmed and exhausted as a new parent": {"p-value": 6.0555265810763275e-06, "V'": 0.08517782662433496}}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on how long the user has been posting on parenting subreddits. The Group A snippets are Reddit posts from parents first 3 months on Reddit, while the Group B snippets are parenting Reddit posts from 3 months to 1 year. I am a sociologist studying the process of raising a child. My goal is to figure out how different concerns and needs come up as a baby grows. "}, {"+": {}, "-": {"refers to the need for seeking help and advice from other parents": {"p-value": 0.0005867030349554812, "V'": 0.045978049793229225}, "Discusses the challenges of juggling multiple children": {"p-value": 0.00013557059607899338, "V'": 0.027402075844469367}, "mentions the importance of creating positive memories": {"p-value": 0.0009350916262251168, "V'": 0.013063022353057265}}, "research goal": "The dataset includes posts from individual parents in parenting Subreddits on the site Reddit. The two classes are generated based on how long the user has been posting on parenting subreddits. The Group A snippets are Reddit posts on parenting subreddits from accounts more than 5 years old, while the Group B snippets are Reddit posts on parenting subreddits from accounts less than 5 years old. I am a sociologist studying the process of raising a child. My goal is to figure out how different concerns and needs come up as a baby grows. "}, {"+": {"discusses the challenges of breastfeeding": {"p-value": 9.609524577672161e-15, "V'": 0.6645027723736343}, "discusses difficulty breastfeeding and maintaining milk supply": {"p-value": 7.795534100458794e-09, "V'": 0.5375174086236929}, "Discusses difficulty in breastfeeding and lack of support": {"p-value": 2.4001287959038568e-05, "V'": 0.4040419250337648}, "discusses difficulty breastfeeding": {"p-value": 4.934029012948374e-07, "V'": 0.4754692778532197}, "mentions struggles with breastfeeding and milk supply": {"p-value": 1.0102491835500963e-10, "V'": 0.5447328415037065}}, "-": {"discusses introducing solid foods to the baby": {"p-value": 3.559140074610371e-22, "V'": 0.5822508482599444}, "discusses specific types of baby formulas": {"p-value": 8.828717393957718e-06, "V'": 0.25541178195541475}, "mentions difficulty in getting baby to accept solid foods": {"p-value": 3.4728082979440016e-16, "V'": 0.5281395853224434}, "mentions concerns about balancing baby's diet": {"p-value": 7.583350590105964e-17, "V'": 0.6818179015177326}, "discusses problems with feeding, such as vomiting and fussiness": {"p-value": 1.6719834697906378e-05, "V'": 0.4083711186476384}, "mentions introducing solids": {"p-value": 4.883653373452613e-19, "V'": 0.5663785211398479}, "mentions difficulties with picky eating": {"p-value": 3.935089943116576e-06, "V'": 0.24098203301169485}, "mentions difficulties with formula use": {"p-value": 4.277196366974811e-07, "V'": 0.4444450402918926}, "discusses the best food options and types of formula to use": {"p-value": 8.401306786914736e-15, "V'": 0.5714290032161777}, "Discusses trying to transition to formula": {"p-value": 6.656612671848702e-05, "V'": 0.33261101467052023}, "mentions difficulties introducing solid foods": {"p-value": 1.6649530511880234e-20, "V'": 0.5743157049162197}}, "research goal": "The dataset includes posts from various parenting-related Subreddits, which on forums on the site Reddit. The two classes are generated based on which Subreddit the post is from. The Group A snippets are Reddit posts froma Subreddit about breastfeeding, while the Group B snippets are Reddit posts from a Subreddit about baby food. I am a parent trying to figure out which forum to post on. My goal is to figure out the specific concerns and needs that come up on different topics. "}, {"+": {}, "-": {}, "research goal": "The dataset includes posts from various parenting-related Subreddits, which on forums on the site Reddit. The two classes are generated based on which Subreddit the post is from. The Group A snippets are Reddit posts from a Subreddit about parents asking for support, while the Group B snippets are Reddit posts from a Subreddit about interacting with children. I am a parent trying to figure out which forum to post on. My goal is to figure out the specific concerns and needs that come up on different topics. "}, {"+": {"Explores the challenges of making decisions as a single parent": {"p-value": 6.373951316962499e-05, "V'": 0.5000000599202429}, "mentions the difficulties of being a single parent": {"p-value": 0.00012067154952837322, "V'": 0.603897213995698}, "mentions difficulties of single parenting": {"p-value": 0.00018381628999183605, "V'": 0.5519476157405969}}, "-": {"expresses the difficulty of preparing for and understanding kinship adoption": {"p-value": 5.12412625091929e-08, "V'": 0.7662326080878896}, "mentions difficulties in establishing an identity for adopted children": {"p-value": 0.00037624183493386703, "V'": 0.42857072008998187}, "mentions challenges of adopting a child": {"p-value": 3.364578392444457e-10, "V'": 0.7857149467230582}, "discusses the challenges of adoption or step-parenting": {"p-value": 1.0366993811926485e-06, "V'": 0.7272730642491325}, "Questions about how to love an adopted child as their own": {"p-value": 6.373739365836157e-05, "V'": 0.5000002948925896}}, "research goal": "The dataset includes posts from various parenting-related Subreddits, which on forums on the site Reddit. The two classes are generated based on which Subreddit the post is from. The Group A snippets are Reddit posts from a Subreddit about single parents, while the Group B snippets are Reddit posts from a Subreddit about non-biological parents. I am a parent trying to figure out which forum to post on. My goal is to figure out the specific concerns and needs that come up on different topics. "}, {"+": {}, "-": {"emphasizes the themes of love and mortality": {"p-value": 2.3176895157046035e-05, "V'": 0.36917268006000487}}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on who wrote the poem. The Group A snippets are poems by Emily Dickinson, while the Group B snippets are poems by Alfred, Lord Tennyson. I am a comparative literature researcher. My goal is to figure out the specific topics that each poet writes about. "}, {"+": {"describes a strong emotional reaction to a natural setting": {"p-value": 3.1185130286412167e-12, "V'": 0.42515517259117513}, "uses vivid imagery to describe the beauty of nature": {"p-value": 3.85805886701433e-08, "V'": 0.32867471564799877}, "uses imagery to evoke a sense of nature and the countryside": {"p-value": 2.730761600111754e-09, "V'": 0.3978294394612427}, "expresses nostalgia for a simple, rural lifestyle": {"p-value": 8.600137969037911e-12, "V'": 0.3369562466800485}, "expresses a strong connection to nature": {"p-value": 7.123027568003783e-06, "V'": 0.3053827015194561}, "expresses love for the beauty of nature": {"p-value": 9.193984980405468e-10, "V'": 0.3721533098855212}, "focuses on the beauty of nature": {"p-value": 9.236513716917555e-08, "V'": 0.31780604799774587}, "describes nature in a romantic way": {"p-value": 2.2341875268549916e-08, "V'": 0.3834924611585253}, "uses imagery to evoke a sense of nature and the outdoors": {"p-value": 2.2239295178938688e-08, "V'": 0.37548999523883486}, "emphasizes the beauty of nature and its power to evoke emotion": {"p-value": 9.32600720334074e-10, "V'": 0.4073495363752766}, "refers to nature and natural phenomena": {"p-value": 1.6879214835232742e-05, "V'": 0.2871234491576026}}, "-": {"explores the power of love and its consequences": {"p-value": 3.2290416688606734e-06, "V'": 0.3207374966237845}, "contains references to love and romance": {"p-value": 0.0006592638338049365, "V'": 0.22929595413114934}, "contains references to love and relationships": {"p-value": 0.0001788561053944136, "V'": 0.24565219477061728}}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on who wrote the poem. The Group A snippets are poems by William Wordsworth, while the Group B snippets are poems by William Shakespeare. I am a comparative literature researcher. My goal is to figure out the specific topics that each poet writes about. "}, {"+": {}, "-": {"reflects on the effects of man-made conflict": {"p-value": 5.097453827537853e-05, "V'": 0.07190358788986018}, "mentions historical events or figures": {"p-value": 5.221042493350398e-05, "V'": 0.07349288331677906}, "references historical events or figures": {"p-value": 2.0775178038795146e-05, "V'": 0.07756815508687354}}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on the subject of the poem. The Group A snippets are poems that contain social commentary, while the Group B snippets are poems about history. I am a comparative literature researcher. My goal is to figure out the specific tones, themes, and motifs associated with each subject. "}, {"+": {"uses romantic language to describe a relationship": {"p-value": 2.8691639383638588e-05, "V'": 0.07026044572951193}, "focuses on romantic relationships": {"p-value": 5.444297767772506e-10, "V'": 0.09468597887499772}, "explores longing and desire between two people": {"p-value": 0.0006010975821107019, "V'": 0.06094062058328714}, "focuses on the complexities of love": {"p-value": 2.8195059080150725e-05, "V'": 0.07394091966711136}}, "-": {"explores the idea of family and belonging": {"p-value": 6.745768341380476e-19, "V'": 0.16018569132937344}, "captures the complex emotions of family relationships": {"p-value": 1.845540519336749e-14, "V'": 0.14207274711036566}, "focuses on a family member or parent": {"p-value": 1.2648024616416741e-25, "V'": 0.19055570226699658}, "depicts family relationships": {"p-value": 1.642715278480724e-23, "V'": 0.18400181377933972}, "explores the idea of family connections": {"p-value": 3.643939193093194e-25, "V'": 0.18672149385510006}, "describes complicated family dynamics": {"p-value": 3.5535309467366505e-10, "V'": 0.1125900726740835}, "explores the complexity of family relationships": {"p-value": 4.138887121946492e-17, "V'": 0.1557934622751057}}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on the subject of the poem. The Group A snippets are poems about relationships, while the Group B snippets are poems about family. I am a comparative literature researcher. My goal is to figure out the specific tones, themes, and motifs associated with each subject. "}, {"+": {}, "-": {}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on the subject of the poem. The Group A snippets are poems about time and brevity, while the Group B snippets are poems about nature. I am a comparative literature researcher. My goal is to figure out the specific tones, themes, and motifs associated with each subject. "}, {"+": {"mentions normal heart and lungs": {"p-value": 7.3861290615881125e-06, "V'": 0.13076074900148554}}, "-": {"mentions renal growth": {"p-value": 0.0009746687123092475, "V'": 0.09113490648164141}, "mentioning asymmetry in kidney size": {"p-value": 7.321085942906273e-05, "V'": 0.0799146640736669}}, "research goal": "The dataset includes impressions and medical histories of radiology patients. The two classes are generated based on whether medical experts agreed on a diagnosis. The Group A snippets describe impressions of patients with consensus diagnoses, while the Group B snippets describe impressions of patients with conflicting diagnoses. I am a medical student training to become a radiologist. My goal is to figure out what symptoms make experts more confident in a diagnosis. "}, {"+": {}, "-": {"mentions possible round pneumonia": {"p-value": 2.1746967924338578e-05, "V'": 0.09090943925469563}, "mentions lobe": {"p-value": 7.651506144220952e-05, "V'": 0.2583725977030083}}, "research goal": "The dataset includes impressions and medical histories of radiology patients. The two classes are generated based on whether medical experts agreed on a diagnosis. The Group A snippets describe impressions of cough patients with consensus diagnoses, while the Group B snippets describe impressions of cough patients with conflicting diagnoses. I am a medical student training to become a radiologist. My goal is to figure out what symptoms make experts more confident in a diagnosis. "}, {"+": {"mentions a cough for several weeks": {"p-value": 0.0006009650241818522, "V'": 0.22869992579506984}}, "-": {"mentions a fever with no other symptoms": {"p-value": 8.685464015875042e-50, "V'": 0.6097556364551198}, "mentions a fever of unknown origin": {"p-value": 2.11085196007511e-16, "V'": 0.6547066004523754}, "mentions fever": {"p-value": 3.8166674975107804e-16, "V'": 0.6502235579661249}}, "research goal": "The dataset includes impressions and medical histories of radiology patients. The two classes are generated based on which diagnosis the patient was assigned. The Group A snippets describe medical histories of patients with just a cough, while the Group B snippets describe medical histories of patients with a fever. I am a medical student training to become a radiologist. My goal is to figure out which symptoms or histories suggest each diagnosis. "}, {"+": {}, "-": {"mentions a history of respiratory infection": {"p-value": 8.43192740582247e-11, "V'": 0.3214290701268862}}, "research goal": "The dataset includes impressions and medical histories of radiology patients. The two classes are generated based on which diagnosis the patient was assigned. The Group A snippets describe impressions of patients with a cough, while the Group B snippets describe impressions of patients with a pneumonia, a lung inflamation. I am a medical student training to become a radiologist. My goal is to figure out which symptoms or histories suggest each diagnosis. "}, {"+": {}, "-": {}, "research goal": "The dataset includes impressions and medical histories of radiology patients. The two classes are generated based on which diagnosis the patient was assigned. The Group A snippets describe impressions of patients with a urinary track infection, while the Group B snippets describe impressions of patients with vesicoureteral reflux, an abnormal flow of urine. I am a medical student training to become a radiologist. My goal is to figure out which symptoms or histories suggest each diagnosis. "}, {"+": {"mentions the lecturer's friendliness": {"p-value": 0.00029253634229848077, "V'": 0.0629133795809757}, "mentions the lecturer's friendly demeanor": {"p-value": 0.0006736797561572652, "V'": 0.06170900665919049}, "mentions the lecturer's teaching style": {"p-value": 5.175567831061183e-08, "V'": 0.10117079143526309}}, "-": {}, "research goal": "The dataset includes reviews of lecturers from RateMyProfessor.com. The two classes are generated based on whether the lecturer is male or female. The Group A snippets rate female lecturers, while the Group B snippets rate male lecturers. I am a university dean worried about gender bias on review sites. My goal is to figure out how a lecturer's gender affects what people bring up in reviews. "}, {"+": {"uses offensive language": {"p-value": 2.4092365325877776e-07, "V'": 0.12670651792908105}, "employs sarcasm or irony": {"p-value": 7.237540615218759e-09, "V'": 0.12560915511150367}, "Uses sarcasm to make a point": {"p-value": 6.576561065241775e-08, "V'": 0.09831281691864666}, "involves crude language": {"p-value": 3.2408407076871503e-06, "V'": 0.12397444283907591}, "Uses absurdity to create humor": {"p-value": 1.1302247272222399e-08, "V'": 0.16191573006613075}, "Uses irony to create humor": {"p-value": 3.110230163357901e-11, "V'": 0.18062980120359617}, "Uses sarcasm to create humor": {"p-value": 1.2099364179224174e-05, "V'": 0.07482817737494069}, "involves a surprise twist or unexpected result": {"p-value": 3.2003752375294817e-06, "V'": 0.07598965440067235}, "uses irony to create a humorous effect": {"p-value": 2.8230967854791894e-08, "V'": 0.1503688623584553}, "creates an unexpected twist": {"p-value": 0.00044603362741769313, "V'": 0.058807676958161886}, "Exploits the unexpected": {"p-value": 3.58730898173444e-10, "V'": 0.1393728909444044}, "Uses irony": {"p-value": 5.3623818350309745e-08, "V'": 0.14040293231868228}}, "-": {"relies on puns and wordplay": {"p-value": 7.307810591537966e-07, "V'": 0.1338272744697605}, "relies on puns": {"p-value": 6.073512093581731e-09, "V'": 0.15029893982695022}, "employs puns and word play": {"p-value": 4.610828880528404e-06, "V'": 0.12583608612701308}, "relies on puns or wordplay": {"p-value": 6.090389146886197e-07, "V'": 0.13219438349249182}, "relies on wordplay or puns": {"p-value": 1.7909114174648326e-07, "V'": 0.14062129570979676}}, "research goal": "The dataset includes jokes posted on the Reddit forum r/Jokes, a message board for sharing jokes. The two classes are generated based on whether the joke recieved a lot of upvotes. The Group A snippets are funny Reddit jokes, while the Group B snippets are unfunny Reddit jokes. I am an aspiring comedian. My goal is to figure out the specific topics and setups that people find funny, so I can write a funny joke. "}, {"+": {"mentions fear of what others think or fear of death": {"p-value": 0.0006052238280806166, "V'": 0.23831662855728059}, "mentions feelings of anxiety in social situations, such as feeling uncomfortable in company": {"p-value": 1.3299209200362228e-08, "V'": 0.41572858452147793}, "mentions difficulty in controlling thoughts, such as intrusive thoughts or difficulty turning off the mind": {"p-value": 7.065564242274528e-06, "V'": 0.3304623571521518}, "mentions the fear of being judged, or not being accepted": {"p-value": 2.905407659812225e-06, "V'": 0.34295304025872086}, "mentions feelings of insecurity or low self-esteem": {"p-value": 5.701598980366829e-08, "V'": 0.32866298237458536}, "mentions difficulty in dealing with anxiety, such as not being able to control ones own thoughts and spiraling into a cycle of negative thoughts": {"p-value": 3.320123350253584e-07, "V'": 0.35304992680712277}, "mentions difficulty in controlling emotions, such as not being able to stop feeling anxious or overwhelmed": {"p-value": 2.782426650539303e-07, "V'": 0.3353825717447531}, "mentions a fear of the unknown, such as not knowing how long a situation will last": {"p-value": 0.00021117109616520168, "V'": 0.251426164610407}}, "-": {"mentions difficulty managing work load and deadlines": {"p-value": 0.00018692301246628923, "V'": 0.1511708757085119}}, "research goal": "The dataset includes stress-related posts on Reddit. The two classes are generated based on which Subreddit the post was submitted to. The Group A snippets are posts from a Subreddit about anxiety, while the Group B snippets are posts from a Subreddit about stress. I am training to become a psychiatrist. My goal is to figure out the nuanced differences between types of stress, such as the cause. "}, {"+": {"mentions difficulty in relationships, such as difficulty with communication, feeling of distance, or difficulty forming relationships": {"p-value": 6.985787265458911e-09, "V'": 0.19176770686544647}, "mentions flashbacks or nightmares related to the trauma": {"p-value": 9.024234009015387e-11, "V'": 0.13082617455100695}, "mentions traumatic events from childhood, such as sexual abuse or neglect": {"p-value": 7.700127401553085e-24, "V'": 0.23510839566716504}, "mentions fear of the abuser or perpetrator returning": {"p-value": 6.86108994584658e-08, "V'": 0.1064237898280885}, "mentions difficulty with relationships due to PTSD": {"p-value": 2.2401379592322093e-11, "V'": 0.214220302533945}, "mentions a traumatic experience, such as a car accident or assault": {"p-value": 1.4586722098793165e-31, "V'": 0.32035328999628293}, "mentions difficulty in trusting others or forming relationships": {"p-value": 2.160096057076538e-06, "V'": 0.15862518460788444}, "mentions traumatic experiences such as abuse, sexual assault, or violence": {"p-value": 1.586350543034744e-45, "V'": 0.3935005836063675}, "mentions feelings of depression, guilt, or shame related to a traumatic experience": {"p-value": 4.6605043349202065e-17, "V'": 0.27755835188954964}, "mentions physical abuse, such as being hit, pushed, or sexually assaulted": {"p-value": 3.08510338264913e-14, "V'": 0.14614006334579457}, "mentions a traumatic event, such as a car accident or military service": {"p-value": 2.704379419189234e-33, "V'": 0.32459926133531436}, "mentions difficulty trusting others due to past trauma": {"p-value": 3.0959301552921092e-24, "V'": 0.3338350587648653}, "mentions difficulty in relationships, such as not understanding what another person is feeling or not knowing how to express feelings": {"p-value": 7.11098034874627e-09, "V'": 0.1951317728431201}}, "-": {"mentions fear of the outside world, such as fear of crowds, fear of the dark, or fear of being outside of the home": {"p-value": 3.12691996464746e-07, "V'": 0.11261898759288874}, "mentions difficulty in a professional setting, such as not being able to adjust to a new job or feeling stuck in an old job": {"p-value": 7.57639905328402e-05, "V'": 0.08702418977304469}, "mentions fear of what other people think of them": {"p-value": 2.0070375011194176e-05, "V'": 0.1321964201306851}, "mentions difficulty in social situations, such as feeling anxious or scared in public or around strangers": {"p-value": 8.574056327403229e-08, "V'": 0.18006363631779826}, "mentions difficulty in managing daily tasks due to anxiety": {"p-value": 5.482959207731244e-08, "V'": 0.18054300199087547}}, "research goal": "The dataset includes stress-related posts on Reddit. The two classes are generated based on which Subreddit the post was submitted to. The Group A snippets are posts from a Subreddit about PTSD, while the Group B snippets are posts from a Subreddit about anxiety. I am training to become a psychiatrist. My goal is to figure out the nuanced differences between types of stress, such as the cause. "}, {"+": {}, "-": {}, "research goal": "The dataset includes articles from various Reuters authors. The two classes are generated based on which Reuters journalist wrote the article. The Group A snippets are news articles by Jan Lopatka, while the Group B snippets are news articles by John Mastrini. I am trying to figure out who wrote this anonymous news article. My goal is to figure out the topics that each journalist covers. "}, {"+": {"includes quotes from government officials": {"p-value": 0.0002550944138739237, "V'": 0.1848504453157782}}, "-": {"uses language related to politics, such as 'Velvet Revolution'": {"p-value": 0.0007285103980845312, "V'": 0.16059246924139992}}, "research goal": "The dataset includes articles from various Reuters authors. The two classes are generated based on which Reuters journalist wrote the article. The Group A snippets are news articles by Jan Lopatka, while the Group B snippets are news articles by John Mastrini. I am trying to figure out who wrote this anonymous news article. My goal is to figure out the specific writing style of each journalist. "}, {"+": {"discusses China's agricultural sector and food needs": {"p-value": 9.68840987644164e-45, "V'": 0.5211864272247316}, "focuses on the technical and scientific aspects of a situation": {"p-value": 1.7495405407563484e-06, "V'": 0.17634165710464078}, "refers to the economic impacts of a situation": {"p-value": 4.602378886297718e-05, "V'": 0.13077748419235757}, "mentions specific figures, such as grain crop quantity or imports and exports": {"p-value": 1.6155235406455468e-21, "V'": 0.4023564543472381}, "discusses the economic implications of a development": {"p-value": 1.7828823467816487e-05, "V'": 0.13010561609597593}, "touches on the economic effects of a situation, such as inflation, crop losses, or low lending rates": {"p-value": 2.1645405318786022e-06, "V'": 0.1571963636762711}, "mentions the effect of economic events, such as inflation or currency movements": {"p-value": 0.0009823982107805215, "V'": 0.14190240681089117}, "mentions the political implications of a situation, such as China's influence or the US government's stance": {"p-value": 0.0006192144095617604, "V'": 0.10417731613953218}}, "-": {"features quotes from Chinese political figures, such as Premier Li Peng or Vice-Premier Zhu Rongji": {"p-value": 5.411335296273252e-07, "V'": 0.1611105078296552}, "mentions the consequences of a political decision": {"p-value": 2.0049240428007453e-05, "V'": 0.18597223604211321}, "discusses the transition of Hong Kong to China": {"p-value": 6.604250140485736e-38, "V'": 0.4764430506836759}, "References a high-profile individual in the industry, such as a CEO or government official": {"p-value": 4.776771403683999e-22, "V'": 0.4267233462416905}, "mentions upcoming events, such as a protest or a ceremony": {"p-value": 4.646121758068721e-06, "V'": 0.19308574009724677}}, "research goal": "The dataset includes articles from various Reuters authors. The two classes are generated based on which Reuters journalist wrote the article. The Group A snippets are news articles by Lynne O'Donnel, while the Group B snippets are news articles by Sarah Davidson. I am trying to figure out who wrote this anonymous news article. My goal is to figure out the topics that each journalist covers. "}, {"+": {"uses technical terms such as 'futures', 'FOB' and 'inverse'": {"p-value": 1.5894190219385888e-06, "V'": 0.11905307541766413}, "uses words such as 'traders', 'buyers' and 'producers'": {"p-value": 6.332646122742724e-17, "V'": 0.36422018862012157}, "mentions specific figures, such as '1 million tonnes of corn' and '50 percent'": {"p-value": 6.092314223248685e-18, "V'": 0.3814765092546244}, "uses phrases such as 'force prices up' and 'doesn't make economic sense'": {"p-value": 1.7926488303260478e-05, "V'": 0.15052476961002076}, "uses language from the agricultural industry, such as 'fowl plague'": {"p-value": 2.873984276985396e-34, "V'": 0.4484227941482559}, "uses a lot of numbers, percentages, and statistics": {"p-value": 2.440523100226849e-10, "V'": 0.28855594581107163}, "uses complex economic terms and concepts": {"p-value": 4.188041960400209e-06, "V'": 0.20376650052461698}}, "-": {"mentions political figures or appointments": {"p-value": 2.3597525576183723e-07, "V'": 0.19432544240741198}, "mentions the effects of economic policies": {"p-value": 7.244855864944552e-05, "V'": 0.17842399697452183}}, "research goal": "The dataset includes articles from various Reuters authors. The two classes are generated based on which Reuters journalist wrote the article. The Group A snippets are news articles by Lynne O'Donnel, while the Group B snippets are news articles by Sarah Davidson. I am trying to figure out who wrote this anonymous news article. My goal is to figure out the specific writing style of each journalist. "}, {"+": {"mentions shareholders and their role in a situation": {"p-value": 1.2570541687260932e-05, "V'": 0.20773187216131106}, "highlights the roles of specific companies and their actions": {"p-value": 0.0007344789185657766, "V'": 0.06484543629907003}, "focuses on the financials of a company and its performance": {"p-value": 6.156225844338764e-05, "V'": 0.09953646798927085}, "focuses on the financial impact of a merger or acquisition": {"p-value": 1.7184498594437992e-33, "V'": 0.5503857810761252}, "mentions stock prices and values": {"p-value": 6.263758071021446e-07, "V'": 0.24651578554185277}, "focuses on financial performance and predictions for the future": {"p-value": 0.0005370321985252271, "V'": 0.08922681973793223}}, "-": {"mentions the effects of economic changes on businesses": {"p-value": 2.0816033239133955e-12, "V'": 0.341993405696711}, "discusses the Australian markets and businesses": {"p-value": 1.3803198829757967e-120, "V'": 0.8599913422302036}}, "research goal": "The dataset includes articles from various Reuters authors. The two classes are generated based on which Reuters journalist wrote the article. The Group A snippets are news articles by Robin Sidel, while the Group B snippets are news articles by Bernard Hickey. I am trying to figure out who wrote this anonymous news article. My goal is to figure out the topics that each journalist covers. "}, {"+": {"mentions industry analysts and their opinions": {"p-value": 1.1113833506554493e-07, "V'": 0.2598544978722083}, "uses terms such as 'board of directors'": {"p-value": 1.7100134399038383e-08, "V'": 0.23261600154795817}, "uses words such as 'tender' and 'shares'": {"p-value": 3.120396249814075e-06, "V'": 0.21601614047808215}, "mentions the legal implications of the acquisition": {"p-value": 3.572681210514011e-06, "V'": 0.21887923416293642}, "uses the phrase 'shareholders would be required to authorize'": {"p-value": 6.181903268946007e-08, "V'": 0.17703904562769754}, "uses the phrase 'rebuff the takeover effort'": {"p-value": 1.892862286760837e-11, "V'": 0.27235699861855805}, "mentions specific company names, such as Ridley Corp Ltd": {"p-value": 0.00023988705518857595, "V'": 0.0921268097731135}}, "-": {"uses the phrase 'margin pressure'": {"p-value": 0.0001655712847041018, "V'": 0.11537451666454483}, "uses phrases such as 'unilateral home loan interest rate cut'": {"p-value": 1.9583074124156322e-05, "V'": 0.08823457884914385}, "uses specific geographic locations, such as 'Sydney Newsroom'": {"p-value": 9.658635751800965e-37, "V'": 0.5726771530810815}, "uses the phrase 'profits before abnormals'": {"p-value": 1.61030053808077e-08, "V'": 0.14705817434825316}, "mentions the impact of globalization": {"p-value": 9.030269393418913e-10, "V'": 0.29932458718358546}}, "research goal": "The dataset includes articles from various Reuters authors. The two classes are generated based on which Reuters journalist wrote the article. The Group A snippets are news articles by Robin Sidel, while the Group B snippets are news articles by Bernard Hickey. I am trying to figure out who wrote this anonymous news article. My goal is to figure out the specific writing style of each journalist. "}, {"+": {"has at least five letters": {"p-value": 1.3524211914203597e-08, "V'": 0.1481771269936203}, "has words with an 'e' as the last letter": {"p-value": 5.089503772203959e-06, "V'": 0.11760716012606537}}, "-": {}, "research goal": "The dataset includes common English words. The two classes are generated based on whether the words fit a secret rule. The Group A snippets are words that pass, while the Group B snippets are words that don't pass. I am trying to solve this riddle with my friends. My goal is to figure out something about the spelling of the words that explains the difference. "}, {"+": {}, "-": {}, "research goal": "The dataset includes common English words. The two classes are generated based on whether the words fit a secret rule. The Group A snippets are words that pass, while the Group B snippets are words that don't pass. I am trying to solve this riddle with my friends. My goal is to figure out something about the spelling of the words that explains the difference. "}, {"+": {"involve cases in which the plaintiff alleges a violation of the United States Constitution": {"p-value": 0.00048333488962536765, "V'": 0.09636981499188646}}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on the relationship between the SCOTUS decision and the decision of the lower court. The Group A snippets are facts from Supreme Court cases where a lower court's ruling was reversed, while the Group B snippets are facts from Supreme Court cases where a lower court's ruling was affirmed. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy topics cause the SCOTUS to disagree with lower courts. "}, {"+": {"involve cases with a high degree of complexity and/or ambiguity in the law": {"p-value": 0.0005101268299059935, "V'": 0.11409337566457411}}, "-": {"involve a challenge to state laws or regulations": {"p-value": 0.0005915943348012849, "V'": 0.11381658685844631}}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on the relationship between the SCOTUS decision and the decision of the lower court. The Group A snippets are facts from Supreme Court cases where a lower court's ruling was vacated, while the Group B snippets are facts from Supreme Court cases where a lower court's ruling was reversed. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy topics cause the SCOTUS to disagree with lower courts. "}, {"+": {"involves the criminal justice system or law enforcement": {"p-value": 1.1475804266871059e-06, "V'": 0.11721237938485285}, "involves criminal law, such as murder or robbery": {"p-value": 0.00045735373055224204, "V'": 0.11373566408621683}, "involves criminal cases, such as murder": {"p-value": 0.0009783597092058766, "V'": 0.10626300657801724}, "involves questions of criminal justice, such as police brutality or prosecutorial misconduct": {"p-value": 0.0002756802840437859, "V'": 0.11910047868293216}}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who was president when the case was heard. The Group A snippets are facts from Supreme Court cases during the Obama presidency, while the Group B snippets are facts from Supreme Court cases during the Trump presidency. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy topics were brought up before the court during each presidency. "}, {"+": {"involves the Fourteenth Amendment rights of equal protection and due process": {"p-value": 0.000183540744120258, "V'": 0.08194459636528828}}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who was Chief Justice at the time of the case. The Group A snippets are facts from cases heard by the Rehnquist court, while the Group B snippets are facts from cases heard by the Roberts court. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy areas were brought up before the court under each Chief Justice. "}, {"+": {"involves the Armed Career Criminals Act": {"p-value": 0.0002513734233348218, "V'": 0.04987838235932596}}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who was Chief Justice at the time of the case. The Group A snippets are facts from criminal Supreme Court cases heard by the Roberts court, while the Group B snippets are facts from criminal Supreme Court cases heard by the Rehnquist court. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy areas were brought up before the court under each Chief Justice. "}, {"+": {"involves the right to vote": {"p-value": 2.1037489885309285e-05, "V'": 0.23599973451414596}}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who was Chief Justice at the time of the case. The Group A snippets are facts from civil rights Supreme Court cases heard by the Warren court, while the Group B snippets are facts from civil rights Supreme Court cases heard by the Burger court. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy areas were brought up before the court under each Chief Justice. "}, {"+": {"involves criminal proceedings against individuals": {"p-value": 3.6100331461481414e-05, "V'": 0.14981253548763962}}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who was Chief Justice at the time of the case. The Group A snippets are facts from cases heard by the Warren court, while the Group B snippets are facts from cases heard by the Burger court. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy areas were brought up before the court under each Chief Justice. "}, {"+": {"mentions the first party was misadvised by their lawyer": {"p-value": 7.283841856534753e-05, "V'": 0.02895820036086255}, "describes a violation of the law or regulation by the second party": {"p-value": 0.00035885583588524453, "V'": 0.07947580151040312}}, "-": {"refers to a statute or regulation that was enforced": {"p-value": 3.396394715135907e-05, "V'": 0.08564697245495556}}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who won the case. The Group A snippets are facts from Supreme Court cases where the first party won, while the Group B snippets are facts from Supreme Court cases where the first party lost. I am a lawyer preparing a case in front of the Supreme Court. My goal is to figure out the types of complains that the first party is more likely to win. "}, {"+": {}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on whether the justices agreed with one another. The Group A snippets are facts from Supreme Court cases with only a plurality (no majority) decision, while the Group B snippets are facts from Supreme Court cases with a majority decision. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy areas the justices can reach consensus on. "}, {"+": {"involves a challenge to the constitutionality of a state law": {"p-value": 0.000742938149388471, "V'": 0.11916591904928708}, "involve a claim of ineffective assistance of counsel": {"p-value": 1.95252483506003e-12, "V'": 0.18166581832323497}, "involves a challenge to a state statute criminalizing a certain form of behavior": {"p-value": 1.1519902053756636e-05, "V'": 0.15340520177466327}, "involves a challenge to a state law": {"p-value": 0.00042160605770042646, "V'": 0.11970889867488488}}, "-": {"involves a dispute over the interpretation of a contract": {"p-value": 9.539762527831217e-07, "V'": 0.16072652239399773}}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on whether the justices agreed with one another. The Group A snippets are facts from Supreme Court cases with a unanimous decision, while the Group B snippets are facts from Supreme Court cases with only a majority decision. I am a political scientist studying Supreme Court rulings. My goal is to figure out which policy areas the justices can reach consensus on. "}, {"+": {"involves the use of the Fourth Amendment": {"p-value": 2.3104464243554997e-07, "V'": 0.18429210376825317}}, "-": {"Involves a dispute over the application of the death penalty": {"p-value": 0.00019989698643269104, "V'": 0.12314155761255846}}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who the parties are. The Group A snippets are facts from Supreme Court cases involving an American state as the plaintiff, while the Group B snippets are facts from Supreme Court cases involving an American state as the defendant. I am a political scientist studying Supreme Court rulings. My goal is to figure out which general policy areas come up. "}, {"+": {"involves a dispute over the interpretation of a state law": {"p-value": 4.135275633943862e-36, "V'": 0.3515362244242482}, "involves challenges to state legislation": {"p-value": 8.070510980272058e-05, "V'": 0.10925114967477034}, "involves a dispute over the regulation of environmental protection": {"p-value": 0.0006217208673323477, "V'": 0.04680763339709995}, "involves a dispute over the interpretation of constitutional rights": {"p-value": 1.4117487527574017e-05, "V'": 0.11002853477617713}}, "-": {"involves a dispute over the enforceability of a contract": {"p-value": 2.122138419555608e-06, "V'": 0.06911838927192138}, "involves a dispute over the regulation of corporate activities": {"p-value": 6.893150064453551e-36, "V'": 0.33154780629646907}, "involves a dispute over the interpretation of a federal statute": {"p-value": 1.0922583569721054e-40, "V'": 0.3620966984610261}, "involves a dispute over corporate accountability": {"p-value": 7.642706353513584e-42, "V'": 0.35434825892471233}, "involves a dispute over the implementation of federal regulations": {"p-value": 7.177541421523985e-54, "V'": 0.42417869189874385}, "involves a dispute over the use of technology": {"p-value": 0.0007557991519686022, "V'": 0.05291545820943544}}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who the parties are. The Group A snippets are facts from Supreme Court cases involving an American state as a party, while the Group B snippets are facts from Supreme Court cases without an American state as a party. I am a political scientist studying Supreme Court rulings. My goal is to figure out which general policy areas come up. "}, {"+": {}, "-": {}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who the parties are. The Group A snippets are facts from Supreme Court cases involving the United States as the plaintiff, while the Group B snippets are facts from Supreme Court cases involving the United States as the defendant. I am a political scientist studying Supreme Court rulings. My goal is to figure out which general policy areas come up. "}, {"+": {"involves a dispute over the interpretation of a federal law": {"p-value": 2.935980538499511e-12, "V'": 0.1747302775845525}, "involves a dispute over the interpretation of a federal statute": {"p-value": 7.328271158378707e-07, "V'": 0.13637053232027946}, "involves a dispute over federal government regulations": {"p-value": 7.011782467221373e-21, "V'": 0.2547399234633134}, "involves a dispute over the interpretation of federal statutes": {"p-value": 2.0263889025563024e-11, "V'": 0.17578068255614343}, "involves a dispute over the application of a tax": {"p-value": 0.00015448916235886401, "V'": 0.050238253173212524}}, "-": {"involves a dispute over the constitutionality of a state law": {"p-value": 4.248505410225847e-28, "V'": 0.2500289078377263}, "involves a dispute over the interpretation of a state or local law": {"p-value": 2.8602435116028505e-44, "V'": 0.3580329906535639}, "Involves a dispute over the interpretation of a state law": {"p-value": 1.231099006088388e-33, "V'": 0.3023870526289341}}, "research goal": "The dataset includes facts from cases heard by the Supreme Court of the United States (SCOTUS). The two classes are generated based on who the parties are. The Group A snippets are facts from Supreme Court cases involving the United States as a party, while the Group B snippets are facts from Supreme Court cases without the United States as a party. I am a political scientist studying Supreme Court rulings. My goal is to figure out which general policy areas come up. "}, {"+": {"uses specific and accurate language": {"p-value": 0.0009873432453108941, "V'": 0.11721787441630749}, "avoids overgeneralizations": {"p-value": 3.011854936717952e-08, "V'": 0.32488778995583273}}, "-": {}, "research goal": "The dataset includes short answers from students. The two classes are generated based on how good the score was. The Group A snippets are good short answers, while the Group B snippets are bad short answers. I am a student who has to write several responses toshort answer questions. My goal is to figure out the specific writing style of good writers, so that I can improve my own writing. "}, {"+": {}, "-": {}, "research goal": "The dataset includes short answers from students. The two classes are generated based on how good the score was. The Group A snippets are good short answers, while the Group B snippets are average short answers. I am a student who has to write several responses toshort answer questions. My goal is to figure out the specific writing style of good writers, so that I can improve my own writing. "}, {"+": {}, "-": {}, "research goal": "The dataset includes top news headlines on Reddit, an online message board. The two classes are generated based on the state of the stock market on the day of the headline. The Group A snippets are headlines on days the stock market rises, while the Group B snippets are headlines on days the stock market falls. I am a quantitative trader trying to beat the market. My goal is to figure out what specific news events cause a stock market rally. "}, {"+": {"mentions feeling trapped in a situation": {"p-value": 9.314460273334602e-05, "V'": 0.08190692484057815}, "mentions a feeling of hopelessness": {"p-value": 1.4933654413909226e-08, "V'": 0.0995189888661332}, "mentions an act of violence": {"p-value": 4.759814043954736e-44, "V'": 0.2965798594756074}, "mentions feeling hopeless": {"p-value": 7.1698805552182795e-12, "V'": 0.1365840438153637}, "Mentions feeling hopeless": {"p-value": 3.9188623816431794e-08, "V'": 0.09620483771434751}, "mentions feeling hopeless and helpless": {"p-value": 3.3739184009784376e-05, "V'": 0.07213192648799527}, "mentions a sense of hopelessness": {"p-value": 8.209654537338938e-11, "V'": 0.1148158376287659}}, "-": {}, "research goal": "The dataset includes posts from r/SuicideWatch and r/depression, two forums on Reddit. The two classes are generated based on the suicidal intent of the author. The Group A snippets are notes from people who are suicidal, while the Group B snippets are notes from people who are depressed, but not suicidal. I am a psychiatrist hoping to better help my patients. My goal is to figure out the specific topics or tones brought up by people who actually plan to commit suicide. "}, {"+": {}, "-": {"mentions political defections and divisions within the Indian government": {"p-value": 0.0008494831816836811, "V'": 0.05288571629397118}}, "research goal": "The dataset includes headlines from Times of India news. The two classes are generated based on the year the headline was written. The Group A snippets are Indian news headlines from 2003, while the Group B snippets are Indian news headlines from 2004. I am an Indian historian writing about recent trends. My goal is to figure out which topics dominated the news from year to year. "}, {"+": {"mentions the Noida serial killings": {"p-value": 1.0212085482599552e-05, "V'": 0.023170814559612934}}, "-": {}, "research goal": "The dataset includes headlines from Times of India news. The two classes are generated based on the year the headline was written. The Group A snippets are Indian news headlines from 2007, while the Group B snippets are Indian news headlines from 2008. I am an Indian historian writing about recent trends. My goal is to figure out which topics dominated the news from year to year. "}, {"+": {}, "-": {"mentions the impact of the Citizenship Amendment Act": {"p-value": 6.744894938090618e-07, "V'": 0.029520425467547864}, "mentions the implementation of policies like the GST and CAA": {"p-value": 2.044256117924353e-07, "V'": 0.03815736529861593}}, "research goal": "The dataset includes headlines from Times of India news. The two classes are generated based on the year the headline was written. The Group A snippets are Indian news headlines from 2019, while the Group B snippets are Indian news headlines from 2020. I am an Indian historian writing about recent trends. My goal is to figure out which topics dominated the news from year to year. "}, {"+": {}, "-": {}, "research goal": "The dataset includes testimonies from witnesses in real trials. The two classes are generated based on whether the testimony turned out to be deceptive. The Group A snippets are truthful testimony in criminal trials, while the Group B snippets are deceptive testimony in criminal trials. I am a judge concerned about false testimony. My goal is to figure out the specific claims of innocent people. "}, {"+": {"references sports, such as football or basketball": {"p-value": 7.782299857341818e-20, "V'": 0.08830245362023313}, "talks about sports or sports teams": {"p-value": 3.0768900552763234e-20, "V'": 0.09369931637150339}, "references sports, such as football, basketball, and golf": {"p-value": 8.082968812446541e-18, "V'": 0.08171935845277029}, "discusses sports, such as football, basketball and baseball": {"p-value": 7.205549259863597e-19, "V'": 0.08465808378136602}, "references professional sports, such as football, baseball, and basketball": {"p-value": 4.8343844765048995e-14, "V'": 0.06275875834367425}, "references sports, such as teams, athletes, and games": {"p-value": 5.350896228747678e-23, "V'": 0.10563898191163801}, "discusses sports or athletic activities": {"p-value": 1.9785624640641454e-18, "V'": 0.08984016580564111}, "references current events and politics": {"p-value": 2.4001149068337893e-06, "V'": 0.04468547362917911}, "mentions or references current affairs and politics": {"p-value": 1.7364657523290886e-05, "V'": 0.042960473910166296}, "discusses sports and gaming": {"p-value": 1.1632567070091724e-23, "V'": 0.11780682569207124}}, "-": {"discusses relationships, such as family, friends, and romantic partners": {"p-value": 2.211519688734741e-16, "V'": 0.10640905610175871}, "mentions of beauty, fashion, or makeup": {"p-value": 0.00021937662949796997, "V'": 0.02064962475921126}, "mentions of family, such as siblings, parents, and children": {"p-value": 0.0001638669839285328, "V'": 0.03056096022581035}, "mentions of emotions, such as love, heartache, and sadness": {"p-value": 1.238198761513058e-13, "V'": 0.10474062965794849}, "discusses relationships, such as with family, friends, and partners": {"p-value": 7.458643622171588e-17, "V'": 0.10770349213091915}, "mentions relationships, such as with friends, family, and partners": {"p-value": 1.644086465804945e-15, "V'": 0.1052547654356459}, "mentions or references to romantic relationships, such as marriage and dating": {"p-value": 4.7228567317257276e-09, "V'": 0.05610292555941468}, "expressions of appreciation and positivity": {"p-value": 1.894711743924453e-05, "V'": 0.06358206441623207}, "discusses topics related to fashion, makeup and beauty": {"p-value": 0.0004912897658744113, "V'": 0.02057356635682398}}, "research goal": "The dataset includes random Tweets. The two classes are generated based on whether the Twitter user is male or female. The Group A snippets are Tweets from male users, while the Group B snippets are Tweets from female users. I am a gender studies researcher. My goal is to figure out the specific topics that each gender tends to talk about. "}, {"+": {"uses sports references": {"p-value": 1.5332609610795306e-17, "V'": 0.10302411202338352}, "uses sports-related terminology": {"p-value": 1.2389851555485766e-17, "V'": 0.10435941936072546}, "uses sports-related language": {"p-value": 2.250634759038789e-19, "V'": 0.1078045490431202}, "talks about sports": {"p-value": 3.8841354278016015e-20, "V'": 0.10765091147257555}, "mentions current events": {"p-value": 0.00014360823777648216, "V'": 0.037984499550681276}, "uses references to current events": {"p-value": 0.00014009199595444601, "V'": 0.04018884960564145}, "mentions sports or activities": {"p-value": 3.6563077865421355e-16, "V'": 0.10234198211587314}}, "-": {"talks about personal topics": {"p-value": 5.626315000605152e-32, "V'": 0.2522211914745754}, "uses emotional language": {"p-value": 9.601689859034291e-12, "V'": 0.1231848232092507}, "uses emotive language": {"p-value": 7.530873838046391e-10, "V'": 0.12288155431021702}, "uses emotionally expressive language": {"p-value": 1.1617508792667334e-11, "V'": 0.11669173349136708}, "talks about emotions": {"p-value": 7.552410141468176e-11, "V'": 0.1088952364586766}}, "research goal": "The dataset includes random Tweets. The two classes are generated based on whether the Twitter user is male or female. The Group A snippets are Tweets from male users, while the Group B snippets are Tweets from female users. I am a gender studies researcher. My goal is to figure out the speaking style of each gender. "}, {"+": {}, "-": {}, "research goal": "The dataset includes Tweets about various rumors. The two classes are generated based on how long it's been since the rumor started. The Group A snippets are early Twitter rumors about Denzel Washington praising Trump, while the Group B snippets are later Twitter rumors about Denzel Washington praising Trump. I am a sociologist studying the patterns of rumors. My goal is to figure out the general tone of rumors as they evolve over time. "}, {"+": {"is focused on the actual event or game": {"p-value": 0.00033451237923086175, "V'": 0.4500005945934858}}, "-": {}, "research goal": "The dataset includes Tweets about various rumors. The two classes are generated based on how long it's been since the rumor started. The Group A snippets are early Twitter rumors about the Redhawks, while the Group B snippets are later Twitter rumors about the Redhawks. I am a sociologist studying the patterns of rumors. My goal is to figure out the general tone of rumors as they evolve over time. "}, {"+": {"shows an increased acceptance of the rumor": {"p-value": 0.00021423717127691758, "V'": 0.2729588603721602}, "features references to the new laws legalizing marijuana": {"p-value": 0.00021420876001074544, "V'": 0.27296201265052533}, "will contain retweets": {"p-value": 1.504708480447157e-08, "V'": 0.48214441766457217}, "uses language that is supportive of the rumor": {"p-value": 9.019035491137712e-05, "V'": 0.2933690144803712}, "emphasizes the novelty of the rumor": {"p-value": 5.016325640820289e-05, "V'": 0.29081680139675614}, "include excitement and anticipation": {"p-value": 4.159696289629476e-07, "V'": 1.7239580935931978e-06}, "shows excitement about the new character": {"p-value": 1.7276744666825265e-07, "V'": 2.828403266096515e-06}, "contains factual statements": {"p-value": 0.00021420590910454373, "V'": 0.2729621892720032}}, "-": {"includes positive language and enthusiasm": {"p-value": 0.0006535303393703584, "V'": 0.1836747286428644}, "includes direct questions about the rumor": {"p-value": 0.0002218603887066319, "V'": 1.2767056511674206e-06}}, "research goal": "The dataset includes Tweets about various rumors. The two classes are generated based on how long it's been since the rumor started. The Group A snippets are early Twitter rumors about a Veggietales cannabis character, while the Group B snippets are later Twitter rumors about a Veggietales cannabis character. I am a sociologist studying the patterns of rumors. My goal is to figure out the general tone of rumors as they evolve over time. "}, {"+": {"include facts or evidence to back up the rumor": {"p-value": 5.823819709896647e-06, "V'": 0.4468079370627908}, "uses language that is certain about the truth of the rumor": {"p-value": 0.0007666545215679397, "V'": 0.3404258960810855}, "contains references to the cost of the yacht": {"p-value": 1.4974535911918823e-05, "V'": 0.425532346612065}, "contains evidence to support the rumor": {"p-value": 1.444051054470748e-07, "V'": 0.5106386574133133}, "emphasizes the wealth of Mark Zuckerberg": {"p-value": 0.0007538288223170234, "V'": 0.2978716469398177}, "emphasizes the cost of the yacht": {"p-value": 1.4973941157398459e-05, "V'": 0.42553340513285676}}, "-": {"includes direct refutations of the rumor": {"p-value": 4.010928715462053e-08, "V'": 0.48936275811623653}, "uses language that is cautious and questioning of the rumor": {"p-value": 2.8299217273557883e-09, "V'": 0.5319144514954028}, "is skeptical in tone and expresses doubt about the rumor": {"p-value": 4.192001816141993e-12, "V'": 0.5957455188895151}, "uses language of disbelief and/or disbelief": {"p-value": 6.305830357994397e-05, "V'": 0.2765946673510653}, "uses language that is critical of the rumor": {"p-value": 2.8298268880444312e-09, "V'": 0.531914976732124}, "expresses doubt and skepticism": {"p-value": 2.4241417195869e-05, "V'": 0.361701803724825}, "contains fact-checking and questioning of the rumor": {"p-value": 4.0109747578320766e-08, "V'": 0.4893621950375066}, "uses direct language to convey doubt": {"p-value": 5.255670670536907e-06, "V'": 0.361702831843535}, "expresses doubt and skepticism about the rumor": {"p-value": 1.0762930067611023e-10, "V'": 0.5531921419686204}, "reflects surprise and wonderment": {"p-value": 0.000561396708037134, "V'": 0.2413267834740826}, "contains questions about the authenticity of the rumor": {"p-value": 1.7035005130239356e-06, "V'": 0.4045057072218097}, "uses language that expresses surprise": {"p-value": 0.0008096734017501982, "V'": 0.2553200624027522}, "includes speculation": {"p-value": 0.0006484514886365246, "V'": 0.31914957274908773}}, "research goal": "The dataset includes Tweets about various rumors. The two classes are generated based on how long it's been since the rumor started. The Group A snippets are early Twitter rumors about Zuckerberg buying a yatch, while the Group B snippets are later Twitter rumors about Zuckerberg buying a yatch. I am a sociologist studying the patterns of rumors. My goal is to figure out the general tone of rumors as they evolve over time. "}, {"+": {"mentions products or services": {"p-value": 8.575653877876996e-48, "V'": 0.2602162897094485}, "mentions specific brands or products": {"p-value": 0.00011144837838765398, "V'": 0.04259468733590675}, "promotes a product or service": {"p-value": 1.126122892539727e-68, "V'": 0.29131818590127245}, "contains references to financial topics such as investing or trading": {"p-value": 2.496530765804768e-60, "V'": 0.23659905003386109}, "contains references to investments and financial markets": {"p-value": 1.2463925266526876e-58, "V'": 0.23131338714114413}, "contains links to external websites": {"p-value": 3.833884473103326e-209, "V'": 0.5895774961946452}, "contains keywords related to business, finance or trading": {"p-value": 1.4509157679845e-95, "V'": 0.35406372274116515}, "contains promotional language or offers": {"p-value": 3.2490378278380084e-107, "V'": 0.39518654885535254}, "contains URLs for websites or videos": {"p-value": 5.149790917704342e-204, "V'": 0.584139047632098}, "contains references to commodities and trading": {"p-value": 6.820781930040868e-47, "V'": 0.18541847199594497}, "contains URLs or links to external websites": {"p-value": 1.0340727500173477e-213, "V'": 0.5957783467303563}, "contains references to technology": {"p-value": 2.7804207836322742e-14, "V'": 0.17776927024215194}}, "-": {"uses casual language": {"p-value": 1.1417157879922157e-61, "V'": 0.2665002747422366}, "includes images or videos": {"p-value": 1.980854241001658e-07, "V'": 0.04561577848553408}, "contains references to personal experiences": {"p-value": 5.5429860509555806e-08, "V'": 0.030330499091605397}, "mentions locations or geographic areas": {"p-value": 1.9266055396727957e-11, "V'": 0.0805248240881565}, "involves conversations between users": {"p-value": 8.698128022389293e-46, "V'": 0.15803389415443703}, "contains references to popular culture": {"p-value": 1.15719660077974e-09, "V'": 0.05926687133771159}}, "research goal": "The dataset includes Tweets from users identified as bots or humans. The two classes are generated based on the type of user posting the Tweet. The Group A snippets were Tweeted by bots, while the Group B snippets were Tweeted by humans. I am an engineer at a social media company building a spam detector. My goal is to figure out the specific topics discussed by different kinds of users, so I can detect bots. "}, {"+": {"mentions other users in the tweet": {"p-value": 1.0234387534525444e-40, "V'": 0.30370004371672876}, "contains references to music, artists, or songs": {"p-value": 1.549150192566952e-39, "V'": 0.14175086596199374}, "contains references to music or musicians": {"p-value": 3.1751578688004426e-41, "V'": 0.1491953243418909}, "contains references to music and music videos": {"p-value": 4.8674361035453e-46, "V'": 0.15198955902438213}, "references to popular music and lyrics": {"p-value": 1.8818361704725664e-34, "V'": 0.11476511158034122}, "contains references to popular culture and music": {"p-value": 6.500819076441603e-44, "V'": 0.16542566242529938}}, "-": {"contains references to money-making opportunities": {"p-value": 1.923407061942653e-150, "V'": 0.41907316025683833}, "mentions products or services": {"p-value": 8.889415444082275e-104, "V'": 0.38565661450085986}, "references virtual goods": {"p-value": 1.2327301179951134e-08, "V'": 0.06829724692203501}, "contains references to investing or forex trading": {"p-value": 1.1184207528722442e-165, "V'": 0.441613632652196}, "contains references to financial markets and investments": {"p-value": 7.750835094063824e-185, "V'": 0.473259732607997}, "contains references to money, investments, or finances": {"p-value": 1.867487464214208e-228, "V'": 0.556142638054606}, "contains references to financial markets or investments": {"p-value": 1.3522594672382396e-180, "V'": 0.46847612083960904}, "contains links to online stores or products": {"p-value": 3.9528653044624575e-15, "V'": 0.10004960164415125}, "contains links to external sites": {"p-value": 2.7579180706306836e-301, "V'": 0.6198405154136418}, "contains references to business or financial topics": {"p-value": 1.0666313375651934e-290, "V'": 0.621771360785819}, "contains links to external websites": {"p-value": 1.8534702335804292e-253, "V'": 0.5896263892693001}, "contains links to other websites or media": {"p-value": 2.1005679945868342e-254, "V'": 0.5903718020051172}, "contains promotional links": {"p-value": 5.366154243168054e-280, "V'": 0.6185236666907602}}, "research goal": "The dataset includes Tweets from users identified as bots or humans. The two classes are generated based on the type of user posting the Tweet. The Group A snippets were Tweeted by bots that pretend to be people, while the Group B snippets were Tweeted by weaker bots. I am an engineer at a social media company building a spam detector. My goal is to figure out the specific topics discussed by different kinds of users, so I can detect bots. "}, {"+": {"are written in an informal tone, such as using slang terms": {"p-value": 1.0412694616637158e-07, "V'": 0.6265435202573989}}, "-": {}, "research goal": "The dataset includes a collection of Tweets without emojis. The two classes are generated based on whether the Tweet has a misspelling. The Group A snippets misspell \"going\", while the Group B snippets don't misspell \"going\". I am a linguist studying English vernacular. My goal is to figure out the social context in which people are more likely to use a misspelled word. "}, {"+": {"mentions an activity such as shopping, work, or school": {"p-value": 0.0007529399409494078, "V'": 0.3851269509530925}, "uses slang or colloquial expressions, such as 'tapauing' or 'luh'": {"p-value": 3.4567702913346156e-07, "V'": 0.7672283346664309}, "is talking about the present or recent activities": {"p-value": 0.0006231561833842822, "V'": 0.3080671511613244}}, "-": {}, "research goal": "The dataset includes a collection of Tweets without emojis. The two classes are generated based on whether the Tweet has a misspelling. The Group A snippets misspell \"that\", while the Group B snippets don't misspell \"that\". I am a linguist studying English vernacular. My goal is to figure out the social context in which people are more likely to use a misspelled word. "}, {"+": {}, "-": {}, "research goal": "The dataset includes a collection of Tweets without emojis. The two classes are generated based on whether the Tweet has a misspelling. The Group A snippets misspell \"with\", while the Group B snippets don't misspell \"with\". I am a linguist studying English vernacular. My goal is to figure out the social context in which people are more likely to use a misspelled word. "}, {"+": {"uses informal language": {"p-value": 0.0009349588896811016, "V'": 0.10628386880391272}, "uses informal language such as slang or colloquialisms": {"p-value": 6.409566489994162e-10, "V'": 0.28105659117885906}, "uses informal language such as 'LOL' or 'LMAO'": {"p-value": 4.129810471026844e-05, "V'": 0.11985868808352365}, "contains informal language such as 'gonna' or 'wanna'": {"p-value": 0.00039607792127415017, "V'": 0.14988640733975253}, "uses informal language and slang": {"p-value": 7.94263409131706e-12, "V'": 0.3161700237281978}}, "-": {}, "research goal": "The dataset includes a collection of Tweets without emojis. The two classes are generated based on whether the Tweet has a misspelling. The Group A snippets misspell \"your\", while the Group B snippets don't misspell \"your\". I am a linguist studying English vernacular. My goal is to figure out the social context in which people are more likely to use a misspelled word. "}, {"+": {}, "-": {"emphasizes the need for a global anti-terrorist coalition": {"p-value": 7.7664732504292e-05, "V'": 0.0787786786250217}, "mentions the need for collective responsibility against terrorism": {"p-value": 0.0003270782536614217, "V'": 0.10369615289341623}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the year the speech was given. The Group A snippets are speeches from Russia between 2008 and 2012, while the Group B snippets are speeches from Russia between 2000 and 2008. I am a political scientist studying historical trends. My goal is to figure out the specific policy priorities and stances of countries over time. "}, {"+": {}, "-": {}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the year the speech was given. The Group A snippets are speeches from Russia between 2008 and 2012, while the Group B snippets are speeches from Russia after 2012. I am a political scientist studying historical trends. My goal is to figure out the specific policy priorities and stances of countries over time. "}, {"+": {}, "-": {"References the five principles of peaceful coexistence": {"p-value": 9.379963046827146e-05, "V'": 0.10163653557368652}, "mentions the need to uphold the principles of the United Nations Charter": {"p-value": 0.00013752346280400564, "V'": 0.2071147289253532}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the year the speech was given. The Group A snippets are speeches from China between 2000 and 2013, while the Group B snippets are speeches from China between 2013 and 2016. I am a political scientist studying historical trends. My goal is to figure out the specific policy priorities and stances of countries over time. "}, {"+": {}, "-": {}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the year the speech was given. The Group A snippets are speeches from China between 2013 and 2016, while the Group B snippets are speeches from China after 2016. I am a political scientist studying historical trends. My goal is to figure out the specific policy priorities and stances of countries over time. "}, {"+": {"stresses the need to confront terrorism and terrorist organizations": {"p-value": 7.717150251085255e-07, "V'": 0.13776438584804662}, "emphasizes the importance of the United Nations Charter": {"p-value": 4.1128880298303974e-05, "V'": 0.07195000877870128}, "mentions the importance of defeating terrorism": {"p-value": 7.238098619633103e-07, "V'": 0.11613856490824195}, "mentions the threats of terrorism and violence": {"p-value": 0.00020911086701498954, "V'": 0.10799340816977293}}, "-": {}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the year the speech was given. The Group A snippets are speeches from the United States between 2001 and 2008, while the Group B snippets are speeches from the United States between 2009 and 2016. I am a political scientist studying historical trends. My goal is to figure out the specific policy priorities and stances of countries over time. "}, {"+": {}, "-": {"emphasizes the importance of upholding national borders": {"p-value": 1.9708073240515317e-33, "V'": 0.3876915780307263}, "emphasizes the need for strong military defense": {"p-value": 2.6348585311947363e-09, "V'": 0.17347344005260423}, "emphasizes the need to protect national borders": {"p-value": 1.0834992690597518e-27, "V'": 0.32486709061946556}, "emphasizes the need for secure borders": {"p-value": 6.663781843689878e-07, "V'": 0.12455161593017311}, "mentions the need to protect national sovereignty and independence": {"p-value": 7.785342293494141e-10, "V'": 0.2830694684145948}, "discusses the need for fair and reciprocal trade": {"p-value": 3.7719984559811557e-07, "V'": 0.11585878734436338}, "mentions the need for strong economic sanctions on Iran": {"p-value": 2.6590442318232367e-06, "V'": 0.05266010673839446}, "mentions tariffs as a means to protect American interests": {"p-value": 0.0002574026452945424, "V'": 0.021582730768682987}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the year the speech was given. The Group A snippets are speeches from the United States between 2009 and 2016, while the Group B snippets are speeches from the United States after 2017. I am a political scientist studying historical trends. My goal is to figure out the specific policy priorities and stances of countries over time. "}, {"+": {"calls for an early and successful conclusion of negotiations between the Soviet Union and the United States on intermediate and strategic weapons": {"p-value": 3.829628776967863e-05, "V'": 0.034761877292149045}, "expresses support for the United Nations and its role in international conflict resolution": {"p-value": 2.872464536289711e-06, "V'": 0.13534360453397942}}, "-": {"calls for an end to the occupation of land by foreign forces": {"p-value": 3.852808595219794e-18, "V'": 0.23530877810690906}, "expresses support for the Palestinian people and their right to self-determination": {"p-value": 4.934476736223412e-13, "V'": 0.15475547961321148}, "condemns the actions of the Israeli government to annex Jerusalem and continue its policy of settlements": {"p-value": 2.8072390980254342e-15, "V'": 0.16904236577790446}, "urges for the resolution of the Palestinian cause": {"p-value": 5.632976566882718e-09, "V'": 0.11662529976652755}, "expresses support for the right to self-determination of the Palestinian people": {"p-value": 2.3112235959565316e-09, "V'": 0.11199138081730811}, "mentions the need for a just and comprehensive peace in the Middle East": {"p-value": 7.305076637244374e-12, "V'": 0.1671577379785949}, "calls for the recognition of the Palestinian people and the Palestine Liberation Organization": {"p-value": 9.200887135880055e-05, "V'": 0.05803215257141399}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel in the 80s, while the Group B snippets are speeches from members of the Arab League in the 80s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"Highlights the importance of the non-proliferation treaty": {"p-value": 0.00022891763048994636, "V'": 0.03292720474687252}, "calls for rapid progress towards the elimination of nuclear weapons": {"p-value": 0.00013409421762684712, "V'": 0.02037068705914084}, "stresses the importance of international cooperation for sustainable development": {"p-value": 3.0189275950874844e-08, "V'": 0.1132546297526523}}, "-": {"calls for the implementation of Security Council resolutions in the Middle East": {"p-value": 1.9226667766415314e-09, "V'": 0.049637145582428696}, "calls for the withdrawal of foreign military forces from the region": {"p-value": 5.311246210433806e-05, "V'": 0.021576985265587875}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel in the 90s, while the Group B snippets are speeches from members of the Arab League in the 90s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"expresses support for United Nations-led peacekeeping operations": {"p-value": 9.072261916800542e-12, "V'": 0.1466522322483584}, "encourages the United Nations to maintain a strong presence in the region": {"p-value": 3.2169078938393713e-06, "V'": 0.10409887390856518}, "mentions the need to address the spread of weapons of mass destruction": {"p-value": 2.467809776651994e-06, "V'": 0.05141114160703139}, "calls for the implementation of the Good Friday Agreement": {"p-value": 1.038575238586174e-06, "V'": 0.042839369974068564}, "mentions the importance of democratic governance, civil society and the rule of law": {"p-value": 0.0007153912817829824, "V'": 0.07456631411407078}}, "-": {"urges for the withdrawal of Israeli forces from the occupied territories": {"p-value": 1.4242030733801184e-10, "V'": 0.08826564772887556}, "refers to the peace process in the Middle East as a Syrian national priority": {"p-value": 2.5040404583599774e-15, "V'": 0.10858281468911166}, "calls for a just solution to the Palestinian refugee crisis": {"p-value": 3.723408686587322e-06, "V'": 0.05356028646746449}, "emphasizes the right of people under foreign occupation to resist occupation, in accordance with international law and the Charter of the United Nations": {"p-value": 8.119293517282432e-20, "V'": 0.14500863227188932}, "calls for a fair and just solution to the political problems in the Middle East": {"p-value": 1.1100703094880964e-09, "V'": 0.13159943562860943}, "mentions the need to end the illegal occupation of the Palestinian territories": {"p-value": 3.369540333885973e-15, "V'": 0.125178279148226}, "calls for Israel to listen to the international community on settlement issues": {"p-value": 4.1385407654005753e-11, "V'": 0.11003647125650443}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel from 2000-2009, while the Group B snippets are speeches from members of the Arab League from 2000-2009. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"emphasizes the value of collective action through the United Nations to address global challenges": {"p-value": 5.071749403178044e-22, "V'": 0.2661571526612601}, "mentions the importance of gender equality and women's empowerment": {"p-value": 1.0946877542648232e-08, "V'": 0.050532328342588306}, "highlights the importance of the European Union in promoting collective action": {"p-value": 3.59576832543016e-05, "V'": 0.04330690922250841}, "expresses support for the United Nations and its role in promoting and protecting human rights": {"p-value": 1.5680570876482694e-24, "V'": 0.29029129658237585}}, "-": {"calls for the end of occupation in Palestine": {"p-value": 1.7365944194913752e-10, "V'": 0.12915289185815132}, "condemns unilateral actions taken by Israel to perpetuate its occupation of the region": {"p-value": 1.2982285609993507e-15, "V'": 0.1586249118835512}, "mentions the importance of national autonomy and self-determination": {"p-value": 1.2542859382386605e-06, "V'": 0.11409239494118759}, "calls for the end of unilateral actions that seek to pre-empt negotiations": {"p-value": 3.519453397839336e-10, "V'": 0.14648101114084852}, "calls for the end of the occupation of Palestine by the occupying Power": {"p-value": 1.6416254585941545e-11, "V'": 0.12930123565283103}, "expresses concern about the continued occupation of Palestinian territories": {"p-value": 3.663525734921525e-09, "V'": 0.13196012633540982}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel in the 2010s, while the Group B snippets are speeches from members of the Arab League in the 2010s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"emphasizes the importance of international co-operation and trust to achieve peace and security": {"p-value": 0.0002302124619584445, "V'": 0.10033865607628623}, "calls for a political solution to the conflict in Northern Ireland": {"p-value": 5.2058795974239165e-15, "V'": 0.07857062989037311}, "calls for a comprehensive test-ban treaty or moratorium on all nuclear tests": {"p-value": 0.0009244562581220877, "V'": 0.017376895463277194}}, "-": {"stresses the importance of the right of the Afghan people to self-determination": {"p-value": 0.0001600133275198867, "V'": 0.04537772440605382}, "Condemns the Israeli regime's disregard for international law and human rights": {"p-value": 2.1410181376562752e-16, "V'": 0.16423679557067566}, "calls for international co-operation in a framework of peace to make the Gulf region a zone free of international conflicts": {"p-value": 2.6274118138598888e-08, "V'": 0.09614042847704862}, "emphasizes the need for a resolution to the Arab-Israeli conflict": {"p-value": 8.190512268042822e-07, "V'": 0.11228278524668082}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel in the 80s, while the Group B snippets are speeches from members of the Gulf Cooperation Council in the 80s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"acknowledges the importance of the Anglo-Irish Agreement to securing peace": {"p-value": 4.652682992959379e-24, "V'": 0.12858168521498614}, "calls for proper funding of United Nations peacekeeping operations": {"p-value": 0.000628012920540955, "V'": 0.029560305413907653}}, "-": {"calls for the implementation of Security Council resolutions regarding Iraq": {"p-value": 2.07768155916984e-11, "V'": 0.07475650413001989}, "advocates for the implementation of Security Council resolutions related to Iraq": {"p-value": 8.425869815088741e-11, "V'": 0.08924239944041923}, "argues for consideration of particular circumstances for developing countries when implementing new agreements": {"p-value": 1.2735649382294929e-05, "V'": 0.0663862913531471}, "expresses support for the Middle East Peace Process": {"p-value": 7.15058047553611e-07, "V'": 0.10154502234654539}, "highlights the need for increased economic solidarity among Gulf Cooperation Council members": {"p-value": 0.00017445702738024492, "V'": 0.035786832217852144}, "urges friendly countries to support the accession procedures of the International Organization": {"p-value": 2.751186728707979e-05, "V'": 0.043058308326331486}, "calls for the implementation of United Nations Security Council resolutions": {"p-value": 1.4912169535280029e-06, "V'": 0.09175008723380965}, "Focuses on the need for economic development and stability in the region": {"p-value": 1.7268866106972162e-08, "V'": 0.12595866737165579}, "calls for the full withdrawal of Israeli forces from occupied territories": {"p-value": 6.6580097115545135e-09, "V'": 0.03978208682440786}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel in the 90s, while the Group B snippets are speeches from members of the Gulf Cooperation Council in the 90s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"Highlights the importance of the United Nations in international peace and security": {"p-value": 3.434467678976419e-05, "V'": 0.09211492952661932}}, "-": {"supports the right of return of Palestinian refugees": {"p-value": 0.00025195827180654216, "V'": 0.02466250380632586}, "Calls for a just and comprehensive settlement of the Middle East conflict": {"p-value": 1.5364555444153926e-09, "V'": 0.10436726165725295}, "emphasizes the need for strong diplomatic ties with other Arab countries": {"p-value": 9.605899455296434e-08, "V'": 0.045104545128890325}, "Highlights the importance of security and peace in the Middle East": {"p-value": 3.3620675039815907e-15, "V'": 0.16765462022272706}, "expresses support for a two-state solution between Israel and Palestine": {"p-value": 0.0005285045696174711, "V'": 0.02805596321973599}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel from 2000-2009, while the Group B snippets are speeches from members of the Gulf Cooperation Council from 2000-2009. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"expresses support for the United Nations Charter and international law": {"p-value": 1.0310492185136256e-05, "V'": 0.1263489004968243}}, "-": {"condemns the use of religion for political purposes by extremist groups": {"p-value": 0.0007370465428613914, "V'": 0.04568185150256545}, "advocates for the resolution of conflicts in the Middle East": {"p-value": 1.90068801022695e-14, "V'": 0.20135703445219724}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Israel in the 2010s, while the Group B snippets are speeches from members of the Gulf Cooperation Council in the 2010s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"Recognizes the need to evolve a broad-based system for managing the world economy": {"p-value": 2.642222518274018e-06, "V'": 0.19526671328886952}}, "-": {"Stresses the importance of regional cooperation and understanding between countries in South Asia": {"p-value": 0.00030649378277398666, "V'": 0.15038236936146826}, "supports the right of the Afghan people to self-determination": {"p-value": 1.330671563998536e-08, "V'": 0.15324710992867213}, "calls for the withdrawal of Soviet troops from Afghanistan": {"p-value": 1.8060523556500172e-08, "V'": 0.12161169268822952}, "calls for the complete withdrawal of Soviet troops from Afghanistan": {"p-value": 5.050388954494611e-06, "V'": 0.09220307509805042}, "calls for immediate withdrawal of foreign troops from Afghanistan": {"p-value": 2.738485589117204e-06, "V'": 0.09593459897800825}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from India in the 80s, while the Group B snippets are speeches from Pakistan in the 80s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"expresses support for the work and objectives of the United Nations": {"p-value": 2.90072698224627e-05, "V'": 0.11462542362160733}, "emphasizes the need to strengthen the United Nations in order to effectively address global issues": {"p-value": 1.7934410255672386e-07, "V'": 0.11037648826081842}, "urges the international community to come together to defend itself against terrorism": {"p-value": 0.0008583863280654588, "V'": 0.031497169890858545}}, "-": {"calls for the recognition of the right of self-determination of the Kashmiri people": {"p-value": 7.865539060697855e-11, "V'": 0.06997447301158168}, "mentions the need to promote dialogue and understanding between India and Pakistan": {"p-value": 3.2315251411859984e-05, "V'": 0.04453545384861598}, "calls for the implementation of a nuclear-weapons-free zone in South Asia": {"p-value": 0.0005883550710505858, "V'": 0.020356466504173738}, "highlights the need for a South Asia nuclear-free zone": {"p-value": 2.37250761069083e-05, "V'": 0.030534159941572936}, "urges international community to support and facilitate a solution to the Kashmir issue": {"p-value": 1.1519302151504699e-12, "V'": 0.09017513230298708}, "calls for the United Nations to take cognizance of disputes between two states": {"p-value": 3.298555986205849e-10, "V'": 0.10688520322658965}, "expresses support for Kashmiri self-determination": {"p-value": 2.3315091842818405e-17, "V'": 0.11408657772197368}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from India in the 90s, while the Group B snippets are speeches from Pakistan in the 90s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"emphasizes the need to reform international economic governance": {"p-value": 1.2552627948973e-07, "V'": 0.11403347339534706}}, "-": {"advocates for the resolution of international disputes affecting Muslims": {"p-value": 2.0131790075135237e-15, "V'": 0.1683830542259985}, "advocates for the implementation of a comprehensive strategy against terrorism": {"p-value": 0.00014968399553572657, "V'": 0.07743650978273174}, "Highlights the need for increased security cooperation between India and Pakistan": {"p-value": 0.0003030096691324053, "V'": 0.036734957011723905}, "calls for the resolution of international disputes that affect Muslims": {"p-value": 1.7971173652741493e-14, "V'": 0.1601022942290965}, "advocates for the resolution of political disputes involving Muslim peoples": {"p-value": 1.577059872851197e-17, "V'": 0.19423456775259332}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from India from 2000-2009, while the Group B snippets are speeches from Pakistan from 2000-2009. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"expresses commitment to the implementation of the Sustainable Development Goals": {"p-value": 0.0006114361827659296, "V'": 0.11402874822468248}, "calls for global partnership to meet Sustainable Development Goals": {"p-value": 0.000318265944586215, "V'": 0.11205828784758873}, "calls on the international community to support India's efforts to achieve a meaningful and equitable agreement on climate change": {"p-value": 0.00016605275947467976, "V'": 0.054598263910721596}}, "-": {"calls for the implementation of United Nations Security Council Resolutions on self-determination": {"p-value": 0.0008820857500481679, "V'": 0.07111154386092186}, "mentions the rights of self-determination of the Kashmiri people": {"p-value": 4.3699000013541776e-05, "V'": 0.06760671030867459}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from India in the 2010s, while the Group B snippets are speeches from Pakistan in the 2010s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"emphasizes the need for a permanent peace structure between the two Koreas": {"p-value": 5.482066530589696e-10, "V'": 0.06283548719239825}, "Prioritizes dialogue and reconciliation between the South and North of Korea": {"p-value": 6.954851909547175e-11, "V'": 0.08615653472149783}, "calls for denuclearization of the Korean Peninsula": {"p-value": 8.104369848495268e-06, "V'": 0.0431094647787499}}, "-": {}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Korea in the 90s, while the Group B snippets are speeches from Japan in the 90s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"mentions inter-Korean economic interdependence as a way to resolve the political confrontation between North and South": {"p-value": 0.0006221881682133554, "V'": 0.02119439996875949}, "encourages the denuclearization of the Korean peninsula": {"p-value": 6.302177152556003e-10, "V'": 0.08945949638225549}, "calls for the denuclearization of the Korean peninsula": {"p-value": 0.0003356608268588946, "V'": 0.035141848887037926}, "mentions the need for denuclearization of the Korean peninsula": {"p-value": 3.258683160339318e-05, "V'": 0.048904979423577716}, "Recommends strategies for peaceful resolution of conflicts on the Korean peninsula": {"p-value": 6.522270513140123e-05, "V'": 0.049134730013371763}, "supports the disarmament and non-proliferation of biological and chemical weapons": {"p-value": 0.00011106753431937288, "V'": 0.08973790726005998}, "encourages the reduction of nuclear arsenals": {"p-value": 0.0001963121490009331, "V'": 0.06602542045621632}, "highlights the importance of nuclear non-proliferation and the peaceful use of nuclear energy": {"p-value": 1.0223174778761608e-05, "V'": 0.08009068676123315}, "highlights the importance of non-proliferation efforts in the Asian region": {"p-value": 0.0008868868501584758, "V'": 0.06368899954187203}}, "-": {"highlights the role of Japan in promoting peace and security": {"p-value": 3.516682933487542e-50, "V'": 0.3388061896379088}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Korea from 2000-2009, while the Group B snippets are speeches from Japan from 2000-2009. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"Stresses the importance of developing and utilizing international institutions to maintain a world order": {"p-value": 1.1561909482590412e-07, "V'": 0.1790239618373356}, "calls for the international community to stand with Korea in tearing down the world's last remaining wall of division": {"p-value": 1.164534727542673e-07, "V'": 0.10206730127304535}, "mentions the role of the United Nations in protecting peace": {"p-value": 1.6042365180571284e-09, "V'": 0.2098471891351661}, "proposes the transformation of the demilitarized zone on the Korean peninsula into an international peace zone": {"p-value": 9.548531976155083e-07, "V'": 0.061797930937860825}}, "-": {"Highlights the need for gender equality in access to education and health care": {"p-value": 0.0009936622755303249, "V'": 0.046145949999073704}}, "research goal": "The dataset includes speeches from debates at the United Nations. The two classes are generated based on the country giving the speech. The Group A snippets are speeches from Korea in the 2010s, while the Group B snippets are speeches from Japan in the 2010s. I am a political scientist studying comparative government. My goal is to figure out the specific policy priorities and stances of each country. "}, {"+": {"uses dismissive language such as 'whatever'": {"p-value": 0.00014574614195468036, "V'": 0.06644938266187639}}, "-": {}, "research goal": "The dataset includes expert-annotated unhealthy conversations. The two classes are generated based on the type of unhealthy tone as annotated by experts. The Group A snippets are online messages that are dismissive, while the Group B snippets are online messages that are condescending. I am a couples therapist learning about different unhealthy conversation styles. My goal is to figure out what specific topics and actions define each tone. "}, {"+": {}, "-": {}, "research goal": "The dataset includes expert-annotated unhealthy conversations. The two classes are generated based on the type of unhealthy tone as annotated by experts. The Group A snippets are online messages that generalize, while the Group B snippets are online messages that generalize unfairly. I am a couples therapist learning about different unhealthy conversation styles. My goal is to figure out what specific topics and actions define each tone. "}, {"+": {"uses expletives or profanity": {"p-value": 3.676010723996173e-05, "V'": 0.07457121454029275}, "refers to other people in insulting terms": {"p-value": 0.00011531539872054932, "V'": 0.06533449660530699}, "refers to people in a derogatory way": {"p-value": 0.00023199153945068654, "V'": 0.06770805746831099}, "uses aggressive language or insults": {"p-value": 3.551418881776798e-06, "V'": 0.06800967468095853}, "refers to other people with hostile language": {"p-value": 1.70299835792292e-05, "V'": 0.07291450437157987}, "Uses profanity or offensive language": {"p-value": 8.872225069205985e-06, "V'": 0.08278975519481155}}, "-": {}, "research goal": "The dataset includes expert-annotated unhealthy conversations. The two classes are generated based on the type of unhealthy tone as annotated by experts. The Group A snippets are online messages that are hostile, while the Group B snippets are online messages that are antagonizing. I am a couples therapist learning about different unhealthy conversation styles. My goal is to figure out what specific topics and actions define each tone. "}, {"+": {"Uses positive language": {"p-value": 1.9400093764612578e-13, "V'": 0.11294017235572805}, "Has a positive or cheerful tone": {"p-value": 1.0395270294071903e-10, "V'": 0.08625244369130378}, "mentions controversial topics": {"p-value": 0.00047323269757031366, "V'": 0.07477257109485175}}, "-": {}, "research goal": "The dataset includes definitions from UrbanDictionary.com, a crowdsourced English dictionary. The two classes are generated based on how many upvotes or downvotes the definition received. The Group A snippets are Urban Dictionary definitions in the top 1% of downvotes, while the Group B snippets are average Urban Dictionary definitions. I am a user of the site hoping to write popular definitions. My goal is to figure out what types of proposed definitions people like. "}, {"+": {"mentions positive attributes of a person or thing": {"p-value": 4.303445236073637e-05, "V'": 0.07757323351268078}, "includes positive language": {"p-value": 0.00020714481483348303, "V'": 0.06662975374758132}, "mentions positive qualities of an individual": {"p-value": 8.96701573772972e-08, "V'": 0.08382264443129776}}, "-": {}, "research goal": "The dataset includes definitions from UrbanDictionary.com, a crowdsourced English dictionary. The two classes are generated based on how many upvotes or downvotes the definition received. The Group A snippets are Urban Dictionary definitions in the top 5% of upvotes, while the Group B snippets are Urban Dictionary definitions in the top 10% of upvotes. I am a user of the site hoping to write popular definitions. My goal is to figure out what types of proposed definitions people like. "}, {"+": {}, "-": {}, "research goal": "The dataset includes definitions from UrbanDictionary.com, a crowdsourced English dictionary. The two classes are generated based on how many upvotes or downvotes the definition received. The Group A snippets are Urban Dictionary definitions in the top 1% of upvotes, while the Group B snippets are Urban Dictionary definitions in the top 5% of upvotes. I am a user of the site hoping to write popular definitions. My goal is to figure out what types of proposed definitions people like. "}, {"+": {"expresses admiration or appreciation": {"p-value": 2.552621870670163e-11, "V'": 0.09374983930251286}, "praises the object of the definition": {"p-value": 2.1068879070887654e-15, "V'": 0.1283006223780111}}, "-": {}, "research goal": "The dataset includes definitions from UrbanDictionary.com, a crowdsourced English dictionary. The two classes are generated based on how many upvotes or downvotes the definition received. The Group A snippets are Urban Dictionary definitions in the top 1% of upvotes, while the Group B snippets are average Urban Dictionary definitions. I am a user of the site hoping to write popular definitions. My goal is to figure out what types of proposed definitions people like. "}, {"+": {"uses positive words or phrases": {"p-value": 3.642103545084297e-05, "V'": 0.08153982558583417}, "uses positive language or humor": {"p-value": 5.074337420147415e-05, "V'": 0.07468812871413438}}, "-": {}, "research goal": "The dataset includes definitions from UrbanDictionary.com, a crowdsourced English dictionary. The two classes are generated based on how many upvotes or downvotes the definition received. The Group A snippets are Urban Dictionary definitions in the top 5% of upvotes, while the Group B snippets are Urban Dictionary definitions in the top 10% of upvotes. I am a user of the site hoping to write popular definitions. My goal is to figure out what types of proposed definitions people like. "}, {"+": {}, "-": {}, "research goal": "The dataset includes descriptions of companies that were part of the Y Combinator startup incubator. The two classes are generated based on where the startup was founded. The Group A snippets are Y Combinator startup descriptions from the Bay Area, while the Group B snippets are Y Combinator startup descriptions outside the Bay Area. I am an aspiring entreprenuer deciding whether to move to the bay. My goal is to figure out how location influences the service or product offered. "}, {"+": {"provides an online personality test game": {"p-value": 0.0008844757551706632, "V'": 0.030769098038447674}}, "-": {}, "research goal": "The dataset includes descriptions of companies that were part of the Y Combinator startup incubator. The two classes are generated based on the operation status of the startup. The Group A snippets are Y Combinator startup descriptions that are dead, while the Group B snippets are Y Combinator startup descriptions that are still operating. I am a venture capital firm deciding which startups to fund. My goal is to figure out what services or products are more likely to succeed. "}, {"+": {"allows users to share photos and memories": {"p-value": 6.164037201324806e-05, "V'": 0.11245590996958646}, "enables users to create and share content": {"p-value": 1.1302348716526026e-06, "V'": 0.2859152861437422}}, "-": {}, "research goal": "The dataset includes descriptions of companies that were part of the Y Combinator startup incubator. The two classes are generated based on the operation status of the startup. The Group A snippets are Y Combinator startup descriptions that have exited, while the Group B snippets are Y Combinator startup descriptions that are still operating. I am a venture capital firm deciding which startups to fund. My goal is to figure out what services or products are more likely to succeed. "}, {"+": {}, "-": {"uses cloud-based technology": {"p-value": 5.588113549047489e-09, "V'": 0.266055884876257}, "Involves cloud-based technology": {"p-value": 1.6893565886245985e-12, "V'": 0.3362767618244648}, "mentions cloud-based services": {"p-value": 2.183856051548368e-05, "V'": 0.18079324706330965}, "mentions internet of things technology": {"p-value": 2.2335930021136893e-08, "V'": 0.26140874401754355}, "mention cloud-based services": {"p-value": 2.662438102358566e-06, "V'": 0.20679791619468815}}, "research goal": "The dataset includes descriptions of companies that were part of the Y Combinator startup incubator. The two classes are generated based on when the startup was founded. The Group A snippets are Y Combinator startup descriptions from before 2013, while the Group B snippets are Y Combinator startup descriptions from after 2013. I am a venture capital firm studying trends in the industry. My goal is to figure out which products and services are most exciting today. "}, {"+": {}, "-": {"uses long sentences with complex syntax.": {"p-value": 5.847644610798643e-10, "V'": 0.15953322181701968}, "asks questions with a lot of qualifiers": {"p-value": 0.00048298877159170357, "V'": 0.10425575551843624}, "contains complex sentence structures, such as subordinate clauses.": {"p-value": 6.958479775897332e-10, "V'": 0.1883181912002589}, "uses long, complicated sentences with many clauses.": {"p-value": 2.8262863090610674e-06, "V'": 0.12026209738589483}, "incorporates long words that are difficult to pronounce.": {"p-value": 0.00027504476759000377, "V'": 0.0780710263217403}, "uses long and complex syntax sentences.": {"p-value": 1.8409867480003226e-08, "V'": 0.15711000377599016}, "contains incorrect information or false statements.": {"p-value": 6.854995797124224e-23, "V'": 0.29535238895986543}, "contains complex language, such as double negatives, difficult vocabulary, and subject-specific terms.": {"p-value": 2.8960785896661072e-08, "V'": 0.1579589187592848}, "uses complex language, such as long and technical words.": {"p-value": 0.0001987065075464607, "V'": 0.11251620750196872}, "contains complex grammar, such as long phrases or clauses.": {"p-value": 2.0486287693216726e-11, "V'": 0.20111232130601958}}, "research goal": "The dataset includes the input of the task1195_disflqa_disfluent_to_fluent_conversion task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Given a disfluent sentence, modify the sentence to it to its equivalent fluent form, preserving the meaning of the sentence.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses long sentences, with clauses and conjunctions.": {"p-value": 1.317640191093877e-10, "V'": 0.15592268243983376}, "contains words with multiple meanings.": {"p-value": 1.1315713348893372e-08, "V'": 0.08277070693012717}, "uses language that is too informal for the context": {"p-value": 9.741470839096511e-05, "V'": 0.04042778185044775}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1195_disflqa_disfluent_to_fluent_conversion task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Given a disfluent sentence, modify the sentence to it to its equivalent fluent form, preserving the meaning of the sentence.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses complex sentence structures.": {"p-value": 4.460324593520821e-13, "V'": 0.21322997801927668}, "contains complex language and sentence structures.": {"p-value": 3.476724499696047e-06, "V'": 0.14236397947034268}, "contains long and complex sentences.": {"p-value": 8.099123742553821e-13, "V'": 0.21421362315502585}, "uses long sentences and complex grammar structures.": {"p-value": 1.4232885389982898e-10, "V'": 0.1942177331363943}, "refers to many historical events and figures.": {"p-value": 0.00020048884096125722, "V'": 0.07948357213974663}, "explores historical events.": {"p-value": 0.00010095377636545687, "V'": 0.10296291815816283}, "uses long, complicated sentences.": {"p-value": 9.476013483539285e-10, "V'": 0.16213614311181973}, "contains questions that are difficult to answer.": {"p-value": 4.631320179939734e-08, "V'": 0.14105155338142028}, "uses complex sentence structure and grammar.": {"p-value": 1.819310058456305e-11, "V'": 0.19246531114071663}}, "research goal": "The dataset includes the outputs generated by Curie on the task1195_disflqa_disfluent_to_fluent_conversion task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Given a disfluent sentence, modify the sentence to it to its equivalent fluent form, preserving the meaning of the sentence.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"Describes a team, such as a sports team or a business team.": {"p-value": 1.2626092484815402e-16, "V'": 0.19615357176460163}, "contains information about a team's manager, club, and season.": {"p-value": 8.629761265104518e-15, "V'": 0.13076931462917557}, "mentions a number of people and organizations.": {"p-value": 0.000949271173441193, "V'": 0.10217746361391578}, "focuses on information related to sports teams and their members.": {"p-value": 2.1011642607577585e-24, "V'": 0.23461537882520972}}, "-": {"contains technical terms, like airport information (runway length, elevation above sea level, etc.).": {"p-value": 1.77519137780497e-08, "V'": 0.0846156396344093}, "describes university affiliations and programs, such as European University Association and School of Business and Social Sciences at the Aarhus University.": {"p-value": 6.749518059749682e-06, "V'": 0.04423058344645688}, "uses a lot of jargon or complex language.": {"p-value": 7.206193922018918e-10, "V'": 0.12614948891986627}, "requires background knowledge to understand the context of the sentence.": {"p-value": 6.946681595106386e-13, "V'": 0.214093900072081}, "uses complex sentence structures with multiple clauses.": {"p-value": 5.419600523601736e-19, "V'": 0.2706414344932015}, "uses detailed descriptions, such as ingredients for a dish or surface type for a runway.": {"p-value": 2.138102478233991e-06, "V'": 0.09038472083517635}, "contains a lot of specific information about location, numbers, and dates.": {"p-value": 3.8753298240894515e-06, "V'": 0.14066855402369915}, "References a specific country or nation.": {"p-value": 1.7945983404806683e-11, "V'": 0.19038814868169862}, "involves multiple countries and locations.": {"p-value": 2.8771177745480024e-13, "V'": 0.22282592334216245}, "contains references to multiple countries or regions.": {"p-value": 3.208691786515125e-11, "V'": 0.2009734956438871}, "contains complex language.": {"p-value": 7.80505748346226e-08, "V'": 0.16537064653604355}, "contains complex concepts, such as architectural styles or currency.": {"p-value": 7.517942935152451e-05, "V'": 0.10863050771527924}, "Includes difficult words.": {"p-value": 1.963106651928422e-06, "V'": 0.1393559809311219}, "Involves detailed descriptions.": {"p-value": 0.00011364766116575092, "V'": 0.04423041532142463}, "uses a complex language structure.": {"p-value": 1.2677904420239726e-12, "V'": 0.2136736174290626}, "contains references to specific cultures, countries, or regions.": {"p-value": 2.54128292597258e-06, "V'": 0.11538478047199541}}, "research goal": "The dataset includes the input of the task1728_web_nlg_data_to_text task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You will be given one or more triples. The second part of each triple shows the relation between the first and the third element. Your task is to write a simple and short piece of text (sentence(s)) that describes the triples in natural language.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"describes sports teams, such as A.S. Gubbio 1910.": {"p-value": 7.04444991131638e-28, "V'": 0.24802032953038158}}, "-": {"Refers to specific geographical locations.": {"p-value": 0.00026212600536726306, "V'": 0.08807609654895021}, "involves some geographic or cultural knowledge.": {"p-value": 4.758377680200708e-10, "V'": 0.16385635922996356}, "mentions specific locations, such as airports and monuments.": {"p-value": 0.00042657419725198063, "V'": 0.1068090518575886}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1728_web_nlg_data_to_text task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You will be given one or more triples. The second part of each triple shows the relation between the first and the third element. Your task is to write a simple and short piece of text (sentence(s)) that describes the triples in natural language.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses a lot of technical language, such as 'AIP advances'.": {"p-value": 0.0004747674203342266, "V'": 0.022569722264842173}}, "-": {"uses technical language and terminology.": {"p-value": 9.24830535933773e-17, "V'": 0.20522255417477203}, "involves geographical locations, such as states, countries, and cities.": {"p-value": 1.8879655193041463e-57, "V'": 0.44163800284053306}, "includes a lot of geographical information.": {"p-value": 1.9628461903836113e-35, "V'": 0.36090026275536646}, "contains a lot of factual information about people, places, and events.": {"p-value": 1.5364038837295607e-48, "V'": 0.40454129963364377}, "uses a descriptive style, with a focus on details.": {"p-value": 7.633016020222251e-36, "V'": 0.3609297094318495}, "talks about a geographical location, such as an island or a country.": {"p-value": 1.131838458570791e-56, "V'": 0.4434297850201442}, "uses precise language to describe objects and locations.": {"p-value": 3.897990652821421e-58, "V'": 0.41148607582782426}, "contains complex language and structure.": {"p-value": 4.1385287227225896e-22, "V'": 0.22449287985180275}, "Uses technical language or jargon.": {"p-value": 7.576284818049775e-10, "V'": 0.12624910427690866}, "contains a lot of factual information.": {"p-value": 1.2456911892576965e-43, "V'": 0.3966378698943211}, "contains long sentences.": {"p-value": 4.182914321253901e-40, "V'": 0.3829834607306647}, "uses complex sentence structures with multiple clauses.": {"p-value": 5.90680731347043e-41, "V'": 0.38456932612649863}, "Uses a lot of geographical information.": {"p-value": 3.0137211650217216e-38, "V'": 0.37469345710873164}, "uses long sentences that describe or explain complex topics.": {"p-value": 9.019647445426984e-11, "V'": 0.17139621977503028}, "generally uses complex language.": {"p-value": 6.343834132605399e-13, "V'": 0.13942779245454628}}, "research goal": "The dataset includes the outputs generated by Curie on the task1728_web_nlg_data_to_text task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You will be given one or more triples. The second part of each triple shows the relation between the first and the third element. Your task is to write a simple and short piece of text (sentence(s)) that describes the triples in natural language.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the input of the task102_commongen_sentence_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses complex sentence structures.": {"p-value": 4.981525582717626e-10, "V'": 0.12074564543516117}, "contains complex sentences.": {"p-value": 0.0003057458345258649, "V'": 0.11234980225825719}, "contains complex sentence structures.": {"p-value": 2.0045393688684186e-08, "V'": 0.12986777135096533}}, "research goal": "The dataset includes the outputs generated by tk11b on the task102_commongen_sentence_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses long and complex sentences.": {"p-value": 8.44950303443642e-13, "V'": 0.20325150197105615}, "involves a person interacting with a physical object.": {"p-value": 7.577191179655795e-07, "V'": 0.15151088681981983}, "uses complex and long sentences.": {"p-value": 1.8383557820914137e-06, "V'": 0.10135010251896082}, "uses long and complex sentences": {"p-value": 2.716645679278479e-13, "V'": 0.20400878741605072}, "uses complex sentence structure with multiple clauses.": {"p-value": 3.017150333327998e-12, "V'": 0.1492096155255152}, "uses long sentences.": {"p-value": 5.627235679185825e-06, "V'": 0.0736821495068906}}, "-": {}, "research goal": "The dataset includes the outputs generated by Curie on the task102_commongen_sentence_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the input of the task569_recipe_nlg_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you have to generate the title of the recipe given its required ingredients and directions.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"has a variety of ingredients.": {"p-value": 5.09326597939767e-05, "V'": 0.09465813733423553}, "uses descriptive words.": {"p-value": 1.3900848595004759e-05, "V'": 0.13044131031466638}, "uses descriptive language.": {"p-value": 6.224419017671547e-06, "V'": 0.09668982294287111}, "uses a variety of words.": {"p-value": 0.0001820529908730131, "V'": 0.05231013003294375}}, "research goal": "The dataset includes the outputs generated by tk11b on the task569_recipe_nlg_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you have to generate the title of the recipe given its required ingredients and directions.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"contains complex language.": {"p-value": 0.0009110035833220563, "V'": 0.03838639494570155}, "uses long sentences.": {"p-value": 0.0005325241178288482, "V'": 0.035151712921729035}, "has descriptive and visual language.": {"p-value": 5.378877866401174e-06, "V'": 0.12039896410652842}, "uses complex language structure.": {"p-value": 8.047862513996766e-05, "V'": 0.040582477942658346}, "uses complex language with a high level of vocabulary.": {"p-value": 0.0008707719973325735, "V'": 0.03166977921009158}}, "-": {}, "research goal": "The dataset includes the outputs generated by Curie on the task569_recipe_nlg_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you have to generate the title of the recipe given its required ingredients and directions.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"contains details and descriptions.": {"p-value": 0.0009888454652029553, "V'": 0.10153923518854091}, "contains long sentences with complex grammar.": {"p-value": 3.5317396297596573e-05, "V'": 0.12780433869800917}, "contains factual information and opinion-based statements.": {"p-value": 1.1878740561442345e-05, "V'": 0.1349062881827109}, "contains entities which indicate a certain location": {"p-value": 0.0009990735846712138, "V'": 0.09399159273884372}, "refers to a well known or popular business, such as a restaurant or movie theater": {"p-value": 1.4158248773162903e-09, "V'": 0.18035160119372823}, "refers to a certain price range": {"p-value": 4.1269587946478006e-10, "V'": 0.15633509147162825}, "mentions a customer rating, such as 1 out of 5 stars": {"p-value": 0.00017663454535150495, "V'": 0.09854265787091768}, "contains detailed descriptions of the environment and location.": {"p-value": 6.298108507685052e-11, "V'": 0.14180283448576803}, "contains multiple entities, such as a person and an organization.": {"p-value": 0.0008994884281546315, "V'": 0.0960087308588794}, "focuses on a particular location such as a restaurant, pub, or cafe.": {"p-value": 6.6918169711643e-11, "V'": 0.1991788503837647}, "uses long sentences.": {"p-value": 1.899301449943352e-09, "V'": 0.18207488318339138}, "describes places and locations.": {"p-value": 5.482731029503836e-05, "V'": 0.11913868578261755}}, "-": {"describes an event with a certain date": {"p-value": 1.5332106412751608e-05, "V'": 0.10484409691381333}, "contains knowledge about sports teams and players.": {"p-value": 0.0007597290551974988, "V'": 0.08178833307468814}, "requires knowledge about a specific event, such as a sports game or movie release.": {"p-value": 9.008156103568164e-06, "V'": 0.10136405963168549}}, "research goal": "The dataset includes the input of the task1409_dart_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given triplets. Each triplet is in the form of [subject, predicate, object]. Your task is to generate proper sentence that utilizes these triples. The objective is to construct a sentence that (a) captures the facts specified in the triples and (b) is a well-formed sentence easily understandable by a human. All triple values need not be used directly in the sentence as long as the facts are adequately captured.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"contains long sentences.": {"p-value": 9.35383964971226e-13, "V'": 0.2183933188338783}, "uses complex grammar.": {"p-value": 3.216406137650956e-06, "V'": 0.10905443891524551}, "uses accurate and detailed facts.": {"p-value": 0.00042982000195617635, "V'": 0.0694089259598103}, "uses descriptive language.": {"p-value": 3.423499009509389e-18, "V'": 0.2602546887776933}, "uses complex language and sophisticated grammar": {"p-value": 2.8894258825813183e-05, "V'": 0.08610591349693496}, "incorporates multiple, complex facts into one sentence": {"p-value": 1.0112361532940825e-13, "V'": 0.2264005746677889}, "often contains long sentences.": {"p-value": 1.3466105961447809e-07, "V'": 0.1471290768439193}, "contains words that are specific to a certain culture or language.": {"p-value": 9.340394948236231e-09, "V'": 0.16051873827998844}, "expresses a degree of uncertainty or doubt": {"p-value": 0.0002912170915863937, "V'": 0.07066559286039432}, "uses a complex sentence structure with multiple clause structures.": {"p-value": 3.6124161611170556e-08, "V'": 0.1694098094280705}, "uses complex and long sentences.": {"p-value": 2.894463070583636e-10, "V'": 0.18876051411779082}, "uses long and complex sentences.": {"p-value": 6.052586633513649e-07, "V'": 0.13399057645359969}, "uses complex language structures, such as complex sentences.": {"p-value": 2.2532180081067465e-06, "V'": 0.14057068355798064}, "has long sentences.": {"p-value": 4.701117898801645e-10, "V'": 0.18901115206710928}}, "-": {"refers to a historical event or person.": {"p-value": 2.0679983213278708e-07, "V'": 0.14644809011006438}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1409_dart_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given triplets. Each triplet is in the form of [subject, predicate, object]. Your task is to generate proper sentence that utilizes these triples. The objective is to construct a sentence that (a) captures the facts specified in the triples and (b) is a well-formed sentence easily understandable by a human. All triple values need not be used directly in the sentence as long as the facts are adequately captured.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"mentions specific restaurants, coffee shops, and food items.": {"p-value": 4.94623167165853e-09, "V'": 0.17197845735928852}, "uses a lot of adjectives to describe the food, such as 'delicious' or 'cheap'.": {"p-value": 4.0734125077823e-07, "V'": 0.0793351461566684}, "mentions the names of specific restaurants and coffee shops.": {"p-value": 4.514037453304407e-06, "V'": 0.1299795927401015}}, "-": {"uses a formal language.": {"p-value": 2.0457186937811102e-11, "V'": 0.1824028777765052}, "uses long sentences": {"p-value": 2.030481700361576e-12, "V'": 0.20875891662975032}, "mentions a historical event or person.": {"p-value": 8.114925709431376e-14, "V'": 0.20235445344655867}, "Tries to be descriptive.": {"p-value": 0.0008349261058141347, "V'": 0.10188635971044707}, "uses proper nouns": {"p-value": 6.4342024211471e-08, "V'": 0.12049760618605387}, "contains complex language, such as long sentences and difficult words.": {"p-value": 1.7846009663354356e-06, "V'": 0.10557773199433464}, "uses long, complex sentence structures.": {"p-value": 4.815323432360397e-13, "V'": 0.21609768533332363}, "contains factual information.": {"p-value": 8.849770041841096e-15, "V'": 0.13356996336694527}, "uses complex language, such as long words and phrases.": {"p-value": 7.037853186493408e-08, "V'": 0.14658786653732794}}, "research goal": "The dataset includes the outputs generated by Curie on the task1409_dart_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given triplets. Each triplet is in the form of [subject, predicate, object]. Your task is to generate proper sentence that utilizes these triples. The objective is to construct a sentence that (a) captures the facts specified in the triples and (b) is a well-formed sentence easily understandable by a human. All triple values need not be used directly in the sentence as long as the facts are adequately captured.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"mentions ratings.": {"p-value": 4.911515318559698e-21, "V'": 0.2846155948806884}, "focuses on family-friendly restaurants.": {"p-value": 0.0005046910153180891, "V'": 0.0961540403216184}, "mentions family-friendly": {"p-value": 0.0005047213410848292, "V'": 0.09615362717833334}, "mentions ratings, such as '1 out of 5' or '5 out of 5'": {"p-value": 4.911824995408976e-21, "V'": 0.28461547804174786}}, "-": {"mentions a bad experience with the customer service.": {"p-value": 0.0002462646721819336, "V'": 0.048077121167816414}, "mentions a bad experience with the decor.": {"p-value": 1.2729004817624856e-09, "V'": 0.09423076206169119}, "contains adjectives.": {"p-value": 1.1212265049195086e-09, "V'": 0.16198088638552155}, "uses adjectives to describe the food, for example, 'delicious' or 'tasty'": {"p-value": 7.110285646800484e-07, "V'": 0.06730792949361301}, "mentions the location, such as 'Manhattan' or 'riverside'": {"p-value": 0.0003907970479303961, "V'": 0.10192300099450402}, "mentions cuisine, such as 'Chinese' or 'Italian'": {"p-value": 3.091409905104637e-05, "V'": 0.08846166989600246}, "uses flowery language, with descriptors such as 'acceptable' or 'bad' to describe the quality of the restaurant.": {"p-value": 1.4716463238220147e-15, "V'": 0.20859806390011984}, "mentions the location of the restaurant.": {"p-value": 4.846436490938422e-07, "V'": 0.11538460957340102}, "contains detailed descriptions of food quality and service.": {"p-value": 8.474569908127173e-16, "V'": 0.1403845031432873}}, "research goal": "The dataset includes the input of the task1598_nyc_long_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"The task is to write a full sentence or two using all of the information given. The sentence(s) will be a brief review of a restaurant. Use all of the information provided.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"Focuses on the quality of the food.": {"p-value": 0.00011624645413032864, "V'": 0.11299968884787492}, "uses complex sentence structures.": {"p-value": 7.586407375694415e-05, "V'": 0.052515554958665334}}, "-": {"mentions the good decor of the restaurant.": {"p-value": 1.8912755241881326e-10, "V'": 0.09401866546430306}, "mentions the decor of the restaurant.": {"p-value": 1.2787668665579495e-20, "V'": 0.18986701980526727}, "has complex grammar.": {"p-value": 0.0007796385311343625, "V'": 0.08727979034627961}, "contains complex language.": {"p-value": 8.648552756983946e-09, "V'": 0.08054908965771292}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1598_nyc_long_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"The task is to write a full sentence or two using all of the information given. The sentence(s) will be a brief review of a restaurant. Use all of the information provided.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"contains words or phrases that evoke negative sentiment.": {"p-value": 2.2984622660786233e-14, "V'": 0.12955002083394662}, "uses a lot of adjectives that describe the restaurant's quality.": {"p-value": 1.8594011771201646e-07, "V'": 0.10500085021632591}, "mentions the rating of the restaurant in a negative way.": {"p-value": 1.1537741190473537e-10, "V'": 0.09413145052340299}}, "research goal": "The dataset includes the outputs generated by Curie on the task1598_nyc_long_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"The task is to write a full sentence or two using all of the information given. The sentence(s) will be a brief review of a restaurant. Use all of the information provided.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"mentions difficult-to-understand concepts or abstract ideas.": {"p-value": 0.00017809436269377462, "V'": 0.07808471812169093}, "mentions a person with an established reputation, such as a famous writer or an actor": {"p-value": 5.9930581238440845e-05, "V'": 0.0839558375534255}, "discusses a controversial issue, such as Brexit or the glass ceiling": {"p-value": 3.474960754955778e-09, "V'": 0.13348199316037018}}, "-": {}, "research goal": "The dataset includes the input of the task1356_xlsum_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Generate an appropriate title for the given text. The generated title must be short and include the main topic of the text. The preferred titles are under fifteen words.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"mentions celebrities and/or scandals.": {"p-value": 1.824752102137865e-05, "V'": 0.1104837265792773}, "mentions famous celebrities or cultural icons.": {"p-value": 2.7593070813589135e-05, "V'": 0.10468437979649536}, "mentions a famous person or public figure": {"p-value": 1.9669542595108068e-06, "V'": 0.12324019623019644}, "mentions a famous celebrity or public figure.": {"p-value": 2.3453219710813295e-05, "V'": 0.10664398697186692}}, "-": {}, "research goal": "The dataset includes the outputs generated by tk11b on the task1356_xlsum_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Generate an appropriate title for the given text. The generated title must be short and include the main topic of the text. The preferred titles are under fifteen words.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses complex sentence structure and language.": {"p-value": 0.00018667987407082232, "V'": 0.11372967592992222}}, "-": {"mentions specific people or countries": {"p-value": 2.889948254426685e-13, "V'": 0.22054300880064548}, "mentions a specific location, such as London.": {"p-value": 1.9182747490541998e-08, "V'": 0.17261492080971136}, "mentions a specific person or organization": {"p-value": 5.533311093122986e-05, "V'": 0.128280970697707}, "talks about current events and scandals": {"p-value": 2.249408541999487e-07, "V'": 0.16250516662695347}, "mentions a news-worthy event or person.": {"p-value": 2.7221214111288735e-10, "V'": 0.19950336290552845}, "mentions a specific location, such as a city, region, or country.": {"p-value": 1.402245619006567e-07, "V'": 0.16693013448636018}, "mentions a natural disaster or emergency situation.": {"p-value": 0.00039665992968295133, "V'": 0.05319252498291736}}, "research goal": "The dataset includes the outputs generated by Curie on the task1356_xlsum_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Generate an appropriate title for the given text. The generated title must be short and include the main topic of the text. The preferred titles are under fifteen words.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the input of the task1659_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given a summary for US Congressional and California state bill, your task is to generate a Title for this bill. The preferred titles are under forty words and mention the purpose of the bill.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than Curie, while the Group B snippets task input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"includes technical words.": {"p-value": 3.337938965758098e-07, "V'": 0.13505084646161447}, "Uses complex terminology, such as legal and financial terms": {"p-value": 4.854818217629696e-18, "V'": 0.17121954398147565}, "focuses on policy changes, such as regulations and grant awards": {"p-value": 0.00018488097703353478, "V'": 0.0586691698222257}, "contains legal and/or technical terminology.": {"p-value": 6.478105695104878e-10, "V'": 0.1012225147069239}, "Uses legal jargon and terminology.": {"p-value": 2.9347079281182316e-20, "V'": 0.18502110400843552}, "discusses law, regulations, and/or court cases.": {"p-value": 1.0633843853238125e-33, "V'": 0.27228230831188216}, "uses a lot of legal and technical terms.": {"p-value": 2.1629554732291812e-19, "V'": 0.20889079010391987}, "uses technical terms": {"p-value": 2.596968952512857e-05, "V'": 0.12377274041455888}, "mentions a specific year in the title.": {"p-value": 3.604506989052565e-06, "V'": 0.13367267613443123}}, "-": {}, "research goal": "The dataset includes the outputs generated by tk11b on the task1659_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given a summary for US Congressional and California state bill, your task is to generate a Title for this bill. The preferred titles are under forty words and mention the purpose of the bill.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"mentions the need for a specific action to improve a certain situation.": {"p-value": 2.001363042482429e-06, "V'": 0.1396869658175427}}, "-": {"discusses specific topics.": {"p-value": 0.0005378766595479185, "V'": 0.03265620966184979}, "mentions legal and regulatory requirements.": {"p-value": 4.288532368942898e-08, "V'": 0.16280492783773703}, "Contains complex legal and financial language.": {"p-value": 8.395616683934215e-05, "V'": 0.1128189719125342}, "mentions a specific law or legislation, such as the National Environmental Policy Act of 1969.": {"p-value": 6.968050255130789e-18, "V'": 0.260974303496713}, "mentions specific laws or acts.": {"p-value": 4.634485805154726e-14, "V'": 0.22422023126849577}, "contains language related to legislation, such as bills and acts.": {"p-value": 2.402893350703576e-11, "V'": 0.1939332840096204}}, "research goal": "The dataset includes the outputs generated by Curie on the task1659_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given a summary for US Congressional and California state bill, your task is to generate a Title for this bill. The preferred titles are under forty words and mention the purpose of the bill.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by Curie on the datapoints where tk11b is better than Curie, while the Group B snippets outputs generated by Curie on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"is related to controversial topics or issues.": {"p-value": 1.0032155228130506e-19, "V'": 0.121722477358015}, "has an opinion on a controversial issue": {"p-value": 1.0145976836683069e-05, "V'": 0.04602827903289908}, "Uses emotionally charged language.": {"p-value": 1.3660175967736926e-44, "V'": 0.24467816699356462}, "involves controversial topics, such as the use of weapons.": {"p-value": 2.3539739541963657e-58, "V'": 0.3127728535793268}, "uses emotionally charged language, such as 'must be banned'": {"p-value": 1.0481425377904446e-72, "V'": 0.342031276555953}, "expresses an opinion on a controversial issue.": {"p-value": 3.360148182025925e-06, "V'": 0.04778999143393059}, "uses emotionally charged language, such as 'the worst of the worst'.": {"p-value": 2.251571465405842e-40, "V'": 0.1777374870568079}, "talks about controversial topics, such as nuclear weapons.": {"p-value": 3.1543121730510555e-43, "V'": 0.270089180901432}, "refers to a current event or issue.": {"p-value": 9.326470537417087e-08, "V'": 0.0984418608827457}, "involves moral and ethical issues.": {"p-value": 7.340373439149639e-18, "V'": 0.15484633913645085}, "expresses strong opinions on controversial topics.": {"p-value": 4.224287003134908e-10, "V'": 0.07106238757475358}, "involves topics related to human rights.": {"p-value": 7.975695952828508e-06, "V'": 0.08676422645058052}}, "-": {"talks about spending money in a positive or negative way.": {"p-value": 1.1571806515298039e-07, "V'": 0.05512369739015297}}, "research goal": "The dataset includes the input claim of the task738_perspectrum_classification task in the AI2-Natural Instruction dataset, where the ground truth label is support. The task definition is \n\n\"In this task you will be given a claim and a perspective. You should determine whether that perspective supports or undermines the claim. If the perspective could possibly convince someone with different view, it is supporting, otherwise it is undermining.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets input claim on the datapoints where tk11b is better than Curie, while the Group B snippets input claim on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"contains strong/emotional language.": {"p-value": 1.1956999927858587e-39, "V'": 0.23833532435438043}, "Uses emotionally charged language to evoke emotion or action.": {"p-value": 4.599681606106658e-23, "V'": 0.1578803007553194}, "mentions the difficulty in monitoring a growing area.": {"p-value": 0.0005052641849104196, "V'": 0.01322655459344016}, "contains moral judgements and opinions.": {"p-value": 5.2635552661900245e-09, "V'": 0.13397580539998988}, "argues against government policy or regulations.": {"p-value": 1.6926890446284216e-05, "V'": 0.07574515080331354}, "uses a moral standpoint, such as debates on beauty contests and animal welfare.": {"p-value": 3.031782938037752e-07, "V'": 0.11257866843168068}}, "-": {"mentions a global issue with a potential solution": {"p-value": 1.6592917363429683e-33, "V'": 0.2105351692765634}, "uses language related to government intervention and spending.": {"p-value": 0.00017194523417291612, "V'": 0.05914400147366351}, "emphasizes the importance of education for children.": {"p-value": 1.0779983247319457e-08, "V'": 0.0573782736796589}, "discusses the power of a shared culture to bring people together.": {"p-value": 6.664437500675089e-06, "V'": 0.02685115131445062}, "uses a positive or affirmative tone to support their claim.": {"p-value": 1.0718895641819226e-236, "V'": 0.6675620513923108}, "talks about the legal and fiscal benefits of marriage for gay couples.": {"p-value": 6.672640497693329e-07, "V'": 0.032546831491206844}}, "research goal": "The dataset includes the input perspective of the task738_perspectrum_classification task in the AI2-Natural Instruction dataset, where the ground truth label is support. The task definition is \n\n\"In this task you will be given a claim and a perspective. You should determine whether that perspective supports or undermines the claim. If the perspective could possibly convince someone with different view, it is supporting, otherwise it is undermining.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets input perspective on the datapoints where tk11b is better than Curie, while the Group B snippets input perspective on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses emotionally charged language": {"p-value": 1.784877976975165e-57, "V'": 0.35539881234170034}, "uses terms which are politically charged.": {"p-value": 4.359169034873078e-37, "V'": 0.2825448997523594}, "uses strong language to support a claim.": {"p-value": 3.1900861561111383e-06, "V'": 0.04894467317375384}, "uses emotional language to evoke a reaction.": {"p-value": 4.342137301204788e-19, "V'": 0.12899693182514677}, "focuses on the drawbacks of the current system.": {"p-value": 7.844574555416023e-108, "V'": 0.5008823828785198}}, "-": {"mentions advantages of a certain action": {"p-value": 1.8053126323379522e-33, "V'": 0.25040654629200143}}, "research goal": "The dataset includes the whole input of the task738_perspectrum_classification task in the AI2-Natural Instruction dataset, where the ground truth label is support. The task definition is \n\n\"In this task you will be given a claim and a perspective. You should determine whether that perspective supports or undermines the claim. If the perspective could possibly convince someone with different view, it is supporting, otherwise it is undermining.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets whole input on the datapoints where tk11b is better than Curie, while the Group B snippets whole input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"Uses language that is biased against one race/ethnicity.": {"p-value": 0.0002953611059163545, "V'": 0.029016610537400053}, "discusses current social issues, such as gender equality and smoking in public places.": {"p-value": 1.6805416574297504e-05, "V'": 0.0839906888444461}, "contains language that is critical of the current government.": {"p-value": 4.255646656707196e-11, "V'": 0.058513895006632365}, "touches on controversial topics, such as gender equality.": {"p-value": 0.00045140032394360444, "V'": 0.06484567909195232}, "refers to a controversial issue such as religion or abortion.": {"p-value": 5.531077560542097e-21, "V'": 0.18135520352256718}, "contains a lot of statistics or scientific evidence.": {"p-value": 6.0499185776004035e-06, "V'": 5.274992987281987e-07}, "contains emotionally charged language.": {"p-value": 8.736952423689563e-24, "V'": 0.1627046763398568}}, "-": {"Discusses international issues, such as the global economy or foreign relations.": {"p-value": 2.3533548309633353e-06, "V'": 0.08296138877183845}}, "research goal": "The dataset includes the input claim of the task738_perspectrum_classification task in the AI2-Natural Instruction dataset, where the ground truth label is undermine. The task definition is \n\n\"In this task you will be given a claim and a perspective. You should determine whether that perspective supports or undermines the claim. If the perspective could possibly convince someone with different view, it is supporting, otherwise it is undermining.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets input claim on the datapoints where tk11b is better than Curie, while the Group B snippets input claim on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"Uses language that implies a sense of obligation": {"p-value": 5.377751979630315e-06, "V'": 0.06651297207680419}, "uses positive language to promote the idea": {"p-value": 3.4373538709612496e-142, "V'": 0.4953449946601254}, "Uses language that is positive or uplifting": {"p-value": 4.180847047986255e-59, "V'": 0.24235436275510105}}, "-": {"mentions a lack of trust in the government.": {"p-value": 5.205636253263941e-12, "V'": 0.08268429162516754}, "Touches on controversial topics, such as racism or inequality.": {"p-value": 5.996841293275019e-28, "V'": 0.2357342500003019}, "discusses human rights violations.": {"p-value": 3.175804995658188e-08, "V'": 0.07769354818685864}, "Uses language which expresses doubt or uncertainty.": {"p-value": 9.979992260448755e-47, "V'": 0.2716515270086757}, "contains negative implications about a certain group of people.": {"p-value": 9.608812973035133e-27, "V'": 0.1938821866738097}, "Advocates for a change in the current system": {"p-value": 2.3630841553263237e-11, "V'": 0.14251084061169397}, "mentions controversial topics that are difficult to agree on": {"p-value": 1.6096763796926758e-51, "V'": 0.33588646636048974}, "uses scare tactics to make an argument.": {"p-value": 5.037990769407351e-56, "V'": 0.3165937337592861}, "uses rhetoric to strongly emphasize a point.": {"p-value": 1.4019381580309386e-19, "V'": 0.18371988136884107}, "uses strong emotional language": {"p-value": 2.727761678095209e-06, "V'": 0.04694262839993352}, "mentions traditionally controversial topics, such as religion.": {"p-value": 2.7580977867917934e-08, "V'": 0.0653714205936225}}, "research goal": "The dataset includes the input perspective of the task738_perspectrum_classification task in the AI2-Natural Instruction dataset, where the ground truth label is undermine. The task definition is \n\n\"In this task you will be given a claim and a perspective. You should determine whether that perspective supports or undermines the claim. If the perspective could possibly convince someone with different view, it is supporting, otherwise it is undermining.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets input perspective on the datapoints where tk11b is better than Curie, while the Group B snippets input perspective on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"talks about the economic impact of a decision": {"p-value": 6.250768356595056e-06, "V'": 0.06511032699372864}, "Presents a controversial topic from a biased point of view.": {"p-value": 1.9251216253449563e-07, "V'": 0.08511289772478203}, "Uses complex and long sentences.": {"p-value": 5.060601396408765e-07, "V'": 0.11988488928518126}, "uses words or phrases to convey strong emotions.": {"p-value": 0.00015000414132094398, "V'": 0.06016137852556794}}, "research goal": "The dataset includes the whole input of the task738_perspectrum_classification task in the AI2-Natural Instruction dataset, where the ground truth label is undermine. The task definition is \n\n\"In this task you will be given a claim and a perspective. You should determine whether that perspective supports or undermines the claim. If the perspective could possibly convince someone with different view, it is supporting, otherwise it is undermining.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets whole input on the datapoints where tk11b is better than Curie, while the Group B snippets whole input on the datapoints where tk11b is worse than Curie. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses complex words.": {"p-value": 1.4259751106598113e-06, "V'": 0.14495158790900914}, "uses long sentences.": {"p-value": 4.926031373086752e-06, "V'": 0.13791033929898827}, "contains uncommon words or phrases.": {"p-value": 1.7908579493102503e-11, "V'": 0.18220245117343037}, "contains long sentences.": {"p-value": 0.0006314368216237691, "V'": 0.09869939741614864}, "utilizes complex words.": {"p-value": 0.00037218983248283786, "V'": 0.10296763867456599}, "uses background knowledge.": {"p-value": 0.0004623290457399204, "V'": 0.09552109240803885}, "contains historically specific references.": {"p-value": 0.0009676071996072525, "V'": 0.0972949930127795}, "contains complex words and/or phrases.": {"p-value": 2.0624308352088303e-05, "V'": 0.1286686480232908}, "contains long, complex sentences with multiple subjects and objects.": {"p-value": 0.0002867072484659032, "V'": 0.09456755812513673}, "uses words with multiple meanings in different contexts.": {"p-value": 7.395904017475428e-06, "V'": 0.10140600266988231}, "contains long, complex sentences.": {"p-value": 7.301899744437962e-06, "V'": 0.13433773528338355}, "uses long sentences, containing many words and complex grammar.": {"p-value": 2.9302410939250078e-05, "V'": 0.11228274543635985}, "uses outdated language, i.e. words or phrases not commonly used today.": {"p-value": 2.0891558858303948e-11, "V'": 0.14514676462012746}, "contains complex sentence structures, such as multiple clauses or conjunctions.": {"p-value": 2.811591192458151e-07, "V'": 0.15753364999477293}, "uses long sentences with complex grammar.": {"p-value": 2.13096614376855e-07, "V'": 0.13673802992245554}, "uses long and complex sentences.": {"p-value": 1.230897244120873e-05, "V'": 0.11582437465804496}, "uses ambiguous pronouns, such as 'it' or 'they', without context.": {"p-value": 2.2913245826214658e-08, "V'": 0.17104838376702625}}, "research goal": "The dataset includes the input of the task1195_disflqa_disfluent_to_fluent_conversion task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Given a disfluent sentence, modify the sentence to it to its equivalent fluent form, preserving the meaning of the sentence.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses complex, long-form sentences.": {"p-value": 0.0009066749696520431, "V'": 0.08960709281715515}, "involves complex grammar structures.": {"p-value": 4.749011082955822e-08, "V'": 0.08925104623777966}, "uses complex sentence structures.": {"p-value": 0.00026954217925117494, "V'": 0.11223452526416838}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1195_disflqa_disfluent_to_fluent_conversion task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Given a disfluent sentence, modify the sentence to it to its equivalent fluent form, preserving the meaning of the sentence.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses long and complex sentences.": {"p-value": 2.3680266613398457e-32, "V'": 0.3542552387313706}, "focuses on historical events such as wars or battles.": {"p-value": 9.905616635563105e-05, "V'": 0.08701890246217737}, "uses a complex sentence structure with multiple clauses.": {"p-value": 1.450902633804913e-22, "V'": 0.2537982113454991}, "Refers to historical events, such as wars and annexations.": {"p-value": 0.0002064566119616994, "V'": 0.08893332906841306}, "involves questions that require detailed knowledge of a certain topic.": {"p-value": 1.1927041971782316e-15, "V'": 0.24188249910353882}, "involves multiple topics within the same sentence.": {"p-value": 5.617232177695762e-15, "V'": 0.1741091897010399}, "has long sentences with clauses and conjunctions.": {"p-value": 1.491716464595292e-11, "V'": 0.12667760818891327}, "focuses on history and the events that happened in the past.": {"p-value": 0.00016315303482070044, "V'": 0.10887646750532615}, "uses complex sentence structure with multiple clauses.": {"p-value": 6.191422418815236e-19, "V'": 0.22275731411068206}, "expresses complex ideas using multiple clauses.": {"p-value": 1.0953310669610163e-11, "V'": 0.12720214542978467}, "contain long, complex sentences with multiple clauses.": {"p-value": 1.1407984002207243e-21, "V'": 0.2259162371502182}}, "research goal": "The dataset includes the outputs generated by tk3b on the task1195_disflqa_disfluent_to_fluent_conversion task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Given a disfluent sentence, modify the sentence to it to its equivalent fluent form, preserving the meaning of the sentence.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses technical terms, such as ISSN_number or LCCN_number.": {"p-value": 2.96573920421695e-09, "V'": 0.09619535649686836}}, "-": {"mentions a specific country or location.": {"p-value": 2.6288770585323915e-06, "V'": 0.1288463190720014}, "mentions a specific location.": {"p-value": 3.3000151869305344e-07, "V'": 0.1346163589888334}, "features countries and their leaders": {"p-value": 3.6938433314459314e-10, "V'": 0.17115815026349984}, "contains specific information, such as cities and countries.": {"p-value": 3.518806856090007e-07, "V'": 0.12502244158595777}, "references multiple countries.": {"p-value": 1.0372251158691155e-06, "V'": 0.13234014027047075}, "uses geographic information, such as city and country names.": {"p-value": 1.1249963876498822e-07, "V'": 0.1403846549887805}, "contains questions about a place's location or demographics.": {"p-value": 2.0030542264976925e-09, "V'": 0.17695518354403383}, "mentions geographical locations.": {"p-value": 3.8226931124169416e-07, "V'": 0.13461544086296584}, "uses specific vocabulary related to a certain domain, such as sports, geography or history.": {"p-value": 2.1282679636978142e-05, "V'": 0.10396217709764}}, "research goal": "The dataset includes the input of the task1728_web_nlg_data_to_text task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You will be given one or more triples. The second part of each triple shows the relation between the first and the third element. Your task is to write a simple and short piece of text (sentence(s)) that describes the triples in natural language.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"makes use of numerical information, such as dates, statistics, and measurements.": {"p-value": 2.151981427407103e-08, "V'": 0.14350120393447305}, "mentions dates, historical facts, and other relevant information.": {"p-value": 0.00015330459205340958, "V'": 0.0987716510682092}, "uses facts, figures, and data to give an explanation of a topic.": {"p-value": 3.254418642547788e-05, "V'": 0.12677466498535817}, "uses complex language structures, such as multiple clauses or conjunctions.": {"p-value": 2.2380006290564192e-06, "V'": 0.12896165955873165}, "uses jargon or technical language.": {"p-value": 0.0002186602163174493, "V'": 0.10291771666201513}}, "-": {"mentions geographical locations (cities, islands, etc).": {"p-value": 3.52402267772946e-05, "V'": 0.1156529024232914}, "contains specific geographical information, such as city and state.": {"p-value": 4.022388239799592e-06, "V'": 0.13138338280686623}, "describes facts about places, such as cities and countries, their leaders, and landmarks.": {"p-value": 1.9814345497350762e-11, "V'": 0.18789549930446503}, "contains geographical information, such as countries, cities, and other landmark locations.": {"p-value": 3.729484742869232e-09, "V'": 0.15694130436776188}, "mentions specific locations or landmarks.": {"p-value": 7.146225903340973e-05, "V'": 0.12041908475536267}, "uses geographic information, such as cities and countries.": {"p-value": 7.589058877797053e-08, "V'": 0.13983029976511963}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1728_web_nlg_data_to_text task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You will be given one or more triples. The second part of each triple shows the relation between the first and the third element. Your task is to write a simple and short piece of text (sentence(s)) that describes the triples in natural language.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"Contains detailed descriptions of places, people, and culture.": {"p-value": 0.00011026523203670478, "V'": 0.06935736220050957}, "contains geographic information, such as the location of a hospital.": {"p-value": 2.503034793207266e-06, "V'": 0.14182544474486092}, "uses factual information, such as airport locations.": {"p-value": 3.989525204209659e-07, "V'": 0.15257565461735773}, "uses geographic locations, such as city and country names.": {"p-value": 0.00029517769593539735, "V'": 0.09424830491682634}, "uses a lot of geographic locations": {"p-value": 1.8260668687740567e-10, "V'": 0.1903106311478217}, "uses complex language.": {"p-value": 3.7287682854372376e-07, "V'": 0.08368885549734323}}, "research goal": "The dataset includes the outputs generated by tk3b on the task1728_web_nlg_data_to_text task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You will be given one or more triples. The second part of each triple shows the relation between the first and the third element. Your task is to write a simple and short piece of text (sentence(s)) that describes the triples in natural language.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"contains action verbs, such as 'throw' and 'depart'": {"p-value": 3.7011028879002856e-05, "V'": 0.11539882165684329}, "contains references to transportation, such as 'train' and 'plane'": {"p-value": 5.913180179158808e-10, "V'": 0.15509234483220308}, "contains references to movement, such as 'come' and 'travel'": {"p-value": 1.5261808670636067e-06, "V'": 0.1273147519471893}, "mentions activities related to transportation, such as trains, bridges, or stations.": {"p-value": 1.2795197691183135e-09, "V'": 0.15740744496308487}, "uses verbs of motion, such as walk, travel, and cross.": {"p-value": 2.0596904842635173e-05, "V'": 0.10416693038266936}, "involves activities and/or travel, such as leaving a station by train.": {"p-value": 1.3949695854830237e-07, "V'": 0.13656756043793453}, "uses concrete language related to physical objects.": {"p-value": 0.00023800855747499393, "V'": 0.11554190504601214}, "include descriptions of motion or movement.": {"p-value": 1.1523505291123818e-07, "V'": 0.1342593425970393}, "contain verbs related to travel or exploration.": {"p-value": 1.7437642372830775e-06, "V'": 0.11113289043757074}, "mentions a journey, such as travelling with a train or an airplane.": {"p-value": 6.735242788847374e-09, "V'": 0.10880321605334502}}, "-": {}, "research goal": "The dataset includes the input of the task102_commongen_sentence_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"uses imagery to describe a scene, such as a boat in the harbour of the city.": {"p-value": 0.0006081302176988622, "V'": 0.11277716455358627}}, "-": {"uses passive voice instead of active voice.": {"p-value": 6.763890658541413e-05, "V'": 0.09633806836590275}, "describes a scene in a room, such as a lamp on the couch.": {"p-value": 1.0900019453096507e-05, "V'": 0.0761750367666712}, "utilizes long and complex sentences.": {"p-value": 4.738447448425211e-06, "V'": 0.08740357751706088}}, "research goal": "The dataset includes the outputs generated by tk11b on the task102_commongen_sentence_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the outputs generated by tk3b on the task102_commongen_sentence_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given concept set (with 3 to 5 concepts) that contain mentions of names of people, places, activities, or things. These concept sets reflect reasonable concept co-occurrences in everyday situations. All concepts given as input are separated by \"#\". Your job is to generate a sentence describing a day-to-day scene using all concepts from a given concept set.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the input of the task569_recipe_nlg_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you have to generate the title of the recipe given its required ingredients and directions.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses long, complex sentences.": {"p-value": 0.00044735901944473373, "V'": 0.04894449770859191}, "uses specific instructions.": {"p-value": 1.2910615419274176e-06, "V'": 0.12754491513378663}, "uses long words.": {"p-value": 0.0001087377067099007, "V'": 0.09374071520708854}, "uses complex or long sentences.": {"p-value": 0.0007122724526229261, "V'": 0.03546501046109897}, "Uses long words and complex syntax": {"p-value": 2.9366140687948365e-05, "V'": 0.07492852799210359}}, "research goal": "The dataset includes the outputs generated by tk11b on the task569_recipe_nlg_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you have to generate the title of the recipe given its required ingredients and directions.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"Uses complex and long sentences.": {"p-value": 1.1710659183635542e-12, "V'": 0.11488561286496983}, "uses a lot of adjectives to describe the recipe ingredients and flavor.": {"p-value": 0.00042993583560048317, "V'": 0.032053416780209426}, "contains complex ingredients.": {"p-value": 5.915277526964533e-10, "V'": 0.12962937450048684}, "uses complex vocabulary and sentence structure.": {"p-value": 9.993076438404286e-12, "V'": 0.09583261453978383}, "contains detailed descriptions of ingredients.": {"p-value": 3.2462661519070355e-16, "V'": 0.15453159793064636}, "uses complex language and long sentences.": {"p-value": 8.569060406089105e-13, "V'": 0.10108860974687595}, "uses complex language.": {"p-value": 6.348849962173354e-11, "V'": 0.08256016852951108}, "uses complicated language and/or descriptive words.": {"p-value": 3.741761300021598e-07, "V'": 0.07398449660749767}, "uses long sentences.": {"p-value": 2.0431102568888582e-10, "V'": 0.07308774245751937}, "uses long, complicated words to describe the recipe title.": {"p-value": 4.815119560925068e-10, "V'": 0.11192194432605665}, "uses long words and complex language structures.": {"p-value": 2.416447274669225e-12, "V'": 0.11917230181888622}, "uses complex words.": {"p-value": 5.363920572424951e-08, "V'": 0.12276776001172615}}, "-": {}, "research goal": "The dataset includes the outputs generated by tk3b on the task569_recipe_nlg_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you have to generate the title of the recipe given its required ingredients and directions.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses long sentences with complex grammar.": {"p-value": 1.8313385063220313e-12, "V'": 0.2133374425560957}, "has complex vocabulary.": {"p-value": 5.553371791688525e-09, "V'": 0.17524159832855957}, "uses descriptive words.": {"p-value": 2.3125999359698786e-13, "V'": 0.2245648483102048}, "contains details.": {"p-value": 0.0002244714017980047, "V'": 0.07012272310029788}, "contains complex grammatical structures.": {"p-value": 2.679604489660173e-20, "V'": 0.25824910259920797}, "contains complex and long words.": {"p-value": 5.0160665801649614e-09, "V'": 0.1674263239779672}, "contains complex sentence structures.": {"p-value": 8.506691039285166e-13, "V'": 0.2045578308882967}, "contains complex syntactical structures, such as multiple clauses.": {"p-value": 1.3347203092662776e-20, "V'": 0.28125991203332323}, "contains multiple facts about a single subject.": {"p-value": 2.515117072504656e-07, "V'": 0.1313139745809354}, "focuses on general facts about a location.": {"p-value": 8.021397886198929e-09, "V'": 0.17449354832397557}, "has details, including facts and information about the subject.": {"p-value": 1.0378808923600918e-07, "V'": 0.14288199456047612}, "uses complex sentence structures.": {"p-value": 1.8485311956202203e-12, "V'": 0.20276886765447055}, "utilizes complex sentence structures.": {"p-value": 4.2243770077663924e-12, "V'": 0.20083849853490698}, "uses a variety of nouns, adjectives, and verbs to describe the triplets.": {"p-value": 1.7042404823374757e-20, "V'": 0.25757957378175694}, "contains an implication of a comparison between two variables": {"p-value": 2.2053205549115469e-10, "V'": 0.19197591336541409}, "has complex sentences.": {"p-value": 7.573712474243023e-13, "V'": 0.20832349629163438}, "has a complex sentence structure": {"p-value": 1.2779039095945843e-07, "V'": 0.14099515965578047}, "includes entity details, such as location, country, etc.": {"p-value": 0.00043071021486216197, "V'": 0.09480859207389225}}, "-": {"contains precise and accurate language about numbers and dates.": {"p-value": 2.0872018598307045e-08, "V'": 0.1701908535553085}}, "research goal": "The dataset includes the input of the task1409_dart_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given triplets. Each triplet is in the form of [subject, predicate, object]. Your task is to generate proper sentence that utilizes these triples. The objective is to construct a sentence that (a) captures the facts specified in the triples and (b) is a well-formed sentence easily understandable by a human. All triple values need not be used directly in the sentence as long as the facts are adequately captured.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"contains references to everyday life, such as restaurants and locations.": {"p-value": 2.272713638718307e-10, "V'": 0.19313824255386491}, "uses long and complex words.": {"p-value": 2.5949189240004365e-15, "V'": 0.24144853884045608}, "contains culturally specific references, such as Indian food or Mexican sports.": {"p-value": 6.594322083743504e-10, "V'": 0.16951491183323494}, "uses a complex sentence structure.": {"p-value": 9.063378342967077e-12, "V'": 0.17768169081958884}, "uses complex noun phrases to convey information.": {"p-value": 2.5154514907433048e-14, "V'": 0.22084931117071627}, "uses complex language, i.e., words with multiple syllables.": {"p-value": 3.130330910866524e-11, "V'": 0.20275681751305602}, "refers to a specific location or place": {"p-value": 1.4965670250312952e-06, "V'": 0.13772724913939838}, "uses complex syntax, such as subordinate clauses.": {"p-value": 3.079854209103378e-18, "V'": 0.19838099912587195}, "uses long-winded sentences and complex grammar.": {"p-value": 8.044570475299648e-07, "V'": 0.09821110989911008}, "uses a lot of complex language and structure.": {"p-value": 6.473455986432432e-07, "V'": 0.06926764339620817}, "uses complex language and grammar structure.": {"p-value": 1.7172178049182214e-08, "V'": 0.12916058676617007}, "uses domain-specific words.": {"p-value": 6.218301294842935e-12, "V'": 0.1917539456627554}}, "-": {}, "research goal": "The dataset includes the outputs generated by tk11b on the task1409_dart_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given triplets. Each triplet is in the form of [subject, predicate, object]. Your task is to generate proper sentence that utilizes these triples. The objective is to construct a sentence that (a) captures the facts specified in the triples and (b) is a well-formed sentence easily understandable by a human. All triple values need not be used directly in the sentence as long as the facts are adequately captured.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"uses complex, technical vocabulary.": {"p-value": 0.0005989534021023817, "V'": 0.05685305754958804}, "mentions a specific geographic location.": {"p-value": 0.0002082939928778952, "V'": 0.10820833092581533}, "uses technical terms, such as names of athletes, airports, etc.": {"p-value": 9.872203268884488e-13, "V'": 0.20415005463955307}, "uses a lot of numbers and facts.": {"p-value": 1.0327987792703284e-14, "V'": 0.2001714091566304}, "uses complex language and sentence structure.": {"p-value": 0.00010480766449540416, "V'": 0.07978528011490264}, "uses specific terms and numbers, such as dates and percentages.": {"p-value": 1.0289337112785572e-18, "V'": 0.25189313308690175}, "refers to historical events or people.": {"p-value": 1.543638764732735e-05, "V'": 0.11561319537645784}, "uses complex and long sentences.": {"p-value": 1.0544867211896067e-06, "V'": 0.12970434880119736}, "uses technical terms, such as names of organizations and airports.": {"p-value": 0.00032911552928344584, "V'": 0.09297977271519983}}, "research goal": "The dataset includes the outputs generated by tk3b on the task1409_dart_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given triplets. Each triplet is in the form of [subject, predicate, object]. Your task is to generate proper sentence that utilizes these triples. The objective is to construct a sentence that (a) captures the facts specified in the triples and (b) is a well-formed sentence easily understandable by a human. All triple values need not be used directly in the sentence as long as the facts are adequately captured.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"mentions the price of the food with a negative connotation.": {"p-value": 2.3088692469308282e-14, "V'": 0.20774239351306326}, "mentions a specific location, such as an area in the city.": {"p-value": 9.713088488972157e-08, "V'": 0.1288462356553043}, "contains descriptive words and phrases.": {"p-value": 3.976154235300462e-12, "V'": 0.12626220229637575}, "uses complex words and long sentences.": {"p-value": 1.5605395151497828e-29, "V'": 0.3254629676666605}, "mentions a specific location as part of the review.": {"p-value": 5.984765180919635e-08, "V'": 0.12884616904613744}, "mentions a near location as part of the review.": {"p-value": 1.5967376775123408e-07, "V'": 0.12692297845664924}, "mentions the price of a dish.": {"p-value": 1.0696584230641908e-21, "V'": 0.28159987670129016}, "mentions the price range of the restaurant in detail": {"p-value": 2.5739371958027265e-61, "V'": 0.46346135236939656}}, "-": {"mentions bad service or quality.": {"p-value": 3.319156546736957e-10, "V'": 0.12533508753128578}, "mentions the decor of the restaurant in a negative way.": {"p-value": 2.4479369346858224e-07, "V'": 0.07307652353278682}, "mentions the decor of the restaurant as bad.": {"p-value": 2.447518166538385e-07, "V'": 0.07307697134421802}, "mentions the decor in a positive way.": {"p-value": 0.00019688855759814194, "V'": 0.04038456681329184}, "mentions the decor of the restaurant.": {"p-value": 6.425480057810074e-11, "V'": 0.1134617854348689}, "contains detailed descriptions about the restaurant, such as the decor, quality of food, and service": {"p-value": 4.882308429783951e-12, "V'": 0.12884607992141847}}, "research goal": "The dataset includes the input of the task1598_nyc_long_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"The task is to write a full sentence or two using all of the information given. The sentence(s) will be a brief review of a restaurant. Use all of the information provided.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"mentions the type of cuisine being served": {"p-value": 6.222368984997685e-15, "V'": 0.19416442112734666}, "mentions the location of the restaurant being in the city centre.": {"p-value": 1.273941330513658e-29, "V'": 0.3021140015531395}, "uses descriptive language.": {"p-value": 6.926654353758803e-17, "V'": 0.14444803607614487}, "uses descriptive language": {"p-value": 9.228888326716695e-15, "V'": 0.12516384980679407}, "mentions a location in the city centre.": {"p-value": 3.2981007746089258e-28, "V'": 0.29630409936846674}, "Uses complex words and/or long sentences.": {"p-value": 6.30565414560031e-14, "V'": 0.16370190187809242}, "uses fewer adjectives to describe the restaurant's features.": {"p-value": 3.549012983151043e-07, "V'": 0.06139076377506769}, "contains complex sentence structures.": {"p-value": 0.00038296857538721316, "V'": 0.03284038606355133}, "contains detailed descriptions of locations.": {"p-value": 7.735360153391514e-05, "V'": 0.07299791764317613}, "contains general descriptions.": {"p-value": 1.6775676314331147e-26, "V'": 0.22162384794038748}, "mentions high prices.": {"p-value": 0.00044209163537668733, "V'": 0.08284727094469968}, "mismatches between the quality and price of the meal.": {"p-value": 0.00038635627601696755, "V'": 0.10239332153530967}}, "-": {"uses abstract words.": {"p-value": 9.464631876025413e-05, "V'": 0.11821670883671875}, "mentions slow service.": {"p-value": 2.1158167319165537e-07, "V'": 0.06552637963961899}, "mentions bad customer service.": {"p-value": 2.115967042536413e-07, "V'": 0.0655262036517604}, "mentions bad service of the restaurant.": {"p-value": 4.093709027728331e-08, "V'": 0.07126255638651102}}, "research goal": "The dataset includes the outputs generated by tk11b on the task1598_nyc_long_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"The task is to write a full sentence or two using all of the information given. The sentence(s) will be a brief review of a restaurant. Use all of the information provided.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"mentions the prices of the food in the restaurant.": {"p-value": 3.607681466504509e-105, "V'": 0.5561715020122866}, "mentions the cuisine served in the restaurant.": {"p-value": 7.92829899920145e-15, "V'": 0.16978111959760855}, "mentions the prices of the food or drinks.": {"p-value": 5.067133218163803e-103, "V'": 0.5512095543315993}, "uses descriptive language": {"p-value": 1.5896211660911304e-08, "V'": 0.12357061485673615}, "uses descriptive words.": {"p-value": 4.5928208543753716e-10, "V'": 0.12529785698123364}}, "-": {"Uses descriptive words, such as 'fantastic' or 'high quality'.": {"p-value": 6.871710262140037e-36, "V'": 0.2421059450767971}, "Uses positive words/phrases, such as 'child friendly' or 'high customer rating'.": {"p-value": 1.471920188760096e-57, "V'": 0.36600076843136464}, "uses long and complex sentence structure.": {"p-value": 6.652117095925733e-70, "V'": 0.43276740644202805}, "mentions the customer rating of the restaurant.": {"p-value": 1.5538128484017767e-64, "V'": 0.430041243335362}, "uses descriptive adjectives.": {"p-value": 1.2052122430283234e-14, "V'": 0.09815282622156513}, "uses positive language.": {"p-value": 2.0421585158448146e-27, "V'": 0.2037293715438938}, "mentions the quality of the food served.": {"p-value": 1.5371263376283855e-74, "V'": 0.4658040690187238}, "Uses overly simplistic language and/or descriptions.": {"p-value": 3.33128020515768e-16, "V'": 0.1981343833376615}, "mentions bad quality food and/or bad service.": {"p-value": 7.831346957709745e-27, "V'": 0.2020710480486362}, "Uses language that is casual.": {"p-value": 7.417486628738032e-22, "V'": 0.17891449521980454}, "uses complex sentence structures.": {"p-value": 7.202283823503378e-95, "V'": 0.5424494294359463}, "has long sentences.": {"p-value": 2.079349336210042e-16, "V'": 0.12702017492246365}}, "research goal": "The dataset includes the outputs generated by tk3b on the task1598_nyc_long_text_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"The task is to write a full sentence or two using all of the information given. The sentence(s) will be a brief review of a restaurant. Use all of the information provided.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the input of the task1356_xlsum_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Generate an appropriate title for the given text. The generated title must be short and include the main topic of the text. The preferred titles are under fifteen words.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {}, "research goal": "The dataset includes the outputs generated by tk11b on the task1356_xlsum_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Generate an appropriate title for the given text. The generated title must be short and include the main topic of the text. The preferred titles are under fifteen words.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"contains complex language structures.": {"p-value": 0.0009238438923979129, "V'": 0.03266242681531889}}, "-": {"references to international news items, such as whales being sighted in Thailand.": {"p-value": 0.0007547057707579146, "V'": 0.09198260960884583}}, "research goal": "The dataset includes the outputs generated by tk3b on the task1356_xlsum_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"Generate an appropriate title for the given text. The generated title must be short and include the main topic of the text. The preferred titles are under fifteen words.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"mentions specific states or areas in the United States.": {"p-value": 0.0007621811170972905, "V'": 0.09088851311355739}}, "research goal": "The dataset includes the input of the task1659_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given a summary for US Congressional and California state bill, your task is to generate a Title for this bill. The preferred titles are under forty words and mention the purpose of the bill.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets task input on the datapoints where tk11b is better than tk3b, while the Group B snippets task input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher. My goal is to figure out how to build a better system. "}, {"+": {"uses legal terminology and jargon": {"p-value": 2.2659312900300674e-07, "V'": 0.06908212400514402}, "mentions specific laws, such as the Civil Rights Act of 1997.": {"p-value": 2.2754305384947918e-12, "V'": 0.11896320561531626}, "mentions a specific year in its title.": {"p-value": 4.675505772030863e-05, "V'": 0.12450632091019942}}, "-": {}, "research goal": "The dataset includes the outputs generated by tk11b on the task1659_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given a summary for US Congressional and California state bill, your task is to generate a Title for this bill. The preferred titles are under forty words and mention the purpose of the bill.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk11b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk11b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"contains legal terminology, such as 'shall' or 'shall not'.": {"p-value": 2.258962982168635e-13, "V'": 0.2125623451492461}, "contains technical, legal language.": {"p-value": 1.0544796141257079e-07, "V'": 0.11093573885634467}, "uses long sentences.": {"p-value": 1.8828720124652927e-06, "V'": 0.07997534700428155}, "uses technical language.": {"p-value": 7.35163602256008e-07, "V'": 0.12471392121683988}, "contains complex and lengthy sentences": {"p-value": 1.9079765096847476e-11, "V'": 0.1608377014771427}, "contains terminology related to financial regulation and oversight": {"p-value": 5.8855139077784064e-09, "V'": 0.1712902374473913}, "contains technical terms and jargon.": {"p-value": 1.6247816222106725e-07, "V'": 0.15086644763080792}, "contains legal language, such as terms related to taxes, fees and regulations.": {"p-value": 2.1177881050207975e-10, "V'": 0.18534435171302777}, "involves the use of technical terms or jargon.": {"p-value": 5.969828268381887e-05, "V'": 0.12023368365160969}, "contains legal language that refers to specific laws or regulations.": {"p-value": 3.1707382338224995e-10, "V'": 0.19304488966923888}}, "-": {}, "research goal": "The dataset includes the outputs generated by tk3b on the task1659_title_generation task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given a summary for US Congressional and California state bill, your task is to generate a Title for this bill. The preferred titles are under forty words and mention the purpose of the bill.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets outputs generated by tk3b on the datapoints where tk11b is better than tk3b, while the Group B snippets outputs generated by tk3b on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {}, "-": {"contains a request for the bot to complete a difficult task.": {"p-value": 4.412058239359951e-05, "V'": 0.24030924567377654}}, "research goal": "The dataset includes the whole input of the task1394_meta_woz_task_classification task in the AI2-Natural Instruction dataset. The task definition is \n\n\"In this task, you are given four sentences: a bot task sentence, a bot role sentence, a user task sentence and a user role sentence. Your job is to classify given sentences into one of the 47 different domains. The domains are: 'UPDATE_CALENDAR', 'PRESENT_IDEAS', 'MOVIE_LISTINGS', 'AUTO_SORT', 'GAME_RULES', 'CONTACT_MANAGER', 'BANK_BOT', 'MUSIC_SUGGESTER', 'CHECK_STATUS', 'PET_ADVICE', 'HOW_TO_BASIC', 'NAME_SUGGESTER', 'QUOTE_OF_THE_DAY_BOT', 'GUINESS_CHECK', 'INSURANCE', 'RESTAURANT_PICKER', 'MAKE_RESTAURANT_RESERVATIONS', 'WEDDING_PLANNER', 'SKI_BOT', 'HOME_BOT', 'PLAY_TIMES', 'BUS_SCHEDULE_BOT', 'WHAT_IS_IT', 'PHONE_PLAN_BOT', 'DECIDER_BOT', 'PHONE_SETTINGS', 'TIME_ZONE', 'LIBRARY_REQUEST', 'UPDATE_CONTACT', 'CATALOGUE_BOT', 'PROMPT_GENERATOR', 'SCAM_LOOKUP', 'SPORTS_INFO', 'POLICY_BOT', 'CITY_INFO', 'APARTMENT_FINDER', 'EVENT_RESERVE', 'SHOPPING', 'EDIT_PLAYLIST', 'LOOK_UP_INFO', 'ORDER_PIZZA', 'WEATHER_CHECK', 'APPOINTMENT_REMINDER', 'GEOGRAPHY', 'STORE_DETAILS', 'AGREEMENT_BOT', 'ALARM_SET'.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets whole input on the datapoints where tk11b is better than tk3b, while the Group B snippets whole input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses words or phrases that are unclear or ambiguous.": {"p-value": 7.05950138646497e-05, "V'": 0.10074501396658472}, "contains specific questions, such as requesting a phone number or live music.": {"p-value": 1.1272760776650366e-05, "V'": 0.13448016945535168}, "contains questions.": {"p-value": 2.3584324671094587e-29, "V'": 0.3402916030275108}, "asks for information.": {"p-value": 6.031518889419806e-32, "V'": 0.34114934245216044}, "contains a request for information about a location.": {"p-value": 1.1187433817774926e-05, "V'": 0.1383319442658083}, "contains requests for the virtual assistant to provide information.": {"p-value": 1.5651846208903053e-09, "V'": 0.18854733327282747}, "uses conversational language.": {"p-value": 1.0730781880485901e-19, "V'": 0.28244554090986695}, "asks for detailed information about the restaurant.": {"p-value": 4.40731546375392e-10, "V'": 0.17023863770557804}}, "-": {"refers to a particular time frame, such as a date or time.": {"p-value": 8.223076019747075e-07, "V'": 0.13631760818531952}, "mentions a specific time period.": {"p-value": 5.532063910577411e-08, "V'": 0.14626328716730483}, "contains words or phrases about making a reservation.": {"p-value": 4.775388106427832e-06, "V'": 0.12632947661746177}, "contains words of gratitude such as 'thank you' or 'appreciate'": {"p-value": 2.7078737860487607e-06, "V'": 0.06869501469965951}, "Contains requests for specific times or locations.": {"p-value": 5.275497517012672e-08, "V'": 0.17103454260703838}, "uses specific language to describe the reservation.": {"p-value": 5.185186648251915e-06, "V'": 0.13146970179978917}}, "research goal": "The dataset includes the whole input of the task879_schema_guided_dstc8_classification task in the AI2-Natural Instruction dataset. The task definition is \n\n\"You are given a sentence from a conversation between a human and a virtual assistant. Your task is to identify whether the sentence is a question or not. Answer with Yes or No.\". The two classes are generated based on how well different models performed on each sample. The Group A snippets whole input on the datapoints where tk11b is better than tk3b, while the Group B snippets whole input on the datapoints where tk11b is worse than tk3b. I am a natural language processing researcher.. My goal is to figure out how to build a better system. "}, {"+": {"uses complex words and phrases.": {"p-value": 6.793248595113309e-05, "V'": 0.08706168694222527}, "contain a lot of colloquial language.": {"p-value": 2.464102641065064e-11, "V'": 0.06022565811438123}, "contains complex syntax structures": {"p-value": 3.190684227143622e-20, "V'": 0.1970551758226411}, "refers to pop culture, such as movies or television shows.": {"p-value": 0.00029749068716089175, "V'": 0.03538897826964783}, "contains a comparison or metaphor.": {"p-value": 3.26291924557731e-05, "V'": 0.04394133905361157}, "has an informal tone, such as slang or colloquial speech.": {"p-value": 1.4627766883089104e-35, "V'": 0.2360575829171132}, "uses figurative language, such as metaphors, similes, and analogies.": {"p-value": 7.64379320187409e-10, "V'": 0.09019035284858636}, "involves a comparison between two entities": {"p-value": 7.570752470502921e-05, "V'": 0.07025220730928489}}, "-": {}, "research goal": "The dataset includes a collection of sentence pairs annotated with textual entailment information from a range of genres. The two classes are generated based on whether the IID-trained model generalizes better than the OOD-trained model. The Group A snippets are input datapoints where the ground truth is entailment, the model trained with IID data predicts correctly, but the model trained with OOD data predicts incorrectly, while the Group B snippets other datapoints. I am an natural langauge processing researcher. My goal is to figure out understand what datapoints are in-distribution trained models better. "}, {"+": {"mentions complex topics such as economic policies or regulations.": {"p-value": 4.3404716434561544e-05, "V'": 0.05513462799647771}}, "-": {"contain difficult to parse language structures.": {"p-value": 6.8762556170442975e-09, "V'": 0.07342960245941396}}, "research goal": "The dataset includes a collection of sentence pairs annotated with textual entailment information from image https://github.com/SALT-NLP/Parenting_OnlineUsage. The two classes are generated based on whether the IID-trained model generalizes better than the OOD-trained model. The Group A snippets are input datapoints where the ground truth is non-entailment, the model trained with IID data predicts correctly, but the model trained with OOD data predicts incorrectly, while the Group B snippets other datapoints. I am an natural langauge processing researcher. My goal is to figure out understand what datapoints are in-distribution trained models better. "}, {"+": {"contain legal terminology.": {"p-value": 2.5676400861032697e-08, "V'": 0.04024559790762218}, "Involves descriptive language about a person or place.": {"p-value": 1.4292906600258086e-08, "V'": 0.09357588166507988}}, "-": {"uses military lingo and terminology.": {"p-value": 1.5277842232825892e-10, "V'": 0.06299975428587906}, "mentions a specific airplane and flight number.": {"p-value": 1.7480576193547026e-05, "V'": 0.01999988707512694}, "talks about terrorism or hijacking.": {"p-value": 4.0569319493564205e-11, "V'": 0.054000022282856615}, "mentions terrorism, warfare, or violence.": {"p-value": 0.000167585807911272, "V'": 0.04700835473649883}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which split is the hypothesis from. The Group A snippets hypotheses from the dev matched split, while the Group B snippets hypotheses from the dev mismatched split. I am a natural language processing researcher. My goal is to figure out what are the differences between the two splits (distribution shift) so that I can build better models. "}, {"+": {"uses long sentences.": {"p-value": 2.8627514522912757e-37, "V'": 0.18822358327515426}, "Discusses topics related to government and policy.": {"p-value": 5.3270265625883165e-36, "V'": 0.1471350411308832}, "involves complex technical topics, such as engineering and accounting.": {"p-value": 5.824927386742392e-07, "V'": 0.02399980143358243}, "uses references to historical events or locations": {"p-value": 1.1454119893866875e-31, "V'": 0.13825875753579692}, "uses complex words and phrasings.": {"p-value": 6.534889371634861e-29, "V'": 0.1238334666410371}, "uses long and complex sentence structures.": {"p-value": 2.4007932442656872e-55, "V'": 0.3009742846192305}, "uses words related to religious beliefs or practices.": {"p-value": 0.00025958924164791556, "V'": 0.02115736440636968}, "mentions historical locations or events.": {"p-value": 3.833826877398839e-18, "V'": 0.08219021216127492}, "refers to historical events,": {"p-value": 9.68283791242753e-16, "V'": 0.06670869000309537}, "are written in casual and conversational language.": {"p-value": 5.251048547180531e-05, "V'": 0.043966672296541706}}, "-": {"involves physical activity, such as walking, playing, climbing, or biking.": {"p-value": 7.068592183837589e-101, "V'": 0.36735232853119437}, "involves physical activities such as throwing, climbing, and shopping.": {"p-value": 2.2097055449928282e-85, "V'": 0.3387992247631419}, "contains action verbs, such as jumping or running.": {"p-value": 1.6937039674169946e-92, "V'": 0.37140516398310336}, "talks about physical activities, such as surfing, racing, playing sports, etc.": {"p-value": 5.402656571842017e-69, "V'": 0.2780407625315511}, "have concrete sentences.": {"p-value": 1.1327810606089643e-47, "V'": 0.18209690704002301}, "uses specific language to describe people, e.g., 'blond hair'": {"p-value": 0.000860862962764494, "V'": 0.07565746384111421}, "mentions everyday activities, such as playing sports or going to the park": {"p-value": 4.055347761487183e-75, "V'": 0.28933286672228975}, "uses concrete nouns, such as 'flag' or 'biker'": {"p-value": 1.9809305219276005e-106, "V'": 0.4462325048964948}, "involves physical actions like running, walking, or cooking.": {"p-value": 2.1355744899198854e-115, "V'": 0.4127522399489144}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on which dataset the hypothesis is coming from. The Group A snippets hypotheses from the MNLI in-distribution data, while the Group B snippets hypotheses from the SNLI in-distribution data. I am a natural language processing researcher. My goal is to figure out what are the differences between the two datasets (distribution shift) so that I can build better models. "}, {"+": {}, "-": {}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on whether the corresponding label is entailment. The Group A snippets hypotheses from the dev matched split with label entailment, while the Group B snippets hypotheses from the dev matched split that do not have label entailment. I am a natural language processing researcher. My goal is to figure out what are the spurious correlations between the labels and the hypotheses so that I can build better models. "}, {"+": {}, "-": {}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on whether the corresponding label is neutral. The Group A snippets hypotheses from the dev matched split with label neutral, while the Group B snippets hypotheses from the dev matched split that do not have label neutral. I am a natural language processing researcher. My goal is to figure out what are the spurious correlations between the labels and the hypotheses so that I can build better models. "}, {"+": {}, "-": {"uses a complex sentence structure.": {"p-value": 0.0006235798765867377, "V'": 0.06377366384033922}}, "research goal": "The dataset includes training examples from various natural language inference (NLI) datasets. The two classes are generated based on whether the corresponding label is contradiction. The Group A snippets hypotheses from the dev matched split with label contradiction, while the Group B snippets hypotheses from the dev matched split that do not have label contradiction. I am a natural language processing researcher. My goal is to figure out what are the spurious correlations between the labels and the hypotheses so that I can build better models. "}, {"+": {}, "-": {}, "research goal": "The dataset includes a reading comprehension dataset of yes/no questions. The two classes are generated based on whether the answer to the question is yes or no. The Group A snippets questions from the BoolQ dataset with answer yes, while the Group B snippets questions from the BoolQ dataset with answer no. I am a natural language processing researcher. My goal is to figure out what are the spurious correlations between the labels and the questions so that I can build better models. "}, {"+": {"asks a lot of questions about a particular country.": {"p-value": 2.735907126434342e-05, "V'": 0.04851655853581262}, "questions about lifestyle changes or advice.": {"p-value": 9.23855435788047e-05, "V'": 0.05688834918927371}}, "-": {}, "research goal": "The dataset includes questions from Quora.com. The two classes are generated based on whether the questions have a duplicate in the dataset. The Group A snippets questions from the QQP dataset with label duplicate, while the Group B snippets questions from the QQP dataset with label non duplicate. I am a natural language processing researcher. My goal is to figure out what are the spurious correlations between the labels (whether a question is duplicated) and the questions so that I can build better models. "}, {"+": {}, "-": {}, "research goal": "The dataset includes reading comprehension questions crowdsourced from Wikipedia articles. The two classes are generated based on whether the question is answerable given the paragraph. The Group A snippets questions from the SQuAD v2 dataset that are answerable, while the Group B snippets questions from the SQuAD v2 dataset that are unanswerable. I am a natural language processing researcher. My goal is to figure out what are the spurious correlations between the labels (whether a question is answerable) and the questions so that I can build better models. "}, {"+": {}, "-": {}, "research goal": "The dataset includes news articles collected from various outlets between 2015 and 2017. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"mentions a technology innovation": {"p-value": 1.93516452299894e-08, "V'": 0.08931238183178739}, "discusses economic issues, such as business or industry.": {"p-value": 2.8454230929861434e-34, "V'": 0.2917968306512612}, "discusses a major shift in business or technology.": {"p-value": 3.4717451143726423e-15, "V'": 0.14761212390206366}, "involves business and finance": {"p-value": 1.0756612525239583e-20, "V'": 0.21104998240551404}, "concerns developments in technology and/or science.": {"p-value": 1.1343911990452869e-08, "V'": 0.10460966941950316}}, "-": {}, "research goal": "The dataset includes news articles collected from various outlets between 2015 and 2017. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"mentions famous people, such as Robert Durst or Donald Trump.": {"p-value": 3.0552055864270386e-06, "V'": 0.13073190634358112}, "mentions prominent political figures, such as Trump and Ryan.": {"p-value": 0.0009558000290759068, "V'": 0.08523131919800003}, "mentions celebrities or popular figures": {"p-value": 6.475534272201713e-07, "V'": 0.13984831351674626}, "refers to a specific person, such as a president or a criminal.": {"p-value": 3.209494629222085e-06, "V'": 0.13532778469984735}}, "-": {"discusses religious issues.": {"p-value": 6.2382068785040104e-06, "V'": 0.057620173414752104}}, "research goal": "The dataset includes news articles collected from various outlets between 2015 and 2017. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from one cluster, while the Group B snippets are from a very close cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"refers to a particular event in history.": {"p-value": 3.582679919040533e-16, "V'": 0.4864773842844302}, "Talks about military engagements and battles.": {"p-value": 2.395161088734007e-89, "V'": 0.8888887471853749}, "refers to historical events, such as battles, wars or military operations.": {"p-value": 9.51411548349697e-66, "V'": 0.8174603459121622}, "describes military battles and campaigns.": {"p-value": 1.3560062328755867e-87, "V'": 0.8889241573280392}, "references a battle or conflict.": {"p-value": 1.1066018890752897e-62, "V'": 0.8174606298498481}, "refers to military operations and battles.": {"p-value": 7.333560282360864e-83, "V'": 0.8730161140688543}, "discusses military operations or battles.": {"p-value": 1.2389235210363637e-88, "V'": 0.888888983509173}, "references military actions, such as battles and invasions.": {"p-value": 6.006906327135719e-80, "V'": 0.865079808714221}}, "-": {}, "research goal": "The dataset includes text snippets from Wikipedia. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"mentions a specific character from a manga or anime series": {"p-value": 2.2190752907479258e-23, "V'": 0.72131082693117}, "contains references to video games, such as game titles and characters": {"p-value": 1.1671131817574223e-17, "V'": 0.639343707644664}, "refers to videogames, such as levels, characters, or mechanics.": {"p-value": 1.5908503378493724e-14, "V'": 0.5737698436721823}, "Involves characters from popular culture, such as movies, tv shows, manga, etc.": {"p-value": 7.09827289580748e-21, "V'": 0.7213116663290987}, "mentions a specific game or sports.": {"p-value": 6.480268208352104e-09, "V'": 0.4590167028937895}, "mentions specific characters from a series or game.": {"p-value": 4.02719588038503e-24, "V'": 0.7540978808864203}, "mentions a video game, including characters and gameplay elements.": {"p-value": 3.01047612110956e-15, "V'": 0.5901643492268381}, "mentions characters or events from popular culture, such as movies and games.": {"p-value": 5.535181934377364e-29, "V'": 0.8032778757876782}, "talks about video games, such as characters and gameplay.": {"p-value": 5.214857683915787e-16, "V'": 0.6065613693663592}}, "-": {}, "research goal": "The dataset includes text snippets from Wikipedia. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"references pop culture, such as movies, books, and television shows.": {"p-value": 2.0927315965193853e-30, "V'": 0.5819667276037502}, "refers to a type of art or artwork.": {"p-value": 0.00010523604285161488, "V'": 0.1232108479637119}, "focuses on criminal activities.": {"p-value": 3.6693699550769825e-15, "V'": 0.3606562336871572}, "refers to characters from a show or movie.": {"p-value": 3.5543888240452676e-63, "V'": 0.8117358440329893}, "refers to popular culture, such as movies and music.": {"p-value": 3.876127331935757e-28, "V'": 0.5542688490296447}}, "-": {"describes military action, such as battles and campaigns.": {"p-value": 3.4322359095869605e-65, "V'": 0.8199327875769322}, "mentions specific countries, such as Australia and New Zealand.": {"p-value": 2.5352401946846692e-06, "V'": 0.32812583818811447}, "references battles, wars, or military actions.": {"p-value": 1.0858344272000483e-57, "V'": 0.7871456575743914}, "talks about military operations and conflict.": {"p-value": 3.777128602648266e-46, "V'": 0.7217423692573024}, "references military operations and battles.": {"p-value": 1.893470024328686e-74, "V'": 0.8527187817777815}, "references military or combat operations.": {"p-value": 6.261350131341899e-72, "V'": 0.8445218749330742}, "mentions military or war activities.": {"p-value": 1.8934661633146227e-74, "V'": 0.8527188711305291}, "mentions military activities and battles.": {"p-value": 2.5840176066977673e-67, "V'": 0.8281286521269635}, "references a historical event, such as a battle or war.": {"p-value": 1.0717814340416627e-101, "V'": 0.9190735840191481}, "describes military operations, battles, or strategies.": {"p-value": 6.260336408500789e-72, "V'": 0.8445226751138964}, "refers to a specific location, such as a town or country.": {"p-value": 1.396199883450682e-26, "V'": 0.5940281634205954}, "describes a fight or a conflict.": {"p-value": 1.286479818558464e-12, "V'": 0.39708576062704176}}, "research goal": "The dataset includes text snippets from Wikipedia. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from one cluster, while the Group B snippets are from a somewhat close cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"uses philosophical and theoretical language.": {"p-value": 2.208002709914774e-161, "V'": 0.6385209018658776}, "talks about the power dynamics between different racial and ethnic groups.": {"p-value": 1.3173011383559505e-06, "V'": 0.0963939001824304}, "discusses the consequences of modernity and colonialism.": {"p-value": 8.877929739701101e-26, "V'": 0.2235155362888938}, "mentions legal discourse and conceptions of discrimination.": {"p-value": 3.27133505615512e-07, "V'": 0.08762154773827338}, "mentions oppression of certain people.": {"p-value": 3.369281174574713e-08, "V'": 0.12325354663208388}, "Uses philosophical language and theoretical concepts.": {"p-value": 1.82716320403587e-148, "V'": 0.6172403003220374}}, "-": {"mentions historical events, such as wars and battles.": {"p-value": 7.137623019719373e-10, "V'": 0.10545652237429652}, "refers to specific people, organizations, or countries.": {"p-value": 3.3781328219087374e-74, "V'": 0.45822673045503987}, "mentions a prominent figure or event in history": {"p-value": 3.729037321194229e-10, "V'": 0.12144871059899798}, "discusses events or policies related to the United States.": {"p-value": 7.109114659693552e-47, "V'": 0.3251009157554426}, "discusses economic issues, such as debt burdens and subsidies.": {"p-value": 9.139755226810365e-10, "V'": 0.0868338627686632}, "mentions a geopolitical event or action, such as the Cold War or the Brexit vote.": {"p-value": 1.1126465419764945e-27, "V'": 0.2178047793547367}, "talks about historical events and contexts": {"p-value": 6.947388646519714e-18, "V'": 0.21558855864256643}, "talks about economics, such as oil and energy production and subsidies.": {"p-value": 2.4074743890310756e-19, "V'": 0.13248430811492726}, "uses technical terms related to economics": {"p-value": 0.0002307845361261845, "V'": 0.06743674241972789}, "mentions the impact of technology or industrialization.": {"p-value": 2.655915396624671e-07, "V'": 0.08026693466374298}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from one cluster, while the Group B snippets are from a very close cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"talks about foreign affairs and military actions.": {"p-value": 3.6634785745490146e-190, "V'": 0.5946251015733519}, "mentions potential military conflict between US and China": {"p-value": 7.936479557498719e-05, "V'": 0.08139543921339767}, "talks about international relations and military conflicts.": {"p-value": 2.854735267679245e-173, "V'": 0.5742465996665036}, "discusses international tensions.": {"p-value": 1.7104362152845726e-129, "V'": 0.494834226234531}, "mentions the use of force or military action.": {"p-value": 2.42650705132371e-19, "V'": 0.21925612228251667}, "mentions military and defense policies.": {"p-value": 2.0550591959277565e-91, "V'": 0.40930160918801695}, "mentions international relations, such as the North Korean weapons technology.": {"p-value": 8.436056012120326e-155, "V'": 0.5089268838428415}, "mentions international players and/or organizations": {"p-value": 9.552512909673067e-195, "V'": 0.5974118521510677}, "mentions international economic environment or relations between countries": {"p-value": 3.533578597244354e-207, "V'": 0.6186051120044733}, "touches on the concept of economic prosperity and power.": {"p-value": 3.963324859388377e-14, "V'": 0.15448278984458283}}, "-": {"Discusses educational issues.": {"p-value": 8.665356817852794e-21, "V'": 0.09416248708988284}, "Talks about the impact of economic policies on citizens.": {"p-value": 3.2209797833994606e-07, "V'": 0.06305894349127884}, "Discusses economic reform and its potential impacts.": {"p-value": 8.660974427495324e-06, "V'": 0.06267084835165246}, "discusses the role of government in policy and education.": {"p-value": 0.00010080365153394064, "V'": 0.07328328872779355}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from one cluster, while the Group B snippets are from a very close cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {}, "-": {"refers to a famous historical event or person when discussing a current issue.": {"p-value": 5.329096866566969e-07, "V'": 0.17670180464498214}, "involves discussions about international relations.": {"p-value": 0.00023190579615687043, "V'": 0.1310558062387177}, "discusses international relations and foreign policy.": {"p-value": 2.7315366346568894e-05, "V'": 0.14759445890835599}, "references historical events, figures, or philosophies.": {"p-value": 4.793428774015475e-07, "V'": 0.16705046937976697}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from one cluster, while the Group B snippets are from a somewhat close cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"portrays a scene from nature.": {"p-value": 4.94066791175071e-10, "V'": 0.13864198902738775}, "uses vivid and poetic language to describe an experience or event.": {"p-value": 1.0709615872728817e-91, "V'": 0.5281249730739987}, "contains references to nature, such as plants and animals.": {"p-value": 9.793272710783522e-11, "V'": 0.1597854299147926}, "refers to a specific place or location.": {"p-value": 8.865523334288645e-13, "V'": 0.15714361359219498}, "references to physical objects, such as a cup, a bed, a door, or a stool.": {"p-value": 3.015819441657147e-13, "V'": 0.18903078528671682}, "uses vivid imagery and metaphors to convey a feeling.": {"p-value": 5.040100005389598e-64, "V'": 0.45216673221889986}, "uses vivid imagery and description, such as landscapes and colors.": {"p-value": 6.604957721770652e-28, "V'": 0.28323731258608076}, "uses imagery to describe nature or landscapes.": {"p-value": 4.790077909502927e-11, "V'": 0.14711423587895703}}, "-": {}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"expresses feelings of love and/or desire.": {"p-value": 8.784402225645278e-07, "V'": 0.08827445017794877}, "uses symbolic language to describe personal experiences.": {"p-value": 6.37227578968047e-24, "V'": 0.2647056315858799}}, "-": {}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"features themes of loneliness and sadness.": {"p-value": 1.545749207311327e-06, "V'": 0.23352142942490506}, "focuses on familial relationships.": {"p-value": 0.0003113410338099805, "V'": 0.10101011226305896}, "uses abstract language to evoke emotion": {"p-value": 0.00028500363364581153, "V'": 0.16156186771552605}, "uses figurative language, such as metaphors and similes.": {"p-value": 0.00040806409873475207, "V'": 0.17676840092709828}}, "-": {}, "research goal": "The dataset includes poems from PoetryFoundation.com. The two classes are generated based on which automatically generated \"cluster\" the snippet is from. The Group A snippets are from a particular cluster, while the Group B snippets are from the rest of the cluster. I am a data scientist performing unsupervised clustering. My goal is to figure out what each cluster represents. "}, {"+": {"references US economic policies": {"p-value": 2.8529670267106246e-11, "V'": 0.09341028501463941}, "discusses the necessity of governmental action": {"p-value": 3.300382593099162e-06, "V'": 0.06418989748207617}, "mentions historical policies": {"p-value": 5.490349051404261e-13, "V'": 0.0858344311373418}, "Relies on current international law": {"p-value": 1.4083658381963118e-05, "V'": 0.03991821646002272}, "mention the controversial nature of engaging with Venezuela": {"p-value": 1.8950859635476503e-13, "V'": 0.0556982418036815}, "mentions the US embargo on Cuba": {"p-value": 1.6625765460662685e-12, "V'": 0.04897155505592693}, "Focuses on external threats, such as FARC and organized crime.": {"p-value": 1.0686906029981004e-09, "V'": 0.08378488219054928}, "mentions US intervention": {"p-value": 2.2779641552552218e-06, "V'": 0.06345720477593077}, "focuses on the impact of trade": {"p-value": 1.0647549233245572e-09, "V'": 0.08157327662432247}, "mentions US unilateral actions as a factor": {"p-value": 1.8039767932818892e-06, "V'": 0.07234286440635795}}, "-": {"mentions the implications of climate change": {"p-value": 5.978692691175868e-07, "V'": 0.042566227905571824}, "focus on technology related to the ocean": {"p-value": 7.061864091599643e-24, "V'": 0.1016166247616141}, "Emphasizes the importance of marine ecosystems": {"p-value": 9.327346959745728e-06, "V'": 0.026533443516207164}, "emphasizes climate change and its impacts": {"p-value": 1.0225483311400604e-07, "V'": 0.049113313643981395}, "focuses on the impact of environmental policies": {"p-value": 1.8835265766159466e-21, "V'": 0.13359344656743424}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on which year the evidence was published. The Group A snippets were published in the year 2013, while the Group B snippets were published in the year 2014. I am a coach reflecting on the debate community. My goal is to figure out how debate topics have shifted over time. "}, {"+": {"focuses on educational reform": {"p-value": 2.515312076766631e-68, "V'": 0.2664190320190296}, "discusses racism within the education system": {"p-value": 1.8580962703029234e-08, "V'": 0.03489326242244949}, "focuses on education reform": {"p-value": 7.892749650681996e-67, "V'": 0.25970424613309184}, "mentions the need for reform": {"p-value": 3.706243191113532e-05, "V'": 0.0836892058192168}}, "-": {"mentions migration and immigration": {"p-value": 1.1826314697357163e-57, "V'": 0.2461766839480457}, "mentions immigration status and uncertainty": {"p-value": 1.6455016271760164e-16, "V'": 0.07523322813883193}, "Discusses the legal implications of immigration": {"p-value": 1.149589693944426e-31, "V'": 0.15505405975664072}, "emphasizes economic effects of immigration": {"p-value": 3.238714353534306e-23, "V'": 0.09896922420204055}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on which year the evidence was published. The Group A snippets were published in the year 2017, while the Group B snippets were published in the year 2018. I am a coach reflecting on the debate community. My goal is to figure out how debate topics have shifted over time. "}, {"+": {"mentions the EPA": {"p-value": 3.920297463273126e-06, "V'": 0.02060830585710429}, "mentions water protection": {"p-value": 2.2235414490631094e-27, "V'": 0.10794947767852552}, "references offshore drilling": {"p-value": 1.9684676206513686e-05, "V'": 0.01766444810036565}, "mentions the environment and climate change": {"p-value": 6.6773999552422425e-37, "V'": 0.1790853167959704}, "emphasizes environmental protection": {"p-value": 1.506679000185044e-47, "V'": 0.202326637770916}, "Discusses environmental cooperation": {"p-value": 2.541751011579462e-19, "V'": 0.08439476225590178}}, "-": {"mentions the use of technology, such as AI": {"p-value": 9.626100300574164e-23, "V'": 0.11465365737817189}, "emphasizes military AI": {"p-value": 1.551444991431109e-10, "V'": 0.06042350138942011}, "mentions the use of AI weapons": {"p-value": 2.091374093774657e-06, "V'": 0.023322542454277145}, "focuses on the geopolitical implications of technological advancements": {"p-value": 2.2859811325744144e-30, "V'": 0.19920147411814104}, "emphasizes international cooperation": {"p-value": 3.7367412288548615e-37, "V'": 0.1952919139490536}, "mentions alliences or entanglement": {"p-value": 2.8886789024101102e-30, "V'": 0.20591240259529925}, "mentions the need for international cooperation": {"p-value": 3.1221614481181684e-36, "V'": 0.18169388542869724}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on which year the evidence was published. The Group A snippets were published in the year 2021, while the Group B snippets were published in the year 2022. I am a coach reflecting on the debate community. My goal is to figure out how debate topics have shifted over time. "}, {"+": {"emphasizes social justice issues": {"p-value": 2.9965380010057136e-11, "V'": 0.18466469980024633}, "refers to environmental protection and sustainability.": {"p-value": 5.082333637730773e-06, "V'": 0.09156490324512906}, "discusses social justice issues": {"p-value": 2.6789333914055695e-10, "V'": 0.18054397082683582}, "Relates to environmental issues": {"p-value": 1.489731937965195e-06, "V'": 0.10244915866063659}, "discusses environmental issues": {"p-value": 1.6845373793592306e-05, "V'": 0.08777663768879161}, "focuses on structural inequalities": {"p-value": 1.833253992640728e-07, "V'": 0.12951366933380248}}, "-": {"Focuses on President Obama's legacy": {"p-value": 7.029185536033362e-25, "V'": 0.11827304150057674}, "Focuses on President Obama's political capital": {"p-value": 1.041235652220868e-24, "V'": 0.12302166284487874}, "refers to U.S. foreign policy": {"p-value": 8.634017289756975e-16, "V'": 0.2517530851077755}, "Discusses Obama's political capital (PC) and its importance in upholding deals": {"p-value": 1.108051126536433e-22, "V'": 0.10345709151441135}, "focuses primarily on US financial and economic policies": {"p-value": 1.1676160934803456e-23, "V'": 0.26435107095572985}, "refers to Hillary Clinton's political strategies": {"p-value": 1.7551989253112063e-32, "V'": 0.12751639614203414}, "references to economic reforms": {"p-value": 4.433236339763359e-06, "V'": 0.07363843588285228}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the category of argument. The Group A snippets are \"Affirmatives\", while the Group B snippets are \"Politics\". I am a novice to policy debate. My goal is to figure out the general topics of each category. "}, {"+": {"Discusses colonialism and its effects": {"p-value": 3.530900444580956e-26, "V'": 0.262079059180645}, "References the implications of violence": {"p-value": 9.197600361986932e-15, "V'": 0.19264446422830625}, "Discusses the relationship between subject and nature": {"p-value": 8.309784051372898e-06, "V'": 0.08351443398943206}, "challenges the status quo": {"p-value": 1.0107831483376692e-05, "V'": 0.11900578140218643}, "Questions the efficacy of the affirmative plan": {"p-value": 0.0002311617075781132, "V'": 0.09268282065525615}}, "-": {"emphasizes the importance of civic engagement": {"p-value": 7.45700606521251e-32, "V'": 0.2371920068813532}, "advocates for skills development": {"p-value": 4.740182916922469e-12, "V'": 0.07816548931521951}, "discusses the importance of meaningful debate": {"p-value": 5.325861450384557e-49, "V'": 0.31665848879746006}, "proposes a deliberative framework": {"p-value": 1.5700368654892815e-15, "V'": 0.15350101643427777}, "focuses on the value of the debate game": {"p-value": 8.151465133703941e-27, "V'": 0.15606852308797717}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the category of argument. The Group A snippets are \"Kritik Answers\", while the Group B snippets are \"Theory arguments\". I am a novice to policy debate. My goal is to figure out the general topics of each category. "}, {"+": {"addresses the potential effects of EU policies": {"p-value": 0.0002770286595135369, "V'": 0.1078709031773374}}, "-": {}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the category of argument. The Group A snippets are \"Counterplan Answers\", while the Group B snippets are \"Disadvantage Answers\". I am a novice to policy debate. My goal is to figure out the general topics of each category. "}, {"+": {"discusses male-centered legal systems and their marginalization of women": {"p-value": 1.1896745596101454e-34, "V'": 0.30273086786510606}, "Discusses the effects of the environment": {"p-value": 3.2137684392619873e-06, "V'": 0.07484229433119177}, "Focuses on gender equality": {"p-value": 1.4627931023167979e-55, "V'": 0.41881808937476267}, "Argues for gender-focused solutions": {"p-value": 1.1297961652228196e-30, "V'": 0.2650969319220443}, "Focuses on gender roles and their impacts": {"p-value": 7.431354696700219e-95, "V'": 0.5686364296744718}, "highlights the risks of using gendered language": {"p-value": 8.249086840586575e-28, "V'": 0.25079941284920615}, "advocates for women's rights and equality": {"p-value": 1.2297578491801885e-53, "V'": 0.41627546238288793}}, "-": {"addresses the issue of slavery": {"p-value": 2.6799030551561014e-11, "V'": 0.15049456460138072}, "discusses the effects of slavery": {"p-value": 8.222636882422822e-08, "V'": 0.11868222302746995}, "discusses the concept of anti-blackness": {"p-value": 1.003855769398711e-149, "V'": 0.48831549295367793}, "Uses blackness as a central concept": {"p-value": 7.143909535563483e-191, "V'": 0.5725737823279451}, "centers on the limit of fictions of blackness": {"p-value": 3.510966094656952e-138, "V'": 0.45216689014177225}, "talks about race and racism.": {"p-value": 7.609512081448696e-108, "V'": 0.5560667138297246}, "mentions anti-black logic and its unethical implications": {"p-value": 1.5737616079870597e-103, "V'": 0.49280016324592024}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the argument made. The Group A snippets are arguments about \"feminism\", while the Group B snippets are arguments about \"afropessimism\". I am a novice to policy debate. My goal is to figure out the claims of each type of argument. "}, {"+": {"refers to US-Japan alliance": {"p-value": 6.791054294179255e-11, "V'": 0.18046944676940024}, "argues for the importance of the US-Japan alliance": {"p-value": 3.7138197195747684e-08, "V'": 0.1347340587396885}, "focuses on US-Japan relations": {"p-value": 9.245468731711213e-15, "V'": 0.23856596608734087}, "Focuses on alliances and deterrence": {"p-value": 6.916418157234558e-58, "V'": 0.564216976778822}, "alludes to US-Japan relations": {"p-value": 3.4194596030440294e-16, "V'": 0.2583268519470883}, "focuses on improving relations between countries": {"p-value": 2.1017571673248536e-33, "V'": 0.44478177997720897}, "Focuses on US-Japan relations": {"p-value": 7.906746715139073e-16, "V'": 0.2533991220126604}}, "-": {"Calls for an undoing of the world.": {"p-value": 2.2646949398621814e-11, "V'": 0.07844218803085719}, "discusses the power of gender to naturalize and civilize": {"p-value": 8.974951751473383e-11, "V'": 0.050504798705093515}, "refuses normative structures": {"p-value": 1.5375525830948941e-26, "V'": 0.3001023634074057}, "Rejects the idea of sovereignty": {"p-value": 8.63963752209163e-17, "V'": 0.10490560892161443}, "Explores ontological difference": {"p-value": 6.971293694004301e-152, "V'": 0.6879300533209377}, "Criticizes the power structures of whiteness": {"p-value": 1.7398421286935634e-60, "V'": 0.32228014739279004}, "emphasizes the need for reform": {"p-value": 0.0004991664565329421, "V'": 0.11384034665742923}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the argument made. The Group A snippets are arguments about \"consultation\", while the Group B snippets are arguments about \"queer pessimism\". I am a novice to policy debate. My goal is to figure out the claims of each type of argument. "}, {"+": {"mentions an alternative to the status quo": {"p-value": 1.6149765186561138e-10, "V'": 0.1511151126772854}, "Talks about the consequences of inaction": {"p-value": 6.944657901436971e-08, "V'": 0.09290309334732297}, "Addresses international cooperation and organizations": {"p-value": 7.158594937002238e-31, "V'": 0.26783762398726496}, "mentions renewable energy solutions": {"p-value": 0.0001498448448082632, "V'": 0.029498339972303576}, "Calls for increased funding and investment": {"p-value": 5.172435264521273e-16, "V'": 0.12388683994560047}, "Discusses the need for the U.S. government to take action": {"p-value": 1.7380009157894185e-28, "V'": 0.22179810123532648}, "focuses on the status quo of current affairs": {"p-value": 4.611354352819339e-13, "V'": 0.18113770531321263}, "mentions alternative policy solutions": {"p-value": 2.1691120015538915e-33, "V'": 0.26209207404523477}, "Advocates for government intervention": {"p-value": 8.716719122864566e-53, "V'": 0.374181591947961}}, "-": {"mentions the concept of hyperreality": {"p-value": 1.8244558336160582e-21, "V'": 0.08397725069212887}, "critiques American Dream and global exploitation": {"p-value": 1.309106369302686e-44, "V'": 0.3047890044383654}, "mentions postmodernism": {"p-value": 4.5112317283206826e-107, "V'": 0.3592635765778615}, "mentions the power of words": {"p-value": 1.2654415193616753e-22, "V'": 0.09145593659248825}, "discusses the need for resistance to the status quo": {"p-value": 6.934536333915564e-69, "V'": 0.3670782487693708}, "Discussion of power structures": {"p-value": 3.686049114902166e-26, "V'": 0.2879404639684198}, "Explores the concept of hyperreality": {"p-value": 9.621491841181292e-66, "V'": 0.24159664862763242}, "refusal to engage in the political system": {"p-value": 2.2081949988790466e-16, "V'": 0.10936221741514217}, "mentions Baudrillard's theories": {"p-value": 1.6001967752360395e-38, "V'": 0.14915992112586335}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the argument made. The Group A snippets are arguments about \"other ways to solve the problem\", while the Group B snippets are arguments about \"Baudrillard\". I am a novice to policy debate. My goal is to figure out the claims of each type of argument. "}, {"+": {"focuses on US federal government policies": {"p-value": 0.00048584154959565435, "V'": 0.07872557601725502}}, "-": {"discusses Cuban healthcare and its issues": {"p-value": 1.0210048113368456e-07, "V'": 0.027237097800857986}, "focuses on international power dynamics": {"p-value": 3.7255836580066167e-22, "V'": 0.2577118782070724}, "mentions influence of China": {"p-value": 1.3921917603518737e-11, "V'": 0.10478272156586985}, "focuses on Latin American engagement": {"p-value": 1.1497705167231243e-31, "V'": 0.21367008504893575}, "mentions oil exports": {"p-value": 6.598866913615059e-05, "V'": 0.035911454046348365}, "emphasizes the role of the US in Latin America": {"p-value": 2.1299505290029464e-10, "V'": 0.08058261583105658}, "mentions US involvement in foreign countries": {"p-value": 1.3793208571179726e-08, "V'": 0.10937037772844352}, "discusses international relations between countries": {"p-value": 1.3708928357031347e-23, "V'": 0.2607998893489817}, "refers to US hegemony and its effect on stability": {"p-value": 0.00097062431999582, "V'": 0.0645391926509857}, "focuses on international relations": {"p-value": 5.0521631117782545e-19, "V'": 0.23781822711298994}, "focuses on US foreign policy": {"p-value": 0.00044788899919947595, "V'": 0.0771064592133553}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the debate camp that published the evidence. The Group A snippets are pieces of evidence compiled by Northwestern (NHSI), a debate camp, while the Group B snippets are pieces of evidence compiled by Sun Country (SCDI), a debate camp. I am a debater deciding which camp to go to. My goal is to figure out what specific topics each debate camp focuses on. "}, {"+": {"discusses immigration policies and reform": {"p-value": 0.0003533160858202085, "V'": 0.13706832312681352}}, "-": {"discusses the importance of federalism": {"p-value": 3.391382000490894e-21, "V'": 0.20966241812114386}, "focuses on federalism": {"p-value": 3.877620802347219e-09, "V'": 0.22922146507587393}, "references federalism": {"p-value": 2.1508307685903054e-10, "V'": 0.22747293031780103}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the debate camp that published the evidence. The Group A snippets are pieces of evidence compiled by Michigan State (SDI), a debate camp, while the Group B snippets are pieces of evidence compiled by None, a debate camp. I am a debater deciding which camp to go to. My goal is to figure out what specific topics each debate camp focuses on. "}, {"+": {"focuses on federalism": {"p-value": 7.725724052656752e-06, "V'": 0.1320377631980495}, "mentions environmental issues": {"p-value": 4.529325256978904e-06, "V'": 0.16752343840721223}, "relies on epistemology": {"p-value": 0.00010611554130919775, "V'": 0.09178685460963676}, "talks about federalism and the balance of power between the federal government and states": {"p-value": 9.092153321675024e-09, "V'": 0.14646626435457347}, "relates to environmental issues": {"p-value": 6.08770281833643e-06, "V'": 0.17042312083884975}, "Discusses indigenous rights and sovereignty": {"p-value": 8.378411059365595e-13, "V'": 0.20250716818112013}}, "-": {"refers to space exploration": {"p-value": 2.567096869876001e-10, "V'": 0.22545456036389952}, "Focuses on India-related topics": {"p-value": 4.735663470561902e-05, "V'": 0.1131037553717583}, "references international issues": {"p-value": 1.2935648361571997e-06, "V'": 0.24234039530662377}, "mentions technology and its implications": {"p-value": 6.421422957308413e-12, "V'": 0.28551229481610396}}, "research goal": "The dataset includes evidence compiled for American competitive policy debate, published online by debate camps. The two classes are generated based on the debate camp that published the evidence. The Group A snippets are pieces of evidence compiled by Mean Green Comet, a debate camp, while the Group B snippets are pieces of evidence compiled by The Debate Intensive, a debate camp. I am a debater deciding which camp to go to. My goal is to figure out what specific topics each debate camp focuses on. "}, {"+": {"mentions wrong size": {"p-value": 1.1571555015716796e-25, "V'": 0.23383613778935144}, "complains about the quality of the product": {"p-value": 0.0, "V'": 0.6677952804583082}, "mentions wrong color": {"p-value": 3.1494945748255503e-75, "V'": 0.200955495658}, "mentions quality not being up to par": {"p-value": 1.1518231319967037e-250, "V'": 0.5920024162487953}, "mentions wrong size was received": {"p-value": 3.59492223646802e-98, "V'": 0.33559182913671737}, "mentions uncomfortable fit": {"p-value": 7.997113264902791e-101, "V'": 0.2907996063686366}, "reports uncomfortable fit or size": {"p-value": 3.275698861653597e-180, "V'": 0.42729380950492407}, "Too narrow for size": {"p-value": 2.8616873418987126e-07, "V'": 0.05725008637995577}, "mentions wrong size being sent": {"p-value": 5.183019254932504e-115, "V'": 0.30882146757615014}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of fashion items on Amazon giving 1 star, while the Group B snippets are reviews of fashion items on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the poor quality": {"p-value": 7.692169569788807e-21, "V'": 0.4786321344997755}, "mentioning that the product looks cheap": {"p-value": 0.00010036621478465749, "V'": 0.08730132793818429}, "mentions that the item was not as described": {"p-value": 3.266022147326999e-08, "V'": 0.26274413232332083}, "mentions that the product was defective": {"p-value": 7.69056390067822e-21, "V'": 0.4786325261459113}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of fashion items on Amazon giving 1 star, while the Group B snippets are reviews of fashion items on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"not true to size": {"p-value": 2.346942510721667e-12, "V'": 0.3503254453708242}, "mentions poor construction quality": {"p-value": 2.820016956218932e-58, "V'": 0.419356100898923}, "Poorly constructed": {"p-value": 2.8223534488917765e-58, "V'": 0.4193540287534589}, "mentions poor construction": {"p-value": 2.3118322180443908e-35, "V'": 0.27956938605966314}, "Uncomfortable and do not breathe": {"p-value": 2.6330065634715306e-33, "V'": 0.3845709325205765}, "Complains about the fit of the item": {"p-value": 4.406531497631035e-34, "V'": 0.627256640726179}, "shoe size is not true to size": {"p-value": 1.026878267705202e-06, "V'": 0.2469137561443577}, "mentions the product being of poor quality": {"p-value": 1.076573224980332e-53, "V'": 0.41500726356868595}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of fashion items on Amazon giving 2 stars, while the Group B snippets are reviews of fashion items on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the shoes being too large": {"p-value": 2.2612897819803703e-50, "V'": 0.1714399065097167}, "mentions the size being too small": {"p-value": 3.0084147131196805e-20, "V'": 0.09491643868550145}, "mentions not being true to size": {"p-value": 1.0955441421246522e-32, "V'": 0.1805052935295966}, "mentions the product being too small": {"p-value": 1.2551082902362577e-29, "V'": 0.11591367210535371}, "Product is too large or too small": {"p-value": 1.3403797637183267e-70, "V'": 0.23218557406215976}, "Unhappy with the size of the item": {"p-value": 7.186131646864822e-63, "V'": 0.1948621563039962}, "complains about the size": {"p-value": 6.560321180492653e-69, "V'": 0.19861714019286125}, "mentions fit issues": {"p-value": 2.9260862693250375e-60, "V'": 0.32964804617800103}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of fashion items on Amazon giving 4 stars, while the Group B snippets are reviews of fashion items on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"not as expected": {"p-value": 0.0, "V'": 0.8741234119661021}, "mentions poor quality": {"p-value": 0.0, "V'": 0.8374639007433595}, "mentions low quality or not worth the money": {"p-value": 0.0, "V'": 0.885414341106506}, "reports product not working as advertised": {"p-value": 0.0, "V'": 0.8960265477130646}, "mentions the product being a waste of money": {"p-value": 2.0313944407661708e-142, "V'": 0.6743783212550133}, "mentions the product being weak and not standing": {"p-value": 0.0, "V'": 0.7400615322922007}, "mentions a low quality product": {"p-value": 0.0, "V'": 0.8281281507372034}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of beauty products on Amazon giving 1 star, while the Group B snippets are reviews of beauty products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"does not feel like it offers a close shave": {"p-value": 1.0092831683011094e-11, "V'": 0.13480197461591154}, "mentions product being uncomfortable to use": {"p-value": 1.2510262833583682e-14, "V'": 0.1505512973793771}, "talks about lack of value for money": {"p-value": 4.928043676921881e-100, "V'": 0.39777012199915884}, "Mentions quality not being as expected": {"p-value": 4.776528588583657e-34, "V'": 0.17456680027721172}, "mentions poor quality of product": {"p-value": 4.7659653571118e-138, "V'": 0.4620182719622482}, "mentions poor quality": {"p-value": 2.0688141828764944e-150, "V'": 0.49318227892557676}, "the product is not as pictured": {"p-value": 1.0084811218730204e-46, "V'": 0.229570040624773}, "mentions a misleading title": {"p-value": 4.71853319723312e-14, "V'": 0.07472193471851457}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of beauty products on Amazon giving 1 star, while the Group B snippets are reviews of beauty products on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about the product not working as intended": {"p-value": 0.0, "V'": 0.7072492432007483}, "mentions a lack of power/strength": {"p-value": 1.4419454592001923e-12, "V'": 0.11492355524971112}, "mentions that it is not worth the money": {"p-value": 3.167194291322844e-134, "V'": 0.40148738327944744}, "Complains about the product's poor quality": {"p-value": 0.0, "V'": 0.7747310312953773}, "mentions cheap build quality": {"p-value": 2.1591686781729627e-102, "V'": 0.3433215250550643}, "mentions a poor product design": {"p-value": 3.81395833e-316, "V'": 0.6792561260533766}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of beauty products on Amazon giving 2 stars, while the Group B snippets are reviews of beauty products on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the price being too high": {"p-value": 5.9225941778777505e-08, "V'": 0.03640427212723141}, "complains about the smell": {"p-value": 7.19015570161649e-07, "V'": 0.019716876752176104}, "mentions the product being small": {"p-value": 6.479304215713477e-05, "V'": 0.028808451483584522}, "mentions the product being too costly": {"p-value": 1.3852010056535637e-06, "V'": 0.03546257419287818}, "mentions limited power": {"p-value": 3.755422960021151e-26, "V'": 0.08219629739283478}, "mentions the product does not last long": {"p-value": 7.502562322870945e-19, "V'": 0.07524672876446106}, "mentions not lasting as long as expected": {"p-value": 4.3694730965932306e-21, "V'": 0.08574187277782344}, "indicates the product is not what was expected": {"p-value": 9.67500403394024e-37, "V'": 0.08653331837788887}, "Mentions the product being too small.": {"p-value": 5.1944840686168455e-08, "V'": 0.02877987217017393}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of beauty products on Amazon giving 4 stars, while the Group B snippets are reviews of beauty products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"Not worth the money": {"p-value": 0.0, "V'": 0.9106728117548516}, "The product did not fit as advertised.": {"p-value": 0.0, "V'": 0.8279015013545119}, "mentions poor quality or construction": {"p-value": 0.0, "V'": 0.7938723954796393}, "mentions poor quality": {"p-value": 0.0, "V'": 0.8145698616895766}, "mentions a poor quality product": {"p-value": 0.0, "V'": 0.8426407340291993}, "mentions poor quality construction": {"p-value": 0.0, "V'": 0.7017739222667012}, "mentions poor quality product": {"p-value": 0.0, "V'": 0.845347147361255}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of appliances on Amazon giving 1 star, while the Group B snippets are reviews of appliances on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions that the product is of poor quality": {"p-value": 8.671534292949001e-79, "V'": 0.3772141639109552}, "mentions poor quality materials": {"p-value": 2.59507027074627e-75, "V'": 0.37114556049426967}, "Product was not usable as advertised": {"p-value": 2.15320385144724e-107, "V'": 0.39337610233562004}, "mentions poor quality parts": {"p-value": 4.889523708369302e-72, "V'": 0.36433335388394905}, "mentions poor quality": {"p-value": 5.101035782211399e-63, "V'": 0.33213324033569974}, "mentions poor quality product": {"p-value": 1.3335858548122033e-80, "V'": 0.366808125447614}, "mentions a faulty part": {"p-value": 3.551173944958795e-59, "V'": 0.3237087012719232}, "mentions product being faulty": {"p-value": 2.674706713761959e-86, "V'": 0.35757938400418265}, "mentions a product being cheaply made": {"p-value": 3.828140044254712e-51, "V'": 0.28947339634700303}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of appliances on Amazon giving 1 star, while the Group B snippets are reviews of appliances on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a problem with customer service": {"p-value": 1.354719497456272e-29, "V'": 0.16655186447553122}, "refers to the product being flimsy": {"p-value": 7.281777267916302e-106, "V'": 0.34381658687997807}, "Poor quality materials": {"p-value": 3.0428549512848557e-286, "V'": 0.620004371742205}, "Complains about the product breaking quickly": {"p-value": 1.0919137973681038e-121, "V'": 0.37836911634644077}, "mentions a short-lived product": {"p-value": 2.282096733360264e-36, "V'": 0.1878927499579235}, "mentions poor quality of the product": {"p-value": 0.0, "V'": 0.7376787331267122}, "mentions being dissatisfied with the quality": {"p-value": 0.0, "V'": 0.7539001827358013}, "mentions poor quality of product": {"p-value": 0.0, "V'": 0.7362762968455419}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of appliances on Amazon giving 2 stars, while the Group B snippets are reviews of appliances on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"not as bright as expected": {"p-value": 1.8199637486318714e-05, "V'": 0.01188848987716252}, "mentioning that the product is too small": {"p-value": 1.2039797518966838e-11, "V'": 0.031229892241392252}, "The filter works as expected, but is a little expensive.": {"p-value": 3.090567970421244e-18, "V'": 0.08616710551035198}, "mentions the product being too expensive": {"p-value": 2.1557509863782976e-10, "V'": 0.03670159251728602}, "notes the product was not the right color": {"p-value": 0.0008647109368479769, "V'": 0.009249612841265463}, "mentions product not working as advertised": {"p-value": 1.278717753275508e-46, "V'": 0.11368518071084283}, "complains about the price": {"p-value": 6.971730654326578e-16, "V'": 0.04311473661411694}, "mentions the product being damaged": {"p-value": 2.928954735684721e-13, "V'": 0.047095450175989664}, "mentions difficulty installing": {"p-value": 1.2796686871934654e-14, "V'": 0.049725987147712064}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of appliances on Amazon giving 4 stars, while the Group B snippets are reviews of appliances on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"Does not work or stay on": {"p-value": 0.0, "V'": 0.576381188914468}, "mentions poor quality": {"p-value": 0.0, "V'": 0.709233375241934}, "mentions poor quality of product": {"p-value": 0.0, "V'": 0.7347902606674283}, "poor quality of materials": {"p-value": 0.0, "V'": 0.7043895212764684}, "expresses dissatisfaction with the product's quality": {"p-value": 0.0, "V'": 0.8574468693860822}, "refers to the product being of bad quality": {"p-value": 0.0, "V'": 0.707337499179486}, "mentions product not being as described": {"p-value": 0.0, "V'": 0.8804645107388165}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of arts, crafts, and sewing products on Amazon giving 1 star, while the Group B snippets are reviews of arts, crafts, and sewing products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the product being a waste of money": {"p-value": 5.1786260233336105e-200, "V'": 0.5230987045154011}, "mentions poor quality": {"p-value": 2.5296275383197154e-85, "V'": 0.34861707312106843}, "mentions the product arriving in a poor condition": {"p-value": 3.283070084518002e-45, "V'": 0.2178785212409906}, "mentions the product not providing value for money": {"p-value": 1.2937176450785402e-81, "V'": 0.35731954816705214}, "mentions poor quality of materials": {"p-value": 2.526588184173666e-66, "V'": 0.3163459217852071}, "mentions poor quality materials": {"p-value": 6.359693284393727e-75, "V'": 0.3415930034378431}, "mentions poor quality product": {"p-value": 2.3729877362282832e-92, "V'": 0.3660767264558398}, "mentions a product not working as expected": {"p-value": 3.468697854071642e-33, "V'": 0.18422999609609003}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of arts, crafts, and sewing products on Amazon giving 1 star, while the Group B snippets are reviews of arts, crafts, and sewing products on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"product not as expected": {"p-value": 0.0, "V'": 0.782121865835891}, "mentions the product being too fragile": {"p-value": 1.573048890501413e-07, "V'": 0.18654466452612833}, "mentions poor quality": {"p-value": 0.0, "V'": 0.6791481340383293}, "mentions poor quality for the money": {"p-value": 0.0, "V'": 0.6424798267316914}, "Complains about the quality of the material.": {"p-value": 0.0, "V'": 0.7338378314956647}, "reports unsatisfactory product quality": {"p-value": 0.0, "V'": 0.8057123819923155}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of arts, crafts, and sewing products on Amazon giving 2 stars, while the Group B snippets are reviews of arts, crafts, and sewing products on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the color being off from expectations": {"p-value": 5.2820862467935435e-17, "V'": 0.03390526096167548}, "not as sharp as other notchers": {"p-value": 6.091515817054838e-10, "V'": 0.019740237643928224}, "mentions the product not being suitable for the intended purpose": {"p-value": 1.0307930351770149e-64, "V'": 0.13040020996471866}, "mentions poor quality of product": {"p-value": 8.499167808525499e-63, "V'": 0.15676794457047202}, "mentions difficulty of assembly": {"p-value": 0.0006562658839472789, "V'": 0.008971361163060694}, "mentions the product being too expensive": {"p-value": 2.680244184650537e-09, "V'": 0.02332009530196984}, "mentions poor quality": {"p-value": 2.348536741026681e-57, "V'": 0.12249161575722325}, "mentions difficulty using the product": {"p-value": 5.128007531937742e-105, "V'": 0.19150459793782407}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of arts, crafts, and sewing products on Amazon giving 4 stars, while the Group B snippets are reviews of arts, crafts, and sewing products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"poor quality material used": {"p-value": 2.7368829144896208e-87, "V'": 0.5605576071089985}, "mentions poor quality materials": {"p-value": 6.813387373236933e-166, "V'": 0.6322178222401654}, "mentions poor quality": {"p-value": 0.0, "V'": 0.7751519148099576}, "mentions the item being made from cheap materials": {"p-value": 2.8941588543398005e-13, "V'": 0.32214734681329205}, "product is of poor quality": {"p-value": 0.0, "V'": 0.7705605598214484}, "complains about the product not fitting": {"p-value": 0.0, "V'": 0.8287563810945527}, "mentions the product not being what was expected": {"p-value": 0.0, "V'": 0.8887690448924285}, "mentions the item not being as described": {"p-value": 0.0, "V'": 0.8800476315219671}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of automotive on Amazon giving 1 star, while the Group B snippets are reviews of automotive on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a poor fit": {"p-value": 1.8649324294896483e-50, "V'": 0.3122930979838306}, "complains about product quality": {"p-value": 5.049266218226231e-72, "V'": 0.31430847040045495}, "mentions poor quality of the product": {"p-value": 1.1546752395751924e-132, "V'": 0.49149750053170405}, "mentions the product arriving defective": {"p-value": 5.083337656299396e-104, "V'": 0.4020284397543327}, "mentions product not fitting": {"p-value": 7.160193534138862e-48, "V'": 0.2746453087241073}, "mentions a product not working as expected": {"p-value": 2.7238158645379516e-69, "V'": 0.31116004800803365}, "Complaints about missing pieces": {"p-value": 1.9957593635248924e-06, "V'": 0.07001418952274058}, "mentions that it is not fit for its purpose": {"p-value": 5.89332858422521e-129, "V'": 0.45174978504233787}, "mentions poor quality of product": {"p-value": 1.444383626967661e-122, "V'": 0.4730451223510702}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of automotive on Amazon giving 1 star, while the Group B snippets are reviews of automotive on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions product not working correctly": {"p-value": 3.465791313841211e-298, "V'": 0.6702675929816512}, "mentions the product not working as advertised": {"p-value": 0.0, "V'": 0.7258770443964816}, "lack of customer service from the manufacturer": {"p-value": 1.700181602192472e-05, "V'": 0.062176287722702}, "negative sentiment about the product's quality": {"p-value": 0.0, "V'": 0.7765875589301944}, "didn't get the expected results": {"p-value": 0.0, "V'": 0.7380677026552089}, "mentions poor quality of materials": {"p-value": 1.9633911121661847e-208, "V'": 0.5514510564214066}, "poor quality of construction": {"p-value": 9.392506548004812e-150, "V'": 0.4390736370565524}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of automotive on Amazon giving 2 stars, while the Group B snippets are reviews of automotive on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the product not being a perfect fit": {"p-value": 2.381064757263427e-98, "V'": 0.28815731166432856}, "complains about the low quality": {"p-value": 1.3346921789173248e-11, "V'": 0.03449784302653445}, "implies that the product was overpriced": {"p-value": 2.643708973519129e-05, "V'": 0.018849394345780354}, "complains about the product's value in relation to its cost": {"p-value": 5.487523571519952e-17, "V'": 0.04156943647790059}, "mentions poor performance": {"p-value": 3.00969070691096e-25, "V'": 0.0822895721856605}, "Mention of short life span": {"p-value": 2.0908114771939498e-05, "V'": 0.026501939291817364}, "mentions poor quality": {"p-value": 6.811631242508899e-16, "V'": 0.05220399516133009}, "Notes the product is difficult to remove": {"p-value": 0.00018741898558704203, "V'": 0.017977004079987025}, "mentions difficulty in installation": {"p-value": 9.073257294964451e-18, "V'": 0.0671699387691719}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of automotive on Amazon giving 4 stars, while the Group B snippets are reviews of automotive on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor video or audio quality": {"p-value": 4.619927858364045e-95, "V'": 0.28839762723475293}, "mentions the over-orchestrated, over-produced music": {"p-value": 1.3719906821369232e-34, "V'": 0.1404169840500676}, "mentions terrible remastering with compressed dynamic range": {"p-value": 4.0669096028967793e-23, "V'": 0.08414707233962178}, "mentions poor recording quality": {"p-value": 3.999890513586697e-70, "V'": 0.22814382742599423}, "mentions poor sound quality": {"p-value": 2.5956681338259625e-69, "V'": 0.2257534964111646}, "disappointed with the remastering quality": {"p-value": 1.6785788124924327e-74, "V'": 0.2331899795805675}, "complains about poor quality": {"p-value": 0.0, "V'": 0.8675278832320742}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of CDs on Amazon giving 1 star, while the Group B snippets are reviews of CDs on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a lack of quality": {"p-value": 1.546098432076472e-11, "V'": 0.13806346295943994}, "criticizes the lack of quality": {"p-value": 7.781351335552001e-17, "V'": 0.15183098180160137}, "criticizes the poor quality": {"p-value": 3.09842656436792e-51, "V'": 0.2848119481915563}, "mentions poor sound quality": {"p-value": 1.1878344630478454e-08, "V'": 0.08849402955821838}, "expresses dissatisfaction with the sound quality": {"p-value": 5.026871101986525e-12, "V'": 0.12572144295016796}, "mentions poor quality of sound": {"p-value": 1.140549239009849e-30, "V'": 0.2188610837164973}, "mentions sound quality is poor": {"p-value": 4.300067234054971e-09, "V'": 0.08948475997508662}, "mentions poor sound quality or production": {"p-value": 5.082771699976286e-24, "V'": 0.19564987591004926}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of CDs on Amazon giving 1 star, while the Group B snippets are reviews of CDs on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"gives a lack of appreciation for the artist": {"p-value": 0.0, "V'": 0.706344173399677}, "mentions poor sound quality": {"p-value": 2.953957101313038e-39, "V'": 0.18751687268756892}, "disappointed with the quality": {"p-value": 0.0, "V'": 0.8062526149663788}, "mentions missing features": {"p-value": 1.850786806886103e-39, "V'": 0.27526784280740413}, "Complains about the production value": {"p-value": 3.1882171973913384e-158, "V'": 0.48739335010045526}, "notes poor sound quality": {"p-value": 3.454362374993185e-70, "V'": 0.2772957831186582}, "mentions lack of originality": {"p-value": 1.181722316454016e-174, "V'": 0.5573056308798984}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of CDs on Amazon giving 2 stars, while the Group B snippets are reviews of CDs on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the sound quality not being as good as expected": {"p-value": 1.5754106751571636e-13, "V'": 0.06019481188021206}, "mentions that the music is not as good as the previous albums": {"p-value": 4.3078760662640757e-38, "V'": 0.146528726117854}, "criticizes the sound quality": {"p-value": 7.76829804367818e-13, "V'": 0.05662267356352505}, "mentions the sound quality is not up to par": {"p-value": 2.4613591307937327e-09, "V'": 0.04494554146562293}, "mentions a lack of quality": {"p-value": 9.893023469686732e-13, "V'": 0.052502142437340804}, "mentions dated sound or context": {"p-value": 1.368284892353122e-09, "V'": 0.126430971123342}, "mentions not liking the vocal arrangements": {"p-value": 1.1876158469910664e-06, "V'": 0.02713701886301502}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of CDs on Amazon giving 4 stars, while the Group B snippets are reviews of CDs on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"cheaply made and easily broken": {"p-value": 0.0, "V'": 0.534214163116539}, "mentions the product being cheaply made": {"p-value": 1.4214988408521705e-295, "V'": 0.47260936021566946}, "refers to the item being cheaply made": {"p-value": 2.178724618892709e-298, "V'": 0.4748002530721907}, "Complains about the poor quality of the product": {"p-value": 0.0, "V'": 0.8614477507458456}, "poor quality product": {"p-value": 0.0, "V'": 0.8447598806384623}, "Poor quality materials and construction": {"p-value": 0.0, "V'": 0.7682939271113779}, "mentions it was too cheaply made": {"p-value": 0.0, "V'": 0.5105210892517478}, "bad product quality": {"p-value": 0.0, "V'": 0.8585982129019415}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of cell phones and accessories on Amazon giving 1 star, while the Group B snippets are reviews of cell phones and accessories on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a poor build quality": {"p-value": 9.77056351457595e-59, "V'": 0.32757509828933595}, "mentions poor quality of product": {"p-value": 8.665781926411554e-85, "V'": 0.3674807000243586}, "mentions poor quality or material": {"p-value": 1.205913001491904e-83, "V'": 0.3718722654422329}, "mentions the product not working": {"p-value": 1.0134236462486535e-85, "V'": 0.3779955827727121}, "mentions the poor quality of the product": {"p-value": 1.1487819550166626e-104, "V'": 0.4124511950434428}, "mentions poor quality and materials": {"p-value": 1.3750155902571914e-77, "V'": 0.36307837228037443}, "mentions that the item was defective": {"p-value": 4.4595993084960804e-122, "V'": 0.45380549780194956}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of cell phones and accessories on Amazon giving 1 star, while the Group B snippets are reviews of cell phones and accessories on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor quality": {"p-value": 0.0, "V'": 0.7005080217697177}, "mentions poor quality materials": {"p-value": 8.393045303501345e-284, "V'": 0.6169564946061754}, "mentions of poor quality product": {"p-value": 0.0, "V'": 0.7389871637150187}, "mentions poor quality material": {"p-value": 6.4083828905335e-311, "V'": 0.6446559504974168}, "mentions poor material quality": {"p-value": 1.9668652608885307e-275, "V'": 0.6117136397952905}, "mentions product not lasting long enough": {"p-value": 6.224858197857179e-78, "V'": 0.4871196299950795}, "expresses dissatisfaction with the product quality": {"p-value": 0.0, "V'": 0.7909539645947883}, "Mentions poor fit and finish": {"p-value": 2.56572380497e-312, "V'": 0.6682327267560481}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of cell phones and accessories on Amazon giving 2 stars, while the Group B snippets are reviews of cell phones and accessories on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions inadequate protection": {"p-value": 8.982574344146877e-13, "V'": 0.04446453177330725}, "mentions poor battery life": {"p-value": 0.0006225035943226415, "V'": 0.018385300553043396}, "mentions a lack of protection": {"p-value": 8.803632385301718e-07, "V'": 0.028422201336692876}, "mentions the product being too bulky": {"p-value": 1.8707597814399033e-05, "V'": 0.025912754635686444}, "mentions problems with durability": {"p-value": 5.59170774005483e-41, "V'": 0.15387947058704593}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of cell phones and accessories on Amazon giving 4 stars, while the Group B snippets are reviews of cell phones and accessories on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor sound quality": {"p-value": 1.5977886604709049e-108, "V'": 0.21535517909925253}, "mentions dissatisfaction with sound quality": {"p-value": 2.471730220853061e-146, "V'": 0.2807079403651996}, "not worth the money spent": {"p-value": 2.134830358881594e-290, "V'": 0.46179637842296073}, "mentions the product being advertised as new when it was not": {"p-value": 2.756850282822173e-10, "V'": 0.03229903979310872}, "Mentions poor sound quality": {"p-value": 2.697560352395225e-144, "V'": 0.27337711743601845}, "complains about the quality of the product": {"p-value": 0.0, "V'": 0.7128147806566518}, "The volume level is too low.": {"p-value": 2.4133695564066205e-13, "V'": 0.04061542326439028}, "mentions poor mastering quality": {"p-value": 9.39371950701232e-66, "V'": 0.13771999668443927}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of digital music on Amazon giving 1 star, while the Group B snippets are reviews of digital music on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"bad vocals": {"p-value": 4.1089372707924757e-25, "V'": 0.11708483583056163}, "complains about the poor quality of the music": {"p-value": 4.862362907436708e-97, "V'": 0.404806253402954}, "mentions that it was a waste of money": {"p-value": 3.154503942545498e-148, "V'": 0.45506798021300726}, "mentions lack of creativity": {"p-value": 6.7061198349064506e-40, "V'": 0.22961455843062345}, "mentions the artist not having any talent": {"p-value": 1.010033032542517e-76, "V'": 0.25172989143618313}, "Unsatisfied with the sound quality": {"p-value": 6.266413876214619e-33, "V'": 0.2095484179293757}, "mentions poor sound quality": {"p-value": 9.029651770720716e-16, "V'": 0.10941901544969915}, "mentions the sound quality is bad": {"p-value": 1.1957682104204062e-13, "V'": 0.0977614085240498}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of digital music on Amazon giving 1 star, while the Group B snippets are reviews of digital music on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"disappointed with the sound quality": {"p-value": 2.808160624174571e-73, "V'": 0.23344350679573078}, "mentions the sound quality is low": {"p-value": 1.0080477981576784e-33, "V'": 0.11514180494877073}, "mentions that the product was not delivered as expected": {"p-value": 1.5181369943256823e-206, "V'": 0.47876475032214505}, "sounds too soft or low.": {"p-value": 2.1955186799174252e-05, "V'": 0.0318838682551687}, "mentions the quality of the music is not as good as expected": {"p-value": 9.838591744092642e-243, "V'": 0.5656294416085162}, "criticizes the quality of sound": {"p-value": 2.243951918632993e-160, "V'": 0.43376140325697377}, "mentions low quality of sound": {"p-value": 7.977991475404461e-72, "V'": 0.21920927819354363}, "mentions poor sound quality": {"p-value": 5.086122267540525e-52, "V'": 0.1714436915534297}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of digital music on Amazon giving 2 stars, while the Group B snippets are reviews of digital music on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor sound quality": {"p-value": 3.86126999000074e-09, "V'": 0.020247427391515423}, "says the product is good, but not great": {"p-value": 1.6217203923497235e-52, "V'": 0.11664970201525822}, "disappointed with the sound quality": {"p-value": 1.2581162355773575e-07, "V'": 0.015293209614443647}, "mentions low-quality production": {"p-value": 1.4004302240910983e-06, "V'": 0.016599832291166405}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of digital music on Amazon giving 4 stars, while the Group B snippets are reviews of digital music on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the gift card not working": {"p-value": 2.0777329860117985e-201, "V'": 0.41768040829638275}, "cards not working": {"p-value": 3.2833831511565654e-204, "V'": 0.4215850032513255}, "complains about not being able to redeem the card": {"p-value": 0.0, "V'": 0.569149109104641}, "mentions the product being a rip off": {"p-value": 1.5972834462377186e-241, "V'": 0.47120013819223383}, "Did not receive the item that was ordered": {"p-value": 1.3284105841018637e-139, "V'": 0.3218751342206229}, "complains about the lack of security": {"p-value": 2.9224318451781935e-208, "V'": 0.42949550679338966}, "Expresses disappointment about not getting a 'Merry Christmas' message": {"p-value": 6.903807032057532e-14, "V'": 0.09320868607422965}, "mentions an omitted product": {"p-value": 0.0007638203987274058, "V'": 0.041468699918183194}, "The gift card was never received": {"p-value": 1.990983426144029e-36, "V'": 0.1624383629001328}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of gift cards on Amazon giving 1 star, while the Group B snippets are reviews of gift cards on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a lack of funds": {"p-value": 9.250016150111133e-12, "V'": 0.056898747873598346}, "mentions the card not having any money or value": {"p-value": 1.306699737572274e-35, "V'": 0.15939733167857123}, "complains about lack of value on the card": {"p-value": 9.664672811708606e-45, "V'": 0.23744174294920933}, "mentions a lack of monetary value on the card": {"p-value": 3.5231221647700525e-15, "V'": 0.08312976633898386}, "mentions problems with delivery": {"p-value": 2.9798737179758963e-05, "V'": 0.0873247388756413}, "mentions delivery delay": {"p-value": 0.00014139431855808977, "V'": 0.05272904393163416}, "mentions delay in delivery": {"p-value": 0.00029138312735694057, "V'": 0.04971808988296622}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of gift cards on Amazon giving 1 star, while the Group B snippets are reviews of gift cards on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions delays in delivery": {"p-value": 2.4059419112392147e-40, "V'": 0.1946161519619036}, "Recipient was not told who this was from": {"p-value": 1.657284307711328e-13, "V'": 0.07079211334899306}, "had difficulties with the purchase process": {"p-value": 6.443546380292358e-109, "V'": 0.40714224629106366}, "complains about the tin being damaged": {"p-value": 2.71461186198801e-08, "V'": 0.036720004321441437}, "disappointed with service or customer service": {"p-value": 0.0, "V'": 0.7307576523738487}, "Complained about the late arrival of the gift card.": {"p-value": 5.986700031297443e-40, "V'": 0.1724014170633627}, "mentions late delivery": {"p-value": 2.932283295163136e-31, "V'": 0.16927371146131912}, "Mentions a confusing or difficult process to use the gift card": {"p-value": 4.658410425745414e-55, "V'": 0.2557474507353729}, "Complains about the long delivery time": {"p-value": 8.59738460724098e-40, "V'": 0.1705363585000823}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of gift cards on Amazon giving 2 stars, while the Group B snippets are reviews of gift cards on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions not receiving the product as expected": {"p-value": 4.954621916851558e-58, "V'": 0.09039438663810448}, "mentions difficulty in finding the correct product": {"p-value": 1.1264608107517484e-06, "V'": 0.01860501146272733}, "concerns about not being able to print out the picture": {"p-value": 1.5027111598881634e-05, "V'": 0.012654565185694841}, "mentions product arriving late": {"p-value": 1.462061113996851e-07, "V'": 0.016729711424660284}, "mentions difficulty in printing": {"p-value": 2.8555036668960452e-11, "V'": 0.014592472541728812}, "Mentions difficulty or confusion in using the card": {"p-value": 5.274546971814694e-53, "V'": 0.08618526409707265}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of gift cards on Amazon giving 4 stars, while the Group B snippets are reviews of gift cards on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"tastes artificial or overly processed": {"p-value": 1.1433166374736194e-102, "V'": 0.42955006713929184}, "mentions poor taste": {"p-value": 0.0, "V'": 0.657653008404791}, "mentions bad taste": {"p-value": 0.0, "V'": 0.6419121530780741}, "tastes artificial or chemical": {"p-value": 2.1540337867071422e-16, "V'": 0.22686297478302703}, "mentions a low-quality product": {"p-value": 0.0, "V'": 0.7220829495207316}, "mentions a bad taste": {"p-value": 9.036852518492994e-304, "V'": 0.5888227861306369}, "tastes bad or artificial": {"p-value": 0.0, "V'": 0.6181698828775375}, "mentions the product not meeting expectations": {"p-value": 0.0, "V'": 0.8854921136313474}, "Mentions poor quality": {"p-value": 0.0, "V'": 0.8367666140038926}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of grocery and gourmet food on Amazon giving 1 star, while the Group B snippets are reviews of grocery and gourmet food on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a bad taste": {"p-value": 1.1203883637655403e-50, "V'": 0.31117650389011814}, "mentions the product being a bad purchase": {"p-value": 5.48823264804371e-130, "V'": 0.48137929192805234}, "mentions an unpleasant taste": {"p-value": 1.447794297581026e-51, "V'": 0.3071394341273084}, "complains about the taste of the product": {"p-value": 3.333302730597033e-24, "V'": 0.21423333158245567}, "mentions artificial ingredients or flavors": {"p-value": 1.2003761171869166e-11, "V'": 0.09679301241778808}, "complains about the lack of flavor": {"p-value": 3.13614949521657e-08, "V'": 0.11820950564966431}, "mentions a bad taste or smell": {"p-value": 4.8800561062868856e-57, "V'": 0.3210814203462621}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of grocery and gourmet food on Amazon giving 1 star, while the Group B snippets are reviews of grocery and gourmet food on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about the taste": {"p-value": 1.2880996141397197e-287, "V'": 0.6455890585409099}, "mentions the product's bland or weak taste": {"p-value": 3.5621709149852743e-181, "V'": 0.4910500178204194}, "mentions an unpleasant taste": {"p-value": 8.001435874524556e-150, "V'": 0.41336038297542643}, "not impressed with the product's taste": {"p-value": 0.0, "V'": 0.6950056817820107}, "taste/flavor is unsatisfactory": {"p-value": 0.0, "V'": 0.7162996766038046}, "mentions a lack of flavor or taste": {"p-value": 3.78194085625813e-108, "V'": 0.35961679643970895}, "tastes too strong/strange": {"p-value": 6.134788519269196e-129, "V'": 0.4046244763832809}, "mentions a bad taste": {"p-value": 1.8373557893556535e-195, "V'": 0.4988423464338557}, "Dislike the taste or smell of the product": {"p-value": 2.77e-322, "V'": 0.6610704814908975}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of grocery and gourmet food on Amazon giving 2 stars, while the Group B snippets are reviews of grocery and gourmet food on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the price is too high": {"p-value": 6.0667748298230325e-09, "V'": 0.03285434050439684}, "price is too high": {"p-value": 7.781802728102678e-11, "V'": 0.03592505233466162}, "mentions an issue with the packaging": {"p-value": 1.159697000844063e-21, "V'": 0.07105531032166851}, "Mentions the product being too expensive": {"p-value": 5.239854058043324e-10, "V'": 0.034827508368868415}, "Notes a lack of flavor": {"p-value": 3.0743642941225476e-21, "V'": 0.05652317331568209}, "mentions product being too small": {"p-value": 0.00022305696246064383, "V'": 0.0166923813852849}, "mentions product being too expensive": {"p-value": 3.039803098951878e-09, "V'": 0.03373722572328484}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of grocery and gourmet food on Amazon giving 4 stars, while the Group B snippets are reviews of grocery and gourmet food on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor quality control": {"p-value": 0.0, "V'": 0.8552465624084673}, "poor quality control": {"p-value": 0.0, "V'": 0.8826200215917386}, "mentions poor quality or craftsmanship": {"p-value": 0.0, "V'": 0.858210400364723}, "mentions poor quality": {"p-value": 0.0, "V'": 0.8371357743660053}, "mentions poor quality of materials": {"p-value": 0.0, "V'": 0.7587880776715856}, "Mentions the product not living up to expectations": {"p-value": 0.0, "V'": 0.9183360822057836}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of industrial and scientific products on Amazon giving 1 star, while the Group B snippets are reviews of industrial and scientific products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the product not performing as expected": {"p-value": 9.300164684261327e-74, "V'": 0.2908632677676932}, "Complains about the product being of low quality": {"p-value": 1.4774076481033388e-139, "V'": 0.4414379908264716}, "complains about the product not working": {"p-value": 3.0130722161503785e-143, "V'": 0.45859088880421156}, "mentions poor quality materials": {"p-value": 2.766931345016139e-130, "V'": 0.46514117848003483}, "mentions poor accuracy": {"p-value": 8.85746516161479e-09, "V'": 0.16939068490621986}, "mentions poor quality": {"p-value": 1.6556370237835693e-148, "V'": 0.48109473860338187}, "complains about the quality of the product": {"p-value": 3.1273393433425573e-84, "V'": 0.3121709860358116}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of industrial and scientific products on Amazon giving 1 star, while the Group B snippets are reviews of industrial and scientific products on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions product not lasting long enough": {"p-value": 2.2082552062868398e-17, "V'": 0.2707098791638218}, "mentions poor quality materials": {"p-value": 0.0, "V'": 0.659995414103345}, "mentions product failure": {"p-value": 0.0, "V'": 0.7356671677729512}, "Poorly made": {"p-value": 0.0, "V'": 0.6911369495233098}, "complains about cheap quality": {"p-value": 0.0, "V'": 0.7479705089336123}, "complains about the quality of the product": {"p-value": 0.0, "V'": 0.7428858594675706}, "mentions poor quality": {"p-value": 0.0, "V'": 0.6955740763435906}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of industrial and scientific products on Amazon giving 2 stars, while the Group B snippets are reviews of industrial and scientific products on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor quality of product": {"p-value": 3.8504697382137347e-32, "V'": 0.08706052155006529}, "mentions a design flaw": {"p-value": 7.808551714563694e-80, "V'": 0.2528240694766881}, "mentions a low-quality product": {"p-value": 1.181437461250524e-19, "V'": 0.06317670256441835}, "complains about the product being too thin": {"p-value": 0.0001261418777228337, "V'": 0.011240475014422508}, "complains about the quality": {"p-value": 1.71972722938692e-30, "V'": 0.08246207490924157}, "mentions a shortcoming in product quality": {"p-value": 3.2418889357276576e-104, "V'": 0.30417317944353056}, "mentions difficulty with installation": {"p-value": 1.503775849236104e-13, "V'": 0.05119751534176707}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of industrial and scientific products on Amazon giving 4 stars, while the Group B snippets are reviews of industrial and scientific products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a burning sensation on the skin": {"p-value": 6.15794931813734e-10, "V'": 0.04078159485111893}, "Complains about the price": {"p-value": 5.585846182019785e-26, "V'": 0.09893687854785836}, "mentions the product being overpriced": {"p-value": 1.8373408204163784e-09, "V'": 0.06257783838785394}, "mentions that the product does not work as advertised": {"p-value": 0.0, "V'": 0.909379903566978}, "expresses disappointment in the product": {"p-value": 0.0, "V'": 0.9650413091054857}, "mentions bad color or color not as advertised": {"p-value": 1.4276794918023286e-91, "V'": 0.26468554182131737}, "mentions poor quality for the price": {"p-value": 0.0, "V'": 0.7221916902624564}, "mentions being overpriced": {"p-value": 5.242301613636854e-15, "V'": 0.08704306226927322}}, "-": {"The smell is unoffensive, but it fades quickly.": {"p-value": 1.5974506062423744e-10, "V'": 0.04975703194760134}}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of luxury beauty products on Amazon giving 1 star, while the Group B snippets are reviews of luxury beauty products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions breakouts after use": {"p-value": 1.016830793905426e-11, "V'": 0.0772002577363469}, "mentions product not working after one use": {"p-value": 1.586508125053484e-58, "V'": 0.3120781035131601}, "complains about the product being watery and difficult to use": {"p-value": 2.163969375003194e-09, "V'": 0.0938453771642718}, "mentions the product not doing what it promises": {"p-value": 1.0367192600684073e-14, "V'": 0.13854433531640042}, "mentions product burning skin": {"p-value": 4.481021294861531e-06, "V'": 0.04571404351645696}}, "-": {"mentions the product being overpriced": {"p-value": 1.4026634723609647e-08, "V'": 0.07644045740110594}, "Pans the product for being too expensive": {"p-value": 6.840274676825785e-07, "V'": 0.0649008971901079}}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of luxury beauty products on Amazon giving 1 star, while the Group B snippets are reviews of luxury beauty products on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor coverage": {"p-value": 5.8200585646495655e-102, "V'": 0.3205909121106118}, "mentions not matching the user's complexion": {"p-value": 7.066703126783503e-14, "V'": 0.10825267152577624}, "Mentions the product not being effective.": {"p-value": 0.0, "V'": 0.7751129628307503}, "complains about the scent of the product": {"p-value": 4.4272951357068056e-16, "V'": 0.09244897334494098}, "mentions the product not being effective": {"p-value": 0.0, "V'": 0.7615254181766897}, "mentions a strong or unpleasant scent": {"p-value": 1.1748434743916222e-09, "V'": 0.0709805318679929}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of luxury beauty products on Amazon giving 2 stars, while the Group B snippets are reviews of luxury beauty products on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a color not suiting their skin tone": {"p-value": 1.974444944899072e-15, "V'": 0.05775641459142723}, "complains about the color not being as expected": {"p-value": 3.885337882486571e-18, "V'": 0.07129931824124248}, "mentions that the product is too expensive": {"p-value": 1.6811162505446196e-33, "V'": 0.12507507652061906}, "mentions the product being too heavy": {"p-value": 3.28432585094488e-11, "V'": 0.04059963816980634}, "mentions the heavy scent of the product": {"p-value": 3.293770194268399e-21, "V'": 0.07609694502008693}, "mentions a bad smell": {"p-value": 0.00031142696080070767, "V'": 0.013859084484210082}, "price point is too high": {"p-value": 1.0997659938893896e-39, "V'": 0.1273652231561529}, "mentions the product not lasting long": {"p-value": 4.275784149221379e-24, "V'": 0.0861970088727736}, "mentions the price as too expensive": {"p-value": 2.7455122632938617e-34, "V'": 0.11845214799151749}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of luxury beauty products on Amazon giving 4 stars, while the Group B snippets are reviews of luxury beauty products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the quality of the magazine has declined": {"p-value": 1.1615679430093772e-82, "V'": 0.2628276561580399}, "poor customer service": {"p-value": 9.41383056197118e-99, "V'": 0.3000707316032275}, "content is shallow and does not include in-depth articles": {"p-value": 1.3128644589646749e-55, "V'": 0.2426337809718923}, "mentions the print quality being poor": {"p-value": 0.0004938947021140197, "V'": 0.013613614107595281}, "mentions poor customer service": {"p-value": 1.9414744025470828e-48, "V'": 0.16264222657773764}, "mentions the magazine being outdated": {"p-value": 6.235655130622366e-29, "V'": 0.1058876960269317}, "mentions a lack of content": {"p-value": 6.778303278097815e-65, "V'": 0.2121936012834391}, "received their order late": {"p-value": 1.1081537609926222e-49, "V'": 0.1722127073309982}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of magazines on Amazon giving 1 star, while the Group B snippets are reviews of magazines on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions that the magazine is not worth the money spent": {"p-value": 3.7585645911608135e-23, "V'": 0.18356978328859752}, "mentions not getting the product as expected": {"p-value": 1.2737855341353398e-15, "V'": 0.15472445089508813}, "complains about the lack of content": {"p-value": 8.939054499233336e-10, "V'": 0.12796271880363042}, "complains about the lack of useful information": {"p-value": 5.958463347197004e-14, "V'": 0.1579986394866632}, "mentions unexpected delays in delivery": {"p-value": 6.449444968773844e-20, "V'": 0.15186871506193492}, "complains about the high cost of the magazine": {"p-value": 0.0003064405548891857, "V'": 0.04107863671247639}, "complains about the cost": {"p-value": 9.47641921187371e-08, "V'": 0.07420008621522234}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of magazines on Amazon giving 1 star, while the Group B snippets are reviews of magazines on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"Too many ads and not enough content": {"p-value": 1.2545623180745403e-110, "V'": 0.3815680279140073}, "not enough content compared to the amount of ads": {"p-value": 9.226923904776799e-139, "V'": 0.44125137140152054}, "mentions too many ads": {"p-value": 4.674947877089383e-29, "V'": 0.16339810442036728}, "Too many ads, not enough practical information": {"p-value": 8.361650132555007e-97, "V'": 0.36079941975645}, "not worth the price": {"p-value": 5.21251e-318, "V'": 0.6585034700738452}, "complains about too many ads": {"p-value": 2.5198090916554854e-32, "V'": 0.17436890101070918}, "mentions poor quality of paper/print": {"p-value": 6.9890685664019995e-06, "V'": 0.029369608500431403}, "mentions not being able to read the magazine on a PC": {"p-value": 9.992782739985785e-05, "V'": 0.02899541525941442}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of magazines on Amazon giving 2 stars, while the Group B snippets are reviews of magazines on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions too many ads": {"p-value": 1.2170306595355029e-20, "V'": 0.05419021914295733}, "mentions annoying advertisements": {"p-value": 4.845027849337133e-17, "V'": 0.049802787838490695}, "not enough editorial content": {"p-value": 4.389361446576487e-15, "V'": 0.04422941450375433}, "complaining about the length of time to receive the magazine": {"p-value": 2.5715504064382836e-06, "V'": 0.02219103290362144}, "complains about too many ads": {"p-value": 4.8300018078070887e-17, "V'": 0.04662756387406155}, "Too many advertisements": {"p-value": 5.593966485819952e-20, "V'": 0.05359531676497277}, "mentions problems with delivery time": {"p-value": 1.2413623196857192e-08, "V'": 0.027105436528561733}, "mentions the cost being too high": {"p-value": 0.00011888715473084107, "V'": 0.01803389455066768}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of magazines on Amazon giving 4 stars, while the Group B snippets are reviews of magazines on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor output quality": {"p-value": 0.0, "V'": 0.6754568801864331}, "Mentions poor quality and craftsmanship.": {"p-value": 0.0, "V'": 0.8762094864304248}, "mentions poor sound quality": {"p-value": 1.2343372543991733e-15, "V'": 0.24094474951471512}, "mentions receiving damaged goods": {"p-value": 3.075051550677577e-191, "V'": 0.48450980534634935}, "references the product being cheaply made": {"p-value": 3.817284799131011e-253, "V'": 0.5663139046567102}, "expresses dissatisfaction with sound quality": {"p-value": 4.2396681917363394e-61, "V'": 0.35021014188349053}, "mentions the product not working": {"p-value": 0.0, "V'": 0.8084756937615759}, "mentions low-quality materials": {"p-value": 2.24876e-318, "V'": 0.6398109962774117}, "mentions poor product quality": {"p-value": 0.0, "V'": 0.8780438682124074}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of music instruments on Amazon giving 1 star, while the Group B snippets are reviews of music instruments on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"high cost for low quality": {"p-value": 2.635608155434363e-31, "V'": 0.15547338863606824}, "mentions poor sound quality": {"p-value": 1.1407149635781187e-09, "V'": 0.1061696213123744}, "Mentions poor build quality": {"p-value": 4.3757174023024675e-88, "V'": 0.4051816313008531}, "mentions cheap materials or low-quality parts": {"p-value": 1.4609954704484986e-57, "V'": 0.33163880395146983}, "mentions poor quality materials": {"p-value": 1.7236487384765632e-104, "V'": 0.44634074536833246}, "mentions deceptive marketing tactics": {"p-value": 3.45681220460111e-25, "V'": 0.15714810865323114}, "criticizes poor sound quality": {"p-value": 2.9231028454113966e-11, "V'": 0.12630230022722502}, "mentions the product being of poor quality": {"p-value": 2.0501578904015607e-162, "V'": 0.5347714657329651}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of music instruments on Amazon giving 1 star, while the Group B snippets are reviews of music instruments on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about not fitting the instrument": {"p-value": 1.0554790929595399e-193, "V'": 0.5393041191250136}, "mentions the product was not worth the money": {"p-value": 2.660211168553015e-167, "V'": 0.46463345416769875}, "mentions poor sound quality": {"p-value": 1.6129378455446528e-29, "V'": 0.2179600278429077}, "poor construction or quality": {"p-value": 0.0, "V'": 0.7220674272057095}, "mentions bad sound quality": {"p-value": 9.636918295495515e-14, "V'": 0.1947552480796279}, "mentions poor build quality": {"p-value": 7.4463675910506e-181, "V'": 0.5219537478976076}, "notes poor quality control": {"p-value": 0.0, "V'": 0.7221276356883664}, "mentions poor quality of product": {"p-value": 5.975758227459163e-285, "V'": 0.663103588079147}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of music instruments on Amazon giving 2 stars, while the Group B snippets are reviews of music instruments on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about the price": {"p-value": 0.0005736523047132481, "V'": 0.01630244403161716}, "comes with a defective part": {"p-value": 3.72217282609417e-22, "V'": 0.0857950610973397}, "complains about the price being too high": {"p-value": 0.0005799105022996591, "V'": 0.016288646703175518}, "mentions poor build quality": {"p-value": 6.571599488159756e-18, "V'": 0.06763809224678768}, "mentions difficulty in assembly of the product": {"p-value": 6.737339166778922e-10, "V'": 0.04318384075976889}, "mentions the product being too bulky": {"p-value": 4.829528596575576e-06, "V'": 0.023425322744833423}, "mentions the product not being suitable for intended use": {"p-value": 1.3310822844383294e-26, "V'": 0.11840365324914771}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of music instruments on Amazon giving 4 stars, while the Group B snippets are reviews of music instruments on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"Complains about the quality of the product": {"p-value": 0.0, "V'": 0.9332893083578352}, "mentions product is not as described": {"p-value": 0.0, "V'": 0.9256874970076573}, "expresses disappointment with the quality": {"p-value": 0.0, "V'": 0.9283414441120398}, "mentions poor quality": {"p-value": 0.0, "V'": 0.8386033698254557}, "mentions poor design and impracticality": {"p-value": 0.0, "V'": 0.8914029866916822}, "mentions poor quality materials": {"p-value": 0.0, "V'": 0.7770612944181962}, "mentions the product not working as advertised": {"p-value": 0.0, "V'": 0.8901846719912916}, "Disappointed with the low quality of the product": {"p-value": 0.0, "V'": 0.9022971966475536}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of office products on Amazon giving 1 star, while the Group B snippets are reviews of office products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions having to return the product": {"p-value": 2.5226340022265673e-31, "V'": 0.13178944052221728}, "mentions poor quality of product": {"p-value": 5.9989390492040095e-74, "V'": 0.3494044870129357}, "refers to a short lifespan of the product": {"p-value": 2.318150481747918e-28, "V'": 0.18854473117917236}, "mentions poor quality materials": {"p-value": 6.08448332507377e-91, "V'": 0.4074812659320543}, "mentions not being able to use the product for its intended purpose": {"p-value": 1.8249763967932042e-41, "V'": 0.2817174078244767}, "reports poor quality": {"p-value": 2.6856890314100713e-118, "V'": 0.4453635931011764}, "mentions a product failure": {"p-value": 4.492409360281571e-117, "V'": 0.4480282115982641}, "mentions poor quality and design": {"p-value": 4.9150482290231744e-54, "V'": 0.2768642580340023}, "Complains about the quality of the product": {"p-value": 6.998037365146678e-61, "V'": 0.29712467641798634}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of office products on Amazon giving 1 star, while the Group B snippets are reviews of office products on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor quality": {"p-value": 0.0, "V'": 0.7281722344813786}, "mentions poor quality materials": {"p-value": 5.554832907707719e-305, "V'": 0.6330125506076777}, "mentions a lack of quality": {"p-value": 0.0, "V'": 0.7278226058242518}, "mentions poor quality in the product": {"p-value": 0.0, "V'": 0.7594193230795354}, "poorly written product description": {"p-value": 3.048709573474196e-36, "V'": 0.32633643697503834}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of office products on Amazon giving 2 stars, while the Group B snippets are reviews of office products on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about the quality of the product": {"p-value": 4.2555029526869556e-75, "V'": 0.14702998482694643}, "mentions the price being too high": {"p-value": 8.66603292856007e-16, "V'": 0.035200388190510454}, "mentions poor quality": {"p-value": 6.788856537675549e-20, "V'": 0.04482630190726977}, "mentions the product being too small": {"p-value": 5.351126813991187e-16, "V'": 0.0363771229795097}, "mentions that the product is too small": {"p-value": 3.131119366060131e-16, "V'": 0.034449769000171514}, "mentions difficulty sharpening the pencils": {"p-value": 0.00017132928531346883, "V'": 0.00991295324964371}, "The pen can feel cheap in hand.": {"p-value": 0.0008406028712666741, "V'": 0.004834983902467826}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of office products on Amazon giving 4 stars, while the Group B snippets are reviews of office products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor quality materials": {"p-value": 5.633111646041556e-291, "V'": 0.6492978132684109}, "mentions the product being cheaply made": {"p-value": 2.967817489872292e-138, "V'": 0.40667413976574}, "mentions the product being too flimsy": {"p-value": 2.784896297141338e-78, "V'": 0.26552374677647683}, "mentions poor quality of the product": {"p-value": 0.0, "V'": 0.7354799460029948}, "mentions difficulty in setting up and/or using the product": {"p-value": 3.4509943198939127e-100, "V'": 0.35141806380260676}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of patio products on Amazon giving 1 star, while the Group B snippets are reviews of patio products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions inferior quality materials": {"p-value": 1.8886299200139694e-33, "V'": 0.26120940569459034}, "mentions poor quality material.": {"p-value": 3.515061787170465e-58, "V'": 0.34342295112720383}, "mentions poor quality materials": {"p-value": 8.37363722488639e-53, "V'": 0.32786078041019256}, "mentions poor quality": {"p-value": 1.1557557450546291e-61, "V'": 0.3424287536167159}, "product malfunctioned quickly": {"p-value": 1.0799664920198835e-93, "V'": 0.4163599888131226}, "mentions poor quality material": {"p-value": 3.13496524969011e-59, "V'": 0.3462127615785451}, "mentions bad design or build quality": {"p-value": 1.0468314632076889e-37, "V'": 0.26651847545208984}, "complains about the product's unreliability": {"p-value": 2.931376240844164e-83, "V'": 0.36596303902633676}, "mentions the product not working as advertised": {"p-value": 5.575879404200334e-55, "V'": 0.29240205416685827}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of patio products on Amazon giving 1 star, while the Group B snippets are reviews of patio products on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor workmanship": {"p-value": 1.3457593122956253e-169, "V'": 0.49729210750345704}, "mentions poor quality of materials": {"p-value": 6.959141672450284e-213, "V'": 0.5606006664469312}, "mentions product as flimsy": {"p-value": 1.1332512137551762e-81, "V'": 0.36762613136927885}, "mentions poor product quality": {"p-value": 0.0, "V'": 0.7088912376320562}, "not worth the money": {"p-value": 0.0, "V'": 0.78753746601189}, "mentions a poor product quality": {"p-value": 0.0, "V'": 0.711690451587911}, "mentions poor quality": {"p-value": 2.351252315231253e-288, "V'": 0.6463737159758028}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of patio products on Amazon giving 2 stars, while the Group B snippets are reviews of patio products on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions being flimsy and not lasting long": {"p-value": 2.3089064523779803e-22, "V'": 0.07770628765574063}, "mentions the price being too high": {"p-value": 4.4522056126400434e-05, "V'": 0.017086856279146172}, "mentions the product being flimsy or not sturdy": {"p-value": 1.474033481021803e-18, "V'": 0.06619671015211438}, "mentions that the product is too expensive": {"p-value": 0.00010872867512627427, "V'": 0.015257313194439525}, "mentions difficulty in assembly": {"p-value": 7.615563364711672e-05, "V'": 0.01545458840358976}, "refers to the product being too small": {"p-value": 5.744743591143565e-07, "V'": 0.018739881409505343}, "mentions difficulty in installation": {"p-value": 7.130748033638696e-09, "V'": 0.030745223247742808}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of patio products on Amazon giving 4 stars, while the Group B snippets are reviews of patio products on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"not as pictured": {"p-value": 0.0, "V'": 0.6822412615335659}, "mentions product arriving damaged": {"p-value": 1.1097727930350524e-64, "V'": 0.1805857861276152}, "mentions a bad taste or smell": {"p-value": 3.590756393991282e-139, "V'": 0.33103921517983037}, "mentions inferior quality of product": {"p-value": 0.0, "V'": 0.7920970311354729}, "mentions an unpleasant taste": {"p-value": 4.377095248153415e-109, "V'": 0.2808495666855819}, "mentions product not cleaning as expected": {"p-value": 1.962979877802507e-61, "V'": 0.1712705656123895}, "mentions an allergy or sensitivity to the product.": {"p-value": 1.1338207537811803e-19, "V'": 0.11538878634040739}, "mentions poor product quality": {"p-value": 0.0, "V'": 0.8033431826325507}, "mentions the product being too small for the price": {"p-value": 4.118167742644384e-15, "V'": 0.06503420087107567}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of pantry goods on Amazon giving 1 star, while the Group B snippets are reviews of pantry goods on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions of a bad smell": {"p-value": 2.3746683312866827e-07, "V'": 0.04492683459652231}, "mentions poor design/construction": {"p-value": 3.5386895715155096e-43, "V'": 0.21334171085574755}, "mentions a bad smell": {"p-value": 4.470989371262592e-07, "V'": 0.0433399847332592}, "mentions poor quality": {"p-value": 3.1734469059665733e-150, "V'": 0.46529924279166396}, "mentions product was defective": {"p-value": 2.6188677380041267e-174, "V'": 0.49573080712553463}, "complains about taste": {"p-value": 2.0951665064231786e-08, "V'": 0.10326394308457187}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of pantry goods on Amazon giving 1 star, while the Group B snippets are reviews of pantry goods on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"Complains about the product's flavor": {"p-value": 1.7391550536006008e-106, "V'": 0.34228845620576365}, "mentions an unpleasant taste": {"p-value": 7.590043155613053e-80, "V'": 0.2383617326015312}, "complains about the bad taste": {"p-value": 1.1211800863273917e-135, "V'": 0.3537158096993485}, "mentions poor taste": {"p-value": 1.6486790555255113e-140, "V'": 0.36421725614574546}, "mentions dissatisfaction with the product quality": {"p-value": 1.1741207e-316, "V'": 0.6322135637704753}, "mentions the taste being off or not what was expected": {"p-value": 4.527808372239361e-91, "V'": 0.3086750788425982}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of pantry goods on Amazon giving 2 stars, while the Group B snippets are reviews of pantry goods on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the product being too sweet": {"p-value": 2.4353288941604037e-09, "V'": 0.018056204456294186}, "not as good as expected": {"p-value": 1.2694594848054787e-74, "V'": 0.14412152766189648}, "says the product is unhealthy due to high sodium levels": {"p-value": 0.0003799728049108022, "V'": 0.00714187170988708}, "mentions the product being too salty": {"p-value": 4.175008120318416e-05, "V'": 0.011413557581810597}, "mentions price being too high": {"p-value": 1.7396757510938754e-18, "V'": 0.05401012905135593}, "mentions product being too expensive": {"p-value": 8.71402334252364e-20, "V'": 0.056430218266905766}, "mentions not achieving advertised results": {"p-value": 1.5162198290266956e-64, "V'": 0.13117373995207243}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of pantry goods on Amazon giving 4 stars, while the Group B snippets are reviews of pantry goods on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions unsatisfactory customer service": {"p-value": 0.0, "V'": 0.728850456345884}, "mentions problems with installation and setup": {"p-value": 0.0, "V'": 0.7740620377282185}, "not user-friendly/difficult to use": {"p-value": 0.0, "V'": 0.8256516971084775}, "has difficulty with installation": {"p-value": 1.7323526987442325e-111, "V'": 0.5581073132220274}, "complains about the product being difficult to use or understand": {"p-value": 0.0, "V'": 0.8392305413036225}, "mentions poor performance": {"p-value": 0.0, "V'": 0.8478592176775458}, "mentions poor customer service": {"p-value": 0.0, "V'": 0.6684045786087982}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of software on Amazon giving 1 star, while the Group B snippets are reviews of software on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions a lack of technical support": {"p-value": 1.2668672513104776e-33, "V'": 0.22272208704398852}, "Complains about the product not working as advertised": {"p-value": 1.9660632298843817e-47, "V'": 0.22360346556474264}, "mentions being overcharged": {"p-value": 5.5866442527303996e-05, "V'": 0.04252039862267119}, "expresses difficulty with technical aspects": {"p-value": 1.3826893465420405e-21, "V'": 0.1718230850176139}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of software on Amazon giving 1 star, while the Group B snippets are reviews of software on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"Expensive for the features it provides.": {"p-value": 0.0007169590512715373, "V'": 0.028473111843547217}, "mentions incompatibility with operating system": {"p-value": 5.417364099673332e-30, "V'": 0.22630495697094988}, "complains about the user interface": {"p-value": 1.0297662125292082e-182, "V'": 0.5456223132839633}, "mentions difficulty of installation": {"p-value": 8.278253799556205e-22, "V'": 0.1482480347776742}, "Mentions the software being unhelpful": {"p-value": 0.0, "V'": 0.7348237742202381}, "mentions not receiving the correct version": {"p-value": 6.436822207866778e-44, "V'": 0.24844922595720656}, "mentions high price": {"p-value": 4.130236368696115e-07, "V'": 0.054927978956436196}, "complains about compatibility issues": {"p-value": 8.28158460557841e-170, "V'": 0.5331719773845639}, "mentions difficulty with installation process": {"p-value": 7.335355931653595e-27, "V'": 0.18376983090356436}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of software on Amazon giving 2 stars, while the Group B snippets are reviews of software on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions the product not being worth the price": {"p-value": 2.2720289364419776e-09, "V'": 0.04129484127779325}, "mentions the product being too slow": {"p-value": 1.873529085786283e-10, "V'": 0.051370531561336266}, "mentions incompatibility issues": {"p-value": 7.927621639328522e-33, "V'": 0.2084018616141496}, "mentions long load times": {"p-value": 5.344127850747696e-10, "V'": 0.04873127991254358}, "reports on slow loading times": {"p-value": 7.517188606210251e-10, "V'": 0.044935569837086686}, "complains of lack of compatibility with other software": {"p-value": 2.628553170779579e-29, "V'": 0.14019704876645353}, "mentions difficulties with the installation process": {"p-value": 7.637234402650606e-14, "V'": 0.08271364441882498}, "mentions difficulty with setup or installation": {"p-value": 1.6823355449884274e-18, "V'": 0.11197779523283333}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of software on Amazon giving 4 stars, while the Group B snippets are reviews of software on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions poor graphics": {"p-value": 2.062065988582117e-45, "V'": 0.18102270384844033}, "mentions too easy or boring gameplay": {"p-value": 8.851977908520186e-75, "V'": 0.24153025012507773}, "mentions slow loading times": {"p-value": 2.6553261471455773e-09, "V'": 0.031034787602735796}, "complains about poor graphics": {"p-value": 5.6669135444095776e-67, "V'": 0.2303954861605985}, "complains about graphics not being realistic": {"p-value": 1.4444243506251952e-44, "V'": 0.15283621971770495}, "mentions the game being too difficult or complicated to figure out": {"p-value": 5.487059302058344e-13, "V'": 0.09244305234702904}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of video games on Amazon giving 1 star, while the Group B snippets are reviews of video games on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"mentions DRM schemes": {"p-value": 4.448792151601896e-10, "V'": 0.049161879607787486}, "complains about the graphics": {"p-value": 7.062840933438234e-07, "V'": 0.0757578489077338}, "mentions poor graphics": {"p-value": 4.710279913963683e-05, "V'": 0.06339036818138022}, "mentions poor gameplay": {"p-value": 3.460111276087465e-19, "V'": 0.1917694059402527}, "mentions bugs or glitches": {"p-value": 2.080589224024593e-17, "V'": 0.1859286035907779}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of video games on Amazon giving 1 star, while the Group B snippets are reviews of video games on Amazon giving 3 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about the lack of replay value": {"p-value": 4.041702701346187e-158, "V'": 0.53705987762875}, "mentions poor graphics": {"p-value": 2.1377583175234305e-30, "V'": 0.17053324555425664}, "low replay value": {"p-value": 6.538711749857838e-303, "V'": 0.6901986600779445}, "refers to the game as 'lousy'": {"p-value": 3.574274180027579e-295, "V'": 0.6581610511398895}, "Quality of graphics is below par": {"p-value": 5.1683646700267636e-76, "V'": 0.3333646816453353}, "Complains about the graphics.": {"p-value": 1.895275126255607e-43, "V'": 0.24689739385431286}, "mentions struggling with character movement": {"p-value": 2.0036784911172916e-17, "V'": 0.13278970711615945}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of video games on Amazon giving 2 stars, while the Group B snippets are reviews of video games on Amazon giving 4 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}, {"+": {"complains about difficulty/challenge of the game": {"p-value": 8.383458730085701e-38, "V'": 0.13308270528141097}, "finds the game too repetitive": {"p-value": 1.2956958499002192e-35, "V'": 0.1018437951760277}, "mentions the game being too short": {"p-value": 7.262798828308685e-09, "V'": 0.03269599297905276}, "mentions the graphics needing to be updated": {"p-value": 7.834970989623179e-07, "V'": 0.028826138379445993}, "mentions slowdowns in the game": {"p-value": 5.675646362825133e-14, "V'": 0.04835185554421109}, "comments on the game being too short": {"p-value": 2.2811876653166396e-10, "V'": 0.03723008249311027}, "mildly disappointed with the graphics": {"p-value": 1.3943933319242463e-16, "V'": 0.06785662428514361}, "mentions the linearity of the game": {"p-value": 3.71982575383823e-23, "V'": 0.06976150822345475}}, "-": {}, "research goal": "The dataset includes Amazon reviews collected from various product categories. The two classes are generated based on how many stars the review gave. The Group A snippets are reviews of video games on Amazon giving 4 stars, while the Group B snippets are reviews of video games on Amazon giving 5 stars. I am a seller of various products on Amazon. My goal is to figure out which specific aspects users dislike, such as the price, features, or performance. "}]