paper_name,data_description,natural_language_query,answer,method,data_files,domain,treatment,outcome,covariates,running_var,temporal_var,state_var,mediator,instrument_var,
Synthetic Dataset,"This dataset was compiled from a Randomized Control Trial conducted by the Department of Housing and Urban Development (HUD) of the United States. The goal was to investigate the impact of a new housing subsidy policy on recipients' housing stability. Variables include the age of the recipient ('recipient_age'), their monthly income ('monthly_income'), whether they own a home ('is_homeowner', binary: 1 for homeowners, 0 for non-homeowners), whether they have dependents ('has_dependents', binary: 1 for yes, 0 for no), whether they reside in a rural area ('lives_in_rural_area', binary: 1 for rural, 0 for urban), whether they received the housing subsidy ('received_subsidy', binary: 1 for yes, 0 for no), and their self-reported housing stability ('housing_stability').",Does providing housing subsidies improve the stability of housing situations?,3.959617246,rct,rct_data_7.csv,policy,received_subsidy,housing_stability,"recipient_age, monthly_income, is_homeowner, has_dependents, lives_in_rural_area",,,,,,
Synthetic Dataset,"This dataset was gathered by a team of psychologists from the National Institute of Mental Health in a randomized experiment. The study aimed to explore the impact of a mindfulness-based intervention on the stress levels of individuals with a history of mental health challenges. The 'participant_age' represents the age of the participant. The 'stress_level' is a measure of the participant's stress level before the intervention, with higher values indicating greater stress. The 'cognitive_test_score' represents the participant's cognitive abilities. The 'mental_health_history', 'psychological_treatment_history', and 'medication_history' are binary variables indicating whether the participant has a history of mental health issues, has previously undergone psychological treatment, or has been on medication, respectively. The 'mindfulness_intervention' indicates whether the participant received the mindfulness intervention (1) or not (0). The 'post_treatment_stress_level' measures the participant's stress level after the intervention.",Can practicing mindfulness techniques enhance stress reduction in individuals?,6.114257154,rct,rct_data_29.csv,psychology,mindfulness_intervention,post_treatment_stress_level,"participant_age, stress_level, cognitive_test_score, mental_health_history, psychological_treatment_history, medication_history",,,,,,
Synthetic Dataset,This dataset was collected by the National Bureau of Economic Research (NBER) in a randomized trial to study the impact of tax credits on household annual spending. The 'age' represents the age of the head of the household. The 'income_level' is the annual household income in dollars and 'education_years' is the total years of formal education completed by the head of the household. The 'household_size' indicates the number of people living in the household. The 'is_homeowner' is a binary variable indicating whether the household owns their residence (1) or not (0). The 'has_unemployed_member' is a binary variable indicating whether the household has any member who is currently unemployed (1) or not (0). The 'received_government_benefits' denotes whether the household has received any form of government social assistance in the past year (1) or not (0). The 'tax_credit_received' indicates whether the household received a tax credit (1) or not (0). The 'annual_spending' captures the total annual spending of the household in dollars.,Will the government boost the economy by giving households a tax break?,2.531698255,rct,rct_data_48.csv,Economics,tax_credit_received,annual_spending,"age, income_level, education_years, household_size, is_homeowner, has_unemployed_member, received_government_benefits",,,,,,
Synthetic Dataset,"The dataset was collected by the Justice Department as part of a randomized study on the impact of a restorative justice program on defendants' recidivism rates. The variables include the defendant's age ('defendant_age'), the number of prior convictions ('prior_convictions'), the severity of the crime ('severity_of_crime'), whether the defendant received legal aid ('legal_aid_received'), the experience of the defendant's lawyer ('lawyer_experience_years'), whether the defendant accepted a plea bargain ('plea_bargain_accepted'), whether the defendant completed community service ('community_service_completed'), and whether a victim impact statement was presented ('victim_impact_statement_presented'). The restorative justice program participation ('restorative_justice_program') is a binary variable, with 1 indicating participation in the program and 0 indicating no participation. The recidivism rate ('recidivism_rate') represents the likelihood of reoffending.",Can participating in a restorative justice program lower the likelihood of reoffending among defendants?,8.998674328,rct,rct_data_38.csv,law,restorative_justice_program,recidivism_rate,"defendant_age, prior_convictions, severity_of_crime, legal_aid_received, lawyer_experience_years, plea_bargain_accepted, community_service_completed, victim_impact_statement_presented",,,,,,
Synthetic Dataset,"The dataset is a result of a Randomized Control Trial (RCT) conducted by the Department of Education in a midwestern state in the U.S. The department aimed to study the impact of using interactive textbooks on students' final grades. The study controlled for factors such as the student's age ('student_age'), the education level of their parents ('parent_education_level'), the size of their classroom ('classroom_size'), the school funding per student ('school_funding_per_student'), whether their school is located in a rural area ('rural_school'), and whether they have access to tutoring ('access_to_tutoring'). The intervention, or treatment, was the use of interactive textbooks ('interactive_textbook_use'), where a value of 1 indicates that the student used the interactive textbooks and 0 indicates that they did not. The final grades ('final_grade') of the students at the end of the school year were recorded.",Does the use of interactive textbooks enhance students' final grades?,7.817072114,rct,rct_data_17.csv,education,interactive_textbook_use,final_grade,"student_age, parent_education_level, classroom_size, school_funding_per_student, rural_school, access_to_tutoring",,,,,,
Synthetic Dataset,"The dataset was collected during a Randomized Control Trial conducted by a leading healthcare research institute. The trial was designed to study the impact of a new treatment on the recovery rates of patients diagnosed with Chronic Obstructive Pulmonary Disease (COPD). The 'patient_age' represents the age of the patients. The 'months_since_diagnosis' indicates the number of months that have passed since the patients were diagnosed with COPD. The 'smoking_status', 'physical_activity', and 'diabetes_diagnosis' are binary variables. The 'smoking_status' indicates whether the patient is a smoker (1) or not (0), 'physical_activity' denotes whether the patient is physically active (1) or not (0), and 'diabetes_diagnosis' signifies whether the patient has been diagnosed with diabetes (1) or not (0). The 'new_treatment' indicates whether the patient received the new treatment (1) or was in the control condition (0). Finally, 'recovery_rate' represents the recovery rate of the patients on a scale of 0 to 100.",Will the introduction of a new treatment boost the recovery rates of patients suffering from Chronic Obstructive Pulmonary Disease (COPD)?,8.153864237,rct,rct_data_26.csv,healthcare,new_treatment,recovery_rate,"patient_age, months_since_diagnosis, smoking_status, physical_activity, diabetes_diagnosis",,,,,,
Synthetic Dataset,"The dataset was collected by the Department of Political Science at a national university in partnership with a non-profit organization through a randomized experiment. The aim of the study was to examine the impact of a political campaign on voter turnout in a nationwide election. The participants' ages, varying between 18 and 63 years, are recorded as 'age'. Their years of formal education, ranging from 13 to 35 years, are recorded as 'education_years'. The participants' annual income in thousands of dollars is defined as 'income'. The level of political activity, such as attending political rallies or participating in political discussions, is recorded as 'political_activity'. The level of news exposure, measured as hours spent on news consumption per week, is recorded as 'news_exposure'. Whether the participant resides in a rural area is noted as 'rural_residence' and is binary, with 1 indicating rural residence and 0 indicating urban residence. The participants' previous voting history, whether they have voted in past elections, is captured as 'previous_voting_history' and is binary, with 1 indicating they have voted before and 0 indicating they have not. The experiment involved exposing a randomly selected group of participants to a political campaign, recorded as 'campaign_exposure', where 1 indicates exposure and 0 indicates no exposure. Finally, 'voting_turnout' represents whether participants voted in the subsequent election, with 1 indicating they voted and 0 indicating they did not vote.",Can exposure to a political campaign lower voter apathy in a nationwide election?,4.46159042,rct,rct_data_39.csv,political_science,campaign_exposure,voting_turnout,"age, education_years, income, political_activity, news_exposure, rural_residence, previous_voting_history",,,,,,
Synthetic Dataset,"This dataset was compiled by the Department of Labor in a randomized trial to evaluate the impact of a new job security policy on job satisfaction among workers. The dataset considered various factors such as the individual's age, years of education, income, number of hours worked per week, and years of job experience. It also included binary variables such as union membership (1 for members, 0 for non-members) and whether the individual had undergone a job training program (1 for yes, 0 for no). The job security policy indicates whether individuals were affected by the policy (1) or not (0). Job satisfaction level was measured with higher values indicating greater satisfaction.",Does a new job security policy result in improved worker job satisfaction?,7.266382436,rct,rct_data_27.csv,labor,job_security_policy,job_satisfaction_level,"individual_age, education_years, income, weekly_working_hours, job_experience_years, union_membership, job_training_program",,,,,,
Synthetic Dataset,"This dataset was collected from a randomized control trial conducted by the Department of Sociology in collaboration with the National Institute of Informatics. The trial was designed to understand the impact of a digital literacy program on internet usage. The variables in the dataset include the age of the individuals ('individual_age'), their years of education ('years_of_education'), their employment status ('employed'), whether or not they have access to the internet ('internet_access'), and whether they live in a rural area ('rural_residence'). The 'digital_literacy_program' indicates whether or not the individual was randomly assigned to participate in the digital literacy program (1 means they participated, 0 means they did not). Finally, the 'internet_usage' represents the average number of hours the individual spends on the internet per day.",Can implementing a digital literacy program increase daily internet usage?,1.597041686,rct,rct_data_24.csv,sociology,digital_literacy_program,internet_usage,"individual_age, years_of_education, employed, internet_access, rural_residence",,,,,,
Synthetic Dataset,"This dataset is a record from a Randomized Control Trial conducted by the Department of Transportation to study the impact of a public transport subsidy policy on the residents' usage of public transport. The policy was randomly assigned to various residents. The variables in the dataset include 'resident_age' representing the age of the resident, 'income_level' capturing the yearly income of the resident, 'urban_residence' indicating whether the resident lives in an urban (1) or rural (0) area, and 'employment_status' showing whether the resident is employed (1) or unemployed (0). The 'public_transport_subsidy' is a binary indicator of whether the resident received a public transport subsidy (1) or not (0). The 'public_transport_usage' measures the number of times a resident used public transport in a month.",Does providing subsidies for public transportation enhance its usage among residents?,6.072467523,rct,rct_data_2.csv,policy,public_transport_subsidy,public_transport_usage,"resident_age, income_level, urban_residence, employment_status",,,,,,
Synthetic Dataset,"This dataset was collected by the National Bank as part of a randomized study to understand the impact of a policy change on the successful repayment of loans. The variables in the dataset include 'age' (denoting the age of the loan applicant), 'income' (denoting the annual income of the borrower), 'home_owner_status' (binary variable, where 1 indicates the borrower owns their home and 0 indicates they do not), 'previous_loan_history' (binary variable, where 1 indicates the borrower has previous loan experience and 0 indicates they do not), 'job_stability' (binary variable, where 1 signifies stable employment and 0 indicates irregular or unstable employment), 'loan_grant' (binary variable, where 1 means the new loan granting policy was applied and 0 means it was not), and 'loan_repayment' (denoting the success rate of loan repayment).",Will a change in the criteria for granting loans by the National Bank boost how successfully borrowers repay their loans?,3.490116611,rct,rct_data_4.csv,economics,loan_grant,loan_repayment,"age, income, home_owner_status, previous_loan_history, job_stability",,,,,,
Synthetic Dataset,"This dataset was collected by the Department of Safety and Public Policy (DSPP) through a randomized trial to study the effect of a new public safety funding policy on the perceived safety of residents. The neighborhoods were randomly selected to receive the additional funding ('public_safety_fund'). The age of the participant ('participant_age') and their annual income ('annual_income') are recorded. The crime rate in the neighborhood ('crime_rate') is also recorded. Home ownership status ('home_ownership'), where '0' denotes a renter and '1' denotes a homeowner, as well as employment status ('employment_status'), where '0' denotes unemployed and '1' denotes employed, are binary variables. The education level of the participant ('education_level') is another binary variable, with '0' representing 'high school or less' and '1' representing 'college degree or higher'. The perceived safety satisfaction ('safety_satisfaction') represents the level of satisfaction with community safety.",Can investing more in public safety lower residents' safety concerns in their community?,8.146638275,rct,rct_data_36.csv,policy,public_safety_fund,safety_satisfaction,"participant_age, annual_income, crime_rate, home_ownership, education_level, employment_status",,,,,,
Synthetic Dataset,"The dataset is a product of a randomized control trial conducted by the National Institute of Sociology to study the effect of social media usage on political participation. Individuals in the treatment arm were asked to participate in a new social media app designed for participating in political discourse. Individual age ('individual_age') represents the age of the respondent. Education level ('education_level') indicates the highest level of education completed by the respondent on a scale from 1 (no formal education) to 10 (doctoral degree). Household income ('household_income') reflects the total income of the respondent's household. Employment status ('employment_status') indicates the respondent's current employment situation on a scale of 1 to 5, where 1 is unemployed and 5 is full-time employment. Community involvement ('community_involvement') measures the level of respondent's involvement in local community activities. Internet access ('internet_access') and public transport access ('public_transport_access') are binary variables, with 1 indicating access and 0 indicating no access. Social media usage ('social_media_usage') is a binary variable where 1 indicates participation in the social media app and 0 indicates no use. The political participation ('political_participation') measures the level of respondent's involvement in political activities, such as voting, campaigning, or attending public meetings.",Does regular use of social media result in increased involvement in political activities among citizens?,8.108756031,rct,rct_data_16.csv,sociology,social_media_usage,political_participation,"individual_age, education_level, household_income, employment_status, community_involvement, internet_access, public_transport_access",,,,,,
Synthetic Dataset,"This dataset was collected in the context of a randomized control trial conducted by a team of psychologists at a renowned university. The study's primary aim was to evaluate the effect of a newly developed anti-anxiety medication on individuals diagnosed with anxiety disorders. The participant's age ('participant_age') and initial cognitive test score ('cognitive_test_score') are used. Whether the participant had been diagnosed with anxiety ('diagnosed_anxiety'; 1 - Yes, 0 - No) and whether they had undergone psychotherapy treatment ('psychotherapy_treatment'; 1 - Yes, 0 - No) are binary variables. The medication intervention ('medication_intervention'; 1 - Given medication, 0 - Given placebo) reflects whether the participant was administered the new anti-anxiety medication or a placebo. The post treatment anxiety score ('post_treatment_anxiety_score') is a measure of the participant's anxiety level following the intervention, with lower scores indicating reduced anxiety.",Can a new anti-anxiety medication improve anxiety reduction in individuals?,8.906176432,rct,rct_data_12.csv,psychology,medication_intervention,post_treatment_anxiety_score,"participant_age, cognitive_test_score, diagnosed_anxiety, psychotherapy_treatment",,,,,,
Synthetic Dataset,"This dataset was collected as part of a nationwide Randomized Control Trial conducted by the National Election Commission. The study aimed to evaluate the impact of a new voter education program on the voting rate, in the context of various demographic factors. The 'age' represents the age of the individual. The 'education_years' indicates the total number of years of formal education completed by the individual. The 'income_level' is the individual's income level. The 'registered_voter' is a binary variable indicating whether or not the individual is a registered voter, with 1 indicating 'Yes' and 0 indicating 'No'. The 'political_affiliation' is a binary variable where 1 represents an affiliation to a major political party and 0 represents no such affiliation. The 'urban_resident' is a binary variable indicating whether the individual resides in an urban or rural area, with 1 for 'Urban' and 0 for 'Rural'. The 'voter_education_program' is a binary variable indicating whether the participant was exposed to the voter education program, with 1 for 'Yes' and 0 for 'No'. The 'voting_rate' represents the percentage of eligible elections in which the individual voted.",Will an increase in voter turnout be achieved by introducing a new voter education program?,6.732305104,rct,rct_data_14.csv,political_science,voter_education_program,voting_rate,"age, education_years, income_level, registered_voter, political_affiliation, urban_resident",,,,,,
Synthetic Dataset,"The dataset was collected from a randomized control trial conducted by the Department of Labor, studying the effect of a minimum wage increase on workers' job satisfaction. The 'worker_age' represents the age of the worker. The 'education_years' indicates the total years of formal schooling completed by the worker. The 'previous_income' shows the worker's income before the trial began. The 'working_hours' indicates the number of hours the worker typically works per week. The 'union_membership' is a binary variable indicating whether the worker is part of a union (1) or not (0). The 'job_security' is a binary variable indicating whether the worker has a permanent contract (1) or a temporary contract (0). The 'job_training' is a binary variable indicating whether the worker has undergone job training (1) or not (0). The 'minimum_wage_increase' indicates whether the worker's wage was increased (1) or not (0). The job satisfaction ('job_satisfaction') measures the worker's job satisfaction on a scale from 0 to 100.",Can raising the minimum wage boost job satisfaction among workers?,4.986693134,rct,rct_data_6.csv,labor,minimum_wage_increase,job_satisfaction,"worker_age, education_years, previous_income, working_hours, union_membership, job_security, job_training",,,,,,
Synthetic Dataset,"This dataset was collected as part of a randomized study conducted by the National Institute of Sociology. The study aimed to measure the impact of access to broadband internet on political participation. The 'participant_age' represents the age of the study participants. The 'education_years' indicates the total years of formal education completed by the participants. The 'annual_income' indicates the annual income of the participants in USD. The 'gender' is a binary variable with 1 denoting male and 0 denoting female participants. The 'rural_residence' is a binary variable where value 1 indicates that the participant lives in a rural area and 0 indicates urban residence. The 'social_network_size' is a binary variable, with 1 representing a large social network (above median) and 0 a small social network (below median). The 'access_to_broadband' indicates whether the participant was provided with access to broadband internet (1) or not (0). The political participation ('political_participation') represents the frequency of a participant's political activities on a scale of 0 to 100.",Does providing broadband access to individuals lower political disengagement?,8.345326623,rct,rct_data_49.csv,sociology,access_to_broadband,political_participation,"participant_age, education_years, annual_income, gender, rural_residence, social_network_size",,,,,,
Synthetic Dataset,"This dataset was collected by the Department of Psychology at a top-tier research university through a randomized experiment. The study included participants of varying ages ('participant_age'), cognitive abilities ('cognitive_test_score'), histories of mood disorders ('mood_disorder_history'), family histories of mental illness ('family_history_of_mental_illness'), and stress levels ('stress_levels'). Additionally, the data also includes whether a participant had visited a psychologist in the past year ('psychologist_visit', binary: 1 indicates a visit, 0 indicates no visit), and whether they were on any psychiatric medication ('medication_usage', binary: 1 indicates usage, 0 indicates no usage). The mindfulness intervention ('mindfulness_intervention') indicates whether the participant was assigned to the mindfulness intervention group (1) or control condition (0). The anxiety levels ('anxiety_levels') is a measure of participants' self-reported anxiety levels post-intervention.",Could offering mindfulness-based programs result in reduced anxiety levels among individuals?,9.123597566,rct,rct_data_20.csv,psychology,mindfulness_intervention,anxiety_levels,"participant_age, cognitive_test_score, mood_disorder_history, family_history_of_mental_illness, stress_levels, psychologist_visit, medication_usage",,,,,,
Synthetic Dataset,"The Department of Education conducted a Randomized Control Trial to investigate the impact of an online supplementary education program on students' final grades. The dataset includes variables such as 'student_age' which represents the age of the students in years, 'parents_education_years' which indicates the total years of formal education completed by the parents, and 'classroom_size', indicating the number of students in the class. The 'school_funding_status' is a binary variable where 1 indicates that the school is publicly funded and 0 indicates it is privately funded. The 'tutoring_access' is binary, with 1 indicating that the student has access to tutoring services and 0 indicating otherwise. The 'online_program_participation' is a binary indicator where 1 represents participation in the online program and 0 represents non-participation. The final grade ('final_grade') represents the student's final grade in the subject.",Can participating in an online supplementary program improve students' final grades?,7.620676225,rct,rct_data_58.csv,education,online_program_participation,final_grade,"student_age, parents_education_years, classroom_size, school_funding_status, tutoring_access",,,,,,
Synthetic Dataset,"This data was collected in a randomized control trial conducted by the Department of Nutrition at a major university. The study aimed to investigate the impact of a new diet plan on the cholesterol level of patients with a high risk of heart disease. Patient age represents the age of the patients at the time of the study. BMI score indicates the body mass index of each patient. Smoking status and alcohol consumption are binary variables, indicating whether the patient is a smoker (1 for yes, 0 for no) and whether the patient drinks alcohol (1 for yes, 0 for no), respectively. Physical activity is also a binary variable, representing whether the patient engages in regular physical activity (1 for yes, 0 for no). The new_diet_plan is the treatment indicator (binary, 1 if the patient underwent the new diet plan, 0 if not). The cholesterol level shows the patients' cholesterol levels after the completion of the study.",Does following a new diet plan enhance cholesterol reduction?,4.138544101,rct,rct_data_42.csv,healthcare,new_diet_plan,cholesterol_level,"patient_age, bmi_score, smoking_status, alcohol_consumption, physical_activity",,,,,,
Synthetic Dataset,"This dataset was compiled by a team of political scientists from various universities who are studying the factors leading to successful campaign outcomes in local elections across the United States. The dataset includes candidate age ('candidate_age'), the candidate's years of political experience ('experience_years'), and the amount of money spent on the campaign ('campaign_spending'). Binary variables include whether the candidate is female ('female_candidate', binary), whether the candidate is from a minority group ('minority_candidate', binary), and whether the candidate is an incumbent ('incumbent_candidate', binary). The social media campaign usage is given by the binary variable, 'social_media_campaign'. The measure of how likely the candidate is to win the election is given by the variable 'election_won'.",Does the use of social media campaigns increase the chances of winning local elections in the United States?,1.578919284,observational,observational_data_55.csv,political_science,social_media_campaign,election_won,"candidate_age, experience_years, campaign_spending, female_candidate, minority_candidate, incumbent_candidate",,,,,,
Synthetic Dataset,"The dataset was collected by a group of university researchers aiming to study the effectiveness of cognitive behavioral therapy (CBT) in reducing anxiety levels. Participants of varying ages ('participant_age') were assessed for their stress levels ('stress_score') using a standard psychological stress test, with scores ranging from -4 to 40. They were also asked about their regular participation in mindfulness training ('mindfulness_training') and physical exercise ('physical_exercise'), which were recorded as binary variables (1 for regular participation, 0 for irregular or no participation). The participants' family history of mental health issues ('family_history') was also recorded as a binary variable (1 for presence, 0 for absence). The participants were either assigned to undergo CBT ('cognitive_behavioral_therapy') or not (1 for undergoing therapy, 0 for not). The participants' anxiety level ('anxiety_level') post-intervention was measured using a standard anxiety test.",Can undergoing cognitive behavioral therapy boost anxiety reduction?,9.501849941,observational,observational_data_37.csv,psychology,cognitive_behavioral_therapy,anxiety_level,"participant_age, stress_score, mindfulness_training, physical_exercise, family_history",,,,,,
Synthetic Dataset,"The dataset was collected by the Department of Social Services to evaluate the impact of a Universal Basic Income (UBI) program on the monthly income of recipients. The 'recipient_age' is the age of the UBI recipient. The 'previous_month_income' is the income of the recipient from the previous month before the UBI program was implemented. The 'household_size' indicates the number of individuals in the recipient's household. The 'unemployment_duration' is the number of months the recipient has been unemployed. The 'education_years' is the total years of schooling completed by the recipient. The 'rural_residence' is a binary variable indicating whether the recipient resides in a rural area (1) or not (0). The 'previous_beneficiary_status' is a binary variable indicating whether the recipient was a beneficiary of any social welfare program prior to the UBI program (1) or not (0). The 'ubi_receipt' is a binary variable indicating whether the recipient is a beneficiary of the UBI program (1) or not (0). Finally, 'current_month_income' is the recipient's income for the current month after the UBI program was implemented.",Will the introduction of a Universal Basic Income program increase income among recipients?,9.466051917,observational,observational_data_44.csv,policy,ubi_receipt,current_month_income,"recipient_age, previous_month_income, household_size, unemployment_duration, education_years, rural_residence, previous_beneficiary_status",,,,,,
Synthetic Dataset,"The Department of Education collected this dataset to study the impact of online learning platforms on students' end-of-year grades. The 'teacher_age' represents the average age of teachers in years. The 'average_study_hours' represents the average number of study hours per week per student. The 'low_income_status' is a binary variable, indicating whether the student comes from a low-income family, with 1 meaning 'Yes' and 0 meaning 'No'. The 'rural_school' is another binary variable, indicating whether the school is located in a rural area, with 1 meaning 'Yes' and 0 meaning 'No'. The 'online_learning' represents whether the student has access to an online learning platform, with 1 meaning 'Yes' and 0 meaning 'No'. The 'end_of_year_grades' represents the student's final grades at the end of the academic year.",Does introducing online learning platforms result in improved final grades?,4.351681816,observational,observational_data_58.csv,education,online_learning,end_of_year_grades,"teacher_age, average_study_hours, low_income_status, rural_school",,,,,,
Synthetic Dataset,"The Department of Justice collected this dataset as part of a nationwide study to examine the impact of alternative sentencing programs on recidivism rates among defendants. The 'defendant_age' is the age of the defendant at the time of trial, measured in years. The 'trial_length' represents the length of the trial in days. The 'legal_representation' is a binary variable indicating whether the defendant had legal representation (value of 1) or not (value of 0). The 'prior_convictions' is a binary variable indicating whether the defendant had prior convictions (value of 1) or not (value of 0). The 'violent_crime' is a binary variable indicating whether the defendant was on trial for a violent crime (value of 1) or not (value of 0). The 'alternative_sentencing' is a binary variable, indicating whether the defendant was sentenced to an alternative program (value of 1) or not (value of 0). Lastly, 'recidivism_rate' is a measure of the defendant's likelihood to reoffend.",Can alternative sentencing programs improve defendant rehabilitation outcomes?,8.36667416,observational,observational_data_9.csv,law,alternative_sentencing,recidivism_rate,"defendant_age, trial_length, legal_representation, prior_convictions, violent_crime",,,,,,
Synthetic Dataset,"This dataset was collected from a real-world observation of patients in several hospitals across the country. The aim was to study the potential factors influencing heart attack occurrences and the impact of statin treatment on these occurrences. The 'age' represents the age of the patient. The 'blood_pressure' shows the systolic blood pressure of the patient. The 'cholesterol_level' represents the level of cholesterol in the patient's blood. The 'bmi' indicates the Body Mass Index of the patient. The 'smoking_status' is a binary variable where 1 indicates the patient is a smoker and 0 means the patient is a non-smoker. The 'diabetes_status' is a binary variable where 1 means the patient has diabetes and 0 means the patient doesn't. The 'family_history_of_heart_disease' is a binary variable where 1 indicates a positive family history and 0 means no family history of heart disease. The 'statin_treatment' is a binary variable where 1 means the patient has been prescribed statin treatment and 0 means they haven't. Finally, 'heart_attack_occurrence' represents the occurrence of heart attacks, where higher values indicate higher frequency or severity of heart attacks.",Does the prescription of cholesterol-lowering drugs (statins) help reduce the occurrence of heart attacks?,1.303792394,observational,observational_data_1.csv,healthcare,statin_treatment,heart_attack_occurrence,"age, blood_pressure, cholesterol_level, bmi, smoking_status, diabetes_status, family_history_heart_disease",,,,,,
Synthetic Dataset,This dataset was gathered by the Department of Housing and Urban Development to examine the relationships between socio-economic factors and home ownership in the United States. The 'age' represents the age of the respondents. The 'education_years' reflects the total years of formal education that respondents have completed. The 'annual_income' indicates the respondent's total annual income in US dollars. The 'gender' denotes whether the respondent is male (1) or female (0). The 'minority_status' indicates whether the respondent belongs to a racial or ethnic minority group (1) or not (0). The 'financial_aid' represents whether the respondent has received any form of financial aid (1) or not (0) for housing. The 'home_ownership' is the percentage of home ownership within the respondent's community.,Does receiving financial aid lower barriers to home ownership in communities?,9.934592328,observational,observational_data_16.csv,sociology,financial_aid,home_ownership,"age, education_years, annual_income, gender, minority_status",,,,,,
Synthetic Dataset,"The dataset was collected by the Department of Social Services to assess the impact of a new social welfare policy focused on displaced workers. The data was gathered from the application forms submitted by individuals for the new policy. The 'age_at_application' indicates the age of the applicant at the time of application. The 'prior_experience_years' represents the total years of work experience the applicant has before application, while 'previous_month_income' is the applicant's income in the month before the policy application. The 'dependents_count' is the number of individuals financially dependent on the applicant. The 'unemployed_status' is a binary variable indicating whether the applicant is currently unemployed (1) or not (0). The 'prior_beneficiary_status' refers to whether the applicant has benefited from any social welfare policy before (1) or not (0). The 'policy_enrollment_status' is a binary variable indicating whether the applicant enrolled in the new policy (1) or not (0). The 'post_policy_monthly_income' represents the applicant's income in the month following the policy application or its denial.",What effect does enrolling in the new social welfare policy have on displaced workers' monthly income?,3.068683162,observational,observational_data_11.csv,policy,policy_enrollment_status,post_policy_monthly_income,"age_at_application, prior_experience_years, previous_month_income, dependents_count, unemployed_status, prior_beneficiary_status",,,,,,
Synthetic Dataset,"The dataset was collected by the Department of Labor to study the effects of a recent minimum wage increase on the labor market. The 'years_of_experience' represents the number of years of work experience an individual has. The 'hours_per_week' denotes the average hours worked per week by the individual. The 'part_time' is a binary variable indicating whether the job is part-time (1) or full-time (0). The 'job_training' signifies whether the individual has undergone any job-specific training, with 1 for yes and 0 for no. The 'minimum_wage_increase' is a binary variable indicating whether the individual's wage was affected by the minimum wage increase i.e., 1 for an increase and 0 for no change. Finally, 'monthly_income' indicates the individual's monthly income.",How does a rise in the minimum wage impact the monthly income of workers in the labor market?,3.38650213,observational,observational_data_10.csv,labor,minimum_wage_increase,monthly_income,"years_of_experience, hours_per_week, part_time, job_training",,,,,,
Synthetic Dataset,"This dataset has been compiled by the Department of Economics in collaboration with the Department of Rural Development to study the impact of a microcredit program on the monthly income of participants in rural areas. The age of participants ('age_of_participant') and their years of working experience ('years_of_experience') are collected. The gender ('gender') is a binary variable denoting whether the participant is male (1) or female (0). Whether the participant lives in a rural area ('rural_residence') is also a binary variable, with 1 indicating rural residence and 0 indicating urban residence. The 'previous_beneficiary' is binary and indicates whether the participant was a beneficiary of any other government program prior to this study (1 for yes, 0 for no). The 'microcredit_participation' is binary and shows whether the participant was part of the microcredit program (1 for yes, 0 for no). The 'monthly_income' represents the participant's monthly income.",Does participating in the microcredit program cause an increase in monthly income?,3.472725008,observational,observational_data_23.csv,economics,microcredit_participation,monthly_income,"age_of_participant, years_of_experience, gender, rural_residence, previous_beneficiary",,,,,,
Synthetic Dataset,"This dataset was collected by the National Political Studies Institute to explore the influence of political campaign exposure on voting behavior. The study aimed to understand the dynamics of voting behavior in a real-world observation scenario. The 'age' represents the age of the voting individual. The 'political_engagement_score' denotes the individual's level of political engagement on a scale of 1 to 100. The 'income' represents the individual's annual income in thousands of dollars. The 'gender' is a binary variable where 0 denotes 'male' and 1 denotes 'female'. The 'minority_status' is a binary variable, with 1 indicating that the individual belongs to a minority group and 0 indicating otherwise. The 'campaign_exposure' is a binary variable where 1 indicates that the individual was exposed to a political campaign and 0 indicates no exposure. The 'voting_behavior' is measured as the likelihood of the individual voting for a specific party, on a scale of 1 to 100.",Can exposure to a political campaign improve an individual's likelihood to vote for a particular political party?,7.703936159,observational,observational_data_47.csv,political_science,campaign_exposure,voting_behavior,"age, political_engagement_score, income, gender, minority_status",,,,,,
Synthetic Dataset,"This dataset was collected by the Department of Labor for a study on the economic effects of a job retraining program. The age of the individual is captured in the 'age' column. The 'years_of_experience' column denotes the number of years the individual has spent in their field of work, while 'previous_month_income' represents the income of the individual in the month prior to the study. The 'education_years' column marks the total years of formal education completed by the individual. Gender is a binary variable captured in the 'gender' column, with 1 indicating male and 0 indicating female. The 'rural_residence' column is a binary variable indicating whether the individual resides in a rural area (1 for yes, 0 for no). The 'job_retraining_program' column indicates whether the individual has participated in a job retraining program (1 for yes, 0 for no). The 'current_month_income' indicates the individual's income in the month after the job retraining program.",Does participating in a job retraining program lead to an increase in monthly income?,1.840863219,observational,observational_data_15.csv,economics,job_retraining_program,current_month_income,"age, years_of_experience, previous_month_income, education_years, gender, rural_residence",,,,,,
Synthetic Dataset,"The dataset was collected as part of a large-scale study conducted by the National Institute of Mental Health (NIMH) to understand the role of cognitive-behavioral therapy in anxiety management. The study collected data from a diverse population across the United States. The variables in the dataset represent key demographic information and psychological indicators. The 'age' is the age of the participant in years. The 'self_esteem_score' is a score derived from a standardized self-esteem assessment. The 'stress_level' is a score based on a standardized stress assessment test. The 'gender' (binary: 0 for male, 1 for female) indicates the gender of the participant. The 'history_of_depression' (binary: 0 for no, 1 for yes) indicates whether the participant has had a previous diagnosis of depression. The 'physical_activity' (binary: 0 for low, 1 for high) represents the level of regular physical activity of the participant. The 'cognitive_behavioral_therapy' (binary: 0 for no, 1 for yes) indicates whether the participant has undergone cognitive behavioral therapy. The 'anxiety_level' is a score based on a standardized anxiety assessment test.",Does undergoing cognitive-behavioral therapy result in decreased anxiety levels?,2.641362093,observational,observational_data_12.csv,psychology,cognitive_behavioral_therapy,anxiety_level,"age, self_esteem_score, stress_level, gender, history_of_depression, physical_activity",,,,,,
Synthetic Dataset,"The dataset was collected by the Department of Housing and Urban Development (HUD) to assess the impact of a new housing policy on citizens' quality of life. The 'age_years' refers to the age of the individual in years. The 'monthly_income_dollars' captures the individual's monthly income in dollars. The 'cost_of_living_index' denotes the cost of living index in the area where the individual resides. The 'home_ownership' is a binary variable indicating whether the individual owns a home (1) or not (0). The 'policy_beneficiary' is a binary variable that denotes whether the individual is a beneficiary of the new housing policy (1) or not (0). The 'housing_policy' is a binary variable, indicating whether the new policy was implemented in the individual's area (1) or not (0). The 'quality_of_life_index' represents the quality of life as perceived by the individual.",Can the introduction of the new housing policy help improve people's quality of life?,6.45718904,observational,observational_data_2.csv,policy,housing_policy,quality_of_life_index,"age_years, monthly_income_dollars, cost_of_living_index, home_ownership, policy_beneficiary",,,,,,
Synthetic Dataset,"This dataset was collected by the Department of Justice as part of a longitudinal study aiming to understand the factors influencing recidivism rates among defendants and the impact of alternative sentencing programs. The dataset includes information about the defendant's age ('defendant_age'), number of previous convictions ('previous_convictions'), and the complexity of their case ('case_complexity'). It also includes binary variables indicating whether the defendant was represented by a public defender ('public_defender'), whether a plea bargain was made ('plea_bargain'), and whether the charge was a felony ('felony_charge'). The 'alternative_sentencing' indicates whether the defendant was assigned to an alternative sentencing program instead of a traditional prison sentence. The 'recidivism_rate' measures the rate at which defendants committed another crime after their sentence.",Does assigning defendants to alternatives to traditional prison sentencing lower their likelihood of committing another crime?,4.18143841,observational,observational_data_13.csv,Law,alternative_sentencing,recidivism_rate,"defendant_age, previous_convictions, case_complexity, public_defender, plea_bargain, felony_charge",,,,,,
Synthetic Dataset,"This dataset was collected by a team of researchers from the National Institute of Health for a study on the impact of statin treatment on the occurrence of cardiovascular events. The data was collected from several hospitals across the country and includes a variety of variables. The patient's age ('patient_age') is recorded. Body Mass Index ('bmi'), blood pressure ('blood_pressure'), and cholesterol level ('cholesterol_level') are also recorded. Smoking status ('smoking_status'), alcohol usage ('alcohol_usage'), and family history of cardiovascular diseases ('family_history') are binary variables coded as 0 for 'No' and 1 for 'Yes'. The use of statin treatment ('statin_treatment') is also a binary variable coded as 0 for 'No treatment' and 1 for 'Received treatment'. The risk of occurrence of a cardiovascular event is given by the variable 'cardiovascular_event'.",Will the use of statins boost heart attack risks?,9.02537665,observational,observational_data_26.csv,healthcare,statin_treatment,cardiovascular_event,"patient_age, bmi, blood_pressure, cholesterol_level, smoking_status, alcohol_usage, family_history",,,,,,
Synthetic Dataset,"The dataset was collected by the Department of Sociology from several communities across the nation to study the effect of a job training program on employment status, especially among disadvantaged groups. The 'age' represents the age of the individuals, ranging from their early 20s to late 50s. The 'education_years' represents the number of years of formal education completed by the individuals. The 'income' represents the individual's annual income in US dollars. The 'socioeconomic_index' represents the socioeconomic status of the individual based on factors like occupation, income, and education. The 'number_of_dependents' represents the number of individuals, such as children or elderly parents, that are financially dependent on the individual. The 'gender' is a binary variable (1 for male, 0 for female). The 'minority_status' is a binary variable indicating whether the individual is from a minority group (1 for yes, 0 for no). The 'unemployed' is a binary variable indicating the individual's employment status at the beginning of the study (1 for unemployed, 0 for employed). The 'participation_in_job_training' indicates whether the individual participated in the job training program (1 for yes, 0 for no). The 'employment_status_after_year' is a score that measures the chances of getting a job, with higher values meaning higher likelihood of being employed.",Does participating in a job training program cause increased chances of finding employment?,4.381615188,observational,observational_data_24.csv,sociology,participation_in_job_training,employment_status_after_year,"age, education_years, income, socioeconomic_index, number_of_dependents, gender, minority_status, unemployed",,,,,,
Synthetic Dataset,"This dataset was collected by a team of psychologists from various clinics across the country, who were studying the impact of mindfulness-based stress reduction (MBSR) on anxiety levels. The participant's age ('participant_age') represents the age of the individuals at the time of the study. The stress score ('stress_score') indicates the stress level of the participants before the intervention. Psychotherapy history ('psychotherapy_history') is a binary variable indicating whether the participant had undergone psychotherapy before (1) or not (0). Meditation practice ('meditation_practice') is a binary variable representing whether the participant had any prior experience with meditation (1) or not (0). Mindfulness intervention ('mindfulness_intervention') is a binary variable, indicating whether the participant underwent the MBSR intervention (1) or received standard care (0). Finally, the post-treatment anxiety score ('post_treatment_anxiety_score') represents the participant's level of anxiety after the intervention or standard care.",Can practicing mindfulness-based stress reduction techniques improve anxiety reduction?,8.999909795,observational,observational_data_45.csv,psychology,mindfulness_intervention,post_treatment_anxiety_score,"participant_age, stress_score, psychotherapy_history, meditation_practice",,,,,,
Synthetic Dataset,"This dataset has been gathered by a team of psychologists conducting an observational study in a large urban hospital. The research aims to understand the impact of mindfulness-based therapy on anxiety levels, taking into account the patient's age, stress levels, participation in other forms of therapy, and family history of mental health issues. The 'age_years' represents the age of the patient in years. The 'stress_level' reflects the perceived stress level of the individual on a scale of 2 to 39. The 'therapy_participation' is a binary variable that denotes whether the patient participates in other forms of therapy (1) or not (0). The 'family_history' is a binary variable indicating whether there is a known familial history of mental health issues (1) or not (0). The 'mindfulness_treatment' is a binary variable indicating whether the patient is receiving mindfulness-based therapy (1) or not (0). Finally, 'anxiety_level' measures the patient's anxiety level.",Does practicing mindfulness-based therapy help reduce anxiety levels?,3.9191236,observational,observational_data_20.csv,psychology,mindfulness_treatment,anxiety_level,"age_years, stress_level, therapy_participation, family_history",,,,,,
Synthetic Dataset,"This dataset was collected by the Department of Economics to investigate the impact of microloans on the income of recipients. The data is observational, having been compiled from a local microfinance institution's records. The dataset contains the following variables: 'age', which represents the age of the loan recipients; 'education_years' indicating the number of years of formal education completed by the individual; 'previous_month_income' representing the individual's income in the month prior to receiving the loan; 'job_experience_years' indicating the number of years the individual has been in their current job; 'gender' (binary), with 1 indicating male and 0 indicating female; 'minority_status' (binary), with 1 indicating that the individual belongs to a minority group and 0 indicating otherwise; 'rural_residence' (binary), with 1 indicating that the individual resides in a rural area and 0 indicating urban residence; 'loan_received' (binary), with 1 indicating that the individual has received a microloan and 0 indicating otherwise; and 'current_month_income' representing the individual's income in the month after receiving the loan.",Do small loans help boost the income of participating individuals?,1.543859694,observational,observational_data_31.csv,economics,loan_received,current_month_income,"age, education_years, previous_month_income, job_experience_years, gender, minority_status, rural_residence",,,,,,
Synthetic Dataset,"This dataset was collected by a national political research institute to examine the impact of a political campaign on voter turnout during a national election. It includes variables such as the age of the voter, the income level of the voter, the gender of the voter (with 1 referring to male and 0 to female), and the minority status of the voter (with 1 indicating that the voter belongs to a minority group and 0 indicating that they do not). The voter turnout represents the percentage of eligible voters who participated in the election. The campaign exposure indicates whether the voter was exposed to the campaign (1) or not (0). The political engagement score is a measure of the voter's interest and involvement in politics. Voters scoring 4 or higher on political engagement received campaign materials.",Did exposure to the political campaign lead to increased voter participation in the national election?,6.180614408,rdd,rdd_data_55.csv,political_science,campaign_exposure,voter_turnout,"age, income, gender, minority_status",political_engagement_score,,,,,political_science
Synthetic Dataset,"The dataset was collected by a regional health department to evaluate the effect of a new rehabilitation program on the recovery time of patients with cardiovascular diseases. The 'patient_age' represents the age of the patients, expressed in years. The 'health_index' is a scale from 1 to 11 indicating the overall health status of the patient, with higher values representing better health. The 'smoking_status' indicates whether the patient is a current smoker (1) or not (0). The 'obesity_status' indicates whether the patient is clinically obese (1) or not (0). The 'recovery_time' indicates the number of days it took for a patient to recover. The 'treatment_received' indicates whether the patient received the new rehabilitation program (1) or the standard care (0). Program eligibility was determined by income level (variable income_level); patients with an income level above 12 qualified for the enhanced rehabilitation program.",How does the new rehabilitation program affect cardiovascular disease recovery time?,9.292910523,rdd,rdd_data_1.csv,healthcare,treatment_received,recovery_time,"patient_age, health_index, smoking_status, obesity_status",income_level,,,,,healthcare
Synthetic Dataset,"This dataset was collected by a national banking institution studying the impact of a new policy that grants loans to individuals meeting credit requirements. The aim is to investigate whether this new policy is effective in improving the loan repayment rate, particularly among economically underprivileged and minority groups. The variables in the dataset include 'age_years' which represents the age of the individual, 'income_level' indicating the annual income of the individual, 'gender' denoting the gender of the individual (0-Female, 1-Male), 'minority_status' signifying whether the individual belongs to a minority group (0-No, 1-Yes), 'credit_score' indicating the credit score, 'loan_grant' indicating whether the individual was granted a loan (0-No, 1-Yes) based on the credit score, and 'loan_repayment_rate' representing the rate at which the loan was repaid by the individual. The new policy provides loans to individuals with credit scores of at least 6.",Can the new loan policy improve repayment rates for people with certain credit scores?,6.293487555,rdd,rdd_data_23.csv,Economics,loan_grant,loan_repayment_rate,"age_years, income_level, gender, minority_status",credit_score,,,,,economics
Synthetic Dataset,This dataset was collected by The National Legal Aid & Defender Association to study the impact of court-appointed legal representation on sentencing outcomes in criminal court cases. The 'defendant_age' represents the age of the defendant at the time of the trial. The 'prior_convictions' represents the number of prior convictions a defendant has. The 'defendant_gender' indicates whether the defendant is male (1) or female (0). The 'minority_status' indicates whether the defendant is from a minority group (1) or not (0). The 'public_defender_assigned' indicates whether a public defender was assigned (1) or not (0). The 'sentence_length' represents the length of the sentence in months. The 'offense_severity_score' is a score assigned to the severity of the offense. Defendants with offense severity reaching 24 or above typically receive court-appointed representation.,What effect did court-appointed legal representation have on sentence length in criminal cases?,8.676534731,rdd,rdd_data_9.csv,law,public_defender_assigned,sentence_length,"defendant_age, prior_convictions, defendant_gender, minority_status",offense_severity_score,,,,,law
Synthetic Dataset,The dataset was collected by a psychological research institute to study the impact of a yoga intervention program on stress levels. The participant's age ('participant_age') is recorded in years. Pre-intervention stress levels ('stress_score_pre') were measured using a standardized stress test. Gender ('gender') is coded with '1' representing males and '0' representing females. Smoking status ('smoking_status') is coded with '1' indicating that the participant is a smoker and '0' indicating a non-smoker. Yoga participation ('yoga_participation') is coded with '1' indicating participation in the yoga program and '0' indicating no participation. The program enrolled participants who scored 10 or higher on flexibility assessments. Post-intervention stress levels ('stress_score_post') are also measured using the same standardized stress test.,Can yoga programs boost stress reduction among adults,7.951805044,rdd,rdd_data_37.csv,psychology,yoga_participation,stress_score_post,"participant_age, stress_score_pre, gender, smoking_status",flexibility_score,,,,,psychology
Synthetic Dataset,"The dataset was collected by a regional sociology research institute to study the impact of a community development program on neighborhood satisfaction. The variables include age, 'income_level', gender, 'minority_status', 'neighborhood_score', 'community_program_participation', and 'neighborhood_satisfaction'. The age represents the age of the survey respondent. The 'income_level' denotes the annual income (in thousands) of the survey respondent. Gender indicates whether the respondent is male (1) or not (0). The 'minority_status' shows whether the respondent identifies as part of a minority group (1) or not (0). The 'neighborhood_score' is a measure calculated based on factors like crime rate, school quality, and access to amenities. The 'community_program_participation' indicates whether the respondent has participated in the community development program (1) or not (0). Residents in neighborhoods scoring 15 or better were eligible for the community program. The 'neighborhood_satisfaction' is a measure of the respondent's satisfaction with their neighborhood on a scale from 1 to 20, with higher scores indicating higher satisfaction.",Will community development programs enhance neighborhood satisfaction?,3.709216763,rdd,rdd_data_32.csv,sociology,community_program_participation,neighborhood_satisfaction,"age, income_level, gender, minority_status",neighborhood_score,,,,,sociology
Synthetic Dataset,"This dataset was collected by the Department of Agriculture to evaluate the impact of a new food stamp policy on the food expenditure of families, focusing on economically disadvantaged and minority groups. The 'age' represents the age of the head of the household. The 'income_level' denotes the annual income of the household. The 'gender' indicates whether the household is female-headed (1) or male-headed (0). The 'minority_status' indicates whether the household is a minority household (1) or not (0). The 'food_stamp_program_enrollment' indicates whether households are enrolled in the food stamp program post-policy change (1) or not enrolled (0). The 'food_expenditure' represents the total monthly spending on food by the household. The 'family_size' represents the number of individuals in a household. The new policy prioritizes enrollment for families with 20 or more members.",Can the new food stamp policy lower food spending in families?,2.494622119,rdd,rdd_data_40.csv,sociology,food_stamp_program_enrollment,food_expenditure,"age, income_level, gender, minority_status",family_size,,,,,sociology
Synthetic Dataset,"The dataset was collected by the state Department of Education to evaluate the impact of a new weekend coaching program aimed at improving the academic performance of high school students, particularly those from low-income families. Students achieving a GPA below 13 were given priority access to the coaching program. The 'student_age' represents the age of the student in years. The 'parent_education_level' denotes the highest level of education attained by the student's parents, measured on a scale of 1 to 10, with 1 being no formal education and 10 being a doctoral degree. The 'low_income_status' indicates whether the student comes from a low-income family (1) or not (0). The 'gender' denotes the student's gender, with male represented as 1 and female as 0. The 'pre_program_gpa' indicates the student's Grade Point Average (GPA) before participating in the weekend coaching program. The 'program_participation' indicates whether the student participated in the weekend coaching program (1) or not (0). The 'post_program_gpa' represents the student's GPA after the coaching program.",Does participating in the weekend coaching program improve the academic performance of high school students?,1.283495731,rdd,rdd_data_5.csv,education,program_participation,post_program_gpa,"student_age, parent_education_level, low_income_status, gender",pre_program_gpa,,,,,education
Synthetic Dataset,"This dataset was collected as part of a study by a labor union to understand the impact of on-the-job training on the promotion rate among union members, taking into account variables such as their age, income level, gender, and minority status. The 'worker_age' represents the age of the workers in years. The 'income_level' represents the annual income of the workers in thousands of dollars. The 'gender' indicates whether the worker is female (1) or male (0). The 'minority_status' indicates whether the worker is from a minority group (1) or not (0). The 'on_the_job_training' indicates whether or not the worker has received on-the-job training, where 1 means the worker has received the training and 0 means they have not. The 'promotion_rate' represents the rate at which workers get promoted within the organization. The 'job_satisfaction_score' is a score out of 30, based on a survey conducted among the workers. Workers scoring 22 or above on job satisfaction were eligible for training opportunities.",How does on-the-job training affect promotion rates among union members?,6.113452441,rdd,rdd_data_19.csv,labor,on_the_job_training,promotion_rate,"worker_age, income_level, gender, minority_status",job_satisfaction_score,,,,,labor
Synthetic Dataset,"The dataset was collected by a sociological research institution to evaluate the effects of a social assistance program on social mobility, particularly for minority and low-income individuals. The 'age' represents the age of the individuals. The 'income_level' represents participants' annual income in thousands of dollars. The 'gender' indicates whether participants are female (1) or male (0). The 'minority_status' indicates whether individuals belong to a minority group (1) or not (0). The 'social_assistance_program' indicates whether the individual is enrolled in the social assistance program (1) or not (0). The 'social_mobility_index' reflects the individuals' social mobility, measured by changes in income, education, and occupation over time. 'education_level' denotes the years of education, and individuals with 14 or more years of education qualified for the assistance program.",Can social assistance programs improve social mobility for individuals?,1.513239957,rdd,rdd_data_16.csv,sociology,social_assistance_program,social_mobility_index,"age, income_level, gender, minority_status",education_level,,,,,sociology
Synthetic Dataset,"This dataset was collected by a team of psychologists from a research university, studying the impact of a new therapeutic intervention on depression symptoms. The researchers aimed to understand if the new treatment is effective, and if so, how demographic factors and cognitive abilities may influence treatment outcomes. The 'age' represents the age of the participants, measured in years. The 'cognitive_test_score' represents the score of the participants on a cognitive ability test. The 'gender' indicates whether the participant is male (1) or female (0). The 'income_level' represents whether the participant's income is above or below the median income; 1 indicates above the median, and 0 indicates below. The 'treatment_effectiveness' measures the reduction in depression symptoms after treatment. The 'received_treatment' indicates whether the participant received the new treatment (1) or not (0). The variable mental_health_score denotes scores on a mental health test, and participants scoring above 2 on mental health assessments were offered the new treatment.",Will the new psychological treatment enhance depression symptom reduction?,5.739091073,rdd,rdd_data_12.csv,psychology,received_treatment,treatment_effectiveness,"age, cognitive_test_score, gender, income_level",mental_health_score,,,,,psychology
Synthetic Dataset,"The dataset was collected by the Department of Labor to evaluate the effects of a new on-the-job training program on the monthly income of employees, with a specific focus on the role of job satisfaction in determining treatment access. The variables include 'employee_age' representing the age of the employee in years, 'weekly_work_hours' indicating the number of hours the employee works in a week, 'gender' with '1' denoting male and '0' denoting female, 'smoking_status' with '1' indicating the employee is a smoker and '0' indicating non-smoker. The 'training_program_participation' indicates whether an employee participated in the training program, with '1' denoting participation and '0' non-participation. The 'monthly_income' represents the monthly income of the employee in US dollars. The 'job_satisfaction_score' denotes the employee's job satisfaction score. Employees with job satisfaction scores of 4 or higher were invited to participate in the training program.",Can on-the-job training programs increase the income of employees?,3.953116055,rdd,rdd_data_6.csv,labor,training_program_participation,monthly_income,"employee_age, weekly_work_hours, gender, smoking_status",job_satisfaction_score,,,,,labor
Synthetic Dataset,"This dataset was collected by a national crime prevention agency to evaluate the impact of a new crime prevention policy. The data includes 'age' (denoting the age of individuals), 'income_level' (specifying the income level of individuals), 'gender' (where 1 stands for male and 0 for female), 'minority_status' (with 1 indicating a minority individual and 0 otherwise). The 'policy_enrollment' (1 denotes enrollment in the policy, 0 denotes non-enrollment), 'crime_rate' (indicating the crime rate in the community where the individual resides), and 'crime_severity_score' (used for enrollment in the policy). The dataset investigates the impact of the new crime prevention policy on the crime rate, considering factors like age, income level, gender, and minority status. Communities with crime severity scores above 13 were enrolled in the prevention policy.",Does the new crime prevention policy reduce community crime rates?,9.214864494,rdd,rdd_data_7.csv,policy,policy_enrollment,crime_rate,"age, income_level, gender, minority_status",crime_severity_score,,,,,policy
Synthetic Dataset,"The dataset was collected by a mental health research institution that aimed to analyze the effect of a cognitive-behavioral therapy (CBT) intervention on the anxiety levels of individuals. The study considered several factors, including participant age ('participant_age'), education years ('education_years'), gender ('gender', 1 for female and 0 for male), and smoking status ('smoking_status', 1 for smoker and 0 for non-smoker). The 'cognitive_behavioural_therapy' (1 for received CBT and 0 for did not receive CBT) indicates whether the participant underwent the therapy. The 'post_treatment_anxiety_score' represents the anxiety level after the treatment. Individuals with pre-treatment anxiety scores (pre_treatment_anxiety_score) of 22 or higher were offered CBT intervention.",How does cognitive-behavioral therapy affect anxiety levels in individuals?,4.988833901,rdd,rdd_data_29.csv,psychology,cognitive_behavioural_therapy,post_treatment_anxiety_score,"participant_age, education_years, gender, smoking_status",pre_treatment_anxiety_score,,,,,psychology
Synthetic Dataset,"This dataset was collected by a leading healthcare research institution to study the impact of a new medication on recovery time for patients suffering from a chronic lung disease. The dataset includes patient age, income level, gender (where 0 stands for male and 1 for female), smoker status (where 0 stands for non-smoker and 1 for smoker), treatment access (where 0 means no access to the new medication and 1 means access to the new medication), recovery time (representing the number of days a patient takes to recover), and severity score (representing the severity of the lung disease). Patients with disease severity scores of 24 or above were considered eligible for the new medication.",Can access to new medication improve recovery time for patients with severe lung disease?,4.85176817,rdd,rdd_data_26.csv,healthcare,treatment_access,recovery_time,"patient_age, income_level, gender, smoker_status",severity_score,,,,,healthcare
Synthetic Dataset,"The Department of Health collected this dataset to evaluate the impact of a new public health policy aiming to increase health outcomes in low-income communities. The variables in the dataset are: 'age', representing the number of years since the individual's birth; income_level detailing the annual income of the individual in thousands of dollars; gender, where 1 represents male and 0 represents female; minority, where 1 indicates the individual belongs to a minority group and 0 indicates they do not. The health_status variable represents the individual's self-reported health status on a scale from 0 (poor health) to 20 (excellent health). The health_policy is a binary variable, where 1 indicates that the individual participated in the new health policy and 0 indicates they did not. Eligibility was determined using the 'health_scores' variable, which measures the scores on an assessment test. Individuals with scores above 20 were eligible to participate.",Will the new public health policy enhance health outcomes?,1.962260886,rdd,rdd_data_2.csv,Policy,health_policy_participation,health_status,"age, income_level, gender, minority_status",health_score,,,,,policy
Synthetic Dataset,"This dataset was collected by a national housing agency to evaluate the impact of a housing subsidy program on the rate of home ownership, particularly among low-income and minority groups. The 'individual_age' represents the age of the individual in years. The 'individual_income' represents the annual income of the individual in thousands. The 'gender' indicates whether the individual is female (1) or male (0). The 'minority_status' indicates whether the individual belongs to a minority group (1) or not (0). The 'housing_subsidy' indicates whether or not the individual received the housing subsidy, with 1 indicating that they did, and 0 indicating that they did not. The 'home_ownership_rate' measures the rate of homeownership in the community. Individuals with credit scores of 25 or above, as measured by the variable 'credit_score', received housing subsidies.",Can housing subsidies lower barriers to home ownership?,6.24700446,rdd,rdd_data_24.csv,sociology,housing_subsidy,home_ownership_rate,"individual_age, individual_income, gender, minority_status",credit_score,,,,,sociology
Synthetic Dataset,"This dataset was collected by a non-profit policy think tank to evaluate the effect of a new political policy on citizens' satisfaction, especially among low-income groups. 'age' represents the age of the individuals surveyed. The variable 'income_level' represents the individuals' income level in thousands of dollars, and 'gender' is coded as 1 for male and 0 for female. The variable 'minority_status' indicates whether the individual belongs to a minority group (1) or not (0). The 'political_engagement_score' is used for policy exposure determination. The 'policy_exposure' variable indicates whether there was exposure to the policy (1) or no exposure (0). The 'post_policy_satisfaction' measures the level of satisfaction with the political system after the policy's implementation. Citizens with political engagement scores of 16 or higher received information about the new policy.",Does exposure to the new political policy increase citizens' satisfaction with the political system?,9.911627356,rdd,rdd_data_22.csv,political_science,policy_exposure,post_policy_satisfaction,"age, income_level, gender, minority_status",political_engagement_score,,,,,political_science
Synthetic Dataset,"This dataset was collected by a non-profit research institution studying the effects of political campaign advertising on post-election satisfaction among voters, particularly focusing on age, education, gender, and minority status. The variables in the dataset include 'age', which represents the age of the respondent in years, 'education_years' representing the total years of formal education completed, 'gender' with 0 indicating male and 1 indicating female, and 'minority_status', where 1 indicates that the respondent belongs to a minority group and 0 otherwise. The 'political_ad_exposure' indicates whether the respondent was exposed to a specific political ad campaign (1) or not (0). The 'post_election_satisfaction' measures the level of satisfaction with election results. The variable 'political_engagement_scores' measures how active an individual is politically. Voters with engagement scores above 25 were targeted for the advertising campaign.",How does political advertisement exposure affect voters' satisfaction with election results?,7.789183287,rdd,rdd_data_14.csv,political_science,political_ad_exposure,post_election_satisfaction,"age, education_years, gender, minority_status",political_engagement_score,,,,,political_science
Synthetic Dataset,"This dataset was collected by the Department of Labor to evaluate the impact of a new employment policy aimed at helping low-income adults gain employment. The policy is designed to provide vocational training and job placement assistance to individuals whose income falls below a certain threshold. The 'income' represents the annual income of the individuals in dollars. The 'age' indicates the age of the individuals. The 'gender' indicates whether the individual is male (1) or female (0). The 'region' indicates whether the location is urban (1) or rural (0). The 'policy_participation' signifies whether the individual participated in the policy program (1) or not (0). The 'employment_rate' indicates the rate of employment among the individuals. Individuals with eligibility scores above 4, as given by the variable ""earning_threshold"", are eligible for the employment assistance program.",What effect does the new employment policy have on job finding effectiveness?,2.164321262,rdd,rdd_data_11.csv,policy,policy_participation,employment_rate,"income, age, gender, region",earning_threshold,,,,,policy
Synthetic Dataset,"The dataset was collected by a mental health facility investigating the impact of Cognitive Behavioral Therapy (CBT) on stress levels in patients diagnosed with anxiety or depression. The dataset includes variables such as 'participant_age' and 'counselling_sessions', which indicates the number of counselling sessions attended by the participant prior to the study. The dataset also includes binary variables such as 'anxiety_diagnosis' and 'depression_diagnosis', where 1 represents a positive diagnosis, and 'antidepressant_use', where 1 indicates usage. The 'cbt_intervention' is another binary variable where 1 indicates that the participant received CBT. The dataset also tracks stress scores at the middle ('mid_treatment_stress_score') and end of the treatment ('post_treatment_stress_score'), with higher scores indicating higher stress levels.",Can attending a series of cognitive-behavioral therapy sessions help reduce stress levels?,9.411756331,frontdoor,frontdoor_data_8.csv,psychology,cbt_intervention,post_treatment_stress_score,"participant_age, counselling_sessions, anxiety_diagnosis, depression_diagnosis, antidepressant_use",,,,mid_treatment_stress_score,,
Synthetic Dataset,"The Department of Labor collected this dataset to investigate the effects of a skills training program on workers' annual income. The data was collected from a policy rollout, where the skills training program was offered to a select group of workers. The variables in this dataset are 'age' (age of the worker in years), 'years_in_current_job' (the number of years the worker has been in their current job), 'gender' (binary: male or female), 'rural_residence' (binary: whether the worker lives in a rural area, 1=yes, 0=no), 'skills_training_participation' (binary: whether the worker participated in the skills training program, 1=yes, 0=no), 'new_skills_acquired' (a score indicating the level of new skills acquired by the worker), and 'annual_income' (the worker's annual income in USD).",Does the skills training program improve workers' annual earnings?,9.733818107,frontdoor,frontdoor_data_51.csv,labor,skills_training_participation,annual_income,"age, years_in_current_job, gender, rural_residence",,,,new_skills_acquired,,
Synthetic Dataset,"This dataset was collected by a large urban school district in the United States to study the impact of an online learning program on student performance. The district sought to understand the influence of various factors on student achievement, including student age ('student_age'), the number of hours a student spends studying each week ('hours_study_per_week'), whether the student comes from a low-income family ('low_income_status' where 1 indicates a low-income family and 0 otherwise), whether the student receives private tutoring ('private_tutoring' where 1 indicates private tutoring and 0 otherwise), and the education level of the student's parents ('parental_education'). The variable 'enrollment_in_elearning' represents whether the student is enrolled in an online learning program (where 1 indicates enrollment and 0 otherwise). The dataset includes the student's intermediate test score ('intermediate_test_score') and the student's final exam score ('final_exam_score').",Can enrolling in an online learning program influence a student's final exam performance?,3.450114729,frontdoor,frontdoor_data_58.csv,education,enrollment_in_elearning,final_exam_score,"student_age, hours_study_per_week, low_income_status, private_tutoring, parental_education",,,,intermediate_test_score,,
Synthetic Dataset,"This dataset was collected by the Department of Justice to study the impact of sentencing length on recidivism within five years. The 'defendant_age' represents the age of the defendant at the time of sentencing. The 'prior_convictions' indicates the number of prior convictions the defendant has. The 'charge_severity' measures the severity of the charges on a scale from 1 to 21. The 'legal_representation' is a binary variable indicating whether the defendant had legal representation (1) or not (0). The 'bail_status' is a binary variable indicating whether the defendant was out on bail (1) or not (0) during the trial. The 'diversion_program' is a binary variable indicating whether the defendant was enrolled in a diversion program (1) or not (0). The 'sentencing_length' represents the length of the sentence in months. The 'time_in_custody' represents the actual time the defendant spent in custody. Finally, 'recidivism' provides a measure of how likely an individual is to reoffend.",How does the length of a prison sentence impact the likelihood of a defendant reoffending?,8.8209702,frontdoor,frontdoor_data_9.csv,law,sentencing_length,recidivism,"defendant_age, prior_convictions, charge_severity, legal_representation, bail_status, diversion_program",,,,time_in_custody,,
Synthetic Dataset,"This dataset was collected by the Department of Social Welfare in a South Asian country, aiming to assess the impact of a government cash transfer program on household wellbeing. The dataset includes variables like 'age_of_head', which represents the age of the household head; 'income', the total annual income of the household; 'rural_residence', a binary variable indicating whether the household is located in a rural area (1) or not (0); 'previous_beneficiary_status', a binary variable indicating whether the household has previously received any social benefits (1) or not (0); and 'underemployment_status', a binary variable representing whether the household head is underemployed (1) or not (0). The 'cash_transfer_enrollment' is a binary variable indicating whether the household is enrolled in the cash transfer program (1) or not (0). The 'consumption' represents the amount of money spent by the household on goods and services, and 'household_wellbeing' is a measure of the overall wellbeing of the household, including factors like health, education, and living standards.",Will receiving government cash assistance result in improved quality of life?,3.801411928,frontdoor,frontdoor_data_48.csv,economics,cash_transfer_enrollment,household_wellbeing,"age_of_head, income, rural_residence, previous_beneficiary_status, underemployment_status",,,,consumption,,
Synthetic Dataset,"The dataset was collected by a leading political research institute to study the impact of political campaign exposure on the voting rates of citizens. The age of the citizens is represented by the variable 'age', with years being the unit of measurement. Their annual income is depicted by 'income', measured in thousands of dollars. The 'gender' is a binary variable, with 0 denoting male and 1 female. The 'rural_residence' is also binary, with 0 indicating urban and 1 rural residence. The 'party_affiliation' is a binary variable where 0 indicates affiliation with Party A and 1 with Party B. The 'policy_exposure' denotes whether a citizen was exposed to the political campaign, with 1 indicating exposure and 0 non-exposure. The 'political_engagement_index' is a measure of the citizen's engagement in political activities, and 'voting_rate' represents the percentage of eligible voters who cast a vote.",Does exposure to a political campaign serve as a key to increasing the likelihood of citizens casting their vote?,2.656985956,frontdoor,frontdoor_data_22.csv,political_science,policy_exposure,voting_rate,"age, income, gender, rural_residence, party_affiliation",,,,political_engagement_index,,
Synthetic Dataset,"This healthcare dataset was collected by a major hospital in the United States to study the impact of beta-blocker treatment on heart attack outcomes among high-risk patients. The dataset includes the following variables: 'patient_age' (representing the age of the patient), 'hospital_visits_per_year' (indicating the number of hospital visits made by the patient in a year), 'smoking_status' (binary variable where 1 indicates a current smoker and 0 indicates a non-smoker), 'heart_disease_status' (binary variable where 1 indicates diagnosis of heart disease and 0 indicates no heart disease), 'statin_prescription_status' (binary variable where 1 indicates that the patient has been prescribed statins and 0 indicates no statin prescription), 'heart_attack_outcome' (representing the severity of the heart attack outcome on a scale from 0 to 100, with higher values indicating more severe outcomes), 'beta_blocker_treatment_status' (binary where 1 indicates that the patient has been treated with beta-blockers and 0 indicates no beta-blocker treatment), and 'blood_pressure_level' (representing the patient's average blood pressure level).",Can taking beta-blockers lower the severity of heart attacks in high-risk patients?,4.901233456,frontdoor,frontdoor_data_59.csv,healthcare,beta_blocker_treatment_status,heart_attack_outcome,"patient_age, hospital_visits_per_year, smoking_status, heart_disease_status, statin_prescription_status",,,,blood_pressure_level,,
Synthetic Dataset,"The dataset is a product of a nationwide survey carried out by the Department of Psychology from a renowned university. The intent was to study the impact of social media usage on the mental health of individuals. The survey was distributed to a random selection of individuals aged 18 and above across the country. The participant's age is represented by 'participant_age'. The 'hours_of_screen_time' shows the average number of hours the participant spends daily on screens for non-work-related activities. The 'number_of_friends' indicates the number of friends the participant has in their social circle. The 'smoking_status' and 'physical_activity_status' are binary variables, with 'smoking_status' indicating whether the participant is a smoker (1) or not (0), and 'physical_activity_status' indicating whether the participant engages in regular physical activity (1) or not (0). The 'social_media_usage' indicates the average number of hours the participant spends daily on social media platforms. The 'self_esteem_score' represents the participant's self-esteem level. Lastly, 'mental_health_score' represents the participant's mental health status.",Will spending time on social media platforms enhance people's mental health?,3.156387742,frontdoor,frontdoor_data_37.csv,psychology,social_media_usage,mental_health_score,"participant_age, hours_of_screen_time, number_of_friends, smoking_status, physical_activity_status",,,,self_esteem_score,,
Synthetic Dataset,"This dataset was collected by the Bureau of Labor Statistics to examine the impact of job training programs on annual income, considering factors such as age, years of education, gender, and union membership. 'Age' represents the individual's age in years. 'Education_years' denotes the total years of formal education completed by the individual. 'Gender' is a binary variable, where 1 indicates female and 0 indicates male. 'Union_membership' is also a binary variable, with 1 denoting union membership and 0 indicating no union membership. 'Job_training' indicates whether the individual has undergone a job training program. 'Job_satisfaction' represents the level of job satisfaction on a scale of 0 to 100. The 'annual_income' represents the individual's annual income in dollars.",Does participating in a job training program boost a higher annual income?,1.679793827,frontdoor,frontdoor_data_10.csv,labor,job_training,annual_income,"age, education_years, gender, union_membership",,,,job_satisfaction,,
Synthetic Dataset,"This dataset was collected by the Department of Sociology at a major university, investigating the impact of participation in a government-funded social program on individuals' life satisfaction. The age of the participants ('age') and their household income ('household_income') were considered. Participants were also categorized by gender (binary variable 'gender'; 0 for male and 1 for female) and by whether they live in an urban or rural area (binary variable 'urban_residence'; 0 for rural and 1 for urban). The main intervention studied was whether participants were part of a specific government-funded social program (binary variable 'participation_in_social_program'; 0 for no participation and 1 for participation). The effect of this program was studied in relation to the size of individuals' social networks ('social_network_size') and their overall life satisfaction score ('life_satisfaction_score') after a year of participation in the program.",Can enrolling in government-funded social programs improve individuals' life satisfaction?,1.861546718,frontdoor,frontdoor_data_49.csv,sociology,participation_in_social_program,life_satisfaction_score,"age, household_income, gender, urban_residence",,,,social_network_size,,
Synthetic Dataset,"The dataset was collected by a consortium of school districts aiming to understand the impact of using Teaching Assistants (TAs) in classrooms on the overall performance of students. The consortium was interested in how the use of TAs interacts with various teacher and school characteristics. The dataset includes the following variables: 'teacher_age', 'teacher_experience_years', 'average_class_size', 'teacher_gender' (binary, where 1 represents female and 0 represents male), whether the school is located in a rural area ('rural_location', binary, where 1 represents rural and 0 represents urban), use of TAs in classroom instruction ('use_of_teaching_assistants', binary, where 1 represents use of TAs and 0 represents no use of TAs), 'student_engagement_score' (a measure of student involvement in classroom activities), and the 'student_performance_score' (reflects the average grade of students in the class).",Does the introduction of teaching assistants in classrooms improve the performance of students in schools?,3.720566682,frontdoor,frontdoor_data_33.csv,education,use_of_teaching_assistants,student_performance_score,"teacher_age, teacher_experience_years, average_class_size, teacher_gender, rural_location",,,,student_engagement_score,,
Synthetic Dataset,"This political science dataset was collected by a national election commission seeking to understand the impact of exposure to political campaigns on voter turnout. The data was collected from real-world observation during a recent national election period. The variable 'age' represents the age of the individual in years. The 'income_level' indicates the individual's annual income in dollars. The 'gender' is a binary variable where 0 represents 'male' and 1 represents 'female'. The 'rural_residence' is a binary variable where 0 indicates 'urban' and 1 indicates 'rural'. The 'campaign_exposure' is a binary variable where 0 means that the individual was not exposed to the political campaign and 1 means that they were. The 'political_engagement' represents the level of political engagement of the individual on a scale from 0 to 100. Finally, 'voting_turnout' indicates the likelihood of the individual voting, also on a scale from 0 to 100.",Will being exposed to a political campaign result in people being more likely to vote in the elections?,5.043913737,frontdoor,frontdoor_data_47.csv,political_science,campaign_exposure,voting_turnout,"age, income_level, gender, rural_residence",,,,political_engagement,,
Synthetic Dataset,"The Department of Sociology at a major university collected this dataset to study the impact of a job training program on the income level and job satisfaction of participants. The dataset was gathered from real-world observation of individuals who voluntarily participated in the job training program. The 'individual_age' records the age of the participant in years, while 'income_level' denotes the annual income in thousands of dollars before enrolling in the program. The 'gender' is a binary variable indicating the gender of the participant where 1 is male and 0 is female. The 'rural_residence' is also a binary variable, denoting whether the participant lives in a rural area, with 1 indicating rural residency and 0 indicating urban residency. The 'job_training_program' indicates whether the participant underwent a job training program, where 1 indicates participation and 0 indicates no participation. The 'post_program_income' is the annual income in thousands of dollars after the participant completed the job training program. Finally, 'job_satisfaction_level' represents the participant's self-reported job satisfaction on a scale of 1 to 100.",Does enrolling in a job training program enhance an increase in income?,9.238498133,frontdoor,frontdoor_data_16.csv,sociology,job_training_program,post_program_income,"individual_age, income_level, gender, rural_residence",,,,job_satisfaction_level,,
Synthetic Dataset,"This dataset was collected by a governmental social services agency in a Southeast Asian country, aiming to evaluate the impact of a cash transfer program on the quality of life of low-income families. The data was gathered from surveys distributed to households across urban and rural areas. The 'age' represents the age of the head of the household. The 'income' represents the household's monthly income in local currency units. The 'rural_residence' is a binary variable that indicates whether the household is located in a rural area (1) or an urban area (0). The 'previous_beneficiary_status' is a binary variable that indicates whether the household was a beneficiary of a similar program in the past (1) or not (0). The 'underemployment_status' is a binary variable that indicates whether any adult in the household is underemployed, meaning they are involuntarily working less than the full-time hours they desire (1), or not (0). The 'cash_transfer_program_enrollment' is a binary variable indicating whether the household is enrolled in the cash transfer program (1) or not (0). The 'household_consumption' represents the household's monthly consumption expenditure in local currency units. The 'quality_of_life_index' represents the household's quality of life, measured using a standard index that considers various aspects of health, education, and living standards.",Can joining a government cash transfer program lower barriers to quality of life?,2.782962754,frontdoor,frontdoor_data_52.csv,policy,cash_transfer_program_enrollment,quality_of_life_index,"age, income, rural_residence, previous_beneficiary_status, underemployment_status",,,,household_consumption,,
Synthetic Dataset,"The dataset was collected by a major school district in the United States to study the impact of their newly-implemented online learning program on student performance. The 'student_age' represents the age of the student in years. The 'parent_education_level' denotes the highest education level achieved by the parent(s), measured in years of schooling. The 'low_income_status' is a binary variable indicating whether the student comes from a low-income family (1) or not (0). The 'private_tutoring' is a binary variable that denotes whether the student receives private tutoring (1) or not (0). The 'enrolled_in_online_program' is a binary variable indicating whether the student is enrolled in the school's online learning program (1) or not (0). The 'study_hours_per_week' shows the average number of hours the student studies per week. Finally, 'student_performance_index' represents the student's performance index, a composite score that includes grades, standardized test scores, and other academic indicators.",Can enrolling in the school's online learning program boost student performance?,6.36686206,frontdoor,frontdoor_data_41.csv,education,enrolled_in_online_program,student_performance_index,"student_age, parent_education_level, low_income_status, private_tutoring",,,,study_hours_per_week,,
Synthetic Dataset,"The U.S. Department of Housing and Urban Development (HUD) collected this dataset to evaluate the impact of housing vouchers on housing quality. The dataset includes personal and household details like age, years of education, weekly work hours, and household income. It also contains binary variables indicating whether the person is female (1 if female, 0 otherwise) and whether the individual resides in an urban area (1 for urban, 0 for rural). The receipt of a housing voucher varies from 16 to 36, indicating the value of the housing voucher in thousands of dollars. The housing quality is measured on a scale from 216 to 422, with higher scores indicating better housing conditions. The city-specific housing budget, representing the budget allocated by the city for housing-related issues, is also included.",Do housing vouchers improve the quality of housing individuals live in?,9.987413008,iv,iv_data_11.csv,policy,housing_voucher,housing_quality,"age, education_years, work_hours_week, household_income, female, urban_residency",,,,,city_housing_budget,
Synthetic Dataset,"The dataset was collected by the Department of Justice to investigate the impact of the intensity of legal representation on the length of sentencing. The defendant's age ('defendant_age') represents the age of the defendant at the time of trial. The judge's experience in years ('judge_experience_years'). The prior convictions ('prior_convictions') is a binary variable, where '0' denotes 'no prior convictions' and '1' denotes 'prior convictions'. The public defender assigned ('public_defender_assigned') is a binary variable, where '0' means 'no public defender was assigned' and '1' means 'a public defender was assigned'. Charge severity ('charge_severity') is a binary variable, where '0' represents 'less severe charges' and '1' represents 'more severe charges'. The number of hours of legal representation ('legal_representation_hours') represents the time spent by the lawyer on the case. The sentence length ('sentence_length') indicates the length of the defendant's sentence. The public defender's experience ('public_defender_experience') represents the experience level of the public defender assigned to the case.",Can having more hours of legal representation enhance sentence outcomes?,8.081789132,iv,iv_data_46.csv,law,legal_representation_hours,sentence_length,"defendant_age, judge_experience_years, prior_convictions, public_defender_assigned, charge_severity",,,,,public_defender_experience,
Synthetic Dataset,"The dataset was collected by a large-scale national health institution conducting a study to understand the impact of a new drug on patient recovery rates. The data was collected from different hospitals across the country in a real-world observation setup. The variables in the dataset include: 'patient_age' (representing the age of the patient), 'bmi' (Body Mass Index), 'cholesterol_level', 'blood_pressure', 'pre_existing_condition' (binary, 1 if the patient has a pre-existing condition, 0 otherwise), 'smoker_status' (binary, 1 if the patient is a smoker, 0 otherwise), 'drug_administration' (binary, 1 if the new drug was administered, 0 otherwise), 'recovery_rate' (representing the rate of recovery of the patient), and 'doctor_experience' (representing the years of experience of the doctor treating the patient).",Will the administration of a new drug result in improved patient recovery rates?,3.906515412,iv,iv_data_1.csv,healthcare,drug_administration,recovery_rate,"patient_age, bmi, cholesterol_level, blood_pressure, pre_existing_condition, smoker_status",,,,,doctor_experience,
Synthetic Dataset,"This dataset, collected by a national education research institute, explores the impact of tutoring on student academic performance. The data combines student demographic information like age ('student_age'), parents' education level ('parent_education'), and school quality index ('school_quality'). It also includes binary indicators for student gender ('gender': 1 for female, 0 for male), minority status ('minority_status': 1 for minority, 0 for non-minority), and low-income status ('low_income_status': 1 for low-income, 0 for non-low-income). The 'tutoring_hours' represents the number of hours a student received tutoring, and 'academic_performance' is an index of the student's overall academic performance. The 'tutor_availability' measures the number of hours a tutor was available in the student's school.",Does participating in more hours of tutoring cause improved academic performance among students?,3.302516124,iv,iv_data_33.csv,education,tutoring_hours,academic_performance,"student_age, parent_education, school_quality, gender, minority_status, low_income_status",,,,,tutor_availability,
Synthetic Dataset,"This dataset was collected by the Bureau of Economic Analysis as part of their research into the effects of job training programs on individual annual income. The dataset includes variables such as 'years_of_experience' (representing the individual's years of work experience), 'hourly_wage' (indicating the individual's wage per hour), 'gender' (a binary where 1 indicates female and 0 indicates male), 'full_time' (a binary denoting whether the individual works full time, with 1 for yes and 0 for no), and 'unemployment_benefits' (a binary showing whether the individual receives unemployment benefits, with 1 for yes and 0 for no). The 'job_training_program' represents the intensity of the job training program that the individual has undergone, and 'annual_income' represents the individual's total annual income. The 'regional_unemployment_rate' represents the unemployment rate in the individual's region.",Can undergoing a job training program improve the annual income of individuals?,7.750405225,iv,iv_data_15.csv,economics,job_training_program,annual_income,"years_of_experience, hourly_wage, gender, full_time, unemployment_benefits",,,,,regional_unemployment_rate,
Synthetic Dataset,"This dataset, collected by the National Institute for Child and Family Policy, evaluates the impact of child care subsidies on access to quality child care. The data was gathered from real-world observations, tracking metrics such as the age and income level of the residents, employment status, childcare expenses, whether they are single parents, whether they live in rural areas, their level of educational attainment, and whether they received childcare subsidies. The 'policy_rollout' refers to the timing and location of the introduction of the child care subsidy policy. The 'resident_age' represents the age of the resident. The 'income_level' indicates the annual income level of the resident. The 'employment_status' denotes the hours worked per week by the resident. The 'childcare_expense' represents the monthly cost of childcare for the resident. The 'single_parent', 'rural_residence', and 'educational_attainment' are binary variables, where '1' indicates 'yes' and '0' indicates 'no'. The 'child_care_access' measures the resident's access to child care services. The 'child_care_subsidy' represents the amount of subsidy received by the resident. The 'policy_rollout' measures how strongly the policy was rolled out, with higher values meaning the policy was more widely implemented.",Does receiving a subsidy for child care services enhance access to quality child care for families?,4.17723241,iv,iv_data_36.csv,policy,child_care_subsidy,child_care_access,"resident_age, income_level, employment_status, childcare_expense, single_parent, rural_residence, educational_attainment",,,,,policy_rollout,
Synthetic Dataset,"The dataset was gathered by a social research institute in a longitudinal study investigating the impact of community programs on social mobility. The variables in the dataset include 'age', representing the participant's age in years; 'education_years', denoting the number of years of formal education the participant has completed; 'income_level', indicating the participant's annual income in dollars; 'social_support_score', a self-reported score on a scale of 1-10, where a higher score indicates a higher level of perceived social support; 'gender' (binary), distinguishing between male (0) and female (1); 'minority_status' (binary), identifying whether the participant belongs to a minority group, with 0 representing non-minority and 1 representing minority; 'community_program_participation', representing the extent of the participant's involvement in community programs, measured by the number of hours spent on these programs per week; 'social_mobility_score', a composite score indicating the extent of upward social mobility the participant has experienced over the study period; and 'community_resource_availability', indicating the number of community resources, such as libraries, community centers, and job training programs, available within a five-mile radius of the participant's home.",Can participation in community programs help people climb the social ladder?,5.256720186,iv,iv_data_24.csv,sociology,community_program_participation,social_mobility_score,"age, education_years, income_level, social_support_score, gender, minority_status",,,,,community_resource_availability,
Synthetic Dataset,"This dataset was collected by a nonpartisan political research institution aiming to understand the impact of campaign funding on election results. The data was gathered from real-world observations of various election campaigns across the country. The 'age_of_candidate' represents the age of the political candidate in years. The 'candidate_education_years' indicates the total years of formal education completed by the candidate. The 'previous_campaigns' signifies the number of previous campaigns the candidate has participated in. The 'political_party_affiliation' displays a numerical code representing the candidate's political party. The 'gender' is a binary variable, where 1 represents male candidates and 0 represents female candidates. The 'incumbent' is also binary, identifying whether the candidate is the incumbent (1) or not (0). The 'minority_status' is another binary variable, indicating whether the candidate is a member of a minority group (1) or not (0). The 'campaign_funding' refers to the total funding in USD the candidate's campaign received. The 'election_result' represents the percentage of votes received by the candidate. Lastly, 'population_wealth' represents the median income of the population in the election region.",How does a rise in funding for the campaign of a political candidate affect the result in the election?,7.856778014,iv,iv_data_14.csv,political_science,campaign_funding,election_result,"age_of_candidate, candidate_education_years, previous_campaigns, political_party_affiliation, gender, incumbent, minority_status",,,,,population_wealth,
Synthetic Dataset,"The dataset was collected by a national mental health research institute to understand the impact of Cognitive Behavioral Therapy (CBT) on anxiety levels. Client age ('client_age') captures the age of the client. Stress score ('stress_score') measures the client's perceived stress level on a scale from 22 to 36, with higher scores indicating higher stress levels. Support network size ('support_network_size') represents the number of people in the client's support network. Prior therapy ('prior_therapy') is a binary variable indicating whether the client has previously undergone any form of therapy (1: Yes, 0: No). Gender ('gender') is a binary variable where 1 represents female and 0 represents male. Anxiety level ('anxiety_level') measures the client's anxiety level on a standardized scale, and therapist experience ('therapist_experience') is the years of experience of the therapist. The number of CBT sessions attended ('cbt_sessions') indicates the number of therapy sessions each client attended.",Does attending more therapy sessions enhance anxiety reduction among people with high stress?,2.726539944,iv,iv_data_45.csv,psychology,cbt_sessions,anxiety_level,"client_age, stress_score, support_network_size, prior_therapy, gender",,,,,therapist_experience,
Synthetic Dataset,The dataset was collected by the Department of Social Services to examine the impact of a Universal Basic Income (UBI) policy on the financial stability of individuals. The data was gathered from the real-world observation of the UBI policy rollout across various regions of the country. The 'age' represents the age of the individuals. The 'income' denotes the annual income of the individuals. The 'education_level' indicates the highest level of education achieved by the individuals (on a scale from 1 to 10). The 'employment_status' indicates the employment status of the individuals (on a scale from 1 to 5). The 'gender' is a binary variable where 0 represents male and 1 represents female. The 'race' is a binary variable where 0 represents Non-White and 1 represents White. The 'marital_status' is a binary variable where 0 signifies unmarried individuals and 1 signifies married individuals. The 'universal_basic_income' represents the amount of UBI received by the individuals. The 'financial_stability_score' represents the financial stability of the individuals. The 'region' represents the region where the individuals live.,Does the implementation of a Universal Basic Income policy boost the financial stability of individuals?,1.369358695,iv,iv_data_2.csv,Policy,universal_basic_income,financial_stability_score,"age, income, education_level, employment_status, gender, race, marital_status",,,,,region,
Synthetic Dataset,"This dataset, collected by the Bureau of Labor Statistics, aims to understand the impact of job training program intensity on workers' annual income. The 'age' represents the age of each worker. The 'education_years' denotes the total years of formal education completed by the worker. The 'previous_income' measures the worker's income in the year before the training program. The 'working_hours_per_week' indicates the average number of hours the worker spends at work per week. The 'gender' is a binary variable, where 1 indicates male and 0 indicates female. The 'union_membership' is a binary variable, with 1 indicating that the worker is a member of a labor union and 0 meaning they are not. The 'minority_status' is a binary variable, with 1 denoting that the worker is from a minority group and 0 otherwise. The 'job_training_intensity' represents the intensity of the job training program, measured in hours per week. The 'annual_income' measures the worker's income in the year after the training program. The 'local_unemployment_rate' indicates the unemployment rate in the worker's local area.",Will a more intense job training program increase earnings among workers?,8.540080982,iv,iv_data_27.csv,labor,job_training_intensity,annual_income,"age, education_years, previous_income, working_hours_per_week, gender, union_membership, minority_status",,,,,local_unemployment_rate,
Synthetic Dataset,"This dataset was collected by the National Bureau of Economic Analysis with the aim of evaluating the effectiveness of job training programs on individuals' annual income. It includes data on age (measured in years), education (measured in years of schooling completed), gender (binary, 0 for male and 1 for female), minority status (binary, 0 for non-minority and 1 for minority), rural area residence (binary, 0 for urban and 1 for rural), job training program intensity (measured in hours), annual income (measured in dollars), and local unemployment rate (measured in percentage).",Can participating in more hours of job training programs result in higher annual income among individuals living in areas with higher unemployment rates?,2.661965759,iv,iv_data_28.csv,Policy,job_training_hours,annual_income,"age, education_years, gender, minority_status, rural_area",,,,,local_unemployment_rate,
Synthetic Dataset,"The U.S. Department of Commerce collected this dataset to evaluate the impact of tariff reductions on import volumes, and subsequently, on individual consumers. The dataset includes variables such as age (representing the age of the individual), education_years (the number of years of formal education completed), personal_income (the individual's annual income), and price_index (the general level of prices in the economy). It also includes binary variables such as gender (1 for female, 0 for male), rural_residence (1 for rural, 0 for urban), and trade_agreement_participation (1 if the individual's country is part of a trade agreement, 0 otherwise). The tariff_reduction represents the percentage reduction in tariffs. The import_volume indicates the volume of goods imported by the individual. The presence_of_trade_agreement is a binary variable (1 for presence, 0 for absence) that indicates whether there is a trade agreement between the individual's country and the country of import.",Does lowering import tariffs improve the amount of goods individuals import,1.180638644,iv,iv_data_56.csv,economics,tariff_reduction,import_volume,"age, education_years, personal_income, price_index, gender, rural_residence, trade_agreement_participation",,,,,presence_of_trade_agreement,
Synthetic Dataset,"This dataset has been compiled by the National Institute for Child and Family Policy, intending to study the impact of child care subsidies on employment rates, particularly among single parents in rural areas. The 'age' represents the age of the parent, and 'education_years' indicates the total years of formal schooling completed by the parent. The 'gender' is a binary variable where 1 represents female and 0 represents male. The binary variable 'rural_residence' indicates whether the parent lives in a rural area, with 1 indicating rural residence and 0 indicating urban residence. The 'single_parent' is another binary variable, with 1 indicating that the parent is a single parent and 0 indicating otherwise. The 'childcare_subsidy' represents the monthly amount of childcare subsidy received by the parent. The 'childcare_availability' represents the number of childcare facilities per 1,000 residents in the parent's local area. The 'employment_status' is a binary variable, where 1 indicates that the parent is employed and 0 indicates that the parent is unemployed.",Will providing child care subsidies lower unemployment among single parents in rural areas?,4.216097517,iv,iv_data_4.csv,economics,childcare_subsidy,employment_status,"age, education_years, gender, rural_residence, single_parent",,,,,childcare_availability,
Synthetic Dataset,"The dataset was collected by the National Labor Relations Board (NLRB) to study the impact of job training programs on workers' annual income. It includes real-world observations from a broad cross-section of U.S. workers. The 'education_years' represents the total years of formal schooling completed by the individual. The 'age' is the individual's age in years. The 'gender' is a binary variable, with 1 indicating female and 0 indicating male. The 'union_membership' is another binary variable, where 1 signifies union membership and 0 non-membership. The 'job_training_intensity' measures the number of hours of job training the individual received in the past year. The 'local_unemployment_rate' is the local unemployment rate in the individual's residence area. The 'annual_income' is the individual's total income over the past year.",Can job training programs boost the annual income of American workers?,7.540240261,iv,iv_data_51.csv,labor,job_training_intensity,annual_income,"education_years, age, gender, union_membership",,,,,local_unemployment_rate,
Synthetic Dataset,"The dataset was collected by the National Institute for Sociology Studies to evaluate the effect of workforce training programs on annual income. It is based on real-world observations. The 'age_years' represents the age of the individual in years. The 'education_level' signifies the highest level of education attained by the individual, measured on a scale from 1 (no formal education) to 10 (doctoral degree). The 'gender' is a binary variable where 1 represents males and 0 represents females. The 'minority_status' is another binary variable, where 1 indicates that the individual identifies as part of a racial or ethnic minority, and 0 indicates otherwise. The 'workforce_training_hours' indicates the total hours of workforce training the individual has received. The 'local_unemployment_rate' represents the unemployment rate in the individual's local area. The 'annual_income_dollars' represents the individual's annual income in dollars.",Can participating in more hours of workforce training cause a higher annual income?,2.648798203,iv,iv_data_57.csv,sociology,workforce_training_hours,annual_income_dollars,"age_years, education_level, gender, minority_status",,,,,local_unemployment_rate,
Synthetic Dataset,"This healthcare dataset was obtained from observational data collected by a consortium of hospitals across the nation. The data was collected to study the impact of surgical treatment intensity on the recovery rates of patients. The 'patient_age' signifies the age of the patient in years. The 'hospital_quality_score' represents the quality of the hospital on a scale of 22 to 37, with higher values indicating better quality. The 'smoking_status' is a binary variable where '1' indicates the patient is a smoker, and '0' indicates the patient is not a smoker. The 'insurance_status' is a binary variable, with '1' indicating that the patient holds insurance, and '0' meaning the patient is uninsured. The 'surgical_treatment_intensity' indicates the intensity of the surgical treatment received by the patient, measured on a scale of 13 to 26. The 'recovery_rate' provides the rate of recovery of the patient, with higher numbers indicating higher recovery rates. The 'surgeon_experience' represents the years of experience of the surgeon who carried out the surgical treatment.",Does the intensity of a surgical treatment influence how fast patients recover?,7.877453415,iv,iv_data_34.csv,healthcare,surgical_treatment_intensity,recovery_rate,"patient_age, hospital_quality_score, smoking_status, insurance_status",,,,,surgeon_experience,
Synthetic Dataset,"This dataset was collected as part of a randomized field experiment conducted by a large urban school district in partnership with education researchers. The study examined whether encouraging students to participate in a supplemental online math course could improve their academic achievement. Students in selected middle schools were randomly assigned to receive an official invitation to enroll in the online course, while others did not receive such encouragement. The dataset includes key background variables: parent_education_years (total years of caregiver's schooling), household_income_bracket (ordinal, 1 to 4), prior_math_score_z (last term's standardized math score), is_english_language_learner (binary, 1=ELL, 0=not ELL), has_internet_access (binary, 1=yes, 0=no), received_online_course_invitation (binary, 1=invited, 0=not invited), enrolled_in_online_course (binary, 1=enrolled, 0=not enrolled), and end_of_term_math_score (raw math achievement score at term's end).",How does online math course participation affect student test performance?,4.752452205,iv_encouragement,iv_encouragement_data_17.csv,education,enrolled_in_online_course,end_of_term_math_score,"parent_education_years, household_income_bracket, prior_math_score_z, is_english_language_learner, has_internet_access",,,,,received_online_course_invitation,
Synthetic Dataset,"This dataset was collected by a university sociology research team in partnership with a city government to examine the effects of neighborhood engagement initiatives. The study focused on a randomized rollout of invitations to participate in a new community improvement project across several urban neighborhoods. Researchers gathered data on residents' backgrounds, living situations, and social integration, as well as whether they were invited to and participated in the project, and their subsequent satisfaction with the community. The dataset includes variables: 'years_in_community' records how long each resident has lived in their neighborhood; 'household_size' indicates the number of people in each home; 'social_network_score' reflects each person's sense of social connectedness; 'single_parent' (binary: 1 = yes, 0 = no) and 'unemployed' (binary: 1 = yes, 0 = no) provide key demographic information. 'Received_invitation' (binary: 1 = yes, 0 = no) marks whether an individual was invited to join the improvement project, while 'participated_in_neighborhood_project' (binary: 1 = yes, 0 = no) shows actual participation. The primary outcome, 'community_satisfaction_score', measures how satisfied residents felt about their neighborhood after the project's implementation.",What happens to community satisfaction when residents join neighborhood improvement projects?,6.08742848,iv_encouragement,iv_encouragement_data_16.csv,sociology,participated_in_neighborhood_project,community_satisfaction_score,"years_in_community, household_size, social_network_score, single_parent, unemployed",,,,,received_invitation,
Synthetic Dataset,"This dataset was collected by a university political science research team as part of a randomized field experiment during a midterm election year. The study examined how encouraging citizens to attend local town hall meetings impacts their understanding of political issues. Randomly selected registered voters in several municipalities were sent a voter mobilization mailer (voter_mobilization_mailer, binary: 1 = received mailer, 0 = did not) encouraging them to attend an upcoming town hall. The research team then tracked whether each individual attended the event (attended_town_hall, binary: 1 = attended, 0 = did not) and later administered a standardized survey measuring their political knowledge (political_knowledge_score). Covariates include education_years (total years of formal schooling), political_interest (self-rated interest in politics, 1‚Äì4), civic_participation_score (composite score of recent civic activities), identifies_with_major_party (binary: 1 = identifies with a major party, 0 = does not), and previous_voter (binary: 1 = voted in last election, 0 = did not).",Can attending town hall meetings improve citizens' political knowledge?,5.513884251,iv_encouragement,iv_encouragement_data_14.csv,political_science,attended_town_hall,political_knowledge_score,"education_years, political_interest, civic_participation_score, identifies_with_major_party, previous_voter",,,,,voter_mobilization_mailer,
Synthetic Dataset,"This dataset was collected by a city housing authority as part of a randomized policy rollout aimed at improving housing stability among public housing residents. The study examined whether offering and attending housing counseling services could help at-risk households maintain stable housing situations. The dataset includes variables tracking each household's background and circumstances, such as years spent in public housing, household size, history of eviction (binary: 1 if any eviction, 0 otherwise), single parent status (binary: 1 if single parent, 0 otherwise), and whether anyone in the household recently lost a job (binary: 1 if yes, 0 otherwise). The key policy intervention is whether the household was offered counseling (binary: 1=offered, 0=not offered), and whether they attended at least one session (binary: 1=attended, 0=not attended). The primary outcome is a housing stability score, which reflects the security and sustainability of the household's current housing situation.",How effective are housing counseling services at improving family stability?,5.612234939,iv_encouragement,iv_encouragement_data_28.csv,policy,attended_counseling,housing_stability_score,"years_in_public_housing, household_size, prior_eviction, single_parent, recent_job_loss",,,,,counseling_offer,
Synthetic Dataset,"This dataset was collected by a university psychology research team conducting a randomized controlled trial to examine the effects of peer support groups on adult psychological wellbeing. The study focused on adults from various backgrounds, some of whom were randomly selected and invited to join a weekly peer support group. Researchers gathered extensive background information, including cognitive functioning, mental health history, recent life events, and employment status. The dataset includes: cognitive_test_score (standardized score on a cognitive abilities assessment), sessions_attended_last_year (number of mental health sessions attended in the previous year), recent_life_event_score (numerical score capturing the impact of significant recent life changes), has_history_of_anxiety (binary: 1 if the participant has a diagnosed anxiety disorder, 0 otherwise), is_employed_full_time (binary: 1 if employed full time, 0 if not), invited_to_support_group (binary: 1 if the participant received an invitation to the support group, 0 if not), attended_support_group (binary: 1 if attended at least one group session, 0 if not), and psychological_wellbeing_score (score from a validated psychological wellbeing survey, where higher values indicate better wellbeing).",What is the relationship between peer support groups and psychological wellbeing?,4.519707655,iv_encouragement,iv_encouragement_data_29.csv,psychology,attended_support_group,psychological_wellbeing_score,"cognitive_test_score, sessions_attended_last_year, recent_life_event_score, has_history_of_anxiety, is_employed_full_time",,,,,invited_to_support_group,
Synthetic Dataset,"This dataset was collected by a national economic research institute to examine the impact of financial literacy seminars on household economic outcomes amid a government-led policy rollout. The study involved randomly inviting a subset of working-age adults to attend a financial literacy seminar, tracking both who was invited and who actually attended, as well as their socioeconomic backgrounds and income before and after the intervention. The variables include years_of_education (years of formal schooling), household_size (number of people in the household), pre_intervention_income_change (change in monthly income over the prior year), owns_home (binary, 1 if respondent owns their home), has_stable_job (binary, 1 if respondent has held the same job for at least a year), received_financial_literacy_invite (binary, 1 if the respondent was invited to the seminar), attended_financial_literacy (binary, 1 if the respondent attended the seminar), and post_intervention_monthly_income (average monthly income in thousands of currency units after the intervention period).",How do financial literacy seminars affect household earnings?,7.573778556,iv_encouragement,iv_encouragement_data_15.csv,economics,attended_financial_literacy,post_intervention_monthly_income,"years_of_education, household_size, pre_intervention_income_change, owns_home, has_stable_job",,,,,received_financial_literacy_invite,
Synthetic Dataset,"This dataset was collected as part of a government-backed randomized trial examining the impact of advanced job skills training on workers' earnings in the retail sector. Researchers from the National Labor Policy Institute partnered with several large retail chains to identify eligible employees, randomly assign invitations to a new training program, and track both participation and subsequent income. The data includes: education_years (total years of formal education completed), job_tenure_years (number of years the individual has worked in their current job), union_member (binary; 1 if the employee is a union member, 0 otherwise), female (binary; 1 if female, 0 if male), training_invited (binary; 1 if the employee was randomly invited to participate in the training program, 0 otherwise), training_attended (binary; 1 if the employee actually attended the training, 0 otherwise), and annual_income (individual's total wage and salary income in the year following the intervention, measured in thousands of dollars).",Is job skills training effective at increasing worker income?,9.317683973,iv_encouragement,iv_encouragement_data_9.csv,labor,training_attended,annual_income,"education_years, job_tenure_years, union_member, female",,,,,training_invited,
Synthetic Dataset,"This dataset was collected by a regional public health department conducting a randomized encouragement trial to examine the effects of a healthy living program on cardiovascular risk factors in adults. The study targeted adults aged 18 and older who were at elevated risk for hypertension and related chronic diseases. Eligible participants were randomly selected to receive an invitation to a multi-week healthy living program focused on diet, exercise, and lifestyle modification; attendance was voluntary. The dataset includes information on participants' age, baseline body mass index (BMI), and a coded smoking history index. It also contains binary indicators for prior diagnoses of hypertension and diabetes (1 for yes, 0 for no). The key variables for the study are whether the individual was invited to the program (binary: 1 for invited, 0 for not), whether they actually attended (binary: 1 for attended, 0 for not), and their follow-up systolic blood pressure measured several months after the intervention.",Can healthy living programs impact participants' blood pressure?,9.979714418,iv_encouragement,iv_encouragement_data_8.csv,healthcare,attended_health_program,followup_systolic_bp,"age_years, bmi, smoking_history, has_hypertension, has_diabetes",,,,,invited_to_health_program,
Synthetic Dataset,"This dataset was collected as part of a multi-site randomized controlled trial led by a university psychology research group, examining the impact of mindfulness-based workshops on adult life satisfaction. Researchers invited a randomly selected subset of community adults to participate in a free mindfulness program and tracked both those who were invited and those who actually attended the sessions. The dataset includes each participant's baseline cognitive test score, their composite wellbeing index, whether they had prior experience with therapy (binary: 1 = yes, 0 = no), and whether they are currently employed full-time (binary: 1 = yes, 0 = no). The variable 'invited_to_mindfulness' (binary: 1 = invited, 0 = not invited) indicates assignment to receive an invitation, while 'attended_mindfulness' (binary: 1 = attended, 0 = did not attend) captures actual participation. The primary outcome, 'life_satisfaction_score', measures participants' reported satisfaction with life after the study period.",What happens to life satisfaction when adults join mindfulness workshops?,1.948311898,iv_encouragement,iv_encouragement_data_12.csv,psychology,attended_mindfulness,life_satisfaction_score,"cognitive_test_score, wellbeing_index, prior_therapy, full_time_employed",,,,,invited_to_mindfulness,
Synthetic Dataset,"This dataset was compiled by a state criminal justice policy research group to examine the impact of providing pro bono legal aid offers to low- and moderate-income defendants facing criminal charges. Data were gathered during a randomized rollout of a legal aid program across several jurisdictions, with some eligible defendants randomly assigned to be offered free legal representation. The dataset includes each defendant's prior criminal history (prior_offenses), age at the time of the current charge (age_at_charge), income bracket (income_bracket), family responsibilities (has_dependents, binary: 1 if the defendant has dependents, 0 otherwise), and employment status at the time of charge (employed_status, binary: 1 if employed, 0 otherwise). The key program variables are whether the defendant was offered legal aid (legal_aid_offer, binary: 1 if offered, 0 otherwise), whether they accepted and received legal aid (legal_aid_uptake, binary: 1 if received, 0 otherwise), and the outcome variable: sentence length in months (sentence_length_months).",How does access to legal representation affect criminal sentence length?,6.961330166,iv_encouragement,iv_encouragement_data_13.csv,law,legal_aid_uptake,sentence_length_months,"prior_offenses, age_at_charge, income_bracket, has_dependents, employed_status",,,,,legal_aid_offer,
Synthetic Dataset,"This dataset was collected as part of a randomized field experiment conducted by a nonpartisan research institute studying the effects of civic education campaigns on adult political knowledge. Researchers randomly selected a sample of urban residents and invited a subset to participate in a series of civic education events designed to boost political awareness ahead of a national election. The dataset includes detailed background information: 'education_years' (total years of schooling), 'political_interest' (self-reported interest in politics on a 1 to 4 scale), and 'civic_participation_score' (standardized measure of recent civic activities). Binary covariates include 'registered_voter' (1 if registered, 0 if not) and 'prior_voting_history' (1 if has voted before, 0 otherwise). The 'campaign_invitation' variable indicates whether the individual was randomly invited to the campaign (1=yes, 0=no), while 'campaign_participation' records actual attendance at campaign events (1=yes, 0=no). The primary outcome, 'political_knowledge_score', measures performance on a standardized test of political knowledge administered after the campaign.",Are civic education campaigns effective at improving political knowledge?,5.893373032,iv_encouragement,iv_encouragement_data_22.csv,political_science,campaign_participation,political_knowledge_score,"education_years, political_interest, civic_participation_score, registered_voter, prior_voting_history",,,,,campaign_invitation,
Synthetic Dataset,"This dataset was collected as part of a randomized trial led by a metropolitan university's Department of Sociology in partnership with local social service agencies. The study examined whether encouraging parents to attend community-based parenting workshops could improve children's overall wellbeing in low-income urban neighborhoods. Researchers randomly selected and invited parents to participate in the workshops and tracked both those who attended and those who did not. The dataset includes detailed background information‚Äîsuch as parents' years of education, household size, whether the household is headed by a single parent (binary: 0 = no, 1 = yes), and whether the family receives public assistance (binary: 0 = no, 1 = yes). The key variables are whether the parent was invited to the parenting workshop (binary: 0 = not invited, 1 = invited), whether they actually attended (binary: 0 = did not attend, 1 = attended), and a quantitative measure of child wellbeing based on a standardized assessment.",What impact do parenting workshops have on children's wellbeing?,9.024814156,iv_encouragement,iv_encouragement_data_6.csv,sociology,attended_parenting_workshop,child_wellbeing_score,"education_years, household_size, single_parent_household, public_assistance_recipient",,,,,invited_to_parenting_workshop,
Synthetic Dataset,"This dataset was collected by an urban school district's research department to examine the effectiveness of a summer math enrichment program for middle school students. Eligible students were randomly assigned to receive an invitation to the program. The dataset includes key background information: parent_education_years (total years of formal education completed by the student's parent or guardian), baseline_math_gpa (student's starting math GPA), absences_last_term (number of school days missed in the previous term), free_lunch_eligible (binary, 1 = eligible for free/reduced lunch, 0 = not eligible), and english_language_learner (binary, 1 = student is an English language learner, 0 = not). The variable summer_program_invited (binary) indicates assignment to receive an invitation, while summer_program_attended (binary) reflects actual participation. The outcome, end_of_year_math_score, measures student performance on a standardized math exam at year's end.",Can summer math programs boost student performance on year-end tests?,2.130500858,iv_encouragement,iv_encouragement_data_7.csv,education,summer_program_attended,end_of_year_math_score,"parent_education_years, baseline_math_gpa, absences_last_term, free_lunch_eligible, english_language_learner",,,,,summer_program_invited,
Synthetic Dataset,"This dataset was gathered by a national economic policy evaluation unit to study the effects of a government consumption subsidy on household spending patterns. Data were collected as part of a randomized policy rollout, where selected households were offered the opportunity to receive a one-time consumption subsidy intended to stimulate local economic activity. The dataset includes background information on each household: years_employed (total years the household head has worked), household_size (number of household members), log_household_income (log-transformed annual income), owns_home (binary: 1 if the household owns their residence, 0 otherwise), and has_college_degree (binary: 1 if the household head completed college, 0 otherwise). The variable offered_consumption_subsidy (binary: 1 if the household was randomly selected to be offered the subsidy, 0 otherwise) indicates the encouragement or assignment, while used_consumption_subsidy (binary: 1 if the household actually claimed and used the subsidy, 0 otherwise) records treatment uptake. The outcome, annual_household_consumption, is measured as the household's total consumption expenditures over the year, in thousands of currency units.",How do government consumption subsidies affect household spending?,7.912072777,iv_encouragement,iv_encouragement_data_23.csv,economics,used_consumption_subsidy,annual_household_consumption,"years_employed, household_size, log_household_income, owns_home, has_college_degree",,,,,offered_consumption_subsidy,
Synthetic Dataset,"This dataset was collected by a national economic research institute to examine the impact of expanding financial inclusion in rural areas. In collaboration with local banks, some households were randomly selected and offered the opportunity to open no-fee bank accounts, while others were not. The dataset contains information on household demographic and economic characteristics, details of the encouragement intervention, banking uptake, and annual household savings. The variables are as follows: 'years_of_schooling' records the total years of formal education completed by the primary household earner; 'household_size' is the number of people in the household; 'distance_to_bank_km' measures the household's distance to the nearest bank branch in kilometers; 'owns_smartphone' (binary: 1 if at least one smartphone is owned, 0 otherwise); 'is_self_employed' (binary: 1 if the primary earner is self-employed, 0 otherwise); 'bank_account_offer' (binary: 1 if the household was randomly offered a no-fee bank account, 0 otherwise); 'opened_bank_account' (binary: 1 if the household opened a new account, 0 otherwise); and 'annual_household_savings' is the household's reported total savings at year-end.",Is easier bank account access effective at increasing rural household savings?,2.030740965,iv_encouragement,iv_encouragement_data_5.csv,economics,opened_bank_account,annual_household_savings,"years_of_schooling, household_size, distance_to_bank_km, owns_smartphone, is_self_employed",,,,,bank_account_offer,
Synthetic Dataset,"This dataset was compiled by a university psychology research group studying the impact of resilience training on adult wellbeing. Researchers conducted a randomized encouragement trial, inviting a randomly selected subset of adults from a metropolitan community to attend a structured, eight-week resilience training program. The primary goal was to examine whether attending the program led to measurable improvements in psychological wellbeing. The dataset includes the following variables: cognitive_score (baseline assessment of cognitive ability), stress_level (self-reported stress on a standardized scale), prior_therapy (binary; 1 = has previously received therapy, 0 = has not), full_time_employed (binary; 1 = employed full-time, 0 = not employed full-time), encouraged_to_participate (binary; 1 = received an official invitation to the resilience program, 0 = did not receive invitation), attended_resilience_program (binary; 1 = attended at least six sessions, 0 = attended fewer), and wellbeing_score (post-intervention measure of psychological wellbeing).",What happens to wellbeing when adults participate in resilience training?,8.501023431,iv_encouragement,iv_encouragement_data_20.csv,psychology,attended_resilience_program,wellbeing_score,"cognitive_score, stress_level, prior_therapy, full_time_employed",,,,,encouraged_to_participate,
Synthetic Dataset,"This dataset was collected by a metropolitan university's sociology department as part of a randomized policy rollout in partnership with the city council. The study examined the impact of encouraging residents to participate in community leadership workshops on subsequent levels of local civic engagement. Researchers randomly selected residents from several neighborhoods to receive invitations to a series of leadership workshops, then tracked who actually attended and followed their involvement in community activities over the following year. The dataset includes years_in_community (years living in the neighborhood), household_size (number of household members), social_trust_score (standardized trust in neighbors, ranging from -1 to 1), owns_home (binary: 1 if owns home, 0 otherwise), employed_full_time (binary: 1 if employed full-time, 0 otherwise), invited_to_community_leadership_workshop (binary: 1 if invited, 0 otherwise), attended_workshop (binary: 1 if attended, 0 otherwise), and community_engagement_index (composite score reflecting participation in local events, volunteering, and group memberships).",How does community leadership workshop participation affect civic engagement?,1.973693241,iv_encouragement,iv_encouragement_data_24.csv,sociology,attended_workshop,community_engagement_index,"years_in_community, household_size, social_trust_score, owns_home, employed_full_time",,,,,invited_to_community_leadership_workshop,
Synthetic Dataset,"This dataset was compiled by a regional health authority conducting a study to examine the impact of a health coaching program on cardiovascular risk among adults. The study invited a randomly selected group of patients from primary care clinics to participate in personalized health coaching, aiming to improve lifestyle factors and, ultimately, blood pressure control. Data was collected on each patient's age, body mass index (BMI), diabetes diagnosis (binary: 1 = diagnosed, 0 = not diagnosed), and smoking status (binary: 1 = currently smokes, 0 = does not smoke). The variable 'received_health_coaching_invite' indicates whether the patient was invited to join the program (binary), while 'attended_health_coaching' records participation (binary). The main outcome, 'systolic_blood_pressure', is the patient's systolic blood pressure measured at their follow-up appointment.",Can health coaching programs improve patients' blood pressure control?,9.203920976,iv_encouragement,iv_encouragement_data_1.csv,healthcare,attended_health_coaching,systolic_blood_pressure,"age_years, bmi, has_diabetes, is_smoker",,,,,received_health_coaching_invite,
Synthetic Dataset,"This dataset was collected by a large urban school district examining the impact of a new college readiness program for high school juniors. In a randomized controlled trial, eligible students were either invited to participate in the program or not, with invitations serving as encouragement to join. The aim was to study whether offering and attending this support program would improve academic outcomes, particularly final exam scores. The data includes background information on students (such as their primary caregiver's education in years, baseline GPA, low-income status, and English language learner status), whether they were invited and whether they participated in the program, and their end-of-year final exam score. Parent education years and baseline GPA are variables, while low income status, English language learner, invitation, and participation are binary (1 = yes, 0 = no). The final exam score represents standardized performance.",What is the relationship between college readiness programs and exam performance?,8.599830412,iv_encouragement,iv_encouragement_data_25.csv,education,participated_in_college_readiness_program,final_exam_score,"parent_education_years, baseline_gpa, low_income_status, english_language_learner",,,,,invited_to_college_readiness_program,
Synthetic Dataset,"This dataset was collected by a national labor policy research institute to examine the impact of a new workplace skill training initiative on workers' earnings. Data were gathered as part of a randomized trial in which a randomly selected group of employees across various industries received invitations to participate in the training program, while others did not. The dataset includes years of formal education completed, years of job experience in the current field, an indicator for whether the worker has a college degree (binary: 1 for yes, 0 for no), union membership status (binary: 1 for member, 0 for non-member), whether the worker was invited to participate in the skill training (binary: 1 for invited, 0 for not invited), whether the worker actually attended the training (binary: 1 for attended, 0 for did not attend), and the worker's annual income in thousands of dollars. Each variable offers insight into background, encouragement, participation, and labor market outcomes.",How effective are workplace training programs at increasing employee wages?,8.409133726,iv_encouragement,iv_encouragement_data_19.csv,labor,attended_skill_training,annual_income,"education_years, job_experience_years, has_college_degree, is_union_member",,,,,invited_to_skill_training,
Synthetic Dataset,"This dataset was collected by the Department of Education to evaluate the impact of a new digital learning tool on students' performance in standardized tests. The study was conducted across a variety of schools, both public and private, and the data was collected before and after the implementation of the learning tool. The 'student_age' represents the age of the students in years. The 'teacher_experience_years' indicates the number of years of teaching experience of the students' teachers. The 'school_type' is a binary variable representing the type of school the student is enrolled in, where 1 indicates a public school, and 0 indicates a private school. The 'gender' is a binary variable where 1 represents male and 0 represents female. The 'treatment_status' is a binary variable indicating whether the student was exposed to the new digital learning tool, with 1 representing exposure and 0 representing no exposure. The 'standardized_test_score' measures the students' performance on national standardized tests. The 'post_intervention_status' is a binary variable indicating whether the data was collected before or after the implementation of the digital learning tool, with 1 indicating post-intervention and 0 indicating pre-intervention.",Does the introduction of a new digital learning tool boost students' performance on standardized tests?,8.552815012,did_canonical,did_canonical_data_33.csv,education,treatment_status,standardized_test_score,"student_age, teacher_experience_years, school_type, gender",,post_intervention_status,,,,
Synthetic Dataset,"The dataset was collected by the National Bureau of Economic Research to evaluate the impact of a new microloan program on household income growth in developing countries. The 'household_id' is a unique identifier for each household. The 'post_policy' is a binary variable indicating whether the observation was made before or after the microloan policy was implemented, with 1 indicating post-implementation. The 'loan_received' indicates whether the household received a microloan under the new policy (1 = received, 0 = not received). The 'income_growth' measures the percentage change in household income. Among the covariates, 'household_income' is the household's initial income before the policy was implemented, and 'business_experience_years' is the number of years the household has been engaged in any kind of income-generating business. The 'rural_location' is a binary variable indicating whether the household is located in a rural area (1 = rural, 0 = urban), and 'previous_loan_status' indicates whether the household had any loans prior to the new policy (1 = had loans, 0 = no previous loans).",Can the new microloan program in developing countries result in substantial income growth for rural households?,6.597524646,did_canonical,did_canonical_data_31.csv,economics,loan_received,income_growth,"household_income, business_experience_years, rural_location, previous_loan_status",,post_policy,,,,
Synthetic Dataset,"This dataset was collected by a renowned healthcare research institute in order to study the impact of a new treatment for cardiovascular diseases. The 'patient_id' is a unique identifier for each patient involved in the study. The 'patient_age' represents the age of the patient at the time of the study and 'previous_visits' indicates the number of times a patient has visited the hospital in the past year. The 'smoker_status' is a binary variable indicating whether the patient is a smoker (1) or not (0), while 'diabetes_status' is also a binary variable signifying whether the patient has diabetes (1) or not (0). The 'new_treatment' is binary, indicating whether the patient received the new treatment (1) or not (0). The 'post_treatment' is binary as well, specifying whether the data was collected before (0) or after (1) the new treatment was introduced. The 'recovery_rate' measures the patient's recovery rate after receiving the treatment.",Will the introduction of a new treatment for cardiovascular diseases result in a higher recovery rate among patients?,8.602526314,did_canonical,did_canonical_data_18.csv,healthcare,new_treatment,recovery_rate,"patient_age, previous_visits, smoker_status, diabetes_status",,post_treatment,,,,
Synthetic Dataset,"The dataset was collected by a national political research institute to assess the impact of a new voting encouragement campaign on voter turnout. The aim was to understand the effectiveness of the campaign in different demographics, and inform future campaign strategies. Age ('age') and income ('income') represent the age of the voter and their annual income. Gender ('gender') is a binary variable where 1 represents female and 0 represents male. The previous voter status ('previous_voter_status') is a binary variable indicating whether the individual voted in the previous election, with 1 denoting they did and 0 they did not. The voting campaign participation ('voting_campaign_participation') is a binary variable where 1 indicates that the voter was exposed to the campaign and 0 indicates they were not. The voter turnout ('voter_turnout') is a composite metric that describes how politically active an individual is during elections. The post-campaign period ('post_campaign_period') is a binary variable indicating whether the data was collected before (0) or after (1) the campaign. Finally, 'voter_id' is a unique identifier for each voter in the dataset.",Can the new national voting encouragement campaign make voters more active?,6.487605019,did_canonical,did_canonical_data_22.csv,political_science,voting_campaign_participation,voter_turnout,"age, income, gender, previous_voter_status",,post_campaign_period,,,,
Synthetic Dataset,This dataset was collected by the National Health Agency during the rollout of a new COVID-19 vaccine. The aim of the study was to evaluate the vaccine's protective effects against COVID-19 infection. The 'post_vaccination' is a binary indicator denoting whether the observation was made before or after the vaccination (1 indicates post-vaccination). The 'vaccine_received' is a binary indicator for whether the individual received the vaccine or not. The 'covid_infection' indicates the degree to which the individual tested positive for COVID-19 after the vaccination period. The 'age' represents the age of the individual. The 'prior_covid_exposure' indicates the number of known COVID-19 exposures the individual had prior to the study. The 'pre_existing_conditions' is a binary variable indicating whether the individual has any pre-existing health conditions (1 indicates presence of such conditions). The 'healthcare_worker_status' is a binary variable indicating whether the individual is a healthcare worker (1 indicates that they are a healthcare worker).,Does the new COVID-19 vaccine reduce infection risk among individuals?,5.520623441,did_canonical,did_canonical_data_50.csv,healthcare,vaccine_received,covid_infection,"age, prior_covid_exposure, pre_existing_conditions, healthcare_worker_status",,post_vaccination,,,,
Synthetic Dataset,"This dataset was collected by a national legal research institute to study the impact of an innovative rehabilitation program on recidivism rates. The 'defendant_id' is a unique identifier for each participant in the study. The 'defendant_age' represents the age of the defendant. The 'previous_convictions' shows the number of previous convictions for each defendant. The 'legal_representation' is a binary variable that indicates whether the defendant had legal representation (1) or not (0). The 'financial_status' is a binary variable indicating whether the defendant has a stable financial status (1) or not (0). The 'rehabilitation_program' is a binary variable, indicating whether a defendant was subjected to the new rehabilitation program (1) or not (0). The 'post_intervention' is a binary variable indicating whether the observation was made before (0) or after (1) the rehabilitation program. Finally, 'reoffending_rate' measures the rate at which defendants reoffend.",Will the recent rehabilitation program reduce the rate of defendants reoffending?,6.920863413,did_canonical,did_canonical_data_9.csv,law,rehabilitation_program,reoffending_rate,"defendant_age, previous_convictions, legal_representation, financial_status",,post_intervention,,,,
Synthetic Dataset,"This dataset was collected by a mental health research institute that implemented a mindfulness-based stress reduction program. The aim was to study the impact of the program on participants' anxiety levels. The 'age' measures the age of individuals in the study. The 'stress_level' measures the self-reported stress level of the individuals on a scale from 1-10. The 'gender' is binary, with 1 representing female and 0 representing male. The 'previous_therapy_experience' is a binary variable indicating whether the participant had any form of therapy experience prior to the study, with 1 indicating 'Yes' and 0 indicating 'No'. The 'anxiety_level' measures the self-reported anxiety level of individuals on a scale from 1-10. The 'mindfulness_program_participation' is binary and represents whether or not the individual participated in the mindfulness-based stress reduction program, with 1 indicating participation and 0 indicating no participation. Finally, 'post_treatment' is a binary variable indicating whether the observation was made before or after the mindfulness-based stress reduction program, with 1 indicating 'Post-treatment' and 0 indicating 'Pre-treatment'.",Does participating in a mindfulness-based stress reduction program result in decreased anxiety levels in individuals?,2.68558006,did_canonical,did_canonical_data_45.csv,psychology,mindfulness_program_participation,anxiety_level,"age, stress_level, gender, previous_therapy_experience",,post_treatment,,,,
Synthetic Dataset,"This dataset was collected by the National Bureau of Social Research to evaluate the impact of a new state-sponsored internet access program on social media usage in the United States. The data was collected through a combination of surveys and internet usage data, before and after the program's implementation. The 'individual_income' represents the annual income of the individual in dollars, and 'individual_education' refers to the number of years of formal education completed by the individual. The 'rural_residence' is a binary variable that indicates whether the individual lives in a rural area (1) or not (0). The 'previous_internet_access' is binary and indicates whether the individual had internet access before the program (1) or not (0). The 'social_media_usage' measures the average number of hours per day the individual spends on social media. The 'internet_access_program' is binary and indicates whether the individual was in the area where the internet access program was implemented (1) or not (0). Lastly, 'post_intervention' is a binary variable indicating whether the data refers to the period after the program was implemented (1) or before (0).",Will the state-sponsored program to increase internet access lead to a rise in social media usage in rural areas?,3.888771063,did_canonical,did_canonical_data_49.csv,sociology,internet_access_program,social_media_usage,"individual_income, individual_education, rural_residence, previous_internet_access",,post_intervention,,,,
Synthetic Dataset,"This dataset was collected by the National Election Commission to study the impact of a new policy aimed at increasing voter turnout. The policy's implementation is represented by 'policy_implementation', a binary variable where 1 indicates that a citizen was exposed to the policy and 0 if not. The 'voter_turnout' represents the percentage of eligible voters who voted. The 'age' represents the age of the citizen. The 'income_level' indicates the annual income of the citizen. The 'gender' is a binary variable where 1 represents female and 0 represents male. The 'education_level' is a binary variable, with 1 indicating that the citizen has a college degree or higher and 0 otherwise. The 'post_policy_period' is a binary variable indicating whether the observation was made before (0) or after (1) the policy implementation. The 'citizen_id' is a unique identifier for each citizen in the dataset.",Can the new policy reduce voter disengagement?,9.005021911,did_canonical,did_canonical_data_14.csv,political_science,policy_implementation,voter_turnout,"age, income_level, gender, education_level",,post_policy_period,,,,
Synthetic Dataset,"The dataset was collected by a national education department to study the effect of a new online tutoring program on student performance. The tutoring program was rolled out across schools. The 'student_id' is a unique identifier for each student in the dataset. The 'average_income' represents the average income of the student's family. The 'school_rating' indicates the rating of the school the student attends. The 'private_school' is a binary variable, with '1' indicating that the student attends a private school and '0' indicating a public school. The 'online_tutoring' is a binary variable indicating whether the student has had any online tutoring aside from the new program, with '1' for yes and '0' for no. The 'student_performance' measures the student's academic performance. The 'tutoring_program' is a binary variable that indicates whether a student has received the new tutoring program, with '1' for yes and '0' for no. The 'post_intervention' is a binary variable indicating whether the observation was made before or after the tutoring program was introduced, with '1' for after and '0' for before.",Does the introduction of the new online tutoring program improve students' academic performance?,5.089956982,did_canonical,did_canonical_data_0.csv,education,tutoring_program,student_performance,"average_income, school_rating, private_school, online_tutoring",,post_intervention,,,,
Synthetic Dataset,"This dataset was collected by the National Health Insurance Agency to evaluate the impact of a new health insurance policy on the coverage of individuals. The aim was to identify the effectiveness of the policy in increasing insurance coverage among the population. The 'income_level' represents the annual income level of the individual in dollars. The 'age' indicates the age of the individual. The 'gender' is a binary variable, where 1 indicates male and 0 indicates female. The 'previous_insurance' is a binary variable indicating whether the individual had health insurance before the new policy was implemented, with 1 indicating 'Yes' and 0 'No'. The 'new_policy' is a binary indicating whether the individual was subjected to the new policy (1 for 'Yes', 0 for 'No'). The 'insurance_coverage' represents the duration of health insurance coverage for the individual in months. The 'post_policy_period' is a binary variable indicating the time period - before or after the new policy implementation, with 1 for 'post' (after) and 0 for 'pre' (before).",Will the new health insurance policy reduce coverage gaps among populations?,4.04566925,did_canonical,did_canonical_data_2.csv,policy,new_policy,insurance_coverage,"income_level, age, gender, previous_insurance",,post_policy_period,,,,
Synthetic Dataset,"A comprehensive dataset was collected by a team of psychologists at a mental health clinic to evaluate the impact of a novel cognitive behavioral therapy (CBT) on patients' anxiety levels. The dataset includes both pre- and post-therapy data, with the 'post_therapy' indicating whether the data was collected before (0) or after (1) the therapy. The 'therapy_enrolled' represents whether the individual was enrolled in the new therapy program (1) or not (0). The 'anxiety_score' represents the patient's anxiety level on a standardized scale. The dataset also incorporates several covariates: 'prior_therapy_sessions' represents the number of therapy sessions the patient had attended before the study; 'stress_level' indicates the patient's self-reported stress level; 'gender' is a binary variable where 1 represents female and 0 represents male; 'marital_status' is a binary variable where 1 indicates married and 0 indicates unmarried.",Does the new cognitive behavioral therapy program reduce anxiety levels in patients?,5.205932586,did_canonical,did_canonical_data_12.csv,psychology,therapy_enrolled,anxiety_score,"prior_therapy_sessions, stress_level, gender, marital_status",,post_therapy,,,,
Synthetic Dataset,"The dataset was collected by the Department of Housing and Urban Development (HUD) to study the effects of a new housing subsidy policy on affordable housing availability. The 'years_of_experience' represents the number of years the head of the household has been in their current job. The 'income_level' indicates the annual income of the household. The 'residential_status' is a binary variable, where 1 denotes a homeowner and 0 indicates a renter. The 'gender' is another binary variable, with 1 representing male and 0 representing female. The 'received_subsidy' indicates whether the household received a housing subsidy under the new policy, where 1 means they did, and 0 means they did not. The 'affordable_housing_availability' represents how easily the household was able to secure affordable housing. The 'post_policy_period' is a binary variable indicating whether the data is from before (0) or after (1) the policy implementation. The 'household_id' is the identifier for each unique household in the dataset.",Can the introduction of the new housing subsidy policy increase the availability of affordable housing?,5.232908576,did_canonical,did_canonical_data_11.csv,policy,received_subsidy,affordable_housing_availability,"years_of_experience, income_level, residential_status, gender",,post_policy_period,,,,
Synthetic Dataset,"This dataset was collected by the National Labor Bureau to study the impact of a minimum wage policy on the monthly income of individuals. Each individual in the dataset is identified by 'individual_id', a unique identifier. The 'post_policy' indicates whether the data was collected before or after the policy's implementation. A binary variable 'minimum_wage_policy' indicates whether the individual was subjected to the minimum wage policy. The 'monthly_income' represents the income of the individual in the month when the data was collected. The 'education_years' denotes the total years of formal education completed by the individual. The 'work_experience_years' represents the total years of work experience of the individual. The 'urban_residence' is a binary variable indicating whether the individual resides in an urban area (1) or a rural area (0). Lastly, 'part_time_job' is a binary variable, indicating whether the individual holds a part-time job (1) or a full-time job (0).",Will the nationwide minimum wage policy result in increased monthly income for individuals?,8.503736543,did_canonical,did_canonical_data_16.csv,sociology,minimum_wage_policy,monthly_income,"education_years, work_experience_years, urban_residence, part_time_job",,post_policy,,,,
Synthetic Dataset,"The dataset was collected by a group of psychologists studying the impact of a mindfulness-based stress reduction (MBSR) program on anxiety levels. The study involved 1000 participants, each measured at two time points - before and after the intervention. The 'anxiety_level_pre' represents the participant's anxiety level before the intervention, and 'anxiety_level_post' is the anxiety level after the intervention. The 'stress_level' denotes the level of stress in the participant's life. The 'gender' is a binary variable where 1 indicates female and 0 indicates male. The 'received_previous_therapy' is a binary variable indicating whether the participant had received any form of therapy before the study, with 1 for 'yes' and 0 for 'no'. The 'received_mindfulness_training' is a binary variable, with 1 indicating that the participant received the MBSR training and 0 indicating they did not. The 'post_measurement' is a binary variable indicating whether the observation was made before (0) or after (1) the intervention. Lastly, 'participant_id' is the unique identifier assigned to each participant.",Will participating in a mindfulness-based stress reduction program reduce anxiety levels?,9.275656599,did_canonical,did_canonical_data_20.csv,psychology,received_mindfulness_training,anxiety_level_post,"anxiety_level_pre, stress_level, gender, received_previous_therapy",,post_measurement,,,,
Synthetic Dataset,"This dataset was collected by a national legal research institute aiming to evaluate the impact of a new sentencing policy on the length of court sentences. The information was gathered over a period of time before and after the implementation of the policy, from a range of court records across the country. The 'defendant_age' is the age of the defendant at the time of sentencing, while 'previous_convictions' refers to the number of previous convictions the defendant has had. The 'legal_representation' is a binary variable, with 1 indicating that the defendant had legal representation and 0 indicating otherwise. The 'income_level' is also a binary variable, where 1 indicates that the defendant's income is above the median income level, and 0 indicates an income below the median. The 'sentence_length' is the length of the sentence received in months. The 'new_sentencing_policy' is a binary indicating whether the sentence was given after (1) or before (0) the implementation of the new sentencing policy. The 'defendant_id' is a unique identifier for each defendant, and 'post_policy_period' is a binary variable indicating whether the data was collected after (1) or before (0) the new policy was implemented.",Can the introduction of a new sentencing policy reduce the length of court sentences?,3.182645213,did_canonical,did_canonical_data_46.csv,law,new_sentencing_policy,sentence_length,"defendant_age, previous_convictions, legal_representation, income_level",,post_policy_period,,,,
Synthetic Dataset,"This dataset, collected by the Department of Education, evaluates the impact of a new online resources program on students' academic performance. The department rolled out an initiative to provide additional online learning resources to students and investigated its effects on student performance. The dataset includes the 'teacher_experience' which represents the number of years a teacher has been teaching. It also includes 'student_income_quartile', representing the income bracket of the student's family, divided into quartiles. The 'rural_school' is a binary variable indicating whether the school is located in a rural area (1) or not (0). The 'gender' is a binary variable, where 1 represents male and 0 represents female students. The 'student_performance' measures the students' academic performance. The 'online_resources_usage' is a binary indicator of whether the student used the online resources (1) or not (0). The 'post_intervention_period' is another binary variable indicating whether the observation was made before (0) or after (1) the online resources program was introduced. The 'student_id' is a unique identifier for each student.",Does the introduction of new online learning resources improve students' academic performance?,1.928909111,did_canonical,did_canonical_data_41.csv,education,online_resources_usage,student_performance,"teacher_experience, student_income_quartile, rural_school, gender",,post_intervention_period,,,,
Synthetic Dataset,"The dataset was collected by the Department of Education over a ten-year period to study the effect of a scholarship program on students' test scores. The scholarship program ('scholarship_received') was given to students based on their financial need and academic performance. The dataset includes students' test scores ('test_scores') for that year of study, family income ('family_income'), and the students' previous test scores ('previous_test_scores'). The binary variables are whether the student attends a private school ('private_school'; 1 if private, 0 if public) and whether the school is located in an urban setting ('urban_school'; 1 if urban, 0 if rural). The 'student_id' denotes the student's unique identifier, and the 'year_of_study' variable represents the academic year when the data was collected.",Does receiving a scholarship lead to an improvement in students' academic performance?,1.676092078,did_twfe,did_twfe_data_33.csv,education,scholarship_received,test_scores,"family_income, previous_test_scores, private_school, urban_school",,year_of_study,student_id,,,
Synthetic Dataset,"This dataset was collected by the International Political Science Association over a ten-year period from multiple countries worldwide. The objective was to investigate the impact of economic stimulus policies on the countries' economic growth rates. Each 'country_id' represents a unique country, while the 'year' indicates the year in which the data was collected. The variable 'policy_implementation' is a binary indicator denoting whether a country implemented an economic stimulus policy in a given year (1 if yes, 0 if no). The 'economic_growth_rate' represents the annual growth rate of a country's GDP. The 'gdp_per_capita' represents a country's GDP per capita for a given year. The 'unemployment_rate' represents the percentage of the total workforce that is unemployed and actively seeking employment. The 'government_type' is a binary variable indicating whether a country has a democratic (1) or non-democratic (0) government. Finally, 'leader_gender' is a binary variable indicating whether the country's leader is male (1) or female (0). Different countries implemented these policies at different times during the study period.",Does the introduction of economic stimulus policies result in a significant increase in the rate of economic growth in different countries worldwide?,5.57410808,did_twfe,did_twfe_data_30.csv,political_science,policy_implementation,economic_growth_rate,"gdp_per_capita, unemployment_rate, government_type, leader_gender",,year,country_id,,,
Synthetic Dataset,"This dataset was collected by the Department of Health over a ten-year period to investigate the impact of telehealth services on patient recovery rates in various hospitals. It includes variables such as patient age, hospital capacity, rural location, and whether the patient has private insurance. The variables are 'patient_age', which represents the average age of the patients in a given hospital, and 'hospital_capacity', which denotes the number of patients a hospital can accommodate at a time. The binary variables include 'rural_location', which represents whether the hospital is located in a rural area (1) or not (0), and 'private_insurance', which denotes whether a patient has private insurance (1) or not (0). The 'telehealth_services' is also binary and indicates whether telehealth services were implemented in the hospital (1) or not (0). The 'patient_recovery_rate' represents the percentage of patients who recovered in a given year. The dataset also includes a 'hospital_id' to identify each hospital and a 'measurement_year' to denote the year of data collection. Hospitals implemented telehealth services at different times throughout the decade.",Can the introduction of telehealth services in hospitals cause improved patient recovery rates?,6.388972446,did_twfe,did_twfe_data_42.csv,healthcare,telehealth_services,patient_recovery_rate,"patient_age, hospital_capacity, rural_location, private_insurance",,measurement_year,hospital_id,,,
Synthetic Dataset,"The dataset was gathered by the Department of Labor to study the impact of minimum wage increases on employees' income across different industries over a decade. The data was collected from real-world observation. The 'previous_year_income' signifies the income of the employee from the previous year. The 'education_years' represents the total years of formal education completed by the employee. The 'gender' is a binary variable where 1 means the employee is female, and 0 means the employee is male. The 'union_membership' is another binary variable where 1 denotes the employee is part of a labor union and 0 otherwise. The 'current_year_income' represents the employee's income for the current year. The 'minimum_wage_increase' indicates whether there was a minimum wage increase in that year (1 if yes, 0 if no). The 'employee_id' is a unique identifier assigned to each employee. The 'year_of_observation' denotes the year in which the data was recorded. Different states implemented minimum wage increases at various times during the study period.",How does the increase in minimum wage over the past decade impact employees' income,3.170260227,did_twfe,did_twfe_data_10.csv,labor,minimum_wage_increase,current_year_income,"previous_year_income, education_years, gender, union_membership",,year_of_observation,employee_id,,,
Synthetic Dataset,"This dataset was collected by the U.S. Department of Justice over a ten-year period to examine the impact of providing free legal aid to defendants on the conviction rates in U.S. courts. The 'court_id' represents the unique identifier for each court. The 'year' denotes the year in which the data was measured. The 'legal_aid_program' is a binary variable indicating whether a free legal aid program was implemented at a specific court in a certain year (1) or not (0). The 'conviction_rate' shows the yearly conviction rate in each court. The 'crime_rate' signifies the annual crime rate in the court's jurisdiction, and 'average_income' represents the average annual income of residents in the court's jurisdiction. The 'urban_location' is a binary variable, with 1 denoting that the court is in an urban area and 0 indicating a rural area. The 'public_defender_availability' is a binary variable, with 1 denoting that public defenders are widely available and 0 indicating they are not. Different courts implemented legal aid programs at different times during the study period.",Does providing free legal aid to defendants result in decreased conviction rates in U.S. courts?,5.449371806,did_twfe,did_twfe_data_46.csv,law,legal_aid_program,conviction_rate,"crime_rate, average_income, urban_location, public_defender_availability",,year,court_id,,,
Synthetic Dataset,"This dataset was collected by a team of psychologists from a large mental health institute over a decade to study the effectiveness of a cognitive-behavioral therapy (CBT) program on reducing depression symptoms. The 'participant_id' is a unique identifier for each participant. The 'year' denotes the year of observation. The 'stress_level' represents the individual's assessed stress level. The 'cognitive_ability' indicates the cognitive ability of the participant, measured using standardized cognitive tests. The 'gender' is a binary variable, where 1 indicates female and 0 indicates male. The 'previous_therapy' is another binary variable, where 1 denotes previous therapy experience and 0 denotes no previous therapy experience. The 'cbt_participation' indicates whether the participant was exposed to the CBT program at time t (1 indicates participation, 0 indicates no participation). Finally, 'depression_score' represents the severity of depression symptoms measured using a standardized depression scale. Participants were enrolled in the CBT program at different times throughout the study period.",Does the introduction of a cognitive-behavioral therapy program reduce depression symptoms among participants?,3.616407375,did_twfe,did_twfe_data_53.csv,psychology,cbt_participation,depression_score,"stress_level, cognitive_ability, gender, previous_therapy",,year,participant_id,,,
Synthetic Dataset,"This dataset was collected over a period of ten years by the Department of Health and Human Services to examine the effects of smoking bans on lung cancer rates across different states. The variable 'state' represents the individual states where the data was collected. The variable 'year' indicates the year in which the data was recorded. The 'smoking_ban' is a binary variable indicating whether or not a smoking ban was in effect in a given state at a given time (1 for 'yes', 0 for 'no'). The 'lung_cancer_rate' represents the number of new lung cancer cases per 100,000 people in a given state-year. The 'gdp_per_capita' is the state's GDP per capita. The 'population_density' is the number of people per square mile. The 'urban_area' is a binary variable indicating whether the area is predominantly urban (1) or rural (0). The 'healthcare_spending' is a binary variable indicating whether the state's healthcare spending is above the national median (1) or not (0). States implemented smoking bans at different times during the study period.",Can implementing public smoking bans result in a decrease in lung cancer rates in the United States?,5.181822313,did_twfe,did_twfe_data_36.csv,policy,smoking_ban,lung_cancer_rate,"gdp_per_capita, population_density, urban_area, healthcare_spending",,year,state,,,
Synthetic Dataset,"This dataset was collected by the US Department of Labor to investigate the effects of a job training program on the hourly wages of workers over a decade. The 'worker_id' is a unique identifier assigned to each worker. The 'survey_year' indicates the year each data point was measured. The 'job_training_program' is a binary variable indicating whether the worker participated in the job training program in a particular year, with 1 indicating participation and 0 indicating no participation. The 'hourly_wage' represents the hourly earnings of the worker. The 'education_years' represents the total years of formal education completed by the worker. The 'work_experience_years' represents the number of years the worker has been in the labor force. The 'union_membership' is a binary variable indicating whether the worker is a member of a labor union, with 1 indicating membership and 0 indicating non-membership. The variable 'gender' is a binary variable where 1 represents male and 0 represents female. Workers enrolled in training programs at various times throughout the decade.",Does participation in job training programs result in increased workers' wages?,5.814700209,did_twfe,did_twfe_data_19.csv,labor,job_training_program,hourly_wage,"education_years, work_experience_years, union_membership, gender",,survey_year,worker_id,,,
Synthetic Dataset,"The dataset was collected by the U.S. Department of Justice over a ten-year period. The 'county_code' represents the unique identification code assigned to each county. The 'year' indicates the year when the data was recorded. The 'legal_aid_availability' is a binary variable indicating whether the county provided free legal aid to defendants that year (1 if yes, 0 if no). The 'conviction_rate' represents the rate of convictions per 1000 people in the county. The 'median_income' denotes the median income of the county's residents for that year. The 'crime_rate' represents the crime rate per 1000 people in the county. The 'urban_location' is a binary variable indicating whether the county is predominantly urban (1 for yes, 0 for no). The 'public_defender_availability' is a binary variable indicating whether the county has a public defender's office (1 for yes, 0 for no). Counties introduced free legal aid programs at different times during the study period.",Does the introduction of free legal aid in some counties reduce conviction rates?,5.084855099,did_twfe,did_twfe_data_9.csv,law,legal_aid_availability,conviction_rate,"median_income, crime_rate, urban_location, public_defender_availability",,year,county_code,,,
Synthetic Dataset,"The Department of Education collected this dataset over a ten-year period to evaluate the impact of implementing e-learning platforms on average student test scores across various high schools. The 'school_id' denotes the unique identification number assigned to each school. The 'academic_year' represents the specific academic year in which the data was recorded. The 'e_learning_implemented' is a binary variable indicating whether the school implemented an e-learning platform in that particular year (1 if implemented, 0 if not). The 'average_test_score' represents the average test score of all students in the school for that academic year. The 'average_class_size' indicates the average number of students per class in the school for a given year. The 'school_funding' represents the amount of funding provided to the school in a particular academic year. The 'urban_location' is a binary variable indicating whether the school is located in an urban area (1 if yes, 0 if no). Finally, 'high_school_level' is a binary variable indicating whether the school is a high school (1 if yes, 0 if no). Schools implemented e-learning platforms at different times throughout the study period.",Can the introduction of online learning platforms improve students' test scores in high schools across the country?,8.515843723,did_twfe,did_twfe_data_58.csv,education,e_learning_implemented,average_test_score,"average_class_size, school_funding, urban_location, high_school_level",,academic_year,school_id,,,
Synthetic Dataset,"This dataset, collected by the U.S. Bureau of Justice Statistics over a ten-year period, explores the impact of prisoner rehabilitation programs on recidivism rates across different states. The 'state' represents the individual states surveyed during the study period. The 'survey_year' is the year in which the data was gathered, ranging from 0 to 9. The 'rehabilitation_program' is a binary variable indicating whether or not the state implemented a rehabilitation program for prisoners in that year (1 if the program was implemented, 0 otherwise). The 'recidivism_rate' represents the percentage of released prisoners who reoffend within three years of release. The 'unemployment_rate' represents the state's unemployment rate for the corresponding year. The 'average_income' is the state's average annual income per capita for the year. The 'urban_state' is a binary variable indicating whether the state is predominantly urban (1) or rural (0). Lastly, 'private_prison_presence' is a binary variable indicating whether there are private prisons in the state (1 if present, 0 otherwise). States implemented rehabilitation programs at different times during the study period.",Will implementing rehabilitation programs for prisoners reduce recidivism?,5.919573856,did_twfe,did_twfe_data_44.csv,policy,rehabilitation_program,recidivism_rate,"unemployment_rate, average_income, urban_state, private_prison_presence",,survey_year,state,,,
Synthetic Dataset,"This dataset was collected by a team of psychologists over a 10-year period to study the impact of a newly developed cognitive-behavioral therapy (CBT) on depression scores among participants. The 'participant_id' is a unique identifier assigned to each participant in the study. The 'year_of_study' represents the year in which the data was collected, ranging from year 0 to year 9. The 'therapy_participation' is a binary variable indicating whether a participant was enrolled in the new CBT program in that year (1 for yes, 0 for no). The 'depression_score' indicates the severity of depressive symptoms as assessed by a standardized depression scale. The 'cognitive_ability' represents the participant's score on a cognitive ability test. The 'age' indicates the participant's age at the time of the study. The 'gender' is a binary variable (1 for female, 0 for male). The 'previous_therapy_experience' is a binary variable indicating whether the participant has had any prior exposure to any form of psychotherapy (1 for yes, 0 for no). Participants enrolled in the CBT program at different times during the study period.",Does participating in a newly developed cognitive-behavioral therapy program reduce the severity of depressive symptoms?,5.22838168,did_twfe,did_twfe_data_45.csv,psychology,therapy_participation,depression_score,"cognitive_ability, age, gender, previous_therapy_experience",,year_of_study,participant_id,,,
Synthetic Dataset,"This dataset was collected by the Department of Justice over a ten-year period to investigate the impact of offering free legal aid to defendants on the conviction rate in different counties across the country. The 'county_id' is a unique identifier for each county. The 'year' denotes the year of observation. The 'legal_aid_availability' is a binary indicating whether free legal aid was made available in that county during the year of observation (1 for yes, 0 for no). The 'conviction_rate' represents the proportion of cases resulting in a conviction. The 'average_income' indicates the average per capita income in the county. The 'crime_rate' represents the number of reported crimes per 1,000 residents in the county. The 'urban_county' is a binary variable indicating whether the county is classified as urban (1) or rural (0). Lastly, 'public_defender_availability' is a binary variable indicating whether the county has a public defender's office (1 for yes, 0 for no). Counties implemented legal aid programs at different times during the study period.",Can providing free legal aid to defendants in various U.S. counties affect conviction rates over a decade?,3.031451993,did_twfe,did_twfe_data_3.csv,law,legal_aid_availability,conviction_rate,"average_income, crime_rate, urban_county, public_defender_availability",,year,county_id,,,
Synthetic Dataset,"This dataset was gathered by the Department of Labor over a decade to analyze the impact of job training programs on the income of employees. The 'education_years' records the total years of formal education completed by the employee. The 'previous_income' captures the employee's income from the previous year. The 'gender' is binary, with 1 indicating a female employee and 0 indicating a male employee. The 'union_membership' is also binary, with 1 representing union membership and 0 representing non-membership. The 'current_income' records the employee's income for the current year. The 'job_training' is a binary indicator of whether or not the employee participated in a job training program in the current year, with 1 indicating participation and 0 indicating non-participation. The 'employee_id' uniquely identifies each employee. The 'observation_year' denotes the year in which the data was recorded. Employees participated in job training programs at different times throughout the decade.",Will undergoing job training programs result in an increase in the yearly income of employees?,6.332315453,did_twfe,did_twfe_data_51.csv,labor,job_training,current_income,"education_years, previous_income, gender, union_membership",,observation_year,employee_id,,,