Synthify

14 Jul 2023 (modified: 01 Aug 2023)InvestinOpen 2023 OI Fund SubmissionEveryoneRevisionsBibTeX
Funding Area: Critical shared infrastructure / Infraestructura compartida critica
Problem Statement: Surveys are not representative nor accurate. Economists, computational social scientists, pollsters, and other researchers across the US spend thousands of dollars every year for modern survey launching platforms like Prolific and MTurk to assess problems such as consumer behavior and psychology. Respondents on such platforms are predominately White and are far from representative of the population; further, most people put little thought into their responses. Drawing from a limitless sample of online artificially intelligent personas, we have designed the GPT-powered open source survey system of the future: Synthify.
Proposed Activities: Our product solves three major problems in survey design: representation, sample size, and cost. By fetching thousands of online personas to synthetically complete surveys using a large language model, studies launched through our platform are far more representative compared to manual survey platforms like Prolific that are dominated by White males. For studies targeted at certain demographic groups, researchers can easily filter through personas through our platform--something not possible on manual platforms where certain demographic groups are scarce and even non-existent. Secondly, with our platform, the potential sample size for studies grows exponentially. In the past, survey sample size was limited by those on the platform actually willing to complete the survey. With our platform, researchers can simply select from the much larger pool of those who have created an account on the site. Thirdly, our platform saves both researchers and participants time and hence, money. Participants only have to complete a one-time demographic survey, and researchers only have to cover the cost of a GPT call, which is usually over a hundred times cheaper than having to pay individual participants through Prolific. Here is a rough timeline for Synthify: July 2023 - October 2023: Pilot phase within the Wharton School We have partnered with several researchers from within the Wharton school to evaluate the efficacy of synthetic data when placed right alongside real survey data. We are piloting our software for political survey questions, by comparing our synthetic data generated from demographics to the corresponding responses of real people. We are also applying this same methodology in consumer psychology applications. November 2023 - February 2023: Building survey data center Once we have solidified a core user base within Wharton, we will go public, allowing any researchers or participants (who will volunteer their demographics to represent a “persona”) to register on our site. Researchers are able to recruit an appropriate sample of users for their study and their demographics will be used to synthetically complete surveys. Past surveys, responses, and useful model prompts will all be publicly available to any registered users. March 2023 - Beyond: Transition to private model We plan to expand beyond surveys to build a general “persona” data center where researchers can communicate with personas via our software (with explicit consent). Response data will be fed into a secure, custom large language model developed by a Y Combinator startup we have partnered with to generate formatted, useful data. Expertise required: We plan to hire additional web developers to improve the user experience of our software. We also plan to create an AI research team to analyze which prompts would be most useful for extracting synthetic data.
Openness: The entirety of our project is open source. Anyone can create a user profile and provide their demographic information which will be used to generate data (however, it is anonymized per user request). Any researcher can create a researcher profile to run a study to generate synthetic data. We plan to create a data center where all past synthetic surveys and useful prompts are publicly available. Also all user personas are available to all researchers, and there will be no data restrictions or paywalls allowing for fast research progress for anyone.
Challenges: One of the challenges we have is attracting a large enough user base for our platform to compete with other regular survey platforms where users are paid to fill out surveys. We would need to go through the appropriate legal producers to get authorization to pay people for providing their demographic information. In the short term, GPT calls are cheap, but eventually we will need a source of revenue or funding to continue to allow no paywalls for our platform.
Neglectedness: We are currently supported by Microsoft for Startups where we have received access to their Azure products in addition to OpenAI credits. If successful, this funding will increase to about 120k. We have applied to YCombinator and VIP-X, and are awaiting evaluation.
Success: We want our platform being used in social science departments at every major university to improve representation and accuracy in research. Finding representative data has always been the hardest aspect of social science research, and our platform certainly solves this problem. We are also looking to integrate our software into the company's existing tools that attempt to evaluate consumer behavior and preferences with online surveys. There are also many applications of our tool in ATS systems and online interviewing where resumes need to be automatically parsed.
Total Budget: 10000
Budget File: pdf
Affiliations: Sponsored by Microsoft for Startups. Working with The Wharton School.
LMIE Carveout: N/A
Team Skills: Our greatest asset is that this team has been working together on research projects since we were 14 years old. While interning at NASA, we immediately connected, building a malaria-detection edge computing device that was awarded an international research grant and earned publication in a top electrical engineering journal. Since then, we learned how to fully harness each other’s skill sets and manage a team, working on several successful projects over the years. Our team has years of experience in machine learning, in natural language processing to species distribution modeling. We have web development experience in many different frameworks, and we all have numerous projects related to data science, with applications in economics, ecology, and aerospace. Our broad programming skill sets give us leverage to rapidly scale. With over several research publications and connections at many major US universities, we know what it takes to gain traction in academia, and most importantly, what researchers need. We also have extensive experience in pitching ideas to VCs, companies, academics, science fair judges, and anyone who will listen. We are experts in getting people excited about new, innovative products. This will enable us to grow as a company and grow as innovators. We are sponsored by Microsoft for Startups and are collaborating with the Wharton School. We have connections and additional funding opportunities within the M&T program alumni network through UPenn.
How Did You Hear About This Call: Word of mouth (e.g. conversations and emails from IOI staff, friends, colleagues, etc.) / Boca a boca (por ejemplo, conversaciones y correos electrónicos del personal del IOI, amigos, colegas, etc.)
Submission Number: 19
Loading