Duplicates Are Bad
When managing online sample, it’s critical to prevent duplicate respondents. For obvious reasons, having the same respondent answer your survey multiple times is bad for your data. Just a few of those reasons are:
- Each duplicate respondent effectively reduces your valid base size by one, while doubling their own weight.
- Repeat respondents are biased because they’ve already been exposed to the survey and aren’t reacting to seeing concepts for the first time.
- Previously terminated respondents might have learned from prior attempts how to navigate the survey in order to get their incentive.
How Do They Get In?
How do duplicates get into your survey? There are several ways the same respondent might end up trying to take the same survey multiple times.
Multiple Independent Sources
If you’re using multiple sample sources that do not filter through the same platform, they’re not talking to each other about who’s already taken your survey. One source may or may not even know if their respondents are in the other’s supply. Remember, the sample industry isn’t a straight line from respondent to sample house to survey. It’s a web where a respondent signs up for a front facing panel where they collect rewards, which then passes that respondent off to direct suppliers and to sub-suppliers who pass them off to other suppliers, and eventually they get sent to your survey. And some respondents might belong to multiple front facing panels, so if they’re not careful, back end suppliers might purchase the same respondent from multiple sources. So even if you tell Supplier X not to use sample from Supplier Y, both Supplier X and Supplier Y might both be using Supplier Z and sending the same respondents to your survey.
Single Sourcing Can Still Be A Challenge
Even if you’re using a single sample source, some suppliers are more effective at deduplicating their own sample than others. Again, most suppliers aren’t using their own proprietary panel where each unique panelist is sitting in a database waiting to be invited to a survey. Each supplier is likely going through multiple sub-suppliers, so there’s potential for those sub-suppliers to send the same respondent to the survey. Matching respondents across multiple sources is complex problem, and naturally some suppliers are going to be better at it than others.
And You Always Have Fraud
Finally, there’s outright fraud. Whenever there’s an opportunity to make money on the internet, there will inevitably be people trying to take advantage of that. So even if you have a single database of “unique” respondents, there will be bad actors signing up multiple times under multiple email addresses, from multiple devices, doing their best to obscure the fact that they are signed up multiple times.
How Do You Define A Duplicate?
On the flip side, how do you tell if a duplicate is really a duplicate? If multiple people in the same household are responding to the same survey, are they being flagged as a duplicate? Or in a B2B setting, are multiple people taking the survey on a shared device? Filtering out too many false positives can frustrate respondents, lower incidence, and make it harder to field studies with hard-to-reach audiences.
Duplicates Can Be Valid
Additionally, there are times when you want to allow a respondent back into the survey. If a respondent drops out, you might want to allow them back in. And if you allow them back in, you need to consider if you can let them pick up where they left off or if they need to start over. If you loosen your screening criteria or open quotas that were previously closed, you might reinvite respondents that have already quota filled and need to allow them back in. And if you’re doing a tracking study, you may want to prevent past participants from the most recent wave(s) while allowing past participants from older waves in.
What Does Brookmark Do?
So how do you accurately identify and keep duplicate respondents out of your data without being too aggressive? It’s tempting to look at things like IP address or the IDs from the samplehouse to identify duplicates after the fact. But IP address isn’t a reliable indicator. And samplehouse IDs are often unique per transaction, not necessarily per respondent, and wouldn’t account for respondents from other samplehouses. Cookies are another common way to block duplicate respondents, but more and more users are disabling cookies on their devices. We’ve found that the best way to identify duplicates and block them is to rely on a technology partner that specializes in fraud and deduplication.
About Brookmark Research:
Brookmark is a data driven consultancy focused on developing powerful marketing strategies to fuel the growth of our clients’ businesses. We leverage primary marketing research, 1st party, 3rd party and proprietary data sources along with proprietary tools and strategy frameworks to develop, test and evaluate marketing driven strategies that deliver exponential ROI. Whether you are an early-stage growth company or a Fortune 1000 company, we help successful companies accelerate, and stagnate companies invigorate growth.