Delivering quality data is a foremost objective at IntelliSurvey. This principle guides our teams throughout the survey life cycle. We also offer the utility of our proprietary data cleaning system, CheatSweep™, for all surveys hosted on our platform at no additional cost. CheatSweep is an IntelliSurvey internal-facing algorithm which weighs approximately twenty criteria to assess the probability that a respondent is cheating on a survey. While we do not share CheatSweep data with our client partners, there are some concepts and terms to be aware of.
What is CheatSweep?
CheatSweep is a two-step automated system for handling data quality within the IntelliSurvey platform. These steps are called scoring and sweeping. First, CheatSweep uses a mix of signals to score the likelihood a respondent is fraudulent. Then, the probability can be used to automatically remove bad records from the sample.
How is CheatSweep deployed?
CheatSweep scoring operates at the survey level; thus, there needs to be ample data collected before the algorithm can accurately estimate response scores. Scoring is automatic after the default minimum of n=50 is met. This minimum can be adjusted, albeit rarely, for special circumstances. Once the minimum has been met, scores are continually assessed at a set interval. Sweeping must be enabled by an IntelliSurvey Project Manager in consultation with the client. Typically, sweeping is initiated after n=100 has been achieved. Sweeping occurs while the survey is in field so sampling can replace removed records from desired quotas. There are two methods for sweeping.
The first (and most common) means of sweeping is based on a threshold. Assume CheatSweep scoring occurs on a 0 to 100 scale, with 100 being a high probability the respondent is fraudulent. The threshold method sets a value at which records with a CheatSweep score above it are removed (e.g., remove responses with a score above 80). The alternative sweeping technique is to remove by percentage. This tactic removes a designated percentage of responses with the highest scores (e.g., remove the worst 5% of responses). This latter technique should only be applied if the sample quality is well-understood, as CheatSweep scores are not constrained to be normally distributed (i.e., when the overall sample is very good, a 5% removal means good records will be tossed).
CheatSweep can also accommodate special circumstances. For instance, surveys with a mixture of customer list sample and panel sample can have CheatSweep tuned to never remove customer list respondents. CheatSweep can also penalize respondents who are detected to be in other countries, by either increasing their CheatSweep score or blocking them altogether depending on the client’s preference.
How does the survey impact data quality?
While we keep the specific elements of CheatSweep under our hats, we would be remiss to not mention the important role survey methodology plays in data quality. Appropriate screening criteria is essential for good data. The inclusion of sound survey design practices in your survey may also improve CheatSweep scoring (e.g., reverse-worded items in Likert scale tables). Elements relevant to CheatSweep are leveraged; no client instruction needed.