Data Capture Services

DataForce Data Capture Services Information, Tips, and How-tos

Your Paper-based Data Collection Options Just Got Better

If your mission is to discover answers to objectives, improve performance, or find new ideas and create improvements, then you may need to conduct research and produce statistical analysis to support your mission. In order to do so, you may be faced with the challenge of data capturing from paper surveys and if you don’t choose the right method for your project then you may end up being over budget or facing data quality challenges. To avoid this issue, we will help you review four options to capture data from paper questionnaires, and ultimately, choose the best one for your project. You may be surprised with option 4.

Option 1 – Manual key entry by human operators. 

This option is great for small volumes. Manually entering data into digital format, usually through spreadsheets, can quickly become a time consuming and tedious task. Typically, if you have more than 500 surveys – you will likely be better off automating the data collection, but it really depends on the length of the survey. Manual key entry by human operators is also very prone to errors and many times you will have to use double key entry to ensure accuracy which  will double your labor costs.

Option 2 – Purchasing and maintaining data collection software.

This option can get very costly depending on the number of surveys that you process annually. A decent standalone software that has ICR and OCR capabilities, will be around $15,000 plus you will have to pay for annual support which is typically 18-20% of the cost each year. However, if you have high volume (even just for short periods of time) or need a networked system, it can quickly add up to more than $100,000. This also doesn’t cover the cost of the servers, employees that need to learn and use the software, etc. It adds up very quickly and only makes sense if you are processing hundreds of thousands or a million plus images annually. In short, purchasing and maintaining data collection software can require high initial spend, technical know-how, staff training, and ongoing support cost that is unlikely to be reasonable for a one-time project.

Option 3 – Utilize a service bureau.

This is a great option if you only have a couple of projects a year, or if you just don’t want to manage the process. Just make sure that you understand the bureau’s processes and that they align with your expectations. Does your survey have PII or PHI? If it does, what are their security measures when it comes to your data? Are they using overseas resources? Are their employees trained with HIPAA, are they compliant with HITRUST or any other privacy compliance that you require? Make sure you utilize a service bureau that has the data collection methods, technology, security, and experience to process your research surveys and deliver precise results based on your coding and output specifications.

Option 4 – Rent the software

With the advances in technology, DataForce is able to offer the option for you to remotely and securely scan your surveys into our data collection software that is available to you on a month-to-month, or project-to-project basis. You are able to scan your surveys locally using the “rented” software and any image scanner. You can either use your staff or the bureau’s staff to do the verification. (Verification is reviewing fields that fall out of tolerance so the human operator can apply the established rules.) Using a data collection software on a short-term basis, paper-based data collection becomes pretty painless and you only pay for the images that are processed. When a software rental option is utilized, you can select the most appropriate data capture methodologies and processes to complete your survey project on time and in budget.

DataForce will provide you with the training, survey scanning software, and support to gather the information your organization needs in the convenience of your own premises. No long term commitments or costly annual maintenance Learn how!

Creating a Data Schema

At long last, you’ve made it to the data collection stage of your survey project. It’s time to warm up the automated data collection equipment, make sure everything is programmed correctly and prepare for the results to come in.

As with each stage of survey administration, there is some prep work needed to ensure accurate outcomes. In the case of automated data collection, it all begins with the Data Schema.   

A Data Schema is a blueprint of what all the numbers mean in the data file you will get with your results (see chart below). The good news is that you get to design this to your liking.

You will assign a value to each response (i.e, “1 = Daily”; “2 = Several times a week,” etc). We recommend you Include values for “blank” and “multi marks,” as shown in the chart below as -9 and -8, respectively. You will also want to include ranges where applicable. For example, if you are surveying teenagers and asking the year they were born, you can put a range on the year that you are expecting. If a date comes up out of range, the automation will stop for an operator to confirm the entry and ensure there was not a substitution error.

As part of your data schema, we highly recommend you include a data dictionary (see 3rd column in chart below). This identifies all the expected values for that question.

Data Schema

The data dictionary column allows you to easily build a query to check for values that are out of range.

Sample Data Testing

After your survey is programmed, the testing begins. Programmatic testing against the data schema ensures that your multi-modal data collection will run seamlessly and that the resulting data is delivered in a format you can use. Your data collection partner will specifically test for:

    • Coding – Did it code correctly?
    • Exporting – Did it export correctly?
  • Formatting – Can the customer work with the data as supplied or do they need something changed or adjusted

We start with a test that accounts for all possible survey responses. (The total number of surveys filled out is equal to the maximum number of response choices on the survey, plus 2). To test this, we fill out one survey with all the first response choices marked. Then we fill out a second survey with all the second response choices, etc.  We follow this up with a test to account for multiple marks entered on single response items, another with test text entered for comment style questions, and finally, one for “mark all that apply” questions. By testing for all possible response types, we ensure that all questions are programmed correctly.

The next test involves data sampling (i.e, using a small subset of your respondent population to collect data). We do a mixed response test with live forms filled out by a respondent subset to ensure nothing unexpected occurs in the way respondents are filling out the forms. For example, we might see that many people are selecting multiple responses to a single response question. This gives us the opportunity to alter the programming to capture all responses.

The Data Schema is an essential part of data collection programming, testing and processing. By creating an impeccable blueprint and investing the time to properly test samples, you will ensure the integrity of your results and safeguard against the pain of data loss!

For more information on data schemas, data collection or any aspect of survey mail management, contact us today!

By |2019-03-20T10:56:38+00:00October 9th, 2018|Data Capture Services|0 Comments

Automated Data Collection: Which Approach is Best?

Long gone are the days when mail survey responses were collected manually and key entered into digital format. Today, the question isn’t whether you should automate, but rather which automated data collection approach you should be using.

The most commonly used data capture technologies in the survey industry today are OMR (optical mark recognition) and Image Scanning, each with inherent advantages and disadvantages. While both provide exceptional accuracy and cost efficiency, OMR is significantly faster while Image Scanning offers more flexibility.

Choosing a data collection technology for your project is something you will need to do early in the planning process before your survey forms are designed. Your survey research partner can help you determine which solution will work best for your unique project.

Following are detailed descriptions of these industry-best quantitative data collection technologies:

OMR Technology

OMR technology detects the absence or presence of a mark. It is the fastest data collection technology in the industry and is particularly adept at measuring the darkness of a mark to help determine whether the mark is a valid response or an erasure. OMR is commonly used in standardized school testing such as the fill in the bubble test forms.

OMR forms are very specialized documents that require critical registration. This means that the forms must include precise “timing marks” along the edge of the form to let the OMR scanner know where to look for data. If this is not done correctly, data collection will be adversely affected. Therefore, you must work with a printer who has experience with OMR forms.

Color is also extremely important with OMR documents. Only colors that contain no black as part of their PMS color can be used. If a pen will be allowed, only various shades of red can be used, which further limits color choices. In addition, the paper stock requires the proper reflectance and fluorescence so that it will not read false marks during the data collection process.

As the forms are being scanned, the data is immediately written to the data file. OMR scanning has an accuracy rating of 99.9%, but only when the forms are filled out correctly. Respondents need to use the correct writing instrument and fill the bubbles completely to achieve this type of accuracy rating.

A drawback to OMR technology is that it requires you to produce pre-printed documents, which some clients have found to be inflexible, costly (especially with small quantities) and incapable of meeting design change requirements on short notice.

Image Scanning Technology

Image scanning uses ‘mark sense technology’ to detect marks on a form. While it looks a lot like OMR (collecting data from multiple choice questions), mark sense technology is very different. Rather than look for marks on a form, the scanner takes a bi-tonal (black and white) image of each form field and looks for differences in pixels between the scanned image and a template, revealing the marks in the process.

  • Time

Image scanning does take longer to process. This is because images are taken of each page, then processed against a pre-programmed template, called a document definition. Any fields that fall outside the tolerance are routed to a human verifier who reviews the field on screen and makes the appropriate choice based on the rules that have been established. Only after this step will the data be written to the data file. Testing has shown that image scan processing can take up to 40% longer than OMR depending on the rules established.

  • Flexibility

However, image scanning is much more flexible than OMR. The biggest advantage is during printing, as image scanning does not require special ink colors or the critical registration that OMR must have. Forms can be printed in black and white, and images can be stored and indexed off of any field that is collected during the scanning process.  

Forms can also include fields for open-ended comments (i.e, handwriting) that will be captured using a combination of ICR (Intelligent Character Recognition) software and operator review. Rules can also be established that force a field to be reviewed by an operator for editing. For example, all blank responses should be inspected. This is a popular rule for tests administered to young students who may have circled the choice vs marking the bubble. Other popular rules that are established for human editing include double marks, light marks that do not meet the minimum threshold, missing responses, invalid ID’s, out-of-range marks, and more.

Although the processing takes longer, we have found Image Scanning data to be more accurate than that of OMR because of the operator intervention with the form. While using an operator will certainly increase the cost of collecting data, the flexibility and increased accuracy may be worth it for your project!

 Automated Data Collection - Quick Reference Chart


OMR and Image Scanning are the best-automated data collection technologies in the industry today. Because of its inherent flexibility, Image Scanning is the more commonly used option. But for those that can adhere to OMR’s strict requirements, there is no faster or more accurate fully-automated way to collect data for multiple choice only questions.

When you do decide what automated data collection approach your project will need, one of the first things you need to prep for is a blueprint of what all the numbers mean in the data file you will get with your results. This is called a Data Schema. Check out our blog post on Creating a Data Schema

For more information on automated data collection, data capture services, or any aspect of mail survey management, contact us today!

By |2019-05-24T19:57:48+00:00September 13th, 2018|Data Capture Services|0 Comments

The Most Important Skillsets for In-House Survey Projects

If you are planning to effectively and efficiently handle a large-scale survey project in-house, knowing the questions you want to ask your target respondents is only the beginning. You also need to have access to a wide range of specialized skillsets. This is because practical considerations are certain to arise throughout the project that will require smart planning based on experience.

Here is a quick overview of a few of the most important skillsets involved in completing a large-scale survey project in-house.

Survey Design Skills

The core objective of your survey project is to obtain answers to questions that will help you analyze the thoughts, beliefs, actions, or experiences of your respondent population. To reach that objective, you must compose questions that will elicit useful responses.

During the survey design stage, it is important to consider more than the research objectives of your survey. There are other practical considerations relating to the efficiency of the project. Page count is an excellent example. If your survey is very complex or has so many questions that it must be printed in a large, multi-page booklet, you are jeopardizing the success of your project.

Large booklets create enormous challenges. They are difficult to design, print, and distribute. Additionally, they require extra effort to dismantle and collate so that they can be processed through your data extraction system. The greater the number of pages, the greater the complexity and room for error.

Graphic Design Skills

Another important skillset is graphic design. Your in-house designer must be proficient in Adobe InDesign, which is the most popular software for laying out surveys. Even though they may be a great graphic artist, their general skillset will only get them so far. They must also have experience in dealing with the many design choices that will impact your respondent’s experience while taking the survey as well as back-end quantitative data collection issues.

The layout of the survey must make it easy and intuitive for those completing the survey to fill in their responses. Even simple things like requesting a date of birth or phone number are a critical design issues. The choice to use a line to collect this information is problematic.

That’s because it is difficult for scanning software to read information formatted in a line with acceptable accuracy. The better design decision is to create a series of boxes that require respondents to place one letter or number in each box. This design option encourages respondents to write neatly. It also results in a more legible survey that is easier to review manually.

Printing Skills

The design of the survey affects the complexity of printing the survey. Remember that one of your goals is to automate data collection. Therefore, issues such as the number of pages, the layout of each page, and the placement of design elements are extremely important. For projects that require optical mark recognition (OMR), barcodes, or pre-slugging for accurate survey scanning and tracking, small printing errors have big consequences.

Logistics Skills

Managing the distribution, collection, and storage of your surveys presents an enormous logistical challenge. Once you’ve printed your surveys and support materials, you must then assemble the survey packets and distribute them to your respondents. You must also be prepared to receive the surveys when they are returned. The logistics skills involved include collating, packaging, labeling, addressing, bulk mailing, return-mail handling, warehousing, inventory tracking, and follow-up mailings.

Data Extraction Skills

Assuming that you have the necessary OMR and image-scanning equipment in place, you’ll also need to follow strict quality control measures to ensure the accuracy of your data. The data extraction process requires reliable data output formats, tables and rules, exceptions, electronic verification, manual verification, and image indexing to facilitate easy search and retrieval.

Do You Have Access to All of the Skillsets You Need?

As you prepare for your large-scale survey project, consider making a thorough assessment of your team’s skillsets. If you don’t have easy access to all of the skills you need to complete your project effectively and efficiently, it’s time to search for a partner that can help. Doing so will help your organization maximize its return on investment in time and money.

For more information on paper scanning servicesforms processing services or any aspect of survey mail management, contact us today!


Good, Fast, Cheap – Pick Two

The other day I got into a heated discussion with a colleague about this quote. You can Google it and it will return 100’s of pages and images. It appears that there are even businesses that post signs, telling their customers they can only have two.

In my humble opinion, any organization that communicates and/or promotes this quote is preparing you, the consumer, to either:

a. Be prepared to pay a lot (Fast/Good)

b. To be able to say “I told you so” when it didn’t go so well (Fast/Cheap)

c. Have the perfect excuse as to why your product/project is 2 weeks over due (Cheap/Good)

Personally, I would never do business with someone that would put themselves into a box. I think we should always be striving for the ‘impossible’ utopia.

Don’t get me wrong, if you are looking for a Cadillac with a Pinto budget; you will never be happy. Make sure that are you working with a vendor that you trust, understands your objectives – if you don’t trust them to give you sound advice; it is time for a change. The right relationship shouldn’t feel like a customer/vendor relationship; it should be a partnership.

Don’t limit yourself, we should always be thinking outside of the box on how to do things better, faster and more efficiently.

[activecampaign form=7]