top of page


Miscellaneous tips and common obstacles to keep in mind when engaging in data preparation and analysis.

Miscellaneous Tips

The 10 Commandments of Qualitative Research

  1. Account for personal biases which may influence findings.

  2. Acknowledge biases in sampling, and reflect on data collection and analysis methods to ensure depth and relevance.

  3. Engage with other researchers to reduce bias.

  4. Keep meticulous records and clear decision trails to ensure data interpretation is consistent and transparent.

  5. Include rich verbatim descriptions of participants' accounts to support findings.

  6. Document thought processes during data analysis and interpretation.

  7. Establish a comparison case to ensure different perspectives are represented.

  8. Practice respondent valudation: invite participants to confirm accuracy and themes of interview transcript.

  9. Practice data triangulation using different methods to produce a more comprehensive set of findings.

  10. Love your quantitative neighbour.

Quantitative and Qualitative Conceptual Correlates


The consistency of the analytic procedures, including accounting for personal and research method biases that may have influenced the findings.


The 'trustworthiness' by which the methods have been implemented. Documented such that decisions are clear and transparent, and others can obtain comparable figures.


Acknowledges the complexity of interaction with participants and the possible bias from the researchers' ideas, and attempts to distance them from participants' accounts.

Truth Value

Whether findings can be applied to other contexts, settings, or groups.


The transferability of the findings to other settings and contexts.

Common Obstacles

What is the Pareto Principle?

Here are the most common sources of data errors:

  • Respondents do not understand the questions

  • Questions are not suited for or adapted to the respondent

  • Staff are not properly trained for the questionnaire

  • Staff are not properly supervised or supported

  • Staff may begin estimating or guessing answers

80% of the effects come from 20% of the causes​. This means that 80% of your data and process errors are coming from 20% of your sources.​ This is also known as the "Law of the Vital Few" or "Principle of Factor Sparsity."

The 10 Deadly Sins of Study Design

  1. The study was not ethically conductedRead and review ethical protocols.

  2. Statistically underpoweredWas your sample size large enough?

  3. Inappropriate control groupEither no control group is selected or control group is inappropriate.

  4. Not properly randomizedYou can't take shortcuts with randomization.

  5. Intervention not delivered as intendedLearn more about ways this can happen by clicking here.

  6. Compliance or adherence is lowLow compliance or adherence is defined as <80%.

  7. The loss to follow-up is highA low follow-up loss is defined as >20%.

  8. The outcome is not properly assessedIs there systemic bias, or is your data unreliable, confounding, or unrepresentative?

  9. Data management system issuesData is not entered on time, or the data quality is not checked and corrected.

  10. Data not properly analyzedClustering, missing data, and dirty data not taken into account.

Common Data Messes and their Solutions

Definitions of Variables Not Known.

"Heaping" of Data

Problem: The variable names are cryptic or coding of variable values are not known.


  1. Prepare and maintain data codebooks

  2. Retain all original forms or screenshots

  3. Drop the data from your analysis.

Problem: This phenomenon occurs via coarsening of data by rounding true values to even multiples of reported units. Common for age, income, and weight.


  1. Assess the impact of the heaping on your analysis

  2. Identify data subgroup and remove from database

  3. Create statistical model to "redistribute" heaped data.

Invalid Data

Problem: Occurs when data points are out of acceptable range or when data are outliers.


  1. Review original paper forms or electronic records

  2. Delete the variable from the analysis

  3. Record or delete values that are out of range

  4. Perform analyses with and without the "suspect" data

  5. Impute new values.

Unexpected Distributions

Problem: When data is not Gaussian and there are floor effects. High variance can be due to data collector or other causes.


  1. Use transformations to normalize data if the distribution is valid

  2. Recode to binary or categorical data

  3. Delete data from the suspected source and repeat analysis.

Notes on Missing Data

Data often goes missing because of:

  • Censoring of data

  • Intentional 'missing-ness' (e.g. skip pattern)

  • Respondent refusal

  • Missing completely at random

Remember: there is a cascade effect of poor quality and missing data!

bottom of page