Internal sources of data are those that are internal to the organisation in question. For instance, if you are doing a research project for an organisation (or research institution) where you are an intern, and you want to reuse some of their past data, you would be using internal data sources.
The benefit of using these sources is that they are easily accessible and there is no associated financial cost of obtaining them.
External sources of data, on the other hand, are those that are external to an organisation or a research institution. This type of data has been collected by “somebody else”, in the literal sense of the term. The benefit of external sources of data is that they provide comprehensive data – however, you may sometimes need more effort (or money) to obtain it.
Let’s now focus on different types of internal and external secondary data sources.
There are several types of internal sources. For instance, if your research focuses on an organisation’s profitability, you might use their sales data. Each organisation keeps a track of its sales records, and thus your data may provide information on sales by geographical area, types of customer, product prices, types of product packaging, time of the year, and the like.
Alternatively, you may use an organisation’s financial data. The purpose of using this data could be to conduct a cost-benefit analysis and understand the economic opportunities or outcomes of hiring more people, buying more vehicles, investing in new products, and so on.
Another type of internal data is transport data. Here, you may focus on outlining the safest and most effective transportation routes or vehicles used by an organisation.
Alternatively, you may rely on marketing data, where your goal would be to assess the benefits and outcomes of different marketing operations and strategies.
Some other ideas would be to use customer data to ascertain the ideal type of customer, or to use safety data to explore the degree to which employees comply with an organisation’s safety regulations.
The list of the types of internal sources of secondary data can be extensive; the most important thing to remember is that this data comes from a particular organisation itself, in which you do your research in an internal manner.
The list of external secondary data sources can be just as extensive. One example is the data obtained through government sources. These can include social surveys, health data, agricultural statistics, energy expenditure statistics, population censuses, import/export data, production statistics, and the like. Government agencies tend to conduct a lot of research, therefore covering almost any kind of topic you can think of.
Another external source of secondary data are national and international institutions, including banks, trade unions, universities, health organisations, etc. As with government, such institutions dedicate a lot of effort to conducting up-to-date research, so you simply need to find an organisation that has collected the data on your own topic of interest.
Alternatively, you may obtain your secondary data from trade, business, and professional associations. These usually have data sets on business-related topics and are likely to be willing to provide you with secondary data if they understand the importance of your research. If your research is built on past academic studies, you may also rely on scientific journals as an external data source.
Once you have specified what kind of secondary data you need, you can contact the authors of the original study.
As a final example of a secondary data source, you can rely on data from commercial research organisations. These usually focus their research on media statistics and consumer information, which may be relevant if, for example, your research is within media studies or you are investigating consumer behaviour.
TABLE 5 summarises the two sources of secondary data and associated examples:
However, there is no reason why they should, since the whole procedure of doing statistical analyses is not that difficult – you just need to know which analysis to use for which purpose and to read guidelines on how to do particular analyses (online and in books). Let’s provide specific examples.
If you are doing descriptive research, your analyses will rely on descriptive and/or frequencies statistics.
Descriptive statistics include calculating means and standard deviations for continuous variables, and frequencies statistics include calculating the number and percentage of the frequencies of answers on categorical variables.
Continuous variables are those where final scores have a wide range. For instance, participants’ age is a continuous variable, because the final scores can range from 1 year to 100 years. Here, you calculate a mean and say that your participants were, on average, 37.7 years old (for example).
Another example of a continuous variable are responses from a questionnaire where you need to calculate a final score. For example, if your questionnaire assessed the degree of satisfaction with medical services, on a scale ranging from 1 (not at all) to 5 (completely), and there are ten questions on the questionnaire, you will have a final score for each participant that ranges from 10 to 50. This is a continuous variable and you can calculate the final mean score (and standard deviation) for your whole sample.
Categorical variables are those that do not result in final scores, but result in categorising participants in specific categories. An example of a categorical variable is gender, because your participants are categorised as either male or female. Here, your final report will say something like “50 (50%) participants were male and 50 (50%) were female”.
Please note that you will have to do descriptive and frequencies statistics in all types of quantitative research, even if your research is not descriptive research per se. They are needed when you describe the demographic characteristics of your sample (participants’ age, gender, education level, and the like).
When doing correlational research, you will perform a correlation or a regression analysis. Correlation analysis is done when you want to see if levels of an independent variable relate to the levels of a dependent variable (for example, “is intelligence related to critical thinking?”).
You will need to check if your data is normally distributed – that is, if the histogram that summarises the data has a bell-shaped curve. This can be done by creating a histogram in a statistics program, the guidelines for which you can find online. If you conclude that your data is normally distributed, you will rely on a Pearson correlation analysis; if your data is not normally distributed, you will use a Spearman correlation analysis. You can also include a covariate (such as people’s abstract reasoning) and see if a correlation exists between two variables after controlling for a covariate.
Regression analysis is done when you want to see if levels of an independent variable(s) predict levels of a dependent variable (for example, “does intelligence predict critical thinking?”). Regression is useful because it allows you to control for various confounders simultaneously. Thus, you can investigate if intelligence predicts critical thinking after controlling for participants’ abstract reasoning, age, gender, educational level, and the like. You can find online resources on how to interpret a regression analysis.
When you are conducting experiments and quasi-experiments, you are using t-tests, ANOVA (analysis of variance), or MANCOVA (multivariate analysis of variance).
Independent samples t-tests are used when you have one independent variable with two conditions (such as giving participants a supplement versus a placebo) and one dependent variable (such as concentration levels). This test is called “independent samples” because you have different participants in your two conditions.
As noted above, this is a between-subjects design. Thus, with an independent samples t-test you are seeking to establish if participants who were given a supplement, versus those who were given a placebo, show different concentration levels. If you have a within-subjects design, you will use a paired samples t-test. This test is called “paired” because you compare the same group of participants on two paired conditions (such as taking a supplement before versus after a meal).
Thus, with a paired samples t-test, you are establishing whether concentration levels (dependent variable) at Time 1 (taking a supplement before the meal) are different than at Time 2 (taking a supplement after the meal).
There are two main types of ANOVA analysis. One-way ANOVA is used when you have more than two conditions of an independent variable.
For instance, you would use a one-way ANOVA in a between-subjects design, where you are testing the effects of the type of treatment (independent variable) on concentration levels (dependent variable), while having three conditions of the independent variable, such as supplement (condition 1), placebo (condition 2), and concentration training (condition 3).
Two-way ANOVA, on the other hand, is used when you have more than one independent variable.
For instance, you may want to see if there is an interaction between the type of treatment (independent variable with three conditions: supplement, placebo, and concentration training) and gender (independent variable with two conditions: male and female) on participants’ concentration (dependent variable).
Finally, MANCOVA is used when you have one or more independent variables, but you also have more than one dependent variable.
For example, you would use MANCOVA if you are testing the effect of the type of treatment (independent variable with three conditions: supplement, placebo, and concentration training) on two dependent variables (such as concentration and an ability to remember information correctly).