This glossary is based on numerous sources. It is by no means perfect but serves as a guide for my students, and where possible, explains statistical terminology in layman’s terms. Inevitably, that means that there might be compromises. As such this glossary will also be in continued development.

Contact me if you believe that there are errors or you would like an entry included. And obviously still need to work on some entries post-W.

A B C D E F G H I J K L M N O P Q R S T U V W Sources

The numerical value of a number, disregarding its sign. The absolute values of − 8 and 8 are both 8.

The probability of a Type I error, that is, the probability of rejecting the null hypothesis when it is true. (also see significance level)

A hypothesis specified before any testing is done. This is opposed to HARK’ing: *Hypothesising After the Results are Known.*

A formula that allows the reversal of conditional probabilities.

A statistical approach based on Bayes’ theorem, where prior information or beliefs are combined with new data to provide estimates of unknown parameters.

- In statistical research design, the probability of a Type II error, that is, the probability of failing to reject the null hypothesis when it is false. 1 − β = power.
- In regression, the term for a standardised B coefficient (slope of the regression line).

Error that is systematic and might lead to incorrect interpretation of results.

Variables that can take only two values; also called dichotomous variables.

A graph that depicts the minimum and maximum or a range of uncertainty (whiskers), lower and upper quartiles (box) and the median (horizontal line in the box) for a set of data.

A study that includes the whole population rather than a sample of that population.

If you take repeated samples from a population with finite variance and calculate their averages, then the averages will be normally distributed. This is called the central limit theorem. The central limit theorem allows inferring that parametric test statistics will become robust to deviations from normality as the sample size increases.

A statistical test used to investigate whether a frequency distribution follows a specific theoretical distribution.

A statistical test used to investigate the association between two categorical variables.

A statistical method used to identify groups or clusters of individuals who have common features in terms of known variables

In research design, a variable that correlates with both the independent and dependent variables and is not in the causal pathway between them.

The extent to which a measurement, or series of measurements, adequately measures a construct (such as intelligence or personality).

Variables included in a study design not because they are the focus of interest but because they are believed to influence the variables of interest and the researcher wants to control for their effect. These should ideally be pre-specified.

A multifactorial regression model used with a time-to-event outcome. This could be, for example, the time to relapse after a substance abuse treatment or the time to finding employment after job loss. Sometimes also referred to as ‘Survival Analysis’.

The extent to which a measurement correlates with something else. For instance, how well scores on a test correlate with grades in school.

A statistic used to measure the degree of internal consistency between items in a questionnaire. Alpha will be between 0 and 1. Common guidelines are that values below .7 suggests inadequate reliability. Values above .9 suggest that items are very similar and in such a case perhaps fewer items can be used to measure the same construct.

A study in which data is collected at a single point in time. This is contrasted with a longitudinal design.

The number of values which are free to vary in an equation or statistic. More information can be found here here.

In research design, a variable that is assumed to be influenced by another, independent, variable (or more) included in the design.

A phenomenon is deterministic when its outcome is inevitable and all observations will take specific value. A phenomenon is stochastic when its outcome may take different values in accordance with some probability distribution.

**Dichotomous**data have two values and take the form “yes or no,” “got better or got worse.” Also known as binary variables.**Categorical**data have two or more categories such as yes, no, and undecided. Categorical data may be ordered (opposed, indifferent, in favor) or unordered (dichotomous, categorical, ordinal, metric). Preferences can be placed on an ordered or ordinal scale such as strongly opposed, opposed, indifferent, in favor, strongly in favor.**Metric**data can be placed on a scale that permits meaningful subtraction; for example, while “in favor” minus “indifferent” may not be meaningful, 35.6 pounds minus 30.2 pounds is.

Data that do not lie on a continuum and can only take certain values, usually counts (integers). For these types of data non-parametric statistics are often used.

Typically used in regression modelling to enable a categorical predictor variable to be included. A variable with n categories is converted into n–1 binary variables, where one category is the reference category.

An error whereby a researcher assumes that a statistical pattern found at an aggregate level must translate to a lower level of analysis. Suppose that a researcher finds an association between meat consumption and wealth at the national level, i.e. richer countries consume more meat. It would then be a fallacy to assume that the same relationship holds at the individual level, i.e. richer *individuals* eat more meat. The converse is the atomistic fallacy.

A statistical method which is part of multivariate statistics used to identify unknown underlying factors within a set of data. Factor analysis is often used when researchers want to reduce the complexity of a dataset. There is a distinction between “exploratory” and “confirmatory” factor analysis. (see SEM)

A design including two or more categorical variables and their interactions. These designs can be analysed via ANOVA among other technique. In a *full factorial* design, all possible combinations of the variables are included in the study.

A statistical test that can be used to investigate the association between two categorical variables when the sample is small.

A graph in meta-analysis used to display individual study estimates and confidence intervals, and the pooled estimate and confidence interval.

A statistical approach where the data alone are used to provide estimates of unknown parameters. This is as opposed to Bayesian Statistics. Typically some form significance testing is used.

In meta-analysis, a simple graphical method for exploring the results from studies to see if publication bias might be present.

A special type of statistical distribution that allows accommodating many different types of data.

An alternative approach to multilevel modelling for data with a hierarchical structure or clusters, or serial measurements, that gives population average estimates.

In survival analysis (Cox regression), the ratio of hazards or risks of outcome in two groups.

Term used in meta-analysis, among other things, referring to statistical variability between estimates. In meta-analysis, when there is unexplained heterogeneity researchers often perform meta-regressions to explain some of the observed heterogeneity.

A graph depicting the frequency distribution of a variable, with the length of each bar typically representing the number of cases or the expected number of cases (‘density’).

The dictionary definition of a hypothesis is a proposition, or set of propositions ,put forth as an explanation for certain phenomena. For statisticians, a simple hypothesis would be that the distribution from which an observation is drawn takes a specific form. For example, F[x] is N(0,1). In the majority of cases, a statistical hypothesis will be compound rather than simple—for example, that the distribution from which an observation is drawn has a mean of zero. Often, it is more convenient to test a null hypothesis—for example, that there is no or null difference between the parameters of two populations. There is no point in performing an experiment or conducting a survey unless one also has one or more alternate hypotheses in mind.

The number of new cases of a given condition occurring within a specific time period.

A set of separate data values that are not related to each other such as the weight of each man in a random sample of men. Many statistical approaches require that data are independently sampled. If that is not the case researchers often aim to model the source of the non-independence. For example, children in a class room might be more similar to each other than in a random sample of children. Therefore, researchers would, for example, use a multilevel model to account for such non-independence. Such an analysis would help accounting for the fact that children in a class room are more similar to one another than when randomly sampled.

In research design, a variable that is believed to exert an influence on another variable, the dependent variable.

A variable for which the relationships between two other variables are different, depending on the category or score of the interaction variable.

The range of values that includes the middle 50% of values when they are arranged in ascending order. Outliers are often defined as those cases 1.5*IQR. Extreme values can be defined as those cases 3*IQR.

A graph demonstrating survival probabilities over time.

A type of ordinal rating scale developed by the psychologist Rensis Likert. A Likert scale presents a statement and asks people to indicate their agreement or disagreement using an ordered scale.

A regression model used with a binary outcome.

A statistical test used to investigate the association between two paired proportions.

The arithmetic average of a set of numbers.

The central value of a set of numbers when they are ordered by value.

A statistical analysis which combines the results of several independent studies examining the same question. Often presented as part of a systematic review.

The most common value of a variable.

Statistical modelling approach for data with an hierarchical structure or clusters, or serial measurements. Sometimes referred to as random effects or mixed models.

Data that do not have numeric meaning and for which numeric values serve only as labels (such as gender or color). Also called categorical data.

Statistics not based on assumptions about the distribution of the population(s) from which the study data have been drawn or which make less stringent assumptions than parametric statistics.

Sampling in which the probability of selection for any unit or combination of units is unknown. An examples would be convenience sampling, such as advertising a study on social media. Many statistical techniques require probability sampling for statistical inference.

A continuous probability distribution with a symmetrical bell shape, which is followed by many naturally occurring variables, for example, stature. Sometimes referred to as a Gaussian distribution.

The baseline hypothesis that is tested in a statistical significance test and which is usually of the form ‘there is no difference between samples’, ‘these samples are from the same distribution’, or ‘there is no association’.

The number of patients who need to be treated in order that one additional patient has a negative outcome

The number of patients who need to be treated in order that one additional patient has a positive outcome.

A study in which subjects are observed, with exposures and outcomes measured, without any intervention by the researcher.

The probability of an event occurring divided by the probability of it not occurring.

A measure of the difference in odds between two groups, calculated by dividing the odds in one group by the odds in another group. The odds ratio is a measure of effect size, and formulae exist to convert it to a Pearson correlation.

The process of specifying how a concept will be defined and measured.

A variable that can be ordered, that is, ranked in size but without the assumption of equal intervals between consecutive values. For example, the grading of Pokemon cards by experts, into Mint, Near Mint, Good, Poor.

Models can be subdivided into two components, one systematic and one random. The systematic component can be a function of certain predetermined parameters (a parametric model), be parameter-free (nonparametric), or be a mixture of the two types (semiparametric). The definitions in the following section apply to the random component.

**Parametric** statistical procedures concern the parameters of distributions of a known form. One may want to estimate the variance of a normal distribution or the number of degrees of freedom of a chisquare distribution. Student t, the F ratio, and maximum likelihood are typical parametric procedures.

**Nonparametric** procedures concern distributions whose form is unspecified. One might use a nonparametric procedure like the bootstrap to obtain an interval estimate for a mean or a median or to test that the distributions of observations drawn from two different populations are the same. Nonparametric procedures are often referred to as distribution-free, though not all distribution-free procedures are nonparametric in nature.

**Semiparametric** statistical procedures concern the parameters of distributions whose form is not specified. Permutation methods and U statistics are typically employed in a semiparametric context.

See SIGNIFICANCE LEVEL AND p VALUE.

A measure of the strength of linear relationship between two continuous variables. It varies between -1 and 1 and can also be used as an indication of effect size. Read the original paper here

A regression model used to model rates or count data based on a Poisson distribution. An example would be when we want to model the number of driving tests an individual has completed.

A term used in Bayesian statistics. A probability distribution obtained by combining prior evidence with new information.

The probability that a statistical test will find a significant difference if a real difference of a given size exists, i.e. the null hypothesis is false. Power = 1 - \(Beta\)

In regression analysis, a variable which is used to predict the value of an outcome variable. See INDEPENDENT VARIABLE

A statistical method used to reduce a dataset with many inter-correlated variables to a smaller set of uncorrelated variables that explain the overall variability almost as well

A term used in Bayesian statistics. The distribution of prior beliefs or existing information are combined with new data to provide the posterior distribution.

Sampling methods in which all combinations of members of the population have a known probability of selection.

A study in which individuals are followed (and data collected) moving forward in time.

A term used in meta-analysis. A bias that occurs when the papers which are published on a topic are an incomplete subset of all the studies which have been conducted on that topic. There are tests which allow examining the potential occurrence of (see FUNNEL PLOT)

Research that generates non-numerical data which are not analysed using statistical methods, for example recorded in-depth interviews may be examined to identify common themes.

Data which can be expressed numerically and are usually either measured or counted.

Research that generates numerical data which can be analysed using statistical methods.

Error that is due to chance. Random error makes measurement less precise but does not introduce bias. The opposite of systematic error.

A method of expressing the relationship between the magnitude of two numbers. The numbers do not need to share a common unit (for instance, number of pet dogs per 1,000 population).

A graph plotting the sensitivity against 1–specificity for a diagnostic test at different cut-off points. This relates to signal detection theory.

A sub-group of cases selected from a population. Often researchers want the sample to be randomly drawn from the population in order to infer patterns. For example, suppose that we want to find out the height of male Northumbria students is larger than say 180 cm on average, then we would want to randomly sample individuals. For example, we could use a computer to randomly select 5% of cases in a database of male student ID’s. A *biased* sample would be to only sample those individuals enrolled in a sports science degree. With some exceptions, most statistical analyses rely on random sampling. If we sampled all individuals in a population, then no statistics with regards to ‘uncertainty’ need to be calculated. In our example, if we measured all Northumbria students, then there is no ‘uncertainty’ and we know the answer to our question: we now know the population average!

A way of testing assumptions made in statistical analyses by doing several analyses based on different assumptions, and comparing the results.

Repeated measurements taken over time. Such data requires longitudinal analysis, often referred to as time series analysis.

The significance level is the probability of making a Type I error. It is a characteristic of a statistical procedure.

The p value is a random variable that depends both upon the sample and the statistical procedure that is used to analyze the sample.

If one repeatedly applies a statistical procedure at a specific significance level to distinct samples taken from the same population when the hypothesis is true and all assumptions are satisfied, then the p value will be less than or equal to the significance level with the frequency given by the significance level.

Bias due to the way a sample is selected. For example, advertising a study with potentially large financial rewards might attract participants who are disproportionally poorer than the overall population.

Data that do not follow a symmetrical distribution. This can violate the assumptions of (parametric) statistical tests.

A measure of dispersion used for continuous data. It is equal to the square root of the variance.

A measure of precision. It is the standard deviation of the sampling distribution of the sample mean.

A graph which uses the data values themselves to depict the shape of a frequency distribution.

Error due to some cause other than chance. Systematic error can make observed values consistently higher or lower than true values and thus introduce bias. The opposite of random error.

A literature review which aims to identify and qualify all (published) research answering a given question.

A Type I error is the probability of rejecting the hypothesis when it is true. A Type II error is the probability of accepting the hypothesis when an alternative hypothesis is true. Thus, a Type II error depends on the alternative.

The power of a test for a given alternative hypothesis is the probability of rejecting the original hypothesis when the alternative is true. A Type II error is made when the original hypothesis is accepted even though the alternative is true. Thus, power is one minus the probability of making a Type II error. (also see BETA)

A code or variable used to identify all the records belonging to a single unit of analysis (for instance, a student ID to identify all the courses a single student is enrolled in). Often, a unique identifier is needed to link various datasets together.

How closely a measurement actually measures what it is intended to measure.

A quantity that is measured or observed and which varies (or can vary) from case to case. The opposite is a constant.

A measure of the variability of a range of numbers, calculated as the mean squared difference from the mean. The square root of the variance is the standard deviation.

A type of selection bias resulting from collecting data from a sample of volunteers rather than the general population. Volunteers could differ in all sorts of characteristics from the overall population.

Also known as Mann Whitney U test or Mann-Whitney Wilcox test. A statistical test comparing ordinal data from two independent groups. A non-parametric alternative to the independent samples *t*-test

The above is compiled based on the following sources:

Boslaugh, S. (2012). *Statistics in a nutshell: a desktop quick reference. 2nd edition.* Cambridge, UK: O’Reilly.

Crawley, M. J. (2013). *The R book: Second edition.* New York, NY: John Wiley & Sons.

Good, P. I., & Hardin, J. W. (2012). *Common errors in statistics (and how to avoid them).* Hoboken, NJ: John Wiley & Sons.

Peacock, J.L. & Peacock, P.J. (2011). *Oxford Handbook of Medical Statistics*. Oxford, UK: Oxford University Press.