Here are definitions of terms commonly used in evaluation planning and execution. Most definitions were drawn from glossaries produced by the U.S. Department of Educationís What Works Clearinghouse, the U.S. Department of Educationís Evaluation Toolkit for Magnet School Programs, the Centers for Disease Control and Prevention, or the U.S. Agency for International Development.


Information collected before or at the start of a project or program that provides a basis for planning and/or assessing subsequent progress and impact.


The extent to which a measurement, sampling, or analytic method systematically underestimates or overestimates the true value of a variable or attribute.

Case Study

A systematic description and analysis of a single project, program, or activity.

Causal Inference

The logical process used to draw conclusions from evidence concerning what has been produced or “caused” by a program. To say that a program produced or caused a certain result means that, if the program had not been there (or if it had been there in a different form or degree), then the observed result (or level of result) would not have occurred.

Comparison Group

The group that does not receive the services, products or activities of the program being evaluated. Also called the control group.

Comparison Group Design

A study design in which outcomes for a group receiving an intervention are compared to those for a group not receiving the intervention. Examples include randomized controlled trials, quasi-experimental designs, and regression discontinuity designs.

Cost Benefit Analysis

An evaluation of the relationship between program costs and outcomes. It can be used to compare different interventions with the same outcomes to determine efficiency.


The comparison condition. The counterfactual may be receiving a different intervention or not receiving any services or intervention. For group design studies, groups receiving different dosage levels or different versions of a single intervention are not acceptable counterfactuals.

Dependent Variable

Dependent (output, outcome, response) variables, so called because they are"dependent" on the independent variable; the outcome presumably depends on how these input variables are managed or manipulated.

Descriptive Statistical Analysis

Numbers and tabulations used to summarize and present quantitative information concisely.

Differential Attrition

The difference in attrition rates between the intervention and comparison groups. High levels of differential attrition in a randomized controlled trial suggest that the intervention group (as analyzed) may differ from the comparison group (as analyzed) in ways other than due to the intervention.

Effect Size

A standardized measure of the magnitude of an effect. The effect size represents the change (measured in standard deviations) in an average student’s outcome that can be expected if that student is given the intervention. Because effect sizes are standardized, they can be compared across outcomes and studies.


An intervention is effective if it improves outcomes relative to what would have been seen without the intervention.


A demonstration of the similarity of the analysis groups at baseline. Randomized controlled trials with high attrition and quasi-experimental designs must establish that the intervention and comparison groups used in the analysis were equivalent on observable characteristics at baseline. Characteristics for which equivalence must be established are outlined in the protocol for a review.

Evaluation Design

The logic model or conceptual framework used to arrive at conclusions about outcomes.

Evaluation Plan

A written document describing the overall approach or design that will be used to guide an evaluation. It includes what will be done, how it will be done, who will do it, when it will be done, why the evaluation is being conducted, and how the findings will likely be used.

Evaluation strategy

The method used to gather evidence about one or more outcomes of a program. An evaluation strategy is made up of an evaluation design, a data collection method, and an analysis technique.

Experimental Design

The random assignment of students, classrooms or schools to either the intervention group (or groups) or the control group (or groups). Randomized experiments are the most effective and reliable research method available for testing casual hypotheses and for making causal conclusions, that is, being able to say that the intervention cause the outcomes.

External Evaluatiom

The evaluation of an intervention or program conducted by entities and/or individuals that are not directly related to the implementing organization.

External Validity

The degree to which findings, conclusions, and recommendations produced by an evaluation are applicable to other settings and contexts.


The Family Educational Rights and Privacy Act (20 U.S.C. § 1232g; 34 CFR Part 99) is a Federal law that protects the privacy of student education records. The law applies to all schools that receive funds under an applicable program of the U.S. Department of Education. (Read more here.)


The extent to which an intervention or program is practiced and set forth as designed. It is one important focus of a process evaluation. 

Formative Evaluation

An evaluation conducted during the course of project implementation with the aim of improving performance during the implementation phase. Related terms: process evaluation, summative evaluation.  

Impact Evaluation

A systematic study of the change that can be attributed to a particular intervention, such as a project, program or policy. Impact evaluations typically involve the collection of baseline data for both an intervention group and a comparison or control group, as well as a second round of data collection after the intervention, sometimes even years later. Related terms: outcome evaluation; rigorous evaluation.

Independent Evaluation

An evaluation carried out by entities and persons not directly involved in the design or implementation of a project or program. It is characterized by full access to information and by full autonomy in carrying out investigations and reporting findings.

Independent Variable

A variable that may influence or predict to some degree, directly or indirectly, the dependent variable. An independent variable may be able to be manipulated by the researcher (for example, introduction of an intervention in a program) or it may be a factor that cannot be manipulated (for example, the age of beneficiaries).


An educational program, product, practice, or policy aimed at improving student outcomes.

Intervention Group

The group in an analysis that receives the intervention being tested.

Internal Evaluation

Evaluation conducted by those who are implementing and/or managing the intervention or program. Related term: self-evaluation.

Internal Validity

The degree to which conclusions about causal linkages are appropriately supported by the evidence collected.

Logic Model

A systematic and visual way to present the perceived relationships among the resources you have to operate the program, the activities you plan to do, and the changes or results you hope to achieve. Related terms: program theory; theory of action. Read more about the differences between logic model and theory of action here.

Logical Framework

A management tool used to improve the design and evaluation of interventions that is widely used by development agencies. It is a type of logic model that identifies strategic project elements (inputs, outputs, outcomes, impact) and their causal relationships, indicators, and the assumptions or risks that may influence success and failure. Related term: Results Framework.

Longitudinal Data

Data collected over a period of time, involving a stream of data for particular persons or entities over time.

Mixed Methods

Use of both quantitative and qualitative methods of data collection in an evaluation.


The performance and analysis of routine measurements to detect changes in status. Monitoring is used to inform managers about the progress of an ongoing intervention or program, and to detect problems that may be able to be addressed through corrective actions.


Knowledge, skills, attitudes, and other desired benefits that are attained as a result of an activity. To examine the effectiveness of an intervention for the WWC, eligible research must compare the outcome for a group receiving the intervention to the outcome for a group not receiving the intervention. An outcome measure is an instrument, device, or method that provides data on the outcome. An outcome domain is a group of closely related outcome measures, believed to provide information on the same underlying skill or ability.

Outcome Evaluation

This form of evaluation assesses the extent to which a program achieves its outcome oriented objectives. It focuses on outputs and outcomes (including unintended effects) to judge program effectiveness but may also assess program processes to understand how outcomes are produced. Related terms: impact evaluation; rigorous evaluation.

Overall Attrition

The attrition rate of the combined intervention and comparison groups. High levels of overall attrition in a randomized controlled trial suggest that the intervention group (as analyzed) may differ from the comparison group (as analyzed) in ways other than due to the intervention.

Performance Management

Systematic process of collecting and analyzing performance data to track progress towards planned results to improve resource allocation, implementation, and results. A goal is often continuous improvement.

Process Evaluation

A form of evaluation that focuses on what happens in a program as it is delivered and documents the extent to which intervention strategies and activities are executed as planned. It requires close monitoring of implementation activities and processes. This type of information can be used to adjust activities throughout a program’s lifecycle. 

Program Evaluation

The systematic collection of information about the activities, characteristics, and outcomes of programs to make judgments about the program, improve program effectiveness, and/or inform decisions about future program development.

Program Theory

A statement of the assumptions about why the intervention should affect the intended outcomes. The theory includes hypothesized links between (a) the program requirements and activities, and (b) the expected outcomes. It is depicted in the logic model. Related terms: theory of action; theory of change.  

Propensity Score Matching

A statistical process of identifying a control group (e.g. students, classrooms, schools) that is observationally similar to a specific treatment group in non-experimental settings.

Qualitative Data

Nonnumeric data the can answer the how and why questions in an evaluation. These data are needed to triangulate (see definition for triangulation) results to obtain a complete picture of the effects of an intervention.

Quasi-experimental Design

A design in which groups are created through a process that is not random. For a quasi-experimental design to be rigorous, the intervention and comparison groups must be similar, demonstrating baseline equivalence on observed characteristics, before the intervention is started.  

Random Assignment

The process of grouping research subjects so that each subject has a fair and equal chance of receiving either the intervention being studied (by being placed in the treatment group), or not (by being placed in the "control" group). Related term: randomization.

Randomized Controlled Trial

A design in which groups are created through a process that is random. Carried out correctly, random assignment results in groups that are similar on average in both observable and unobservable characteristics, and any differences in outcomes between the groups are due to the intervention alone.  

Regression Discontinuity Design

A design in which groups are created using a continuous scoring rule. For example, students may be assigned to a summer school program if they score below a preset point on a standardized test, or schools may be awarded a grant based on their score on an application. A regression line or curve is estimated for the intervention group and similarly for the comparison group, and an effect occurs if there is a discontinuity in the two regression lines at the cutoff.


Consistency or dependability of data with reference to the quality of the instruments and procedures used. Data are reliable when the repeated use of the same instrument generates the same results.

Rigorous Evaluation

An evaluation that uses experimental or quasi-experimental design for a specific purpose and to determine a program’s effectiveness. Related terms: impact evaluation; outcome evaluation.

Selection bias

When the treatment and control groups involved in the program are initially statistically unequal in terms of one or more of the factors of interest. This is a threat to internal validity.


Individuals who have an interest in a project. Examples include students, teachers, the project’s source of funding, the sponsoring or host organization, internal project administrators, participants, parents, community members and other potential program users.

Statistical Significance

A general evaluation term referencing the idea that a difference observed in a sample is unlikely to be due to chance. Statistical tests are performed to determine whether one group (e.g. the experimental group) is different from another group (e.g. the control or comparison group) on the measurable outcome variables used in a research study.

Summative Evaluation

Evaluation of an intervention or program in its later stages or after it has been completed to (a) assess its impact (b) identify the factors that affected its performance (c) assess the sustainability of its results, and (d) draw lessons that may inform other interventions. Related terms: outcome evaluation; formative evaluation.


The use of multiple data sources, observations, research methods, or theories in investigations to verify an outcome finding.

Unit of Analysis

The level at which an analysis is conducted. For example, a study that looks at student outcomes will likely conduct the analysis using student-level data.

Unit of Assignment

The level at which group assignment is conducted. For example, a study may have random assignment conducted at the school level, though analysis may be at the school or student level.


The extent to which data measures what it purports to measure and the degree to which that data provides sufficient evidence for the conclusions made by an evaluation.