Untitled Document

Construct tests to evaluate the learner's mastery of the learning objective. The tests should be developed in the design phase. In the past, tests were often the last items developed in an instructional program in the development phase, after all of the training material has been built. Many of the tests were based on testing the instructional material, nice to include information, items not directly related to the learning objectives, etc, but the major purpose of the test is to promote the development of the learner. It ascertains whether the desired behaviour changes have occurred following the training activities. It performs this by evaluating the learner's ability to accomplish the learning objective. It also provides feedback to both the learner and the instructor.
The learning objective should be a good simulation of the conditions, behaviours and standards of the performance needed in the real world, therefore the evaluation at the end of the instruction should match the objective. The methodology and contents of the learning program should directly support the learning objectives. The instructional media should explain, demonstrate, and provide practice. Then, when students learn, they can perform on the test, meet the objective, and perform as they must in the real world. The diagram below, shows how it all flows together:

To focus the training program on the required task performance measurements, the present model is based on the following development order:

Analyze the task to determine the objective.
Develop the learning objective fully and determine if it has any enabling objectives. If it does, then explain them explicitly.
List the steps required to perform the objective to standards.
Build a test instrument to determine if the learner can perform the steps that are required to reach the objective.
Construct courseware that will train the learners to perform the objective. You know the learners can perform the objective if they can meet the evaluation standards.

Using this development order, the focal point of the model is the objective. The objective specifies what behaviour must be displayed to perform the task to standards. Training is then developed to teach the steps that will best lead to the desired behaviour. This is what training is all about!

Tests are often referred to as "evaluations" or "measurements". In order to avoid confusion we will define the terms used in evaluating learners:

Test or Test Instrument: A systematic procedure for measuring a sample of an individual's behaviour, such as multiple-choice, performance test, etc.
Evaluation: A systematic process for the collection and use of information from many sources to be applied in interpreting the results and in making value judgments and decisions. This collection of results or scores is normally used in the final analysis of whether a learner passes or fails. In a short course the evaluation could consist of one test, while in a larger course the evaluation could consist of dozens of tests. The process of determining the value and effectiveness of a learning program, module, and course
Measurement: The process employed to obtain a quantified representation of the degree to which a learner reflects a trait or behaviour. This is one of the many scores that an individual may achieve on a test. An evaluator is most interested in the gap between a learner's score and the maximum score possible. If the testing instrument is true, then this is the area that the learner did not master

A scheme should be constructed prior to producing test item. Without an advance plan, some test items may remain untouched while others will be around all the time. It is often easier to build test items on some topics than on others. These easier topics tend to get over-represented. It is also easier to build test items that require the recall of simple facts, rather than items calling for critical evaluation, integration of different facts, or application of principles to new situations. A good test or evaluation plan has a descriptive scheme that states what the learners may or may not do while taking the test. It includes behavioural objectives, content topics, the distribution of test items, and what the learner's test performance really means.

There are several varieties of tests. The most commonly used in training programs are Criterion Referenced Written Tests, Performance Tests, and Attitude Surveys. Although there are exceptions, normally one of the three types of test is given to test one of the three learning domains. Although most tasks require the use of more than one learning domain, there is generally one that stands out. The dominant domain should be the focal point of one of the following evaluations:

Criterion Referenced Test: Evaluates the cognitive domain which includes the recall or recognition of specific facts, procedural patterns, and concepts that serve in the development of intellectual abilities and skills. The testing of these abilities and skills are often measured with a written test or a performance test. Note: A criterion referenced evaluation focuses on how well a learner is performing in terms of a known standard or criterion. This differs from a norm referenced evaluation which focuses on how well a learner performs in comparison with other learners or peers.
Performance Test: Evaluates the psychomotor domain which involves physical movement, coordination, and use of the motor-skill areas. Measured in terms of speed, precision, distance, procedures, or techniques in execution. Can also be used to evaluate the cognitive domain. A performance test is also a criterion referenced test if it measures against a set standard or criterion. A performance test that evaluates to see who can perform a task the quickest would be a norm referenced performance test.
Attitude Survey: Evaluates the affective domain which addresses the manner in which we deal with things emotionally, such as feelings, values, appreciation, enthusiasms, motivations, and attitudes. Attitudes are not observable; therefore a representative behaviour must be measured. For example, we cannot tell if a worker is well motivated by looking at her or testing her. But we can observe some representative behaviours, such as being on time, working well with others, performing tasks in an excellent manner, etc.

Whenever possible, criterion referenced performance tests should be used. Having a learner perform the task under realistic conditions is normally a better indicator of a person's ability to perform the task under actual working conditions.
If a performance test is not possible, then a criterion referenced written test should be used to measure the learners' achievements against the objectives. The test items should determine the learner's acquisition of the KSAs required to perform the task. Since a written measuring device samples only a portion of the population of behaviours, the sample must be representative of the behaviours associated with the task. Since it must be representative, it must also be comprehensive.

Open-ended question: This is a question with an unlimited answer. The question is followed by a sufficient blank space for the response.
Checklist: This question lists items and directs the learner to check those that apply to the situation.
Two-Way question: This type of question has alternate responses, such as yes/no or true/false.
Multiple-Choice question: this gives several choices, and the learner is asked to select the most correct one.
Ranking Scales: This type of question requires the learner to rank a list of items.
Essay: Requires an answer in a sentence, paragraph, or short composition. Problem arises because of the wide variance in which instructors grade essay questions. The other types of questions (multiple choice, true/false, etc.) problem is that they emphasize on isolated bits of information and thus measure a learner's ability to recognize the right answer, but not the ability to recall or reproduce the right answer. In spite of this criticism, learners who score high on these types of questions also do well on essay examinations. Thus the two kinds of tests appear to measure the same type of competencies.

The most commonly used question in training environments is the multiple-choice question. Each question is called a test item. The parts of the test item are labelled as:

When writing multiple-choice questions follow these points to build a well constructed test instrument:

The stem should present the problem clearly
Only one correct answer should be included.
Distracters should be reasonable
'All the above' should be used sparingly. If used, an equal number of 'All the above' should be correct and incorrect (distracters). Do not use 'None of the above'.
Each item should test one central idea or principle. This enables the learner to fully concentrate on answering the question instead of analysing the question. It also allows the instructor to determine exactly which principles were not comprehended by the learner
The distracters and answer for a question should be listed in series. That is, high to low, low to high, alphabetical, longest to shortest, like vs. unlike, function, etc
Often, test items can be improved by modifying the stem. In the two examples below, the stem has been modified to eliminate duplicate words in the distracters. This makes the question easier to read.

1. The written objectives statement should
_____a. reflect the identified needs of the learner and developer
_____b. reflect the identified needs of the learner and organization
_____c. reflect the identified needs of the developer and organization
_____d. reflect the identified needs of the learner and instructor

In the above example, all the distracters were simply chosen at random. A better example with believable distracters and numbers in sequence would be:

If an acceptable and valid distracter cannot be found, then fewer distracters should be used. Although four choices are considered the standard for multiple-choice questions as they only allow a 25% chance of the learner guessing the correct answer, go with three if another believable distracter cannot be constructed. A distracter should never be used just to provide four choices as it wastes the learner's time reading through the possible choices.
Also, notice that the layout of the above example question makes an excellent score sheet for the instructor as it gives all the required information for a full review of the evaluation.

True and false questions provide an adequate method for testing learners when two or more distracters cannot be constructed for a multiple-choice question or to break up the monopoly of a long test. Multiple-choice questions are generally preferable as a learner who does not know the answer has a 25 percent chance of correctly guessing a question with four choices or approximately 33 percent for a question with three choices. With a true-false question their odds get better with a 50 percent chance of guessing the correct answer.
True and false questions are constructed as follows:

Question 1 is false as there should be approximately an equal number of true and false items. Question 2 is true for any type of question. Other pointers when using True and False tests are:

Although open ended questions provide a superior method of testing than multiple-choice or true-false questions as they allow little or no guessing, they take longer to construct and are more difficult to grade. Open ended questions are constructed as follows:

Placing the blank at or near the end of a statement allows the learner to concentrate on the intent of the statement. Also, the overuse of blanks tends to create ambiguity. For example:
Poor example:

A performance test allows the learner to demonstrate a skill that has been learned in a training program. Performance tests are also criterion referenced in that they require the learner to demonstrate the required behaviour stated in the objective. For example, the learning objective "Calculate the exact price on sales using a cash register" could be tested by having the learners ring up the total with a given number of sales items by actual using a cash register. The evaluator should have a check sheet to go by that lists all the performance steps that the learner must perform to pass the test. If the standard is met, then the learner passes. If any of the steps are missed or performed incorrectly, then the learner should be given additional practice and coaching and then retested.
There are three critical factors in a well conceived performance test:

The learner must know what behaviours (actions) are required in order to pass the test. This is achieved by providing adequate practice and coaching sessions throughout the learning sessions. Before the performance evaluation takes place, the steps required for a successful completion of the test must be fully understood by the learner.
The necessary equipment and scenario must be ready and in good working condition prior to the test. This is accomplished by prior planning and a commitment by the leaders of the organization to provide the necessary resources.
The evaluator must know what behaviours are to be looked for and how they are rated. The evaluator must know each step of the task to look for and the parameters for the successful completion of each step

Attitude surveys measure the results of a training program, organization, or selected individuals. The goal might be to change the entire organization (Organizational Development) or measure a learner's attitude in a specific area. Since attitudes are defined as latent constructs and are not observable in themselves, the developer must identify some sort of behaviour that would seem to be representative of the display of the attitude in question. This behaviour can then be measured as an index of the attitude construct. Often, the survey must be administered several times as individuals' attitude will vary from day to day, indeed, sometimes even hour to hour. Before and after measurements should be taken to show the changes in attitude. Generally, a survey is conducted one or more times to assess the attitude in a given area, then a program is undertaken to change the individual's attitudes. After the program is completed, the survey is again administered to test its effectiveness.