Modification Description:1-01
We regret that we have not fully explained the
basic principles of Classical Test Theory (CTT), Item Response Theory (IRT),
and the Rasch model, which have been more clearly explained and presented in
the revised article. At the same time, in the revised article we have cited the
original works on these theories after their development (e.g. Gulliksen 1950;
Lord 1953; Rasch 1960), with the following specific modifications:
2.1Theories of Test Development
2.2.1 Classical Test Theory
The basic idea of Classical Test Theory (CTT) is
to view the score of a test (often called the observed score of a test) as a
linear combination of true scores and error scores, i.e.
X=T+E
X
is the measurement result, T is the true score and E is the error score
(Gulliksen 1950). The principles and methods used in traditional test for
reliability, validity and item analysis are based on this model. Test theory is
based on assumptions, which can be divided into strong and weak assumptions,
with the weak assumptions being easily met by the majority of test data and the
strong assumptions not being easily met by the majority of test data. CTT is
based on three weak assumptions:(i)
If a person's particular psychological trait can be repeatedly measured enough
times using parallel tests, the mean of their observed scores will be close to
the true score. (ii) The correlation between the true and error scores is zero.
(iii) The correlation between the error scores on each parallel test is zero.
Using the CTT, the reliability, validity, difficulty and discrimination of the
measurement instrument can be tested (Brennan 2010). The theoretical system of
CTT is very well established and has the following advantages: (i) It has weak theoretical assumptions and less stringent
requirements for implementation conditions, and is therefore widely applicable.
(ii) It focuses on the validity of the measurement instrument, especially the
construct validity. (iii) It requires a small sample size, usually a sample
size of 200-500 is sufficient. However, the CTT also has certain shortcomings,
such as: (i) item dependence, as the test scores
depend on the difficulty of the items, making it difficult to compare subjects
who take different tests, and when the items are difficult, the test scores are
lower. (ii) Sample-dependent, where item difficulty is heavily dependent on the
sample of subjects. If the ability of the sample is high, the item difficulty
is low. (iii) Item difficulty and subject ability are not in the same frame of
reference, so it is not possible to verify that the items match the subjects
exactly (Lord 1953; Hambleton and Jones 1993; Fayers
2004).
2.2.2
Item Response Theory
In
response to the shortcomings of CTT, modern test theory has emerged. In the
item analysis section, the main emergence is the Item Response Theory(IRT), which is based on latent trait theory. A latent
trait is a stable, intrinsic characteristic (denoted as θ) that is not directly
observable and that governs a subject's response to a corresponding item and
shows consistency in response, there is a relationship between an underlying
trait of the subject and the responses to items measuring that trait as
follows: as the latent trait increases, the probability of correctly reflecting
the item P(θ) also increases (Lord 1977). IRT has a larger number of models,
and by finding the right model for the data, a more accurate analysis of the
items can be carried out. Currently, the more commonly used models are the
one-parameter logistic model (referred to as the 1PL model or Rasch model,
which has only difficulty), the two-parameter logistic model (referred to as
the 2PL model, which has difficulty and discrimination) and the three-parameter
logistic model (referred to as the 3PL model, which has difficulty
, discrimination and guessing), and the 3PL model equation is:
i=1,2,...,n
In
the formula, a, b and c correspond to the three parameters of item
discrimination, item difficulty and guessing factor respectively, and D is a constant
1.7. If the guessing factor is not taken into account,
then c = 0 and the model is a 2PL model; if it is further assumed that all
items have the same discrimination but different difficulty, then a = 0 and c =
0 and the model becomes a 1PL model.
While
the CTT is based on weak assumptions, the IRT is based on strong assumptions
and has three basic assumptions: (i) The unidimensionality of latent traits hypothesis - meaning
that all items that make up a given test measure the same latent trait. (ii)
The assumption of local independence - meaning that no correlation exists
between items for a given subject's ability. (iii) The item characteristic
curve assumption - a model of the relationship between the probability of a
correct response on an item and the subject's ability. The main advantages of
IRT include: (i) It solves the problem of sample
dependency of CTT. (ii) It solves the problem of item dependence of CTT. (iii)
IRT puts the subject's ability and the difficulty of the items on the same
scale for estimation, and can verify whether the items match the subject's
ability (Hambleton, Swaminathan,and
Rogers, 1991). However, IRT also has certain shortcomings, such as that it
usually requires a sample size of more than 500.
2.2.3
Rasch Model
The
Rasch model is one of the IRT models, which was developed by the Danish
mathematician Georg Rasch. It is formulated as:
P
is the probability of an individual with ability B correctly answering an item
of difficulty D, where X denotes the random variable of item success or
failure, with X = 1 indicating item success and X = 0 indicating item failure.
Rasch model assumes that the probability of success of an event is influenced
only by individual ability and item difficulty, i.e. the probability of an
individual correctly answering a question depends only on both the subject's
ability and the difficulty of the items ( Rasch 1960).
The Rasch model is currently the most simplified model in the field of IRT,
requiring the least number of parameters to be estimated, incorporating only
the difficulty and ability parameters. The mathematical expression of the Rasch
model reveals that the difficulty and ability parameters in the Rasch model are
symmetrical to each other. With the mathematical structure of the relative symmetry
of the parameter estimates, the Rasch model can transform a non-linear data
matrix consisting of item responses into two columns of interval data with
symmetrical properties reflecting the ability and difficulty parameters. These
properties of the Rasch model allow it to have two major advantages in
practical applications: firstly, it can transform non-linear data into data
with isometric significance, i.e. construct higher-ranking linear measurements
based on lower-ranking data, thus providing more informative measurements and
making them more accurate and objective (Bond and Fox 2015; Fischer and Molenaar 2012). Secondly, the Rasch model places the subject
and the item in the same scale, which helps the researcher to more accurately
assess and interpret the fit between the target being measured and the
measurement instrument (Lunz 2010). In addition, the
stability and accuracy of parameter estimates in Rasch models are generally
higher than other complex models, making them less susceptible to additional
factors during parameter estimation or data transformation, which can help to
improve the reliability of the measure.