Computerized Adaptive Testing Environment for Clients

Emīls Kālis, State Probation Service of Latvia

September 3, 2018 Helsinki

Evaluation of programs

Why is so hard to answer - does a program reduce recidivism?

Six steps for evaluation of programs

Step	What	Why	How
1	Content	To be sure that the program is well grounded theoretically.	Analysis of content/ expertise
2	Process of implementation	To check if oranizational aspects in practice match program requirements.	Analysis of practice
3	Inclusion of participants	To see in what level principle of responsivity is considered.	Analysis of selection principles and instruments applied
4	Effectiveness of program	To be sure that the program does intended changes to participiants?	Analysis of changes

From evaluation to evolution of programs

step 5

Calibration of program (interese of program developer)

are there some variables which ensure that some participants will experience more changes than others?
are theses changes stable? what kind of means are necessary to make these changes sustainable?

Results of such analysis can facilitate changes in content of program or criteria for inclusion of participants

Does the program help reduce recidivism?

Interese of society

step 6 - Do the changes initiated by program are related to reduction of recidivism?

This question can be answered only after we have positive answers to previous five steps!

Step 4 (Analysis of changes) is the most important in reaching step 6. Though in studies we often see that effectiveness of programs are being measured by recidivism rates, ignoring changes related to the specific program. In such situation is almost impossible to replicate results because due to game of too many variables involved including variable not related to programs and questionable research design problems.

Obstacles for measuring changes

Neglecting importance of analysis of change, instead preferring analysis of recidivism rate
Lack of instruments measuring changes
Use of inappropriate instruments, e.g., self-report measures

Problems with self-report measures

long list of tiresome questions- even people with high intellect abilities sometimes will find hard to answer to some questions.
questionnaires better work for studies where participants are not so interested to deceive. For example, how confident will you feel about your client's (sentenced for violation) negative aswer to the question: How likely that you are going to hit the guy who insulted you?
hard to develop multicultural measure

Problems with self-report measures

very hard to achieve sufficient reliability of measure. One can achieve high level of reliability, asking many times one question with different words. But when there are many different questions which intended to capture whole construct under interest, very explicit noise of question-type items appear.
almost impossible to develop parallel forms of questionnaire which has empirically proved stable measure.

In other words, changes appearing in the first application of questionnaire and second application of questionnaire could be barely related to real changes of interest.

Different approach to the problem

Principles for developing new measures

peace by peace - measuring separately each specific target of program
for clients with wide range of intellectual abilities
applicable in multicultural context
insensitive to deceiving
picture oriented items (questions) with few type of standard questions. Please select the picture where …..

Different approach to the problem

Methodology

Item response theory
- approach that gives opportunity to test measurement equality between different testing forms and time points
Computerized adaptive testing
- a mechanism how to reduce number of items(questions). A test taker is not exposed to all test items (100) but only appropriate items (e.g.,30) which matches his/her knowledge level. The number of items is exposed to a test person is depending of testing process where algorithm strives to gain reliable measure for this certain test person.

If one can solve 3+4, but can not solve 38+23, algorytm will find one's border of ability, giving tasks between 3+4 and 38+23

What is is CATEC?

Computerized Adaptive Testing Environment for Clients (CATEC) is

independent user interface application providing dynamic view of test items (questions);
CATEC is integrated with case management system (PLUS) - in order to verify client and to relate test results to particular client's case;
Content and forms of questions are administrated from open source software: R: A language and environment for statistical computing;
Computerized adaptive testing process is ensured by R packages catR and catIrt

Process of development of test

defining a target - ability/knowledge/attitude (the most important expected outcome from a program);
creation of ideas - trying to find out how the target could be measured by visual stimuli (gathering together the most creative and the most experienced workers);
creation of items (questions) - artistic work, embodying ideas;

Process of development of test

setting up the test - technical work;
collection of data - run the test in practice;
analysis of data and development of measurement models;
applying measurement models in practice;
continuous evaluation of program, providing objective data for calibration of program and good basis for measuring recidivism.

where we are and where are we going?

2016-2017 - development of Computerized Adaptive Testing Environment for Clients (CATEC)
2018-2019 - development of pilot tests
2020…. - integrating tests in probation business
…….. - evaluation of programs

Pilot tests

This year we plan to launch CATEC with pilot tests:

intellectual ability: abstract reasoning - non-verbal estimate of fluid intelligence;
ability to recognize emotions.

Example of CATEC: officer login

Example of CATEC: relating client with the main system

Example of CATEC: verification of client

Example of CATEC: type of answers - multi-select

Example of CATEC: type of answers - one correct

Example of CATEC: type of answers - free text

Example of CATEC: type of answers - question-picture and aswer-picture

additional benefits from the developed tests

Developed measures can help estimate client's risk level, substituting arbitrary rating risk category with more reliable and valid measure. For example, ability to recognize crime risk situations can play crucial role in desistance from crime and can be referred to risk factors such as poor cognitive skills.

What next?

As such kind of tests are picture based, diminishing the role of verbal instructions, they are multi-cultural friendly. This means that tests developed in one country can be relatively easily adapted in other country.

Computerized Adaptive Testing Environment for Clients

Evaluation of programs

Six steps for evaluation of programs

From evaluation to evolution of programs

Calibration of program (interese of program developer)

Does the program help reduce recidivism?

Interese of society

Obstacles for measuring changes

Problems with self-report measures

Problems with self-report measures

Different approach to the problem

Principles for developing new measures

Different approach to the problem

Methodology

What is is CATEC?

Process of development of test

Process of development of test

where we are and where are we going?

Pilot tests

This year we plan to launch CATEC with pilot tests:

additional benefits from the developed tests

What next?

Questions