Simplified Version of KST Analysis

One is simply trying to minimize the value of chi-squared in an m-1 dimensional space, where m is the number of knowledge states proposed for the knowledge structure. One might then list all the pathways and do another chi-squared fit to optimize them. However, the seconf fit is normally not needed if all you desire is to learn the one, two, or three most probable pathways. No pathway optimization is included in the True Basic program listed below. Whether this program works on your version of Basic is unknown until you try it. 
 


REM Basic Language Program for a Simplified Version of
REM Knowledge Space Theory as applied to two columns of
REM data read in from a spreadsheet file called Data1. Your version of Basic may require
REM a comma in each row separating the two columns As an example without the comma, for 5
REM response states and 3 proposed knowledge states. A real data set would have many more
REM response states and knowledge states.


0000000 24
1011001 13
1111001 7
1111000 11
1100111 3
0000000 24
1011101 0
1111111 0


REM This program version is for a "nq" question test, with 4 < nq < 16, usually, and with the
REM expectation of using less than 50 knowledge states for the fit.

REM The data are in two columns: the first n rows having the
REM response state r$ in binary notation in the first column and each
REM state's population Popr in the second column; immediately
REM following are the rows of proposed knowledge states k$ in binary notation
REM in the first column and their populations Popk in the
REM second column (n.b., one just needs placeholders such as a zero for Popk in the second column).
 

REM Lines further below have the careless error probabilities e( ) and the
REM lucky guess probabilities g( ), providing a suggested value of 0.1 for
REM each question. They can be varied "manually"
REM when attempting to find a better fit.

REM The "nq" equals the number of questions, the "ns" is the number of students, "n" equals
REM the number of occupied response states; "m" is the number of
REM knowledge structure states being tried. You must enter the proper values before
REM reading in the data file.

nq=10
ns=300
n=159
m=37
PRINT n
 

DIM r$(n), Popr(n)
DIM k$(m), Popk(m)
DIM Pop(n),e(15),g(15),v(nq)
DIM u$(nq),z$(nq)
DIM Prk(n,m),Kpop(m),Kp(n)
DIM Calpop(m), Prob(m),Cpoptot(m)
DIM Chisq(m),Poprk(m),ChisqT(m)
OPTION NOLET

REM Now the response states are read from Data1.

OPEN #3: name "Data1"
PRINT "n=",n
PRINT "m=",m
FOR i=1 to n
INPUT #3: r$(i),Popr(i)
REM PRINT i,r$(i),Popr(i)
NEXT i
PRINT "Read of response states is done!"

REM Now your guess for the knowledge structure states
REM is read from the first column of the same file and dummy
REM numbers are in the second column for these states.

FOR i=1 to m
INPUT #3: k$(i),Popk(i)
REM PRINT i,k$(i),Popk(i)
NEXT i
CLOSE #3

REM The following calculates the
REM total population and stores
REM the value as Pop(n+1).

Pop(1)=0
For j=1 to n
Pop(j+1)=Pop(j)+Popr(j)
NEXT j

PRINT "Population =",Pop(n+1)

REM The program will use up to nq of these 15 careless error probabilities. If you
REM use multiple choice questions without justification, you
REM should change these values.

e(1)=0.1
e(2)=0.1
e(3)=0.1
e(4)=0.1
e(5)=0.1
e(6)=0.1
e(7)=0.1
e(8)=0.1
e(9)=0.1
e(10)=0.1
e(11)=0.1
e(12)=0.1
e(13)=0.1
e(14)=0.1
e(15)=0.1

REM The program will use up to nq of these 15 lucky guess probabilities. If you use
REM multiple choice questions without justification, you
REM should change these values.

g(1)=0.1
g(2)=0.1
g(3)=0.1
g(4)=0.1
g(5)=0.1
g(6)=0.1
g(7)=0.1
g(8)=0.1
g(9)=0.1
g(10)=0.1
g(11)=0.1
g(12)=0.1
g(13)=0.1
g(14)=0.1
g(15)=0.1

REM Initialize 15 values of v( ) to unity. The program will reset the first nq of them.

v(1)=1
v(2)=1
v(3)=1
v(4)=1
v(5)=1
v(6)=1
v(7)=1
v(8)=1
v(9)=1
v(10)=1
v(11)=1
v(12)=1
v(13)=1
v(14)=1
v(15)=1

REM The following FOR-NEXT routine calculates the probability for
REM each response state to actually be in one of the guessed knowledge states.

FOR q=1 to n

FOR p=1 to m

FOR w=1 to nq
IF cpos(k$(p),"1",w)=w THEN
u$(w)="1"
ELSE
u$(w)="0"
END IF
NEXT w

FOR h=1 to nq
IF cpos(r$(q),"1",h)=h THEN
z$(h)= "1"
ELSE
z$(h)= "0"
END IF
NEXT h

FOR y=1 to nq
REM PRINT u$(y),z$(y)
IF u$(y)="1" THEN
IF z$(y)="1" THEN
v(y)=1-e(y)
ELSE
v(y)=e(y)
END IF
ELSE
IF z$(y)="0" THEN
v(y)=1-g(y)
ELSE
v(y)=g(y)
END IF
END IF
REM PRINT v(y)
NEXT y

Prk(q,p)=v(1)*v(2)*v(3)*v(4)*v(5)*v(6)*v(7)*v(8)*v(9)*v(10)*v(11)*v(12)*v(13)*v(14)*v(15)
REM PRINT "q","p","Prk(q,p)"
REM PRINT q,p,Prk(q,p)
NEXT p
REM PRINT "BREAK"
NEXT q

REM The next several FOR-NEXT loops calculate the predicted
REM knowledge state populations, normalize them, and calculate the
REM chi-squared values.

FOR p=1 to m
FOR q=1 to n
Kp(1)=0
Kp(q+1)=Kp(q)+Prk(q,p)*Popr(q)
NEXT q
Kpop(p)=Kp(n+1)
REM PRINT Kpop(p),"Kpop"
NEXT p

FOR p=1 to m
Cpoptot(1)=0
Cpoptot(p+1)=Cpoptot(p)+Kpop(p)
NEXT p
REM PRINT Cpoptot(m+1),"Cpoptot"

FOR p=1 to m
Calpop(p)=Kpop(p)*Pop(n+1)/Cpoptot(m+1)
REM PRINT Calpop(p)

Prob(p)=Calpop(p)/Pop(n+1)
REM PRINT Prob(p),"Prob"
NEXT p

FOR p=1 to m
FOR q=1 to n
IF r$(q)=k$(p) THEN
Poprk(p)=Popr(q)
ELSE
END IF
NEXT q
NEXT p
 

PRINT "k","Prob","Pred Pop","Pop","Chi Sq"
FOR p=1 to m

Chisq(p)=(Calpop(p)-Poprk(p))^2/Calpop(p)

REM Now for the output listing the knowledge state, its probability,
REM its calculated population, its actual population, and its
REM contribution to chi-squared.

PRINT k$(p),Prob(p),Calpop(p),Poprk(p),Chisq(p)
NEXT p

FOR p=1 to m
ChisqT(1)=0
ChisqT(p+1)=ChisqT(p)+Chisq(p)
NEXT p
PRINT "ChisqT(p)= ",ChisqT(m+1)

END

REM This chi-squared value must be compared to the chi-square
REM maximum value for a 5% fit level. The number of degrees of freedom is the sum of
REM m (the number of knowledge states in the knowledge structure) plus the number of
REM careless error probabilities (equal to the number of questions) plus the number of
REM lucky guess probabilities (equal to the number of questions) minus 1. For example,
REM a 10 question test with a 30 state knowledge structure has 49 d.o.f.

REM If you now desire to calculate an approximate value for the probabilities for the
REM numerous paths through through the knowledge structure, you
REM need to write out the paths by hand, multiply their state probabilities, and
REM normalize the result to 1. Or you can try to write a computer program to do this chore!

FINIS!