PISA data analysis, computing standard errors using replicate weights
In this post you can download the R code examples to compute the standard errors of the mean, standard deviation, proportions or mean differences, on the data of the PISA database, using the replicate weights method.
Tis post is related with the article computing standard errors with replicate weights in PISA database.
In this link you can download the Windows version of R program.
All examples must be run with a dataframe in which you have been removed the missing values.
Examples to compute of the standard error of the mean and standard deviation
In this link you can download the sample code for mean and standard deviation calculation. It consists of three functions that work with a dataframe. One of them computes the mean and standard deviation of a set of columns, another one perform the same calculations, but on a dataframe that contains more than one country and the third groups the data by the different levels of a number of factors before making the calculations.
Example 1, calculation of mean, standard deviation and their corresponding standard errors
The name of the first function is wght_meansd, and this is the code:
wght_meansd<-function(sdata,cmean,wght,brr) {
mmeans<-matrix(ncol=length(cmean),nrow=4);
mmeans[,]<-0;
if (is.numeric(cmean)) {
colnames(mmeans)<-names(sdata)[cmean];
}
else {
colnames(mmeans)<-cmean;
}
rownames(mmeans)<-c("MEAN","SE-MEAN","STDEV","SE-STDEV");
swght<-sum(sdata[,wght]);
for (i in 1:length(cmean)) {
mmeans[1,i]<-sum(sdata[,wght]*sdata[,cmean[i]])/swght;
mmeans[3,i]<-sqrt((sum(sdata[,wght]*(sdata[,cmean[i]]^2))/swght)-
mmeans[1,i]^2);
for (j in 1:length(brr)) {
sbrr<-sum(sdata[,brr[j]]);
mmbrj<-sum(sdata[,brr[j]]*sdata[,cmean[i]])/sbrr;
mmeans[2,i]<-mmeans[2,i] + (mmbrj - mmeans[1,i])^2;
mmeans[4,i]<-mmeans[4,i] +
(sqrt((sum(sdata[,brr[j]]*(sdata[,cmean[i]]^2))/sbrr)-mmbrj^2) -
mmeans[3,i])^2;
}
}
mmeans[2,]<-sqrt((mmeans[2,] * 4) / length(brr));
mmeans[4,]<-sqrt((mmeans[4,] * 4) / length(brr));
return(mmeans);
}
In the sdata parameter you must pass in the dataframe with the data. cmean is a vector with the indices or the names of the columns for which you want to calculate the mean and standard deviation. wght is the index or the name of the column with the student weight (W_FSTUWT), and brr is a vector with the indexes or names of the columns with the 80 replicate weights (W_FSTR1 a W_FSTR80).
To calculate the mean and standard deviation, the values of each column must be weighted using the student weight. The standard errors are obtained by calculating the mean or standard deviation using each one of the replicate weights and adding the squares of their differences with the mean calculated using the student weight. The final result is then divided by 20 and the square root is calculated.
The result is returned in a matrix with the same columns as those in the parameter cmean and with four rows, the first two corresponding to the mean and their standard error, and the last two to the standard deviations and their standard errors.
Example 2, same as above, but applied to a sample with more than one country
The function of this example is wght_meansdbycnt, and this is the code:
wght_meansdbycnt<-function(sdata,cmean,cnt,wght,brr) {
lcntrs<-vector('list',length(levels(as.factor(sdata[,cnt]))));
for (p in 1:length(levels(as.factor(sdata[,cnt])))) {
names(lcntrs)[p]<-levels(as.factor(sdata[,cnt]))[p];
}
for (p in 1:length(levels(as.factor(sdata[,cnt])))) {
mmeans<-matrix(ncol=length(cmean),nrow=4);
mmeans[,]<-0;
if (is.numeric(cmean)) {
colnames(mmeans)<-names(sdata)[cmean];
}
else {
colnames(mmeans)<-cmean;
}
rownames(mmeans)<-c("MEAN","SE-MEAN","STDEV","SE-STDEV");
rcnt <- sdata[,cnt]==levels(as.factor(sdata[,cnt]))[p];
swght<-sum(sdata[rcnt,wght]);
for (i in 1:length(cmean)) {
mmeans[1,i]<-sum(sdata[rcnt,wght]*sdata[rcnt,cmean[i]])/swght;
mmeans[3,i]<-sqrt((sum(sdata[,wght]*(sdata[,cmean[i]]^2))/swght)-
mmeans[1,i]^2);
for (j in 1:length(brr)) {
sbrr<-sum(sdata[rcnt,brr[j]]);
mmbrj<-sum(sdata[rcnt,brr[j]]*sdata[rcnt,cmean[i]])/sbrr;
mmeans[2,i]<-mmeans[2,i] + (mmbrj - mmeans[1,i])^2;
mmeans[4,i]<-mmeans[4,i] +
(sqrt((sum(sdata[rcnt,brr[j]]*(sdata[rcnt,cmean[i]]^2))/sbrr)-
mmbrj^2) - mmeans[3,i])^2;
}
}
mmeans[2,]<-sqrt((mmeans[2,] * 4) / length(brr));
mmeans[4,]<-sqrt((mmeans[4,] * 4) / length(brr));
lcntrs[[p]]<-mmeans;
}
return(lcntrs);
}
The function has an additional parameter, cnt, which is for the index or the name of the column with the country. Calculations are the same as in the previous case, but the result is returned in a list of matrices, each corresponding to one of the countries.
Example 3, compute the mean and standard deviation grouped by a set of factors
In this example we calculate the mean and standard deviation, along with their standard errors, but separating them according to the levels of a set of factors, such as sex or educational level of parents.
The function is wght_meansdfact and this is the code:
wght_meansdfact<-function(sdata,cmean,cfact,wght,brr) {
nc<-0;
for (i in 1:length(cfact)) {
nc <- nc + length(levels(as.factor(sdata[,cfact[i]])));
}
mmeans<-matrix(ncol=nc,nrow=4 * length(cmean));
mmeans[,]<-0;
cn<-c();
for (i in 1:length(cfact)) {
for (j in 1:length(levels(as.factor(sdata[,cfact[i]])))) {
cn<-c(cn, paste(names(sdata)[cfact[i]],
levels(as.factor(sdata[,cfact[i]]))[j], sep="-"));
}
}
colnames(mmeans)<-cn;
rn<-c();
for (i in 1:length(cmean)) {
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEAN",sep="-"));
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEAN","SE",sep="-"));
rn<-c(rn, paste(names(sdata)[cmean[i]],"STDEV",sep="-"));
rn<-c(rn, paste(names(sdata)[cmean[i]],"STDEV","SE",sep="-"));
}
rownames(mmeans)<-rn;
ic<-1;
for(f in 1:length(cfact)) {
for (l in 1:length(levels(as.factor(sdata[,cfact[f]])))) {
ir<-1;
rfact<-sdata[,cfact[f]]==levels(as.factor(sdata[,cfact[f]]))[l];
swght<-sum(sdata[rfact,wght]);
for (i in 1:length(cmean)) {
mmeans[ir,ic]<-sum(sdata[rfact,wght] *
sdata[rfact,cmean[i]])/swght;
mmeans[ir+2,ic]<-sqrt((sum(sdata[rfact,wght] *
(sdata[rfact,cmean[i]]^2))/swght)-mmeans[ir,ic]^2);
for (j in 1:length(brr)) {
sbrr<-sum(sdata[rfact,brr[j]]);
mmbrj<-sum(sdata[rfact,brr[j]]*sdata[rfact,cmean[i]])/sbrr;
mmeans[ir+1,ic]<-mmeans[ir+1,ic] + (mmbrj -
mmeans[ir,ic])^2;
mmeans[ir+3,ic]<-mmeans[ir+3,ic] +
(sqrt((sum(sdata[rfact,brr[j]] *
(sdata[rfact,cmean[i]]^2))/sbrr)-mmbrj^2) -
mmeans[ir+2,ic])^2;
}
mmeans[ir+1,ic]<-sqrt((mmeans[ir+1,ic] * 4) / length(brr));
mmeans[ir+3,ic]<-sqrt((mmeans[ir+3,ic] * 4) / length(brr));
ir<-ir+4;
}
ic<-ic + 1;
}
}
return(mmeans);
}
This time, the results are grouped using the different levels of the factors, which column index or name is passed on a vector in the cfact parameter. As a result, a matrix similar as that of the Example 1 is returned, but in each column you have the mean and standard deviation for a given level of each factor. For example, you have the mean of male students in a column and that of the female ones in another.
Examples of calculus of proportions and its standard error
The functions of these examples perform a count of levels of factors to calculate their proportions in a sample, and the standard error of those proportions. In this link you can download the sample code to calculate proportions.
In these and subsequent examples, the wght and brr parameters have the same use and meaning as in example 1 above.
Example 4, simple calculation of the proportions of a set of factors in a sample
In this example you will obtain the percentage of cases of each of the levels of a set of factors in the sample.
The function is wght_ppc, and this is the code:
wght_ppc<-function(sdata,cppc,wght,brr) {
nc<-0;
for (i in 1:length(cppc)) {
nc <- nc + length(levels(as.factor(sdata[,cppc[i]])));
}
mppc<-matrix(ncol=nc,nrow=2);
mppc[,]<-0;
cn<-c();
for (i in 1:length(cppc)) {
for (j in 1:length(levels(as.factor(sdata[,cppc[i]])))) {
cn<-c(cn, paste(names(sdata)[cppc[i]],
levels(as.factor(sdata[,cppc[i]]))[j], sep="-"));
}
}
colnames(mppc)<-cn;
rownames(mppc)<-c("PPC","SE");
swght<-sum(sdata[,wght]);
ix<-1;
for (i in 1:length(cppc)) {
for (j in 1:length(levels(as.factor(sdata[,cppc[i]])))) {
rfact<-sdata[,cppc[i]]==levels(as.factor(sdata[,cppc[i]]))[j];
mppc[1,ix]<-sum(sdata[rfact,wght]) / swght;
for (k in 1:length(brr)) {
sbrr<-sum(sdata[,brr[k]]);
ppcbrr<-sum(sdata[rfact,brr[k]]) / sbrr;
mppc[2,ix]<-mppc[2,ix] + (ppcbrr-mppc[1,ix])^2;
}
x<-ix + 1;
}
}
mppc[2,]<-sqrt((mppc[2,] * 4) / length(brr));
return(mppc);
}
To perform the count, actually what we do is add up the student weights instead of counting cases. To calculate the standard errors we employ a similar procedure as that used for the mean and standard deviation: the proportion is calculated using each of the replicate weights and we calculate the standard error by the square of the differences with the proportion obtained using the student weight.
The sdata parameter is for the dataframe with the data, and cppc is a vector with indexes or column names of the factors.
As a result you obtain a matrix with two rows. In the first of them are the proportions between 0 and 1, of the different levels of the factors, one column per level. In the second row are the corresponding standard errors.
Example 5, calculation of the proportions of a set of factors, combining their different levels
In this example, we calculate the proportions of all combinations of levels of the set of factors, rather than each level separately.
The function is wght_ppccombined, and this is the code:
wght_ppccombined<-function(sdata,cppc,wght,brr) {
nc<-1;
for (i in 1:length(cppc)) {
nc <- nc * length(levels(as.factor(sdata[,cppc[i]])));
}
mppc<-matrix(ncol=nc,nrow=2);
mppc[,]<-0;
cn<-c();
#####################################################################
# To avoid calculate the combinations in a recursive process, which would
# require another function, we will use the ccom vector to define the
# combinations. Each position of the vector corresponds with the
# corresponding factor in cppc, and contains the ordinal of one of the
# factor levels. In each round of the loop, the ccom vector contains a
# different combination of the levels of all the factors in cppc
######################################################################
ccom<-rep(1,length(cppc));
bw<-TRUE;
while(bw) {
cnc<-paste(names(sdata)[cppc[1]],
levels(as.factor(sdata[,cppc[1]]))[ccom[1]], sep="-");
for (i in 2:length(ccom)) {
cnc<-paste(cnc,names(sdata)[cppc[i]],
levels(as.factor(sdata[,cppc[i]]))[ccom[i]], sep="-");
}
cn<-c(cn, cnc);
for (i in length(cppc):1) {
if (ccom[i] < length(levels(as.factor(sdata[,cppc[i]])))) {
ccom[i] = ccom[i] + 1;
break;
}
else {
if (i == 1) {
bw<-FALSE;
break;
}
else {
ccom[i] = 1;
}
}
}
}
colnames(mppc)<-cn;
rownames(mppc)<-c("PPC","SE");
swght<-sum(sdata[,wght]);
ix<-1;
######################################################################
# perform again the previous procedure for calculate the combinations of
# the factor levels
######################################################################
bw<-TRUE;
ccom<-rep(1,length(cppc));
while(bw) {
rfact<-sdata[,cppc[1]]==levels(as.factor(sdata[,cppc[1]]))[ccom[1]];
for (i in 2:length(ccom)) {
rfact<-rfact & (sdata[,cppc[i]] ==
levels(as.factor(sdata[,cppc[i]]))[ccom[i]]);
}
mppc[1,ix]<-sum(sdata[rfact,wght]) / swght;
for (k in 1:length(brr)) {
sbrr<-sum(sdata[,brr[k]]);
ppcbrr<-sum(sdata[rfact,brr[k]]) / sbrr;
mppc[2,ix]<-mppc[2,ix] + (ppcbrr-mppc[1,ix])^2;
}
ix<-ix + 1;
for (i in length(cppc):1) {
if (ccom[i] < length(levels(as.factor(sdata[,cppc[i]])))) {
ccom[i] = ccom[i] + 1;
break;
}
else {
if (i == 1) {
bw<-FALSE;
break;
}
else {
ccom[i] = 1;
}
}
}
}
mppc[2,]<-sqrt((mppc[2,] * 4) / length(brr));
return(mppc);
}
The result is a matrix with two rows, with a column for each of the combinations of levels of all factors and two rows, one for the proportions and the other for the standard errors.
Example 6 calculating the proportions of crossing two sets of factors
In this example, we will cross the values of the levels of one set of factors with other different set and we will count the cases using their weights, along with the calculation of standard errors.
The name of the function is wght_ppccrossed, and this is the code:
wght_ppccrossed<-function(sdata,rppc,cppc,wght,brr) {
nc<-0;
for (i in 1:length(cppc)) {
nc <- nc + length(levels(as.factor(sdata[,cppc[i]])));
}
nr<-0;
for (i in 1:length(rppc)) {
nr <- nr + length(levels(as.factor(sdata[,rppc[i]])));
}
mppc<-matrix(ncol=nc,nrow=nr);
mppc[,]<-0;
seppc<-matrix(ncol=nc,nrow=nr);
seppc[,]<-0;
result<-vector('list',2);
names(result)<-c("PPC","SE");
cn<-c();
for (i in 1:length(cppc)) {
for (j in 1:length(levels(as.factor(sdata[,cppc[i]])))) {
cn<-c(cn, paste(names(sdata)[cppc[i]],
levels(as.factor(sdata[,cppc[i]]))[j], sep="-"));
}
}
colnames(mppc)<-cn;
colnames(seppc)<-cn;
cr<-c();
for (i in 1:length(rppc)) {
for (j in 1:length(levels(as.factor(sdata[,rppc[i]])))) {
cr<-c(cr, paste(names(sdata)[rppc[i]],
levels(as.factor(sdata[,rppc[i]]))[j], sep="-"));
}
}
rownames(mppc)<-cr;
rownames(seppc)<-cr;
ir<-1;
swght<-sum(sdata[,wght]);
for (r in 1:length(rppc)) {
for (l in 1:length(levels(as.factor(sdata[,rppc[r]])))) {
rfact <- (sdata[,rppc[r]] ==
levels(as.factor(sdata[,rppc[r]]))[l]);
ic<-1;
for (i in 1:length(cppc)) {
for (j in 1:length(levels(as.factor(sdata[,cppc[i]])))) {
cfact<-(sdata[,cppc[i]] ==
levels(as.factor(sdata[,cppc[i]]))[j]) & rfact;
mppc[ir,ic]<-sum(sdata[cfact,wght]) / swght;
for (k in 1:length(brr)) {
sbrr<-sum(sdata[,brr[k]]);
ppcbrr<-sum(sdata[cfact,brr[k]]) / sbrr;
seppc[ir,ic]<-seppc[ir,ic] + (ppcbrr-mppc[ir,ic])^2;
}
ic<-ic + 1;
}
}
ir<-ir + 1;
}
}
seppc[,]<-sqrt((seppc[,] * 4) / length(brr));
result[[1]]<-mppc;
result[[2]]<-seppc;
return(result);
}
In the cppc parameter we pass a vector with the indexes or names of the columns of sdata we want to cross with the columns in the rppc parameter, also with a vector with indexes or column names.
The result is a list with two matrices, the first with the proportions, between 0 and 1, result of the cross between the rows (in rppc) and columns (in cppc) of the different values of levels of the factors, the second with the corresponding standard errors.
Example 7, computing the proportions of a number of factors in a sample with several countries
This example is like Example 4, but this time we have a sample with more than one country.
The name of the function is wght_ppc_bycnt, and this is the code:
wght_ppc_bycnt<-function(sdata,cnt,cppc,wght,brr) {
nc<-0;
for (i in 1:length(cppc)) {
nc <- nc + length(levels(as.factor(sdata[,cppc[i]])));
}
nr<-length(levels(as.factor(sdata[,cnt])));
mppc<-matrix(ncol=nc,nrow=nr);
mppc[,]<-0;
seppc<-matrix(ncol=nc,nrow=nr);
seppc[,]<-0;
result<-vector('list',2);
names(result)<-c("PPC","SE");
cn<-c();
for (i in 1:length(cppc)) {
for (j in 1:length(levels(as.factor(sdata[,cppc[i]])))) {
cn<-c(cn, paste(names(sdata)[cppc[i]],
levels(as.factor(sdata[,cppc[i]]))[j], sep="-"));
}
}
colnames(mppc)<-cn;
colnames(seppc)<-cn;
cr<-c();
for (j in 1:length(levels(as.factor(sdata[,cnt])))) {
cr<-c(cr, levels(as.factor(sdata[,cnt]))[j]);
}
rownames(mppc)<-cr;
rownames(seppc)<-cr;
ir<-1;
for (l in 1:length(levels(as.factor(sdata[,cnt])))) {
rfact <- (sdata[,cnt]==levels(as.factor(sdata[,cnt]))[l]);
swght<-sum(sdata[rfact,wght]);
ic<-1;
for (i in 1:length(cppc)) {
for (j in 1:length(levels(as.factor(sdata[,cppc[i]])))) {
cfact<-(sdata[,cppc[i]] ==
levels(as.factor(sdata[,cppc[i]]))[j]) & rfact;
mppc[ir,ic]<-sum(sdata[cfact,wght]) / swght;
for (k in 1:length(brr)) {
sbrr<-sum(sdata[rfact,brr[k]]);
ppcbrr<-sum(sdata[cfact,brr[k]]) / sbrr;
seppc[ir,ic]<-seppc[ir,ic] + (ppcbrr-mppc[ir,ic])^2;
}
ic<-ic + 1;
}
}
ir<-ir + 1;
}
seppc[,]<-sqrt((seppc[,] * 4) / length(brr));
result[[1]]<-mppc;
result[[2]]<-seppc;
return(result);
}
The index or name of the column with the country must be passed in the cnt parameter. The result is a list with two matrices. The first has a row for each country and a column with the proportions for each of the levels of the factors, the second contains the corresponding standard errors.
Examples of calculation of mean differences and their standard error
In these examples we can see how to calculate a difference between two means and their standard error. In this link you can download the sample code to calculate mean differences.
In these examples the student weight and the replicate weights are passed in the wght and brr parameters, as in the previous examples.
Example 8, calculating the difference between the means of a set of columns, grouping by the levels of a set of factors
In this example, we will calculate the difference between means for each of the combinations of two levels of a set of factors. A factor may be, for example, the sex, and the function computes the difference of means of a given index between male and female students. Other factors may have more than two levels, and the difference of means will be calculated for each combination of two of these levels.
The function is wght_meandiff, and its code is:
wght_meandiff<-function(sdata,cmean,cfact,wght,brr) {
nc<-0;
for (i in 1:length(cfact)) {
for (j in 1:(length(levels(as.factor(sdata[,cfact[i]])))-1)) {
for(k in (j+1):length(levels(as.factor(sdata[,cfact[i]])))) {
nc <- nc + 1;
}
}
}
mmeans<-matrix(ncol=nc,nrow=2 * length(cmean));
mmeans[,]<-0;
cn<-c();
for (i in 1:length(cfact)) {
for (j in 1:(length(levels(as.factor(sdata[,cfact[i]])))-1)) {
for(k in (j+1):length(levels(as.factor(sdata[,cfact[i]])))) {
cn<-c(cn, paste(names(sdata)[cfact[i]],
levels(as.factor(sdata[,cfact[i]]))[j],
levels(as.factor(sdata[,cfact[i]]))[k], sep="-"));
}
}
}
colnames(mmeans)<-cn;
rn<-c();
for (i in 1:length(cmean)) {
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEANDIFF",sep="-"));
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEANDIFF","SE",sep="-"));
}
rownames(mmeans)<-rn;
ic<-1;
for(f in 1:length(cfact)) {
for (l in 1:(length(levels(as.factor(sdata[,cfact[f]])))-1)) {
for(k in (l+1):length(levels(as.factor(sdata[,cfact[f]])))) {
ir<-1;
rfact1<-sdata[,cfact[f]] ==
levels(as.factor(sdata[,cfact[f]]))[l];
rfact2<-sdata[,cfact[f]] ==
levels(as.factor(sdata[,cfact[f]]))[k];
swght1<-sum(sdata[rfact1,wght]);
swght2<-sum(sdata[rfact2,wght]);
for (i in 1:length(cmean)) {
mmeans[ir,ic]<-(sum(sdata[rfact1,wght] *
sdata[rfact1,cmean[i]])/swght1) -
(sum(sdata[rfact2,wght] *
sdata[rfact2,cmean[i]])/swght2);
for (j in 1:length(brr)) {
sbrr1<-sum(sdata[rfact1,brr[j]]);
sbrr2<-sum(sdata[rfact2,brr[j]]);
mmbrj<-(sum(sdata[rfact1,brr[j]] *
sdata[rfact1,cmean[i]])/sbrr1) -
(sum(sdata[rfact2,brr[j]] *
sdata[rfact2,cmean[i]])/sbrr2);
mmeans[ir+1,ic]<-mmeans[ir+1,ic] + (mmbrj -
mmeans[ir,ic])^2;
}
mmeans[ir+1,ic]<-sqrt((mmeans[ir+1,ic] * 4) / length(brr));
ir<-ir+2;
}
ic<-ic + 1;
}
}
}
return(mmeans);
}
In cmean must be passed a vector with the indexes or the names of the columns for which you want to calculate the mean, while in cfact must be passed a vector with the indexes or the names of the columns with the factors to group data according to their levels.
The result will be a matrix with a column for each difference of means of each combination of two levels of each of the factors, and two rows for each of the fields for which that we want to calculate the average, the first with the differences of means, and the second with their typical errors.
Example 9, calculating the difference of means between countries
This example shows an example of calculating the difference of means of a subset of columns of each of the combinations of two countries in the sample.
The name of the function is wght_meandiffbycnt1 and the code is as follows:
wght_meandiffbycnt1<-function(sdata,cmean,cnt,wght,brr) {
nc<-0;
for (j in 1:(length(levels(as.factor(sdata[,cnt])))-1)) {
for(k in (j+1):length(levels(as.factor(sdata[,cnt])))) {
nc <- nc + 1;
}
}
mmeans<-matrix(ncol=nc,nrow=2 * length(cmean));
mmeans[,]<-0;
cn<-c();
for (j in 1:(length(levels(as.factor(sdata[,cnt])))-1)) {
for(k in (j+1):length(levels(as.factor(sdata[,cnt])))) {
cn<-c(cn, paste(names(sdata)[cnt],
levels(as.factor(sdata[,cnt]))[j],
levels(as.factor(sdata[,cnt]))[k], sep="-"));
}
}
colnames(mmeans)<-cn;
rn<-c();
for (i in 1:length(cmean)) {
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEANDIFF",sep="-"));
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEANDIFF","SE",sep="-"));
}
rownames(mmeans)<-rn;
ic<-1;
for (l in 1:(length(levels(as.factor(sdata[,cnt])))-1)) {
for(k in (l+1):length(levels(as.factor(sdata[,cnt])))) {
ir<-1;
rcnt1<-sdata[,cnt]==levels(as.factor(sdata[,cnt]))[l];
rcnt2<-sdata[,cnt]==levels(as.factor(sdata[,cnt]))[k];
swght1<-sum(sdata[rcnt1,wght]);
swght2<-sum(sdata[rcnt2,wght]);
for (i in 1:length(cmean)) {
mcnt1<-sum(sdata[rcnt1,wght]*sdata[rcnt1,cmean[i]])/swght1;
mcnt2<-sum(sdata[rcnt2,wght]*sdata[rcnt2,cmean[i]])/swght2;
mmeans[ir,ic]<-mcnt1 - mcnt2;
secnt1<-0;
secnt2<-0;
for (j in 1:length(brr)) {
sbrr1<-sum(sdata[rcnt1,brr[j]]);
sbrr2<-sum(sdata[rcnt2,brr[j]]);
mbcnt1<-sum(sdata[rcnt1,brr[j]] *
sdata[rcnt1,cmean[i]])/sbrr1;
mbcnt2<-sum(sdata[rcnt2,brr[j]] *
sdata[rcnt2,cmean[i]])/sbrr2;
secnt1<-secnt1 + (mcnt1 - mbcnt1)^2;
secnt2<-secnt2 + (mcnt2 - mbcnt2)^2;
}
secnt1<-sqrt((secnt1 * 4) / length(brr));
secnt2<-sqrt((secnt2 * 4) / length(brr));
mmeans[ir+1,ic]<-sqrt(secnt1^2 + secnt2^2);
ir<-ir+2;
}
ic<-ic + 1;
}
}
return(mmeans);
}
In the cnt parameter must be passed the index or the name of the column with the country, and in cmean a vector with the indexes or the names of the columns for which we wish to obtain the mean differences. What we get is a matrix with a column with the difference of means of each combination of two countries, and two rows for each of the columns indicated in cmean, the first for the difference of means, and the second for the standard error of the difference.
When the standard error of a difference between two countries is calculated, the procedure is different of that used with data belonging to the same country. In this case, the samples of the two countries are independent, and the standard error of the difference is equal to the square root of the sum of the squared typical errors of the mean of each country. That is how the standard error is calculated in this example.
Example 10, calculating the differences of means in a dataset with multiple countries, grouping by the levels of a set of factors
In this last example, we also have a sample with several countries, but what we will calculate is the difference of means between a set of columns depending on the combination of levels of a set of factors taken two by two, as in example 4.
The function is wght_meandiffbycnt2, and the code is as follows:
wght_meandiffbycnt2<-function(sdata,cmean,cnt,cfact,wght,brr) {
lcntrs<-vector('list',length(levels(as.factor(sdata[,cnt]))));
for (p in 1:length(levels(as.factor(sdata[,cnt])))) {
names(lcntrs)[p]<-levels(as.factor(sdata[,cnt]))[p];
}
nc<-0;
for (i in 1:length(cfact)) {
for (j in 1:(length(levels(as.factor(sdata[,cfact[i]])))-1)) {
for(k in (j+1):length(levels(as.factor(sdata[,cfact[i]])))) {
nc <- nc + 1;
}
}
}
cn<-c();
for (i in 1:length(cfact)) {
for (j in 1:(length(levels(as.factor(sdata[,cfact[i]])))-1)) {
for(k in (j+1):length(levels(as.factor(sdata[,cfact[i]])))) {
cn<-c(cn, paste(names(sdata)[cfact[i]],
levels(as.factor(sdata[,cfact[i]]))[j],
levels(as.factor(sdata[,cfact[i]]))[k], sep="-"));
}
}
}
rn<-c();
for (i in 1:length(cmean)) {
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEANDIFF",sep="-"));
rn<-c(rn, paste(names(sdata)[cmean[i]],"MEANDIFF","SE",sep="-"));
}
for (p in 1:length(levels(as.factor(sdata[,cnt])))) {
mmeans<-matrix(ncol=nc,nrow=2 * length(cmean));
mmeans[,]<-0;
colnames(mmeans)<-cn;
rownames(mmeans)<-rn;
ic<-1;
for(f in 1:length(cfact)) {
for (l in 1:(length(levels(as.factor(sdata[,cfact[f]])))-1)) {
for(k in (l+1):length(levels(as.factor(sdata[,cfact[f]])))) {
ir<-1;
rfact1<-(sdata[,cfact[f]] ==
levels(as.factor(sdata[,cfact[f]]))[l]) &
(sdata[,cnt]==levels(as.factor(sdata[,cnt]))[p]);
rfact2<-(sdata[,cfact[f]] ==
levels(as.factor(sdata[,cfact[f]]))[k]) &
(sdata[,cnt]==levels(as.factor(sdata[,cnt]))[p]);
swght1<-sum(sdata[rfact1,wght]);
swght2<-sum(sdata[rfact2,wght]);
for (i in 1:length(cmean)) {
mmeans[ir,ic]<-(sum(sdata[rfact1,wght] *
sdata[rfact1,cmean[i]])/swght1) -
(sum(sdata[rfact2,wght] *
sdata[rfact2,cmean[i]])/swght2);
for (j in 1:length(brr)) {
sbrr1<-sum(sdata[rfact1,brr[j]]);
sbrr2<-sum(sdata[rfact2,brr[j]]);
mmbrj<-(sum(sdata[rfact1,brr[j]] *
sdata[rfact1,cmean[i]])/sbrr1) -
(sum(sdata[rfact2,brr[j]] *
sdata[rfact2,cmean[i]])/sbrr2);
mmeans[ir+1,ic]<-mmeans[ir+1,ic] + (mmbrj -
mmeans[ir,ic])^2;
}
mmeans[ir+1,ic]<-sqrt((mmeans[ir+1,ic] * 4) /
length(brr));
ir<-ir+2;
}
ic<-ic + 1;
}
}
}
lcntrs[[p]]<-mmeans;
}
return(lcntrs);
}
The result is a list with a matrix for each country. The structure of these matrices is the same as that in example 4.