Administering and reporting results of standardised achievement tests for students has become a common practice in recent years in many developing countries, as these are widely regarded as indicators of success in schooling and/or acquisition of basic skills or knowledge for adult life. However, accurately measuring the cognitive development and achievement of the children in each cohort and getting meaningful scores may prove to be a complex endeavour. Getting estimates of the cognitive abilities and achievements over time of the Young Lives study children is important as these variables may be considered both outcomes (proxy for the individual's skills) and predictors of later outcomes.
In 2006, pilots of several cognitive development and achievement tests were carried out in each country prior to Round 2 of the Young Lives household and child survey. As a result of these, it was decided to administer the following tests: the Peabody Picture Vocabulary Test (PPVT) and the Cognitive Developmental Assessment (CDA). These were administered to assess children's verbal and quantitative ability respectively in the Younger Cohort (aged between 4.5 and 5.5 years old at the time of Round 2). The PPVT, plus two reading and writing items from Round 1 and a mathematics achievement test, were administered to assess children's verbal and quantitative abilities respectively in the older cohort (aged between 11.5 and 12.5 years old at the time of Round 2).
The main concern in administering and using the results of these tests is that their reliability and validity is established before using the data for research. This is because these tests for the most part were not developed for the specific contexts in which they were used in Young Lives. Hence, the main goal of the analysis presented in this paper is to establish the reliability and validity of each of the tests administered for each cohort within each country. In this process, the items with the best psychometric properties are different across countries. Therefore, we do not carry out international comparisons of results in this paper but suggest that the test results should be used for analysis within countries (even within countries, different language groups rely on different combinations of items to establish their ability). However, the general construct measured by each test is the same across countries. Hence, a comparison of the relationships between achievement and other variables across countries is a possibility.
In order to get indicators of reliability and validity, we used current standards on psychometrics to guide the analyses.4 The psychometric characteristics of each test were estimated through several methods. The reliability analysis was developed to see how consistent the scores are for the children. In other words, the reliability index tells us how accurate and stable the scores are. We used both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to estimate reliability indicators. The validity analysis had the objective of evaluating the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (in this case research). To address this analysis, we estimated the correlation between each test score and some variables such as age and educational level to check if they were supported by previous empirical evidence reported in the literature (i.e. on average, children in higher grades of school should get higher results than children in lower grades or out of school, and parental education should be positively correlated with scores on the tests).
The paper is organised into six sections, including this introduction. The second section presents the framework for the analysis, introducing definitions and theories of validity and reliability. Section three includes a description of each of the cognitive development and achievement instruments used in the YL project. The fourth section presents information on the data and methods used in this paper, and the results on the psychometric characteristics of the tests are presented in section five. Finally, in section six, some considerations about the use of the test scores are presented, as well as recommendations for the assessment of cognitive development and achievement in Round 3 of Young Lives.
Administering and reporting results of standardised achievement tests for students has become a common practice in recent years in many developing countries, as these are widely regarded as indicators of success in schooling and/or acquisition of basic skills or knowledge for adult life. However, accurately measuring the cognitive development and achievement of the children in each cohort and getting meaningful scores may prove to be a complex endeavour. Getting estimates of the cognitive abilities and achievements over time of the Young Lives study children is important as these variables may be considered both outcomes (proxy for the individual's skills) and predictors of later outcomes.
In 2006, pilots of several cognitive development and achievement tests were carried out in each country prior to Round 2 of the Young Lives household and child survey. As a result of these, it was decided to administer the following tests: the Peabody Picture Vocabulary Test (PPVT) and the Cognitive Developmental Assessment (CDA). These were administered to assess children's verbal and quantitative ability respectively in the Younger Cohort (aged between 4.5 and 5.5 years old at the time of Round 2). The PPVT, plus two reading and writing items from Round 1 and a mathematics achievement test, were administered to assess children's verbal and quantitative abilities respectively in the older cohort (aged between 11.5 and 12.5 years old at the time of Round 2).
The main concern in administering and using the results of these tests is that their reliability and validity is established before using the data for research. This is because these tests for the most part were not developed for the specific contexts in which they were used in Young Lives. Hence, the main goal of the analysis presented in this paper is to establish the reliability and validity of each of the tests administered for each cohort within each country. In this process, the items with the best psychometric properties are different across countries. Therefore, we do not carry out international comparisons of results in this paper but suggest that the test results should be used for analysis within countries (even within countries, different language groups rely on different combinations of items to establish their ability). However, the general construct measured by each test is the same across countries. Hence, a comparison of the relationships between achievement and other variables across countries is a possibility.
In order to get indicators of reliability and validity, we used current standards on psychometrics to guide the analyses.4 The psychometric characteristics of each test were estimated through several methods. The reliability analysis was developed to see how consistent the scores are for the children. In other words, the reliability index tells us how accurate and stable the scores are. We used both Classical Test Theory (CTT) and Item Response Theory (IRT) methods to estimate reliability indicators. The validity analysis had the objective of evaluating the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests (in this case research). To address this analysis, we estimated the correlation between each test score and some variables such as age and educational level to check if they were supported by previous empirical evidence reported in the literature (i.e. on average, children in higher grades of school should get higher results than children in lower grades or out of school, and parental education should be positively correlated with scores on the tests).
The paper is organised into six sections, including this introduction. The second section presents the framework for the analysis, introducing definitions and theories of validity and reliability. Section three includes a description of each of the cognitive development and achievement instruments used in the YL project. The fourth section presents information on the data and methods used in this paper, and the results on the psychometric characteristics of the tests are presented in section five. Finally, in section six, some considerations about the use of the test scores are presented, as well as recommendations for the assessment of cognitive development and achievement in Round 3 of Young Lives.