Recently I was very fortunate to be amongst the UKFEchat delegation who went to OfSTED to meet with Lorna Fitzjohn, Director of Further Education and Skills. Whilst on the one hand, we had a very civil and reasonable discussion about a whole range of issues - e.g. whether lesson plans are required (no, but evidence of planning is) to the importance of destinations over and above success rates, to me at any rate, I don't think we got to the heart of the matter, which is for me the clear gap between he highest levels of the 'FE blob' and the best available current evidence.. A good example of this occurred in September 2014 when former Department for Business, Innovation and Skills chief Dr Susan Pember (2014), ex-head of FE and skills investment and performance at BIS, and governance adviser at the Association of Colleges (AoC), said at a seminar including Governors and Principals “You as individuals (governors), you have all been through the school system. I would trust your instinct. If it’s boring to you and you’ve just been in 10 of them [lessons] that look really boring, trust your instincts and think, would you sign up to that course? Be a bit braver about this.”
Yet the evidence suggests advice is incorrect. Strong et al found that the correlation coefficient for untrained observers agreeing on a lesson observation grade was 0.24. As for accuracy of grading, Strong et al found that the probability of each observer’s grade being consistent with other evidence was less than 50%. The Measuring Effective Teaching Project (2013) sponsored by the Gates Foundation and is the largest study in the effective use of lesson observations, provides clear guidance for the need of rigorous training prior to lesson observation takes place. Finally, O’Leary’s (2014) recent study of the use lesson observation within the further education sector, and which is largest study of lesson observations in the English education system, indicated much lesson observation practice was detrimental and counter-productive to the development of lecturers’ professional practice.
Secondly, the evidence on lesson observations has broader implications for both the reliability and validity of inspection judgements, in particular using the aspect grade for quality of teaching, learning and assessment as a limiting factor for the overall grade for effectiveness. Using a similar approach as undertaken by Waldegrave and Simons (2014) to analyse the relationship between grades awarded for school inspections, the following table summarises the relationship between the different inspection grades awarded during 125 general further education college inspections and which took place between January 2013 and June 2014.
It can be seen from the data that the teaching, learning and assessment aspect grade corresponds most strongly with the overall grade for effectiveness, which is not surprising given the guidance in the 'Inspection' handbook. Out of 125 GFE college inspections undertaken in the specified 18 month period, there was only 1 occasion when there was a difference between the two grades, and on this occasion the overall grade for effectiveness was lower than the grade for teaching, learning and assessment. .
However, the direct relationship between the grade for overall effectiveness and the quality of teaching, learning and assessment is not without its problems. In the further education sector in which, unlike schools, individual lesson grades are still being used by OfSTED inspectors to summarise judgments about the quality of teaching, learning and assessment within a lesson. Both Matt O’ Leary and Rob Coe identify the serious challenges with the use of observations in the grading of teaching and learning. Waldegrave and Simons (2014) cite Coe’s synthesis of a number of research studies, which raises serious questions about the validity and reliability of lesson observation grades. When considering value added progress made by students and a lesson observation grade (validity) , Coe states that in the best case there will be only 49% agreement between the two grades and in the worst case there will be only 37% agreement. As for the reliability of grades Coe’s synthesis suggests that in the best case there will be 61% agreement and in the worst care only 45% agreement between two observers.
As such, it would seem that using the teaching, learning and assessment grade as the driver for the grade for overall effectiveness is not consistent with the current best available evidence, and indicates that systems of accountability within the further education sector have yet to be fully informed by what we know about school-effectiveness and improvement. There are also implications for the consistency of the process of making judgment in the different sectors now covered by the CIF, especially given the importance given to data in the school system where there is a direct relationship between the grade of outcomes and the grade for overall effectiveness. In other words, we may now have an emerging CIF across the sectors, but judgements between these areas may well be inconsistent as they reflect possibly contradictory stances in each sector's inspection handbook. Whilst on the one hand, we want inspectors to exercise their judgement when making grading decisions, on the other hand it is reasonable to expect intellectual consistency between the sectors.
I am sure this is the start of an interesting and exciting dialogue between UKFEchat and OfSTED which will benefit the sector as a whole.