31 Oct 2022 · 16 min read

The Most Effective Skills Assessments are not Skills Assessments

The most effective way to measure skills should have the highest predictive validity towards individual performance or at minimum task completion. In contrast with popular opinion, direct skills assessments or skills tests are not the most effective method for assessing skills because of their inherent limitations. Before we can dive into those limitations we first have to review the very concept of learning new things and the art of being “good” at them.

What does skill development look like?

Everyone reading this - at some point in their lives - has learned a new skill. For those who can remember learning something new, there is a simple visualization exercise that will give some structure to the complexity of learning. Try drawing the personal journey you have had when learning your latest skill (for example a language or instrument).

Take a piece of paper - or use your mind - and draw/imagine a graph with on the vertical (y) axis ‘expertise’ and on the horizontal (x) axis ‘time’. Now draw the line that represents the first moment of engaging in this new skill until the level of expertise that you have reached within this skill. Imagine the phases of learning you have gone through. What happened at the very beginning and what’s happening closer to your current level? What does your graph look like? How does the line move from left to right and up and down the graph?

We’re willing to bet that - for practically all of you (there are always some people with innate natural abilities) - this graph looks a whole lot like a very big S or inverted Z.

What is not so clear - and what tends to vary radically from each and every one of you but also from skill to skill - is the practical implications on the top of your personal curve. Your “expertise” levels. Let’s take the example of running. For some of you ‘distance’ is the epitome of expertise, for others ‘speed’, and for others, it can be ‘frequency’. Some may not agree with the significance of one or more of these achievements, whilst others find them all equally impressive. Either way, we all tend to agree that Olympic medals are pretty good indicators of expertise, and for almost all of us, these epitomes are unachievable.

An interesting observation presents itself when looking at how long it takes for a person to reach a specific point or how close they get to publicly agreed-upon definitions of expertise (such as Olympic medals). Here, individuals will have radically different experiences. Some take weeks, others years, and a few will never move up the curve at all. Moreover, not everyone will be able to make it to the top of the curve or even past the inflection point. This observation has everything to do with one’s propensity for success in a specific skill (someone’s natural inclination).

DJI8RXOQwloi7JybWTICeXW3XGkMsuyu46Hq9UiH1aueRIFhHl_RDTrUjS6TL0h3xkPXw62Gs9AAxRTUA6-lcAUs-eHNmy4c6a2xhUY7fJS1m4UXOmgCxjQ13xAyiEfgD-5DhferC_9beTsefXmdGRGjJRrvkz81VaZ283L_Veyim1I6_F0lFJGSfg

S-Curves and propensity for success

As mentioned in the previous paragraph, people have wildly different experiences when climbing up the s-curve. Some take more time and some take far less time than others. Moreover, everyone has a completely different height or peak (can reach different maximum points).

When referring to someone’s ability to learn a skill, Talent Data Labs defines two important aspects of learning: “Velocity” (speed) and “Range” (height). If you look at the visualization above you can see how a normal s-curve for an Olympic runner develops itself. In reality, each and every curve is different and can have longer or shorter tails (flat parts) and steeper or flatter slopes. On top of that most people’s top of the s-curve isn’t even close to that of an Olympian. Both the Velocity and the Range attributes are very important and useful for different skills and practical settings. The next section will go over the attributes in more detail.

Velocity

Velocity or speed governs the time it takes for a person to move up on their own respective s-curve. High velocity means that the person reaches the top of their s-curve relatively fast. This is important when you want someone to learn something quickly, for example learning a language when moving into a new country. Velocity is less important when you talk about hobbies or non-time-dependent activities. In professional settings, velocity is very important when replacing someone internally through educating or re-skilling. Velocity is typically measured as the amount of time someone takes to reach the inflection point of their personal s-curve but has nothing to do with the range of their s-curve (proximity to Olympic medals).

Range

Range or height defines how far someone can go into a skill before reaching their personal peak. The absolute peak is usually the world record of something. The range is most important when you are doing tasks that require high specialization or innovation. The range is not very important in things like sports because you can always compete with someone within your own range (usually named level or rating). In professional settings, the range is typically measured as the complexity level of tasks being handled by an individual. Some argue that, given enough time, everyone should be able to reach the epitome of performance. Talent Data Labs believes that a really big part of this is defined by someone’s innate abilities and that not everyone can become an Olympian.

Range in combination with Velocity becomes critical in urgent high-skill scenarios. Some problems you want to solve quickly and properly, such as discovering a cure for an unknown virus. In contrast, learning an instrument to play at home, requires neither range nor velocity to enjoy. The relevance of range and velocity all depend on your goals and their urgency. In professional settings, you lose competitiveness when you have low-range and low-velocity learners. Which makes it imperative to measure or understand these concepts in hiring, re-education, and lateral movements. The next section shall explore how we tend to measure skills.

Measuring skills

As established before, the most important aspects of the propensity of an individual to learn a new skill are Velocity and Range. Traditionally - but actually also presently - hiring managers try to figure out if someone should do a task or navigate a skill by doing a skills test. Skills tests are typically short samples of an individual doing a generic task within a skill that’s easy to measure and compare.

Often these tests are multiple-choice-based questions asking about the terminology or use of certain aspects of these skills in a specific environment. The Linkedin skills tests are a great public sample of these. If you haven’t tried them yet, kindly go here and do one. Alternatively, below you can see the sample on their Python Programming test.

LllaqN7P3JPum8147StnDTc01Gdb2eFw6XjpVW-VrmVuRmRXTG4MPcoOLgdVHjjQvLmyyWpTdfoks3KWwVwbwlWSHPT0Q33S2zgt5VEY9fJH9NMz68k2k_IP4GiH2Lyj7uyrf1R_S04GWONOZ9uQ2QHw_63Db2HC_IetpRKgPShI7rThWQs02RTQow

Imagine asking all your python developers if they can figure this function out. To which extent do you think this assessment is a good method to differentiate between average, good, and great developers? Who will have the most room to grow?

Limitations of traditional skills testing

As you can see above, skills testing is quite a basic (low-resolution) tool. This causes a few limitations. We also have some more academic issues with these types of assessments in general - as they only show whether someone is above the initial phase of learning (is on the starting slope of the s-curve). This implies that the score on a skills test is unlikely to predict the current height (level) of the expertise index of an individual. In other words, there is limited proof available for a correlation between skill testing results and expertise levels. Because, outside of academics, concepts such as “expertise” and “predictive validity” are very hard to measure and explain this article will focus mostly on practical issues and limitations.

Some of the most practical issues in skills testing relate to the following:
1) Firstly, the inability to measure Velocity and Range, the two most essential attributes for professional settings and solving urgent practical problems.
2) Secondly, with the current market of skills shortages, finding enough people with a skill is untenable, and re-educating others would be more feasible. Direct skills testing is not an effective tool to identify whose time is best invested in re-education.
3) Thirdly, everything that can be learned, can be learned. Skills can be learned so direct skills data does not age well. Furthermore, tools have a tendency to change over time.
4) Lastly, similarity in skills allows for the efficient transfer of knowledge (better running will make you a better expert cyclist through muscle efficiency). Testing all skills that are related (a skill family) is not practically feasible.

The combination of practical and academic limitations in skills testing gives us a good reason to look for alternative methods of achieving results. Essentially skills testing is trying to solve a problem in the market and that problem relates to people performing poorly in tasks and roles. In other words, a good result from skills testing would be a good performance in tasks within roles. Let’s examine which methods have proven to be best at measuring performance in tasks and in roles.

The best method of skills testing according to science

According to decades worth of research (by Schmidt & Hunter), the best method for skills testing is in fact having someone actually work with that skill on a real problem in a real environment. This way the exact relevant range of the s-curve is exposed and we can simply measure performance on the task in a real setting.

For all practical intents and purposes, sampling real work usually isn’t very feasible and employers rarely have the timeline or the patience to wait for such an in-depth review. Besides that, it’s very costly to do so with multiple resources simultaneously as they will need to be paid. We won’t even mention the legal negotiations and limitations here.

So, let’s look at a more feasible alternative. The next section will propose an alternative that measures performance in tasks and roles but also removes some of the practical limitations in terms of re-skilling or re-education recommendation and analysis.

The best skills testing alternative

As established above, traditional skill testing essentially verifies if someone has done something before (whether they have gotten onto the s-curve or not). In other words, if someone has any significant experience in something. The first part of this segment will examine breaking even on discovering whether someone has any significant experience in a skill. The second part will explore predicting performance and expertise.

A user-friendly alternative for skill testing

To break even on skills testing, the software would need to return whether someone has significant experience in something. As seen above, skills testing uses extensive surveys and Q&A to establish whether someone understands the jargon and technical requirements of a skill. This would be much easier accomplished and verified by looking at past experiences in using a skill. Typically an interview is a good format to validate past experiences and quite user-friendly, but not a great pre-assessment tool as it’s time-consuming and costly.

To also break even on time and costs we need to rely on software and self-assessment. In self-assessment, it’s feasible to understand experience in skills by mapping the intersection of someone’s activities (past experiences) and the skills required for those activities. The requirements to do so are an understanding of which skills are used for what jobs at what companies at which intensity. Our team has built a neural network that not only understands which skills are most relevant to each job and environment but also which skills are most similar (in over 300 million different profiles).

Thanks to this neural network it is possible to understand similarity both in functionality and in practice. A major feature of this software is the automatic understanding of whether people are demonstrably on the s-curve in up to 99% of the cases. The connections in the neural network allow users to see an estimate of whether people with no demonstrable experience in the skill will be on the s-curve with quite some reliability as well (~80%). Meaning the software can cover transferable and similar skills. Furthermore, the software can understand how easy it would be to learn related and similar skills for an individual.

That being said, these features have reliably solved the question if users are on the s-curve or not. This means that - with access to simple CV data or social profiles - anyone could infer the significant skills of any user. This is on par with modern skill-testing tools. Meaning you can just as reliably infer skills from available data as measure it through testing. But let’s not stop there and see how we can do even better.

The best method for skill testing

As established, breaking even on skills tests can be done automatically without wasting the time of either a user or manager. To do even better we need to go back to the requirements for good skills testing. The requirement for good skills testing is understanding the possible level of expertise and the time it takes to attain that level. These traits also need to predict performance. First, we’ll examine an off-the-shelf approach, and then we’ll look at doing it yourself.

Focusing on the key attributes (Range and Velocity) is difficult. Luckily, as aforementioned, the outcome of good skill assessment scores is clear. Good skill assessments measure performance. An overwhelming number of psychologists and other researchers have reported on countless studies that investigate performance. Schmidt and Hunter famously look at decades worth of this research. According to the vast domain of research, there are some strong correlations between measurements and performance. Performance tends to correlate fantastically with General Mental Aptitude (GMA) (r = .55). Assessing Mental Aptitude is done with Logical testing tools and doesn’t require much more effort to establish a baseline.

vuzwvel_ybU2vR-Y3GW73KOGmAkZiIFIu3iLoIEfTF90jFijOldF1We1inKMCt_3pmIvdegi42aUijfJdtUpIzY-g3atFaZ0e16BMCucH1s7wvNOxIRvRnA0OO7Bwzdzr4xtSKkxtg2MLIumNUBIyPntsLgPQ6a6WDayHJkyg03HAqDox9UZaHFJxQ

Increasing that predictive baseline is not very hard either. If you measure Personality on top of GMA you can easily achieve a multiple R of 0.66. Meaning there is a strong positive relationship between Personality + GMA and someone’s performance. Given that Personality and GMA give us a strong understanding of general performance and the impact that should have on general expertise levels, there’s only one ingredient missing. Time spent learning or practicing.

The final ingredient needed to infer the full Range and Velocity someone has in a skill is the experience of someone in the specific skill (exposure). This can be done automatically with our CV parsing software in combination with the neural network. We can define learning speed + range and roughly estimate total exposure time. By defining a rank order on our large database of people we can create a norm and give a total score on exposure weighted by the Range and Velocity propensity but we can also show these separately.

Doing this yourself is straightforward, albeit slightly laborious, and starts by gathering a couple of hundred resumes of people using a skill in a job and looking at how long they have been practicing that skill. If you list all the people and the skills they have been using you can start marking an index of time towards skill in all different roles. Based on the intersection between time spent in a role, and performance over time one can analyze the generalized learning curve towards these skills and the distance from it in different users. Important metrics here can include average time to performance or average time to churn. If measured well, the performance data should also demonstrate true experts. This allows for labels such as high velocity and high range employees within a skill. You’ll have to assign these labels to your users through a script or manually.

The second step is to do some psychometric tests mapping towards those roles and jobs and to create a similar rank order algorithm as mentioned above based on the high velocity and high range labels. This in turn gives a good index of which psychographic profiles predict high velocity and high range. You can then project these two measures on society at large by doing these psychographic assessments and parsing their cv-based exposure data.

The strongest predictors of success between people and their skills can then be investigated through some common statistical tests and analyses. Investigate how the interactions between psychographic profiles, skill range, and velocity develop over time as well as the relative exposure to the skill within the role (experience). This should define the weights between exposure, psychographics, and skills. Now implement those weighted rankings in a second-order ranking algorithm. This algorithm should order all the people you have analyzed in a nice list to create a fully personalized skills norm. The norm should map exposure and psychometrics to a relative skill score. Repeat this process for all important skills. Now you can use proximity to these psychographic profiles and skill exposure as a scoring method for predicting performance. Feel free to contact the Talent Data Labs team to collaborate on these studies with you.

Zsno53u1zEgrw8msUT5dgIpgH4ymMdTf2UUaVqWSbbz4F4BwVw8lbh1YaE5neFyml4UW-aWVDoBMunCxdZvqT_MSOiUimblKKjf2JTyEnwn1LCUgktexA5SipGUKB0EfoUHPHl9R0gJKdFJHb0J7HR8tgr8j5Dp5NogJ2wNc8fmEHUG90Adcx00piQ

Essentially, this scoring method is superior not only because it is more relevant and practically applicable it also allows you to use it on people that do not have the skill yet. Imagine a world where teaching resources on skills are exclusively spent on users that will most likely be best at that specific skill. The world will save so many resources, hairs pulled from educators, and people burning out.

To close off, I’ll leave you with a question:
What would it mean to your organization if you could reliably skill-test someone before you teach them that skill?

TL;DR

Skill measurement is broken. We have come to think of skills as simple traits of binary nature, meaning they are either “on” or “off” in an individual. If you look closely at yourself or your friends you will realize that skills are so much more complex than that.

The most important traits of skills are actually how good you can be at them and how fast you can become good (Range and Velocity). More importantly, because of skills shortages, we shouldn’t even care if people are already good or bad at them, we should try to discover if people can potentially become good at them.

To do skills measurement better we need to look at the intersection between the propensity to become good and exposure to that skill (experience). Commercial tools and organizations should start looking into best practices such as creating skills norms and building relative scores for tasks and jobs. Direct skills testing is very inefficient and has practical limitations. Skills inference and psychographic assessments are much better tools to predict an individual’s performance and propensity to become good.

See all posts