The Moneyball of Hiring


The business world loves a good sports analogy. And if you want to perform like a leading sports team (or building a corporate culture like one), you should be recruiting like one. So, why don’t we? Because hiring is hard, undervalued in many companies, and worse, we think we’re good at it. But we can do better. You can hire like the Red Sox and Applied can help.

Applied is the Moneyball of hiring - the platform uses behavioural science to assess candidates that saves companies time (and therefore money), surfaces candidates that are too frequently overlooked (due to their name, their face, their education, or experience), and helps hiring managers mitigate their biases. In this article, I’m going to talk through why and how Applied is shaking up the hiring sector in the same way that Billie Beane shook up Baseball at the Oakland A’s.

Moneyball is a book written by Michael Lewis (also famous for Flyboys, The Big Short, and The Undoing Project) about how the Oakland Athletics used statistics to recruit baseball players to have better ROI (return on investment), and set the MLB (Major League Baseball) record winning streak. This led them to almost winning the world series and ultimately changing professional baseball forever. When someone refers to Money outside of the book or film adaptation, theyre normally referring to how to use statistical analysis to uncover better ways to win with reduced costs.

Applied is a tool that debiases the hiring process, saves time, and helps companies hire more ethically and efficiently. Using behavioural science, the Applied Sift is a process where candidates answer scenario based questions where they are then randomised, chunked, and set against a rubric/review guide/scorecard for hiring teams to assess individually. Reviewers can then assess them without knowing their gender, the colour of their skin, or if they went to an ivy league school.

Brad Pitt Shakeup

What is Noisy Data?

In order to fully explain how baseball changed and how hiring is changing, we need to look at noisiness in data. In science and engineering a signal is a function that carries information to or about something. When you flick a switch in your house it sends a signal to the light for the light to turn on (or off). A signal in data, is how useful, accurate, or predictive the data is. Noise is the inaccurate, distracting or irrelevant data. For example, on reddit, there will be messages related to the topic of discussion/post and a lot more messages that are spam, trolls, or divergent to the topic (noisy messages). In any data, there is going be some misleading points that will distract you from finding an accurate answer.

Brad Pitt Shakeup

CVs and resumes as being full of “noise” — lots of signifiers that distract from what the candidate is really trying to signal — their signal being that they have the skills for the job.

In Moneyball, this is seen when the scouts are talking about the players’ stats and whether or not they can “get on base.” Brad Pitt’s character tries to make it clear that the thing that matters in winning games in the next year, is getting players on base. All the rest, is noise — “throwing funny” isn’t a good reason not to recruit someone.

Nate Silver is a statistician who founded fivethirtyeight and wrote a book called Signal and the Noise - a book about the art and science of prediction. Silver also created a statistical model called PECOTA that ranks baseball players based off of more varied and historical baseball stats. He isn’t the Jonah Hill character in Moneyball, but, he does cover it in a chapter about the predictability of recruiting baseball players. I mention this because his book is going to help help tease out this sports analogy and help explain why Applied is the best way to assess candidates.

Adapt or Die

Your Process is Bad, and You Should Feel Bad

What do you look for and how do you assess it? In Silver’s book, he talks about data collected from baseball scouts as being both quantitative (numbers) and qualitative (first hand observation). Like many other statheads, economists, and scientists out there, Silver concludes that in order to give good predictions you have to have both. Meaning, baseball teams can’t just rely on statistical models like PECOTA, they need scouts and recruiters to go out into the field and make observations to make better, holitstic decisions.

Baseball is an incredibly controlled environment: There are only 10 positions to play at one time, there are 9 innings, and there are specific rules to follow. This is vastly different to corporate structures where the rules change depending on a company’s revenue, number of employees, what industry they’re in or what countries they operate in. Not to mention that there are 100s of different roles and jobs with varying levels of seniority and expertise. If a game that controlled hasn’t figured out how to measure potential or create and rely on a predictive model for successful players, how could we expect every HR or People team to do the same for employees?

In baseball, there are traits that aren’t easily tracked that scouts pick up on — which are listed below. We’re going to try and look at both and compare examples of eligibility, sift, and interview questions.

In the book, Silver maps out the intellectual and psychological behaviours that separate good baseball players from great baseball players (if you use Applied, in the app this is called “spread” which I’ll come back to). These skills are qualitative which means that there isn’t an objective/number that you can allocate to a player. ie. it’s not quantitative like a stat in baseball or in professional careers, like a CFA certification, an MD, or a CSCS card.

Silver breaks these qualities down into five groups and I’ve matched them against how they could be processed in Applied:

Applied Skills Tagging

Job Eligibility — PECOTA Ranking

Let’s start with the easy bit — stats. In PECOTA baseball players have ranking stats like number of runs. There really isn’t a right or wrong answer, but the scouts will want to know what these stats are. In the platform, these could be input into the eligibility section of an application. The Applied app is definitely not built for all the stats that are in PECOTA but to keep in line with what leagues have tracked since its beginnings, we could include runs, hits, putouts, assists, and errors. As Silver’s book states, these aren’t leading indicators for whether a player will be good which is why there isn’t a right or wrong answer for these. If you were to make the process simple in Applied, you could just stick with one stat — their PECOTA ranking.

Applied Job Eligibility

Even though PECOTA takes in stats that could be considered “cv like,” Silver’s main approach is correct in that past performance in certain areas, does not predict future performance. “His findings are counterintuitive to most fans. ‘When you try to predict future E.R.A. (Earned Run Averages) with past E.R.A.’s, you’re making a mistake,’ Silver said. He found that the most predictive statistics, by a considerable margin, are a pitcher’s strikeout rate and walk rate. Stats like Home runs allowed, lefty-righty breakdowns and other data tell less about a pitcher’s future.”

All models are wrong, but some of them are useful.

What this illustrates is that data can’t give us everything. In order to have a good dataset, you need to categorise. And when you categorise, you need to simplify. This often eliminates the much needed nuance of the situation. When we simplify, we make shortcuts. And when we have shortcuts, we often have bias. These sorts of models aren’t perfect and they’re not always right which is why they shouldn’t be the be-all end-all for making decisions. We need another step and with Applied, there is “the Sift”. Because you can’t eliminate bias, you have to work around it. The sift structures assessment differently and enables teams to build interventions for the shortcuts that our minds and models make.

Application Questions — Scouts Observations

The Applied Sift is a differnt way of assessing applications by masking the identity of candidates and reorganising their application answers to mitigate hiring managers biases for rank order effect and other many biases. The goal of sift questions is to get candidates to think as though they already have the job and are slotted into a situation where they have to react, plan, solve, decide, prioritize etc. In his book, Silver talks about preparedness and work ethic as qualities that scouts assess against. If we take that into consideration for a baseball player’s Sift questions, a typical question for a scout to observe/assess would be “what is their pre-game routine”? or even better, “your pre-game routine has been interrupted by x. This means you have to change your routine and have to plan a different approach. What are the first steps you take? Who do you need to talk to and is there anything that you need to support you?”

After interviewing many scouts Silver concluded that there were 5 characteristics that they looked for. It’s rather convenient he grouped these under the same five headings in the spider graph above:

  • competitiveness and self-confidence
  • preparedness and work ethic
  • concentration and focus
  • adaptiveness and learning ability
  • stress management and humility

Applied Rose Graph

Regardless if they’re a baseball player, a sales person, or an engineer, these are five qualities that any manager would look for when hiring someone to join their team.

But the thing is these players aren’t going to be filling out or answering sift questions — they’re going to be playing. So why use an assessment like the Applied Sift? Scouts will take notes, observations, and will most likely have a report filled out to take back to their team in order to make a final decision on who they’ll recruit. Before they set out to make their observations, they probably have set of criteria (a rubric or scorecard) to make these notes useful to the rest of the scouts. If they were using the Applied sift, the scout would be the person filling out a candidate’s application with their observations. Once completed, the rest of their team would be the reviewers and rank the observations based on the notes taken by the observational scout/recruiter. It would also be an interesting test to compare scout’s notes based on the rubric/review guide.

How can we know if these are the right questions or observations to be making?

At Applied we use three metrics to assess if the questions in our library are good questions. These are:

  • Maturity = How frequently a question is used — the higher the score the better as it means the statistics behind the question are robust
  • Agreement = Reviewers agree which answers are good, which indicates the question is well designed
  • Spread = This metric indicates if the question separates the field of applicants

Applied Library Questions

Applied Library Question

Coming back to the concept of spread — A lower score for the spread metric means that the question may not separate good applicants from great applicants. I think even when we have a library full of questions with high spread ratings, hiring managers would still get nervous about bringing in less people for interviews and would still want to go forward with interviewing applicants rather than just relying on sift questions. So why are interviews so important in hiring?

Do They Fit This Team?

Interviews are pretty good predictors of whether or not someone will be a good candidate for the role. In Schmidt & Hunter’s study, structured interviews are third from the top in terms of predictability. That being said, I can’t imagine anyone trying to sit down and do hundreds of interviews in one go with every person that applied. It’s why you need some sort of process to sift out candidates and make a shortlist that will save time and surface the best candidates.

But why even do an interview if you have the stats (eligibility) and observations (sift assessment)?

  • you have to factor in who else is on the team and what behaviours or attributes that might be missing.
  • You also can’t ask every single question upfront — it’s time consuming for both candidates and reviewers.

Collaboration, motivation, and values, are all factors in building a really great team. You’re not evaluating/assessing this person in isolation. They have to work with a group of people you already employ/play with. Could this person have an approach that no one else has on the team? After you know if they have the essential skills to do the job (through Sift questions), interviews are a great follow up for further observation, skills testing, and value fit.

Hiring’s Not That Hard, Right?

Hiring is not Hard

How can you not be romantic about baseball?

At the end of the film, Jonah Hill shows Brad Pitt a clip of an unlikely player hitting a home run: “Jeremy’s about to realise that the balls gone 60 ft over the fence. He hit a home run and didn’t even realise it.

Getting the chance to get up to bat is one thing, then hitting one out of the park not thinking you could is another. Many candidates who’ve gone through the Applied platform know this exact feeling — being the underdog, the overlooked, or the misfit — and getting a fair chance not just to be considered, but to even hit a home run. There are so many people who reach out to us to share their stories expressing their joy of applying through the platform, whether they got the job or not.

Regardless if they get the job, Applied made them feel like they hit a home run.