We're the team that works on Triplebyte's engineering assessments. Ask us anything! [11/18/20 - AMA]

Want to know how Triplebyte’s coding quizzes are created and what our scoring system means? Curious about how we use performance data in job searching/matching? Or are you just looking for some tips on what you need to know to bump yourself into the top percentile in your domain? It’s all on the table: Ask away!

The Triplebyters here to answer your questions:
@ebakan (Head of Machine Learning)
@Larry_Thomas (ML Engineer)
@aburkard (ML Engineer)
@Ian_Eisenberg (ML Engineer)
@ypeng (ML Engineer)
@Rachel_Wolford (Product Manager)

You can find some primer info on Triplebyte’s assessments here: https://triplebyte.com/blog/deep-dive-triplebyte-assessment

1 Like

Hello everyone. I have a couple of questions, and hopefully itś alright to ask them all in one reply. Here goes:

  1. You must handle a lot of data every day. Is there any chance you can tell us about some of the tools or platforms you’ve found make your lives easier?

  2. With respect to the tests and questions, how do you make sure that adding a new question won’t break your scoring model’s accuracy?

  3. Don’t know if you are the right people to ask this, but, is there any kind of variable or characteristic that turned out to be much more important than it would appear for scoring applicants?

1 Like

Hi all! I thought it would be helpful to share some of the common questions that our Talent Support team (with input from the Assessments Squad) answers on a regular basis.

  1. Why don’t you share the exact quiz questions and solutions?

  2. If I take an automated interview, is there a secondary review? How do I trust that it is properly calibrated/assessing my skills accurately?

  3. Why does the quiz have questions in languages that I don’t know?

2 Likes

Hi, thanks for doing this!

I wanted to get everyone’s opinion on what the major differences are between iOS interviews and backend/web app interviews.

Curious about what the problems are comparatively (are they mostly just LEETCode like problems in Swift/OBJC, is it more common to build a small app (or apps)?

Are there certain core APIs that are often asked about?

Are most of them relatively similar or is it pretty mixed what one gets?

Any other insights and comparisons are much appreciated! Mostly interested in what the companies in the 30-2000 employees do, my interview/work experience is in firms smaller than that.

As Reference: I’ve had a few of both types of interviews but more formal ones for backend so I’d like to know more about what iOS interviews have in store.

Thanks again!
Elliott

1 Like

Some questions seem like conventional best practices and the answer is not necessarily objective. How do you combat that and back it up as a fair assessment question?

How does the process of generating new questions look like?

1 Like

I’m an economics graduate student, and while I understand the math and statistics components of data analytics, I’m weaker on data structures and algorithms. Does your scoring system have any concessions for applicants that don’t have a computer science/engineering background? Or, what would be your recommendation for applicants with a profile similar to mine.

2 Likes

As a science teacher, I have come to be familiar with effective learning as a transparent process of what is being taught, how to study, what to study, how to measure learning, and measuring that learning in clear projects that (as in A/B testing) demonstrate clear learning or not. What are you teaching? How are your evaluating? How do we study? What do we study? How are we tested? What is tested? What are the content-maps? The subtext here is that you have a black box with an unverifiable claim of how to interpret what goes in and comes out. Help me to understand.

1 Like

Hi Abhinav -

There are sort of two half-answers to this question.

First off, we try to stick to concepts that are relevant to practical engineering. In some cases, this means we ask things that might seem somewhat unrelated to the topic, like asking about setting up an API endpoint on our ML quiz. That’s obviously not a pure ML task - but it’s important to practical ML engineering because a model usually needs to be accessible by other developers to be useful. By the same token, we do think that things like big-O are relevant: you might not always care if an algorithm is O(n) or O(n^2), but when you do care, it’s pretty important to be able to tell the difference. We’ve written a lot about trying to rely less on credentials, but that doesn’t mean that the material in a CS curriculum is wholly meaningless - just that we shouldn’t penalize someone for learning it some other way. So in short, we do ask about algorithms and data structures, but we try to do so in a way that is accessible for engineers without a traditional academic background.

The other half of the answer is that our scoring system isn’t a strict better <-> worse spectrum. We give individual scores for different areas: one score for algorithms, another for (say) back-end web development, another for (say) practical Linux, and so on. We don’t expect engineers to be skilled in every area, and you don’t have to showcase a score in an area where you scored poorly - our goal is to offer you a chance to demonstrate the areas where you do shine.

In terms of our process, my advice is simple: take a quiz you think reflects your strengths, highlight good scores relevant to the kind of role you want, and tailor how you present yourself (e.g. through your resume) accordingly.

Hope that helps!

1 Like

Hi Aram -

This is a good question, and a legitimate criticism of some of our older test content!

Today, we have a fairly robust process for developing our questions, which was put together by our in-house psychometrics guy (my colleague Larry, who you might have seen in our recent assessment deep dive). We train new question writers on a long list of criteria, which includes avoiding both actual (in the sense of contested opinion) and apparent (in the sense of writing style) subjectivity. (It also includes a number of other points, like keeping a consistent style among correct/incorrect answers, avoiding language that might be unclear to non-native English speakers, and so on.) Moreover, we do validate that questions are meaningfully predictive of downstream outcomes (e.g. interview performance) - questions that aren’t predictive quickly get filtered out of our quiz pool.

We didn’t always have a proper process, though, and we’re actually in the middle of auditing every single past test question to make sure they’re up to our current standards.

You should check out the article I linked above for more details, but the short version is:

  • Develop a broad overview of what we’re trying to test.
  • Refine that overview into subtopics.
  • Find subject-matter experts in each subtopic.
  • Train them on test development best-practices.
  • Have them help us refine subtopics into individual questions.
  • Review their question submissions ourselves to make sure they meet our standards.
2 Likes

I can answer #2 - there’s not really any plausible way for a single question to break our model entirely even if it’s a truly awful question.

Questions get validated against downstream outcomes, so to end up getting any weight to speak of, a question would need to predict e.g. interview performance. The worst case here is a question that is predictive (it tells us something about your likelihood of passing interviews) but not actually evaluating engineering skill.

A classic example would be a college admissions process biased on e.g. socioeconomic status - it is predictive, in the sense that wealthier students have better outcomes, but it doesn’t mean they’re smarter or ‘better’ students. This is a problem for predictive models in general, and a lot of ink has been spilled on the problem of avoiding training AI in a way that reproduces our own human biases and inequalities. (In fact, I think our ML team has been reading some literature on this problem lately.) In any case, we have policies in place for our question development that should help us avoid this problem, at least in its more egregious forms.

Even if we did miss such a question, a single answer just isn’t weighted that strongly in our scoring. The most any one question could do is nudge scores up or down a point, and it usually wouldn’t even do that. Moreover, the further off our estimates are, the more strongly we’d expect other questions to push us back in the correct direction.

That’s not to say our scoring is flawless or anything - only that it’s pretty hard for any single question to do meaningful damage even if it’s really really bad.

2 Likes

Hi Geoffrey –

Great questions about teaching and learning and wanting to understand the “black box” of our testing practice. I completely agree that effective teaching and learning – which testing and evaluation is highly integrated – are best achieved with greater transparency.

Although we are not a learning platform, we do think very carefully about how people learn engineering concepts and what are the knowledge, skills, and abilities required of various engineers. In our test development process, we do our best to map those concepts to test blueprints that we can use for further question specifications and writing.

Since our assessments are not tied to any specific curriculum, set of learning standards, or educational program, our process for test development generally follows as so:

  • Identify the general engineering domain or subdomain we want to produce scores for and make inferences about
  • Survey various educational programs/curricula and job training programs to operationalize them into discrete skills appropriate for an assessment
  • Work with subject-matter experts to help refine those skills even further
  • Have our subject-matter experts distill the topics down to questions that are designed to elicit responses that reflect a specific behavior that is tied to topics that are part of our test blueprint that reflect the knowledge, skills, and abilities required of engineers.

With all that said about the general process, we agree that more transparency is super important and something we are working to improve. We’ve recently posted a blog post to provide more insights into the mechanics of our testing program. I think you can expect to see more information posted about our development process and unpacking that black box.

1 Like
  1. A mix of reasons. One, we don’t want to make our question content too easy to pull in bulk. Two, it’s usually counterproduce to fixate too much on individual questions, especially since some answers counted for much more in terms of a final score than others (e.g. a wrong answer to a very easy question hurts you more than a wrong answer to a hard one). Three, providing detailed solutions for all 1000+ questions in our bank would take a prohibitive amount of time and energy - we’d rather put that energy into developing more content so people can highlight more of their skills.

  2. There’s some manual review to make sure the automated processes work properly, but the degree of human review goes down over time. As far as trust goes - well, that’s why we’re doing an AMA! The automated interview is as accurate as our other assessments (moreso, since it’s longer and uses some of our better questions), so insofar as you trust our quiz to measure accurately, you should trust the automated interview to do so, too. See our deep dive blog post (linked in the OP) or the other answers here for more on how we make sure our assessments are accurate.

  3. We don’t expect every engineer to know every language that appears. But for the most part, we’re not testing language-specific knowledge with these questions (and when we are it’s usually marked). Instead, we’re using common syntax that we expect is understandable even to engineers without detailed familiarity with the language in question. Among other things, the ability to pattern-match unfamiliar code is a good proxy for general comfort with common patterns, and these questions are mostly intended to test general comfort with programming logic.

2 Likes

I can answer #1:

I assume you’re specifically interested in how we process and store data related to the quiz. There are two main issues we face:

  1. We need to support many simultaneous quiz takers, and be able to quickly retrieve their questions/answers. A transactional database (postgres) and cacheing works fine for this.

  2. The bigger challenge is that we need to store quiz answers for all candidates, as well as their results at multiple points in time. As candidates take more assessments, our estimate of their skills will shift. However for ML reasons, we often want to know what that estimate was at a particular point in time (say when they received a call request from a company). Since storage is relatively cheap, the solution here is to write very verbose records to a data warehouse. We use AWS Redshift for the DB and Kinesis to stream data.

Really, the amount of quiz data pales in comparison to other parts of the app (e.g. clickstream data). We also store this data in a Redshift cluster. We use Segment as a nifty platform for connecting many disparate data sources and receivers.

1 Like