I can answer #2 - there’s not really any plausible way for a single question to break our model entirely even if it’s a truly awful question.
Questions get validated against downstream outcomes, so to end up getting any weight to speak of, a question would need to predict e.g. interview performance. The worst case here is a question that is predictive (it tells us something about your likelihood of passing interviews) but not actually evaluating engineering skill.
A classic example would be a college admissions process biased on e.g. socioeconomic status - it is predictive, in the sense that wealthier students have better outcomes, but it doesn’t mean they’re smarter or ‘better’ students. This is a problem for predictive models in general, and a lot of ink has been spilled on the problem of avoiding training AI in a way that reproduces our own human biases and inequalities. (In fact, I think our ML team has been reading some literature on this problem lately.) In any case, we have policies in place for our question development that should help us avoid this problem, at least in its more egregious forms.
Even if we did miss such a question, a single answer just isn’t weighted that strongly in our scoring. The most any one question could do is nudge scores up or down a point, and it usually wouldn’t even do that. Moreover, the further off our estimates are, the more strongly we’d expect other questions to push us back in the correct direction.
That’s not to say our scoring is flawless or anything - only that it’s pretty hard for any single question to do meaningful damage even if it’s really really bad.