Caught, and will attempt to answer @MikeLandau's characteristically incisive questions ...
We use the "bad movie" and "bad feedback" flags as ways to find movies that should be considered for removal. Recently (IIRC), one of our vessel experts went through and looked at user comments on calibration movies as another approach and selected some movies for removal. We have not actually removed those movies yet, but plan too.
The extent to which the bar drops depends on whether your incorrect response was a false negative (missed stall) or false positive (incorrectly identified stall) as well as your recent history of false positives and false negatives. In general, a false negative will probably have a greater impact on your sensitivity bar than a false positive.
I think this question, like the others, is apt! But the logic might be circular - in order for the expert to have a perfect bar, the expert would need only agree with herself, whether right or wrong. The more crowd-generated data we obtain and compare to the laboratory-based annotations, the more we realize how subjective the answers can be, especially for very challenging and possibly indeterminate vessels. There are cases of crowd-expert disagreement we've examined closely where the expert was determined to be correct (by a group of scientists) based on aspects of the vessel movie that would be virtually impossible for a non-expert to consider (for example, if it is impossible to tell looking at the designated vessel itself, consideration of the adjoining flow context can give clues: if one of the vessels that shares a junction with the designated vessel has one end flowing and one end stalled, then the designated vessel must by flowing because the blood has to go somewhere.) Observing this disagreement between the crowd and experts, and even among experts, leads to a few considerations:
1) we need to provide advanced annotation courses to volunteers who have achieved some level of expertise (perhaps as measured by sensitivity).
2) we might want to let our high sensitivity catchers provide a second level of vetting for the tricky vessels
3) we need to reconsider how the crowd-generated data is being used in the lab - not as a final answer on flowing/stalled, but as a way to greatly reduce the set of vessels that need to be examined by experts.
We have certainly considered creating a sensitivity-based leaderboard (or something like that), and will take your comment as further support for that idea!