What machine learning methods would you use under the following conditions?
- Binary classification problem, supervised learning
- One or more continuous features, such as term frequencies
- Label A will most likely have a score between two values. Any lower may suggest label B, but any higher may suggest label B as well
- The optimum range for A is not known but could be learned from the training data
It doesn’t seem like logistic regression would work here because it models cases where every increase in a feature’s score makes the likelihood of category A or B greater, and it finds the features that best fit this pattern. What approach would you considering for this type of problem, and what factors would be most important to your decision? Are there domains where this kind of problem is often worked on? (Maybe authorship attribution in cases where the categories are “same author” and “some other author”?) Thank you for any thoughts you might have.