How many things have happened since my last post! To name a few:
- I’ve learned (mostly by accident) about a new programming language of exotic name and many strengths: Ceylon. More about it in later posts.
- I’ve started two new courses: Functional Programming Principles in Scala and Computing for Data Analysis. I’ll dedicate two future posts to these courses.
- I’ve finished Building Machine Learning Systems with Python. Yay!
And, in fact, this post is the continuation of my review on this book. Therefore, without much ado, I’ll continue with …
Chapter 5: classification – detecting poor answers
This chapter is a rehashing Chapter 2, but in a much more complex context: classifying answers on Q/A websites (such as Stack Overflow) in good/bad , useful/not useful, etc. The techniques which this chapter describes rely, as expected, on measuring first the usefulness of an answer (not an easy problem). Then the techniques consist in applying two algorithms (nearest neighbour and logistic regression) to train the classifier and improve its performance. More importantly, the chapter introduces the reader into debugging ML systems – activity which requires fine-tuning and ad-hoc tweaking.
Not a bad chapter, but the problem that I see is that it does not achieve the initial goal: the classifier only partially satisfies all the criteria.