Hate Speech Detection using Machine Learning

Making use of progress in Natural Language Processing to find out when there is online abuse

What is the problem

Onset of internet , has brought everyone from everywhere, on to the same platform. Many interent forums, allow the mask of anonymity while interacting with someone. Which gives an impetus for increase in online hate speech.

How do we know what is abuse and what is normal speech ?

If we had a dataset of examples, and then we try to take a supervised approach to classify these samples as hate speech or normal. We would have problems with data

Enter Semantic Vectors

With the use of word embeddings, which are basically vectors which carry meanings for words (words already seen in training set). We can develop an idea of what is hateful and what is not.

The Model

I took use of this wonderful tutorial by “author citation”, to use the pre-trained bert network. A pre-trained network basically means, that this model has been trained on some previous task before. Usually when we try to “transfer “ these pre-trained models/weights to our problem, we would wish that they have been trained on a task similar enough to our current problem.

So for doing a quick one day test, I picked up a dataset on hatespeech. And trained the final layer of this bert model to detect the labels.

Problems which I observed

The dataset itself was limited in terms of the nature of abuse. Most of the data labelled as offensive was sexual in nature. Or sometimes explicity racist. The model could not pick up on the non-obvious cases of abuse. Or trolling. So this shall be a work in progress.

Link to my code : github_link