Making use of progress in Natural Language Processing to find out when there is online abuse
Onset of internet , has brought everyone from everywhere, on to the same platform. Many interent forums, allow the mask of anonymity while interacting with someone. Which gives an impetus for increase in online hate speech.
If we had a dataset of examples, and then we try to take a supervised approach to classify these samples as hate speech or normal. We would have problems with data
With the use of word embeddings, which are basically vectors which carry meanings for words (words already seen in training set). We can develop an idea of what is hateful and what is not.
I took use of this wonderful tutorial by “author citation”, to use the pre-trained bert network. A pre-trained network basically means, that this model has been trained on some previous task before. Usually when we try to “transfer “ these pre-trained models/weights to our problem, we would wish that they have been trained on a task similar enough to our current problem.
So for doing a quick one day test, I picked up a dataset on hatespeech. And trained the final layer of this bert model to detect the labels.
The dataset itself was limited in terms of the nature of abuse. Most of the data labelled as offensive was sexual in nature. Or sometimes explicity racist. The model could not pick up on the non-obvious cases of abuse. Or trolling. So this shall be a work in progress.
Link to my code : github_link