How to evaluate content quality with BERT
I specifically enjoyed the conversation about BERT and its potential for content quality evaluation.
“M 26:40 – .. Is Google using BERT now to better understand now whether content is good?”
“J 27:00 – … It’s not so much to understand the quality of the content but more to understand what is this content about, what is this sentence about, what is this query about …”
Google has repeatedly said that it helps understand natural language better. Content quality assessment like humans do is still fairly complicated for machines to do.
“M 28:54 – … could Google treat that as a negative to say ‘oh this page looks like it was SEO-ed, these keywords are here for Google and make that an actual detriment to the page”
“J 29:41 – … they’re just adding thousands of variations of the same keywords to a page and then our keyword stuffing algorithm might kick in and say well actually this looks like keyword stuffing, and then our keyword stuffing algorithm might kick in …”
On the other hand, keywords stuffing is something that is easier for machines to spot. One way to check that is to see if the text is written in a nonsensical way.
“J 29:41 – … But I guess with regards to BERT one of the things that that could be done because a lot of these algorithms are open-sourced, there’s a lot of documentation and reference material around them, is to try things out and to take some of this SEO text and throw it into one of these algorithms and see does the primary content get pulled out, are the entities able to be recognized properly and it’s not one to one the same to how we would do it because I’m pretty sure our algorithms are based on similar ideas but probably tuned differently but it can give you some insight into is this written in such a way that it’s actually too confusing for a system to understand what it is that they’re writing about.”
This is the part that got me excited. Trying this out is a great idea and precisely what we will do in this article.
Britney Muller from Moz shared a really good idea and Python notebook with the code to test it.
We can use BERT fine tuned on The Corpus of Linguistic Acceptability (CoLA) dataset for single sentence classification.
This model can help us determine which sentences are grammatically correct and which aren’t. It could be used as one of several proxies for content quality.
It is obviously not foolproof, but can get us in the right direction.
Fine tuning BERT on CoLA
Here are the technical steps:
- Fetch a target page and extract the text.
- Split it into sentences.
- Use our model to predict whether each sentence is grammatically correct or not.
- Calculate and report grammatically correct and incorrect sentences
First, let’s build our predictive model.
Copy the notebook to your Google Drive and change the runtime type to GPU.
You should be able to run all the cells (one at a time) and see the evaluation in action.
Building the predictive model
- Download and uncompress the CoLA dataset
- Create the Ludwig model definition with the appropriate settings and hyper parameters
- Run Ludwig to train the model
- Evaluate the model with held back data in the CoLA dataset
You should be able to follow each of these steps in the notebook. I will explain my choices here and some of the nuances needed to make it work.
Another important step is that you need to set up the GPU version of Tensorflow to finish the training quickly.
We accomplish this with the next few lines of code:
!pip install tensorflow-gpu==1.15.3
import tensorflow as tf; print(tf.__version__)
After this, you need to restart the runtime using the menu item: Runtime > Restart runtime.
Run the form again, the line that imports pandas and continue to the step where you need to install Ludwig.
As you can see above, in their visualization, the best combination results in a validation accuracy of 84%.
We added the same parameters to our model definition under the training section.
Next, we can train our BERT model on the CoLA dataset using a single command line.
!ludwig experiment --data_csv cola_dataset.csv --model_definition_file model_definition.yaml
We achieve a validation accuracy of 80%, which is slightly lower than the original notebook, but we put in significantly less effort!
Now, we have a powerful model that can classify sentences as grammatically correct or not.
I added additional code to the notebook to evaluate some test sentences and it found 92 grammatically incorrect out of 516.
As you can see above, the predictions on the grammatically incorrect sentences look pretty accurate.
Converting web pages to sentences to predict their grammatical correctness
Splitting text into sentences using regular expressions seems like a trivial thing to do, but there are many language nuances that make this approach impractical.
As you can see above, the technique works quite well. Now, we just need to feed these sentences to our grammar correctness predictive model.
About The Author
- ^ podcast interview with John Muller (www.mariehaynes.com)
- ^ https://t.co/txUud2be3H (t.co)
- ^ June 4, 2020 (twitter.com)
- ^ The Corpus of Linguistic Acceptability (CoLA) dataset (nyu-mll.github.io)
- ^ Colab notebook (colab.research.google.com)
- ^ Ludwig (ludwig-ai.github.io)
- ^ all the steps (colab.research.google.com)
- ^ CSS selector (www.w3schools.com)
- ^ original notebook (colab.research.google.com)
- ^ that I created (colab.research.google.com)
- ^ Ludwig (ludwig-ai.github.io)
- ^ Google Colab (colab.research.google.com)
- ^ hyper parameters (machinelearningmastery.com)
- ^ weights and biases notebook (app.wandb.ai)
- ^ StackOverflow thread (stackoverflow.com)
- ^ article (searchengineland.com)
- ^ Twitter (twitter.com)
- ^ here (searchengineland.com)
- ^ RankSense (www.ranksense.com)
Powered by WPeMatico