Experiments¶

Version 1¶

Solution Statement¶

In this experiment, we break snippet extraction into two components:

Identifying sentences and complete phrases from within the text of a review.
Scoring a sentence or phrase according to its "shareability".

The importance of the first step lies in the unstructured writing style of many reviewers. A reviewer could be anyone, and it is not reasonable to expect perfect grammar or sentence structure from an arbitrary person. Therefore, relying on punctuation to demarcate complete thoughts is unsound; we need to be able to parse potentially long sentences in order to uncover complete thoughts contained within. This is not even strictly a problem of poor writing. Often, highly shareable, quotable phrases are not themselves full sentences. The phrase "Mark and John did a great job" could easily come at the end of a longer sentence, but be the only part worth highlighting in a social sharing context.

Given the output of the first model, which is a set of phrases that are considered grammatically shareable, we move to the question of whether the phrase is meaningful to share. Of course, something can be grammatically correct, such as "I had a problem with my gutters", without being something to highlight in an Instagram post.

In the absence of any source of grounded objective truth for the concept of "shareability", our solution involved creating a synthetic dataset in its place. We used that dataset to score phrases via similarity.

Data¶

Without any labeled data for either model, datasets needed to be generated.

For the first model, which identified grammatically complete thoughts, we used the following data generation process:

Gather the text of reviews that meet a minimum length requirement.
Use period punctuation to separate the text into sentences. While slightly noisy, because of the aforementioned unreliability of arbitrary reviewers' grammar, we felt this had a high enough signal to serve as ground truth for "complete thoughts".
Give the sentences positive labels, as these are considered grammatically complete thoughts.
For each sentence, create a "non-sentence" by either:
- Removing words from the end of the sentence.
- Removing words from the beginning of the sentence.
- Adding words from the end of the previous sentence to the beginning of this sentence.
- Adding words from the beginning of the next sentence to the end of this sentence.
Give the non-sentences negative labels, as these are not grammatically complete thoughts.

The result is a dataset which should be able to be used to optimize a model on the binary classification task of taking in a span of text and classifying it as a complete thought. This dataset could essentially be made arbitrarily big, so we chose 80,000 training samples, 20,000 validation samples, and 20,000 testing samples.

For the second model, as mentioned, we needed a source of ground truth for the very subjective label of "shareability" of a text. For this, we hand-crafted a set of "anchor texts", which were short excerpts from reviews which scored highly on a varied set of positive topics. The idea was that reviews which clearly emphasize positive topics likely contain snippets of text that have social sharing value, so they could serve as sufficient ground truth until better data could be collected from clients using the service.

Models¶

Sentence classification used the bert-base-uncased architecture from Hugging Face. It was fine-tuned on the sentence dataset for only a single epoch, due to the size of the dataset.

Shareability scoring used a Sentence Transformer in order to embed the resulting candidate spans, alongside the "anchor" spans. The embeddings were compared using cosine similarity to arrive at scores for each candidate span.

Serving Logic¶

The serving logic has 7 steps:

Use stanza to split each review text into sentences.
If none of the resulting spans meet the minimum length requirement, short-circuit and return no snippets.
Apply the sentence model to all spans, returning a binary label for each span; filter the spans on this label.
Apply the shareability model to all remaining spans, obtaining an embedding for each.
Apply the shareability model to each anchor snippet, obtaining an embedding for each.
Calculate the cosine similarity matrix between the candidate spans and anchor snippets.
Take the max cosine similarity over anchor snippets for each candidate span to obtain its score.
Sort the candidate spans by the score.

This combines the first and second model to convert the text of a review into a list of scored spans.

Potential Future Directions¶

The most fruitful direction for snippet development is to make heavy use of the feedback data from clients. As more feedback data is collected, we can begin to leverage:

Selected snippets, to form a source of ground truth for "shareability", replacing the anchor concept.
Edits by users, to form a source of ground truth for sentence correctness, potentially augmenting the existing dataset.
Social media platforms on which snippets were shared, in order to group the feedback data and potentially identify any differences in optimal sharing language between platforms.
The engagement generated by the social media post associated with a given snippet, as a measure of the affect of the snippet on customers.

All of these things can lead to better datasets, and therefore better models, and a better service.