Experiments¶

Version 1¶

Solution Statement¶

To assign topic probabilities to review content, we use a Zero-Shot Classifier with an ANLI scheme on review content. Each topic under consideration is manually assigned a query phrase, and the pre-trained model is employed to extract an entailment score based on the topic query. The post-processed entailment score is a measure of how likely the review content is to be related to the topic query. Scores are then passed back to the client for display.

Data¶

As the model employed is a zero-shot classifier, no training data was used. The model was pre-trained on several datasets which are described in the models section.

Models¶

The pre-trained model selected is a huggingface implementation of the RoBERTa-Large NLI model. The model is pre-trained on a combination of popular NLI datasets, including SNLI, MNLI, FEVER-NLI, ANLI (R1, R2, R3). For information on the model, please reference the original paper here. For an understanding on adversarial NLI and how the model was trained, please reference the following resources: * ANLIzing the Adversarial Natural Language Inference Dataset * Adversarial NLI: A New Benchmark for Natural Language Understanding

Serving Logic¶

When a request is made to the endpoint, the review content is processed as follows for each instance: 1. If the review is longer 5 sentences, it is broken up into multiple batches of no greater than 5 sentences. Experimentation revealed that the accuracy of zero-shot classification diminished significantly as the length of the review increased, and so this countermeasure was employed. 2. If a topic list is provided via the Parameter schema, topics are refined to only include topics present in that list. If no topic list is provided, all topics are used. 3. The review content is tokenized and encoded using the RoBERTa tokenizer. 4. The review content is then passed through the pre-trained RoBERTa model to generate the entailment scores for each topic. * If the review rating is 4 or 5, only entailment scores for positive topics are returned. The score for negative topics are set to -1. * If the review rating is 3 or fewer, only entailment scores for negative topics are returned. The score for positive topics are set to -1. 5. The entailment scores are post-processed to generate the topic probabilities for each review. 6. The topic probabilities are returned in the response.

Results¶

N/A

Potential Future Directions¶

Topic Modeling: Unsupervised topic modeling could be used to generate or validate topics for the reviews. Topics which better represent the content of the reviews may produce better results.

Supervised Classification: A supervised classification model could be trained (or a pre-trained model could be fine-tuned) on manually labeled data to generate topic probabilities for reviews. It is likely that this would improve performance, but has some immediate drawbacks. This could potentially require a large amount of labeled data, and may require a significant amount of time to train the model depending on methodology. Steps could be taken to mitigate these drawbacks, such as using an external labeling service.

Query Phrase Tuning: Query phrases are manually selected for each topic. These phrases could be altered to improve performance through manual tuning or through a search algorithm.