Experiments¶

Version 1¶

Solution Statement¶

Content completion is a challenge that falls within the natural language processing domain of machine learning. In Paystone's use case, a content completion model should enable a piece of content (text) to be automatically completed (extended) by a machine learning service in a manner that sounds "natural" and does not stray from the context of the original piece of content. Since models like this can be very expensive to build from scratch, and since content completion models are usually generic enough that they can be put to use to many different specific applications, there are many third-party and proprietary models that have already been built and can be used for this problem. The content completion service uses the open source GPT-2 language model, as well as the newer and better proprietary GPT-3 model by OpenAI to complete content.

Data¶

Since the models used in this service are third-party models, Paystone data was not used in the creation of this service.

Models¶

As mentioned above, this service makes use of two third-party models to complete content: open source GPT-2 and proprietary GPT-3. GPT-2 and GPT-3 have both been developed by OpenAI, however, GPT-2 is the older version and has been made open-source by the company, whereas GPT-3 is still proprietary. The implementation of these two models in the service are based on a probabilistic method to determine which model should be called when the end user calls the service. The reason behind this is a balance of performance to cost. GPT-3 is a much better performing model than GPT-2, however since it is proprietary, each call to it costs more than a call to GPT-2, which Paystone can host on its own. As such, a probabilistic method is used to determine which model the content completion service should call, with GPT-3 being called 90% of the time by default, in an attempt to keep the average quality of the content generation high while also retaining data about the generation made by GPT-2 so that its relative perfofmance can be measured, and the model can be improved in the future so that it can be used more and minimize costs. The probability of choosing GPT-3 can be modified by the end user.

For a technical understanding of how these models work please use the following resources: * The huggingface face model card for GPT-2 * The OpenAI GPT-3 API introduction

Serving Logic¶

The serving logic is composed of the following main steps: 1. Clean the prompt (input content) 2. Determine model to call 3. Call appropriate model to complete the prompt

For step one, the text prompt that was used to call the service is cleaned of escape characters.

For step two, the appropriate model to be called is determined based on the defined probability input parameter (default 90%).

For the final step, the appropriate model is called, using the text prompt, as well as other user defined parameters, as inputs. The content will be completed by the model and returned to the user as a response.

Results¶

Since this service used third-party models, there are no metrics that can be used to evaluate the service models. Evaluation is performed through continuous usage and feedback by users.

Potential Future Directions¶

Fine-tuning GPT-2: As mentioned, one of the goals of using GPT-2 is to allow for improvement in the future. This would be done through a process called fine-tuning, which would allow us to essentially "fit" the model to our current application cases, hopefully boosting the performance of the model, allowing us to lower GPT-3 usage costs.
Fine-tuning GPT-3: Fine-tuning is not unique to GPT-2 and can also be performed on GPT-3. However, instead of reducing costs this would increase costs. The benefit to this would simply be a performance boost and is worth exploring to improve completed content quality.