Service Specification¶

Executive Summary¶

The content completion service is a machine learning service that can be used to automatically complete/extend a piece of given content. For example, when provided with the following input content:

"Mark’s Window Cleaners is the best window cleaning business in the entirety of Ontario."

The content completion service extends the paragraph to:

"Mark’s Window Cleaners is the best window cleaning business in the entirety of Ontario. If you're looking for the best window cleaning business in all of Ontario, look no further than Mark's Window Cleaners. We provide top-notch service at an affordable price, and our customer satisfaction is second to none. Contact us today to schedule a free consultation, and see for yourself why we're the best in the business."

N/A

Value to Growth Platform¶

Content completion is a service that allows the convert team to automatically generate content for a convert site as long as context is given. It is a foundational capability that will unlock new customer benefits in the future in applications such as automated messaging and review replies.

Service Level Agreements (SLAs)¶

Throughput¶

Median requests per day: 4
Variability in Daily Request Distribution: Medium
Variability in Weekly Request Distribution: Medium

Latency¶

Median: 4.6 seconds 95th Percentile: 26.9 seconds
99th Percentile: 56 seconds
Worst Case Latency: 59 seconds

Schema¶

A POST request must be submitted to the API. The defined schema of this service is:

-Instance
    - ID: prompt
      Description: Input text for the generation.
      Data Type: String
      Runtime Restrictions:
        - Must be greater than 10 characters

-Parameters
    - ID: seed
      Description: Set the random seed for generation.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: num_return_sequences
      Description: How many outputs are required.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: num_sentences
      Description: How many sentences in one output (only for local model).
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: max_length
      Description: Set the maximum length of the generated text.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: min_length
      Description: Set the minimum length of the generated tex.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: num_beams_gpt2
      Description: Number of beams, more is better but it will slow the generation.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: num_beam_groups_gpt2
      Description: When do_sample is False and needs to be divisible by num_beams.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: no_repeat_ngram_size_gpt2
      Description: How many times a 2-gram (from default) can be repeated.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: do_sample_gpt2
      Description: If turned on, use top-p and top-k.
      Data Type: Boolean
      Runtime Restrictions:
        - None
    - ID: top_k_gpt2
      Description: Probability mass is redistributed for top-k tokens for generation. See https://huggingface.co/blog/decision-transformers.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: top_p_gpt2
      Description: Picks the words exceeding p probability.
      Data Type: Float
      Runtime Restrictions:
        - None
    - ID: num_beams_gpt3
      Description: More beams are better but it will increase cost for GPT3.
      Data Type: Integer
      Runtime Restrictions:
        - None
    - ID: temperature
      Description: Lower values will make the outputs deterministic and repetitive.
      Data Type: Float
      Runtime Restrictions:
        - Must be between 0 and 1, inclusive.
    - ID: gpt3_selection_probability
      Description: Probability of calling gpt3 api in comparison to gpt2.
      Data Type: Float
      Runtime Restrictions:
        - Must be between 0 and 1, inclusive.
    - ID: repetition_penalty_gpt3
      Description: Penalize words that were already generated or belong to the context. https://beta.openai.com/docs/api-reference/parameter-details.
      Data Type: Float
      Runtime Restrictions:
        - Must be between -2 and 2, exclusive.
    - ID: presence_penalty_gpt3
      Description: Penalize new tokens if they are already in the text. https://beta.openai.com/docs/api-reference/parameter-details".
      Data Type: Float
      Runtime Restrictions:
        - Must be between -2 and 2, exclusive.
    - ID: need_sentiment
      Description: Individual sentiment scores for generated content.
      Data Type: Boolean
      Runtime Restrictions:
        - None
    - ID: gpt3_model
      Description: Select from text-davinci-002, text-curie-001, text-ada-001.
      Data Type: String
      Runtime Restrictions:
        - None
    - ID: user_name
      Description: Email address or name of the api caller to uniquely identify the request.
      Data Type: String
      Runtime Restrictions:
        - None

-Prediction
    - ID: generated_text
      Description: Generated Text.
      Data Type: String
      Runtime Restrictions:
        - None
    - ID: prediction_id
      Description: A unique identifier for each prediction instance used for logging. The value maps to the table in BigQuery and can be ignored by users.
      Data Type: String
      Runtime Restrictions:
        - None
    - ID: sentiment
      Description: Positive neutral or negative sentiment.
      Data Type: Float
      Runtime Restrictions:
        - None

Feedback Mechanisms¶

The main feedback mechanism is direct communication with the end users. Currently, this is limited to the convert team members. Through understanding their difficulties when using the content completion service, we can understand specific improvements that need to be made.

In addition, the response data is collected whenever a request is made to the service, as well as changes that convert members made to the text. This data can be used to fine tune either of the two models in the future.