MultiModalPredictor.extract_embedding#

MultiModalPredictor.extract_embedding(data: Union[DataFrame, dict, list], id_mappings: Optional[Union[Dict[str, Dict], Dict[str, Series]]] = None, return_masks: Optional[bool] = False, as_tensor: Optional[bool] = False, as_pandas: Optional[bool] = False, realtime: Optional[bool] = None, signature: Optional[str] = None)[source]#

Extract features for each sample, i.e., one row in the provided dataframe data.

Parameters
  • data – The data to extract embeddings for. Should contain same column names as training dataset and follow same format (except for the label column).

  • id_mappings – Id-to-content mappings. The contents can be text, image, etc. This is used when data contain the query/response identifiers instead of their contents.

  • return_masks – If true, returns a mask dictionary, whose keys are the same as those in the features dictionary. If a sample has empty input in feature column image_0, the sample will has mask 0 under key image_0.

  • as_tensor – Whether to return a Pytorch tensor.

  • as_pandas – Whether to return the output as a pandas DataFrame (True) or numpy array (False).

  • realtime – Whether to do realtime inference, which is efficient for small data (default None). If not specified, we would infer it on based on the data modalities and sample number.

  • signature – When using matcher, it can be query or response.

Returns

  • Array of embeddings, corresponding to each row in the given data.

  • It will have shape (#samples, D) where the embedding dimension D is determined

  • by the neural network’s architecture.