MultiModalPredictor.evaluate#

MultiModalPredictor.evaluate(data: Union[DataFrame, dict, list, str], query_data: Optional[list] = None, response_data: Optional[list] = None, id_mappings: Optional[Union[Dict[str, Dict], Dict[str, Series]]] = None, metrics: Optional[Union[str, List[str]]] = None, chunk_size: Optional[int] = 1024, similarity_type: Optional[str] = 'cosine', cutoffs: Optional[List[int]] = [1, 5, 10], label: Optional[str] = None, return_pred: Optional[bool] = False, realtime: Optional[bool] = None, eval_tool: Optional[str] = None)[source]#

Evaluate model on a test dataset.

Parameters

data – A dataframe, containing the same columns as the training data. Or a str, that is a path of the annotation file for detection.
query_data – Query data used for ranking.
response_data – Response data used for ranking.
id_mappings – Id-to-content mappings. The contents can be text, image, etc. This is used when data/query_data/response_data contain the query/response identifiers instead of their contents.
metrics – A list of metric names to report. If None, we only return the score for the stored _eval_metric_name.
chunk_size – Scan the response data by chunk_size each time. Increasing the value increases the speed, but requires more memory.
similarity_type – Use what function (cosine/dot_prod) to score the similarity (default: cosine).
cutoffs – A list of cutoff values to evaluate ranking.
label – The label column name in data. Some tasks, e.g., image<–>text matching, have no label column in training data, but the label column may be still required in evaluation.
return_pred – Whether to return the prediction result of each row.
realtime – Whether to do realtime inference, which is efficient for small data (default None). If not specified, we would infer it on based on the data modalities and sample number.
eval_tool – The eval_tool for object detection. Could be “pycocotools” or “torchmetrics”.

Returns

A dictionary with the metric names and their corresponding scores.
Optionally return a dataframe of prediction results.