Skip to content

api proposal: send images to llm #245104

@justschen

Description

@justschen

TPI: #244511 (1.99)
TPI: #247538 (1.100, for tool calling)

reference issue: #239976

right now, our LanguageModelChatMessage does not accept image parts in the message.

vscode/src/vscode-dts/vscode.d.ts

Lines 19747 to 19789 in a508d75

export class LanguageModelChatMessage {
/**
* Utility to create a new user message.
*
* @param content The content of the message.
* @param name The optional name of a user for the message.
*/
static User(content: string | Array<LanguageModelTextPart | LanguageModelToolResultPart>, name?: string): LanguageModelChatMessage;
/**
* Utility to create a new assistant message.
*
* @param content The content of the message.
* @param name The optional name of a user for the message.
*/
static Assistant(content: string | Array<LanguageModelTextPart | LanguageModelToolCallPart>, name?: string): LanguageModelChatMessage;
/**
* The role of this message.
*/
role: LanguageModelChatMessageRole;
/**
* A string or heterogeneous array of things that a message can contain as content. Some parts may be message-type
* specific for some models.
*/
content: Array<LanguageModelTextPart | LanguageModelToolResultPart | LanguageModelToolCallPart>;
/**
* The optional name of a user for this message.
*/
name: string | undefined;
/**
* Create a new user message.
*
* @param role The role of the message.
* @param content The content of the message.
* @param name The optional name of a user for the message.
*/
constructor(role: LanguageModelChatMessageRole, content: string | Array<LanguageModelTextPart | LanguageModelToolResultPart | LanguageModelToolCallPart>, name?: string);
}

In order to support vision requests to the LLM, we need to support an additional type and format.

I propose:

export class LanguageModelChatMessage2 {
/**
* Utility to create a new user message.
*
* @param content The content of the message.
* @param name The optional name of a user for the message.
*/
static User(content: string | Array<LanguageModelTextPart | LanguageModelToolResultPart2 | LanguageModelDataPart | LanguageModelExtraDataPart>, name?: string): LanguageModelChatMessage2;
/**
* Utility to create a new assistant message.
*
* @param content The content of the message.
* @param name The optional name of a user for the message.
*/
static Assistant(content: string | Array<LanguageModelTextPart | LanguageModelToolCallPart | LanguageModelDataPart | LanguageModelExtraDataPart>, name?: string): LanguageModelChatMessage2;
/**
* The role of this message.
*/
role: LanguageModelChatMessageRole;
/**
* A string or heterogeneous array of things that a message can contain as content. Some parts may be message-type
* specific for some models.
*/
content: Array<LanguageModelTextPart | LanguageModelToolResultPart2 | LanguageModelToolCallPart | LanguageModelDataPart | LanguageModelExtraDataPart>;
/**
* The optional name of a user for this message.
*/
name: string | undefined;
/**
* Create a new user message.
*
* @param role The role of the message.
* @param content The content of the message.
* @param name The optional name of a user for the message.
*/

This adds a new LanguageModelDataPart that can be sent in the content of the LanguageModelChatMessage.

/**
 * A language model response part containing arbitrary data, returned from a {@link LanguageModelChatResponse}.
 */
export class LanguageModelDataPart {
	/**
	 * Factory function to create a `LanguageModelDataPart` for an image.
	 * @param data Binary image data
	 * @param mimeType The MIME type of the image
	 */
	static image(data: Uint8Array, mimeType: ChatImageMimeType): LanguageModelDataPart;

	static json(value: object): LanguageModelDataPart;

	static text(value: string): LanguageModelDataPart;

	/**
	 * The mime type which determines how the data property is interpreted.
	 */
	mimeType: string;

	/**
	 * The data of the part.
	 */
	data: Uint8Array;

	/**
	 * Construct a generic data part with the given content.
	 * @param value The data of the part.
	 */
	constructor(data: Uint8Array, mimeType: string);
}

Example usage:

const messages = [
       vscode.LanguageModelChatMessage2.User([new vscode.LanguageModelDataPart({ 
              data: imageData, 
              mimeType: 'image/png',
       })]),
       vscode.LanguageModelChatMessage2.User('Tell me about this image. Start each setence with "MEOW"'),
];

const chatResponse = await request.model.sendRequest(messages, {}, token);

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions