Labeler

Labeler (class)

Base class for labeling operations using an LLM. Parameters (constructor)

llm_model (str): Name of the LLM model to use (e.g., "openai::gpt-4o-mini").
unknown_label (str, optional): Label for unclear classifications (default: "unclear").
temperature (float, optional): Temperature for the LLM model (default: 0).

Key Methods

_deserialize_label_responses(responses): Deserialize LLM responses into a DataFrame.
_run_labeling_prompts(prompts, system_prompt, max_workers=100): Run prompts concurrently and collect LLM responses.

Example

from bigdata_research_tools.labeler import Labeler

labeler = Labeler(llm_model="openai::gpt-4o-mini")
# Use labeler._run_labeling_prompts(...) and labeler._deserialize_label_responses(...)

get_prompts_for_labeler

Generate a list of user messages for each text to be labelled by the labeling system. Parameters

texts (List[str]): Texts to get the labels from.
textsconfig (Optional[List[Dict]], optional): Optional fields for the prompts in addition to the text.

Returns

List[str]: List of prompts for the labeling system.

Example

from bigdata_research_tools.labeler import get_prompts_for_labeler

texts = ["Chunk 0 text here", "Chunk 1 text here"]
prompts = get_prompts_for_labeler(texts)

parse_labeling_response

Parse the response from the LLM model used for labeling. Parameters

response (str): The response from the LLM model as a raw string.

Returns

dict: Parsed dictionary with keys:
- motivation
- label

Example

from bigdata_research_tools.labeler import parse_labeling_response

response = '{"motivation": "Relevant to AI", "label": "AI"}'
parsed = parse_labeling_response(response)
print(parsed["label"])

NarrativeLabeler (class)

LLM-powered labeler for narrative labeling. Parameters (constructor)

llm_model (str): Name of the LLM model to use.
label_prompt (str, optional): Custom prompt for labeling (default: uses system prompt).
unknown_label (str, optional): Label for unclear classifications (default: "unclear").
temperature (float, optional): Temperature for the LLM model (default: 0).

Key Methods

get_labels(theme_labels, texts, max_workers=50): Label a list of texts with the provided theme labels.
post_process_dataframe(df): Post-process the labeled DataFrame for export.

Example

from bigdata_research_tools.labeler import NarrativeLabeler

labeler = NarrativeLabeler(llm_model="openai::gpt-4o-mini")
labels_df = labeler.get_labels(
    theme_labels=["AI", "Cloud"],
    texts=["AI is transforming business.", "Cloud adoption is accelerating."]
)
processed_df = labeler.post_process_dataframe(labels_df)

get_prompts_for_labeler

Generate a list of user messages for each text to be labelled by the labeling system. Parameters

texts (List[str]): Texts to get the labels from.
textsconfig (Optional[List[Dict]], optional): Optional fields for the prompts in addition to the text.

Returns

List[str]: List of prompts for the labeling system.

Example

from bigdata_research_tools.labeler import get_prompts_for_labeler

texts = ["Chunk 0 text here", "Chunk 1 text here"]
prompts = get_prompts_for_labeler(texts)

parse_labeling_response

Parse the response from the LLM model used for labeling. Parameters

response (str): The response from the LLM model as a raw string.

Returns

dict: Parsed dictionary with keys:
- motivation
- label

Example

from bigdata_research_tools.labeler import parse_labeling_response

response = '{"motivation": "Relevant to AI", "label": "AI"}'
parsed = parse_labeling_response(response)
print(parsed["label"])

ScreenerLabeler (class)

LLM-powered labeler for thematic screener labeling. Parameters (constructor)

llm_model (str): Name of the LLM model to use.
label_prompt (str, optional): Custom prompt for labeling (default: uses system prompt).
unknown_label (str, optional): Label for unclear classifications (default: "unclear").
temperature (float, optional): Temperature for the LLM model (default: 0).

Key Methods

get_labels(main_theme, labels, texts, max_workers=50): Label a list of texts with the provided main theme and labels.
post_process_dataframe(df): Post-process the labeled DataFrame for export, including company/entity columns and placeholder replacement.

Example

from bigdata_research_tools.labeler import ScreenerLabeler

labeler = ScreenerLabeler(llm_model="openai::gpt-4o-mini")
labels_df = labeler.get_labels(
    main_theme="AI",
    labels=["AI", "Cloud"],
    texts=["AI is transforming business.", "Cloud adoption is accelerating."]
)
processed_df = labeler.post_process_dataframe(labels_df)

get_prompts_for_labeler

Generate a list of user messages for each text to be labelled by the labeling system. Parameters

texts (List[str]): Texts to get the labels from.
textsconfig (Optional[List[Dict]], optional): Optional fields for the prompts in addition to the text.

Returns

List[str]: List of prompts for the labeling system.

Example

from bigdata_research_tools.labeler import get_prompts_for_labeler

texts = ["Chunk 0 text here", "Chunk 1 text here"]
prompts = get_prompts_for_labeler(texts)

parse_labeling_response

Parse the response from the LLM model used for labeling. Parameters

response (str): The response from the LLM model as a raw string.

Returns

dict: Parsed dictionary with keys:
- motivation
- label

Example

from bigdata_research_tools.labeler import parse_labeling_response

response = '{"motivation": "Relevant to AI", "label": "AI"}'
parsed = parse_labeling_response(response)
print(parsed["label"])

Introduction

Research Service

Labeler (class)

get_prompts_for_labeler

parse_labeling_response

NarrativeLabeler (class)

get_prompts_for_labeler

parse_labeling_response

ScreenerLabeler (class)

get_prompts_for_labeler

parse_labeling_response

Introduction

Research Service

Documentation Index

​Labeler (class)

​get_prompts_for_labeler

​parse_labeling_response

​NarrativeLabeler (class)

​get_prompts_for_labeler

​parse_labeling_response

​ScreenerLabeler (class)

​get_prompts_for_labeler

​parse_labeling_response

Labeler (class)

get_prompts_for_labeler

parse_labeling_response

NarrativeLabeler (class)

get_prompts_for_labeler

parse_labeling_response

ScreenerLabeler (class)

get_prompts_for_labeler

parse_labeling_response