Labeler (class)

Base class for labeling operations using an LLM.

Parameters (constructor)

  • llm_model (str): Name of the LLM model to use (e.g., "openai::gpt-4o-mini").
  • unknown_label (str, optional): Label for unclear classifications (default: "unclear").
  • temperature (float, optional): Temperature for the LLM model (default: 0).

Key Methods

  • _deserialize_label_responses(responses): Deserialize LLM responses into a DataFrame.
  • _run_labeling_prompts(prompts, system_prompt, max_workers=100): Run prompts concurrently and collect LLM responses.

Example

from bigdata_research_tools.labeler import Labeler

labeler = Labeler(llm_model="openai::gpt-4o-mini")
# Use labeler._run_labeling_prompts(...) and labeler._deserialize_label_responses(...)

get_prompts_for_labeler

Generate a list of user messages for each text to be labelled by the labeling system.

Parameters

  • texts (List[str]): Texts to get the labels from.
  • textsconfig (Optional[List[Dict]], optional): Optional fields for the prompts in addition to the text.

Returns

  • List[str]: List of prompts for the labeling system.

Example

from bigdata_research_tools.labeler import get_prompts_for_labeler

texts = ["Chunk 0 text here", "Chunk 1 text here"]
prompts = get_prompts_for_labeler(texts)

parse_labeling_response

Parse the response from the LLM model used for labeling.

Parameters

  • response (str): The response from the LLM model as a raw string.

Returns

  • dict: Parsed dictionary with keys:
    • motivation
    • label

Example

from bigdata_research_tools.labeler import parse_labeling_response

response = '{"motivation": "Relevant to AI", "label": "AI"}'
parsed = parse_labeling_response(response)
print(parsed["label"])

NarrativeLabeler (class)

LLM-powered labeler for narrative labeling.

Parameters (constructor)

  • llm_model (str): Name of the LLM model to use.
  • label_prompt (str, optional): Custom prompt for labeling (default: uses system prompt).
  • unknown_label (str, optional): Label for unclear classifications (default: "unclear").
  • temperature (float, optional): Temperature for the LLM model (default: 0).

Key Methods

  • get_labels(theme_labels, texts, max_workers=50): Label a list of texts with the provided theme labels.
  • post_process_dataframe(df): Post-process the labeled DataFrame for export.

Example

from bigdata_research_tools.labeler import NarrativeLabeler

labeler = NarrativeLabeler(llm_model="openai::gpt-4o-mini")
labels_df = labeler.get_labels(
    theme_labels=["AI", "Cloud"],
    texts=["AI is transforming business.", "Cloud adoption is accelerating."]
)
processed_df = labeler.post_process_dataframe(labels_df)

get_prompts_for_labeler

Generate a list of user messages for each text to be labelled by the labeling system.

Parameters

  • texts (List[str]): Texts to get the labels from.
  • textsconfig (Optional[List[Dict]], optional): Optional fields for the prompts in addition to the text.

Returns

  • List[str]: List of prompts for the labeling system.

Example

from bigdata_research_tools.labeler import get_prompts_for_labeler

texts = ["Chunk 0 text here", "Chunk 1 text here"]
prompts = get_prompts_for_labeler(texts)

parse_labeling_response

Parse the response from the LLM model used for labeling.

Parameters

  • response (str): The response from the LLM model as a raw string.

Returns

  • dict: Parsed dictionary with keys:
    • motivation
    • label

Example

from bigdata_research_tools.labeler import parse_labeling_response

response = '{"motivation": "Relevant to AI", "label": "AI"}'
parsed = parse_labeling_response(response)
print(parsed["label"])

ScreenerLabeler (class)

LLM-powered labeler for thematic screener labeling.

Parameters (constructor)

  • llm_model (str): Name of the LLM model to use.
  • label_prompt (str, optional): Custom prompt for labeling (default: uses system prompt).
  • unknown_label (str, optional): Label for unclear classifications (default: "unclear").
  • temperature (float, optional): Temperature for the LLM model (default: 0).

Key Methods

  • get_labels(main_theme, labels, texts, max_workers=50): Label a list of texts with the provided main theme and labels.
  • post_process_dataframe(df): Post-process the labeled DataFrame for export, including company/entity columns and placeholder replacement.

Example

from bigdata_research_tools.labeler import ScreenerLabeler

labeler = ScreenerLabeler(llm_model="openai::gpt-4o-mini")
labels_df = labeler.get_labels(
    main_theme="AI",
    labels=["AI", "Cloud"],
    texts=["AI is transforming business.", "Cloud adoption is accelerating."]
)
processed_df = labeler.post_process_dataframe(labels_df)

get_prompts_for_labeler

Generate a list of user messages for each text to be labelled by the labeling system.

Parameters

  • texts (List[str]): Texts to get the labels from.
  • textsconfig (Optional[List[Dict]], optional): Optional fields for the prompts in addition to the text.

Returns

  • List[str]: List of prompts for the labeling system.

Example

from bigdata_research_tools.labeler import get_prompts_for_labeler

texts = ["Chunk 0 text here", "Chunk 1 text here"]
prompts = get_prompts_for_labeler(texts)

parse_labeling_response

Parse the response from the LLM model used for labeling.

Parameters

  • response (str): The response from the LLM model as a raw string.

Returns

  • dict: Parsed dictionary with keys:
    • motivation
    • label

Example

from bigdata_research_tools.labeler import parse_labeling_response

response = '{"motivation": "Relevant to AI", "label": "AI"}'
parsed = parse_labeling_response(response)
print(parsed["label"])