LLM for Structuring Information

Related API: ccai9012.llm_utils ยท ccai9012.viz_utils

Overview

Category: Unstructured Text Analysis & Knowledge Structuring

Modular Components: - Text Preprocessing Pipeline - LLM API Calling - LLM Embedding Extractor - Vector Clustering - Q&A over Documents - Heatmap Visualization - Wordcloud generation

Use Cases

Code Examples

Urban Sentiment Classification

Content: - Extract structured sentiment (location, themes, polarity) from reviews using LLM - Use NER + classification - Create sentiment maps to inform urban design

Dataset: - Yelp open dataset - Source: https://business.yelp.com/data/resources/open-dataset/ (Please press the Download JSON red button to get the dataset, and put the file under starter_kits/2_llm_structure_output/urban_sentiment/data folder.)

Required Packages: LangChain, DeepSeek, transformers, pandas, json


Yelp Review heatmap.

Airbnb Reviews Analysis

Content: - Collect Airbnb housing and review data (public dataset Inside Airbnb) - Classification of reviews’ sentiments of different aspects (location, host, facility) - Create Airbnb aspect-wise impression heatmap and wordcloud

Dataset: - Airbnb review dataset - Source: https://insideairbnb.com/get-the-data/


Airbnb Review keywords wordcloud.

Energy Action Plan PDF Structuring

Content: - Extract building/energy info from Energy Action Plans (PDFs) - Summarize content into JSON format for analysis - Enable comparison across tribal regions over time

Dataset: - Energy Action Plans documents - Source: https://cchrc.org/

Required Packages: LangChain, PyMuPDF, pdfplumber, transformers, pandas

Literature Review of Topics

Content: - Webcrawl website for relevant papers - Go through document by document with specific questions - Identify insights & keywords - Catalogue & represent findings

Dataset: Collection of literatures from specific topic