DIMSUM AI Labs

AI-Friendly Cantonese Corpus System and AI Agent System Serving Real-World Environments

Compatible with Lingnan culture, compatible with AI technology,
Building a complete system integrating "Cantonese datasets — Large Language Models — AI Agents — Innovative applications".

Four key features:
* AI-Friendly — Equipped with interfaces for connecting with large and small models
* AI Agent Friendly — Equipped with interfaces for connecting with AI Agents
* Hot-swappable Modules — The system can dynamically add new capabilities, such as integrating "blockchain rights confirmation modules"
* Community Mode — Adhering to the mode of co-construction between core team and technical community, jointly building the main system, corpus, Cantonese Apps and module library

ML
Model
NLP
Language
AI
Research
DL
Neural
LLM
Large
Scroll to explore
DIMSUM AI Logo

Who Are Our Target Users?

* Cantonese Enthusiasts — Search Cantonese-related data and use Cantonese-related applications
* Cantonese Learners — Master Cantonese with AI assistance, based on abundant high-quality Cantonese materials and related applications
* Cantonese Researchers — Obtain high-quality Cantonese data and materials, use related research tools
* Corpus Contributing Institutions — Provide Cantonese corpus sets, maximize data value
* Individual Corpus Contributors — Participate in corpus annotation and provision contributions, earn corresponding rewards, co-build Cantonese enthusiast ecosystem
* Developers — Participate in building Cantonese-related infrastructure and innovative applications, submit applications to app store, co-build developer ecosystem

Critical Propositions

Dim Sum AI Lab is deeply dedicated to exploring a series of critical propositions.

Corpus Development

How to develop annotation standards to efficiently build a large-scale Autonomous Multimodal Cantonese Corpus?

Application Ecosystem

How to design an access mechanism to create the Yue App Store and a versatile Cantonese application ecosystem based on the Cantonese corpus?

Search Engine Innovation

How to improve search algorithms to develop a next-generation AI-friendly search engine based on the Cantonese corpus?

AI SaaS Framework

How to innovate service models to create a next-generation AI SaaS framework based on the Cantonese corpus?

Community Building

How to foster an open-source community to build a next-generation global community of builders and researchers (DAO for Builders & Researchers) around the Cantonese corpus?





数据情况

文本资料

100 万+

条记录

音视频资料

1+

TB

图片资料

3000+

语料集数量

20+

应用数量

10+

总数据规模

1+

TB





System Architecture

A comprehensive four-layer architecture designed for scalable Cantonese AI development

Application Layer

L4

User-Facing Solutions

Cantonese AI Agents
Cantonese Apps
Cantonese Tools

API Gateway

L3

Integration Interfaces

REST APIs
GraphQL APIs
WebSocket APIs

Core Services

L2

AI Infrastructure

AI Search Engines
AI SaaS Framework
App Extension
LLMs

Data Foundation

L1

Knowledge Base

Multimodal Data Repository
Hybrid Annotation System