Multi-Modal Data Lake Analytics

Multi-Modal Data Lake Analytics with LLMs

This research project explores innovative approaches to analyzing complex, multi-modal data lakes using the power of Large Language Models (LLMs). The project aims to bridge the gap between different data modalities and enable seamless information retrieval and processing.

Research Objectives

  • Develop novel architectures for multi-modal data integration
  • Create efficient algorithms for cross-modal information retrieval
  • Design scalable solutions for large-scale data lake processing
  • Implement practical applications in real-world scenarios

Key Technologies

  • Large Language Models: GPT-based architectures, BERT variants
  • Computer Vision: ResNet, Vision Transformers, CLIP
  • Database Systems: Distributed data storage, query optimization
  • Machine Learning: Deep learning, transfer learning, multi-task learning

Current Progress

Currently working on developing a unified framework that can process text, images, and structured data simultaneously, enabling more comprehensive data analytics and insights generation.

Supervision

This research is conducted under the guidance of Prof. Nan Tang at HKUST(GZ), focusing on advancing the state-of-the-art in data science and analytics.