Multi-Modal Data Lake Analytics
Multi-Modal Data Lake Analytics with LLMs
This research project explores innovative approaches to analyzing complex, multi-modal data lakes using the power of Large Language Models (LLMs). The project aims to bridge the gap between different data modalities and enable seamless information retrieval and processing.
Research Objectives
- Develop novel architectures for multi-modal data integration
- Create efficient algorithms for cross-modal information retrieval
- Design scalable solutions for large-scale data lake processing
- Implement practical applications in real-world scenarios
Key Technologies
- Large Language Models: GPT-based architectures, BERT variants
- Computer Vision: ResNet, Vision Transformers, CLIP
- Database Systems: Distributed data storage, query optimization
- Machine Learning: Deep learning, transfer learning, multi-task learning
Current Progress
Currently working on developing a unified framework that can process text, images, and structured data simultaneously, enabling more comprehensive data analytics and insights generation.
Supervision
This research is conducted under the guidance of Prof. Nan Tang at HKUST(GZ), focusing on advancing the state-of-the-art in data science and analytics.