Responsibilities
Team Introduction:
The Risk Control R&D Team is dedicated to addressing various challenges posed by malicious activities across products. Their work spans multiple domains of risk governance such as content, transactions, traffic, and accounts. By leveraging technologies such as machine learning, multimodal models, and large models, the team strives to understand user behaviors and content, thereby identifying potential risks and issues. By continuously deepening their understanding of business and user behaviors, the team drives innovation in models and algorithms with an aim to build an industry-leading risk control algorithm system.
Project Objectives:
Optimize and enhance large models' ability to understand and reason about structured data (sequential data, graph data) based on risk control data.
Want more jobs like this?
Get Software Engineering jobs in Singapore delivered to your inbox every week.
Project Necessity:
Data in risk control scenarios is primarily structured, while large models have significantly improved their understanding of text and images. Integrating non-text/image structured data from risk control scenarios with large models to enable better comprehension of structured data remains an industry-wide challenge. This involves three key difficulties:
1. How to effectively align structured information with the NLP semantic space, allowing models to simultaneously understand both data structure and semantic information.
2. How to use appropriate instructions to enable large models to interpret structural information in structured data.
3. How to endow large language models with step-by-step reasoning capabilities for graph learning downstream tasks, thereby inferring more complex relationships and attributes.
Project Content:
Current industry explorations of structured data include:
1. Graph data understanding (e.g., GraphGPT: Enabling large models to read graph data, SIGIR'2024).
2. Graph data RAG (e.g., Microsoft GraphRAG: Unlocking LLM discovery on narrative private data).
3. Sequential data understanding (e.g., StructGPT: A large model reasoning framework for structured data, EMNLP-2023).
However, current efforts mainly focus on understanding single-type structured data, and several challenges remain in risk control scenarios:
1. How to effectively fuse and understand various types of structured data, especially the integration of graph and sequential data.
2. Addressing the challenges mentioned in the ""Project Necessity"" section, particularly the step-by-step reasoning capabilities for downstream tasks, which are currently underexplored-especially reasoning over sequential data.
Research Directions:
1. Large model structured data understanding
2. Large model structured data RAG
3. Large model thought chains
Qualifications
1. Got doctor degree, currently pursuing a doctoral degree in computer science, cybersecurity, artificial intelligence, or related fields.
2. Excellent coding skills and a solid foundation in data structures and algorithms; proficiency in Python is required, and familiarity with PyTorch or TensorFlow (TF) is preferred.
3. Outstanding ability to define, analyze, and solve problems; candidates with publications in CCF-A category journals or top conferences such as AAAI, NeurIPS, SIGKDD, SIGIR, etc., are preferred.
4. Strong resilience under pressure, excellent communication and teamwork skills; passionate about technology, willing to embrace challenges with the team, and a drive for innovation.