Projects

Secure Retrieval-Augmented Generation on LLMs

May 2024 - Present

This project addresses the security vulnerabilities in Retrieval-Augmented Generation (RAG) systems for Large Language Models (LLMs).

Challenges

RAG systems are vulnerable to poisoning attacks and jailbreaking attempts
Traditional defense mechanisms often compromise model performance
Need for robust security without sacrificing utility

Approach

Implemented a novel objective function combining adversarial loss, BERTScore, and harmful loss
Developed a robust RAG system for LLMs to counter universal attacks
Created adaptive defense mechanisms that respond to attack patterns

Results

Successfully prevented over 90% of state-of-the-art poisoning attacks
Blocked jailbreaking attacks on RAG-based LLMs
Maintained high performance on legitimate queries

TrojAI: Detecting Trojans in AI Models

2021 - 2023

Participated in the NIST TrojAI competition to detect backdoored or poisoned models among over 1000 adversarial training models.

Challenges

Detecting subtle backdoors inserted during the training process
Distinguishing between benign and malicious model behaviors
Handling diverse model architectures and attack vectors

Approach

Developed a two-stage detection flow: Polygon trigger detection and Instagram trigger detection
Implemented advanced feature extraction techniques to identify suspicious patterns
Created ensemble methods to improve detection accuracy

Results

Achieved 2nd place out of 16 teams in Round 3 of the NIST TrojAI competition
Exceeded target performance with cross-entropy loss < 0.3465
Successfully identified various types of backdoor triggers

Learn more

Quantization of Deep Neural Networks

Feb 2021 - May 2021

Explored the application of Binary Neural Networks (BNNs) in oblivious inference to enable privacy-preserving machine learning.

Challenges

Traditional privacy-preserving inference methods are computationally expensive
Need for efficient cryptographic protocols for neural networks
Balancing privacy, performance, and accuracy

Approach

Enabled oblivious inference in Binary Neural Networks
Explored the application of BNN in oblivious inference
Devised lightweight cryptographic protocols tailored to BNNs

Results

Achieved 2x faster inference compared to standard approaches
Up to 11x faster inference for binary networks
Maintained acceptable accuracy while providing privacy guarantees

View Paper

Optimization and Acceleration of Deep Learning

2021

Explored methods to optimize and accelerate deep neural networks for deployment on various hardware platforms, particularly resource-constrained devices.

Techniques Explored

Pruning: Reduced model size by removing sections with little classification power
Tucker Decomposition: Decomposed tensors into smaller core tensors and matrices
Quantization: Reduced precision of weights and activations

Results

Significant reduction in model size while maintaining accuracy
Improved inference speed on resource-constrained devices
Reduced power consumption for IoT and edge deployments

Learn more

Leakage Power Minimization

2020

Used Synopsys PrimeTime and PT-PX for timing and power analysis to implement strategies for gate sizing and Vt-swapping optimizations.

Approach

Applied sensitivity function to balance leakage power and timing slacks
Calculated sensitivity metrics based on ∆leakage_power / ∆slack
Sized cells based on sensitivity ranking
Used ECO flow with Cadence Innovus Implementation System

Results

Benchmark	Reduction Rate
usb_phy	82.92%
aes_cipher_top	73.68%
mpeg2_top	71.90%

Learn more

Xinqiao Zhang

Research Projects

Secure Retrieval-Augmented Generation on LLMs

Challenges

Approach

Results

TrojAI: Detecting Trojans in AI Models

Challenges

Approach

Results

Quantization of Deep Neural Networks

Challenges

Approach

Results

Optimization and Acceleration of Deep Learning

Techniques Explored

Results

Leakage Power Minimization

Approach

Results

Ready to collaborate?