Statistical Inference versus Baysean Inference

FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.

This project has no flash-attn dependency, no custom triton kernel. Everything is implemented with FlexAttention. The code is commented, the structure is flat. Read the accompanying write-up: vLLM ...

IEEE

Membership Inference Attacks and Differential Privacy: A Study Within the Context of Generative Models

Abstract: Membership attacks pose a major issue in terms of secure machine learning, especially in cases in which real data are sensitive. Models tend to be overconfident in predicting labels from the ...

GitHub

inference-gateway/browser-agent

A enterprise-ready Agent-to-Agent (A2A) server that provides AI-powered capabilities through a standardized protocol.

IEEE

Computational Caching-Empowered Collaborative LLM Inference Framework in VEC Networks

Abstract: The overwhelming scale of large language models (LLMs) exhausts the on-device communication and computation resources in vehicular networks, limiting its application in performing inference ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results