Research & Papers · Other Companies

New AI model generates 45-minute lip-synced video from one photo and runs in real time
Featured
Research & Papers

New AI model generates 45-minute lip-synced video from one photo and runs in real time

A single image becomes a talking character: LPM 1.0 generates real-time video with lip sync, facial expressions, and emotional reactions. For now, it remains a research project. The article New AI...

The Decoder
Read more
A Coding Implementation of MolmoAct for Depth-Aware Spatial Reasoning, Visual Trajectory Tracing, and Robotic Action Prediction
Research & Papers

A Coding Implementation of MolmoAct for Depth-Aware Spatial Reasoning, Visual Trajectory Tracing, and Robotic Action Prediction

This tutorial provides a practical implementation guide for MolmoAct, an action-reasoning model that performs...

MarkTechPost
Researchers define what counts as a world model and text-to-video generators do not
Research & Papers

Researchers define what counts as a world model and text-to-video generators do not

An international research team introduces OpenWorldLib to standardize world model research and establish clear...

The Decoder
Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput
Research & Papers

Researchers from MIT, NVIDIA, and Zhejiang University Propose TriAttention: A KV Cache Compression Method That Matches Full Attention at 2.5× Higher Throughput

Researchers from MIT, NVIDIA, and Zhejiang University developed TriAttention, a KV cache compression method that...

MarkTechPost
AI models would rather guess than ask for help, researchers find
Research & Papers

AI models would rather guess than ask for help, researchers find

Researchers tested 22 multimodal language models using ProactiveBench and found that almost none ask for help when...

The Decoder
How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model
Research & Papers

How Knowledge Distillation Compresses Ensemble Intelligence into a Single Deployable AI Model

Knowledge Distillation is a technique that compresses the intelligence of multiple ensemble AI models into a single,...

MarkTechPost
Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts
Research & Papers

Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Uses a Memory Graph to Navigate Massive Visual Contexts

Alibaba's Tongyi Lab has released VimRAG, a multimodal RAG framework designed to handle visual data in...

MarkTechPost
A Coding Guide to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim
Research & Papers

A Coding Guide to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim

A tutorial demonstrating how to build and execute a complete markerless 3D human kinematics pipeline using Pose2Sim,...

MarkTechPost
NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model
Research & Papers

NVIDIA Releases AITune: An Open-Source Inference Toolkit That Automatically Finds the Fastest Inference Backend for Any PyTorch Model

NVIDIA has released AITune, an open-source inference toolkit that automatically selects the fastest inference backend...

MarkTechPost
Five AI Compute Architectures Every Engineer Should Know: CPUs, GPUs, TPUs, NPUs, and LPUs Compared
Research & Papers

Five AI Compute Architectures Every Engineer Should Know: CPUs, GPUs, TPUs, NPUs, and LPUs Compared

The article examines five different AI compute architectures (CPUs, GPUs, TPUs, NPUs, and LPUs), explaining how modern...

MarkTechPost
New technique makes AI models leaner and faster while they’re still learning
Research & Papers

New technique makes AI models leaner and faster while they’re still learning

Researchers have developed a new technique using control theory to reduce unnecessary complexity in AI models during...

MIT News AI
Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context
Research & Papers

Sigmoid vs ReLU Activation Functions: The Inference Cost of Losing Geometric Context

This article analyzes the geometric properties of sigmoid versus ReLU activation functions in deep neural networks,...

MarkTechPost
Databricks co-founder wins prestigious ACM award, says ‘AGI is here already’
Research & Papers

Databricks co-founder wins prestigious ACM award, says ‘AGI is here already’

Matei Zaharia, Databricks co-founder, has won the top honor from the Association for Computing Machinery. He is working...

TechCrunch AI
An Implementation Guide to Running NVIDIA Transformer Engine with Mixed Precision, FP8 Checks, Benchmarking, and Fallback Execution
Research & Papers

An Implementation Guide to Running NVIDIA Transformer Engine with Mixed Precision, FP8 Checks, Benchmarking, and Fallback Execution

This tutorial provides a practical implementation guide for NVIDIA Transformer Engine, focusing on mixed-precision...

MarkTechPost
Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting
Research & Papers

Import AI 452: Scaling laws for cyberwar; rising tides of AI automation; and a puzzle over gDP forecasting

The article discusses scaling laws for cyberwarfare applications of AI, explores the rising tide of AI automation's...

Import AI
Alibaba's Qwen team built HopChain to fix how AI vision models fall apart during multi-step reasoning
Research & Papers

Alibaba's Qwen team built HopChain to fix how AI vision models fall apart during multi-step reasoning

Alibaba's Qwen team developed HopChain, a framework that improves AI vision models' multi-step reasoning by breaking...

The Decoder
How to Build a Netflix VOID Video Object Removal and Inpainting Pipeline with CogVideoX, Custom Prompting, and End-to-End Sample Inference
Research & Papers

How to Build a Netflix VOID Video Object Removal and Inpainting Pipeline with CogVideoX, Custom Prompting, and End-to-End Sample Inference

This tutorial demonstrates how to build and implement Netflix's VOID model, an advanced video object removal and...

MarkTechPost
Inside the Creative Artificial Intelligence (AI) Stack: Where Human Vision and Artificial Intelligence Meet to Design Future Fashion
Research & Papers

Inside the Creative Artificial Intelligence (AI) Stack: Where Human Vision and Artificial Intelligence Meet to Design Future Fashion

The article discusses how artificial intelligence, including algorithms, neural networks, and machine learning, is...

MarkTechPost
Netflix open-sources VOID, an AI framework that erases video objects and rewrites the physics they left behind
Research & Papers

Netflix open-sources VOID, an AI framework that erases video objects and rewrites the physics they left behind

Netflix has open-sourced VOID, an AI framework capable of removing objects from videos while automatically adjusting...

The Decoder
Know3D lets users control the hidden back side of 3D objects with text prompts
Research & Papers

Know3D lets users control the hidden back side of 3D objects with text prompts

Know3D is a research project that uses large language models to enable users to control the appearance of hidden...

The Decoder

Stay Updated

Get the latest AI news delivered to your inbox every morning. No spam, unsubscribe anytime.