What technologies paperswithcode.com is built with

### Subscribe to the PwC Newsletter

×

Stay informed on the latest trending ML papers with code, research developments, libraries, methods, and datasets.  

[Read previous issues](/newsletter)

Subscribe

##### Join the community

×

You need to [log in](/accounts/login?next=/) to edit.  

You can [create a new account](/accounts/register?next=/) if you don't have one.  

[

Top

](/)[

New

](./latest)[

Greatest

](./greatest)

Trending Research

-----------------

Subscribe

[

](/paper/vggt-visual-geometry-grounded-transformer)

[VGGT: Visual Geometry Grounded Transformer](/paper/vggt-visual-geometry-grounded-transformer)

==============================================================================================

[facebookresearch/vggt](https://github.com/facebookresearch/vggt) • • 14 Mar 2025

We present VGGT, a feed-forward neural network that directly infers all key 3D attributes of a scene, including camera parameters, point maps, depth maps, and 3D point tracks, from one, a few, or hundreds of its views.

[Depth Estimation](/task/depth-estimation) [Novel View Synthesis](/task/novel-view-synthesis) [**+2**](/paper/vggt-visual-geometry-grounded-transformer#tasks)

2,243

11.37 stars / hour

[Paper](/paper/vggt-visual-geometry-grounded-transformer)  

[Code](/paper/vggt-visual-geometry-grounded-transformer#code)  

[

](/paper/neural-fields-with-thermal-activations-for)

[Neural Fields with Thermal Activations for Arbitrary-Scale Super-Resolution](/paper/neural-fields-with-thermal-activations-for)

================================================================================================================================

[prs-eth/thera](https://github.com/prs-eth/thera) • • 29 Nov 2023

We present a novel way to design neural fields such that points can be queried with an adaptive Gaussian PSF, so as to guarantee correct anti-aliasing at any desired output resolution.

[Image Super-Resolution](/task/image-super-resolution)

525

2.50 stars / hour

[Paper](/paper/neural-fields-with-thermal-activations-for)  

[Code](/paper/neural-fields-with-thermal-activations-for#code)  

[

](/paper/txagent-an-ai-agent-for-therapeutic-reasoning)

[TxAgent: An AI Agent for Therapeutic Reasoning Across a Universe of Tools](/paper/txagent-an-ai-agent-for-therapeutic-reasoning)

=================================================================================================================================

[mims-harvard/TxAgent](https://github.com/mims-harvard/TxAgent) • • 14 Mar 2025

It selects tools based on task objectives and executes structured function calls to solve therapeutic tasks that require clinical reasoning and cross-source validation.

[AI Agent](/task/ai-agent) [Decision Making](/task/decision-making)

269

2.49 stars / hour

[Paper](/paper/txagent-an-ai-agent-for-therapeutic-reasoning)  

[Code](/paper/txagent-an-ai-agent-for-therapeutic-reasoning#code)  

[

](/paper/reasongraph-visualisation-of-reasoning-paths)

[ReasonGraph: Visualisation of Reasoning Paths](/paper/reasongraph-visualisation-of-reasoning-paths)

====================================================================================================

[ZongqianLi/ReasonGraph](https://github.com/ZongqianLi/ReasonGraph) • 6 Mar 2025

Large Language Models (LLMs) reasoning processes are challenging to analyze due to their complexity and the lack of organized visualization tools.

344

2.14 stars / hour

[Paper](/paper/reasongraph-visualisation-of-reasoning-paths)  

[Code](/paper/reasongraph-visualisation-of-reasoning-paths#code)  

[

](/paper/reinforcement-learning-outperforms-supervised)

[Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering](/paper/reinforcement-learning-outperforms-supervised)

===========================================================================================================================================================

[xiaomi-research/r1-aqa](https://github.com/xiaomi-research/r1-aqa) • • 14 Mar 2025

Recently, reinforcement learning (RL) has been shown to greatly enhance the reasoning capabilities of large language models (LLMs), and RL-based approaches have been progressively applied to visual multimodal tasks.

[Audio Question Answering](/task/audio-question-answering) [Question Answering](/task/question-answering) [**+1**](/paper/reinforcement-learning-outperforms-supervised#tasks)

168

1.81 stars / hour

[Paper](/paper/reinforcement-learning-outperforms-supervised)  

[Code](/paper/reinforcement-learning-outperforms-supervised#code)  

[

](/paper/step-video-ti2v-technical-report-a-state-of)

[Step-Video-TI2V Technical Report: A State-of-the-Art Text-Driven Image-to-Video Generation Model](/paper/step-video-ti2v-technical-report-a-state-of)

======================================================================================================================================================

[stepfun-ai/step-video-ti2v](https://github.com/stepfun-ai/step-video-ti2v) • 14 Mar 2025

We present Step-Video-TI2V, a state-of-the-art text-driven image-to-video generation model with 30B parameters, capable of generating videos up to 102 frames based on both text and image inputs.

[Image to Video Generation](/task/image-to-video)

86

1.62 stars / hour

[Paper](/paper/step-video-ti2v-technical-report-a-state-of)  

[Code](/paper/step-video-ti2v-technical-report-a-state-of#code)  

[

](/paper/kblam-knowledge-base-augmented-language-model)

[KBLaM: Knowledge Base augmented Language Model](/paper/kblam-knowledge-base-augmented-language-model)

======================================================================================================

[microsoft/KBLaM](https://github.com/microsoft/KBLaM) • • 14 Oct 2024

In this paper, we propose Knowledge Base augmented Language Model (KBLaM), a new method for augmenting Large Language Models (LLMs) with external knowledge.

[8k](/task/8k) [In-Context Learning](/task/in-context-learning) [**+6**](/paper/kblam-knowledge-base-augmented-language-model#tasks)

175

1.61 stars / hour

[Paper](/paper/kblam-knowledge-base-augmented-language-model)  

[Code](/paper/kblam-knowledge-base-augmented-language-model#code)  

[

](/paper/data-formulator-2-iteratively-creating-rich)

[Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way](/paper/data-formulator-2-iteratively-creating-rich)

===========================================================================================================================================================

[microsoft/data-formulator](https://github.com/microsoft/data-formulator) • 28 Aug 2024

Data analysts often need to iterate between data transformations and chart designs to create rich visualizations for exploratory data analysis.

[Code Generation](/task/code-generation) [Navigate](/task/navigate)

9,873

1.53 stars / hour

[Paper](/paper/data-formulator-2-iteratively-creating-rich)  

[Code](/paper/data-formulator-2-iteratively-creating-rich#code)  

[

](/paper/2503-01710)

[Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens](/paper/2503-01710)

======================================================================================================================

[sparkaudio/spark-tts](https://github.com/sparkaudio/spark-tts) • • 3 Mar 2025

Recent advancements in large language models (LLMs) have driven significant progress in zero-shot text-to-speech (TTS) synthesis.

[Attribute](/task/attribute) [Text to Speech](/task/text-to-speech) [**+1**](/paper/2503-01710#tasks)

5,259

1.53 stars / hour

[Paper](/paper/2503-01710)  

[Code](/paper/2503-01710#code)  

[

](/paper/lhm-large-animatable-human-reconstruction)

[LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds](/paper/lhm-large-animatable-human-reconstruction)

===================================================================================================================================

[aigc3d/LHM](https://github.com/aigc3d/LHM) • • 13 Mar 2025

Animatable 3D human reconstruction from a single image is a challenging problem due to the ambiguity in decoupling geometry, appearance, and deformation.

[3D Human Reconstruction](/task/3d-human-reconstruction)

314

1.39 stars / hour

[Paper](/paper/lhm-large-animatable-human-reconstruction)  

[Code](/paper/lhm-large-animatable-human-reconstruction#code)  

[](?page=2)

Contact us on: [[email protected]](mailto:[email protected]) . Papers With Code is a free resource with all data licensed under [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/).

[Terms](/site/terms) [Data policy](/site/data-policy) [Cookies policy](/site/cookies-policy) [from](/about#team)
🏳️The latest in Machine Learning | Papers With Code

paperswithcode.com