At Use-AI.rs, we are building an open-source AI framework in Rust. Our goal is to create a concurrent, locally hostable AI agent for practical applications in production environments.

The different layers of abstraction build spaces for different professions. For Business Informatic Specialists, Mathematicians, and Computer Scientists, Use-AI.rs provides a comprehensive set of tools and libraries to build and extend AI applications.

The entrypoint of Use-AI.rs is the use_ai crate. It provides a toolset for Business Informatic Specialists and Mathematicians to test and deploy AI applications with ease. use_ai also is a good starting point for Computer Scientists since it shows how to implement the underlying architecture of the framework.

The underlying architecture is mostly interesting for Computer Scientists and can be found in the core crate. It contains the following modules:

At lib-calculator::model we find a space also interesting for Mathematicians since it provides a set of easy to use tensor operations that can be used to build powerful AI models.

Tooling

Test AI

Use AI

Current

Use-AI is just a basic cli tool written with Inquire. Inquire is used since we just want to have a nice little entry point for People who doesn't now the project and want to find out what with data can be done with Use-AI.rs. Also, Inquire provides an interface which allows you to easily find the underlying functions so you have an example of how Use-AI.rs can be used in production.

Future

The Use-AI Tool is just the first tool we want to provide. In Future the goal is to provide a hole Network Server Layer. For the Server layer we are thinking about a TCP Server. The Server should provide a load test tool for a hypothetical Query Server. But since this is a plan in far future these are only idea's. When someone I found who can implement something like that it may be implemented in parallel.

Store

Current

lib-store is the first library of the Core of Use-AI.rs. Here we find ai_config.rs and mangodb.rs.

Future

Not much will happen here. Only when the Config is expanded we will see some changes here. Some additional Db request functions and some functions regarding Configs will be added here when needed.# Store

Config

ai_config.rs, is the entrypoint for the configuration file.

  • Models, will be provided by lib-calculator and
  • DataSection, will be provided by lib-stage.

MangoDB

mangodb.rs, operates MangoDB in synchronous. Since we expect this Db to run on a system what also handles our data in parallel on the cpu and since we don't have to worry about huge db requests it is decided that we only communicate synchronised with the Db so we don't spawn threads that block building threads of performance critical features.

Stage

Current

Here is where the construction site begins. Currently, we only have a CSV pipeline to show what the plan is with lib-stager. This pipeline shows how we want to handle data on a more abstract layer. For now, we just input some csv file and handle it from there with different operations on columns or the hole data. This is chosen because we want to reach a first complete pass through Use-AI.rs using a simple XGBoost algorithm for now. Till we expanded that simple XGBoost it's called GBDT here.

Future

For DQNs and Q-learning we need to expand lib-stage. Goal is not to only use files instead this will be used to Configure a Data Stream which first, will enable the possibility to get Input from program external drivers and also, will add the possibility for communication between models. But to be able to implement such features we first need to focus on the lib-calculator. The functions as they are, will be renamed in near Future.

Config

Data Input

Transfer

Data Output

Calculator

In lib-calculator the most crucial parts are the module model and the operator. In model we can easily build models within the operation trait which are supported by the opertator module and trait. So, the operator provides functions through the PipelineExec and PipelinePush traits. The operator will be executed on the GPUs Kernel. Every model needs to implement the Operation trait

Config

Model

When you want to integrate a new model you can do that by doing these three things:

  1. Build a struct which includes tuples of MetaData and Handle. This structure is our abstract representation of the shared memory on the GPU. The macro #[operator] above your struct will allocate a Tensor on the GPU and leave you a TensorHandelRef to work with in the operation

Operation

Proc Macros

Welcome, Dev!

Here you can find the Blog of Use-AI.rs. Here Developers who are currently actively Developing Use-AI.rs make technical discussions of the current state of Use-AI.rs.

Introducing Action Space

Maintainer's Comment | 17.03.2025

Since I had a major illness, I was unable to post this blog as planned. Also, I wasn't able to code at the pace and with the quality I intended, which led to incomplete code. The reason why I still decided to commit the code was that you guys can see what I've been working on and how I've been progressing. I finally got around to writing about Action Space, a procedural macro designed to compose Tensor operation sequences in Rust. This is still not working properly, but I hope to fix it soon.

Introduction

When working with AI models, especially in reinforcement learning and ensembled learning, you often end up chaining multiple operations on Tensor. Action Space is a Rust procedural macro that makes composing these tensor operations straightforward. With Action Space, the goal was to abstract Tensor operation sequences so that they can be easily composed and reused. This is intended to make it easier to manage and maintain complex Tensor operations.

Example: Composing Tensor Operations

Here's a basic example demonstrating how Action Space helps structure Tensor computations in Rust:

#![allow(unused)]
fn main() {
use cubecl::wgpu::WgpuRuntime;
use lib_proc_macros::action_space;

fn some() {
    let input: TensorHandleRef<WgpuRuntime> = build_tensor::<WgpuRuntime>();

    let result: TensorHandleRef<WgpuRuntime> = action_space!(
        WgpuRuntime,
        (input, ExecMean, output),
        ((input, output), PrepResiduals),
        (ExecSum),
        (PrepSquare),
        (ExecProd),
    );
}
}

Breaking It Down

  • The macro takes a Runtime to define the environment.
  • The first tuple takes the input before the operations are applied.
  • Operations from the PipelineExec and PipelinePush trait are applied sequentially.
  • If we want to handle the output of a specific operation differently, we can add a variable name in the tuple behind the operation.
  • When the operation needs a non-default input, we can specify it with a tuple before the operation.
  • The default case takes the output of the previous operation and passes it to the next operation.

For now, the goal here is that we can input either a 1D, 2D, or 3D Tensor.

  • A 1D Tensor results in a Scalar.
  • A 2D Tensor results in a Vector.
  • A 3D Tensor results in a Matrix.

This isn't working properly yet, and we also do not handle Runtime and Client as intended. This results in the plan to fix and improve this feature next.

Short Update

Maintainer's Comment | 12.02.2025

On the weekend I have Implemented the first PipelineExec to show how we want to use cubecl-reduce for the first Iteration of Use-AI.rs. In the future we want to build our own reduce strategies and configurations. But this will be good enough for now. With PipelinePush we will see the first Kernels which fill Tensors depending on the type of PipelinePush needed for Future execution trough PipelineExec. With these reusable Pipeline Operations we can easily compose different AI Models with GPU Optimisation. This most likely implicates, that we are no longer aim to lower all of lib-calculator into the GPU. More likely lib-calculator::operation will be the only module what will be completely on the GPU Kernel.

Apart from this I decided this week to use XGBoost: A Scalable Tree Boosting System (Tianqi Chen and Carlos Guestrin, 10. Jun 2016) as the paper which will lead us in implementing XGBoost.

Next week we will see more code again!

New Blog and Doc Updates

Maintainer's Comment | 03.02.2025

Welcome to the new Blog

As you may have noticed the new Blogs place will also will be used as entry place for the Use-AI.rs Documentation. This was decided due to the fact, that it's easier to maintain a good documentation within the codebase compared to an external Application. So, Welcome to the new Documentation! This blog will be useful to communicate changes and discuss development decisions for future users of the Framework.

State

Today I will go tough the hole Project and will discuss the decisions made. This will be structured from the highest abstraction used in Use-AI.rs down to the lowest abstraction layers. This represents the architecture chosen. These discussions will also be used in the Book to introduce the idea of each component.

Use-AI

Current

Use-AI is just a basic cli tool written with Inquire. Inquire is used since we just want to have a nice little entry point for People who doesn't now the project and want to find out what with data can be done with Use-AI.rs. Also, Inquire provides an interface which allows you to easily find the underlying functions so you have an example of how Use-AI.rs can be used in production.

Future

The Use-AI Tool is just the first tool we want to provide. In Future the goal is to provide a hole Network Server Layer. For the Server layer we are thinking about a TCP Server. The Server should provide a load test tool for a hypothetical Query Server. But since this is a plan in far future these are only idea's. When someone I found who can implement something like that it may be implemented in parallel.

Store

Current

lib-store is the first library of the Core of Use-AI.es. Here we find ai_config.rs and mangodb.rs.

ai_config.rs, is the entrypoint for the configuration file.

  • Models, will be provided by lib-calculator and
  • DataSection, will be provided by lib-stage.

mangodb.rs, operates MangoDB in synchronous. Since we expect this Db to run on a system what also handles our data in parallel on the cpu and since we don't have to worry about huge db requests it is decided that we only communicate synchronised with the Db so we don't spawn threads that block building threads of performance critical features.

Future

Not much will happen here. Only when the Config is expanded we will see some changes here. Some additional Db request functions and some functions regarding Configs will be added here when needed.

Stage

Current

Here is where the construction site begins. Currently, we only have a CSV pipeline to show what the plan is with lib-stager. This pipeline shows how we want to handle data on a more abstract layer. For now, we just input some csv file and handle it from there with different operations on columns or the hole data. This is chosen because we want to reach a first complete pass through Use-AI.rs using a simple XGBoost algorithm for now. Till we expanded that simple XGBoost it's called GBDT here.

Future

For DQNs and Q-learning we need to expand lib-stage. Goal is not to only use files instead this will be used to Configure a Data Stream which first, will enable the possibility to get Input from program external drivers and also, will add the possibility for communication between models. But to be able to implement such features we first need to focus on the lib-calculator. The functions as they are, will be renamed in near Future.

Calculator and Proc Macros

Current

Here we find the first structures of how the Gpu Backends will be called, build and executed.

lib.rs gives us the first structure which needs to be provided by stage.

#![allow(unused)]
fn main() {
pub struct MetaData {
    pub stride: Box<[usize]>,
    pub shape: Box<[usize]>,
}
}

Every array needed for the operation needs to be provided by lib-stage as MetaData and its Vec<f32> representation.

  • pub fn build(stride: Box<[usize]>, shape: Box<[usize]>) -> MetaData {...}: Can be used to build MetaData.
  • pub fn handle_empty<R: Runtime>(&self) -> (&Self, Handle) {...} builds a tuple containing &self and a empty Handle of a gpu client.
  • pub fn handle_from_vec<R: Runtime>(&self, input: Vec<f32>) -> (&Self, Handle) {...} builds a tuple containing &self and a filled Handle of a gpu client.

model's mod.rs contains the Operation trait.

#![allow(unused)]
fn main() {
pub trait Operation: Sized {
    type Ctx;
    type Operator;
    type Output;

    fn exec(ctx: Self::Ctx, operator: Self::Operator) -> Self::Output;
}
}

Every Operation needs a Ctx, which can be implemented trough #[ctx] and an Operator which needs to be implemented trough #[operator]. The Operator macro allow us to build a TensorHandleRef tuple out of GbdtHandle's which will be used to build the Operation on the GPUs Kernel.

The Operators tuple is our representation of the Memory in the GPU. For now, #[operator] just builds an implementation over a structure like

#![allow(unused)]
fn main() {
pub struct GbdtOperator {
    pub target: (MetaData, Handle),
    pub table: (MetaData, Handle),
    pub buffer: (MetaData, Handle),
}
}

to build a tuple of TensorHandleRef<R> which will be lowered with CubeCL into the Kernel with a specific Operation which contains the needed Kernel of the Model.

Future

In the future we will lower all implementations of lib-calculator into the GPU Kernel with CubeCL. The goal here is that we lib-calculator also can help with standardization in lib-stage.

GPU-Optimized GBDT with CubeCL: A Step Towards XGBoost | 25.01.2025

To build our own XGBoost algorithm, I will start with a GPU-optimized Gradient Boosted Decision Tree (GBDT). This step will help me get a feeling for CubeCL's reduce feature and understand how to extend it for efficient GPU operations.

Current Approach

The CubeCL reduce feature processes each GPU operation individually. For each primitive operation (+, *, argmax, ...), the CPU allocates memory on the GPU, sends data for computation, retrieves the result, and repeats this for subsequent operations. This isn't optimal because of frequent memory allocations and data transfers.

Goals

The primary focus is to implement a GBDT using CubeCL’s reduce feature in its current state. This will provide a deeper understanding of its behavior and limitations. Initially, the implementation will prioritize functionality over optimization to build familiarity with CubeCL-reduce for future development.

Next Steps

In the future, CubeCL-reduce will be enhanced to support multiple GPU operations in a single step. This approach will reduce the need for repetitive memory allocations. Additionally, strategies for efficient memory management will be developed to minimize the overhead caused by data transfers between the CPU and GPU. These improvements will form the basis for creating an efficient GPU-based XGBoost algorithm.

What’s Next?

Next week, I plan to release the first version of the GBDT implementation using CubeCL reduce. While it will not yet be optimized, this version will provide valuable insights and set the stage for further enhancements.

Use AI Tool Demo | 14.01.2025

With today’s update, the Use-AI.rs tool is introduced! To get started, you need to have Rust installed and a MongoDB instance running at mongodb://127.0.0.1:27017/.

You can target other domains by modifying the connection string in mangodb.rs. While this approach is not ideal and will be improved in the future, it demonstrates the current design concept adequately for now.

To begin, navigate to crates/use-ai in your terminal. Here, you can run the exploration tool for Use-AI.rs.

Steps to Get Started:

  1. Open the Config menu.
  2. Select Import and provide the following:
    • Path: The path to the example configuration file config_GBDT_example.json.
    • Name: A name for the imported configuration.
  3. After importing, load the imported configuration for use in the Start menu.
  4. Navigate to the Start menu.
  5. Select Init Transformation to transform the provided test.csv.

Output:

The terminal will display a table representation of the transformed data, which is ready for downstream processing by the Operator.

  • Header Mapping: The headers are mapped appropriately.
  • String Records: Non-numerical string records are mapped to their numerical representation in the data.

At this stage, the transformation demonstrates the data flow up to the lowest abstraction layer (Operator), which is not yet fully implemented.

Future Plans

The next milestone in building the first complete data pipeline for Use-AI.rs is the development of a GPU-optimized GBDT algorithm for the Operator with Burn.

Disclaimer

Today’s update introduces the first major structural decision for the project.
While the crate setup will remain as it is, individual functions and types may be reorganized in future iterations. However, all components will continue to follow a similar structural approach to what exists now.

Welcome to my blog | 11.01.2025

Since I am building this framework alongside my job, I have more time to think about the project than to actually work on it. Therefore, I decided to create something akin to a blog to provide insights into the thought processes behind the scene. The goal is to make sense of the sketches I’ve created and to develop a clearer understanding of future plans.

Disclaimer:

Let’s call this a very pre-alpha stage. What this means is that not only do we not yet have a full picture of what features the final version will include, but we also haven’t finalized the naming of all structural components. To manage the complexity, I typically sketch out various structures and implement features in them to determine the optimal approach. This process won’t take too long, as I have already tested several ideas and have a good sense of the structure I want to adopt.

Project Goal:

The goal of Use-AI.rs is to create an engaging framework for the practical use of AI in production. This can be summarized as: "An engaging framework for Deep Reinforcement Learning (DRL) and Ensembled Models (EM), implemented in Rust." I have chosen DRLs and EMs because, in my view, they are among the most practical tools for applications such as robotics, IoT, and economics. While there are existing solutions, the most convenient ones are often too dependent on external ecosystems, making them less appealing for production use. Others lack sufficient flexibility and engagement. By leveraging Rust, we aim to address these shortcomings and build a future-proof solution.

Current State:

Currently, there is a JSON file called ai_config_example.json, which represents the highest level of abstraction we aim to achieve. While the JSON is not fully refined yet, we will simplify its complexity for now. As features are implemented, we can reintroduce complexity as needed. However, the ultimate goal is to ensure that even simplified configurations will continue to work in future versions, allowing users to abstract functionality to their desired depth while maintaining compatibility with the framework's layers. The reason for a configuration file is to enable the implementation of complex DRL and EM algorithms without requiring direct modifications to the framework's code. Since DRLs and EMs are essentially combinations of mathematical and logical methods, the configuration file defines the operations that will be applied to input data. At this stage, the files ai_config_example.json and data_from_csv.rs provide a glimpse of what I aim to achieve in this regard.

Future Plans:

The immediate goal is to reach the pre-alpha stage. Since I plan to update this project at least once a week, this stage should be reached relatively quickly, as I already have a clear idea of the necessary changes. In the near term, you can expect significant structural updates and renaming of components. Once these foundational changes are complete, the next step will be to introduce Burn. Initially, Burn will be used for GPU tensor operations for neural networks and some DRL functionalities. However, since Burn does not yet cover everything needed for high-performance computing (HPC) in reinforcement learning, we will later introduce CubeCL to build custom RL kernels for GPU hardware. Since Burn and CubeCL exist within the same ecosystem, this integration will allow us to create a robust and high-performance deep AI framework.