At Use-AI.rs, we are building an open-source AI framework in Rust. Our goal is to create a concurrent, locally hostable AI agent for practical applications in production environments.
The different layers of abstraction build spaces for different professions. For Business Informatic Specialists, Mathematicians, and Computer Scientists, Use-AI.rs provides a comprehensive set of tools and libraries to build and extend AI applications.
The entrypoint of Use-AI.rs is the use_ai
crate. It provides a toolset for Business Informatic Specialists and Mathematicians to test and deploy AI applications with ease.
use_ai
also is a good starting point for Computer Scientists since it shows how to implement the underlying architecture of the framework.
The underlying architecture is mostly interesting for Computer Scientists and can be found in the core
crate. It contains the following modules:
lib-store
: A synchronous storage module with MangoDB.lib-stage
: Handles data processing and normalization with Rayon and ndArray.lib-calculator
: Here we operate the GPU and build AI models with CubeCL.lib-proc_macros
: A set of procedural macros for simplifying common tasks in AI development.
At lib-calculator::model
we find a space also interesting for Mathematicians since it provides a set of easy to use tensor operations that can be used to build powerful AI models.
Tooling
Test AI
Use AI
Current
Use-AI is just a basic cli tool written with Inquire. Inquire is used since we just want to have a nice little entry point for People who doesn't now the project and want to find out what with data can be done with Use-AI.rs. Also, Inquire provides an interface which allows you to easily find the underlying functions so you have an example of how Use-AI.rs can be used in production.
Future
The Use-AI Tool is just the first tool we want to provide. In Future the goal is to provide a hole Network Server Layer. For the Server layer we are thinking about a TCP Server. The Server should provide a load test tool for a hypothetical Query Server. But since this is a plan in far future these are only idea's. When someone I found who can implement something like that it may be implemented in parallel.
Store
Current
lib-store
is the first library of the Core of Use-AI.rs. Here we find ai_config.rs
and mangodb.rs
.
Future
Not much will happen here. Only when the Config is expanded we will see some changes here. Some additional Db request functions and some functions regarding Configs will be added here when needed.# Store
Config
ai_config.rs
, is the entrypoint for the configuration file.
Models
, will be provided bylib-calculator
andDataSection
, will be provided bylib-stage
.
MangoDB
mangodb.rs
, operates MangoDB in synchronous.
Since we expect this Db to run on a system what also handles our data in parallel on the cpu and
since we don't have to worry about huge db requests it is decided that we only communicate
synchronised with the Db so we don't spawn threads that block building threads of performance critical features.
Stage
Current
Here is where the construction site begins. Currently, we only have a CSV pipeline to show what the plan is with lib-stager
.
This pipeline shows how we want to handle data on a more abstract layer. For now, we just input some csv file and handle
it from there with different operations on columns or the hole data. This is chosen because we want to reach a first complete
pass through Use-AI.rs using a simple XGBoost algorithm for now. Till we expanded that simple XGBoost it's called GBDT here.
Future
For DQNs and Q-learning we need to expand lib-stage
. Goal is not to only use files instead this will be used to Configure
a Data Stream which first, will enable the possibility to get Input from program external drivers and also, will add the
possibility for communication between models. But to be able to implement such features we first need to focus on the
lib-calculator
.
The functions as they are, will be renamed in near Future.
Config
Data Input
Transfer
Data Output
Calculator
In lib-calculator
the most crucial parts are the module model
and the operator
.
In model
we can easily build models within the operation
trait which are supported by the opertator
module and trait.
So, the operator
provides functions through the PipelineExec
and PipelinePush
traits.
The operator
will be executed on the GPUs Kernel.
Every model
needs to implement the Operation
trait
Config
Model
When you want to integrate a new model
you can do that by doing these three things:
- Build a struct which includes tuples of
MetaData
andHandle
. This structure is our abstract representation of the shared memory on the GPU. The macro#[operator]
above your struct will allocate aTensor
on the GPU and leave you aTensorHandelRef
to work with in theoperation
Operation
Proc Macros
Welcome, Dev!
Here you can find the Blog of Use-AI.rs. Here Developers who are currently actively Developing Use-AI.rs make technical discussions of the current state of Use-AI.rs.
Introducing Action Space
Maintainer's Comment | 17.03.2025
Since I had a major illness, I was unable to post this blog as planned. Also, I wasn't able to code at the pace and with the quality I intended, which led to incomplete code. The reason why I still decided to commit the code was that you guys can see what I've been working on and how I've been progressing. I finally got around to writing about Action Space, a procedural macro designed to compose Tensor operation sequences in Rust. This is still not working properly, but I hope to fix it soon.
Introduction
When working with AI models, especially in reinforcement learning and ensembled learning, you often end up chaining multiple operations on Tensor. Action Space is a Rust procedural macro that makes composing these tensor operations straightforward. With Action Space, the goal was to abstract Tensor operation sequences so that they can be easily composed and reused. This is intended to make it easier to manage and maintain complex Tensor operations.
Example: Composing Tensor Operations
Here's a basic example demonstrating how Action Space helps structure Tensor computations in Rust:
#![allow(unused)] fn main() { use cubecl::wgpu::WgpuRuntime; use lib_proc_macros::action_space; fn some() { let input: TensorHandleRef<WgpuRuntime> = build_tensor::<WgpuRuntime>(); let result: TensorHandleRef<WgpuRuntime> = action_space!( WgpuRuntime, (input, ExecMean, output), ((input, output), PrepResiduals), (ExecSum), (PrepSquare), (ExecProd), ); } }
Breaking It Down
- The macro takes a Runtime to define the environment.
- The first tuple takes the input before the operations are applied.
- Operations from the
PipelineExec
andPipelinePush
trait are applied sequentially. - If we want to handle the output of a specific operation differently, we can add a variable name in the tuple behind the operation.
- When the operation needs a non-default input, we can specify it with a tuple before the operation.
- The default case takes the output of the previous operation and passes it to the next operation.
For now, the goal here is that we can input either a 1D, 2D, or 3D Tensor.
- A 1D Tensor results in a Scalar.
- A 2D Tensor results in a Vector.
- A 3D Tensor results in a Matrix.
This isn't working properly yet, and we also do not handle Runtime and Client as intended. This results in the plan to fix and improve this feature next.
Short Update
Maintainer's Comment | 12.02.2025
On the weekend I have Implemented the first PipelineExec
to show how we want to use cubecl-reduce
for the first Iteration of Use-AI.rs.
In the future we want to build our own reduce strategies and configurations. But this will be good enough for now.
With PipelinePush
we will see the first Kernels which fill Tensors depending on the type of PipelinePush
needed for Future execution trough PipelineExec
.
With these reusable Pipeline Operations we can easily compose different AI Models with GPU Optimisation.
This most likely implicates, that we are no longer aim to lower all of lib-calculator
into the GPU. More likely lib-calculator::operation
will be the
only module what will be completely on the GPU Kernel.
Apart from this I decided this week to use XGBoost: A Scalable Tree Boosting System (Tianqi Chen and Carlos Guestrin, 10. Jun 2016) as the paper which will lead us in implementing XGBoost.
Next week we will see more code again!
New Blog and Doc Updates
Maintainer's Comment | 03.02.2025
Welcome to the new Blog
As you may have noticed the new Blogs place will also will be used as entry place for the Use-AI.rs Documentation. This was decided due to the fact, that it's easier to maintain a good documentation within the codebase compared to an external Application. So, Welcome to the new Documentation! This blog will be useful to communicate changes and discuss development decisions for future users of the Framework.
State
Today I will go tough the hole Project and will discuss the decisions made. This will be structured from the highest abstraction used in Use-AI.rs down to the lowest abstraction layers. This represents the architecture chosen. These discussions will also be used in the Book to introduce the idea of each component.
Use-AI
Current
Use-AI is just a basic cli tool written with Inquire. Inquire is used since we just want to have a nice little entry point for People who doesn't now the project and want to find out what with data can be done with Use-AI.rs. Also, Inquire provides an interface which allows you to easily find the underlying functions so you have an example of how Use-AI.rs can be used in production.
Future
The Use-AI Tool is just the first tool we want to provide. In Future the goal is to provide a hole Network Server Layer. For the Server layer we are thinking about a TCP Server. The Server should provide a load test tool for a hypothetical Query Server. But since this is a plan in far future these are only idea's. When someone I found who can implement something like that it may be implemented in parallel.
Store
Current
lib-store
is the first library of the Core of Use-AI.es. Here we find ai_config.rs
and mangodb.rs
.
ai_config.rs
, is the entrypoint for the configuration file.
Models
, will be provided bylib-calculator
andDataSection
, will be provided bylib-stage
.
mangodb.rs
, operates MangoDB in synchronous.
Since we expect this Db to run on a system what also handles our data in parallel on the cpu and
since we don't have to worry about huge db requests it is decided that we only communicate
synchronised with the Db so we don't spawn threads that block building threads of performance critical features.
Future
Not much will happen here. Only when the Config is expanded we will see some changes here. Some additional Db request functions and some functions regarding Configs will be added here when needed.
Stage
Current
Here is where the construction site begins. Currently, we only have a CSV pipeline to show what the plan is with lib-stager
.
This pipeline shows how we want to handle data on a more abstract layer. For now, we just input some csv file and handle
it from there with different operations on columns or the hole data. This is chosen because we want to reach a first complete
pass through Use-AI.rs using a simple XGBoost algorithm for now. Till we expanded that simple XGBoost it's called GBDT here.
Future
For DQNs and Q-learning we need to expand lib-stage
. Goal is not to only use files instead this will be used to Configure
a Data Stream which first, will enable the possibility to get Input from program external drivers and also, will add the
possibility for communication between models. But to be able to implement such features we first need to focus on the
lib-calculator
.
The functions as they are, will be renamed in near Future.
Calculator and Proc Macros
Current
Here we find the first structures of how the Gpu Backends will be called, build and executed.
lib.rs
gives us the first structure which needs to be provided by stage.
#![allow(unused)] fn main() { pub struct MetaData { pub stride: Box<[usize]>, pub shape: Box<[usize]>, } }
Every array needed for the operation needs to be provided by lib-stage
as MetaData
and its Vec<f32>
representation.
pub fn build(stride: Box<[usize]>, shape: Box<[usize]>) -> MetaData {...}
: Can be used to buildMetaData
.pub fn handle_empty<R: Runtime>(&self) -> (&Self, Handle) {...}
builds a tuple containing&self
and a emptyHandle
of a gpu client.pub fn handle_from_vec<R: Runtime>(&self, input: Vec<f32>) -> (&Self, Handle) {...}
builds a tuple containing&self
and a filledHandle
of a gpu client.
model
's mod.rs
contains the Operation
trait.
#![allow(unused)] fn main() { pub trait Operation: Sized { type Ctx; type Operator; type Output; fn exec(ctx: Self::Ctx, operator: Self::Operator) -> Self::Output; } }
Every Operation
needs a Ctx
, which can be implemented trough #[ctx]
and an Operator which needs to be implemented
trough #[operator]
. The Operator macro allow us to build a TensorHandleRef
tuple out of GbdtHandle
's which will be
used to build the Operation
on the GPUs Kernel.
The Operators tuple is our representation of the Memory in the GPU. For now, #[operator]
just builds an implementation
over a structure like
#![allow(unused)] fn main() { pub struct GbdtOperator { pub target: (MetaData, Handle), pub table: (MetaData, Handle), pub buffer: (MetaData, Handle), } }
to build a tuple of TensorHandleRef<R>
which will be lowered with CubeCL into
the Kernel with a specific Operation
which contains the needed Kernel of the Model.
Future
In the future we will lower all implementations of lib-calculator
into the GPU Kernel with CubeCL. The goal here is that
we lib-calculator
also can help with standardization in lib-stage
.
GPU-Optimized GBDT with CubeCL: A Step Towards XGBoost | 25.01.2025
To build our own XGBoost algorithm, I will start with a GPU-optimized Gradient Boosted Decision Tree (GBDT). This step will help me get a feeling for CubeCL's reduce feature and understand how to extend it for efficient GPU operations.
Current Approach
The CubeCL reduce feature processes each GPU operation individually. For each primitive operation (+, *, argmax, ...), the CPU allocates memory on the GPU, sends data for computation, retrieves the result, and repeats this for subsequent operations. This isn't optimal because of frequent memory allocations and data transfers.
Goals
The primary focus is to implement a GBDT using CubeCL’s reduce feature in its current state. This will provide a deeper understanding of its behavior and limitations. Initially, the implementation will prioritize functionality over optimization to build familiarity with CubeCL-reduce for future development.
Next Steps
In the future, CubeCL-reduce will be enhanced to support multiple GPU operations in a single step. This approach will reduce the need for repetitive memory allocations. Additionally, strategies for efficient memory management will be developed to minimize the overhead caused by data transfers between the CPU and GPU. These improvements will form the basis for creating an efficient GPU-based XGBoost algorithm.
What’s Next?
Next week, I plan to release the first version of the GBDT implementation using CubeCL reduce. While it will not yet be optimized, this version will provide valuable insights and set the stage for further enhancements.
Use AI Tool Demo | 14.01.2025
With today’s update, the Use-AI.rs tool is introduced! To get started, you need to have Rust installed and a MongoDB instance running at mongodb://127.0.0.1:27017/
.
You can target other domains by modifying the connection string in mangodb.rs
. While this approach is not ideal and will be improved in the future, it demonstrates the current design concept adequately for now.
To begin, navigate to crates/use-ai
in your terminal. Here, you can run the exploration tool for Use-AI.rs.
Steps to Get Started:
- Open the Config menu.
- Select Import and provide the following:
- Path: The path to the example configuration file
config_GBDT_example.json
. - Name: A name for the imported configuration.
- Path: The path to the example configuration file
- After importing, load the imported configuration for use in the Start menu.
- Navigate to the Start menu.
- Select Init Transformation to transform the provided
test.csv
.
Output:
The terminal will display a table representation of the transformed data, which is ready for downstream processing by the Operator.
- Header Mapping: The headers are mapped appropriately.
- String Records: Non-numerical string records are mapped to their numerical representation in the data.
At this stage, the transformation demonstrates the data flow up to the lowest abstraction layer (Operator), which is not yet fully implemented.
Future Plans
The next milestone in building the first complete data pipeline for Use-AI.rs is the development of a GPU-optimized GBDT algorithm for the Operator with Burn.
Disclaimer
Today’s update introduces the first major structural decision for the project.
While the crate setup will remain as it is, individual functions and types may be reorganized in future iterations. However, all components will continue to follow a similar structural approach to what exists now.
Welcome to my blog | 11.01.2025
Since I am building this framework alongside my job, I have more time to think about the project than to actually work on it. Therefore, I decided to create something akin to a blog to provide insights into the thought processes behind the scene. The goal is to make sense of the sketches I’ve created and to develop a clearer understanding of future plans.
Disclaimer:
Let’s call this a very pre-alpha stage. What this means is that not only do we not yet have a full picture of what features the final version will include, but we also haven’t finalized the naming of all structural components. To manage the complexity, I typically sketch out various structures and implement features in them to determine the optimal approach. This process won’t take too long, as I have already tested several ideas and have a good sense of the structure I want to adopt.
Project Goal:
The goal of Use-AI.rs is to create an engaging framework for the practical use of AI in production. This can be summarized as: "An engaging framework for Deep Reinforcement Learning (DRL) and Ensembled Models (EM), implemented in Rust." I have chosen DRLs and EMs because, in my view, they are among the most practical tools for applications such as robotics, IoT, and economics. While there are existing solutions, the most convenient ones are often too dependent on external ecosystems, making them less appealing for production use. Others lack sufficient flexibility and engagement. By leveraging Rust, we aim to address these shortcomings and build a future-proof solution.
Current State:
Currently, there is a JSON file called ai_config_example.json, which represents the highest level of abstraction we aim to achieve. While the JSON is not fully refined yet, we will simplify its complexity for now. As features are implemented, we can reintroduce complexity as needed. However, the ultimate goal is to ensure that even simplified configurations will continue to work in future versions, allowing users to abstract functionality to their desired depth while maintaining compatibility with the framework's layers. The reason for a configuration file is to enable the implementation of complex DRL and EM algorithms without requiring direct modifications to the framework's code. Since DRLs and EMs are essentially combinations of mathematical and logical methods, the configuration file defines the operations that will be applied to input data. At this stage, the files ai_config_example.json and data_from_csv.rs provide a glimpse of what I aim to achieve in this regard.
Future Plans:
The immediate goal is to reach the pre-alpha stage. Since I plan to update this project at least once a week, this stage should be reached relatively quickly, as I already have a clear idea of the necessary changes. In the near term, you can expect significant structural updates and renaming of components. Once these foundational changes are complete, the next step will be to introduce Burn. Initially, Burn will be used for GPU tensor operations for neural networks and some DRL functionalities. However, since Burn does not yet cover everything needed for high-performance computing (HPC) in reinforcement learning, we will later introduce CubeCL to build custom RL kernels for GPU hardware. Since Burn and CubeCL exist within the same ecosystem, this integration will allow us to create a robust and high-performance deep AI framework.