Ai Assistant – Devlog
Project goal-

My goal with this project is to eventually have my own locally hosted chatbot which is capable of real time TTS (Text to speech) and STT (Speech to text) along with generating its own responses with locally hosted LLMs (Large language models). I'm hoping to learn a lot more about working with AI and handling datasets throughout this project to make me a lot more adept with what will likely be the future of programming.

Processes-

The first thing that had to be done on this project was to create a discord bot that could respond to questions whilst staying completely in character. I decided to use a discord bot as a starting point as after research I feel is an excellent way to get real time user testing and experiment with the way the bot responds and remembers things. Also discord bots are capable of playing and hearing audio in voice channels which will allow me implement TTS and STT at a later date. To begin this I decided on a character for my Ai called Keith. He has a set of interests and hobbies which are included in his initial prompt along with some base information about the people he will be interacting with in the discord server.

Keith recognising the content of an image and talking with a user

Keith recognising the content of an image and talking with a user-

For the LLM we are currently using ChatGPT on the 3.5 turbo api like was done with The GitHub Button. We also added support for the GPT 4 api for image recognition so that Keith can analyse images to understand what's in them. We decided to stick with external sources for these models for now as the processing power to do them locally is very expensive along with complex to set up. Once we have the initial prototype of keith fully fleshed out we will then start shifting to locally hosted systems using docker containers and CUDA as after researching this seems like the best way.

Some of the code which Keith uses to access the current darts scores and convert them into readable information-

Some of the code which Keith uses to access the current darts scores and convert them into readable information-

Alongside his base functionality Keith also has some additional uses such as getting the live darts score. For this I made a system which scrapes the darts score from a website and formats it in a way that's easy for Keith to read. I decided to make this system as a web server because Keith will eventually be containerised into docker and it makes sense to separate things as much as possible to make debugging and fault finding easier in the long run.

Keith updating a user on the current darts scores

Keith updating a user on the current darts scores-

Evaluation-

Currently Keith is actually in a really good position. The LLM integration is working really well and it's near impossible to get him to break character. Dexter, who I am working with on this project, has also managed to train and create a working TTS voice model for Keith to use so that he can speak, and I'm currently in the process of containerising that so I can run it on my server locally.

Future Ideas-

This project has taught me so much about docker, TTS and STT systems, and about working with ai workloads in general. I'm definitely thinking now on how this could be implemented into games in the future for real time dynamic interactions with NPC’s and how else I could apply these technologies in my projects and life.