How It Works

This page provides a technical deep-dive into Julia's architecture. Understanding how Julia works under the hood can help you get the most out of her capabilities and appreciate the technology powering your conversations.

High-Level Architecture

Julia is built as a multi-channel AI assistant with a unified backend that processes messages from WhatsApp and Telegram, orchestrates AI reasoning, and executes actions through integrated tools.

Channels

WhatsApp, Telegram

Privacy Layer

Pseudonymization

Orchestrator

NestJS Backend

Tools

Calendar, Email, etc.

YouChannelPrivacy LayerLLM + ToolsDe-pseudonymizeResponse

Message Processing Flow

Every message you send goes through a sophisticated pipeline that combines context retrieval, AI reasoning, and tool execution:

Message Reception

Channel Dispatcher Service

Your message arrives via WhatsApp or Telegram webhook. The channel adapter normalizes the message format and identifies your user account through the linked channel identity.

User Context Loading

Context Builder Service

The system loads your profile, recent messages, active tasks, and contacts. This context helps the AI understand who you are and what you've been working on.

Quota Check

Subscription Service + Redis

Your subscription tier is checked to ensure you haven't exceeded daily or monthly message limits. Redis tracks usage in real-time.

Privacy Layer (Pseudonymization)

Privacy Gateway Service

Before reaching the AI, all personal data is pseudonymized. Contact names become PERSON_A, emails become EMAIL_A, etc. This ensures the LLM never sees your real personal information.

LLM Processing

Unified LLM Service

Your pseudonymized message, along with context and available tools, is sent to the LLM (GPT-4o or GPT-4o-mini depending on tier). The LLM decides what actions to take.

Tool Execution

Tool Executor Service

If the LLM decides to use tools (check calendar, send email, etc.), pseudonyms are converted back to real values, tools are executed, then results are re-pseudonymized.

Response Generation

Orchestrator Service

The LLM generates a natural language response based on tool results and context. Memory writes are processed to store important information.

De-pseudonymization

Privacy Gateway Service

Before sending the response to you, pseudonyms are converted back to real names. 'I scheduled with PERSON_A' becomes 'I scheduled with Sarah Chen'.

Message Delivery

Channel Service

The response is sent back through the appropriate channel (WhatsApp or Telegram API) and the conversation is logged for future context.

AI & Function Calling

Julia uses OpenAI's function calling (tool use) capability to interact with external services. This allows the AI to take real actions rather than just generating text.

Available Tools

google_calendarList, search, check availability, create events

outlook_calendarSame as Google Calendar for Microsoft accounts

emailList, search, read, send, reply, forward emails

taskCreate, list, and complete tasks

contactCreate, update, search, and list contacts

memoryRead/write user profile and episodic memory

brave_searchSearch the web for current information

ask_clarificationRequest clarification from the user

account_statusGet subscription and usage information

ReAct Agent Pattern

Julia uses a ReAct (Reasoning + Acting) agent pattern. This means the AI can:

Reason about what information is needed
Call one or more tools to gather information or take action
Observe the results and decide if more actions are needed
Iterate up to 5 times for complex multi-step tasks
Generate a final response based on all gathered information

Memory System

Julia's memory system enables her to learn and remember information across conversations. It combines structured profile data with semantic search over episodic memories.

Profile Memory

Structured JSON data that's always included in context:

{
  "personal": {
    "name": "Sarah",
    "timezone": "Europe/Paris"
  },
  "preferences": {
    "meetings": "morning",
    "communication": "concise"
  },
  "work": {
    "company": "TechCorp",
    "role": "Product Manager"
  }
}

Episodic Memory

Important events stored with vector embeddings for semantic search:

• Stored as text with 1536-dim OpenAI embeddings
• pgvector extension for efficient similarity search
• Retrieved based on relevance to current message
• Importance score affects retrieval priority

Smart Context Building

When enabled, Julia uses an LLM call (GPT-4o-mini) to analyze your message and determine which context is relevant. This reduces the amount of data sent to the main LLM by 50-75%, improving both speed and cost while maintaining accuracy.

Technology Stack

Backend

NestJS (Node.js framework)
TypeScript
PostgreSQL + pgvector
Redis (caching, rate limiting)
TypeORM

AI & Integrations

OpenAI GPT-4o / GPT-4o-mini
OpenAI Embeddings (text-embedding-3-small)
Brave Search API
Google APIs (Calendar, Gmail)
Microsoft Graph API

Frontend

Next.js 14 (App Router)
TypeScript
Tailwind CSS
Framer Motion
Zustand (state management)

Infrastructure

Docker / Docker Compose
Stripe (payments)
WhatsApp Business API
Telegram Bot API

Back to Documentation Home