EchoAI Technical Documentation

Formal documentation of architecture, systems, and future roadmap.

1. Introduction

EchoAI is a next-generation conversational assistant built entirely on Android OS. It integrates cutting‐edge voice interaction, advanced deep thinking, image processing, web search capabilities, and secure local memory management.

Designed with a focus on privacy, adaptability, and transparency, EchoAI is engineered to deliver intelligent, context-aware responses without compromising user data security.

EchoAI is the culmination of innovative mobile development and state‐of‐the‐art AI integration, aimed at redefining conversational experiences on the go.

2. System Architecture Overview

The EchoAI platform is composed of several modular subsystems that work in tandem:

High-Level Data Flow:

  1. User input is captured as voice or text.
  2. The system processes context and detects user intent.
  3. If needed, an external API query is initiated (for vision or web search).
  4. Responses are filtered, summarized, and then integrated.
  5. Extracted facts are stored locally for memory recall.
  6. Results are presented via text display or synthesized speech output.

// Example of high-level flow:
User Input -> Intent Detection -> [Optional External Query] -> Summarization -> Memory Update -> Output
      

3a. Voice Interaction System

This subsystem utilizes Android’s native SpeechRecognizer to capture voice input and process speech in real time. It supports multi-language recognition and dynamic adjustments for speech rate and pitch.

Technical Detail: The system leverages partial results for immediate feedback and implements error correction algorithms for enhanced transcription accuracy.

3b. Deep Think Engine

The Deep Think Engine is engineered to handle complex queries by interfacing with GPT-based models. It constructs detailed prompts and leverages streaming responses to provide step-by-step reasoning.

Example Prompt:


"Explain the principles of quantum computing step-by-step. Finally, provide a summary starting with 'Final Answer:'"
      

3c. Vision Module

The Vision Module enables image-based interactions by employing OCR to extract text and utilizing Azure Computer Vision for object detection and scene description.

Technical Note: The module concurrently combines results from multiple APIs and aggregates them into a unified descriptive output.

3e. Memory & Knowledge Management

EchoAI maintains a secure, on-device memory store for conversation history and extracted facts, ensuring context-aware responses over prolonged interactions.

Implementation Detail: Memory extraction algorithms scan conversation transcripts for key phrases (e.g., “I love”, “I like”) to dynamically update the memory store.

3f. Privacy & Security Systems

EchoAI enforces strict privacy and security measures. All sensitive data, including API keys and conversation history, is encrypted and stored locally.

Security Enhancements: Certificate pinning and TLS secure all external communications.

4. Roadmap & Future Enhancements

Future Direction: Continuous improvement driven by user feedback and emerging AI advancements.

5. Developer Notes

EchoAI is developed entirely on Android OS using Code Studio and Python Studio. The project emphasizes modern encryption, asynchronous processing, and modular design to ensure a responsive and secure user experience.

6. MainActivity.java Documentation

Overview

MainActivity.java is the central hub of EchoAI. It initializes the chat interface, manages broadcast receivers, and orchestrates data flow between subsystems.

Technical Highlights:

  • Robust asynchronous communication using Handlers.
  • Dynamic ListView updates for real-time chat.
  • Secure API key management through integrated decryption routines.

Key Components & Workflow

The activity sets up UI components, processes user inputs through methods like sendUserInput(), and persistently stores conversation sessions.


// Pseudocode snippet:
if (userInput is voice) {
  processVoiceInput();
} else {
  processTextInput();
}
        

Error Handling

Errors are logged via AppLogger and communicated to users via UI notifications.

7. VoiceActivity.java Documentation

Overview

VoiceActivity.java facilitates real-time voice interaction using Android’s SpeechRecognizer and TextToSpeech systems, with an Azure TTS fallback for increased reliability.

Subsystem Details

  • Speech Recognition:
    • Configured for partial results with multi-language support.
    • Handles asynchronous recognition events and error recovery.
  • Text-to-Speech:
    • Customizable via user settings (language, voice, rate).
    • Synchronizes TTS state with UI animations via broadcast receivers.

Integration & Robustness

Coordinates with external components (e.g., VisionActivity) to deliver a seamless voice experience.

8. VisionActivity.java Documentation

Overview

VisionActivity.java handles image capture and processing for OCR and object detection, merging API responses to generate a natural language description.

Technical Workflow

  • Camera Setup: Uses TextureView and the legacy Camera API for live preview.
  • Image Processing: Scales, rotates, and encodes images for API consumption.
  • Concurrent API Calls: Sends images to OCR.space and Azure Computer Vision concurrently.
  • Aggregation: Merges API responses into a unified descriptive message.

Error Handling

Ensures proper resource management and graceful handling of API errors via AppLogger.

9. DeepThinkActivity.java Documentation

Overview

DeepThinkActivity.java enables reflective, in-depth analysis by interfacing with the OpenAI Chat API in a streaming manner. It displays a chain-of-thought along with a final answer in a dynamic chat interface.

Architecture & Workflow

  • Constructs detailed prompts to elicit step-by-step reasoning.
  • Streams intermediate responses to update the UI in real time.
  • Stores conversation summaries for persistent history.

Technical Considerations

Utilizes background threads, custom adapters, and UI handlers to maintain high responsiveness.

10. WebSearchActivity.java Documentation

Overview

WebSearchActivity.java provides a chat-like interface for performing web searches using DuckDuckGo and synthesizing detailed answers with the OpenAI Chat API.

Detailed Workflow

  1. Encode the user query and send a GET request to DuckDuckGo.
  2. Parse the JSON response to extract search summaries or related topics.
  3. Construct a composite prompt for the OpenAI API based on search results.
  4. Stream the AI response to update the UI in real time.
  5. Store a truncated summary of the interaction in conversation history.

Error Handling

A ProgressDialog indicates processing, and Toast messages notify the user of any network or parsing errors.

11. ExploreActivity.java Documentation

Overview

ExploreActivity.java offers a dynamic discovery interface that aggregates multiple rows of interactive cards—including fixed content, dynamic user-based categories, GPT-generated ideas, trending news, and recent media.

Key Features

  • Fixed Rows: Pre-defined suggestions such as featured cards and art prompts.
  • Dynamic Generation: Fetches user-based topics and GPT-generated categories.
  • Asynchronous Loading: Background threads and handlers fetch images and API data for smooth UI performance.
  • Interactive Cards: Tappable cards launch related activities with context-specific prompts.

Implementation Details

The activity uses HorizontalScrollView and nested LinearLayout elements to organize content. GPT-generated categories are fetched asynchronously to update the UI dynamically.

12. KeystoreHelper.java Documentation

Overview

KeystoreHelper.java offers a secure interface for generating, encrypting, and decrypting keys using the Android Keystore. It supports both device-specific encryption and universal encryption modes.

Device-Specific Encryption

  • Key Generation: Generates an AES key stored in the Android Keystore using KeyGenParameterSpec.
  • Encryption: Uses AES/GCM/NoPadding to produce Base64-encoded ciphertext and IV.
  • Decryption: Reconstructs the original data using the stored key and IV.

Universal Encryption

  • Uses a hardcoded Base64-encoded key for pre-encrypting data during development.
  • Allows sensitive data (e.g., API keys) to be encrypted before shipping the APK.

Technical Workflow

The helper methods abstract cryptographic operations via the Cipher API and GCMParameterSpec. For example:


// Example of device-specific encryption:
EncryptionResult result = KeystoreHelper.encrypt("sensitiveData");
        

Error Handling

Exceptions are thrown if encryption or decryption fails, ensuring that any cryptographic errors are properly propagated.

13. PreferencesHelper.java Documentation

Overview

PreferencesHelper.java manages persistent storage of encrypted API keys and their initialization vectors (IVs) using SharedPreferences. It supports multiple API keys used throughout the application.

Functional Capabilities

  • Saving Keys: Methods such as saveEncryptedApiKey() securely store keys.
  • Retrieving Keys: Methods such as getEncryptedApiKey() abstract SharedPreferences access.
  • Clearing Keys: Provides functionality to remove stored keys, enabling key rotation.

Implementation Details

Each API key type is mapped to a unique constant in SharedPreferences for uniform access and reduced redundancy.


// Example: Saving an encrypted API key:
PreferencesHelper.saveEncryptedApiKey(context, encryptedApiKey, iv);
        

Error Handling

These methods rely on Android’s robust SharedPreferences system and assume successful storage and retrieval.

14. ToolsManager.java Documentation

Overview

ToolsManager.java encapsulates static helper methods for device-level operations such as flashlight control, image capture, vibration, and timer management. It leverages legacy APIs to ensure compatibility with older Android devices.

Key Responsibilities & Methods

  • turnOnFlashlight(): Activates the flashlight using the Camera API with a dummy SurfaceTexture.
  • turnOffFlashlight(): Stops the camera preview and releases camera resources.
  • takePicture(): Launches the device’s camera app via an Intent.
  • vibrateDevice(): Triggers device vibration using appropriate APIs based on Android version.
  • setTimer(): Initiates a countdown timer via CountDownTimer and executes a callback when finished.

Core Implementation Details

Hardware-specific operations are abstracted into reusable static methods. For example, a dummy preview is configured to enable torch mode, and conditional checks ensure compatibility with both legacy and modern vibration APIs.


// Example usage:
if (ToolsManager.turnOnFlashlight()) {
  // Flashlight activated successfully.
}
        

Error Handling and Logging

Methods are enclosed in try-catch blocks and log errors using AppLogger.log() to ensure graceful failure and troubleshooting.

16. MemoryActivity.java Documentation

Overview

MemoryActivity.java displays and manages the assistant's memory of extracted conversational facts. It loads stored memories from SharedPreferences, displays them in a ListView, and offers functionality to clear the memory.

Key Responsibilities

  • Data Retrieval: Loads memory data (stored as a JSON array) from SharedPreferences.
  • UI Presentation: Displays memory items in a ListView using an ArrayAdapter.
  • Data Management: Provides a clear memory function that updates both the UI and persistent storage.

Technical Implementation

The activity retrieves memory data using the key ai_memory from the EchoAI_Memory SharedPreferences file. The JSON array is parsed and each memory item is added to an ArrayList that is bound to a ListView.


// Pseudocode:
String memoryJson = prefs.getString("ai_memory", "[]");
JSONArray memoryArray = new JSONArray(memoryJson);
for (int i = 0; i < memoryArray.length(); i++) {
  memoryItems.add(memoryArray.getString(i));
}
        

Error Handling

JSON parsing errors are caught and logged. Users are notified via Toast messages if memory loading fails.

15. CommandMapping.java Documentation

Overview

CommandMapping.java is a lightweight classifier that detects specific command patterns in recognized text and returns corresponding ActionType values. It supports both standard commands and additional tool commands.

Key Responsibilities

  • Pattern Matching: Utilizes case‑insensitive regular expressions to detect commands.
  • Weighting: Assigns numerical weights to mappings to influence selection priority.
  • Parameter Extraction: Extracts command parameters via captured regex groups.
  • Default Fallback: Provides a default mapping for unrecognized input.

Core Components

  • Regular Expressions: Compiled patterns stored for fast matching.
  • ActionType Enum: Maps each pattern to a corresponding command action.
  • Mapping Weight: Numerical value indicating priority when multiple mappings match.

Technical Workflow

When input text is processed, each CommandMapping iterates through its regex patterns. If a match is found, parameters are extracted and the corresponding ActionType is returned. The static method getDefaultMappings() returns a list of mappings covering common commands.


// Example usage:
List<CommandMapping> mappings = CommandMapping.getDefaultMappings();
for (CommandMapping mapping : mappings) {
  if (mapping.matches(userInput)) {
    List<String> params = mapping.extractParameters(userInput);
    // Execute action with extracted parameters.
  }
}
        

Error Handling and Logging

All pattern additions and matches are logged via AppLogger.log() to facilitate debugging and ensure reliable command detection.

17. AzureTTS.java Documentation

Overview

AzureTTS.java provides methods to synthesize speech using Azure's TTS REST API. It supports both a basic single-argument method and an overloaded version that accepts an AzureTTSCallback for notifying callers when playback starts and finishes.

Key Responsibilities

  • SSML Generation: Constructs SSML payloads for Azure TTS.
  • HTTP Communication: Manages secure HTTP POST requests to Azure’s TTS endpoint.
  • Audio Playback: Saves the synthesized speech as a WAV file and plays it back using MediaPlayer.
  • Callback Integration: Notifies the caller when TTS playback starts and finishes.

Core Components & Workflow

The class builds an SSML request that encapsulates the text to be spoken, sends it via an HTTP POST request to Azure's TTS endpoint, and checks for audio content in the response. The received audio file is stored in cache and played using MediaPlayer, triggering callback methods on start and completion.


// Pseudocode for speech synthesis:
ssml = "...your text...";
POST ssml to Azure TTS endpoint;
if (response is audio) {
  save audio file;
  play audio using MediaPlayer;
  callback.onTTSStarted();
  ...
  callback.onTTSFinished();
}
        

Error Handling

Network or content errors are logged via AppLogger.log(), and the callback's onTTSFinished() is invoked to signal processing end.

Integration

AzureTTS.java is integrated within the voice subsystem to provide an alternative TTS engine, enabling dynamic selection based on user preferences.

18. ImageGen.java Documentation

Overview

ImageGen.java is an IntentService that generates images based on a given text prompt using OpenAI’s image generation API. The service loads and decrypts its API key at startup, sends a POST request with the prompt, and broadcasts the resulting image URL.

Key Responsibilities

  • API Key Management: Loads and decrypts the ImageGen API key from SharedPreferences or asset files using KeystoreHelper and PreferencesHelper.
  • HTTP Communication: Sends a JSON payload containing the prompt to the OpenAI image generation endpoint.
  • Response Handling: Parses the JSON response to extract the generated image URL and broadcasts it.
  • Service Notification: Broadcasts an intent when image generation starts and when the image URL is available.

Core Components & Workflow

Upon receiving an intent with a text prompt, the service sends a POST request to the OpenAI image generation endpoint. The JSON payload includes the prompt, the number of images to generate, and the desired size. If successful, it parses the returned JSON to extract the image URL and broadcasts it using a custom intent.


// Pseudocode for image generation:
if (prompt is not empty) {
  POST { "prompt": prompt, "n": 1, "size": "512x512" } to OpenAI image endpoint;
  if (response code is OK) {
    parse JSON to extract image URL;
    broadcast "OPENAI_IMAGE_GENERATED" with image URL;
  } else {
    log error;
  }
}
        

Error Handling

If the API call fails or no image URL is returned, the service logs the error via Log.e and broadcasts an appropriate error message.