EchoAI Documentation

1. Introduction

EchoAI is a next-generation conversational assistant built entirely on Android OS. It integrates cutting‐edge voice interaction, advanced deep thinking, image processing, web search capabilities, and secure local memory management.

Designed with a focus on privacy, adaptability, and transparency, EchoAI is engineered to deliver intelligent, context-aware responses without compromising user data security.

EchoAI is the culmination of innovative mobile development and state‐of‐the‐art AI integration, aimed at redefining conversational experiences on the go.

2. System Architecture Overview

The EchoAI platform is composed of several modular subsystems that work in tandem:

Voice Interaction System
Deep Think Engine
Vision Module
Web Search System
Memory & Knowledge Extraction
Secure API Gateway

High-Level Data Flow:

User input is captured as voice or text.
The system processes context and detects user intent.
If needed, an external API query is initiated (for vision or web search).
Responses are filtered, summarized, and then integrated.
Extracted facts are stored locally for memory recall.
Results are presented via text display or synthesized speech output.


// Example of high-level flow:
User Input -> Intent Detection -> [Optional External Query] -> Summarization -> Memory Update -> Output

3a. Voice Interaction System

This subsystem utilizes Android’s native SpeechRecognizer to capture voice input and process speech in real time. It supports multi-language recognition and dynamic adjustments for speech rate and pitch.

Multi-language support: Automatically adapts to the device’s locale settings.
Customizable Parameters: Allows adjustments of speech rate and pitch via user settings.
Security: Processes all voice data locally or through secure APIs—no persistent storage is used.

Technical Detail: The system leverages partial results for immediate feedback and implements error correction algorithms for enhanced transcription accuracy.

3b. Deep Think Engine

The Deep Think Engine is engineered to handle complex queries by interfacing with GPT-based models. It constructs detailed prompts and leverages streaming responses to provide step-by-step reasoning.

Chained Response Architecture: Enables the AI model to articulate intermediate reasoning before delivering a final answer.
Contextual Analysis: Automatically identifies key contextual elements in complex queries.
Safeguards: Implements limits to prevent runaway or excessively verbose responses.

Example Prompt:


"Explain the principles of quantum computing step-by-step. Finally, provide a summary starting with 'Final Answer:'"

3c. Vision Module

The Vision Module enables image-based interactions by employing OCR to extract text and utilizing Azure Computer Vision for object detection and scene description.

OCR Integration: Captured images are sent to OCR.space for text extraction.
Object Detection: Azure Computer Vision detects objects and generates descriptive captions.
Dynamic Description: A natural language prompt is generated via the OpenAI API to summarize the detected content.

Technical Note: The module concurrently combines results from multiple APIs and aggregates them into a unified descriptive output.

3d. Web Search System

The Web Search System conducts real-time searches using DuckDuckGo and processes the results with an AI-based summarizer built on GPT models. It is designed to preserve user privacy.

DuckDuckGo Integration: Retrieves search results in JSON format.
AI Summarization: Condenses search results using GPT-based techniques.
Configurable Fallback: Offers an option to disable web search integration if desired.

Example Workflow: The system encodes the query, retrieves a summary from DuckDuckGo, and then refines the information using the OpenAI API.

3e. Memory & Knowledge Management

EchoAI maintains a secure, on-device memory store for conversation history and extracted facts, ensuring context-aware responses over prolonged interactions.

Local Storage: Encrypted JSON data is stored in SharedPreferences.
Memory Recall: Commands allow retrieval of stored facts.
Data Wipe: Users can completely clear conversation history.

Implementation Detail: Memory extraction algorithms scan conversation transcripts for key phrases (e.g., “I love”, “I like”) to dynamically update the memory store.

3f. Privacy & Security Systems

EchoAI enforces strict privacy and security measures. All sensitive data, including API keys and conversation history, is encrypted and stored locally.

Encryption: All encryption operations use AES-GCM.
Key Management: API keys are managed via the Android Keystore or universal encryption methods.
Data Isolation: Sensitive data is not transmitted without explicit user consent.

Security Enhancements: Certificate pinning and TLS secure all external communications.

4. Roadmap & Future Enhancements

Expanded Multimodal Vision Support: Integration of advanced image and video analysis techniques.
Offline Memory Backup: Development of secure export and restoration mechanisms for local data.
Command Scripting: Introduction of a scripting layer for advanced user commands.
Multi-language Mode: Enhanced support for non-English interactions (planned Q3 2025).

Future Direction: Continuous improvement driven by user feedback and emerging AI advancements.

5. Developer Notes

EchoAI is developed entirely on Android OS using Code Studio and Python Studio. The project emphasizes modern encryption, asynchronous processing, and modular design to ensure a responsive and secure user experience.

Asynchronous Task Management: All long-running operations (e.g., API calls, image processing) are executed on background threads.
Local Encryption: Sensitive data is secured using AES-GCM with keys managed in the Android Keystore.
UI Responsiveness: Handlers and animation listeners ensure smooth transitions.
Extensive Logging: Debug information is logged via AppLogger.log() to facilitate troubleshooting.

6. MainActivity.java Documentation

Overview

MainActivity.java is the central hub of EchoAI. It initializes the chat interface, manages broadcast receivers, and orchestrates data flow between subsystems.

Technical Highlights:

Robust asynchronous communication using Handlers.
Dynamic ListView updates for real-time chat.
Secure API key management through integrated decryption routines.

Key Components & Workflow

The activity sets up UI components, processes user inputs through methods like sendUserInput(), and persistently stores conversation sessions.


// Pseudocode snippet:
if (userInput is voice) {
  processVoiceInput();
} else {
  processTextInput();
}

Error Handling

Errors are logged via AppLogger and communicated to users via UI notifications.

7. VoiceActivity.java Documentation

Overview

VoiceActivity.java facilitates real-time voice interaction using Android’s SpeechRecognizer and TextToSpeech systems, with an Azure TTS fallback for increased reliability.

Subsystem Details

Speech Recognition:
- Configured for partial results with multi-language support.
- Handles asynchronous recognition events and error recovery.
Text-to-Speech:
- Customizable via user settings (language, voice, rate).
- Synchronizes TTS state with UI animations via broadcast receivers.

Integration & Robustness

Coordinates with external components (e.g., VisionActivity) to deliver a seamless voice experience.

8. VisionActivity.java Documentation

Overview

VisionActivity.java handles image capture and processing for OCR and object detection, merging API responses to generate a natural language description.

Technical Workflow

Camera Setup: Uses TextureView and the legacy Camera API for live preview.
Image Processing: Scales, rotates, and encodes images for API consumption.
Concurrent API Calls: Sends images to OCR.space and Azure Computer Vision concurrently.
Aggregation: Merges API responses into a unified descriptive message.

Error Handling

Ensures proper resource management and graceful handling of API errors via AppLogger.

9. DeepThinkActivity.java Documentation

Overview

DeepThinkActivity.java enables reflective, in-depth analysis by interfacing with the OpenAI Chat API in a streaming manner. It displays a chain-of-thought along with a final answer in a dynamic chat interface.

Architecture & Workflow

Constructs detailed prompts to elicit step-by-step reasoning.
Streams intermediate responses to update the UI in real time.
Stores conversation summaries for persistent history.

Technical Considerations

Utilizes background threads, custom adapters, and UI handlers to maintain high responsiveness.

10. WebSearchActivity.java Documentation

Overview

WebSearchActivity.java provides a chat-like interface for performing web searches using DuckDuckGo and synthesizing detailed answers with the OpenAI Chat API.

Detailed Workflow

Encode the user query and send a GET request to DuckDuckGo.
Parse the JSON response to extract search summaries or related topics.
Construct a composite prompt for the OpenAI API based on search results.
Stream the AI response to update the UI in real time.
Store a truncated summary of the interaction in conversation history.

Error Handling

A ProgressDialog indicates processing, and Toast messages notify the user of any network or parsing errors.

11. ExploreActivity.java Documentation

Overview

ExploreActivity.java offers a dynamic discovery interface that aggregates multiple rows of interactive cards—including fixed content, dynamic user-based categories, GPT-generated ideas, trending news, and recent media.

Key Features

Fixed Rows: Pre-defined suggestions such as featured cards and art prompts.
Dynamic Generation: Fetches user-based topics and GPT-generated categories.
Asynchronous Loading: Background threads and handlers fetch images and API data for smooth UI performance.
Interactive Cards: Tappable cards launch related activities with context-specific prompts.

Implementation Details

The activity uses HorizontalScrollView and nested LinearLayout elements to organize content. GPT-generated categories are fetched asynchronously to update the UI dynamically.

12. KeystoreHelper.java Documentation

Overview

KeystoreHelper.java offers a secure interface for generating, encrypting, and decrypting keys using the Android Keystore. It supports both device-specific encryption and universal encryption modes.

Device-Specific Encryption

Key Generation: Generates an AES key stored in the Android Keystore using KeyGenParameterSpec.
Encryption: Uses AES/GCM/NoPadding to produce Base64-encoded ciphertext and IV.
Decryption: Reconstructs the original data using the stored key and IV.

Universal Encryption

Uses a hardcoded Base64-encoded key for pre-encrypting data during development.
Allows sensitive data (e.g., API keys) to be encrypted before shipping the APK.

Technical Workflow

The helper methods abstract cryptographic operations via the Cipher API and GCMParameterSpec. For example:


// Example of device-specific encryption:
EncryptionResult result = KeystoreHelper.encrypt("sensitiveData");

Error Handling

Exceptions are thrown if encryption or decryption fails, ensuring that any cryptographic errors are properly propagated.

13. PreferencesHelper.java Documentation

Overview

PreferencesHelper.java manages persistent storage of encrypted API keys and their initialization vectors (IVs) using SharedPreferences. It supports multiple API keys used throughout the application.

Functional Capabilities

Saving Keys: Methods such as saveEncryptedApiKey() securely store keys.
Retrieving Keys: Methods such as getEncryptedApiKey() abstract SharedPreferences access.
Clearing Keys: Provides functionality to remove stored keys, enabling key rotation.

Implementation Details

Each API key type is mapped to a unique constant in SharedPreferences for uniform access and reduced redundancy.


// Example: Saving an encrypted API key:
PreferencesHelper.saveEncryptedApiKey(context, encryptedApiKey, iv);

Error Handling

These methods rely on Android’s robust SharedPreferences system and assume successful storage and retrieval.

14. ToolsManager.java Documentation

Overview

ToolsManager.java encapsulates static helper methods for device-level operations such as flashlight control, image capture, vibration, and timer management. It leverages legacy APIs to ensure compatibility with older Android devices.

Key Responsibilities & Methods

turnOnFlashlight(): Activates the flashlight using the Camera API with a dummy SurfaceTexture.
turnOffFlashlight(): Stops the camera preview and releases camera resources.
takePicture(): Launches the device’s camera app via an Intent.
vibrateDevice(): Triggers device vibration using appropriate APIs based on Android version.
setTimer(): Initiates a countdown timer via CountDownTimer and executes a callback when finished.

Core Implementation Details

Hardware-specific operations are abstracted into reusable static methods. For example, a dummy preview is configured to enable torch mode, and conditional checks ensure compatibility with both legacy and modern vibration APIs.


// Example usage:
if (ToolsManager.turnOnFlashlight()) {
  // Flashlight activated successfully.
}

Error Handling and Logging

Methods are enclosed in try-catch blocks and log errors using AppLogger.log() to ensure graceful failure and troubleshooting.

16. MemoryActivity.java Documentation

Overview

MemoryActivity.java displays and manages the assistant's memory of extracted conversational facts. It loads stored memories from SharedPreferences, displays them in a ListView, and offers functionality to clear the memory.

Key Responsibilities

Data Retrieval: Loads memory data (stored as a JSON array) from SharedPreferences.
UI Presentation: Displays memory items in a ListView using an ArrayAdapter.
Data Management: Provides a clear memory function that updates both the UI and persistent storage.

Technical Implementation

The activity retrieves memory data using the key ai_memory from the EchoAI_Memory SharedPreferences file. The JSON array is parsed and each memory item is added to an ArrayList that is bound to a ListView.


// Pseudocode:
String memoryJson = prefs.getString("ai_memory", "[]");
JSONArray memoryArray = new JSONArray(memoryJson);
for (int i = 0; i < memoryArray.length(); i++) {
  memoryItems.add(memoryArray.getString(i));
}

Error Handling

JSON parsing errors are caught and logged. Users are notified via Toast messages if memory loading fails.

15. CommandMapping.java Documentation

Overview

CommandMapping.java is a lightweight classifier that detects specific command patterns in recognized text and returns corresponding ActionType values. It supports both standard commands and additional tool commands.

Key Responsibilities

Pattern Matching: Utilizes case‑insensitive regular expressions to detect commands.
Weighting: Assigns numerical weights to mappings to influence selection priority.
Parameter Extraction: Extracts command parameters via captured regex groups.
Default Fallback: Provides a default mapping for unrecognized input.

Core Components

Regular Expressions: Compiled patterns stored for fast matching.
ActionType Enum: Maps each pattern to a corresponding command action.
Mapping Weight: Numerical value indicating priority when multiple mappings match.

Technical Workflow

When input text is processed, each CommandMapping iterates through its regex patterns. If a match is found, parameters are extracted and the corresponding ActionType is returned. The static method getDefaultMappings() returns a list of mappings covering common commands.


// Example usage:
List<CommandMapping> mappings = CommandMapping.getDefaultMappings();
for (CommandMapping mapping : mappings) {
  if (mapping.matches(userInput)) {
    List<String> params = mapping.extractParameters(userInput);
    // Execute action with extracted parameters.
  }
}

Error Handling and Logging

All pattern additions and matches are logged via AppLogger.log() to facilitate debugging and ensure reliable command detection.

17. AzureTTS.java Documentation

Overview

AzureTTS.java provides methods to synthesize speech using Azure's TTS REST API. It supports both a basic single-argument method and an overloaded version that accepts an AzureTTSCallback for notifying callers when playback starts and finishes.

Key Responsibilities

SSML Generation: Constructs SSML payloads for Azure TTS.
HTTP Communication: Manages secure HTTP POST requests to Azure’s TTS endpoint.
Audio Playback: Saves the synthesized speech as a WAV file and plays it back using MediaPlayer.
Callback Integration: Notifies the caller when TTS playback starts and finishes.

Core Components & Workflow

The class builds an SSML request that encapsulates the text to be spoken, sends it via an HTTP POST request to Azure's TTS endpoint, and checks for audio content in the response. The received audio file is stored in cache and played using MediaPlayer, triggering callback methods on start and completion.


// Pseudocode for speech synthesis:
ssml = "...your text...";
POST ssml to Azure TTS endpoint;
if (response is audio) {
  save audio file;
  play audio using MediaPlayer;
  callback.onTTSStarted();
  ...
  callback.onTTSFinished();
}

Error Handling

Network or content errors are logged via AppLogger.log(), and the callback's onTTSFinished() is invoked to signal processing end.

Integration

AzureTTS.java is integrated within the voice subsystem to provide an alternative TTS engine, enabling dynamic selection based on user preferences.

18. ImageGen.java Documentation

Overview

ImageGen.java is an IntentService that generates images based on a given text prompt using OpenAI’s image generation API. The service loads and decrypts its API key at startup, sends a POST request with the prompt, and broadcasts the resulting image URL.

Key Responsibilities

API Key Management: Loads and decrypts the ImageGen API key from SharedPreferences or asset files using KeystoreHelper and PreferencesHelper.
HTTP Communication: Sends a JSON payload containing the prompt to the OpenAI image generation endpoint.
Response Handling: Parses the JSON response to extract the generated image URL and broadcasts it.
Service Notification: Broadcasts an intent when image generation starts and when the image URL is available.

Core Components & Workflow

Upon receiving an intent with a text prompt, the service sends a POST request to the OpenAI image generation endpoint. The JSON payload includes the prompt, the number of images to generate, and the desired size. If successful, it parses the returned JSON to extract the image URL and broadcasts it using a custom intent.


// Pseudocode for image generation:
if (prompt is not empty) {
  POST { "prompt": prompt, "n": 1, "size": "512x512" } to OpenAI image endpoint;
  if (response code is OK) {
    parse JSON to extract image URL;
    broadcast "OPENAI_IMAGE_GENERATED" with image URL;
  } else {
    log error;
  }
}

Error Handling

If the API call fails or no image URL is returned, the service logs the error via Log.e and broadcasts an appropriate error message.