1. Introduction
EchoAI is a next-generation conversational assistant built entirely on Android OS. It integrates cutting‐edge voice interaction, advanced deep thinking, image processing, web search capabilities, and secure local memory management.
Designed with a focus on privacy, adaptability, and transparency, EchoAI is engineered to deliver intelligent, context-aware responses without compromising user data security.
EchoAI is the culmination of innovative mobile development and state‐of‐the‐art AI integration, aimed at redefining conversational experiences on the go.
2. System Architecture Overview
The EchoAI platform is composed of several modular subsystems that work in tandem:
Voice Interaction System
Deep Think Engine
Vision Module
Web Search System
Memory & Knowledge Extraction
Secure API Gateway
High-Level Data Flow:
- User input is captured as voice or text.
- The system processes context and detects user intent.
- If needed, an external API query is initiated (for vision or web search).
- Responses are filtered, summarized, and then integrated.
- Extracted facts are stored locally for memory recall.
- Results are presented via text display or synthesized speech output.
// Example of high-level flow:
User Input -> Intent Detection -> [Optional External Query] -> Summarization -> Memory Update -> Output
3a. Voice Interaction System
This subsystem utilizes Android’s native SpeechRecognizer
to capture voice input and process speech in real time. It supports multi-language recognition and dynamic adjustments for speech rate and pitch.
- Multi-language support: Automatically adapts to the device’s locale settings.
- Customizable Parameters: Allows adjustments of speech rate and pitch via user settings.
- Security: Processes all voice data locally or through secure APIs—no persistent storage is used.
Technical Detail: The system leverages partial results for immediate feedback and implements error correction algorithms for enhanced transcription accuracy.
3b. Deep Think Engine
The Deep Think Engine is engineered to handle complex queries by interfacing with GPT-based models. It constructs detailed prompts and leverages streaming responses to provide step-by-step reasoning.
- Chained Response Architecture: Enables the AI model to articulate intermediate reasoning before delivering a final answer.
- Contextual Analysis: Automatically identifies key contextual elements in complex queries.
- Safeguards: Implements limits to prevent runaway or excessively verbose responses.
Example Prompt:
"Explain the principles of quantum computing step-by-step. Finally, provide a summary starting with 'Final Answer:'"
3c. Vision Module
The Vision Module enables image-based interactions by employing OCR to extract text and utilizing Azure Computer Vision for object detection and scene description.
- OCR Integration: Captured images are sent to OCR.space for text extraction.
- Object Detection: Azure Computer Vision detects objects and generates descriptive captions.
- Dynamic Description: A natural language prompt is generated via the OpenAI API to summarize the detected content.
Technical Note: The module concurrently combines results from multiple APIs and aggregates them into a unified descriptive output.
3d. Web Search System
The Web Search System conducts real-time searches using DuckDuckGo and processes the results with an AI-based summarizer built on GPT models. It is designed to preserve user privacy.
- DuckDuckGo Integration: Retrieves search results in JSON format.
- AI Summarization: Condenses search results using GPT-based techniques.
- Configurable Fallback: Offers an option to disable web search integration if desired.
Example Workflow: The system encodes the query, retrieves a summary from DuckDuckGo, and then refines the information using the OpenAI API.
3e. Memory & Knowledge Management
EchoAI maintains a secure, on-device memory store for conversation history and extracted facts, ensuring context-aware responses over prolonged interactions.
- Local Storage: Encrypted JSON data is stored in SharedPreferences.
- Memory Recall: Commands allow retrieval of stored facts.
- Data Wipe: Users can completely clear conversation history.
Implementation Detail: Memory extraction algorithms scan conversation transcripts for key phrases (e.g., “I love”, “I like”) to dynamically update the memory store.
3f. Privacy & Security Systems
EchoAI enforces strict privacy and security measures. All sensitive data, including API keys and conversation history, is encrypted and stored locally.
- Encryption: All encryption operations use AES-GCM.
- Key Management: API keys are managed via the Android Keystore or universal encryption methods.
- Data Isolation: Sensitive data is not transmitted without explicit user consent.
Security Enhancements: Certificate pinning and TLS secure all external communications.
4. Roadmap & Future Enhancements
- Expanded Multimodal Vision Support: Integration of advanced image and video analysis techniques.
- Offline Memory Backup: Development of secure export and restoration mechanisms for local data.
- Command Scripting: Introduction of a scripting layer for advanced user commands.
- Multi-language Mode: Enhanced support for non-English interactions (planned Q3 2025).
Future Direction: Continuous improvement driven by user feedback and emerging AI advancements.
5. Developer Notes
EchoAI is developed entirely on Android OS using Code Studio and Python Studio. The project emphasizes modern encryption, asynchronous processing, and modular design to ensure a responsive and secure user experience.
- Asynchronous Task Management: All long-running operations (e.g., API calls, image processing) are executed on background threads.
- Local Encryption: Sensitive data is secured using AES-GCM with keys managed in the Android Keystore.
- UI Responsiveness: Handlers and animation listeners ensure smooth transitions.
- Extensive Logging: Debug information is logged via
AppLogger.log()
to facilitate troubleshooting.
6. MainActivity.java Documentation
Overview
MainActivity.java
is the central hub of EchoAI. It initializes the chat interface, manages broadcast receivers, and orchestrates data flow between subsystems.
Technical Highlights:
- Robust asynchronous communication using Handlers.
- Dynamic ListView updates for real-time chat.
- Secure API key management through integrated decryption routines.
Key Components & Workflow
The activity sets up UI components, processes user inputs through methods like sendUserInput()
, and persistently stores conversation sessions.
// Pseudocode snippet:
if (userInput is voice) {
processVoiceInput();
} else {
processTextInput();
}
Error Handling
Errors are logged via AppLogger
and communicated to users via UI notifications.
7. VoiceActivity.java Documentation
Overview
VoiceActivity.java
facilitates real-time voice interaction using Android’s SpeechRecognizer
and TextToSpeech
systems, with an Azure TTS fallback for increased reliability.
Subsystem Details
- Speech Recognition:
- Configured for partial results with multi-language support.
- Handles asynchronous recognition events and error recovery.
- Text-to-Speech:
- Customizable via user settings (language, voice, rate).
- Synchronizes TTS state with UI animations via broadcast receivers.
Integration & Robustness
Coordinates with external components (e.g., VisionActivity) to deliver a seamless voice experience.
8. VisionActivity.java Documentation
Overview
VisionActivity.java
handles image capture and processing for OCR and object detection, merging API responses to generate a natural language description.
Technical Workflow
- Camera Setup: Uses
TextureView
and the legacyCamera
API for live preview. - Image Processing: Scales, rotates, and encodes images for API consumption.
- Concurrent API Calls: Sends images to OCR.space and Azure Computer Vision concurrently.
- Aggregation: Merges API responses into a unified descriptive message.
Error Handling
Ensures proper resource management and graceful handling of API errors via AppLogger
.
9. DeepThinkActivity.java Documentation
Overview
DeepThinkActivity.java
enables reflective, in-depth analysis by interfacing with the OpenAI Chat API in a streaming manner. It displays a chain-of-thought along with a final answer in a dynamic chat interface.
Architecture & Workflow
- Constructs detailed prompts to elicit step-by-step reasoning.
- Streams intermediate responses to update the UI in real time.
- Stores conversation summaries for persistent history.
Technical Considerations
Utilizes background threads, custom adapters, and UI handlers to maintain high responsiveness.
10. WebSearchActivity.java Documentation
Overview
WebSearchActivity.java
provides a chat-like interface for performing web searches using DuckDuckGo and synthesizing detailed answers with the OpenAI Chat API.
Detailed Workflow
- Encode the user query and send a GET request to DuckDuckGo.
- Parse the JSON response to extract search summaries or related topics.
- Construct a composite prompt for the OpenAI API based on search results.
- Stream the AI response to update the UI in real time.
- Store a truncated summary of the interaction in conversation history.
Error Handling
A ProgressDialog
indicates processing, and Toast messages notify the user of any network or parsing errors.
11. ExploreActivity.java Documentation
Overview
ExploreActivity.java
offers a dynamic discovery interface that aggregates multiple rows of interactive cards—including fixed content, dynamic user-based categories, GPT-generated ideas, trending news, and recent media.
Key Features
- Fixed Rows: Pre-defined suggestions such as featured cards and art prompts.
- Dynamic Generation: Fetches user-based topics and GPT-generated categories.
- Asynchronous Loading: Background threads and handlers fetch images and API data for smooth UI performance.
- Interactive Cards: Tappable cards launch related activities with context-specific prompts.
Implementation Details
The activity uses HorizontalScrollView
and nested LinearLayout
elements to organize content. GPT-generated categories are fetched asynchronously to update the UI dynamically.
12. KeystoreHelper.java Documentation
Overview
KeystoreHelper.java
offers a secure interface for generating, encrypting, and decrypting keys using the Android Keystore. It supports both device-specific encryption and universal encryption modes.
Device-Specific Encryption
- Key Generation: Generates an AES key stored in the Android Keystore using
KeyGenParameterSpec
. - Encryption: Uses AES/GCM/NoPadding to produce Base64-encoded ciphertext and IV.
- Decryption: Reconstructs the original data using the stored key and IV.
Universal Encryption
- Uses a hardcoded Base64-encoded key for pre-encrypting data during development.
- Allows sensitive data (e.g., API keys) to be encrypted before shipping the APK.
Technical Workflow
The helper methods abstract cryptographic operations via the Cipher
API and GCMParameterSpec
. For example:
// Example of device-specific encryption:
EncryptionResult result = KeystoreHelper.encrypt("sensitiveData");
Error Handling
Exceptions are thrown if encryption or decryption fails, ensuring that any cryptographic errors are properly propagated.
13. PreferencesHelper.java Documentation
Overview
PreferencesHelper.java
manages persistent storage of encrypted API keys and their initialization vectors (IVs) using SharedPreferences. It supports multiple API keys used throughout the application.
Functional Capabilities
- Saving Keys: Methods such as
saveEncryptedApiKey()
securely store keys. - Retrieving Keys: Methods such as
getEncryptedApiKey()
abstract SharedPreferences access. - Clearing Keys: Provides functionality to remove stored keys, enabling key rotation.
Implementation Details
Each API key type is mapped to a unique constant in SharedPreferences for uniform access and reduced redundancy.
// Example: Saving an encrypted API key:
PreferencesHelper.saveEncryptedApiKey(context, encryptedApiKey, iv);
Error Handling
These methods rely on Android’s robust SharedPreferences system and assume successful storage and retrieval.
14. ToolsManager.java Documentation
Overview
ToolsManager.java
encapsulates static helper methods for device-level operations such as flashlight control, image capture, vibration, and timer management. It leverages legacy APIs to ensure compatibility with older Android devices.
Key Responsibilities & Methods
turnOnFlashlight()
: Activates the flashlight using the Camera API with a dummySurfaceTexture
.turnOffFlashlight()
: Stops the camera preview and releases camera resources.takePicture()
: Launches the device’s camera app via an Intent.vibrateDevice()
: Triggers device vibration using appropriate APIs based on Android version.setTimer()
: Initiates a countdown timer viaCountDownTimer
and executes a callback when finished.
Core Implementation Details
Hardware-specific operations are abstracted into reusable static methods. For example, a dummy preview is configured to enable torch mode, and conditional checks ensure compatibility with both legacy and modern vibration APIs.
// Example usage:
if (ToolsManager.turnOnFlashlight()) {
// Flashlight activated successfully.
}
Error Handling and Logging
Methods are enclosed in try-catch blocks and log errors using AppLogger.log()
to ensure graceful failure and troubleshooting.
16. MemoryActivity.java Documentation
Overview
MemoryActivity.java
displays and manages the assistant's memory of extracted conversational facts. It loads stored memories from SharedPreferences, displays them in a ListView, and offers functionality to clear the memory.
Key Responsibilities
- Data Retrieval: Loads memory data (stored as a JSON array) from SharedPreferences.
- UI Presentation: Displays memory items in a ListView using an ArrayAdapter.
- Data Management: Provides a clear memory function that updates both the UI and persistent storage.
Technical Implementation
The activity retrieves memory data using the key ai_memory
from the EchoAI_Memory
SharedPreferences file. The JSON array is parsed and each memory item is added to an ArrayList that is bound to a ListView.
// Pseudocode:
String memoryJson = prefs.getString("ai_memory", "[]");
JSONArray memoryArray = new JSONArray(memoryJson);
for (int i = 0; i < memoryArray.length(); i++) {
memoryItems.add(memoryArray.getString(i));
}
Error Handling
JSON parsing errors are caught and logged. Users are notified via Toast messages if memory loading fails.
15. CommandMapping.java Documentation
Overview
CommandMapping.java
is a lightweight classifier that detects specific command patterns in recognized text and returns corresponding ActionType
values. It supports both standard commands and additional tool commands.
Key Responsibilities
- Pattern Matching: Utilizes case‑insensitive regular expressions to detect commands.
- Weighting: Assigns numerical weights to mappings to influence selection priority.
- Parameter Extraction: Extracts command parameters via captured regex groups.
- Default Fallback: Provides a default mapping for unrecognized input.
Core Components
- Regular Expressions: Compiled patterns stored for fast matching.
- ActionType Enum: Maps each pattern to a corresponding command action.
- Mapping Weight: Numerical value indicating priority when multiple mappings match.
Technical Workflow
When input text is processed, each CommandMapping
iterates through its regex patterns. If a match is found, parameters are extracted and the corresponding ActionType
is returned. The static method getDefaultMappings()
returns a list of mappings covering common commands.
// Example usage:
List<CommandMapping> mappings = CommandMapping.getDefaultMappings();
for (CommandMapping mapping : mappings) {
if (mapping.matches(userInput)) {
List<String> params = mapping.extractParameters(userInput);
// Execute action with extracted parameters.
}
}
Error Handling and Logging
All pattern additions and matches are logged via AppLogger.log()
to facilitate debugging and ensure reliable command detection.
17. AzureTTS.java Documentation
Overview
AzureTTS.java
provides methods to synthesize speech using Azure's TTS REST API. It supports both a basic single-argument method and an overloaded version that accepts an AzureTTSCallback
for notifying callers when playback starts and finishes.
Key Responsibilities
- SSML Generation: Constructs SSML payloads for Azure TTS.
- HTTP Communication: Manages secure HTTP POST requests to Azure’s TTS endpoint.
- Audio Playback: Saves the synthesized speech as a WAV file and plays it back using
MediaPlayer
. - Callback Integration: Notifies the caller when TTS playback starts and finishes.
Core Components & Workflow
The class builds an SSML request that encapsulates the text to be spoken, sends it via an HTTP POST request to Azure's TTS endpoint, and checks for audio content in the response. The received audio file is stored in cache and played using MediaPlayer
, triggering callback methods on start and completion.
// Pseudocode for speech synthesis:
ssml = "...your text... ";
POST ssml to Azure TTS endpoint;
if (response is audio) {
save audio file;
play audio using MediaPlayer;
callback.onTTSStarted();
...
callback.onTTSFinished();
}
Error Handling
Network or content errors are logged via AppLogger.log()
, and the callback's onTTSFinished()
is invoked to signal processing end.
Integration
AzureTTS.java
is integrated within the voice subsystem to provide an alternative TTS engine, enabling dynamic selection based on user preferences.
18. ImageGen.java Documentation
Overview
ImageGen.java
is an IntentService
that generates images based on a given text prompt using OpenAI’s image generation API. The service loads and decrypts its API key at startup, sends a POST request with the prompt, and broadcasts the resulting image URL.
Key Responsibilities
- API Key Management: Loads and decrypts the ImageGen API key from SharedPreferences or asset files using
KeystoreHelper
andPreferencesHelper
. - HTTP Communication: Sends a JSON payload containing the prompt to the OpenAI image generation endpoint.
- Response Handling: Parses the JSON response to extract the generated image URL and broadcasts it.
- Service Notification: Broadcasts an intent when image generation starts and when the image URL is available.
Core Components & Workflow
Upon receiving an intent with a text prompt, the service sends a POST request to the OpenAI image generation endpoint. The JSON payload includes the prompt, the number of images to generate, and the desired size. If successful, it parses the returned JSON to extract the image URL and broadcasts it using a custom intent.
// Pseudocode for image generation:
if (prompt is not empty) {
POST { "prompt": prompt, "n": 1, "size": "512x512" } to OpenAI image endpoint;
if (response code is OK) {
parse JSON to extract image URL;
broadcast "OPENAI_IMAGE_GENERATED" with image URL;
} else {
log error;
}
}
Error Handling
If the API call fails or no image URL is returned, the service logs the error via Log.e
and broadcasts an appropriate error message.