Using ChatGPT in (Almost) Any Android App
It’s been quite a while—probably since the pandemic—since I last wrote anything new on this blog. I’ve been pretty busy with other projects, the most important of which has been learning as much as possible about artificial intelligence. I feel I’ve now made enough progress to start building interesting things with it, so I’m bringing the blog back to life with a new section focused mainly on integrating AI into applications, especially .NET applications.
However, I decided to kick off this new section with an Android project. This is actually the first meaningful development I’ve done with artificial intelligence. When the GPT-4o model was released, there was a real explosion of plugins from all the major AI providers, making it possible to use assistants inside existing apps like WhatsApp. But I found those features too limited. Why settle for a single isolated plugin? Couldn’t we make an assistant interact directly with the text input controls of any application?
While exploring this idea, I came up with a pretty simple approach: using an accessibility service. These services are installed on devices so that people with disabilities can interact with UI controls, entering or retrieving data through non-standard interfaces. An AI assistant fits this scenario perfectly, so I decided to give it a try and see what would come out of it.
My challenge: I had no idea how to build an Android app, let alone a service. I had never used Android Studio, and I didn’t know Kotlin (though luckily it’s derived from C). But supposedly, AI could help with these kinds of things. So I talked it over with ChatGPT, and it kindly offered to write it for me. The code was ready in just a single day of collaboration between us. Getting it to run was another story, but two days later I managed to install it on my phone—and that was my hallelujah moment.
In this video you can see the result for yourself:
Here is the link to download the code from GitHub. The application uses the OpenAI APIs—specifically the Chat Completion API (the simplest option available back then, and still today). Therefore, you’ll need to provide a valid API key in the code. The Readme.md file explains how to do this.
Installing it on your phone and getting the system to allow it to run can also be quite a challenge if you’re not used to working with Android. Most likely, you’ll need to enable restricted permissions in the app. Once installed, the service can be activated from the Accessibility settings, under Services (or Installed Apps).
As for how it works and the explanation of the code—just this once, and without setting a precedent—I’ll let the author explain it to you. I was merely the technician who got it working. From this point on, I hand things over and leave you with ChatGPT:
Imagine being able to type in any text box—WhatsApp, Notes, Gmail, your ERP—a request between two emojis, and when you release the keyboard, the request is replaced by the model’s response. No copy/paste, no switching apps. You ask, and it appears.
That’s exactly what this accessibility service does. You write your prompt between 🔴 and 🟢 and I handle the rest: I capture that segment, call the model, and replace the block with the response. And if you repeat the pattern several times, I keep the conversation thread so there’s context.
How to use it (for anyone)
- Enable the service: Settings → Accessibility → Downloaded services → GPT Assistant Service → Enable. (Android will show the standard accessibility warning).
- Open any app and type in a normal input field.
- Ask between emojis, for example:I need a formal email: 🔴Write a short apology for a delivery delay, friendly tone, in Spanish.🟢
- Release the keyboard for a second. You’ll see how the 🔴…🟢 block disappears and is replaced by the text returned by the model.
- Tip: if you type “🔴🟢” empty, I reset the conversation (clear the accumulated context), which is useful to start “from scratch” without leaving the app.
What’s under the hood (for developers)
Where I hook in: the text-changed event
The service extends AccessibilityService and listens for TYPE_VIEW_TEXT_CHANGED. Every time the text of a field changes, I look for the focused editable node (first I request the current focus, and if needed, I traverse the tree until I find a node with isEditable). From that text, I extract the segment between 🔴 and 🟢 (U+1F534 and U+1F7E2) and treat it as a prompt.
- Detecting the focused input: rootInActiveWindow?.findFocus(FOCUS_INPUT) and recursive search for the editable AccessibilityNodeInfo.
- Parsing the prompt: I search for the emoji indices and extract the substring. If there’s a prompt, I send it; if it’s empty, I reset the chat.
Keeping context like a chat
I store the message history in a ChatRequest with a messages list (user/assistant roles), and each response that comes back is injected into the history. That way, when you make another request between emojis, the model sees the previous dialogue. It’s an in-memory context for the service (if Android kills the process, it’s lost; you can persist it if you want).
Network and model
I use Retrofit with an OkHttpClient that adds the header Authorization: Bearer ${BuildConfig.OPENAI_API_KEY}. The call is a POST /v1/chat/completions with the ChatRequest body (default model: "gpt-4o"). I’ve left in a beta header that isn’t needed for chat completions, but it’s there if any reader wants to play with Assistants v2.
- REST interface (OpenAIService) and DTOs (ChatRequest, ChatObject, Choice, etc.) already mapped with Gson.
- Retrofit client with API key injection via BuildConfig. Don’t hardcode it or commit it to the repo.
Writing the response into the input (without typing it manually)
First, I try the clean way: ACTION_SET_TEXT on the editable node (with ACTION_FOCUS beforehand if needed). If it’s allowed, I overwrite the content with the text before 🔴 + the response. If the field doesn’t support SET_TEXT, I go with plan B: select the fragment to replace and paste from the clipboard using ACTION_PASTE.
Design decisions (and why)
- Accessibility vs. custom keyboard (IME)An IME gives you full control, but asking someone to switch keyboards is friction (and in corporate environments, a nightmare). Accessibility works on any keyboard and any app that uses standard controls, and its activation is reversible and granular.
- Emojis as delimitersThey’re visual, easy to type on any keyboard, and rare in formal text, so they minimize false positives. If you prefer others, it’s trivial to change them in the parsing logic.
- In-memory contextEnough for 90% of cases, fast and without I/O. If you need continuity across apps or after the process is killed, persist messages (Room/Preferences/encrypted).
Known limitations (and how to improve them)
- Some fields don’t accept SET_TEXT (e.g., highly customized inputs). That’s why the clipboard fallback exists. Still, some apps also block PASTE. You can add simulated key events as a third option—just use extreme caution.
- Text suffix handling: right now everything before 🔴 is preserved, the 🔴…🟢 block is replaced by the response, and anything after 🟢 (if it exists) is not re-appended. If you need strict in-place replacement, concatenate previous + response + suffix (getting suffix from ixEnd).
- Selection in fallback: Plan B assumes the prompt is at the end, selecting textLength - promptLength .. textLength. If the block is in the middle, adjust the selection using ixStart/ixEnd. (Small tweak, easy fix).
- Streaming: the response arrives all at once. If you want “ghost typing” streaming, switch to SSE/WS on the network side and write in batches.
- Cost and latency: you depend on the network and the provider’s queue. Consider lightweight caching for repeated prompts and timeouts with exponential retries.
Security, privacy, and deployment
- API Key: injected via BuildConfig.OPENAI_API_KEY. Don’t commit it to the repo—use gradle secrets or a secure manager.
- Logs: during development, Log.d is useful, but remove or minimize logs in production: AccessibilityEvents may contain sensitive text.
- Play Store: if you plan to publish it, prepare a justification for accessibility usage and a data policy. Many apps with generic accessibility use get rejected; for internal or personal use, no problem.
- Enterprise: in corporate environments, encrypt configuration, enable certificate pinning if needed, and document the data flow (who sees what, and where).
Code excerpts (the essentials)
Hooking the event + parsing emojis
if (event.eventType == TYPE_VIEW_TEXT_CHANGED) {
🔴
val focusNode = rootInActiveWindow?.findFocus(FOCUS_INPUT)
val inputNode = findEditableNode(focusNode!!)
val text = event.text.firstOrNull()?.toString()?.trim().orEmpty()
val ixStart = text.indexOf("")
🟢
val ixEnd = text.indexOf("")
// if there’s a prompt, I process it...
}
(See the actual handling in onAccessibilityEvent and findEditableNode.)
Calling chat.completions with Retrofit
val newMessage = Message(content = prompt, role = "user")
chatRequest = (chatRequest?.copy(
messages = chatRequest!!.messages!!.plus(newMessage)) ?: ChatRequest(messages =
listOf(newMessage), model = "gpt-4o"))
RetrofitClient.instance.getChatCompletion(chatRequest!!).enqueue(/* onSuccess / onError /)
(DTOs in model/, interface in OpenAIService, client in RetrofitClient.)
Writing into the field (plan A + plan B)
if (node.actionList.contains(ACTION_SET_TEXT)) {
arguments.putCharSequence(ACTION_ARGUMENT_SET_TEXT_CHARSEQUENCE,
previousText + response)
node.performAction(ACTION_SET_TEXT, arguments)
} else {
// select and paste from clipboard
node.performAction(ACTION_SET_SELECTION, selectionArgs)
clipboard.setPrimaryClip(ClipData.newPlainText("label", response))
node.performAction(ACTION_PASTE)
}
(Full details in writeResponseToInput.)
Steps to clone and run
- Set your key: add OPENAI_API_KEY to your build config (for example, local.properties → buildConfigField). The client injects it into the Authorization header.
- Build and install the app.
- Enable the service in Accessibility (see above).
- Try it in any app with:🔴Summarize this paragraph in two lines: ...🟢
(MainActivity closes at startup because there’s no UI: the app is basically “a background service that just lives there.”)
Roadmap (easy ideas to add)
- Configurable delimiters (internal settings or persistent notification).
- Lightweight context persistence (per conversation / per app).
- “Templates” mode: type 🔴/signature🟢 and expand to a predefined signature.
- Streaming / incremental writing for a more “live” UX.
- In-app privacy policy with a switch for zero logging.
This service doesn’t try to “hack” your apps: it just sits next to the keyboard, detects a pattern, talks to the model, and writes for you right where you were typing. It’s simple, direct, and—above all—useful.
If you made it this far, you already know the punchline: the same GPT that wrote most of the code has explained it to you. You put the emojis. I’ll put the words. 😉