· 7 min read
First Android App with Kotlin
Building a pose-detection camera app for 3D model generation - my first Android project.

First Android App with Kotlin
In January 2023, I built my first Android application. It was a camera app with pose detection that captured images for 3D body model generation. First time on Android, first time with Kotlin, first time working with on-device machine learning. A lot of firsts.
What the App Needed to Do
An external company had AI that could generate accurate 3D body models from photographs, but only if the photos met strict requirements. The subject had to be in a very specific pose - arms at particular angles, feet positioned correctly, facing the right direction. Photos that didn’t match the expected pose produced garbage models.
My job was to build an app that would guide users into the correct pose and only capture the image once they’d nailed it. The app needed to:
- Show a live camera preview
- Run pose detection in real-time to track the user’s body position
- Compare detected pose against the target pose
- Provide audio and visual feedback guiding the user into position
- Automatically capture when the pose matched within acceptable tolerances
- Upload the captured image along with metadata to cloud storage
The target users weren’t technical - this needed to be simple enough that anyone could use it without instruction.
The Camera
Android camera work is more complicated than you’d expect. There are two APIs: the older Camera API (deprecated) and Camera2/CameraX (current). CameraX is the higher-level abstraction and handles most of the lifecycle complexity, so I went with that.
Setting up the preview was straightforward - bind a PreviewView to the camera lifecycle, and it handles orientation, aspect ratio, and surface management. The tricky part was running pose detection on the camera frames without killing performance.
CameraX has an ImageAnalysis use case that lets you process each frame. But you can’t just throw heavy ML inference at every frame - you’ll drop to single-digit FPS and the preview will stutter. I had to:
- Run inference on a background thread
- Skip frames if the previous inference was still running
- Downscale frames before sending to the ML model
- Throttle the feedback updates to the UI
Getting this flow right took trial and error. Too much processing and the app felt sluggish. Too little and the pose tracking was jerky.
Pose Detection with ML Kit
Google’s ML Kit includes a pose detection API that runs on-device. It identifies 33 body landmarks (eyes, ears, shoulders, elbows, wrists, hips, knees, ankles, etc.) and returns their x/y coordinates in the image frame along with confidence scores.
The detection itself is a single function call, but interpreting the results was the interesting part. I needed to determine whether the detected pose matched the target pose closely enough.
The target pose was defined as a set of angle requirements: left elbow between 170-180 degrees (nearly straight), shoulders level within 5 degrees of horizontal, feet shoulder-width apart, arms at a certain angle from the torso. I wrote a pose comparator that calculated the relevant angles from the landmark positions and checked each against the acceptable range.
Some challenges:
- Landmark occlusion: If the user’s arm is behind their body, ML Kit might not detect that landmark or give it a low confidence. I had to handle missing landmarks gracefully - if a critical landmark was missing, prompt the user to adjust rather than crashing.
- Coordinate systems: ML Kit returns coordinates in image space, but the image might be rotated or mirrored depending on camera orientation. I had to transform coordinates before calculating angles.
- Threshold tuning: Too strict and users could never match the pose. Too loose and the captured images wouldn’t work with the 3D AI. This required iteration with real users to find the sweet spot.
Guiding the User
Static instructions weren’t enough - users needed real-time feedback on what to adjust. I built a guidance system that:
- Checked which pose requirements weren’t met
- Prioritised the most off-target one
- Played an audio prompt (“Raise your left arm”) and displayed text
- Waited for improvement before issuing the next instruction
The audio prompts used Android’s TextToSpeech engine. Nothing fancy, but it let users focus on adjusting their pose instead of reading the screen.
Visual feedback included an overlay showing the detected skeleton and colour-coding limbs green when they were positioned correctly. Watching limbs turn green as you adjusted gave satisfying immediate feedback.
Auto-Capture Logic
When all pose requirements were satisfied, I didn’t immediately capture. Users might briefly pass through the correct pose without being stable. Instead, I required the pose to be held for about a second before triggering capture.
The capture itself used CameraX’s ImageCapture use case - separate from the preview and analysis, so you get a full-resolution still image. After capture, the app showed a preview so users could verify the image looked right before uploading.
If the image was blurry (camera shake) or the pose had drifted during capture, users could retake. I added basic blur detection using edge detection on a downscaled version of the image - not perfect, but caught the obvious cases.
Data Storage
Captured images went to Firebase Storage with Firebase Firestore holding the metadata: timestamps, user IDs, image URLs, device info, and pose landmark data at the moment of capture. The external 3D AI system would pull from this database.
Designing the Firestore schema required thinking about access patterns. The 3D processing system needed to query for unprocessed images across all users. Individual users needed to see their own images. I ended up with a flat collection of image documents with composite indexes for the common queries.
Firebase’s offline-first approach helped with flaky network situations. Images would upload when connectivity was available, and the app didn’t block on network operations.
Learning Kotlin Along the Way
I’d never written Kotlin before this project. Coming from TypeScript and Go, there was a learning curve, but less than expected.
Kotlin’s syntax is different - null safety is built into the type system (no more null pointer exceptions, in theory), data classes reduce boilerplate, extension functions let you add methods to existing types. But the underlying concepts were familiar. A list is a list. A class is a class. The patterns I knew (dependency injection, observers, model-view separation) worked the same way.
The Android-specific stuff was the harder part. Activity lifecycle, fragment management, view binding, permissions handling - this is Android knowledge more than Kotlin knowledge. I spent more time reading Android documentation than Kotlin documentation.
Android Studio helped a lot. The IDE catches Kotlin issues immediately, the layout preview shows your UI without running the app, and the debugger works well. Coming from VS Code for web work, the IDE-heaviness of Android Studio felt like a lot, but the tooling earned it.
What I’d Do Differently
Looking back, a few things I’d change:
- Better error handling during inference: Sometimes pose detection would throw exceptions under weird conditions (corrupted frame, unexpected orientation). I caught them, but the recovery wasn’t always graceful.
- More modular pose definitions: The target pose was hardcoded. If the 3D company wanted a different pose, I’d have to change code. Should have made it configurable from the server.
- Offline fallback: The app assumed network availability for uploads. A proper offline queue would have been better for users in areas with spotty connectivity.
What Stuck With Me
The project reinforced that foundational knowledge transfers. Kotlin syntax faded from memory after I stopped using it (use it or lose it), but the patterns and problem-solving approaches were the same as any other platform. If you understand why something is structured a certain way, learning the syntax is just lookup work.
It also reminded me that the ecosystem matters. Android has excellent ML tooling because Google has invested heavily in it. On-device pose detection is a few lines of code because someone built ML Kit. Choosing a platform means choosing what’s easy and what’s hard based on what libraries exist.