Suck the cat with code! This paper is participating in[Cat Essay Campaign]
MLKit is a Google machine learning library for mobile devices. Engineers can implement a variety of AI capabilities, such as images, text, face recognition, etc. on Android or iOS with just a little code, many of which can be done offline on the device with TensorFlow Lite.
developers.google.com/ml-kit
This article takes you through the following features of MLKit on Android:
- Image Labeling
- Object Detection
- Object Tracking
1. Image Labeling
Image recognition is an important field in computer vision, which simply means to help you extract effective information from pictures. MLKit provides ImageLabeling, which can recognize and classify image information.
For example, if you enter a picture containing a cat, ImageLabeling can identify the cat elements in the picture and give a label of the cat. In addition to the most prominent cat, ImageLabeling can also identify all recognizable things in the picture, such as flowers and grass, and give the probability and proportion of occurrence respectively. The result of the recognition is returned as List
. Based on the preset default model, ImageLabeling can classify image elements into more than 400 annotation categories, although you can use your own trained model to expand these categories.
The default model currently supports classification annotations: developers.google.com/ml-kit/visi…
It’s easy to introduce MLKit’s ImageLabeliing in Android, just add the dependency in Gradle
implementation 'com. Google. Mlkit: image - labeling: 17.0.5'
Copy the code
Next, write an Android Demo to show you how it works. Let’s write a simple UI for the Demo using Compose:
@Composable
fun MLKitSample(a) {
Column {
var imageLabel by remember { mutableStateOf("")}//Load Image
val context = LocalContext.current
val bmp = remember(context) {
context.assetsToBitmap("cat.png")!!!!! } Image(bitmap = bmp.asImageBitmap(), contentDescription ="")
val coroutineScope = rememberCoroutineScope()
Button(
onClick = {
//TODO: image recognition logic, see below
})
{ Text("Image Labeling") }
Text(imageLabel, Modifier.fillMaxWidth(), textAlign = TextAlign.Center)
}
}
Copy the code
Put the image resource into/Assets and load it as a Bitmap
fun Context.assetsToBitmap(fileName: String): Bitmap? =
assets.open(fileName).use {
BitmapFactory.decodeStream(it)
}
Copy the code
After clicking Button, Bitmap is recognized, and the identified information is obtained to update imageLabel.
Take a look inside onClick:
val labeler = ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)
val image = InputImage.fromBitmap(bmp, 0)
labeler.process(image).addOnSuccessListener { labels : List<ImageLabel> ->
// Task completed successfully
imageLabel = labels.scan("") { acc, label ->
acc + "${label.text} : ${label.confidence}\n"
}.last()
}.addOnFailureListener {
// Task failed with an exception
}
Copy the code
The ImageLabeler handler is first created. Inputimage.frombitmap processes the Bitmap to a resource type that is acceptable to the ImageLabeler.
After the processing is successful, the list of ImageLabel is returned. ImageLabel represents the annotation information of each category. After the image is identified, a group of such DE annotations are obtained, including the name of each category and its probability of occurrence.
2. Object Detection
Object detection is also a basic research direction of computer vision. Note the difference between “detect” and “identify” :
- So you’re looking at Where is, Where is the target
- Identification (Lebeling) : Focus on What is, i.e. What is the goal
ImageLebeling can recognize categories of things in an image, but cannot determine which things are where. Target detection can determine where several things are respectively, but the classification information of things is not clear.
ObjectDetection also provides some recognition capabilities, but its default model file can only recognize a limited number of categories, which cannot be classified as accurately as ImageLebeling. Identifying more accurate information requires additional model files. However, we can use the above two sets of apis together to achieve target detection and accurate identification and classification.
First add the ObjectDetection dependency
implementation 'com. Google. Mlkit: object - detection: 16.2.7'
Copy the code
Next in the above example, add a Button for target detection after clicking
@Composable
fun MLKitSample(a) {
Column(Modifier.fillMaxSize()) {
val detctedObject = remember { mutableStateListOf<DetectedObject>() }
//Load Image
val context = LocalContext.current
val bmp = remember(context) {
context.assetsToBitmap("dog_cat.jpg")!!!!! } Canvas(Modifier.aspectRatio( bmp.width.toFloat() / bmp.height.toFloat())) { drawIntoCanvas { canvas -> canvas.withSave { canvas.scale(size.width / bmp.width) canvas.drawImage(/ / draw the image
image = bmp.asImageBitmap(), Offset(0f.0f), Paint()
)
detctedObject.forEach {
canvas.drawRect( // Draw a border for target detection
it.boundingBox.toComposeRect(),
Paint().apply {
color = Color.Red.copy(alpha = 0.5 f)
style = PaintingStyle.Stroke
strokeWidth = bmp.width * 0.01 f
})
if (it.labels.isNotEmpty()) {
canvas.nativeCanvas.drawText( // Draw object recognition information
it.labels.first().text,
it.boundingBox.left.toFloat(),
it.boundingBox.top.toFloat(),
android.graphics.Paint().apply {
color = Color.Green.toArgb()
textSize = bmp.width * 0.05 f
})
}
}
}
}
}
Button(
onClick = {
//TODO: object detection logic, see below
})
{ Text("Object Detect")}}}Copy the code
Since we need to draw the target boundary information on the image, we use Canvas to draw the UI this time, including the following contents:
- DrawImage: Draws the target image
- drawRectMLKit is returned after successful detection
List<DetectedObject>
DetectedObject to draw target boundaries - DrawText: Classification annotation based on DetectedObject
After clicking Button, target detection is carried out, and the specific implementation is as follows:
val options =
ObjectDetectorOptions.Builder()
.setDetectorMode(ObjectDetectorOptions.SINGLE_IMAGE_MODE)
.enableMultipleObjects()
.enableClassification()
.build()
val objectDetector = ObjectDetection.getClient(options)
val image = InputImage.fromBitmap(bmp, 0)
objectDetector.process(image)
.addOnSuccessListener { detectedObjects ->
// Task completed successfully
coroutineScope.launch {
detctedObject.clear()
detctedObject.addAll(getLabels(bmp, detectedObjects).toList())
}
}
.addOnFailureListener { e ->
// Task failed with an exception
// ...
}
Copy the code
ObjectDetectorOptions lets you configure the detection processing. You can use Builder for multiple configurations:
- setDetectorMode: ObjectDetection There are many ways to detect objects, and this is the simplest one
SINGLE_IMAGE_MODE
That is, the detection of a single picture. In addition, there are other methods such as video stream detection, which will be introduced later. - EnableMultipleObjects: Can detect only the most prominent objects or detect all objects, we enable multi-object detection, detect all detectable objects.
- enableClassification: ObjectDetection has limited ability in image recognition. The default model can only recognize 5 species, which are all relatively broad categories, such as plants and animals. EnableClassification enables image recognition. After opening, its identification results will be stored
DetectedObject.labels
. Since this recognition result is meaningless, we will replace it with the annotation information identified by ImageLebeling in the example
Create an ObjectDetector handler based on ObjectDetectorOptions and start detecting the image when it is passed in. GetLabels is a custom method that adds image recognition information based on ImageLebeling. The final result of the check is updated to the MutableStateList in detctedObject, refreshing the Compose UI.
private fun getLabels(
bitmap: Bitmap,
objects: List<DetectedObject>) = flow {
val labeler = ImageLabeling.getClient(ImageLabelerOptions.DEFAULT_OPTIONS)
for (obj in objects) {
val bounds = obj.boundingBox
val croppedBitmap = Bitmap.createBitmap(
bitmap,
bounds.left,
bounds.top,
bounds.width(),
bounds.height()
)
emit(
DetectedObject(
obj.boundingBox,
obj.trackingId,
getLabel(labeler, croppedBitmap).map {
detecteDobject.label
DetectedObject.Label(it.text, it.confidence, it.index)
})
)
}
}
Copy the code
BoundingBox is used to break the Bitmap into small images based on the border information of DetectedObject, and then call getLabel to get the annotation information to supplement the DetectedObject instance (this is actually a reconstructed instance).
ImageLebeling in getLabel is an asynchronous procedure defined as a suspended function for ease of call:
suspend fun getLabel(labeler: ImageLabeler, image: Bitmap): List<ImageLabel> =
suspendCancellableCoroutine { cont ->
labeler.process(InputImage.fromBitmap(image, 0))
.addOnSuccessListener { labels ->
// Task completed successfully
cont.resume(labels)
}
}
Copy the code
3. Object Tracking
Target tracking is to achieve continuous capture effect through ObjectDetection of video frame by frame. In the following example, we launch a camera preview to perform ObjectTracking on the captured image.
We used CameraX to start the camera because the CameraX packaged API is easier to use.
implementation "Androidx. Camera: the camera - camera2:1.0.0 - rc01"
implementation "Androidx. Camera: the camera - lifecycle: 1.0.0 - rc01"
implementation "Androidx. Camera: the camera - view: 1.0.0 - alpha20"
implementation "Com. Google. Accompanist: accompanist - permissions: 0.16.1"
Copy the code
CameraX strings ist-Permissions is used for dynamic application of camera permissions.
CameraX preview of the need to use androidx. Camera. The PreviewView, we through AndroidView integration into Composable, AndroidView above cover Canvas, Canvas draws the target border.
The entire UI layout is as follows:
val detectedObjects = mutableStateListOf<DetectedObject>()
Box {
CameraPreview(detectedObjects)
Canvas(modifier = Modifier.fillMaxSize()) {
drawIntoCanvas { canvas ->
detectedObjects.forEach {
canvas.scale(size.width / 480, size.height / 640)
canvas.drawRect( // Draw the border
it.boundingBox.toComposeRect(),
Paint().apply {
color = Color.Red
style = PaintingStyle.Stroke
strokeWidth = 5f
})
canvas.nativeCanvas.drawText( // Draw text
"TrackingId_${it.trackingId}",
it.boundingBox.left.toFloat(),
it.boundingBox.top.toFloat(),
android.graphics.Paint().apply {
color = Color.Green.toArgb()
textSize = 20f})}}}}Copy the code
DetectedObjects is the result of ObjectDetection frame-by-frame real-time detection. Camera review integrates AndroidView for CameraPreview and updates detectedObjects in real time. DrawRect and drawText also appear in the previous example, but note that drawText draws a trackingId. The ObjectDetection of the video adds the trackingId information to the DetectedObject. The border position of the video target changes constantly, but the trackingId remains the same, making it easier to lock individuals in multiple targets.
@Composable
private fun CameraPreview(detectedObjects: SnapshotStateList<DetectedObject>) {
val lifecycleOwner = LocalLifecycleOwner.current
val context = LocalContext.current
val cameraProviderFuture = remember { ProcessCameraProvider.getInstance(context) }
val coroutineScope = rememberCoroutineScope()
val objectAnalyzer = remember { ObjectAnalyzer(coroutineScope, detectedObjects) }
AndroidView(
factory = { ctx ->
val previewView = PreviewView(ctx)
val executor = ContextCompat.getMainExecutor(ctx)
valimageAnalyzer = ImageAnalysis.Builder() .setBackpressureStrategy(ImageAnalysis.STRATEGY_KEEP_ONLY_LATEST) .build() .also { it.setAnalyzer(executor, objectAnalyzer) } cameraProviderFuture.addListener({val cameraProvider = cameraProviderFuture.get(a)val preview = Preview.Builder().build().also {
it.setSurfaceProvider(previewView.surfaceProvider)
}
val cameraSelector = CameraSelector.Builder()
.requireLensFacing(CameraSelector.LENS_FACING_BACK)
.build()
cameraProvider.unbindAll()
cameraProvider.bindToLifecycle(
lifecycleOwner,
cameraSelector,
preview,
imageAnalyzer
)
}, executor)
previewView
},
modifier = Modifier.fillMaxSize(),
)
}
Copy the code
CameraPreview is mainly about the use of CameraX. This article will not explain the use of CameraX line by line, but will only focus on the code relevant to the topic: CameraX can set ImageAnalyzer to be used to parse video frames, which is exactly what we’re looking for. Here we have a custom ObjectAnalyzer for target detection.
Finally, take a look at the implementation of ObjectAnalyzer
class ObjectAnalyzer(
private val coroutineScope: CoroutineScope,
private val detectedObjects: SnapshotStateList<DetectedObject>
) : ImageAnalysis.Analyzer {
private val options = ObjectDetectorOptions.Builder()
.setDetectorMode(ObjectDetectorOptions.STREAM_MODE)
.build()
private val objectDetector = ObjectDetection.getClient(options)
@SuppressLint("UnsafeExperimentalUsageError")
override fun analyze(imageProxy: ImageProxy) {
val frame = InputImage.fromMediaImage(
imageProxy.image,
imageProxy.imageInfo.rotationDegrees
)
coroutineScope.launch {
objectDetector.process(frame)
.addOnSuccessListener { detectedObjects ->
// Task completed successfully
with(this@ObjectAnalyzer.detectedObjects) {
clear()
addAll(detectedObjects)
}
}
.addOnFailureListener { e ->
// Task failed with an exception
// ...
}
.addOnCompleteListener {
imageProxy.close()
}
}
}
}
Copy the code
The video frame previewed by the camera is obtained in ObjectAnalyzer for ObjectDetection, and the detection result is updated to detectedObjects. Notice here that ObjectDetectorOptions is set to STREAM_MODE specifically for video detection. Although it is theoretically possible to treat each frame as SINGLE_IMAGE_MODE, only STREAM_MODE detection results have the trackingId value, and the STREAM_MODE border position is treated to make the shift smoother.
The last
In this paper, in order to take part in the activities in the platform, meow, for example this paper introduces the MLKit image recognition ability, MLKit still has a lot of practical function, such as face detection compared to Android’s own Android. Media. FaceDetector both performance and recognition rate has a qualitative leap. In addition, many domestic AI companies also have a lot of good solutions, such as Kuang Shi. I believe that with the development of AI technology, there will be more and more application scenarios on mobile terminals in the future.
This article code: github.com/vitaviva/Je…