Notes, which comes with iOS, offers many powerful features under its pristine name, and scanning documents is one of the features I use a lot. I have been trying to provide a similar function in Health Note for a long time, but it has been delayed due to the fact that it involves a lot of knowledge. Recently, IN my spare time, I sorted out and studied the topics related to the implementation of this function in WWDC in recent years, and benefited a lot. Apple officials already have all the tools we need. This article will show you how to achieve similar functionality to memo scanning through VisionKit, Vision, NaturalLanguage, CoreSpotlight and other system frameworks.

The original post was posted on my blog wwww.fatbobman.com

Welcome to subscribe my public account: [Elbow’s Swift Notepad]

Take pictures suitable for recognition with VisionKit

VisionKit introduction

VisionKit is a small framework that allows your applications to use the system’s document scanner. Use the camera view VNDocumentCameraViewController present covers the entire screen. Through implement VNDocumentCameraViewControllerDelegate, in a view controller takes a callback from the document camera, such as complete scan.

Give developers the ability to capture and manipulate images (perspective transformation, color manipulation, etc.) by scanning the appearance of documents consistent with Notes.

VisionKit Usage method

The VisionKit framework has clear objectives, requires no configuration, and is extremely simple to use.

Apply for permission to use the camera in the APP

Add the NSCameraUsageDescription key in info to fill in the reason for using the camera.

Create VNDocumentCameraViewController

VNDocumentCameraViewController did not provide any configuration options, only need to declare an instance of it can be used.

The following code is used in SwiftUI:

import VisionKit

struct VNCameraView: UIViewControllerRepresentable {
    @Binding var pages:[ScanPage]
    @Environment(\.dismiss) var dismiss

    typealias UIViewControllerType = VNDocumentCameraViewController

    func makeUIViewController(context: Context) -> VNDocumentCameraViewController {
        let controller = VNDocumentCameraViewController()
        controller.delegate = context.coordinator
        return controller
    }

    func updateUIViewController(_ uiViewController: VNDocumentCameraViewController.context: Context) {}

    func makeCoordinator(a) -> VNCameraCoordinator {
        VNCameraCoordinator(pages: $pages,dismiss: dismiss)
    }
}

struct ScanPage: Identifiable {
    let id = UUID(a)let image: UIImage
}
Copy the code

Implement VNDocumentCameraViewControllerDelegate

VNDocumentCameraViewControllerDelegate provides three callback methods

  • documentCameraViewController(_ controller: VNDocumentCameraViewController, didFinishWith scan: VNDocumentCameraScan)

    Tells the delegate that the user has successfully saved the scanned document from the document camera

  • documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController)

    Tells the delegate that the user has canceled from the document scanner camera.

  • documentCameraViewController(_ controller: VNDocumentCameraViewController, didFailWithError error: Error)

    Tell the delegate that the document scan failed while the camera view controller is active.

final class VNCameraCoordinator: NSObject.VNDocumentCameraViewControllerDelegate {
    @Binding var pages:[ScanPage]
    var dismiss:DismissAction

    func documentCameraViewController(_ controller: VNDocumentCameraViewController.didFinishWith scan: VNDocumentCameraScan) {
        for i in 0..<scan.pageCount{
            let scanPage = ScanPage(image: scan.imageOfPage(at: i))
            pages.append(scanPage)
        }
        dismiss()
    }

    func documentCameraViewControllerDidCancel(_ controller: VNDocumentCameraViewController) {
        dismiss()
    }

    func documentCameraViewController(_ controller: VNDocumentCameraViewController.didFailWithError error: Error) {
        dismiss()
    }

    init(pages:Binding"[ScanPage] >,dismiss:DismissAction) {
        self._pages = pages
        self.dismiss = dismiss
    }
}
Copy the code

VisionKit allows users to scan images continuously. You can query the number of images by pageCount and get them separately by imageOfPage.

Users should adjust the direction of the scanned picture to the correct display state for the next step of text recognition.

Called in the view

struct ContentView: View {
    @State var scanPages = [ScanPage] ()@State var scan = false
    var body: some View {
        VStack {
            Button("Scan") {
                scan.toggle()
            }
            List {
                ForEach(scanPages, id: \.id) { page in
                    HStack{
                    Image(uiImage: page.image)
                        .resizable()
                        .aspectRatio(contentMode: .fit)
                        .frame(height: 100)
                    }
                }
            }
            .fullScreenCover(isPresented: $scan) {
                VNCameraView(pages: $scanPages)
                    .ignoresSafeArea()
            }
        }
    }
}
Copy the code

At this point, you have the ability to take scanned images exactly as Notes does.

Use Vision for word recognition

Vision is introduced

In contrast to VisionKit’s smallness, Vision is a large, powerful framework with a wide range of uses. It applies computer vision algorithms to perform various tasks on the input images and videos.

The Vision framework can perform face and face feature point detection, text detection, bar code recognition, image registration and target tracking. Vision also allows you to use custom Core ML models for tasks such as classification or object detection.

In this case, we simply use the text detection feature provided by Vision.

How to use Vision for text recognition

Vision is able to detect and recognize multilingual text in images, completely local to the device, ensuring user privacy. Vision provides two text detection paths (algorithms) : Fast and Accurate. Fast is great for scenarios like reading numbers in real time, and in this case, since we need to do word processing on the entire document, choosing the exact path using a neural network algorithm is more appropriate.

No matter what kind of recognition calculation is done in Vision, the general process is not too different.

  • Prepare the input image for Vision

    Vision uses VNImageRequestHandler to process image-based requests and assumes that the image is upright, so direction is taken into account when passing the image. In this case, we will use the VNDocumentCameraViewController provide image processing.

  • Create a Vision Request

    Start by creating a VNImageRequestHandler object with the image to be processed.

    Next, create VNImageBasedRequest and propose the request. For each recognition type there is a corresponding VNImageBasedRequest subclass. In this case, the recognizetextrequest request corresponding to the text is VNRecognizeTextRequest.

    Multiple requests for the same image can be made by creating and binding all requests to an instance of VNImageRequestHandler.

  • Interpret test results

    You can access test results in two ways: first, check the Results property by calling Perform. Set the callback method to retrieve the identification information when creating the Request object. The result of the callback may contain multiple observations (observations), and you need to loop through the observation array to process each observation.

The approximate code is as follows:

import Vision

func processImage(image: UIImage) -> String {
    guard let cgImage = image.cgImage else {
        fatalError()}var result = ""
    let request = VNRecognizeTextRequest { request, _ in
        guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
        let recognizedStrings = observations.compactMap { observation in
            observation.topCandidates(1).first?.string
        }
        result = recognizedStrings.joined(separator: "")
    }
    request.recognitionLevel = .accurate // Use the exact path
    request.recognitionLanguages = ["zh-Hans"."en-US"] // Set the recognized language

    let requestHandler = VNImageRequestHandler(cgImage: cgImage)
    do {
        try requestHandler.perform([request])
    } catch {
        print("error:\(error)")}return result
}
Copy the code

Each identified text field may contain multiple results. The topCandidates(n) are set to return a maximum of several candidates.

RecognitionLanguages defines the order of language use in the process of language processing and text recognition. When recognizing Chinese, Chinese should be set at the first place.

Documents to be identified:

This type of document is not suitable for natural language processing (unless you do a lot of deep learning), but this is the type of content that health Notes will primarily preserve.

Recognition results:

InBody Moisture TinBody770) ID 15904113359 Height Year 𠳕 Sex 75 Male Test Date/Time (dialysis) 172cm (1946.07.10) 201, 10.09.16:39 Body moisture composition Body moisture composition Body moisture content ((L) 60 0O 100 110 120 130 100 170 32.5 Total body moisture 32 5T 30 0AA. Intracellular moisture () 70 10 GO 100 10% net moisture 19 9L 22 7-277 19.9 Extracellular moisture 12.6L (No.13 170 Extracellular moisture (L) HOF 00 100 110 120 13o 140 160 170 % segment water analysis 12.6 Right upper limb 1.80 L (201-279 Extracellular water ratio analysis left upper limb 2.00 L 2 07-2 79 lower per fir 16 8t 17 4 213 Extracellular water ratio 0.320 0.340 0360 0 380 0.300 0.400 0410 0 420 0.430 0 440 0 450 Right lower gum 5.65L (6 08-743 0.390 left lower limb 5.72 L (6 08-743 segment water analysis Human component analysis Protein 8.7 kg ( 9B~120 Standard inorganic salts 2.83 Hg 3.38~4 14 Right upper limb (L) 70 85 100 15 130 45 160 175 1G0 205 1.80 Body fat 30.0 xg (7.8-156 left upper limb (L) 55 70 85 100 115 130 145 175 Fat-free weight 44.0 Mg (49 8~00 9 2.00 bone mineral content 2.37 kg (279~3.41 trunk (L) 70 80 90 100 110 120 130 40 150 160 170 Muscle fat analysis 16.8 Body weight 74.0xg 55 3-74.9 Right lower limb (L) 80 90 100 110 120 130 40 150 160 170%5.65 Skeletal muscle content 23.9 kg 27 8-340 muscle mass 41.6 kg 47.0-57 4 Left lower limb (L) 70 80 90 100 110 120 130 140 150 160 170%5.72 Body fat content 30.0 kg (7.8 to 156 obesity analysis segment extracellular water ratio analysis BMI 25.0 kg/m (18.5 to 25.0 body fat percentage 40.5% (10.0 to 200 0 43 0.42 research Item - Edema basal metabolism SLAUGHTER 1321 kcal (1593 to 1865 waist-to-hip ratio 1.07 0.80 to 0.90 0.395 ABDOMINAL circumference 102.1 cm slight edema 0 39 0.389 0.393 visceral fat area 171.8 cm3 Obesity 90~110 0 38 0.379 114 % normal 0.376 body cell volume 28.5 kg (32.5~39.7 037) Upper arm girth 32.4cm 0 36 Upper arm muscle girth 27.5cm Right upper limb left upper limb trunk Right lower limb Left Lower Limb TBW/FFM 73.9% Fat loss BODY mass index Body moisture history 14.9kg /m' Fat mass index 10.1 kg/m' Body weight (kg) 86.1 79.1 81.0 79.3 73.5 74.0 Whole body Phase Angle (50xz) 4.6 Total body moisture 39.9 35.8 37.1 43.6 35. 32.5 Biological resistance resistance - Intracellular moisture (L) 23.7 22.0 22.9 26.2 Right upper limb Left upper limb 1000 Right lower limb left glue 21.1 19.9Zq) 1 MHlz/438.4 383.5 35.6 331.9 323.0 5 g.428.0 374.7 34.4 324.1 315.2 Extracellular moisture (L) 16.2 13.8 14.2 17.4 14.0 50 K1/377.9 334.7 31.0 294.0 285.0 12.6 250 H12/345.4 306.2 27.2 275.1 265.0 500 MHz 334.7 296.9 Extracellular moisture ratio 0.406 0.386 0.383 0.400 0.398 0.390 1000&h2/328.6. 291.3 23.9 265.7 255.3: 1903 28 20 01.22: 20.05 20 20 08 24 21 07 01:21 10.09 129 11 13 11.34 16.31: 1639 Ver Lookin Body120 32a6- SN. C71600359 Copyrgh(g 1296-by InBody Co. Lat Au Pghs resaned BR-Chinese-00-B-140129Copy the code

The result of recognition is closely related to document printing quality, shooting Angle and light quality.

Use NaturalLanguage for keyword extraction of text

Health Note is an app that records data as its core. The document scanning function is added to meet the needs of users for centralized filing and sorting of paper results. Therefore, only appropriate query keywords need to be extracted from the identified text to save.

NaturalLanguage introduction

NaturalLanguage is a framework for analyzing NaturalLanguage text and infering its language-specific metadata. It provides a variety of natural language processing (NLP) capabilities and supports many different languages and scripts. The framework is used to divide natural language text into paragraphs, sentences, or words, and to tag information about these fragments, such as parts of speech, lexical categories, phrases, scripts, and languages.

Using this framework, you can perform the following tasks:

  • Language Identification

    Automatically detects the language of a piece of text

  • Tokenization

    To break a piece of text into linguistic units or codes

  • Parts-of-speech tagging

    Mark individual words with parts of speech

  • Speech reduction (LEMMTRANSCEND)

    Stems are derived from morphological analysis

  • Named Entity Recognition

    Identify the marker as a person, place, or organization name

The idea of extracting keywords

In this case, the format of the physical examination report is not very friendly for text recognition (users will submit various types of report results, which makes it difficult to conduct targeted deep learning), and it is also difficult to do part-of-speech tagging or entity recognition for the recognition results. So I just did the following steps:

  • pretreatment

    Remove symbols that affect Tokenization. In this case, because the text is received from VNRecognizeTextRequest, there are no control characters that would cause Tokenization to crash.

  • Tokenization (word segmentation and removal of unwanted information)

    Create an NLTokenizer instance for word segmentation. The general code is as follows:

  let tokenizer = NLTokenizer(unit: .word) // The granularity level of the toggle operation
  tokenizer.setLanguage(.simplifiedChinese) // Set the language of the text to be split
  tokenizer.string = text
  var tokenResult = [String]()
  tokenizer.enumerateTokens(in: text.startIndex..<text.endIndex) { tokenRange, attribute in
      let str = String(text[tokenRange])
      if attribute ! = .numeric, stopWords[str] = = nil, str.count > 1 {
                tokenResult.append(str)
      }
      return true
  }
Copy the code
  • duplicate removal

    Remove duplicate content.

    After doing the above, the image above ends up with the following (optimized for Spotlight)

Body moisture Height Sex male Date time dialysis composition Cell HOF analysis Upper limb ratio Right lower limb body composition protein standard inorganic salt fat weight mineral trunk musculoskeletal BMI percentage Research Item Edema basal metabolism Abdominal circumference mild visceral area obesity Normal upper arm circumference TBW FFM index history Record whole body phase biological resistors lower left MHLZ recently all VER Lookin copyrgh Lat PGHS resransom ChineseCopy the code

I do not have the knowledge and experience of NLP, the above processing process only by their own feeling, if there is any mistake, welcome to correct. Better results should be obtained by optimizing the recognized line height of text, enriching stopWords and customWords, and collocative judgment. The quality of the scanned image has the greatest impact on the final result.

Full text retrieval with CoreSpotlight

In addition to saving text for retrieval in Core Data, we can also add it to the system index to make it easier for users to search with Spotlight.

For more on how to add Data to Spotlight and how to call Spotlight in your app for retrieval, see my other article showing Core Data in your app in Spotlight.

conclusion

A seemingly difficult function, even if the developer does not have the relevant knowledge and experience reserves, can be implemented by using the API provided by the system. The official API has been able to handle the general requirements of the scenario, deserves kudos for Apple’s efforts.

I hope this article has been helpful to you.

The original post was posted on my blog wwww.fatbobman.com

Welcome to subscribe my public account: [Elbow’s Swift Notepad]