Three tips for application data crawling: reverse engineering, OCR, packet sniffing

I will introduce three tips for application data crawling. Try analyzing the app's internals through reverse engineering, OCR, and packet sniffing.

7
Three tips for application data crawling: reverse engineering, OCR, packet sniffing

1. Reverse Engineering

Reverse Engineering is the process of decompiling the binary code of an app to restore the original source code or a similar form of it.

Android's APK files and iOS's IPA files can be analyzed through decompilation.

This method is mainly used to understand the internal logic of an app or to identify data communication methods.

Analysis may not be possible for apps that have special treatments to avoid reverse engineering.

Below are separate summaries of Android APK reverse engineering and iOS IPA reverse engineering.

1.1. Android APK Reverse Engineering

Tools: jadx, APKTool, Dex2jar, etc.

1.1.1. Steps for Android APK Reverse Engineering

  1. Download the APK file from Google Play Store.

  2. Decompile the APK file using tools like APKTool.

  3. Analyze the decompiled source code to understand the app's logic, API endpoints, data structures, etc.

  4. Analyze network packets or check logs to understand the data communication methods.

1.2. iOS IPA Reverse Engineering

Tools: Hopper, IDA Pro, Ghidra, etc.

1.2.1. Steps for iOS IPA Reverse Engineering

  1. Download the IPA file from the App Store.

  2. Decompile the IPA file using the chosen tools.

  3. Analyze the decompiled source to understand the app's internal logic, data communication methods, etc.

2. OCR

OCR (Optical Character Recognition) is a method of capturing text from app screens.

The accuracy of OCR is influenced by various factors.

Text size, font, background color, etc., can affect accuracy.

It may not always deliver high accuracy and is not a recommended method.

However, it can be useful in situations where access to APIs or source code is not possible.

2.1. OCR Tools

  • Tesseract: The most widely known open-source OCR engine.
  • Google Cloud Vision API: Google's OCR service with high accuracy and support for various languages.
  • ABBYY FineReader: Commercial OCR software providing high accuracy.

2.2. Steps for Using OCR Tools

  1. Capture the screen displaying the required information in the target app.

  2. Preprocessing of the image may be necessary to improve OCR accuracy. This involves tasks like noise removal and adjusting brightness and contrast.

  3. Convert the preprocessed image to text using OCR tools.

  4. Since the text obtained through OCR may not be refined and could contain errors, refinement is necessary.

  5. Analyze the refined text and store the necessary data.

3. Packet Sniffing

Packet sniffing is the process of capturing and analyzing data packets in a network.

This method is useful for understanding how an app communicates with a server during app crawling.

Packet sniffing allows analysis of the app's internal logic, used API endpoints, authentication mechanisms, etc.

3.1. Packet Sniffing Tools

  • Wireshark: The most widely used packet sniffing tool.
  • Charles Proxy: Mainly used for web development and mobile app analysis. Provides SSL proxying for analyzing HTTPS traffic.
  • MITMproxy: Analyzes traffic by simulating Man-in-the-Middle attacks.
  • tcpdump: A text-based packet analysis tool useful in server environments.

3.2. Steps for Packet Sniffing

  1. Install packet sniffing tools and configure proxy settings on the app or device if necessary.

  2. Capture traffic using packet sniffing tools while the app is running and exchanging data.

  3. Analyze the captured packets to identify API endpoints, protocols used, data formats transmitted, etc.

  4. Understand the requests sent to the identified API endpoints and how responses are received.

    Typically, RESTful APIs use HTTP methods like GET, POST, PUT, DELETE.

  5. Once the necessary data or patterns are identified, write crawling code based on them.

4. Conclusion

API endpoints may expose sensitive information such as user login details. Additionally, there is always a possibility of API endpoint changes with app updates. Therefore, it is recommended to operate with countermeasures in place.

Check out this article as well:

Automate Data Collection Now

Start in 5 minutes without coding · Experience with 5,000+ website crawls

Get started for free →

Comments

Add Comment

Your email won't be published and will only be used for reply notifications.

Continue Reading

Get notified of new posts

We'll email you when 해시스크래퍼 기술 블로그 publishes new content.

Your email will only be used for new post notifications.