Detailed explanation of the limitations in web scraping (crawling) using Selenium and Chromedriver, features of undetected_chromedriver as a solution, installation and usage instructions, and various related options.

Detailed explanation of the limitations of web scraping using Selenium and Chromedriver, features of undetected_chromedriver, installation and usage instructions, and various options.

5
Detailed explanation of the limitations in web scraping (crawling) using Selenium and Chromedriver, features of undetected_chromedriver as a solution, installation and usage instructions, and various related options.

What is undetected_chromdriver?

notion image

Web scraping and web automation play important roles in many data analysis and web development tasks today. Among the tools used for these tasks, Selenium and Chromedriver are the most widely used.

Selenium was originally created as a tool for automating tests of web applications, but due to its functionality and versatility, it is widely used as a web scraping tool among data analysts and web developers.

Chromedriver is a WebDriver for controlling the Google Chrome browser within the Selenium framework. This allows developers to execute automated test scripts to navigate websites or perform actions like a user (clicking, scrolling, etc.) automatically.

However, when using these basic tools to scrape websites, most modern websites have security mechanisms that detect and block such automated access. For example, when trying to scrape a site with high security using Selenium and chromedriver, the site detects and blocks these actions as it considers them to be done by bots to protect their data. In such situations, the need for undetected_chromedriver arises.

notion image

undetected_chromedriver, unlike regular Chromedriver, is a Python library that allows you to avoid detection and blocking of website automation. In other words, it makes running scraping scripts appear as if it were done by an actual human.

Through this, you can collect data provided by websites more reliably and efficiently. undetected_chromedriver enables developers familiar with Selenium and Chromedriver to bypass these detection mechanisms without additional knowledge about web scraping security.

undetected_chromedriver is a tool that helps make web scraping tasks easier and more efficient.

Installation and Usage

pip install undetected-chromedriver

Install it like this.

import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from fake_useragent import UserAgent
from selenium_stealth import stealth
from selenium.webdriver.chrome.service import Service

options = uc.ChromeOptions()
  • Various options

options = uc.ChromeOptions allows you to create options to put in.

# 팝업 차단을 활성화합니다.
options.add_argument('--disable-popup-blocking')


# WebDriver 객체 생성
driver = uc.Chrome( options = options,enable_cdp_events=True,incognito=True)

# selenium_stealth 설정
stealth(driver,
        vendor="Google Inc. ",
        platform="Win32",
        webgl_vendor="intel Inc. ",
        renderer= "Intel Iris OpenGL Engine",
        fix_hairline=True,
        )


options.add_argument('--remote-debugging-port=9222')

# 웹사이트 방문
# driver.get('[원하는 웹사이트 방문]')

# 대기 시간 설정 => 대기 시간을 설정하여, html이 렌더링 되는 시간을 벌어줍니다.
driver.implicitly_wait(2)

# 자바스크립트 코드 실행
driver.execute_script("Object.defineProperty(navigator, 'plugins', {get: function() {return[1, 2, 3, 4, 5];},});")

  • Using fake_useragent to use a different user agent string for each execution is a good practice. This allows you to change commonly used information for identifying browsers.

  • Using selenium_stealth can mask some features that some sites use to detect web scrapers.

  • Using driver.execute_script() to execute JavaScript code to override the navigator.plugins property can block another way websites detect scrapers.

  • Using driver.delete_all_cookies() to delete all cookies can help prevent websites from tracking previous visits.

By using various options like this, you can make undetected chromedriver work.

try:
    driver.find_elements(By.XPATH, "//div[@class='lister__item__inner']/a")[8].click()
    driver.implicitly_wait(2)
    driver.delete_all_cookies()
    driver.execute_script("Object.defineProperty(navigator, 'plugins', {get: function() {return[1, 2, 3, 4, 5];},});")
except:
    print("치단당함")

Insert the code in the same way you used chromedriver and selenium before.

Key Differences Between Existing ChromeDriver and undetected_chromedriver

1) Enhanced Features: undetected_chromedriver has more enhanced anti-bot inspection features compared to regular ChromeDriver. It is designed to bypass defense mechanisms some websites use to prevent automated actions like web scraping.

2) Stealth Mode: undetected_chromedriver introduces several additional technologies to better conceal automation actions using Selenium. This includes removing the navigator.webdriver flag, setting WebGL vendor, setting language plugin order, setting resolution, and more. In short, it can be thought of as adding full options to chromedriver.

3) Responsiveness: Some users find undetected_chromedriver to be faster and more responsive than the existing ChromeDriver. This can vary depending on the website, and not all websites will work perfectly.

Conclusion

As websites continue to develop and apply new defense mechanisms, ongoing research and methods for avoiding the latest defense technologies are necessary.

Also, read this article:

Automate Data Collection Now

Start in 5 minutes without coding · Experience with scraping 5,000+ websites

Get started for free →

Comments

Add Comment

Your email won't be published and will only be used for reply notifications.

Continue Reading

Get notified of new posts

We'll email you when 해시스크래퍼 기술 블로그 publishes new content.

Your email will only be used for new post notifications.