Shadow DOM is a web development technology that encapsulates the style and structure of web elements, allowing them to function independently from external influences.

How does Shadow DOM differ from iframe?

Shadow DOM focuses on encapsulation of styles and structures, while iframe is used for embedding separate web pages within a current page.

Can external styles affect Shadow DOM elements?

No, external styles and scripts do not affect elements inside the Shadow DOM, and vice versa.

Is Shadow DOM more secure than iframes?

Shadow DOM offers better isolation from external influences, but it does not inherently enhance security compared to iframes, which can expose user data if untrusted content is loaded.

How can I scrape elements within Shadow DOM?

To scrape elements within Shadow DOM, you need to access the 'root' of the Shadow DOM directly using JavaScript, as standard tools like CSS selectors or XPath cannot reach it.

Shadow DOM Crawling Guide: Dissecting the Hidden Elements of the Web

1. What is Shadow DOM?

Shadow DOM is a technology used in web development, mainly to encapsulate the style and structure of web elements. Here, 'encapsulation' means protecting and maintaining web elements inside the Shadow DOM independently from external influences.

2. Difference between Shadow DOM and iframe

Shadow DOM and iframe both create isolated areas within a web page, but they serve different purposes and functions.

Below are three key differences:

2.1. Encapsulation vs Embedding

The core of Shadow DOM is 'encapsulation.' By using Shadow DOM, you can isolate the internal structure and style of web elements from the external environment. This prevents conflicts with styles or scripts of other elements within the same web page.

On the other hand, iframe focuses on the concept of 'embedding.' It allows inserting a separate window of another web page within the current page. For example, when embedding a YouTube video or Google Maps on a web page, iframes are used.

2.2. Style and Script

External styles and scripts do not affect the content inside the Shadow DOM, and conversely, styles or scripts within the Shadow DOM do not affect external elements.

Similarly, with iframes, styles or scripts from the parent page do not affect the content inside the iframe, and the content inside the iframe does not affect the parent page.

2.3. Security

Since Shadow DOM is isolated from the rest of the web page, accessing data or functionality inside the Shadow DOM from external sources becomes more complex. While this does not necessarily enhance security, it helps prevent unintended access to internal structures or data.

In the case of iframes, if external sources are untrusted pages, there is a risk of exposing user data or information, and loading iframe content via HTTP could weaken the overall page security.

For more information on security issues related to iframes and web scraping, you can refer to previous posts.

3. Shadow DOM Scraping Method

Usually, we use tools like CSS selectors or XPath to access HTML elements on a web page. However, in the case of Shadow DOM, these tools cannot access the internal structure. Therefore, we need to directly access the 'root' of the Shadow DOM using JavaScript.

Let's assume there is a web page structure as follows:

HTML Example:

&lt;custom-settings&gt;
  #shadow-root
    &lt;settings-panel&gt;
      #shadow-root
        &lt;settings-content&gt;
          #shadow-root
            &lt;input id="customInput"&gt;
        &lt;/settings-content&gt;
    &lt;/settings-panel&gt;
&lt;/custom-settings&gt;

As shown in the example above, Shadow DOM often has multiple layers of hierarchy. Therefore, to access

&lt;input id="customInput"&gt;

you need to navigate through the hierarchy step by step to find the Shadow Root.

Below is a Python code using Selenium to access the Shadow Root.

def get_shadow_root(element, driver):
    return driver.execute_script('return arguments[0].shadowRoot', element)

driver = ... # Selenium 웹 드라이버 초기화

# 첫 번째 Shadow DOM 접근
root1 = driver.find_element_by_tag_name('custom-settings')
shadow_root1 = get_shadow_root(root1, driver)

# 두 번째 Shadow DOM 접근
root2 = shadow_root1.find_element_by_tag_name('settings-panel')
shadow_root2 = get_shadow_root(root2, driver)

# 세 번째 Shadow DOM 접근
root3 = shadow_root2.find_element_by_tag_name('settings-content')
shadow_root3 = get_shadow_root(root3, driver)

# 원하는 요소에 접근
input_element = shadow_root3.find_element_by_id("customInput")

With this code, you can sequentially explore the hierarchy of the Shadow DOM and access the desired elements.

The key point is that each layer within the Shadow DOM has an independent DOM, so you need to obtain the Shadow Root of that layer to access the elements of the next layer.

4. Conclusion: Utilize Shadow DOM Scraping to Obtain Hidden Data Effectively

As web structures become more sophisticated, there will be more code with 'hidden elements.'

Today, we compared the characteristics and differences between iframe and Shadow DOM, and also conducted a simple example of web scraping. To adapt to the advancement of development technologies, it is essential to continuously research and master methods to access desired data.