When an element cannot be crawled: Accessing iframes
When creating a web crawler, you will encounter iframes at least once. Today, let's explore the concept, advantages and disadvantages, and use cases of iframes, as well as how to access iframe content using Selenium and Watir.
Concept of iframe
An iframe, short for "Inline Frame," is an HTML element used to include another web page within a web page. In other words, it provides the functionality to insert another web page within a small frame on a single web page. This allows for content integration and interaction between web pages, and it is utilized for various use cases.
When iframes are used
1) Inserting content from other websites into my website
- Websites use iframes for various reasons such as embedding YouTube videos, SlideShare slideshows, Google Maps, or advertising content. Iframe can also be used to insert PDFs that users do not need to download.
2) Separating areas within a page
- Iframe can be used to separate areas within a website, such as dividing the left side into a menu area and the right side into the main content area, or separating the advertising section at the bottom of the page.
3) Loading dynamic content
- Dynamically loading content or web apps can be displayed within the current page using iframes. This enables interaction between web pages.
Advantages and disadvantages of iframes
1) Advantages
Convenient for importing external content
Using iframes, you can easily embed content from other websites into the current page, making it useful for integrating content from different services.
You can easily embed a map with the simple code below. (Source: mdn web docs)
<iframe
id="inlineFrameExample"
title="Inline Frame Example"
width="300"
height="200"
src="https://www.openstreetmap.org/export/embed.html?bbox=-0.004017949104309083%2C51.47612752641776%2C0.00030577182769775396%2C51.478569861898606&layer=mapnik">
</iframe>
Useful for reusing the same content multiple times
- When you need to use the same content on various pages, using iframes can help avoid duplicate work.
Does not affect the current page
- Content within an iframe operates independently from the external web page, so changes in external content may not affect the current page.
2) Disadvantages
Can be vulnerable to security risks
- Using iframes can allow external content to be executed under the same domain and security policies as the internal page, potentially leading to security risks like XSS (Cross Site Scripting).
Accessibility issues may arise
- Content within iframes is separated from the original document, which can cause accessibility issues. It may inconvenience users who rely on assistive technologies like screen readers.
May have a negative impact on SEO
In the past, iframes were considered unfavorable for SEO because search engines could not fetch their content. However, it is now challenging to determine the direct impact of iframes on SEO.
Search engines can detect iframes and recognize them as content from other websites, so iframes do not necessarily help or harm search engine rankings.
However, it is advisable to avoid using iframes on main pages that you want to rank higher in search results. It is better for SEO to compose main pages with unique and useful content rather than external content.
May be difficult to resize
- Since the size of iframes adjusts based on the original content's size, it can pose challenges for responsive layout design.
User experience may be degraded
- Loading content within an iframe may cause the page content to pause or delay, leading to a degraded user experience.
How to retrieve information from an iframe in Selenium
In Selenium, you can access iframes by following these steps.
1) Accessing the webpage
from selenium import webdriver
from selenium.webdriver.common.by import By
driver = webdriver.Chrome() # 웹 드라이버 선택 (예: Chrome)
driver.get("URL_OF_YOUR_PAGE") # 접근할 웹 페이지의 URL 입력
2) Identifying the iframe element
You can find the iframe using the find_element method. You can utilize the iframe's id, name, index, etc.
iframe_element = driver.find_element(By.ID, "iframe_id") # iframe의 id를 사용하여 찾는 예시
3) Switching to the iframe
Change the context to the iframe using the switch_to.frame() method.
driver.switch_to.frame(iframe_element) # iframe 내부로 전환
4) Performing necessary tasks within the iframe
driver.find_element(By.CLASS_NAME, "element_class") # iframe 안의 엘리먼트의 class로 찾는 예시
5) Returning to the original context after completing the tasks
driver.switch_to.default_content() # 원래의 컨텍스트로 복귀
How to retrieve information from an iframe in Watir
In Watir, you can access iframes by following these steps.
1) Accessing the webpage
require 'watir'
browser = Watir::Browser.new :chrome # 브라우저 선택 (예: Chrome)
browser.goto 'URL_OF_YOUR_PAGE' # 접근할 웹 페이지의 URL 입력
2) Identifying the iframe element
iframe_element = browser.iframe(id: 'iframe_id') # iframe의 id를 사용하여 찾는 예시
3) Performing necessary tasks within the iframe
iframe.element(xpath: "//button[@class='CLASS_NAME']") # iframe 안의 엘리먼트를 class로 찾는 예시
Conclusion
When creating a web crawler, if you encounter an XPath that exists but cannot find the element, most likely the element is inside an iframe. When crawling, elements inside iframes cannot be directly accessed, requiring separate methods to access them.
Today, we explored the concept, advantages and disadvantages of iframes, and how to access iframes when creating web crawlers.
The methods mentioned earlier will enhance your web crawling abilities and greatly assist in collecting various data.
Collect Data, Automate Now
Start in 5 minutes without coding · Experience crawling 5,000+ websites




