I have prepared an example of crawling real-time charts on the limited edition trading platform Claude, Kream (https://kream.co.kr).
1. What is Claude?
Claude is an AI chatbot service from Anthropic, an artificial intelligence startup founded by OpenAI employees and strongly supported by Google.
Claude 2 was announced on July 11, 2023, and was launched in Korea on October 17, about 3 months later.
Claude provides the ability to chat in Korean and attach up to 5 files of up to 10MB in size, including PDF, TXT, and CSV files. It supports up to 100,000 tokens in the prompt, allowing input of many more tokens than GPT-4 (up to 32,768).
2. Analyzing HTML with Claude
Now let's try crawling price change information from the real-time chart on Kream.
Let's copy the HTML of the <ul> tag containing the product list for analysis.
In a previous post on building a Coupang crawler bot with ChatGPT, the issue was the token count for GPT-4. Can GPT-4 analyze this HTML?
It seems unable to handle it due to token limits.
What about Claude?
First, Claude receives HTML as a txt file for analysis.
The analysis results include not only product information but also HTML tag information and the libraries used. This allows us to determine what information can be collected based on the analysis results.
3. Extracting Web Data with Claude
Now let's ask Claude for the code to extract data.
3.1. Writing the prompt
I entered the prompt as follows.
셀레니움으로 웹 페이지 데이터를 추출하고자 해. 언어는 Python을 사용해줘.
위 HTML에서 상품 이미지 URL, 상품명, 모델 번호, 사이즈, 가격, 가격 변동, 가격 변동 비율을 가져와줘.
위 항목은 모두 필수 데이터이므로 해당 데이터가 없다면 나에게 어떤 데이터를 찾을 수 없는지 오류 메시지를 전달해야 해.
모든 HTML element는 xpath로 찾아줘.
위 조건을 가지고 JSON 형식으로 데이터를 출력해줘.
3.2. First Response
As seen in the image, code is provided to fetch data using Selenium and xpath.
However, since the latest version of Selenium does not support find_element_by_xpath, I requested the following modification.
내 셀레니움 버전에서는 `find_element_by_xpath`를 사용할 수 없어.
`By`를 import해서 `find_element`로 요소를 찾도록 수정해줘.
3.3. Second Response
It has been modified to use find_element and By.
Here is the complete code written by Claude.
from selenium import webdriver
from selenium.webdriver.common.by import By
import json
driver = webdriver.Chrome()
driver.get("<https://www.example.com>")
products = []
items = driver.find_elements(By.XPATH, "//ul[@class='trading-items-container']/li")
for item in items:
try:
img_url = item.find_element(By.XPATH, ".//picture/img").get_attribute("src")
except:
img_url = "Image URL not found"
try:
name = item.find_element(By.XPATH, ".//div[contains(@class,'name')]").text
except:
name = "Name not found"
try:
code = item.find_element(By.XPATH, ".//div[contains(@class,'style-code')]").text
except:
code = "Code not found"
try:
size = item.find_element(By.XPATH, ".//div[contains(@class,'option')]").text
except:
size = "Size not found"
try:
price = item.find_element(By.XPATH, ".//div[contains(@class,'current-price')]").text
except:
price = "Price not found"
try:
change = item.find_element(By.XPATH, ".//div[contains(@class,'change-price')]/span").text
except:
change = "Price change not found"
try:
change_pct = item.find_element(By.XPATH, ".//div[contains(@class,'change-percent')]").text.split()[0]
except:
change_pct = "Change percentage not found"
product = {
"image": img_url,
"name": name,
"code": code,
"size": size,
"price": price,
"price_change": change,
"price_change_pct": change_pct
}
products.append(product)
print(json.dumps(products, indent=2))
driver.quit()
Looking at the code, it satisfies all the requirements of the prompt, from exception handling to output in json format.
3.4. First Modification
It seems we only need to modify the Chrome driver settings here.
I have made the following changes from package imports to launching the Chrome browser.
import json
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
url = "<https://kream.co.kr/trading-chart>"
driver = webdriver.Chrome(service=ChromeService(ChromeDriverManager().install()))
driver.get(url)
3.5. First Execution
Now that everything seems ready, shall we give it a try?
[
{
"image": "<https://kream-phinf.pstatic.net/MjAyMzA0MDNfMTE1/MDAxNjgwNDkxOTQ4NjQ0.W-mLA596MM6US4gSI6jNsJ9NhCbIeUR5mg0qxQ6p_AUg.Ec1REyClrXQ38dSM40zZhVidrYf0gxBsnc7C1SqK71kg.JPEG/a_e78ed04e3a1948a09a7b26bff5b27634.jpg?type=m>",
"name": "Nike x Peaceminusone LS T-Shirt Black (DR0098-010)",
"code": "DR0098-010",
"size": "M",
"price": "152,000",
"price_change": "\u25b2",
"price_change_pct": "+1.3%"
},
{
"image": "<https://kream-phinf.pstatic.net/MjAyMTAxMjBfNTAg/MDAxNjExMTIyMzAzMTY2.0QpenidH075iEa1_gVzZTKOxJPiB8W6KyX4YmiELcfIg.ydUN0Aq9T8nWA5frqrsbwnbFfsNMNHDx6ezU5WxGAfQg.PNG/p_28c110281b3f4e109e30a6247d503e2a.png?type=m>",
"name": "Nike Dunk Low Pro Premium Medium Curry",
"code": "DD1390-100",
"size": "255",
"price": "107,000",
"price_change": "\u25b2",
"price_change_pct": "+0.9%"
},
{
"image": "<https://kream-phinf.pstatic.net/MjAyMTAzMDVfMjQz/MDAxNjE0OTE0NzIzMTQ4.tsuFUJtHGm4g4KE5EDikVMScORptOQqIB7afi1Nz2Qwg.sftz3YQOuw48xpNSJa1tV4uEsz5iU4mjIvpllgHWEn8g.PNG/p_8f7b72adbc924b5bbf7c670d55865e6b.png?type=m>",
"name": "Adidas Samba OG Cloud White",
"code": "B75806",
"size": "215",
"price": "130,000",
"price_change": "\u25b2",
"price_change_pct": "+0.8%"
}
]
<div data-v-4f90d828="" class="change-price up" style="color: rgb(241, 87, 70);">
<span data-v-4f90d828="" class="arrow">▲</span>
<span data-v-4f90d828="">27,000</span>
</div>
We have successfully retrieved almost all the information, but due to the HTML structure as shown, only the \\u25b2 character corresponding to ▲ is retrieved, and the change amount is not retrieved.
3.6. Second Modification
Let's modify it to also retrieve the change amount:
try:
change = item.find_element(By.XPATH, ".//div[contains(@class,'change-price')]").text.replace("\u25bc", "-").replace("\u25b2", "+")
except:
change = "Price change not found"
3.7. Second Execution
Here are the results after running it again:
[
{
"image": "<https://kream-phinf.pstatic.net/MjAyMTA3MzBfNTYg/MDAxNjI3NjI3OTg5ODM5.p-9pRcc3cQ_eaTl7UZdCmPJz8Astbjg2N6_HkEhLrHsg.ekjxXxJ-kcnoaZC6gCQH_-KBCUJORPTRC791UZ_C3ukg.PNG/p_e1ef5e002eda49adb7f5d0c8a41f798d.png?type=m>",
"name": "Jordan 1 x Travis Scott x Fragment Retro Low OG SP Military Blue",
"code": "DM7866-140",
"size": "260",
"price": "1,875,000",
"price_change": "+27,000",
"price_change_pct": "+1.5%"
},
{
"image": "<https://kream-phinf.pstatic.net/MjAyMzA0MDNfMTE1/MDAxNjgwNDkxOTQ4NjQ0.W-mLA596MM6US4gSI6jNsJ9NhCbIeUR5mg0qxQ6p_AUg.Ec1REyClrXQ38dSM40zZhVidrYf0gxBsnc7C1SqK71kg.JPEG/a_e78ed04e3a1948a09a7b26bff5b27634.jpg?type=m>",
"name": "Nike x Peaceminusone LS T-Shirt Black (DR0098-010)",
"code": "DR0098-010",
"size": "M",
"price": "152,000",
"price_change": "+2,000",
"price_change_pct": "+1.3%"
},
{
"image": "<https://kream-phinf.pstatic.net/MjAyMjEwMTJfMjQ3/MDAxNjY1NTYyODI3ODMw.YeSueVotfq5mgVXlQIVeV-uL_Q2otMk6nkvlbG2rHncg.vMJPWgosd9OjOn-s75jjEnjk44sB1gNqftQSN8CEkTAg.PNG/a_b2325461062d47e2887459a3e6928d88.png?type=m>",
"name": "Adidas Puffylette Core Black",
"code": "GY4559",
"size": "265",
"price": "78,000",
"price_change": "+1,000",
"price_change_pct": "+1.3%"
}
]
Now it seems to properly indicate both the change amount and its direction.
4. Conclusion
In this way, we have created a Kream real-time chart crawler using Claude.
With Claude, we were able to analyze long HTMLs as it allows attaching files up to 10MB.
While reducing the HTML size would be necessary to request ChatGPT to build a crawler, Claude handled long HTMLs without any issues.
Although Claude's response speed is slower compared to ChatGPT, there were no shortcomings in the responses.
In conclusion, for creating web crawlers, Claude seems more useful than ChatGPT.
Also, check out this article:
Automate Data Collection Now
Start in 5 minutes without coding · Experience crawling 5,000+ websites




