Creating a word cloud with local restaurant data in our neighborhood - Hashscraper workshop

Explore information on Python data visualization libraries such as Matplotlib, Seaborn, Plotly, Bokeh, and Altair through hands-on practice of visualizing and crawling local restaurant data.

2
Creating a word cloud with local restaurant data in our neighborhood - Hashscraper workshop

1. Data Visualization and Crawling

1.1. What is Data Visualization?

Data visualization is the process of representing complex data or patterns in an intuitive way to make it easier to understand. It can represent information in various forms such as pictures, charts, graphs, etc., and effective visualization greatly helps in discovering insights hidden in the data.

1.2. Types of Python Data Visualization Libraries

1.2.1. Matplotlib(https://matplotlib.org/)

Matplotlib is the most basic library used for drawing 2D graphs in Python.

It is used when various chart and plot styles such as line plots, scatter plots, bar charts, histograms, pie charts, etc., are needed. It provides high flexibility in plot styling, making it one of the most commonly used libraries as a foundation for other visualization libraries.

1.2.2. Seaborn(https://seaborn.pydata.org/)

Seaborn is an advanced visualization library built on top of Matplotlib.

It is mainly used for statistical data visualization. Seaborn provides better color palettes, themes, and graphic elements than Matplotlib and supports various charts such as heatmaps, pair plots, violin plots, etc.

1.2.3. Plotly(https://plotly.com/python/)

Plotly is a library for creating interactive graphs.

It is optimized for creating interactive plots on the web. Since Plotly is based on D3.js, it is supported in various platforms and languages.

1.2.4. Bokeh(https://bokeh.org/)

Bokeh is a Python library for creating more sophisticated visualizations.

It generates dynamic data visualizations, making it ideal for creating web applications. Nowadays, with everything becoming more dynamic and responsive, using Bokeh can help create more attractive and beautiful reports.

1.2.5. Altair(https://altair-viz.github.io/)

Altair is a Python-specific declarative statistical visualization tool based on Vega and Vega-Lite.

Altair is designed to focus on the core content of the data, moving away from the complexity of graphic tasks. One of the key features of Altair is its API, which is based on the intuitive and consistent syntax of Vega-Lite, allowing users to quickly learn it. Thanks to this structure, users can visualize data concisely and aesthetically without complex code.

These five are some of the representative data visualization libraries, and various other libraries are being released and developed in real-time. It is recommended to combine various visualization libraries to analyze data more intuitively rather than using only one.

Next, we will explain the 'Word Cloud' which will be used as the main topic among these libraries.

2. What is a Word Cloud?

A word cloud is a visualization technique that represents words in different sizes based on their frequency or importance in text data. The higher the frequency of a word, the larger it is represented in the cloud, making it very intuitive. Word clouds have the advantage of easily identifying which words or topics are important in the text at a glance.

2.1. Differences from Other Visualization Libraries

The biggest difference with the visualization libraries mentioned above is that it is text-based.

The libraries introduced earlier are all based on structured data (data already in the form of datasets such as Excel), but Word Cloud helps users analyze and visualize text data such as comments and posts written on the web by analyzing and segmenting them.

3. Example Practice

3.1. Hashscraper Login

...

Read the data

Read the data using read_excel.

import pandas as pd
my_hometown_shop = pd.read_excel('../blog/네이버 지도 수집.xlsx')
my_hometown_shop

...

Apply Word Cloud

Apply the word cloud using the refined data.

from wordcloud import WordCloud
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm

sys_font = fm.findSystemFonts()
nanum_fonts = [f for f in sys_font if 'Nanum' in f]
path ='/Library/Fonts/NanumBarunpenRegular.ttf'

# 아까 언급했던 경로로 직접 설정합니다.
path = '/Library/Fonts/NanumBarunpenRegular.ttf'

wordcloud = WordCloud(width=800, height=800,
                      background_color='white',
                      max_words=200,
                      contour_width=3,
                      contour_color='steelblue',
                      font_path=path).generate(' '.join(my_hometown_shop['카테고리']))

plt.figure(figsize=(10, 10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

...

By creating a word cloud using the collected local restaurant data as shown above, you can intuitively check the distribution of industries by region.

Please try various ways of utilization.

Also read this article:

Automate Data Collection Now

Start crawling over 5,000 websites without coding in just 5 minutes

Start for free →

Comments

Add Comment

Your email won't be published and will only be used for reply notifications.

Continue Reading

Get notified of new posts

We'll email you when 해시스크래퍼 기술 블로그 publishes new content.

Your email will only be used for new post notifications.