Sitemap

The Battle of Tokyo Wards

14 min readFeb 5, 2021

Planning Food Tours in Tokyo Using K-Means Clustering Algorithm (Part of IBM’s “Battle of Neighborhoods” Capstone Project)

Access my project’s Notebook here.

Izakaya

Introduction: The Business Problem

As the world’s largest and most populated metropolis, Tokyo has a lot to offer in a myriad number of ways. One of the most exciting aspects is the dining experience. Tokyo features a full spectrum of both local and regional Japanese cuisine in addition to all types of international fare. From cheap hole-in-the-wall joints in the alleyways to expensive high-class restaurants on Roppongi Hills, delicious food can be found in every corner of the city with virtually every budget in between.

A fictitious start-up tour company called 2 Rice 1 Sake reached out to me last week about an idea to launch an exciting travel concept organized around “food tours.” The client plans to attract young and older foreign tourists who want to go on a fun food escapade in and around Tokyo. The firm was recently established and has few contacts on the ground. 2 Rice 1 Sake has requested an initial exploratory study on Tokyo’s different wards and their culinary landscape so that he/she can choose where to focus its food tours in the later stages of the project formulation. Specifically, the client wants to obtain a broad picture of the kinds of restaurants that are popular and most frequented in different wards. The client is not interested in other recreational venues like parks, game centers, or sports facilities.

To fulfill the client’s demands, I leverage Foursquare location data and deploy K-Means clustering method to group Tokyo’s 23 districts into their categories based on their restaurant venues information. K-Means clustering is suitable for this project because it takes unlabeled data and categorizes them based on similar features that provide key insights into underlying patterns about restaurant venues in Tokyo. To provide an additional depth to the picture that would contribute to the client’s planning process, I also utilize K-Means clustering to analyze publicly available dataset on celebrity visits to Tokyo with the aim of grouping the districts into categories based on restaurant venues where prominent celebrities such as Tyler the Creator and Anthony Bourdain have visited.

Proof that Tyler was there ;)

Data Requirements

List of Tokyo wards including their coordinates (longitude and latitude)

Ward data will be from Wikipedia to produce a DataFrame that shows key details about each of the 23 wards that exist in Tokyo. Using the Pandas library, the coordinates of the 23 major wards will be obtained from the geocoder class of Geopy client.

Source: https://en.wikipedia.org/wiki/Special_wards_of_Tokyo#List_of_special_wards

Popular restaurant categories in each ward

The Foursquare API key will be used to access all possible venues located in each ward. Only restaurant venues will be filtered and analyzed given that our client prioritizes food tours above all else. We will then use K-Means clustering to cluster them into different groups for analysis.

Prominent celebrities and locations of their dine-outs in Tokyo

After the culinary features of each ward has been determined, this dataset will be imported from Kaggle and the exploratory analysis will be replicated. The dataset, which contains details about which celebrities visited Tokyo and the locations they had dined out during their stay, will be merged with the previous DataFrame. We will filter out the celebrities, the restaurants they dined at, and the locations of the latter.

Source https://www.kaggle.com/alnguyen22/celebrities-in-tokyo

Structure of the Exploratory Study

This exploratory study consists of two parts.

Part 1 maps out the types of popular restaurant venues in Tokyo after leveraging data on Tokyo wards from Wikipedia and geocoder class of the Geopy client, as well as Foursquare API. After the initial exploratory analysis, I adopt K-means clustering as a method for grouping types of popular restaurants into their respective categories. The result is a greater understanding of the primary clusters of restaurant venues in Tokyo that are most frequented. I provide observations and show how the clusters would help the client focus on a specific niche when going forward to formulating a marketing strategy.

Part 2 replicates the same methodology in Part 1 with the data set on celebrity visits to Tokyo obtained from Kaggle. The result of this secondary analysis complements the findings in Part 1 by providing the client with a greater understanding of the popularity of some Tokyo wards that managed to attract prominent celebrities. I reflect on the findings and show why the client might want to select areas or restaurant venues where celebrities have visited when going forward to formulate his or her marketing strategy.

The study ends with final ruminations about the exploratory study and advances a set of recommendations for how the client should plan his/her food tours in Tokyo.

Methodology

Part 1: Mapping Popular Restaurant Venues in Tokyo

I: Data Importation and Wrangling

In Japan, cities are administratively subdivided into “wards,” which are local entities directly controlled by the municipal government. They handle administrative functions such as registration, health insurance, and property taxation. For Tokyo, the metropolis is subdivided into “special wards,” which are city-level wards with municipal autonomy largely comparable to other forms of municipalities. For this exploratory study, we focus on Tokyo, which has a total of 23 special wards. We begin by creating a function that scrapes the names of the special wards from the listed table that is available on Wikipedia and subsequently import them into a DataFrame, which will be cleaned and prepared for exploratory analysis.

II: Obtaining the Longitude and Latitude of Each Ward

Once we have imported the wards into a Pandas DataFrame and cleaned them, we proceed to locate the coordinates of each ward. I use the geocoder class from the Geopy client to extract the coordinates of each special ward in Tokyo. The longitude and latitude values of each ward will be appended to the DataFrame we just created. We will use these new columns to help plot the locations on the map and locate nearby values.

III: Explore Tokyo Wards

Now that we have the coordinates for each ward, we will generate visualizations of different Tokyo wards according to their values using Folium, which is a library used for visualizing geospatial data. Next, using Foursquare API, we will first examine Setagaya Ward as a vector for getting a glimpse into the types of venues in Tokyo before going on to obtain nearby venues in all of the city’s wards. Finally, because our client has prioritized restaurants over other venues, I filter out restaurant only venues and examine the extent to which they cluster together.

After visualizing the different special wards in Tokyo, we briefly explore the types of venues at are in Setagaya, the largest of the 23 wards. Known for its reputation as an upscale residential district that features a mix of greenery and funk shopping and nightlife areas, Setagaya is highly popular among students and young adults. We begin by calling the Foursquare API key that allows us to access venue data. Using the same DataFrame, we then filter out Setagaya and extract the first 100 venues that are in the ward within a radius of 500 meters.

Shimokitazawa, Setagaya Ward

After obtaining the top 100 venues, we then create a function that extracts the categories of venues in Setagaya. From this initial exploratory analysis of Setagaya, we see that Foursquare provides useful glimpses into the ward’s diverse culinary landscape that help explain its reputation as a popular spot for students and young adults.

After our surface tour of Setagaya, we will extend our exploration to all Tokyo wards following the same steps that were conducted in the previous section. Again, create a function that extracts the categories of nearby venues in all Tokyo wards. We also create a new DataFrame and filter out only restaurant venues.

After creating the new DataFrame, we use Matplotlib to visualize the most frequently visited restaurant category in Tokyo. Our findings so far indicate that Ramen restaurants are the most popular and frequently visited, followed by Chinese restaurants and Japanese restaurants (I assume that these could be Japanese restaurants that do not exclusively offer Ramen but also other non-Ramen dishes).

IV: Cluster the Wards Using K-Means

K-Means clustering is an unsupervised machine learning algorithm that can be deployed to locate clusters of information that share similar characteristics and to classify these groups into special categories. The algorithm works by creating a determined set of clusters from the data points defined by the user. It subsequently iteratively tries to find the optimum centroid to classify the data points into. From our exploratory data analysis of all restaurant venues across Tokyo’s 23 special wards, we can potentially spot the formation of some major clusters.

We can deploy K-Means clustering to find restaurant categories and wards where some of the popular venues are concentrated. This would provide helpful guidance for the client when making decisions about selecting restaurant categories that fit the concept of Tokyo food tours. We first pre-process the data using “One Hot Encoding,” which creates a binary column for each category and returns a sparse matrix or dense array. After we find the mean value for the frequency of visits to a particular venue, we then create a new DataFrame and display the top 10 venues for each ward.

Press enter or click to view image in full size

After merging this Dataframe with the previously cleaned Dataframe, the K-value to 3, which means we want to group our data points (popular restaurant categories) into 3 major clusters, we get the following results:

Part 1 Results and Discussion

My findings after conducting K-Means clustering show that the three main clusters of the most frequented restaurant venues are 1) Ramen restaurants 2) international restaurants and 3) Japanese restaurants. Among international restaurants, Chinese restaurants appear to be the most frequented.

Cluster 1: Japanese Restaurants

Press enter or click to view image in full size

Cluster 2: Ramen Restaurants

Press enter or click to view image in full size

Cluster 3: International Restaurants (Particularly Chinese)

Press enter or click to view image in full size

The visualization of the three main clusters indicates that Ramen restaurants (Cluster 1; blue) are spread out across the city and are also concentrated in major entertainment and shopping districts such as Shinjuku and Shibuya. Japanese restaurants (Cluster 2; red) that might offer additional dishes beyond Ramen seem to in smaller entertainment districts around the edges of the center that have less foot traffic. Some of these districts are also suburban and residential areas. International restaurants (Cluster 3; light green) are primarily in Itabashi and Bunkyo.

Press enter or click to view image in full size
Press enter or click to view image in full size

Further analysis of ramen restaurants showed that these venues are mostly concentrated in Ota, followed by Shinjuku, Chiyoda, Toshima, and Shibuya wards.

Since Ramen restaurants form the largest cluster, the client should consider organizing his/her food tours in Tokyo around them. The client might also be interested in selecting a few Japanese restaurants in the second cluster to provide customers with a wider range of local dining experiences. However, the client might also find it more appealing to base the food tours exclusively on ramen.

Part 2: Mapping Celebrity Visits to Tokyo (and Where They Dined)

I: Data Importation

Tokyo’s flare has without a doubt has drawn significant international attention. From global movie premieres to concerts, the city is no stranger to welcoming a myriad group of renowned celebrities. Films that millennials have grown up with, such as Lost in Translation, Kill Bill, The Last Samurai Resident Evil: After Life, and Inception, also feature scenes that were shot in Tokyo. It has also featured live performances from top-chart artists such as ASAP Rocky and Tyga. Given that celebrities have large individual followings, it might be a useful exercise to consider the restaurants where they had dined during their visits to Tokyo and in which areas these venues are primarily concentrated in. This would provide the client with additional options when deciding on the scope of his or her food tours.

Since the dataset on celebrity visits to Tokyo is available as a CSV file that is accessible via Kaggle, we do not need to perform any data wrangling. We first begin by importing the CSV file and using Pandas to create a DataFrame. After creating the DataFrame, we filter out only restaurant venues since the dataset also contains non-restaurant venues such as bars and parks.

II: Obtaining the Longitude and Latitude of Each District (Within a Ward)

Like in Part 1, we obtain the longitude and latitude values of each location. However, this time we will extract the coordinates of each district within a given ward. For example, Shibuya Ward contains well-known commercial and residential districts such as Harajuku, Omotesando, Ebisu, Sendagaya, and Ebisu.

After obtaining the coordinates for each district, we then visualize their locations using Folium. We can already some clusters forming, particularly those more in the major entertainment and shopping districts and more residential areas around the edges of these centers.

We first pre-process the celebrity visits data using One Hot Encoding and then find the mean value for the frequency of visits to each district. Subsequently, we create a new DataFrame and display the top 3 venues for each district.

For our analysis, we will only on the 1st most common celebrity (first column) who visited the venues given that many of the values for the 2nd to 10th most common celebrity in every cluster were actually 0 (meaning that they never visited these locations). I set the number of most common celebrity to 10 to see how the data would look like if our dataset on celebrity visits was larger than the current one.

III: Explore and Cluster Districts using K-Means

After merging the DataFrame with the previously cleaned DataFrame and setting the K-value to 3, we get the following results:

Part 2 Results and Discussion

Cluster 1: Major Entertainment and Shopping Districts

Cluster 2: Smaller Entertainment Districts

Cluster 3: Smaller Entertainment Districts

When we visualize the three main clusters, we can see that foreign celebrities tend to visit restaurant venues located in the major entertainment and shopping districts such as Shibuya and Shinjuku (Cluster 1; blue). Cluster 2 (red) and 3 (light green) are similar in the sense that districts such as Ueno, Asakusa, and Hatanodai are smaller entertainment districts. These exceptions are Anthony Bourdain. We might speculate why Bourdain preferred these venues since his TV program focused on exploring local cultures and their best-kept secrets.

Asakusa, Taito Ward

Conclusion and Recommendations

This exploratory sought to provide marketing solutions for 2 Rice 1 Sake, an emerging client in the global tourism industry who plans to launch a novel traveling concept that is built on taking customers on exciting food tours in and around Tokyo, Japan. This study utilized a combination of exploratory data analysis and K-Means clustering methods to analyze data on popular restaurant categories and popular celebrity locations in Tokyo with the aim of into the city’s dining-out scene and diverse culinary landscape. The main findings from this study can be summarized into the following points:

  1. Ramen restaurants form the largest cluster among the most visited restaurant types, followed by Japanese and international restaurants respectively. Among the international restaurants, Chinese restaurants are the most popular.
  2. Ramen restaurants are mostly concentrated in Ota, followed by Shinjuku, Chiyoda, Toshima, and Shibuya wards.
  3. Most prominent foreign celebrities tend to visit restaurant venues located in the major entertainment and shopping districts such as Shibuya and Shinjuku. Only a handful visited venues in smaller entertainment districts.

There are some limitations that should be noted in this study. Before discussing the specific limitations intrinsic to the first and second part of the study, it is worthwhile to point out the methodological shortcomings of the overall study. The defined number of clusters (K value) for Part 1 and Part 2 was arbitrarily determined, which might explain why the results returned in Part 2 showed two clusters that came to be interpreted as “Smaller Entertainment Districts” instead of just one cluster of the same values. To avoid this kind of arbitrary selection of K value, this study should have transformed the data to fit a standard normal distribution and deployed the “elbow method,” which runs the K-Means algorithm for a range of possible K-values to find the optimal number of clusters.

The limitations that are central to Part 1 and 2 are linked to the comprehensiveness of the data at hand. First, the Foursquare data should be observed with some caution; it should be pointed out that there may be more Ramen restaurants that may have not been mentioned or “checked-in” by Foursquare users that might lead to an undercounting of some of these restaurants across Tokyo. Second, Foursquare did not have data on each restaurant’s menus, which limits our understanding of what selection of dishes they have to offer. Third, because we only looked at the most frequently visited types of restaurants, the ratings of each restaurant’s food quality which could have provided the client with more specific insights were outside the scope of this study. Similarly, because Part 2 sought out to map out locations where the celebrities had dined during their stay in Tokyo, it left out analyses of the menus, prices, and ratings of each restaurant’s food quality.

Notwithstanding these notable limitations, the findings from this exploratory study provide key insights into Tokyo’s culinary landscape that would help the client develop a more comprehensive marketing strategy. To enhance the client’s planning and future decision-making process, the following recommendations should be taken into consideration:

  1. The client should seek out Ramen spots in the following wards: Ota, Shinjuku, Chiyoda, Toshima, and Shibuya.
  2. The client should also consider Japanese restaurants that offer more than Ramen dishes.
  3. The client should diversify his/her food tour experience in ways that complement the hunt for the most fulfilling ramen by selecting additional venues where prominent celebrities have dined out. This would be added value for the client given that celebrities have a huge following. Going forward, the client should conduct additional research on which celebrities best align with his or her target niche so that they would be attracted to joining the tour.
  4. The client should select venues that are both in major and lesser entertainment and shopping districts areas to maximize the flavor and character of the food tour.

--

--

No responses yet