Blocket is a popular Swedish marketplace where individuals and businesses can buy and sell a wide range of products and goods. From a research and analysis point of view, the data from Blocket can provide valuable insights into consumer preferences and industry trends.
In this blog, we will learn how to scrape Blocket to extract mobile device data from the website. It will help us track the popularity of different mobile brands, identify the pricing patterns, and gain a deeper understanding of consumer behavior in the use mobile phone market.
Why Scrape Blocket?
Scraping data from Blocket opens a gateway to a wealth of insights that can be immensely advantageous for various purposes. Here’s how usa phone number list delving into Blocket’s data through web scraping can benefit retailers:
-
Market Intelligence: By scraping Blocket’s data, you gain access to real-world transactions and interactions. This enables you to grasp what products are in demand, what prices they fetch, and the overall pulse of the market.
-
Consumer Preferences: The data extracte from Blocket offers a direct window into the choices of consumers. You can uncover what types of products at this point marketing automation gain traction, which brands are more popular, and the features that attract buyers.
-
Pricing Patterns: Understanding how prices fluctuate on Blocket can aid businesses in setting competitive prices. By gauging the relationship between prices and factors like condition, brand, and features, you can make informe pricing decisions.
-
Trend Spotting: Blocket’s data holds the gambler data power to reveal emerging trends. Whether it’s certain brands gaining momentum or new product categories surfacing, this information can guide your strategic planning.
-
Business Strategy: For sellers, scraping Blocket provides a strategic edge by offering insights into successful selling tactics. Learning from other listings’ strengths and weaknesses can inform your own sales approach.
-
Competitive Analysis: Scraping helps you monitor your competitors. By observing how others position their products, set prices, and engage with customers, you can adapt and refine your own strategies.
-
Data-Driven Decisions: In today’s data-driven world, making decisions without reliable information can be risky. Scraping Blocket equips you with the data neede to back up your choices and minimize uncertainties.
Let’s dive into the scraping process.
Also Read: A Guide to Scrape Indee using Selenium and BeautifulSoup
The Attributes
To begin the scraping process, we first nee to identify the attributes which nee to be extracte. The following attributes are extracte for each mobile phone ad on the website.
-
product_url: It is the unique address of the mobile ad on the Blocket website.
-
product_name: It specifies the model of the mobile.
-
price: It is the selling price of the mobile.
-
description: It is a short detail about the device.
-
seller_name: It provides the name of the individual or organization selling the device.
Require Libraries
The first step in any scraping process is to import the require libraries. We will be scraping Blocket using Selenium which is a tool use to automate web browsers. The following libraries are importe for the scraping process.
-
Selenium web driver is a tool used for web automation. It allows a user to automate web browser actions such as clicking a button, filling in fields, and navigating to different websites.
-
ChromeDriverManager is a library that simplifies the process of downloading and installing the Chrome driver, which is require by Selenium to control the Chrome web browser.
-
BeautifulSoup is a python library that is use for parsing and pulling data out of HTML and XML files.
-
The lxml library of Python is use for the processing of HTML and XML files. An ElementTree or etree is a module in lxml use to parse XML documents.
-
The csv library is use to read and write tabular data in CSV format.
-
The time library is use to represent time in different ways.
Scraping Process
After importing the required libraries, the next step is to initialize a few variables which we will be using later in the program. The first variable is the pagination_url. On examining the Blocket website, we found that there are 40 web pages of use mobile ads and each page consists of 40 products. We nee to scrape each web page and for that purpose we use the pagination_url variable. It is a URL we identified, to which we will append the web page number. This will form the complete URL of a web page and then we will scrape the mobile ad URLs of each product from that page.
The scraped product links are not complete and therefore invalid. They are appended to a base_url to form a complete and valid URL. The number of web pages, 40, is assigned to a variable named total_no_of_pages. We also initialize an empty list named product_list to which we will be storing the URL of each product from each of the 40 pages.
Selenium can interact
We need to open the web browser so that Selenium can interact with it and scrape the required details. For this, we create an instance of the Chrome web driver using the ChromeDriverManager method. This instance is assigned to a variable named driver. The web driver is downloaded and installed. Next, we define a function named get_dom() which takes as input the URL of a website.
In this method, the web Chrome driver will first open the URL and retrieve the page source code using the driver.page_source attribute. It will contain the HTML code of the loaded page and will be stored in a variable named page_content. Then, we will create a BeautifulSoup object called product_soup by parsing the page source code using the ‘html.parser’ HTML parser converts it to an ElementTree object using the et.HTML() method, and returns the resulting DOM tree. This DOM is a hierarchical representation of the HTML structure of the page.
Extraction Process
As mentioned earlier, there are a total of 40 web pages with used mobile ads and each page consists of 40 ads. First, we will navigate through each web page and extract the links of all the products. We will be navigating through the web pages using a for loop. During each iteration of the loop, the value is concatenated with the pagination_url and the URL for a page is created.
Next, we call the get_dom() function by passing the URL as the parameter, extract the link of all the products on that page and store it in a list named page_product_list. This list is then added to the list product_list which we initialized as an empty list in the beginning. When the for loop terminates, we will have the link of all the products on all the pages in the list product_list.Now that we have the link of all the products, we can extract the required details by iterating through this list. During each iteration, the following functions are called. Each function is used to extract a particular detail.
Writing data to a CSV File
Extracting the data is not enough. We need to store it somewhere so that we can use it for other purposes like analysis. Now we will see how to store the extracted data to a csv file.
The following code opens a CSV file named ‘blocket_mobile_data.csv’ in the write mode. Then we initialize a writer object named theWriter. The column names are written into a list named heading and then written to the csv file using the writerow() function.
Then we iterate through each element of the list product_list. The list elements are incomplete product links. Each element is then concatenated with the base_url to form a complete and valid URL. The get_dom() function is called by passing the URL as a parameter and the returned dom is stored in a variable named product_dom. Then each attribute of the product is extracted by calling the functions mentioned earlier and passing the product_dom as a parameter to the function call. The extracted attributes are then written to the csv file. After extracting each attribute, we call the sleep() method of the time library, which causes the program to pause for a few seconds. This is a way to avoid getting blocked during scraping. After all the process has been completed, we call the driver.quit() command, which closes the web browser that was opened by the selenium web driver.
Wrapping up
In a nutshell, scraping data from Blocket’s used mobile phone section arms both buyers and sellers with vital insights. Buyers can make smarter choices, picking the right mobile device, while sellers can better understand demand and pricing.
Ready to supercharge your e-commerce data game? Turn to DataHut’s web scraping services and convert raw data into smart decisions. Contact us today and kickstart your journey towards smarter insights and success.