Pharmacy Web Scraper
This was a short project done for a friend doing medical research at UIC. The research team needed a list of all pharmaceutical clinics in Cook County. Using a list from Countyoffice.org of all pharmacies in Cook County I scraped for the information they needed which was pharmacy name, pharamacy address, zip code, and phone number. An example output including 4 of the 833 listed pharamacies is shown to the right.
An issue we had when scraping was that by default the site only displayed 50 pharmacies. A 'load more' button at the bottom of the webpage could be used to get the next 50. Intially, I tried to use Selenium to automate the process of loading more and scraping however this proved difficult and not worth the time. We ended up getting a list of all 833 pharmacy urls from Chrome site elements instead. Using this list of urls I scraped each for the deliverable information.
You can find the code for this project below.
pip install requests
pip install beautifulsoup4
from bs4 import BeautifulSoup
import requests
import pandas as pd
import IPython
page = IPython.display.HTML(filename='/content/allurls.html')
with open("allurls.html") as fp:
soup = BeautifulSoup(fp, "html.parser")
pharmacy_urls = soup.find_all("div", class_="listings")
for pharmacy_url in pharmacy_urls:
links = pharmacy_url.find_all("a")
for link in links:
url_suffix = link["href"]
all_links = "https://www.countyoffice.org" + url_suffix
## ALL OF THIS FOR EACH INDIVIDUAL URL
page2 = requests.get(all_links)
hotsoup = BeautifulSoup(page2.content, "html.parser")
url_2 = hotsoup.find_all("div", class_="col-md-7")
for pharm_properties in url_2:
Name = pharm_properties.find("dd", class_="name")
Address_Parent = pharm_properties.find("dd", class_="address")
Address_City = Address_Parent.find("span", class_="addressLocality")
Address_State = Address_Parent.find("span", class_="addressRegion")
Address_Street = Address_Parent.find("span", class_="streetAddress")
#Address = [Address_Street, Address_City, Address_State]
Zip = Address_Parent.find("span", class_="postalCode")
Phone = pharm_properties.find("dd", class_="telephone")
print(Name.text.strip())
print(Address_Street.text.strip() + " " + Address_City.text.strip() + ", " + Address_State.text.strip())
print(Zip.text.strip())
print(Phone.text.strip())