Scraping Book Prices

steverichter · March 27, 2022, 4:03pm

I have many out-of-print books. Therefore, I’m planning a project to query abeBooks.com or other booksellers for current collectible book prices. I’ve found one article on this subject by Ricardo Avila.

According to Avila, ABE does not have a documented API, so I’m probably looking at web-scraping. Does anyone have experience or advice in the area of web-scraping, especially used book prices?

OliverBarrett · March 27, 2022, 4:12pm

Check the legal stuff first. Some sites prohibit scraping. It's such a contentious area, legally, I've never actually gone ahead and done it. (My two-cents: get permission in writing first from any site you want to scrape, but check with your lawyer first too!)

For tools, there are free tools for both Python (for example, BeautifulSoup) and Java (for example htmlunit) and others.

Malcolm · March 27, 2022, 7:19pm

If a book price has been used is it harder to scrape off the page?

You are really hoping that the pages are generated in a way that allows you to locate the information consistently. If it does then your job is to slurp the page, suck out the good bits, spit the rest away. Then do that all over again. Tedious but do-able.

Good regex savvy tools will make this job easier.

OliverBarrett · March 28, 2022, 11:07am

Although I use RegEx all the time, I don't see a good reason to use RegEx with all the free and widely-used HTML libraries available that understand HTML structure.

Assuming you're willing to take the legal risk and you have a Mac, you could also use "PDFPenPro", the PDF Utility, and not write a lick of code or a regex. You can point PDFPen Pro to a URL and it will scan the entire site (to the depth you specify) and create a PDF for every page.

Topic		Replies	Views
Security in the internet Lounge (Discussions) community	18	338	October 31, 2019
Removing Public Data - Facebook and LinkedIn Questions integration , privacy , rest-api	0	257	March 1, 2021
E-signature platform for back office Bobino's	7	971	February 18, 2020
Generative AI Application Retrofit – Extracting Structured Information Douglas Swatski Collection community	1	123	February 21, 2024
Nice little new feature: {{PageCount}} Bobino's	5	639	June 13, 2020

Scraping Book Prices

Related topics