filter >

Photo

Live-Coding: Master Anti-Ban & Web Scraping Techniques with Scrapoxy

Fabien Vauchelles

from Scrapoxy (France)

About speaker

Fabien Vauchelles is an Anti-Ban Expert. With over a decade of experience in Web Scraping, Fabien's passion for code and technology helps him to bypass protections. He is the creator of Scrapoxy, a mature free and open-source proxy waterfall tailored for the Web Scraping industry.

About speakers company

I work independently and focus solely on open-source projects. Scrapoxy is freely accessible and open to the entire community.

Abstracts

specific

The session will be a 35-minute live-coding demonstration, followed by a 10-minute Q&A.

You can preview the slides here: https://bit.ly/masteringcfp45

In this presentation, I'll take attendees on an intriguing story:

Meet Isabella, a visionary AI engineer with a head full of dreams. She wants to revolutionise the tourism industry. But there is a catch - she's missing the crucial ingredient for her AI model: data.

We’ll join Isabella on her data quest, tackling protection measures step by step with proxies, headless browsers and deobfuscation. I developed the website https://trekky-reviews.com specifically for this talk, featuring the latest techniques used by anti-bot systems.

The best part? Everyone will walk away with actionable skills to legally gather data using these cutting-edge methods.

Alternatively, I offer a more in-depth 3-hour workshop if that's preferred.

Here’s a sneak peek of the live-coding:

1. Introduction (3 mins)
To kick off the presentation, I engage the audience by asking about their experiences with coding a web scraper. This sets the stage for introducing myself and expressing my enthusiasm for web scraping.

2. Narrative (2 mins)
I share a compelling narrative to this audience: Meet Isabella, a visionary AI engineer with a head full of dreams. To build her product, she needs to collect vital data and bypass protections.

3. Legal (2 mins)
Let's take a proactive approach. Here's a straightforward decision pathway: If the data is public, non-personal, you don't need to agree to any terms (T&C), and you're not causing harm (DDoS), then you're good to go!

4. Website Target Structure (3 mins)
I created a dedicated website for this presentation: https://trekky-reviews.com/. This site features various iterations. Each fortified with progressively challenging protections. Throughout the presentation, we'll help Isabella to manoeuvre through these defences.

5. Framework Presentation (2 mins)
I introduce a brief overview of the Scrapy framework and how to write a spider.

6. Live Challenge-Solving (21 mins)
Now, let’s dive into live-coding.

We will help Isabella to tackle a series of challenges:
- altering HTTP headers (3 mins)
- using Datacenter and Residential Proxies with Scrapoxy (7 mins)
- leveraging and tuning Headless Browser to overcome fingerprint (4 mins)
- code deobfuscation and crafting of anti-bot payload (7 mins)

7. Conclusion (2 min)
As a wrap-up, I will present upcoming challenges and potential solutions, leaving us with food for thought into the future of web scraping.

The talk was declined

other talks of this topic