About speaker
Fabien Vauchelles is an Anti-Ban Expert. With over a decade of experience in Web Scraping, Fabien's passion for code and technology helps him to bypass protections. He is the creator of Scrapoxy, a mature free and open-source proxy waterfall tailored for the Web Scraping industry.
About speakers company
I work independently and focus solely on open-source projects. Scrapoxy is freely accessible and open to the entire community.
This session is a workshop with progressively challenging exercises, lasting 90 to 180 minutes to fit your schedule.
You can preview the workshop here: https://github.com/fabienvauchelles/scraping-workshop
We’ll tackle protection measures step by step with proxies, headless browsers and deobfuscation. I developed the website https://trekky-reviews.com specifically for this workshop, featuring the latest techniques used by anti-bot systems.
The ideal attendance size is 30, but I can easily accommodate between 15 and 60 participants.
The best part? Everyone will walk away with actionable skills to legally gather data using these cutting-edge methods.
Alternatively, I offer a 45-minute live-coding session if that's preferred.
Here’s a sneak peek of the 2-hour workshop:
1. Introduction (4 mins)
To kick off the workshop, I engage the participants by asking about their experiences with bypassing website protection. This sets the stage for introducing myself and expressing my passion for web scraping and reverse-engineering anti-bot measures.
2. Legal (4 mins)
Let's take a proactive approach. Here's a straightforward decision pathway: If the data is public, non-personal, you don't need to agree to any terms (T&C), and you're not causing harm (DDoS), then you're good to go!
3. Website Target Structure (4 mins)
I created a dedicated website for this workshop: https://trekky-reviews.com/. This site features various iterations. Each fortified with progressively challenging protections. Throughout the workshop, we'll manoeuvre through these defences.
4. Framework Installation and 1st challenge (15 mins)
I will guide participants through the installation of the Scrapy framework and kickstart the first project.
5. Basic Challenge-Solving (15 mins)
Participants will engage in solving 2 challenges:
- Bypass Useragent filtering
- Add consistent HTTP headers
6. Proxies Overview (5 mins)
I explain the different types of proxy: Datacenter, ISP, Residential, and Mobile, outlining their respective advantages and drawbacks.
7. Proxies Challenges (20 mins)
We'll set up Scrapoxy and configure the first connector. Participants will tackle 2 challenges:
- Bypass Rate Limit with Datacenter proxies
- Avoid detection with ISP proxies
8. Headless Browser Challenge (20 mins)
Participants will install Playwright and tackle a series of challenges, including:
- Executing Javascript with a headless browser
- Tuning headless browser parameters (like timezone)
9. Code Deobfuscation (10 mins)
I'll introduce techniques for deobfuscating both strings and code-flow.
10. Deobfuscation Challenge (20 mins)
With the installation of Babel.js, participants will start reverse engineering a protection through deobfuscation. They will replicate the anti-bot behaviour, including payload encryption.
11. Conclusion (3 min)
As a wrap-up, I will present upcoming challenges and potential solutions, leaving us with food for thought into the future of protections.
The Program Committee has not yet taken a decision on this talk
Katharina Fetzer
hylane GmbH
Daniel Raniz Raneland
factor10
Shelly Goldblit
Dell Technologies
Sivan Biham
Healthy.io
Fabien Vauchelles
Scrapoxy
Ambesh Singh
Visionet Systems Deutschland
Vadzim Prudnikau
Trainitek
Federico Fregosi
OpsGuru
Opemipo Disu
Latitude
Pradeep Sharma
Independent
Alexandre Gallice
Red Hat
Daniel Raniz Raneland
factor10
João Esperancinha
Vereniging COIN
Vadzim Prudnikau
Trainitek
Siddhant Agarwal
Neo4j
Ambesh Singh
Visionet Systems Deutschland
Tech Internals Conf is the leading conference for developers of complex and highly loaded systems
Participation options
Offline
The price is soaring —> the closer the conference is, the more it costs.
The current price of a ticket is —> 360 EUR
If you have any questions you can reach out to our support service —> support@internals.tech
Special offer (from 5 tickets)
To order from 5 tickets, contact us support@internals.tech
leave a requestChanged your mind?
Please tell us why.
Thank you for your reply!
Professional conference for developers of high-load systems