About speaker
Fabien Vauchelles is an Anti-Ban Expert. With over a decade of experience in Web Scraping, Fabien's passion for code and technology helps him to bypass protections. He is the creator of Scrapoxy, a mature free and open-source proxy waterfall tailored for the Web Scraping industry.
About speakers company
I work independently and focus solely on open-source projects. Scrapoxy is freely accessible and open to the entire community.
The session will be a 35-minute live-coding demonstration, followed by a 10-minute Q&A.
You can preview the slides here: https://bit.ly/masteringcfp45
In this presentation, I'll take attendees on an intriguing story:
Meet Isabella, a visionary AI engineer with a head full of dreams. She wants to revolutionise the tourism industry. But there is a catch - she's missing the crucial ingredient for her AI model: data.
We’ll join Isabella on her data quest, tackling protection measures step by step with proxies, headless browsers and deobfuscation. I developed the website https://trekky-reviews.com specifically for this talk, featuring the latest techniques used by anti-bot systems.
The best part? Everyone will walk away with actionable skills to legally gather data using these cutting-edge methods.
Alternatively, I offer a more in-depth 3-hour workshop if that's preferred.
Here’s a sneak peek of the live-coding:
1. Introduction (3 mins)
To kick off the presentation, I engage the audience by asking about their experiences with coding a web scraper. This sets the stage for introducing myself and expressing my enthusiasm for web scraping.
2. Narrative (2 mins)
I share a compelling narrative to this audience: Meet Isabella, a visionary AI engineer with a head full of dreams. To build her product, she needs to collect vital data and bypass protections.
3. Legal (2 mins)
Let's take a proactive approach. Here's a straightforward decision pathway: If the data is public, non-personal, you don't need to agree to any terms (T&C), and you're not causing harm (DDoS), then you're good to go!
4. Website Target Structure (3 mins)
I created a dedicated website for this presentation: https://trekky-reviews.com/. This site features various iterations. Each fortified with progressively challenging protections. Throughout the presentation, we'll help Isabella to manoeuvre through these defences.
5. Framework Presentation (2 mins)
I introduce a brief overview of the Scrapy framework and how to write a spider.
6. Live Challenge-Solving (21 mins)
Now, let’s dive into live-coding.
We will help Isabella to tackle a series of challenges:
- altering HTTP headers (3 mins)
- using Datacenter and Residential Proxies with Scrapoxy (7 mins)
- leveraging and tuning Headless Browser to overcome fingerprint (4 mins)
- code deobfuscation and crafting of anti-bot payload (7 mins)
7. Conclusion (2 min)
As a wrap-up, I will present upcoming challenges and potential solutions, leaving us with food for thought into the future of web scraping.
The Program Committee has not yet taken a decision on this talk
Ambesh Singh
Visionet Systems Deutschland
Opemipo Disu
Latitude
Alexandre Gallice
Red Hat
Sivan Biham
Healthy.io
Federico Fregosi
OpsGuru
Vadzim Prudnikau
Trainitek
Siddhant Agarwal
Neo4j
Lara Mossler
Airbnb
Mariia Bulycheva
Zalando
Pradeep Sharma
Independent
Ambesh Singh
Visionet Systems Deutschland
Vadzim Prudnikau
Trainitek
Katharina Fetzer
hylane GmbH
Shelly Goldblit
Dell Technologies
Tech Internals Conf
is the largest conference for developers of complex
and highly loaded systems
Participation options
The price is soaring —> the closer the conference is, the more it costs.
what I`ll get?
Unlock up to 50%
off your ticket!
Enter your email to see your personalised discount — no commitment to purchase required
Changed your mind?
Please tell us why.
Thank you
for your reply!
of the largest conference for developers of complex
and highly loaded systems