About speaker
Fabien Vauchelles is an Anti-Ban Expert. With over a decade of experience in Web Scraping, Fabien's passion for code and technology helps him to bypass protections. He is the creator of Scrapoxy, a mature free and open-source proxy waterfall tailored for the Web Scraping industry.
About speakers company
I work independently and focus solely on open-source projects. Scrapoxy is freely accessible and open to the entire community.
The session will be a 35-minute live-coding demonstration, followed by a 10-minute Q&A.
You can preview the slides here: https://bit.ly/masteringcfp45
In this presentation, I'll take attendees on an intriguing story:
Meet Isabella, a visionary AI engineer with a head full of dreams. She wants to revolutionise the tourism industry. But there is a catch - she's missing the crucial ingredient for her AI model: data.
We’ll join Isabella on her data quest, tackling protection measures step by step with proxies, headless browsers and deobfuscation. I developed the website https://trekky-reviews.com specifically for this talk, featuring the latest techniques used by anti-bot systems.
The best part? Everyone will walk away with actionable skills to legally gather data using these cutting-edge methods.
Alternatively, I offer a more in-depth 3-hour workshop if that's preferred.
Here’s a sneak peek of the live-coding:
1. Introduction (3 mins)
To kick off the presentation, I engage the audience by asking about their experiences with coding a web scraper. This sets the stage for introducing myself and expressing my enthusiasm for web scraping.
2. Narrative (2 mins)
I share a compelling narrative to this audience: Meet Isabella, a visionary AI engineer with a head full of dreams. To build her product, she needs to collect vital data and bypass protections.
3. Legal (2 mins)
Let's take a proactive approach. Here's a straightforward decision pathway: If the data is public, non-personal, you don't need to agree to any terms (T&C), and you're not causing harm (DDoS), then you're good to go!
4. Website Target Structure (3 mins)
I created a dedicated website for this presentation: https://trekky-reviews.com/. This site features various iterations. Each fortified with progressively challenging protections. Throughout the presentation, we'll help Isabella to manoeuvre through these defences.
5. Framework Presentation (2 mins)
I introduce a brief overview of the Scrapy framework and how to write a spider.
6. Live Challenge-Solving (21 mins)
Now, let’s dive into live-coding.
We will help Isabella to tackle a series of challenges:
- altering HTTP headers (3 mins)
- using Datacenter and Residential Proxies with Scrapoxy (7 mins)
- leveraging and tuning Headless Browser to overcome fingerprint (4 mins)
- code deobfuscation and crafting of anti-bot payload (7 mins)
7. Conclusion (2 min)
As a wrap-up, I will present upcoming challenges and potential solutions, leaving us with food for thought into the future of web scraping.
The Program Committee has not yet taken a decision on this talk
João Esperancinha
Vereniging COIN
Shelly Goldblit
Dell Technologies
Vadzim Prudnikau
Trainitek
Federico Fregosi
OpsGuru
Siddhant Agarwal
Neo4j
Daniel Raniz Raneland
factor10
Sivan Biham
Healthy.io
Vadzim Prudnikau
Trainitek
Opemipo Disu
Latitude
Daniel Raniz Raneland
factor10
Katharina Fetzer
hylane GmbH
Ambesh Singh
Visionet Systems Deutschland
Pradeep Sharma
Independent
Alexandre Gallice
Red Hat
Ambesh Singh
Visionet Systems Deutschland
Tech Internals Conf is the leading conference for developers of complex and highly loaded systems
Participation options
Offline
The price is soaring —> the closer the conference is, the more it costs.
The current price of a ticket is —> 360 EUR
If you have any questions you can reach out to our support service —> support@internals.tech
Special offer (from 5 tickets)
To order from 5 tickets, contact us support@internals.tech
leave a requestChanged your mind?
Please tell us why.
Thank you for your reply!
Professional conference for developers of high-load systems