Menu

Close
  • Home
  • About Me
  • Contact Me
Subscribe
Menu

Benoit Bernard

My thoughts about programming, debugging and technology

Scroll Down
← Newer Posts Page 2 of 4 Older Posts →

The Tale of Creating a Distributed Web Crawler

Around 6 million records with about 15 fields each. This was the dataset that I wanted to analyze for a data analysis project of mine. But »

Benoit Bernard Benoit Bernard on web, crawler, scraper, distributed, scaling, python, politeness 12 September 2017

Web Scraping and Crawling Are Perfectly Legal, Right?

"Come on, I worked so hard on this project! And this is publicly accessible data! There's certainly a way around this, right? Or else, I did »

Benoit Bernard Benoit Bernard on scraping, crawling, legal, law, lawsuit, tos, harvesting, data 18 April 2017

The Case of the Mysterious Python Crash

It was almost 11PM. My distributed web crawler had been running for a few hours when I discovered a very weird thing. One of its log »

Benoit Bernard Benoit Bernard on python, crawler, logs, linux, crash, requests, eventlet, signals, timeout 14 March 2017

Using Uber's Pyflame and Logs to Tackle Scaling Issues

Here I was again, looking at my screen in complete disbelief. This time, it was different though. My distributed web crawler seemed to be slowing down »

Benoit Bernard Benoit Bernard on python, crawler, scaling, performance, profiler, uber, pyflame, logs, mongodb, zeromq, linux 14 February 2017

Tracking Down a Freaky Python Memory Leak (Part 2)

If you read part 1 of this series, you know that my crawler was plagued by several memory leaks. Using umdh, I was able to determine »

Benoit Bernard Benoit Bernard on memory leak, python, windows, lxml, libxml2, umdh, windbg 17 January 2017
← Newer Posts Page 2 of 4 Older Posts →
Benoit Bernard © 2023
Proudly published with Ghost