E222: Gil Elbaz and Nova Spivack of Common Crawl on This Week in Startups



about this episode

Click here to stream the audio version of this episode.
Click here to stream the video version of this episode.

Join our new mailing list and be the first to learn about upcoming guests!

Gil Elbaz and Nova Spivack of Common Crawl–a free and open index of over 5 billion Web pages–joined This Week in Startups to talk about web data access and democratization. Gil and Nova helped us tackle topics like search, social networking data and how ordinary users can make use of the crawl.

0:00-2:30 Today on TWiST, Gil Elbaz and Nova Spivack of Common Crawl are here to talk about web search and indexing.
2:30-4:30 How’s everything going over at Factual, Gil?
4:30-5:15 Nova, how are you doing? What are you up to?
5:15-6:00 So you come up with ideas and get funding for them, then move onto the next idea?
6:00-8:00 Thank you to Walker Corporate Law for sponsoring the program. Everyone thank @ScottEdWalker!
8:00-9:45 Let’s talk about Common Crawl–what you’re doing and why it’s important.
9:45-12:00 Why is this a big concern of Google and Yahoo?
12:00-14:00 How much of ThisWeekIn.com to you have stored there?
14:00-16:30 When you shed light on open systems and platforms, that makes them even better, yes?
16:30-19:00 What’s the future looking like for search companies, in terms of building a competitor?
19:00-21:30 Nova: Explain how partners or other people can add to the data set.
21:30-24:00 What effect does the social web have on search?
24:00-26:30 Google Plus is the most successful social network since Facebook and Twitter, why hasn’t Facebook put out an open, searchable version?
26:30-31:30 Thank you to GoToMeeting for sponsoring the program. Sign you for your 30 day free trial at GoToMeeting.com and use the promo code ‘START.’
31:30-33:30 So why not launch this as a commercial product? Do you see a time that people would be able to pay to access this data cache?
33:30-35:00 Are there any applications that a consumer can go look at right now? (See Tineye.com.)
35:00-36:45 How often are you indexing?
36:45-40:00 Is the future of search providing an answer or is the future of search indexing the web?
40:00-41:00 Question from the chat: Does Live Matrix use Common Crawl?
41:00-43:30 What do you guys think Apple’s chances of entering the search space are?
43:30-46:15 Is Wikipedia so busy maintaining their site that they can’t delve further into search?
46:15-51:15 What are some other applications for the index (Common Crawl)?
51:15-56:45 Do you think, as a society, that we’re going to have access to these tools?
56:45-57:45 What did you see in Klout, Nova, that I didn’t see?
57:45-59:00 Nova: The Web is high school on a global scale.
59:00-1:00:45 Do you see Klout ever becoming similar to a FICA score?
1:00:45-1:01:00 If someone wants to get involved, where do they go?
1:01:00-1:01:45 Thank you to Gil and Nova for joining us today and to Walker Corporate Law and GoToMeeting for their sponsorship. We’ll see you next time!

Support This Week in Startups and independent media by joining the TWiST Producer Program at TwistList.co!

Multilingual? Translate this episode of TWiST into another language and email the transcript to translate@thisweekin.com

Keep up with the latest from our sister company, LAUNCH:

Jason: @jason
Tyler: @steepdecline
Gil: @gilelbaz
Nova: @novaspivack
Common Crawl: @commoncrawl
Walker Corporate Law: @scottedwalker
GoToMeeting: @gotomeeting

Special thanks to the members of the TWiST Backchannel Program!Executive Producers


Associate Producers

  • Brad Pineau
  • Kat Ganesan
  • Nicholas Christian
  • Mau Frontier
  • Kyle Braatz
  • Serena Ehrlich
  • Nathan Hangen
  • JD
  • Ian Gerstel
  • Julian Hearn
  • Alex Lotoczko
  • James Kennedy
  • Benoit Curdy
  • Asher Nevins
  • Mike Kaltschnee
  • Paul Higgins
  • William Doom
  • David Lee
  • Jake Kerber
  • Sarp Coskun
  • Giuseppe Taibi
  • Tyrone Rubin
  • Keno Vigil
  • Paul Peters
  • Jamal Waring
  • Nick Ostroff
  • Alex Binkley
  • John MP Knox
  • Zon Petilla
  • Bryan McCormick
  • Marcos Trinidad
  • Allen Cordrey
  • Daniel Mich
  • Joshua Rosen
  • Grant Carlile
  • James Smith
  • Christopher Rill
  • Elliot Myhre
  • Nihon Giga
  • Nathan Gielis


  • Ryan Hoover
  • Michael Cranston
  • Josiah Thomas
  • João Fernandes
  • Petrus Theron
  • Michael Wild
  • Dale Emmons
  • Tim de Jardine
  • Alejandro Vasquez
  • Milan Babuskov
  • Chris Rowe
  • Nelson Melo
  • James Dawson
  • Toddy Mladenov
  • Daniel Torres
  • Chris Macke
  • Piotr Zuralski
  • Armand Konan
  • Brian Vogel
  • Paul D
  • Jennifer Sun
  • David Kolb
  • George Gecewicz
  • Sue Marrone
  • Eugene Granovksy
  • Will Blackton
  • Ryan Dodds
  • Brett Arp
  • Jason Cresswell
  • Edwin Orange
  • Daniel Bradley
  • Shawn Daniel
  • Priidu Kull
  • Patrick Desroches
  • Alex Lam
  • Paul Secor
  • Ryan Urabe
  • Madhu R.
  • Paul Ardeleanu
  • Ian Thomas
  • Edwin Orange
  • Sarp Coskun
  • Manny Alarcon
  • Charlie Osmond
  • Christopher Smitley
  • Roshan H.
  • Barcy Cordrey
  • Greg Dickson
  • Brett Arp
  • Hello24 Ltd.
  • Ian Gerstel
  • Taphon Maddison
  • John Bradley
  • Liron Shapira
  • Luigi Armogida
  • Dave Ferrara
  • Janus Lindau
  • Chris Mancil


more episodes

See more in: 2012, Startups Tags:
Notice: Only variables should be passed by reference in /home/thiswee/public_html/wp-content/themes/supersimple/functions.php on line 92

Notice: Only variables should be assigned by reference in /home/thiswee/public_html/wp-content/themes/supersimple/functions.php on line 92
| Comments