digg

...now browsing by tag

 
 

10 High Order Bits from the Web 2.0 Expo in NY

Thursday, September 25th, 2008

10. Your Web App: Give it a REST

David Heinemier Hansson’s session about making Ruby on Rails RESTful cast this battle as an epic one between the REST Rebels and the Imperial WS-* Death Star. It’s going to be a tough fight but you know who’s gonna win.

REST (Representational State Transfer) is the elegant architecture and set of conventions first presented in Roy Fielding‘s PhD dissertation “Architectural Styles and the Design of Network-based Software Architectures“. It is well aligned with the HTTP protocol and much simpler to implement and use than SOAP, XMLRPC, etc.

Implementing RESTful APIs in web applications is getting really easy with leading frameworks like Rails and Cake supporting REST as a first-class citizen. The Atom format is leading the charge as a RESTful format supported by the big players: Google, Microsoft, Yahoo, Twitter, etc.

9. “It’s Not Information Overload, It’s Filter Failure”

Clay Shirky’s talk states that since the invention of the printing press humans have always faced information overload. We have been surrounded by more information than we can consume in an entire lifetime for centuries. The problem is not information over load, it’s filter failure. We need better filters.

Jay Adelson of Digg believes building better filters is exactly the mission Digg and other players in the collaborative filter space are addressing.

8. Sensor Driven Data: The Web is Getting Orwellian

With Apple putting GPS in iPhones, Google putting GPS in Android, Nikon putting GPS in the Coolpix P6000, and … you get the point. GPS, motion sensors, video recorders, microphones, and other sensors are increasingly distributed and surrounding us.

Tim O’Reilly believes a BIG revolution is happening Here. Tim is really bullish on sensor driven data. Where 2.0 has its own O’Reilly Conference. This space is heating up fast.

7. Javascript is Bringing Sexy Back to the Browser

John Resig gave a session on processing.js, his visualization engine running atop the HTML5 Canvas. The canvas has really low level functionality, a la OpenGL for 2-D surfaces, but with the right libraries in place it can lead to some truly impressive results. Flickr’s Paul Hammond gave perhaps the most compelling story of the use of Javascript and Canvas. After building Flickr Stats, using Canvas for graph visualizations, a team member loaded a page on an iPhone. It just flat out worked.

Unfortunately my friends over at Microsoft are slowing down the progress here with no planned support for the HTML5 Canvas in IE8. Google’s excanvas gets around this for IE users by mapping to VML. Unfortunately excanvas currently only works in quirks mode in IE8. Damnit Microsoft, you’ve brought IE8 a long ways towards being a modern, friendly player on the web, why not support Canvas? Come on Oz, come on.

6. The Open Web is Nearing the Tipping Point

DataPortability co-founders Chris Saad and Daniela Barbosa gave a great session on the basic motivations behind the movement. The future the DataPortability group is trying to create, one which allows us to owning our data, our contacts, our relationships, etc. and be able to move them freely and easily between the on-line systems we use sounds truly empowering. The big players are joining the party: Microsoft, Google, Facebook, Six Apart, Linked In, Yahoo, Digg, Plaxo, MySpace. But Chris says “Who cares about them? This is a grassroots effort!”

Joseph Smarr, Chief Architect of Plaxo, gave another interesting session on the major components of the open web and how they fit together. OAuth, OpenID, Open Social, and others were covered. The feeling I walked away with is that we’re a lot closer than I thought.

5. Web Scalability thanks to Async & Danga

“You can’t drop something in 40,000 buckets, synchronously, at once”, said Digg’s Lead Architect, Joe Stump in his session “Scaling Digg and Other Web Applications“. He was referencing what happens when Kevin Rose posts a message on Twitter. (Rose actually has nearly 65,000 followers on Twitter) Asynchronous task queuing is how the folks at Digg, Twitter, and Flickr deal with problems that are really hard to do in real time in any scalable fashion.

Just about all of Brad Fitzpatrick’s (of LiveJournal and OpenID fame) lightweight systems software, freely available at Danga.com, seems to be used by the biggest Web 2.0 players to achieve scale. That memcached, gearman, perlbal, djabberd, and mogilefs, all came out of Fitzpatrick and Danga is just incredible. No wonder Google gobbled him up from Six Apart.

4. Web 2.0 Traffic: It’s Out-of-Band

The knowledge tidbit that stuck out more in my mind than any other was that Twitter gets 10 times the amount of traffic from its API than it does through its website. It makes sense, I’d just never acknowledged it explicitly. Dion Hinchcliffe’s workshop painted a similar story for many other Web 2.0 successes. The canonical example is YouTube with the embedded video. The decision to put html snippets plainly visible, right beside of the video, was perhaps their most genius move. Modern web applications and services are making themselves relevant by opening as many channels of distribution possible through feeds, widgets, badges, and programmable APIs.

3. Cal Henderson’s PHP Tent Revival

If not for Cal Henderson I may have never have touched PHP again. I’m probably going to come back to this topic in more depth in a future post but Cal’s workshop “Scalable Web Architectures: Common Patterns and Approaches” renewed my interest in, relationship with, and respect for PHP. The funny thing is that wasn’t even the point of the talk. Cal and Joe Stump of Digg‘s succinct point that Langauges Don’t Scale is right on. Sure PHP isn’t as beautiful, trendy, or well designed as Python or Ruby are. However, some of the design decisions made by PHP’s Rasmus, specifically the ‘shared nothing’ness, make it a great technology for web applications. There’s a reason why Facebook, Digg, Flickr, and co. are still on it.

After Cal’s workshop I asked him: if you could do it all over again with Flickr would you choose to go with Python or Ruby? Cal’s answer: Nope, I’d do it in PHP.

2. Set Your Baby Free

By grooming and nurturing a web app internally for an extended period of time is you lose a lot of value. Jason Fried‘s notion of “half a product is better than a half-assed product” is so fitting here. Sandy Jen of Meebo echoes similar notions in her talk: Start out with something simple, see if it works, evolve. Bring your customers into the feedback loop as quickly as possible. Joshua Schachter, founder of delicious, spoke of the exact same sentiments in his talk on “Scaling and Building Social Systems“.

1. Want to Set the World on Fire? YOU Better Bring the Fire.

If you are not bringing the heat, get out of the kitchen. Passion was the common thread amongst the most inspiring talks I saw at the conference. Between Gary Vaynerchuk, Jason Fried, and Arianna Huffington the message was  consistent: be passionate. I’m going to let Gary roll this one out with his amazingly energetic keynote on building personal brand…

Jay Adelson – Organizing Chaos: The Growth of Collaborative Filters

Friday, September 19th, 2008

[Live from Web 2.0 Expo 9/16 - 9/19 Follow along the other Expo Talks in RSS.]

Jay Adelson is CEO of Digg, guiding all aspects of the company’s development, growth and management. Under his leadership, Digg has grown to 26 million visitors per month, and is now considered one of the top socially focused Web sites.

Why do collaborative filters matter? How many of you used google? How many of you have used Digg? Any time you take the interests of a group and use that to filter and create relevance for an audience and a group then that is collaborative filtering. Even search is a sense of collaborative filtering, just think about backrub and page rank or clicks on a search result. This has evolved.

So what’s changed? Now you’re on the web 24 hours a day. In 2003 Berkeley said there were about 2.3 million sites added every day. Now there’s about a terabyte a day added to the net. This data is dynamic. Privacy and sense of privacy has also changed. Younger generation doesn’t have the same issues associated with privacy that we have and our parents have. How I use my away message on AIM, “I’m at lunch”, whereas my teenage baby sitter’s will say “I’m feeling down” “I’m full”. We are moving from a seek culture to a connecting culture.

Let’s break social filtering down into three parts:

1) something like a Digg or a Zeitgeist is the same for everyone.

2) social networks where I create a subset of groups with just my friends. I can’t use my friends as a judging factor for what might be interesting to me.

3) The exciting thing, the point I can leave you with today, is the hyper-personalization opportunity. Instead of looking at a social network, look at everyone and pair you with people like you and use that collective wisdom that are more specifically interesting to you. Since your personal data is going to move from website to the next you have to think about how you can take that information and deliver experiences specific to individual users. Collaborative filters are the key to the monetization to Web 2.0 applications

Joe Stump – Scaling Digg and Other Web Applications

Thursday, September 18th, 2008

[Live from Web 2.0 Expo 9/16 - 9/19 Follow along the other Expo Talks in RSS.]

Joe Stump is currently the Lead Architect for Digg where he spends his time partitioning data, creating internal services, and ensuring the code frameworks are in working order.

Digg by the numbers: 30,000,000 Ron Paul fans. 13,000 requests a second, bunches of servers.

“Web 2.0 sucks (for scaling).” Web 1.0 was easy where we had this landrush of just getting content on-line.

Web 2.0 somebody had a bright idea that we would turn content over to the users. The problem is people like creating a lot of shit. Web 1.0 was easy to scale because I only needed to worry about a could hundred thousand some records. Now we’ve got a lot more to worry about. Another thing I hate is AJAX which makes interacting with websites really easy. It gives users the ability to create shit even faster.

Making your PHP code 300% faster doesn’t matter, it’s not where your bottlenecks are. “PHP Doesn’t Scale” – Cal Henderson. PHP doesn’t scale, Java doesn’t scale, Ruby doesn’t scale – languages don’t scale. When you’re worrying about scale and storing 4 billion kitten photos: how you program it probably doesn’t matter.

What’s scaling? Scaling is specialization. As you get bigger and as you grow the solutions being sold to you by vendors won’t cut it. You have to cut your database into different pieces and make it very specialized and specific to your needs. We’re going to talk about some of the techniques we use at Digg. Scaling is also about severe hair loss. I’m not joking. I’m going bald. It’s tough. It’s not easy. You can’t do it alone.
Often people get confused with scaling out and scaling up. You get to a point where you can’t scale up anymore. You can’t just buy more expensive machines at some point. Everyone is scaling out right now with lots of crappy boxes. We expect to fail.

Your mom lied; don’t share. Decentralize, expect failures and just add boxes. Amazon is one of the best at this.

CAP Theorem says you can only pick two of the following three: strong Consistency, high Availability, Partition tolerance.

What are my options? Denormalize, eventually consistent, parallel, asynchronous, specialize.

Denormalization is necessary in partitioned solutions and it’s becoming a huge problem for Digg. If you’re not using queues and messaging systems you’re going to want to look into gearman and djabberd. You wonder why things are going slow and you realize you’re doing 5 synchronous trips to the database. You’ve got to make these calls async with either http calls or gearman. One thing Digg is big on is running the numbers before you try and fix a problem. Run the numbers to make sure things actually will work. We’ll discuss a case of this.

Memcached, OMG Files! (MogileFS) Digg uses for icons and photos, Gearman is a massively distributed fork, and the new favorite toy: MemcacheDB “Will be the biggest new kid on the block in scaling.” Initial tests on a laptop yielded 15,000 writes a second. The developer behind this took Berkley DB and Memcache and brought them together.

Caching techniques: cache forever and explicitly expire, have a chain of responsibility. We had a generic expiration time on all objects at Digg. The problem is we have a lot of users and a lot of users that are inactive.  Chain-of-Responsibility pattern creates a chain: mysql, memcache, apc, PHP globals. You’re first going to hit globals, if it has it you’ll get it straight back, if not go to the next link in the chain, etc. Used at Facebook and Digg. If you’re caching fairly static content you can get away with a file based cache, if it’s something requested a bunch go with memcache, if it’s something like a topic in Digg we use apc.

Partition your data horizontally (rows a-f on one machine) and vertically (some columns on one table, some on another table). Horizontal when you have so much data you need to spread it across a lot of servers. Vertical scaling: Instead of altering tables, add a new table and add new columns to it, this avoids downtime. Abstract your data access so that the partitioned details are hidden from the user.

Green badges at Digg are the bane of Joe’s existence. Similar problem to what Twitter and Digg have. If you take a message from one place and drop it in a bunch of other buckets.  Kevin rose has 40,000 followers. You can’t drop something into 40,000 buckets synchronously. 300,000 to 320,000 diggs a day. If the average person has 100 followers that’s 300,000,000 Diggs day. The most active Diggers are the most followed Diggers. The idea of averages skews way out. “Not going to be 300 queries per second, 3,000 queries per second. 7gb of storage per day. 5tb of data across 50 to 60 servers so MySQL wasn’t going to work for us. That’s where memcachedb comes in.” The recommendation engine is a custom graph database from the R&D department and is eventually consistent. An example of problems you run into at real big scale on a social website.

[ Follow the Feed for notes on talks from other web leaders & innovators at the Web 2.0 Expo in New York going on this week. ]