Engineering @ Facebook's Notes
Facebook's Scribe technology now open source
Here at Facebook, we're constantly facing scaling challanges because of our enormous growth. One particular problem we encountered a couple of years ago was collection of data from our servers. We were collecting a few billion messages a day (which seemed like a lot at the time) for everything from access logs to performance statistics to actions that went to News Feed. We used a variety of different technologies for the different use cases, and all of them were bursting at the seams. We decided to build a unified system (called Scribe) to handle all of these cases, and do it in a way that would scale with Facebook's growth. The system we built turned out to be enormously useful, handling over 100 use cases and tens of billions of messages a day. It has also been battle tested by just about anything that can go wrong, so I encourage you to take a look at the newly opened Scribe source and see if it might be useful for you. To give the code some context, I'm going to go through the major design decisions we made to allow the system to scale.
The first decision we made was to not lock ourselves into a particular network topology. The Scribe servers are arranged in a directed graph, but each server only knows about the next server in the graph. This flexible topology allows for things like adding an extra layer of fan-in if the system grows too large, and batching messages before sending them between datacenters, but without having any code that explicitly needs to understand datacenter topology, only a simple configuration.
The second major design decision was about reliability. We chose was a middle ground here, reliable enough that we can expect to get all of the data almost all of the time, but not reliable enough to require heavyweight protocols and disk usage. More specifically, Scribe spools data to disk on any node to handle intermittent connectivity node failure, but it doesn't sync a log file for every message, so there's a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. Basically, this is more reliability than you get with most logging systems, but not something you should use for database transactions. As it turned out, this is a reasonable level of reliability for a lot of use cases, and has made scaling much easier. It's also the source of a lot of the hard-learned lessons: getting the system to catch up seamlessly after a significant network problem is tricky, especially when there are tens or hundreds of gigabytes of data backed up.
The final design decision was about the data model. When you're building something that looks like a logging system there are a lot of things people expect: logging levels and rules about when they get sent, timestamping and ordering of messages, schemas for common messages, etc. We decided that this was a can of worms that shouldn't be mixed up with the asynchronous and mostly reliable delivery of data, so we made the data model very simple. A message is two strings: a category and the actual message. The category is the description of what the message is about, and the expectation is that messages of the same category end up in the same place. The message is the actual data to be logged. We also don't have any a priori list of categories that must be maintained. If you create a new category it shows up at a new file. This is following the Unix philosophy of doing exactly one thing and doing it well, and it has definitely paid off in ease of use and development. We started with four or five use cases in mind and now we have hundreds, but we didn't have to modify the Scribe source for any of them.
Another choice we made early on was to build Scribe using Thrift. This sped up development enormously because a lot of the hard parts were already taken care of, and it also made the resulting system much more flexible. We currently log messages to Scribe from PHP, python, C++, and Java code, and the list of possible languages is growing all the time from the contributions of developers around the world. So Scribe has already benefitted enormously from Thrift being open, and it will be even better having Scribe open too. I hope you find it as useful as we do.
The first decision we made was to not lock ourselves into a particular network topology. The Scribe servers are arranged in a directed graph, but each server only knows about the next server in the graph. This flexible topology allows for things like adding an extra layer of fan-in if the system grows too large, and batching messages before sending them between datacenters, but without having any code that explicitly needs to understand datacenter topology, only a simple configuration.
The second major design decision was about reliability. We chose was a middle ground here, reliable enough that we can expect to get all of the data almost all of the time, but not reliable enough to require heavyweight protocols and disk usage. More specifically, Scribe spools data to disk on any node to handle intermittent connectivity node failure, but it doesn't sync a log file for every message, so there's a possibility of a small amount of data loss in the event of a crash or catastrophic hardware failure. Basically, this is more reliability than you get with most logging systems, but not something you should use for database transactions. As it turned out, this is a reasonable level of reliability for a lot of use cases, and has made scaling much easier. It's also the source of a lot of the hard-learned lessons: getting the system to catch up seamlessly after a significant network problem is tricky, especially when there are tens or hundreds of gigabytes of data backed up.
The final design decision was about the data model. When you're building something that looks like a logging system there are a lot of things people expect: logging levels and rules about when they get sent, timestamping and ordering of messages, schemas for common messages, etc. We decided that this was a can of worms that shouldn't be mixed up with the asynchronous and mostly reliable delivery of data, so we made the data model very simple. A message is two strings: a category and the actual message. The category is the description of what the message is about, and the expectation is that messages of the same category end up in the same place. The message is the actual data to be logged. We also don't have any a priori list of categories that must be maintained. If you create a new category it shows up at a new file. This is following the Unix philosophy of doing exactly one thing and doing it well, and it has definitely paid off in ease of use and development. We started with four or five use cases in mind and now we have hundreds, but we didn't have to modify the Scribe source for any of them.
Another choice we made early on was to build Scribe using Thrift. This sped up development enormously because a lot of the hard parts were already taken care of, and it also made the resulting system much more flexible. We currently log messages to Scribe from PHP, python, C++, and Java code, and the list of possible languages is growing all the time from the contributions of developers around the world. So Scribe has already benefitted enormously from Thrift being open, and it will be even better having Scribe open too. I hope you find it as useful as we do.
Written about 2 months ago


Thank you for your paitence and help,
Diane Cheek
crazyladydi57@aol.com
Scribe is software to help people make more software like facebook. My advice - learn how to use facebook by using facebook. By that I mean - ask your facebook friends, they can help!
Now give me a sewing machine and I can tell you all of it's functions and how it processes.
I wonder if there is one of the books called " How to operate facebook or my space for DUMMIES" ??
http://www.facebook.com/pa
@Michael:
I'm glad to try the FB hosting.
I'm always looking for good examples of appropriate design instead of acronym driven design, and this is a great example. Thanks for sharing!
Here's a pretty good run down of all the different things you can do on Facebook and how to use them:
http://www.facebook.com/he
Good luck and have fun!
http://sourceforge.net/pro
to use on my enterprise applications I develop, however it appears there are not releases available? When can we expect releases (even alpha or beta) available?
http://scribeserver.svn.so
http://developers.facebook
I'm interested in hearing about your experience with Scribe/fb303/Ruby.
Cheers
Chandranshu
Did you disable the "real" access logs and instead setup logging at the application level? Or did you write a small Apache (or whatever) mod to do this? I'm trying to imagine how it's setup, because I assume this is your full replacement for what someone might do with remote syslogging - right?
But Scribe logs much much more data than just Apache logs. Any Facebook feature that needs to aggregate data from many machines can simply write the data to Scribe. And we have over a hundred use cases where Scribe has come in handy.
1. w.r.t Anthony's answer of aggregating apache log, I am wondering why facebook is not using "ErrorLog syslog:localX" for apache error log aggregation and 'CustomLog "| /usr/bin/logger -p localX.notice -t something"' for access log aggregation over (r)syslog?
2. reading over the python lib. The client has a simple send_log, recv_log interface, nice. But if messaging order are important, how to do tcp like sliding window type batch sent, ack?
thanks,
-john
The primary reason we use Scribe to aggregate Apache logs is that Scribe enables us to easily aggregate logs from many,many machines into a central location. You can still use Apache’s custom logging to filter logs locally and then use Scribe to aggregate logs from multiple machines.
The send_log and recv_log functions you are referring to are not technically a part of Scribe’s interface, and you can ignore them. These functions are auto-generated by Thrift. Scribe is built on top of Thrift to support cross-language remote procedure calling. The only function in Scribe’s interface is Log(). See the scribe.thrift file to see how Scribe defines a Thrift service. And check out http://developers.facebook