![]() |
| [[ Home | Forums | 3D Engines Database | Wiki | Articles/Tutorials | Game Dev Jobs | IRC Chat Network | Contact Us ]] |
|
|||||||
![]() |
|
|
Thread Tools | Search this Thread | Display Modes |
|
|
#1 |
|
DevMaster Editor
Join Date: Jan 2005
Posts: 54
|
Writing Low-Pain Massively Scalable Multiplayer Servers
Author: Joel Reymont Description: This article describes an alternative approach to building massively scalable online multiplayer systems using the OpenPoker project as an example, which features built-in fault-tolerance, load balancing and unlimited scalability. |
|
|
|
|
|
#2 | |
|
New Member
Join Date: Oct 2005
Posts: 5
|
Hmm, first off, why doesn't the "Site Discussions" forum show up on the forums list?
Anyhow I noticed this info on your blog the other day, and now I see it again, so I'll write something. Quote:
This strikes me as both fanciful and incorrect. If a game server crashes, it is not able to send its latest information to a database. Also, if all transactions are replicated to all other nodes, you have two additional issues: 1) Bandwidth usage -- if EVERYTHING goes to EVERY machine then you are going to hit a bottleneck that limits your scalability. 2) Coherency -- some nodes are going to receive information before others, thus they will have a different internal gamestate during that time. Additionally, if a node sends this information individually to each other node, there is also the issue of a crash happening after it has replicated to some nodes but before it has replicated to all, e.g.: Node 1: Send to node 2 Send to node 3 * Crash * (Nodes 4 and 5 never get the info) Now your nodes are working on different copies of the information. To solve 1), you only send info to the nodes that need it. If you have 3 database servers, that's a lot less to send to than 20 game servers. They game servers can request information as they need it -- having all machines know everything at all times misses the point of scalability altogether, as you cannot have a game bigger than any one machine can track at a given time. To solve 2), you have to synchronize updates; whenever you broadcast info, you have to make sure everyone gets it before ANYone acts on it. E.g., send it to the main database server, which replicates on a separate gig-E interface to 2 backups, and doesn't ACK until they ACK back; and none of them treat it as live data until they have all ACKed to each other. Any other approach is just asking for concurrency/coherency/race condition problems. This is something you always have to keep in mind, because whenever a node DOES fail, it was in the middle of doing SOMETHING -- and that data may be borked. You have to have very specific policy about what happens to it. You also have to make sure that any major paradoxes are prevented by having complete data for a related set of objects be local to only one node at a time -- this means that for #1 above, a game server can request non-critical things like "who's on node 3" (which can change on that node before it is acted upon by this node), but can't muck with player data on that server unless it requests -- and waits for -- that player object to be sent over or otherwise designated as belonging to it. The game world therefore must be divided into smaller pieces in some way that are somewhat exclusive. For poker, you can divide your bankroll up in order to play on table X on node 1 and table Y on node 2; if you don't, this scenario can happen: Node 1 asks the database server for a balance, gets it, then accepts a bet, tells the db server to debit, only to find out you already lost your money in your game on node 2. Therefore, your basic choices narrow down to node-local data for fast access, otherwise slow network-querying for approval for every data modification (send an atomic check-and-debit request to the DB server so you aren't fooled by a check and debit in 2 steps in-between which there is another debit; also the DB server needs to log this transaction and the game it applies to BEFORE it sends back the reply, so if the game server goes down, or the DB server dies right then, the money trail does not disappear. Essentially you have to journal every operation so it can be rolled back, not just as a db.commit(), but in your game logic as well). As an additional point, having a 4 GB dataset limit for the entire game world rather limits your scalability as well. 4 GB per box (Mnesia FAQ) is workable, but not when every box replicates to every other one -- that's a huge waste, N copies of the same data so 4/N GB per server means capacity declines as the server farm grows (16 servers means 256 MB of unique data per server plus 3.75 GB in copies of the other servers' stuff; or alternatively, 16 copies of the same 4 GB but adding more servers doesn't add capacity for anything but processing). You also have to consider just how much CPU is dedicated to simply reading gig-E and storing replication data coming over the wire, and what the latencies will be when you send a message to a node and there is a gigabit of information in the pipe ahead of it (1 second delay assuming you achieve 100% of thetheoretical speed). Thus the article rather glosses over the issues involved in actual usage (I presume because nobody is actually using it yet). While Erlang sounds great in theory, any load-distributing game server farm is going to be quite complex in operation. If you really want to be sure you have fault tolerance, you need to be able to specify things very precisely; you don't just "throw more boxes" at it. You need to track exactly how things fail and/or degrade, and in what order. None of this is to slight any of the work done by the author. Some good ideas are presented. However, without a lot more detail on the specific scaling and robustness mechanisms, this appears to be usage of a neat language but without hard data on real-world usage in this example. There is no such thing as low-pain when it comes to actual operations. |
|
|
|
|
|
|
#3 | ||||||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
This is true, but mnesia supports a feature called "horizontal fragmentation", which means that you can partition the data set and spread it across a cluster of computers. The default partitioning scheme is to simply hash on the key, but you can supply your own scheme. This way, mnesia is able to support very large databases with practically linear scalability (true only for data sets that lend themselves to such partitioning, but poker certainly seems to be in that category.) If you download the Erlang/OTP source, there is an example application under lib/mnesia/examples/bench/. It is an implementation of a proprietary benchmark for Home Location Registers (HLR - the subscriber database servers in cellular networks). AFAIR, that application was tested on a cluster of 50 computers (as many workstations as they could get their hands on at the time), and scalability was linear. Not only that: transaction performance rivalled some of the best commercial cluster databases, and response times were superior to everything else we tested at the time. I think that's about as much as I'm at liberty to tell from our own experiments, but... the source for mnesia is there, and from it, I'm sure you can devise your own comparisons. ![]() Quote:
As long as you use transactions for all database accesses, you will be protected from this. Mnesia has ACID properties (ACID = "Atomicity, Consistency, Integrity, Durability" -- a common term in the database community.) This means that Mnesia takes care to avoid inconsistencies and other problems, such as deadlock and/or starvation. No updates are visible to others in the cluster, until a transaction commits. If updates are committed to disk, checkpointing will ensure that the database always comes back in a consistent state after a system crash. With mnesia, you determine per table how many replicas you want, where you want them, and whether you want a replica to be stored on disk or only in RAM. Writing your own transaction and replication engine is a scary proposition. Mnesia has been used commercially in distributed embedded systems with very high reliability requirements for about 8 years now, so if you base your logic on that, you'll be in pretty good company. Quote:
This is the sort of consistency problem that relational DBMSes were specifically designed to solve. With mnesia, you can do this within the context of a transaction. The rollback facility is built in. Quote:
See above. Also, the 4GB limitation for mnesia is for RAM-only data, and per box. With a cluster of tables, you can go far beyond that. (It should be noted that the practical limit per box is probably around 2-3 GB. The biggest mnesia database I've run continuously was at about 1 GB on one box.) Quote:
While I agree in general with this, I would guess that Joel "glosses" over some things because he feels that they have been proven in practice. For example, Nortel's SSL Accelerator (which has about 50% of the SSL Accelerator market) is designed along very similar lines. The Ericsson AXD 301, released in 1998, uses many of the same concepts, and has better than 99.999% availability including in-service upgrades. Having spent some time discussin Joel's architecture with him, I wouldn't say that he is finished (to reach "5-nines" availability, a lot of detail work is still required), but I think his basic architecture should be able to get him there. Quote:
Compared to how it's normally done, I'd say: yes there really is. ![]() Sincerely, Ulf Wiger Senior Software Architect IMS Gateways Ericsson AB |
||||||
|
|
|
|
|
#4 | |||||||||
|
New Member
Join Date: Oct 2005
Posts: 5
|
Quote:
What does that mean? Processing scaled linearly? In what way? It was already mentioned that more boxes will give you more CPUs, but you will hit both DB-size and bandwidth-limits that become the limiting factors in scalability. You aren't going to scale any kind of mass-replication scheme linearly, simply because with 50 nodes, if each one modifies more than 2 MB of data per second, you will saturate your network link. Benchmarks designed to demonstrate e.g. CPU scaling are not the same as MMOGs, which are a lot messier and less predictable. Quote:
Quote:
If the local node must treat everything as a db transaction, waiting for a network response, then your performance grinds to a halt as your RAM performance is replaced with your ethernet performance. However, if a local node can update its local in-memory copy of some instance data, and in the background, propagate that change, you now have an inconsistent data set until such time as everyone (or every mirror/db) has made the same change. A node can crash at any point, including between making changes, or after making a change but before propagating it, or in the middle of propagating it so not everyone gets the message, or one of its mirrors could die and be unable to commit the change, holding up the rest of the cluster, etc., etc. And it gets much hairier if you let more than one node update a record at once. You ALSO have to track not only player data, but exactly what point you were at in your game logic, because if you e.g. start a game and take the blinds and commit those changes, then crash, are you able to continue from that point, or do you undo the blinds and start the game over again (they have already been committed)? Quote:
Quote:
Quote:
From the FAQ: Quote:
Quote:
To further illustrate the non-applicability of your claims:, how about, "Google runs on Linux, so if you run on Linux you'll be in good company." That doesn't address game data availability/fault tolerance, does it? The application layer has to do that. (And Google loses and replaces many cheap nodes every day, but I don't recall reading that they re-route your request to make sure it gets the result set from that node's mirror.) Quote:
The fault-tolerance described can be done, but I don't see any details as to how it is achieved in this instance (an RDBMS is a database, not a game, and doesn't ACIDify your game logic). Modern MMOGs have good performance but poor fault-tolerance and coherency, leading to downtime and duping bugs. I look forward to seeing other people's solutions to this. |
|||||||||
|
|
|
|
|
#5 |
|
Member
Join Date: Oct 2005
Location: Florida
Posts: 78
|
mntr,
Do you have an article that describes a proper development or test environment for these systems? This would optimally included decided requirements for various server target uses. Corey |
|
|
|
|
|
#6 | ||||||||||||||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
It means that throughput in terms of transactions per second per cpu stayed roughly constant (e.g. going from 4 to 16 cpus, tps/cpu dropped only ca 6%, and latency increased with less than 7%) Quote:
And as I tried to point out, whether you will get good mileage out of a cluster database depends on how well your data model lends itself to partitioning. Quote:
The point of horizontal fragmentation is to avoid mass-replication. Thus, the 50-node example didn't involve mass replication at all. There is no need to keep more than 2-3 copies of each object around as regards fault-tolerance. The only strong argument for mass replication is if you want very fast local lookup of the data, and the data in question isn't updated frequently. One could perhaps argue that Joel's load-balancing scheme might not be optimal, and that it would be better to co-locate data for one game in in one fragment. OTOH, it would also be prudent to note that Joel hadn't suggested a fragmented database -- I did. One important point here is that whether the database is fragmented or fully replicated can be completely transparent to the user. Quote:
True, things get more dicey. But "all" in this context means all nodes in the network that have a writable copy of the tables being modified. Also, mnesia uses optimistic two-phase commit with roll forward. Basically, a participating node may die during the two-phase commit, and mnesia will re-run the transaction and commit anyway. Quote:
The _node_ is capable of handling many concurrent transactions, and as long as they don't contend for resources, they run independently of each other. Each transaction will block its own process (Joel mentions having run 800,000 concurrent processes on his laptop) until it is able to commit. You can optionally execute synchronous transactions, meaning that you will in fact wait until the transaction has been committed and logged to disk on all participating nodes. This can be useful in order to avoid overloading other nodes through the use of asymmetric producer-consumer patterns. Quote:
Surely, but why would the local node treat everything as a db transaction? You only write enough in the distributed database in order to recover if the local node fails. Quote:
When something happens, you commit the change, then tell the clients what happened. You cannot include server-client communication inside the transaction, as it violates the Atomicity property. Quote:
So, in the internal benchmark I mentioned, we were able to run more than 5000 transactions per second on a 16-node cluster, where roughly half of the transactions were updates of some sort. If you think that raw transaction throughput is going to be the limiting factor, MySQL Cluster claims about 100,000 replicated transactions per second on a 4-node cluster, and response times of ca 5-10 ms on a replicated transaction. As far as I can tell, you base your prediction on the assumption that mass replication is the only alternative (granted, it's what Joel describes). Quote:
No, not really. The data was kept in so-called disc_copies, which means that they were both logged to disk and kept in RAM. This gives both fast lookups and persistency, but has the disadvantage that you limit the size of your database to what will fit in available RAM. Mnesia uses different storage techniques for disc_copies and disc_only_copies. The disk logging used by disc_copies is quite fast. The main drawback of mnesia when it comes to really big databases may well be that it always copies whole tables when a node restarts and needs to synchronize its tables with others. This is not an attractive approach for really huge databases. Quote:
To be fair to Joel, I don't think his intention was to write a technical manual. [quote=mntr]Both bad examples. What does an SSL accelerator have to do with the usage profile of an MMOG? The SSL accelerator has a similar setup with a load-balancing front-end and processing back-ends. It also needs to handle sessions and cookie persistency, which means that it has to be (at least partially) stateful. Quote:
No. The point was rather that the product in question has a control system with up to 32 processors running erlang and mnesia. It exceeds the required "5-nines availability", which means that the question of what happens when the switch dies is nearly moot. But since you ask: single node failures within the switch are handled such that transient calls are lost, but all established calls are kept. Call-handling performance scales practically linearly, and what's more -- each control processor is able to withstand 1000% signaling overload continuously without crashing and without losing calls. While all this isn't obviously applicable to game logic, it's (I guess what you refer to as an "appeal to authority") meant to offer some background on what this technology was designed to handle. You take issue with some of Joel's statements because your experience tells you it won't work (presumably based on trials using slightly different implementation techniques). Other statements pass without comment, such as "When simulating 27,000 poker games on my laptop I found that I had about 136,000 players and close to 800,000 processes in total." Try running 800,000 simultaneous processes in your favourite Java VM or e.g. C# and see what happens. And being able to support 100,000 players on one machine, you aren't going to need 50 machines anytime soon, so the network saturation problems of doing massive replication across the cluster don't seem to be a high-priority problem right now. Even so, I've tried to suggest a way to deal with it, that would require only minor changes to the design. For another erlang-based application using a setup similar to that of Joel's you can look at the EJabberd (http://ejabberd.jabber.ru/) Quote:
No, but I would say that it does say something: if Linux is good enough for Google, you can safely consider it to be a solid OS. My claim was a little bit more to the point than that: I pointed at some products that use Erlang/OTP with reasonably similar distribution patterns as Joel's approach. You question whether one can use mnesia for storage of game data, and I've refered not only to benchmarks showing that mnesia can fairly comfortably handle hundreds of transactions per second per machine with very good response times (around 4 ms -- I didn't mention that). I've also mentioned products using erlang where the reliability requirements are far higher than what ought to be reasonable to expect from a game server. I think it is perfectly relevant and true to say that "if you chose erlang/OTP for an application that requires a high degree of fault tolerance, you'll be in good company". And BTW, putting Linux underneath is fine. That's what we do. ![]() Quote:
Fair enough. I'm also curious to see that. I should perhaps mention that I've not taken part in Joel's design, other than giving some advice and in general approving of his architecture. I have no stake in his business. And as far as the other systems is concerned, you can rest assured that they are subjected to some really grueling tests. The AXD 301 has a battery of over 10,000 test cases, much of them automated, and there is certainly pulling of plugs, boards, and cables. It's the "split brain" scenarios that I find particularly stimulating -- where the system gets split down the middle and then have to (automatically, since these are unattended systems) heal again, preferrably without anyone noticing. ![]() Quote:
Oh, but on the contrary, Joel does describe the details of how he intends to handle fault-tolerance. He even shows much of the most relevant source code. I get the feeling that you don't take his approach seriously because it has to be much more to it than what he describes. There are basically two (ok, three) sides to this discussion:
Sincerely, Ulf Wiger |
||||||||||||||
|
|
|
|
|
#7 | ||||||||||||||||||||||
|
New Member
Join Date: Oct 2005
Posts: 5
|
Quote:
Quote:
Right, which is completely unlike the article's description of every transaction being replicated to every node. My critique still stands. What you are doing is using different examples and then saying how my critique doesn't apply to them, which has nothing to do with applicability to the scenario described in the article which this forum is about. Quote:
Quote:
Quote:
Quote:
db.commit(some_data) player.send(message) db.commit(sent_player_a_message) Whether this last item was committed or not has no bearing on whether or not the data was copied to a kernel buffer and sent, or whether it was received. There are many more issues like that. Similarly, there is also the question of whether that node's local database replicated that transaction to the other nodes or not before the node died. There is also the question of whether the hot-swap node replacing it is going to read the gamestate and resume and THEN get the replication data. There is also the issue of whether TCP retransmission timeouts with exponential back-off will result in a client acknowledgement coming in way late due to the amount of time it took for the hot-swap to get up and running, or if the client uses UDP, if the server actually DID receive an acknowledgement before it died but before it could store it. And on and on. Quote:
Quote:
Quote:
And that's not even considering 4GB or more, which is not at all a large dataset for an MMOG. Quote:
Quote:
Quote:
Quote:
Experience tells me that you don't store every single change in a database, because of performance. So the database does not give you the last game state, it must be reconstructed, and that is what was left out of the article. Logic tells you that nodes failing WILL result in some loss of data (packets received and being processed by the game; transient calculations, etc.) unlike the article's claim that NO data is lost in a crash, and that some nodes will receive data before others, which means inconsistent game state during that time period, which complicates recovery. As you know, ACID doesn't guarantee nothing is lost for an incomplete write, just that it will be atomic so either it saves the whole transaction or dumps the whole transaction -- further invalidating the article's claims. Quote:
Quote:
I use neither. I also don't use tons of threads. Did you know select() with a single process gives better performance than one-thread-per-connection? And it takes far less memory as well. Anyway if you want fault-tolerance you want to be able to snapshot your heap memory, so use shared memory and a memory manager that will operate on a user-defined heap, avoiding system malloc() and sbrk()-related schemes.Quote:
On the other hand, other types of MMOGs use far more data per player and far more CPU than shuffling and dealing cards, requiring a cluster, and then all these problems will apply. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Bottom line, he's making claims without empirical data to back it up. The burden of proof falls upon the claimant. He has not proved any of this. In other news, I will write an article about my flying car, which sure sounds great in theory, and the engines are very reliable... |
||||||||||||||||||||||
|
|
|
|
|
#8 |
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Twice now, I've had a lengthy reply chucked by the system, because I've hit "preview" after my session has timed out. Apparently, I'm a slow learner, and didn't copy my post to backup storage before trying to preview it.
There was one core point: I've come to realize in discussions with C++ developers that there is one fundamental difference in the way we approach fault-tolerance. I've been introduced to the term "happy case programming", meaning that you write your code only to handle the normal cases. This is a bad thing, and means that your program is essentially useless for production purposes. With Erlang, this would be referred to as "Programming for the correct case", and is the preferred way of writing code. The key aspects of fault-tolerance and concurrency are intrinsic in the language, and as long as you follow some simple and intuitive rules when you design e.g. your game logic, fault-tolerance can be added/improved after the fact without having to rewrite code. Furthermore, performance tuning such as changing replication patterns and load balancing schemes can also be done afterwards without forcing a rewrite of the game logic. That's the main reason why I feel that the other examples are relevant. They prove that it is indeed possible to achieve very high levels of availability this way. This is also why I think Joel's approach holds water: he hasn't done anything yet that would preclude good scalability and fault-tolerance. Regards, Ulf Wiger |
|
|
|
|
|
#9 | ||||||
|
New Member
Join Date: Oct 2005
Posts: 5
|
Quote:
Quote:
Right, all C++ developers think exactly the same way. Quote:
Sure sounds absurd to me. Only handling the normal cases is NOT an approach to fault-tolerance. A more correct way of stating your thought is, "most programmers do not implement much in the way of fault-tolerance". Languages in which you handle memory directly also make it easier to make more mistakes, which is why you have to take special care. The point is you use a fault-tolerant framework, you don't just randomly write C++ code. Just like you use OpenGL when you want to do 3D. The fault-tolerant framework should handle memory management and have robust data structures, which can be interrupted in the middle of an operation without leaving dangling pointers (or which automatically corrects such). It should support checkpointing, e.g. a complete memory snapshot, and in a cluster this should be synchronized to get a completely coherent gamestate cluster-wide. It should be able to handle crashes of game code by starting up another game process very quickly, not having to reload everything from disk, and track reliable and unreliable versions with a version control system automatically (e.g. version 1.2.3 crashes, go back to 1.2.2 and don't use 1.2.3 again, but 1.2.4 is okay to try). And even with all this, you still have to be careful how you write functions -- you have to have an understanding of it, just like you can't simply use a "e-z threading library" with no understanding of deadlocking or thread starvation, at least not very successfully! Even languages that handle threading very well can still see performance go right down the drain if the user wants multiple threads to access the same resource frequently. Quote:
Let's see some examples! Game logic, please, not completely unrelated items like switches, unless they replicate incoming packets cluster-wide and guarantee delivery (so the protocol never has to worry about dropped packets). I don't think fault-tolerance is "intuitive" for programmers in general, and Erlang is no silver bullet -- look at all the things that have to be considered just for a login. Quote:
Uhh, no, not unless you guessed correctly in the first place. You may well find that your existing game logic creates too much network traffic and you have to rewrite things to improve performance. The downside to hiding implementation details is the developer doesn't realize that every time s/he requests resource X, it's a call over the network. Your choices are either to disallow this (and work ONLY on local data), or to require the developer to be aware of it by explicit mechanisms. Quote:
Uhh, yes, in the same way that using Linux doesn't preclude it -- but neither one DOES IT FOR YOUR, either! "My design does not preclude adding fault-tolerance" is a completely different claim than "my design IMPLEMENTS fault-tolerance". The difference with your examples is that high-availability through a low defect rate is not the same as fault-tolerance (of defects!). Minesweeper has high availability and let's say it never crashes -- but that's not due to fault-tolerance. Perhaps your car engine is very reliable if it gets highly refined fuel, but that's not fault-tolerance. Fault-tolerance would be if it could still run with half the fuel tank being filled with sand. Okay, let's use an easy example, which is fault-tolerant, but slow. If you have every variable stored transactionally in a database, you can resume from any point, right?. If you want more than one node, you have to wait for the replication to commit on every backup node, otherwise you roll it back on all of them. That will work, but it's slow. Clients are easily handled too because they act as back-up replication of part of the data set, so you don't continue until your backups and ALL the clients respond and agree (by committing changes). This is the easy way to handle things. Everything is in effect one big node, because you make sure every operation went through smoothly. The tricky part is how to do this in a reasonable timeframe. Everything you don't store in the DB requires reconstruction, and clients need extra logic for backing up to some previous known-good state (e.g. a checkpoint/snapshot). You also have to simply disallow certain operations because they just are not time-efficient (e.g. partitioning the data set so you only operate on your local data, and minimize the frequency of objects hopping between nodes because it's slow). |
||||||
|
|
|
|
|
#10 | |||||||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
That's not what I meant, but certainly C++ has some characteristics that cannot be overlooked, if you're going to use it. Quote:
That's not a more correct way of stating the Erlang approach to fault-tolerance. It's more along the lines of: "don't try to insert fault handling in your normal code, because you're likely to mess it up. Let the infrastructure deal with it." Quote:
Right, and here we start coming closer to the fundamental difference. To a C++ programmer, if you don't have a well conceived strategy for fault tolerance from the beginning, you don't have fault tolerance, period. You don't graft it on after the fact. Erlang/OTP is a fault-tolerant framework. The language was designed with fault-tolerance as a core requirement. It's no silver bullet, but it sure shines in this regard. ![]() Quote:
I did mention in my (lost) response that I'm not a game developer, so you're not going to get game logic examples from me (I don't have any). And you still seem to have completely misunderstood what the AXD301 is (understandable, since it's not a product in your domain.) It's no domestic ethernet switch, and the examples regarding erlang code in AXD 301 have very little to do with the switching per se. The control system in a system like that does tons of things (around 1.5 million lines of erlang code, in fact, which would correspond to some 5-10 million lines of C++, based on practical experience.) Quote:
Linux does offer fault-tolerance support in several ways. The most obvious one being memory protection between processes. Quote:
But in fact, there is lots of fault-tolerance implicit Joel's code, because he follows the design guidelines in a fault-tolerant framework. Certain things don't have to be nailed down in order to get the game logic working, and he hasn't nailed it down. This, obviously, makes it much easier to change initial assumptions. One such example is the massive replication scheme. The only place where that is visible in his code is where the tables are created. And in fact, it will create replicas on all nodes available at install time. If you want to add nodes in service, additional code is needed to make sure they also get replicas (if that's what's desired). The (not yet written) code could instead add fragments to the table, and/or redistribute existing fragments across the new cluster configuration. This would not affect his game logic one bit, but it (fragmentation) would force him to rewrite the load balancer (which consists of a dozen or so lines of code.) Whether you add replicas or not, the code will still work, but characteristics may not be what you expect (difficult to know without full scalability tests). From a fault-tolerance point of view, keeping two central replicas of the data is fine, and the game servers don't need a copy of their own. Data doesn't have to be local for you to access it, but it obviously affects performance. And before you start commenting that this is irrelevant -- yes, I agree that for unlimited scalability, you can't replicate everything to everywhere, but where we disagree is on whether it invalidates his approach. I say it doesn't because his approach doesn't depend on it, and it's easy to change. One of the things he hasn't done is write a supervisor process for his game servers. Doing so means rewriting exactly zero lines of what he's done so far, and it's just a matter of 5-10 lines extra. This is not a show stopper either. Quote:
One of my colleagues has written a PhD thesis named "Making reliable systems in the presence of software errors" (http://www.sics.se/~joe/thesis/armst...esis_2003.pdf). If you read that, you will significantly increase your understanding of how erlang programmers approach fault tolerance. And as the title suggests, what we do is not "high-availability through a low defect rate" (even though our defect rate is extremely low, as Joel's probably will be.) Sincerely, Ulf Wiger |
|||||||
|
|
|
|
|
#11 | |||||
|
New Member
Join Date: Oct 2005
Posts: 5
|
Quote:
So it has its uses, yes, but let's not get carried away. Quote:
Feel free to explain it, because so far, what I understand about it is that is doesn't do what a MMOG server does in any way and therefore is not an example that applies to this case. Quote:
I assume your examples are general and non-applicable because you are not aware of any specific applicable ones. Quote:
That's what this whole thing boils down to. Lack of testing. Quote:
That's what every company wants to think about its products, but "we'll probably have a low defect rate" is just marketing speak. In any case, my original post addressed various issues with the article, which you have mostly avoided and instead brought up reasons why "Erlang is not completely out of the question" versus the pertinent "is this design sound?" analysis. I find it possibly telling that the original author has been silent. Usually the purpose of publishing something about your great design is to get feedback. Perhaps he has been busy elsewhere, but if not, my guess is that my critique has pointed out some areas that still need testing, and he has some changes to make before demonstrating the capabilities discussed in the article. |
|||||
|
|
|
|
|
#12 | ||||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
If you've followed Joel's blogs, you are of course aware of his test harness, where he runs through thousands of actual poker hands, verifying that they actually play out like they did in real life. It's also been shown elsewhere (e.g. in the AXD 301), that you can expect the defect rate in an erlang-based application to be at least 4x lower than in a comparable C++ application. You may dismiss all anecdotal evidence as either unscientific or irrelevant if the examples given don't implement MMOGs. That's your perogative. But I would argue that it's more than just "marketing speak" in this case. Quote:
I have also said that I have no personal stake in this, and stated that I'm not a game designer. I have tried to offer information based on my own experience and expertise. Quote:
Understandable, but in this case incorrect. Joel has been silent because he moved rather suddenly and has been without an internet connection since then. Quote:
And I've argued that those changes are minor - in many cases a mere configuration issue. Again, based on my experience in areas other than game design. Now, you will have to excuse me, but I will not continue this thread unless the discussion takes a different direction. I feel that we're going around in circles. Regards, Ulf Wiger |
||||
|
|
|
|
|
#13 | |
|
New Member
Join Date: Oct 2005
Posts: 7
|
I havent read through this thread but one comment in this article really struck a cord.
Quote:
Having work on nearly 8 mainstream MMO games I can safely say that I complete disagree with that statement and that is certainly a bold claim if you have never used your stuff to make a AAA MMORPG game. poker has a fraction of the number of objects, amount of data per object, amount of interaction(actions a user can do), then a MMO. Just managing the replication of the data (what you can see and cant see, who is wearing what cloths/armor/weapon, what spell they just cast, what effect to play) the amount of data that represents each object is significantly higher then a poker game. chat system, replication, movement, visibility, database data, player interaction(hitting buttons to do things, chat, etc). is a much larger scope. scalibilty of a 52 card game with such small amount of data as 5 of hearts is a much smaller scale, so getting to support lots of players with a system like that is NOT the same as in a full scale, full featured MMORPG. also, so what if a group of c++ programmers took more time to make openpoker, thats certainly poor justification for erlang, those guys might not have had the experience and skills to do it better (that many other people DO have). Ive worked on plenty of system just as scalible, all written in C++ by professions that accomplished everything you have here. I think when you present things like this you should check the attitude in and just present it as an alternative and clever tech to use for your own project. but like anything else, its not a one stop golden hammer solution for everything. seesh. its good stuff no doubt and im sure its appreciated that you share it with the community but damn dude there are many solutions for many problems and none of them are the best for everything. honest. |
|
|
|
|
|
|
#14 | |||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
So, what would be a typical amount of data, and would you always have to replicate all in one go? After all, most of the things you listed change at a very low rate (initiated by a human being, typically one thing at a time). Basically, how many replicated changes per second would be typical in a game server? Quote:
Now, all anecdotal evidence regarding erlang argue in its favour when it comes to managing complexity, at least compared to e.g. Java and C++, while e.g. Haskell tends to give shorter programs still. Quote:
Just be aware that Joel is not exactly the first person to compare an Erlang development effort with comparable activity in C++. This ground is well covered, and it's not just his subjective opinion. Whether the average difference in productivity is 2, 4, 10 or more times higher can of course be debated (it's highly unscientific anyhow), and most certainly varies depending on the type of problem. Problems involving lots of concurrency -- esp. complex concurrency patterns(*) -- tend to be vastly easier to write in Erlang. As a result, performance in Erlang-based applications is not seldomly significantly better than in the C++ applications, even though nearly all micro-benchmarks tend to suggest the opposite (failure to manage complexity is a common source of performance problems). Another observation made by several people is that you seem to need much more skilled people to succeed with C++ than with Erlang. (*) Complex concurrency patterns= mainly when you have state machines receiving input from multiple, uncoordinated sources. This causes interleaving issues and timing-dependend code unless one takes great care in designing the concurrency infrastructure (which C++ lacks btw). An interesting point to discuss is whether performance would be adequate in an erlang-based MMO, and whether one simply has to resort to C++ in order to get good enough performance. But arguing that you could handle the complexity better in C++ is simply ill-informed. There is simply no evidence to support that, and much evidence to the contrary. Regards, Ulf Wiger |
|||
|
|
|
|
|
#15 |
|
New Member
Join Date: Oct 2005
Posts: 2
|
I will admit that I specifically made some bold claims in the article to stir things up. Regardless, you can shoot yourself in the foot with Erlang just as well as you can with C++ or anything else.
It seems like everyone really focused on the nitty-gritties of how to exactly implement replication and what to replicate, while missing the big picture. The big picture being that it's far easier to implement scalable systems using Erlang than C++, my particular replication approach notwithstanding. Last edited by Joel Reymont : 10-27-2005 at 04:50 PM. |
|
|
|
|
|
#16 | |
|
Member
Join Date: Oct 2005
Location: Florida
Posts: 78
|
Quote:
You might want to check your attitude with these comments as well. He was never attacking those who developed with C++. corey
___________________________________________
G3D 6.07 3D Engine |
|
|
|
|
|
|
#17 | |
|
New Member
Join Date: Oct 2005
Posts: 7
|
Quote:
I interpreted this part as being a little on the windows c++ programmers suck and look how great I am in doing it in half the time. Maybe that wasnt the intention but it sort of comes off that way. as for the question about replication, you say it doesnt change much, but in fact it does when things go in and out of scope. You not replicating everything from everyone all the time, only the people that are reasonably close to you. as they move in and out of "Range" lets call it, there can be huge chunks of data as say 50 people walk into range of you. of course you only try to send the most relevant data first and then the less important data later on. but my point is, that is a lot more to manage and keep track of then any poker game. If the things people are wearing can be unique, uniquely colored, have a special symbol on them, have unique attributes, etc it can be a ton of information to manage. Think of the example of walking in star wars galaxies through the housing area. Now there is a possibility you may be able to modify this openpoker code to work with it, but to assume that because you support a poker game that it will surely, by default, support a full scale MMO rpg type game? nah, not buying it, and an awfullly bold statement and assumption given it has not been used or tested with one. Thats the point i was trying to make, to me the article sounded a lot like look how great I am, game projects are bloated messes, and this erlang is the bomb. please, these mmo projects are not bloated messes, sure many of the game designers are green, the managers that run them even more green, unrealistic time lines, constant demos instead of actually working on the game, yes, bloated? no one has time to bloat the codebase. the ones that do dont ship. I apologize if I came off harse, been a rough week working on the lastest MMO and when I saw this I was like good grief, unless you work on these you have no clue what goes on behind the scenes and the work it takes. if there was a simple solution that was that easy we would be using it already, we are always looking to cut development time, looking at how to get it done faster before the axe falls on our timeline. and comparing poker to a MMO is like apples and oranges. sure they both need scalability but the similarity ends there. |
|
|
|
|
|
|
#18 | ||||||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
I don't think he (nor anyone else) intended to say that c++ programmers suck (my own opinion is that c++ rather requires programmers to be excellent). c++, on the other hand, is not necessarily the best programming tool around, and for some problems there are much better alternatives. Others would specifically bash c++. Browse e.g. comp.lang.functional, and you will find lots of programming language researchers claiming that "c++ is broken beyond repair". There are languages out there that are more expressive, more coherent, safer, and more productive than c++. Even if you personally don't believe that to be true, perhaps you can appreciate that many very competent people do? (Disclaimer: I'm no programming language expert, even though I try to follow developments as well as I can. I switched from C++ to Erlang about 13 years ago, and for me, it was most definitely the right choice. I've never looked back.) By all means, choose c++ because it's what most employers ask for (besides Java). You may even choose it because really excellent c++ programmers are a scarce commodity and command a very high price on the job market. It's a logical career choice. Besides, many of the niche languages have been known to have performance problems. Beware, though, that just because that's been true historically, it doesn't necessarily stay true. OCaml, for example, is a functional language with an outstanding compiler, generating code that's sometimes even faster than optimized C code. It's nearly impossible to stay current with programming language development while being up to one's ears in commercial development (I have this problem myself). But don't turn down a chance to hear what proponents of niche languages are capable of. By definition, niche languages need to be significantly better than mainstream languages for something, or there is no reason for their existence (beyond providing material for some PhD thesis.) Quote:
We seem to use the word "replicate" differently. In Erlang, the database, as well as the language, offer distribution transparency. That is, the data doesn't have to reside locally for you to access it, and you don't even have to know where it resides (hence, transparency). For performance reasons, one would like to have the load balancer direct players who interact a lot to the same machine. This is of course easy to do in poker, as players don't exactly dash between rooms. In other games, I imagine that it might be quite difficult, and some caching is probably necessary. To repeat, the concern as regards Erlang for MMO design should be performance, not complexity. Data volumes, update frequency, etc. are interesting figures to discuss. At what rate does data actually change, for example? If you have to re-distribute data dynamically at a high rate, then perhaps the built-in database support will not be adequate. Quote:
But those things you mention are pretty static attributes, so as far as distribution patterns go, they should be reasonably straightforward to handle. Besides, handling attribute lists is something that functional languages excel at from a structural point of view. 'map' and 'fold' functions, applying a higher-order function onto each object in a linked list, are very common operations. You can of course turn it around, and have a list of higher-order functions that you apply to a given object. Joel's game logic state machine works this way. Quote:
It is the bomb when it comes to massive concurrency and distribution. See for example Todd Proebstring's excellent talk, "Disruptive Programming Language Technologies" (http://ll2.ai.mit.edu/talks/proebsting.ppt). In the talk, he claimed that "Erlang is THE disruptive technology for concurrency" (I couldn't get my RealPlayer to work for some reason, so that may not be the exact quote, but close enough.) Proebsting manages the Programming Languages Group at Microsoft Research. Quote:
There are different ways to bloat code. If you're pressed for time, and your programmers/managers are green, you won't likely be able/allowed to restructure your code in order to keep it small and maintainable. This will quite often lead to code bloat, IMO. Since I work in commercial development myself (although not game design), I'm very much aware of the priorities: if it works (sort of), and customers are waiting -- ship it. Maintenance, even though it makes up ca 80% of the lifecycle cost (at least in our market), too often becomes an afterthought when there's a crunch. After all, if you can't make a sale in the first place, there will be nothing to maintain. This is an area where we feel that Erlang has been great. Reasonably mediocre designers can still write code that works in a short timeframe, and the code is not especially difficult to maintain. In our world, performance is adequate (at times even outstanding). Since the telecoms industry is extremely competitive nowadays, we don't have time to reach excellent quality through brute force alone. Getting the job done quickly and with fewer faults is a huge advantage. Still, since our products also involve a lot of low-level hardware control, we use C a lot (in a previous project, it was ca 1.5 million lines of Erlang, and about as much C -- over a 100 different board types to control). To quote our line managers, the Erlang stuff "just works". Getting the C code right easily becomes the bottleneck, both in terms of programmer time to get it right, and in terms of finding good enough programmers. Granted, real-time device programming in C is not your average C programming task, and at least up until now, using erlang in that domain has been (is still) out of the question. I apologize if I came off harse, been a rough week working on the lastest MMO and when I saw this I was like good grief, unless you work on these you have no clue what goes on behind the scenes and the work it takes. Quote:
A commendable attitude. Of course, introducing new technology in running projects is quite a challenge, and quite often, it simply isn't justifiable to change, even one would like to. Regards, Ulf Wiger |
||||||
|
|
|
|
|
#19 |
|
New Member
Join Date: Oct 2005
Posts: 7
|
Thank you for the feedback, I would like to point out that people have been writing scalable, fault tolerant C applications for a few decades now.
In fact many of these "improved" languages often times are written in c or c++. There are lots of example of decent improvements though, Object Caml, python, java, etc. Object Caml for example is very good for networking solutions, been used in many mainstream systems for years, but it has many limitations that prevented it from being completely useful in an MMO. I suppose if your approaching erlang as a piece of the total system needed to build a MMO you might be on to something but if its being suggested that its the entire system (like collision, physics, movement, replication, network scenegraph, etc) I would be just a tad bit skeptical. Im referring to those systems in the terms of networking, not in terms of client side stuff btw, networked physics, collision, movement, etc. Im genuinely curious if erlang can really work, because like I said if there is enough evidence then I would seriously consider trying it out as finding box solutions that can be easily integrated into our codebase saves tons of time. especially when you talking about MMO projects that run 3 4 or 5 years and cost 20, 30, 50 million to create. granted much of that cost is in art assets but a good chunk is in the programming staff as well. I suppose one thought is that the pool of knowledgeable erlang programmers is most likely a lot smaller then the c/c++ pool making it difficult to find talent but if its easy enough to pick up it might not be a problem. |
|
|
|
|
|
#20 | |||||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
Indeed. Quote:
Yes. This is also true for Erlang. The Erlang VM is a sizeable C application. Quote:
Many use Erlang as a "systems programming language", providing a layer that manages the machines in a cluster, monitoring applications, load balancing, etc. Essentially, Erlang/OTP becomes your fault-tolerance middleware and provides the non-stop and healing properties to your system, allowing you to focus on the game-specific stuff -- and how much of that is done in Erlang will surely vary. Quote:
You should visit www.erlang.org, browse the erlang-questions mailing list, and perhaps subscribe and take part in the discussions. The mailing list is considered by many to be outstanding, with lots of very knowledgeable people who are more than willing to help. Some have also done some fairly interesting (AFAICT) stuff in game design. [quote=BuffDigits]especially when you talking about MMO projects that run 3 4 or 5 years and cost 20, 30, 50 million to create. granted much of that cost is in art assets but a good chunk is in the programming staff as well.[quote] The "killer" application in Erlang (the most downloaded, at least) is Wings3D, a 3D modelling tool. Perhaps you've heard of it? Quote:
The pool is decidedly smaller, but... At the moment, I think Erlang programmers are looking for work. Especially in Sweden, some of the biggest universities teach Erlang, and you are likely to find people who actively seek companies where they are allowed to do Erlang work (that's how I came to work for Ericsson, btw.) So competition for the really great programmers is not nearly as fierce as it is for the great C++ programmers. Regards, Ulf Wiger |
|||||
|
|
|
|
|
#21 |
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Just thought I'd mention that there will be a talk with the hefty title "A Virtual World Distributed Server developed in Erlang as a Tool for analysing Needs of Massively Multiplayer Online Game Servers" at the Erlang User Conference 2005, in Stockholm next Thursday.
Regards, Ulf Wiger |
|
|
|
|
|
#22 | |
|
New Member
Join Date: Oct 2005
Posts: 2
|
Quote:
Could you enumerate them loosely? P.S. Apologies for hijacking the thread |
|
|
|
|
|
|
#23 |
|
New Member
Join Date: Oct 2005
Posts: 7
|
well lets put it another way, if your making some small time MMO game with a group of friends on a shoe string budget, use whatever you can get your hands on.
If you dropping 20 to 30million to make a mainstream MMO game, requiring teams at time of ship of 100+ people (and sometimes more for 24/7 customer support, qa, publishing, network operations, etc). then using semi obscure packages from all over the internet can become very problematic. the game industry is a high turn over industry, you always need to be able to find good quality programmers that know there stuff. The more obscure the language you use, the more difficult that becomes. Its already reasonbly hard to find competent and high quality C++ people, much harder then you think. Sure plenty of meat and potatoes C++ programmers but very few with enough background to design and develop highly scalable, limited bugs, and properly documented systems (at least not in the gaming circles, plenty of them in other industries). So you really need to factor that in, but in additional to that we found Caml really didnt do well unless you used the bytecode version, we found numerous bugs in the compiler (probably fixed since then), and debugging was very cumbersome. Plus now we hade 3 different compilers, 2 different operating systems, and 3 different debuggers to work on the product (see where this is going?) Its a mess. Client is usually in windows, servers in linux, lagunages included C++, python, object caml. Compilers, msdev, gcc, caml bytecode compiler, etc. what a freakn maintenance nightmare. we scrapped almost all of it, went pure C++ with some small use of 1 scripting laungage, saved tons of money, needed fewer programmers, and managing and maintaining the final product was much, much, easier. Its easy to spiral out of control really fast, and get too hung up on what is cool or neat, when you need to focus on getting the thing done, for cheap, and on time. Sometimes these packages that "do everything for you" cost you more in the long run, but sometimes they work, you have to use your own judgement. But I can tell you on a large scale game project (remember a lot of game programmers never took a softwre development course, they have a tendency to be seat of the pants programmers, hack and slash, and slam out the next demo. thats a stereotype from me they arent all bad though.) but when we use 1 compiler, 1 OS, and 1 language, things get done, everyone on the project can work on any system (since they all understand C++) they can all help each other. before that you had the python gurus, the caml gurus(1 guy), c++ gurus and they didnt mix well, they could help each other, and if a critical person left or was sick a week you were screwed. so overall Id say its a combination of we had a bunch of fundemental problems with caml coupled with the complexity of putting too much stuff into the mix. lets not forget the pointing fingers thing too, its caml, no its c++'s fault, your interface is wrong, no the right compiler flag wasn't set, and on and on. I have to admit, coming from a more professional industry myself, it never seemed to be a problem before, but the gaming industry in general is a whole different ballgame. I've never met so many cool fun people, but also so many that are untrained, flying by the seat of there pants programmers that dont know what a sequence diagram is or how to write use cases, let alone gather proper requirements. Its amazing, and Ive work for several different major publishers, 3 different game companies and been exposed to hundreds of programmers and there are a few super stars in there but the rest fit my description to a T. sorry , rambling, so use your own judgement, object caml did not suite our needs, but im sure a more deciplined team could use it just fine, and reading through some of the release notes it looks like a number of our reported bugs have been fixed, but in the decade of making MMO's Ive met 1 programmer that knew what the hell caml was and that was only because a teacher asked him to use it in college. but if your interested in the types of things that can be done with caml, check out a package called ensemble, pretty nice, has lots of networking capabilites. Use to be well maintained, not sure anymore(another potential problem with stuff like this, support just disappears overnight). well, thats probably not what you are looking for, but hopefully its some insight into the shenanagans thats goes on in these major, hugely funded game projects, its amazing anything ever ships if you ask me. but i will say when one ships and its not buggy and runs smoothly those people deserve a major round of cheers cause its not all there fault when it fails, the time frames, the constant need to show a new demo with new stuff, the moving up of delivery dates, not having enough people soon enough, etc all contributes to the problem and to make it through that sometimes is a freakn miracle. (i apologize if I offended any good game programmers out there, but if youve been around the game industry long enough it becomes pretty clear you are few and far between. Maybe its because every wanna be programmer wants to make a game or its because the quality programmers work in mainstream industry(because you get paid more), I dont know, but every team seems to have 1 super guy and a ton of worker bees. sure they cant all be studs, but geez, your talking about a lot of money and a lot of time, you need at least 1 functional expert in each of the critical areas, and thats rare.(referring to multi million dollar mmo titles)) |
|
|
|
|
|
#24 | |||
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Quote:
Agree, except for the part of there being plenty of good C++ programmers in other industries. Learning C++ to the point that you get really good at it takes a very long time. I can't help thinking that many good C++ programmers so vigorously defend their way either because they don't want to think that all that hard work was in vain, or because they have forgotten how painful it was to get where they are. And, BTW, to make really good, scalable and robust applications in C++, and to do it quickly, requires outstanding programmers. My hat's off to those who can do it - they are much better programmers than I. I have to use smarter technology to compete. Then again, the master programmers who choose to use the smartest tools can become fantastically productive. Quote:
Very true. Quote:
That's quite interesting. A sign of the game industry still being quite immature, I guess. (Not trying to offend anyone - I'm not saying that the people in the game industry are immature.) I recall similar problems in the webshop industry. Early on, many of the webshops were built by people who were great at squeezing every last ounce of functionality out of JavaScript and HTML, but hadn't the faintest idea of the logistics required to handle thousands of customers. Meanwhile, the mailorder industry slowly moved over to the web, and to them it was just a different kind of catalog - the logistics backend was already in place. Btw, the presentation at the Erlang User Conference 2005 dealt with an MMOG prototype that was very similar in structure to Joel's, even to the point of using fully replicated tables ;-). It also used a load generation tool, Tsunami (they're looking for a different name), that should be of interest to many of you. http://www.erlang.se/euc/05/ Performance Measurement and Applications Benchmarking with Erlang (http://www.erlang.se/euc/05/0930Remond.pdf) A Virtual World Distributed Server developed in Erlang as a Tool for analysing Needs of Massively Multiplayer Online Game Servers http://www.erlang.se/euc/05/1000slaski.pdf (slides) http://www.erlang.se/euc/05/Slaski.pdf (paper) Regards, Ulf Wiger |
|||
|
|
|
|
|
#25 | |
|
New Member
Join Date: Oct 2005
Location: Sweden
Posts: 11
|
Regarding Mnesia and large data sets + synching whole tables over the net:
Quote:
I just thought I'd mention that I got hold of a few SPARCs with 16 GB RAM. I thought I'd push Erlang and mnesia to see where it started deteriorating. I downloaded 64-bit Erlang (those SPARCs didn't have a working C compiler) and tried some scalability tests. First, I ran a ring benchmark: create a ring of N processes, and pass a message around the ring. Calculate average time for spawn() and message passing. I was able to create a ring of 6 million processes without noticeable deterioration of spawn and send cost. Using an optimization trick called 'hibernate' (the process heap is compressed while waiting for the next message), I was able to run 20 million processes in a ring, again without deterioration (even though send times rose from 2 to 5 us/message when I introduced 'hibernate'). The cost of spawn() was ca 4 us. I then tried creating large mnesia tables. I had a couple of different examples that I alternated between. One was DNS ENUM records. I inserted 70 million ENUM records in a replicated table with checkpointing to disk. Building the table from scratch took a fairly long time, but there are some tricks to speed things up. Terminating Erlang and restarting it, reloading the table took less than 30 seconds. Reloading it across the GB Ethernet network took about the same amount of time. With a thread pool enabled for the runtime system, I didn't notice any significant reduction in responsiveness, even when the system used practically 100% CPU. In fact, timing the transactions while the system was going at full speed gave better figures than benchmarking an idle system. That's a known (if not perfectly well understood) quirk of Erlang's runtime system: it seems to perform better under load than it does when idling. An update transaction on the replicated table took ca 700 us. It doesn't block anything except the client process requesting the transaction. The cost of transactions, reads and updates stayed pretty much constant even as the data set approached 16 GB. A Dirty write on a RAM table takes about 15 usec. Transactions are more expensive of course. So, with 64-bit Erlang, the 4 GB limit is removed. You have to be a bit careful, though, because the data structures tend to use up much more memory. Using binaries is a pretty good option, and it's not so much less convenient, given Erlang's bit syntax. Regards, Ulf W |
|
|
|
|
![]() |
| Thread Tools | Search this Thread |
| Display Modes | |
|