123
 123

  2010-02-09 Tue

22:54 Blob Storage in Innodb (6780 Bytes) » MySQL Performance Blog

I’m running in this misconception second time in a week or so, so it is time to blog about it.
How blobs are stored in Innodb ? This depends on 3 factors. Blob size; Full row size and Innodb row format.

But before we look into how BLOBs are really stored lets see what misconception is about. A lot of people seems to think for standard (”Antelope”) format first 768 bytes are stored in the row itself while rest is stored in external pages, which would make such blobs really bad. I even seen a solution to store several smaller blobs or varchar fields which are when concatenated to get the real data. This is not exactly what happens :)

With COMPACT and REDUNDANT row formats (used in before Innodb plugin and named “Antelope” in Innodb Plugin and XtraDB) Innodb would try to fit the whole row onto Innodb page. At least 2 rows have to fit to each page plus some page data, which makes the limit about 8000 bytes. If row fits completely Innodb will store it on the page and not use external blob storage pages. For example 7KB blob can be stored on the page. However if row does not fit on the page, for example containing two 7KB blobs Innodb will have to pick some of them and store them in external blob pages. It however will keep at least 768 bytes from each of the BLOBs on the row page itself. With two of 7KB blobs we will have one blob stored on the page completely while another will have 768 bytes stored on the row page and the remainder at external page.

Such decision to store first 768 bytes of the BLOB may look strange, especially as MySQL internally has no optimizations to read portions of the blob – it is either read completely or not at all, so the 768 bytes on the row page is a little use – if BLOB is accessed external page will always have to be read. This decision seems to be rooted in desire to keep code simple while implementing initial BLOB support for Innodb – BLOB can have prefix index and it was easier to implement index BLOBs if their prefix is always stored on the row page.

This decision also causes strange data storage “bugs” – you can store 200K BLOB easily, however you can’t store 20 of 10K blobs. Why ? Because each of them will try to store 768 bytes on the row page itself and it will not fit.

Another thing to beware with Innodb BLOB storage is the fact external blob pages are not shared among the blobs. Each blob, even if it has 1 byte which does not fit on the page will have its own 16K allocated. This can be pretty inefficient so I’d recommend avoiding multiple large blobs per row when possible. Much better decision in many cases could be combine data in the single large Blob (and potentially compress it)

If all columns do not fit to the page completely Innodb will automatically chose some of them to be on the page and some stored externally. This is not clearly documented neither can be hinted or seen. Furthermore depending on column sizes it may vary for different rows. I wish Innodb would have some way to tune it allowing me to force actively read columns for inline store while push some others to external storage. May be one day we’ll come to implementing this in XtraDB :)

So BLOB storage was not very efficient in REDUNDANT (MySQL 4.1 and below) and COMPACT (MySQL 5.0 and above) format and the fix comes with Innodb Plugin in “Barracuda” format and ROW_FORMAT=DYNAMIC. In this format Innodb stores either whole blob on the row page or only 20 bytes BLOB pointer giving preference to smaller columns to be stored on the page, which is reasonable as you can store more of them. BLOBs can have prefix index but this no more requires column prefix to be stored on the page – you can build prefix indexes on blobs which are often stored outside the page.

COMPRESSED row format is similar to DYNAMIC when it comes to handling blobs and will use the same strategy storing BLOBs completely off page. It however will always compress blobs which do not fit to the row page, even if KEY_BLOCK_SIZE is not specified and compression for normal data and index pages is not enabled.

If you’re interested to learn more about Innodb row format check out this page in Innodb docs:

It is worth to note I use BLOB here in a very general term. From storage prospective BLOB, TEXT as well as long VARCHAR are handled same way by Innodb. This is why Innodb manual calls it “long columns” rather than BLOBs.


Entry posted by peter | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

21:03 Introducing percona-patches for 5.1 (6255 Bytes) » MySQL Performance Blog

Our patches for 5.0 have attracted significant interest.  You can read about SecondLife’s experience here, as well as what Flickr had to say on their blog.  The main improvements come in both performance gains and improvements to diagnostics (such as the improvements to the slow log output, and INDEX_STATISTICS).

Despite having many requests to port these patches to 5.1, we simply haven’t had the bandwidth as our main focus has been on developing XtraDB and XtraBackup.  Thankfully a customer (who prefers to stay unnamed) as stood up and sponsored the work to move the patches to 5.1.

To refresh, the most interesting patches are:

Two new features which not available for 5.0:

  • In slow.log for Stored Procedure call you can see profiling for each individial query from this procedure, not just call storproc()
  • With userstat you can get additional THREADS_STATISTICS which show similar information to USER/CLIENT_STATISTICS but per THREAD granularity (it’s useful if you have connection pool)

On this stage the patches are available only in source code, you
can get them from Launchpad https://code.launchpad.net/~percona-dev/percona-patches/5.1.43.  Binaries are also on the way, and will be ready soon. We are running intensive stress testing loads on them to provide stable and quality packages.

And to finalize are results for tpce-like benchmark, where I compare MySQL-5.1.43 vs percona-5.1.43.

The results made for TPCE configuration with 2000 customers and 300 tradedays and 16 concurrent users on our R900 server. The dataset is about 25GB, fully fitting into buffer_pool, so disk does not really matter, but data was stored on FusionIO 320GB MLC card.

On chart with results I show amount of TradeResults transactions per 10 sec during 3600 session (more is better)
tpce-like_2000c_300d

As you see with percona patches you can get just about 10x improvement.
Yeah, that sounds too cool, but let me explain where difference comes from.

As I mentioned in tpce workload details the load is very SELECT intensive and these SELECTS are mainly scans by secondary keys ( not Primary Keys), so it hits problems in InnoDB rw-lock implementations and in buffer_pool mutex contention, which alredy fixed in percona-patches ( and in XtraDB and InnoDB-plugin also).

So you are welcome to try it!


Entry posted by Vadim | One comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

16:45 T-SQL Tuesday #3: SQL Server Relationships (702 Bytes) » Kalen Delaney
It's time for the third T-SQL Tuesday , managed this time by Rob Farley . I thought I wasn't going to make this deadline, since I have been swamped since the end of last week, and over the entire weekend. However, I've got a break with nothing urgent for the next hour or so, so I thought I could just write something short. My first two T-SQL Tuesday posts dealt with SQL Server history, so I will continue along those lines, and this post will just be a very short history lesson. In honor of the impending...(read more)
03:45 TEDthink (4032 Bytes) » Seth's Blog

Can you factor this? X2-4x+4

If you're like most people, you get a little queasy at the thought. And when you were in tenth grade, you surely wondered why they were bothering you.

(the answer is (x-2) times (x-2), in case you were curious.)

It turns out that the real reason you needed to do this work was to be able to play with numbers in your head. Abstract numerical thought is an important skill among educated people.

Which brings us to TED, a conference held every year in Long Beach. It's going on right now.

Dinosaur001-thumb Watch a few TED videos and try to get ahead of the speaker. They have an idea...it's probably a conceptual tricky idea, one with a lot of moving parts. And there is a lot of shorthand and arm waving ... basically, it's similar to a quadratic equation. If you need the other person to slow down and explain every little bit, you've missed the point. The point is to do abstract conceptual thought. To get in practice taking the accepted status quo and questioning it, at least for a little while, at least this or that part of it.

I think this is a skill, a rare one. The ability to be facile in the manipulation of ideas, both theoretical and established, is a valuable one, and I think the TED videos and art of reading books (at least the first ten minutes of each) are two great ways to getting better at manipulation of ideas. It takes practice, and it's worth it.

I sat in a meeting last week with someone who was 100% tactical. She couldn't let go of the urgency of the moment long enough to envision a different future, even for five minutes. The abstract conceptual part was missing from her part of the conversation.

The trick is to be able to leap to, "if we did A and B, would that get us C? Would C be a good thing? Is it possible to do A and B if we really commit?" and then move on to the next one. And that takes practice. Why wouldn't it?

BONUS: Hugh MacLeod, artist, good friend and creator of the cartoon above, has created four cube grenades about being a linchpin. These are limited editions, first come first shipped. (You can sign up for his free cartoon of the day).

  2010-02-08 Mon

12:45 More webinars coming your way (1810 Bytes) » Inside AdSense
We've just published a new webinar schedule for the next month. In our live webinars, we'll highlight changes you can make to your ad implementation in order to maximize your AdSense revenue. We're also trying something new with our upcoming webinars: we've set up open Google Moderator discussions, where you can raise questions related to the specific webinar topic or vote on questions from other participants. We'll then answer your questions live in the webinar.

To sign up for our webinars, please visit our Help Center. Even if you're not able to attend the live sessions, we'll make sure to upload each webinar recording within 24 hours of the event.

09:30 Introducing tpce-like workload for MySQL (14070 Bytes) » MySQL Performance Blog

We have been using tpcc-mysql benchmark for long time, and there many results published in our blog, but that's just single workload. That's why we are looking into different benchmarks, and one
of them is TPCE. Yasufumi made some efforts to make TPCE working with MySQL, and we are making it available for public consideration.

You can download it from our Lauchpad Percona-tools project, it's
bzr branch lp:~percona-dev/perconatools/tpcemysql

Important DISCLAIMER:
Using this package you should agree with TPC-E License Agreement,
which in human words is:

  • You can't name results as "TPC Benchmark Results"
  • You can't compare results with results published on http://www.tpc.org/ and you can't pretend the results are compatible with published by TPC.

And we are not going to do anything from that, your primary goals is XtraDB/InnoDB performance research and/or compare with available Storage Engines for MySQL.

The workload in tpce is quite different from tpcc. Tpcc is write intensive, while tpce
is read oriented.
To give more details, there is stats for 10 seconds:

CODE:
  1. | Com_select                            | 46272       |
  2. | Com_update                            | 5214        |
  3. | Com_delete                            | 385         |
  4. | Com_insert                            | 3468        |
  5. | Com_commit                            | 5404        |

The result is quite chatty,

CODE:
  1. |    |     [MEE]    | [DM] |                         [CE]                          |
  2. sec. |    TR,    MF |   DM |   BV,    CP,    MW,    SD,    TL,    TO,    TS,    TU | MEEThreads, ReqQueue
  3.       (1st line: count, 2nd line: 90%ile response [msec.])
  4.  260 |   402,    39,     0,   195,   532,   749,   588,   342,   415,   816,    88 | 30, 0
  5.           20,    60,     0,    30,    20,    20,    20,    50,    20,   310,    60
  6.  
  7.  270 |   395,    40,     0,   201,   608,   842,   608,   358,   449,   833,    89 | 30, 0
  8.           30,    40,     0,    30,    20,    20,    20,    50,    20,   300,    50

but it allows you to see count of 11 different transactions per 10 secs and 90% response time.

and final result

CODE:
  1. [TradeResult(TR) transaction]
  2. Succeed: 150243
  3. Lated:   0
  4. Retried: 3
  5. Failed:  0
  6.  
  7. 41.7342 TpsE

where you can see count of successful TR (TradeResult) transactions, and
the summary result in TpsE (transactions per seconds).

Expect our results soon!


Entry posted by Vadim | No comment

Add to: delicious | digg | reddit | netscape | Google Bookmarks

08:04 How FarmVille Scales to Harvest 75 Million Players a Month (982 Bytes) » High Scalability

If real farming was as comforting as it is in Zynga's mega-hit Farmville then my family would have probably never left those harsh North Dakota winters. None of the scary bedtime stories my Grandma used to tell about farming are true in FarmVille. Farmers make money, plants grow, and animals never visit the red barn. I guess it's just that keep-your-shoes-clean back-to-the-land charm that has helped make FarmVille the "largest game in the world" in such an astonishingly short time.

How did FarmVille scale a web application to handle 75 million players a month? Fortunately FarmVille's Luke Rajlich has agreed to let us in on a few their challenges and secrets. Here's what Luke has to say...

04:15 Frightened, clueless or uninformed? (2989 Bytes) » Seth's Blog

In the face of significant change and opportunity, people are often one of the three. If you're going to be of assistance, it helps to know which one.

Uninformed people need information and insight in order to figure out what to do next. They are approaching the problem with optimism and calm, but they need to be taught. Uninformed is not a pejorative term, it's a temporary state.

Clueless people don't know what to do and they don't know that they don't know what to do. They don't know the right questions to ask. Giving them instructions is insufficient. First, they need to be sold on what the platform even looks like.

And frightened people will resist any help you can give them, and they will blame you for the stress the change is causing. Scared people like to shoot the messenger. Duck.

The worst kind of frightened person is one with power. Someone in a mob of other frightened people, someone with a gun, someone who is the CEO. When confronted with a scared CEO, time to run. Before someone can change, they have to learn, and before they learn, they have to cease being scared.

One reason so many big ideas come from small organizations is that there is far less fear of change at the top. One mistake board members and shareholders make is that they reward the scared but hyper-confident CEO, instead of calling him on the carpet as he rages at change.

When I first encountered surfing, I was scared of it. It looks cool, but an old guy like me can get hurt. A patient instructor allayed my fears until I was willing to get started. When you first start out, the things you think are important are actually irrelevant, and it's the stuff you don't know is important that gets you thrown into the ocean. Finally, and only then, was I smart enough to actually learn.

I'm bad at surfing now, but at least I know why.

Comfort the frightened, coach the clueless and teach the uninformed.

  2010-02-07 Sun

04:45 The least I could do (1711 Bytes) » Seth's Blog

One way to think about running a successful business is to figure out what the least you can do is, and do that. That's actually what they spent most of my time at business school teaching me.

No sense putting more on that pizza, sending more staff to that event, answering the phone in fewer rings... what's the point? No sense being kind, looking people in the eye, being open or welcoming or grateful. Doing the least acceptable amount is the way to maximize short term profit.

Of course, there's a different strategy, a crazy alternative that seems to work: do the most you can do instead of the least.

Radically overdeliver.

Turns out that this is a cheap and effective marketing technique.

  2010-02-06 Sat

  2010-02-05 Fri

15:15 Automated Root Cause Analysis » I'm just a simple DBA on a complex production system