How OKR Helped Gmail Reach 1 Billion Users: an Interview with Itamar Gilad
What exactly does OKR look like in practice at the company that popularized it? I know many readers are curious about OKR at Google, so I’m happy to introduce you to Itamar Gilad, who, as former Lead Product Manager and Head of Growth at Gmail, helped, among other things, launch the Tabbed Inbox feature and grow Gmail from 400 million to 1 billion users. As a product management and strategy coach, trainer, and speaker, Itamar has developed a number of tools to help teams make better product decisions. My blog editor Melissa Suzuno recently sat down with Itamar to discuss his experiences with OKR at Google.
Tell us a little about yourself.
Itamar: I was trained as a software engineer and worked as a software developer and engineering lead/manager for five years. Then I realized that managing developers wasn’t my thing and switched to product management, which was probably a better fit, because I kept doing it for the next 15 years in various companies including Google and Microsoft. At Google I was a product lead and Head of Growth for Gmail. Since I left Google three years ago I’ve been working as product management and strategy coach, writer, and speaker.
What was the OKR process like for Gmail when you were there?
Itamar: Google uses multi-level OKRs: Company, Product Area (which is similar to division—Android, Chrome, and Geo are examples of Product Areas), and Product (such as Gmail, Maps, and Chrome). Then inside the products there may be other sub-levels, but at a minimum each product team (typically 3–15 people) should have its own OKRs. At the time I was at Google some people used personal OKRs and some didn’t.
You would expect that with this many levels the process would be complicated and time-consuming with long rounds of reviews, but actually it isn’t. Different product areas and products may use different approaches. Google generally trusts employees and teams to do the right thing and rarely enforces one process across the board.
Google generally trusts employees and teams to do the right thing and rarely enforces one process across the board.
At Gmail typically during the OKR season, you would get a draft of the overall Gmail OKR for the next quarter from your managers. Some of the things in it would be derived from higher levels and some were Gmail specific. You would then discuss with your peers what OKRs were applicable to your area of responsibility and copy those parts (Objectives, Key Results, or both) into your OKR doc, but you could and should also create your own OKRs. Some of your key results might propagate up to the product OKR, product area OKR, or even the company OKRs.
For example, as Head of Growth, I would project the Monthly and Weekly active users of Gmail by the end of quarter, and those numbers often appeared in Google’s OKRs. A lot of the discussion and negotiation was done over email. In a typical OKR cycle we would spend a few hours (all together) at team level and a few days higher up the hierarchy.
We also used shared OKRs to coordinate teams, groups, and products around a shared goal. It’s a very powerful tool when used right.
Can you give us an example of what that looked like in practice?
Itamar: When I joined Gmail, we were talking about driving up engagement of casual users, people that use email for personal purposes, with the product. At the time, Facebook was getting a lot of engagement and I guess that focused our attention on this key metric. They’re very different products and in hindsight, it probably wasn’t the right basis of comparison.
The Objective was easy—we wanted to raise the level of engagement of casual users. But when it came to the Key Results, it wasn’t that simple. Should it be about reading more messages? That wasn’t necessarily a good Key Result because reading more messages might be a result of getting more promotional email and social notifications. The users may be engaged, but for the wrong reasons.
Another metric we considered was messages sent. However, people don’t necessarily need to send that many personal emails these days. They have chat and social media, so the number of emails sent doesn’t necessarily mean that email isn’t engaging.
So we realized that maybe we were asking the wrong question. We conducted research—interviewed people and did quantitative analysis and realized that people had a lot of promotional and notification emails that cluttered up their inboxes, and made it hard for them to find the messages they really cared to read.
We realized the main goal we should set was not driving engagement, but driving engagement with the right type of email—enabling people to read just the emails that they really cared about and interact with the other ones in a different way.
We realized the main goal we should set was not driving engagement, but driving engagement with the right type of email.
So we set up an OKR around that. Something along these lines:
Casual users have their most important mail front-and-center
- Achieve a % of “Primary” messages read > X%
- Achieve a false positive rate < Y%
- Achieve a false negative rate < Z%
False positives are messages we thought would be important, but turned out not to be so. False negatives are messages that we thought were not important, but it turned out they were.
That led to a big project that ended up launching the Tabbed Inbox, which puts your email into Primary, Social, Promotions, and other tabs. Initially it was a feature we planned to launch just on desktop. As we started testing, we realized the problem was even bigger in mobile. We concluded this needed to be a shared OKR with the Android and iOS Gmail teams.
At the time, Gmail Android was part of the Android product area, a completely different division in Google that had its own set of objectives, so I reached out to them. I presented the problem and asked if we could share the OKR of relieving the pain of those casual email users, and luckily they agreed.
When I teach OKRs, I say the good thing about shared OKRs is that sometimes you get a “no” and then you know that’s something you shouldn’t focus on, at least this quarter. But in this particular case, they said yes, and we got to co-launch the new inbox across Gmail clients. We touched hundreds of millions of people with this very big and visual change that was very well received, thanks to the power of shared OKRs.
How would you describe alignment at Google, including cross-team alignment?
Itamar: Of all the companies I’ve worked with or at, Google is the one where the people are the most “in the know.” People understand what the company is trying to achieve and that helps with cross-team communication, up and down communication, and that’s partly because Google management spends a lot of effort sharing both the goals and the motivation. That doesn’t mean that we always agree—there’s often discussion, which is very important as well—but you can talk to anyone and they will probably know why their product has certain goals set.
I’d also say the process creates a lot of cross-alignment. For example if your product is busy with improving performance in emerging markets, there’s a good chance that peer products have inherited the same goal, and that makes it easier to collaborate (which is important because there are a lot of integrations between Google products and systems). If you’re all pulling in completely different directions, the discussion becomes more difficult.
How does all this information about OKRs get communicated?
Itamar: There were quarterly all hands meetings. The company-level OKR reviews were streamed live and recorded. The CEO and senior vice presidents go through the previous quarter’s OKRs and how we did and then outline the OKRs for the next quarter. There’s extreme transparency.
On a product level, such as for Gmail, we’d have a product-specific all hands to discuss our OKRs. On a team level, we’d also have meetings to review OKRs, so there’s a lot of communication. Once you create your OKRs, they’re all transparent, so anyone can open anyone else’s OKRs and there’s never any secrecy about them. It’s very easy to understand what people are working on.
In your view, how could Google improve the way it uses OKR?
Itamar: Google started using OKR at a time where output OKRs (OKRs that specify activities rather than outcomes) were still the norm. For this reason you could encounter output OKRs at Google during my time there, though I know things have improved since.
Output OKRs are something I encounter a lot in nearly all companies. Most people are conscious that specifying “Launch X” is not a good key result and try to avoid it, but there are many ways to disguise output OKRs as outcome OKRs—I should know, I did it, too.
The classic scenario is you fall in love with an idea, you build a project around it, and the OKR is a bit of an afterthought. We already convinced ourselves that this is the right idea, but now we need an OKR. If it’s very hard for you to write OKRs, it could be because you went one step too far and you already decided on the activities.
If it’s very hard for you to write OKRs, it could be because you went one step too far and you already decided on the activities.
There are subtle ways people work around this issue. For example a key result like “In the next quarter we expect 45% of our users to use the new onboarding flow.” It sounds great—it’s an outcome, it’s measurable—but who actually said that this new onboarding flow is a good thing?
This OKR is actually disguising the fact that our mission is to launch the new onboarding flow. So that’s not a good outcome. A good outcome would be based on what the new onboarding flow would provide to users—would it shorten the onboarding time? Would it increase the success rate of onboarding? Any other benefit? This kind of OKR is okay only if you’ve already tested it very thoroughly and have very strong evidence that the new onboarding flow is actually helping these other key metrics.
Google also relies a lot on big themes to drive the company strategy. For example, in the past, Mobile-first/Mobile-only was about building features and capabilities for people that were mostly or only using the product on mobile. Today that seems like a no-brainer, but in the early 2010s, this was a good theme to focus on.
The themes are often reflected in the OKRs as goals. The risk there is that sometimes people build things just because they align with the theme, even if they don’t really help the users or business.
An example would be a predominantly desktop product that freezes desktop product development in order to focus on mobile-first/mobile-only users, even though there’s no evidence that this market segment is about to become important enough to justify such a big investment.
There’s an assumption that alignment with the theme is sufficient evidence, but in my view that’s rarely the case. There are many products that are built based on themes (think blockchain, chatbots, VR…) and have no merit. The right approach (and I assume the one Google believes in) is that you should let the theme direct you, but you should only set goals that make sense to your product and market.
Even when people create good, outcome-based OKRs, they often struggle with deciding what to do next, which activities to work on. And they tend to make decisions without data, based only on someone’s opinion—usually the boss’s. What should teams do after creating OKRs?
Itamar: The key problem is that, in tech, as well as in many other industries, reality is increasingly uncertain, complex, and fast changing. Markets are evolving quickly, competitors are coming in and out, technologies are changing. We never really know whether our software can do what we want it to do, how many bugs there will be—there’s a lot of uncertainty.
I see a lot of companies not dealing with this reality very well. They rely a lot on top-down planning, decision-by-committee, opinions, weak heuristics, and cognitive biases. This is why I developed the GIST framework to help companies systematically drive their products towards business and market impact. The system has four layers: Goals, Ideas, Steps, and Tasks. Goals include OKRs and metrics. Ideas are about collecting and evaluating many ideas fast using evidence. Steps are mini-projects developing the idea and testing it at the same time (implementing the principles of Lean Startup and Design Thinking), and tasks are the day-to-day activities that implement the steps. You can read more about GIST here. My book on the topic is coming out this autumn.
The Confidence Meter is a tool that I developed to allow people to evaluate the strength of the evidence in support of their ideas. For example opinions and thematic support (the blue area on the upper right) give us weak evidence and therefore very little confidence. Tests and experiments (dark-red on the upper left) give us strong evidence and much more confidence. The GIST framework guides you to test ideas iteratively in build-measure-learn loops and then re-evaluate them using this tool.
You can download the tool free (with instructions) here: confidence calculator.