by Steve Rivers
A few months ago, I wrote a column in MusicBiz regarding the use of callout and some of its pitfalls. As a PD, I used callout, along with requests and store sales, as programming weapons to triangulate hit songs in order to build power rotations. I did this to make sure my stations where playing the hits.
MusicBiz has assembled quite a panel to answer some of your callout questions: Bob Lawrence from Pinnacle Media Worldwide, Mark Ramsey from Mercury Research, Janis Kaye from Marshall/Kaye Research, Todd Wallace from Todd Wallace and Associates, Jonathon Little from Troy Research, Guy Zapoleon from Zapoleon Media Strategies and Garry Mitchell at COM Quest.
RIV: I'm guessing that a combination of avoiding telemarketers, the "no-call list", Telezappers and people living busy, complex lives have made it difficult to get people on the phones to complete a research call. How have you dealt with this problem?
GARRY: We have developed a couple tricks in the manner in which we order sample from SSI (the same company Arbitron uses to recruit diary keepers). We have also introduced some new technologies that have been effective at getting people to pick up the phone in the first place. I could tell you more about that, but then I would have to kill you. Lastly, we continue to dial like crazy. Sometimes it literally takes 10,000 dials to get an in-tab sample of 100.
MARK: It has not really been a problem for me. Sure,
cooperation is down over the years, but that's true of all varieties of
research across all industries.
TODD: Largely, it is a numbers game. It simply requires spending more hours making more calls to snorkel for respondents, which could mean random cold calls, or using semi-qualified leads from listener databases.
JONATHON: Troy Research has offered traditional research (AMTs, callout and perceptuals) along with all our online products during our existence. In recent years, our online products have become our primary focus (Weekly Music Tracking, 500 Song Library Tests, Focus Groups, Online Perceptuals). We began moving away from traditional callout a couple years ago as the difficulty (and the cost) of recruiting regular folks who could pass the screener and take the surveys increased. Caller ID and no-call lists were the first serious issues, but the amazing increase in young listeners with cell phones and no landlines meant callout had been eclipsed by technology. It was cost-prohibitive to get enough P1s with random phone calls. Representative samples could not be found.
RIV: How many times do you use the same respondent in callout surveys?
BOB: While we don't do callout and only offer OnlineTRACKER?, we believe the same respondents should not be used with great frequency. It is important to avoid "the golden nose syndrome," which creates respondents who believe they are special and their opinions are more valuable than anyone else's. This biases the research. Online samples are much larger, making it easier to invite the freshest respondents and avoid the problem.
GUY: You should use a respondent only a few times, but these days that is becoming a virtual impossibility with the difficulty of getting a large enough callout database, so as long as you space out the times you ask this person to take the survey (say, three weeks in between), you should be able to use this person more often.
JANIS: We prefer no more than three times. If you're using 100 samples each week, then get 25% new respondents in to the sample each week.
MARK: I don't do callout, but I suspect that most companies allow respondents to participate more than they will acknowledge publicly. When you get a live one, you don't want to toss it back in the ocean. On the other hand, the average callout respondent wants to participate only about three times, as I recall, before they remove themselves. I know of at least one company that cheats in this regard, and I suspect many others do likewise.
GARRY: No more than six times per year, maximum; with a rest period of a minimum of 30 days between participation.
TODD: Unlike some researchers, I have no problem with allowing the same respondent to repeat participation in a callout music survey--as long as they are qualified through a proper filter. After all, Nielsen television research has been successfully using "panels" for years. Now Arbitron has given its stamp of approval to repeating panels for its PPM initiative. If it's valid enough for Nielsen and Arbitron to base their future on, why shouldn't it be a valid method for conducting callout music research? Remember, callout music research is a survey of both popularity and TIMING--we want to know how target listeners feel about each of the songs we are checking THIS WEEK. So let them participate as long as they want; they'll tell you when they don't want to do it anymore. Some actually look forward to doing it. The net result is you'll have larger samples sizes that you would by insisting on some arbitrary "rule" that a respondent needs to "rest" for a week or two between surveys.
RIV: Are you finding yourself using listener databases from contest winners and other promotional activity to build your respondent lists?
MARK: Steve, companies have been doing that for a decade. All companies. And don't let anyone tell you different.
BOB: Databasing is clearly an important method, and using them offers a way to find the same people who agree to participate in keeping a diary. These people are also the most willing participants, who are annoyed less by the disturbing phone calls. OnlineTRACKER? actually invites listeners who sign up on the station's Web site. They are, indeed, very active listeners and are truly the ones who help identify hits and stiffs faster and more reliably than any other sample. It's most like the real world, and maintaining real-world reliability is most important.
TODD: Frankly, I have always felt comfortable including listeners from other databases in the respondent stew. Again, the key--especially if you're mining respondents from a contest sign-up list--is making sure each prospective respondent is fully qualified. In other words, you want to properly filter for age, gender, station arrays, TSL and any other pet crosstabs you like to zero in on, like workplace listening or dayparts or lifegroups. Some researchers seem to think that unless respondents are gathered with absolutely "pure" random sampling, the results will be tainted or contaminated. When you think about it, the important thing is finding target-demo listeners who listen to the station array or the music life-group you want. It generally doesn't matter if you find them by way of a "pure" random sample, or a contest database, or trolling through the request-lines (as long as the respondent still fits that filter). I do believe it's a good idea to limit the composition of "impure" respondents in your sample, so it's not skewing all one direction. But I've got to say, I've done side-by-side tests of pure-sample vs. database-derived respondents and found no appreciable differences in any of the key readings.
One caveat: When you have a sample comprised of only people who've called your request line, you'll tend to see that it's a little more active than a randomly found active/passive sample. But that actually helps you get a faster fix on your music marketplace, which is, I think, a desirable skew. In this case, it's a short-cut that costs you less and arguably gives you sharper insight.
GUY: Most radio stations are going to any and all databases to pull usable respondents. In my work with radio stations, virtually every single callout company I've worked with is struggling to get database, and most have asked for the station database in advance of working with the station.
GARRY: Never. Generally, the level of client stations we deal with are quite exacting as far as how we recruit. Contest winners, Internet recruiting, referrals of friends and family, etc. are all verboten.
RIV: The main concern I keep hearing from programmers is the complaint from "wild swings" in weekly and bi-weekly reports on songs. How do you suggest they handle these?
JANIS: Wild swings on songs can likely be traced back to the sample in the survey changing in some way from week to week. You should examine the screening requirements.
Some companies take 100% cume and half P1s to the station. The other half could come from stations outside the format target. If the composition of these non-P1 stations lean one way one week and another the next, this could cause certain songs to "wobble" from week to week. For instance, you could have a Mainstream A/C target station, and the non-P1s could lean more toward Country one week and more toward Hot A/C or Rock the next. A Country crossover song could look great one week, then drop lower the next, depending on the lean of those non-P1 stations. We usually require that the non-station P1s be screened to like the target format, or come from the close music competitors to insure a consistent target sample from week to week.
BOB: The larger samples that come from online research versus callout have been proven to exhibit far less statistical wobble than in traditional callout. When callout gives 80-100 people using rolling averages of two weeks and may only contain 40 respondents in some cases, OnlineTRACKER? has hundreds in the sample each week.
JONATHON: We saw wild swings when PDs were using 40 fresh respondents bundled into an 80 rolling average. A new sample of 40 persons is just too small. We handled that issue by combining two methodologies into a product we call Hybrid Online / Telephone Music Testing. Traditional telephone surveying is used to reach 50 potential listeners who are not already P1s to the station. At the same time during the same week, our online music tracking is used to gather feedback from the station?s loyal P1s, who are almost impossible to find randomly dialing the phone. We combine those results to provide the perfect mix of music data from "P1 and Potential" listeners.
TODD: One way is to activate large enough samples to justify
weekly research, instead of bi-weekly or every three weeks the way some
stations do. When we started doing callout all those years ago, one of
our main objectives was to pinpoint the local timing of every
individual song we were playing or considering for airplay. The
questions from "back in the day" were: Are the key passion indicators
strong enough to justify early airplay? Has the song achieved enough
critical mass in positive familiarity to justify intensified exposure
in a hotter rotation? (Put the wrong song in hot rotation too early and
it kills your TSL.) Is burnout becoming a significant factor? Well,
when your data is a two-week rolling average, it's very difficult to
get a clear answer to questions like these.
GUY: There are ways to handle this. One: I always recommend people do two-week averages of the key demos. Two: The higher the percentage of P1s, the more stability in the results (although you get wobbles in all callout). The larger the sample sizes, the less wobbles you would get from week to week, but I'm talking 200 people or more to get to that confidence level.
MARK: Pay what the stuff is worth. Don't cheap it out. Buy more respondents. Pay what it takes to have it done right.
GARRY: Force their callout vendor to source all the data they provide. I can't overstress how important this is. If you're merely looking at compiled results (i.e. song scores), and you have no idea about the people who comprise that sample, you might as well be using a dartboard! Your callout provider should be sending you (at the very least), the first names, phone numbers, demo/gender and station listening information for every respondent, every cycle. If they can't (or won't), well, there's your first red flag. We have literally caught some callout companies selling in-tab samples of "80" completes to stations, when they are actually only completing 40 "new" calls (the remainder of the sample are the 40 they already sold you from last week). Hence, the nonsensical scores and wild swings week to week.
RIV: It seems to always take ballads a long longer than other types of music to "kick in" with callout. What is the average time-span today to ascertain when a song, ballad or otherwise, is a hit?
GUY: That's the way it has been since the beginning of music history. Ballads are the slowest to test and the longest hits to stick around. People don't give songs nearly the chance to become hits that they did, and a lot of that comes from expecting too much from callout. Traditional callout can't reliably tell you the potential of a hit by looking at the potential or mean score when the song is less than 70-80% familiar. That's why Internet callout and predictive systems like Promosquad HitPredictor are better ways to tell if a song has the potential to be a hit. To see if a song is a hit, it may take as long as 10 weeks. In the case of a niche format like Hot A/C, when a Mainstream Top 40 isn't also playing the song, I've seen it take up to nine months! As Richard Harker of Harker Research once pointed out to me, you need to look at the "likes" as much as the "favorites." Early on in callout, if a song isn't really negative and has enough likes (as well as favorites early on) it's worth holding on to.
BOB: We have found that, generally, four weeks is enough time to test a song to see if it is truly a hit. OnlineTRACKER? will usually be able to identify these songs even faster than callout, since the respondents are very active listeners...and finding hits and stiffs faster is always a good thing!
GARRY: I'm not sure there's any hard and fast rules on how long it takes a "hit" to surface in callout, but there are a couple tips: If the song is highly unfamiliar in callout, it's probably too soon to be testing it (we suggest a minimum of 100 daytime spins before testing new music). Also, I believe the passive nature of callout tends to generally favor the uptempo and foreground songs more so than the ballads. This is really a generalization, because then something like Mariah Carey's "We Belong Together" comes along, and is so far above and beyond the scores of anything else being tested in Top 40, that it destroys that theory!
JONATHON: We have online clients who begin testing a song when it reaches 100 spins. Others won't test until it has reached 200 spins. If a PD really believes in a ballad that still isn't showing after 200 spins, he/she may stay with it through 300 spins and test again.
RIV: Is this due to splintered exposure due to the cable music channels, iPod listeners or generally radio?s reluctance to play new music first?
BOB: Radio has indeed, become more conservative due to the need to play only hit product and take less risks, as a result of more choices and new technologies.
GUY: Yes, radio does not own music listening like it once did. Its share is dramatically down from 20 years ago, and that is a huge reason why getting a song familiar through radio airplay takes so much longer. The other reason is that getting traditional callout to have the accuracy it once had is impossible, since people will not get on the telephone to take a survey due to abuse by telemarketers, etc. That means the chance of getting a the most accurate sample, a true random sample is virtually impossible, so we're stuck with milking a database with much smaller numbers than random calling would provide. So you see less accuracy in representing the market's population, and bigger wobbles from survey to survey.
MARK: It's due to the fact that the average radio listener is not qualified to rate new music, yet thats what we ask them to do. It would be like testing a new iPod on your dad. New music should be tested among those with a taste for it and an aptitude to tell the good (i.e., hit-likely) stuff from the bad (i.e., stiff-likely). To my knowledge, nobody...nobody...does their callout this way.
JONATHON: Personal observation: It's both. It now takes longer than ever to find out if you have a hit. There's the phenomenon of splintered exposure coupled with radio PDs/MDs who, for nearly 10 years, have based adds on criteria other than 1) fit and 2) hit potential. Now that the monthly check and/or the free concert aren't part of the add, their two primary criteria for adding a song are gone. So it's easy to just let somebody else break the new music. Man, do I miss those days when two Top 40s would battle it out to see who could make the hit first with what they considered to be a song that was a perfect fit for the station and the audience.
GARRY: Since we're primarily qualifying respondents based on terrestrial radio cume and preference listening (we?ve never used iPod, satellite or cable radio as screening criteria, but sometimes do use music montages or clusters), it surely doesn't help radio's cause to not be on the leading edge of playing new music. Without the exposure of new music by radio, it undoubtedly is going to take longer to see encouraging test scores from much of this new music in callout results.
RIV: What is your view on Internet "callout", yielding huge samples of what seems to be largely P1 and heavy P2s?
BOB: Online research has clear, definitive advantages. Sample size is larger, the audience is real (and we know that since they have signed up on the station's Web site), there is less wobble from week to week, and the real hits are identified much faster. Using online research enables programmers to see data immediately, even as the studies are still in progress. You are E-mailed at various intervals during the survey and you can examine data then, or wait until completion. The process is entirely customized for every programmer's needs. The cost of online research is another real advantage. Rather than costing $20-80k a year, OnlineTRACKER? is offered at a fraction of traditional callout costs.
MARK: When it comes to new music, P?s and heavy P2s are all
that count. These folks lead the rest literally and figuratively. We
should not fight this. We should embrace it.
All callout for new music should occur online, bar none.
GARRY: Internet music and perceptual testing delivers hefty samples without any of the heavy lifting of weekly callout (that is the good news). The downside, if there is one, is that you are primarily "preachin? to the choir"--your station's P1s (namely, those who are computer-literate, have Internet access, and the time and desire to surf to your Web site). Many of our callout clients use the Internet as a valuable adjunct to weekly callout; it does provide an "early-indicator" of Positive Acceptance and Burn from your P1s, but so far we have been unable to devise a reliable formula to project those scores to represent the market as a whole. Early resistance (5+ years ago) from PDs regarding Internet testing centered primarily around a fear that competitors (or devious record company reps!) would hack into your test and skew the results, but most savvy programmers have now educated themselves better on how this stuff actually works, and that doesn't seem to be the issue it was early on.
TODD: I think it has great potential for at least two reasons, maybe three. 1) These humongous samples tend to minimize the impact of any moles in the system. In callout, if you have a couple of moles in your 60-sample survey, that could sway the results 4 or 5% (depending on how the sample is weighted). But if your main competitor slams your Internet study with 10, 20, or even 50 bogus votes, that's a very small drop in the bucket out of five or 10,000 other legitimate voters. 2) Aren't P1s and heavy P2s the most important "types" of listeners we can attract? What better way to attract them and satisfy them than to limit the snorkeling filter to only those listeners who are going to contribute the most to our TSL! Doesn't that make sense? 3) One of the side-benefits of Internet research is that regular listeners (or at least those who surf your Web site) get the impression that THEY can contribute. When listeners are persuaded to participate in your weekly Internet survey, they tend to "buy in" to the music you play (because they think they've helped select it). That's a great benefit. If you can get 10,000 of your listeners to think they have a stake in your music selection process, they are bound to tell a friend about how cool your station is.
JONATHON: Company President Bill Troy invented Internet Music Tracking close to 10 years ago while he was working with Kurt Hansen at Strategic Media Research in Chicago. He launched his concept with RadioReseach.Com, the world?s very first online music testing product, in 1997. For the first four years, when callout was still very healthy, we did dozens of side-by-side comparisons: callout data next to online data. Callout had 70-80% P1s in its sample; online had 85-90% P1s in its sample. The data was nearly identical. In a typical 30 song list, the top 10 songs and the bottom nine or 10 songs were in the identical rank order. So both systems were demonstrating the obvious hits and the songs of least value. The 10 songs in the middle were close, but not always identical. Song 13 on one list might have been 14 on the other. Or 19 might have been 17 or 18 on the other.
Sample sizes were typically 80 in the callout survey, 100+ in the online survey. That convinced us of the validity of online data. In fact, some early believers, like Tracy Johnson at Star 100.7, after using Troy Research for a few months, began telling us that they considered the Internet survey participants more representative of their audience than callout or AMT respondents.
GUY: There are some excellent companies doing Internet research. Pinnacle Media Worldwide's Online Tracker is the service I'm most familiar with (although both Troy Research and RateTheMusic provide very good products as well). What I've seen is excellent results that give the same results that traditional callout gives six weeks later for most of the songs tested. These services work hand in hand with the radio station, to build a listener database (the same as traditional callout companies do now) to rate hooks on the internet.
Like anything else, it's how much you put into this system. If you get a minimum database of at least 10,000 people, and reward them with prizes, and you can get dependable results that dovetail in time with traditional callout. One pitfall is don't try to get a huge sample each week, otherwise you'll be reusing the same people each week and diminish the accuracy of the system. Yes, it's mostly P1, but P1s give you 70% of your ratings, so what's wrong with that! I believe that traditional callout gets the 50% of your audience that doesn't hang on the Internet and is slower to adopt new music (a lot of the same folks Arbitron is using). Internet callout uses the 50% of your listeners who are most passionate about music and more on top of new trends; that's why the results are ahead of traditional callout. Some people say who cares about these folks since they don't mirror Arbitron like traditional callout does. Well, these people on the Internet are 50% of the audience and the most active people who influence the other 50% who aren't on the Internet as much. These people are the ones who will determine the future.
RIV: Regarding online research, is it your view that as long as you realize what makes up the sample--in short, if you know what you're looking --does it have value even if you are preaching to the converted?
MARK: I think what you mean is...is any sample okay as long as you view it with regard to the limitations of its composition? If that's what you mean, my answer is...no. Bad research is unambiguously bad. I would do my stuff online long before I'd trust a research firm with small samples of repeat offenders and contest entrants rolled across multiple weeks. That is an exercise in futility. The fact that stations don't budget properly for callout tells you all you need to know about how it's valued.
JONATHON: It is of great value, especially to programmers who no longer have $50,000 a year for traditional research. The P1s who make up 80-90% of most Internet music panels are very much like Arbitron diary keepers--they are opinion-givers. They are also the 30% of a station's audience who contribute 70-80% of the reported listening. Why not pay careful attention to what they want, at the same time keeping in mind that "burn" and "popularity scores" are slightly higher because the online panel is composed of a high percentage of loyal listeners.
GARRY: Yes, as long as you keep in mind that the sample is comprised of (active) P1s (with computer and Internet access), and don't try to project the results to represent the entire market (or even to replicate the passive nature of callout research), it can be a valuable tool and "early indicator".
TODD: I think so, just as long as you know what?s in the sausage. With regard to whether you include "impure" respondents or not, if I was programming a current-intensive music format, I?ll take the larger sample size anytime. I'd much rather field a weekly callout survey comprised of, say, 50 "pure" respondents who might be repeat panelists, plus 30 respondents from a fully re-qualified contest list, plus 30 respondents gleaned from your request lines (for a total sample of 110 target-demo respondents, who albeit might lean slightly active). I?ll take that over a rolling average or bi-weekly survey of 60 "pure" randomly-acquired respondents, where the test scores are bouncing all over the place. The "wild swings" and "sample wobbles" make it very difficult to use callout research with the kind of precision that you and I are used to (especially what we used to do back in the good old days when we had the luxury 150-sample sizes every week).
BOB: Our objective at Pinnacle Media Worldwide is to remain as goal-oriented as possible. Of course, any sample offers some telemetry, but the idea behind weekly tracking (whether callout or online) must be to identify the audience giving you most of your listening and determine what they like and what they don't. The point of research is to serve your P1s and get them to listen longer--and make every effort to identify differences in P2s and try to convert those P2s without sacrificing P1s.
GUY: No, not if you don't know how to interpret it. A bad sample is a bad sample is a bad sample, and not worth looking at. Chances are it could send you in the wrong musical direction entirely, and you're better off using your gut or other measurement tools to find the hits. I see a lot of great songs lost because callout doesn't show positive results early on with new music. When you take that together with label politics, and pressure on programmers to add newer songs, a lot of great songs get lost in the shuffle.
RIV: What is the best advice you have for PDs and MDs who continue to see shrinking budgets to best make the case for continued callout programs?
BOB: OnlineTRACKER? offers the most reliable, most up-to-the-minute and most affordable answer to a problem that simply will not get better for traditional callout. Our people take the surveys when they want versus when the phone rings. At this time, for those not convinced, we would tell them to try OnlineTRACKER? for 90 days and see what others have seen. Eventually, callout will become cost-prohibitive.
GUY: I know more and more people will go with Internet callout as time goes on, you can get it for so much more cheaply, and eventually it will be a much more productive way to get people to respond and take music surveys. Respondents can take it at their convenience no matter what time of the day it is, and research companies will continue to find better ways to gather and verify the sample. So for now I'd say use both, traditional and Internet callout. Make sure you spend enough money to get a good sample each week, (80 is barely enough to look at the total sample, with much less reliability when you split out the sample into demos and P1s) When you are setting up traditional callout, stick to your guns to avoid research companies taking shortcuts and be realistic with screeners and quotas. If you don't do this, you?ll be looking at the road to bad information with bad ratings to follow.
JONATHON: It is difficult to make a case for continued callout as it has been done for 30 years. It's become very expensive, and the call center can't easily deliver a large enough sample (80 per week minimum) that's representative of a station's audience. Our hybrid telephone/online testing product can save 20-25% over regular callout, while delivering a larger sample (100 P1s and 50 non-P1s) and a more representative sample. For less than $10,000 a year a station can replace its callout with 50 weeks of Online Music Tracking featuring custom reporting (Z Scores, Standard Deviations, Net Positives, etc), trending, and suspect data detection.
MARK: Companies need to put their money where their mouth is. If you can't make your case to your bosses, then there isn't a case to be made. Give up. Do it online.
TODD: The best way to get bang for your buck is do your callout in-house. That carries with it several side benefits. Your snorkeling process can generate tracking studies (weekly, bi-weekly, or monthly ratings) with all kinds of great insightful camera angles. Plus, you can attach a few perceptual questions to your music study, to help you gain a better perspective on other programming elements. It is a pain to supervise properly, but at least you're getting regular feedback from your audience.
GARRY: Couple thoughts: 1) Source all your data. If you can't see who your research company is talking to (names and phone numbers of all respondents), your research results aren't worth diddly-squat (send me your callout money, and I'll flush it down the toilet for you); 2) Reallocate your callout resources. If you don't have enough money to deliver reliable samples of 40 in-tab each week (producing minimum of 80-completes in adjacent weeks), then re-schedule your callout cycles to cover the most important weeks; those prior to and during the first two phases of the Spring and Fall books; 3) Supplement a more limited callout program with Internet music and perceptual research. 4) Consider getting value-added mileage out of the by-products of your weekly callout. For example, many of our clients actually market to (direct mail and telemarketing) people who cume or prefer one of their competitors', but don't qualify for their own callout (you do have access to all the names and phone numbers in your sample, don?t you?--see #1, above); 5) Quit doing callout research if you don't have enough money to do it right. Some PDs believe bad research is better than no research at all, but I disagree. Most PDs' guts would do a better job of picking the hits than some of the shoddy research we see out there.
I'd like to thank our callout panel--and you reading this. I hope you've found some useful information. If you would like to speak with any of these experts, E-mail firstname.lastname@example.org and I will be happy to furnish you with an E-mail address.
Most of all, I want to encourage you to read Gerry Cagle's latest rant to remind all of us, that once upon a time programmers and music directors were excited to discover the next hit song, and jocks sold it on the air with excitement because they loved it. People actually called radio stations and begged to hear it. It didn't take plasma TVs, Best Buy coupons, or anything else to get it on the radio. Just the magic in the grooves.
All the best this year. Family. Friends. Health.