Error with stdev array - getting wrong answer

E

Ethan Brown

I'm trying to use a stdev array to calculate the stdev for rows within a list.

The formula references a template spreadsheet where it looks for IDs in column A that match the unique value in A# of the current spreadsheet. The values that I want the stdev of are located in column AO of the template

= STDEV(IF(('[Averaging Data Template.xlsx]Template'!$A$1:$A$1000=A10)*('[Averaging Data Template.xlsx]Template'!$AO$1:$AO$1000<>""),'[Averaging Data Template.xlsx]Template'!$AO$1:$AO$1000))

What is odd is that I'm getting a different value for stdev (224.1236) than if I filtered the template for only the data I wanted, pasted it in a new spreadsheet so I only had the data that matched A10, and then calculated the stdev of those values (223.5701).

Any clue why the values would be different? There are ~400 records in the template file. Some rows that match A10 do not have a value in in column AO.

And I am using ctrl+shift+enter to make it an array.
 
J

joeu2004

Ethan Brown said:
=STDEV(IF(('[Averaging Data Template.xlsx]Template'!$A$1:$A$1000=A10)
*('[Averaging Data Template.xlsx]Template'!$AO$1:$AO$1000<>""),
'[Averaging Data Template.xlsx]Template'!$AO$1:$AO$1000))

What is odd is that I'm getting a different value for stdev
(224.1236) than if I filtered the template for only the data
I wanted, pasted it in a new spreadsheet so I only had the
data that matched A10, and then calculated the stdev of those
values (223.5701). [....]
And I am using ctrl+shift+enter to make it an array.

Darn! That was my first guess: that you had not pressed ctrl+shift+Enter.
To be sure, please confirm that you see curly braces around the entire
formula in the Formula Bar (i.e. {=formula}).

On theory: some of the "numbers" in AO1:AO1000 might be text, not numeric,
even if they appear to be numeric. When you copy and pasted those cells, it
might have corrected the problem automagically.

Otherwise, I don't see any "obvious" error in the formula. So it would be
helpful if you could upload example Excel files that demonstrate the
problem. It would be even better if you abstract the problem into a single
Excel file.

Upload the file(s) to a file sharing website and post the URL of the
"shared" file in a response here. The following are some free file-sharing
websites; or use your own.

Box.Net: http://www.box.net/files
Windows Live Skydrive: http://skydrive.live.com
MediaFire: http://www.mediafire.com
FileFactory: http://www.filefactory.com
FileSavr: http://www.filesavr.com
FileDropper: http://www.filedropper.com
RapidShare: http://www.rapidshare.com

-----

Some observations unrelated to your problem....


Since STDEV ignores empty cells and cells that contain text (like the null
string), the formula can be simplified as follows:

=STDEV(IF(('[Averaging Data Template.xlsx]Template'!$A$1:$A$1000=A10),
'[Averaging Data Template.xlsx]Template'!$AO$1:$AO$1000))

Also, I suspect you should use STDEVP, not STDEV. The latter is used when
you have a sampling of data that represents a larger population, and you are
trying to estimate the population std dev.
 
E

Ethan Brown

Joeu2004,

Thanks for the quick reply. I'm still a novice with arrays so I appreciate the advice and your help. I think your point on the stdevp is a great one.

I'm not sure however that stdev in an array automatically ignores the blanks.

I took your advice and uploaded a dummmy version of the data in rapidshare. I've only copied the data for one ID, but you'll see that there are multiple rows associated with the ID, some of which has data and some of which doesn't. When I use the STDEV formula, I get one answer which requires me to use "<>" in my array formula to replicate. Am I missing something, or is there an error in the formatting which is causing me to have to use the "<>" portion of the array formula?

https://rapidshare.com/files/2781698737/Stdev_Test.xlsx?bin=1
 
J

joeu2004

Ethan Brown said:
I'm not sure however that stdev in an array automatically
ignores the blanks.

I 'spose you could ready the STDEV help page <wink>. To wit:

"If an argument is an array or reference, only numbers in that array or
reference are counted. Empty cells, logical values, text, or error values in
the array or reference are ignored."

However, for a very different reason, I was wrong with my suggestion that
you can avoid the IF condition (C1:C36<>"") in your array formulas, using
your new example.

I will discuss that below. But I am not sure that is related to your
original problem.

If you still need help, please upload an example file that demonstrates your
original problem and post the URL here.

PS: Note that your newsreader or newserver changes the subject line each
time you post. That is confusing. Please try to work around that either by
setting an appropriate option, if one is provided, or by copying the
original subject line of the message you are responding to, adding the "Re:"
prefix if necessary.



Ethan Brown said:

You have sparse values in C2:C36; and the cells without values are empty (no
constant and no formula). Also, all the values in A2:A36 match the values
in E3 and E4, which are the same.

You have the following formulas and approximate results:

F2=0.127789: =STDEV(C2:C36)
F3=0.127789: =IF(OR(COUNTIF(A1:A36,E3)=1,E3=""),"",
STDEV(IF((A1:A36=E3)*(C1:C36<>""),C1:C36)))
F4=0.202208: {=IF(OR(COUNTIF(A1:A36,E4)=1,E4=""),"",
STDEV(IF((A1:A36=E4),C1:C36)))}

0.127789 is the correct result. You can demonstrate that with the formula
=STDEV(C3,C9,C13,C23,C36).

That demonstrates that STDEV does indeed ignore empty cells.

You can demonstrate that it also ignores cells with text, notably the null
string ("") that appears blank, by changing the range in F2 to C1:C36, which
is consistent with the range in F3 and F4. Note that A1 contains text, the
column title.

-----

However, there is still the question: why do F3 and F4 return different
results?

And that begs the question: why does F3 work at all?!

Note that F3 is not array-entered, as it should normally be.

It works only by accident because the formula is in row 3, and A3=E3 and
C3<>"". Consequently, STDEV(IF((A1:A36=E3)*(C1:C36<>""),C1:C36)) is
evaluated as STDEV(C1:C36).

If you copy the F3 formula from the Formula Bar (not the cell) and paste it
into, say, F8, you would see that the formula returns an error when you
press just Enter. It returns the correct result only when you press
ctrl+shift+Enter.

It is difficult to explain why the non-array-entered formula seems to work
(misleadingly) in F3. It is the reason why named references work in normal
formulas. It is treated as what I call an array-indexed formula: the row
number of the formula determines the values used from array references in
the formula.

If you don't understand that, no matter. It is not what you want to do,
presumably. However, it might explain why some array formulas "work" (i.e.
return a value, albeit often bogus) in one context, but return an Excel
error or the wrong value in another context. That is why array formulas are
risky to use, IMHO.

Let's move on to the formula in F4....

F4 is correctly array-entered.

It returns the wrong value because for C1:C36 corresponding to the
non-matching A1:A36, Excel substitutes zero in the IF() array result instead
of FALSE or "empty" (the null string). For example, for row 2, it is
equivalent to =C2. Note that =C2 returns zero when C2 is empty.

Consequently, the array-entered formula in F4 calculates
STDEV(0,0,C3,0,...,0,C9,...) instead of STDEV(C3,C9,...). Of course, the
zero-valued data throws off the std dev computation understandably.

Again, this is a mistake with my off-the-cuff suggestion to simplify the
original formula. Presumably it has nothing to do with your original
problem.

PS: F4 __seems__ to work when not array-entered as an accident of
implementation for the same reason that F3 seems to work. It is a
misleading illusion due to its position relative to the array references.
Always array-enter formulas that include expressions of the form
STDEV(IF((A1:A36=E4),C1:C36)).
 
J

joeu2004

Errata.... I wrote regarding the normal-entered array formula in F3:
It is treated as what I call an array-indexed formula:
the row number of the formula determines the values used
from array references in the formula. [....]
it might explain why some array formulas "work" (i.e.
return a value, albeit often bogus) in one context, but
return an Excel error or the wrong value in another context.

I meant to say: that is why some array formulas that
__are_not_array-entered__ seem to "work" in one context, but not in another.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top