More Newb Questions: About XBRL Financial Reporting

Nov 6, 2012 at 5:17 PM
Edited Nov 6, 2012 at 5:18 PM

I thought I'd document my newb experiences here so other newbs might be spared the pain.

By late yesterday, I was lost, but not in any regard to Gepsio.  I tried the example code documented in the "Using Gepsio from C#" blog with the URI included in the blog notes (just after the code), and the sample code worked.  Next, I wanted to try it on some other data, but I didn't know how to construct a URI from the data presented by EDGAR after as the result of a search.  So, I figured I try to download the principal XML file (the first one) and start there.  It failed with a file not found except -- missing XSD file.  So, I started to download the one in the list following, labeled as ".xsd," but I stopped short when I saw that it had a .xml file extension.  Totally confused, I decided to back up for an orientation to EDGAR.

I spent the whole morning today on EDGAR and other web sites; I found nothing helpful.  So, I started experimenting with the Gepsio sample program.  I went back and downloaded the file labeled as .xsd (but with a file extension of .xml) renamed with a file extension of .xsd.  Gepsio got farther.  One by one, I found that it (Gepsio) was happy only after downloading all six files listed as XBRL constituents, and it ran to completion.  I moved the files to a folder all their own; Gepsio was happy, and I was happy.

Conclusion: Gepsio's XbrlDocument.Load(filename) will accept either a URI or local file name as an argument.

Question No. 1 -- Is that the extent of the possibilities?  (I can't imagine what other possibilities might exist, but I thought I'd ask.)

Question No. 2 -- What is the scheme EDGAR uses to construct a URI from known quantities?  It seems as though a URI might be composed from the EDGAR CIK, accession no. without hyphens, accession no. with hyphens, and a standard suffix of index.htm.  Please point me to where I can find out how to do this.

Nov 6, 2012 at 6:37 PM

Thank you for your observations! I will write a blog post to explain more information, so that you can have some background.

What is the URL of the EDGAR document that you were trying to load through Gepsio? I will use that document as an example in the blog post.

Thank you for considering Gepsio!

Nov 6, 2012 at 7:38 PM

Great!  Here's the URL:  Hah!  Now that I'm looking at the URL of the EDGAR page I was looking and from which I downloaded the individual files, I see that exactly that is the URL that XbrlDocument.Load() should use for the argument string.  I see that IBM's CIK is the first directory following data.  The next directory is the accession number without hyphens, and the one following is the accession number with hyphens.  The whole business ends with -index.htm.  Is that pattern universal?  Thanks for making the sun shine.

Nov 8, 2012 at 5:21 PM

What I learned from my last couple of work sessions with an IBM 10-Q from EDGAR:

  1. Suppose I'm interested in Revenues.  It is not sufficient to get the Revenues Item and retrieve its Value.  In my 10-Q, there are exactly 16 Revenues Items, and if I had taken the first one found, it would have been the year-to-date figure for last year rather than this year.  You must also examine the ContextRefName to see if its the context you are interested in.  Apparently, there is a convention employed (by IBM at least) where the ContextRefName string has three parts: (1) a "D" or "I" character to code for "Duration" or "Instant" for the time; (2) a date, as YYYY-MM-DD; and (3) a descriptive suffix.  I don't know if this is used universally, hence, it's probably a better practice to examine properties in the ContextRef class and query start and stop dates for periods and instant dates for instants.
  2. Some of the Item names are ridiculous.  For example, the name for "Net Income Before Taxes" is: IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest.  I know, I know.  It's undoubtedly a US-GAAP standard, and I shouldn't blame it on XBRL.

No new questions today.

Nov 9, 2012 at 7:31 PM
Edited Nov 9, 2012 at 7:33 PM

Well, I'm stalled, and I do have a question today because I need someone to steer me in the right direction.  Here's my problem: Referring back to my previous post, I searched for all Revenues Items in the 10-Q filing.  I ignored all such Items that didn't line up with the period of interest, 2012Q3.  Still I found four Revenue Items with different contexts, including the one I wanted.  But, how to programmatically choose among the four?  I thought perhaps there was a US-GAAP standard.  Then, peeking inside a Microsoft filing, I found that they used a completely different context-naming scheme from IBM.  Inspecting other members of the Context class, I saw nothing that would help me pick the context I wanted from the remaining set of four.

In my application, I need only a handful or two of items generally found on a Consolidated Statement of Earning and a Consolidated Statement of Financial Position.  I thought that information might help me choose, but I could find no linkage between that information and contexts.  (I did find those statement names in a list of schemas in the fragment, but I found no links to specific context identifiers.)

Question No. 3 -- How do I sort this out?  (Even a reference to a web page or book would help.)

Nov 14, 2012 at 8:07 PM
Edited Nov 14, 2012 at 8:17 PM

I sought help from the xbrl-public forum and attracted the help of none other than Charles Hoffman, the "Father of XBRL."  But, all has come to naught.  Here is a copy of my posting on that forum within the hour:

After spending the morning perusing the VBA code Charles referenced, I concluded that the code was doing exactly the same thing that I was via Gepsio with respect to data extraction and that the VBA code would yield exactly the same result with IBM context conventions.  If it took the first of multiple selections for Revenues, it would be fortuitous because it happens that the first is the correct one.  But, for other facts, it would not be so lucky.

I thought perhaps I could employ some heuristics to screen the remaining selections, so I decided to check out what other companies were doing.  I started with a third-quarter 10-Q for  Imagine my shock when Amazon had no Revenues fact, nor any of the other facts required by my application.  They use what, to me, are facts with synonomous names, but, I suspect, meaning slightly different things to a CPA.  I probed no further because that told me I would need a "profile" describing the conventions used by each company I was interested in.  I decided not to do that, for each profile would only be good for the length of time until the company decided to change its conventions.  Better to do the job manually from a hard- or screen-copy of the report.  In other words, I decided to abandon XBRL.

In case you're wondering just what my application is.  I learned, in the mid-1960s of a technique taught in IBM's Executive Development School for assessing return-on-investment and planning for it.  The technique was invented by Louis R. Mobley and is founded on the Dupont Equations.  Initially, I plotted IBM's ROI position manually on a piece of graph paper.  Then I wrote an APL program to plot it.  A few years ago, I converted it to Excel VBA.  My intent was to exploit XBRL as a data source, but that has come to naught.  The output of the Excel VBA code is exhibited here:

I want to thank Charles and Paul for trying to help me and to emphasize that the problem is not due to XBRL or Gepsio.  Rather, I lay the blame on the lack of common standards for filings.  So, what's new?

So, I want to thank Jeffrey for his help.  I find no fault with Gepsio.  I found that it worked well and as advertised.  I don't forsee doing any more work with XBRL.

Nov 14, 2012 at 8:13 PM

Thank you for the feedback!

I apologize for the delay in my responses. I have been on vacation for a week. I see that you got your questions answered, in one way or another. I think that, as I read through what you had to say, the issues are with the inconsistencies built into the XBRL taxonomies rather than with Gepsio itself.

Thank you for considering Gepsio!

Nov 14, 2012 at 8:27 PM

I was beginning to feel it was hopeless when no one here or at xbrl-public had anything to contribute.  That was at the point where, after screening for the correct time period (context), I still retrieved four facts for Revenues.  But, knowing that I used Revenues for IBM, when I found out that used a separate and different term in lieu of Revenues, I knew that my intent was doomed.  Maybe I'm naive, but I can't see how the SEC can succeed with its intended use of XBRL without seriously better filing standards.

Nov 14, 2012 at 8:30 PM

I'm a software developer by trade, and not an accountant, so I am most likely speaking out of turn ... but it seems as though the XBRL technical syntax has built a design language but people haven't yet agreed on the consistent usage of the language across domains.

Nov 14, 2012 at 8:56 PM

Charles Hoffman just posted this, which doesn't agree with my assessment.  But, he doesn't get down to earth enough for me.  Does this meaning anything to you?

So you may be misinterpreting my code. I don't grab the first fact, I can actually grab the CORRECT fact every time from any filing; and in fact have done so.

STEP 1: Grab the value of the concept dei:DocumentPeriodEndDate. There is ALWAYS one concept, and in 99% of filings the filer has this CORRECT (meaning that 1% of the time the filer has this WRONG, this COULD be caught by SEC validation and COULD be CORRECT 100% of the time). This value is the most current balance sheet date.

STEP 2: Grab the concept "net cash flow" (usually us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease or us-gaap:CashPeriodIncreaseDecrease or a similar concept, you can find that 98% of the time). That gives you the candidate PERIODS for the income statement and cash flow statement. Get the contexts which have the endDate which matches the balance sheet. Get all of them. In what I do, I want the most current period YEAR-TO-DATE numbers. So, I look for the longest period by looking at the startDate relative to the endDate; the LONGEST one is the YTD numbers. (As a side note, you can use this method to then figure out any other periods you have to work with.)

Once I know the balance sheet data and the period of the most current YTD income statement, I can reliably grab key facts from filings.

STEP 3: Specify the fact.

STEP 4: (OPTIONAL IF DIMENSIONS EXIST) You have to fiddle around with the dimensions assigned, they tend to be inconsistent, but you CAN write an algorithm to find the correct fact. Some dimensions/[Axis] are explicit, some are implied, sometimes goofy dimensions/[Axis] are added; but the set of stuff you have to ferret through is not huge. It can be done.

STEP 5: Because not all financial reports have one root reporting entity (the vast majority do); you may need to know WHICH of a set of multiple reporting entities/legal entities you want to grab. But, this is a relatively small number of filings.

BOTTOM LINE: You can figure this out relatively easily and I am not even a programmer. It is a nusance and beter direction from the SEC would VASTLY improve this.

From there, all sorts of tricks exist to figure out all kinds of useful information. It is just that there are so many annoying things which are being done which need to be worked around.



He speaks of "concepts," I understand what that is conceptually, having read about it in some of his writings.  But I am unable to relate it to any thing I have seen in an XBRL filing or in Gepsio's document object model.

Nov 14, 2012 at 10:28 PM

Then there's this:


Looks like Charlie has provided you a sound approach.  I’ve not tried it myself but certainly looks well thought out.

At the same time I’m curious how you could get better results using a “hard- or screen-copy of the report”.  Sounds like screen scraping which is dependent on captions that are used inconsistently and often have different meaning. 

And the captions Amazon used are no more informative than the XBRL tagging – which has been tagged the same since there first XBRL filing, i.e., if a profile was necessary it would have been unchanged since July 2009, at least as it relates to this sample.  Our observation is that filers don’t frequently change elements used and when they do it is obvious to the user.

The approach I would have used here is to walk the calculation tree that Amazon provides with their filing to identify the likely revenue element, which is us-gaap:SalesRevenueNet.  In this case, it is pretty straight forward.  I’ve seen some pretty ugly tagging and this doesn’t qualify.  This is the calculation ELR for Amazon’s statement of operations.

Alternatively, I would walk the US GAAP taxonomy calculation tree to find the best fit absent the use of the revenue element you are looking for.  Start with revenue and then stop when you find an element match among its children.  This approach may be less reliable today and we (the FASB) are working to create a unified calculation hierarchy that will better support this exercise.  We hope to publish it soon.  Some have created their own calculation hierarchies to meet their specific requirements but this does work.  We have used it with good success.

J. Louis Matherne
Chief of Taxonomy Development
Financial Accounting Standards Board | 203-956-5229 |

My point would be, if they got it right, this kind of tangle-unraveling wouldn't be necessary.

If you find this interesting, I'll keep updating; if you'd just as soon ignore it, I can too.

Nov 15, 2012 at 4:14 AM

If it's all the same to you, I'd love to continue the discussion here. I am coming to XBRL from the point of view of a software developer. I need to know more about the actual business cases and problems that arise in its use. Knowing those issues helps makes things like Gepsio better.

In an earlier comment, you asked:

He speaks of "concepts," I understand what that is conceptually, having read about it in some of his writings.  But I am unable to relate it to any thing I have seen in an XBRL filing or in Gepsio's document object model.

Gepsio does indeed have the notion of a Concept object. I will see what I can do about writing up a small sample that illustrates its use. In short, an Item object contains a property called ContextRef, which references a Concept object that describes the concept for the item.

Nov 15, 2012 at 11:21 AM

I'll be glad to continue the discussion and experiment.

What is the name of Gepsio's concept object?  I specifically looked for it in the Help file, even searched for it.

Nov 15, 2012 at 12:45 PM

I am putting together a blog post describing how to read context information from facts through Gepsio. I will also show a code sample in C# (and perhaps Powershell). I will post a URL to the article to this thread when the article is ready.

Nov 15, 2012 at 12:48 PM

The code will look something like this:

foreach(var currentFact in currentFragment)
    if(currentFact is Item)
        // This fact is a single-value item.
        var currentItem = currentFact as Item;
        var contextForCurrentItem = currentItem.ContextRef;
        // examine properties of contextForCurrentItem
    else if(currentFact is Tuple)
        // This fact is a multi-value tuple.
        var currentTuple = currentFact as Tuple;

I wrote that from memory -- I could be wrong -- but it will be something like that.

Nov 16, 2012 at 11:53 AM


That's essentially what I'm doing.  I've verified that I am dealing with an Item and not a Tuple.  After the cast, I examine the ContextRef and ignore any duration or instant that doesn't match what I'm looking for, and I continue the search.  In the dataset I'm using, after this screening, four Revenues facts remain that match the duration (for Revenues) of interest.  So, I anxiously await your introduction of "concept."

According to Hoffman (in Dummies, p 19), "concept" (in the context of XBRL) means a business term like "Net Income (Loss)."  My problem is that this is a definition from a 50,000 foot altitude.  (I need a 3,000 foot view.)  Is a concept the same as a fact? 

The standard ( suggests a link back to the taxonomy schema that defines the concept.  But I don't see any such link in either Gepsio or the raw XML.  But, maybe I'm just lost. 

I've got a hunch that the four different Revenues facts, each with its own unique context ID, deals with four different parts of the filing.  I know the one I want is the one associated with the "Consolidated Statement of Income."  But I have no idea how to determine that linkage.

Nov 16, 2012 at 11:56 AM


Are you still working with the instance document you mentioned before? I want to make sure that I am using the same document you're using so I can answer your question properly.

Nov 16, 2012 at 12:14 PM

Just to be sure, here ‘tis.

Yep. That looks the same as your URL.


From: JeffreyFerguson [email removed]
Sent: Friday, November 16, 2012 7:57 AM
Subject: Re: More Newb Questions: About XBRL Financial Reporting [gepsio:402125]

From: JeffreyFerguson


Are you still working with the instance document you mentioned before? I want to make sure that I am using the same document you're using so I can answer your question properly.

Nov 16, 2012 at 1:04 PM

P.S., I forgot to ask: Charlie and Louis suggested chasing the calculation tree to distinguish one item from another. I’m capable of tracing trees, but I’m in the dark about what Gepsio class or classes will get me started. Do you have a suggestion?

Nov 22, 2012 at 3:31 PM

I think I've run out of things to try.  I spent the morning running the code I've got peeking at this and that, looking for opportunities to uncover the hierarchy represented in the filing.  I keep reading that that is in the schemas.  However, every reference I've seen in a specific fact references back to the whole blessed schema and not its components.  So that effort was fruitless.

I resorted to manually searching for all of the multiple-matching facts of interest by value in the human-readable htm file.  What I found, in most cases, the unwanted contexts are associated with "notes" in the htm file.  Just to make things interesting, there was an exception: For StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest, all of the unwanted contexts were members of a running calculation on a principal section of the filing.  In this single case, tracing the calculation chain might help.  But I still have no clue about how to do that.