Caching XBRL XSD Schemas in Gepsio

Jul 5, 2012 at 7:22 PM
Edited Jul 5, 2012 at 7:35 PM

I've been using Gepsio for about a month now. Thank you very much for this open source effort. However, I found one thing is missing: capability to cache the XBRL XSD schemas locally. It hurts the bottom line in the application where I used Gepsio. I managed to add that caching today. This is how to do it: 

1. Goto the project (JeffFerguson.Gepsio project) and change the "Target Framework" to ".NET Framework 4"

2. Add the following code to XbrlSchema.cs file: 

 

using System.Net.Cache; // This line added for caching support

//...

public class XbrlSchema
    {
//...
private XmlUrlResolver thisXmlUrlResolver;

//...
 internal XbrlSchema(XbrlFragment ContainingXbrlFragment, string SchemaFilename, string BaseDirectory)
        {
            thisContainingXbrlFragment = ContainingXbrlFragment;
            this.Path = GetFullSchemaPath(SchemaFilename, BaseDirectory);

			try
			{
				var schemaReader = XmlTextReader.Create(this.Path);
				thisXmlSchema = XmlSchema.Read(schemaReader, null);
				thisXmlSchemaSet = new XmlSchemaSet();

///---- START caching with XmlUrlResolver
                thisXmlUrlResolver = new XmlUrlResolver();
                thisXmlUrlResolver.CachePolicy = new RequestCachePolicy(RequestCacheLevel.CacheIfAvailable);
                thisXmlSchemaSet.XmlResolver = thisXmlUrlResolver;
///----- END caching with XmlUrlResolver 

                thisXmlSchemaSet.Add(thisXmlSchema); 
				thisXmlSchemaSet.Compile();

//...

 

 

Well, actually I found the call to the XmlSchemaSet object (thisXmlSchemaSet) to be particularly the biggest bottleneck with Visual Studio profiler when dealing with lots of XBRL instance files because XmlSchemaSet always download the XSD dependencies from xbrl.org.

A side note: the CachePolicy member of the XmlUrlResolver class is not available in .NET Framework version < 4.0 . That's the reason why you have to switch the project to .NET Framework 4.0.

Hopefully this is useful  for others. 

Coordinator
Jul 9, 2012 at 12:44 PM

Thank you very much for the suggestion! I, too, have been thinking about XBRL Schema processing optimizations, and this looks like a great idea. Let me do some additional testing with the conformance suite, and, if all goes well, perhaps I can fold this into the main code.

Thank you, also, for working with Gepsio. I'd enjoy hearing more about the application for which you're using Gepsio ... feel free to send me a message (click on my user name and click the "Contact" link on my profile page) and explain how you're using Gepsio. I'd love to hear about it!

Jul 9, 2012 at 1:51 PM
Edited Jul 9, 2012 at 1:52 PM

Hi Jeff, 

Probably the better option for the RequestCachePolicy object initialization is to use RequestCachePolicy.Revalidate as the initialization parameter. Because, it ensures that the cached copy would be always up to date. .NET carried-out timestamp check on the cached copy against the source on the web. This is what MSDN (http://msdn.microsoft.com/en-us/library/system.net.cache.requestcachelevel.aspx) said: 

Revalidate: Satisfies a request by using the cached copy of the resource if the timestamp is the same as the timestamp of the resource on the server; otherwise, the resource is downloaded from the server, presented to the caller, and stored in the cache.A copy of a resource is only added to the cache if the response stream for the resource is retrieved and read to the end of the stream. So subsequent requests for the same resource would use a cached copy if the timestamp for the cached resource is the same as the timestamp of the resource on the server.

 

while the previously suggested RequestCachePolicy.CacheIfAvailable:

CacheIfAvailable: Satisfies a request for a resource from the cache, if the resource is available; otherwise, sends a request for a resource to the server. If the requested item is available in any cache between the client and the server, the request might be satisfied by the intermediate cache.A copy of a requested resource is only added to the cache if the response stream for the resource is retrieved and read to the end of the stream. So subsequent requests for the same resource would use a cached copy.

Jul 10, 2012 at 11:25 PM
Edited Jul 10, 2012 at 11:27 PM

Hi Jeff/Pinczakko,

I too want cache all imported schema from custom taxonomy schema file.

I am using Customizing the XmlUrlResolver Class article from: http://msdn.microsoft.com/en-us/library/bb669135.aspx

to intercept calls to imported schema locations by overriding ResolveUri method and it only gets called once for the calling extended taxonomy schema and not for any imports and I cannot cache it at that point and retrieve from cache as well.

My preference is to have a local caching c:\localcache for schemas comming from xbrl.fasb.org, taxonomies.xbrl.us\us-gaap and other SEC web sites.

So I don't know how to intercept calls to schemas indicated in "import" element.

I am working in an XBRL team where we are modeling XBRL proofing and we are redesigning XBRL processing with XDocument/XElement and use Linq to Xml which is much faster than XmlNode processing (using a DDD architectural style and repositories).

I did an extension to Gaspio to do more processing. What was missing was processing of presentation linkbase and I had to connect instance facts and contexts with Presentation Items.

I think Jeff is from the area I am working (Minneapolis) so Jeff maybe we can talk offline. My email is radoslav attt everestkc dottt net

Rad

Coordinator
Jul 10, 2012 at 11:37 PM

I, too, like the idea of a local disk-persisted cache that Gepsio can control. I'd like to see Gepsio provide the option (since some people might not want caching for whatever reason) of a schema cache that can be loaded from and saved to disk, perhaps in isolated storage. I'll be looking at this issue soon, as I know that schema processing is a bottleneck that can be avoided with some caching. Schema caching makes sense ... how often will the US GAAP taxonomy change? (Not often, especially after publication.) How often does Gepsio download it? (Once per instance.)

I have a blog post planned for this issue. When I post it, I will post the blog post address here.

Keep the great discussion going! Discussions like these make Gepsio better. Thank you to all of you!

Coordinator
Jul 10, 2012 at 11:43 PM

Presentation linkbase support is coming --- that's a discussion for a separate thread. Feel free to open up a new discussion for that!

Jul 13, 2012 at 6:47 AM

Hi all, 

After testing the RequestCachePolicy.Revalidate, I found it still generate more network traffic than the RequestCachePolicy.CacheIfAvailable setting.

 

@raca:

Did you switch the Target Framework to .NET 4.0? Because if not, you cannot control the caching behaviour. Only .NET 4 have the support to control caching out of the box.

Jul 13, 2012 at 7:55 AM

Yes, it is .NET 4.0 project. It is that I missed a line and I come to conclusion myself and when I went back and compared your solution I realized
that I missed the fact that XmlSchemaSet could have its XmlResolver. I oversaw this line:
thisXmlSchemaSet.XmlResolver = thisXmlUrlResolver;
So I added this line and it worked. With this I could intercept Uri in my custom class in GetEntry overriden method where I could manipulate Uri and redirect "http:" schema to "file:" schema, ie use File.Open method.

My mistake was that I only used my bustom XmlResolver when creating XmlReader (for XmlSchema creation) which is not being used when processing imports

XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = resolver;

XmlReader reader = XmlReader.Create(
pathToExtensionSchema, settings);

This way we can cache both custom schema and its imports.
Usually custom schema is in a local working folder and is modified frequently
so there is no need to cache it. However once a filing is done then one can use remote
SEC location to get it using “http:” schema.

Rad

Aug 20, 2013 at 8:33 AM
Hi,

I'm also having performance problems and I think it is due to the external loading of the Schema.

Is there any working example on how to implement the caching?

Best regards,

Jan
Coordinator
Aug 20, 2013 at 12:50 PM
Thanks for the additional report! I will look into this further.

On the "good news" side, I can report that I am updating Gepsio to support .NET 4.5, and, as noted above, I can look into using the RequestCachePolicy.CacheIfAvailable setting to get these cached. On the "not as good news" side, I am also updating Gepsio to support Windows RT/Windows Store and Windows Phone 8, and XmlSchema doesn't exist on those platforms.

So, I will have two options:
  • use RequestCachePolicy.CacheIfAvailable only for .NET 4.5 support, and do nothing for Windows RT/Windows Store and Windows Phone 8
  • find a platform-agnostic way to cache schemas
I'll look into both options.

Thanks for trying Gepsio,
Jeff Ferguson
Aug 20, 2013 at 7:19 PM
@japel
Here is how i do the caching. Notice that I used XmlCachingResolver custom class shown bellow.

I hope this will help you.
Rad
            XmlCachingResolver resolver = new XmlCachingResolver(enableHttpCaching: true);
            //resolver.Credentials = CredentialCache.DefaultCredentials;

            XmlReaderSettings settings = new XmlReaderSettings();
            // Set the reader settings object to use the resolver.
            settings.XmlResolver = resolver;
            settings.CloseInput = true;
            // Create the reader.
            XmlReader reader = XmlReader.Create(_inputUri, settings);

            XmlSchema xmlSchema = XmlSchema.Read(reader, null);
            reader.Close();
            _xmlSchemaSet = new XmlSchemaSet();
            _xmlSchemaSet.XmlResolver = resolver;
            _xmlSchemaSet.Add(xmlSchema);
            _xmlSchemaSet.Compile();
            _targetNamespace = xmlSchema.TargetNamespace;

            Dictionary<string, string> documentUriNamespaceMapping = new Dictionary<string, string>();
            new ArrayList(_xmlSchemaSet.Schemas()).ToArray().Cast<XmlSchema>()
                                                            .ToList()
                                                            .ForEach((XmlSchema schema) =>
                                                                         {
                                                                            string sourceUri = schema.SourceUri;
                                                                            if (_targetNamespace == schema.TargetNamespace)
                                                                            {
                                                                                sourceUri = Path.GetFileName(sourceUri);
                                                                            }

                                                                            documentUriNamespaceMapping[sourceUri] = schema.TargetNamespace;
                                                                        }
                                                                    );



//http://msdn.microsoft.com/en-us/library/bb669135.aspx
//This document describes how to build a custom class for resolving XML data by extending the XmlUrlResolver class. 
//The custom class in this example resolves an XML data resource from the default cache.
public class XmlCachingResolver : XmlUrlResolver
{
    bool enableHttpCaching;
    ICredentials credentials = null;

    //resolve resources from cache (if possible) when enableHttpCaching is set to true
    //resolve resources from source when enableHttpcaching is set to false 
    public XmlCachingResolver(bool enableHttpCaching)
    {
        this.enableHttpCaching = enableHttpCaching;
    }

    public override ICredentials Credentials
    {
        set
        {
            credentials = value;
            base.Credentials = value;
        }
    }

    public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
    {
        if (absoluteUri == null)
        {
            throw new ArgumentNullException("absoluteUri");
        }
        //resolve resources from cache (if possible)
        if (absoluteUri.Scheme == "http" && enableHttpCaching && (ofObjectToReturn == null || ofObjectToReturn == typeof(Stream)))
        {
            WebRequest webReq = WebRequest.Create(absoluteUri);

            webReq.CachePolicy = new HttpRequestCachePolicy(HttpRequestCacheLevel.CacheIfAvailable);
            if (credentials != null)
            {
                webReq.Credentials = credentials;
            }
            WebResponse resp = webReq.GetResponse();
            return resp.GetResponseStream();
        }
        //otherwise use the default behavior of the XmlUrlResolver class (resolve resources from source)
        else
        {
            return base.GetEntity(absoluteUri, role, ofObjectToReturn);
        }
    }
}
}
Coordinator
Aug 20, 2013 at 7:37 PM
All good suggestions! Thank you!

I will do my best to get this sort of thing into the next code drop. As I mentioned before, the next code drop will introduce support for not only .NET 4.5, but also for Windows RT/Windows Store and Windows Phone 8. Since XML schema is not available on all of those platforms (and I have no idea why) I will need to come up with a platform neutral way to cache the schemas. You're all absolutely right that caching is needed, so thank you for the reminders!

Thanks for using Gepsio,
Jeff Ferguson
Mar 18, 2014 at 8:18 AM
+1 on the deep validation and caching.