Reporting on InfoPath Documents

B

Bill

I'm inheriting a SharePoint-based system that has thousands of InfoPath
documents scattered across hundreds of WSS sites. Business users are now
starting to complain that they want to be able to do some ad-hoc reports on
the data contained in all of these documents. SharePoint essentially stores
these documents in a big blob of text, far from being an efficient structure
to query. What approaches have people used to report on this data? I can
think of a couple ways:

1. Promote all of the fields in the document so that SharePoint will store
them in them separately in the database. Then I can do some ETL to load this
data into a data warehouse that users can query.

2. Add logic to the InfoPath forms to write the data to a normalized
database in addition to saving the file. This way I'll have my data in an
easy format to extract to a DW.

3. Keep doing things the way I am now but create a custom ETL process that
parses the XML documents out of the SharePoint database and loads it into a
DW.

Any thoughts? Thanks in advance.
 
M

[MSFT] AlexWein

Depending on the complexity of reporting you're looking for, you may want to
go with either of the routes you mentioned:
1) Simplest scenarios - promote a few properties. This will help you get to
the flat fields and do some aggregation.
2) Moderate scenarios - save stuff necessary for reporting into a relational
database, and do reporting from there. More complex, requires writing web
services.
3) Very complex scenarios - save entire XML forms as blobs to SQL server
2005 instead of SharePoint. SQL Server 2005 supports XML data type, which
allows you to go digging through the data using XQuery (very powerful
schema-aware search and reporting).
 
B

Bill

Thank you Alex. I'll probably go with the first approach, but I'm interested
in learning more about the third approach. What's a good way to get SQL
Server 2005 to store these XML documents as an XML data type with the least
amount of effort? Should I just take the binary data in the SharePoint tables
and move it into another table with an XML data type field? I guess I could
set up a trigger to do the work for me automatically. Is anyone doing this
now?

[MSFT] AlexWein said:
Depending on the complexity of reporting you're looking for, you may want to
go with either of the routes you mentioned:
1) Simplest scenarios - promote a few properties. This will help you get to
the flat fields and do some aggregation.
2) Moderate scenarios - save stuff necessary for reporting into a relational
database, and do reporting from there. More complex, requires writing web
services.
3) Very complex scenarios - save entire XML forms as blobs to SQL server
2005 instead of SharePoint. SQL Server 2005 supports XML data type, which
allows you to go digging through the data using XQuery (very powerful
schema-aware search and reporting).

Bill said:
I'm inheriting a SharePoint-based system that has thousands of InfoPath
documents scattered across hundreds of WSS sites. Business users are now
starting to complain that they want to be able to do some ad-hoc reports on
the data contained in all of these documents. SharePoint essentially stores
these documents in a big blob of text, far from being an efficient structure
to query. What approaches have people used to report on this data? I can
think of a couple ways:

1. Promote all of the fields in the document so that SharePoint will store
them in them separately in the database. Then I can do some ETL to load this
data into a data warehouse that users can query.

2. Add logic to the InfoPath forms to write the data to a normalized
database in addition to saving the file. This way I'll have my data in an
easy format to extract to a DW.

3. Keep doing things the way I am now but create a custom ETL process that
parses the XML documents out of the SharePoint database and loads it into a
DW.

Any thoughts? Thanks in advance.
 
A

alex94040

For the third approach - implementation is eally not super-difficult;
you just need to some custom code (in a web service or inside your
form) to store the entire XML of the form in an InfoPath table.

Basically, you need to:
1) Define a schema for your data and create a sample filled out form.
2) Create a SQL table with the following structure:
ID autonumber,
XML myform [note that SQL Server 2005 natively supports XML columns]
3) Save the sample filled out form to the table you just created.
4) Setup some logic to retrieve and submit XML blobs. In the demo that
I built, I'm using a web service with two methods - getForm(formid) and
setForm(xml, formid).
5) Hook up an InfoPath form to the logic you built in step 4. In my
demo, since I'm using a web service, I just started a new form based on
that web service.
6) Configure your form to submit and query for data properly
7) Setup logic to create new records - for me it was just a third web
service method.

This method is very new, and I don't know of anyone presently using
this in a production environment; I have this setup end-to-end in a
demo - we've shown it at a couple of developer conferences this year.
Feel free to follow up with questions!

Also, et me talk a bit more about what you can achieve if you implement
this.
1) No need to shred the XML data into relational structures - this
means that when your schema changes, you don't need to re-create tables
and worry about migrating data to a new relational schema.
2) Go into your forms' data with XQuery. You could search the filled
out forms using a context-aware search - "show me all status reports
that have Joe as the author" (much better than a full-text search for
Joe for obvious reasons)
3) Taking this a step further - imagine doing joins between data you
get from forms using XQuery and your other relational tables. Very,
very powerful.
4) If you're worried about performance of #3 (and you should be :)) -
you could "promote" certain frequently used XPaths from your XMLs into
relational columns, set up indexes on those columns, and do the joins.
This would mean that you need to write a trigger to do the promotion,
though - or have this done as a part of your submit logic.

Thanks,
Alex
 
S

stryc9

Can you provide a link to a demo of this? I am thinking about trying
this out and would like to see it come together in a demo to see if it
will fit.

For the third approach - implementation is eally not super-difficult;
you just need to some custom code (in a web service or inside your
form) to store the entire XML of the form in an InfoPath table.

Basically, you need to:
1) Define a schema for your data and create a sample filled out form.
2) Create a SQL table with the following structure:
ID autonumber,
XML myform [note that SQL Server 2005 natively supports XML columns]
3) Save the sample filled out form to the table you just created.
4) Setup some logic to retrieve and submit XML blobs. In the demo that
I built, I'm using a web service with two methods - getForm(formid) and
setForm(xml, formid).
5) Hook up an InfoPath form to the logic you built in step 4. In my
demo, since I'm using a web service, I just started a new form based on
that web service.
6) Configure your form to submit and query for data properly
7) Setup logic to create new records - for me it was just a third web
service method.

This method is very new, and I don't know of anyone presently using
this in a production environment; I have this setup end-to-end in a
demo - we've shown it at a couple of developer conferences this year.
Feel free to follow up with questions!

Also, et me talk a bit more about what you can achieve if you implement
this.
1) No need to shred the XML data into relational structures - this
means that when your schema changes, you don't need to re-create tables
and worry about migrating data to a new relational schema.
2) Go into your forms' data with XQuery. You could search the filled
out forms using a context-aware search - "show me all status reports
that have Joe as the author" (much better than a full-text search for
Joe for obvious reasons)
3) Taking this a step further - imagine doing joins between data you
get from forms using XQuery and your other relational tables. Very,
very powerful.
4) If you're worried about performance of #3 (and you should be :)) -
you could "promote" certain frequently used XPaths from your XMLs into
relational columns, set up indexes on those columns, and do the joins.
This would mean that you need to write a trigger to do the promotion,
though - or have this done as a part of your submit logic.

Thanks,
Alex

Thank you Alex. I'll probably go with the first approach, but I'm interested
in learning more about the third approach. What's a good way to get SQL
Server 2005 to store these XML documents as an XML data type with the least
amount of effort? Should I just take the binary data in the SharePoint tables
and move it into another table with an XML data type field? I guess I could
set up a trigger to do the work for me automatically. Is anyone doing this
now?
 
Top