Visualizing Hydraulic Fracturing with Microsoft Power BI and FracFocus.org
This report was created with the the May 2017 release of Microsoft Power BI and the April 2017 data provided on FracFocus.org. This is not a screenshot. It is a fully interactive report ready for viewing in a mobile device or any modern browser. You can interact with it below, or view it on PowerBI.com. Choose an operator from the list to see the wells fraced by that operator since 2015 displayed on the map.
Following is a step-by-step tutorial so you can create this visualization or your own from FracFocus.org data. At the end of the blog post, I evaluate the current strengths and limitations of Microsoft Power BI today and offer some takeaways.
(Note, this report contains only those frac jobs submitted to the FracFocus.org registry as of April 2017.)
Microsoft Power BI
Microsoft announced Power BI in 2015, and has iterated rapidly on the technology and the licensing models. Microsoft is worth paying attention to in this space. First of all they are already the vendor of the worlds most established and widely used BI tool: Excel. Additionally, no other company has the combination of enterprise software support, a proprietary cloud offering, a mature and scalable data platform like SQL Server, and a broad developer community to power an integrated plaform for business analytics.
Microsoft has also introduced Power BI at a much lower introductory price point while keeping a cloud- and mobile-first mindset. There’s no question they have a lot of catching up to do with incumbents like Tableau and Tibco Spotfire (see “Needs Improvement” below). But for those needing a simple, low-cost point of entry into visual analytics with a reputable vendor, it makes sense to at least look at Power BI.
FracFocus.org is the national hydraulic fracturing chemical registry. It is administered by the Ground Water Protection Council and Interstate Oil and Gas Compact Commission. Most hydrocarbon-producing states now require the disclosure of chemicals used in hydraulic fracturing to the FracFocus.org database. The data is made available as a database back-up file monthly.
This database contains data about the frac jobs performed, and the specific chemicals used in that job. While the chemical data is interesting, everything in this report can be created from the “RegistryUpload” table that contains information about frac jobs.
Step 1 – What you need
- Download Microsoft SQL Server Express (free)
- Download Power BI Desktop (free)
- Create a free account at https://powerbi.microsoft.com/en-us/get-started/
- Download the FracFocus .bak file
Step 2 – Restore database
After installing SQL Server Express, open SQL Server Management Studio. Restore the backup file by right-clicking on “Databases” and selecting “Restore”
- Select “Device”
- Find the .bak file download location
- Choose a name for the restored database
- Click “OK”
Step 3 – Connecting to the data with Power BI
Open Power BI desktop to connect your SQL Server Express instance and query the “RegistryUpload” table. Click “Get Data” in the ribbon. Then choose “Database”, “SQL Server database”.
Then enter the name of your SQL Server Express instance and the database to which you restored the FracFocus .bak file.
Sidebar: Import vs. DirectQuery
You are presented with a choice between Import and DirectQuery. Import will extract the data from the SQL Server to build an in-memory data model. DirectQuery will leave the data in place and issue SQL queries to return only the data needed to power the visualizations.
I originally chose DirectQuery, and it worked very well when connecting live to the SQL Server. When publishing to PowerBI.com, however, I ran into two limitations.
- SQL Express running on a laptop VM does not make a reliable source for a web-based dashboard.
- Using DirectQuery from a published data source is a Power BI Pro feature. I wanted to be sure that even Power BI Free users could access this content.
This led me to changed to Import rather than DirectQuery. Since our RegistryUpload table only has 124k rows, data volume limitations were not a problem.
Important note – Although I was able to change the connection settings from DirectQuery to Import (“Edit Queries” -> “Data source settings” -> “Change connection”), I continued to see this message on PowerBI.com, even after changing the connection settings and publishing.
Somehow, Power BI still believed that file had DirectQuery content. To truly switch my connection to Import, I had to start over with a brand new Power BI Desktop file. Therefore, choose your connections carefully, considering data volumes, licensing considerations, and report usage at design time! Starting over on an advanced file could create hours of rework and frustration.
Step 4 – Enhancing the Data Model
I added JobStartYear and JobStartMonth to Power BI’s data model with the following steps:
- Click “Edit Queries”
- Open the “RegistryUpload” table
- Right-click on the JobStartDate column header
- Select “Duplicate Column”
- Find the newly created column on the far right
- Right-click on the column header
- Select “Transform” -> “Year” -> “Year”
- Rename the column to “JobStartYear”
I followed a similar process for the month of the start date to create the JobStartMonth column.
Step 5 – Creating the Frac Map Page
With the data acquired, you can use Power BI to start adding visualizations to a page.
Since the data included latitude and longitude information, the visualization process was simple. I chose the “Map” visualization (don’t confuse with “Filled Map”), then dropped the lat and longs in place.
Map default visualization
Map visualization settings
Because of bogus date data in JobStartDate, I chose to filter my map to only contains years since 2015 to 2017. Limiting the data in this way was a quick workaround, but of course, the data could be cleansed if completeness was essential.
Secondly, it’s important to note a current limitation with Power BI maps – they can only show 3,500 points at a time. There may also be a way to change this by building a custom visual but it comes with performance trade-offs. Keep in mind when using a map that if your data points exceed this maximum, points may be hidden from your map with no message of any kind.
One final map note: some of the rows use different projections (NAD83, NAD27, and WSG84). Power BI’s standard is WSG84. I did not standardize those for this demonstration, but converting them into the same projection would be part of the data cleansing effort. An online tool like this one should make that easy. There’s also a nice tutorial on DataVeld on how to do this with free desktop tools like qGIS.
The other visualizations were straightforward. In most cases I use “Count of pKey” to get a count of all the jobs found in the table. Here are the types of controls used to create each one.
Step 6 – Creating the Well Types Page
The Well Types page contains a basic stacked bar and stacked column chart. The interesting part was separating out “IndianWell” and “FederalWell” jobs from “normal” jobs, and showing them all on the same chart.
The “IndianWell” and “FederalWell” columns are stored as type bit fields in the SQL Server, which is common for true/false values. In order to add them up, we needed to convert them into a numeric 1 or 0 that Power BI can sum. To do that, I created two calculated columns like this one:
Even with this, I had to explicitly inform Power BI that I wanted to use this as a whole number in the Edit Queries screen:
Now we can add up how many Indian and federal wells are in the dataset. However, we still needed an identifier for those wells which were neither. So I created the “NotFederalNotIndian” flag:
This formula syntax gave me my biggest surprise of the project. I assumed that the formula syntax for Power BI would always follow DAX, which was modeled to be like Excel. Even the previous formula in this artcile used the IF(
<false>) format then However, it seems that Microsoft is also using a syntax called “Power Query M”, which more closely resembles F#.
One one hand, this opens exciting possibilities for extension in Power BI using a robust and established CLR language. On the other, it could also lead to tremendous confusion among semi-technical analysts in the business who know Excel but aren’t necessarily programmers.
With those new columns, I summed up the three values to produce the bar and column charts below. I recommend selecting a bar on the right to see the corresponding values light up in the column chart on the left.
Power BI is young and evolving, but it has several things going for it:
- Price – I created this solution for $0. If your data is under 1GB, you also can get started for free and share to unlimited other users for free as well.
- Attractive – Microsoft has done an excellent job making attractive visuals, and thinking web-first in their designs.
- Experience – Anyone who has been using the PowerPivot or PowerQuery add-ons for Excel will find many familiar features inside of Power BI.
- Cloud-Power – No other company has their own cloud to put behind their data visualization software, so everything about the product can be optimized to run on cloud resources. The Pro level offers cheap access to shared cloud resources, while Power BI Premium provides less-cheap dedicated cloud hardware.
- On-Premise Options – Starting in June 2017, Power BI Premium and Power BI Report Server will offer customers the opportunity to bring all of Power BI capabilities in-house.
- Tool fragmentation – I’m a professional technologist, this is all I do. But even I feel overwhelmed by the Power BI terminology. All of the following terms have a distinct, specific meaning: reports, dashboards, apps, app workspaces, workspaces, paginated reports, mobile reports, and workbooks. They need to simplify terms, or we need a printed glossary!
- Cloud confusion – The cloud brings power but also obscurity. The licensing for Power BI Pro is extremely detailed and filled with caveats. Premium is brand-new and Microsoft hasn’t finalized the licensing options yet. There are also fuzzy lines and overlaps between the Free, Pro, and Premium tiers.
- Community uptake – Many data visualization tools form around them a zealous (um rabid?) community of power users. Some seem to be more passionate about their tool than Howard Dean was about the fifty nifty United States. Power BI may be driven more by top-down decisions based on cost than by analyst-led campaigns. Will people get this excited about a less-powerful but more-affordable tool?
- Impressive things can be achieved with Power BI even today, and even at the free tier.
- Microsoft is far from done improving Power BI, so stay tuned! Follow the Power BI Blog here.
- Competition in visual analytics is intensifying. Generally, this is good for customers and will probably lower TCO. However, it will also drive up search costs and companies may incur switching costs if they don’t (or can’t) standardize on the right platform.
- We all have a lot to learn. Having more options creates opportunities, but it also generates a lot of homework.
Thank you for reading, and please find me on LinkedIn where I post all the latest about the changing world of analytics.
As time of writing, I am not a Microsoft employee or shareholder. I have received no compensation for this article. I am a free-tier member of the Microsoft Partner Network, but have no other professional affiliation with Microsoft.