This is a guest blog post from our partner, Mozart Data. To learn more about the partnership, visit: Mozart Data + Explo
Production databases and row-based options
When the goal is simply to store large amounts of data and access it for an individual account or file — like a single customer — row-based data warehouses like PostgreSQL and MySQL are great. Data is organized in rows, based on a defining feature like account or identification number. They’re incredibly fast at returning results on an individual row in the warehouse.
That speed is what makes them so useful for personalized web experiences — when a user logs into a site, their credentials pull their row from storage and all of their personalized settings load automatically.
Production databases are typically built using this row-based or row-oriented structure because they’re so efficient at performing a single task, like inserting, updating, or deleting information from one specific record.
While these databases are very good at many things that your company likely needs, they have limitations when it comes to analysis.
Columnar data warehouses
The built-in reporting in the various tools and platforms you’re using (ad platforms, CRMs, social media platforms, etc.) are all capable of the basic work of analyzing that specific data. But, things become much more complicated when you need to combine data from multiple sources for more complex (and impactful) analysis — like the kind of analytics needed for valuable customer-facing dashboards.
Once that’s the goal, you’re ready for a columnar data warehouse. These warehouses are also called column-based or column-oriented databases. When people just say “data warehouse”, they’re also typically referring to this option.
Instead of organizing information by rows, information is organized and pulled by column. How is that actually any different? For a simple example, consider the following:
A company collects basic demographic information — first name, last name, birthday, zip code — from people who make a purchase online. All of that information exists in one row, but is divided by columns. If that company wants to know what geographic locations are buying more of their product (say for a targeted promotional campaign), what they really need is zip code. That’s the only column that matters for this analysis.
Columnar data warehouses are great at rapidly pulling that specific information. Running that query — or any query focused on some, but not all information in a table — on a row-based warehouse is much slower, because all of the information in the row still has to be queried. If you’re looking to create customer-facing dashboards that update frequently with timely data, you absolutely need this capability.
Faster analysis doesn’t just save your team time and energy — it directly impacts your costs, substantially. With any large data storage solution, some of the cost comes from storage, but this is typically a small portion of actual costs. Computation costs are typically much higher.
Computation costs are the costs incurred by running queries — pulling data into a new table for analysis, reorganizing data, etc. Slower queries cost more money. For an embedded analytics example, it would be incredibly cost-prohibitive to provide timely analytics services with slow queries.
If for some reason cost isn’t a concern, you should still use a columnar data warehouse for this type of work. Querying your production database directly can slow it down to the point of interfering with function. Queries can also go awry, and a query that edits data in your live production database can have enormous cascading negative effects. Instead, pull the relevant data into your data warehouse and work with it there.
Making your data work for embedded analytics
A columnar data warehouse allows you to quickly and reliably perform the complex analysis needed for embedded analytics (like customer-facing dashboards) at a more effective cost than other solutions.
Once you’ve chosen a data warehouse, you’re opening up new opportunities for your business. The ability to combine multiple data sources for analysis gives customer-facing teams the ability to better understand the users they’re interacting with. Product-focused teams can gain insight on how products are actually used to tackle challenges like feature adoption.
If you’re interested in learning more about getting your data in shape, or think you’re ready to set up a data warehouse and the other tools you’ll need to make the most of your data, learn about the partnership with Mozart Data + Explo and request a demo.
Other Embedded Analytics Resources:
- Benefits of Embedded Analytics
- Embedded Analytics Uses Cases
- Embedded Analytics Features
- Embedded Analytics Applications
- Embedded Analytics Solution