The Citus Blog | Citus Data

Fun with SQL: Text and system functions

Written byBy Craig Kerstiens | March 13, 2019Mar 13, 2019

SQL by itself is great and powerful, and Postgres supports a broad array of more modern SQL including things like window functions and common table expressions. But rarely do I write a query where I don't want to tweak or format the data I'm getting back out of the database. Thankfully Postgres has a rich array of functions to help with converting or formatting data. These built-in functions save me from having to do the logic elsewhere or write my own functions, in other words I have to do less work because Postgres has already done it for me which I'm always happy about.

We've covered a set of functions earlier, today we're going to look at some different categories of functions to dive deeper.

Keep reading

Approximation algorithms for your database

Written byBy Craig Kerstiens | February 28, 2019Feb 28, 2019

In an earlier blog post I wrote about how breaking problems down into a MapReduce style approach can give you much better performance. We've seen Citus is orders of magnitude faster than single node databases when we're able to parallelize the workload across all the cores in a cluster. And while count (*) and avg is easy to break into smaller parts I immediately got the question what about count distinct, or the top from a list, or median?

Exact distinct count is admittedly harder to tackle, in a large distributed setup, because it requires a lot of data shuffling between nodes. Count distinct is indeed supported within Citus, but at times can be slow when dealing with especially larger datasets. Median across any moderate to large size dataset can become completely prohibitive for end users. Fortunately for nearly all of these there are approximation algorithms which provide close enough answers and do so with impressive performance characteristics.

Keep reading

Thinking in MapReduce, but with SQL

Written byBy Craig Kerstiens | February 21, 2019Feb 21, 2019

For those considering Citus, if your use case seems like a good fit, we often are willing to spend some time with you to help you get an understanding of the Citus database and what type of performance it can deliver. We commonly do this in a roughly two hour pairing session with one of our engineers. We'll talk through the schema, load up some data, and run some queries. If we have time at the end it is always fun to load up the same data and queries into single node Postgres and see how we compare. After seeing this for years, I still enjoy seeing performance speed ups of 10 and 20x over a single node database, and in cases as high as 100x.

And the best part is it didn't take heavy re-architecting of data pipelines. All it takes is just some data modeling, and parallelization with Citus.

Keep reading

The most useful Postgres extension: pg_stat_statements

Written byBy Craig Kerstiens | February 8, 2019Feb 8, 2019

Extensions are capable of extending, changing, and advancing the behavior of Postgres. How? By hooking into low level Postgres API hooks. The open source Citus database that scales out Postgres horizontally is itself implemented as a PostgreSQL extension, which allows Citus to stay current with Postgres releases without lagging behind like other Postgres forks. I've previously written about the various types of extensions, today though I want to take a deeper look at the most useful Postgres extension: pg_stat_statements.

Keep reading

Microsoft Acquires Citus Data: Creating the World’s Best Postgres Experience Together

Written byBy Umur Cubukcu | January 24, 2019Jan 24, 2019

Today, I’m very excited to announce the next chapter in our company’s journey: Microsoft has acquired Citus Data.

When we founded Citus Data eight years ago, the world was different. Clouds and big data were newfangled. The common perception was that relational databases were, by design, scale up only—limiting their ability to handle cloud scale applications and big data workloads. This brought the rise of Hadoop and all the other NoSQL databases people were creating at the time. At Citus Data, we had a different idea: that we would embrace the relational database, while also extending it to make it horizontally scalable, resilient, and worry-free. That instead of re-implementing the database from scratch, we would build upon PostgreSQL and its open and extensible ecosystem.

Fast forward to 2019 and today’s news: we are thrilled to join a team who deeply understands databases and is keenly focused on meeting customers where they are. Both Citus and Microsoft share a mission of openness, empowering developers, and choice. And we both love PostgreSQL. We are excited about joining forces, and the value that doing so will create: Delivering to our community and our customers the world’s best PostgreSQL experience.

Keep reading

Contributing to Postgres

Written byBy Craig Kerstiens | January 15, 2019Jan 15, 2019

About once a month I get this question: "How do I contribute to Postgres?". PostgreSQL is a great database with a solid code base and for many of us, contributing back to open source is a worthwhile cause. The thing about contributing back to Postgres is you generally don't just jump right in and commit code on day one. So figuring out where to start can be a bit overwhelming. If you're considering getting more involved with Postgres, here's a few tips that you may find helpful.

Keep reading

10 Most Popular Citus Data Blog Posts in 2018, ft. Postgres

Written byBy Claire Giordano | January 13, 2019Jan 13, 2019

Seasons each have a different feel, a different rhythm. Temperature, weather, sunlight, and traditions—they all vary by season. For me, summer usually includes a beach vacation. And winter brings the smell of hot apple cider on the stove, days in the mountains hoping for the next good snowstorm—and New Year’s resolutions. Somehow January is the time to pause and reflect on the accomplishments of the past year, to take stock in what worked, and what didn’t. And of course there are the TOP TEN LISTS.

Spoiler alert, yes, this is a Top 10 list. If you’re a regular on the Citus Data blog, you know our Citus database engineers love PostgreSQL. And one of the open source responsibilities we take seriously is the importance of sharing learnings, how-to’s, and expertise. One way we share learnings is by giving lots of conference talks (seems like I have to update our Events page every week with new events.) And another way we share our learnings is with our blog.

So just in case you missed any of our best posts from last year, here is the TOP TEN list of the most popular Citus Data blogs published in 2018. Enjoy.

Keep reading

Fun with SQL: Self joins

Written byBy Craig Kerstiens | January 2, 2019Jan 2, 2019

Various families have various traditions in the US around Christmas time. Some will play games like white elephant where you get a mix of decent gifts as well as gag gifts... you then draw numbers and get to pick from existing presents that have been opened ("stealing" from someone else) or opening an up-opened one. The game is both entertaining to try to get something you want, but also stick Aunt Jennifer with the stuffed poop emoji with a Santa hat on it.

Other traditions are a bit simpler, one that my partner's family follows is drawing names for one person you buy a gift for. This is nice because you can put a bit of effort into that one person without having to be too overwhelmed in tracking down things for multiple people. Each year we draw names for the next year. And by now you're probably thinking what does any of this have to do with SQL? Well normally when we draw names we write them on a piece of paper, someone takes a picture, then that gets texted around to other family members. At least for me every October I'm scrolling back through text messages to try to recall who it was I'm supposed to buy for. This year I took a little time to put everyone's name in a SQL database and write a simple query for easier recall.

Keep reading

The perks of sharing your Citus open source stories

Written byBy Claire Giordano | December 27, 2018Dec 27, 2018

Most of us who work with open source like working with open source. You get to build on what’s already been built, and you get to focus on inventing new solutions to new problems instead of reinventing the wheel on each project. Plus you get to share your work publicly (which can improve the state of the art in the industry) and you get feedback from developers outside your company. Hiring managers give it a +1 too, since sharing your code will sometimes trigger outside interest in what you’re doing and can be a big boon for recruiting. After all “smart people like to hang out with smart people”.

Keep reading

\watch ing Star Wars in Postgres

Written byBy Will Leinweber | December 14, 2018Dec 14, 2018

I recently had the honor of speaking at the last Keep Ruby Weird. A good part of the talk dealt with Postgres and since Citus Data is not only a database company but also a Postgres company, I figured sharing those parts on the Citus Data blog would be a good idea. If you'd like to see it in talk form, or you'd also like to know how to watch movies rendered as emojis in your terminal, I encourge you to watch the talk.

Keep reading