Description
It's that time of the year again where data professionals look at their data predictions from 2022 and decide what they were wrong about and think: "this must be the year for XYZ." Aside from the fact that these type of predictions are 100% subjective and nearly impossible to verify, it's always fun to play armchair quarterback and make a forecast about the future (see why forecasts are flawed in this episode about Superforecasting). The reason why predicting what will happen in 2023 is that my predictions are based on what other people are talking about, not necessarily what they are doing. The only data point I have on what's actually happening within organizations is what I see happening in my own organization. So take everything with a grain of salt and let me know if these predictions resonate with you!
1) Artificial intelligence and natural language processing doesn't eat your lunch
How could a prediction for 2023 not include something about artificial intelligence? It seems like the tech world was mesmerized by ChatGPT in the second half of 2022, and I can't blame them. The applications and use cases are pretty slick and mind-blowing. Internally at my company, we've already started testing out this technology for summarizing meeting notes and it works out quite well and saves a human from having to manually summarize the notes. My favorite application of AI shared on Twitter (where else do you discover new technologies? Scientific journals?) is this bot that argues with a Comcast agent and successfully gets a discount on an Internet plan:
https://twitter.com/jbrowder1/status/1602353465753309195
These examples are all fun and cute and may help you save on your phone bill, but I'm more interested in how AI will be used inside organizations to improve data quality.
Data quality is always an issue when you're collecting large amounts in real-time every day. Historically, analysts and data engineers are running SQL queries to find data with missing values or duplicate values. With AI, could some of this manual querying and UPDATE and INSERT commands be replaced with a system that intelligently fills in the data for you? In a recent episode with Korhonda Randolph, Korhonda talks about fixing data by sometimes calling up customers to get their correct info which then gets inputted a master data management system. David Yakobovitch talks about some interesting companies in episode 101 that smartly help you augment your data using AI.
We've also seen examples of AI helping people code via Codex, for example. I think this might be an interesting trend to look out for as the demand for data engineers from organizations outpaces supply. Could an organization cut some corners and rely on Codex to develop some of this core infrastructure for their data warehouse? Seems unlikely if you ask me, but given the current funding environment for startups,
When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing...
Published 09/10/24
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career...
Published 08/05/24