Fermata and Staccato: Does Music Predict the Stock Market?

Spotify Wrapped is making the rounds at the end of this very *insert preferred adjective here* year. Everyone shares their top artists and songs of the year, and we rejoice in the gift that music gives us.

But, can music actually be a predictor of our moods? And if it can predict our moods, does it somehow reflect the stock market?

Do America’s Top Music Choices predict market returns?

None of this is market analysis or investment advice, and should not be interpreted as such. All of this research is from December 2018 and does not incorporate any information beyond that.

Continue reading “Fermata and Staccato: Does Music Predict the Stock Market?”

Midwest Arbitrage: Capturing One of America’s Most Undervalued Assets

This week, I partnered with a fellow Midwesterner (though a bit more Midwest than I am) and startup operator Lea Boreland to take a quantitative look at innovation and the Midwest.

This was a really fun piece to work on. I grew up in Kentucky, but now live in Los Angeles. I miss Kentucky deeply – and I remember driving by the Ford plant most days, or going on field trips where we would pass quarry after quarry. (In fact, you can feel the tremors from a nearby drilling at my childhood home).

We wanted to write a piece to highlight the potential of the Midwest – and give it a voice in the conversation about Silicon Valley and tech.

Lea and I will show that since WW2, the Midwest has stagnated in comparison to the West, but there’s massive potential to reorient the applied industrial knowledge of the region toward 21st century trades like software development and machine learning. We only need to develop the risk capital ecosystem to take advantage of this potential. 

Source: Lea Boreland

I. The Midwest was once the home of industry in America.

In 1850, textiles were the most important part of the American economy, representing 1/5th of the U.S. Industrial Production Index, followed by Transport Equipment and Machinery, and shifting into chemical and fuel products gradually over time. 

During this time, the Midwest became the a hub of industry due to its proximity to natural resources (ex., fertile soil, heavy metals) and water access. The upper Midwest specialized in mining and pineries, while the lower central states became the nation’s breadbasket. The former capacity can be traced back to Precambrian lava flows, while the latter can be owed to ancient glaciers producing rivers (ex. The Mississippi River Basin, the Great Lakes) and favorable soils (see clay loam and alfisols on the chart below). 

Thanks to these ancient geologic happenings, the Midwest had both stuff to be processed and a means to transport it…at just the time when we were developing the technology to process and move stuff at scale. The region took this advantage and ran with it, leading US technological exploration into the early 20th century: Chicago was the nation’s first railroad center, and Detroit was, of course, the capital of the automobile industry. Meanwhile, at this time, the West coast was just beginning to be populated.

II. The Midwest is now underperforming the rest of the country.

Here’s the thing: when you build anything, you acquire technical debt. You forfeit optionality. The Midwest was built around manufacturing and industry, and then a new, higher yield technology came around: the semiconductor. While traditional manufacturing has been gradually replaced, automated, and/or outsourced, the tabula-rasa West Coast has been able to, from nothing, deploy the infrastructure to capture and grow the market for new technology.

This growth can be traced back not to a geologic source but a intentional, if serendipitous, policy choice: Stanford University hired Fred Terman as its president, and Terman—who saw war-era innovations firsthand while working for the DoD— created the physical place (the Stanford Industrial Park) that would link public innovations with corporate America in the postwar years.

Since then, the West coast has taken off, while the Midwest has barely gotten a hand hold onto the new tech based economy. Data on major public companies helps us visualize this shift. In the year 2020, CA is home to 550 publicly held companies, whereas Kentucky (where I am from) is home to only 13:

The returns from these public companies have also varied dramatically by region. The South and the Midwest have clearly lagged behind the West and the Northeast over the past five years, but the South has outperformed on a 20Y basis (due to oil and gas). The Midwest tells an interesting story in this graph: one of potential. One of the rules of investing is to find companies that have great potential and are trading at a low multiple – the Midwest clearly has room to the upside:

Finally, we can look at the impact of the industrial shift on employment. The graph below shows the unemployment rate by state since 1976. You can see the evolution in unemployment shift from the coasts to the middle of the country overtime, representing a gradual migration of human capital from the Midwest, to the Northeast and the West, as traditional industrial sectors decline and tech grows. 

III. As a result of this slow start to the tech boom, very little VC funding is making its way to the Midwest….But there is a ton of potential to reorient the applied industrial knowledge from the Midwest’s heyday toward high-tech pursuits. 

This graph, created in November 2020, shows the Top 100 YC Companies by Location (United States only). It segments the firms by headquarters – the West (CA, WA primarily), the Northeast, the Midwest, and the South. Out of the 85 companies listed, only 3 are headquartered in the Midwest/South, 3 in the Northeast, and 2 in Utah, which we considered to be a hybrid West-South. 

Source: YC

The top quadrant represents YC firms that are in the West, across all batches and industry. There is some crossover from two firms located in Lehi Utah, Podium and Weave. Zapier is in Tulsa, Oklahoma, which is considered the South by some, the Midwest by others. EquipmentShare is in Columbia, MO, and ShipBob is in Chicago, IL. 

This pattern repeats into the W20 YCombinator class, which included 117 companies headquartered in the United States. 99 of these are based on the West Coast, 98 of which are in California and 1 in Washington. 18 companies represent the rest of the country, with 10 located in New York, NY, 3 in Massachusetts, 2 in Texas, and one each in Michigan, Chicago, and Pennsylvania. 

Source: YC

There is clearly a lot of funding going towards companies located on the West Coast – and that makes sense, because a lot of companies build on the West Coast, However, the Midwest has a plethora of untapped knowledge, both in tech and in manufacturing. A few key opportunities:

  • Connecting tech and traditional manufacturing: “Hardtech innovation” will come from the minds that gave us the previous Industrial Revolutions – but in a different space. It will be focused on smart manufacturing, IoT, smart homes, and connected devices. The applied industrial knowledge used to build automobiles are like those needed to build robotics, devices, energy tech, medical devices, and much more, as Katie Pyzyk highlighted in her recent piece.
    • Midwestern cities are “maker cities.” The knowledge of production processes and advanced tech here should not be discounted.
  • Agriculture and Energy: The Midwest is also known for agriculture. This knowledge transfers well into bioactives and other ag-engineering opportunities, as well as work in hydrogen and electrical vehicles.
  • River access: The Ohio River connects most of the Midwest and South. It actually served as a point of economic prosperity when steamboats and barges were how products were shipped around the states. Now, there is an opportunity to harness both its physical power—as the University of Cincinnati is already doing.
  • Universities: The Ohio State University, University of Michigan, Notre Dame, are just some of the higher education institutions in the area that are well-established research hubs. Another undervalued aspect of the area? The students are great too. Human capital from tiny towns and underserved areas needs to be tapped into. 
Source: Web Smith

IV. VC can and should play a huge role in this.

Public projects like the Tennessee Valley Authority and the Research Triangle in Raleigh-Durham all catalyzed their respective regions. There is a big opportunity for private dollars to come in and do similar work elsewhere in the region. There are so many companies changing the world in cities like Cincinnati, Louisville, Ann Arbor, and Memphis. These cities are economic anchors to the rest of the country and provide promising potential for growth.

If we can grow them, this would also add resilience to the economy as a whole. According to the work of David Castells-Quintana and Vicente Royuela in “Agglomeration, Inequality, and Economic Growth”, it’s actually more beneficial to have a network of small economies than a series of concentrated cities:

“A more balanced urban system, in which small and medium-sized cities play a fundamental role in the mobilization of local assets to exploit local synergies, seems to be a better strategy than intense urban concentration.”

We are seeing the fallout from urban concentration now—expensive homes, high cost of living, etc. Smaller cities can exist simultaneously with larger cities to boost the growth of the entire nation. A cohesive national economy that spans across the entire country would begin to chip away at so many of the problems the United States faces as a nation. 

Venture capital has the opportunity to go into smaller cities and build from there, as Andrew Yang has done with Venture for America. The Bay Area may always be an anchor for startups, but there is opportunity to invest in traditional industries that are rapidly changing, such as manufacturing, agriculture, oil, and gas.

 The ecosystem will continue to grow, as applied knowledge catalyzes new ideas. The Midwest is the most undervalued asset in America, and the scope of opportunity is massive.

Some companies and careers in Energy Tech, as curated by Lea:

Companies and Careers in Energy Tech
  • Tennessee Valley Authority, a publicly owned development corporation
  • NextMV, a Midwest based logistics startup
  • Nimbus, an Ann Arbor based urban mobility company
  • Rivian, a Michigan-based EV startup
  • EquipmentShare, a Missouri based industrial equipment rental service
  • Censys, an Ann Arbor based cybersecurity startup
  • ShipBob, a Chicago-based e-commerce fulfillment service
  • M25, a Midwest-focused venture capital firm

A Visualization: Correlation vs Causation

There are a lot of charts floating around, discussing how x is related to y because they are “highly correlated”.

Our brains tend to make nonsensical connections to try and explain the world to us. We draw two lines and claim a relationship, thinking that we have effectively answered the question in the process.

Not quite.

Image for post
Source: Etsy

Correlation is not causation.

Correlation is a relationship, in which A moves with B. There are three types:

  • Positive: A and B increase/decrease at the same time
  • Negative: increase (decrease) in A leads to a decrease (increase) in B
  • None: A and B are unrelated

Causation is a cause-and-effect, in which A causes B.

I’ve made a couple of graphs to illustrate the fine line between correlation and causation — and how correlation can insinuate a relationship that might not actually exist.

Correlation versus Causation

Are falling emissions levels impacting Kim Kardashians popularity?

Image for post

Are McDonalds restaurants causing inflation?

Image for post

Did Chads drive the fall in interest rates?

Image for post

Did Twitter users increase Tesla’s share price?

Image for post

Is the price of bitcoin a leading indicator for avocado sales?

Image for post

Did farmers stop farming because of the success of the Marvel Movies?

Image for post

This is an illustration to show that just because data series move together, doesn’t mean that they cause each other — or that they are “related” at all.

Most of these examples show datasets that have no direct relationship to each other — but they look like they do.

Correlation does not necessarily mean causation.

Image for post


The Nugget Numbers: How Much Chicken is in a Chicken Nugget?

A lighter piece for a long day.

A Nugget Analysis

According to Epictetus, the right way to eat is the same as the way to live: “just, cheerful, equable, temperate, and orderly.”

More precisely:

Source: Daily Stoic

So when I came across a chicken nugget dataset on Kaggle, I was fascinated. This person had weighed their individual nuggets from McDonalds and Wendys.

Like Epictetus, I wanted to explore how “equable, temperate, and orderly” the nuggets might be – how much chicken is in the chicken nugget?

Nugget Numbers

According to this dataset, McDonalds nuggets weigh 16.5g on average – and they range from 21g to 14.6g. Wendy’s nuggets weigh 17.2g on average, ranging between 18.3 and 15.9g. There was a standard deviation of 1.68g across the McDonald’s nuggets and of 0.70g across the Wendy’s nuggets.

As you can tell by the very squished Wendy’s logo, Wendy’s has a more consistent nugget experience – a larger nugget AND less variation in nugget weight, as compared to McDonalds.

But which nugget actually has the MOST chicken of the nuggets (the most chicken-y chicken nugget?)

Disclaimer: I am vegan. As a vegan, I must inform you that I am vegan. It’s a vegan rule of law. Also, this is not a scientific article. These are educated guessestimates (as most things are)

The Nugget Nutrition

Looking at the nutrition data for a 6-pack of nuggets from McDonalds and Wendys, some nutritional discrepancies appear. Wendy’s nuggets are higher in fat (27g versus McDonalds 25g) and McDonalds have more carbs and protein. 

Each food that we eat is composed of macro- and micro-nutrients. From the macros we can calculate the number of calories in each food.

  • Each gram of carbohydrates = ~4 calroies
  • Each gram of protein = ~4 calories.
  • Each gram of fat = ~9 calories. 
Source: Health24

So taking our McDonalds nuggets, we can calculate the calories based on the fat, carbs, and protein that are listed.

  • McDonalds Nuggets Calories
    • (25 g of fat) x (9 g cals / gram) = 225 calories from Fat
    • (25g of carbs) x (4 cals / gram) = 100 calories from Carbs
    • (23g of protein) x (4 cals / gram) = 92 calories from Protein

Comparing the two nuggets side-by-side:

 The graph below breaks down the calories from each macro into percentages. Wendy’s nuggets are 57% fat, 22% carbohydrates, and 21% protein. McDonald’s are 54% fat, 24% carbs, and 22% protein. Both are 100% nugget.

To compare, a regular piece of chicken is 90% protein and 10% fat. 

So, the nuggets definitely aren’t “regular” chicken, but we knew that already. But how much “regular chicken” is in the chicken nuggets?

Source: Very Well Fit

The Nugget Math: McDonald’s McNugget

McDonalds has the following macro breakdown:

McDonalds nuggets are ~24% breading (meaning it’s ~24% carbohydrates, as explained above). This is not chicken. For illustrative purposes, that leaves 76% of the nugget left to be the “chicken” in our chicken nugget.

Taking our previous all-white chicken example (chicken is 90% protein and 10% fat): we would expect this remaining 76% of the nugget to be ~8% fat and ~68% protein.

But the remaining 76% nugget is actually 54% fat and 22% protein – far from the expected 8% fat and 68% protein! Quite the nugget nonplus!

We have an excess of fat (relative to what it would be if rest of the nugget were pure chicken).

The fat surplus is ~27%, most likely from different types of vegetable and cooking oils. 

That leaves the remaining portion of the chicken nugget left to be actual chicken: a whooping ~30%.

A chicken nugget is 70% bread and fat, 30% chicken.

This actually fits the true Merriam Webster’s definition of nugget well – “a small chunk or lump of another substance.” (note, they don’t specify the substance)

Wendy’s Nuggets

With McDonalds, we get about 30% chicken in our chicken nugget.

Looking at the ingredients, we can see that the two nuggets are relatively similar in build.

McDonald’s Chicken NuggetsWendy’s Chicken Nuggets 

Is Wendy’s any better? More bang for the nugget buck?

Nope- Wendy’s nuggets are only 28% real chicken! All the math is below.

Even taking into account Wendy’s larger nugget size of 17.2g, you still get less chicken versus the 16.5g McDonalds nugget in absolute terms. To be (far too) precise, each Wendy’s nugget has 4.89g of chicken and each McDonalds nugget has 4.92g of chicken (on average).

That 0.03g adds up overtime. Compounding, but for nuggets.

If you’re eating nuggets, it looks like the McNugget might be the highest value “real” chicken nugget you can get – Wendy’s might have a bigger nugget to offer – but it might not be as chicken-y as you would expect.

(Please note, this is obviously not *exact science*)


A Nugget for your Thoughts

Why am I looking at nuggets? That is a good question. I don’t have a great answer. I find datasets (like this one on Kaggle) and I try to build stories with them.

With this story, I was shocked that there wasn’t more than 50% chicken in the chicken nugget. As a vegan, it made my stomach hurt to read the ingredients – but I also think that we should actively treat our bodies with the love and care they deserve. Chicken nuggets are not good for you.

But then, one of my friends brought up an important point – what does the nugget stand for? They are non-chicken nuggets masquerading as chicken nuggets – they aren’t real.

Is reality even real? If our chicken nuggets are less than 50% chicken, are they even chicken nuggets at all?

Is Jean Baudrillard right in his work Simulacra and Simulation? Have we truly “replaced all reality and meaning with symbols and signs, and that human experience is of a simulation of reality?

This reminded me of the fourth stage of sign-order that he highlights in his book:

This is a regime of total equivalency, where cultural products need no longer even pretend to be real in a naïve sense, because the experiences of consumers’ lives are so predominantly artificial that even claims to reality are expected to be phrased in artificial, “hyperreal” terms.

Source: Simulacra and Simulation

Our brains will fill in the blanks of our existence, perceiving things that are not there, confusing reality and representation. Perhaps these nuggets are just a symbolic representation of all that nuggets used to be, a psychological construct that we build upon in order to obtain a sense of reality. 

Welcome to the desert of the real.

This is a reference to the Matrix. In this scene, Morpheus shows Neo the world as it exists today, and that tells him that he had been existing in the Matrix. He says “welcome to the desert of the real”, a reference to Baudrillard’s work.

TikTok and Content Strategy: The Power of Creation

  • “How do you grow your audience?”
  • “How do you monetize content?”
  • “What’s your niche?”

These questions have haunted me for the past several weeks. After figuring out that I don’t really have a niche as a writer, I had a sort of analysis paralysis. I wrote a piece about learning in public and how the best niche was to have no niche, but it all felt wrong.

I resigned myself to the fact that I might never have a niche, and then thought about TikTok.

What creates a TikTok star? What’s the relationship between content and fame? What’s the secret?

The secret is that there is no secret.

The Strategy of No Strategy

Dixie D’Amelio was recently interviewed by Nylon. Dixie is the 8th most popular person on TikTok, with over 40 million followers. When talking about her interactions with the app:

Honestly, hours,” she says of the time she gives (to TikTok) — which, with each new follower, becomes more and more precious — to scrolling on her ‘For You page’ [TikTok’s homepage] “I could stay up all night doing it, which is a positive and a negative thing

Source: Nylon

TikTok creators are interesting. A large portion of them are young and tend to approach the app like Dixie does, as she described to Nylon:

“I just have fun with it. I try to keep my content the same as it always has been because that’s why people followed me. So I try to stay with what I know. I’m still surprised when the ones that take really no effort [become popular], where people are like, ‘Oh my gosh, classic Dixie.’ [But,] honestly, the ones that I do put effort in, I’m like, ‘Oh, this is, this is too much. I’m embarrassing myself.’ And then I end up deleting it.

Source: Nylon

She doesn’t highlight a brand strategy or a particular niche.

I’ve had two separate ‘Twitter strategy’ conversations that sum up exactly what she’s said — the best content can often be off-the-cuff ideas, quick videos, and simple takeaways.

But Nylon highlighted that the more followers she gets, the more important it is for her to stay engaged.

So I wondered: does she need to spend more time on the app, uploading content, with each new follower that she gets? Do more uploads mean more followers?

No. No, it doesn’t.

The Data

Socialblade has a list of the top TikTokers, ranked by followers, uploads, following, and likes.

  • Followers: # of people who are following the creator
  • Uploads: # of videos uploaded by the creator
  • Following: # of people that the creator is following
  • Likes: # of total likes that the creator’s videos have received

A Brief Overview

I collected the data from Socialblade, threw it into Excel and looked it over in R, a programming platform. The sample size is 100 creators, who have an average of 24.9M followers, 1,408 video uploads, 961M likes received, and they follow, on average, 1,022 people.

Breaking Down the Data: The Top 10 TikTok Stars

Here are the stats of the top 10 TikTok creators by followers:

Dixie’s little sister, Charli D’Amelio absolutely dominates TikTok with almost 100m followers — besting #2 Addison Rae by 28.7M.

But the thing is, Charli doesn’t post that much. She doesn’t follow that many people. The explanation for her rise is way outside the scope of this article, but her strategy – post some videos, follow a few people – is interesting.

Number of Followers vs Number of Following

Charli only follows 1,065 people, much less than the amount that follows her.

In fact, the top 10 creators only follow about 1,175 people on average – skewed slightly by Spencer X.

Number of Uploads: How Do You Grow a Following?

So Charli, the most ‘successful’ person on the app, doesn’t follow a ton of people. But does she upload a lot of videos?


She’s actually #36 in terms of number of uploaded videos. The ‘Most Active’ (most uploaded videos) title belongs to Kyle Thomas, a British creator with 17.8M followers who has posted 7,430 videos.

Kyle does not have the most followers per upload.

The people who have the most followers per upload are celebrities. They automatically generate a following because of who they are. They aren’t necessarily “creators” who depend on the native app for success. Julesleblanc is probably the closest in-app creator, with 460k followers per each of her 32 uploaded videos.

There’s a bit of skew here because most of the celebrities can get away with not uploading much content. Ariana Grande has only uploaded 5 TikToks — but has 18.2 million followers – translating to 3.6M followers per upload. Justin Bieber is in second place, with 11 uploaded videos, giving him 1.6M followers per upload.

So is the Kyle Thomas strategy – uploading a ton of videos – the right one to pursue?


Content Strategy: Upload More?

Kyle and Charli clearly have different strategies. Which one is better? Does uploading more videos correlate with a higher follower count?

No. It does not. The scatterplot above shows the relationship between followers and uploads. The more followers you have, the less you tend to post.

More videos doesn’t necessarily mean a higher follower count.

The Negative Correlation Between Followers and Uploads

In fact, there’s a negative correlation between the two. In the table below, there’s a -7% correlation between uploads and followers — meaning that the more you post, the more likely it is that you have less followers.

If you run a regression on the data, you get a pretty gnarly equation:

The main takeaway here is that for every video that the creators upload, they lose 2,428 followers, holding all other variables constant. (The uploads coefficient of 0.0023 is expressed in millions here).

Subtracting Out Celebrities

Even if you take out all the celebrities (leaving you with n = 60), the same sort of relationship appears.

It’s actually worse with the celebrities removed – you lose more than 3,000 followers per upload!

The more you post… the more likely it is that you have fewer followers!

Of course, you can’t just not post. But in an increasingly content-heavy world, less content might be more.

The Laws of Content Supply and Demand

This is the scarcity effect — if you’re always offering up content, you reduce the gap between demand and supply. Ideally, the content demand is greater than the content supply. That keeps people coming back.

But if you’re constantly posting about your day, about something random that you see on the side of the road, people get less interested. There’s less mystery. You’ve got to keep the wonder alive.

Source: Social Network Buzz

The Strategy of Silent Strategy

This dataset does make an interesting point to the Nylon article . It is important for TikTok stars to watch other TikToks to check out the competition.

But that doesn’t mean they need to upload constantly.

As creators, we sometimes get wrapped up in our own voices. We focus too much on producing, and not enough time on consuming.

What Dixie and Charli seem to do well is stay on top of the TikTok trends. Most are ‘easy’ to replicate and are quickly consumed. They also carry well across other platforms, with further enables virality, and thus follower growth.

There’s probably a more scientific relationship between type of post and followers gained. Still, it’s interesting to consider the fact that the most popular influencers of this time don’t have a massive media strategy (that we know of) – they just have fun. 

In some spaces, you’re encouraged to niche yourself into a box. And that makes sense — it’s good to gain deep expertise in one subject, because it’s an effective strategy for becoming an expert that people know that they can rely on.

But there’s also the point that some of us seem to forget — you’re supposed to “just have fun with it” too. Take advantage of trends, show people that you’re having a good time and that you enjoy what you do – There is no secret path to success. Don’t be so busy creating a niche that you forget to create.

Natural Language Processing and Naval: The Art of the Podcast

Podcasts are a harmonic creation.

They are fascinating tools, connecting us to the speakers in a very personal way, almost like we are a part of the conversation too. I laugh out loud at some things, or nod along, reacting emphatically, despite being in a different time and place than both the interviewer and interviewee.

Podcasts are tools of connection. Almost everyone either wants to start a podcast or knows someone who has started a podcast (for better or worse).

Podcast creation is the perfect storm of an open consumer base, an ever-evolving Internet, and accessible recording software and hardware. The barrier to entry is low. In the most basic approach, all you need to have is a cellphone and Internet access.

We also like to hear from online ‘celebrities’ or people that we greatly admire, and use the content for learning and entertainment. It’s easy to go onto Spotify or Apple to get access to a wide variety of podcasts, on an unfathomably wide variety of topics.

What’s the best podcast?

When Austen, the CEO of Lambda School and someone with 130k+ followers on Twitter, tweeted this, I was curious to see what people would say. I will caveat this with the fact that most of Austen’s followers are tech / tech-adjacent people (most responses had the word ‘software’ or ‘building’ in their bio) so the responses are likely biased into the tech sphere.

Image for post
Source: Austen Allred

I mined their responses in a spreadsheet.

Image for post

The most recommended podcast was The Tim Ferris Show (16% of all responses, n = 208) followed closely by Joe Rogan, as shown above.

Specific combinations were recommended as follows:

Image for post

Out of the 208 responses, 8% of people specifically recommended the Joe Rogan + Naval combination. 7% specifically recommended Naval and Shane Parrish of The Knowledge Project. Another 2% was Naval’s own Podcast, How to Get Rich.

Naval represented a total of 24% of all recommendations that specified an interviewee (n = 148, 35 responses for Naval). Peter Thiel and Derek Sivers followed, making up 5% and 4% of all responses, respectively.

Image for post

The Knowledge of Naval

Naval is fascinating, and it makes sense that he is a quarter of all interviewee recommendations because he is in his words, “a hero among young male geeks”.

Naval represents a next-level sort of thinking, and voices what a lot of us feel but don’t always vocalize (the future is entrepreneurship, power over the monkey mind, the debugging, the duality of thought, etc. etc.).

Eric Jorgenson recently wrote The Almanack of Naval Ravikant which captures a lot of his wisdom. There are several hundred threads on Twitter with key takeaways from the book, as well as from Naval’s interviews.

Even just running the frequency metrics for words from the transcript from The Knowledge Project, you can see why Naval is so admired in this space.

Image for post

The word ‘think’ is mentioned 161 times in his conversation with Shane Parrish. The word ‘can’ is mentioned > 100 times, tied with ‘people’ and ‘read’. From what I can gather from Naval, those four words encapsulate a large portion of what he seems to value and care about.

But let’s do a deeper dive.

Text Mining of Naval

I pulled the Naval’s interview transcript from The Knowledge Project Podcast, The Tim Ferris Podcast, and Joe Rogan’s podcast. I used the R packages for Natural Language Processing to get a sense of sense of sentiment, emotion, and word association.

I used the NRC Word-Emotion Association Lexicon (aka EmoLex) which was developed by Saif Mohammad, a researcher at NRC. EmoLex divides the words into eight emotions: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust. The words are segmented via crowdsourcing (you can read more here if you’re interested).

Image for post
Source: Safi Mohammad

Emotion Association and Sentiment Analysis

The Knowledge Project

On the Knowledge Project, a lot of the conversation fell into a ‘trust’ emotion. Almost 400 words. Words like wisdom, wealth, thoughtful, etc.

Image for post
Plot of Shane Parrish and Naval: Distinct Emotions

On a percentage basis, both trust and anticipation outweigh any of the other emotions.

Image for post

The sentiment scatterplot is below, with most of the conversation remaining positive over time.

Image for post

The average sentiment of the conversation was 0.077, the most ‘negative’ out of all of the conversations.

The most positive sentence:

If you can be more right, more rational, and that’s one of the reasons why I love your blog because it really focuses on helping you be more right, better decision-making, more rational, then you’re going to get nonlinear returns in your life

The Joe Rogan Podcast

For the Joe Rogan podcast, there was much more anticipation and joy, relative to trust. Trust still dominated.

Image for post

There was a more even dispersion across the emotional spectrum too, with a decent amount of fear (~10%) interjected into the conversation.

Image for post

An equal amount of sadness and anger was represented here too, but the conversation was relatively positive.

Image for post

The conversation was very positive in terms of sentiment, scoring 0.0985.

This was the most positive sentence: Very professional, very quick, very thorough but he did more diligence on me than I did on him

The Tim Ferris Show

Tim Ferriss and Naval had a very trustful conversation.

Image for post

There was more trust, more anticipation here — this podcast was very similar to the conversation Naval had with Shane Parrish, according to this distribution.

Image for post

There was an average sentiment of 0.0901.

The most positive sentence: Most of the ways we try to get peace from mind are indirect, whereas if you understand things if you see things properly you will naturally slowly develop peace from mind

Most negative: “So you have to ruthlessly, ruthlessly disappoint everybody”

Image for post

The Podcasts

The Knowledge Project

The Knowledge Project has an interesting word frequency distribution, with the most common words in the podcast descriptions being ‘author’, ‘making’, ‘can’, ‘learning’, ‘life’ and ‘world’.

Image for post

Econ Talks

Econ Talks was another top podcast, with the word University the most prominent (likely due to the interviewee’s job title) as well as argues, author, book, and policy.

Image for post

The Tim Ferris Show

The Tim Ferris Show was lessons, life, master, and building.

Image for post

Joe Rogan didn’t have descriptive podcast names (just the names of the interviewees) so no word cloud for JRE.


There is power in trust. There is a thin line between a good podcast and a great podcast, and the interviewee is 85% of that. A good interviewer is just as important — someone who lets the interviewee talk, but also contributes to the conversation meaningfully.

Conveying trust + anticipation (hope) seems to be a strong combination for a successful podcast (at least for Naval).

I will also highlight that in this dataset, there was a lack of diversity, both in terms of race, gender, age, etc. I’ve included some excellent podcasts below for even more thought and perspective. Please comment any below!

The Paradox of Societal Polarization: We are all Hedgefoxes

“The fox knows many things, but the hedgehog knows one big thing.” —


We are so quick to organize ourselves into different categories. Politics. Religion. Sports teams. Keto vs paleo vs aerivore.

We like labels. It’s an iteration of community building, which is a key part of the human psyche. We are social creatures, and that comes out in how we identify with the world around us. 

We even segment according to how we think. 

Continue reading “The Paradox of Societal Polarization: We are all Hedgefoxes”

The Economic Impact of Bees and the Role of Deceit

Bees and Humans.

“If all mankind were to disappear, the world would regenerate back to the rich state of equilibrium that existed ten thousand years ago. If insects were to vanish, the environment would collapse into chaos.”

E.O. Wilson

Pollinators are responsible for every 1 in 3 bites of food. They increase the output of 87 of the leading crops worldwide. The dollar value of these crops is between $235B and $577B per year. The amount of agriculture dependent on these pollinators has increased 300% over the past 50 years.

Continue reading “The Economic Impact of Bees and the Role of Deceit”

Dating Data: An Overview of the Algorithm

Love in the time of COVID is a… challenge, to say the least.

I’ve downloaded and deleted Hinge every few months since December 2019, and decided to run the data on my most recent download to see how things were going. I did not go on any dates with anyone new due to the pandemic, but I did chat with a few.

However, I wanted to do a deep dive into the science of the algorithm – what drives this fast-growing matchmaking process?

Continue reading “Dating Data: An Overview of the Algorithm”