Data Bases for data science

Sairam Penjarla
2 min readMar 19


🤖As we all know, a strong foundation in databases is crucial for any data science project.

But for beginners, it’s even more important to have a good understanding of the basics of various databases. This not only helps in building solid data science projects but also sets you apart in interviews.

Here are my top 5 favorite databases for data science beginners:


  • An open-source relational database management system is known for its reliability and advanced features such as full-text search and concurrency control.


  • A widely used open-source relational database management system with a user-friendly interface and powerful query capabilities.


  • A search engine and analytical engine built on top of the Lucene library. It is often used for real-time search and analytics on large-scale data sets.


  • A document-based NoSQL database known for its scalability and flexibility, making it a great choice for handling large and unstructured data sets.

💻Apache Cassandra

  • A highly scalable, distributed NoSQL database designed for high availability and fault tolerance, making it a great choice for large-scale data processing and analytics.

💻Hadoop HDFS

  • A distributed file system that can store and process large-scale data sets across a cluster of commodity servers. It is often used in conjunction with other big data tools like MapReduce, Pig, and Hive.

💻Apache Spark

  • An open-source, distributed computing system that can handle big data processing and analytics tasks, including SQL, streaming, and machine learning. It supports a variety of data sources including HDFS, HBase, and Cassandra.

There are many other databases available that could be suitable for different use cases. But, it’s always important for beginners to evaluate the specific requirements of their project and choose the database that best fits their needs. Additionally, having a solid understanding of different databases will help beginners to outshine in interviews and stand out among others.

#databases #datascience #MySQL #PostgreSQL #MongoDB #Hadoop #ApacheSpark #Elasticsearch



Sairam Penjarla

Looking for my next opportunity to make change in a BIG way