Какая база подойдет для быстрых операций с JSON?

MongoDB. JSON-подобные документы это формат в котором она хранит свои данные.
- Вконтакте
- Вконтакте
Баз подойдет много. Mongo подходит. PostgreSQL тоже подходит с их новым типом JSONB и текстовым индексом. На одном из семинаром главный Постгресщик Бартунов хвастался что на тестах производительности этот тип данных обгоняет Mongo. Хотя ХЗ как это проверить на кастомных проектах.
Which is the suitable database for storing a large JSON? [closed]
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago .
I have only one large JSON file. For example,
I need to support operations following operations:
Querying for an element should return all child elements e.g. Querying for AssetID should return
Update value of elements.
I considered following approaches:
- Graph database : I started reading about Neo4J. However, it can not create graph from JSON intellgently. One has to specify node type and their hierarchy order.
- ElasticSearch : It can work by treating JSON as text and hence not efficient solution.
- Postgres : It supports querying over JSON objects but updating, deletions won’t be efficient.
Is there any good database out there which can load data from large JSON and handle my operations?
![]()
4 Answers 4
If you are only working with JSON then you should really use a document oriented database as it will save you having to wrestle something sql related.
MongoDB is a good choice, supports many drivers and can deal with tree structures (Though I’m not sure about the automatic creation)
CRUD operations are simple and cover a wide range of cases.
For very large datasets on busy servers you should use the XFS file system and the WiredTiger storage engine as there are some gains in performance.
It’s well supported, and isn’t that much of a learning curve. (I came from Pure SQL without too much trouble)
You also have the option of MariaDB or MySQL which also both support JSON though I have no experience with either, and in the case of MySQL I feel it was just a ‘bolt on’ which had to be added in the face of an up-coming requirement.
This is a typical architectural question to choose the right database, wherein you have to consider quite a few important aspects such as HA, resiliency, replication, sharding, tools support, maturity, licensing, backup & restore etc.
MongoDB and Couchbase DB are the two most popular and widely used document databases. There is no straight-forward answer to choose one, as you have to do trade-off analysis. I can share my two cents, hopefully this would help you in arriving at the right decision.
Either MongoDB or Couchbase NoSQL document databases can be considered, as json is the first class citizen in both and you get really good options to perform operations using fields.
- MongoDB (CP support out of CAP) prefers consistency over availability whereas couchbase (AP out of CAP) is high available database.
- MongoDB cluster works with master/slave architecture whereas couchbase cluster works with peer-to-peer distribution architecture.
There are many more dimensions to be considered and following links would take you in right direction.
Since, in your particular case you have highlighted that you have only one large file, IMDG (in-memory data grid such as Apache Ignite) based solutions can also be considered with a single node set up.
List of JSON Databases
List of database management systems that support JSON.
Here’s a list of database management systems (DBMS) that support JSON.
MongoDB is also a cross platform NoSQL DBMS, currently supporting Windows, Mac, Solaris, and various Linux distributions at the time of writing.
MongoDB is used by some of the largest companies in the world, including Facebook, Google, Nokia, MTV Networks, Cisco, Forbes, and many more.
Behind the scenes, MongoDB actually stores the JSON documents in a binary-encoded format called BSON. BSON extends JSON through supporting additional data types and to be efficient for encoding and decoding within different languages.
The Couchbase Data Platform includes Couchbase Server and Couchbase Mobile. Both of these are open-source, NoSQL, multi-model, document-oriented database management systems that store JSON documents.
Couchbase refers to its platform as the industry’s first Engagement Database — a new class of database that can tap into dynamic data, at any scale and across any channel or device.
Couchbase (the company) began as NorthScale in 2009, and was subsequently renamed to Membase Incorporated in 2010. Couchbase, Inc. was then created through the merger of Membase and CouchOne in February 2011.
Couchbase customers include Amadeus, AT&T, BD (Becton, Dickinson and Company), Carrefour, Cisco, Comcast, Disney, DreamWorks Animation, eBay, Marriott, Neiman Marcus, Tesco, Tommy Hilfiger, United, Verizon, Wells Fargo, and more.
Apache CouchDB is a document oriented open source database management system that uses JSON natively.
CouchDB was first released in 2005 and later became an Apache Software Foundation project in 2008.
Couch is an acronym for cluster of unreliable commodity hardware.
CouchDB is reportedly used by companies such as Amadeus IT Group, Credit Suisse, npm, and the BBC.
Azure DocumentDB is Microsoft’s multi-tenant distributed database service for managing JSON documents at Internet scale. DocumentDB indexing enables automatic indexing of documents without requiring a schema or secondary indices. DocumentDB is designed to provide real-time consistent queries in the face of very high rates of document updates.
MarkLogic is considered a multi-model NoSQL database for its ability to store, manage, and search JSON and XML documents and semantic data (RDF triples).
MarkLogic was initially based on XML, but has since evolved to natively store JSON documents and RDF triples.
MarkLogic customers include Aetna, BBC, Boeing, Broadridge Financial Solutions, Dow Jones, McGraw Hill Financial, NBC, Wiley, U.S. Army, U.S. Navy.
OrientDB is an open source NoSQL database management system written in Java. It is a multi-model database, supporting graph, document, key/value, and object models, but the relationships are managed as in graph databases with direct connections between records.
OrientDB natively supports HTTP, RESTful protocol, and JSON additional libraries or components.
OrientDB clients include Comcast, Sky, Cisco, Verisign, Ericsson, United Nations, and Warner Music Group.
RethinkDB is the first open-source, scalable JSON database built from the ground up for the realtime web. RethinkDB is designed specifically to push data to applications in realtime.
Riak is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. In addition to the open-source version, it comes in a supported enterprise version and a cloud storage version.
Although Riak wasn’t explicitly created as a document store, it does have features that make it possible to store and query JSON objects or XML.
BaseX is a native and light-weight XML database management system and XQuery processor, developed as a community project on GitHub.
Although it’s an XML database, its JSON module contains functions to parse and serialize JSON documents.
Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License. Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.
Elasticsearch users include Wikimedia, Adobe Systems, Facebook, Stack Exchange, Quora, Mozilla, Netflix, and more.
MySQL is the world’s most popular open source DBMS. MySQL is used by some of the largest organisations in the world, including Facebook, Google, Twitter, Adobe, Flickr, Alcatel Lucent, Zappos, YouTube, and many more. It is also used by many smaller scale projects such as personal websites or blogs.
MySQL 5.7.8 introduced a native JSON data type that enables efficient access to data in JSON. This includes optimized storage, and automatic validation of JSON documents stored in JSON columns. Invalid JSON documents will produce an error.
Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation. Oracle Database is one of the world’s most popular RDBMSs.
Although Oracle Database is an object-relational database, it does support JSON (and XML). It supports JSON natively with relational database features, including transactions, indexing, declarative querying, and views.
PostgreSQL (often referred to as Postgres), is an object-relational database management system (ORDBMS) with an emphasis on extensibility and standards-compliance.
PostgreSQL also has a number of JSON functions and operators that can be used with its two JSON data types (JSON, and JSONB).
JSON Databases Explained
JSON (JavaScript Object Notation) has become a standard data-interchange format, particularly for semi-structured data. JSON databases are part of the NoSQL family of databases that offer flexibility in storing varied data types and easily accommodating changes in data model or project requirements. The flexibility of a JSON database comes from the way data is stored—as documents instead of rigid tables. Read on to know more.
What are JSON databases?
A JSON database is a document-type NoSQL database, ideal for storing semi-structured data. It’s much more flexible compared to the row-columns format, which is fixed and expensive when it comes to implementing even small schema changes.
With relational databases, JSON data needs to be parsed or stored using the NVARCHAR column (LOB storage). However, document databases like MongoDB can store JSON data in its natural format, which is readable by humans and machines.
There are two ways to store JSON data in a JSON database:
Store the whole object in a single document.
Here, the author details are inside the book document itself. This technique is also known as embedding because the author subdocument is embedded in the book document.
Store parts of objects separately and link them using the unique identifiers (referencing).
One author may write multiple books. So, to avoid duplicating data inside all the books, we can create separate author document and refer to it by its _id field:
Advantages of JSON databases
Just like traditional databases, JSON document databases manage data partitioning, indexing, clustering, replication, and data access on their own. Apart from this, JSON databases offer many advantages.
JSON databases are faster and have more storage flexibility
NoSQL databases, in general, have more storage flexibility and offer better indexing methods. In a document database, each document is handled as an individual object, and there is no fixed schema, so you can store each document in the way it can be most easily retrieved and viewed. Additionally, you can evolve your data model to adapt it to your changing application requirements. The schema versioning pattern makes use of the flexible document model to allow just that.
JSON databases provide better schema flexibility
The best part of a JSON document database is the schema flexibility—i.e., it supports whatever way you want to store your data. You can have all the information that you need to access together (embedding) in one document or take the liberty of creating separate documents and then linking them (referencing). It’s very simple to even query the nested objects inside a document, like nested arrays or embedded documents.
JSON databases can easily map to SQL structures
Many developers are familiar with SQL. By storing data in a JSON database, developers can simply map SQL columns and JSON document key names. For example, the bookName key of a document can be mapped to the book_name column of the book table. Most JSON databases automate this mapping, which saves on a developer’s learning curve and reduces the development time.
JSON databases support different index types
Due to the availability of various index types, search queries are quite fast. For example, since MongoDB has no fixed schema, you can create a wildcard index on a field or set of fields to support querying that field. There are many other types of indexes, like O2-tree and T-tree, that make NoSQL databases highly performant.
JSON databases are better suited for big data analytics
JSON databases have a flexible schema and scale well vertically and horizontally, making them suitable to store huge volumes and a variety of big data. Document databases like MongoDB have a rich query language (MQL) and aggregation pipeline, eliminating the need for ETL systems for data processing and transformation. Further, these databases can easily pass data to popular data analysis programming languages like Python and R, without additional coding steps.