在.net中解析大型JSON文件

本文关键字：大型 JSON 文件 net | 更新日期: 2023-09-27 18:09:59

我使用了json的"JsonConvert.Deserialize(json)"方法。到目前为止，NET工作得很好，老实说，我不需要比这更多的东西了。

我正在开发一个后台(控制台)应用程序，它不断地从不同的url下载JSON内容，然后将结果反序列化成一个。net对象列表。

 using (WebClient client = new WebClient())
 {
      string json = client.DownloadString(stringUrl);
      var result = JsonConvert.DeserializeObject<List<Contact>>(json);
 }

上面的简单代码片段可能看起来并不完美，但它完成了工作。当文件较大(15,000个联系人- 48mb文件)时，JsonConvert。DeserializeObject不是解决方案，该行抛出JsonReaderException异常类型。

下载的JSON内容是一个数组，这是一个示例的样子。Contact是反序列化JSON对象的容器类。

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

我最初的猜测是内存不足。只是出于好奇，我试图将其解析为JArray，这也导致了同样的异常。

我已经开始深入研究Json。NET文档和读取类似的线程。由于我还没有找到一个可行的解决方案，我决定在这里发布一个问题。

更新:当逐行反序列化时，我得到了相同的错误:"[。路径"，行600003，位置1 "所以下载了其中的两个并在notepad++中进行了检查。我注意到，如果数组长度超过12,000，在第12,000个元素之后，"["关闭并开始另一个数组。换句话说，JSON看起来就像这样:

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

在.net中解析大型JSON文件

正如您在更新中正确诊断的那样，问题是JSON具有关闭]，紧接着是打开[以开始下一组。这种格式使得JSON作为一个整体无效，这就是JSON的原因。. NET抛出错误。

幸运的是，这个问题似乎经常出现，Json。NET实际上有一个特殊的设置来处理它。如果您直接使用JsonTextReader来读取JSON，您可以将SupportMultipleContent标志设置为true，然后使用循环来单独反序列化每个项目。

无论有多少数组或每个数组中有多少项，这应该允许您以有效的内存方式成功处理非标准JSON。

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;
        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

完整的演示在这里:https://dotnetfiddle.net/2TQa8p

Json。NET支持直接从流反序列化。这是一种使用StreamReader一次读取JSON字符串的方法，而不是将整个JSON字符串加载到内存中。

using (WebClient client = new WebClient())
{
    using (StreamReader sr = new StreamReader(client.OpenRead(stringUrl)))
    {
        using (JsonReader reader = new JsonTextReader(sr))
        {
            JsonSerializer serializer = new JsonSerializer();
            // read the json from a stream
            // json size doesn't matter because only a small piece is read at a time from the HTTP request
            IList<Contact> result = serializer.Deserialize<List<Contact>>(reader);
        }
    }
}

参考:JSON。NET性能提示

对于5 GB的文件大小，我已经在Python中做了类似的事情。我在某个临时位置下载了该文件，并逐行读取，形成一个JSON对象，类似于SAX的工作方式。

c#使用Json。. NET，您可以下载该文件，使用流阅读器读取该文件，并将该流传递给JsonTextReader，并使用JTokens.ReadFrom(your JSonTextReader object)将其解析为JObject。

这可能仍然与一些人相关，现在"new"System.Text.Json out.

await using FileStream file = File.OpenRead("files/data.json");
var options = new JsonSerializerOptions {
    PropertyNamingPolicy = JsonNamingPolicy.CamelCase
};
// Switch the JsonNode type with one of your own if
// you have a specific type you want to deserialize to.
IAsyncEnumerable<JsonNode?> enumerable = JsonSerializer.DeserializeAsyncEnumerable<JsonNode>(file, options);
await foreach (JsonNode? obj in enumerable) {
    var firstname = obj?["firstname"]?.GetValue<string>();
}

如果你对更多感兴趣，比如如何解析压缩JSON，有我写的这篇博客文章:在。net中使用流解析60GB JSON文件