国产数据库圈，为啥那么多水货？”
编者按： 自 2023 年以来，RAG 已成为基于 LLM 的人工智能系统中应用最为广泛的架构之一。由于诸多产品的关键功能（如：领域智能问答、知识库构建等）严重依赖RAG，优化其性能、提高检索效率和准确性迫在眉睫，成为当前 RAG 相关研究的核心问题。如何高效准确地从PDF等非结构化数据中提取信息并加以利用，是其中一个亟待解决的重要问题。本文比较分析了多种解决方案的优缺点，着重探讨了这一问题的应对之策。

文章首先介绍了基于规则的解析方法，如pypdf，指出其无法很好地保留文档结构。接着作者评估了基于深度学习模型的解析方法，如 Unstructured 和 Layout-parser ，阐述了这种方法在提取表格、图像和保留文档布局结构等方面的优势，但同时也存在一些局限性。对于具有双列（double-column）等复杂布局的 PDF 文档，作者提出了一种经过改进的重排序算法。此外，作者还探讨了利用多模态大模型直接从 PDF 文档中提取信息的可能性。

这篇文章系统地分析了 PDF 文档解析中的各种挑战，并给出了一系列解决思路和改进算法，为进一步提高非结构化数据解析的质量贡献了有价值的见解，同时也指出了未来 PDF 文档解析的发展方向。
作者 | Florian June
编译|岳扬
对于 RAG 系统而言，从文档中提取信息是一种不可避免的情况。确保能够从源文件中有效地提取内容，对于提高最终输出的质量至关重要。
切勿低估这一流程的重要性。在使用 RAG 系统时，如果在文档解析过程中信息提取不力，会导致对 PDF 文件中所含信息的理解和利用受限。
解析流程（Pasing process）在 RAG 系统中的位置如图 1 所示：
图 1：解析流程（Pasing process）在 RAG 系统中的位置。Image by author。
在实际工作场景中，非结构化数据远比结构化数据丰富。但如果这些海量数据不能被解析，其巨大价值将无法发掘，其中 PDF 文档尤为突出。
在非结构化数据中，PDF 文档占绝大多数。有效处理 PDF 文档对管理其他类型的非结构化文档也有很大帮助。
本文主要介绍解析 PDF 文档的方法，包括但不限于如何有效解析 PDF 文档、如何尽可能提取更多有用信息等相关问题的算法和建议。
01 解析PDF将会面临的挑战

PDF 文档是非结构化文档的代表性格式，然而，从 PDF 文档中提取信息是一个极具挑战性的过程。
与其说 PDF 是一种数据格式，不如将其描述为一系列打印指令的集合更为准确。PDF 文件由一系列指令组成，这些指令指示 PDF 阅读器或打印机在屏幕或纸张上如何安排各种符号、文字的位置和显示方式。 这与 HTML 和 docx 等文件格式截然不同，这些文件格式使用
、

和等标签来组织不同的逻辑结构，如图 2 所示： 
图 2：Html vs PDF. Image by author.
解析 PDF 文档的挑战在于准确提取整个页面的布局，并将所有内容（包括表格、标题、文本段落和图像）转化为文本形式。 在这个过程中，会出现文本提取不准确、图像识别不精确以及混淆表格中的行列关系等挑战。
02 如何解析PDF文档
一般来说，解析 PDF 文档有三种方法，它们各有优缺点，适用于不同的场景：


基于规则的解析方法（Rule-based approach） ：根据文档的组织特征确定 PDF 文档中每个部分的样式和内容。不过，这种方法的通用性不强，因为 PDF 的类型和布局繁多，难以通过预定义的规则覆盖所有情况。

基于深度学习模型的解析方法：例如结合目标检测（object detection）和 OCR 模型的解决方案。
基于多模态大模型解析复杂结构或提取 PDF 中的关键信息。
2.1 基于规则的解析方法（Rule-based approach）
pypdf[1]是这种方法最具代表性的工具之一，它是一种被广泛使用的基于规则的 PDF 解析工具。在 LangChain[2]和LlamaIndex[3]等库中，被作为解析 PDF 文件的标准方法使用。
下面是使用pypdf尝试解析《Attention Is All You Need》[4]论文第 6 页的案例。该页面如图 3 所示。
图 3：《Attention Is All You Need》论文第 6 页
具体代码如下：
importPyPDF2
filename= "/Users/Florian/Downloads/1706.03762.pdf"
pdf_file= open(filename, 'rb')

reader=PyPDF2.PdfReader(pdf_file)

page_num= 5
page=reader.pages[page_num]
text=page.extract_text()

print('--------------------------------------------------')
print(text)

pdf_file.close()
代码运行结果为（为简洁起见，省略其余部分）：
(py)Florian:~Florian$piplist |grep pypdf
pypdf3.17.4
pypdfium24.26.0

(py)Florian:~Florian$python/Users/Florian/Downloads/pypdf_test.py
--------------------------------------------------
Table1:Maximum path lengths,per-layer complexityandminimum number of sequential operations
fordifferent layer types.nis the sequence length,dis the representation dimension,kis the kernel
size of convolutionsandrthe size of the neighborhoodinrestricted self-attention.
Layer Type Complexity per Layer Sequential Maximum Path Length
Operations
Self-Attention O(n2d)O(1)O(1)
Recurrent O(nd2)O(n)O(n)
Convolutional O(knd2)O(1)O(logk(n))
Self-Attention(restricted)O(rnd)O(1)O(n/r)
3.5Positional Encoding
Since our model contains no recurrenceandno convolution, inorderforthe model to make use of the
order of the sequence,we must inject some information about the relativeorabsolute position of the
tokensinthe sequence.To this end,we add"positional encodings"to theinputembeddings at the
bottoms of the encoderanddecoder stacks.The positional encodings have the same dimension dmodel
asthe embeddings,so that the two can be summed.There are many choices of positional encodings,
learnedandfixed[9].
In this work,we use sineandcosine functions of different frequencies:
PE(pos,2i)=sin(pos/100002i/d model)
PE(pos,2i+1)=cos(pos/100002i/d model)
where posis the positionandiis the dimension.Thatis,each dimension of the positional encoding
corresponds to a sinusoid.The wavelengths form a geometric progressionfrom 2to100002.We
chose this function because we hypothesized it would allow the model to easily learn to attend by
relative positions,sincefor anyfixed offset k,PEpos+kcan be representedasa linear function of
PEpos.
...
...
...
根据 PyPDF 的检测结果，可以发现它将 PDF 中的字符序列（character sequences）序列化为一个单一的长序列，而不保留结构信息。换句话说，它将文档中的每一行都视为由换行符"n "分隔的序列，因此无法准确识别文本段落或表格。
这种限制是基于规则的 pdf 解析方法的固有特征。
2.2 基于深度学习模型的解析方法
这种方法的优点在于能够准确识别文档的整体布局（包括表格和文本段落），甚至可以理解表格的内部结构。说明这种方法可以将文档划分为定义明确、完整的信息单元，同时还可以保留预期的含义和结构。
不过，这种方法也存在一定的局限性。目标检测（object detection）和 OCR 阶段可能比较耗时。因此，建议使用 GPU 或其他用于加速特定计算任务的硬件，并采用多个进程和线程并行处理。
这种方法需要使用目标检测（object detection）技术和 OCR 模型，我已经测试了几个最具代表性的开源框架：


Unstructured[5]：该框架已经被集成到 langchain[6]中。在infer_table_structure为 True 的情况下，hi_res 策略的表格识别效果很好。然而，fast策略由于没有使用目标检测模型，错误地识别了许多图像和表格，因此表现不佳。

Layout-parser[7]：如果需要识别结构复杂的 PDF，建议使用框架中最大规模的模型，这样准确率会更高，不过速度可能会稍慢一些。此外，Layout-parser 的模型[8]似乎在过去两年中没有更新过。

PP-StructureV2[9]：采用了各种模型组合进行文档分析，性能高于平均水平。其架构如图 4 所示：
图 4：作者提出的 PP-StructureV2 框架。它包含两个子系统：布局信息提取（layout information extraction）和关键信息提取（key information extraction）。来源：PP-StructureV2[9]。
除了前文提到的那些开源工具外，还存在像 ChatDOC 这样需要付费才能使用的商业工具，这些商业工具利用基于文档布局的识别和OCR（光学字符识别）方法来解析PDF文档。
接下来，我们将详细说明如何使用开源的unstructured[10]框架来解析 PDF，解决下面这三个关键挑战。
挑战 1：如何从表格和图片中提取数据
在本小节，我们将以 unstructured[10]框架为例。检测到的表格数据可以直接导出为 HTML。相关代码如下：
fromunstructured.partition.pdfimportpartition_pdf

filename= "/Users/Florian/Downloads/Attention_Is_All_You_Need.pdf"

#infer_table_structure=True automatically selects hi_res strategy
elements=partition_pdf(filename=filename,infer_table_structure=True)
tables= [elforelinelementsifel.category== "Table"]

print(tables[0].text)
print('--------------------------------------------------')
print(tables[0].metadata.text_as_html)
通过跟踪partition_pdf 函数的内部代码逻辑，绘制了如图 5 的基本代码流程图。
图 5：partition_pdf 函数的内部代码逻辑。Image by author。
代码运行的结果如下：
Layer Type Self-Attention Recurrent Convolutional Self-Attention(restricted)Complexity per Layer O(n2d)O(nd2)O(knd2)O(rnd)Sequential Maximum Path Length Operations O(1)O(n)O(1)O(1)O(1)O(n)O(logk(n))O(n/r)
--------------------------------------------------




































Layer Type
Complexity per Layer
Sequential Operations
Maximum Path Length


Self-Attention
O(n?-d)
O(1)
O(1)


Recurrent
O(n-d?)
O(n)
O(n)


Convolutional
O(k-n-d?)
O(1)
O(logy(n))


Self-Attention(restricted)
O(r-n-d)
ol)
O(n/r)



复制 HTML 标签并将它们保存为 HTML 文件。然后，使用 Chrome 打开它，如图 6 所示：

图 6：图 3 中表格 1 的内容提取。Image by author。
可以看出，unstructured 算法基本准确提取了整个表格的数据。
挑战 2：如何重新排列检测到的数据块？特别是如何处理双列（double-column） PDF
在处理双列（double-column）PDF时，我们以《BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding》[11]论文为例。红色箭头表示阅读顺序：

图 7：Double-column page
确定布局后，unstructured 框架会将每个页面划分为若干矩形块，如图 8 所示。

图 8：布局检测结果的可视化。Image by author。
每个矩形块的详细信息可以按以下格式获取：
[

LayoutElement(bbox=Rectangle(x1=851.1539916992188,y1=181.15073777777613,x2=1467.844970703125,y2=587.8204599999975),text='These approaches have been generalized to coarser granularities,such as sentence embed-dings(Kiros et al.,2015;Logeswaran and Lee,2018)or paragraph embeddings(Le and Mikolov,2014).To train sentence representations,prior work has used objectives to rank candidate next sentences(Jernite et al.,2017;Logeswaran and Lee,2018),left-to-right generation of next sen-tence words given a representation of the previous sentence(Kiros et al.,2015),or denoising auto-encoder derived objectives(Hill et al.,2016).',source=, type='Text',prob=0.9519357085227966,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=196.5296173095703,y1=181.1507377777777,x2=815.468994140625,y2=512.548237777777),text='word based only on its context.Unlike left-to-right language model pre-training,the MLM ob-jective enables the representation to fuse the left and the right context,which allows us to pre-In addi-train a deep bidirectional Transformer.tion to the masked language model,we also use a“next sentence prediction”task that jointly pre-trains text-pair representations.The contributions of our paper are as follows:',source=, type='Text',prob=0.9517233967781067,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=200.22352600097656,y1=539.1451822222216,x2=825.0242919921875,y2=870.542682222221),text='•We demonstrate the importance of bidirectional pre-training for language representations.Un-like Radford et al.(2018),which uses unidirec-tional language models for pre-training,BERT uses masked language models to enable pre-trained deep bidirectional representations.This is also in contrast to Peters et al.(2018a),which uses a shallow concatenation of independently trained left-to-right and right-to-left LMs.',source=, type='List-item',prob=0.9414362907409668,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=851.8727416992188,y1=599.8257377777753,x2=1468.0499267578125,y2=1420.4982377777742),text='ELMo and its predecessor(Peters et al.,2017,2018a)generalize traditional word embedding re-search along a different dimension.They extract context-sensitive features from a left-to-right and a right-to-left language model.The contextual rep-resentation of each token is the concatenation of the left-to-right and right-to-left representations.When integrating contextual word embeddings with existing task-speciﬁc architectures,ELMo advances the state of the art for several major NLP benchmarks(Peters et al.,2018a)including ques-tion answering(Rajpurkar et al.,2016),sentiment analysis(Socher et al.,2013),and named entity recognition(Tjong Kim Sang and De Meulder,2003).Melamud et al.(2016)proposed learning contextual representations through a task to pre-dict a single word from both left and right context using LSTMs.Similar to ELMo,their model is feature-based and not deeply bidirectional.Fedus et al.(2018)shows that the cloze task can be used to improve the robustness of text generation mod-els.',source=, type='Text',prob=0.938507616519928,image_path=None,parent=None), 


LayoutElement(bbox=Rectangle(x1=199.3734130859375,y1=900.5257377777765,x2=824.69873046875,y2=1156.648237777776),text='•We show that pre-trained representations reduce the need for many heavily-engineered task-speciﬁc architectures.BERT is theﬁrstﬁne-tuning based representation model that achieves state-of-the-art performance on a large suite of sentence-level and token-level tasks,outper-forming many task-speciﬁc architectures.',source=, type='List-item',prob=0.9461237788200378,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=195.5695343017578,y1=1185.526123046875,x2=815.9393920898438,y2=1330.3272705078125),text='•BERT advances the state of the art for eleven NLP tasks.The code and pre-trained mod-els are available at https://github.com/google-research/bert.',source=, type='List-item',prob=0.9213815927505493,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=195.33956909179688,y1=1360.7886962890625,x2=447.47264000000007,y2=1397.038330078125),text='2 Related Work',source=, type='Section-header',prob=0.8663332462310791,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=197.7477264404297,y1=1419.3353271484375,x2=817.3308715820312,y2=1527.54443359375),text='There is a long history of pre-training general lan-guage representations,and we brieﬂy review the most widely-used approaches in this section.',source=, type='Text',prob=0.928022563457489,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=851.0028686523438,y1=1468.341394166663,x2=1420.4693603515625,y2=1498.6444497222187),text='2.2 Unsupervised Fine-tuning Approaches',source=, type='Section-header',prob=0.8346447348594666,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=853.5444444444446,y1=1526.3701822222185,x2=1470.989990234375,y2=1669.5843488888852),text='As with the feature-based approaches,theﬁrst works in this direction only pre-trained word em-(Col-bedding parameters from unlabeled text lobert and Weston,2008).',source=, type='Text',prob=0.9344717860221863,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=200.00000000000009,y1=1556.2037353515625,x2=799.1743774414062,y2=1588.031982421875),text='2.1 Unsupervised Feature-based Approaches',source=, type='Section-header',prob=0.8317819237709045,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=198.64227294921875,y1=1606.3146266666645,x2=815.2886352539062,y2=2125.895459999998),text='Learning widely applicable representations of words has been an active area of research for decades,including non-neural(Brown et al.,1992;Ando and Zhang,2005;Blitzer et al.,2006)and neural(Mikolov et al.,2013;Pennington et al.,2014)methods.Pre-trained word embeddings are an integral part of modern NLP systems,of-fering signiﬁcant improvements over embeddings learned from scratch(Turian et al.,2010).To pre-train word embedding vectors,left-to-right lan-guage modeling objectives have been used(Mnih and Hinton,2009),as well as objectives to dis-criminate correct from incorrect words in left and right context(Mikolov et al.,2013).',source=, type='Text',prob=0.9450697302818298,image_path=None,parent=None), 

LayoutElement(bbox=Rectangle(x1=853.4905395507812,y1=1681.5868488888855,x2=1467.8729248046875,y2=2125.8954599999965),text='More recently,sentence or document encoders which produce contextual token representations have been pre-trained from unlabeled text andﬁne-tuned for a supervised downstream task(Dai and Le,2015;Howard and Ruder,2018;Radford et al.,2018).The advantage of these approaches is that few parameters need to be learned from scratch.At least partly due to this advantage,OpenAI GPT(Radford et al.,2018)achieved pre-viously state-of-the-art results on many sentence-level tasks from the GLUE benchmark(Wang language model-Left-to-right et al.,2018a).',source=, type='Text',prob=0.9476840496063232,image_path=None,parent=None)

]

其中(x1, y1)是左上顶点的坐标，而(x2, y2)是右下顶点的坐标：
(x_1,y_1)--------
||
||
||
----------(x_2,y_2)

此时，你可以选择重新调整（reshape）页面的阅读顺序。Unstructured 框架已经内置了排序算法，但我发现在处理 double-column 的情况时，排序结果并不令我满意。
因此，有必要设计一种算法处理这种情况。最简单的方法是首先按照左上顶点的横坐标进行排序，如果横坐标相同，则按纵坐标进行排序。其伪代码如下：
layout.sort(key=lambda z:(z.bbox.x1,z.bbox.y1,z.bbox.x2,z.bbox.y2))

不过，我们发现，即使是同一列中的图块，其横坐标也可能存在变化。如图 9 所示，紫线指向的 block 横坐标 bbox.x1 实际上更靠左。进行排序时，它的位置会在绿线指向的 block 之前，这显然违反了文档的阅读顺序。

图 9：同一列的横坐标可能会有变化。Image by author。
在这种情况下，一种具备可行性的算法如下：

首先，对所有左上顶点x坐标x1进行排序，得到x1_min
然后，对所有右下顶点x坐标x2进行排序，得到x2_max
接下来，确定页面中心线的 x 坐标为：

x1_min=min([el.bbox.x1 for el in layout])
x2_max=max([el.bbox.x2 for el in layout])
mid_line_x_coordinate=(x2_max+x1_min)/2

之后，如果 bbox.x1

分类完成后，根据它们的 y 坐标对每列内的每个 block 进行排序。最后，将右侧列连接到左侧列的右侧。
left_column= []
right_column= []
forelinlayout:
 ifel.bbox.x1
值得一提的是，这一算法改进也能兼容单栏 PDF 的解析。
挑战 3：如何提取多级标题
提取标题（包括多级标题）的目的是增强 LLM 所提供回复内容的准确性。
例如，如果用户想了解图 9 中第 2.1 节的大意，只需准确提取出第 2.1 节的标题，并将其与相关内容一起作为上下文发送给 LLM，最终所得到的回复内容的准确性就会大大提高。
该算法仍然依赖于图 9 所示的布局块（layout blocks）。我们可以提取 type=’Section-header’的 block，并计算高度差值（bbox.y2 - bbox.y1）。高度差值（height difference）最大的 block 对应一级标题，其次是二级标题，然后是三级标题。
2.3 基于多模态大模型解析PDF中的复杂结构
在多模态模型得到快速发展和广泛应用之后，也可以利用多模态模型来解析表格。有几种选择[12]：

检索相关图像（PDF 页面）并将它们发送到 GPT4-V ，以响应用户向系统提交的问题或需求。
将每个 PDF 页面视为一张图像，让 GPT4-V 对每个页面进行图像推理。通过图像推理构建 Text Vector Store index（译者注：应当是对文本向量进行索引和检索的数据结构或存储空间）。使用Image Reasoning Vectore Store（译者注：应当为用于存储图像推理向量的数据库或仓库）查询答案。
使用 Table Transformer 从检索到的图像中裁剪表格信息，然后将这些裁剪后的图像发送到 GPT4-V 以响应用户向系统提交的问题或需求。
对裁剪后的表格图像使用 OCR 技术进行识别，然后将数据发送到 GPT4 / GPT-3.5 ，以回答用户向系统提交的问题。

经过测试，确定第三种方法最为有效。
此外，我们还可以使用多模态模型从图像中提取或总结关键信息（因为 PDF 文件可轻松转换为图像），如图 10所示。

图 10：从图像中提取或总结关键信息。来源：GPT-4 with Vision: Complete Guide and Evaluation[13]
03 Conclusion
一般几乎所有的非结构化文档都具有高度的灵活性，需要各种各样的解析技术。然而，业界目前还没有就使用哪种方法达成共识。
在这种情况下，建议选择最适合项目需求的方法，根据不同类型的 PDF 文件，采取特定的处理方法。例如，论文、书籍和财务报表等非结构性文档可能会根据其特点进行独特的布局设计。
尽管如此，如果条件允许，仍建议选择基于深度学习或多模态的方法。这些方法可以有效地将文档分割成定义明确、完整的信息单元，从而最大限度地保留文档的原意和结构。
Thanks for reading!
————
Florian June
An artificial intelligence researcher, mainly write articles about Large Language Models, data structures and algorithms, and NLP.
END
参考资料
[1]https://github.com/py-pdf/pypdf
[2]https://github.com/langchain-ai/langchain/blob/v0.1.1/libs/langchain/langchain/document_loaders/pdf.py
[3]https://github.com/run-llama/llama_index/blob/v0.9.32/llama_index/readers/file/docs_reader.py
[4]https://arxiv.org/pdf/1706.03762.pdf
[5]http://unstructured-io.github.io/unstructured/
[6]https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/document_loaders/pdf.py
[7]http://github.com/Layout-Parser/layout-parser
[8]https://layout-parser.github.io/platform/
[9]https://arxiv.org/pdf/2210.05391.pdf
[10]https://github.com/Unstructured-IO/unstructured
[11]https://arxiv.org/pdf/1810.04805.pdf
[12]https://docs.llamaindex.ai/en/stable/examples/multi_modal/multi_modal_pdf_tables.html
[13]https://blog.roboflow.com/gpt-4-vision/
本文经原作者授权，由 Baihai IDP 编译。如需转载译文，请联系获取授权。
原文链接：
https://pub.towardsai.net/advanced-rag-02-unveiling-pdf-parsing-b84ae866344e

                                        展开阅读全文 
                                    


                                            httpslangchaingithubhtmlpypdfblobllama深度学习indexdocsvmodalrule其他开源                                        



 著作权归作者所有


                                        举报
                                    






加载中







            
        



                                
                            

                                

                                                                            点击引领话题

                                                                    

                                

                                    发布并加入讨论

                                
                            

 










热门内容

                                        更多精彩内容
                                    




                                                    KaiwuDB 成功入选《2023 ToB 行业影响力价值榜  创新力产品榜》
                                                

                                                    warm-flow 工作流发布 v1.1.4，监听器生命周期
                                                

                                                    开放签开源工具 V1.2 版增加骑缝签功能
                                                

                                                    Visual Studio Code 1.88 发布
                                                

                                                    Calibre 7.8 发布，功能强大的开源电子书工具
                                                

                                                    为什么 mybatis-mp 是最新一代最强 ORM 框架？
                                                

                                                    超 4000 应用加入鸿蒙生态
                                                

                                                    微软宣布完成 Azure RTOS 的开源迁移：更名为 Eclipse ThreadX，采用 MIT 开源协议
                                                

                                                    JetBrains 全家桶 2024 首个大版本更新 (2024.1)
                                                

                                                    新型 HTTP/2 漏洞导致 Web 服务器遭受 DoS 攻击
                                                

                                                    受 XZ 后门事件影响，Ubuntu 24.04 Beta 推迟发布
                                                

                                                    通用 Mapper 4.3.0 发布
                                                

                                                    Redict 首个稳定版 7.3.0 发布，基于 Redis 7.2.4 的社区分支
                                                

                                                    FFmpeg 7.0“Dijkstra”发布
                                                

                                                    Simple Admin Go 语言分布式后台管理系统 v1.3.12 发布
                                                

                                                    smart-doc 3.0.3 发布，Java 零注解 API 文档生成工具
                                                

                                                    :tada:Laravel + Vue3 前后端分离后端框架 CatchAdmin v3.2.2 发布
                                                

                                                    开源即时通讯应用 Tailchat v1.11.0 发布，插件化分布式 noIM 应用
                                                

                                                    :fire::fire::fire: 最好用的跨平台 asiinema 终端录屏分享工具，比官方工具还强大
                                                

                                                    德国也要“自主可控”，州政府将 3 万台 PC 从 Windows 迁移到 Linux
                                                

                                                    .NET9 PreView2+.AOT ILC的重大变化
                                                

                                                    记一次 Rust 内存泄漏排查之旅 | 经验总结篇
                                                

                                                    Apache IoTDB 诞生记：学术圈出来的数据库，有啥不一样？
                                                

                                                    Java 8 内存管理原理解析及内存故障排查实践
                                                

                                                    再聊SPI机制
                                                

                                                    Advanced RAG 02：揭开 PDF 文档解析的神秘面纱
                                                

                                                    Apache IoTDB 诞生记：学术圈出来的数据库，有啥不一样？
                                                

                                                    MySQL DBA 需要了解一下 InnoDB Online DDL 算法更新
                                                

                                                    代码手术刀-自定义你的代码重构工具
                                                

                                                    用three.js做一个3D汉诺塔游戏（上）
                                                

                                                    Vue.js 应用实现监控可观测性最佳实践
                                                

                                                    eBPF 零侵扰分布式追踪的进展和探索
                                                

                                                    从4小时到15分钟，一次分布式数据库的丝滑体验
                                                

                                                    一文搞懂 Kafka consumer 与 broker 交互机制与原理
                                                

                                                    得物App灰度&全量发布效率提升实践
                                                

                                                    钱大妈生鲜如何利用 CCR 实现 Apache Doris 集群读写分离
                                                

                                                    为 AI 而生的编程语言「GitHub 热点速览」
                                                

                                                    知识图谱推理算法综述（上）：基于距离和图传播的模型
                                                

                                                    DevOps 选型指南：Zadig/云效/Coding/Jenkins/GitLab/Argo/Tekton
                                                

                                                    20年编程，AI编程6个月，关于Copliot辅助编码工具，你想知道的都在这里
                                                

                                                    一文教你实战构建消息通知系统Django
                                                

                                                    Redis开源协议调整，我们怎么办？
                                                

                                                    一文搞懂 Kafka consumer 与 broker 交互机制与原理
                                                

                                                    GreatSQL 优化技巧：将 MINUS 改写为标量子查询
                                                

                                                    登录系统演进、便捷登录设计与实现
                                                

                                                    发现数据异常波动怎么办？别慌，指标监控和归因分析来帮你
                                                

                                                    干货！Docker镜像综合管理
                                                

                                                    从 MongoDB 到 PostgreSQL 的大迁移
                                                

                                                    .NET9 PreView2+.AOT ILC的重大变化
                                                

                                                    阿里通义灵码全面公测，来看看它的水平怎么样？
                                                

                                                    同城双活：交易链路的稳定性与可靠性探索
                                                

                                                    2024 开源数据工程生态系统全景图
                                                

                                                    分布式数据库技术的演进和发展方向
                                                

                                                    实例演示如何使用CCE XGPU虚拟化
                                                

                                                    何时应用 RAG 与微调
                                                

                                                    国产数据库，是研发们的“离职创业咖啡店”吗？
                                                

                                                    干货！Docker镜像综合管理
                                                

                                                    【直播预告】国产数据库，一半都是花架子？
                                                

                                                    Apache IoTDB 诞生记：学术圈出来的数据库，有啥不一样？
                                                

                                                    四高内核底座+两大架构创新，看 openGauss 如何创数据库新未来
                                                

                                                    何时应用 RAG 与微调
                                                

                                                    完蛋！我把AI喂吐了！
                                                

                                                    1024 分辨率下最快模型，字节跳动文生图开放模型 SDXL-Lightning 发布
                                                

                                                    OpenTiny Vue 3.14.0 正式发布，增加了 MindMap 思维导图等3个新组件
                                                

                                                    四高内核底座+两大架构创新，看 openGauss 如何创数据库新未来
                                                

                                                    从编译器、游戏引擎到游戏掌机——我是这样做独立游戏的
                                                

                                                    警示：软删除引发泼天大祸！
                                                

                                                    实例演示如何使用CCE XGPU虚拟化
                                                

                                                    Fluid 携手 Vineyard，打造 Kubernetes 上的高效中间数据管理
                                                

                                                    一则 MySQL 从节点 hung 死问题分析
                                                

                                                    浅谈JVM整体架构与调优参数
                                                

                                                    基于 K8s 容器集群的容灾架构与方案
                                                

                                                    教你如何使用Zig实现Cmpp协议
                                                

                                                     动弹“摸鱼”，免费领 399元 百度 AI 开发者大会门票
                                                

                                                    Fluid 携手 Vineyard，打造 Kubernetes 上的高效中间数据管理
                                                

                                                    Java 21 虚拟线程如何限流控制吞吐量
                                                

                                                    .NET9 PreView2+.AOT ILC的重大变化
                                                

                                                    DBA 要被云淘汰了？新人该咋办？
                                                

                                                    他潜伏三年想插它后门，最终还是输给了另一个他
                                                

                                                    北京站源创会精彩回顾
                                                

                                                    万字带你了解ChatGLM
                                                

                                                    3.00 版本来了！DolphinDB V2.00.12 & V3.00.0 正式发布！
                                                

                                                    程序员逆袭 CEO 总共分几步？
                                                

                                                    国产数据库圈，为啥那么多水货？
                                                

                                                    Java 22正式发布，一文了解全部新特性
                                                

                                                    运维人少，如何批量管理上百个微服务、上千条流水线？
                                                

                                                    一文搞懂 Kafka consumer 与 broker 交互机制与原理
                                                

                                                    实例演示如何使用CCE XGPU虚拟化
                                                

                                                    80岁图灵奖得主再度出山，打造基于数据库的云原生操作系统 DBOS
                                                

                                                    Fluid 携手 Vineyard，打造 Kubernetes 上的高效中间数据管理
                                                

                                                    探索Django REST框架构建强大的API
                                                

                                                    2024 年 JavaScript 和 TypeScript 趋势 | 开发者生态系统调研洞察
                                                

                                                    一口气搞懂分库分表 12 种分片算法，大厂都在用
                                                

                                                    【直播预告】国产数据库，一半都是花架子？
                                                

                                                    Redis开源协议调整，我们怎么办？
                                                

                                                    干货！Docker镜像综合管理
                                                

                                                    万字带你走过数据库的这激荡的三年
                                                

                                                    Redis开源协议调整，我们怎么办？
                                                

                                                    国产数据库圈，为啥那么多水货？
                                                

                                                    3.00 版本来了！DolphinDB V2.00.12 & V3.00.0 正式发布！
                                                







全站热门评论





                                                                                    .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                t
            




                                troika 2024-04-07 14:35
                            


                            我就问一条:这么多app里有微信吗？
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                CCCZZCCC 2024-03-07 08:46
                            


                            每次看到你，就想着进来看看笑话，结果咱是首评...
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                i
            




                                iVista 2024-03-07 13:04
                            


                            张小龙没（）
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                战场原礼亚 2024-04-07 14:53
                            


                            我现在也是这套
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                lintghi 2024-03-21 16:36
                            


                            为了避免云厂商白嫖吧
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                luwenhua 2024-03-05 18:18
                            


                            其实用用惯了，比windows省心多了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                金拱门 2024-04-07 13:36
                            


                            没毛病。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                我的ID是jmjoy 2024-03-19 11:16
                            


                            在交通上，如果不对驾驶员不带安全带和酒驾等行为进行扣分和罚款，那这类违规行为肯定会泛滥，在编程语言方面同理，通过“教育”和“规范”这些手段来让程序员遵守内存安全的想法很幼稚啊。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                烈冰 2024-03-16 13:44
                            


                            WPS这样重量级的软件，想不到这么快就交付了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                Mark哥是我 2024-03-17 23:23
                            


                            你看过多少java的源码以及netcore的源码？:joy::joy:
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                我的ID是jmjoy 2024-03-07 17:45
                            


                            让市场说了算明显不行，某些企业利用资本先发优势大搞垄断，霸占生态位不干人事，店大欺客，早就应该治治了。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                Ai东 2024-04-07 13:56
                            


                            我想不出为什么不用 mybatis-mp 的理由！唯一的理由：太强了！
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}



 服务器托管                                                       

                
            




                                漫步海边小路 2024-03-08 08:43
                            


                            我猜你只知道这两个系统,多读正经书, 少看聊斋
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                朋克 2024-04-07 12:06
                            


                            想起了硅谷的剧情
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                战场原礼亚 2024-04-07 14:10
                            


                            就看看国内还有没有大厂有钱跟进了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                战场原礼亚 2024-04-07 13:58
                            


                            这家公司最好不要使用任何开源产品
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                t
            




                                troika 2024-03-08 14:54
                            


                            外企:你继续说，我在走。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                M
            




                                MDIngs 2024-03-15 15:35
                            


                            离职后需要把发的工资还给公司吗？
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                osc_91916845 2024-03-05 16:24
                            


                            恭喜恭喜，想想如果 redis 是中国公司会怎么样，收购？我估计不会，大概率 fork 一份源码再自己包装一下，然后推出“官方Redis库”，一通运作，原作者的库被淹没，所有贡献被一脚踢开
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                战场原礼亚 2024-04-07 14:42
                            


                            这局太厉害了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                战场原礼亚 2024-04-07 15:11
                            


                            本来社**义国家不应该有“富豪”
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                钛元素 2024-04-07 11:57
                            


                            是LTS吗
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                osc_10488977 2024-04-07 13:20
                            





                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                小而美软件开发 2024-04-07 13:59
                            


                            不知道谁再用
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                带着小猪_闯天下 2024-04-02 12:11
                            


                            宇宙中不能使用的国家都是 ChatGPT 主动限制的，这是事实
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                angelshaka 2024-03-05 18:25
                            


                            百乙己涨红了脸：怎么能这么卖了呢，一点文人风骨都没有，我的想卖都没人要啊
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                iman123 2024-04-07 11:30
                            


                            升级 electron 28 的 esm ( https://github.com/electron/electron/releases/tag/v28.0.0) 主要是支持 node 的 esm，可以参考 https://www.electronjs.org/docs/latest/tutorial/esm, 插件开发的似乎没有提到，可能过几年node 24，26之后就默认esm了，现在还需要使用mjs或者type:"module"
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                太空中的小星星 2024-03-08 09:28
                            


                            tx公司是霸权和垄断、资本等作怪。tx公司以前还专门试过阻止wine登录qq，不支持就算了。还阻止，这些公司毫无职业道德，早就该管了。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                单一结构 2024-03-31 18:20
                            


                            幸亏没用这玩意，不然升级之后怎么用都不知道。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                蓝色海洋之前 2024-03-26 13:02
                            


                            没bug就没有必要改
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                權少 2024-04-07 13:57
                            


                            Rust的優勢不是杜絕內存泄露，而是避免野指針、空引用和數據競爭。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                luwenhua 2024-03-11 14:10
                            


                            怎么感觉这人怎么一直在投机停不下来啊
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                dhssingle 2024-03-22 11:36
                            


                            微软的 Garnet 据说已经在 Azure 上替代 Redis 跑了几个月了。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                李英辉 2024-04-07 11:10
                            


                            操作系统的商业化是不可避免的，尤其是桌面系统；在前期，给厂商溢价竞争甚至垄断的机会，没有什么不好的。只有厂商赚钱了，才有更好的进化机会；mac os 和windows都是优秀的闭源商业操作系统。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                Devlive开源社区 2024-03-10 14:32
                            


                            只有缺心眼的人才能说出这种话来，ai的研发者不是程序员吗？说话不动脑子
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                -SORA- 2024-04-07 14:34
                            


                            德国工程师是不是会把装有代码的硬盘放进油纸包里，一百年后打开还是锃光瓦亮的
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                梅子酒好吃 2024-04-07 13:09
                            


                            终于支持 solon 喽：）
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                无尽的拉格朗日 2024-04-07 15:18
                            


                            时间好多。 羡慕
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                我的ID是jmjoy 2024-03-21 17:54
                            


                            云厂商是有些不道德，二次开发开源产品卖钱，既不投钱，又不回馈社区。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                无
            




                                无库 2024-03-07 13:13
                            


                            早该如此的，国家应该要求处于垄断地位的超级软件必须支持Linux，否则重罚
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                码农小胖哥 2024-03-21 17:05
                            


                            云厂商一年卖云redis 赚疯了，远低于他们对社区的贡献
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                kakai 2024-04-07 11:14
                            


                            别这么洗，德国对大嘛合法化，你该怎么化解？
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}

     服务器托管           



                
            




                                无尽的拉格朗日 2024-04-07 15:22
                            


                            个人感觉rust语言的优势是。 让我等javaer。。。可以用系统语言写项目了。 内存一个jvm管。 一个编译器管。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                李英辉 2024-04-07 11:14
                            


                            已经修复了，米国制造连软件都已经靠不住了吗？
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                墨名次 2024-04-07 12:35
                            


                            非常好用
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                红猪的爸爸 2024-04-07 13:41
                            


                            看你开发那种软件 看你是工程师还是码农
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                z-zg 2024-03-24 19:19
                            


                            在中国感觉殡葬与养老可能是朝阳产业
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                风
            




                                风一样的Man 2024-03-06 10:50
                            


                            我还没开始用, 你就砍掉了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                简单代码 2024-04-07 15:08
                            


                            没有微信天也塌不了，正好不用加入工作群了。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                抢小孩糖吃 2024-03-13 13:05
                            


                            挺好的，更新一下旧的IT架构，创造一堆软件迁移需求，创造一堆硬件迁移需求，新增更多的IT岗位。
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                不羁的醒与醉 2024-03-14 10:00
                            


                            碰瓷营销差不多得了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                大后锋 2024-03-18 13:27
                            


                            苹果是仇视一切可以跨平台运行的东西吧
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                F
            




                                Francesca 2024-03-26 20:00
                            


                            像极了 现在很多项目经理做项目 “先实现功能，后面有时间再优化”，然后就没有然后了
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                石头帽 2024-03-29 09:16
                            


                            这玩意就用pc，谁摸鱼用手机啊，能随便玩手机谁玩动弹啊
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                waylau 2024-04-07 11:28
                            


                            来回倒腾，骗经费么~
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                老码客 2024-04-07 14:53
                            


                            Docker插件好像有点问题，Docker部署不能用。提示：Cannot run program "docker.exe"
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                钛元素 2024-04-07 11:56
                            


                            德国LAO怎么总是来来去去的？
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                janz0912 2024-04-07 13:21
                            


                            其实我不稀饭这门新语言
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                cielSwift 2024-04-07 12:48
                            


                            rtos怎么都叫thread什么的 rt－thread
                        



                                                        .comments-box__list-news .osc-avatar{border-radius: 5px}.comments-box__list-news .comment-item:hover{background: rgba(201,201,201,0.1)}




                
            




                                魔力猫 2024-04-07 11:28
                            


                            反正政客拿钱就是。
                        














关于作者








                
            




                                                    
                            





Baihai_IDP





关注



                                                                                                                        私信
                                                        

                                                                                                                                提问
                                                            






文章

                                                    78
                                                

经验值

                                                    1.9K
                                                

粉丝

                                                    20
                                                
关注

                                                    0
                                                




作者的专辑

                                                    全部
                                                



                                                            工作日志(78)
                                                        







                



作者的其它热门文章




                                        ▪ {{formatHtml(o.title)}}
                                    







热门资讯




1

                                                            谷歌 Rust 团队工作效率是 C++ 团队的两倍
                                                        


2

                                                            某开源公司实习生上班时间向其他开源项目提交 PR，CEO 发现后要求关闭
                                                        


3

                                                            用 Vue 全家桶纯手工搓了一个开源版「抖音」，高仿度接近 100%
                                                        


4

                                                            知名开源前端框架「威优易」，你们学吗？
                                                        


5

                                                            近万台龙芯 3A5000 电脑走进中小学课堂
                                                        


6

                                                            阿里云：以后公司 20% 代码由通义灵码编写
                                                        


7

                                                            OpenHarmony 4.1 Release
                                                        


8

                                                            OpenAI全网疯传的53页PDF文档：计划2027年前开发出通用人工智能
                                                        


9

                                                            Bun 1.1 版本震撼发布，Windows 支持来了
                                                        


10

                                                            mac 苹果芯片运行 Asahi，最强 Linux，终极 ARM64 Linux 工作站
                                                        








推荐关注

                                    换一批 
                                







                
            



                                careyjike
                            


文章 10
访问 3.5K









                
            



                                敖丙
                            


文章 829
访问 19W









                
            



                                深夜里的程序猿
                            


文章 26
访问 2.2W









                
            



                                Atomi
                            


文章 1
访问 294









                
            



                                李恒哲
                            


文章 15
访问 3W












热门软件




                                                    Xen  - 开源虚拟机
                                                

                                                    Date Range Picker  - jQuery双日历
                                                

                                                    ColorBox  - jQuery的弹出窗口
                                                

                                                    ZoneMinder  - 开源视频监控系统
                                                

                                                    Gambas  - VB开发工具
                                                

                                                    JCaptcha  - Java验证码生成库
                                                

                                                    Picok  - 个人门户系统
                                                

                                                    Firefox Sync  - Firefox书签同步插件
                                                

                                                    EclipseME  - Eclipse 手机应用开发插件
                                                

                                                    Tiny Tiny RSS  - RSS/Atom新闻聚合器
                                                

                                                    Groovy++  - Groovy 增强版
                                                

                                                    FlexBox  - jQuery下拉框插件
                                                

                                                    ProjectForge  - Web项目管理
                                                

                                                    Magento  - 开源的 PHP 电子商务系统
                                                

                                                    JSP Layout  - JSP布局框架
                                                

                                                    Xith3D  - 高性能的Java 3D引擎
                                                

                                                    Cacti  - 网络流量监测图形分析工具
                                                

                                                    ContactList  - 联系人导出
                                                

                                                    hessdroid  - Android上的Hessian库
                                                

                                                    CutyCapt  - 跨平台网页截图工具
                                                





                






                         
                    
打赏


                                        

0 评论



                        
                    

0 收藏



                        
                    

0 赞





                            微信

                            QQ

                            微博
                        

分享





    

        选择专区和圈子：{{title}}
    




                {{o.name}}
                



                        {{m.name}}
                





            取消
        

            确定
        






OSCHINA 社区

                    关于我们

                    联系我们

                    加入我们

                    合作伙伴

                    Open API
                


在线工具

                    Gitee.com

                    企业研发管理

                    CopyCat-代码克隆检测

                    实用在线工具
                


攻略

                    项目运营

                    Awesome 软件（持续更新中）
                


QQ群
                    

                    229767317
                


公众号
                
            

视频号
                
            

            







OSCHINA(OSChina.NET)
工信部
                开源软件推进联盟
指定官方社区
                社区规范
            

深圳市奥思网络科技有限公司版权所有
                粤ICP备12009483号
            



                            .codeBlock:hover .oscCode{display: block !important;} .codeBlock{z-index: 2;position: fixed;right: 20px;bottom: 57px; overflow: hidden; margin-bottom: 4px;padding: 8px 0 6px;width: 40px;height: auto;box-sizing: content-box;cursor: pointer;border: 1px solid #ddd;background: #f5f5f5;text-align: center;transition: background 0.4s ease;}

@media only screen and (max-width: 767px){ .codeBlock{display: none;}}
/*

html{

-webkit-filter: grayscale(100%);

-moz-filter: grayscale(100%);

-ms-filter: grayscale(100%);

-o-filter: grayscale(100%);

filter:progid:DXImageTransform.Microsoft.BasicImage(grayscale=1);

_filter:none;

}

*/

        

        
    
if(window.location.href.indexOf("www.oschina.net/group")!=-1 && window.location.href.indexOf("/admin/")!=-1){

    document.querySelector("#mainScreen > div > div.group-admin-container > div.admin-body-box.box-card > div > div.menu-box > div > div:nth-child(4)").remove()

}


顶部

    (function(){

        var bp = document.createElement('script');

        var curProtocol = window.location.protocol.split(':')[0];

        if (curProtocol === 'https'){

            bp.src = 'https://zz.bdstatic.com/linksubmit/push.js';

        }

        else{

            bp.src = 'http://push.zhanzhang.baidu.com/push.js';

        }

        var s = document.getElementsByTagName("script")[0];

        s.parentNode.insertBefore(bp, s);

    })();
    var _hmt = _hmt || [];

    _hmt.push(['_requirePlugin', 'UrlChangeTracker', {

        shouldTrackUrlChange: function (newPath, oldPath) {

            return newPath && oldPath;

        }}

    ]);

    (function() {

        var hm = document.createElement("script");

        hm.src = "https://hm.baidu.com/hm.js?a411c4d1664dd70048ee98afe7b28f0b";

        var s = document.getElementsByTagName("script")[0];

        s.parentNode.insertBefore(hm, s);

    })();
    {

    "@context": "https://ziyuan.baidu.com/contexts/cambrian.jsonld",

    "@id": "https://my.oschina.net/IDP/blog/11051004",

    "appid": "1653861004982757",

    "title":"Advanced RAG 02：揭开 PDF 文档解析的神秘面纱 - IDP的个人空间",

    "images": ["https://oscimg.oschina.net/oscnet/up-b7cb31bb23a25f0b29e5fc488c50328d9cc.png"],

    "description":"编者按： 自 2023 年以来，RAG 已成为基于 LLM 的人工智能系统中应用最为广泛的架构之一。由于诸多产品的关键功能（如：领域智能问答、知识库构建等）严重依赖RAG，优化其性能、提高检索效率和准确性迫在眉睫，成...",

    "pubDate": "2024-04-07T10:17:11+08:00",

    "upDate":"2024-04-07T10:17:11+08:00",

    "lrDate":""

    }
<!--  
    window.dataLayer = window.dataLayer || [];

    function gtag(){dataLayer.push(arguments);}

    gtag('js', new Date());
    gtag('config', 'G-TK89C9ZD80');
-->
    window.goatcounter = {

        path: function(p) { return location.host + p }

    }
    (function(){

        var el = document.createElement("script");

        el.src = "https://lf1-cdn-tos.bytegoofy.com/goofy/ttzz/push.js?2f2c965c87382dadf25633a3738875e5ccd132720338e03bf7e464e2ec709b9dfd9a9dcb5ced4d7780eb6f3bbd089073c2a6d54440560d63862bbf4ec01bba3a";

        el.id = "ttzz";

        var s = document.getElementsByTagName("script")[0];

        s.parentNode.insertBefore(el, s);

    })(window)