Thingi10K: A Dataset of 10,000 3D-Printing Models: Qingnan Zhou New York University Alec Jacobson Columbia University
Figure 1: The Thingi10K dataset contains 10,000 models from from featured “things” on, a popular online repository.
0.1K ShapeNetCore
0% 50% 100%
Figure 2: Our online query interface selects subsets of Thingi10K.
# Components
objects (e.g., animals, furniture) from various internet sources for
shape classification [Shilane et al. 2004]. More recently, ShapeNet
collects more than three million annotated models [Chang et al. 10
2015]. The ShapeNetCore subset contains 57,459 single-object
models with semi-automatically generated category information.
Although models from these datasets resemble physical objects, 0% 50% 100%
their geometric characteristics suggest their intention was for vi-
Figure 3: Percentile plots of vertex and component count.
sualization rather than fabrication. These datasets are not suitable
for testing 3D printing techniques.
In addition to generic datasets, a variety of specialized datasets ex- 2 Methodology
ist. For example, Lim et al. provide 219 IKEA 3D models for pose-
estimation [Lim et al. 2013]. Recently, Choi et al. released a dataset Instead of our hiring professional modelers or scanning physical ob-
of 10,000 scanned objects, with a subset of 383 successfully recon- jects, we leverage the availability of 3D models hosted and shared
structed 3D models [Choi et al. 2016]. The Shape Retrieval Contest online. Among all 3D shape repositories, we select Thingiverse
releases multiple datasets each year to test retrieval algorithms in- for its large and active user community, its vast collection of print-
cluding generic [Bronstein et al. 2010b; Li et al. 2012], non-rigid validated designs, and its restriction to open-source licenses.
humans [Pickup et al. 2014], sketch-based shapes [Li et al. 2013; Li
et al. 2014], shape correspondences [Bronstein et al. 2010a], facial As one of the largest online shape repositories, Thingiverse hosts
expressions [Nair & Cavallaro 2008; Veltkamp et al. 2011], and more than a million user-uploaded things, 3D designs consisting of
range scans [Dutagaci et al. 2010]. Our Thingi10K dataset com- one or more 3D models (i.e., one or more mesh files). As of October
plements these sources by providing a specialized dataset for 3D 2015, Thingiverse has more than 2 million active users, with 30-40
printing objects. uploads each week and 1.7 million downloads per month [Maker-
Bot 2015]. Thanks to this community, a design is typically not only
We are not the first to utilize Thingiverse models for academic pur- modeled virtually but also fabricated by one or more users, which
poses. To test a rapid prototyping interface, Mueller et al. con- provides invaluable real-world validations.
sider Thingiverse models, but report that meshing artifacts required
manual cleanup before processing [Mueller et al. 2014]. Beyer Our Thingi10K dataset consists of 10,000 models (from 2011
et al. procedurally collect 2,250 models with specific tags from things) systematically culled from Thingiverse via web crawling.
Thingiverse to test a decomposition algorithm [Beyer et al. 2015]. Rather than randomly sample the entire repository, which may con-
Buehler et al. manually sift through 25,000 models from search re- tain bogus models uploaded by inexperienced users or for testing
sults on Thingiverse to identify 363 models as “assistive technolo- purposes, we focus on things featured on Thingiverse. Featured
gies” [Buehler et al. 2015]. Beyond testing a specific routine, these things are entirely and independently selected by Thingiverse staff
works do not analyze low-level geometric characteristics of the col- based on their design, beauty and manufacturability. In a sense,
lected models. These collected datasets are also not publicly avail- these 10,000 models represent a subset of the top-quality designs
able. on Thingiverse. Thingi10K contains every 3D model of every thing
featured by Thingiverse between Sept. 16, 2009 and Nov. 15, 2015.
Contributions. Unlike previous datasets, our Thingi10K dataset
reflects the variety, complexity and (lack of) quality of 3D print-
ing models. It is immediately useful for testing the performance 3 Analysis
of methods for structural analysis [Stava et al. 2012; Zhou et al.
2013; Umetani & Schmidt 2013], shape optimization [Prévost et al. The 10,000-model dataset comes from 2,011 unique things de-
2013; Bächer et al. 2014; Musialski et al. 2015], or solid geometry signed by 1,083 unique users, covering a large variety. Nearly all
operations [Zhou et al. 2016]. Due to its specialized nature and cor- models are stored as .stl files (9,956); the rest are .obj (42), .ply (1),
related contextual information, we suspect the dataset is also useful and .off (1). We analyze both geometric and contextual information
for machine learning and data mining algorithms. We compare the of our dataset to illustrate its representational quality and diversity.
Thingi10K: 2.8M vertices
ShapeNetCore: 165K vertices
76K vertices
76K vertices
g=4886 g=79 g=78
Figure 4: Highest resolution models from each dataset. Figure 6: Models with the highest genus from each dataset.
Solid: The input mesh must be a valid boundary of a subspace of In contrast, our dataset offers a curated collection of 3D meshes
R3 . Specifically, it must be PWN, self-intersection free and induce with a large range of mesh qualities. It contains a significant num-
a {0, 1} winding number field. ber of high quality models as well as a non-negligible proportion
of models with common mesh quality problems. Due to its large
Aspect ratio: The aspect ratio of a triangle is the ratio of its cir- quantity, our dataset is ideal for stress-testing purposes where one
cumradius to the diameter of its incircle. can easily select a subset of the data that matches any combination
of mesh criteria (Section 4). Because all data are sampled from
Intrinsically Delaunay: All edges must have non-negative cotan- real-world models designed to be 3D printed, our dataset provides
gent weights [Fisher et al. 2007]. an unbiased view of the mesh qualities used in practice. Our anal-
These mesh quality measures are not by no means complete. Ad- ysis could be used to gauge the restrictions posed by various as-
ditional quality measures ([Shewchuk 2002; Attene 2013]) can be sumptions on mesh quality. For example, an algorithm assuming
easily adopted. self-intersection-free input would automatically exclude 45% of in-
puts, which may not be acceptable in a real-world settings.
customizer_challenge auto-generated python script to batch download results for custom
sketchup halloween search terms.
plastic_valley supportless education
gears camera reprap fun customizer skull
Users of our online query interface can view all contextual and
sculpture rpg robotics
scan vase toolkitchen household geometry model details (Figure 13). In particular, we respect the
space iphone fantasy printbot puzzle playset copyright of each model. On the model detail page, we clearly in-
ultimaker electronics christmas useful container dicate the original author and open source licence of each model.
castle music animal lamp holder pla monster gear We also provide links to the original Thingiverse pages where the
mount robot dualstrusion lulzbot
math experiment geometry box arduino car
raw data can be obtained.
Figure 10: Thingi10K user tags highlight the dataset’s variety. 5 Conclusion
Figure 11: A soap bubble chair is decomposed and re-oriented by Our dataset could be used as input for stress-testing purposes as
its designer for support-free 3D printing. well as ground truth for learning algorithms. As for future work,
we plan to update and increase the size of the dataset over time
to reflect the fast-evolving nature of the 3D printing community.
Specifically, we would like to include all featured things from Thin-
Lastly, all things are published under one of the open source li- giverse and add support for users to suggest additional models for
censes. Figure 12 illustrates all licenses supported by Thingiverse. inclusion. We hope our dataset and the accompanying analysis pro-
vide an informative summary of 3D printing models and clarify the
4 Online query interface requirements for geometry processing algorithms to be robust.
Figure 14: Our web interface returns subsets of the Thingi10K dataset via text queries.
